Public functions
Documentation for JWAS.jl's public interface. Below are functions available to general users.
Index
JWAS.GWAS
JWAS.build_model
JWAS.get_genotypes
JWAS.outputEBV
JWAS.outputMCMCsamples
JWAS.runMCMC
JWAS.set_covariate
JWAS.set_random
JWAS.showMME
JWAS.solve
Public Interface
JWAS.build_model
— Functionbuild_model(model_equations::AbstractString,R=false; df::AbstractFloat=4.0)
- Build a model from model equations with the residual variance R. In Bayesian analysis, R is the mean for the prior assigned for the residual variance with degree of freedom df, defaulting to 4.0. If R is not provided, a value is calculated from responses (phenotypes).
- By default, all variabels in modelequations are factors (categorical) and fixed. Set variables to be covariates (continuous) or random using functions `setcovariate()
or
set_random()`.
#single-trait
model_equations = "BW = intercept + age + sex"
R = 6.72
models = build_model(model_equations,R);
#multi-trait
model_equations = "BW = intercept + age + sex
CW = intercept + litter";
R = [6.72 24.84
24.84 708.41]
models = build_model(model_equations,R);
JWAS.PedModule.get_pedigree
— Functionget_pedigree(pedfile::AbstractString;header=false,separator=',')
- Get pedigree informtion from a pedigree file with header (defaulting to
false
) and separator (defaulting to,
). - Pedigree file format:
a,0,0
c,a,b
d,a,c
JWAS.get_genotypes
— Functionget_genotypes(file::Union{AbstractString,Array{Float64,2},Array{Float32,2},Array{Any,2},DataFrames.DataFrame},G=false;
separator=',',header=true,rowID=false,
center=true,quality_control=false,
method = "RR-BLUP",Pi = 0.0,estimatePi = true,estimateScale=false,
G_is_marker_variance = false,df = 4.0)
- Get marker informtion from a genotype file/matrix. This file needs to be column-wise sorted by marker positions.
- If a text file is provided, the file format should be:
Animal,marker1,marker2,marker3,marker4,marker5 S1,1,0,1,1,1 D1,2,0,2,2,1 O1,1,2,0,1,0 O3,0,0,2,1,1
- If an nxp Matrix of genotypes (Array or DataFrame) is provided, where n is the number of individuals and p is the number of markers,
- This matrix needs to be column-wise sorted by marker positions.
- rowID is a vector of individual IDs, e.g.,rowID=["a1","b2","c1"]; if it is omitted, IDs will be set to 1:n
- header is a header vector such as ["id"; "mrk1"; "mrk2";...;"mrkp"]. If omitted, marker names will be set to 1:p
- If a text file is provided, the file format should be:
- If
quality_control
=true,- Missing genotypes should be denoted as
9
, and will be replace by column means. Users can also impute missing genotypes before the analysis. - Minor allele frequency
MAF
threshold, defaulting to0.01
, is uesd, and fixed loci are removed.
- Missing genotypes should be denoted as
- G is the mean for the prior assigned for the genomic variance with degree of freedom df, defaulting to 4.0. If G is not provided, a value is calculated from responses (phenotypes).
- Available
methods
include "conventional (no markers)", "RR-BLUP", "BayesA", "BayesB", "BayesC", "Bayesian Lasso", and "GBLUP". - In Bayesian variable selection methods,
Pi
for single-trait analyses is a number;Pi
for multi-trait analyses is a dictionary such asPi=Dict([1.0; 1.0]=>0.7,[1.0; 0.0]=>0.1,[0.0; 1.0]=>0.1,[0.0; 0.0]=>0.1)
, defaulting toall markers have effects (Pi = 0.0)
in single-trait analysis andall markers have effects on all traits (Pi=Dict([1.0; 1.0]=>1.0,[0.0; 0.0]=>0.0))
in multi-trait analysis.Pi
is estimated ifestimatePi
= true, , defaulting tofalse
. - Scale parameter for prior of marker effect variance is estimated if
estimateScale
= true, defaulting tofalse
.
JWAS.set_covariate
— Functionset_covariate(model::MME,variables::AbstractString...)
- set variables as covariates; model is the output of function
build_model()
.
#After running build_model, variabels age and year can be set to be covariates as
set_covariate(model,"age","year")
#or
set_covariate(model,"age year")
JWAS.set_random
— Functionset_random(mme::MME,randomStr::AbstractString,ped::Pedigree, G;df=4)
- set variables as random polygenic effects with pedigree information ped. and variances G.
- G is the mean for the prior assigned for the variance with degree of freedom df, defaulting to 4.0. If G is not provided, a value is calculated from responses (phenotypes).
#single-trait (example 1)
model_equation = "y = intercept + age + animal"
model = build_model(model_equation,R)
ped = get_pedigree(pedfile)
G = 1.6
set_random(model,"animal", ped, G)
#single-trait (example 2)
model_equation = "y = intercept + age + animal + animal*age"
model = build_model(model_equation,R)
ped = get_pedigree(pedfile)
G = [1.6 0.2
0.2 1.0]
set_random(model,"animal animal*age", ped,G)
#multi-trait
model_equations = "BW = intercept + age + sex + animal
CW = intercept + age + sex + animal"
model = build_model(model_equations,R);
ped = get_pedigree(pedfile);
G = [6.72 2.84
2.84 8.41]
set_random(model,"animal",ped,G)
set_random(mme::MME,randomStr::AbstractString,G;Vinv=0,names=[],df=4)
- set variables as random effects, defaulting to i.i.d effects, with variances G.
- G is the mean for the prior assigned for the variance with degree of freedom df, defaulting to 4.0. If G is not provided, a value is calculated from responses (phenotypes).
- the random effects are assumed to be i.i.d by default and it can be defined with any (inverse of) covariance structure Vinv with its index (row names) provided by names.
#single-trait (i.i.d randome effects)
model_equation = "y = intercept + litter + sex"
model = build_model(model_equation,R)
G = 0.6
set_random(model,"litter",G)
#multi-trait (i.i.d randome effects)
model_equations = "BW = intercept + litter + sex
CW = intercept + litter + sex"
model = build_model(model_equations,R);
G = [3.72 1.84
1.84 3.41]
set_random(model,"litter",G)
#single-trait (randome effects with specific covariance structures)
model_equation = "y = intercept + litter + sex"
model = build_model(model_equation,R)
V = [1.0 0.5 0.25
0.5 1.0 0.5
0.25 0.5 1.0]
G = 0.6
set_random(model,"litter",G,Vinv=inv(V),names=[a1;a2;a3])
JWAS.runMCMC
— FunctionrunMCMC(model::MME,df::DataFrame;
weight = false,
### MCMC
chain_length = 1000,
starting_value = false,
burnin = 0,
output_samples_frequency = chain_length/1000,
output_folder = results,
update_priors_frequency = 0,
### Methods
estimate_variance = true,
estimateScale = false,
single_step_analysis = false,
pedigree = false,
categorical_trait = false,
missing_phenotypes = true,
constraint = false,
causal_structure = false,
### Genomic Prediction
outputEBV = true,
output_heritability = true,
### MISC
seed = false,
printout_model_info = true,
printout_frequency = 0)
Run MCMC for Bayesian Linear Mixed Models with or without estimation of variance components.
- Markov chain Monte Carlo
- The first
burnin
iterations are discarded at the beginning of a MCMC chain of lengthchain_length
. - Save MCMC samples every
output_samples_frequency
iterations, defaulting tochain_length/1000
, to a folderoutput_folder
, defaulting toresults
. MCMC samples for hyperparametes (variance componets) and marker effects are saved by default. MCMC samples for location parametes can be saved usingoutput_MCMC_samples()
. Note that saving MCMC samples too frequently slows down the computation. - The
starting_value
can be provided as a vector for all location parameteres and marker effects, defaulting to0.0
s. The order of starting values for location parameters and marker effects should be the order of location parameters in the Mixed Model Equation for all traits (This can be obtained by getNames(model)) and then markers for all traits (all markers for trait 1 then all markers for trait 2...). - Miscellaneous Options
- Priors are updated every
update_priors_frequency
iterations, defaulting to0
.
- Priors are updated every
- The first
- Methods
- Single step analysis is allowed if
single_step_analysis
=true
andpedigree
is provided. - Variance components are estimated if
estimate_variance
=true, defaulting totrue
. - Miscellaneous Options
- Missing phenotypes are allowed in multi-trait analysis with
missing_phenotypes
=true, defaulting totrue
. - Catogorical Traits are allowed if
categorical_trait
=true, defaulting tofalse
. Phenotypes should be coded as 1,2,3... - Censored traits are allowed if the upper bounds are provided in
censored_trait
as an array, and lower bounds are provided as phenotypes. - If
constraint
=true, defaulting tofalse
, constrain residual covariances between traits to be zeros. - If
causal_structure
is provided, e.g., causal_structure = [0.0,0.0,0.0;1.0,0.0,0.0;1.0,0.0,0.0] for trait 2 -> trait 1 and trait 3 -> trait 1, phenotypic causal networks will be incorporated using structure equation models.
- Missing phenotypes are allowed in multi-trait analysis with
- Single step analysis is allowed if
- Genomic Prediction
- Individual estimted breeding values (EBVs) and prediction error variances (PEVs) are returned if
outputEBV
=true, defaulting totrue
. Heritability and genetic
output_heritability
=true
, defaulting totrue
. Note that estimation of heritability is computaionally intensive. - Individual estimted breeding values (EBVs) and prediction error variances (PEVs) are returned if
- Miscellaneous Options
- Print out the model information in REPL if
printout_model_info
=true; print out the monte carlo mean in REPL withprintout_frequency
, defaulting tofalse
. - If
seed
, defaulting tofalse
, is provided, a reproducible sequence of numbers will be generated for random number generation.
- Print out the model information in REPL if
JWAS.outputEBV
— FunctionoutputEBV(model,IDs::Array)
Output estimated breeding values and prediction error variances for IDs.
JWAS.outputMCMCsamples
— FunctionoutputMCMCsamples(mme::MME,trmStrs::AbstractString...)
Get MCMC samples for specific location parameters.
JWAS.showMME
— FunctionshowMME(mme::MME,df::DataFrame)
- Show left-hand side and right-hand side of mixed model equations (no markers).
JWAS.solve
— Functionsolve(mme::MME,df::DataFrame;solver="default",printout_frequency=100,tolerance = 0.000001,maxiter = 5000)
- Solve the mixed model equations (no marker information) without estimating variance components.
Available solvers include default
, Jacobi
, Gauss-Seidel
, and Gibbs sampler
.
JWAS.GWAS
— FunctionGWAS(marker_effects_file,model;header=true,window_size=100,threshold=0.001)
run genomic window-based GWAS without a map file
- MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
- window_size is either a constant (identical number of markers in each window) or an array of number of markers in each window
- model is either the model::MME used in analysis or the genotypic covariate matrix M::Array
- File format:
GWAS(marker_effects_file;header=true)
Compute the model frequency for each marker (the probability the marker is included in the model) using samples of marker effects stored in markereffectsfile.
GWAS(model,map_file,marker_effects_file...;
window_size = "1 Mb",sliding_window = false,
GWAS = true, threshold = 0.001,
genetic_correlation = false,
header = true)
run genomic window-based GWAS
- MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
- model is either the model::MME used in analysis or the genotype cavariate matrix M::Array
- map_file has the (sorted) marker position information with delimiter ','. If the map file is not provided, a fake map file will be generated with 100 markers in each 1 Mb window.
- If two markereffectsfile are provided, and genetic_correlation = true, genomic correlation for each window is calculated.
- Statistics are computed for nonoverlapping windows of size window_size by default. If sliding_window = true, those for overlapping sliding windows are calculated.
- map file format:
markerID,chromosome,position
m1,1,16977
m2,1,434311
m3,1,1025513
m4,2,70350
m5,2,101135