Public functions
Documentation for JWAS.jl's public interface. Below are functions available to general users.
Index
JWAS.GWAS
JWAS.add_genotypes
JWAS.build_model
JWAS.outputEBV
JWAS.outputMCMCsamples
JWAS.runMCMC
JWAS.set_covariate
JWAS.set_random
JWAS.showMME
JWAS.solve
Public Interface
JWAS.build_model
— Function.build_model(model_equations::AbstractString,R=false; df::Float64=4.0)
- Build a model from model equations with the residual variance R. In Bayesian analysis, R is the mean for the prior assigned for the residual variance with degree of freedom df defaulting to 4.0. If R is not provided, a value is calculate from responses (phenotypes).
- By default, all variabels in modelequations are factors (categorical) and fixed. Set variables to be covariates (continuous) or random using functions `setcovariate()
or
set_random()`.
#single-trait
model_equations = "BW = intercept + age + sex"
R = 6.72
models = build_model(model_equations,R);
#multi-trait
model_equations = "BW = intercept + age + sex
CW = intercept + litter";
R = [6.72 24.84
24.84 708.41]
models = build_model(model_equations,R);
JWAS.set_covariate
— Function.set_covariate(model::MME,variables::AbstractString...)
- set variables as covariates; model is the output of function
build_model()
.
#After running build_model, variabels age and year can be set to be covariates as
set_covariate(model,"age","year")
#or
set_covariate(model,"age year")
JWAS.set_random
— Function.set_random(mme::MME,randomStr::AbstractString,ped::Pedigree, G;df=4)
- set variables as random polygenic effects with pedigree information ped and variances G with degree of freedom df, defaulting to
4.0
.
#single-trait (example 1)
model_equation = "y = intercept + age + animal"
model = build_model(model_equation,R)
ped = get_pedigree(pedfile)
G = 1.6
set_random(model,"animal", ped, G)
#single-trait (example 2)
model_equation = "y = intercept + age + animal + animal*age"
model = build_model(model_equation,R)
ped = get_pedigree(pedfile)
G = [1.6 0.2
0.2 1.0]
set_random(model,"animal animal*age", ped,G)
#multi-trait
model_equations = "BW = intercept + age + sex + animal
CW = intercept + age + sex + animal"
model = build_model(model_equations,R);
ped = get_pedigree(pedfile);
G = [6.72 2.84
2.84 8.41]
set_random(model,"animal",ped,G)
set_random(mme::MME,randomStr::AbstractString,G;Vinv=0,names=[],df=4)
- set variables as random effects, defaulting to i.i.d effects, with variances G with degree of freedom df, defaulting to 4.0.
- the random effects are assumed to be i.i.d by default and it can be defined with any (inverse of) covariance structure Vinv with its index (row names) provided by names.
#single-trait (i.i.d randome effects)
model_equation = "y = intercept + litter + sex"
model = build_model(model_equation,R)
G = 0.6
set_random(model,"litter",G)
#multi-trait (i.i.d randome effects)
model_equations = "BW = intercept + litter + sex
CW = intercept + litter + sex"
model = build_model(model_equations,R);
G = [3.72 1.84
1.84 3.41]
set_random(model,"litter",G)
#single-trait (randome effects with specific covariance structures)
model_equation = "y = intercept + litter + sex"
model = build_model(model_equation,R)
V = [1.0 0.5 0.25
0.5 1.0 0.5
0.25 0.5 1.0]
G = 0.6
set_random(model,"litter",G,Vinv=inv(V),names=[a1;a2;a3])
JWAS.PedModule.get_pedigree
— Function.get_pedigree(pedfile::AbstractString;header=false,separator=',')
- Get pedigree informtion from a pedigree file with header (defaulting to
false
) and separator (defaulting to,
). - Pedigree file format:
a,0,0
c,a,b
d,a,c
JWAS.add_genotypes
— Function.add_genotypes(mme::MME,M,G;header=false,center=true,rowID=false,G_is_marker_variance=false,df=4)
- Get marker informtion from an nxp Matrix M of genotypes (Array or DataFrame), where n is the number of individuals and p is the number of markers. This matrix needs to be column-wise sorted by marker positions.
- rowID is a vector of individual IDs, e.g.,rowID=["a1","b2","c1"]; if it is omitted, IDs will be set to 1:n
- header is a header vector such as ["id"; "mrk1"; "mrk2";...;"mrkp"]. If omitted, marker names will be set to 1:p
- G defaults to the genetic variance with degree of freedom df=4.0.
add_genotypes(mme::MME,file,G;separator=' ',header=true,center=true,G_is_marker_variance=false,df=4.0)
- Get marker informtion from a genotype file. This file needs to be column-wise sorted by marker positions.
- G defaults to the genetic variance with degree of freedom df=4.0.
- File format:
Animal,marker1,marker2,marker3,marker4,marker5
S1,1,0,1,1,1
D1,2,0,2,2,1
O1,1,2,0,1,0
O3,0,0,2,1,1
JWAS.runMCMC
— Function.runMCMC(model::MME,df::DataFrame;
### MCMC
chain_length = 1000,
starting_value = false,
burnin = 0,
output_samples_frequency = chain_length/1000,
output_samples_file = "MCMC_samples",
update_priors_frequency = 0,
### Methods
methods = "conventional (no markers)",
estimate_variance = true,
Pi = 0.0,
estimatePi = false,
estimateScale = false,
single_step_analysis = false,
pedigree = false,
categorical_trait = false,
missing_phenotypes = true,
constraint = false,
causal_structure = false,
### Genomic Prediction
outputEBV = true,
output_heritability = true,
### MISC
seed = false,
printout_model_info = true,
printout_frequency = 0)
Run MCMC for Bayesian Linear Mixed Models with or without estimation of variance components.
- Markov chain Monte Carlo
- The first
burnin
iterations are discarded at the beginning of a MCMC chain of lengthchain_length
. - The
starting_value
can be provided as a vector for all location parameteres and marker effects, defaulting to0.0
s. - Save MCMC samples every
output_samples_frequency
iterations, defaulting tochain_length/1000
, to filesoutput_samples_file
, defaulting toMCMC_samples.txt
. MCMC samples for hyperparametes (variance componets) and marker effects are saved by default. MCMC samples for location parametes can be saved usingoutput_MCMC_samples()
. Note that saving MCMC samples too frequently slows down the computation. - Miscellaneous Options
- Priors are updated every
update_priors_frequency
iterations, defaulting to0
.
- Priors are updated every
- The first
- Methods
- Available
methods
include "conventional (no markers)", "RR-BLUP", "BayesB", "BayesC", "Bayesian Lasso", and "GBLUP". - Single step analysis is allowed if
single_step_analysis
=true
andpedigree
is provided. - In Bayesian variable selection methods,
Pi
for single-trait analyses is a number;Pi
for multi-trait analyses is a dictionary such asPi=Dict([1.0; 1.0]=>0.7,[1.0; 0.0]=>0.1,[0.0; 1.0]=>0.1,[0.0; 0.0]=>0.1)
, defaulting toall markers have effects (Pi = 0.0)
in single-trait analysis andall markers have effects on all traits (Pi=Dict([1.0; 1.0]=>1.0,[0.0; 0.0]=>0.0))
in multi-trait analysis.Pi
is estimated ifestimatePi
= true, , defaulting tofalse
. - Variance components are estimated if
estimate_variance
=true, defaulting totrue
. - Scale parameter for prior of marker effect variance is estimated if
estimateScale
= true, defaulting tofalse
. - Miscellaneous Options
- Missing phenotypes are allowed in multi-trait analysis with
missing_phenotypes
=true, defaulting totrue
. - Catogorical Traits are allowed if
categorical_trait
=true, defaulting tofalse
. - If
constraint
=true, defaulting tofalse
, constrain residual covariances between traits to be zeros. - If
causal_structure
is provided, e.g., causal_structure = [0.0,0.0,0.0;1.0,0.0,0.0;1.0,0.0,0.0] for trait 2 -> trait 1 and trait 3 -> trait 1, phenotypic causal networks will be incorporated using structure equation models.
- Missing phenotypes are allowed in multi-trait analysis with
- Available
- Genomic Prediction
- Individual estimted breeding values (EBVs) and prediction error variances (PEVs) are returned if
outputEBV
=true, defaulting totrue
. Heritability and genetic
output_heritability
=true
, defaulting totrue
. Note that estimation of heritability is computaionally intensive. - Individual estimted breeding values (EBVs) and prediction error variances (PEVs) are returned if
- Miscellaneous Options
- Print out the model information in REPL if
printout_model_info
=true; print out the monte carlo mean in REPL withprintout_frequency
, defaulting tofalse
. - If
seed
, defaulting tofalse
, is provided, a reproducible sequence of numbers will be generated for random number generation.
- Print out the model information in REPL if
JWAS.outputEBV
— Function.outputEBV(model,IDs::Array;PEV=false)
Output estimated breeding values and prediction error variances (defaulting to false) for IDs.
JWAS.outputMCMCsamples
— Function.outputMCMCsamples(mme::MME,trmStr::AbstractString...)
Get MCMC samples for specific location parameters.
JWAS.showMME
— Function.showMME(mme::MME,df::DataFrame)
- Show left-hand side and right-hand side of mixed model equations (no markers).
JWAS.solve
— Function.solve(mme::MME,df::DataFrame;solver="default",printout_frequency=100,tolerance = 0.000001,maxiter = 5000)
- Solve the mixed model equations (no marker information) without estimating variance components.
Available solvers include default
, Jacobi
, GaussSeidel
, and Gibbs sampler
.
GWAS
get_additive_genetic_variances
get_heritability
get_correlations
get_breeding_values
QC