Public

Public functions

Documentation for JWAS.jl's public interface. Below are functions available to general users.

Index

Public Interface

JWAS.build_modelFunction.
build_model(model_equations::AbstractString,R=false; df::Float64=4.0)
  • Build a model from model equations with the residual variance R. In Bayesian analysis, R is the mean for the prior assigned for the residual variance with degree of freedom df defaulting to 4.0. If R is not provided, a value is calculate from responses (phenotypes).
  • By default, all variabels in modelequations are factors (categorical) and fixed. Set variables to be covariates (continuous) or random using functions `setcovariate()orset_random()`.
#single-trait
model_equations = "BW = intercept + age + sex"
R               = 6.72
models          = build_model(model_equations,R);

#multi-trait
model_equations = "BW = intercept + age + sex
                   CW = intercept + litter";
R               = [6.72   24.84
                   24.84  708.41]
models          = build_model(model_equations,R);
source
JWAS.set_covariateFunction.
set_covariate(model::MME,variables::AbstractString...)
  • set variables as covariates; model is the output of function build_model().
#After running build_model, variabels age and year can be set to be covariates as
set_covariate(model,"age","year")
#or
set_covariate(model,"age year")
source
JWAS.set_randomFunction.
set_random(mme::MME,randomStr::AbstractString,ped::Pedigree, G;df=4)
  • set variables as random polygenic effects with pedigree information ped and variances G with degree of freedom df, defaulting to 4.0.
#single-trait (example 1)
model_equation  = "y = intercept + age + animal"
model           = build_model(model_equation,R)
ped             = get_pedigree(pedfile)
G               = 1.6
set_random(model,"animal", ped, G)

#single-trait (example 2)
model_equation  = "y = intercept + age + animal + animal*age"
model           = build_model(model_equation,R)
ped             = get_pedigree(pedfile)
G               = [1.6   0.2
                   0.2  1.0]
set_random(model,"animal animal*age", ped,G)

#multi-trait
model_equations = "BW = intercept + age + sex + animal
                   CW = intercept + age + sex + animal"
model           = build_model(model_equations,R);
ped             = get_pedigree(pedfile);
G               = [6.72   2.84
                   2.84  8.41]
set_random(model,"animal",ped,G)
source
set_random(mme::MME,randomStr::AbstractString,G;Vinv=0,names=[],df=4)
  • set variables as random effects, defaulting to i.i.d effects, with variances G with degree of freedom df, defaulting to 4.0.
  • the random effects are assumed to be i.i.d by default and it can be defined with any (inverse of) covariance structure Vinv with its index (row names) provided by names.
#single-trait (i.i.d randome effects)
model_equation  = "y = intercept + litter + sex"
model           = build_model(model_equation,R)
G               = 0.6
set_random(model,"litter",G)

#multi-trait (i.i.d randome effects)
model_equations = "BW = intercept + litter + sex
                   CW = intercept + litter + sex"
model           = build_model(model_equations,R);
G               = [3.72  1.84
                   1.84  3.41]
set_random(model,"litter",G)

#single-trait (randome effects with specific covariance structures)
model_equation  = "y = intercept + litter + sex"
model           = build_model(model_equation,R)
V               = [1.0  0.5 0.25
                   0.5  1.0 0.5
                   0.25 0.5 1.0]
G               = 0.6
set_random(model,"litter",G,Vinv=inv(V),names=[a1;a2;a3])
source
get_pedigree(pedfile::AbstractString;header=false,separator=',')
  • Get pedigree informtion from a pedigree file with header (defaulting to false) and separator (defaulting to ,).
  • Pedigree file format:
a,0,0
c,a,b
d,a,c
source
JWAS.add_genotypesFunction.
add_genotypes(mme::MME,M,G;header=false,center=true,rowID=false,G_is_marker_variance=false,df=4)
  • Get marker informtion from an nxp Matrix M of genotypes (Array or DataFrame), where n is the number of individuals and p is the number of markers. This matrix needs to be column-wise sorted by marker positions.
  • rowID is a vector of individual IDs, e.g.,rowID=["a1","b2","c1"]; if it is omitted, IDs will be set to 1:n
  • header is a header vector such as ["id"; "mrk1"; "mrk2";...;"mrkp"]. If omitted, marker names will be set to 1:p
  • G defaults to the genetic variance with degree of freedom df=4.0.
source
add_genotypes(mme::MME,file,G;separator=' ',header=true,center=true,G_is_marker_variance=false,df=4.0)
  • Get marker informtion from a genotype file. This file needs to be column-wise sorted by marker positions.
  • G defaults to the genetic variance with degree of freedom df=4.0.
  • File format:
Animal,marker1,marker2,marker3,marker4,marker5
S1,1,0,1,1,1
D1,2,0,2,2,1
O1,1,2,0,1,0
O3,0,0,2,1,1
source
JWAS.runMCMCFunction.
runMCMC(model::MME,df::DataFrame;
        ### MCMC
        chain_length             = 1000,
        starting_value           = false,
        burnin                   = 0,
        output_samples_frequency = chain_length/1000,
        output_samples_file      = "MCMC_samples",
        update_priors_frequency  = 0,
        ### Methods
        methods                  = "conventional (no markers)",
        estimate_variance        = true,
        Pi                       = 0.0,
        estimatePi               = false,
        estimateScale            = false,
        single_step_analysis     = false,
        pedigree                 = false,
        categorical_trait        = false,
        missing_phenotypes       = true,
        constraint               = false,
        causal_structure         = false,
        ### Genomic Prediction
        outputEBV                = true,
        output_heritability      = true,
        ### MISC
        seed                     = false,
        printout_model_info      = true,
        printout_frequency       = 0)

Run MCMC for Bayesian Linear Mixed Models with or without estimation of variance components.

  • Markov chain Monte Carlo
    • The first burnin iterations are discarded at the beginning of a MCMC chain of length chain_length.
    • The starting_value can be provided as a vector for all location parameteres and marker effects, defaulting to 0.0s.
    • Save MCMC samples every output_samples_frequency iterations, defaulting to chain_length/1000, to files output_samples_file, defaulting to MCMC_samples.txt. MCMC samples for hyperparametes (variance componets) and marker effects are saved by default. MCMC samples for location parametes can be saved using output_MCMC_samples(). Note that saving MCMC samples too frequently slows down the computation.
    • Miscellaneous Options
      • Priors are updated every update_priors_frequency iterations, defaulting to 0.
  • Methods
    • Available methods include "conventional (no markers)", "RR-BLUP", "BayesB", "BayesC", "Bayesian Lasso", and "GBLUP".
    • Single step analysis is allowed if single_step_analysis = true and pedigree is provided.
    • In Bayesian variable selection methods, Pi for single-trait analyses is a number; Pi for multi-trait analyses is a dictionary such as Pi=Dict([1.0; 1.0]=>0.7,[1.0; 0.0]=>0.1,[0.0; 1.0]=>0.1,[0.0; 0.0]=>0.1), defaulting to all markers have effects (Pi = 0.0) in single-trait analysis and all markers have effects on all traits (Pi=Dict([1.0; 1.0]=>1.0,[0.0; 0.0]=>0.0)) in multi-trait analysis. Pi is estimated if estimatePi = true, , defaulting to false.
    • Variance components are estimated if estimate_variance=true, defaulting to true.
    • Scale parameter for prior of marker effect variance is estimated if estimateScale = true, defaulting to false.
    • Miscellaneous Options
      • Missing phenotypes are allowed in multi-trait analysis with missing_phenotypes=true, defaulting to true.
      • Catogorical Traits are allowed if categorical_trait=true, defaulting to false.
      • If constraint=true, defaulting to false, constrain residual covariances between traits to be zeros.
      • If causal_structure is provided, e.g., causal_structure = [0.0,0.0,0.0;1.0,0.0,0.0;1.0,0.0,0.0] for trait 2 -> trait 1 and trait 3 -> trait 1, phenotypic causal networks will be incorporated using structure equation models.
  • Genomic Prediction
    • Individual estimted breeding values (EBVs) and prediction error variances (PEVs) are returned if outputEBV=true, defaulting to true. Heritability and genetic
    variances are returned if output_heritability=true, defaulting to true. Note that estimation of heritability is computaionally intensive.
  • Miscellaneous Options
    • Print out the model information in REPL if printout_model_info=true; print out the monte carlo mean in REPL with printout_frequency, defaulting to false.
    • If seed, defaulting to false, is provided, a reproducible sequence of numbers will be generated for random number generation.
source
JWAS.outputEBVFunction.
outputEBV(model,IDs::Array;PEV=false)

Output estimated breeding values and prediction error variances (defaulting to false) for IDs.

source
outputMCMCsamples(mme::MME,trmStr::AbstractString...)

Get MCMC samples for specific location parameters.

source
JWAS.showMMEFunction.
showMME(mme::MME,df::DataFrame)
  • Show left-hand side and right-hand side of mixed model equations (no markers).
source
JWAS.solveFunction.
solve(mme::MME,df::DataFrame;solver="default",printout_frequency=100,tolerance = 0.000001,maxiter = 5000)
  • Solve the mixed model equations (no marker information) without estimating variance components.

Available solvers include default, Jacobi, GaussSeidel, and Gibbs sampler.

source
GWAS
get_additive_genetic_variances
get_heritability
get_correlations
get_breeding_values
QC