Internal functions

Documentation for JWAS.jl's internal (private) interface, which are not available to general users. These internal functions are small blocks that public function build on.

<!–-

Index

JWAS.GWAS
JWAS.GWAS
JWAS.GWAS
JWAS.PedModule.get_pedigree
JWAS.PedModule.getinfo
JWAS.add_genotypes
JWAS.add_genotypes
JWAS.add_term
JWAS.build_model
JWAS.getEBV
JWAS.getEBV
JWAS.getEBV
JWAS.getMCMCinfo
JWAS.getMME
JWAS.get_outputX_others
JWAS.getinfo
JWAS.outputEBV
JWAS.outputMCMCsamples
JWAS.runMCMC
JWAS.set_covariate
JWAS.set_random
JWAS.set_random
JWAS.showMME
JWAS.solve

Internal interface

JWAS.GWAS — Method.

GWAS(marker_effects_file,map_file,model;header=true,window_size="1 Mb",threshold=0.001)

run genomic window-based GWAS

MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
model is either the model::MME used in analysis or the genotypic cavariate matrix M::Array
map_file has the (sorted) marker position information with delimiter ','.
File format:

markerID,chromosome,position
m1,1,16977
m2,1,434311
m3,1,1025513
m4,2,70350
m5,2,101135

source

JWAS.GWAS — Method.

GWAS(marker_effects_file,model;header=true,window_size=100,threshold=0.001)

run genomic window-based GWAS without a map file

MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
window_size is either a constant (identical number of markers in each window) or an array of number of markers in each window
model is either the model::MME used in analysis or the genotypic covariate matrix M::Array
File format:

source

JWAS.GWAS — Method.

GWAS(marker_effects_file;header=true)

Compute the model frequency for each marker (the probability the marker is included in the model) using samples of marker effects stored in markereffectsfile.

source

JWAS.add_genotypes — Function.

add_genotypes(mme::MME,M,G;header=false,center=true,rowID=false,G_is_marker_variance=false,df=4)

Get marker informtion from an nxp Matrix M of genotypes (Array or DataFrame), where n is the number of individuals and p is the number of markers. This matrix needs to be column-wise sorted by marker positions.
rowID is a vector of individual IDs, e.g.,rowID=["a1","b2","c1"]; if it is omitted, IDs will be set to 1:n
header is a header vector such as ["id"; "mrk1"; "mrk2";...;"mrkp"]. If omitted, marker names will be set to 1:p
G defaults to the genetic variance with degree of freedom df=4.0.

source

JWAS.add_genotypes — Function.

add_genotypes(mme::MME,file,G;separator=' ',header=true,center=true,G_is_marker_variance=false,df=4.0)

Get marker informtion from a genotype file. This file needs to be column-wise sorted by marker positions.
G defaults to the genetic variance with degree of freedom df=4.0.
File format:

Animal,marker1,marker2,marker3,marker4,marker5
S1,1,0,1,1,1
D1,2,0,2,2,1
O1,1,2,0,1,0
O3,0,0,2,1,1

source

JWAS.build_model — Function.

build_model(model_equations::AbstractString,R=false; df::Float64=4.0)

Build a model from model equations with the residual variance R. In Bayesian analysis, R is the mean for the prior assigned for the residual variance with degree of freedom df defaulting to 4.0. If R is not provided, a value is calculate from responses (phenotypes).
By default, all variabels in modelequations are factors (categorical) and fixed. Set variables to be covariates (continuous) or random using functions `setcovariate()orset_random()`.

#single-trait
model_equations = "BW = intercept + age + sex"
R               = 6.72
models          = build_model(model_equations,R);

#multi-trait
model_equations = "BW = intercept + age + sex
                   CW = intercept + litter";
R               = [6.72   24.84
                   24.84  708.41]
models          = build_model(model_equations,R);

source

JWAS.getEBV — Method.

getEBV(model::MME,genotypefile::AbstractString;header=true,separator=',')

Get estimated breeding values for individuals with genotypes defined by genotypefile. The genotype file format is same to that used in add_genotypes(). The same order for markers (columns) in genotypefile as in add_genotypes() is assumed for simplicity now. The file format is as follows,

Animal,marker1,marker2,marker3,marker4,marker5
A1,1,0,1,1,1
C3,1,2,0,1,0
B2,0,0,2,1,1

source

JWAS.getEBV — Method.

getEBV(model::MME,genotypes::Array{Float64,2})

Get estimated breeding values for individuals with genotypes. The genotypes are provided as an Array. The same order for markers (columns)in genotypes::Array{Float64,2} as in add_genotypes() is assumed for simplicity now.

source

JWAS.getEBV — Method.

getEBV(model::MME)

Get estimated breeding values for individuals defined by outputEBV() (defaulting to all genotyped individuals).

source

JWAS.getinfo — Method.

getinfo(model::MME)

Print out model information.

source

JWAS.outputEBV — Method.

outputEBV(model,IDs::Array;PEV=false)

Output estimated breeding values and prediction error variances (defaulting to false) for IDs.

source

JWAS.outputMCMCsamples — Method.

outputMCMCsamples(mme::MME,trmStr::AbstractString...)

Get MCMC samples for specific location parameters.

source

JWAS.runMCMC — Method.

runMCMC(model::MME,df::DataFrame;
        ### MCMC
        chain_length             = 1000,
        starting_value           = false,
        burnin                   = 0,
        output_samples_frequency = chain_length/1000,
        output_samples_file      = "MCMC_samples",
        update_priors_frequency  = 0,
        ### Methods
        methods                  = "conventional (no markers)",
        estimate_variance        = true,
        Pi                       = 0.0,
        estimatePi               = false,
        estimateScale            = false,
        single_step_analysis     = false,
        pedigree                 = false,
        categorical_trait        = false,
        missing_phenotypes       = true,
        constraint               = false,
        causal_structure         = false,
        ### Genomic Prediction
        outputEBV                = true,
        output_heritability      = true,
        ### MISC
        seed                     = false,
        printout_model_info      = true,
        printout_frequency       = 0)

Run MCMC for Bayesian Linear Mixed Models with or without estimation of variance components.

Markov chain Monte Carlo
- The first burnin iterations are discarded at the beginning of a MCMC chain of length chain_length.
- The starting_value can be provided as a vector for all location parameteres and marker effects, defaulting to 0.0s.
- Save MCMC samples every output_samples_frequency iterations, defaulting to chain_length/1000, to files output_samples_file, defaulting to MCMC_samples.txt. MCMC samples for hyperparametes (variance componets) and marker effects are saved by default. MCMC samples for location parametes can be saved using output_MCMC_samples(). Note that saving MCMC samples too frequently slows down the computation.
- Miscellaneous Options
  - Priors are updated every update_priors_frequency iterations, defaulting to 0.
Methods
- Available methods include "conventional (no markers)", "RR-BLUP", "BayesB", "BayesC", "Bayesian Lasso", and "GBLUP".
- Single step analysis is allowed if single_step_analysis = true and pedigree is provided.
- In Bayesian variable selection methods, Pi for single-trait analyses is a number; Pi for multi-trait analyses is a dictionary such as Pi=Dict([1.0; 1.0]=>0.7,[1.0; 0.0]=>0.1,[0.0; 1.0]=>0.1,[0.0; 0.0]=>0.1), defaulting to all markers have effects (Pi = 0.0) in single-trait analysis and all markers have effects on all traits (Pi=Dict([1.0; 1.0]=>1.0,[0.0; 0.0]=>0.0)) in multi-trait analysis. Pi is estimated if estimatePi = true, , defaulting to false.
- Variance components are estimated if estimate_variance=true, defaulting to true.
- Scale parameter for prior of marker effect variance is estimated if estimateScale = true, defaulting to false.
- Miscellaneous Options
  - Missing phenotypes are allowed in multi-trait analysis with missing_phenotypes=true, defaulting to true.
  - Catogorical Traits are allowed if categorical_trait=true, defaulting to false.
  - If constraint=true, defaulting to false, constrain residual covariances between traits to be zeros.
  - If causal_structure is provided, e.g., causal_structure = [0.0,0.0,0.0;1.0,0.0,0.0;1.0,0.0,0.0] for trait 2 -> trait 1 and trait 3 -> trait 1, phenotypic causal networks will be incorporated using structure equation models.
Genomic Prediction
- Individual estimted breeding values (EBVs) and prediction error variances (PEVs) are returned if outputEBV=true, defaulting to true. Heritability and genetic
variances are returned if output_heritability=true, defaulting to true. Note that estimation of heritability is computaionally intensive.
Miscellaneous Options
- Print out the model information in REPL if printout_model_info=true; print out the monte carlo mean in REPL with printout_frequency, defaulting to false.
- If seed, defaulting to false, is provided, a reproducible sequence of numbers will be generated for random number generation.

source

JWAS.set_covariate — Method.

set_covariate(model::MME,variables::AbstractString...)

set variables as covariates; model is the output of function build_model().

#After running build_model, variabels age and year can be set to be covariates as
set_covariate(model,"age","year")
#or
set_covariate(model,"age year")

source

JWAS.set_random — Function.

set_random(mme::MME,randomStr::AbstractString,ped::Pedigree, G;df=4)

set variables as random polygenic effects with pedigree information ped and variances G with degree of freedom df, defaulting to 4.0.

#single-trait (example 1)
model_equation  = "y = intercept + age + animal"
model           = build_model(model_equation,R)
ped             = get_pedigree(pedfile)
G               = 1.6
set_random(model,"animal", ped, G)

#single-trait (example 2)
model_equation  = "y = intercept + age + animal + animal*age"
model           = build_model(model_equation,R)
ped             = get_pedigree(pedfile)
G               = [1.6   0.2
                   0.2  1.0]
set_random(model,"animal animal*age", ped,G)

#multi-trait
model_equations = "BW = intercept + age + sex + animal
                   CW = intercept + age + sex + animal"
model           = build_model(model_equations,R);
ped             = get_pedigree(pedfile);
G               = [6.72   2.84
                   2.84  8.41]
set_random(model,"animal",ped,G)

source

JWAS.set_random — Function.

set_random(mme::MME,randomStr::AbstractString,G;Vinv=0,names=[],df=4)

set variables as random effects, defaulting to i.i.d effects, with variances G with degree of freedom df, defaulting to 4.0.
the random effects are assumed to be i.i.d by default and it can be defined with any (inverse of) covariance structure Vinv with its index (row names) provided by names.

#single-trait (i.i.d randome effects)
model_equation  = "y = intercept + litter + sex"
model           = build_model(model_equation,R)
G               = 0.6
set_random(model,"litter",G)

#multi-trait (i.i.d randome effects)
model_equations = "BW = intercept + litter + sex
                   CW = intercept + litter + sex"
model           = build_model(model_equations,R);
G               = [3.72  1.84
                   1.84  3.41]
set_random(model,"litter",G)

#single-trait (randome effects with specific covariance structures)
model_equation  = "y = intercept + litter + sex"
model           = build_model(model_equation,R)
V               = [1.0  0.5 0.25
                   0.5  1.0 0.5
                   0.25 0.5 1.0]
G               = 0.6
set_random(model,"litter",G,Vinv=inv(V),names=[a1;a2;a3])

source

JWAS.showMME — Method.

showMME(mme::MME,df::DataFrame)

Show left-hand side and right-hand side of mixed model equations (no markers).

source

JWAS.solve — Method.

solve(mme::MME,df::DataFrame;solver="default",printout_frequency=100,tolerance = 0.000001,maxiter = 5000)

Solve the mixed model equations (no marker information) without estimating variance components.

Available solvers include default, Jacobi, GaussSeidel, and Gibbs sampler.

source

JWAS.add_term — Method.

add to model an extra term: imputation_residual

source

JWAS.getMCMCinfo — Method.

getMCMCinfo(model::MME)

Print out MCMC information.

source

JWAS.getMME — Method.

Construct mixed model equations with

incidence matrix: X ; response : ySparse; left-hand side : mmeLhs ; right-hand side : mmeLhs ;

source

JWAS.get_outputX_others — Method.

get_outputX_others(model)

Make incidence matrices for effects involved in calculation of EBV including J, ϵ, pedTrmVec except marker covariates.

source

JWAS.PedModule.get_pedigree — Method.

get_pedigree(pedfile::AbstractString;header=false,separator=',')

Get pedigree informtion from a pedigree file with header (defaulting to false) and separator (defaulting to ,).
Pedigree file format:

a,0,0
c,a,b
d,a,c

source

JWAS.PedModule.getinfo — Method.

get_info(pedigree::Pedigree)

Print summary informtion from a pedigree object including number of individulas, sires. dams and founders. Return individual IDs, inverse of numerator relationship matrix, and inbreeding coefficients.

source

–>