Internal functions
Documentation for JWAS.jl's internal (private) interface, which are not available to general users. These internal functions are small blocks that public function build on.
<!–-
Index
JWAS.GWAS
JWAS.GWAS
JWAS.GWAS
JWAS.PedModule.get_pedigree
JWAS.PedModule.getinfo
JWAS.add_genotypes
JWAS.add_genotypes
JWAS.add_term
JWAS.build_model
JWAS.getEBV
JWAS.getEBV
JWAS.getEBV
JWAS.getMCMCinfo
JWAS.getMME
JWAS.get_outputX_others
JWAS.getinfo
JWAS.outputEBV
JWAS.outputMCMCsamples
JWAS.runMCMC
JWAS.set_covariate
JWAS.set_random
JWAS.set_random
JWAS.showMME
JWAS.solve
Internal interface
JWAS.GWAS
— Method.GWAS(marker_effects_file,map_file,model;header=true,window_size="1 Mb",threshold=0.001)
run genomic window-based GWAS
- MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
- model is either the model::MME used in analysis or the genotypic cavariate matrix M::Array
- map_file has the (sorted) marker position information with delimiter ','.
- File format:
markerID,chromosome,position
m1,1,16977
m2,1,434311
m3,1,1025513
m4,2,70350
m5,2,101135
JWAS.GWAS
— Method.GWAS(marker_effects_file,model;header=true,window_size=100,threshold=0.001)
run genomic window-based GWAS without a map file
- MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
- window_size is either a constant (identical number of markers in each window) or an array of number of markers in each window
- model is either the model::MME used in analysis or the genotypic covariate matrix M::Array
- File format:
JWAS.GWAS
— Method.GWAS(marker_effects_file;header=true)
Compute the model frequency for each marker (the probability the marker is included in the model) using samples of marker effects stored in markereffectsfile.
JWAS.add_genotypes
— Function.add_genotypes(mme::MME,M,G;header=false,center=true,rowID=false,G_is_marker_variance=false,df=4)
- Get marker informtion from an nxp Matrix M of genotypes (Array or DataFrame), where n is the number of individuals and p is the number of markers. This matrix needs to be column-wise sorted by marker positions.
- rowID is a vector of individual IDs, e.g.,rowID=["a1","b2","c1"]; if it is omitted, IDs will be set to 1:n
- header is a header vector such as ["id"; "mrk1"; "mrk2";...;"mrkp"]. If omitted, marker names will be set to 1:p
- G defaults to the genetic variance with degree of freedom df=4.0.
JWAS.add_genotypes
— Function.add_genotypes(mme::MME,file,G;separator=' ',header=true,center=true,G_is_marker_variance=false,df=4.0)
- Get marker informtion from a genotype file. This file needs to be column-wise sorted by marker positions.
- G defaults to the genetic variance with degree of freedom df=4.0.
- File format:
Animal,marker1,marker2,marker3,marker4,marker5
S1,1,0,1,1,1
D1,2,0,2,2,1
O1,1,2,0,1,0
O3,0,0,2,1,1
JWAS.build_model
— Function.build_model(model_equations::AbstractString,R=false; df::Float64=4.0)
- Build a model from model equations with the residual variance R. In Bayesian analysis, R is the mean for the prior assigned for the residual variance with degree of freedom df defaulting to 4.0. If R is not provided, a value is calculate from responses (phenotypes).
- By default, all variabels in modelequations are factors (categorical) and fixed. Set variables to be covariates (continuous) or random using functions `setcovariate()
or
set_random()`.
#single-trait
model_equations = "BW = intercept + age + sex"
R = 6.72
models = build_model(model_equations,R);
#multi-trait
model_equations = "BW = intercept + age + sex
CW = intercept + litter";
R = [6.72 24.84
24.84 708.41]
models = build_model(model_equations,R);
JWAS.getEBV
— Method.getEBV(model::MME,genotypefile::AbstractString;header=true,separator=',')
Get estimated breeding values for individuals with genotypes defined by genotypefile
. The genotype file format is same to that used in add_genotypes()
. The same order for markers (columns) in genotypefile
as in add_genotypes()
is assumed for simplicity now. The file format is as follows,
Animal,marker1,marker2,marker3,marker4,marker5
A1,1,0,1,1,1
C3,1,2,0,1,0
B2,0,0,2,1,1
JWAS.getEBV
— Method.getEBV(model::MME,genotypes::Array{Float64,2})
Get estimated breeding values for individuals with genotypes. The genotypes are provided as an Array. The same order for markers (columns)in genotypes::Array{Float64,2}
as in add_genotypes()
is assumed for simplicity now.
JWAS.getEBV
— Method.getEBV(model::MME)
Get estimated breeding values for individuals defined by outputEBV()
(defaulting to all genotyped individuals).
JWAS.getinfo
— Method.getinfo(model::MME)
- Print out model information.
JWAS.outputEBV
— Method.outputEBV(model,IDs::Array;PEV=false)
Output estimated breeding values and prediction error variances (defaulting to false) for IDs.
JWAS.outputMCMCsamples
— Method.outputMCMCsamples(mme::MME,trmStr::AbstractString...)
Get MCMC samples for specific location parameters.
JWAS.runMCMC
— Method.runMCMC(model::MME,df::DataFrame;
### MCMC
chain_length = 1000,
starting_value = false,
burnin = 0,
output_samples_frequency = chain_length/1000,
output_samples_file = "MCMC_samples",
update_priors_frequency = 0,
### Methods
methods = "conventional (no markers)",
estimate_variance = true,
Pi = 0.0,
estimatePi = false,
estimateScale = false,
single_step_analysis = false,
pedigree = false,
categorical_trait = false,
missing_phenotypes = true,
constraint = false,
causal_structure = false,
### Genomic Prediction
outputEBV = true,
output_heritability = true,
### MISC
seed = false,
printout_model_info = true,
printout_frequency = 0)
Run MCMC for Bayesian Linear Mixed Models with or without estimation of variance components.
- Markov chain Monte Carlo
- The first
burnin
iterations are discarded at the beginning of a MCMC chain of lengthchain_length
. - The
starting_value
can be provided as a vector for all location parameteres and marker effects, defaulting to0.0
s. - Save MCMC samples every
output_samples_frequency
iterations, defaulting tochain_length/1000
, to filesoutput_samples_file
, defaulting toMCMC_samples.txt
. MCMC samples for hyperparametes (variance componets) and marker effects are saved by default. MCMC samples for location parametes can be saved usingoutput_MCMC_samples()
. Note that saving MCMC samples too frequently slows down the computation. - Miscellaneous Options
- Priors are updated every
update_priors_frequency
iterations, defaulting to0
.
- Priors are updated every
- The first
- Methods
- Available
methods
include "conventional (no markers)", "RR-BLUP", "BayesB", "BayesC", "Bayesian Lasso", and "GBLUP". - Single step analysis is allowed if
single_step_analysis
=true
andpedigree
is provided. - In Bayesian variable selection methods,
Pi
for single-trait analyses is a number;Pi
for multi-trait analyses is a dictionary such asPi=Dict([1.0; 1.0]=>0.7,[1.0; 0.0]=>0.1,[0.0; 1.0]=>0.1,[0.0; 0.0]=>0.1)
, defaulting toall markers have effects (Pi = 0.0)
in single-trait analysis andall markers have effects on all traits (Pi=Dict([1.0; 1.0]=>1.0,[0.0; 0.0]=>0.0))
in multi-trait analysis.Pi
is estimated ifestimatePi
= true, , defaulting tofalse
. - Variance components are estimated if
estimate_variance
=true, defaulting totrue
. - Scale parameter for prior of marker effect variance is estimated if
estimateScale
= true, defaulting tofalse
. - Miscellaneous Options
- Missing phenotypes are allowed in multi-trait analysis with
missing_phenotypes
=true, defaulting totrue
. - Catogorical Traits are allowed if
categorical_trait
=true, defaulting tofalse
. - If
constraint
=true, defaulting tofalse
, constrain residual covariances between traits to be zeros. - If
causal_structure
is provided, e.g., causal_structure = [0.0,0.0,0.0;1.0,0.0,0.0;1.0,0.0,0.0] for trait 2 -> trait 1 and trait 3 -> trait 1, phenotypic causal networks will be incorporated using structure equation models.
- Missing phenotypes are allowed in multi-trait analysis with
- Available
- Genomic Prediction
- Individual estimted breeding values (EBVs) and prediction error variances (PEVs) are returned if
outputEBV
=true, defaulting totrue
. Heritability and genetic
output_heritability
=true
, defaulting totrue
. Note that estimation of heritability is computaionally intensive. - Individual estimted breeding values (EBVs) and prediction error variances (PEVs) are returned if
- Miscellaneous Options
- Print out the model information in REPL if
printout_model_info
=true; print out the monte carlo mean in REPL withprintout_frequency
, defaulting tofalse
. - If
seed
, defaulting tofalse
, is provided, a reproducible sequence of numbers will be generated for random number generation.
- Print out the model information in REPL if
JWAS.set_covariate
— Method.set_covariate(model::MME,variables::AbstractString...)
- set variables as covariates; model is the output of function
build_model()
.
#After running build_model, variabels age and year can be set to be covariates as
set_covariate(model,"age","year")
#or
set_covariate(model,"age year")
JWAS.set_random
— Function.set_random(mme::MME,randomStr::AbstractString,ped::Pedigree, G;df=4)
- set variables as random polygenic effects with pedigree information ped and variances G with degree of freedom df, defaulting to
4.0
.
#single-trait (example 1)
model_equation = "y = intercept + age + animal"
model = build_model(model_equation,R)
ped = get_pedigree(pedfile)
G = 1.6
set_random(model,"animal", ped, G)
#single-trait (example 2)
model_equation = "y = intercept + age + animal + animal*age"
model = build_model(model_equation,R)
ped = get_pedigree(pedfile)
G = [1.6 0.2
0.2 1.0]
set_random(model,"animal animal*age", ped,G)
#multi-trait
model_equations = "BW = intercept + age + sex + animal
CW = intercept + age + sex + animal"
model = build_model(model_equations,R);
ped = get_pedigree(pedfile);
G = [6.72 2.84
2.84 8.41]
set_random(model,"animal",ped,G)
JWAS.set_random
— Function.set_random(mme::MME,randomStr::AbstractString,G;Vinv=0,names=[],df=4)
- set variables as random effects, defaulting to i.i.d effects, with variances G with degree of freedom df, defaulting to 4.0.
- the random effects are assumed to be i.i.d by default and it can be defined with any (inverse of) covariance structure Vinv with its index (row names) provided by names.
#single-trait (i.i.d randome effects)
model_equation = "y = intercept + litter + sex"
model = build_model(model_equation,R)
G = 0.6
set_random(model,"litter",G)
#multi-trait (i.i.d randome effects)
model_equations = "BW = intercept + litter + sex
CW = intercept + litter + sex"
model = build_model(model_equations,R);
G = [3.72 1.84
1.84 3.41]
set_random(model,"litter",G)
#single-trait (randome effects with specific covariance structures)
model_equation = "y = intercept + litter + sex"
model = build_model(model_equation,R)
V = [1.0 0.5 0.25
0.5 1.0 0.5
0.25 0.5 1.0]
G = 0.6
set_random(model,"litter",G,Vinv=inv(V),names=[a1;a2;a3])
JWAS.showMME
— Method.showMME(mme::MME,df::DataFrame)
- Show left-hand side and right-hand side of mixed model equations (no markers).
JWAS.solve
— Method.solve(mme::MME,df::DataFrame;solver="default",printout_frequency=100,tolerance = 0.000001,maxiter = 5000)
- Solve the mixed model equations (no marker information) without estimating variance components.
Available solvers include default
, Jacobi
, GaussSeidel
, and Gibbs sampler
.
JWAS.add_term
— Method.add to model an extra term: imputation_residual
JWAS.getMCMCinfo
— Method.getMCMCinfo(model::MME)
- Print out MCMC information.
JWAS.getMME
— Method.Construct mixed model equations with
incidence matrix: X ; response : ySparse; left-hand side : mmeLhs ; right-hand side : mmeLhs ;
JWAS.get_outputX_others
— Method.get_outputX_others(model)
Make incidence matrices for effects involved in calculation of EBV including J, ϵ, pedTrmVec except marker covariates.
JWAS.PedModule.get_pedigree
— Method.get_pedigree(pedfile::AbstractString;header=false,separator=',')
- Get pedigree informtion from a pedigree file with header (defaulting to
false
) and separator (defaulting to,
). - Pedigree file format:
a,0,0
c,a,b
d,a,c
JWAS.PedModule.getinfo
— Method.get_info(pedigree::Pedigree)
- Print summary informtion from a pedigree object including number of individulas, sires. dams and founders. Return individual IDs, inverse of numerator relationship matrix, and inbreeding coefficients.
–>