Workflow
A step by step workflow for how to run JWAS is shown in this section. The workflow below is used to demonstrate a three-trait Bayesian linear mixed model fitting fixed effects (x1, x3), random effects (x2), direct genetic effects (ID), maternal genetic effects (dam) and genomic information.
Available Models
Given the data and model equations, several different types of models are available in JWAS as shown below. In the table below, "X" denotes the type of available data, and "Y<=A" denotes that Y individuals is a subset of A individuals.
Linear Mixed Models (LMM) | phenotypes (Y) | pedigree (A) | genotypes (G) | notes |
---|---|---|---|---|
Conventional LMM | X | |||
Pedigree-based LMM | X | X | Y<=A | |
Complete Genomic LMM | X | optional | X | Y<=G |
Incomplete Genomic LMM | X | X | X | Y<=A,G<=A |
Incomplete Genomic LMM is also called "single-step" methods in animal breeding.
Pedigree information may be used in Complete Genomic LMM for extra polygenic effects to account for genetic variance not explained by the genomic data (e.g., SNPs).
Pedigree-based LMM (none of the individuals in the pedigree are genotyped) and Complete Genomic LMM (all individuals in the pedigree are genotyped) are special cases of Incomplete Genomic LMM (part of the individuals in the pedigree are genotyped).
Get Data Ready
By default, input data files are comma-separated values (CSV) files, where each line of the file consists of one or more fields, separated by commas. Other field separators such as space (' ') or tab ('\t') can be used if you supply the keyword argument, e.g, CSV.read(...,delim='\t')
or add_genotypes(...,separator='\t')
Click on the buttons inside the tabbed menu to see the data:
ID,y1,y2,y3,x1,x2,x3,dam
a1,-0.06,3.58,-1.18,0.9,2,m,NA
a2,-0.6,4.9,0.88,0.3,1,f,NA
a3,-2.07,3.19,0.73,0.7,2,f,NA
a4,-2.63,6.97,-0.83,0.6,1,m,a2
a5,2.31,3.5,-1.52,0.4,2,m,a2
a6,0.93,4.87,-0.01,05,2,f,a3
a7,-0.69,3.1,-1.47,0.5,2,f,a3
a8,-4.69,7.31,-1.09,0.3,2,m,a6
a9,-2.81,7.18,0.76,0.4,2,m,a6
a10,1.92,1.78,-0.88,0.2,1,m,a7
ID,Sire,Dam
a1,0,0
a2,0,0
a3,0,0
a4,a1,a2
a5,a1,a2
a6,a1,a3
a7,a1,a3
a8,a4,a6
a9,a4,a6
a10,a5,a7
ID,m1,m2,m3,m4,m5
a1,1,2,1,1,0
a2,2,1,1,1,1
a3,1,1,0,1,1
a4,2,2,0,1,0
a5,1,1,2,1,1
a6,2,1,0,0,0
a7,0,2,1,2,1
a8,2,2,0,0,0
a9,2,1,0,1,0
a10,0,2,2,2,1
markerID,chromosome,position
m1,1,16977
m2,1,434311
m3,1,1025513
m4,2,70350
m5,2,101135
Step 1: Load Packages
using JWAS,CSV,DataFrames
The JWAS
package is loaded, as well as the CSV
and DataFrame
packages for reading text files.
Step 2: Read Data
Read Phenotypic Data
phenotypes = CSV.read("phenotypes.txt",DataFrame,delim = ',',header=true,,missingstrings=["NA"])
first(phenotypes)
output:
6×8 DataFrames.DataFrame
│ Row │ ID │ y1 │ y2 │ y3 │ x1 │ x2 │ x3 │ dam │
├─────┼────┼───────┼──────┼───────┼─────┼────┼────┼───────────┤
│ 1 │ a1 │ -0.06 │ 3.58 │ -1.18 │ 0.9 │ 2 │ m │ missing │
│ 2 │ a2 │ -0.6 │ 4.9 │ 0.88 │ 0.3 │ 1 │ f │ missing │
│ 3 │ a3 │ -2.07 │ 3.19 │ 0.73 │ 0.7 │ 2 │ f │ missing │
│ 4 │ a4 │ -2.63 │ 6.97 │ -0.83 │ 0.6 │ 1 │ m │ a2 │
│ 5 │ a5 │ 2.31 │ 3.5 │ -1.52 │ 0.4 │ 2 │ m │ a2 │
│ 6 │ a6 │ 0.93 │ 4.87 │ -0.01 │ 5.0 │ 2 │ f │ a3 │
The phenotypic data is read on line 1. On line 2, the first several rows of the phenotypic data are shown.
Read Pedigree Data
pedigree = get_pedigree("pedigree.txt",separator=",",header=true)
- link to documentation for
get_pedigree
The pedigree data is read on line 1.
Read Genomic Data
genotypes = get_genotypes("genotypes.txt",G,method="BayesC",separator=",",header=true) #G is optional
- link to documentation for
get_genotypes
On line 1, the genomic information is read on line 1 with the genotype file. and variance G
(a 3x3 matrix). In Bayesian analysis, the G
is the mean for the prior assigned for the genomic variance with degree of freedom df
, defaulting to 4.0. If G
is not provided, a value is calculated from responses (phenotypes).
Step 3: Build Model Equations
model_equation = "y1 = intercept + x1 + x3 + ID + dam + genotypes
y2 = intercept + x1 + x2 + x3 + ID + genotypes
y3 = intercept + x1 + x1*x3 + x2 + ID + genotypes"
model=build_model(model_equation,R) #R is optional
- link to documentation for
build_model
The model equation for a 3-trait analysis is defined on the first 3 lines.
- The effects fitted in the model for trait
y1
are the intercept,x1
,x3
, direct genetic effects (ID
), maternal genetic effects (dam
), and molecular marker effects (genotypes
). - The effects fitted in the model for trait
y2
are the intercept,x1
,x2
,x3
, direct genetic effects (ID
), and molecular marker effects (genotypes
). - The effects fitted in the model for trait
y3
are the intercept,x1
, the interaction betweenx1
andx3
,x2
, direct genetic effects (ID
), and , and molecular marker effects (genotypes
).
On the last line, the model is built given the model equation and residual variance R
(a 3x3 matrix). In Bayesian analysis, R
is the mean for the prior assigned for the residual variance with degree of freedom df
, defaulting to 4.0. If R
is not provided, a value is calculated from responses (phenotypes). By default, all effects are treated as fixed and classed as factors (categorical variables) rather than covariates (quantitative variables).
Step 4: Set Factors or Covariate
set_covariate(model,"x1")
- link to documentation for
set_covariate
On line 1, the effect x1
is defined to be a covariate (a quantitative variable) rather than a factor (a categorical variable).
Step 5: Set Random or Fixed Effects
set_random(model,"x2",G1) #G1 is optional
set_random(model,"ID dam",pedigree,G2) #G2 is optional
- link to documentation for
set_random
On line 1, the x2
class effect is defined as random with variance G1
(a 2x2 matrix). On line 2, direct genetic effects and maternal genetic effects are fitted as ID
and dam
with G2
(a 4x4 matrix) and the inverse of the numerator relationship matrix defined from pedigree. In Bayesian analysis, G1
and G2
are the means for the priors assigned for the variances with degree of freedom df
, defaulting to 4.0. If G1
or G2
is not provided, a value is calculated from responses (phenotypes).
Step 6: Run Bayesian Analysis
outputMCMCsamples(model,"dam")
out=runMCMC(model,phenotypes)
- link to documentation for
outputMCMCsamples
- link to documentation for
runMCMC
On line 1, MCMC samples from runMCMC
for x2
is saved to a file, where each row represents one sample from the MCMC. On line 2, a multi-trait BayesC analysis is performed with model
and phenotypes
as had been defined in step 1-6. MCMC samples for marker effects, location parameters specified on line 1, and all variance components from this analysis are saved every output_samples_frequency
iterations to files.
Several steps above can be skipped if no related information is available, e.g., step 4 is skipped if all effects are classed as factors. Several detailed examples are available in the examples section. Here is the link to documentation for all Public functions.
check results
Posterior means (estimate) and standard deviations (SD) of location parameters, most variance components, and marker effects are saved as the variable out
and in text files. They can be listed and obtained as
keys(out)
# output:
#
# Base.KeyIterator for a Dict{Any,Any} with 7 entries. Keys:
# "polygenic effects covariance matrix"
# "Model frequency"
# "residual covariance matrix"
# "marker effects"
# "marker effects variance"
# "location parameters"
# "Pi"
out["residual variance"]
# output:
#
#Covariance Estimate SD
#y1_y1 1.65265 0.29405
#y1_y2 -0.0290279 0.02347
#y1_y3 -0.252009 0.048289
#y2_y1 -0.0290279 0.02347
#y2_y2 0.977405 0.009732
#y2_y3 0.0451994 0.095828
#y3_y1 -0.252009 0.048289
#y3_y2 0.0451994 0.095828
#y3_y3 0.363878 0.049278
In addition, MCMC samples from posterior distributions of marker effects, all variance components, and location parameters specified in step 7, are saved to text files in your working directory. Users can compute the posterior distributions of parameters of interest using these MCMC samples files. A list of output files are shown below.
Output files:
Below is a list of files containing estimates and standard deviations for variables of interest.
file name | description |
---|---|
EBV_y1.txt | estimated breeding values for trait named "y1" |
EBV_y2.txt | estimated breeding values for trait named "y2" |
EBV_y3.txt | estimated breeding values for trait named "y3" |
genetic_variance.txt | estimated genetic variance-covariance for all traits |
heritability.txt | estimated heritability |
location_parameters.txt | estimated non-genetic effects |
pi_genotypes.txt | estimated pi |
polygeniceffectscovariance_matrix.txt | estimated variance-covariance between polygenic effects (y1ID, y2ID, y3ID, y1dam) |
markereffectsgenotypes.txt | estimated marker effects for all traits |
residual_variance.txt | estimated residual variance-covariance for all traits |
Below is a list of files containing MCMC samples for variables of interest.
file name | description |
---|---|
MCMCsamplesEBV_y1.txt | MCMC samples from the posterior distribution of breeding values for trait named "y1" |
MCMCsamplesEBV_y2.txt | MCMC samples from the posterior distribution of breeding values for trait named "y2" |
MCMCsamplesEBV_y3.txt | MCMC samples from the posterior distribution of breeding values for trait named "y3" |
MCMCsamplesgenetic_variance.txt | MCMC samples from the posterior distribution of genetic variance-covariance for all traits |
MCMCsamplesheritability.txt | MCMC samples from the posterior distribution of heritability |
MCMCsamplesmarkereffectsgenotypes_y1 | MCMC samples from the posterior distribution of marker effects for trait named "y1" |
MCMCsamplesmarkereffectsgenotypes_y2 | MCMC samples from the posterior distribution of marker effects for trait named "y2" |
MCMCsamplesmarkereffectsgenotypes_y3 | MCMC samples from the posterior distribution of marker effects for trait named "y3" |
MCMCsamplesmarkereffectsvariances_genotypes.txt | MCMC samples from the posterior distribution of marker effect variance for all traits |
MCMCsamplespi_genotypes.txt | MCMC samples from the posterior distribution of pi |
MCMCsamplespolygeniceffectsvariance.txt | MCMC samples from the posterior distribution of variance-covariance between y1ID, y2ID, y3ID, y1dam |
MCMCsamplesresidual_variance.txt | MCMC samples from the posterior distribution of residual variance-covariance for all traits |
MCMCsamplesy1.dam.txt | MCMC samples from the posterior distribution of dam effect for y1 |
MCMCsamplesy1.IDy2.IDy3.IDy1.damvariances.txt | MCMC samples from the posterior distribution of variance-covariance between y1ID, y2ID, y3ID, y1dam |
MCMCsamplesy2.x2y3.x2variances.txt | MCMC samples from the posterior distribution of variance-covariance between y2x2 and y3x2 |