Workflow

A step by step workflow for how to run JWAS is shown in this section. The workflow below is used to demonstrate a three-trait Bayesian linear mixed model fitting fixed effects (x1, x3), random effects (x2), direct genetic effects (ID), maternal genetic effects (dam) and genomic information.

Workflow

Available Models
Get Data Ready
Step 1: Load Packages
Step 2: Read data
Step 3: Build Model Equations
Step 4: Set Factors or Covariate
Step 5: Set Random or Fixed Effects
Step 6: Use Genomic Information
Step 7: Run Bayesian Analysis
check results

Available Models

Given the data and model equations, several different types of models are available in JWAS as shown below. In the table below, "X" denotes the type of available data, and "Y<=A" denotes that Y individuals is a subset of A individuals.

Linear Mixed Models (LMM)	phenotypes (Y)	pedigree (A)	genotypes (G)	notes
Conventional LMM	X
Pedigree-based LMM	X	X		Y<=A
Complete Genomic LMM	X	optional	X	Y<=G
Incomplete Genomic LMM	X	X	X	Y<=A,G<=A

Note

Incomplete Genomic LMM is also called "single-step" methods in animal breeding.
Pedigree information may be used in Complete Genomic LMM for extra polygenic effects to account for genetic variance not explained by the genomic data (e.g., SNPs).
Pedigree-based LMM (none of the individuals in the pedigree are genotyped) and Complete Genomic LMM (all individuals in the pedigree are genotyped) are special cases of Incomplete Genomic LMM (part of the individuals in the pedigree are genotyped).

Get Data Ready

By default, input data files are comma-separated values (CSV) files, where each line of the file consists of one or more fields, separated by commas. Other field separators such as space (' ') or tab ('\t') can be used if you supply the keyword argument, e.g, CSV.read(...,delim='\t') or add_genotypes(...,separator='\t')

Click on the buttons inside the tabbed menu to see the data:

ID,y1,y2,y3,x1,x2,x3,dam

a1,-0.06,3.58,-1.18,0.9,2,m,0

a2,-0.6,4.9,0.88,0.3,1,f,0

a3,-2.07,3.19,0.73,0.7,2,f,0

a4,-2.63,6.97,-0.83,0.6,1,m,a2

a5,2.31,3.5,-1.52,0.4,2,m,a2

a6,0.93,4.87,-0.01,05,2,f,a3

a7,-0.69,3.1,-1.47,0.5,2,f,a3

a8,-4.69,7.31,-1.09,0.3,2,m,a6

a9,-2.81,7.18,0.76,0.4,2,m,a6

a10,1.92,1.78,-0.88,0.2,1,m,a7

ID,Sire,Dam

a1,0,0

a2,0,0

a3,0,0

a4,a1,a2

a5,a1,a2

a6,a1,a3

a7,a1,a3

a8,a4,a6

a9,a4,a6

a10,a5,a7

ID,m1,m2,m3,m4,m5

a1,1,2,1,1,0

a2,2,1,1,1,1

a3,1,1,0,1,1

a4,2,2,0,1,0

a5,1,1,2,1,1

a6,2,1,0,0,0

a7,0,2,1,2,1

a8,2,2,0,0,0

a9,2,1,0,1,0

a10,0,2,2,2,1

markerID,chromosome,position

m1,1,16977

m2,1,434311

m3,1,1025513

m4,2,70350

m5,2,101135

Step 1: Load Packages

using JWAS,CSV,DataFrames

The JWAS package is loaded, as well as the CSV and DataFrame packages for reading text files.

Step 2: Read data

phenotypes = CSV.read("phenotypes.txt",delim = ',',header=true)
pedigree   = get_pedigree("pedigree.txt",separator=",",header=true)
head(phenotypes)

output:

6×8 DataFrames.DataFrame
│ Row │ ID │ y1    │ y2   │ y3    │ x1  │ x2 │ x3 │ dam │
├─────┼────┼───────┼──────┼───────┼─────┼────┼────┼─────┤
│ 1   │ a1 │ -0.06 │ 3.58 │ -1.18 │ 0.9 │ 2  │ m  │ 0   │
│ 2   │ a2 │ -0.6  │ 4.9  │ 0.88  │ 0.3 │ 1  │ f  │ 0   │
│ 3   │ a3 │ -2.07 │ 3.19 │ 0.73  │ 0.7 │ 2  │ f  │ 0   │
│ 4   │ a4 │ -2.63 │ 6.97 │ -0.83 │ 0.6 │ 1  │ m  │ a2  │
│ 5   │ a5 │ 2.31  │ 3.5  │ -1.52 │ 0.4 │ 2  │ m  │ a2  │
│ 6   │ a6 │ 0.93  │ 4.87 │ -0.01 │ 5.0 │ 2  │ f  │ a3  │

link to documentation for get_pedigree

The phenotypic data is read on line 1, and the pedigree data is read on line 2. On line 3, the first several rows of data are shown.

Step 3: Build Model Equations

model_equation = "y1 = intercept + x1 + x3 + ID + dam
                  y2 = intercept + x1 + x2 + x3 + ID  
                  y3 = intercept + x1 + x1*x3 + x2 + ID"
model=build_model(model_equation, R)

link to documentation for build_model

The non-genomic part of the model equation for a 3-trait analysis is defined on the first 3 lines.

The effects fitted in the model for trait y1 are the intercept, x1, x3, direct genetic effects (ID) and maternal genetic effects (dam).
The effects fitted in the model for trait y2 are the intercept, x1, x2, x3 and direct genetic effects (ID).
The effects fitted in the model for trait y3 are the intercept, x1, the interaction between x1 and x3, x2 and direct genetic effects (ID).

On the last line, the model is built given the model equation and residual variance R (a 3x3 matrix). By default, all effects are treated as fixed and classed as factors (categorical variables) rather than covariates (quantitative variables).

Step 4: Set Factors or Covariate

set_covariate(model,"x1")

link to documentation for set_covariate

On line 1, the effect x1 is defined to be a covariate rather than class effect.

Step 5: Set Random or Fixed Effects

set_random(model,"x2",G1)
set_random(model,"ID dam",pedigree,G2)

link to documentation for set_random

On line 1, the x2 class effect is defined as random with variance G1(a 2x2 matrix). On line 2, direct genetic effects and maternal genetic effects are fitted as ID and dam with G2 (a 4x4 matrix) and the inverse of the numerator relationship matrix defined from pedigree.

Step 6: Use Genomic Information

add_genotypes(model,"genotypes.txt",G3,separator=',')

link to documentation for add_genotypes

On line 1, the genomic part of the model is defined with the genotype file and variance G3 (a 3x3 matrix).

Step 7: Run Bayesian Analysis

outputMCMCsamples(model,"x2")
out=runMCMC(model,phenotypes,methods="BayesC",output_samples_frequency=100)

link to documentation for outputMCMCsamples
link to documentation for runMCMC

On line 1, MCMC samples from runMCMC for x2 is saved to a file, where each row represents one sample from the MCMC. On line 2, a multi-trait BayesC analysis is performed with model and phenotypes as had been defined in step 1-6. MCMC samples for marker effects, location parameters specified on line 1, and all variance components from this analysis are saved every output_samples_frequency iterations to files.

Several steps above can be skipped if no related information is available, e.g., step 6 is skipped for pedigree-based LMM. Several detailed examples are available in the examples section. Here is the link to documentation for all Public functions.

check results

Posterior means of location parameters, most variance components, and marker effects are saved in out. They can be listed and obtained as

keys(out)

# output:
#
# Base.KeyIterator for a Dict{Any,Any} with 7 entries. Keys:
#   "Posterior mean of polygenic effects covariance matrix"
#   "Model frequency"
#   "Posterior mean of residual covariance matrix"
#   "Posterior mean of marker effects"
#   "Posterior mean of marker effects covariance matrix"
#   "Posterior mean of location parameters"
#   "Posterior mean of Pi"

out["Posterior mean of residual covariance matrix"]

# output:
#
# 3×3 Array{Float64,2}:
#   0.674651   -0.103877   0.0834044
#  -0.103877    0.828135  -0.121798
#   0.0834044  -0.121798   0.720751

MCMC samples for marker effects, location parameters specified in step 7, and all variance components are saved to text files in your working directory. They can be obtained as

res=readdlm("MCMC_samples_marker_effects_y1.txt",',',header=true)