Genetic Evaluation (JWAS)
Users can select individuals by random, phenotypes, or esti- mated breeding values from genetic evaluations. A genome-enabled analysis package JWAS (Cheng et al. 2018) has been already incorporated into XSimV2. Multiple methods can be per- formed in XSimV2 for genetic evaluations, including pedigree- based BLUP (Henderson 1984), GBLUP (Habier et al. 2007; Van- Raden 2008), Bayesian Alphabet (Meuwissen et al. 2001; Park and Casella 2008; Kizilkaya et al. 2010; Habier et al. 2011; Erbe et al. 2012; Moser et al. 2015; Gianola and Fernando 2020), and single-step methods (Legarra et al. 2009; Fernando et al. 2014) for both single-trait and multiple-trait analysis (Gianola and Fernando 2020).
genetic_evaluation(cohort ::Cohort,
phenotypes ::DataFrame=DataFrame();
model_equation ::String="",
covariates ::String="",
random_iid ::String="",
random_str ::String="",
methods ::String="GBLUP",
add_genotypes ::Bool=true,
idx_missing_p ::Any=[],
return_out ::Bool=false,
args...)
Arguments
cohort
: The evaluatedcohort
.phenotypes
: Adataframe
wiht columns ofid
, traits, and other studied factors.h2
: Default 0.5. Heritability for simulating cohort phenotypes if nophenotype
is provided.ve
: Default 1. Residual variance for simulating cohort phenotypes if nophenotype
is provided.model_equation
: Default "y ~ intercept". The equation for fitting the breeding values.covariates
: Specifies terms inphenotypes
as continuous factors. Must be included inmodel_equation
as well.random_iid
: Specifies terms inphenotypes
as random effects (i.i.d.). Must be included inmodel_equation
as well.random_str
: Specifies terms inphenotypes
as random effects (pedigree). Must be included inmodel_equation
as well.methods
: Default "GBLUP". Defines what models/methods used in the genetic evaluation.add_genotypes
: Defaulttrue
. Genotypes will be included in the equation.idx_missing_p
: A vector assigning which individuals are not phenotyped.return_out
: Defaultfalse
. Set totrue
to return the completeJWAS
outputs.
Outputs
A n
by t
matrix containing breeding values will be return if return_out = false
, where n
is number of individuals and t
is number of evaluated traits. A complete JWAS
outputs will be returned if return_out = true
.
Example of the phenotypes
Dataframe
Row │ ID y1 y2 factor_1 factor_2
│ String Float64? Float64? Int64 Int64
─────┼─────────────────────────────────────────────────────────────
1 │ 1 0.88976 -0.0798048 1 1
2 │ 2 -0.783203 0.988616 1 1
3 │ 3 missing missing 1 1
4 │ 4 missing missing 1 1
5 │ 5 missing missing 1 1
6 │ 6 missing missing 1 2
7 │ 7 missing missing 1 2
8 │ 8 -1.76058 0.277289 1 2
9 │ 9 0.938871 2.57784 1 2
10 │ 10 0.37026 3.15993 1 2
11 │ 11 -1.91869 0.0935064 2 3
12 │ 12 -0.89847 1.8987 2 3
13 │ 13 1.69663 -0.949513 2 3
14 │ 14 3.48862 0.654378 2 3
15 │ 15 1.39615 1.80355 2 3
16 │ 16 2.31685 2.13446 2 4
17 │ 17 -3.81017 0.0186156 2 4
18 │ 18 -1.71216 -0.0976809 2 4
19 │ 19 1.80917 1.34104 2 4
20 │ 20 -0.504771 3.22665 2 4
Examples
We will use demo genome
and phenome
for the example. A cohort
with 20 individuals is simulated.
julia> build_demo()
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome : 10
[ Info:
[ Info: Chromosome Length (cM):
[ Info: [150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0]
[ Info:
[ Info: Number of Loci : 1000
[ Info: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
[ Info:
[ Info: Genotyping Error : 0.0
[ Info: Mutation Rate : 0.0
[ Info:
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 2
[ Info: Heritability (h2) : [0.5, 0.5]
┌ Info:
│ Genetic_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
[ Info: Number of QTLs : [3 8]
julia> cohort = Cohort(20)
[ Info: Cohort (20 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.862 -0.913]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.814 1.448]
Basic Usage
By default, GBLUP will be performed without providing any argument.
julia> out = genetic_evaluation(cohort)
20×2 Array{Float64,2}:
42.2908 -5.89224
6.38041 -4.5895
19.5065 -57.4513
24.05 -84.5086
11.5925 45.5247
-11.5926 -1.91879
-29.2152 -35.5977
-1.92563 63.8841
19.8686 76.2722
22.6728 45.5179
-58.2754 42.5953
28.556 -21.0314
11.5918 25.1126
-0.517653 -0.237724
-11.4712 -47.3073
-10.9575 -3.58754
-11.7835 -9.62802
-17.7895 -22.0821
-39.5114 -3.46026
6.53008 -1.61416
The breeding values out
can be used as criteria for the selection directly.
julia> select(cohort, 5, out)
[ Info: --------- Selection Summary ---------
[ Info: Select 5 individuals out of 20 individuals
[ Info: Selection differential (P): [0.783 1.192]
[ Info: Selection response (G): [0.344 1.063]
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (5 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.552 0.367]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.181 0.539]
Conditions of the evaluation can be further defined by h2
or ve
.
julia> out = genetic_evaluation(cohort, ve = [1 0.5
0.5 1])
Set return_out = true
to obtain the complete JWAS
outputs.
julia> out = genetic_evaluation(cohort, return_out = true)
Dict{Any,Any} with 7 entries:
"EBV_y2" => 20×3 DataFrame…
"EBV_y1" => 20×3 DataFrame…
"heritability" => 2×3 DataFrame…
"location parameters" => 2×5 DataFrame…
"residual variance" => 4×3 DataFrame…
"marker effects geno" => 40×5 DataFrame…
"genetic_variance" => 4×3 DataFrame…
Customized Phenotypes and Factors.
Obtain JWAS-compatible dataframe.
julia> dt_p = get_phenotypes(cohort, "JWAS")
Assign un-phenotyped individuals
julia> idx = 3:6
julia> allowmissing!(dt_p);
julia> dt_p[idx, 2:end] .= missing;
julia> first(dt_p, 10)
10×3 DataFrame
Row │ ID y1 y2
│ String? Float64? Float64?
─────┼──────────────────────────────────────────
1 │ 1 -0.0933375 -0.882781
2 │ 2 -1.50748 -2.12898
3 │ 3 missing missing
4 │ 4 missing missing
5 │ 5 missing missing
6 │ 6 missing missing
7 │ 7 -0.431361 -0.624666
8 │ 8 -1.17867 0.415607
9 │ 9 0.266733 1.02123
10 │ 10 -1.15588 0.737982
Provide customized phenotypes for the evaluation
julia> out = genetic_evaluation(dams, dt_p)
It's equivalent to set idx_missing_p = 3:6
for the missing phenotypes.
julia> out = genetic_evaluation(dams, dt_p, idx_missing_p = 3:6)
The marker IDs are set to 1,2,...,#markers
The individual IDs is set to 1,2,...,#observations
Genotype informatin:
#markers: 1000; #individuals: 20
The folder results is created to save results.
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for 16 individuals are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
We can simulated factor_1
and factor_2
as fixed and random effects, respectively. And we use both factor_1
and factor_2
to fit y1
, and factor_1
alone to fit y2
.
julia> dt_p[:, "factor_1"] = [i for i in 1:4 for j in 1:5];
julia> dt_p[:, "factor_2"] = [i for i in 1:2 for j in 1:10];
julia> out = genetic_evaluation(cohort, dt_p,
model_equation="y1 = intercept + factor_1 + factor_2
y2 = intercept + factor_1",
random_iid="factor_2",
return_out=true)
factor_2 is not found in model equation 2.
The marker IDs are set to 1,2,...,#markers
The individual IDs is set to 1,2,...,#observations
Genotype informatin:
#markers: 1000; #individuals: 20
The folder results is created to save results.
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for 16 individuals are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
Prior information for random effect variance is not provided and is generated from the data.
Pi (Π) is not provided.
Pi (Π) is generated assuming all markers have effects on all traits.
A Linear Mixed Model was build using model equations:
y1 = intercept + factor_1 + factor_2
y2 = intercept + factor_1
Model Information:
Term C/F F/R nLevels
intercept factor fixed 1
factor_1 factor fixed 4
factor_2 factor random 2
MCMC Information:
chain_length 100
burnin 0
starting_value true
printout_frequency 101
output_samples_frequency 1
constraint false
missing_phenotypes true
update_priors_frequency 0
seed false
Hyper-parameters Information:
random effect variances (y1:factor_2):
0.45
residual variances:
1.0f0 0.0f0
0.0f0 1.0f0
Genomic Information:
complete genomic data (i.e., non-single-step analysis)
Genomic Category geno
Method GBLUP
genetic variances (genomic):
1.0 0.0
0.0 1.0
estimateScale false
Degree of freedom for hyper-parameters:
residual variances: 6.000
random effect variances: 5.000
marker effect variances: 6.000
The version of Julia and Platform in use:
Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =
The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.
Dict{Any,Any} with 7 entries:
"EBV_y2" => 20×3 DataFrame…
"EBV_y1" => 20×3 DataFrame…
"heritability" => 2×3 DataFrame…
"location parameters" => 12×5 DataFrame…
"residual variance" => 4×3 DataFrame…
"marker effects geno" => 32×5 DataFrame…
"genetic_variance" => 4×3 DataFrame…
References
Cheng, H., R. Fernando, and D. Garrick, 2018 JWAS: Julsi- taaimnoptlheemrentation of Whole-genome Analysis Soft- ware. Proceedings of the world congress on genetics applied to livestock production 11: 859.
Erbe, M., B. J. Hayes, L. K. Matukumalli, S. Goswami, P. J. Bow- man, et al., 2012 Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high- density single nucleotide polymorphism panels. Journal of Dairy Science 95: 4114–4129.
Fernando, R. L., J. C. Dekkers, and D. J. Garrick, 2014 A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Ge- netics Selection Evolution 46: 50.
Gianola, D. and R. L. Fernando, 2020 A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Com- plex Traits. Genetics 214: 305–331, Publisher: Genetics Section: Investigations.
Habier, D., R. L. Fernando, and J. C. M. Dekkers, 2007 The Im- pact of Genetic Relationship Information on Genome-Assisted Breeding Values. Genetics 177: 2389–2397, Publisher: Genetics Section: Investigations.
Habier, D., R. L. Fernando, K. Kizilkaya, and D. J. Garrick, 2011 Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics 12: 186.
Henderson, C. R., 1984 Applications of linear models in animal breeding.. Publisher: University of Guelph.
Kizilkaya, K., R. L. Fernando, and D. J. Garrick, 2010 Genomic prediction of simulated multibreed and purebred performance using observed fifty thousand single nucleotide polymor- phism genotypes. Journal of Animal Science 88: 544–551.
Legarra, A., I. Aguilar, and I. Misztal, 2009 A relationship matrix including full pedigree and genomic information. Journal of Dairy Science 92: 4656–4663.
Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard, 2001 Pre- diction of Total Genetic Value Using Genome-Wide Dense Marker Maps. Genetics 157: 1819–1829, Publisher: Genetics Section: Investigations.
Moser, G., S. H. Lee, B. J. Hayes, M. E. Goddard, N. R. Wray, et al., 2015 Simultaneous Discovery, Estimation and Prediction Analysis of Complex Traits Using a Bayesian Mixture Model. PLOS Genetics 11: e1004969, Publisher: Public Library of Science.
Park, T. and G. Casella, 2008 The Bayesian Lasso. Journal of the American Statistical Association 103: 681–686.
VanRaden, P. M., 2008 Efficient Methods to Compute Genomic Predictions. Journal of Dairy Science 91: 4414–4423.