Library
Index
Main.XSim.Cohort
Main.XSim.breed
Main.XSim.build_genome
Main.XSim.build_phenome
Main.XSim.genetic_evaluation
Main.XSim.mate
Main.XSim.select
Contents
Main.XSim.build_genome
— FunctionQuick Start
Quick setup by assigning number of markers
and chromosomes
.
build_genome(;n_loci ::Int64=-1,
n_chr ::Int64=10,
species ::String="none",
args...)
Arguments
n_loci
: Number of simulated markers for each chromosomen_chr
: Number of simulated chromosome with length of 100 centimorganspecies
: Infer genetic position (Morgan) by pre-load linkage maps, available species are: ["cattle", and "pig"]
Examples
julia> build_genome(n_chr = 2,
n_loci = 10000)
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome : 2
[ Info:
[ Info: Chromosome Length (cM): 200.0
[ Info: [100.0, 100.0]
[ Info:
[ Info: Number of Loci : 20000
[ Info: [10000, 10000]
[ Info:
[ Info: Genotyping Error : 0.0
[ Info: Mutation Rate : 0.0
[ Info:
Define genome by a file or a DataFrame
Define genome by providing a formatted dataframe or a path to the file.
build_genome(dt ::DataFrame;
species ::String="none",
args...)
build_genome(filename::String;
species ::String="none",
args...)
Arguments
dt
: ADataFrame
with required columns ofchr
andcM
, orchr
andbp
ifspecies
is provided for the inference.filename
: A filepath to the file containing genome information.species
: Adjust genetic position (Morgan) by pre-load linkage maps, available species are: ["cattle", and "pig"]
Example of the DataFrame
4×7 DataFrame
Row │ id chr bp cM MAF eff_1 eff_2
│ String Int64 Int64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────────────
1 │ snp_1 1 1818249 50.8 0.5 0.1 0.0
2 │ snp_2 1 6557697 80.3 0.5 0.0 0.0
3 │ snp_3 2 2298800 39.2 0.5 0.2 0.0
4 │ snp_4 2 5015698 66.3 0.5 0.0 0.5
Examples
By a filepath
julia> build_genome("path/map.csv";
rate_mutation=0.005, rate_error=0.01)
or a dataframe
julia> using DataFrames
julia> data = CSV.read("path/map.csv", DataFrame)
julia> build_genome(data;
rate_mutation=0.005, rate_error=0.01)
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome : 2
[ Info:
[ Info: Chromosome Length (cM): 146.6
[ Info: [80.3, 66.3]
[ Info:
[ Info: Number of Loci : 4
[ Info: [2, 2]
[ Info:
[ Info: Genotyping Error : 0.01
[ Info: Mutation Rate : 0.005
[ Info:
Use cattle genome as reference to infer the genetic positions
julia> build_genome("path/map.csv"; species="cattle")
[ Info: Arias,J.A. et al. (2009) A high density linkage map of the bovine genome. BMC Genetics, 10, 18.
[ Info: Reference Genome : Btau 4.0
[ Info: SNP Chip : Affymetrix GeneChip Bovine Mapping 10K SNP kit
┌ Warning: The provided genetic distances will be replaced with ones infered from preloaded linkage maps
└ @ XSim ~/Dropbox/projects/XSim/src/objects/global.jl:118
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome : 2
[ Info:
[ Info: Chromosome Length (cM): 16.8
[ Info: [15.1, 1.7]
[ Info:
[ Info: Number of Loci : 4
[ Info: [2, 2]
[ Info:
[ Info: Genotyping Error : 0.0
[ Info: Mutation Rate : 0.0
[ Info:
Explict Definition
Define genome by providing genetic information of each loci explicitly.
build_genome(chromosome ::Array{Int64, 1},
bp ::Array{Int64, 1},
cM ::Array{Float64, 1},
maf ::Array{Float64, 1};
rate_mutation ::Float64=0.0,
rate_error ::Float64=0.0)
Arguments
chromosome
: Chromosome codesbp
: Physical positionscM
: Genetic positionsmaf
: Minor allele frequenciesrate_mutation
: Mutation raterate_error
: Error rate of genotyping
Examples
julia> ch = [1, 1, 2, 2, 2]
julia> bp = [130, 205, 186, 503, 780]
julia> cM = [85.7, 149.1, 37.4, 83.6, 134.3]
julia> maf = [0.5, 0.5, 0.5, 0.5, 0.5]
julia> build_genome(ch, bp, cM, maf)
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome : 2
[ Info:
[ Info: Chromosome Length (cM): 283.4
[ Info: [149.1, 134.3]
[ Info:
[ Info: Number of Loci : 5
[ Info: [2, 3]
[ Info:
[ Info: Genotyping Error : 0.0
[ Info: Mutation Rate : 0.0
[ Info:
Main.XSim.build_phenome
— FunctionQuick Start
Quick setup by assigning number of QTL
.
build_phenome(n_qtls ::Union{Array{Int64, 1}, Int64};
args...)
Arguments
n_qtls
: Number of simulatedQTLs
. It can be an array of integers for multiple traits.vg
: Genetic (co)variances ofQTLs
.h2
: Heritability of simulated traits. This will define the residual (co)variances.
Examples
Single trait
julia> build_phenome(10)
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 1
[ Info: Heritability (h2) : [0.5]
┌ Info:
│ Genetic_Variance =
│ 1×1 Array{Float64,2}:
└ 1.0
┌ Info:
│ Residual_Variance =
│ 1×1 Array{Float64,2}:
└ 1.0
[ Info: Number of QTLs : [10]
Multi-trait with additional information
julia> build_phenome([10, 15];
vg = [1 .5
.5 1],
h2 = [.3, .8])
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 2
[ Info: Heritability (h2) : [0.3, 0.8]
┌ Info:
│ Genetic_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.5
└ 0.5 1.0
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 2.33333 0.0
└ 0.0 0.25
[ Info: Number of QTLs : [10 25]
────────────────────────────────────────────────────────────────
Define phenome by a file or a DataFrame
Define genome by providing a formatted dataframe or a path to the file.
build_phenome(dt ::DataFrame; args...)
build_phenome(filename ::String; args...)
Arguments
dt
: ADataFrame
with required columns ofeff_
prefixed specifying marker effects.filename
: A filepath to the file containing phenome information.
Example of the DataFrame
4×7 DataFrame
Row │ id chr bp cM MAF eff_1 eff_2
│ String Int64 Int64 Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────────────────
1 │ snp 1 1 1818249 50.8 0.5 0.1 0.0
2 │ snp 2 1 6557697 80.3 0.5 0.0 -0.3
3 │ snp_3 2 2298800 39.2 0.5 0.2 0.0
4 │ snp 4 2 5015698 66.3 0.5 0.0 0.5
Examples
By a filepath
julia> build_phenome("path/map.csv", h2 = [0.3, 0.5])
or a dataframe
julia> using DataFrames
julia> data = CSV.read("path/map.csv", DataFrame)
julia> build_phenome(data, h2 = [0.3, 0.5])
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 2
[ Info: Heritability (h2) : [0.3, 0.5]
┌ Info:
│ Genetic_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 2.33333 0.0
└ 0.0 1.0
[ Info: Number of QTLs : [2 2]
────────────────────────────────────────────────────────────────
Define phenome by a matrix of QTL effects
build_phenome(QTL_effects ::Union{Array{Float64}, SparseMatrixCSC}; args...)
Arguments
QTL_effects
: A matrix storing marker effects with the dimension of individuals by markers.
Examples
julia> effects = [0.1 0.0
0.0 -0.3
0.2 0.0
0.0 0.5]
julia> build_phenome(effects)
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 2
[ Info: Heritability (h2) : [0.5, 0.5]
┌ Info:
│ Genetic_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
[ Info: Number of QTLs : [2 2]
It's also possible to add additional information such as heritability.
julia> build_phenome(effects, h2=[0.1, 0.8])
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 2
[ Info: Heritability (h2) : [0.1, 0.8]
┌ Info:
│ Genetic_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 9.0 0.0
└ 0.0 0.25
[ Info: Number of QTLs : [2 2]
Main.XSim.Cohort
— TypeInitialize a cohort by population size
Cohort(n::Int64=0)
Arguments
n
: An integer to assign the population size.
Examples
julia> cohort = Cohort(5)
[ Info: Cohort (5 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [1.265 1.697]
[ Info:
[ Info: Variance of breeding values:
[ Info: [1.6 1.4]
──────────────────────────────────────────────────────────────
Initialize a cohort by genotypes/haplotypes files
Cohort(genetic_data ::Union{DataFrame, Array{Int64}}; args...)
Cohort(filename ::String; args...)
Arguments
genetic_data
: Adataframe
/2D-array
that stores genotypes/haplotypes in the dimension of individuals by markers.filename
: Afilepath
to a file storing genotypes/haplotypes data.n
: Number of lines to be loaded from the file. The default value is-1
and the entire file will be loaded.random
: By default it's set totrue
to randomly selectn
lines (individuals) from the file to generate the cohort.alter_maf
: It will update MAF based on the provided genotypes if it's set totrue
(default).
Example of the demo_genotypes.csv
and demo_haplotypes.csv
Both demo files store marker information for 5 individuals and 4 markers. Use DATA("demo_genotypes.csv")
to interact with demo files.
# demo_genotypes.csv
# rows: individuals, columns: markers
# Homozygote is coded as 0 and 2, otherwise is coded as 1
2,0,0,1
0,0,1,0
0,1,0,2
1,1,0,2
2,0,2,0
# demo_haplotypes.csv
# rows: individuals, columns: alleles
# Reference allele is coded as 0, otherwise is coded as 1
1,1,0,0,0,0,1,0
0,0,0,0,1,0,0,0
0,0,0,1,0,0,1,1
1,0,1,0,0,0,1,1
1,1,0,0,1,1,0,0
Example
# Load entire file
julia> cohort = Cohort("demo_haplotypes.csv")
julia> get_genotypes(cohort)
5×4 Array{Int64,2}:
2 0 0 1
2 0 2 0
0 0 1 0
1 1 0 2
0 1 0 2
# Randomly load 3 individuals with a dataframe.
julia> data = DATA("demo_haplotypes.csv", header=false)
julia> cohort = Cohort(data, random=true, n=3)
julia> get_genotypes(cohort)
3×4 Array{Int64,2}:
2 0 2 0
0 1 0 2
1 1 0 2
# Replace marker MAF by the provided file
julia> cohort = Cohort("demo_haplotypes.csv", alter_maf=true)
[ Info: MAF has been updated based on provided haplotypes/genotypes
[ Info: Cohort (5 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [1.418]
[ Info:
[ Info: Variance of breeding values:
[ Info: [2.012]
──────────────────────────────────────────────────────────────
Functions that insepct Cohort
properties:
All the listed functions can take a keyword argument ID=true
to insert individuals' IDs as the first column.
Genotypes
Genotype matrix in the dimension of individuals
by markers
julia> get_genotypes(cohort)
5×4 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
0 0 1 0
2 0 2 0
2 0 0 1
0 1 0 2
1 1 0 2
QTLs
QTLs matrix in the dimension of individuals
by markers
julia> get_QTLs(cohort)
5×3 Array{Int64,2}:
2 2 0
0 0 2
0 1 0
1 0 2
2 0 1
Breeding values
Breeding values in the dimenstion individuals
by traits
julia> get_BVs(cohort)
5×2 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
1.26491 0.0
3.79473 0.0
1.26491 1.21268
0.0 1.69775
0.632456 1.69775
Pedigree
Pedigree matrix, listed columns are in the order of individuals' ID, sire ID, and dam ID.
julia> get_pedigree(cohort)
5×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
Minor Allele Frequencies (MAF)
In the case where we have 3 QTLs out of 4 markers, we want to compare their allel frequencies.
julia> get_MAF(cohort)
4-element Array{Float64,1}:
0.5
0.2
0.3
0.5
Phenotypes
Simulate cohort phenotypes based on the defined phenome
. h2
and ve
can be assigned specifically for this one-time simulation.
julia> get_phenotypes(cohort)
5×1 Array{Float64,2}:
1.1126064336992942
-0.8337021175232547
-0.363621019381922
4.042256656472762
1.7828800511223049
Main.XSim.mate
— FunctionMating function
mate(cohort_A ::Cohort,
cohort_B ::Cohort;
nA ::Int64=cohort_A.n,
nB_per_A ::Int64=1,
n_per_mate ::Int64=1,
replace_A ::Bool =false,
replace_B ::Bool =false,
ratio_malefemale ::Union{Float64, Int64}=0,
scheme ::String ="none",
args...)
mate(cohort::Cohort; args...) = mate(cohort, cohort; args...)
Arguments
Positional arguments
cohort_A
: Acohort
object that is treated as common mating parents.cohort_B
: Acohort
object that is a mating pool from which individuals are sampled to mate withcohort_A
.
Keyword arguments
nA
:nA
individuals will be sampled fromcohort_A
and treated as common parents.nB_per_A
:nB_per_A
individuals sampled fromcohort_B
will mate with each individual fromcohort_A
.n_per_mate
:n_per_mate
progenies will be reproduced from each pair of mating parent.replace_A
: Whether the sampling is replacable incohort_A
.replace_B
: Whether the sampling is replacable incohort_B
.ratio_malefemale
: By default, two cohorts which are male and female progenies will be returned.ratio_malefemale
defined the progenies ratio of males over females. Ifratio_malefemale=0
, only one cohort will be returned.scheme
: Available options are ["random", "diallel cross", "selfing", "DH"]. See the examples for more details.
Outputs
By default, two cohort
objects will be returned. The first cohort
is assumed to be male progenies and the other cohort
are female progenies. The size of two cohorts will folow the ratio raiot_malefemale
. When ratio_malefemale
is set to 0
, only one cohort
will be returned.
Example
Random mating (Default)
Initialize cohorts
julia> cohort_A = Cohort(5)
julia> cohort_B = Cohort(10)
Define mating events
julia> args = Dict(:nA => cohort_A.n,
:nB_per_A => 1,
:replace_A => false,
:replace_B => false,
:n_per_mate => 1)
julia> progenies = mate(cohort_A, cohort_B; args...)
# Equivalent
julia> progenies = mate(cohort_A, cohort_B)
# Equivalent
julia> progenies = mate(cohort_A, cohort_B; scheme="random")
# Equivalent
julia> progenies = cohort_A * cohort_B
Check the pedigree to see if the mating goes as desired.
julia> get_pedigree(progenies)
5×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
19 1 8
16 2 6
17 3 10
20 4 15
18 5 14
Diallel cross
Initialize cohorts
julia> cohort_A = Cohort(2)
julia> cohort_B = Cohort(5)
Define mating events
julia> args = Dict(:nA => sires.n,
:nB_per_A => dams.n,
:replace_A => false,
:replace_B => false,
:n_per_mate => 1,
:ratio_malefemale=> 1)
julia> male, female = mate(cohort_A, cohort_B; args...)
# Equivalent
julia> male, female = mate(cohort_A, cohort_B; scheme="diallel cross")
Check the pedigree to see if the mating goes as desired.
julia> get_pedigree(male + female)
10×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
12 2 7
10 2 6
11 2 4
14 1 3
15 1 5
9 2 5
13 1 6
17 1 4
8 2 3
16 1 7
Selfing
Initialize cohorts
julia> parents = Cohort(5)
In the selfing scheme, only one cohort
is required.
julia> args = Dict(:nA => 3,
:replace_A => false,
:n_per_mate => 5,
:scheme => "selfing")
julia> progenies = mate(parents; args...)
Inspect the pedigree to verify the mating behavior
julia> get_pedigree(progenies)
15×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
6 4 4
7 4 4
8 4 4
9 4 4
10 4 4
11 1 1
12 1 1
13 1 1
14 1 1
15 1 1
16 5 5
17 5 5
18 5 5
19 5 5
20 5 5
Main.XSim.select
— FunctionSelection function
select(cohort ::Cohort,
n ::Int64,
criteria ::Union{String, Array} = "phenotypes";
h2 ::Union{Array{Float64}, Float64}=GLOBAL("h2"),
ve ::Union{Array{Float64}, Float64}=GLOBAL("Ve"),
weights ::Array{Float64, 1} =[1.0],
return_log ::Bool =false,
is_random ::Bool =false,
silent ::Bool =GLOBAL("silent")
select(cohort::Cohort, ratio::Float64; args...)
Arguments
Positional arguments
cohort
: Acohort
from which individuals are selected.n
:n
individuals are selected.ratio
:ratio
portion of individuals are selected.criteria
:Criteria
that will be used for the selecition. Default "phenotypes", the options are ["phenotypes", "GBLUP", array]. If set to "GBLUP", a genetic evaluation is carried out byJWAS
and the estimated breeding values will be thecriteria
. It's also avaialbe to provdie thecriteria
(e.g., phenotypes matrix) directly for the selection.
Keyword arguments
h2
: The heritabilityh2
of the simulated phenotypes.ve
: The residual covarianceve
of the simulated phenotypes.weight
: Linear coefficients of traits for the selection. The selection is more sensitive to traits with greaterweight
. Negativereturn_log
: Defaultfalse
. Settrue
to return selection differential and selection response besides the selected cohort.silent
: Defaultfalse
. Settrue
to mute the log messages.
Outputs
A selected cohort
object will be returned. If return_log
is set to true
, a dictionary
object containing the selected cohort, selection differential, and selection response will be returned.
─────────────────────────────────────────────────────────
Example
Single Trait Selection
Set demo genome and phenome with single traits controlled by 50 QTLs.
julia> build_demo()
julia> build_phenome(50)
julia> summary()
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome : 10
[ Info:
[ Info: Chromosome Length (cM): 1500.0
[ Info: [150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0]
[ Info:
[ Info: Number of Loci : 1000
[ Info: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
[ Info:
[ Info: Genotyping Error : 0.0
[ Info: Mutation Rate : 0.0
[ Info:
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 1
[ Info: Heritability (h2) : [0.5]
┌ Info:
│ Genetic_Variance =
│ 1×1 Array{Float64,2}:
└ 1.0
┌ Info:
│ Residual_Variance =
│ 1×1 Array{Float64,2}:
└ 1.0
[ Info: Number of QTLs : [50]
Initialize a cohort with 100 individuals
julia> cohort = Cohort(100)
Select 30 individuals
# Select top 30 individuals
julia> cohort_s = select(cohort, 30)
# Equivalent
julia> cohort_s = select(cohort, 0.3)
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [1.174]
[ Info: Selection response (G): [0.843]
┌ Info:
│ Residual_Variance =
│ 1×1 Array{Float64,2}:
└ 1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [1.448]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.367]
Assign Heritability h2
or Residual Covariance ve
# Assign heritability
julia> progenies = select(cohort, 30, h2=0.1)
# Equivalent in the case where genetic variance `vg` is 1.0
julia> progenies = select(cohort, 30, ve=9.0)
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [1.182]
[ Info: Selection response (G): [0.338]
┌ Info:
│ Residual_Variance =
│ 1×1 Array{Float64,2}:
└ 9.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [0.956]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.643]
Negative Selection
Set is_positive=false
to rank individuals in ascending order
julia> progenies = select(cohort, 30, is_positive=false)
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [-1.19]
[ Info: Selection response (G): [-0.89]
┌ Info:
│ Residual_Variance =
│ 1×1 Array{Float64,2}:
└ 1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.24]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.566]
Random Selection
julia> progenies = select(cohort, 30, is_random=true)
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [-0.06]
[ Info: Selection response (G): [-0.191]
┌ Info:
│ Residual_Variance =
│ 1×1 Array{Float64,2}:
└ 1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [0.441]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.946]
Selection wiht Multiple Parameters
It's possible specify multiple parameters described above in one selection. User can either enclose parameters as keyword arguments, or pass them through a dictionary
object.
# Keyword args
julia> progenies = select(cohort, 30, h2=0.3, is_positive=false)
# Equivalent
julia> args = Dict(:h2=>0.3,
:is_positive=>false)
julia> progenies = select(cohort, 30; args...)
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [-1.086]
[ Info: Selection response (G): [-0.486]
┌ Info:
│ Residual_Variance =
│ 1×1 Array{Float64,2}:
└ 2.3333333333333335
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [0.154]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.818]
─────────────────────────────────────────────────────────
Multi-Trait Selection
Set demo genome and phenome with single traits controlled by 50 QTLs.
julia> build_demo()
julia> summary()
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome : 10
[ Info:
[ Info: Chromosome Length (cM): 1500.0
[ Info: [150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0]
[ Info:
[ Info: Number of Loci : 1000
[ Info: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
[ Info:
[ Info: Genotyping Error : 0.0
[ Info: Mutation Rate : 0.0
[ Info:
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 2
[ Info: Heritability (h2) : [0.5, 0.5]
┌ Info:
│ Genetic_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
[ Info: Number of QTLs : [3 8]
Initialize a cohort with 100 individuals
julia> cohort = Cohort(100)
Assign Heritabilities for Multiple Traits
julia> progenies = select(cohort, 30, h2=[0.9, 0.3])
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [0.468 1.028]
[ Info: Selection response (G): [0.383 0.636]
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 0.111111 0.0
└ 0.0 2.33333
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.889 0.28]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.947 0.625]
Assign Trait Correlations via Residual Covariance
julia> progenies = select(cohort, 30, ve=[1 0.3
0.3 1])
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [0.866 0.925]
[ Info: Selection response (G): [0.662 0.762]
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.3
└ 0.3 1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.608 0.406]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.549 0.476]
Derive Selection Index for Multiple Traits
Assigning a vector to the parameter weights
to derive a selection index which is a linear combintation of the weights and the phenotypes. In this example, we demonstrate two traits with the heritability of 0.3 and 0.8, respectively. And we can select traits with more weight on the second trait which is more heritable, and negatively select the first trait.
julia> progenies = select(cohort, 30, h2=[.3, .8], weights=[-0.1, 0.9])
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [-0.318 1.027]
[ Info: Selection response (G): [-0.233 0.869]
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 2.33333 0.0
└ 0.0 0.25
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-1.508 0.513]
[ Info:
[ Info: Variance of breeding values:
[ Info: [1.053 0.458]
Main.XSim.breed
— FunctionWrap-up function: breed()
breed(cohort_A ::Cohort,
cohort_B ::Cohort;
n_gens ::Int64=1,
n_select ::Int64=cohort_A.n + cohort_B.n,
n_select_A ::Int64=cohort_A.n,
n_select_B ::Int64=cohort_B.n,
select_all_gens ::Bool=false,
args...)
breed(cohort::Cohort, n_gens::Int64, args...) =
breed(cohort, cohort, n_gens; args...)
Arguments
Positional arguments
cohort_A
: Acohort
object that is treated as common mating parents. It's asssumed to be sires/males parents.cohort_B
: Acohort
object that is a mating pool from which individuals are sampled to mate withcohort_A
. It's assumed to be dams/female parents.
Keywords arguments
n_gens
: An integer specifying how many mate-select generations are carried out.n_select
: Used whenratio_malefemale
is set to0
.n_select
individuals are selected as parents for the next generation.n_select_A
: Used whenratio_malefemale
is not0
.n_select_A
will be selected as male parents for the next generation.n_select_B
: Used whenratio_malefemale
is not0
.n_select_B
will be selected as female parents for the next generation.select_all_gens
: Default "false" and parents are not included in the next generation pool for selections. Setselect_all_gens
to "true" if the selections consider individuals from all generations.
Outputs
By default, two cohort
objects will be returned. The first cohort
is assumed to be male progenies and the other cohort
are female progenies. The size of two cohorts will folow the ratio raiot_malefemale
. When ratio_malefemale
is set to 0
, only one cohort
will be returned.
Examples
We can have 10
sires and mate each sire with 5
dams for 5
generations. In each generation, we randomly select 10
male progenies as sires and all female progenies as dams for the next generation. We can derive such breeding scheme as below:
julia> args = Dict(# mating arguments
:nA => 10,
:nB_per_A => 5,
:n_per_mate => 2,
:ratio_malefemale => 1.0,
# selection arguments
:is_random => true,
# breeding arguments
:n_gens => 5,
:nA_select => 10,
:select_all_gens => true)
Simulate 10 sires and 50 dams as founders.
julia> sires = Founders(10)
julia> dams = Founders(50)
Breed cohorts based on the defined arguments.
julia> sires, dams = breed(sires, dams; args...)
[ Info: Gen 0 -> Mean of BVs: [1.665 2.745], Variance of BVs: [1.008 0.479]
[ Info: Gen 1 -> Mean of BVs: [1.719 2.715], Variance of BVs: [0.96 0.546]
[ Info: Gen 2 -> Mean of BVs: [1.754 2.777], Variance of BVs: [0.936 0.724]
[ Info: Gen 3 -> Mean of BVs: [1.766 2.796], Variance of BVs: [0.927 0.775]
[ Info: Gen 4 -> Mean of BVs: [1.773 2.771], Variance of BVs: [0.951 0.781]
[ Info: Gen 5 -> Mean of BVs: [1.813 2.751], Variance of BVs: [0.952 0.804]
([ Info: Cohort (60 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [1.855 2.593]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.884 0.697]
, [ Info: Cohort (300 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [1.805 2.782]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.969 0.822]
The result is equivalent to the following mate-select iterations:
julia> for _ in 1:5
males, females = mate(sires, dams, args...)
sires += select(males, n_sires, args...)
dams += females
end
Main.XSim.genetic_evaluation
— FunctionGenetic evaluation
genetic_evaluation(cohort ::Cohort,
phenotypes ::DataFrame=DataFrame();
model_equation ::String="",
covariates ::String="",
random_iid ::String="",
random_str ::String="",
methods ::String="GBLUP",
add_genotypes ::Bool=true,
idx_missing_p ::Any=[],
return_out ::Bool=false,
args...)
Arguments
cohort
: The evaluatedcohort
.phenotypes
: Adataframe
wiht columns ofid
, traits, and other studied factors.h2
: Default 0.5. Heritability for simulating cohort phenotypes if nophenotype
is provided.ve
: Default 1. Residual variance for simulating cohort phenotypes if nophenotype
is provided.model_equation
: Default "y ~ intercept". The equation for fitting the breeding values.covariates
: Specifies terms inphenotypes
as continuous factors. Must be included inmodel_equation
as well.random_iid
: Specifies terms inphenotypes
as random effects (i.i.d.). Must be included inmodel_equation
as well.random_str
: Specifies terms inphenotypes
as random effects (pedigree). Must be included inmodel_equation
as well.methods
: Default "GBLUP". Defines what models/methods used in the genetic evaluation.add_genotypes
: Defaulttrue
. Genotypes will be included in the equation.idx_missing_p
: A vector assigning which individuals are not phenotyped.return_out
: Defaultfalse
. Set totrue
to return the completeJWAS
outputs.
Outputs
A n
by t
matrix containing breeding values will be return if return_out = false
, where n
is number of individuals and t
is number of evaluated traits. A complete JWAS
outputs will be returned if return_out = true
.
Example of the phenotypes
dataframe
Row │ ID y1 y2 factor_1 factor_2
│ String Float64? Float64? Int64 Int64
─────┼─────────────────────────────────────────────────────────────
1 │ 1 0.88976 -0.0798048 1 1
2 │ 2 -0.783203 0.988616 1 1
3 │ 3 missing missing 1 1
4 │ 4 missing missing 1 1
5 │ 5 missing missing 1 1
6 │ 6 missing missing 1 2
7 │ 7 missing missing 1 2
8 │ 8 -1.76058 0.277289 1 2
9 │ 9 0.938871 2.57784 1 2
10 │ 10 0.37026 3.15993 1 2
11 │ 11 -1.91869 0.0935064 2 3
12 │ 12 -0.89847 1.8987 2 3
13 │ 13 1.69663 -0.949513 2 3
14 │ 14 3.48862 0.654378 2 3
15 │ 15 1.39615 1.80355 2 3
16 │ 16 2.31685 2.13446 2 4
17 │ 17 -3.81017 0.0186156 2 4
18 │ 18 -1.71216 -0.0976809 2 4
19 │ 19 1.80917 1.34104 2 4
20 │ 20 -0.504771 3.22665 2 4
Examples
We will use demo genome
and phenome
for the example. A cohort
with 20 individuals is simulated.
julia> build_demo()
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome : 10
[ Info:
[ Info: Chromosome Length (cM):
[ Info: [150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0]
[ Info:
[ Info: Number of Loci : 1000
[ Info: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
[ Info:
[ Info: Genotyping Error : 0.0
[ Info: Mutation Rate : 0.0
[ Info:
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits : 2
[ Info: Heritability (h2) : [0.5, 0.5]
┌ Info:
│ Genetic_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
[ Info: Number of QTLs : [3 8]
julia> cohort = Cohort(20)
[ Info: Cohort (20 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.862 -0.913]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.814 1.448]
─────────────────────────────────────────────────────────
Basic usage
By default, GBLUP will be performed without providing any argument.
julia> out = genetic_evaluation(cohort)
20×2 Array{Float64,2}:
42.2908 -5.89224
6.38041 -4.5895
19.5065 -57.4513
24.05 -84.5086
11.5925 45.5247
-11.5926 -1.91879
-29.2152 -35.5977
-1.92563 63.8841
19.8686 76.2722
22.6728 45.5179
-58.2754 42.5953
28.556 -21.0314
11.5918 25.1126
-0.517653 -0.237724
-11.4712 -47.3073
-10.9575 -3.58754
-11.7835 -9.62802
-17.7895 -22.0821
-39.5114 -3.46026
6.53008 -1.61416
The breeding values out
can be used as criteria for the selection directly.
julia> select(cohort, 5, out)
[ Info: --------- Selection Summary ---------
[ Info: Select 5 individuals out of 20 individuals
[ Info: Selection differential (P): [0.783 1.192]
[ Info: Selection response (G): [0.344 1.063]
┌ Info:
│ Residual_Variance =
│ 2×2 Array{Float64,2}:
│ 1.0 0.0
└ 0.0 1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (5 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.552 0.367]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.181 0.539]
Conditions of the evaluation can be further defined by h2
or ve
.
julia> out = genetic_evaluation(cohort, ve = [1 0.5
0.5 1])
Set return_out = true
to obtain the complete JWAS
outputs.
julia> out = genetic_evaluation(cohort, return_out = true)
Dict{Any,Any} with 7 entries:
"EBV_y2" => 20×3 DataFrame…
"EBV_y1" => 20×3 DataFrame…
"heritability" => 2×3 DataFrame…
"location parameters" => 2×5 DataFrame…
"residual variance" => 4×3 DataFrame…
"marker effects geno" => 40×5 DataFrame…
"genetic_variance" => 4×3 DataFrame…
─────────────────────────────────────────────────────────
Customized phenotypes and factors.
Obtain JWAS-compatible dataframe.
dt_p = get_phenotypes(cohort, "JWAS")
Assign un-phenotyped individuals
julia> idx = 3:6
julia> allowmissing!(dt_p);
julia> dt_p[idx, 2:end] .= missing;
julia> first(dt_p, 10)
10×3 DataFrame
Row │ ID y1 y2
│ String? Float64? Float64?
─────┼──────────────────────────────────────────
1 │ 1 -0.0933375 -0.882781
2 │ 2 -1.50748 -2.12898
3 │ 3 missing missing
4 │ 4 missing missing
5 │ 5 missing missing
6 │ 6 missing missing
7 │ 7 -0.431361 -0.624666
8 │ 8 -1.17867 0.415607
9 │ 9 0.266733 1.02123
10 │ 10 -1.15588 0.737982
Provide customized phenotypes for the evaluation
julia> out = genetic_evaluation(dams, dt_p)
It's equivalent to set idx_missing_p = 3:6
for the missing phenotypes.
julia> out = genetic_evaluation(dams, dt_p, idx_missing_p = 3:6)
The marker IDs are set to 1,2,...,#markers
The individual IDs is set to 1,2,...,#observations
Genotype informatin:
#markers: 1000; #individuals: 20
The folder results is created to save results.
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for 16 individuals are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
We can simulated factor_1
and factor_2
as fixed and random effects, respectively. And we use both factor_1
and factor_2
to fit y1
, and factor_1
alone to fit y2
.
julia> dt_p[:, "factor_1"] = [i for i in 1:4 for j in 1:5];
julia> dt_p[:, "factor_2"] = [i for i in 1:2 for j in 1:10];
julia> out = genetic_evaluation(cohort, dt_p,
model_equation="y1 = intercept + factor_1 + factor_2
y2 = intercept + factor_1",
random_iid="factor_2",
return_out=true)
factor_2 is not found in model equation 2.
The marker IDs are set to 1,2,...,#markers
The individual IDs is set to 1,2,...,#observations
Genotype informatin:
#markers: 1000; #individuals: 20
The folder results is created to save results.
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for 16 individuals are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
Prior information for random effect variance is not provided and is generated from the data.
Pi (Π) is not provided.
Pi (Π) is generated assuming all markers have effects on all traits.
A Linear Mixed Model was build using model equations:
y1 = intercept + factor_1 + factor_2
y2 = intercept + factor_1
Model Information:
Term C/F F/R nLevels
intercept factor fixed 1
factor_1 factor fixed 4
factor_2 factor random 2
MCMC Information:
chain_length 100
burnin 0
starting_value true
printout_frequency 101
output_samples_frequency 1
constraint false
missing_phenotypes true
update_priors_frequency 0
seed false
Hyper-parameters Information:
random effect variances (y1:factor_2):
0.45
residual variances:
1.0f0 0.0f0
0.0f0 1.0f0
Genomic Information:
complete genomic data (i.e., non-single-step analysis)
Genomic Category geno
Method GBLUP
genetic variances (genomic):
1.0 0.0
0.0 1.0
estimateScale false
Degree of freedom for hyper-parameters:
residual variances: 6.000
random effect variances: 5.000
marker effect variances: 6.000
The version of Julia and Platform in use:
Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =
The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.
Dict{Any,Any} with 7 entries:
"EBV_y2" => 20×3 DataFrame…
"EBV_y1" => 20×3 DataFrame…
"heritability" => 2×3 DataFrame…
"location parameters" => 12×5 DataFrame…
"residual variance" => 4×3 DataFrame…
"marker effects geno" => 32×5 DataFrame…
"genetic_variance" => 4×3 DataFrame…