Library

Index

Contents

Main.XSim.build_genomeFunction

Quick Start

Quick setup by assigning number of markers and chromosomes.

build_genome(;n_loci ::Int64=-1,
              n_chr    ::Int64=10,
              species  ::String="none",
              args...)

Arguments

  • n_loci : Number of simulated markers for each chromosome
  • n_chr: Number of simulated chromosome with length of 100 centimorgan
  • species : Infer genetic position (Morgan) by pre-load linkage maps, available species are: ["cattle", and "pig"]

Examples

julia> build_genome(n_chr  = 2,
                    n_loci = 10000)

[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome  : 2
[ Info:
[ Info: Chromosome Length (cM): 200.0
[ Info: [100.0, 100.0]
[ Info:
[ Info: Number of Loci        : 20000
[ Info: [10000, 10000]
[ Info:
[ Info: Genotyping Error      : 0.0
[ Info: Mutation Rate         : 0.0
[ Info:

Define genome by a file or a DataFrame

Define genome by providing a formatted dataframe or a path to the file.

build_genome(dt      ::DataFrame;
             species ::String="none",
             args...)

build_genome(filename::String;
             species ::String="none",
             args...)

Arguments

  • dt : A DataFrame with required columns of chr and cM, or chr and bp if species is provided for the inference.
  • filename : A filepath to the file containing genome information.
  • species : Adjust genetic position (Morgan) by pre-load linkage maps, available species are: ["cattle", and "pig"]

Example of the DataFrame

4×7 DataFrame
 Row │ id      chr    bp       cM       MAF      eff_1    eff_2
     │ String  Int64  Int64    Float64  Float64  Float64  Float64
─────┼────────────────────────────────────────────────────────────
   1 │ snp_1       1  1818249     50.8      0.5      0.1      0.0
   2 │ snp_2       1  6557697     80.3      0.5      0.0      0.0
   3 │ snp_3       2  2298800     39.2      0.5      0.2      0.0
   4 │ snp_4       2  5015698     66.3      0.5      0.0      0.5

Examples

By a filepath

julia> build_genome("path/map.csv";
                    rate_mutation=0.005, rate_error=0.01)

or a dataframe

julia> using DataFrames
julia> data = CSV.read("path/map.csv", DataFrame)
julia> build_genome(data;
                    rate_mutation=0.005, rate_error=0.01)

[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome  : 2
[ Info:
[ Info: Chromosome Length (cM): 146.6
[ Info: [80.3, 66.3]
[ Info:
[ Info: Number of Loci        : 4
[ Info: [2, 2]
[ Info:
[ Info: Genotyping Error      : 0.01
[ Info: Mutation Rate         : 0.005
[ Info:

Use cattle genome as reference to infer the genetic positions

julia> build_genome("path/map.csv"; species="cattle")

[ Info: Arias,J.A. et al. (2009) A high density linkage map of the bovine genome. BMC Genetics, 10, 18.
[ Info: Reference Genome : Btau 4.0
[ Info: SNP Chip         : Affymetrix GeneChip Bovine Mapping 10K SNP kit

┌ Warning: The provided genetic distances will be replaced with ones infered from preloaded linkage maps
└ @ XSim ~/Dropbox/projects/XSim/src/objects/global.jl:118
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome  : 2
[ Info:
[ Info: Chromosome Length (cM): 16.8
[ Info: [15.1, 1.7]
[ Info:
[ Info: Number of Loci        : 4
[ Info: [2, 2]
[ Info:
[ Info: Genotyping Error      : 0.0
[ Info: Mutation Rate         : 0.0
[ Info:

Explict Definition

Define genome by providing genetic information of each loci explicitly.

build_genome(chromosome    ::Array{Int64,   1},
             bp            ::Array{Int64,   1},
             cM            ::Array{Float64, 1},
             maf           ::Array{Float64, 1};
             rate_mutation ::Float64=0.0,
             rate_error    ::Float64=0.0)

Arguments

  • chromosome : Chromosome codes
  • bp : Physical positions
  • cM : Genetic positions
  • maf : Minor allele frequencies
  • rate_mutation : Mutation rate
  • rate_error : Error rate of genotyping

Examples

julia> ch  = [1,    1,     2,    2,    2]
julia> bp  = [130,  205,   186,  503,  780]
julia> cM  = [85.7, 149.1, 37.4, 83.6, 134.3]
julia> maf = [0.5,  0.5,   0.5,  0.5,  0.5]
julia> build_genome(ch, bp, cM, maf)

[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome  : 2
[ Info:
[ Info: Chromosome Length (cM): 283.4
[ Info: [149.1, 134.3]
[ Info:
[ Info: Number of Loci        : 5
[ Info: [2, 3]
[ Info:
[ Info: Genotyping Error      : 0.0
[ Info: Mutation Rate         : 0.0
[ Info:
source
Main.XSim.build_phenomeFunction

Quick Start

Quick setup by assigning number of QTL.

build_phenome(n_qtls ::Union{Array{Int64, 1}, Int64};
              args...)

Arguments

  • n_qtls : Number of simulated QTLs. It can be an array of integers for multiple traits.
  • vg : Genetic (co)variances of QTLs.
  • h2 : Heritability of simulated traits. This will define the residual (co)variances.

Examples

Single trait

julia> build_phenome(10)
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits      : 1
[ Info: Heritability (h2)     : [0.5]
┌ Info:
│   Genetic_Variance =
│    1×1 Array{Float64,2}:
└     1.0
┌ Info:
│   Residual_Variance =
│    1×1 Array{Float64,2}:
└     1.0
[ Info: Number of QTLs        : [10]

Multi-trait with additional information

julia> build_phenome([10, 15];
                     vg = [1 .5
                          .5  1],
                     h2 = [.3, .8])
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits      : 2
[ Info: Heritability (h2)     : [0.3, 0.8]
┌ Info:
│   Genetic_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.5
└     0.5  1.0
┌ Info:
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     2.33333  0.0
└     0.0      0.25
[ Info: Number of QTLs        : [10 25]

────────────────────────────────────────────────────────────────

Define phenome by a file or a DataFrame

Define genome by providing a formatted dataframe or a path to the file.

build_phenome(dt        ::DataFrame; args...)
build_phenome(filename  ::String;    args...)

Arguments

  • dt : A DataFrame with required columns of eff_ prefixed specifying marker effects.
  • filename : A filepath to the file containing phenome information.

Example of the DataFrame

4×7 DataFrame
 Row │ id      chr    bp       cM       MAF      eff_1    eff_2   
     │ String  Int64  Int64    Float64  Float64  Float64  Float64 
─────┼────────────────────────────────────────────────────────────
   1 │ snp 1       1  1818249     50.8      0.5      0.1      0.0
   2 │ snp 2       1  6557697     80.3      0.5      0.0     -0.3
   3 │ snp_3       2  2298800     39.2      0.5      0.2      0.0
   4 │ snp 4       2  5015698     66.3      0.5      0.0      0.5

Examples

By a filepath

julia> build_phenome("path/map.csv", h2 = [0.3, 0.5])

or a dataframe

julia> using DataFrames
julia> data = CSV.read("path/map.csv", DataFrame)
julia> build_phenome(data, h2 = [0.3, 0.5])

[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits      : 2
[ Info: Heritability (h2)     : [0.3, 0.5]
┌ Info:
│   Genetic_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
┌ Info:
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     2.33333  0.0
└     0.0      1.0
[ Info: Number of QTLs        : [2 2]

────────────────────────────────────────────────────────────────

Define phenome by a matrix of QTL effects

build_phenome(QTL_effects ::Union{Array{Float64}, SparseMatrixCSC}; args...)

Arguments

  • QTL_effects : A matrix storing marker effects with the dimension of individuals by markers.

Examples

julia> effects = [0.1  0.0
                  0.0 -0.3
                  0.2  0.0
                  0.0  0.5]
julia> build_phenome(effects)
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits      : 2
[ Info: Heritability (h2)     : [0.5, 0.5]
┌ Info:
│   Genetic_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
┌ Info:
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
[ Info: Number of QTLs        : [2 2]

It's also possible to add additional information such as heritability.

julia> build_phenome(effects, h2=[0.1, 0.8])
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits      : 2
[ Info: Heritability (h2)     : [0.1, 0.8]
┌ Info:
│   Genetic_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
┌ Info:
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     9.0  0.0
└     0.0  0.25
[ Info: Number of QTLs        : [2 2]
source
Main.XSim.CohortType

Initialize a cohort by population size

Cohort(n::Int64=0)

Arguments

  • n : An integer to assign the population size.

Examples

julia> cohort = Cohort(5)
[ Info: Cohort (5 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [1.265 1.697]
[ Info:
[ Info: Variance of breeding values:
[ Info: [1.6 1.4]

──────────────────────────────────────────────────────────────

Initialize a cohort by genotypes/haplotypes files

Cohort(genetic_data ::Union{DataFrame, Array{Int64}}; args...)
Cohort(filename     ::String; args...)

Arguments

  • genetic_data : A dataframe/2D-array that stores genotypes/haplotypes in the dimension of individuals by markers.
  • filename : A filepath to a file storing genotypes/haplotypes data.
  • n : Number of lines to be loaded from the file. The default value is -1 and the entire file will be loaded.
  • random : By default it's set to true to randomly select n lines (individuals) from the file to generate the cohort.
  • alter_maf : It will update MAF based on the provided genotypes if it's set to true (default).

Example of the demo_genotypes.csv and demo_haplotypes.csv

Both demo files store marker information for 5 individuals and 4 markers. Use DATA("demo_genotypes.csv") to interact with demo files.

# demo_genotypes.csv
# rows: individuals, columns: markers
# Homozygote is coded as 0 and 2, otherwise is coded as 1
2,0,0,1
0,0,1,0
0,1,0,2
1,1,0,2
2,0,2,0

# demo_haplotypes.csv
# rows: individuals, columns: alleles
# Reference allele is coded as 0, otherwise is coded as 1
1,1,0,0,0,0,1,0
0,0,0,0,1,0,0,0
0,0,0,1,0,0,1,1
1,0,1,0,0,0,1,1
1,1,0,0,1,1,0,0

Example

# Load entire file
julia> cohort = Cohort("demo_haplotypes.csv")
julia> get_genotypes(cohort)
5×4 Array{Int64,2}:
 2  0  0  1
 2  0  2  0
 0  0  1  0
 1  1  0  2
 0  1  0  2

# Randomly load 3 individuals with a dataframe.
julia> data = DATA("demo_haplotypes.csv", header=false)
julia> cohort = Cohort(data, random=true, n=3)
julia> get_genotypes(cohort)
3×4 Array{Int64,2}:
 2  0  2  0
 0  1  0  2
 1  1  0  2

# Replace marker MAF by the provided file
julia> cohort = Cohort("demo_haplotypes.csv", alter_maf=true)
[ Info: MAF has been updated based on provided haplotypes/genotypes
[ Info: Cohort (5 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [1.418]
[ Info:
[ Info: Variance of breeding values:
[ Info: [2.012]

──────────────────────────────────────────────────────────────

Functions that insepct Cohort properties:

All the listed functions can take a keyword argument ID=true to insert individuals' IDs as the first column.

Genotypes

Genotype matrix in the dimension of individuals by markers

julia> get_genotypes(cohort)
5×4 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
 0  0  1  0
 2  0  2  0
 2  0  0  1
 0  1  0  2
 1  1  0  2

QTLs

QTLs matrix in the dimension of individuals by markers

julia> get_QTLs(cohort)
5×3 Array{Int64,2}:
 2  2  0
 0  0  2
 0  1  0
 1  0  2
 2  0  1

Breeding values

Breeding values in the dimenstion individuals by traits

julia> get_BVs(cohort)
5×2 LinearAlgebra.Adjoint{Float64,Array{Float64,2}}:
 1.26491   0.0
 3.79473   0.0
 1.26491   1.21268
 0.0       1.69775
 0.632456  1.69775

Pedigree

Pedigree matrix, listed columns are in the order of individuals' ID, sire ID, and dam ID.

julia> get_pedigree(cohort)
5×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
1  0  0
2  0  0
3  0  0
4  0  0
5  0  0

Minor Allele Frequencies (MAF)

In the case where we have 3 QTLs out of 4 markers, we want to compare their allel frequencies.

julia> get_MAF(cohort)
4-element Array{Float64,1}:
 0.5
 0.2
 0.3
 0.5

Phenotypes

Simulate cohort phenotypes based on the defined phenome. h2 and ve can be assigned specifically for this one-time simulation.

julia> get_phenotypes(cohort)
5×1 Array{Float64,2}:
  1.1126064336992942
 -0.8337021175232547
 -0.363621019381922
  4.042256656472762
  1.7828800511223049
source
Main.XSim.mateFunction

Mating function

mate(cohort_A         ::Cohort,
     cohort_B         ::Cohort;
     nA               ::Int64=cohort_A.n,
     nB_per_A         ::Int64=1,
     n_per_mate       ::Int64=1,
     replace_A        ::Bool =false,
     replace_B        ::Bool =false,
     ratio_malefemale ::Union{Float64, Int64}=0,
     scheme           ::String ="none",
     args...)

mate(cohort::Cohort; args...) =  mate(cohort, cohort; args...)

Arguments

Positional arguments

  • cohort_A : A cohort object that is treated as common mating parents.
  • cohort_B : A cohort object that is a mating pool from which individuals are sampled to mate with cohort_A.

Keyword arguments

  • nA : nA individuals will be sampled from cohort_A and treated as common parents.
  • nB_per_A : nB_per_A individuals sampled from cohort_B will mate with each individual from cohort_A.
  • n_per_mate : n_per_mate progenies will be reproduced from each pair of mating parent.
  • replace_A : Whether the sampling is replacable in cohort_A.
  • replace_B : Whether the sampling is replacable in cohort_B.
  • ratio_malefemale : By default, two cohorts which are male and female progenies will be returned. ratio_malefemale defined the progenies ratio of males over females. If ratio_malefemale=0, only one cohort will be returned.
  • scheme : Available options are ["random", "diallel cross", "selfing", "DH"]. See the examples for more details.

Outputs

By default, two cohort objects will be returned. The first cohort is assumed to be male progenies and the other cohort are female progenies. The size of two cohorts will folow the ratio raiot_malefemale. When ratio_malefemale is set to 0, only one cohort will be returned.

Example

Random mating (Default)

Initialize cohorts

julia> cohort_A = Cohort(5)
julia> cohort_B = Cohort(10)

Define mating events

julia> args = Dict(:nA               => cohort_A.n,
                   :nB_per_A         => 1,
                   :replace_A        => false,
                   :replace_B        => false,
                   :n_per_mate       => 1)
julia> progenies = mate(cohort_A, cohort_B; args...)

# Equivalent
julia> progenies = mate(cohort_A, cohort_B)

# Equivalent
julia> progenies = mate(cohort_A, cohort_B; scheme="random")

# Equivalent
julia> progenies = cohort_A * cohort_B

Check the pedigree to see if the mating goes as desired.

julia> get_pedigree(progenies)
5×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
 19  1   8
 16  2   6
 17  3  10
 20  4  15
 18  5  14

Diallel cross

Initialize cohorts

julia> cohort_A = Cohort(2)
julia> cohort_B = Cohort(5)

Define mating events

julia> args = Dict(:nA              => sires.n,
                   :nB_per_A        => dams.n,
                   :replace_A       => false,
                   :replace_B       => false,
                   :n_per_mate      => 1,
                   :ratio_malefemale=> 1)
julia> male, female = mate(cohort_A, cohort_B; args...)

# Equivalent
julia> male, female = mate(cohort_A, cohort_B; scheme="diallel cross")

Check the pedigree to see if the mating goes as desired.

julia> get_pedigree(male + female)
10×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
 12  2  7
 10  2  6
 11  2  4
 14  1  3
 15  1  5
  9  2  5
 13  1  6
 17  1  4
  8  2  3
 16  1  7

Selfing

Initialize cohorts

julia> parents = Cohort(5)

In the selfing scheme, only one cohort is required.

julia> args = Dict(:nA          => 3,
                   :replace_A   => false,
                   :n_per_mate  => 5,
                   :scheme      => "selfing")
julia> progenies = mate(parents; args...)

Inspect the pedigree to verify the mating behavior

julia> get_pedigree(progenies)
15×3 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
  6  4  4
  7  4  4
  8  4  4
  9  4  4
 10  4  4
 11  1  1
 12  1  1
 13  1  1
 14  1  1
 15  1  1
 16  5  5
 17  5  5
 18  5  5
 19  5  5
 20  5  5
source
Main.XSim.selectFunction

Selection function

select(cohort      ::Cohort,
       n           ::Int64,
       criteria    ::Union{String, Array} = "phenotypes";
       h2          ::Union{Array{Float64}, Float64}=GLOBAL("h2"),
       ve          ::Union{Array{Float64}, Float64}=GLOBAL("Ve"),
       weights     ::Array{Float64, 1}  =[1.0],
       return_log  ::Bool               =false,
       is_random   ::Bool               =false,
       silent      ::Bool               =GLOBAL("silent")

select(cohort::Cohort, ratio::Float64; args...)

Arguments

Positional arguments

  • cohort : A cohort from which individuals are selected.
  • n : n individuals are selected.
  • ratio : ratio portion of individuals are selected.
  • criteria : Criteria that will be used for the selecition. Default "phenotypes", the options are ["phenotypes", "GBLUP", array]. If set to "GBLUP", a genetic evaluation is carried out by JWAS and the estimated breeding values will be the criteria. It's also avaialbe to provdie the criteria (e.g., phenotypes matrix) directly for the selection.

Keyword arguments

  • h2 : The heritability h2 of the simulated phenotypes.
  • ve : The residual covariance ve of the simulated phenotypes.
  • weight : Linear coefficients of traits for the selection. The selection is more sensitive to traits with greater weight. Negative
  • return_log : Default false. Set true to return selection differential and selection response besides the selected cohort.
  • silent : Default false. Set true to mute the log messages.

Outputs

A selected cohort object will be returned. If return_log is set to true, a dictionary object containing the selected cohort, selection differential, and selection response will be returned.

─────────────────────────────────────────────────────────

Example

Single Trait Selection

Set demo genome and phenome with single traits controlled by 50 QTLs.

julia> build_demo()
julia> build_phenome(50)
julia> summary()
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome  : 10
[ Info: 
[ Info: Chromosome Length (cM): 1500.0
[ Info: [150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0]
[ Info: 
[ Info: Number of Loci        : 1000
[ Info: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
[ Info: 
[ Info: Genotyping Error      : 0.0
[ Info: Mutation Rate         : 0.0
[ Info: 
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits      : 1
[ Info: Heritability (h2)     : [0.5]
┌ Info: 
│   Genetic_Variance =
│    1×1 Array{Float64,2}:
└     1.0
┌ Info: 
│   Residual_Variance =
│    1×1 Array{Float64,2}:
└     1.0
[ Info: Number of QTLs        : [50]

Initialize a cohort with 100 individuals

julia> cohort = Cohort(100)

Select 30 individuals

# Select top 30 individuals
julia> cohort_s = select(cohort, 30)
# Equivalent
julia> cohort_s = select(cohort, 0.3)

[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [1.174]
[ Info: Selection response     (G): [0.843]
┌ Info:
│   Residual_Variance =
│    1×1 Array{Float64,2}:
└     1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [1.448]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.367]

Assign Heritability h2 or Residual Covariance ve

# Assign heritability
julia> progenies = select(cohort, 30, h2=0.1)

# Equivalent in the case where genetic variance `vg` is 1.0
julia> progenies = select(cohort, 30, ve=9.0)

[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [1.182]
[ Info: Selection response     (G): [0.338]
┌ Info:
│   Residual_Variance =
│    1×1 Array{Float64,2}:
└     9.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info: 
[ Info: Mean of breeding values: 
[ Info: [0.956]
[ Info: 
[ Info: Variance of breeding values: 
[ Info: [0.643]

Negative Selection

Set is_positive=false to rank individuals in ascending order

julia> progenies = select(cohort, 30, is_positive=false)
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [-1.19]
[ Info: Selection response     (G): [-0.89]
┌ Info: 
│   Residual_Variance =
│    1×1 Array{Float64,2}:
└     1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info: 
[ Info: Mean of breeding values: 
[ Info: [-0.24]
[ Info: 
[ Info: Variance of breeding values: 
[ Info: [0.566]

Random Selection

julia> progenies = select(cohort, 30, is_random=true)
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [-0.06]
[ Info: Selection response     (G): [-0.191]
┌ Info: 
│   Residual_Variance =
│    1×1 Array{Float64,2}:
└     1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info: 
[ Info: Mean of breeding values: 
[ Info: [0.441]
[ Info: 
[ Info: Variance of breeding values: 
[ Info: [0.946]

Selection wiht Multiple Parameters

It's possible specify multiple parameters described above in one selection. User can either enclose parameters as keyword arguments, or pass them through a dictionary object.

# Keyword args
julia> progenies = select(cohort, 30, h2=0.3, is_positive=false)

# Equivalent
julia> args = Dict(:h2=>0.3,
                   :is_positive=>false)
julia> progenies = select(cohort, 30; args...)

[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [-1.086]
[ Info: Selection response     (G): [-0.486]
┌ Info: 
│   Residual_Variance =
│    1×1 Array{Float64,2}:
└     2.3333333333333335
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info: 
[ Info: Mean of breeding values: 
[ Info: [0.154]
[ Info: 
[ Info: Variance of breeding values: 
[ Info: [0.818]

─────────────────────────────────────────────────────────

Multi-Trait Selection

Set demo genome and phenome with single traits controlled by 50 QTLs.

julia> build_demo()
julia> summary()
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome  : 10
[ Info: 
[ Info: Chromosome Length (cM): 1500.0
[ Info: [150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0]
[ Info: 
[ Info: Number of Loci        : 1000
[ Info: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
[ Info: 
[ Info: Genotyping Error      : 0.0
[ Info: Mutation Rate         : 0.0
[ Info: 
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits      : 2
[ Info: Heritability (h2)     : [0.5, 0.5]
┌ Info: 
│   Genetic_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
┌ Info: 
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
[ Info: Number of QTLs        : [3 8]

Initialize a cohort with 100 individuals

julia> cohort = Cohort(100)

Assign Heritabilities for Multiple Traits

julia> progenies = select(cohort, 30, h2=[0.9, 0.3])
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [0.468 1.028]
[ Info: Selection response     (G): [0.383 0.636]
┌ Info: 
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     0.111111  0.0
└     0.0       2.33333
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info: 
[ Info: Mean of breeding values: 
[ Info: [-0.889 0.28]
[ Info: 
[ Info: Variance of breeding values: 
[ Info: [0.947 0.625]

Assign Trait Correlations via Residual Covariance

julia> progenies = select(cohort, 30, ve=[1   0.3
                                          0.3   1])
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [0.866 0.925]
[ Info: Selection response     (G): [0.662 0.762]
┌ Info: 
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.3
└     0.3  1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info: 
[ Info: Mean of breeding values: 
[ Info: [-0.608 0.406]
[ Info: 
[ Info: Variance of breeding values: 
[ Info: [0.549 0.476]

Derive Selection Index for Multiple Traits

Assigning a vector to the parameter weights to derive a selection index which is a linear combintation of the weights and the phenotypes. In this example, we demonstrate two traits with the heritability of 0.3 and 0.8, respectively. And we can select traits with more weight on the second trait which is more heritable, and negatively select the first trait.

julia> progenies = select(cohort, 30, h2=[.3, .8], weights=[-0.1, 0.9])
[ Info: --------- Selection Summary ---------
[ Info: Select 30 individuals out of 100 individuals
[ Info: Selection differential (P): [-0.318 1.027]
[ Info: Selection response     (G): [-0.233 0.869]
┌ Info:
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     2.33333  0.0
└     0.0      0.25
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (30 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-1.508 0.513]
[ Info:
[ Info: Variance of breeding values:
[ Info: [1.053 0.458]
source
Main.XSim.breedFunction

Wrap-up function: breed()

breed(cohort_A         ::Cohort,
      cohort_B         ::Cohort;
      n_gens           ::Int64=1,
      n_select         ::Int64=cohort_A.n + cohort_B.n,
      n_select_A   ::Int64=cohort_A.n,
      n_select_B ::Int64=cohort_B.n,
      select_all_gens  ::Bool=false,
      args...)

breed(cohort::Cohort, n_gens::Int64, args...) =
    breed(cohort, cohort, n_gens; args...)

Arguments

Positional arguments

  • cohort_A : A cohort object that is treated as common mating parents. It's asssumed to be sires/males parents.
  • cohort_B : A cohort object that is a mating pool from which individuals are sampled to mate with cohort_A. It's assumed to be dams/female parents.

Keywords arguments

  • n_gens : An integer specifying how many mate-select generations are carried out.
  • n_select : Used when ratio_malefemale is set to 0. n_select individuals are selected as parents for the next generation.
  • n_select_A : Used when ratio_malefemale is not 0. n_select_A will be selected as male parents for the next generation.
  • n_select_B : Used when ratio_malefemale is not 0. n_select_B will be selected as female parents for the next generation.
  • select_all_gens : Default "false" and parents are not included in the next generation pool for selections. Set select_all_gens to "true" if the selections consider individuals from all generations.

Outputs

By default, two cohort objects will be returned. The first cohort is assumed to be male progenies and the other cohort are female progenies. The size of two cohorts will folow the ratio raiot_malefemale. When ratio_malefemale is set to 0, only one cohort will be returned.

Examples

We can have 10 sires and mate each sire with 5 dams for 5 generations. In each generation, we randomly select 10 male progenies as sires and all female progenies as dams for the next generation. We can derive such breeding scheme as below:

julia> args  = Dict(# mating arguments
                    :nA               => 10,
                    :nB_per_A         => 5,
                    :n_per_mate       => 2,
                    :ratio_malefemale => 1.0,
                    # selection arguments
                    :is_random        => true,
                    # breeding arguments
                    :n_gens           => 5,
                    :nA_select        => 10,
                    :select_all_gens  => true)

Simulate 10 sires and 50 dams as founders.

julia> sires = Founders(10)
julia> dams  = Founders(50)

Breed cohorts based on the defined arguments.

julia> sires, dams = breed(sires, dams; args...)
[ Info: Gen 0 -> Mean of BVs: [1.665 2.745], Variance of BVs: [1.008 0.479]
[ Info: Gen 1 -> Mean of BVs: [1.719 2.715], Variance of BVs: [0.96 0.546]
[ Info: Gen 2 -> Mean of BVs: [1.754 2.777], Variance of BVs: [0.936 0.724]
[ Info: Gen 3 -> Mean of BVs: [1.766 2.796], Variance of BVs: [0.927 0.775]
[ Info: Gen 4 -> Mean of BVs: [1.773 2.771], Variance of BVs: [0.951 0.781]
[ Info: Gen 5 -> Mean of BVs: [1.813 2.751], Variance of BVs: [0.952 0.804]
([ Info: Cohort (60 individuals)
[ Info: 
[ Info: Mean of breeding values: 
[ Info: [1.855 2.593]
[ Info: 
[ Info: Variance of breeding values: 
[ Info: [0.884 0.697]
, [ Info: Cohort (300 individuals)
[ Info: 
[ Info: Mean of breeding values: 
[ Info: [1.805 2.782]
[ Info: 
[ Info: Variance of breeding values: 
[ Info: [0.969 0.822]

The result is equivalent to the following mate-select iterations:

julia> for _ in 1:5
            males, females = mate(sires, dams, args...)
            sires         += select(males, n_sires, args...)
            dams          += females
       end
source
Main.XSim.genetic_evaluationFunction

Genetic evaluation

genetic_evaluation(cohort         ::Cohort,
                   phenotypes     ::DataFrame=DataFrame();
                   model_equation ::String="",
                   covariates     ::String="",
                   random_iid     ::String="",
                   random_str     ::String="",
                   methods        ::String="GBLUP",
                   add_genotypes  ::Bool=true,
                   idx_missing_p  ::Any=[],
                   return_out     ::Bool=false,
                   args...)

Arguments

  • cohort : The evaluated cohort.
  • phenotypes : A dataframe wiht columns of id, traits, and other studied factors.
  • h2 : Default 0.5. Heritability for simulating cohort phenotypes if no phenotype is provided.
  • ve : Default 1. Residual variance for simulating cohort phenotypes if no phenotype is provided.
  • model_equation : Default "y ~ intercept". The equation for fitting the breeding values.
  • covariates : Specifies terms in phenotypes as continuous factors. Must be included in model_equation as well.
  • random_iid : Specifies terms in phenotypes as random effects (i.i.d.). Must be included in model_equation as well.
  • random_str : Specifies terms in phenotypes as random effects (pedigree). Must be included in model_equation as well.
  • methods : Default "GBLUP". Defines what models/methods used in the genetic evaluation.
  • add_genotypes : Default true. Genotypes will be included in the equation.
  • idx_missing_p : A vector assigning which individuals are not phenotyped.
  • return_out : Default false. Set to true to return the complete JWAS outputs.

Outputs

A n by t matrix containing breeding values will be return if return_out = false, where n is number of individuals and t is number of evaluated traits. A complete JWAS outputs will be returned if return_out = true.

Example of the phenotypes dataframe

 Row │ ID           y1              y2               factor_1  factor_2
     │ String       Float64?        Float64?         Int64     Int64
─────┼─────────────────────────────────────────────────────────────
   1 │ 1             0.88976        -0.0798048         1         1
   2 │ 2            -0.783203        0.988616          1         1
   3 │ 3             missing         missing           1         1
   4 │ 4             missing         missing           1         1
   5 │ 5             missing         missing           1         1
   6 │ 6             missing         missing           1         2
   7 │ 7             missing         missing           1         2
   8 │ 8            -1.76058         0.277289          1         2
   9 │ 9             0.938871        2.57784           1         2
  10 │ 10            0.37026         3.15993           1         2
  11 │ 11           -1.91869         0.0935064         2         3
  12 │ 12           -0.89847         1.8987            2         3
  13 │ 13            1.69663        -0.949513          2         3
  14 │ 14            3.48862         0.654378          2         3
  15 │ 15            1.39615         1.80355           2         3
  16 │ 16            2.31685         2.13446           2         4
  17 │ 17           -3.81017         0.0186156         2         4
  18 │ 18           -1.71216        -0.0976809         2         4
  19 │ 19            1.80917         1.34104           2         4
  20 │ 20           -0.504771        3.22665           2         4

Examples

We will use demo genome and phenome for the example. A cohort with 20 individuals is simulated.

julia> build_demo()
[ Info: --------- Genome Summary ---------
[ Info: Number of Chromosome  : 10
[ Info:
[ Info: Chromosome Length (cM):
[ Info: [150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0, 150.0]
[ Info:
[ Info: Number of Loci        : 1000
[ Info: [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
[ Info:
[ Info: Genotyping Error      : 0.0
[ Info: Mutation Rate         : 0.0
[ Info:
[ Info: --------- Phenome Summary ---------
[ Info: Number of Traits      : 2
[ Info: Heritability (h2)     : [0.5, 0.5]
┌ Info:
│   Genetic_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
┌ Info:
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
[ Info: Number of QTLs        : [3 8]

julia> cohort = Cohort(20)
[ Info: Cohort (20 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.862 -0.913]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.814 1.448]

─────────────────────────────────────────────────────────

Basic usage

By default, GBLUP will be performed without providing any argument.

julia> out = genetic_evaluation(cohort)
20×2 Array{Float64,2}:
  42.2908     -5.89224
   6.38041    -4.5895
  19.5065    -57.4513
  24.05      -84.5086
  11.5925     45.5247
 -11.5926     -1.91879
 -29.2152    -35.5977
  -1.92563    63.8841
  19.8686     76.2722
  22.6728     45.5179
 -58.2754     42.5953
  28.556     -21.0314
  11.5918     25.1126
  -0.517653   -0.237724
 -11.4712    -47.3073
 -10.9575     -3.58754
 -11.7835     -9.62802
 -17.7895    -22.0821
 -39.5114     -3.46026
   6.53008    -1.61416

The breeding values out can be used as criteria for the selection directly.

julia> select(cohort, 5, out)
[ Info: --------- Selection Summary ---------
[ Info: Select 5 individuals out of 20 individuals
[ Info: Selection differential (P): [0.783 1.192]
[ Info: Selection response     (G): [0.344 1.063]
┌ Info: 
│   Residual_Variance =
│    2×2 Array{Float64,2}:
│     1.0  0.0
└     0.0  1.0
[ Info: --------- Offsprings Summary ---------
[ Info: Cohort (5 individuals)
[ Info:
[ Info: Mean of breeding values:
[ Info: [-0.552 0.367]
[ Info:
[ Info: Variance of breeding values:
[ Info: [0.181 0.539]

Conditions of the evaluation can be further defined by h2 or ve.

julia> out = genetic_evaluation(cohort, ve = [1  0.5
                                             0.5  1])

Set return_out = true to obtain the complete JWAS outputs.

julia> out = genetic_evaluation(cohort, return_out = true)
Dict{Any,Any} with 7 entries:
  "EBV_y2"              => 20×3 DataFrame…
  "EBV_y1"              => 20×3 DataFrame…
  "heritability"        => 2×3 DataFrame…
  "location parameters" => 2×5 DataFrame…
  "residual variance"   => 4×3 DataFrame…
  "marker effects geno" => 40×5 DataFrame…
  "genetic_variance"    => 4×3 DataFrame…

─────────────────────────────────────────────────────────

Customized phenotypes and factors.

Obtain JWAS-compatible dataframe.

dt_p = get_phenotypes(cohort, "JWAS")

Assign un-phenotyped individuals

julia> idx = 3:6
julia> allowmissing!(dt_p);
julia> dt_p[idx, 2:end] .= missing;
julia> first(dt_p, 10)
10×3 DataFrame
 Row │ ID            y1               y2
     │ String?       Float64?         Float64?
─────┼──────────────────────────────────────────
   1 │ 1             -0.0933375       -0.882781
   2 │ 2             -1.50748         -2.12898
   3 │ 3              missing          missing
   4 │ 4              missing          missing
   5 │ 5              missing          missing
   6 │ 6              missing          missing
   7 │ 7             -0.431361        -0.624666
   8 │ 8             -1.17867          0.415607
   9 │ 9              0.266733         1.02123
  10 │ 10            -1.15588          0.737982

Provide customized phenotypes for the evaluation

julia> out = genetic_evaluation(dams, dt_p)

It's equivalent to set idx_missing_p = 3:6 for the missing phenotypes.

julia> out = genetic_evaluation(dams, dt_p, idx_missing_p = 3:6)
The marker IDs are set to 1,2,...,#markers
The individual IDs is set to 1,2,...,#observations
Genotype informatin:
#markers: 1000; #individuals: 20
The folder results is created to save results.
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for 16 individuals are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.

We can simulated factor_1 and factor_2 as fixed and random effects, respectively. And we use both factor_1 and factor_2 to fit y1, and factor_1 alone to fit y2.

julia> dt_p[:, "factor_1"] = [i for i in 1:4 for j in 1:5];
julia> dt_p[:, "factor_2"] = [i for i in 1:2 for j in 1:10];
julia> out = genetic_evaluation(cohort, dt_p,
                model_equation="y1 = intercept + factor_1 + factor_2
                                y2 = intercept + factor_1",
                random_iid="factor_2",
                return_out=true)

factor_2 is not found in model equation 2.
The marker IDs are set to 1,2,...,#markers
The individual IDs is set to 1,2,...,#observations
Genotype informatin:
#markers: 1000; #individuals: 20
The folder results is created to save results.
Checking genotypes...
Checking phenotypes...
Individual IDs (strings) are provided in the first column of the phenotypic data.
Phenotypes for 16 individuals are used in the analysis.These individual IDs are saved in the file IDs_for_individuals_with_phenotypes.txt.
Prior information for random effect variance is not provided and is generated from the data.

Pi (Π) is not provided.
Pi (Π) is generated assuming all markers have effects on all traits.

A Linear Mixed Model was build using model equations:

y1 = intercept + factor_1 + factor_2
y2 = intercept + factor_1

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
factor_1        factor       fixed                4
factor_2        factor       random               2

MCMC Information:

chain_length                                    100
burnin                                            0
starting_value                                 true
printout_frequency                              101
output_samples_frequency                          1
constraint                                    false
missing_phenotypes                             true
update_priors_frequency                           0
seed                                          false

Hyper-parameters Information:

random effect variances (y1:factor_2):
 0.45
residual variances:
 1.0f0  0.0f0
 0.0f0  1.0f0

Genomic Information:

complete genomic data (i.e., non-single-step analysis)

Genomic Category                               geno
Method                                        GBLUP
genetic variances (genomic):
 1.0  0.0
 0.0  1.0
estimateScale                                 false

Degree of freedom for hyper-parameters:

residual variances:                           6.000
random effect variances:                      5.000
marker effect variances:                      6.000

The version of Julia and Platform in use:

Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

The analysis has finished. Results are saved in the returned variable and text files. MCMC samples are saved in text files.

Dict{Any,Any} with 7 entries:
  "EBV_y2"              => 20×3 DataFrame…
  "EBV_y1"              => 20×3 DataFrame…
  "heritability"        => 2×3 DataFrame…
  "location parameters" => 12×5 DataFrame…
  "residual variance"   => 4×3 DataFrame…
  "marker effects geno" => 32×5 DataFrame…
  "genetic_variance"    => 4×3 DataFrame…
source