Internal functions
Documentation for JWAS.jl's internal (private) interface, which are not available to general users. These internal functions are small blocks that public function build on.
<!–-
Index
JWAS.Packed2BitBackendDataAPI.describeJWAS.GWASJWAS.GWASJWAS.PedModule.get_infoJWAS.PedModule.get_pedigreeJWAS.SSBRrunJWAS.add_genotypesJWAS.add_termJWAS.block_rhs!JWAS.build_modelJWAS.center!JWAS.check_marker_memory_guard!JWAS.decode_marker!JWAS.estimate_marker_memoryJWAS.format_bytes_humanJWAS.getEBVJWAS.getMCMCinfoJWAS.getMMEJWAS.get_column_refJWAS.get_genotypesJWAS.is_unit_weightsJWAS.load_streaming_backendJWAS.make_incidence_matricesJWAS.mkDictJWAS.mkmat_incidence_factorJWAS.outputEBVJWAS.outputMCMCsamplesJWAS.output_MCMC_samplesJWAS.output_MCMC_samples_setupJWAS.output_location_parameters_samplesJWAS.prediction_setupJWAS.prepare_streaming_genotypesJWAS.runMCMCJWAS.set_covariateJWAS.set_randomJWAS.set_randomJWAS.solveJWAS.streaming_mul_alpha!JWAS.transubstrarr
Internal interface
DataAPI.describe — Method
describe(model::MME)- Print out model information.
JWAS.GWAS — Method
GWAS(model,map_file,marker_effects_file...;
window_size = "1 Mb",sliding_window = false,
GWAS = true, threshold = 0.001,
genetic_correlation = false,
header = true)run genomic window-based GWAS
- MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
- model is either the model::MME used in analysis or the genotype cavariate matrix M::Array
- map_file has the (sorted) marker position information with delimiter ','. If the map file is not provided, i.e., map_file=
false, a fake map file will be generated with window_size markers in each 1 Mb window, and each 1 Mb window will be tested. - If two markereffectsfile are provided, and genetic_correlation = true, genomic correlation for each window is calculated.
- Statistics are computed for nonoverlapping windows of size window_size by default. If sliding_window = true, those for overlapping sliding windows are calculated.
- map file format:
markerID,chromosome,position
m1,1,16977
m2,1,434311
m3,1,1025513
m4,2,70350
m5,2,101135JWAS.SSBRrun — Function
(internal) Incomplete Genomic Data (Single-Step)
- reorder in A (pedigree) as ids for genotyped then non-genotyped inds
- impute genotypes for non-genotyped individuals
- add ϵ (imputation errors) and J as variables in data for non-genotyped inds
JWAS.add_genotypes — Function
DEPRECATED!! Please use get_genotypes()
add_genotypes(mme::MME,M::Union{AbstractString,Array{Float64,2},Array{Float32,2},Array{Any,2},DataFrames.DataFrame},G=false;
header=false,rowID=false,separator=',',
center=true,G_is_marker_variance=false,df=4)- Get marker informtion from a genotype file or an nxp Matrix M of genotypes (Array or DataFrame), where n is the number of individuals and p is the number of markers. This file/matrix needs to be column-wise sorted by marker positions.
- G is the mean for the prior assigned for the genomic variance with degree of freedom df, defaulting to 4.0. If G is not provided, a value is calculated from responses (phenotypes)
- If a text file is provided, the file format should be:
Animal,marker1,marker2,marker3,marker4,marker5 S1,1,0,1,1,1 D1,2,0,2,2,1 O1,1,2,0,1,0 O3,0,0,2,1,1
- If an nxp Matrix of genotypes (Array or DataFrame) is provided, where n is the number of individuals and p is the number of markers,
- This matrix needs to be column-wise sorted by marker positions.
- rowID is a vector of individual IDs, e.g.,rowID=["a1","b2","c1"]; if it is omitted, IDs will be set to 1:n
- header is a header vector such as ["id"; "mrk1"; "mrk2";...;"mrkp"]. If omitted, marker names will be set to 1:p
JWAS.add_term — Method
add to model an extra term: imputation_residual
JWAS.block_rhs! — Method
block_rhs!(rhs, Xblock, y, Rinv, unit_weights)Compute block RHS Xblock' * Diagonal(Rinv) * y in-place without storing Xblock' * Diagonal(Rinv) as a persistent matrix.
JWAS.build_model — Function
build_model(model_equations::AbstractString,R=false; df::AbstractFloat=4.0, estimate_variance=true)- Build a model from model equations with the residual variance R. In Bayesian analysis, R is the mean for the prior assigned for the residual variance with degree of freedom df, defaulting to 4.0. If R is not provided, a value is calculated from responses (phenotypes).
- By default, all variabels in modelequations are factors (categorical) and fixed. Set variables to be covariates (continuous) or random using functions `setcovariate()
orset_random()`. - The argument
estimate_varianceindicates whether to estimate the residual variance;estimate_variance=trueis the default.
#single-trait
model_equations = "BW = intercept + age + sex"
R = 6.72
models = build_model(model_equations,R);
#multi-trait
model_equations = "BW = intercept + age + sex
CW = intercept + litter";
R = [6.72 24.84
24.84 708.41]
models = build_model(model_equations,R);JWAS.center! — Method
This function centers columns of the input matrix X by subtracting their means along each column. The function operates in-place by modifying the original matrix X.
Input:
- X::AbstractMatrix: a matrix to be centered
Output:
- col_means::Vector: a vector of mean values for each column in the original matrix, computed before centering.JWAS.check_marker_memory_guard! — Method
check_marker_memory_guard!(; mode, ratio, estimated_bytes, total_memory_bytes, context_string)Apply guard policy for estimated marker memory usage.
JWAS.decode_marker! — Method
decode_marker!(dest, backend, marker_index)Decode one marker into dest with centering and missing-value mean-imputation.
JWAS.estimate_marker_memory — Method
estimate_marker_memory(nObs, nMarkers;
element_bytes,
has_nonunit_weights=false,
block_starts=false,
storage_mode=:dense)Estimate major marker-path memory components in bytes.
JWAS.format_bytes_human — Method
format_bytes_human(bytes)Format byte counts into a compact human-readable string.
JWAS.getEBV — Method
getEBV(model::MME,traiti)(internal function) Get breeding values for individuals defined by outputEBV(), defaulting to all genotyped individuals. This function is used inside MCMC functions for one MCMC samples from posterior distributions. e.g., non-NNBayespartial (multi-classs Bayes) : y1=M1α1[1]+M2α2[1]+M3α3[1] y2=M1α1[2]+M2α2[2]+M3α3[2]; NNBayespartial: y1=M1α1[1] y2=M2α2[1] y3=M3*α3[1];
JWAS.getMCMCinfo — Method
getMCMCinfo(model::MME)- (internal function) Print out MCMC information.
JWAS.getMME — Method
Construct mixed model equations with
incidence matrix: X ; response : ySparse; left-hand side : mmeLhs ; right-hand side : mmeLhs ;
JWAS.get_column_ref — Method
get_column_ref(X::Vector{T})
To obtain a vector of views (alias/pointer) for each column of the input matrix WITHOUT COPYING the underlying data.
input: a matrix X
output: a vector containing views of each column of the input matrix XJWAS.get_genotypes — Function
get_genotypes(file::Union{AbstractString,Array{Float64,2},Array{Float32,2},Array{Int64,2}, Array{Int32,2}, Array{Any,2}, DataFrames.DataFrame}, G = false;
## method:
method = "BayesC",Pi = 0.0,estimatePi = true,
## variance:
G_is_marker_variance = false, df = 4.0,
estimate_variance=true, estimate_scale=false,
constraint = false, #for multi-trait only, constraint=true means no genetic covariance among traits
## format:
separator=',',header=true,
## quality control:
quality_control=true, MAF = 0.01, missing_value = 9.0,
## others:
center=true,starting_value=false,
annotations=false,
storage=:dense)- Get marker informtion from a genotype file/matrix. This file needs to be column-wise sorted by marker positions.
- If
a text fileis provided, the file format should be:
Animal,marker1,marker2,marker3,marker4,marker5
S1,1,0,1,1,1
D1,2,0,2,2,1
O1,1,2,0,1,0
O3,0,0,2,1,1- If
a DataFrameis provided, where n is the number of individuals and p is the number of markers,- This matrix needs to be column-wise sorted by marker positions.
- The first column in the DataFrame should be individual IDs
- The marker IDs can be provided as the header of the DataFrame. If omitted, marker IDs will be set to 1,2,3...
- If
an nxp Matrixof genotypes (Array) is provided, where n is the number of individuals and p is the number of markers,- This matrix needs to be column-wise sorted by marker positions.
- Individual IDs will be set to 1:n;
- Marker IDs will be set to 1:p
- If
quality_control=true, defaulting totrue,- Missing genotypes should be denoted as
9, and will be replaced by column means. Users can also impute missing genotypes before the analysis. - Minor allele frequency
MAFthreshold, defaulting to0.01, is uesd, and fixed loci are removed.
- Missing genotypes should be denoted as
- G is the mean for the prior assigned for the genomic variance with degree of freedom df, defaulting to 4.0. If G is not provided, a value is calculated from responses (phenotypes).
- Available
methodsinclude "conventional (no markers)", "RR-BLUP", "BayesA", "BayesB", "BayesC", "Bayesian Lasso", and "GBLUP". - In Bayesian variable selection methods,
Pifor single-trait analyses is a number;Pifor multi-trait analyses is a dictionary such asPi=Dict([1.0; 1.0]=>0.7,[1.0; 0.0]=>0.1,[0.0; 1.0]=>0.1,[0.0; 0.0]=>0.1), defaulting toall markers have effects (Pi = 0.0)in single-trait analysis andall markers have effects on all traits (Pi=Dict([1.0; 1.0]=>1.0,[0.0; 0.0]=>0.0))in multi-trait analysis.Piis estimated ifestimatePi= true, , defaulting tofalse. - Scale parameter for prior of marker effect variance is estimated if
estimate_scale= true, defaulting tofalse. annotationsenables AnnotatedBayesC. Supply a numeric matrix with one row per raw marker; JWAS prepends an intercept column internally after marker QC/filtering.storage=:dense(default) keeps the existing in-memory dense loading behavior.storage=:streamloads an opt-in packed backend prepared byprepare_streaming_genotypes.
JWAS.is_unit_weights — Method
is_unit_weights(Rinv)Return true if all residual weights are exactly one.
JWAS.load_streaming_backend — Method
load_streaming_backend(path::AbstractString)Load a packed genotype backend from a prefix (or .jgb2 / .meta path).
JWAS.make_incidence_matrices — Method
make_incidence_matrices(mme,df_whole,train_index)- (internal function) Make incidence matrices for effects involved in
calculation of EBV except marker covariates.
- Both incidence matrices for non-missing observations (used in mixed model equations)
and individuals of interest (output IDs) are obtained.
JWAS.mkDict — Method
mkDict(a::Vector{T}) where T <: Any
Get column index in the incidence matrix for each level of a factor (categorical variable)
input: a=["a1","a4","a1","a2"]
output: d=Dict("a2" => 3, "a1" => 1, "a4" => 2), level_names=["a1","a4","a2"]
note: enumerate(level_names) gives a list of tuples (index, element), reverse() to reverse (index,element) to (element,index)JWAS.mkmat_incidence_factor — Method
mkmatincidencefactor(yID::Vector, uID::Vector) create an incidence matrix Z to reorder uID to yID by yID = ZuID. input: - yID: a vector containing the desired order of IDs - uID: a vector containing the original order of IDs output: - Z: a sparse matrix representing the incidence relationship between yID and uID (yID = ZuID)
JWAS.outputEBV — Method
outputEBV(model,IDs::Array)Output estimated breeding values and prediction error variances for IDs.
JWAS.outputMCMCsamples — Method
outputMCMCsamples(mme::MME,trmStrs::AbstractString...)Get MCMC samples for specific location parameters.
JWAS.output_MCMC_samples — Function
output_MCMC_samples(mme,vRes,G0,outfile=false)(internal function) Save MCMC samples every outputsamplesfrequency iterations to the text file.
JWAS.output_MCMC_samples_setup — Function
output_MCMC_samples_setup(mme,nIter,output_samples_frequency,file_name="MCMC_samples")(internal function) Set up text files to save MCMC samples.
JWAS.output_location_parameters_samples — Method
output_location_parameters_samples(mme::MME,sol,outfile)(internal function) Save MCMC samples for location parameers
JWAS.prediction_setup — Method
prediction_setup(mme::MME)- (internal function) Create incidence matrices for individuals of interest based on a usere-defined
prediction equation, defaulting to genetic values including effects defined with genomic and pedigre information. For now, genomic data is always included.
- J and ϵ are always included in single-step analysis (added in SSBR.jl)
JWAS.prepare_streaming_genotypes — Method
prepare_streaming_genotypes(file::AbstractString;
output_prefix=nothing,
separator=',',
header=true,
missing_value=9.0,
quality_control=true,
MAF=0.01,
center=true,
conversion_mode=:lowmem,
auto_dense_max_bytes=2^30,
tmpdir=nothing,
cleanup_temp=true,
disk_guard_ratio=0.9)Convert a dense text genotype file to a marker-major 2-bit packed backend.
Conversion backend selection:
conversion_mode=:lowmemuses out-of-core staged conversion (disk-backed).conversion_mode=:denseuses in-memory conversion.conversion_mode=:autochooses between the two based onauto_dense_max_bytes.
Low-memory conversion options:
tmpdir: optional location for temporary conversion files.cleanup_temp: remove temporary files after successful conversion.disk_guard_ratio: fail fast when estimated required bytes exceeddisk_guard_ratio * available_bytes.
JWAS.runMCMC — Method
runMCMC(model::MME,df::DataFrame;
#Data
heterogeneous_residuals = false,
#MCMC
chain_length::Integer = 100,
starting_value = false,
burnin::Integer = 0,
output_samples_frequency::Integer = chain_length>1000 ? div(chain_length,1000) : 1,
update_priors_frequency::Integer = 0,
#Methods
single_step_analysis = false, #parameters for single-step analysis
pedigree = false, #parameters for single-step analysis
fitting_J_vector = true, #parameters for single-step analysis
categorical_trait = false,
censored_trait = false,
causal_structure = false,
mega_trait = false,
missing_phenotypes = true,
constraint = false,
#Genomic Prediction
outputEBV = true,
output_heritability = true,
prediction_equation = false,
#MISC
seed = false,
printout_model_info = true,
printout_frequency = chain_length+1,
big_memory = false,
double_precision = false,
fast_blocks = false,
memory_guard = :error,
memory_guard_ratio = 0.80,
##MCMC samples (defaut to marker effects and hyperparametes (variance components))
output_folder = "results",
output_samples_for_all_parameters = false,
##for deprecated JWAS
methods = "conventional (no markers)",
Pi = 0.0,
estimatePi = false)Run MCMC for Bayesian Linear Mixed Models with or without estimation of variance components.
- Markov chain Monte Carlo
- The first
burniniterations are discarded at the beginning of a MCMC chain of lengthchain_length. - Save MCMC samples every
output_samples_frequencyiterations, defaulting tochain_length/1000, to a folderoutput_folder, defaulting toresults. MCMC samples for hyperparametes (variance componets) and marker effects are saved by default. MCMC samples for location parametes can be saved using functionoutput_MCMC_samples(). Note that saving MCMC samples too frequently slows down the computation. - The
starting_valuecan be provided as a vector for all location parameteres and marker effects, defaulting to0.0s. The order of starting values for location parameters and marker effects should be the order of location parameters in the Mixed Model Equation for all traits (This can be obtained by getNames(model)) and then markers for all traits (all markers for trait 1 then all markers for trait 2...). - Miscellaneous Options
- Priors are updated every
update_priors_frequencyiterations, defaulting to0.
- Priors are updated every
- The first
- Methods
- Single step analysis is allowed if
single_step_analysis=trueandpedigreeis provided. - Miscellaneous Options
- Missing phenotypes are allowed in multi-trait analysis with
missing_phenotypes=true, defaulting totrue. - Catogorical Traits are allowed if
categorical_trait=true, defaulting tofalse. Phenotypes should be coded as 1,2,3... - Censored traits are allowed if the upper bounds are provided in
censored_traitas an array, and lower bounds are provided as phenotypes. - If
constraint=true, defaulting tofalse, constrain residual covariances between traits to be zeros. - If
causal_structureis provided, e.g., causal_structure = [0.0 0.0 0.0;1.0 0.0 0.0;1.0 0.0 0.0] for trait 1 -> trait 2 and trait 1 -> trait 3 (column index affacts row index, and a lower triangular matrix is required), phenotypic causal networks will be incorporated using structure equation models.
- Missing phenotypes are allowed in multi-trait analysis with
- Single step analysis is allowed if
- Genomic Prediction
- Predicted values for individuals of interest can be obtained based on a user-defined prediction equation
prediction_equation, e.g., "y1:animal + y1:age".
prediction_equation= false, defaulting tofalse.- Individual estimted genetic values and prediction error variances (PEVs) are returned if
outputEBV=true, defaulting totrue. Heritability and genetic
output_heritability=true, defaulting totrue. Note that estimation of heritability is computaionally intensive. - Predicted values for individuals of interest can be obtained based on a user-defined prediction equation
- Miscellaneous Options
- Print out the model information in REPL if
printout_model_info=true; print out the monte carlo mean in REPL withprintout_frequency, defaulting tofalse. - If
seed, defaulting tofalse, is provided, a reproducible sequence of numbers will be generated for random number generation. - If
big_memory=true, defaulting tofalse, a machine with lots of memory is assumed which may speed up the analysis. memory_guardcontrols the marker-memory precheck before MCMC (:error,:warn,:off; default:error).memory_guard_ratiosets the allowed fraction ofSys.total_memory()for the precheck (default0.80).
- Print out the model information in REPL if
JWAS.set_covariate — Method
set_covariate(model::MME,variables::AbstractString...)- set variables as covariates; model is the output of function
build_model().
#After running build_model, variabels age and year can be set to be covariates as
set_covariate(model,"age","year")
#or
set_covariate(model,"age year")JWAS.set_random — Function
set_random(mme::MME,randomStr::AbstractString,G;Vinv=0,names=[],df=4)- set variables as random effects, defaulting to i.i.d effects, with variances G.
- G is the mean for the prior assigned for the variance with degree of freedom df, defaulting to 4.0. If G is not provided, a value is calculated from responses (phenotypes).
- the random effects are assumed to be i.i.d by default and it can be defined with any (inverse of) covariance structure Vinv with its index (row names) provided by names.
#single-trait (i.i.d randome effects)
model_equation = "y = intercept + litter + sex"
model = build_model(model_equation,R)
G = 0.6
set_random(model,"litter",G)
#multi-trait (i.i.d randome effects)
model_equations = "BW = intercept + litter + sex
CW = intercept + litter + sex"
model = build_model(model_equations,R);
G = [3.72 1.84
1.84 3.41]
set_random(model,"litter",G)
#single-trait (randome effects with specific covariance structures)
model_equation = "y = intercept + litter + sex"
model = build_model(model_equation,R)
V = [1.0 0.5 0.25
0.5 1.0 0.5
0.25 0.5 1.0]
G = 0.6
set_random(model,"litter",G,Vinv=inv(V),names=[a1;a2;a3])JWAS.set_random — Function
set_random(mme::MME,randomStr::AbstractString,ped::Pedigree, G;df=4)- set variables as random polygenic effects with pedigree information ped. and variances G.
- G is the mean for the prior assigned for the variance with degree of freedom df, defaulting to 4.0. If G is not provided, a value is calculated from responses (phenotypes).
#single-trait (example 1)
model_equation = "y = intercept + age + animal"
model = build_model(model_equation,R)
ped = get_pedigree(pedfile)
G = 1.6
set_random(model,"animal", ped, G)
#single-trait (example 2)
model_equation = "y = intercept + age + animal + animal*age"
model = build_model(model_equation,R)
ped = get_pedigree(pedfile)
G = [1.6 0.2
0.2 1.0]
set_random(model,"animal animal*age", ped,G)
#multi-trait
model_equations = "BW = intercept + age + sex + animal
CW = intercept + age + sex + animal"
model = build_model(model_equations,R);
ped = get_pedigree(pedfile);
G = [6.72 2.84
2.84 8.41]
set_random(model,"animal",ped,G)JWAS.solve — Method
solve(mme::MME,df::DataFrame;solver="default",printout_frequency=100,tolerance = 0.000001,maxiter = 5000)- Solve the mixed model equations (no marker information) without estimating variance components.
Available solvers include default, Jacobi, Gauss-Seidel, and Gibbs sampler.
JWAS.streaming_mul_alpha! — Method
streaming_mul_alpha!(out, backend, α)Compute out = X*α from a streaming backend without materializing dense X.
JWAS.transubstrarr — Method
transubstrarr(vec)(internal function) Transpose a column vector of strings (vec' doesn't work here)
JWAS.Packed2BitBackend — Type
Packed2BitBackendStreaming backend for marker-major 2-bit packed genotypes used by storage=:stream in get_genotypes.
JWAS.PedModule.get_info — Method
get_info(pedigree::Pedigree;Ai=false)- Print summary informtion from a pedigree object including number of individulas, sires. dams and founders. Return individual IDs, inverse of numerator relationship matrix, and inbreeding coefficients if Ai=
true.
JWAS.PedModule.get_pedigree — Method
get_pedigree(pedfile::AbstractString;header=false,separator=',',missingstring=["0"])- Get pedigree informtion from a pedigree file with header (defaulting to
false) , separator (defaulting to,) and missing values (defaulting to ["0"]) - Pedigree file format:
a,0,0
c,a,b
d,a,c–>