Mixed effect neural network: Genotypes -> Unobserved intemediate traits -> Phenotyes

Tips: please center the phenotypes to have zero mean.

Example(a): fully-connected neural networks, all intemediate traits are unobserved

  • nonlinear function (to define relationship between middle layer and phenotye): tanh (other supported activation functions: "sigmoid", "relu", "leakyrelu", "linear")
  • number of nodes in the middle layer: 3
  • Bayesian model: multiple independent single-trait BayesC (to sample marker effects on intemediate traits). Note, to use multi-trait Bayesian Alphabet models, please set mega_trait=false in runMCMC() function.
  • sample the unobserved intemediate traits in the middle layer: Hamiltonian Monte Carlo

# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(123)

# Step 2: Read data 
phenofile  = dataset("phenotypes.csv") #get example data path
genofile   = dataset("genotypes.csv")  #get example data path

phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"]) #read phenotypes (output layer)
genotypes  = get_genotypes(genofile,separator=',',method="BayesC");                      #read genotypes  (input layer)


# Step 3: Build Model Equations 
model_equation  ="y1 = intercept + genotypes"  #name of phenotypes is "y1" in the phenotypes data
                                               #name of genotypes is "genotypes" (user-defined in the previous step)
                                               #the single-trait mixed model used between input and each node in middle layer is: middle node = intercept + genotypes
model = build_model(model_equation,
		    num_hidden_nodes=3,            #number of nodes in middle layer is 3
		    nonlinear_function="tanh");    #tanh function is used to approximate relationship between middle layer and phenotype


# Step 4: Run Analysis
out=runMCMC(model,phenotypes,chain_length=5000); 

# Step 5: Check Accuruacy
results    = innerjoin(out["EBV_NonLinear"], phenotypes, on = :ID) 
accuruacy  = cor(results[!,:EBV],results[!,:bv1])

Example output files

The i-th middle nodes will be named as "trait name"+"i". In our example, the observed trait is named "y1", and there are 3 middle nodes, so the middle nodes are named as "y11", "y12", and "y13", respectively.

Below is a list of files containing estimates and standard deviations for variables of interest.

file namedescription
EBV_NonLinear.txtestimated breeding values for observed trait
EBV_y11.txtestimated breeding values for middle node 1
EBV_y12.txtestimated breeding values for middle node 2
EBV_y13.txtestimated breeding values for middle node 3
genetic_variance.txtestimated genetic variance-covariance of all middle nodes
heritability.txtestimated heritability of all middle nodes
location_parameters.txtestimated bias of all middle nodes
neuralnetworksbiasandweights.txt.estimated bias of phenotypes and weights between middle nodes and phenotypes
pi_genotypes.txtestimated pi of all middle nodes
markereffectsgenotypes.txtestimated marker effects of all middle nodes
residual_variance.txtestimated residual variance-covariance for all middle nodes

Below is a list of files containing MCMC samples for variables of interest.

file namedescription
MCMCsamplesEBV_NonLinear.txtMCMC samples from the posterior distribution of breeding values for phenotypes
MCMCsamplesEBV_y11.txtMCMC samples from the posterior distribution of breeding values for middle node 1
MCMCsamplesEBV_y12.txtMCMC samples from the posterior distribution of breeding values for middle node 2
MCMCsamplesEBV_y13.txtMCMC samples from the posterior distribution of breeding values for middle node 3
MCMCsamplesgenetic_variance.txtMCMC samples from the posterior distribution of genetic variance-covariance for all middle nodes
MCMCsamplesheritability.txtMCMC samples from the posterior distribution of heritability for all middle nodes
MCMCsamplesmarkereffectsgenotypes_y11MCMC samples from the posterior distribution of marker effects for middle node 1
MCMCsamplesmarkereffectsgenotypes_y12MCMC samples from the posterior distribution of marker effects for middle node 2
MCMCsamplesmarkereffectsgenotypes_y13MCMC samples from the posterior distribution of marker effects for middle node 3
MCMCsamplesmarkereffectsvariances_genotypes.txtMCMC samples from the posterior distribution of marker effect variance for all middle nodes
MCMCsamplesneuralnetworksbiasandweights.txt.MCMC samples from the posterior distribution of bias of observed trait and weights between middle nodes and phenotypes
MCMCsamplespi_genotypes.txtMCMC samples from the posterior distribution of pi for all middle nodes
MCMCsamplesresidual_variance.txtMCMC samples from the posterior distribution of residual variance-covariance for all middle nodes