Public API Reference
Documentation for NNMM.jl's public interface. Below are functions available to general users.
This documentation reflects NNMM.jl v0.3+ using the Layer/Equation/runNNMM API.
Index
NNMM.EquationNNMM.LayerNNMM.OmicsNNMM.PhenotypesNNMM.GWASNNMM.getEBVNNMM.get_genotypesNNMM.nnmm_get_genotypesNNMM.nnmm_get_omicsNNMM.read_phenotypesNNMM.runNNMM
Core Types
Layer
Defines a layer in the neural network architecture.
NNMM.Layer — Type
LayerA struct representing a layer in the Neural Network Mixed Model (NNMM).
Fields
layer_name::String: Name identifier for the layerdata_path: Path to the data file(s) - String for single file, Vector{String} for partial connectionseparator: Column separator in the data file (default: ',')header: Whether the data file has a header row (default: true)data: Loaded data (initialized as empty)quality_control: Whether to perform quality control on genotypes (default: true)MAF: Minor Allele Frequency threshold for QC (default: 0.01)missing_value: Value representing missing data (default: 9.0)center: Whether to center the data (default: true)
Usage:
# Genotype layer (input) - note the [] around the path
layer1 = Layer(layer_name="geno", data_path=["genotypes.csv"])
# Omics layer (hidden) with missing value handling
layer2 = Layer(layer_name="omics", data_path="omics.csv", missing_value="NA")
# Phenotype layer (output)
layer3 = Layer(layer_name="pheno", data_path="phenotypes.csv", missing_value="NA")Equation
Defines the statistical model connecting two layers.
NNMM.Equation — Type
EquationA struct representing an equation connecting layers in NNMM.
Fields
from_layer_name: Source layer nameto_layer_name: Target layer nameequation: Model equation string (e.g., "omics = intercept + geno")traits: Output trait column names (e.g., ["omic1", "omic2"] or ["trait1"])covariate: Covariate variable namesrandom: Random effect specificationsactivation_function: Activation function - string ("linear", "sigmoid", "tanh", "relu", "leakyrelu") or custom Functionpartial_connect_structure: Structure for partial connectivitystarting_value: Starting values for MCMCmethod: Bayesian method ("BayesC", "BayesA", "BayesB", etc.)Pi: Prior probability of zero effectestimatePi: Whether to estimate Pi- Variance parameters for G (genetic) and R (residual)
Note: omics_name and phenotype_name are deprecated aliases for traits.
Usage:
# Genotypes → Omics (BayesC with 10 omics features)
eq1 = Equation(
from_layer_name = "geno",
to_layer_name = "omics",
equation = "omics = intercept + geno",
omics_name = ["omic1", "omic2", "omic3"],
method = "BayesC",
estimatePi = true
)
# Omics → Phenotypes (with sigmoid activation)
eq2 = Equation(
from_layer_name = "omics",
to_layer_name = "pheno",
equation = "pheno = intercept + omics",
phenotype_name = ["y1"],
method = "BayesC",
activation_function = "sigmoid"
)Supporting Types
NNMM.Omics — Type
OmicsA struct representing omics data (e.g., transcriptomics, metabolomics) in NNMM.
Fields
name: Name identifier for this omics categorytrait_names: Names for corresponding traitsobsID: Row IDs for individualsfeatureID: Feature/omics variable IDsnObs: Number of observationsnFeatures: Number of omics featuresnMarkers: Alias for nFeatures (compatibility)centered: Whether data is centereddata: The omics data matrixntraits: Number of traits in the model- Additional fields for Bayesian MCMC computations
NNMM.Phenotypes — Type
PhenotypesA struct representing phenotype data in NNMM.
Fields
obsID: Individual IDsfeatureID: Phenotype/trait namesnObs: Number of observationsnPheno: Number of phenotypesdata: Phenotype data matrixnFeatures: Alias for nPheno
Main Functions
runNNMM
The primary function for running NNMM analysis.
NNMM.runNNMM — Function
runNNMM(layers, equations; kwargs...)Run Neural Network Mixed Model (NNMM) analysis.
Arguments
layers: Vector of 3Layerobjects defining the network architecture- Layer 1: Genotypes (SNPs)
- Layer 2: Omics/Latent traits
- Layer 3: Phenotypes
equations: Vector of 2Equationobjects defining relationships- Equation 1: Genotypes → Omics (1→2)
- Equation 2: Omics → Phenotypes (2→3)
Keyword Arguments
MCMC Settings
chain_length::Integer=100: Total MCMC iterationsburnin::Integer=0: Number of burn-in iterations to discardoutput_samples_frequency::Integer: Save every nth sample (default: auto)output_prediction_frequency::Integer: Save every nth prediction sample for EBV/EPV (default:output_samples_frequency)update_priors_frequency::Integer=0: Update prior parameters every n iterations
Output Settings
outputEBV=true: Output estimated breeding valuesoutput_heritability=true: Calculate heritability estimatesoutput_folder="nnmm_results": Directory for output files
Computational Settings
seed=false: Random seed for reproducibilitydouble_precision=false: Use Float64 instead of Float32big_memory=false: Enable memory-intensive optimizations
Returns
A dictionary containing:
- Posterior means for all parameters
- MCMC samples (saved to files)
- EBV estimates
Example
# Define layers
layer1 = Layer(name="geno", file="genotypes.csv")
layer2 = Layer(name="omics", file="omics.csv")
layer3 = Layer(name="pheno", file="phenotypes.csv")
# Define equations
eq1 = Equation("omics = intercept + geno",
method="BayesC", omics_name=["o1","o2"])
eq2 = Equation("pheno = intercept + omics",
activation_function=tanh, phenotype_name=["y"])
# Run analysis
results = runNNMM([layer1, layer2, layer3], [eq1, eq2],
chain_length=5000, burnin=1000)Usage:
results = runNNMM(layers, equations;
chain_length = 10000,
burnin = 2000,
output_folder = "my_results",
seed = 42
)describe
Print model summary information.
DataAPI.describe — Function
describe(model::MME)Print a summary of the mixed model equations (MME) structure.
Displays:
- Model equations (truncated if >5)
- Term information (classification, fixed/random, number of levels)
- Prior distributions and hyperparameters
- Variance component settings
- Marker effect settings (if genomic data included)
Arguments
model::MME: The mixed model equations object (created internally by runNNMM)
Output
Prints formatted model summary to stdout including:
- Model equations
- Term classification (covariate/factor, fixed/random)
- Prior settings for variance components
- MCMC configuration
Example
# After running runNNMM, the describe function is called automatically.
# For manual inspection:
describe(results["mme"])Data Reading Functions
read_phenotypes
Read phenotype data from a file.
NNMM.read_phenotypes — Function
read_phenotypes(file; separator=',', header=true, missing_value="NA", output_folder=".")Read phenotype data from a CSV file.
Arguments
file: Path to CSV file containing phenotype data
Keyword Arguments
separator: Column delimiter (default: ',')header: Whether file has header row (default: true)missing_value: String representing missing values (default: "NA")output_folder: Directory for output files (default: current directory)
Returns
DataFrame with phenotype data. First column is individual IDs.
Example
pheno_df = read_phenotypes("phenotypes.csv", missing_value="NA")Notes
- First column must contain individual IDs
- Missing values are converted to Julia's
missingtype - A file
IDs_for_individuals_with_phenotypes.txtis written to output_folder
nnmmgetgenotypes
Read genotype data from a file or matrix.
NNMM.nnmm_get_genotypes — Function
nnmm_get_genotypes(file, G=false; kwargs...)Read genotype data for the NNMM model (layer 1).
Arguments
file: Genotype data source - can be:String: Path to CSV fileDataFrame: DataFrame with ID column + marker columnsMatrix: Numeric matrix of genotypes
G: Prior genetic variance (default: estimated from data)
Keyword Arguments
method: Bayesian method ("BayesA", "BayesB", "BayesC", "RR-BLUP", "GBLUP", "BayesL")Pi: Prior inclusion probability (default: 0.0)estimatePi: Whether to estimate Pi (default: true)G_is_marker_variance: If true, G is marker variance; if false, G is genetic variancedf: Degrees of freedom for prior (default: 4.0)quality_control: Perform QC filtering (default: true)MAF: Minor allele frequency threshold (default: 0.01)center: Center genotypes (default: true)separator: File delimiter (default: ',')header: File has header row (default: true)
Returns
Genotypes struct containing processed genotype data and method parameters.
get_genotypes
Alias for nnmm_get_genotypes.
NNMM.get_genotypes — Function
get_genotypes(args...; kwargs...)Alias for nnmm_get_genotypes. See that function for full documentation.
nnmmgetomics
Read omics data from a file.
NNMM.nnmm_get_omics — Function
nnmm_get_omics(file, G=false; kwargs...)Read omics data for the NNMM model (layer 2 / middle layer).
Arguments
file: Path to CSV file containing omics dataG: Prior genetic variance (default: estimated from data)
Keyword Arguments
omics_name: Vector of column names to use as omics features (required)method: Bayesian method ("BayesA", "BayesB", "BayesC", "RR-BLUP", "BayesL")Pi: Prior inclusion probability (default: 0.0)estimatePi: Whether to estimate Pi (default: true)G_is_marker_variance: If true, G is marker variance; if false, G is genetic variancedf: Degrees of freedom for prior (default: 4.0)constraint: Use independent variances for multi-trait (default: true)separator: File delimiter (default: ',')header: File has header row (default: true)missing_value: String/value representing missing data (default: false)
Returns
Omics struct containing the omics data and method parameters.
Example
omics = nnmm_get_omics("omics_data.csv",
omics_name=["gene1", "gene2", "gene3"],
missing_value="NA")Notes
- First column must be individual IDs
- Missing values will be sampled during MCMC (HMC for latent traits)
- Unlike genotypes, no quality control (MAF filtering) is applied
Post-Analysis Functions
GWAS
Genome-wide association study on MCMC results.
NNMM.GWAS — Function
GWAS(marker_effects_file; header=true)Compute the model frequency for each marker.
Model frequency is the probability that a marker is included in the model (i.e., has non-zero effect) across MCMC samples.
Arguments
marker_effects_file: Path to CSV file with MCMC samples of marker effectsheader: Whether file has header row (default: true)
Returns
DataFrame with columns:
marker_ID: Marker identifiermodelfrequency: Proportion of samples where effect ≠ 0
Example
# After running NNMM with BayesB or BayesC
freq = GWAS("nnmm_results/MCMC_samples_marker_effects_genotypes.txt")GWAS(model,map_file,marker_effects_file...;
window_size = "1 Mb",sliding_window = false,
GWAS = true, threshold = 0.001,
genetic_correlation = false,
header = true)run genomic window-based GWAS
- MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
- model is either the model::MME used in analysis or the genotype cavariate matrix M::Array
- map_file has the (sorted) marker position information with delimiter ','. If the map file is not provided, i.e., map_file=
false, a fake map file will be generated with window_size markers in each 1 Mb window, and each 1 Mb window will be tested. - If two markereffectsfile are provided, and genetic_correlation = true, genomic correlation for each window is calculated.
- Statistics are computed for nonoverlapping windows of size window_size by default. If sliding_window = true, those for overlapping sliding windows are calculated.
- map file format:
markerID,chromosome,position
m1,1,16977
m2,1,434311
m3,1,1025513
m4,2,70350
m5,2,101135Usage:
# Run GWAS on marker effect samples
gwas_result = GWAS("results/MCMC_samples_marker_effects_geno_omic1.txt")
# Sort by model frequency
sorted = sort(gwas_result, :modelfrequency, rev=true)
println(first(sorted, 10))getEBV
Extract estimated breeding values from results.
NNMM.getEBV — Function
getEBV(model::MME,traiti)(internal function) Get breeding values for individuals defined by outputEBV(), defaulting to all genotyped individuals. This function is used inside MCMC functions for one MCMC samples from posterior distributions. e.g., non-NNBayespartial (multi-classs Bayes) : y1=M1α1[1]+M2α2[1]+M3α3[1] y2=M1α1[2]+M2α2[2]+M3α3[2]; NNBayespartial: y1=M1α1[1] y2=M2α2[1] y3=M3*α3[1];
Built-in Datasets
dataset
Access built-in example datasets.
NNMM.Datasets.dataset — Function
dataset(file_name::AbstractString; dataset_name::AbstractString="")Get the path to a built-in dataset file.
Arguments
file_name::AbstractString: The name of the file to retrievedataset_name::AbstractString="": Optional subdirectory name within the data folder
Returns
String: Full path to the requested data file
Examples
phenofile = dataset("phenotypes.csv")
genofile = dataset("genotypes.txt", dataset_name="example")Usage:
using NNMM.Datasets
# Access default example data
pheno_path = Datasets.dataset("phenotypes.csv")
geno_path = Datasets.dataset("genotypes.csv")
# Access simulated omics dataset
geno_path = Datasets.dataset("genotypes_1000snps.txt", dataset_name="simulated_omics_data")
pheno_path = Datasets.dataset("phenotypes_sim.txt", dataset_name="simulated_omics_data")Available Datasets:
| Dataset | Files | Description |
|---|---|---|
| (default) | phenotypes.csv, genotypes.csv, genotypes0.csv, pedigree.csv, GRM.csv, map.csv | Small example data |
| (default) | genotypes_group1.csv, genotypes_group2.csv, genotypes_group3.csv | Genotype groups for partial networks |
example | phenotypes.txt, genotypes.txt, pedigree.txt, etc. | Tab-separated example data |
simulated_omics_data | genotypes_1000snps.txt, phenotypes_sim.txt, pedigree.txt | Simulated dataset with 1000 SNPs and 10 omics |
Pedigree Functions
get_pedigree
Read and process pedigree information.
NNMM.PedModule.get_pedigree — Function
get_pedigree(pedfile::AbstractString;header=false,separator=',',missingstrings=["0"],output_folder=".")- Get pedigree informtion from a pedigree file with header (defaulting to
false) , separator (defaulting to,) and missing values (defaulting to ["0"]) output_folder: Directory to save diagnostic files (defaulting to current directory)- Pedigree file format:
a,0,0
c,a,b
d,a,cUsage:
# Read pedigree file
pedigree = get_pedigree("pedigree.csv", separator=',', header=true)
# Use in random effect specification
random_spec = [(name="ID", pedigree=pedigree)]Parameter Reference Tables
Bayesian Methods
| Method | Description |
|---|---|
"BayesA" | All markers have non-zero effects with marker-specific variances |
"BayesB" | Subset of markers have non-zero effects with marker-specific variances |
"BayesC" | Subset of markers have non-zero effects with common variance |
"BayesL" | Bayesian LASSO |
"RR-BLUP" | Ridge regression BLUP (all markers, common variance) |
"GBLUP" | Genomic BLUP using relationship matrix (Layer 1→2 only) |
Activation Functions
| Function | Formula | Range | Use Case |
|---|---|---|---|
"linear" | f(x) = x | (-∞, ∞) | Traditional regression |
"sigmoid" | f(x) = 1/(1+e^(-x)) | (0, 1) | Bounded outputs |
"tanh" | f(x) = tanh(x) | (-1, 1) | Centered bounded outputs |
"relu" | f(x) = max(0, x) | [0, ∞) | Sparse activation |
"leakyrelu" | f(x) = max(0.01x, x) | (-∞, ∞) | Sparse with gradient flow |
runNNMM Keyword Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
chain_length | Integer | 100 | Total MCMC iterations |
burnin | Integer | 0 | Burn-in iterations to discard |
output_samples_frequency | Integer | auto | Save every Nth sample |
outputEBV | Bool | true | Output estimated breeding values |
output_heritability | Bool | true | Calculate heritability |
output_folder | String | "nnmm_results" | Output directory |
seed | Int/Bool | false | Random seed (false = random) |
printout_frequency | Integer | chain_length+1 | Print progress frequency |
double_precision | Bool | false | Use Float64 instead of Float32 |
big_memory | Bool | false | Enable memory-intensive optimizations |