Public API Reference

Documentation for NNMM.jl's public interface. Below are functions available to general users.

Version Compatibility

This documentation reflects NNMM.jl v0.3+ using the Layer/Equation/runNNMM API.

Index

Core Types

Layer

Defines a layer in the neural network architecture.

NNMM.LayerType
Layer

A struct representing a layer in the Neural Network Mixed Model (NNMM).

Fields

  • layer_name::String: Name identifier for the layer
  • data_path: Path to the data file(s) - String for single file, Vector{String} for partial connection
  • separator: Column separator in the data file (default: ',')
  • header: Whether the data file has a header row (default: true)
  • data: Loaded data (initialized as empty)
  • quality_control: Whether to perform quality control on genotypes (default: true)
  • MAF: Minor Allele Frequency threshold for QC (default: 0.01)
  • missing_value: Value representing missing data (default: 9.0)
  • center: Whether to center the data (default: true)
source

Usage:

# Genotype layer (input) - note the [] around the path
layer1 = Layer(layer_name="geno", data_path=["genotypes.csv"])

# Omics layer (hidden) with missing value handling
layer2 = Layer(layer_name="omics", data_path="omics.csv", missing_value="NA")

# Phenotype layer (output)
layer3 = Layer(layer_name="pheno", data_path="phenotypes.csv", missing_value="NA")

Equation

Defines the statistical model connecting two layers.

NNMM.EquationType
Equation

A struct representing an equation connecting layers in NNMM.

Fields

  • from_layer_name: Source layer name
  • to_layer_name: Target layer name
  • equation: Model equation string (e.g., "omics = intercept + geno")
  • traits: Output trait column names (e.g., ["omic1", "omic2"] or ["trait1"])
  • covariate: Covariate variable names
  • random: Random effect specifications
  • activation_function: Activation function - string ("linear", "sigmoid", "tanh", "relu", "leakyrelu") or custom Function
  • partial_connect_structure: Structure for partial connectivity
  • starting_value: Starting values for MCMC
  • method: Bayesian method ("BayesC", "BayesA", "BayesB", etc.)
  • Pi: Prior probability of zero effect
  • estimatePi: Whether to estimate Pi
  • Variance parameters for G (genetic) and R (residual)

Note: omics_name and phenotype_name are deprecated aliases for traits.

source

Usage:

# Genotypes → Omics (BayesC with 10 omics features)
eq1 = Equation(
    from_layer_name = "geno",
    to_layer_name = "omics", 
    equation = "omics = intercept + geno",
    omics_name = ["omic1", "omic2", "omic3"],
    method = "BayesC",
    estimatePi = true
)

# Omics → Phenotypes (with sigmoid activation)
eq2 = Equation(
    from_layer_name = "omics",
    to_layer_name = "pheno",
    equation = "pheno = intercept + omics",
    phenotype_name = ["y1"],
    method = "BayesC",
    activation_function = "sigmoid"
)

Supporting Types

NNMM.OmicsType
Omics

A struct representing omics data (e.g., transcriptomics, metabolomics) in NNMM.

Fields

  • name: Name identifier for this omics category
  • trait_names: Names for corresponding traits
  • obsID: Row IDs for individuals
  • featureID: Feature/omics variable IDs
  • nObs: Number of observations
  • nFeatures: Number of omics features
  • nMarkers: Alias for nFeatures (compatibility)
  • centered: Whether data is centered
  • data: The omics data matrix
  • ntraits: Number of traits in the model
  • Additional fields for Bayesian MCMC computations
source
NNMM.PhenotypesType
Phenotypes

A struct representing phenotype data in NNMM.

Fields

  • obsID: Individual IDs
  • featureID: Phenotype/trait names
  • nObs: Number of observations
  • nPheno: Number of phenotypes
  • data: Phenotype data matrix
  • nFeatures: Alias for nPheno
source

Main Functions

runNNMM

The primary function for running NNMM analysis.

NNMM.runNNMMFunction
runNNMM(layers, equations; kwargs...)

Run Neural Network Mixed Model (NNMM) analysis.

Arguments

  • layers: Vector of 3 Layer objects defining the network architecture
    • Layer 1: Genotypes (SNPs)
    • Layer 2: Omics/Latent traits
    • Layer 3: Phenotypes
  • equations: Vector of 2 Equation objects defining relationships
    • Equation 1: Genotypes → Omics (1→2)
    • Equation 2: Omics → Phenotypes (2→3)

Keyword Arguments

MCMC Settings

  • chain_length::Integer=100: Total MCMC iterations
  • burnin::Integer=0: Number of burn-in iterations to discard
  • output_samples_frequency::Integer: Save every nth sample (default: auto)
  • output_prediction_frequency::Integer: Save every nth prediction sample for EBV/EPV (default: output_samples_frequency)
  • update_priors_frequency::Integer=0: Update prior parameters every n iterations

Output Settings

  • outputEBV=true: Output estimated breeding values
  • output_heritability=true: Calculate heritability estimates
  • output_folder="nnmm_results": Directory for output files

Computational Settings

  • seed=false: Random seed for reproducibility
  • double_precision=false: Use Float64 instead of Float32
  • big_memory=false: Enable memory-intensive optimizations

Returns

A dictionary containing:

  • Posterior means for all parameters
  • MCMC samples (saved to files)
  • EBV estimates

Example

# Define layers
layer1 = Layer(name="geno", file="genotypes.csv")
layer2 = Layer(name="omics", file="omics.csv")
layer3 = Layer(name="pheno", file="phenotypes.csv")

# Define equations
eq1 = Equation("omics = intercept + geno", 
               method="BayesC", omics_name=["o1","o2"])
eq2 = Equation("pheno = intercept + omics", 
               activation_function=tanh, phenotype_name=["y"])

# Run analysis
results = runNNMM([layer1, layer2, layer3], [eq1, eq2],
                  chain_length=5000, burnin=1000)

See also: Layer, Equation, describe

source

Usage:

results = runNNMM(layers, equations;
    chain_length = 10000,
    burnin = 2000,
    output_folder = "my_results",
    seed = 42
)

describe

Print model summary information.

DataAPI.describeFunction
describe(model::MME)

Print a summary of the mixed model equations (MME) structure.

Displays:

  • Model equations (truncated if >5)
  • Term information (classification, fixed/random, number of levels)
  • Prior distributions and hyperparameters
  • Variance component settings
  • Marker effect settings (if genomic data included)

Arguments

  • model::MME: The mixed model equations object (created internally by runNNMM)

Output

Prints formatted model summary to stdout including:

  • Model equations
  • Term classification (covariate/factor, fixed/random)
  • Prior settings for variance components
  • MCMC configuration

Example

# After running runNNMM, the describe function is called automatically.
# For manual inspection:
describe(results["mme"])
source

Data Reading Functions

read_phenotypes

Read phenotype data from a file.

NNMM.read_phenotypesFunction
read_phenotypes(file; separator=',', header=true, missing_value="NA", output_folder=".")

Read phenotype data from a CSV file.

Arguments

  • file: Path to CSV file containing phenotype data

Keyword Arguments

  • separator: Column delimiter (default: ',')
  • header: Whether file has header row (default: true)
  • missing_value: String representing missing values (default: "NA")
  • output_folder: Directory for output files (default: current directory)

Returns

DataFrame with phenotype data. First column is individual IDs.

Example

pheno_df = read_phenotypes("phenotypes.csv", missing_value="NA")

Notes

  • First column must contain individual IDs
  • Missing values are converted to Julia's missing type
  • A file IDs_for_individuals_with_phenotypes.txt is written to output_folder
source

nnmmgetgenotypes

Read genotype data from a file or matrix.

NNMM.nnmm_get_genotypesFunction
nnmm_get_genotypes(file, G=false; kwargs...)

Read genotype data for the NNMM model (layer 1).

Arguments

  • file: Genotype data source - can be:
    • String: Path to CSV file
    • DataFrame: DataFrame with ID column + marker columns
    • Matrix: Numeric matrix of genotypes
  • G: Prior genetic variance (default: estimated from data)

Keyword Arguments

  • method: Bayesian method ("BayesA", "BayesB", "BayesC", "RR-BLUP", "GBLUP", "BayesL")
  • Pi: Prior inclusion probability (default: 0.0)
  • estimatePi: Whether to estimate Pi (default: true)
  • G_is_marker_variance: If true, G is marker variance; if false, G is genetic variance
  • df: Degrees of freedom for prior (default: 4.0)
  • quality_control: Perform QC filtering (default: true)
  • MAF: Minor allele frequency threshold (default: 0.01)
  • center: Center genotypes (default: true)
  • separator: File delimiter (default: ',')
  • header: File has header row (default: true)

Returns

Genotypes struct containing processed genotype data and method parameters.

source

get_genotypes

Alias for nnmm_get_genotypes.

nnmmgetomics

Read omics data from a file.

NNMM.nnmm_get_omicsFunction
nnmm_get_omics(file, G=false; kwargs...)

Read omics data for the NNMM model (layer 2 / middle layer).

Arguments

  • file: Path to CSV file containing omics data
  • G: Prior genetic variance (default: estimated from data)

Keyword Arguments

  • omics_name: Vector of column names to use as omics features (required)
  • method: Bayesian method ("BayesA", "BayesB", "BayesC", "RR-BLUP", "BayesL")
  • Pi: Prior inclusion probability (default: 0.0)
  • estimatePi: Whether to estimate Pi (default: true)
  • G_is_marker_variance: If true, G is marker variance; if false, G is genetic variance
  • df: Degrees of freedom for prior (default: 4.0)
  • constraint: Use independent variances for multi-trait (default: true)
  • separator: File delimiter (default: ',')
  • header: File has header row (default: true)
  • missing_value: String/value representing missing data (default: false)

Returns

Omics struct containing the omics data and method parameters.

Example

omics = nnmm_get_omics("omics_data.csv", 
                       omics_name=["gene1", "gene2", "gene3"],
                       missing_value="NA")

Notes

  • First column must be individual IDs
  • Missing values will be sampled during MCMC (HMC for latent traits)
  • Unlike genotypes, no quality control (MAF filtering) is applied
source

Post-Analysis Functions

GWAS

Genome-wide association study on MCMC results.

NNMM.GWASFunction
GWAS(marker_effects_file; header=true)

Compute the model frequency for each marker.

Model frequency is the probability that a marker is included in the model (i.e., has non-zero effect) across MCMC samples.

Arguments

  • marker_effects_file: Path to CSV file with MCMC samples of marker effects
  • header: Whether file has header row (default: true)

Returns

DataFrame with columns:

  • marker_ID: Marker identifier
  • modelfrequency: Proportion of samples where effect ≠ 0

Example

# After running NNMM with BayesB or BayesC
freq = GWAS("nnmm_results/MCMC_samples_marker_effects_genotypes.txt")
source
GWAS(model,map_file,marker_effects_file...;
     window_size = "1 Mb",sliding_window = false,
     GWAS = true, threshold = 0.001,
     genetic_correlation = false,
     header = true)

run genomic window-based GWAS

  • MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
  • model is either the model::MME used in analysis or the genotype cavariate matrix M::Array
  • map_file has the (sorted) marker position information with delimiter ','. If the map file is not provided, i.e., map_file=false, a fake map file will be generated with window_size markers in each 1 Mb window, and each 1 Mb window will be tested.
  • If two markereffectsfile are provided, and genetic_correlation = true, genomic correlation for each window is calculated.
  • Statistics are computed for nonoverlapping windows of size window_size by default. If sliding_window = true, those for overlapping sliding windows are calculated.
  • map file format:
markerID,chromosome,position
m1,1,16977
m2,1,434311
m3,1,1025513
m4,2,70350
m5,2,101135
source

Usage:

# Run GWAS on marker effect samples
gwas_result = GWAS("results/MCMC_samples_marker_effects_geno_omic1.txt")

# Sort by model frequency
sorted = sort(gwas_result, :modelfrequency, rev=true)
println(first(sorted, 10))

getEBV

Extract estimated breeding values from results.

NNMM.getEBVFunction
getEBV(model::MME,traiti)

(internal function) Get breeding values for individuals defined by outputEBV(), defaulting to all genotyped individuals. This function is used inside MCMC functions for one MCMC samples from posterior distributions. e.g., non-NNBayespartial (multi-classs Bayes) : y1=M1α1[1]+M2α2[1]+M3α3[1] y2=M1α1[2]+M2α2[2]+M3α3[2]; NNBayespartial: y1=M1α1[1] y2=M2α2[1] y3=M3*α3[1];

source

Built-in Datasets

dataset

Access built-in example datasets.

NNMM.Datasets.datasetFunction
dataset(file_name::AbstractString; dataset_name::AbstractString="")

Get the path to a built-in dataset file.

Arguments

  • file_name::AbstractString: The name of the file to retrieve
  • dataset_name::AbstractString="": Optional subdirectory name within the data folder

Returns

  • String: Full path to the requested data file

Examples

phenofile = dataset("phenotypes.csv")
genofile = dataset("genotypes.txt", dataset_name="example")
source

Usage:

using NNMM.Datasets

# Access default example data
pheno_path = Datasets.dataset("phenotypes.csv")
geno_path = Datasets.dataset("genotypes.csv")

# Access simulated omics dataset
geno_path = Datasets.dataset("genotypes_1000snps.txt", dataset_name="simulated_omics_data")
pheno_path = Datasets.dataset("phenotypes_sim.txt", dataset_name="simulated_omics_data")

Available Datasets:

DatasetFilesDescription
(default)phenotypes.csv, genotypes.csv, genotypes0.csv, pedigree.csv, GRM.csv, map.csvSmall example data
(default)genotypes_group1.csv, genotypes_group2.csv, genotypes_group3.csvGenotype groups for partial networks
examplephenotypes.txt, genotypes.txt, pedigree.txt, etc.Tab-separated example data
simulated_omics_datagenotypes_1000snps.txt, phenotypes_sim.txt, pedigree.txtSimulated dataset with 1000 SNPs and 10 omics

Pedigree Functions

get_pedigree

Read and process pedigree information.

NNMM.PedModule.get_pedigreeFunction
get_pedigree(pedfile::AbstractString;header=false,separator=',',missingstrings=["0"],output_folder=".")
  • Get pedigree informtion from a pedigree file with header (defaulting to false) , separator (defaulting to ,) and missing values (defaulting to ["0"])
  • output_folder: Directory to save diagnostic files (defaulting to current directory)
  • Pedigree file format:
a,0,0
c,a,b
d,a,c
source

Usage:

# Read pedigree file
pedigree = get_pedigree("pedigree.csv", separator=',', header=true)

# Use in random effect specification
random_spec = [(name="ID", pedigree=pedigree)]

Parameter Reference Tables

Bayesian Methods

MethodDescription
"BayesA"All markers have non-zero effects with marker-specific variances
"BayesB"Subset of markers have non-zero effects with marker-specific variances
"BayesC"Subset of markers have non-zero effects with common variance
"BayesL"Bayesian LASSO
"RR-BLUP"Ridge regression BLUP (all markers, common variance)
"GBLUP"Genomic BLUP using relationship matrix (Layer 1→2 only)

Activation Functions

FunctionFormulaRangeUse Case
"linear"f(x) = x(-∞, ∞)Traditional regression
"sigmoid"f(x) = 1/(1+e^(-x))(0, 1)Bounded outputs
"tanh"f(x) = tanh(x)(-1, 1)Centered bounded outputs
"relu"f(x) = max(0, x)[0, ∞)Sparse activation
"leakyrelu"f(x) = max(0.01x, x)(-∞, ∞)Sparse with gradient flow

runNNMM Keyword Arguments

ArgumentTypeDefaultDescription
chain_lengthInteger100Total MCMC iterations
burninInteger0Burn-in iterations to discard
output_samples_frequencyIntegerautoSave every Nth sample
outputEBVBooltrueOutput estimated breeding values
output_heritabilityBooltrueCalculate heritability
output_folderString"nnmm_results"Output directory
seedInt/BoolfalseRandom seed (false = random)
printout_frequencyIntegerchain_length+1Print progress frequency
double_precisionBoolfalseUse Float64 instead of Float32
big_memoryBoolfalseEnable memory-intensive optimizations