Public API Reference

Documentation for NNMM.jl's public interface. Below are functions available to general users.

Version Compatibility

This documentation reflects NNMM.jl v0.3+ using the Layer/Equation/runNNMM API.

Index

NNMM.Equation
NNMM.Layer
NNMM.Omics
NNMM.Phenotypes
NNMM.GWAS
NNMM.getEBV
NNMM.get_genotypes
NNMM.nnmm_get_genotypes
NNMM.nnmm_get_omics
NNMM.read_phenotypes
NNMM.runNNMM

Core Types

Layer

Defines a layer in the neural network architecture.

NNMM.Layer — Type

Layer

A struct representing a layer in the Neural Network Mixed Model (NNMM).

Fields

layer_name::String: Name identifier for the layer
data_path: Path to the data file(s) - String for single file, Vector{String} for partial connection
separator: Column separator in the data file (default: ',')
header: Whether the data file has a header row (default: true)
data: Loaded data (initialized as empty)
quality_control: Whether to perform quality control on genotypes (default: true)
MAF: Minor Allele Frequency threshold for QC (default: 0.01)
missing_value: Value representing missing data (default: 9.0)
center: Whether to center the data (default: true)

source

Usage:

# Genotype layer (input) - note the [] around the path
layer1 = Layer(layer_name="geno", data_path=["genotypes.csv"])

# Omics layer (hidden) with missing value handling
layer2 = Layer(layer_name="omics", data_path="omics.csv", missing_value="NA")

# Phenotype layer (output)
layer3 = Layer(layer_name="pheno", data_path="phenotypes.csv", missing_value="NA")

Equation

Defines the statistical model connecting two layers.

NNMM.Equation — Type

Equation

A struct representing an equation connecting layers in NNMM.

Fields

from_layer_name: Source layer name
to_layer_name: Target layer name
equation: Model equation string (e.g., "omics = intercept + geno")
traits: Output trait column names (e.g., ["omic1", "omic2"] or ["trait1"])
covariate: Covariate variable names
random: Random effect specifications
activation_function: Activation function - string ("linear", "sigmoid", "tanh", "relu", "leakyrelu") or custom Function
partial_connect_structure: Structure for partial connectivity
starting_value: Starting values for MCMC
method: Bayesian method ("BayesC", "BayesA", "BayesB", etc.)
Pi: Prior probability of zero effect
estimatePi: Whether to estimate Pi
Variance parameters for G (genetic) and R (residual)

Note: omics_name and phenotype_name are deprecated aliases for traits.

source

Usage:

# Genotypes → Omics (BayesC with 10 omics features)
eq1 = Equation(
    from_layer_name = "geno",
    to_layer_name = "omics", 
    equation = "omics = intercept + geno",
    omics_name = ["omic1", "omic2", "omic3"],
    method = "BayesC",
    estimatePi = true
)

# Omics → Phenotypes (with sigmoid activation)
eq2 = Equation(
    from_layer_name = "omics",
    to_layer_name = "pheno",
    equation = "pheno = intercept + omics",
    phenotype_name = ["y1"],
    method = "BayesC",
    activation_function = "sigmoid"
)

Supporting Types

NNMM.Omics — Type

Omics

A struct representing omics data (e.g., transcriptomics, metabolomics) in NNMM.

Fields

name: Name identifier for this omics category
trait_names: Names for corresponding traits
obsID: Row IDs for individuals
featureID: Feature/omics variable IDs
nObs: Number of observations
nFeatures: Number of omics features
nMarkers: Alias for nFeatures (compatibility)
centered: Whether data is centered
data: The omics data matrix
ntraits: Number of traits in the model
Additional fields for Bayesian MCMC computations

source

NNMM.Phenotypes — Type

Phenotypes

A struct representing phenotype data in NNMM.

Fields

obsID: Individual IDs
featureID: Phenotype/trait names
nObs: Number of observations
nPheno: Number of phenotypes
data: Phenotype data matrix
nFeatures: Alias for nPheno

source

Data Reading Functions

read_phenotypes

Read phenotype data from a file.

NNMM.read_phenotypes — Function

read_phenotypes(file; separator=',', header=true, missing_value="NA", output_folder=".")

Read phenotype data from a CSV file.

Arguments

file: Path to CSV file containing phenotype data

Keyword Arguments

separator: Column delimiter (default: ',')
header: Whether file has header row (default: true)
missing_value: String representing missing values (default: "NA")
output_folder: Directory for output files (default: current directory)

Returns

DataFrame with phenotype data. First column is individual IDs.

Example

pheno_df = read_phenotypes("phenotypes.csv", missing_value="NA")

Notes

First column must contain individual IDs
Missing values are converted to Julia's missing type
A file IDs_for_individuals_with_phenotypes.txt is written to output_folder

source

nnmmgetgenotypes

Read genotype data from a file or matrix.

NNMM.nnmm_get_genotypes — Function

nnmm_get_genotypes(file, G=false; kwargs...)

Read genotype data for the NNMM model (layer 1).

Arguments

file: Genotype data source - can be:
- String: Path to CSV file
- DataFrame: DataFrame with ID column + marker columns
- Matrix: Numeric matrix of genotypes
G: Prior genetic variance (default: estimated from data)

Keyword Arguments

method: Bayesian method ("BayesA", "BayesB", "BayesC", "RR-BLUP", "GBLUP", "BayesL")
Pi: Prior inclusion probability (default: 0.0)
estimatePi: Whether to estimate Pi (default: true)
G_is_marker_variance: If true, G is marker variance; if false, G is genetic variance
df: Degrees of freedom for prior (default: 4.0)
quality_control: Perform QC filtering (default: true)
MAF: Minor allele frequency threshold (default: 0.01)
center: Center genotypes (default: true)
separator: File delimiter (default: ',')
header: File has header row (default: true)

Returns

Genotypes struct containing processed genotype data and method parameters.

source

get_genotypes

Alias for nnmm_get_genotypes.

NNMM.get_genotypes — Function

get_genotypes(args...; kwargs...)

Alias for nnmm_get_genotypes. See that function for full documentation.

source

nnmmgetomics

Read omics data from a file.

NNMM.nnmm_get_omics — Function

nnmm_get_omics(file, G=false; kwargs...)

Read omics data for the NNMM model (layer 2 / middle layer).

Arguments

file: Path to CSV file containing omics data
G: Prior genetic variance (default: estimated from data)

Keyword Arguments

omics_name: Vector of column names to use as omics features (required)
method: Bayesian method ("BayesA", "BayesB", "BayesC", "RR-BLUP", "BayesL")
Pi: Prior inclusion probability (default: 0.0)
estimatePi: Whether to estimate Pi (default: true)
G_is_marker_variance: If true, G is marker variance; if false, G is genetic variance
df: Degrees of freedom for prior (default: 4.0)
constraint: Use independent variances for multi-trait (default: true)
separator: File delimiter (default: ',')
header: File has header row (default: true)
missing_value: String/value representing missing data (default: false)

Returns

Omics struct containing the omics data and method parameters.

Example

omics = nnmm_get_omics("omics_data.csv", 
                       omics_name=["gene1", "gene2", "gene3"],
                       missing_value="NA")

Notes

First column must be individual IDs
Missing values will be sampled during MCMC (HMC for latent traits)
Unlike genotypes, no quality control (MAF filtering) is applied

source

Post-Analysis Functions

GWAS

Genome-wide association study on MCMC results.

NNMM.GWAS — Function

GWAS(marker_effects_file; header=true)

Compute the model frequency for each marker.

Model frequency is the probability that a marker is included in the model (i.e., has non-zero effect) across MCMC samples.

Arguments

marker_effects_file: Path to CSV file with MCMC samples of marker effects
header: Whether file has header row (default: true)

Returns

DataFrame with columns:

marker_ID: Marker identifier
modelfrequency: Proportion of samples where effect ≠ 0

Example

# After running NNMM with BayesB or BayesC
freq = GWAS("nnmm_results/MCMC_samples_marker_effects_genotypes.txt")

source

GWAS(model,map_file,marker_effects_file...;
     window_size = "1 Mb",sliding_window = false,
     GWAS = true, threshold = 0.001,
     genetic_correlation = false,
     header = true)

run genomic window-based GWAS

MCMC samples of marker effects are stored in markereffectsfile with delimiter ','.
model is either the model::MME used in analysis or the genotype cavariate matrix M::Array
map_file has the (sorted) marker position information with delimiter ','. If the map file is not provided, i.e., map_file=false, a fake map file will be generated with window_size markers in each 1 Mb window, and each 1 Mb window will be tested.
If two markereffectsfile are provided, and genetic_correlation = true, genomic correlation for each window is calculated.
Statistics are computed for nonoverlapping windows of size window_size by default. If sliding_window = true, those for overlapping sliding windows are calculated.
map file format:

markerID,chromosome,position
m1,1,16977
m2,1,434311
m3,1,1025513
m4,2,70350
m5,2,101135

source

Usage:

# Run GWAS on marker effect samples
gwas_result = GWAS("results/MCMC_samples_marker_effects_geno_omic1.txt")

# Sort by model frequency
sorted = sort(gwas_result, :modelfrequency, rev=true)
println(first(sorted, 10))

getEBV

Extract estimated breeding values from results.

NNMM.getEBV — Function

getEBV(model::MME,traiti)

(internal function) Get breeding values for individuals defined by outputEBV(), defaulting to all genotyped individuals. This function is used inside MCMC functions for one MCMC samples from posterior distributions. e.g., non-NNBayespartial (multi-classs Bayes) : y1=M1α1[1]+M2α2[1]+M3α3[1] y2=M1α1[2]+M2α2[2]+M3α3[2]; NNBayespartial: y1=M1α1[1] y2=M2α2[1] y3=M3*α3[1];

source

Built-in Datasets

dataset

Access built-in example datasets.

NNMM.Datasets.dataset — Function

dataset(file_name::AbstractString; dataset_name::AbstractString="")

Get the path to a built-in dataset file.

Arguments

file_name::AbstractString: The name of the file to retrieve
dataset_name::AbstractString="": Optional subdirectory name within the data folder

Returns

String: Full path to the requested data file

Examples

phenofile = dataset("phenotypes.csv")
genofile = dataset("genotypes.txt", dataset_name="example")

source

Usage:

using NNMM.Datasets

# Access default example data
pheno_path = Datasets.dataset("phenotypes.csv")
geno_path = Datasets.dataset("genotypes.csv")

# Access simulated omics dataset
geno_path = Datasets.dataset("genotypes_1000snps.txt", dataset_name="simulated_omics_data")
pheno_path = Datasets.dataset("phenotypes_sim.txt", dataset_name="simulated_omics_data")

Available Datasets:

Dataset	Files	Description
(default)	`phenotypes.csv`, `genotypes.csv`, `genotypes0.csv`, `pedigree.csv`, `GRM.csv`, `map.csv`	Small example data
(default)	`genotypes_group1.csv`, `genotypes_group2.csv`, `genotypes_group3.csv`	Genotype groups for partial networks
`example`	`phenotypes.txt`, `genotypes.txt`, `pedigree.txt`, etc.	Tab-separated example data
`simulated_omics_data`	`genotypes_1000snps.txt`, `phenotypes_sim.txt`, `pedigree.txt`	Simulated dataset with 1000 SNPs and 10 omics

Pedigree Functions

get_pedigree

Read and process pedigree information.

NNMM.PedModule.get_pedigree — Function

get_pedigree(pedfile::AbstractString;header=false,separator=',',missingstrings=["0"],output_folder=".")

Get pedigree informtion from a pedigree file with header (defaulting to false) , separator (defaulting to ,) and missing values (defaulting to ["0"])
output_folder: Directory to save diagnostic files (defaulting to current directory)
Pedigree file format:

a,0,0
c,a,b
d,a,c

source

Usage:

# Read pedigree file
pedigree = get_pedigree("pedigree.csv", separator=',', header=true)

# Use in random effect specification
random_spec = [(name="ID", pedigree=pedigree)]

Parameter Reference Tables

Bayesian Methods

Method	Description
`"BayesA"`	All markers have non-zero effects with marker-specific variances
`"BayesB"`	Subset of markers have non-zero effects with marker-specific variances
`"BayesC"`	Subset of markers have non-zero effects with common variance
`"BayesL"`	Bayesian LASSO
`"RR-BLUP"`	Ridge regression BLUP (all markers, common variance)
`"GBLUP"`	Genomic BLUP using relationship matrix (Layer 1→2 only)

Activation Functions

Function	Formula	Range	Use Case
`"linear"`	f(x) = x	(-∞, ∞)	Traditional regression
`"sigmoid"`	f(x) = 1/(1+e^(-x))	(0, 1)	Bounded outputs
`"tanh"`	f(x) = tanh(x)	(-1, 1)	Centered bounded outputs
`"relu"`	f(x) = max(0, x)	[0, ∞)	Sparse activation
`"leakyrelu"`	f(x) = max(0.01x, x)	(-∞, ∞)	Sparse with gradient flow

runNNMM Keyword Arguments

Argument	Type	Default	Description
`chain_length`	Integer	100	Total MCMC iterations
`burnin`	Integer	0	Burn-in iterations to discard
`output_samples_frequency`	Integer	auto	Save every Nth sample
`outputEBV`	Bool	true	Output estimated breeding values
`output_heritability`	Bool	true	Calculate heritability
`output_folder`	String	"nnmm_results"	Output directory
`seed`	Int/Bool	false	Random seed (false = random)
`printout_frequency`	Integer	chain_length+1	Print progress frequency
`double_precision`	Bool	false	Use Float64 instead of Float32
`big_memory`	Bool	false	Enable memory-intensive optimizations

Public API Reference

Index

Core Types

Layer

Equation

Supporting Types

Main Functions

runNNMM

describe

Data Reading Functions

read_phenotypes

nnmmgetgenotypes

get_genotypes

nnmmgetomics

Post-Analysis Functions

GWAS

getEBV

Built-in Datasets

dataset

Pedigree Functions

get_pedigree

Parameter Reference Tables

Bayesian Methods

Activation Functions

runNNMM Keyword Arguments