Annotated BayesC

Annotated BayesC extends single-trait BayesC by letting marker annotations change the prior exclusion probability for each marker. Instead of using one shared exclusion probability π, JWAS maintains a marker-specific vector π_j.

Method Overview

Standard BayesC uses:

  • δ_j = 0 when marker j is excluded
  • δ_j = 1 when marker j is included
  • a common exclusion probability π

Annotated BayesC replaces that common prior with annotation-driven probabilities:

\[\Pr(\delta_j = 1 \mid a_j, \gamma) = \Phi(a_j^\top \gamma)\]

where:

  • a_j is the annotation row for marker j
  • γ is a vector of annotation coefficients
  • Φ is the standard normal CDF

JWAS stores BayesC π as an exclusion probability, so the internal update is:

\[\pi_j = 1 - \Phi(a_j^\top \gamma)\]

During MCMC, JWAS alternates between:

  1. sampling marker inclusion indicators and marker effects using current π_j
  2. sampling latent liabilities for the annotation model
  3. sampling annotation coefficients γ
  4. refreshing the per-marker exclusion probabilities π_j

Input Requirements

  • Current support is single-trait method="BayesC" only.
  • Pass annotations through get_genotypes(...; annotations=...).
  • annotations must be a numeric matrix with one row per marker in the raw genotype input.
  • JWAS applies the same marker QC/filtering mask to annotations as it applies to genotypes.
  • JWAS prepends an intercept column automatically after filtering. Users should not include an intercept column.

Dense Example

using JWAS, CSV, DataFrames

phenotypes = CSV.read("phenotypes.txt", DataFrame, delim=',', missingstring=["NA"])

annotations = [
    0.0 1.0
    1.0 0.0
    1.0 1.0
    0.0 0.0
    0.5 0.5
]

genotypes = get_genotypes(
    "genotypes.txt", 1.0;
    method="BayesC",
    separator=',',
    quality_control=false,
    annotations=annotations,
)

model = build_model("y1 = intercept + genotypes", 1.0)

output = runMCMC(
    model,
    phenotypes;
    chain_length=2000,
    burnin=500,
    output_samples_frequency=10,
)

The returned output includes an annotation-coefficient table:

output["annotation coefficients genotypes"]

The first row is the automatically added intercept, followed by Annotation_1, Annotation_2, and so on.

Dense Example with fast_blocks

Annotated BayesC also works with the dense block sampler:

output = runMCMC(
    model,
    phenotypes;
    chain_length=6000,
    burnin=500,
    output_samples_frequency=10,
    fast_blocks=true,
)

This uses the same annotation-driven π_j updates, but samples marker effects through the block BayesC path. JWAS rescales the outer chain_length in block mode, so the nominal chain_length should be chosen large enough that the effective post-scaling chain is still longer than burnin. For block-update details, see Block BayesC.

Streaming Example

The streaming backend is also supported for Annotated BayesC:

using JWAS, CSV, DataFrames

prefix = prepare_streaming_genotypes(
    "genotypes.txt";
    separator=',',
    header=true,
    quality_control=false,
    center=true,
)

annotations = [
    0.0 1.0
    1.0 0.0
    1.0 1.0
    0.0 0.0
    0.5 0.5
]

genotypes = get_genotypes(
    prefix, 1.0;
    method="BayesC",
    storage=:stream,
    annotations=annotations,
)

phenotypes = DataFrame(
    ID=copy(genotypes.obsID),
    y1=Float32[1.1, -0.3, 0.8, -0.9, 0.5, -0.1, 0.2],
)

model = build_model("y1 = intercept + genotypes", 1.0)

output = runMCMC(
    model,
    phenotypes;
    chain_length=2000,
    burnin=500,
    output_samples_frequency=10,
    outputEBV=false,
    output_heritability=false,
    memory_guard=:off,
)

Streaming keeps the same annotation logic, but it still inherits the current streaming MVP constraints:

  • single trait only
  • BayesC only
  • fast_blocks=false
  • unit residual weights only
  • double_precision=false
  • phenotype IDs must exactly match genotype IDs in the same order

For more on the streaming backend, see Streaming Genotype Walkthrough.

Practical Notes

  • Standard BayesC is still the same model when no annotations are provided; JWAS simply fills the π vector with one repeated value.
  • Annotation rows are defined on the raw marker order, not the post-QC marker order.
  • If QC drops markers, JWAS drops the corresponding annotation rows before adding the intercept column.