Annotated BayesC
Annotated BayesC extends single-trait BayesC by letting marker annotations change the prior exclusion probability for each marker. Instead of using one shared exclusion probability π, JWAS maintains a marker-specific vector π_j.
Method Overview
Standard BayesC uses:
δ_j = 0when markerjis excludedδ_j = 1when markerjis included- a common exclusion probability
π
Annotated BayesC replaces that common prior with annotation-driven probabilities:
\[\Pr(\delta_j = 1 \mid a_j, \gamma) = \Phi(a_j^\top \gamma)\]
where:
a_jis the annotation row for markerjγis a vector of annotation coefficientsΦis the standard normal CDF
JWAS stores BayesC π as an exclusion probability, so the internal update is:
\[\pi_j = 1 - \Phi(a_j^\top \gamma)\]
During MCMC, JWAS alternates between:
- sampling marker inclusion indicators and marker effects using current
π_j - sampling latent liabilities for the annotation model
- sampling annotation coefficients
γ - refreshing the per-marker exclusion probabilities
π_j
Input Requirements
- Current support is single-trait
method="BayesC"only. - Pass annotations through
get_genotypes(...; annotations=...). annotationsmust be a numeric matrix with one row per marker in the raw genotype input.- JWAS applies the same marker QC/filtering mask to
annotationsas it applies to genotypes. - JWAS prepends an intercept column automatically after filtering. Users should not include an intercept column.
Dense Example
using JWAS, CSV, DataFrames
phenotypes = CSV.read("phenotypes.txt", DataFrame, delim=',', missingstring=["NA"])
annotations = [
0.0 1.0
1.0 0.0
1.0 1.0
0.0 0.0
0.5 0.5
]
genotypes = get_genotypes(
"genotypes.txt", 1.0;
method="BayesC",
separator=',',
quality_control=false,
annotations=annotations,
)
model = build_model("y1 = intercept + genotypes", 1.0)
output = runMCMC(
model,
phenotypes;
chain_length=2000,
burnin=500,
output_samples_frequency=10,
)The returned output includes an annotation-coefficient table:
output["annotation coefficients genotypes"]The first row is the automatically added intercept, followed by Annotation_1, Annotation_2, and so on.
Dense Example with fast_blocks
Annotated BayesC also works with the dense block sampler:
output = runMCMC(
model,
phenotypes;
chain_length=6000,
burnin=500,
output_samples_frequency=10,
fast_blocks=true,
)This uses the same annotation-driven π_j updates, but samples marker effects through the block BayesC path. JWAS rescales the outer chain_length in block mode, so the nominal chain_length should be chosen large enough that the effective post-scaling chain is still longer than burnin. For block-update details, see Block BayesC.
Streaming Example
The streaming backend is also supported for Annotated BayesC:
using JWAS, CSV, DataFrames
prefix = prepare_streaming_genotypes(
"genotypes.txt";
separator=',',
header=true,
quality_control=false,
center=true,
)
annotations = [
0.0 1.0
1.0 0.0
1.0 1.0
0.0 0.0
0.5 0.5
]
genotypes = get_genotypes(
prefix, 1.0;
method="BayesC",
storage=:stream,
annotations=annotations,
)
phenotypes = DataFrame(
ID=copy(genotypes.obsID),
y1=Float32[1.1, -0.3, 0.8, -0.9, 0.5, -0.1, 0.2],
)
model = build_model("y1 = intercept + genotypes", 1.0)
output = runMCMC(
model,
phenotypes;
chain_length=2000,
burnin=500,
output_samples_frequency=10,
outputEBV=false,
output_heritability=false,
memory_guard=:off,
)Streaming keeps the same annotation logic, but it still inherits the current streaming MVP constraints:
- single trait only
- BayesC only
fast_blocks=false- unit residual weights only
double_precision=false- phenotype IDs must exactly match genotype IDs in the same order
For more on the streaming backend, see Streaming Genotype Walkthrough.
Practical Notes
- Standard BayesC is still the same model when no
annotationsare provided; JWAS simply fills theπvector with one repeated value. - Annotation rows are defined on the raw marker order, not the post-QC marker order.
- If QC drops markers, JWAS drops the corresponding annotation rows before adding the intercept column.