Annotated BayesR
Annotated BayesR extends single-trait dense BayesR by letting marker annotations change the full four-class mixture prior for each marker. It is the individual-level JWAS analogue of the sbayesrc.R summary-statistics sampler.
Method Overview
Standard BayesR uses one shared class-probability vector:
\[\pi = (\pi_1, \pi_2, \pi_3, \pi_4)\]
where:
pi_1is the zero-effect classpi_2,pi_3, andpi_4are the nonzero mixture classes
Annotated BayesR replaces that shared prior with marker-specific class probabilities pi_j. JWAS does this with three conditional probit models:
\[p_{1j} = \Pr(\delta_j > 1 \mid a_j, \alpha_1) = \Phi(a_j^\top \alpha_1)\]
\[p_{2j} = \Pr(\delta_j > 2 \mid \delta_j > 1, a_j, \alpha_2) = \Phi(a_j^\top \alpha_2)\]
\[p_{3j} = \Pr(\delta_j > 3 \mid \delta_j > 2, a_j, \alpha_3) = \Phi(a_j^\top \alpha_3)\]
with:
a_jas the annotation row for markerjalpha_1,alpha_2,alpha_3as step-specific annotation coefficientsPhias the standard normal CDF
JWAS then reconstructs the four-class prior for each marker:
\[\pi_{j1} = 1 - p_{1j}\]
\[\pi_{j2} = p_{1j}(1 - p_{2j})\]
\[\pi_{j3} = p_{1j}p_{2j}(1 - p_{3j})\]
\[\pi_{j4} = p_{1j}p_{2j}p_{3j}\]
The BayesR marker update still samples delta_j in one four-way draw. The sequential structure is in the annotation model that generates pi_j, not in a chained marker-state sampler.
Initialization
Annotated BayesR starts from the same default prior as standard BayesR:
\[\pi = (0.95, 0.03, 0.015, 0.005)\]
JWAS converts this into the three conditional probabilities:
p1 = 0.05p2 = 0.40p3 = 0.25
and then initializes the probit intercepts as:
Phi^{-1}(0.05) = -1.64485362695Phi^{-1}(0.40) = -0.25334710314Phi^{-1}(0.25) = -0.67448975020
All non-intercept annotation coefficients start at zero. So all markers begin with the same BayesR prior, and then the annotation model learns marker-specific class probabilities during MCMC.
Sampler Order
For each MCMC iteration, JWAS runs annotated BayesR in this order:
- update location parameters and
yCorr - sample BayesR marker classes and marker effects using the current marker-specific
pi_j - build the step-up indicators:
z1_j = 1(delta_j > 1)z2_j = 1(delta_j > 2)z3_j = 1(delta_j > 3)
- update the three conditional annotation models:
- step 1 on all markers
- step 2 on markers with
z1_j = 1 - step 3 on markers with
z2_j = 1
- rebuild all marker-specific class probabilities
pi_j - sample the shared BayesR marker variance
sigmaSq - sample the residual variance
This follows the same high-level Gibbs ordering as sbayesrc.R, but uses JWAS individual-level marker updates instead of summary-statistics equations.
Input Requirements
- Current support is single-trait
method="BayesR"only. - Current support is dense storage only.
- Pass annotations through
get_genotypes(...; annotations=...). annotationsmust be a numeric matrix with one row per marker in the raw genotype input.- JWAS applies the same marker QC/filtering mask to
annotationsas it applies to genotypes. - JWAS prepends an intercept column automatically after filtering. Users should not include an intercept column.
Current v1 exclusions:
storage=:stream- multi-trait BayesR
- random regression models (
RRM)
fast_blocks is supported for dense annotated BayesR. The block sampler uses the same annotation-induced marker-specific class probabilities pi_j as the dense sampler. As with ordinary BayesR, block mode is an accelerated approximation to the dense transition kernel rather than the exact same sampler.
Dense Example
using JWAS, CSV, DataFrames
phenotypes = CSV.read("phenotypes.txt", DataFrame, delim=',', missingstring=["NA"])
annotations = [
0.0 1.0
1.0 0.0
1.0 1.0
0.0 0.0
0.5 0.5
]
genotypes = get_genotypes(
"genotypes.txt", 1.0;
method="BayesR",
separator=',',
quality_control=false,
annotations=annotations,
)
model = build_model("y1 = intercept + genotypes", 1.0)
output = runMCMC(
model,
phenotypes;
chain_length=2000,
burnin=500,
output_samples_frequency=10,
outputEBV=false,
output_heritability=false,
)Dense Block Example
output = runMCMC(
model,
phenotypes;
chain_length=2000,
burnin=500,
output_samples_frequency=10,
fast_blocks=true,
outputEBV=false,
output_heritability=false,
)Output
Annotated BayesR keeps the standard BayesR outputs, including:
- marker effects
- posterior
Model_Frequency - posterior shared marker variance
- posterior residual variance
It also keeps the pi_<genotype name> table, but the meaning is slightly different from ordinary BayesR. For annotated BayesR, that table reports posterior means of the current annotation-induced class probabilities averaged across markers.
It also adds a step-specific annotation-coefficient table:
output["annotation coefficients genotypes"]with columns:
AnnotationStepEstimateSD
The Step labels are:
step1_zero_vs_nonzerostep2_small_vs_largerstep3_medium_vs_large
These are the sampled annotation-model parameters. They describe how annotations change the current conditional prior class probabilities, not the final posterior class probabilities.
Practical Notes
- Standard BayesR is unchanged when no
annotationsare provided. - Annotation rows are defined on the raw marker order, not the post-QC marker order.
- If QC drops markers, JWAS drops the corresponding annotation rows before adding the intercept column.
- Posterior PIP is still read from the marker-effects output as
Model_Frequency = Pr(delta_j > 1 | data).