Benchmark
This page records a real benchmark run comparing runMCMC(...; fast_blocks=true) and runMCMC(...; fast_blocks=false) for the same target-scale question:
N = 50,000individuals- target
P = 2,000,000markers - target chain length
L = 2000
Date of run: February 26, 2026.
Goal
Estimate wall-clock MCMC time for a single-trait BayesC run at target scale, and compare fast-block vs standard BayesC.
Environment
- Platform: UCD farm cluster (Slurm)
- Julia:
1.12.1 - Model:
y1 = intercept + geno - Data: synthetic dense
Float32genotype matrix and synthetic phenotype - Settings:
method="BayesC"memory_guard=:warnoutputEBV=falseoutput_heritability=false
Fast-Block Benchmark (fast_blocks=true)
Design
- Slurm job:
30951323 - Warmup run before timed cases
- Timed grid:
P ∈ {100,000, 200,000}- input
chain_length ∈ {223, 446} REPS = 2
- Because
fast_blocks=trueandN=50,000:block_size = floor(sqrt(50000)) = 223- outer iterations are
floor(chain_length / 223)(1 or 2 in timed runs)
- Extrapolation:
- For each
P, fitt_mcmc = a + b * outer - Convert target to outer iterations:
floor(2000 / 223) = 8 - Fit target time vs
PusingP = 100,000andP = 200,000, then extrapolate toP = 2,000,000
- For each
Measured t_mcmc (seconds)
| Rep | P | chain_length input | outer | t_mcmc |
|---|---|---|---|---|
| 1 | 100,000 | 223 | 1 | 76.515 |
| 1 | 100,000 | 446 | 2 | 84.821 |
| 1 | 200,000 | 223 | 1 | 147.150 |
| 1 | 200,000 | 446 | 2 | 171.600 |
| 2 | 100,000 | 223 | 1 | 70.930 |
| 2 | 100,000 | 446 | 2 | 84.973 |
| 2 | 200,000 | 223 | 1 | 142.255 |
| 2 | 200,000 | 446 | 2 | 169.454 |
Extrapolated Target Time
SUMMARY_EST target_seconds = 3449.100target_hours = 0.958(about 57.5 minutes)- Replicate-based range: about
3274to3624seconds (0.91to1.01hours)
Standard BayesC Benchmark (fast_blocks=false)
Design
- Slurm job:
30951596 - Warmup run before timed cases
- Timed grid:
P ∈ {100,000, 200,000}chain_length ∈ {2, 4}REPS = 2
- Extrapolation:
- For each
P, fitt_mcmc = a + b * chain_length - Predict
t_mcmcatchain_length = 2000 - Fit those predicted times vs
Pand extrapolate toP = 2,000,000
- For each
Measured t_mcmc (seconds)
| Rep | P | chain_length | t_mcmc |
|---|---|---|---|
| 1 | 100,000 | 2 | 27.457 |
| 1 | 100,000 | 4 | 40.214 |
| 1 | 200,000 | 2 | 55.186 |
| 1 | 200,000 | 4 | 107.764 |
| 2 | 100,000 | 2 | 35.555 |
| 2 | 100,000 | 4 | 53.252 |
| 2 | 200,000 | 2 | 56.058 |
| 2 | 200,000 | 4 | 109.762 |
Extrapolated Target Time
SUMMARY_EST target_seconds = 735340.060target_hours = 204.261(about 8.5 days)- Replicate-based range: about
701539to769141seconds (194.9to213.7hours)
Summary Comparison
| Path | Estimated MCMC time at target (N=50k, P=2M, L=2000) |
|---|---|
fast_blocks=true | 3449 sec (0.958 h) |
fast_blocks=false | 735340 sec (204.261 h) |
Approximate speedup from block mode in this benchmark setup: about 213x.
Notes and Interpretation
- These are empirical estimates from short-chain calibration runs, not exact guarantees.
- For
fast_blocks=true, chain interpretation is two-level (outer iterations + inner block repeats). - The two Slurm jobs landed on different CPU node types, so absolute times include some hardware variance.
- Even with that caveat, the gap between block and non-block is very large for this target-scale scenario.