Memory Usage (BayesA/B/C Marker Paths)
This page summarizes memory usage for JWAS marker samplers, with emphasis on the non-block path (fast_blocks=false) and how block mode changes memory requirements.
Notation
N: number of records (nObs)P: number of markers (nMarkers)b: nominal block sizeB = ceil(P / b): number of blockss_i: size of blocki, withsum_i s_i = Pt: bytes per stored value (t=4forFloat32,t=8forFloat64)
Main allocations are created in GibbsMats(...) (src/1.JWAS/src/markers/tools4genotypes.jl).
Non-Block Mode (fast_blocks=false)
What is stored
X(genotype matrix):N x PxArray: column views ofX(metadata only, no data copy)xpRinvx: lengthPxRinvArray:- aliases
xArraywhenRinv == ones(length(Rinv))(no extraN x Pcopy) - materializes
[x .* Rinv for x in xArray]when weights are non-unit (extraN x P)
- aliases
There are also marker-state vectors (alpha, beta, delta, posterior means, etc.) that scale as O(P) and are usually much smaller than N x P matrices.
Approximate memory formulas
- Unit weights (default when no
weightscolumn is used):
Mem_nonblock_unit ~= t * (N*P + P) + O(P*t)
- Non-unit weights:
Mem_nonblock_nonunit ~= t * (2*N*P + P) + O(P*t)
Interpretation: non-block memory is dominated by one copy of X (unit weights) or two copies (non-unit weights).
Block Mode Additions (fast_blocks != false)
Block mode keeps all non-block structures and adds:
XArray: block views ofX(metadata only)XpRinvX: per-block Gram matrices; total elementssum_i s_i^2(approximatelyP*bfor near-uniform blocks)- a temporary block RHS workspace of length up to block size (reused, not
N*Pscale)
Approximate totals:
- Unit weights:
Mem_block_unit ~= t * (N*P + sum_i s_i^2 + P) + O(P*t)
- Non-unit weights:
Mem_block_nonunit ~= t * (2*N*P + sum_i s_i^2 + P) + O(P*t)
Worked Example (N=500,000, P=2,000,000)
Assume fast_blocks=true, so b=floor(sqrt(N))=707.
B = ceil(P/b) = 2,829sum_i s_i^2 = 1,413,937,788
Original non-block version (fast_blocks=false)
This is the base/original sampler memory footprint without block matrices.
| Component | Elements | Float32 | Float64 |
|---|---|---|---|
X | N*P = 1,000,000,000,000 | 4.00 TB (3.64 TiB) | 8.00 TB (7.28 TiB) |
xRinvArray (unit weights) | alias of xArray (no data copy) | 0 | 0 |
xRinvArray (non-unit weights) | N*P | 4.00 TB (3.64 TiB) | 8.00 TB (7.28 TiB) |
xpRinvx | P | 8.0 MB (7.63 MiB) | 16.0 MB (15.26 MiB) |
Non-block totals:
| Non-block case | Float32 | Float64 |
|---|---|---|
| Unit weights | ~4.00 TB | ~8.00 TB |
| Non-unit weights | ~8.00 TB | ~16.00 TB |
Extra objects introduced by block mode (fast_blocks != false)
Component sizes:
| Component | Elements | Float32 | Float64 |
|---|---|---|---|
X | N*P = 1,000,000,000,000 | 4.00 TB (3.64 TiB) | 8.00 TB (7.28 TiB) |
xRinvArray (only non-unit weights) | N*P | 4.00 TB (3.64 TiB) | 8.00 TB (7.28 TiB) |
XpRinvX (block mode) | sum_i s_i^2 | 5.66 GB (5.27 GiB) | 11.31 GB (10.53 GiB) |
xpRinvx | P | 8.0 MB (7.63 MiB) | 16.0 MB (15.26 MiB) |
Approximate totals (dominated by N*P terms):
| Mode | Float32 | Float64 |
|---|---|---|
| Non-block, unit weights | ~4.00 TB | ~8.00 TB |
| Non-block, non-unit weights | ~8.00 TB | ~16.00 TB |
| Block, unit weights | ~4.01 TB | ~8.01 TB |
| Block, non-unit weights | ~8.01 TB | ~16.01 TB |
Practical Takeaways
- At very large
NandP, dense in-memory genotype matrices dominate memory in both non-block and block modes. - Non-block mode is memory-cheapest when weights are unit (
xRinvArrayaliasesxArray). - Current block mode does not persist
XRinvArray; extra memory is mainlyXpRinvX(plus small reusable block workspaces). - For datasets like
N=500k, P=2M, dense storage is multi-terabyte and typically requires a different data representation strategy.