b3chain — B3PoW-Scratch vs SHA-256d (PoW-level)

1. Side-by-side

Property	SHA-256d (Bitcoin)	B3PoW-Scratch v1.1.1 (B3Chain)
Inner round function	SHA-256 (Merkle-Damgard)	BLAKE3 (Bao tree)
State per attempt	32 bytes	1 048 576 bytes (1 MB scratchpad)
Lanes	1	8 lanes × 128 KiB; cross-lane diffusion via `LANE_SHUFFLE` permutation L' = (5L+1) mod 8 = `[1,6,3,0,5,2,7,4]`
Inner rounds per nonce	2	2048 outer × 2 inner = 4096
Memory bandwidth required	trivial (fits in L1)	8 lanes × 64 B = ~512 B / attempt of working set, dominated by ~1 MB pad re-reads
Time per attempt on CPU (single thread, median)	~1.5 µs	~700 µs
ASIC speed-up upper bound vs 1-thread CPU	~10⁷× (mature market, custom ALUs)	bounded by on-die SRAM / HBM bandwidth; commercially typical 1-3 GB/s per chip ≈ ~10³-10^3.5×, not 10⁷×
Verifier wall-clock budget	none (verification is microseconds)	50 ms per header (`consensus.b3pow_verify_budget_ms`)
Verifier cache	not needed	per-`prev_block_hash` LRU (`b3pow::Cache`), 4 entries by default (≈ 4 MB resident)

2. Why this matters

The headline number is the ASIC speed-up upper bound. A SHA-256d ASIC ekes out enormous gains because the algorithm fits inside a single 32-byte register file; the entire pipeline can be made combinational and pipelined to 1-3 GHz with O(10⁴) parallel pipelines on one die.

A B3PoW-Scratch ASIC cannot do that: every nonce needs to read megabytes of memory in a data-dependent pattern, and the silicon real estate that would otherwise go to "more pipelines" instead has to be spent on more SRAM. The ASIC advantage caps near the memory-bandwidth-per-watt frontier, which the GPU/CPU market is already on.

3. Honesty caveats

The "≈ 1 000-3 000×" ceiling is an upper-bound estimate from analogous memory-bound PoW deployments (RandomX, Equihash 144/5). Real B3PoW-Scratch ASICs are not deployed at scale at the time of writing, so this number will be refined as data appears.
Memory-hardness is a defense in depth layer on top of the identity-hash isolation audited at H-1 — it does not make a weak PoW algorithm strong, it makes a strong PoW algorithm harder to ASIC.
Wall-clock budget enforcement (b3pow_verify_budget_ms) and the HEADERS verification cap (MAX_B3POW_VERIFY_PER_BATCH) are audited separately at H-1.1 and H-1.3; this page only covers the algorithmic comparison.
The primitive throughput cards (pow-throughput, block-validation) still measure the BLAKE3 round-function in isolation. They are useful because BLAKE3 is the inner function of B3PoW-Scratch, but they are not the chain-level number.

4. Source files

contrib/miner/b3miner-rtl/SPEC.md — normative algorithm spec
contrib/testing/compare/compare-b3pow-vs-sha256d.md — this page's source
src/crypto/b3pow_scratch.cpp — the C++ port
src/crypto/b3pow_cache.cpp — b3pow::Cache

Collision margin (MD vs Bao) Compare hub