B3PoW-Scratch v1.1.1 vs SHA-256d — PoW-level comparison

Companion to the BLAKE3-primitive throughput pages. This one zooms out from "how fast is the inner hash" to "what does an ASIC have to do to beat a commodity CPU on this PoW?".

Source: compare-b3pow-vs-sha256d.md Spec: contrib/miner/b3miner-rtl/SPEC.md Type: data-only

1. Side-by-side

PropertySHA-256d (Bitcoin)B3PoW-Scratch v1.1.1 (B3Chain)
Inner round function SHA-256 (Merkle-Damgard) BLAKE3 (Bao tree)
State per attempt 32 bytes 1 048 576 bytes (1 MB scratchpad)
Lanes 1 8 lanes × 128 KiB; cross-lane diffusion via LANE_SHUFFLE permutation L' = (5L+1) mod 8 = [1,6,3,0,5,2,7,4]
Inner rounds per nonce 2 2048 outer × 2 inner = 4096
Memory bandwidth required trivial (fits in L1) 8 lanes × 64 B = ~512 B / attempt of working set, dominated by ~1 MB pad re-reads
Time per attempt on CPU (single thread, median) ~1.5 µs ~700 µs
ASIC speed-up upper bound vs 1-thread CPU ~107× (mature market, custom ALUs) bounded by on-die SRAM / HBM bandwidth; commercially typical 1-3 GB/s per chip ≈ ~103-103.5×, not 107×
Verifier wall-clock budget none (verification is microseconds) 50 ms per header (consensus.b3pow_verify_budget_ms)
Verifier cache not needed per-prev_block_hash LRU (b3pow::Cache), 4 entries by default (≈ 4 MB resident)

2. Why this matters

The headline number is the ASIC speed-up upper bound. A SHA-256d ASIC ekes out enormous gains because the algorithm fits inside a single 32-byte register file; the entire pipeline can be made combinational and pipelined to 1-3 GHz with O(104) parallel pipelines on one die.

A B3PoW-Scratch ASIC cannot do that: every nonce needs to read megabytes of memory in a data-dependent pattern, and the silicon real estate that would otherwise go to "more pipelines" instead has to be spent on more SRAM. The ASIC advantage caps near the memory-bandwidth-per-watt frontier, which the GPU/CPU market is already on.

3. Honesty caveats

  • The "≈ 1 000-3 000×" ceiling is an upper-bound estimate from analogous memory-bound PoW deployments (RandomX, Equihash 144/5). Real B3PoW-Scratch ASICs are not deployed at scale at the time of writing, so this number will be refined as data appears.
  • Memory-hardness is a defense in depth layer on top of the identity-hash isolation audited at H-1 — it does not make a weak PoW algorithm strong, it makes a strong PoW algorithm harder to ASIC.
  • Wall-clock budget enforcement (b3pow_verify_budget_ms) and the HEADERS verification cap (MAX_B3POW_VERIFY_PER_BATCH) are audited separately at H-1.1 and H-1.3; this page only covers the algorithmic comparison.
  • The primitive throughput cards (pow-throughput, block-validation) still measure the BLAKE3 round-function in isolation. They are useful because BLAKE3 is the inner function of B3PoW-Scratch, but they are not the chain-level number.

4. Source files