B3PoW-Scratch v1.1.1

The proof-of-work algorithm that runs B3Chain. Memory-hard, BLAKE3-based, designed to be FPGA-economical and GPU-hostile — not ASIC-proof.

Scratchpad: 1 MB on-chip Lanes: 8 × 128 KB Iterations: 2048 per hash Primitive: BLAKE3 (2-round + full)

Spec: contrib/miner/b3miner-rtl/SPEC.md · Whitepaper: doc/whitepaper/B3POW-SCRATCH-WHITEPAPER.md

Version & activation. The current consensus version is B3PoW-Scratch v1.1.1 (SPEC_VERSION = 0x00010101; FPGA REG_ID magic 0xB3110002). The F-1 fix made ITER_MUL[7] pairwise distinct from ITER_MUL[1] — closing a lane-degeneracy weakness flagged in B3POW-51-ATTACK-ANALYSIS.md. v1.1.1 has been live from genesis on every b3chain network (mainnet, testnet, testnet4, regtest); there is no pre-F-1 chain history to migrate.

What B3PoW-Scratch is, in one paragraph

A 256-bit proof-of-work function over an 80-byte Bitcoin-style block header. The hash is computed by initialising a 1 MB scratchpad deterministically from the parent block hash (a one-time cost amortised across many nonces), then running 2 048 sequential read–modify–write iterations across 8 parallel lanes (each lane owns 128 KB of the scratchpad), and finally hashing the per-lane state plus the nonce into a 32-byte output. The mixing primitive is a 2-round reduced BLAKE3 compress; the final compression is the standard 7-round BLAKE3. See SPEC.md §5–§7 for the byte-level definition.

What it is — and is not

Is: a memory-hard, data-dependent PoW with a 1 MB on-chip working set, designed so the most economical implementation is a single-chip FPGA with on-chip block RAM (B3Miner-1 reference card, ≈ 10 W, < $300 BoM target). Reference implementations in Python, C++, TypeScript and SystemVerilog all ship in-tree and are byte-equivalent.

Is not: "ASIC-proof", "perfectly decentralized", or permanently resistant to specialized hardware. No PoW is. The design goal is to shift the economic frontier toward small, low-power FPGA cards and away from the SHA-256d ASIC monoculture. Custom B3PoW ASICs are possible; the economic-rationality analysis is published openly.

Reference implementations (all four must stay byte-equivalent)

Throughput by backend

Preliminary. The numbers below are the first authoritative bench run (results/r0/). The CPU rows are real measurements on the bench machine documented in HARDWARE.md. The FPGA row is a synthetic placeholder matching the algebraic estimate in FPGA-FEASIBILITY.md; it will be replaced by a real B3Miner-1 telemetry row at bring-up. See methodology.md for what was held constant and how to reproduce.
hashrate-by-backend (open SVG)
Full-PoW throughput per backend (log scale). Python reference is single-digit H/s/core by design; C++ consensus impl is ~30× faster; FPGA target is ~106× faster again.

Verifier latency (single-block)

The most operationally-important number: how long a full node spends checking the PoW of one inbound block. Spec target (SPEC §8.E): p95 < 50 ms on a modern x86_64 core, so PoW verification stays well under 1 % of the 10-minute block interval.

verify-latency (open SVG)
Verifier latency at p50 / p95 / p99 across a deterministic 10 000-header corpus (seed 0xB3110002). The Python reference row shown here is the correctness floor, not the C++ verifier; that bench populates on first bench-b3pow-cpp run.

Raw results & reproducibility

Every CSV row that produced these charts is committed in-tree:

The dry-run FPGA row is intentionally distinguished so charts can colour it differently; live FPGA numbers populate when the B3Miner-1 card reaches bench bring-up. We commit empty (header-only) CSVs for benches without measured rows yet rather than make up numbers.