b3chain — B3PoW-Scratch v1.1.1

What B3PoW-Scratch is, in one paragraph

A 256-bit proof-of-work function over an 80-byte Bitcoin-style block header. The hash is computed by initialising a 1 MB scratchpad deterministically from the parent block hash (a one-time cost amortised across many nonces), then running 2 048 sequential read–modify–write iterations across 8 parallel lanes (each lane owns 128 KB of the scratchpad), and finally hashing the per-lane state plus the nonce into a 32-byte output. The mixing primitive is a 2-round reduced BLAKE3 compress; the final compression is the standard 7-round BLAKE3. See SPEC.md §5–§7 for the byte-level definition.

What it is — and is not

Is: a memory-hard, data-dependent PoW with a 1 MB on-chip working set, designed so the most economical implementation is a single-chip FPGA with on-chip block RAM (B3Miner-1 reference card, ≈ 10 W, < $300 BoM target). Reference implementations in Python, C++, TypeScript and SystemVerilog all ship in-tree and are byte-equivalent.

Is not: "ASIC-proof", "perfectly decentralized", or permanently resistant to specialized hardware. No PoW is. The design goal is to shift the economic frontier toward small, low-power FPGA cards and away from the SHA-256d ASIC monoculture. Custom B3PoW ASICs are possible; the economic-rationality analysis is published openly.

Reference implementations (all four must stay byte-equivalent)

Python (canonical)

contrib/miner/b3miner-rtl/ref/b3pow_ref.py

C++ (consensus)

src/crypto/b3pow_scratch.cpp

TypeScript (pool, browser)

contrib/testnet/pool/src/lib/b3pow-scratch.ts

SystemVerilog (FPGA)

contrib/miner/b3miner-rtl/src/

Vectors (CI gate)

src/test/data/b3pow_consensus_vectors.json

Throughput by backend

Preliminary. The numbers below are the first authoritative bench run (results/r0/). The CPU rows are real measurements on the bench machine documented in HARDWARE.md. The FPGA row is a synthetic placeholder matching the algebraic estimate in FPGA-FEASIBILITY.md; it will be replaced by a real B3Miner-1 telemetry row at bring-up. See methodology.md for what was held constant and how to reproduce.

Full-PoW throughput per backend (log scale). Python reference is single-digit H/s/core by design; C++ consensus impl is ~30× faster; FPGA target is ~10⁶× faster again.

Verifier latency (single-block)

The most operationally-important number: how long a full node spends checking the PoW of one inbound block. Spec target (SPEC §8.E): p95 < 50 ms on a modern x86_64 core, so PoW verification stays well under 1 % of the 10-minute block interval.

Verifier latency at p50 / p95 / p99 across a deterministic 10 000-header corpus (seed 0xB3110002). The Python reference row shown here is the correctness floor, not the C++ verifier; that bench populates on first bench-b3pow-cpp run.

Raw results & reproducibility

Every CSV row that produced these charts is committed in-tree:

results/r0/ — canonical first run (CPU + verify + FPGA-dry)
methodology.md — what we measure, what we hold constant, honesty notes
contrib/testing/bench/ — full bench suite (cpu, cpp, fpga, verify) + chart generator

The dry-run FPGA row is intentionally distinguished so charts can colour it differently; live FPGA numbers populate when the B3Miner-1 card reaches bench bring-up. We commit empty (header-only) CSVs for benches without measured rows yet rather than make up numbers.

Quick links

Bench corner

Run

r0 (first authoritative)

Corpus seed

0xB3110002

Verifier target

p95 < 50 ms (C++ impl)

FPGA estimate

~8 MH/s @ 200 MHz, 8 lanes

Numbers refresh whenever .github/workflows/benchmark.yml completes a successful run on b3chain-main.

Honesty notes

We publish slow Python rows on purpose; they are the correctness floor, not a competitive miner.
Dry-run FPGA rows are clearly marked; charts colour them differently and the methodology documents exactly when they apply.
J/hash columns are NaN unless an out-of-band wall-power reading is wired in.