JouleProof · a Transaction Science pillar

The energy-optimized formal prover.

AI theorem-proving has one recipe — a model writes proofs in Lean, a verifier checks every step, and the loop trains on what compiles. The axis no one optimizes is the only one that makes verifying real code affordable: joules per verified proof. JouleProof reports the joules next to the pass rate, and sends each obligation to the cheapest tier that can close it.

T0–T3

Solver tiers

Grader (the verifier)

Correctness traded

same verified output · an order of magnitude less energy

The bet

Make verification cheap, not just capable.

The technique is not the gate. The recipe — a prover model, a formal language, a verifier as the grader, autoformalized data, and search — is published and increasingly open. What is gated is energy: the flywheel costs millions of tokens per hard proof, and "verify every line of generated code" is impossible at that price.

So energy efficiency is not an optimization on top of the market — it is the precondition for the market. JouleProof attacks the actual cost center and reports the result as a first-class number. Correctness is never on the table: the verifier is the grader, so a proof is machine-checked or it is not recorded. JouleProof only changes which tier pays for the proof.

The ladder

The cheapest sufficient solver wins.

Every proof obligation goes to the minimum viable tool and escalates only on failure. The joules spent on a cheaper tier that ran and failed before the winner are recorded as the disclosed cost of escalation — never hidden.

Automation

Lean tactics

Lean's built-in automation — simp, omega, ring, aesop, polyrith. Deterministic, CPU, effectively free. A large fraction of the subgoals in any real proof close here, and a frontier forward pass is never spent on what aesop discharges instantly.

SMT

decision procedures

Z3 / CVC5 for the decidable fragments they cover — orders of magnitude cheaper than model search where it applies.

Small model

distilled prover

A small, in-domain distilled prover (1–7B / few active params) for the steps that genuinely need learned search. Energy scales with active params × tokens, so this is far cheaper per proof than a frontier model.

Frontier model

last resort

A frontier reasoning model — the last resort, for high-level strategy and the genuinely hard leaves only. The most expensive rung, so the router sends as little here as it can.

The flywheel

The data flywheel becomes an energy flywheel.

Verified proofs are content-addressed and memoized — premise selection over the library plus an accumulating proof database. An identical goal is reused for zero new joules, and the more that has been proved, the cheaper every future proof gets. The cache is durable, because the asset that compounds is the corpus of verified proofs, not the routing trick.

Efficiency, on its own, diffuses. What lasts is the accumulated verified data and the integration around it — so the cache is built to persist.

The receipt

A verified proof that carries its own energy.

Each proof seals as a signed receipt: the machine-checked artifact, the tier that closed it, the joules it cost, and the energy a frontier-only run would have spent. A verified result that travels with a verifiable record of what it cost — the measurable claim, signed.

proved · closed at T0:automation · 50 uj
baseline (frontier-only): 50,000,000 uj
spec: Ungrounded   (proof attested, statement not validated against intent)
signature: verified ✓

Honesty

What's recorded, not glossed.

Three things are first-class in the receipt rather than smoothed over in the framing.

Proof ≠ spec

A machine can prove a formal goal perfectly while that goal fails to encode what anyone meant — a verified proof of the wrong specification is worthless. Every goal carries a spec-grounding (Ungrounded / HumanReviewed / CrossChecked), and the prover never upgrades it on its own. A receipt attests the proof; Ungrounded says, in the artifact itself, that the statement was not validated against intent.

The multiple shrinks with difficulty

Routing wins the easy cases, but the few goals that reach the frontier model dominate the joules — so the energy multiple is smaller on harder work. The receipt reports energy per closing tier, so that degradation is visible, never averaged away.

Energy is reported at all

Per-proof energy is something the field leaves out — it is inferable only from device-days. A signed, per-proof joule receipt, with honest provenance (Measured / ModelBased / Estimator), is itself the contribution.

JouleProof claims no secret technique. It claims the discipline the field skipped: attack the real cost center, and report it.

The standard

An open standard underneath.

The vendor-neutral core is published as proof — a Transaction Science open standard, "a carrier for verified work." It fixes the content-addressed proof obligation, the cheapest-sufficient cascade with disclosed escalation, the honest energy-provenance model, and the signed receipt — Apache-2.0 for the reference code, CC-BY-4.0 for the specification. The protocol is ownerless: any prover can emit a proof receipt, and any auditor can verify it with the family's shared JCR-1 tooling.

JouleProof is the steward and the services layer over that standard — not its proprietor. What endures is not the format, which diffuses by design; it is the accumulating corpus of verified proofs and the integration around it.

proof/1 receipt profileJCR-1 interoperableApache-2.0 / CC-BY-4.0C1–C5 conformance

Read the proof standard

Reference

The router, the cache, the receipt — real and tested.

The orchestration is the wedge: routing trivial goals away from the model is the immediate, large energy cut, and the harness measures it. The bottom two rungs are real, with energy measured from execution — an in-process propositional decider at T0, and a resident Z3 process at T1 (assert the negation; unsat means proved). The model rungs sit behind the same trait until their backends are wired. Measuring rather than asserting is the point: it was the measurement that exposed Z3's per-spawn startup cost and argued for keeping the solver resident.

jouleproof-core

Content-addressed goals, the tier ladder, the honest joule (Measured / ModelBased / Estimator), the verdict, and the Solver trait every backend implements.

jouleproof-router

The escalation engine, the content-addressed proof cache (durable), the signed proof receipt, and batch energy accounting against a frontier-only baseline.

jouleproof-solvers

Two real rungs with measured energy — an in-process propositional decider (T0) and a resident Z3 process (T1) — over model-tier stand-ins, all behind one trait. Each refuses rather than fakes, so the router escalates.

jouleproof-cli

A headless run of a proof workload reporting joules-per-proof, the tier each goal closed at, and the energy a frontier-only run would have spent.