attest

"DONE" is a claim, not proof. Grade the act, not the output.
A local, deterministic, zero-LLM Claude Code hook that verifies a subagent's
Status: DONE / ## Handoff claim against the real git working-tree delta —
and, opt-in, blocks a DONE whose claimed files never actually landed on disk.
It adds no tokens, cannot itself hallucinate, and fails open on every doubt.
The insight
Every eval and observability tool grades the output or asks the model "are you done?" —
self-report, the one signal you cannot take on trust. The git tree is the only ground
truth. So Attest verifies the act, not the output: it diffs the working tree before and
after the subagent runs and checks whether the files a DONE claims it changed actually
changed. Deterministic, read-only, and — because it never calls a model — it cannot
fabricate its own verdict.
The real target is not a lying agent (well-trained agents resist that). It is the
silent write-failure: a Write tool call that returns success but never lands on disk,
behind a confident Status: DONE.
Evidence
Attest was built proof-first and validated against real Claude Code v2.1.170 — not mocks.
- 325 tests — 304 Python (
unittest) + 21 BATS (16 in tests/hooks.bats, 5 in
tests/install.bats). The Python suite runs green (Ran 304 tests … OK).
- Real captured payloads ship in the repo. Four sanitized
SubagentStart/SubagentStop
fixtures plus a transcript sample live in fixtures/ and are pinned byte-for-byte
by tests/test_real_fixtures.py — including the load-bearing safety case: an honest subagent
that created nothing and explained why in prose (mentioning a path, a files_changed: line, and
even the word DONE) from which the conservative parser correctly extracts zero claimed files.
- A live empirical battery on real Claude Code. End-to-end
claude -p dispatches confirmed
the boundary cases: an honest agent that changed nothing was correctly not blocked; a
multi-line Status: DONE with a real file was allowed and parsed; and a genuine false DONE
was blocked — and the blocked subagent self-corrected.
Honesty about that last result: the self-correcting block is non-deterministic —
well-trained agents resist fabricating claims, so the live "lie → block → fix" path can't be
relied on to reproduce. The deterministic proof of blocking is the mechanism test
(below) plus the unit suites (tests/test_hook.py, tests/test_enforce.py). Enforcement is
off by default.
See docs/VALIDATION.md for the full evidence dossier, and
scripts/live-capture-test.sh to re-run the capture
harness yourself against your own Claude Code install.
How it works
Two hooks, three pure layers, one source of truth (git):
SubagentStart snapshots the git working tree — {path: sha256} for every file that
differs from HEAD (modified / added / untracked / deleted).
SubagentStop recomputes the delta, parses the subagent's final claim
(## Handoff block first, then an anchored Status: / Files changed: fallback — never
scraped from prose), and evaluates whether it is a proven false DONE: status DONE,
a claim actually present, and a claimed file that is absent from the delta and not
present on disk.
- In enforce mode only, a proven false
DONE is blocked — Claude Code feeds the reason
back and the same subagent is forced to continue and fix it.
The claim parser is conservative by construction: a missing or prose-only claim yields
status=None and is never treated as a false DONE. A path mentioned in prose never
becomes a claimed file.
What the report looks like
In detect mode (the default), the stop hook prints to stdout after every subagent completes:
attest: stop: <key>: CLAIMED [a.py] OBSERVED [a.py] -> OK [source=payload]
attest: stop: <key>: CLAIMED [a.py, b.py] OBSERVED [a.py] -> MISMATCH: b.py claimed-but-unchanged (would block in enforce mode) [source=payload]
attest: stop: <key>: CLAIMED [a.py] OBSERVED [a.py, c.py] -> SCOPE_CREEP: c.py observed-but-unclaimed [source=payload]
attest: stop: <key>: claim source=none — cannot verify (never treating as false DONE)
<key> is the agent identifier; source=payload or source=transcript shows where the
claim was read from. In enforce mode (ATTEST_ENFORCE=1) the human-readable lines move to
stderr and the (would block in enforce mode) cases become real blocks.
Install