attest

"DONE" is a claim, not proof. Grade the act, not the output.

A local, deterministic, zero-LLM Claude Code hook that verifies a subagent's Status: DONE / ## Handoff claim against the real git working-tree delta — and, opt-in, blocks a DONE whose claimed files never actually landed on disk. It adds no tokens, cannot itself hallucinate, and fails open on every doubt.

The insight

Every eval and observability tool grades the output or asks the model "are you done?" — self-report, the one signal you cannot take on trust. The git tree is the only ground truth. So Attest verifies the act, not the output: it diffs the working tree before and after the subagent runs and checks whether the files a DONE claims it changed actually changed. Deterministic, read-only, and — because it never calls a model — it cannot fabricate its own verdict.

The real target is not a lying agent (well-trained agents resist that). It is the silent write-failure: a Write tool call that returns success but never lands on disk, behind a confident Status: DONE.

Evidence

Attest was built proof-first and validated against real Claude Code v2.1.170 — not mocks.

325 tests — 304 Python (unittest) + 21 BATS (16 in tests/hooks.bats, 5 in tests/install.bats). The Python suite runs green (Ran 304 tests … OK).
Real captured payloads ship in the repo. Four sanitized SubagentStart/SubagentStop fixtures plus a transcript sample live in fixtures/ and are pinned byte-for-byte by tests/test_real_fixtures.py — including the load-bearing safety case: an honest subagent that created nothing and explained why in prose (mentioning a path, a files_changed: line, and even the word DONE) from which the conservative parser correctly extracts zero claimed files.
A live empirical battery on real Claude Code. End-to-end claude -p dispatches confirmed the boundary cases: an honest agent that changed nothing was correctly not blocked; a multi-line Status: DONE with a real file was allowed and parsed; and a genuine false DONE was blocked — and the blocked subagent self-corrected.

Honesty about that last result: the self-correcting block is non-deterministic — well-trained agents resist fabricating claims, so the live "lie → block → fix" path can't be relied on to reproduce. The deterministic proof of blocking is the mechanism test (below) plus the unit suites (tests/test_hook.py, tests/test_enforce.py). Enforcement is off by default.

See docs/VALIDATION.md for the full evidence dossier, and scripts/live-capture-test.sh to re-run the capture harness yourself against your own Claude Code install.

How it works

Two hooks, three pure layers, one source of truth (git):

SubagentStart snapshots the git working tree — {path: sha256} for every file that differs from HEAD (modified / added / untracked / deleted).
SubagentStop recomputes the delta, parses the subagent's final claim (## Handoff block first, then an anchored Status: / Files changed: fallback — never scraped from prose), and evaluates whether it is a proven false DONE: status DONE, a claim actually present, and a claimed file that is absent from the delta and not present on disk.
In enforce mode only, a proven false DONE is blocked — Claude Code feeds the reason back and the same subagent is forced to continue and fix it.

The claim parser is conservative by construction: a missing or prose-only claim yields status=None and is never treated as a false DONE. A path mentioned in prose never becomes a claimed file.

What the report looks like

In detect mode (the default), the stop hook prints to stdout after every subagent completes:

attest: stop: <key>: CLAIMED [a.py] OBSERVED [a.py] -> OK [source=payload]
attest: stop: <key>: CLAIMED [a.py, b.py] OBSERVED [a.py] -> MISMATCH: b.py claimed-but-unchanged (would block in enforce mode) [source=payload]
attest: stop: <key>: CLAIMED [a.py] OBSERVED [a.py, c.py] -> SCOPE_CREEP: c.py observed-but-unclaimed [source=payload]
attest: stop: <key>: claim source=none — cannot verify (never treating as false DONE)

<key> is the agent identifier; source=payload or source=transcript shows where the claim was read from. In enforce mode (ATTEST_ENFORCE=1) the human-readable lines move to stderr and the (would block in enforce mode) cases become real blocks.

Attest — completion-attestation gate

Popularity

What's Inside

README

attest

The insight

Evidence

How it works

What the report looks like

Install

Confidence

Similar Plugins

claude-mem

caveman

llm-council-plugin

self-improving-agent

antigravity-bundle-web-designer

ecc

More by ek33450505

CAST — Claude Agent Specialist Team

misfire — trace-grounded CLAUDE.md adherence auditor

looptrip — multi-agent coordination-pathology detector