From ax
Guided experiment-loop retrospective over the ax agent-experience graph. Walks through open proposals, pending verdicts, and harness-hook effectiveness. Invoke via "ax retro" or "/ax:retro".
How this skill is triggered — by the user, by Claude, or both
Slash command
/ax:retroThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Closes the self-improvement loop. Claude orchestrates `ax improve …`
Closes the self-improvement loop. Claude orchestrates ax improve …
commands; the user decides each row.
Assumes ax (axctl) is on PATH and the local SurrealDB is running. If
ax improve list fails with a connection error, tell the user
scripts/db-start.sh and stop.
ONLY fire on explicit triggers:
/ax:retro slash command (if the plugin marketplace publishes one)Do NOT fire on a generic "look at my recent work" - that risks dragging unrelated context into the loop.
Before the proposal queue, check whether prior sessions still owe a retro. This is the "quota arbitrage" path - idle Opus budget chews through the backlog so the experiment loop has signal next time.
Run:
ax retro pending --since=7 --idle-min=30 --json
Returns sessions in the last 7 days that have no reviewed graph
edge yet AND look finished (explicit ended_at, or last turn is
30min idle). If the list is empty, skip to Step 1.
Show the list to the user as 1 line per session (project · turns · model · reason). Ask:
N session(s) pending retro. Want me to dispatch the retro-reviewer subagent for all of them in parallel, or pick a subset?
On all or <subset>: for each chosen session, write a brief:
ax retro brief --session=<session_id>
This writes .ax/tasks/retro/<key>.md with frontmatter (transcript
path, suggested model, turn count, etc.) and a body that tells the
reviewer what to do.
Dispatch one retro-reviewer subagent per brief, in parallel. Pass
each brief path in the prompt; let the subagent's frontmatter pin
model: opus (override per session if suggested_model differs and
the user asked you to economize).
Wait for all subagents. Aggregate results: counts of retros emitted, proposals recommended, model-fit suggestions. Render as a short summary. The user does not approve retro emissions per row - the subagent already wrote them. The user DOES decide on resulting proposals in Step 2.
The reviewed edge now exists for each drained session, so a
re-run of ax retro pending should show fewer rows.
If the user declines Step 0, move on. The backlog stays - next retro picks it up.
Run silently (parallel where possible):
ax improve list --status=open --json
ax improve list --status=accepted --json
ax improve verdict --json
ax retro list --since=7 --json # cluster-derived friction summary
ax hooks summary --since=7 --tail=20 # optional; tolerate failure
ax retro list reflects three pattern types now:
Pre-<Tool> guard proposalsCLAUDE.mdAddress recurring <kind> friction proposalsIf any of those surfaced, mention them so the user knows to triage in Step 2.
Compute counts: open proposals (by form), accepted experiments with
locked_verdict IS NONE, checkpoints due since last lock. Then render
to the user as 2-4 lines, e.g.:
7 open proposals (3 skill, 4 guidance). 2 accepted experiments are waiting on a verdict. Hook activity last 7d: 142 invocations, 3 blocking errors. Want to triage proposals first, lock the pending verdicts, or skim hook signals?
If both proposal/verdict queues are empty: tell the user nothing's due
and offer ax ingest --derive-only to refresh evidence.
Order open proposals by frequency desc. For each, in turn:
Run ax improve show <dedupe_sig> --json (or reuse the row from
step 1).
Render as 3-5 lines. Example for a skill proposal:
Schema change guardrail (skill · freq=9 · confidence=high) Hypothesis: schema edits often surface in fix-chains within ~14d. Trigger: fix commits overlap SurrealDB schema files. Behavior: run schema lint + one read/write smoke before edit.
Ask the user: accept, reject, or skip.
Branch:
ax improve accept <dedupe_sig>.
Tell the user where the SKILL.md was scaffolded.
Offer: "Want to refine the scaffolded SKILL.md right now?"
If yes: read the file, propose edits, write them back.ax improve reject <dedupe_sig> --reason "<reason>".After the loop, summarize: "Accepted 3, rejected 1, skipped 2."
For each experiment whose latest checkpoint is unlocked
(locked_verdict IS NONE), in age order:
Run ax improve verdict <dedupe_sig> to fetch the experiment +
checkpoint history.
Render the most recent checkpoint as 2-3 lines:
Schema change guardrail - t+30 checkpoint 12 opportunities in window, 8 addressed (66%). Suggested: adopted.
Ask the user to confirm the suggested verdict OR override:
adopted (artifact is doing real work)ignored (user wrote it but never invoked it)regressed (it made things worse)partial (mixed signal)no_longer_needed (pattern self-resolved; trigger stopped firing)Run ax improve verdict <dedupe_sig> --set <verdict> to lock it.
Only run if the user asked for hook review OR if step-1 found ≥3 blocking errors. Light touch - this section is read-only.
Show top hooks from ax hooks summary --since=7 --tail=20 if not
already shown.
If a hook keeps blocking, ask: "Want to inspect a recent
invocation?" Then run
ax hooks invocations --command="<hook>" --tail=5 and render.
Backtest known feedback cases:
ax hooks cases enforce-worktree --tail=50 --window=3
Treat each backtest result as one case type. Report pass/fail/ inconclusive counts.
Interpretation:
hook_progress without a terminal success/blocking event is a
telemetry gap unless correlated with visible behavior.ax hooks init, write a defineHook hook in ~/.ax/hooks/, ax hooks backtest it against history, then ax hooks install --providers=claude,codex.Output a one-paragraph summary:
experiment.created_at + 7d among accepted-but-unlocked
experiments, formatted as "next retro suggested around YYYY-MM-DD".Then ask whether the user wants to commit the scaffolded skill files + proposal-status changes (DB is local, but SKILL.md files are on disk and may belong in version control).
The retro itself produces durable signal that the experiment loop already captures:
Acceptance rate by form - after the session, derive from
proposal.status. If skill-form gets accepted 80% but guidance gets
rejected 80%, the derive-proposals stage is over-eager on the wrong
form. Surface as an observation.
Reject reasons - proposal.reject_reason is a free-text corpus.
After the session run:
ax improve list --status=rejected --json | jq '.[].reject_reason'
Look for repeated phrases ("duplicate of existing hook"). When a pattern emerges, the derive-proposals stage should dedupe against it
Verdict surprises - when the user overrides a suggested verdict, note it. Repeated overrides mean the verdict math is biased.
These are observations, not actions. Report in the close-out; don't write to insight tables.
ax improve list [--form=skill|subagent|hook|guidance|automation] \
[--status=open|accepted|rejected|superseded|all] [--json]
ax improve show <dedupe_sig> [--json]
ax improve accept <dedupe_sig> [--force]
ax improve reject <dedupe_sig> --reason "<text>"
ax improve verdict [<dedupe_sig>] [--set <verdict>] [--json]
ax improve checkpoint [--force]
ax improve reset --yes # destructive; only when user requests
ax retro pending [--since=N] [--idle-min=N] [--json] # Step 0 backlog
ax retro brief --session=<id> [--out-dir=<path>] [--json]
ax retro emit --session=<id> [--source=<src>] [--from-file=<json>]
ax retro list [--since=N] [--limit=N] [--json]
ax hooks summary [--since=N] [--tail=N]
ax hooks invocations [--command="<name>"] [--tail=N]
ax hooks cases <case-name> [--tail=N] [--window=N]
--force on accept overwrites an existing SKILL.md scaffold. Only use
when the user explicitly says so.
reset --yes wipes ALL proposal/experiment/checkpoint state. NEVER run
without explicit user confirmation in this session.
ax improve list returns empty → run ax ingest --derive-only once,
retry. If still empty, evidence is genuinely thin; tell the user.ax improve accept reports scaffold_exists → ask the user if they
want --force or to abandon.ax improve verdict --set reports verdict_locked → that experiment
is already finalized; show the locked value and move on.ax hooks summary returns nothing → retry with --since=30; if
still empty, the hook telemetry pipeline is idle, surface as a TODO.scripts/db-start.sh.ax improve accept for every open proposal in a batch; the
user must say yes per row.~/.claude/skills/ directly. The CLI handles that.npx claudepluginhub necmttn/axPerforms a deep meta-retrospective across past retros and current setup to surface improvements missed by heuristic analysis. Use when proposals are sparse or you want broader exploration.
Runs agile retrospectives on Claude sessions, reflecting on what worked/didn't, and drives actionable improvements with file changes and archiving.
Runs context-aware retrospectives auto-gathering git metrics, learnings, away-logs, and handoffs into pre-populated tables for interactive or agent-summary review.