From claude-mods
Designs and orchestrates scheduled, multi-step agent loops (discover→triage→verify→land/escalate) with risk tiers, budget tracking, and a kill switch. Composes inner loops and native scheduling.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-mods:loop-opsWhen to use
Use when designing or running a recurring/scheduled agent loop rather than a one-shot task — e.g. 'set up a loop that triages PRs every 10 minutes', 'design an autonomous CI-failure recovery loop', 'how risky is this loop / is it ready to run unattended', 'estimate what this loop costs per month', 'build a loop-engineering setup'. For a single-session improvement loop against one metric, use iterate instead.
This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
**A loop is not a prompt.** Turn-by-turn prompting puts you in the loop forever. *Loop
assets/STATE.template.mdassets/examples/pr-watch/STATE.mdassets/examples/pr-watch/github-actions.ymlassets/examples/pr-watch/loop-run.shassets/examples/pr-watch/loop.config.yamlassets/examples/pr-watch/run-log.mdassets/examples/pr-watch/run.mdassets/loop.config.template.yamlassets/model-pricing.jsonassets/run.sh.templateassets/run.template.mdreferences/claude-code-loops.mdreferences/failure-modes.mdreferences/pattern-catalog.mdreferences/risk-tiers.mdreferences/state-spine.mdscripts/check-pricing-sync.pyscripts/loop-check.shscripts/loop-doctor.shscripts/loop-estimate.pyA loop is not a prompt. Turn-by-turn prompting puts you in the loop forever. Loop engineering inverts it: you design a recurring process with memory, verification, and boundaries that discovers work, hands it to agents, verifies the result, and decides — on a schedule or until a goal is met — whether to land it or escalate to a human.
"You shouldn't be prompting coding agents anymore. You should be designing the loops that prompt your agents." — Peter Steinberger
This skill is the outer loop: the orchestration layer above a single agent run. It
is the twin of iterate — iterate is the inner loop (one
metric, one session, git-as-memory); loop-ops is the design discipline for the loop
that schedules and gates inner runs. It does not reimplement spawning or landing; it
composes what this repo already ships.
Every durable loop rests on six primitives. The discipline is wiring them; the parts already exist:
| Primitive | What it is | Owned in claude-mods by |
|---|---|---|
| Schedule | fire the loop on a cadence | native-first: /loop (in-session), Desktop scheduled task (local, unattended), /schedule cloud routines (no local files); /goal is the native completion gate. External (cron/Task Scheduler + loop-run.sh) only for non-Claude-Code control |
| Worktree | isolated, discardable execution context | git-ops worktrees, fleet-worker (per-task worktree) |
| Skills | persistent project knowledge the run loads | this repo's skill layer + your CLAUDE.md |
| Sub-agents | maker/checker separation | Agent/Task; dispatching skills (review, testgen) |
| Connectors | reach tickets / CI / chat | MCP tools, gh, github-ops |
| + State | a durable spine outside the conversation | STATE.md + run-log + budget (this skill) |
The inner improvement loop is iterate; cheap parallel makers are fleet-worker; the
test-gated merge queue is fleet-ops; inter-loop signalling is pigeon. loop-ops is
the doctrine that connects them.
┌──────────────────────────────────────────────────────────────┐
│ SCHEDULE (cadence) │
│ └─▶ TRIAGE read STATE.md → pick the next unit of work │
│ └─▶ WORKTREE isolate (git worktree) │
│ └─▶ MAKER implementer run (or fleet-worker)│
│ └─▶ CHECKER verify gate + guard (tests) │
│ └─▶ GATE safe & allowlisted? │
│ ├─ yes → LAND (commit/PR) │
│ └─ no → ESCALATE (+context) │
│ └─▶ write STATE.md, append run-log, decrement budget ──────┘
The gate is the load-bearing decision. Everything before it is mechanical; the gate is where a loop earns the right to run unattended — or doesn't.
Never start a loop unattended. Graduate it. Each tier maps to a concrete Claude Code permission mode — full mapping, the headless-profile table, and the enumerate vs isolate fork in references/risk-tiers.md.
| Tier | Posture | Permission mode | May do | Lands by |
|---|---|---|---|---|
| L1 Report | read-only discovery + triage | plan / dontAsk+read allowlist | scan, summarize, propose — writes nothing | a human reads the report |
| L2 Assisted | suggest changes, human gates the merge | dontAsk+narrow allowlist, or auto | edit in a worktree, run tests, open a PR | a human approves the PR (or fleet-ops) |
| L3 Unattended | autonomous land within a denylist | bypassPermissions in an isolated container only | commit/merge allowlisted classes | the loop itself, inside its boundary |
The cardinal rule, straight from Claude Code's own gate model: an unattended loop is a
scheduler/script that invokes claude -p, not a Claude session that spawns ungated
children. A session in auto mode that tries to launch a --permission-mode bypassPermissions child is blocked as Create Unsafe Agents — by design. See
references/risk-tiers.md and the repo's
auto-mode-classifier reference.
What a loop may land vs what it must escalate is not a vibe — it mirrors Claude
Code's classifier tiers. Bake these into the config's escalation: field:
main, production deploys
or migrations, mass deletion, granting IAM/repo permissions, anything destroying
pre-session files, editing .claude//settings (self-modification), curl | bash.gh pr merge out and land_via: fleet-ops in. Full
connector/MCP-scope discipline + the auto-merge guard: references/risk-tiers.md.A loop's memory lives outside the conversation, in three files (schemas + read/write contract in references/state-spine.md):
STATE.md — the triage snapshot: priority / watch / noise + a readiness line.
Read at the top of every run, rewritten at the end.run-log.md — append one line per run (timestamp, action, outcome, tokens). The
audit trail that answers "what has this loop been doing?"loop.config.yaml — the loop's definition (goal, tier, cadence, scope, gate,
budget, escalation). Scaffolded by loop-scaffold, scored by loop-check.Patterns are compositions of three axes — trigger (cadence / event via a Channel
/ goal) × posture (L1/L2/L3) × locus (connector→cloud routine / local→Desktop task).
The named patterns are well-trodden points in that space; compose your own from the axes.
Full recipes + the morphology in references/pattern-catalog.md:
| Pattern | Trigger · Locus | Tier | One-line job |
|---|---|---|---|
daily-scan | cadence · local | L1 | discover + prioritize, report only |
pr-watch | event|cadence · connector | L1 | watch review state, surface stuck PRs |
ci-watch | event · local | L2 | triage build failures, propose a fix |
dep-bump | cadence · local | L2 | patch-only bumps behind cooldown + guard |
changelog-gen | event(tag)|cadence · local | L1 | draft release notes for approval |
merge-hygiene | cadence · local | L1 | dead branches, stale flags |
issue-sort | cadence · connector | L1 | classify + label, propose only |
metric-chase | goal · local | L2 | drive a metric (coverage/latency/eval) via iterate |
regression-watch | cadence|event · local | L1 | run a benchmark/eval, flag a regression |
digest | cadence · connector | L1 | summarize email/Asana/news (cloud routine) |
backfill | goal · local | L2 | drain a migration/queue to completion |
monitor | event · local | L1 | error/deploy webhook → triage + page |
freshness | cadence · local | L1 | re-check docs/data/deps vs reality |
Start any pattern at L1. Graduate to L2 only after the L1 reports prove its judgment.
Prefer event over cadence where a webhook exists (cheaper, faster than polling).
Running several loops? Two non-negotiables (detail in references/state-spine.md):
CI Watch → PR Watch → Dependency Bump → Merge-Hygiene/Changelog → Daily Scan (off-peak). A higher-priority loop's
worktree wins; lowers defer. Loops signal each other via pigeon.PAUSED sentinel file
or a loop-pause label — that every loop checks at the top of its run and exits on.
No loop ships without one. Put it in kill_switch: and check it first.| You need to… | Use | Not |
|---|---|---|
| improve one metric in one session | iterate | a hand-rolled inner loop |
| spawn cheap parallel makers | fleet-worker | bespoke claude -p plumbing |
| test-gate + land winning branches | fleet-ops | a manual merge step |
| fire on a cadence | native /loop · Desktop scheduled task · /schedule cloud routine; /goal for completion | a custom cron in this skill |
| commit / PR / release | git-ops, github-ops | raw git push |
| signal between loops | pigeon | a shared scratch file |
loop-ops is the design layer; these are the execution layers.
Five scripts, all following the Skill Resource Protocol
(stdout = data, semantic exit codes, --help with EXAMPLES, --json envelopes): init
scaffolds the loop, audit scores whether the config is well-formed, doctor
preflights whether it will actually run, cost estimates spend (caching-aware), and
check-pricing-sync gates pricing drift in CI. The discipline before scheduling is
init → fill → cost → audit → doctor --live.
scripts/loop-scaffold.sh — scaffold a loop's state spineWrites <dir>/<name>/ with five files from the bundled templates:
loop.config.yaml (assets/loop.config.template.yaml),
STATE.md (assets/STATE.template.md), run-log.md, run.md
(the headless run prompt, assets/run.template.md), and an
executable loop-run.sh (assets/run.sh.template) — the
runner-agnostic tick wrapper any scheduler invokes (cron / Windows Task Scheduler /
systemd / by hand), no GitHub Actions required. Pass a known --pattern
(pr-watch, ci-watch, dep-bump, …) and the config is seeded with that
pattern's scope/goal/escalation — and, at L2+, its gate — so you get a near-ready config to
review, not blank placeholders (it audits clean immediately). Doctrine holds: it still
scaffolds at L1 by default with a graduation block.
# Create .loops/pr-watch/ with config + STATE.md + run-log.md + run.md from templates:
bash scripts/loop-scaffold.sh --name pr-watch --pattern pr-watch --tier L1
# Custom dir + cadence, preview without writing:
bash scripts/loop-scaffold.sh --name dep-bump --pattern dep-bump \
--tier L2 --cadence 1d --dir .loops --dry-run
Refuses to overwrite a populated <dir>/<name>/ (exit 5) unless --force. Atomic
writes. --dry-run prints what it would create and writes nothing. stdout = the created
config path.
scripts/loop-check.sh — readiness scorer (run before you schedule)The question this answers: is this loop safe to turn on at its declared tier? It scores
a loop.config.yaml against the readiness rubric — gate present, scope bounded,
escalation defined, guard + worktree at L2+, budget + kill switch set, permission mode
consistent with tier — and refuses a green light if any critical gap exists.
bash scripts/loop-check.sh .loops/pr-watch/loop.config.yaml # exit 0 ready, 10 not ready
bash scripts/loop-check.sh --json .loops/dep-bump/loop.config.yaml | jq '.data[] | select(.severity=="error")'
bash scripts/loop-check.sh --min 80 .loops/ci-watch/loop.config.yaml # raise the score bar
Exit 0 = ready (no errors, score ≥ --min), 10 = not ready (findings on stdout),
2 usage, 3 config not found, 4 config unparseable. --strict counts warnings
toward the not-ready signal.
scripts/loop-doctor.sh — live preflight (will it actually run?)loop-check proves the config is well-formed; loop-doctor proves the loop will
execute — catching the "blocked at 3am" failures audit can't see. --offline (CI-safe):
the budget fits a tick's estimated tokens, the permission mode is achievable (not
interactive), an L3 bypass declares an isolation boundary. --live adds runtime preflight:
the verify/guard gate's leading binary resolves on PATH, claude/git are present,
the kill-switch sentinel's parent dir exists.
bash scripts/loop-doctor.sh --offline .loops/pr-watch/loop.config.yaml # CI gate
bash scripts/loop-doctor.sh --live .loops/ci-watch/loop.config.yaml # before scheduling
bash scripts/loop-doctor.sh --live --json .loops/dep-bump/loop.config.yaml | jq '.data[] | select(.state=="bad")'
Exit 0 = will run, 10 = a check predicts a runtime failure (gate binary missing,
bypass on host without isolation, budget too small for a tick), 2 usage, 3 not found,
4 unparseable, 5 missing core dep. Run it after loop-check and before scheduling.
scripts/loop-estimate.py — token/$ estimate by pattern × cadence × model (caching-aware)Estimate spend before committing to a cadence — the cost of an outer loop is
runs/day × tokens/run × price, and sub-agents multiply it. It also models prompt
caching: a loop re-sends the same run.md+system prefix every tick (the Ralph
property), so the prefix should be cache-written once then read (~0.1×) — but only if the
tick interval fits the cache TTL. A loop slower than ~1h can't cache (the entry expires
between ticks); the estimator says so and recommends the TTL. Pricing reads from
assets/model-pricing.json (date-stamped; claude-api-ops
is the source of truth — run its check-model-table.py if you suspect drift).
python scripts/loop-estimate.py --pattern pr-watch --cadence 10m --model claude-haiku-4-5
python scripts/loop-estimate.py --pattern ci-watch --cadence 15m --model claude-sonnet-4-6 --days 30 --json
python scripts/loop-estimate.py --list-models # the pricing table + its as-of date
Exit 0 ok, 2 usage, 3 pricing file missing, 4 bad cadence/model. Output names
every assumption (runs/day, tokens/run, sub-agent multiplier) — it's an estimate, and it
says so.
scripts/check-pricing-sync.py — offline drift guard (CI)model-pricing.json is a copy of claude-api-ops's authoritative model table, and a copy
drifts silently. This offline verifier asserts every model in
assets/model-pricing.json matches claude-api-ops's "Current
Models" table (prices included). Both files are in-repo, so it's network-free and gates PR
CI via tests/check-resources.sh; live model-id drift is owned by claude-api-ops's
check-model-table.py.
python scripts/check-pricing-sync.py --offline # exit 0 in sync, 10 drift, 3 a file missing
custom). Start at L1.bash scripts/loop-scaffold.sh --name <n> --pattern <p> --tier L1.loop.config.yaml — the real goal, scope (bounded globs, never *),
verify gate, escalation rule, budget_tokens, kill_switch.python scripts/loop-estimate.py --pattern <p> --cadence <c> --model <m> —
sanity-check the monthly spend against the value.bash scripts/loop-check.sh .loops/<n>/loop.config.yaml — fix every
error before scheduling. Don't schedule a loop that fails its own audit.bash scripts/loop-doctor.sh --live .loops/<n>/loop.config.yaml — prove
it will actually run (gate binary on PATH, budget fits a tick). Audit = well-formed;
doctor = will-run.claude -p every ~270 s), not /loop
(which grows a session and chews tokens); fixed-criteria long task → /goal; quick
supervised polling → /loop. (L1 is read-only — it just writes STATE.md + a report.)fleet-ops landing) and re-audit at the higher tier.A complete, audit + doctor-clean L1 loop ships at
assets/examples/pr-watch/: a filled
loop.config.yaml, a populated STATE.md, the run.md run prompt, a sample
run-log.md, the runner-agnostic loop-run.sh (the tick wrapper, with the
kill-switch gate and dontAsk + allowlist baked in — point cron / Task Scheduler at it),
and an optional github-actions.yml for repos already on GitHub. Copy the dir, adjust
scope/cadence, run loop-check + loop-doctor --live, then wire loop-run.sh to your
scheduler. The other patterns don't ship as
static dirs that rot — loop-scaffold --pattern <name> generates the same, seeded and
gate-clean, for any pattern at any tier. CI runs loop-check + loop-doctor on this
example every build, so it can't drift out of validity.
The incident-shaped catalog — symptom → mechanism → the control that catches each — is references/failure-modes.md (runaway budget, the 3am-dead loop, cache-cold, force-push, ungated-child spawn, colliding loops, silent-stop, gate reward-hacking, …). The headline ones:
claude -p --permission-mode bypassPermissions
in a script to dodge the classifier is Auto-Mode Bypass — a hard_deny nothing
clears. If an outcome is blocked, authorize it (a narrow allow rule, or run the
scheduler outside the auto-mode session), never disguise it.auto mode is
the wrong place to launch the loop. The scheduler/cron/Task-Scheduler/CI runner that
invokes claude -p is the authorizer. See references/risk-tiers.md §"enumerate vs isolate".verify: is empty is not a loop, it's an unsupervised typer.
loop-check errors on it.scope: "*" means "may touch anything" — the audit rejects it./loop, /schedule, hooks, the scheduler pattern.npx claudepluginhub 0xdarkmatter/claude-mods --plugin claude-modsDesigns, configures, and hardens autonomous agent loops with verification gates, persistent state, and stop conditions. Helps scope, build, and debug self-iterating agents.
Find, compare, adapt, or design bounded AI-agent feedback loops with explicit checks, stop rules, guardrails, and handoffs. Use for recurring agent workflows, automation cadences, or iterative improvement processes.
Provides patterns for autonomous Claude Code loops: sequential pipelines, agentic REPLs, PR cycles, de-sloppify cleanups, and RFC-driven multi-agent DAGs. For continuous dev workflows without intervention.