From agent-loops
Iteratively refactors code or SQL queries under a hard correctness bound: cuts complexity or latency while tests or result sets remain unchanged. Each change is validated and kept only if it improves the metric.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-loops:optimize-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
An **evaluator-optimizer** loop with a **pluggable correctness gate + minimized metric**. The artifact
An evaluator-optimizer loop with a pluggable correctness gate + minimized metric. The artifact is some editable thing (a code module or a SQL query); the feedback signal is two-part: a bound gate that must pass (behaviour/results unchanged) and a bound metric that must drop (the cost you minimize). You apply one change, check the gate, measure the metric, and keep the change only if the gate passes AND the metric improves — otherwise you revert. Repeat until the metric stops improving or the budget runs out. Once the loop starts, do not pause for permission.
Two ready bindings ship in tools/ (both vendored, stdlib-only):
<gate_cmd> (the test suite) exits 0; metric: tools/metrics.py prints
complexity (primary), max_nesting, loc (lexicographic tie-breakers). Lower is better.hash from tools/bench.py matches the baseline; metric: the
same tool's median_ms. Lower is better.The gate is non-negotiable in both modes: a change that fails it is a regression, not an improvement.
Never edit the ground truth (the tests / tools/metrics.py in code mode, the database / tools/bench.py
in sql mode) — editing what measures you to move the number defeats the loop.
Use when there is a clear correctness bound to hold and a number to minimize: refactoring code that
has a passing test suite (cut complexity), or tuning a SQL query that has a fixed result-set (cut
latency). The default is the matching shipped tool; the escape hatch is to bind any <gate_cmd> that
exits 0 on pass and any <metric_cmd> that prints a single number to minimize (e.g. a linter's issue
count, or a non-SQLite engine's timing + result fingerprint). Not for adding features or fixing
bugs — those intend to change behaviour, which this loop is built to forbid.
Resolve bindings interactively. If loop.run.yaml exists in the working dir, load it, confirm the
values back in one line, and skip to the loop. Otherwise pick <mode> first (it selects the gate +
metric), then on Claude Code (the AskUserQuestion tool is available) infer a likely value for each
binding and present it as the recommended option; on other hosts ask each as a quoted plain-text
prompt. Then write loop.run.yaml and confirm the values before creating any other files.
<gate_cmd> and <metric_cmd> are the pluggable core: bind them per <mode> from the table. In sql
mode one bench command supplies both — its hash is the gate, its median_ms is the metric.
| binding | meaning | default | how to infer |
|---|---|---|---|
<mode> | code (refactor under test) or sql (query, fixed results) | — | the artifact's kind |
<editable_files> | the file(s) the loop may change | — | code: source files (not tests/configs/the tool); sql: the query file (+ optional indexes file) |
<gate_cmd> | the bound gate that must PASS, else revert | — | code: the test command (exits 0 on pass); sql: implicit — candidate hash must equal the baseline hash from the bench command |
<metric_cmd> | the bound metric printing a number to minimize | — | code: python3 <skill_dir>/tools/metrics.py <editable_files> (→ complexity, then max_nesting, loc); sql: python3 <skill_dir>/tools/bench.py --db <db> --query <query_file> --setup <indexes_file> --repeat 5 (→ median_ms, hash) |
<sandbox_root> | where snapshots + the ledger live | ./sandbox | — |
<budget> | max iterations (hard cap) | 8 | — |
<patience> | stop after N consecutive no-improvement iterations | 3 | — |
<skill_dir> is this skill's installed folder; substitute the real path when writing loop.run.yaml.
For non-default engines/languages, bind any <gate_cmd>/<metric_cmd> meeting the contract above.
Two worked configs: examples/refactor.run.yaml (code) and examples/sql.run.yaml (sql).
<budget>)Copy this checklist and tick items off:
<gate_cmd> (code) — if not green, stop (the loop needs a passing
gate to protect behaviour). Run <metric_cmd>; record the metric as the current best, and in
sql mode record the baseline hash as the correctness reference. Log the baseline row.<editable_files> to <sandbox_root>/iter<N>/ so the iteration reverts.<gate_cmd>; sql — read the candidate's hash from <metric_cmd>.hash ≠ baseline / the tool errored), discard: restore
from the snapshot, log the reason, continue.(complexity, max_nesting, loc)
lexicographically (complexity first; only on a tie consult max_nesting, then loc); sql —
median_ms, keeping only on a margin clear of timing noise (default ≥ 3% relative).<patience>) or <budget>.Change ideas — code mode: flatten nested if/else into guard clauses, replace a hand-rolled loop
with a stdlib call (sum, min, max, statistics.*), collapse duplicated branches, remove dead
code. Preserve public behaviour — names, signatures, return shapes, raised exceptions; the test suite
is the contract.
Change ideas — sql mode: add an index to <indexes_file> covering filtered/joined/grouped columns;
rewrite the query (correlated subquery → JOIN + GROUP BY, hoist a repeated computation, replace
SELECT * with needed columns, push a filter earlier, drop a redundant DISTINCT/sort). The hash is
over the multiset of rows, so it does not catch a changed row order — if ORDER BY is part of
the contract, eyeball that the rewrite preserves it.
Lexicographic keep (code mode), current best (18, 3, 64): (15, 3, 45) keep (lower complexity);
(18, 2, 70) keep (tie complexity, lower nesting); (18, 3, 61) keep (tie, fewer lines); (18, 3, 64)
discard (no progress); (19, 1, 20) discard (higher complexity outweighs simpler nesting/loc).
Plateau counting: increment the no-improvement counter on every iteration that does not set a new
best — discarded for a failed gate, a broken change, or an insufficient metric gain — and reset it to 0
on each keep. <patience> fruitless iterations in a row ends the run; <budget> is the hard cap. On
stop, restore the working files to the best iteration (if the latest was a discard) and report:
baseline vs best metric (and, in sql mode, the speedup factor), the trajectory, and the winning change
set. If you run low on ideas before the budget, look harder rather than stopping early.
<sandbox_root>/ledger.tsv, tab-separated, never commas in the description. status ∈
{keep, discard, baseline}. Use the columns for the active <mode>.
Code mode header iter complexity max_nesting loc status description:
iter complexity max_nesting loc status description
0 23 7 86 baseline unmodified module
1 19 5 78 keep flatten summarize guard clauses
2 19 5 80 discard extract helper (no complexity gain)
3 13 3 40 keep use statistics + min/max/median
SQL mode header iter median_ms rows hash_ok status description (hash_ok ∈ {yes,no,-}):
iter median_ms rows hash_ok status description
0 1121.06 10 - baseline correlated subquery no index
1 6.82 10 yes keep rewrite correlated subquery as JOIN + GROUP BY
2 1.18 10 yes keep add index orders(customer_id, amount)
4 0.40 10 no discard drop ORDER BY — changed result set
Report the best iteration, not necessarily the last.
<editable_files> — the gate's ground truth is read-only: the tests and
tools/metrics.py (code), the database and tools/bench.py (sql). Editing what measures you to move
the number invalidates the run.hash ≠ baseline)
is a regression, not an optimization — revert it regardless of the metric. A green gate after a
behaviour change means the gate is too weak, not that the change is safe; prefer holding behaviour
identical over trusting a thin gate.<gate_cmd>; sql: the same --repeat);
compare the metric, not a single noisy run.../ escapes beyond the bound <sandbox_root>.npx claudepluginhub gaasher/agent-loop-skills --plugin agent-loopsRefactors existing code to lower Big-O complexity via a one-transformation-at-a-time playbook with verify-revert-stop. Use when code is slow, times out, or OOMs on large inputs.
Runs autonomous optimization loops with 3 parallel agents per round, using shell metric commands, git worktrees, guards, and baselines to iteratively improve code performance.
Runs an autonomous improvement loop: modify code, measure one metric, keep or discard changes, repeat. Use for overnight optimization against a quantified goal (coverage, bundle size, etc.).