Skill

auto-improve-skill

Autonomously improve any Oblique Claude skill using an AutoResearch-style loop — no API key needed. Runs eval cases against the current skill, grades outputs, proposes improvements to SKILL.md, keeps if better or reverts if worse, and iterates until the target pass rate is hit or max iterations are reached. Use this skill whenever Sean asks to "improve a skill", "auto-improve", "run the loop on [skill]", "audit [skill name]", "make [skill] better", or "run autoresearch on [skill]". Also trigger when Sean says he wants to improve multiple skills overnight or asks to run evals on any skill.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/oblique-cowork:auto-improve-skill

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Autonomous AutoResearch-style loop for improving Claude skills. Mirrors karpathy/autoresearch:

SKILL.md

218 lines · ~2.1k tokens

Stats

LanguagePython

Parent stars0

MaintenanceExcellent

Last CommitJun 26, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Auto-Improve Skill

Autonomous AutoResearch-style loop for improving Claude skills. Mirrors karpathy/autoresearch:

autoresearch	this skill
`train.py`	`SKILL.md` (the file being improved)
`program.md`	`evals/evals.json` (what "good" looks like)
5-min GPU budget	N eval cases (fixed evaluation budget)
`val_bpb` metric	assertion pass rate (higher = better)
keep/revert	commit if improved, revert if regression
overnight loop	`--max-iter N` autonomous iterations

Step 0: Clarify inputs

Ask (or infer from context) if not already clear:

Which skill? — skill name or path
Evals file? — does it have evals/evals.json? If not, offer to generate one from the skill's purpose
Max iterations? — default 8
Target pass rate? — default 85%

Skill directories live at:

Installed skills: /var/folders/.../T/claude-hostloop-plugins/.../skills/<skill-name>/ (read-only — copy to vault first)
Vault skills: /Users/seanng/Documents/Sean's Oblique Vault/Skills & Evals/<skill-name>/ (writable)

If the skill is in the read-only installed location, copy it to the vault before starting:

cp -r /path/to/installed/skills/<skill-name> \
  "/Users/seanng/Documents/Sean's Oblique Vault/Skills & Evals/<skill-name>-improve/"

Step 1: Load skill and evals

Read SKILL.md from the skill directory. Read evals/evals.json. If no evals exist, generate them now (see Step 1b).

Eval format (compatible with skill-creator):

{
  "skill_name": "skill-name",
  "evals": [
    {
      "id": 1,
      "prompt": "A realistic user prompt that should trigger this skill",
      "assertions": [
        {"name": "assertion_name", "criteria": "Plain-English description of what the output must do or contain"}
      ]
    }
  ]
}

Good assertions are:

Objectively verifiable (yes/no, not "is it good?")
Specific to what the skill is supposed to do
Covering the most common failure modes

Step 1b: Generate evals if none exist

If evals/evals.json doesn't exist, generate 5–8 eval cases based on the skill's purpose. Each case should be a realistic prompt a user would actually send. Write assertions that test the most important output properties. Save to evals/evals.json and show Sean before proceeding.

Step 2: Run the autoresearch loop

Track: best_pass_rate, best_skill_content, current_skill_content, iteration, history[]

Initialise:

best_pass_rate = -1
best_skill_content = current SKILL.md content
no_improvement_streak = 0

For each iteration:

2a. Evaluate the skill

For each eval case, spawn a subagent with this prompt:

You are Claude operating with the following skill instructions. Follow them exactly.

[SKILL.md content]

---

User: [eval prompt]

Collect the output. Then grade it against the assertions.

Grade each assertion as PASS or FAIL with a one-line reason. Calculate pass_rate = assertions_passed / total_assertions.

In Cowork: spawn all eval subagents in a single message (parallel). While they run, draft or review the assertions. When results return, grade inline — no need for a separate grader subagent unless there are many evals.

2b. Commit or revert (the autoresearch core)

if pass_rate > best_pass_rate:
    best_pass_rate = pass_rate
    best_skill_content = current_skill_content
    no_improvement_streak = 0
    → "↑ New best — locking this version"

elif pass_rate < best_pass_rate:
    current_skill_content = best_skill_content   # revert
    no_improvement_streak += 1
    → "↓ Regression — reverting to best"

else:  # equal
    no_improvement_streak += 1
    → "→ No change"

2c. Check stopping conditions

Stop if any of:

pass_rate >= target → "✓ Target reached"
iteration == max_iter → "Max iterations reached"
no_improvement_streak >= 3 → "No improvement for 3 iterations — stopping"

2d. Propose improvements

If not stopping, read the failures and improve the skill content:

What assertions are failing and why?
What's missing from the skill instructions that would fix them?
What's over-constraining the output and causing unnecessary failures?
Are there instructions that need to explain the why, not just the what?

Apply the improvements to current_skill_content. Write a brief note on what changed and why.

Key improvement principles (from skill-creator):

Explain the why behind instructions — LLMs follow reasoning better than rigid rules
Remove anything not pulling its weight
If all evals are independently writing the same helper code, bundle it into the skill
Prefer loosening over-constrained instructions over adding more rules
Don't overfit to specific examples — generalise from the failure pattern

2e. Save version and loop

Save the updated SKILL.md to the skill directory. Save a versioned copy named after the skill so it's unambiguous:

<skill-dir>/improvement_history/<skill-name>_v<N>.md

Example: copywriting/improvement_history/copywriting_v3.md

Increment iteration and go to 2a.

Step 3: Write best version and report

When the loop ends:

Write best_skill_content back to SKILL.md
Show the improvement history table:

Skill: copywriting
Run: 20/04/2026 14:32
─────────────────────────────────────
Iteration  Pass Rate  Change    File
─────────────────────────────────────
Iter 0     54%        (baseline) copywriting_v0.md
Iter 1     67%        ↑ +13%    copywriting_v1.md
Iter 2     58%        ↓ reverted copywriting_v2.md
Iter 3     75%        ↑ +8%     copywriting_v3.md
Iter 4     83%        ↑ +8%     copywriting_v4.md ← best
Iter 5     83%        → no change
Iter 6     83%        → no change
─────────────────────────────────────
Best: copywriting_v4.md (83%)
Written back to: copywriting/SKILL.md

Summarise what changed between v0 and the best version — what instructions were added, removed, or rewritten, and why
If the target wasn't reached, note which assertions are still failing most often and suggest what might fix them

Step 4: Optional — run description optimiser

After the skill body is improved, offer to run the description optimisation loop from skill-creator to improve triggering accuracy. This is a separate step and uses scripts/run_loop.py from the skill-creator directory.

Step 5: Update Skills Map (mandatory if scope or triggers changed)

If the improvement loop changed the skill's description, trigger phrases, output type, or scope, update Sean's Skills Map at /Users/seanng/Documents/Sean's Oblique Vault/Skills Map.md so the vault's AI navigation stays accurate.

Skip this step only if the changes were purely internal (rewording instructions, fixing logic) without affecting how a user would invoke the skill or what it produces.

What to update:

Find the row for this skill in the appropriate domain table.
Update the trigger phrases, purpose, or output type to match the new SKILL.md.
If the description optimiser produced new trigger language, replace the old triggers with the new ones (use the actual user-facing phrases from the new description).
Update the file's "Last updated" line to today's date in DD/MM/YYYY format.
Read back the changed section to verify it parses cleanly.

How to update: use the Edit tool. Don't rewrite the file.

Notes for Cowork

Skills at the installed path are read-only. Always copy to the vault before running.
Spawn eval subagents in parallel (all in one message turn) — don't do them serially.
Save improvement history files as you go — don't batch at the end.
If an eval subagent produces a file output (like a .docx), note the path — grade against what the output says it produced, not the file itself, unless assertion explicitly requires file inspection.
The vault path for storing improved skills: Skills & Evals/<skill-name>-improved/

auto-improve-skill

Invocation

Context Preview

SKILL.md

auto-improve-skill

Invocation

Context Preview

SKILL.md

Auto-Improve Skill

Step 0: Clarify inputs

Step 1: Load skill and evals

Step 1b: Generate evals if none exist

Step 2: Run the autoresearch loop

For each iteration:

Step 3: Write best version and report

Step 4: Optional — run description optimiser

Step 5: Update Skills Map (mandatory if scope or triggers changed)

Notes for Cowork

Similar Skills

Auto-Improve Skill

Step 0: Clarify inputs

Step 1: Load skill and evals

Step 1b: Generate evals if none exist

Step 2: Run the autoresearch loop

For each iteration:

Step 3: Write best version and report

Step 4: Optional — run description optimiser

Step 5: Update Skills Map (mandatory if scope or triggers changed)

Notes for Cowork

Similar Skills