From oblique-cowork
Autonomously improve any Oblique Claude skill using an AutoResearch-style loop — no API key needed. Runs eval cases against the current skill, grades outputs, proposes improvements to SKILL.md, keeps if better or reverts if worse, and iterates until the target pass rate is hit or max iterations are reached. Use this skill whenever Sean asks to "improve a skill", "auto-improve", "run the loop on [skill]", "audit [skill name]", "make [skill] better", or "run autoresearch on [skill]". Also trigger when Sean says he wants to improve multiple skills overnight or asks to run evals on any skill.
How this skill is triggered — by the user, by Claude, or both
Slash command
/oblique-cowork:auto-improve-skillThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Autonomous AutoResearch-style loop for improving Claude skills. Mirrors karpathy/autoresearch:
Autonomous AutoResearch-style loop for improving Claude skills. Mirrors karpathy/autoresearch:
| autoresearch | this skill |
|---|---|
train.py | SKILL.md (the file being improved) |
program.md | evals/evals.json (what "good" looks like) |
| 5-min GPU budget | N eval cases (fixed evaluation budget) |
val_bpb metric | assertion pass rate (higher = better) |
| keep/revert | commit if improved, revert if regression |
| overnight loop | --max-iter N autonomous iterations |
Ask (or infer from context) if not already clear:
evals/evals.json? If not, offer to generate one from the skill's purposeSkill directories live at:
/var/folders/.../T/claude-hostloop-plugins/.../skills/<skill-name>/ (read-only — copy to vault first)/Users/seanng/Documents/Sean's Oblique Vault/Skills & Evals/<skill-name>/ (writable)If the skill is in the read-only installed location, copy it to the vault before starting:
cp -r /path/to/installed/skills/<skill-name> \
"/Users/seanng/Documents/Sean's Oblique Vault/Skills & Evals/<skill-name>-improve/"
Read SKILL.md from the skill directory. Read evals/evals.json. If no evals exist, generate them now (see Step 1b).
Eval format (compatible with skill-creator):
{
"skill_name": "skill-name",
"evals": [
{
"id": 1,
"prompt": "A realistic user prompt that should trigger this skill",
"assertions": [
{"name": "assertion_name", "criteria": "Plain-English description of what the output must do or contain"}
]
}
]
}
Good assertions are:
If evals/evals.json doesn't exist, generate 5–8 eval cases based on the skill's purpose. Each case should be a realistic prompt a user would actually send. Write assertions that test the most important output properties. Save to evals/evals.json and show Sean before proceeding.
Track: best_pass_rate, best_skill_content, current_skill_content, iteration, history[]
Initialise:
best_pass_rate = -1best_skill_content = current SKILL.md contentno_improvement_streak = 02a. Evaluate the skill
For each eval case, spawn a subagent with this prompt:
You are Claude operating with the following skill instructions. Follow them exactly.
[SKILL.md content]
---
User: [eval prompt]
Collect the output. Then grade it against the assertions.
Grade each assertion as PASS or FAIL with a one-line reason. Calculate pass_rate = assertions_passed / total_assertions.
In Cowork: spawn all eval subagents in a single message (parallel). While they run, draft or review the assertions. When results return, grade inline — no need for a separate grader subagent unless there are many evals.
2b. Commit or revert (the autoresearch core)
if pass_rate > best_pass_rate:
best_pass_rate = pass_rate
best_skill_content = current_skill_content
no_improvement_streak = 0
→ "↑ New best — locking this version"
elif pass_rate < best_pass_rate:
current_skill_content = best_skill_content # revert
no_improvement_streak += 1
→ "↓ Regression — reverting to best"
else: # equal
no_improvement_streak += 1
→ "→ No change"
2c. Check stopping conditions
Stop if any of:
pass_rate >= target → "✓ Target reached"iteration == max_iter → "Max iterations reached"no_improvement_streak >= 3 → "No improvement for 3 iterations — stopping"2d. Propose improvements
If not stopping, read the failures and improve the skill content:
Apply the improvements to current_skill_content. Write a brief note on what changed and why.
Key improvement principles (from skill-creator):
2e. Save version and loop
Save the updated SKILL.md to the skill directory. Save a versioned copy named after the skill so it's unambiguous:
<skill-dir>/improvement_history/<skill-name>_v<N>.md
Example: copywriting/improvement_history/copywriting_v3.md
Increment iteration and go to 2a.
When the loop ends:
best_skill_content back to SKILL.mdSkill: copywriting
Run: 20/04/2026 14:32
─────────────────────────────────────
Iteration Pass Rate Change File
─────────────────────────────────────
Iter 0 54% (baseline) copywriting_v0.md
Iter 1 67% ↑ +13% copywriting_v1.md
Iter 2 58% ↓ reverted copywriting_v2.md
Iter 3 75% ↑ +8% copywriting_v3.md
Iter 4 83% ↑ +8% copywriting_v4.md ← best
Iter 5 83% → no change
Iter 6 83% → no change
─────────────────────────────────────
Best: copywriting_v4.md (83%)
Written back to: copywriting/SKILL.md
After the skill body is improved, offer to run the description optimisation loop from skill-creator to improve triggering accuracy. This is a separate step and uses scripts/run_loop.py from the skill-creator directory.
If the improvement loop changed the skill's description, trigger phrases, output type, or scope, update Sean's Skills Map at /Users/seanng/Documents/Sean's Oblique Vault/Skills Map.md so the vault's AI navigation stays accurate.
Skip this step only if the changes were purely internal (rewording instructions, fixing logic) without affecting how a user would invoke the skill or what it produces.
What to update:
How to update: use the Edit tool. Don't rewrite the file.
Skills & Evals/<skill-name>-improved/npx claudepluginhub seanng23/oblique-power-skills --plugin oblique-coworkSets up isolated workspaces using native worktree tools or git worktree fallback. Use before starting feature work to protect the current branch.