From dev
Generates program.md for autonomous AI research experiments (Karpathy's autoresearch). Interviews user on codebase, metrics, constraints; explores code; tailors agent instructions from template.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dev:auto-researchThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are creating a `program.md` — a natural language program that instructs an AI agent to conduct
You are creating a program.md — a natural language program that instructs an AI agent to conduct
autonomous research experiments. The generated document is not documentation; it is executable
instructions that an AI agent will follow literally, running experiments in an infinite loop.
The human writes a research plan (program.md). The AI agent executes the experiment loop. Code is the agent's operating target, not the human's. The human sleeps; the agent works.
Before generating anything, you need to understand the research context. Ask the user about these areas (adapt based on what they've already told you — skip questions they've answered):
uv run train.py, python train.py)After the interview, read the key files to understand:
Read the template at references/program-template.md and fill it in based on the interview
and codebase exploration. The template contains {{PLACEHOLDER}} markers — replace each one
with content tailored to the user's project.
Preserve the original spirit: The generated program.md must retain ALL sections from the template. Never remove sections — only customize their content. The structure (Setup, Experimentation rules, Output format, Logging, Experiment loop, Timeout, Crashes, NEVER STOP) is sacred.
Be specific: Replace generic placeholders with actual file names, actual commands, actual grep patterns. The agent following this document should not need to guess anything.
Calibrate the noise threshold: Based on the metric and experiment duration, set an appropriate threshold for distinguishing real improvement from noise. Short runs with high variance need larger thresholds.
Right-size the experiment priority list: Suggest experiment directions that make sense for the specific domain. An LLM training project has different levers than a reinforcement learning project or an image classifier.
Adapt constraints to the environment: A Mac with MPS has different constraints than an H100. Adjust VRAM warnings, batch size advice, and timeout values accordingly.
Enforce non-interactive operations: The generated program.md must emphasize that all commands run unattended. The template includes a "Non-interactive principle" section — ensure the run command, git operations, and any project-specific commands are configured with non-interactive flags. If the user's workflow involves commands that might prompt for input, identify and document the non-interactive alternatives during the interview phase.
Present the generated program.md to the user. Walk them through the key sections and confirm:
Make adjustments based on feedback.
Save the program.md to the project directory. Advise the user on how to start:
To start autonomous research:
1. Open a new Claude Code / AI agent session in the project directory
2. Prompt: "Read program.md and let's start. Do the setup first."
3. Confirm the setup, then let the agent run
4. Check results.tsv when you return
npx claudepluginhub yanmxa/cc-plugins --plugin devRuns an autonomous 5-stage research loop that reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until a target metric is achieved or budget exhausted.
Runs an autonomous ML research loop that edits training code, runs it, and keeps changes that lower a single scalar metric. Use for hands-off optimization of one training script.
Creates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.