From magi-researchers
Executes research code in src/ to generate artifacts in results/, reading commands from plan/research_plan.md YAML frontmatter or execution_manifest.json. Phase 3.5 of research pipeline with prerequisite checks.
How this skill is triggered — by the user, by Claude, or both
Slash command
/magi-researchers:research-executeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Executes the research code in `src/` to generate result artifacts in `results/`. This is Phase 3.5
Executes the research code in src/ to generate result artifacts in results/. This is Phase 3.5
of the research pipeline, sitting between Implementation (Phase 3) and Testing & Visualization (Phase 4).
Reads execution commands deterministically from the YAML frontmatter of plan/research_plan.md — no
keyword heuristics, no entry-point guessing. The full run command is defined once during planning and
executed here.
/research-execute [path/to/output/dir]
$ARGUMENTS — Optional path to the research output directory. If not provided, uses the most recent outputs/*/ directory.When --claude-only is active, there are no Gemini/Codex calls in this skill. All steps are
performed by Claude directly.
$ARGUMENTS or most recent outputs/*/).plan/research_plan.md and parse the YAML frontmatter:
---
languages: ["rust", "python"]
ecosystem: ["cargo", "uv"]
execution_cmd: "bash run_all.sh"
dry_run_cmd: "bash run_all.sh --dry-run"
expected_outputs:
- "results/metrics.csv"
- "results/checkpoint.pt"
estimated_runtime: "~30 minutes"
---
execution_manifest.json: Check if execution_manifest.json exists in the output directory root. If it does, read execution fields from this file instead of the YAML frontmatter:
{
"schema_version": "1.0.0",
"languages": ["rust", "python"],
"ecosystem": ["cargo", "uv"],
"execution_cmd": "bash run_all.sh",
"dry_run_cmd": "bash run_all.sh --dry-run",
"expected_outputs": [
{"path": "results/metrics.csv", "required": true},
{"path": "results/checkpoint.pt", "required": false}
],
"estimated_runtime": "~30 minutes"
}
If execution_manifest.json exists, it takes precedence over YAML frontmatter fields. If it does not exist, fall back to the YAML frontmatter (backward compatibility).execution_cmd: announce the problem to the user and
ask them to provide the execution command manually. Do not guess. Suggest adding the frontmatter
to research_plan.md following the schema above.src/ exists and contains at least one file.Check if results/ already exists and contains at least one file that is not run_log.txt,
pre_execution_status.json, or pre_execution_status.md (legacy):
Glob: results/**/*
Exclusion: Exclude results/.staging/ from the existence check. Files under .staging/ are incomplete and must not trigger the 'results already exist' early-exit path.
If populated:
src/ and plan/research_plan.md. Compare against hashes stored in results/.source_hashes.json (if it exists).
"results/ already contains artifacts and source code is unchanged. Skipping re-execution.".source_hashes.json is missing: Announce: "results/ contains artifacts but source code has changed since they were generated. Re-execution recommended." Ask the user: "(a) Re-execute with current code, or (b) Keep existing results?"results/pre_execution_status.json (if not already present) with the canonical EXISTING schema:
{
"state": "EXISTING",
"error_class": null,
"severity": null,
"retryable": false,
"downstream_allowed": true,
"traceback_ref": null,
"next_action": "proceed"
}
If dry_run_cmd is specified in the frontmatter, run it first as a fast sanity check:
{dry_run_cmd} 2>&1 | tee results/dry_run_log.txt
Timeout: 60 seconds.
| Outcome | Action |
|---|---|
| Exit 0 | Continue to Step 3 |
| Non-zero exit | Read results/dry_run_log.txt, extract the traceback |
| Timeout | Kill process; report to user; ask whether to proceed to full run anyway |
On dry-run failure:
results/ subdirectory, simple import): attempt one auto-fix, re-run dry-run.results/pre_execution_status.json. Do NOT attempt auto-fix. Report to user with full traceback and recommend investigating the root cause before retrying.If dry_run_cmd is not specified, skip this step and proceed directly to Step 3.
Before executing the full run, announce:
Ready to execute:
Command: {execution_cmd}
Estimated runtime: {estimated_runtime or "unknown"}
Output will be captured to: results/run_log.txt
Pause for user confirmation before running.
If estimated_runtime suggests a long job (> 15 minutes), add:
⚠ This job may take a long time. If you prefer to run it manually:
1. Run externally: {execution_cmd}
2. Copy results to the `results/` directory, then call `/research-execute [output_dir]` — the skill will detect existing results and skip re-execution automatically (Step 1 Early Exit).
Wait for explicit user confirmation before executing.
Create results/ directory if it does not exist, then run:
Manifest overrides: If execution_manifest.json was loaded in Step 0:
cwd is specified, cd to that directory before executing the command.env is specified (object of key-value pairs), prepend each as environment variable exports to the command (e.g., FOO=bar BAZ=qux {execution_cmd}).timeout_override_ms is specified, use it instead of the default 30-minute timeout below.Command validation: Before executing, inspect execution_cmd for shell metacharacters (;, &&, ||, |, $(, `, >, <, &). If any are found beyond simple pipes to tee, warn the user and require explicit confirmation before proceeding. Validate that the cwd field (if present) is a relative subdirectory path with no .. traversal and that the directory exists.
Execute in an isolated process group to prevent orphaned child processes on timeout:
setsid bash -c '{execution_cmd} 2>&1 | tee results/run_log.txt'
Timeout: 30 minutes (adjust based on estimated_runtime if provided and > 30 min, or timeout_override_ms from manifest).
On timeout — staggered teardown:
kill -TERM -$PGIDkill -KILL -$PGID"EXECUTION TIMED OUT. Process group terminated." to results/run_log.txtAtomic results staging: To prevent half-written results from triggering false early-exit on subsequent runs:
mkdir -p results/.staging/RESULTS_DIR=results/.staging/ (if the execution script respects it) or configure output paths to write to results/.staging/results/.staging/ to results/: mv results/.staging/* results/rmdir results/.staging/results/.staging/ (they will not trigger early-exit in Step 1)pre_execution_status.json that partial results are in .staging/Note: If the execution script writes directly to paths that cannot be redirected, skip atomic staging and document this in the run log.
| Outcome | Detection | Action |
|---|---|---|
| Success | Exit code 0 | Continue to Step 5 |
| Runtime error | Non-zero exit | Step 4-FAIL path |
| Timeout | Still running | Kill process → Step 4-TIMEOUT path |
Step 4-FAIL path (non-zero exit):
results/run_log.txt and extract the final traceback.results/pre_execution_status.json. Do NOT attempt auto-fix. Report to user with full traceback and recommend investigating the root cause before retrying.Step 4-TIMEOUT path:
"EXECUTION TIMED OUT." to results/run_log.txt.results/ contains so far.
Also check results/.staging/ — if atomic staging was active, partial artifacts may be there instead of results/. Include both locations in the inventory presented to the user.results/pre_execution_status.json → proceed to Step 4-PARTIAL.Step 4-PARTIAL (failure or timeout):
Write results/pre_execution_status.json:
{
"state": "FAILED | PARTIAL",
"error_class": "dependency|compilation|runtime|timeout|resource|fatal|unknown",
"severity": "recoverable|blocking|fatal",
"retryable": true,
"downstream_allowed": true | false,
"traceback_ref": "results/run_log.txt",
"next_action": "retry|abort|user_decision"
}
Choose the appropriate state, error_class, and severity based on the failure mode:
"state": "PARTIAL", "error_class": "timeout""state": "PARTIAL""state": "FAILED"
Set "downstream_allowed": true if partial artifacts exist that downstream phases can use, false if nothing usable was produced. Set "retryable": true for transient failures (timeout, resource), false for deterministic failures (compilation, logic).Announce clearly to the user. Do NOT block the pipeline — proceed to Step 6.
If execution succeeded:
Glob results/**/* and categorize by extension.
If expected_outputs is specified in frontmatter, verify each file exists. Report any missing ones.
Silent failure detection: If exit code was 0 but one or more required: true expected outputs are missing, treat this as state PARTIAL (not SUCCESS). Write pre_execution_status.json with "state": "PARTIAL", "error_class": "silent_failure", "severity": "recoverable", and "downstream_allowed": true. Announce the discrepancy to the user.
Write results/pre_execution_status.json:
{
"state": "SUCCESS",
"error_class": null,
"severity": null,
"retryable": false,
"downstream_allowed": true,
"traceback_ref": "results/run_log.txt",
"next_action": "proceed"
}
Note for downstream consumers: When reading
pre_execution_status.json, always check thestatefield value — do not treat file existence alone as an indicator of success. Seeresearch-testStep 0 for the correct guard logic.
Legacy fallback: If
pre_execution_status.mdexists (legacy v0.8.x workspace), read it and treat any line containing SUCCESS/FAILED/PARTIAL/EXISTING as the state. New runs always write.json.
Save source fingerprints for future staleness detection:
src/ and plan/research_plan.mdresults/.source_hashes.json:
{
"generated_at": "ISO-8601 timestamp",
"execution_cmd": "{execution_cmd}",
"hashes": {
"src/main.py": "sha256:abc123...",
"plan/research_plan.md": "sha256:def456..."
}
}
Announce success with the artifact summary.
Present to the user:
{execution_cmd}/research-test) when readyresults/ before calling this skill. Step 1 (Early Exit) will detect
the populated results/ and skip re-execution automatically.src/ files during this phase. If errors require code changes, roll back to
Phase 3 (Implement).npx claudepluginhub axect/magi-researchers --plugin magi-researchersImplements research code from research_plan.md in outputs directories. Locates plan, detects language/ecosystem from src/ or frontmatter, sets up workspace, uses MCP for implementation.
Runs an autonomous 5-stage research loop that reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until a target metric is achieved or budget exhausted.
Orchestrates a full research-plan-implement pipeline using parallel subagents, each in its own context window, with file artifacts as the communication channel between phases.