Running Skills EDD Cycle

Run evaluation-driven development cycle for agent skills.

Workflow

Step 1: Build Evaluations First

Create evaluations BEFORE writing documentation. This ensures skills solve real problems.

Run Claude on representative tasks WITHOUT the skill
Document specific failures or missing context
Create 3+ evaluation scenarios that test these gaps

Evaluation scenarios are saved to tests/scenarios.md as part of /creating-effective-skills workflow (Step 7).

Step 2: Establish Baseline

Measure Claude's performance WITHOUT the skill:

Run each evaluation scenario
Record: success/failure, missing context, wrong approaches
This becomes comparison baseline

Step 3: Write Minimal Instructions

Create just enough content to address the gaps:

Start with core workflow only
Add detail only when tests fail
Avoid over-explaining

REQUIRED: Run /creating-effective-skills before writing any skill content. This ensures proper naming, description format, and structure from the start.

Step 4: Evaluate with Multiple Models

Note: This step requires Claude Code CLI. Skip this step if using Claude.ai (model selection not available).

REQUIRED: Run /evaluating-skills-with-models with the skill path.

The skill will:

Auto-load scenarios from tests/scenarios.md
Execute with sub-agents across models (sonnet, opus, haiku)
Evaluate against expected behaviors
Determine recommended model (least capable with full compatibility)

After evaluation: Document recommended model in skill's metadata.

REQUIRED: Run /improving-skills when observations reveal issues.

Step 5: Final Review

Before considering the skill complete:

REQUIRED: Run /reviewing-skills to verify compliance with best practices.

Address all compliance issues identified
Re-run evaluations after fixes
Repeat until skill passes review

Step 6: User Validation Guide

After all reviews pass, output instructions for user to validate in a fresh session:

## Test Your Skill

Run this command in a new terminal to test with a fresh Claude session:

claude --model {recommended_model} "{evaluation_query}"

After testing, paste the output file or result back to this session for final confirmation.

Replace:

{recommended_model}: Model determined in Step 4 (e.g., sonnet)
{evaluation_query}: A representative query from your evaluations

Quick Reference

Cycle

Identify gaps -> Create evaluations -> Baseline -> Write minimal -> Model eval (sub-agents) -> Review -> User validation

What Observations Indicate

Observation	Indicates
Unexpected file reading order	Structure not intuitive
Missed references	Links need to be explicit
Repeated reads of same file	Move content to SKILL.md
Never accessed file	Unnecessary or poorly signaled

running-skills-edd-cycle