Skill

spec-review

Verifies implementation matches design specification for functional completeness, test adequacy, and test coverage. Stage 1 of two-stage review.

testing

code-quality

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/exarchos:skills-copilot-spec-review

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Stage 1 of two-stage review: Verify implementation matches specification and follows TDD.

Supporting Files

references/rationalization-refutation.mdreferences/review-checklist.mdreferences/worked-example.md

SKILL.md

316 lines · ~3.3k tokens

Stats

LanguageTypeScript

Stars45

MaintenanceExcellent

Last CommitJun 24, 2026

Actions

View Source View Plugin View on GitHub View README

Spec Review Skill

Overview

Stage 1 of two-stage review: Verify implementation matches specification and follows TDD.

For a complete worked example, see references/worked-example.md.

MANDATORY: Before accepting any rationalization for approving without full verification, consult references/rationalization-refutation.md. Every common excuse is catalogued with a counter-argument and the correct action.

Triggers

Activate this skill when:

User runs /review command (first stage)
Task implementation is complete
Need to verify spec compliance before quality review
Subagent reports task completion

Execution Context

This skill runs in a SUBAGENT spawned by the orchestrator, not inline.

The orchestrator provides:

State file path (preferred) OR design/plan paths
Diff output from exarchos_orchestrate({ action: "review_diff" }) (context-efficient)
Task ID being reviewed

The subagent:

Reads state file to get artifact paths
Uses diff output instead of reading full files
Runs verification commands
Generates report
Returns verdict to orchestrator

Data Handoff Protocol

The orchestrator is responsible for generating the diff before dispatching the spec-review subagent. The subagent does NOT generate its own diff.

Orchestrator responsibilities:

Generate diff: exarchos_orchestrate({ action: "review_diff", worktreePath: "<worktree-path>", baseBranch: "main" })
Pass diff content in the subagent dispatch prompt
Include state file path for artifact resolution

Subagent responsibilities:

Receive diff content from dispatch prompt (do NOT re-generate)
Read state file for design/plan artifact paths
Run verification commands against the working tree
Return structured JSON verdict

Context-Efficient Input

Instead of per-worktree diffs, receive an integrated diff from the integration branch (e.g., feature/integration-branch) against main:

# Generate integrated diff for review
git diff main...integration > /tmp/combined-diff.patch

# Alternative: use review-diff script against integration branch via orchestrate
# exarchos_orchestrate({ action: "review_diff", worktreePath: "<worktree-path>", baseBranch: "main" })

This provides the complete picture of all changes across all tasks and reduces context consumption by 80-90%.

Pre-Review Schema Discovery

Before evaluating, query the review strategy runbook to determine the appropriate evaluation approach:

Evaluation strategy: exarchos_orchestrate({ action: "runbook", id: "review-strategy" }) to determine the review approach based on diff scope, prior fix cycles, and review stage.

Review Scope

Review Scope: Combined Changes

After delegation completes, spec review examines:

The complete integrated diff (main...feature/integration branch)
All changes across all tasks in one view
The full picture of combined functionality

This enables catching:

Cross-task interface mismatches
Bugs not visible in isolation
Combined behavior vs specification

Spec Review focuses on:

Functional completeness
Test adequacy (outcome-based, tier-scaled — not test-first ordering)
Specification alignment
Test coverage
Intended-vs-delivered: the delivered diff fulfils artifacts.intent — no intended-but-missing or delivered-but-unintended (scope-creep) work (when intentGrounding is supplied)

Does NOT cover (that's Quality Review):

Code style
SOLID principles
Performance optimization
Error handling elegance

Intended-vs-Delivered Grounding

The orchestrator captures the intended change as artifacts.intent (surfaces, a summary, and — when available — a one-line transcript summary) and threads it into your dispatch as an intentGrounding directive on the back-of-pipeline code-review path. When present, you MUST verify the delivered diff against the intended change:

Intended-but-missing — a surface or outcome the intent calls for that the diff does not deliver. Flag as a spec issue.
Delivered-but-unintended (scope creep) — changes outside the intended surfaces/summary with no spec justification. Flag as a spec issue.

When no intentGrounding is supplied (an empty or un-resolvable diff), proceed against the diff alone — do NOT fabricate an intent. This grounding is additive to the spec-alignment checks below, not a replacement for them.

Review Checklist

For the full checklist with verification commands, tables, and report template, see references/review-checklist.md.

Verification:

npm run test:run
npm run test:coverage
npm run typecheck

exarchos_orchestrate({
  action: "check_test_adequacy",
  featureId: "<featureId>",
  taskId: "<taskId>",
  branch: "<branch>",
  riskTier: "<low|medium|high>"
})

Fix Loop

If review FAILS, the fix-loop is bounded by the shared escalation policy (config-resolvable escalation.maxIterations, default 5) — it does NOT loop unboundedly. check_review_verdict returns the escalate decision the loop MUST honor: on a NEEDS_FIXES verdict it carries escalate: true + escalationReason when the loop must stop (the auto-fix bound was hit OR a finding is intent-touching), and the report's routing instruction reflects this.

Two outcomes:

Auto-fix (under the bound, MECHANICAL findings) — escalate is absent/falsy. Re-dispatch to the implementer and re-review, as below. The verdict report surfaces the remaining budget (e.g. "fix cycle N/maxIterations").
Escalate to the user (ask-user) — escalate: true. Do NOT re-dispatch /delegate --fixes. Surface the findings and escalationReason to the user and ask how to proceed (accept, redesign, or adjust scope). This happens when EITHER:
- the auto-fix bound (escalation.maxIterations, default 5) is reached — the loop has fixed-and-re-reviewed that many times without converging; OR
- a finding is intent-touching — a spec-category issue (intended-but-missing or scope-creep) that changes what was asked for, so a human decides rather than the loop silently "fixing" it. Intent-touching findings escalate immediately, regardless of how many cycles have run.

The fix-cycle count is event-sourced (prior review-verdict NEEDS_FIXES gate events) — there is no separate counter to maintain, and check_review_verdict reads it for you.

Auto-fix path — re-dispatch to implementer:

Create fix task with specific issues
Dispatch to implementer (same or new)
Re-review after fixes — check_review_verdict re-evaluates the bound each pass

// Return to implementer (auto-fix path only — when escalate is falsy)
Task({
  model: "opus",
  description: "Fix spec review issues",
  prompt: `
# Fix Required: Spec Review Failed

## Issues to Fix
1. Missing rate limiting implementation
   - Add rate limiter middleware
   - Test: RateLimiter_ExceedsLimit_Returns429

2. Email validation incomplete
   - Add MX record check
   - Test: ValidateEmail_InvalidDomain_ReturnsError

## Success Criteria
- All tests pass
- Coverage >80%
- All issues resolved
`
})

Required Output Format

The subagent MUST return results as structured JSON. The orchestrator parses this JSON to populate state. Any other format is an error.

{
  "verdict": "pass | fail | blocked",
  "summary": "1-2 sentence summary",
  "issues": [
    {
      "severity": "HIGH | MEDIUM | LOW",
      "category": "spec | tdd | coverage",
      "file": "path/to/file",
      "line": 123,
      "description": "Issue description",
      "required_fix": "What must change"
    }
  ],
  "test_results": {
    "passed": 0,
    "failed": 0,
    "coverage_percent": 0
  }
}

Anti-Patterns

Don't	Do Instead
Skip to quality review	Complete spec review first
Accept incomplete work	Return for fixes
Review code style here	Save for quality review
Approve without tests	Require test coverage
Let scope creep pass	Flag over-engineering

Cross-Task Integration Issues

If an issue spans multiple tasks:

Classify as "cross-task integration"
Create fix task specifying ALL affected tasks
Dispatch fix to implementer with context from all affected tasks
Mark original tasks as blocked until cross-task fix completes

State Management

On Review Complete

Pass:

action: "update", featureId: "<id>", updates: {
  "reviews": { "spec-review": { "status": "pass", "summary": "...", "issues": [] } }
}

Fail:

action: "update", featureId: "<id>", updates: {
  "reviews": { "spec-review": { "status": "fail", "summary": "...", "issues": [{ "severity": "...", "file": "...", "description": "..." }] } }
}

Important: The review value MUST be an object with a status field (e.g., { "status": "pass" }), not a flat string (e.g., "pass"). The all-reviews-passed guard silently ignores non-object entries. Accepted statuses: pass, passed, approved, fixes-applied.

Phase Transitions and Guards

For the full transition table, consult @skills/workflow-state/references/phase-transitions.md.

Quick reference:

review → synthesize requires guard all-reviews-passed — all reviews.{name}.status must be passing
review → delegate requires guard any-review-failed — triggers fix cycle when any review fails

Schema Discovery

Use exarchos_workflow({ action: "describe", actions: ["update", "init"] }) for parameter schemas and exarchos_workflow({ action: "describe", playbook: "feature" }) for phase transitions, guards, and playbook guidance. Use exarchos_orchestrate({ action: "describe", actions: ["check_test_adequacy", "check_static_analysis"] }) for orchestrate action schemas.

Transition

All transitions happen immediately without user confirmation:

Pre-Chain Validation (MANDATORY)

Before invoking quality-review:

Verify reviews["spec-review"].status === "pass" in workflow state (all tasks passed)
If not: "Spec review did not pass, cannot proceed to quality review"

Guard shape: The all-reviews-passed guard requires reviews["spec-review"] to be an object with a status field set to a passing value (pass, passed, approved, fixes-applied). Flat strings like reviews: { "spec-review": "pass" } are silently ignored and will block the review → synthesize transition.

If PASS:

Record results — the reviews value MUST be an object with a status field, not a flat string:

exarchos_workflow({ action: "update", featureId: "<id>", updates: {
  reviews: { "spec-review": { status: "pass", summary: "...", issues: [] } }
}})

Output: "Spec review passed. Auto-continuing to quality review..."
Orchestrator dispatches quality-review subagent immediately

Gate events: Do NOT manually emit gate.executed events via exarchos_event. Gate events are automatically emitted by the check_review_verdict orchestrate handler. Manual emission causes duplicates.

If FAIL:

Record results with failing status and issue details:

exarchos_workflow({ action: "update", featureId: "<id>", updates: {
  reviews: { "spec-review": { status: "fail", summary: "...", issues: [{ severity: "HIGH", file: "...", description: "..." }] } }
}})

Output: "Spec review found [N] issues. Auto-continuing to fixes..."

Auto-invoke delegate with fix tasks:

[Invoke the exarchos:delegate skill with args: --fixes <plan-path>]

This is NOT a human checkpoint - workflow continues autonomously.

Troubleshooting

Issue	Cause	Resolution
Test file not found	Task didn't create expected test	Check plan for test file paths, verify worktree contents
Coverage below threshold	Implementation incomplete or tests superficial	Add missing test cases, verify assertions are meaningful
Test-adequacy kill-probe fails	A new/changed test still passes against reverted source (vacuous)	Strengthen the test so reverting the implementation makes it fail
Diff too large for context	Many tasks with large changes	Generate per-worktree diffs with `exarchos_orchestrate({ action: "review_diff", worktreePath: "<task-worktree>" })` to review incrementally

Performance Notes

Use the integrated diff (exarchos_orchestrate({ action: "review_diff" })) instead of reading full files — reduces context by 80-90%
Review per-task when the combined diff exceeds 2,000 lines
Run the test-adequacy kill-probe (exarchos_orchestrate({ action: "check_test_adequacy" })) in parallel with spec tracing

spec-review

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

spec-review

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Spec Review Skill

Overview

Triggers

Execution Context

Data Handoff Protocol

Context-Efficient Input

Pre-Review Schema Discovery

Review Scope

Review Scope: Combined Changes

Intended-vs-Delivered Grounding

Review Checklist

Fix Loop

Required Output Format

Anti-Patterns

Cross-Task Integration Issues

State Management

On Review Complete

Phase Transitions and Guards

Schema Discovery

Transition

Pre-Chain Validation (MANDATORY)

If PASS:

If FAIL:

Troubleshooting

Performance Notes

Similar Skills

Spec Review Skill

Overview

Triggers

Execution Context

Data Handoff Protocol

Context-Efficient Input

Pre-Review Schema Discovery

Review Scope

Review Scope: Combined Changes

Intended-vs-Delivered Grounding

Review Checklist

Fix Loop

Required Output Format

Anti-Patterns

Cross-Task Integration Issues

State Management

On Review Complete

Phase Transitions and Guards

Schema Discovery

Transition

Pre-Chain Validation (MANDATORY)

If PASS:

If FAIL:

Troubleshooting

Performance Notes

Similar Skills