From prd2impl
Milestone gate verification — run automated checks and produce a structured pass/fail report for a milestone. Use when the user says 'smoke test', 'verify milestone', 'M1 gate check', or runs /smoke-test.
How this skill is triggered — by the user, by Claude, or both
Slash command
/prd2impl:skill-10-smoke-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<SUBAGENT-STOP>
Run milestone gate verification: check task completion, run automated tests, verify artifacts, and produce a structured go/no-go report.
/smoke-test {milestone} (e.g., /smoke-test M1)M0, M1, M2){plans_dir}/*-execution-plan.yaml (milestone definitions, gate checks){plans_dir}/tasks.yaml or {plans_dir}/task-status.md (task statuses).artifacts/registry.json (artifact completeness)Path resolution: Before constructing any read path, resolve
{plans_dir}perlib/plans-dir-resolver.md. Alldocs/plans/references (exceptdocs/plans/project.yaml, which stays at repo root) are relative to that resolved directory..artifacts/paths are NOT scoped — they remain shared across plans_dir (see design spec §8 Limitation 1).
dev-loop-skills:skill-0-project-builder maintains a
baseline_commit frontmatter on the project's "Skill 1" knowledge
file plus a self-update.sh --check script returning drift count
(new top-level modules, renamed dirs, etc. since the last bootstrap).
Before running any milestone test verification, check it:
/project-builder self-update --check
Gate rule:
drift_count > 50 (configurable via {plans_dir}/project.yaml::drift_threshold)
→ emit a STAGED warning prompting user to run /bootstrap re-baseline
before gate close. NOT an automatic NO-GO — drift can be intentional.drift_count <= 50 → proceed silently.Why: a stale module map silently passes milestone gates against
imagined code structure. PV2 shipped pipeline_v2/kb_mcp/ because
the planning step didn't know cc_pool.py:691 already auto-injects
an MCP server. A re-baseline before PV2 task-gen would have surfaced
the duplication.
Graceful degradation: when dev-loop-skills missing, skip with a
logged warning, gate proceeds.
Verify all tasks in this milestone's phase are completed:
## Task Completion — M1
| Task | Name | Status | Result |
|------|------|--------|--------|
| T1A.1 | Mode/Gate | 🟩 | ✅ Pass |
| T1A.2 | Timer | 🟩 | ✅ Pass |
| T1A.3 | EventBus | 🟩 | ✅ Pass |
| T1B.2 | Message UI | 🟦 | ❌ Still in progress |
Result: 16/17 complete — FAIL (1 task remaining)
If any tasks are not complete, report and ask whether to proceed with partial verification.
Triggers when: at least one task in the milestone has source_plan_path in its tasks.yaml entry.
For each such task, read the matching task-hints.yaml entries (rich per-plan-task data) to extract per-plan-task files.create + files.modify lists. If task-hints.yaml is missing or out of sync, re-parse the plan with skill-0-ingest/lib/plan-parser.md (Rule 3) as a fallback. Cross-check against actual git history.
The report breaks down to per-PLAN-TASK rows (e.g. T1 / plan-task-1, T1 / plan-task-2) for granularity, even though the prd2impl task is plan-FILE level. This is the "richness preserved" half of the plan-passthrough deal — task_hints.yaml is the source of truth for that richness.
For each prd2impl task T with source_plan_path = P:
task_hints.tasks[] entries whose source_plan_path == P (these are the plan-tasks within this prd2impl task).pt (1-based index i), record:
declared_create[T/pt_i] = pt.files.createdeclared_modify[T/pt_i] = pt.files.modifyIf task-hints.yaml cannot be located (e.g. ingest-docs was never run or task-hints was deleted), parse P directly with plan-parser and use the parsed tasks[]. Surface a WARN: "task-hints.yaml not found for {P}; re-parsed plan at smoke-test time (slower; please regenerate via /ingest-docs)."
git diff --name-status {base_branch}...HEAD | awk '$1 == "A" { print $2 }' # actually created
git diff --name-status {base_branch}...HEAD | awk '$1 == "M" { print $2 }' # actually modified
Define:
actual_create = the "A" setactual_modify = the "M" setThe actual set is computed ONCE for the whole milestone (not per-plan-task) — it's the cumulative diff vs the milestone's base branch.
For each T/pt_i row built in Step 2.5.1:
| Delta | Definition | Severity |
|---|---|---|
missing_create | declared_create[T/pt_i] ∩ NOT(actual_create) | NO-GO (declared file does not exist) |
unexpected_create | actual_create - ⋃(declared_create across all T/pt_i) | WARN (file created outside any plan; reported ONCE at milestone level, not per-plan-task) |
declared_modify_not_modified | declared_modify[T/pt_i] ∩ NOT(actual_modify ∪ actual_create) | NO-GO (plan said to modify but no diff) |
unexpected_modify | actual_modify - ⋃(declared_modify across all T/pt_i) | WARN (modification outside any plan; reported ONCE at milestone level) |
declared_modify_not_modified subtracts actual_create because a file declared as "modify" but actually created from scratch in this milestone is a NAMING mismatch, not a missing change — surface it as a WARN with hint "plan said modify; actual was create — was this file new this milestone?"
Add a new section to the gate report (Step 6):
## Plan vs Actual File Structure
Each prd2impl task with `source_plan_path` is broken down to its plan-tasks
(read from task-hints.yaml). The "Plan-Task" column shows `{prd2impl-task} /
plan-task-{N}` where N is the 1-based ordinal within the plan.
| Plan-Task | Status | Declared (C/M) | Actual (C/M) | Delta |
|-----------|--------|----------------|--------------|-------|
| T1 / plan-task-1 | ✅ | 5/0 | 5/0 | none |
| T1 / plan-task-2 | ❌ | 2/3 | 2/1 | declared_modify_not_modified: api_routes.py, auth.py |
| T1 / plan-task-4 | ⚠️ | 1/2 | 3/2 | unexpected_create: helpers/utils.py, helpers/__init__.py |
### Blocking deltas (NO-GO contributors)
- T1 / plan-task-2: declared but missing modify on `autoservice/api_routes.py`
- T1 / plan-task-2: declared but missing modify on `autoservice/auth.py`
### Warning deltas (CONDITIONAL GO contributors)
- T1 / plan-task-4: unexpected create `helpers/utils.py` — scope creep or incidental?
missing_create row → contributes a NO-GO to the milestone gate.declared_modify_not_modified row → contributes a NO-GO.unexpected_create and unexpected_modify rows are WARNINGs only — they contribute a CONDITIONAL GO if no NO-GO is otherwise present.If NO tasks in the milestone have source_plan_path, skip this step entirely (silent — no warning). The milestone may simply not be a plan-passthrough milestone.
If a task has source_plan_path but the file is missing, surface a CONDITIONAL GO with the diagnostic "plan file missing — cannot verify file structure" and proceed with Step 3.
Build a matrix showing every requirement mapped to its tests and execution status.
Input:
{plans_dir}/*-prd-structure.yaml (user_stories + acceptance_criteria){plans_dir}/*-gap-analysis.yaml (gaps + expected_behavior){plans_dir}/tasks.yaml (test_requirements + covers + status per task).artifacts/registry.json (test execution records)Matrix construction procedure:
1. Build requirement index:
a. From prd-structure.yaml, extract all user_stories and their acceptance_criteria
b. From gap-analysis.yaml, extract all P0 and P1 gaps
c. Sort by priority: P0 AC > P0 gap > P1 AC > P1 gap
2. Build test index:
a. From tasks.yaml, collect all test_requirements entries across all milestone tasks
b. From execution-plan.yaml, collect all e2e_scenarios
c. Map each test to requirements via the `covers` field
3. For each requirement, determine status:
- Find all test entries whose `covers` includes this requirement
- If ≥1 test covers it AND test PASS in registry → ✅ COVERED
- If ≥1 test covers it BUT test not executed or FAIL → ⚠️ UNTESTED
- If no test covers it → ❌ ORPHAN
- If the only covering tests are status=deferred → 🪦 DEFERRED
4. Gate rules:
- ORPHAN > 0 with P0 requirement → NO-GO
- ORPHAN only P1 and below → WARN
- UNTESTED > 0 → CONDITIONAL GO (with remediation list)
Output (write to {plans_dir}/traceability-matrix-{milestone}.md):
里程碑 {milestone} 需求→测试追溯矩阵
═══════════════════════════════════════════════════════════════
需求 测试覆盖 状态
────────────────────────────────────────────────────────────────
US-001 用户登录
AC-1 正确凭证登录 T1A.1:test_login_success ✅ COVERED
AC-2 错误凭证拒绝 T1A.1:test_login_failure ✅
AC-3 3次失败锁定 T1A.2:test_account_lock ✅
── E2E-M1-1:注册→登录流程 ✅
US-002 密码重置
AC-1 发送重置邮件 T1A.3:test_reset_email_sent ✅
AC-2 Token 30分钟过期 T1A.3:test_token_expiry ❌ UNTESTED
AC-3 旧密码立即失效 ── (无覆盖) ❌ ORPHAN
═══════════════════════════════════════════════════════════════
汇总: 12 COVERED / 1 UNTESTED / 1 ORPHAN / 0 DEFERRED
Degradation: If prd-structure.yaml or gap-analysis.yaml unavailable, note "TRACEABILITY MATRIX DEGRADED — requirement sources missing" and build from tasks.yaml covers fields only.
Aggregate all degradations recorded during the session from capability-profile.yaml.
Procedure:
1. Read {plans_dir}/capability-profile.yaml
2. Collect all degraded_paths entries
3. Classify each unique degradation:
🛑 CRITICAL (affects gate validity):
- code-review skipped on ≥1 task
- dev-loop test pipeline missing (entire milestone)
- TDD rhythm disabled on ≥1 task
⚠️ SIGNIFICANT (reduces confidence):
- systematic-debugging skipped on test failures
- parallel dispatch disabled (time impact only)
- completion_criteria absent on legacy tasks
ℹ️ MINOR:
- preflight N/A on greenfield tasks
- greenfield contract-check skipped
4. Output degradation summary in gate report
Output format:
╔══════════════════════════════════════════════════════════════╗
║ DEGRADATION SUMMARY — {milestone} ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ 🛑 CRITICAL ║
║ code-review skipped on 3 tasks (T1A.1, T1A.2, T1B.1) ║
║ → These tasks shipped without independent review. ║
║ → Gate decision confidence reduced. ║
║ → Install superpowers to restore. ║
║ ║
║ ⚠️ SIGNIFICANT ║
║ completion_criteria absent on 2 tasks (T1A.3, T1B.2) ║
║ → Legacy tasks.yaml; re-run /task-gen to upgrade. ║
║ ║
║ ───────────────────────────────────────────────────────── ║
║ Overall degradation level: ⚠️ SIGNIFICANT ║
║ Gate decision MAY be affected. Review above before GO. ║
╚══════════════════════════════════════════════════════════════╝
Gate impact:
Invoke dev-loop-skills:skill-4-test-runner and consume its e2e-report
artifact. Unlike raw pytest, the runner mechanically distinguishes new
failures from regression failures and emits an evidence manifest the
gate can read.
Run the test runner scoped to this milestone's phase keyword:
/test-runner --phase {phase_keyword} --emit-report
Read the resulting artifact at .artifacts/e2e-report-{milestone}-*.yaml.
Parse three signal classes from the report:
new_failure: count — failures in tests added during this milestoneregression_failure: count — failures in tests that previously passed
(auto-escalates to NO-GO regardless of other counts)pass_count, skip_countGate rule:
regression_failure > 0 → NO-GO (do NOT downgrade to "1 env-blocked")new_failure > 0 → STAGED (review with the user before declaring GO)Fall back to raw pytest with a logged warning. Without dev-loop, the gate cannot mechanically distinguish new vs regression failures — this is a structural weakness, not a stylistic preference.
Unit/Integration tests:
echo "WARN: dev-loop-skills not detected; smoke-test cannot distinguish"
echo " new vs regression failures. Install dev-loop-skills for"
echo " milestone-grade reporting."
pytest tests/ -k "{phase_keyword}" --tb=short
Contract tests (if applicable):
pytest tests/contract/ --tb=short
Type checks (if configured):
# Python
mypy autoservice/ --ignore-missing-imports
# TypeScript
npx tsc --noEmit
Build check:
make check # or equivalent
Treat any failure as ambiguous in the fallback path. Prompt the user to triage manually before declaring GO.
Execute the E2E scenarios defined in execution-plan.yaml for this milestone.
Precondition: Step 3 (per-task automated test verification) has passed.
Procedure:
1. Read e2e_scenarios from execution-plan.yaml → milestone.gates.e2e_scenarios
2. If no e2e_scenarios defined → skip with note: "No E2E scenarios defined for this milestone."
3. For each e2e_scenario:
a. Check covers_tasks — all tasks must be status=completed or verified
- Any task not complete → scenario is BLOCKED, skip
Output: "E2E-{id}: BLOCKED — {task_id} not yet completed"
b. Execute based on scenario type:
happy_path → Run full positive-path journey
error_path → Inject error, verify degrade behavior
edge_case → Boundary conditions + concurrency
auth_boundary → Test permission crossings
c. If dev-loop-skills E2E capability available:
Invoke: dev-loop-skills:skill-4-test-runner --scenario {e2e_id}
d. If dev-loop-skills unavailable:
Manual execution with pytest (degradation warning at Step 3)
4. Collect results and determine gate:
- Any E2E scenario FAIL → NO-GO (same severity as regression failure)
- All PASS → ✅
- Any BLOCKED → CONDITIONAL GO (document which are pending)
Output: {plans_dir}/e2e-results-{milestone}.yaml
Check that all tasks in this phase have the expected artifacts:
## Artifact Completeness — M1
| Task | eval-doc | test-plan | test-diff | e2e-report |
|------|----------|-----------|-----------|------------|
| T1A.1 | ✅ eval-003 | ✅ plan-003 | ✅ diff-003 | ✅ e2e-003 |
| T1A.2 | ✅ eval-T1A.2 | ✅ plan-T1A.2 | ⚠️ missing | ⚠️ missing |
| T1B.1 | �� eval-004 | ✅ plan-003 | ✅ diff-003 | ✅ e2e-003 |
Yellow/Red tasks (no dev-loop): Check deliverable files exist
| T1A.4 | N/A | N/A | N/A | ✅ agents/*/soul.md |
Result: 15/17 complete artifacts — WARN (2 missing)
Execute or guide E2E scenarios from the kickoff doc:
For each scenario:
### Scenario: Customer sends first message
Steps:
1. Start web server: make run-web
2. Open browser to localhost:8000
3. Send a test message in the chat widget
4. Verify: Message appears, AI response within 5s
5. Verify: Conversation ID assigned
Result: [ ] Auto-testable [x] Manual verification needed
Categorize each scenario:
If Step 3's e2e-report listed any regression_failure rows, copy
each row into the gate report's ## Blocking failures section
verbatim. Do NOT downgrade these to "1 env-blocked, structurally
identical to verified counterpart" — that footnote pattern is what
the design spec §1 explicitly forbids. A regression failure in the
e2e-report means a previously-passing test is now red; that is a
NO-GO regardless of how the new tests perform.
# Milestone M1 Gate Report — {date}
## Summary
| Check | Result | Details |
|-------|--------|---------|
| Task completion | ✅ PASS | 17/17 tasks completed |
| Plan vs Actual files | ✅ PASS | (0.4.1+) 0 missing_create, 0 declared_not_modified, 1 unexpected_create (WARN) |
| Automated tests | ✅ PASS | 42 tests, 0 failures |
| Contract tests | ✅ PASS | 109 cases, all green |
| Artifact completeness | ⚠️ WARN | 2 artifacts missing (non-critical) |
| Build check | ✅ PASS | make check successful |
| Smoke scenarios | 🔵 PARTIAL | 3/5 auto-verified, 2 need manual |
## Overall: ✅ GO (with 2 manual verifications pending)
## Manual Verification Checklist
- [ ] Scenario 3: Browser chat widget renders correctly
- [ ] Scenario 5: Reconnection after network drop
## Recommended Actions
1. Complete manual verifications
2. If all pass → merge to integration branch
3. Run: /retro M1 for retrospective
4. Proceed to M2: /next-task
Priority (top to bottom):
1. Layer-3 drift > threshold? → STAGED (warning, not blocking)
2. Task completion (Step 2) incomplete? → ask whether to proceed
3. Plan-vs-Actual (Step 2.5):
missing_create > 0 → NO-GO
declared_modify_not_modified > 0 → NO-GO
unexpected_create > 0 → WARN
unexpected_modify > 0 → WARN
4. Traceability matrix (Step 2.6):
ORPHAN > 0 (P0 requirement uncovered) → NO-GO
UNTESTED > 0 → CONDITIONAL GO (with remediation list)
ORPHAN only P1 and below → WARN
5. Per-task tests (Step 3):
regression_failure > 0 → NO-GO
new_failure > 0 → SELECTIVE NO-GO (affected tasks only)
6. E2E scenarios (Step 3.5):
E2E FAIL → NO-GO
E2E BLOCKED → CONDITIONAL GO
7. Artifact completeness (Step 4):
Required artifact missing → NO-GO
Optional artifact missing → WARN
8. Superpowers review (Step 7):
Independent review fails → NO-GO
9. Degradation assessment (Step 2.8):
CRITICAL degradation present → downgrade GO to CONDITIONAL GO
SIGNIFICANT degradation present → WARN in gate report
All pass + no CRITICAL degradation → GO ✅
Before declaring GO, apply the following additional layers when the respective skills are available (non-blocking — skip any layer whose skill is unavailable):
superpowers:requesting-code-review. This
dispatches the code-reviewer subagent to audit the milestone's merged
changes against the plan and coding standards. Process its feedback via
superpowers:receiving-code-review (rigorous verification, not blind
agreement). Rationale: Green-task closures inside /continue-task already
run a per-task review; the milestone-level review catches cross-task
integration issues that per-task review cannot see.superpowers:verification-before-completion as a final check before
declaring GO. It enforces "evidence before assertions" — every ✅ in the
gate report must be backed by an observed command output.If superpowers:requesting-code-review or superpowers:verification-before-completion
cannot be resolved:
⚠️ INDEPENDENT REVIEW LAYERS DISABLED — superpowers review capabilities not available.
Milestone gate decision relies SOLELY on automated checks.
No independent code review will be performed.
No evidence-before-assertion verification will be enforced.
Install superpowers to restore full milestone gating.
Record to capability-profile.yaml.
All three layers are advisory: if none are available, the gate decision falls back to the automated-test / artifact / scenario checks above. If they flag critical issues, downgrade the gate decision from GO to CONDITIONAL GO (or NO-GO) regardless of automated-test results.
───────────────────────────────────────────────────── ⬆ /smoke-test complete ─────────────────────────────────────────────────────
📋 Next: /retro {M} — milestone retrospective analysis /next-task — continue with next milestone tasks ─────────────────────────────────────────────────────
npx claudepluginhub ezagent42/prd2impl --plugin prd2implCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.