From workflows
Validates analysis outputs against SPEC.md requirements using data quality checks. Runs between implement and review phases to ensure every requirement has a corresponding output artifact.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:ds-validateAgentuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-pre-subagent-clear.pyReaduv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyGrepuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyGlobuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyWriteuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyEdituv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyBashuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyAgent|WorkflowGATE_ARTIFACT=.planning/IMPLEMENT_COMPLETE.md GATE_STATUS=COMPLETE GATE_DESCRIPTION="Implementation complete" GATE_REMEDY="Finish ds-implement (all PLAN.md tasks verified in LEARNINGS.md) before validating outputs." GATE_BLOCKED_TOOLS=Agent,Workflow uv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/phase-gate-guard.pyAgentuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-post-subagent-guard.pyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Announce: "Using ds-validate (Phase 3.5) to validate analysis outputs against SPEC.md requirements."
Announce: "Using ds-validate (Phase 3.5) to validate analysis outputs against SPEC.md requirements."
Phase 3.5 of the DS workflow (between implement and review). Maps every SPEC.md requirement to an output artifact and runs data quality checks.
## The Iron Law of ValidationNO REVIEW WITHOUT VALIDATION. This is not negotiable.
ds-review MUST NOT start until .planning/VALIDATION.md confirms all requirements have outputs. Validation is the DS equivalent of test coverage — without it, review is theater.
DS validation does NOT auto-fill gaps. Dev's test-gap-auditor can write missing tests. DS gaps require human judgment — a wrong output means a wrong analysis, not just a missing test. When gaps are found, present them to the user and let the user decide: fix (return to implement) or accept (proceed to review).
Before running runtime DQ checks, run the static analysis constraint check suite:
bash "${CLAUDE_SKILL_DIR}/../../scripts/check-all-ds.sh" "$(pwd)"
This runs all DS constraint check scripts (determinism, join audits, idempotency, error handling, schema contracts, standard errors, visualization integrity).
If any check FAILS: Report the failures in LEARNINGS.md. These are code quality issues in the analysis scripts that must be fixed before proceeding. Dispatch a fix subagent if needed.
If all checks PASS: Proceed to runtime DQ checks.
This flowchart IS the specification. If prose elsewhere and this diagram disagree, the diagram wins.
┌──────────────────────────────────────────────┐
│ 0. RUN static analysis suite (check-all-ds.sh)│
└───────────────────┬──────────────────────────┘
all pass? │
┌──── no ───────┴────── yes ──────┐
▼ ▼
┌──────────────────┐ ┌───────────────────────────────────┐
│ log to LEARNINGS │ │ 1-4. READ SPEC / PLAN / LEARNINGS, │
│ + dispatch fix │ │ DISCOVER ds-checks.md │
│ subagent, re-run │ └─────────────────┬─────────────────┘
└────────┬─────────┘ ▼
│ ┌────────────────────────────────────┐
│ │ 5. RUN ds-validate-coverage workflow│
│ │ (one read-only validator/requirement│
│ │ → JS gate, NOT a hand-tallied score)│
│ └─────────────────┬──────────────────┘
│ ▼
│ ┌────────────────────────────────────┐
│ │ 6. RENDER .planning/VALIDATION.md │
│ │ from the workflow result │
│ └─────────────────┬──────────────────┘
│ JS gate │
│ ┌── gaps_found ───────┴── validated ──┐
│ ▼ ▼
│ ┌──────────────────────┐ ┌──────────────────────┐
└──▶│ decision checkpoint: │ │ proceed to ds-review │
│ user fix-vs-accept │ │ (gate: status= │
│ (see Gate section); │ │ validated) │
│ accept ⇒ flip status │ └──────────────────────┘
│ to validated │
└──────────────────────┘
Note: Steps 1-4 stay in this skill as the reading/discovery preamble — the workflow's own Discover phase re-resolves them authoritatively, but reading them here lets the skill present context and decide scope before invoking the workflow.
Read .planning/SPEC.md and extract every requirement:
For each requirement in SPEC.md:
- Extract the requirement description
- Note the success criteria
- Note the expected output (table, figure, file, etc.)
Read .planning/PLAN.md and extract:
Read .planning/LEARNINGS.md and extract:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-implement/references/ds-checks.md and follow its instructions.
The per-requirement DQ fan-out and the COVERED/PARTIAL/MISSING + validated|gaps_found gate are owned by a ultracode workflow — a script, not hand-dispatched agents. This is why: the validators return RAW DQ statuses and the gate is computed in pure JS from those statuses, so the model can no longer tally the composite by hand (the old honor-system gate). The workflow also isolates one validation transcript per requirement out of main context.
1. Resolve the cached workflow path:
WF=$(command ls -d ~/.claude/plugins/cache/edwinhu-plugins/workflows/*/workflows/ds-validate-coverage.js 2>/dev/null | sort -V | tail -1)
# Local-plugin fallback (running from source, cache empty):
[ -z "$WF" ] && WF="${CLAUDE_SKILL_DIR}/../../workflows/ds-validate-coverage.js"
echo "$WF"
2. Run it (full pass first; on a re-run after fixes, pass onlyChecks + priorReviews from the prior result):
Workflow({ scriptPath: "<WF>", args: { projectDir: "<abs project dir>", pluginRoot: "<abs .../workflows dir>" } })
The workflow fans out one read-only validator per in-scope SPEC requirement (running DQ1-DQ5 + M1 from ds-checks.md), then computes — in JS, from raw statuses — each requirement's classification and the overall status. It returns { overallPass, status, counts, scoreTable, findings, reviews, reviewersThatFlagged }.
Do NOT recompute or rationalize the gate — result.status and result.overallPass are computed in JS. Write .planning/VALIDATION.md using result.scoreTable as the Requirements Map, result.counts for the frontmatter totals, and result.findings under DQ Details:
status: <result.status> # validated | gaps_found — verbatim from the workflow
requirements_total / covered / partial / missing: <result.counts>
Requirements Map: <result.scoreTable>
DQ Details: <result.findings>
The /goal fix loop stays in this skill: if status: gaps_found, present gaps (Step "Gate" below) and let the user decide fix vs accept. On a fix-and-re-validate cycle, re-run the workflow with onlyChecks: <prev result.reviewersThatFlagged> and priorReviews: <prev result.reviews> so unflagged requirements carry forward and only the gaps re-run live.
Each requirement is validated at four levels, in order:
| Level | Check | Example |
|---|---|---|
| 1. Exists | Output file/variable present | output/results.csv exists |
| 2. Substantive | Real data, not empty | >0 rows, expected columns present |
| 3. DQ Passes | DQ1-DQ5 pass | No dupes on key, nulls handled, row counts trace |
| 4. Answers Question | Addresses SPEC.md requirement | Table includes specified variables |
For each requirement, assign a classification:
| Classification | Criteria |
|---|---|
| COVERED | All 4 validation levels pass |
| PARTIAL | Output exists but DQ issues found or doesn't fully address requirement |
| MISSING | No output found for this requirement |
---
status: validated | gaps_found
date: [ISO 8601]
requirements_total: N
covered: N
partial: N
missing: N
---
# Output Validation
## Requirements Map
| # | Requirement | Output | DQ1 | DQ2 | DQ3 | DQ4 | DQ5 | M1 | Classification |
|---|-------------|--------|-----|-----|-----|-----|-----|----|----------------|
| 1 | [from SPEC] | [path] | PASS | PASS | PASS | PASS | PASS | PASS | COVERED |
| 2 | [from SPEC] | [path] | PASS | WARN | PASS | PASS | PASS | PASS | PARTIAL |
| 3 | [from SPEC] | — | — | — | — | — | — | — | MISSING |
## DQ Details
[For any non-PASS check, include the specific finding]
## Summary
- Requirements: N total
- Covered: X
- Partial: Y
- Missing: Z
| Condition | Status |
|---|---|
| All requirements COVERED | validated |
| Any PARTIAL or MISSING remain, user has NOT yet decided | gaps_found |
| Gaps remain BUT the user explicitly accepted them | validated (+ ## Accepted Gaps section) |
Status validated means "dispositioned and cleared to proceed" — either clean, OR gaps the user explicitly accepted. The downstream ds-review gate (GATE_STATUS=validated) blocks on gaps_found, so an undispositioned gaps_found cannot silently pass into review. This is the structural backstop for the decision checkpoint below — do not rely on the prose alone.
When the user accepts gaps, rewrite VALIDATION.md frontmatter status: gaps_found → status: validated and append:
## Accepted Gaps
The user reviewed and accepted these gaps on proceeding to review:
- [REQ-ID] [PARTIAL/MISSING]: [what is incomplete and why the user accepted it]
When presenting validation results to the user (especially gaps), generate diagnostic plots to accelerate the decision:
| Validation Finding | Diagnostic to Generate |
|---|---|
| DQ2: High-null columns | Missingness heatmap (columns × rows) |
| DQ3: Duplicate rows | Duplicate count bar chart by key columns |
| DQ4: Row count mismatch | Pipeline waterfall chart (stage × row count) |
| DQ5: Suspicious cardinality | Value frequency distribution plot |
| PARTIAL requirements | Side-by-side: expected vs actual output summary |
When to generate: Only at decision checkpoints where the user must choose fix vs accept. Do not generate plots for COVERED requirements (no decision needed).
Format: Inline matplotlib/seaborn plots in notebooks, or saved to scratch/diagnostics/ for script-based workflows.
Checkpoint type: human-verify (VALIDATION.md status is machine-verifiable)
.planning/VALIDATION.md must exist before proceeding.
validated: human-verify checkpoint — auto-advanceable; proceed to ds-review.gaps_found: decision checkpoint — present gaps to user before proceeding.
status: validated and append the ## Accepted Gaps section (see Status Rules) BEFORE proceeding. The ds-review gate hooks on status: validated — leaving it at gaps_found will (correctly) block review, because an undispositioned gaps_found is indistinguishable from "user never decided."When the user chooses fix, the cycle ds-validate → ds-implement → ds-validate repeats. This loop is bounded — it does not cycle indefinitely. Track it in .planning/VALIDATE_STATE.md (analogous to ds-review's REVIEW_STATE.md):
---
iteration: 1
max_iterations: 3
status: gaps_found # gaps_found | validated
last_gaps: [REQ-ID, ...] # requirement IDs still PARTIAL/MISSING
---
iteration.gaps_found, STOP looping. Escalate to the user with a structured choice (AskUserQuestion): fix again (override the cap with explicit instruction), accept remaining gaps (flip to validated + Accepted Gaps), or rethink (return to /ds for re-planning). Do not silently start a 4th fix cycle — repeated failure to close the same gap is a signal the plan or data is wrong, not that one more pass will help.This is the critical difference from dev-test-gaps. In dev, missing tests can be auto-generated. In DS, missing or wrong outputs mean the analysis itself may be wrong. Only the user can judge whether a gap is acceptable.
After validation is complete, discover and read the ds-review skill:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-review/SKILL.md and follow its instructions.
npx claudepluginhub edwinhu/workflows --plugin workflowsVerifies data science analysis results for reproducibility and completion, using guardrails to gate tool usage until approval.
Validates implementation against spec using 6 gates (coverage, proof artifacts, credential safety) and generates a coverage matrix report.
Validates CSV/TSV/Excel files and data analyses for quality, completeness, uniqueness, accuracy, consistency, outliers, and bias using qsv stats and frequency tools.