Skill

testing

Enforces testing methodology: test requirements not code, edge cases first, strong assertions, one behavior per test. Guides users through specification-first TDD and risk-prioritized testing flows.

testing

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/kernel:testing

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadBashGrepGlob

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

reference/testing-research.md

SKILL.md

140 lines · ~2.7k tokens

Stats

LanguageJavaScript

Stars11

MaintenanceExcellent

Last CommitJun 19, 2026

Actions

View Source View Plugin View on GitHub View README

Core Laws (non-negotiable)

TEST REQUIREMENTS, NOT CODE. AI generates tests from code — this validates bugs. Test what SHOULD happen.
EDGE CASES FIRST. Empty, null, boundary, concurrent, error paths. Happy path is least valuable.
STRONG ASSERTIONS ONLY. .toBeTruthy() catches nothing. Assert specific values.
ONE BEHAVIOR PER TEST. If "and" appears in the test name, split the test.
REGRESSION OVER COVERAGE. One test that catches a real bug beats 10 that pad metrics.
NEVER use .skip() or .only() — Claude rewrites tests to pass buggy code rather than fix the bug.

Flow

Step 1 — Specify before writing

Write test case descriptions (inputs + expected outputs) BEFORE requesting implementation.
For AI-generated code: provide spec, then ask for tests, then ask for implementation. (gate: spec exists as comments or descriptions before any test code is written)
If a test name cannot be written in GIVEN/WHEN/SHOULD form, the test is ambiguous — clarify first.
Explicitly declare TDD before requesting tests: state "We are using Test-Driven Development." Without this signal, Claude defaults to implementation-first and writes tests that validate existing code rather than define requirements.
Vague test detection gate: if Claude cannot pass tests on first generation, the spec is underspecified — not the implementation. Stop and clarify before iterating.

Step 2 — Prioritize by risk

Order of testing priority:

Business logic (core correctness rules)
Error handling (what fails gracefully vs. crashes)
Boundary conditions (edges of valid input ranges)
State transitions (auth flows, multi-step workflows)
Integration points (external systems, DB, APIs)
Regression cases (every bug fix gets a test that would have caught it)

Step 3 — Write edge cases explicitly

For every function, enumerate:

Empty / null / undefined inputs
Boundary values (0, -1, MAX_INT, empty string, oversized input)
Invalid type/shape → error, not corrupted result
Security inputs (injection payloads, newline injection) → rejected at entry
Concurrent access or race conditions where applicable (gate: minimum 3 edge cases per non-trivial function)

Step 4 — Name tests as specifications

Format: GIVEN <state> WHEN <action> SHOULD <expected>

GOOD: test('GIVEN email without domain WHEN validated SHOULD return false')
POOR: test('validateEmail regex check')

Pattern for file/function naming: test_{function}_{scenario}_{expected}

Step 5 — Layer by pyramid

Unit tests — isolated functions, fast, many. (primary layer)
Integration tests — component boundaries, DB/API calls, medium speed.
E2E / browser — critical user-visible flows only, slow, few. (gate: E2E count stays low; flaky tests fixed or deleted immediately)

Step 5b — Test signal flexibility

Any deterministic output Claude can read counts as a gate: test suite exit code, linter report, build failure, fixture diff, browser screenshot delta. Don't limit "testing" to unit test runners — if it produces a signal, it can gate a decision.

Step 6 — Review AI-generated tests

Before accepting any AI-generated test, verify:

Does the assertion verify the RIGHT thing? (not just .toBe(true))
Is the test coupled to implementation? (mocks of internals → brittle)
Is state shared between tests? (order-dependent failures)
Does it test requirements or does it mirror code that may be buggy? (gate: at least one negative/rejection case per test file)

Step 7 — Multi-agent test patterns (tier 2+)

Writer/Reviewer split: one agent writes tests (spec only, no implementation), separate agent writes code to pass them.
Parallel per module: when coverage gaps span multiple modules, spawn one agent per module boundary.
Surgeon done-when: acceptance criteria = runnable tests passing, not "code written."
Effort level: use effort: high for test-generation agents; default under-generates edge cases.
BQ testing (agent output): AI agent output is non-deterministic — validate action patterns and behavioral invariants, not exact text. Test WHAT the agent did, not how it phrased it.
Subagent scope reviewer: after implementation, spawn a reviewer agent to verify: every requirement is implemented, listed edge cases have tests, nothing outside task scope changed.
Context-aware verification rigor: solo dev → verify logic + edge cases; team → systematic peer review; production → mandatory gating tests. Match rigor to deployment stakes.
Stop hook gates: configure Stop hooks in hooks/scripts/ to run tests/lint checks mechanically after each turn — moves quality gates from agent honor-system to enforcement (I0.15).
Five agentic workflow patterns (match to test scope): sequential (one agent, ordered steps) · operator (supervisor routes to specialists) · split-and-merge (parallel execution + synthesis) · agent teams (specialist groups per module) · headless (no-UI CI-style automated).
TDD as oracle: tests are the only spec that survives context compression. When the session fills, code and comments drift — tests don't. A passing test suite is stronger evidence of correctness than the agent's own assessment. For AI-assisted work, tests are the persistent oracle.
CI-triggered flaky test repair: detect flaky tests via CI retry logs and trigger an automated agent run per flaky test (inputs: test name + last failure output + relevant source files). Cap auto-fix attempts at 3; escalate to human if the agent can't stabilize the test.
Framework-aware generation: before writing new tests, detect the framework in use (Jest/Vitest/Pytest etc.) and read 2-3 existing test files for pattern conventions. Run coverage gap analysis to find untested branches. Match existing patterns first, fill gaps second — never impose a foreign test style.
Fan-out test migrations: for large codebases, generate a task list then loop claude -p "Write tests for $file covering edge cases" --allowedTools "Edit,Bash(npm test *)" per file. Prototype on 2-3 files, refine the prompt, then run at scale. --allowedTools prevents scope bleed in unattended runs.
Vitest as the 2026 JS/TS default: Vitest has displaced Jest as the primary unit testing framework for JS/TS projects. Before writing new tests, check package.json for vitest vs jest. If Vitest: use describe.concurrent for independent tests, vi.mock() not jest.mock(), and MSW for fetch mocking. Don't assume Jest patterns apply to a Vitest project.

Step 8 — Grader pattern (complex features)

Define success rubric (expected behaviors, edge handling, perf bounds) BEFORE tests or code.
After implementation, spawn a grader agent in fresh context (no knowledge of implementation).
Grader evaluates output against rubric. Failure → specific issues → surgeon takes another pass. (gate: grader verdict is PASS before declaring done on complex features)

Step 9 — JiT testing for high-churn code

Generate tests during code review, not into the static suite, when:

Code changes faster than test suites can track
Refactoring: generate behavioral tests BEFORE changes, run AFTER to confirm preservation Meta 2026: JiT generates ~4x more bug-catching tests than static suite additions for AI-generated code. (reference: testing-research.md § Just-in-Time Testing)

Anti-Patterns (block on sight)

Pattern	Block
`coverage_theater`	High coverage, weak assertions. 100% + `.toBeTruthy()` = nothing caught.
`implementation_coupling`	Tests break on refactor. Test behavior, not structure.
`happy_path_only`	Normal inputs rarely fail. Test edges, nulls, boundaries, concurrent access.
`ai_test_trust`	AI synthesizes tests FROM code → validates bugs. Review what assertions ACTUALLY check.
`flaky_tolerance`	Flaky test = broken test. Fix or delete. Never ignore.
`skip_or_only`	`.skip()` / `.only()` become permanent. Fix or delete.
`print_over_assert`	Print statements are not assertions. Require formal `expect`/`assert` calls.
`snapshot_relaxation`	Never update snapshots to make failing tests pass. Snapshots are the source of truth — if they fail, the code changed unexpectedly. Fix the code, not the snapshot.

Verification Gate

Always provide runnable verification. If you can't verify it, don't ship it. AI writes tests prolifically — review for: does it test the actual risk area, or just the happy path it already handles?

When reviewing AI-generated code in 2026 (40–70% of production is AI-generated), check for intent drift: AI correctly implements what it inferred from the prompt, not what was actually needed. Verify against the original spec, not just the code structure.

<on_complete> agentdb write-end '{"skill":"testing","tests_added":,"coverage_delta":"<+X%>","edge_cases":[""],"assertions":"<strong|weak>"}'

Record what you tested and WHY. Prevent duplicate coverage. </on_complete>

testing

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

testing

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Core Laws (non-negotiable)

Flow

Step 1 — Specify before writing

Step 2 — Prioritize by risk

Step 3 — Write edge cases explicitly

Step 4 — Name tests as specifications

Step 5 — Layer by pyramid

Step 5b — Test signal flexibility

Step 6 — Review AI-generated tests

Step 7 — Multi-agent test patterns (tier 2+)

Step 8 — Grader pattern (complex features)

Step 9 — JiT testing for high-churn code

Anti-Patterns (block on sight)

Verification Gate

Similar Skills

Core Laws (non-negotiable)

Flow

Step 1 — Specify before writing

Step 2 — Prioritize by risk

Step 3 — Write edge cases explicitly

Step 4 — Name tests as specifications

Step 5 — Layer by pyramid

Step 5b — Test signal flexibility

Step 6 — Review AI-generated tests

Step 7 — Multi-agent test patterns (tier 2+)

Step 8 — Grader pattern (complex features)

Step 9 — JiT testing for high-churn code

Anti-Patterns (block on sight)

Verification Gate

Similar Skills