Skill

debug

Systematic debugging methodology using Zeller's scientific method: reproduce, hypothesize, isolate via binary search, fix root cause, and add regression tests.

developer-tools

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/kernel:debug

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadBashGrepGlob

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Supporting Files

reference/debug-research.md

SKILL.md

136 lines · ~2.6k tokens

Stats

LanguageJavaScript

Stars11

MaintenanceExcellent

Last CommitJun 19, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

Debugging is forming and testing a THEORY that explains the bug. Not random changes. Not guessing. Scientific method applied to code. DEFECT (in code) → INFECTION (in state) → FAILURE (visible symptom). The failure you see is NOT where the bug is. Binary search upstream. Systematic methodology achieves ~95% first-time fix rate vs ~40% ad-hoc. The process is the multiplier. AgentDB read-start has run. Check past failures—you may have seen this pattern. Skill-specific: skills/debug/reference/debug-research.md

REPRODUCE — get specific before touching code.
- Document: exact input, expected output, actual output (full stack trace), environment, frequency.
- "Sometimes fails" is not a reproduction. Get deterministic.
- (gate: can reproduce consistently, OR have added targeted logging to wait for next occurrence)
HYPOTHESIZE — list 3 causes before pursuing any.
- Read ALL error output first (anchoring bias mitigation).
- Write each hypothesis to AgentDB. Prevents circular re-investigation.
- (gate: 3 candidate hypotheses written; none pursued yet)
ISOLATE — binary search, O(log n) not O(n).
- Code: call chain A→B→C→D→E fails → check midpoint C → recurse into failing half.
- Time: git bisect between known-good and known-bad commit. ~10 tests for 1000 commits.
- Input: large failing input → split in half → recurse to minimal reproduction case.
- Instrument at boundaries: log inputs/outputs at each layer boundary.
- Mock external dependencies to isolate which one causes failure.
- (gate: failure localized to a specific function/commit/input subset)
ROOT CAUSE — the error line is the FAILURE. The DEFECT is upstream.
- Ask: what assumption was violated? What invariant broke?
- If you can't explain WHY it broke, you haven't found root cause.
- Top causes by frequency: wrong input shape/type · off-by-one · missing null check · race condition · shared-state mutation · wrong comparison operator · variable scope · swallowed error · API contract mismatch · environment difference.
- (gate: can state root cause in one sentence explaining the violated invariant)
FIX — root cause, not symptom.
- Fix the DEFECT, not the FAILURE site. (Null check at crash site = symptom fix.)
- Write regression test that fails before fix, passes after.
- Run: original failing case + edge cases + full regression suite.
- Commit fix + test together.
- (gate: regression test green; original failing case passes)

<cognitive_biases> Seek DISCONFIRMING evidence first. Ask: "What would I see if my hypothesis were WRONG?" Read ALL output before investigating. List 3 causes before pursuing any. Past patterns are hypotheses, not conclusions. Check AgentDB but test them. 15 min with no evidence for hypothesis → abandon it. Write what you tested, move on. Re-run the EXACT original failing case. "Seems to work" is not evidence. </cognitive_biases>

<anti_patterns> Random changes until it works. Even if it works, you don't know why. Fragile. Change something, don't test original case, declare victory. Bug returns. Null check at crash instead of asking why null. Real defect is upstream. Logging everywhere. Use binary search first, then targeted logging. It's almost never the library. Check your usage first. Asking Claude to "investigate" without scope. Claude reads hundreds of files, filling context before doing anything useful. Scope investigations narrowly ("look at src/auth only") or use a subagent — the summary returns, the file reads don't. </anti_patterns>

<when_stuck>

Explain problem in writing (rubber duck — engages System 2 cognition).
Read error message carefully. Answer is there 80% of the time.
Simplify to minimal reproduction case.
"What changed?" Check git log, git diff, dependency updates, env changes.
Search exact error message in quotes.
Step away. Bias accumulates under sustained focus.
Plan Mode: Paste stack traces in Plan Mode first. Analyze, form hypotheses, get approval — then switch to Act mode for fixes.
Rewind: Esc+Esc or /rewind restores code to a pre-bug checkpoint without losing conversation context. Checkpoints survive terminal closes.
Visibility-first: Paste raw terminal output / error logs / screenshots directly. Raw data beats your interpretation.
Evidence-first, not assertion: Show the actual evidence (test output, exact command + result, screenshot) rather than stating "it works." Reviewing evidence is faster than re-running verification, and catches cases where "seems to work" masks a different failure path.
Extended thinking: for complex bugs with no clear hypothesis after all above, request deep analysis using extended thinking. Deliberate multi-step reasoning before output catches subtle root causes that fast responses miss.
--verbose mode: run claude --verbose for hard-to-reproduce bugs — shows tool calls, thinking steps, and execution paths in real time. Catches silent failures and misparsed outputs.
Native debugger: when an IDE or runtime debugger is available (VS Code, Xcode LLDB, Chrome DevTools), prefer it over print-statement flooding — set a breakpoint at the suspected boundary, inspect locals at the failure point, evaluate expressions mid-run without re-running from scratch. The call stack tells you the execution path that led to the state.
Pipe raw data: cat error.log | claude or npm run build 2>&1 | claude — pipe terminal output directly instead of copy-pasting. Claude receives full fidelity output without truncation or formatting artifacts.
Test-driven debugging: when failing behavior is covered by a test suite, run the tests first — failing test names + assertions mark the exact defect entry point. Trace backwards from the failing assertion rather than reading code cold; the test already has the failure isolated to a function boundary. npm test -- --watch <file> or pytest -x stops on first failure. Lets the test suite do the reproduction work. </when_stuck>

- 30+ min on single hypothesis with no evidence → abandon it. - 3+ hypotheses rejected → step back, re-examine assumptions. - 2 failed fix attempts → orchestrator invokes tearitapart. May be design problem, not implementation. - 2+ failed corrections in same session → `/clear`, rewrite initial prompt with lessons learned. Clean session beats polluted long session. - 3 consecutive responses under ~500 tokens with no progress → anchor-drift. `/clear`, strip to minimal reproduction, restart. - Bug only in production → add targeted monitoring, document, move on.

<parallel_debug_strategy> Competing hypothesis agents: for bugs with 3+ plausible causes, spawn one agent per hypothesis with fresh context. Each reports: evidence_for | evidence_against | confidence (0-1). Coroner agent synthesizes. Fresh context catches what a long session anchors past. When stuck >30 min, spawn fresh agent with ONLY: exact input, expected vs actual, relevant code section. Trial segmentation: when debugging multi-agent or long-running systems, break the execution trace into segments and analyze each independently. Shorter segments → cleaner causal attribution. Full traces cause spurious correlations that obscure root cause. One segment = one causal claim. See: skills/debug/reference/debug-research.md — Parallel Debug Strategy. </parallel_debug_strategy>

<agentic_debugging>

Check whether bug was introduced by an AI edit: git log --oneline -20, then git bisect.
AI edits tend to: drop early returns, misplace null checks, silently change API call signatures.
Distinguish tool-call failure (environment/permissions) from logic failure (code) — different fixes.
Check git status / git diff for interrupted-state partial writes before assuming canonical version.
Reduce to minimal reproduction BEFORE spawning agents. Agents given vague reproductions reproduce the wrong thing. </agentic_debugging>

<persistent_truth_file> Create _meta/context/DEBUG.md at session start. Update throughout. Claude reads it before each attempt. Prevents circular re-investigation across context compression boundaries.

Template fields: Problem Statement · What We Know · Approaches Tried (with failure reason) · Current Hypothesis · Next Steps. Full template: skills/debug/reference/debug-research.md — Persistent Investigation File. </persistent_truth_file>

<on_complete> agentdb write-end '{"skill":"debug","bug":"","root_cause":"<what_broke>","fix":"<what_fixed>","test":"<regression_test_name>","learned":"<pattern_for_future>"}'

Always record debugging learnings. They prevent repeat investigation. </on_complete>

debug

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

debug

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Similar Skills

Similar Skills