Git Bisect Debugging

Overview

Systematically identify which commit introduced a bug or regression using git bisect. This skill provides a structured workflow for automated, manual, and hybrid bisect approaches.

Core principle: Binary search through commit history to find the exact commit that introduced the issue. Main agent orchestrates, subagents execute verification at each step.

Announce at start: "I'm using git-bisect-debugging to find which commit introduced this issue."

Quick Reference

Phase	Key Activities	Output
1. Setup & Verification	Identify good/bad commits, verify clean state	Confirmed commit range
2. Strategy Selection	Choose automated/manual/hybrid approach	Test script or verification steps
3. Execution	Run bisect with subagents	First bad commit hash
4. Analysis & Handoff	Show commit details, analyze root cause	Root cause understanding

MANDATORY Requirements

These are non-negotiable. No exceptions for time pressure, production incidents, or "simple" cases:

✅ ANNOUNCE skill usage at start:

"I'm using git-bisect-debugging to find which commit introduced this issue."

✅ CREATE TodoWrite checklist immediately (before Phase 1):
- Copy the exact checklist from "The Process" section below
- Update status as you progress through phases
- Mark phases complete ONLY when finished
✅ VERIFY safety checks (Phase 1 - no skipping):
- Working directory MUST be clean (git status)
- Good commit MUST be verified (actually good)
- Bad commit MUST be verified (actually bad)
- If ANY check fails → abort and fix before proceeding
✅ USE AskUserQuestion for strategy selection (Phase 2):
- Present all 3 approaches (automated, manual, hybrid)
- Don't default to automated without asking
- User must explicitly choose
✅ LAUNCH subagents for verification (Phase 3):
- Main agent: orchestrates git bisect state
- Subagents: execute verification at each commit (via Task tool)
- NEVER run verification in main context
- Each commit tested in isolated subagent
✅ HANDOFF to systematic-debugging (Phase 4):
- After finding bad commit, announce handoff
- Use superpowers:systematic-debugging skill
- Investigate root cause, not just WHAT changed

Red Flags - STOP and Follow the Skill

If you catch yourself thinking ANY of these, you're about to violate the skill:

"User is in a hurry, I'll skip safety checks" → NO. Run all safety checks.
"This is simple, no need for TodoWrite" → NO. Create the checklist.
"I'll just use automated approach" → NO. Use AskUserQuestion.
"I'll run the test in my context" → NO. Launch subagent.
"Found the commit, that's enough" → NO. Handoff to systematic-debugging.
"Working directory looks clean" → NO. Run git status to verify.
"I'll verify good/bad commits later" → NO. Verify BEFORE starting bisect.

All of these mean: STOP. Follow the 4-phase workflow exactly.

The Process

Copy this checklist to track progress:

Git Bisect Progress:
- [ ] Phase 1: Setup & Verification (good/bad commits identified)
- [ ] Phase 2: Strategy Selection (approach chosen, script ready)
- [ ] Phase 3: Execution (first bad commit found)
- [ ] Phase 4: Analysis & Handoff (root cause investigation complete)

Phase 1: Setup & Verification

Purpose: Ensure git bisect is appropriate and safe to run.

Steps:

Verify prerequisites:
- Check we're in a git repository
- Verify working directory is clean (git status)
- If uncommitted changes exist, abort and ask user to commit or stash
Identify commit range:
- Ask user for good commit (where it worked)
  - Suggestions: last release tag, last passing CI, commit from when it worked
  - Commands to help: git log --oneline, git tag, git log --since="last week"
- Ask user for bad commit (where it's broken)
  - Usually: HEAD or a specific commit where issue confirmed
- Calculate estimated steps: ~log2(commits between good and bad)
Verify the range:
- Checkout bad commit and verify issue exists
- Checkout good commit and verify issue doesn't exist
- If reversed, offer to swap them
- Return to original branch/commit
Safety checks:
- Warn if range is >1000 commits (ask for confirmation)
- Verify good commit is ancestor of bad commit
- Note current branch/commit for cleanup later

Output: Confirmed good commit hash, bad commit hash, estimated steps

Phase 2: Strategy Selection

Purpose: Choose the most efficient bisect approach.

Assessment: Can we write an automated test script that deterministically identifies good vs bad?

MANDATORY: Use AskUserQuestion tool to present these three approaches (do NOT default to automated):

AskUserQuestion({
  questions: [{
    question: "Which git bisect approach should we use?",
    header: "Strategy",
    multiSelect: false,
    options: [
      {
        label: "Automated - test script runs automatically",
        description: "Fast, no manual intervention. Best for: test failures, crashes, deterministic behavior. Requires: working test script."
      },
      {
        label: "Manual - you verify each commit",
        description: "Handles subjective issues. Best for: UI/UX changes, complex scenarios. Requires: you can manually check each commit."
      },
      {
        label: "Hybrid - script + manual confirmation",
        description: "Efficient with reliability. Best for: mostly automated but needs judgment. Requires: script for most cases, manual for edge cases."
      }
    ]
  }]
})

Three approaches to present:

Approach 1: Automated Bisect

When to use: Test failure, crash, deterministic behavior
How it works: Script returns exit 0 (good) or 1 (bad), fully automatic
Benefits: Fast, no manual intervention, reproducible
Requirements: Can write a script that runs the test/check

Approach 2: Manual Bisect

When to use: UI/UX changes, subjective behavior, complex scenarios
How it works: User verifies at each commit, Claude guides
Benefits: Handles non-deterministic or subjective issues
Requirements: User can manually verify each commit

Approach 3: Hybrid Bisect

When to use: Mostly automatable but needs human judgment
How it works: Script narrows range, manual verification for final confirmation
Benefits: Efficiency of automation with reliability of manual check
Requirements: Can automate most checks, manual for edge cases

If automated or hybrid selected:

Write test script following this template:

#!/bin/bash
# Exit codes: 0 = good, 1 = bad, 125 = skip (can't test)

# Setup/build (required for each commit)
npm install --silent 2>/dev/null || exit 125

# Run the actual test
npm test -- path/to/specific-test.js
exit $?

Script guidelines:

Make it deterministic (no random data, use fixed seeds)
Make it fast (runs ~log2(N) times)
Exit codes: 0 = good, 1 = bad, 125 = skip
Include build/setup (each commit might need different deps)
Test ONE specific thing, not entire suite
Make it read-only (no data modification)

If manual selected:

Write specific verification steps for subagent:

Good example:

1. Run `npm start`
2. Open browser to http://localhost:3000
3. Click the "Login" button
4. Check if it redirects to /dashboard
5. Respond 'good' if redirect happens, 'bad' if it doesn't

Bad example:

See if the login works

Output: Selected approach, test script (if automated/hybrid), or verification steps (if manual)

Phase 3: Execution

Architecture: Main agent orchestrates bisect, subagents verify each commit in isolated context.

Main agent responsibilities:

Manage git bisect state (start, good, bad, reset)
Track progress and communicate remaining steps
Launch subagents for verification
Handle errors and cleanup

Subagent responsibilities:

Execute verification in clean context (no bleeding between commits)
Report result: "good", "bad", or "skip"
Provide brief reasoning for result

Execution flow:

Main agent: Run git bisect start <bad> <good>

Loop until bisect completes:

a. Git checks out a commit to test

b. Main agent launches subagent using Task tool:

For automated:

Prompt: "Run this test script and report the result:

<script content>

Report 'good' if exit code is 0, 'bad' if exit code is 1, 'skip' if exit code is 125.
Include the output of the script in your response."

For manual:

Prompt: "We're testing commit <hash> (<message>).

Follow these verification steps:
<verification steps>

Report 'good' if the issue doesn't exist, 'bad' if it does exist.
Explain what you observed."

For hybrid:

Prompt: "Run this test script:

<script content>

If exit code is 0 or 1, report that result.
If exit code is 125 or script is ambiguous, perform manual verification:
<verification steps>

Report 'good', 'bad', or 'skip' with explanation."

c. Subagent returns: Result ("good", "bad", or "skip") with explanation

d. Main agent: Run git bisect good|bad|skip based on result

e. Main agent: Update progress

Show commit that was tested and result
Calculate remaining steps: git bisect log | grep "# .*step" | tail -1
Example: "Tested commit abc123 (bad). ~4 steps remaining."

f. Repeat until git bisect identifies first bad commit

Main agent: Run git bisect reset to cleanup
Main agent: Return to original branch/commit

Error handling during execution:

Subagent timeout/error: Allow user to manually mark as "skip"
Build failures: Use git bisect skip
Too many skips (>5): Suggest manual investigation, show untestable commits
Bisect interrupted: Ensure git bisect reset runs in cleanup

Output: First bad commit hash, bisect log showing the path taken

Phase 4: Analysis & Handoff

Purpose: Present findings and analyze root cause.

Steps:

Present the identified commit:

Found first bad commit: <hash>

Author: <author>
Date: <date>

<commit message>

Files changed:
<list of files from git show --stat>

Show how to view details:

View full diff: git show <hash>
View file at that commit: git show <hash>:<file>

Handoff to root cause analysis:
- Announce: "Now that we've found the breaking commit at <hash>, I'm using systematic-debugging to analyze why this change caused the issue."
- Use superpowers:systematic-debugging skill to investigate
- Focus analysis on the changes in the bad commit
- Identify the specific line/change that caused the issue
- Explain WHY it broke (not just WHAT changed)

Output: Root cause understanding of why the commit broke functionality

Safety & Error Handling

Pre-flight Checks (Phase 1)

✅ Working directory is clean
✅ In a git repository
✅ Good/bad commits exist and are valid
✅ Good commit is actually good (issue doesn't exist)
✅ Bad commit is actually bad (issue exists)
✅ Good is ancestor of bad
⚠️ Warn if >1000 commits in range

During Execution (Phase 3)

Subagent fails: Log error, allow skip or abort
Build fails: Use git bisect skip, continue
Ambiguous result: Use git bisect skip, max 5 skips
Can't determine good/bad: Ask user for guidance

Cleanup & Recovery

Always run git bisect reset when done (success or failure)
If interrupted, prompt user to run git bisect reset
Return to original branch/commit
If bisect is running and skill exits, warn user to cleanup

Failure Modes

Too many skips: Report untestable commits, suggest narrower range or manual review
Good/bad reversed: Detect pattern (all results opposite), offer to restart with swapped inputs
No bad commit found: Verify bad commit is actually bad, check if issue is environmental

Best Practices

Optimizing Commit Range

Narrow the range first if possible:
- Issue appeared last week? Start from last week, not 6 months ago
- Use git log --since="2 weeks ago" to find starting point
- Use tags/releases as good commits when possible

Writing Good Test Scripts

Do:

✅ Test ONE specific thing
✅ Make it deterministic (fixed seeds, no random data)
✅ Make it fast (runs log2(N) times)
✅ Include setup/build in script
✅ Use proper exit codes (0=good, 1=bad, 125=skip)

Don't:

❌ Run entire test suite (too slow)
❌ Depend on external state (databases, APIs)
❌ Use random data or timestamps
❌ Modify production data

Manual Verification

Be specific:

✅ "API returns 200 for GET /health"
✅ "Login button redirects to /dashboard"
❌ "See if it works"
❌ "Check if login is broken"

Give exact steps:

Run server with npm start
Open browser to http://localhost:3000
Click element with id="login-btn"
Verify URL changes to /dashboard

Common Patterns

Issue Type	Recommended Approach	Script/Steps Example
Test failure	Automated	`npm test -- failing-test.spec.js`
Crash/error	Automated	`node app.js 2>&1
Performance	Automated	`time npm run benchmark \| awk '{if ($1 > 5.0) exit 1}'`
UI/UX change	Manual	"Click X, verify Y appears"
Behavior change	Manual or Hybrid	Script to check, manual to confirm subjective aspects

Progress Communication

After each step: "Tested commit abc123 (<result>). ~X steps remaining."
Show bisect log periodically: git bisect log
Estimate remaining steps: log2(commits in range)
Example: 100 commits → ~7 steps, 1000 commits → ~10 steps

Common Rationalizations (Resist These!)

Rationalization	Reality	What to Do Instead
"User is in a hurry, skip safety checks"	Broken bisect from dirty state wastes MORE time	Run all Phase 1 checks. Always.
"This is simple, no need for TodoWrite"	You'll skip phases without tracking	Create checklist immediately
"I'll just use automated approach"	User might prefer manual for vague issues	Use AskUserQuestion tool
"I'll run the test in my context"	Context bleeding between commits breaks bisect	Launch subagent for each verification
"Working directory looks clean"	Assumptions cause failures	Run `git status` to verify
"I'll verify good/bad commits later"	Starting with wrong good/bad wastes all steps	Verify BEFORE `git bisect start`
"Found the commit, user knows why"	User asked to FIND it, not debug it	Hand off to systematic-debugging
"Production incident, no time for process"	Skipping process in incidents causes MORE incidents	Follow workflow. It's faster.
"I remember from baseline, no need to test"	Skills evolve, baseline was different session	Test at each commit with subagent

If you catch yourself rationalizing, STOP. Go back to MANDATORY Requirements section.

Integration with Other Skills

Called BY systematic-debugging

When systematic-debugging determines an issue is historical:

systematic-debugging detects:
- Issue doesn't exist in commit from 2 weeks ago
- Issue exists now
→ Suggests: "This appears to be a regression. I'm using git-bisect-debugging to find when it was introduced."
→ Invokes: git-bisect-debugging skill
→ Returns: First bad commit for analysis
→ Resumes: systematic-debugging analyzes the breaking change

Calls systematic-debugging

In Phase 4, after finding the bad commit:

git-bisect-debugging completes:
→ Announces: "Found commit abc123. Using systematic-debugging to analyze root cause."
→ Invokes: superpowers:systematic-debugging
→ Context: "Focus on changes in commit abc123"
→ Goal: Understand WHY the change broke functionality

Limitations (By Design)

This skill focuses on straightforward scenarios. It does NOT handle:

❌ Complex merge commit issues (would need --first-parent)
❌ Flaky/intermittent test failures (would need statistical approaches)
❌ Build system failures across many commits (would need advanced skip strategies)

For these scenarios, manual git bisect with user guidance is recommended.

Example Workflows

Example 1: Automated Test Failure

User: "The login test started failing sometime in the last 50 commits."

Claude: "I'm using git-bisect-debugging to find which commit introduced this issue."

[Phase 1: Setup]
- git status → clean
- Good commit: v1.2.0 tag (last release)
- Bad commit: HEAD
- Verify: checkout v1.2.0, run test → passes
- Verify: checkout HEAD, run test → fails
- Range: 47 commits, ~6 steps estimated

[Phase 2: Strategy Selection]
AskUserQuestion: Which approach?
User selects: Automated

Script created:
#!/bin/bash
npm install --silent 2>/dev/null || exit 125
npm test -- tests/login.spec.js
exit $?

[Phase 3: Execution]
Step 1: Launch subagent for commit abc123
  Subagent: "Test failed. Exit code 1. bad"
  Main: git bisect bad → ~3 steps remaining

Step 2: Launch subagent for commit def456
  Subagent: "Test passed. Exit code 0. good"
  Main: git bisect good → ~2 steps remaining

Step 3: Launch subagent for commit ghi789
  Subagent: "Test failed. Exit code 1. bad"
  Main: git bisect bad → ~1 step remaining

Step 4: Launch subagent for commit jkl012
  Subagent: "Test passed. Exit code 0. good"
  Main: git bisect good

Git bisect identifies: ghi789 is first bad commit

[Phase 4: Analysis]
Found first bad commit: ghi789

Author: John Doe
Date: 2025-11-10

feat: update authentication middleware

Files changed:
  src/auth/middleware.js | 12 ++++---
  tests/login.spec.js    |  3 +-

Using systematic-debugging to analyze why this change broke the test...
[Hands off to systematic-debugging skill]

Example 2: Manual UI Regression

User: "The dashboard layout looks wrong, but I'm not sure when it broke."

Claude: "I'm using git-bisect-debugging to find which commit introduced this issue."

[Phase 1: Setup]
- git status → clean
- Good commit: 2 weeks ago (user remembers it working)
- Bad commit: HEAD
- Range: 89 commits, ~7 steps estimated

[Phase 2: Strategy Selection]
AskUserQuestion: Which approach?
User selects: Manual

Verification steps created:
1. Run `npm run dev`
2. Open browser to http://localhost:3000/dashboard
3. Check if the sidebar and main content are side-by-side
4. Respond 'good' if layout is correct, 'bad' if sidebar overlaps content

[Phase 3: Execution]
Step 1: Checkout commit abc123
  Launch subagent with verification steps
  User verifies: "Layout is correct. good"
  Main: git bisect good → ~3 steps remaining

Step 2: Checkout commit def456
  Launch subagent with verification steps
  User verifies: "Sidebar overlaps content. bad"
  Main: git bisect bad → ~2 steps remaining

... continues until first bad commit found

[Phase 4: Analysis]
Found first bad commit: mno345

Author: Jane Smith
Date: 2025-11-08

refactor: migrate to CSS Grid layout

Files changed:
  src/components/Dashboard.css | 25 +++++++++++----------

Using systematic-debugging to analyze the CSS changes...
[Hands off to systematic-debugging skill]

Example 3: Hybrid Performance Regression

User: "API response times got slower recently."

Claude: "I'm using git-bisect-debugging to find which commit introduced this issue."

[Phase 1: Setup]
- git status → clean
- Good commit: last release tag v2.1.0
- Bad commit: HEAD
- Range: 120 commits, ~7 steps estimated

[Phase 2: Strategy Selection]
AskUserQuestion: Which approach?
User selects: Hybrid

Script created:
#!/bin/bash
npm install --silent 2>/dev/null || exit 125

# Run benchmark 3 times, take average
total=0
for i in 1 2 3; do
  time=$(npm run benchmark:api 2>/dev/null | grep "response_time" | awk '{print $2}')
  [ -z "$time" ] && exit 125  # Can't test
  total=$(echo "$total + $time" | bc)
done
avg=$(echo "scale=2; $total / 3" | bc)

# Threshold: 500ms is acceptable
if (( $(echo "$avg > 500" | bc -l) )); then
  exit 1  # bad (too slow)
else
  exit 0  # good (fast enough)
fi

Manual fallback steps:
"If script is ambiguous, manually test API and verify response time is <500ms"

[Phase 3: Execution]
Steps proceed with script automation...
If script returns 125 (can't test), subagent asks user to manually verify

[Phase 4: Analysis]
Found first bad commit: pqr678

Author: Bob Johnson
Date: 2025-11-11

feat: add caching layer for user preferences

Files changed:
  src/api/middleware/cache.js | 45 ++++++++++++++++++++++++++++++++

Using systematic-debugging to analyze the caching implementation...
[Reveals: Cache lookup is synchronous and blocking, causing slowdown]

Troubleshooting

"Good and bad are reversed"

If early results suggest good/bad are swapped:

Stop bisect
Verify issue description is correct
Swap good/bad commits and restart

"Too many skips, can't narrow down"

If >5 commits skipped:

Review skipped commits manually
Check if builds are broken in that range
Consider narrowing the range or manual investigation

"Bisect is stuck/interrupted"

If bisect state is corrupted or interrupted:

git bisect reset  # Clean up bisect state
git checkout main  # Return to main branch
# Restart bisect with better range/script

"Subagent is taking too long"

Set reasonable timeout for verification
If automated: optimize test script
If manual: simplify verification steps
Consider marking commit as 'skip'

Summary

When to use: Historical bugs, regressions, "when did this break" questions

Key strengths:

✅ Systematic binary search (efficient)
✅ Subagent isolation (clean context)
✅ Automated + manual + hybrid approaches
✅ Integrates with systematic-debugging

Remember:

Always verify good is good, bad is bad
Keep test scripts deterministic and fast
Use subagents for each verification step
Clean up with git bisect reset
Hand off to systematic-debugging for root cause

git-bisect-debugging