Skill

code-review-deep

In-depth multi-pass code review of a GitHub change (local checkout). Decomposes the diff into logical review units, reviews each on the best available model (docs on the Haiku model), runs an advisory architectural pass on the highest-risk units, scores findings with an FP-aware scorer, consults the second-brain for conventions and prior reviews, and records false positives. Local output by default; --comment posts to the PR.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/second-brain:code-review-deep [<PR#>] [--comment] [--base <branch>]

User invocable

Model invocable

Inline context

Default effort

Argument hint[<PR#>] [--comment] [--base <branch>]

Tool Access

This skill is limited to the following tools:

ReadWriteBash(gh pr view *)Bash(gh pr comment *)Bash(gh pr list *)Bash(gh pr diff *)Bash(gh api *)Bash(gh repo view *)Bash(git diff *)Bash(git log *)Bash(git blame *)Bash(git rev-parse *)Bash(git merge-base *)Bash(git branch *)Bash(git status *)Bash(git remote *)Agentmcp__plugin_second-brain_knowledge-base__knowledge_searchmcp__plugin_second-brain_knowledge-base__episodic_search

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Thorough, multi-pass review of a GitHub change using the LOCAL git checkout for

SKILL.md

318 lines · ~5.3k tokens(exceeds 5k compaction limit)

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitJun 24, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Deep Code Review

Thorough, multi-pass review of a GitHub change using the LOCAL git checkout for diffs and file contents, and gh only for PR metadata and posting. Make a todo list first, then follow these passes precisely.

Arguments

<PR#> (optional): review that GitHub PR. Without it, review the current branch vs its base.
--comment: post the result as a PR comment (requires a PR context). Without it, output to the terminal only.
--base <branch>: override the auto-detected base branch.

Pass 0 — Eligibility + context load

Dispatch an agent on the Haiku model (pass model: "haiku") for each mechanical sub-step below — eligibility (step 2), CLAUDE.md discovery (step 3), and the change summary (step 4) — so the expensive orchestrator model is not spent on, and its context not bloated by, mechanical work. The decomposition (Pass 1) and the Pass 4 re-check likewise run on the Haiku model.

Resolve scope & base.
- Determine owner/repo from git remote get-url origin (or gh repo view --json nameWithOwner).
- Base branch: --base if given; else the PR's base (gh pr view <#> --json baseRefName); else git merge-base HEAD origin/main (fall back to origin/master). Record the base ref to use in diffs as origin/<base>.
- Head SHA: full local git rev-parse HEAD. Use this SHA LITERALLY in output links — never compute it inside a URL.
Eligibility (only if a PR is in scope). gh pr view <#> --json state,isDraft,headRefName,headRefOid,baseRefName,title,body,author. Stop (do not review) if: closed; draft / WIP; obviously automated or trivial; or we already posted a review (scan gh pr view <#> --comments for our "### Deep code review" + "Generated with [Claude Code]"). Sync guard: if local branch != headRefName → stop: "Local branch <cur> != PR source <headRefName>. Run: git checkout ". If local HEAD != headRefOid → stop: "Local HEAD <local[:8]> != PR head <headRefOid[:8]>. Run: git pull".
CLAUDE.md discovery. git diff --name-only origin/<base>...HEAD; Read the root CLAUDE.md (if any) and any CLAUDE.md in directories of changed files.
Change summary. From the PR title/body (or git log origin/<base>..HEAD --oneline when no PR), produce a concise summary.
Second-brain reads.
- knowledge_search with 3–5 keywords drawn from the changed paths/stack → collect convention/decision pages. Pass their text as "project conventions" alongside CLAUDE.md.
- episodic_search for prior reviews touching these files/this repo → distill a short "previously flagged / previously dismissed here" note.
- Prior-PR review-comment mining (Haiku step, finding-generating). Discover prior PRs touching the changed files: git log origin/<base> -n 200 -- <changed files>, parse PR numbers from merge/squash subjects (Merge pull request #N, (#N)), cap at the ~10 most-recent distinct PRs. For each, gh api repos/<owner>/<repo>/pulls/<N>/comments for inline review comments; keep only comments whose path is among the currently changed files (or same directory). Produce TWO outputs: (a) fold durable observations into the prior-review note above; (b) for each comment that STILL APPLIES (the change re-introduces/retains the concern), emit a prior-review candidate finding {file, lines, category: prior-review, severity, title, explanation citing PR #N + the comment, is_migrated_code}. These findings flow into Pass 3 dedup + scoring exactly like the per-unit findings. Best-effort: no remote / no gh / no PR history → skip silently and note "no prior-PR signal".
- Read ~/.second-brain/review-false-positives.md if it exists (else treat as empty). Hold its contents for Pass 3.
Change-intent classification (Haiku step). From the PR title/body (or git log origin/<base>..HEAD --oneline when no PR), set is_bugfix = does this change CLAIM TO FIX a reported runtime behavior (vs. a feature / refactor / docs / test-only change)? Record it — it gates Pass 3.5.
Fragile-premises note. Read ~/.second-brain/review-fragile-premises.md if it exists (else treat as empty). Hold its contents for Pass 2d.

Pass 1 — Review-unit decomposition (Haiku model)

git diff --stat origin/<base>...HEAD. Group changed files into logical review units (implementation + its tests; module/package cohesion; cross-layer feature slices; config/infra serving one purpose). Skip 100%-deleted files and trivial-only changes (whitespace, import reorder, version bump). Split any unit

15 files or ~3000 lines. Cap at 15 units (merge smallest if over). Tag each unit priority: critical (auth/security/data/access) | high (core logic, user-facing) | medium (utilities/internal) | low (config). Set docs_only: true ONLY when every file is prose documentation — matches *.md/*.mdx/*.txt/*.rst or lives under docs/** — AND none lives under skills/**, agents/**, tests/**, or any other executable/prompt tree. Those files ARE the product (a SKILL.md or agent .md is code-as-prompt) and must be reviewed on the best model, not the Haiku model. Config files (*.json, *.yaml, *.toml, dotfiles) are code-side, NOT docs. Any unit tagged critical or high is docs_only: false regardless of extension. There is no early-exit — true docs units are still reviewed, just on the Haiku model. Emit JSON:

[{"name":"...","files":["..."],"priority":"critical","skip":false,"docs_only":false}, ...]

Filter skip:true; sort critical-first; record the skipped count.

Pass 2 — Per-unit review (parallel agents, model by code-vs-docs)

Dispatch one Agent(subagent_type: "second-brain:code-review-unit-reviewer") per non-skipped unit, choosing the model by unit kind:

code units (docs_only: false): dispatch with NO model override — the agent inherits the session model, i.e. the best model available (the v2 directive).
doc units (docs_only: true): dispatch with model: "haiku" — docs don't need deep reasoning.

Dispatch in waves of at most 5 concurrent agents (not all 15 at once): run critical/high code units first, then medium/low code units, then doc units. Pack each wave to the cap from the priority-sorted list — backfill spare slots from the next tier; don't leave slots idle. The wave cap bounds peak agent count and RAM. Pass each agent: unit name + file list, origin/<base> as the base ref, the change summary, the combined project conventions (CLAUDE.md + wiki pages), and the episodic prior-review note. Each agent returns structured findings only (no file bodies). Collect them.

Pass 2b — Architectural pass (advisory, parallel)

If at least one critical or high unit exists, dispatch exactly ONE Agent(subagent_type: "second-brain:quality-reviewer") over the deduped union of all critical+high unit files. It occupies one slot in wave 1 (as do the Pass 2c history reviewer and the Pass 2d premise reviewer when they run) — so wave 1 holds at most 2 unit-reviewers + the architectural + history + premise reviewers (≤5 concurrent total), keeping the cap intact. Each skipped advisory/lens pass returns its slot to unit-reviewers (all three of 2b/2c/2d run → ≤2 unit-reviewers; any two run → ≤3; any one runs → ≤4; none → ≤5 unit-reviewers) — the ≤5 cap holds in every combination. It depends only on Pass 1's unit list, not Pass 2's findings. Pass it origin/<base> (the SAME base-ref form Pass 2 uses), the change summary, and the file set, and instruct it to scope findings to lines changed since that ref — ignore pre-existing issues on untouched lines. If there are no critical/high units, skip this pass.

Its CRITICAL/WARNING/INFO output is collected verbatim for a separate "Architectural notes (advisory)" section in Pass 4. These notes are advisory only: they are never scored or recorded as false positives, and are kept distinct from the numbered bug findings.

Pass 2c — History / regression pass (scored, parallel)

If at least one non-skipped code unit exists (docs_only: false), dispatch exactly ONE Agent(subagent_type: "second-brain:code-review-history-reviewer") over the deduped union of all non-skipped code-unit files. It occupies one slot in wave 1 alongside the architectural reviewer (see the Pass 2b wave-1 note). It depends only on Pass 1's unit list, not Pass 2's findings, so it runs concurrently. Pass it origin/<base> (the SAME base-ref form Pass 2 uses), the change summary, the combined project conventions (CLAUDE.md + wiki), and the episodic prior-review note. Unlike the architectural pass, its findings ARE bugs (category regression): they flow into Pass 3 dedup + scoring exactly like the per-unit findings. If every unit is docs-only, skip this pass.

Pass 2d — Runtime-premise pass (scored, parallel)

If at least one non-skipped code unit exists (docs_only:false), dispatch exactly ONE Agent(subagent_type:"second-brain:code-review-premise-reviewer") over the deduped union of all non-skipped code-unit files. It occupies one slot in wave 1 alongside the architectural (2b) and history (2c) reviewers. It depends only on Pass 1's unit list, not Pass 2's findings, so it runs concurrently. Pass it origin/<base>, the change summary, the combined project conventions (CLAUDE.md + wiki), the prior-review note, and the review-fragile-premises.md contents from Pass 0. Its findings (category premise) flow into Pass 3 dedup + scoring exactly like the per-unit findings. The premise reviewer NAMES unproven runtime premises (the bug class diff-static review misses); Pass 3.5 PROBES them. If every unit is docs-only, skip this pass.

Pass 3 — Dedup + scoring + filter

Dedup: if a shared file produced the same finding in two units, keep the better-explained one. On a cross-pass collision between a regression finding (Pass 2c, which cites a prior commit short-SHA) and a non-regression finding on the same line, prefer the regression one — its commit citation is what makes it actionable and would otherwise be lost. prior-review findings (Pass 0) participate in dedup and scoring like any other finding.
Score: for each unique finding dispatch Agent(subagent_type: "second-brain:code-review-scorer"), passing the finding, its file paths, the project conventions, and the false-positive store contents from Pass 0. A premise finding (Pass 2d) scores HIGH when the premise is load-bearing AND unproven AND — if Pass 3.5 ran — shown BROKEN; LOW when Pass 3.5 confirmed it holds or it is established/defended. A premise Pass 3.5 marked BROKEN is force-promoted to confirmed (≥70) regardless of the scorer's number. For findings of severity critical or high, run a 3-vote refuter panel instead of a single scorer: one normal code-review-scorer plus two dispatched with REFUTE MODE in their task. The final score is the median of the three (mathematically identical to "confirmed iff >= 2 of 3 score >= 70"), so it drops into the >= 70 / 16-69 / <= 15 partition with no rule change. The panel scorers inherit the session/best model (a quality floor — a refuter must out-reason the finder); do NOT pin them to a cheaper model. Medium/low findings keep the single scorer. Dispatch scorers and refuter-panel votes in the same waves of at most 5 concurrent agents as Pass 2 — Pass 3 scoring shares the ≤5 wave cap (the refuter panel multiplies Pass-3 agents ×3 for each critical/high finding, so this bounds peak agent count and RAM on a constrained host, exactly as in Pass 2).
Partition the scored findings into three buckets (keep all until Pass 4):
- confirmed (score ≥ 70): the numbered review output, sorted by severity then score.
- low-confidence (score 16–69): NOT confirmed, but surfaced in Pass 4 as a separate, clearly-labeled "Lower-confidence findings" section so a real but hard-to-verify bug is never silently dropped. Retain until Pass 4.
- killed-hard (score ≤ 15): dropped — neither shown nor recorded. The scorer now inherits the session model (matches the reviewer it gates), so a ≤15 kill is trustworthy enough to drop without recording.

Pass 3.5 — Bug-fix real-env verification (orchestrator, gated)

Runs ONLY when is_bugfix (Pass 0) AND Pass 2d flagged ≥1 load-bearing premise. This is the ONE step that executes code — run by the orchestrator (this trusted session), NEVER by a sandboxed PR-influenced agent.

Confirm with the user. Print exactly what each proof_probe will run; it executes code. On decline: skip, mark the premise findings "unverified (user declined)", continue to Pass 4. Never blocks the review.
Probe each flagged premise via its proof_probe, exercising the changed code path in the real env — the actual environment state, NOT a sandbox that sets convenient values. Record holds / BROKEN. A BROKEN premise elevates its finding to confirmed critical ("fix does not hold in the real runtime").
Failure-regime test check. Confirm the change adds/modifies a test that exercises the premise's FALSE regime (e.g. the env var UNSET). Missing → a test-gap finding ("no test covers the regime where the bug occurs").

Best-effort: any probe error is reported, never fails the review.

Pass 4 — Output + false-positive write-back

Output.
- Default (no --comment): print the formatted review to the terminal.
- --comment: re-run the eligibility check (still open, no competing deep review posted since we started), then gh pr comment <#> --body "...".
- Comment format (no emojis):
```
### Deep code review

Analyzed X review units (Y files, Z skipped as trivial). Found N issues:

1. **<brief description>** (category: severity)

<link>

...

Generated with [Claude Code](https://claude.ai/code) using second-brain:code-review-deep
```
  Categories include: logic-error, type-safety, cross-file, edge-case, test-gap, convention, security, infrastructure, regression, premise, prior-review.
  
  Or, if none: Analyzed X review units (Y files, Z skipped as trivial). No issues found.
- Lower-confidence findings (unverified). If the low-confidence bucket (score 16–69) is non-empty, append — after the numbered confirmed findings and before the architectural notes — a section titled Lower-confidence findings (unverified — may be false positives). Lead with one line noting these were found but not confirmed at high confidence and may include false positives, then list each as - **<brief>** (category: severity) followed by its <link>. Keep it visually distinct from the numbered confirmed list and from the architectural notes so the three are never conflated. For --comment, post it under that same subhead. If the bucket is empty, omit the section.
- Architectural notes (advisory). If Pass 2b ran, append after the numbered findings a section titled Architectural notes (advisory — not blocking) containing the quality-reviewer output. For --comment, post it under that same labelled subhead, visually separated from the numbered bug list so a reader never mistakes an architectural opinion for a confirmed bug.
- Runtime-premise verification. If Pass 3.5 ran, append a section titled Runtime-premise verification listing each probed premise with holds / BROKEN and a one-line real-env evidence note. A confirmed BROKEN premise may be appended (user-confirmed) to ~/.second-brain/review-fragile-premises.md using the format below — same best-effort, never-fail discipline as the false-positive store.
- Link format (literal full SHA, renders in Markdown): https://github.com/<owner>/<repo>/blob/<FULL-SHA>/<path>#L<start>-L<end> — full SHA written literally (NOT $(git rev-parse …)), # after the path, range L<start>-L<end>, ≥1 line of context each side.
False-positive write-back (user dismissals only — no auto-record).
- Do NOT auto-record killed-hard or low-confidence findings. v2.1 removes the auto-record ratchet: a wrong auto-suppression hides a real bug indefinitely, whereas a missing entry just means the finding is re-judged next run (cheap, now that the scorer matches the reviewer). The store grows ONLY from explicit user action.
- After a terminal review, offer: "Mark any shown finding (confirmed or lower-confidence) as a false positive to remember it?" Record only the findings the user dismisses.
- For each user-dismissed pattern, append an entry to ~/.second-brain/review-false-positives.md (read current contents with Read, append, Write back; if the file is absent create it with the header below). Recording is best-effort — a write failure must NOT fail the review.
File header (only when creating it):
```
  # Review false-positive patterns
  
```
Per entry:
```
  ## <short pattern title>
  - repo: <owner/repo>
  - where: <path or glob> (<category>)
  - why not a bug: <one-line reason>
  - source: user-dismissed
  - date: <YYYY-MM-DD>
```
Fragile-premises file (~/.second-brain/review-fragile-premises.md) — header on create:
```
  # Review fragile-premise patterns
  
```
Per entry:
```
  ## <short premise title>
  - repo: <owner/repo>
  - premise: <the assumption that proved fragile>
  - why fragile: <one-line: how it fails in the real runtime>
  - source: pass-3.5-confirmed | user
  - date: <YYYY-MM-DD>
```

Degradation

If parallel subagent dispatch is unavailable, fall back to a single-context review over the full git diff origin/<base>...HEAD (no unit fan-out, no parallel scoring) and say so in the output. Second-brain reads and FP write-back still apply.

False positives to avoid (carried from the standard reviewer)

Pre-existing issues; not-actually-a-bug; senior-engineer nitpicks; anything a linter/typechecker/compiler catches; general quality gripes unless a convention requires them; convention issues explicitly silenced in code; intentional functional changes; real issues on lines this change did not modify.

Notes

Do not build, typecheck, or run the app — CI handles that. The ONE exception is Pass 3.5: a narrow, orchestrator-run, user-confirmed, bug-fix-only premise probe (a specific proof_probe, not a general build/typecheck/test run).
Use gh for PR metadata/posting; use local git diff + Read for code.
Small changes (< 20 files) may yield only 1–3 units. That's fine.
If repeated runs leave "ghost" agents / RAM growth, triage on the affected box: (a) ps -eo pid,etimes,args | grep -E 'claude --bare|claude -p' — recursive extractors (fires only in API-key mode; OAuth queues them) → fix at the Stop-hook extractor; (b) ps -eo pid,ppid,args | grep server.bundle — orphaned MCP servers whose parent claude exited → reap them; (c) the session claude RSS climbing run-over-run is parent-context bloat, inherent to inline fan-out — the wave cap + lean sub-agent returns above are the mitigation (bounded, not a true leak).
Cost ownership. Model tiering here is the skill's OWN (self-routing). cost-router does not override it — cost-router only routes a review request to this skill (see cost-router setup), because it cannot tier a running skill's internal dispatches. The per-pass model choices in this skill are correctness/resource decisions, not price decisions: best-model-for-code and the code-as-prompt .md exception are accuracy floors; the wave caps and unit-size bounds are RAM/peak-agent ceilings.
Model vs. agent names: whenever a pass names a model (Haiku / Sonnet / Opus) it means the dispatch model parameter (e.g. model: "haiku") — NEVER an agent name. Subagents are always named via subagent_type: "second-brain:<agent>". Don't go looking for an agent called "Haiku".

code-review-deep

Invocation

Tool Access

Context Preview

SKILL.md

code-review-deep

Invocation

Tool Access

Context Preview

SKILL.md

Deep Code Review

Arguments

Pass 0 — Eligibility + context load

Pass 1 — Review-unit decomposition (Haiku model)

Pass 2 — Per-unit review (parallel agents, model by code-vs-docs)

Pass 2b — Architectural pass (advisory, parallel)

Pass 2c — History / regression pass (scored, parallel)

Pass 2d — Runtime-premise pass (scored, parallel)

Pass 3 — Dedup + scoring + filter

Pass 3.5 — Bug-fix real-env verification (orchestrator, gated)

Pass 4 — Output + false-positive write-back

Degradation

False positives to avoid (carried from the standard reviewer)

Notes

Similar Skills

Deep Code Review

Arguments

Pass 0 — Eligibility + context load

Pass 1 — Review-unit decomposition (Haiku model)

Pass 2 — Per-unit review (parallel agents, model by code-vs-docs)

Pass 2b — Architectural pass (advisory, parallel)

Pass 2c — History / regression pass (scored, parallel)

Pass 2d — Runtime-premise pass (scored, parallel)

Pass 3 — Dedup + scoring + filter

Pass 3.5 — Bug-fix real-env verification (orchestrator, gated)

Pass 4 — Output + false-positive write-back

Degradation

False positives to avoid (carried from the standard reviewer)

Notes

Similar Skills