From second-brain
In-depth multi-pass code review of a GitHub change (local checkout). Decomposes the diff into logical review units, reviews each on the best available model (docs on the Haiku model), runs an advisory architectural pass on the highest-risk units, scores findings with an FP-aware scorer, consults the second-brain for conventions and prior reviews, and records false positives. Local output by default; --comment posts to the PR.
How this skill is triggered — by the user, by Claude, or both
Slash command
/second-brain:code-review-deep [<PR#>] [--comment] [--base <branch>][<PR#>] [--comment] [--base <branch>]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Thorough, multi-pass review of a GitHub change using the LOCAL git checkout for
Thorough, multi-pass review of a GitHub change using the LOCAL git checkout for
diffs and file contents, and gh only for PR metadata and posting. Make a todo
list first, then follow these passes precisely.
<PR#> (optional): review that GitHub PR. Without it, review the current
branch vs its base.--comment: post the result as a PR comment (requires a PR context). Without
it, output to the terminal only.--base <branch>: override the auto-detected base branch.Dispatch an agent on the Haiku model (pass model: "haiku") for each mechanical
sub-step below — eligibility (step 2), CLAUDE.md discovery (step 3), and the change
summary (step 4) — so the expensive orchestrator model is not spent on, and its
context not bloated by, mechanical work. The decomposition (Pass 1) and the Pass 4
re-check likewise run on the Haiku model.
owner/repo from git remote get-url origin (or gh repo view --json nameWithOwner).--base if given; else the PR's base (gh pr view <#> --json baseRefName); else git merge-base HEAD origin/main (fall back to origin/master). Record the base ref to use in diffs as origin/<base>.git rev-parse HEAD. Use this SHA LITERALLY in output links — never compute it inside a URL.gh pr view <#> --json state,isDraft,headRefName,headRefOid,baseRefName,title,body,author. Stop (do not review) if: closed; draft / WIP; obviously automated or trivial; or we already posted a review (scan gh pr view <#> --comments for our "### Deep code review" + "Generated with [Claude Code]"). Sync guard: if local branch != headRefName → stop: "Local branch <cur> != PR source <headRefName>. Run: git checkout ". If local HEAD != headRefOid → stop: "Local HEAD <local[:8]> != PR head <headRefOid[:8]>. Run: git pull".git diff --name-only origin/<base>...HEAD; Read the root CLAUDE.md (if any) and any CLAUDE.md in directories of changed files.git log origin/<base>..HEAD --oneline when no PR), produce a concise summary.knowledge_search with 3–5 keywords drawn from the changed paths/stack → collect convention/decision pages. Pass their text as "project conventions" alongside CLAUDE.md.episodic_search for prior reviews touching these files/this repo → distill a short "previously flagged / previously dismissed here" note.git log origin/<base> -n 200 -- <changed files>, parse PR numbers from merge/squash subjects (Merge pull request #N, (#N)), cap at the ~10 most-recent distinct PRs. For each, gh api repos/<owner>/<repo>/pulls/<N>/comments for inline review comments; keep only comments whose path is among the currently changed files (or same directory). Produce TWO outputs: (a) fold durable observations into the prior-review note above; (b) for each comment that STILL APPLIES (the change re-introduces/retains the concern), emit a prior-review candidate finding {file, lines, category: prior-review, severity, title, explanation citing PR #N + the comment, is_migrated_code}. These findings flow into Pass 3 dedup + scoring exactly like the per-unit findings. Best-effort: no remote / no gh / no PR history → skip silently and note "no prior-PR signal".~/.second-brain/review-false-positives.md if it exists (else treat as empty). Hold its contents for Pass 3.git log origin/<base>..HEAD --oneline when no PR), set is_bugfix = does this
change CLAIM TO FIX a reported runtime behavior (vs. a feature / refactor / docs /
test-only change)? Record it — it gates Pass 3.5.~/.second-brain/review-fragile-premises.md if it
exists (else treat as empty). Hold its contents for Pass 2d.git diff --stat origin/<base>...HEAD. Group changed files into logical review
units (implementation + its tests; module/package cohesion; cross-layer feature
slices; config/infra serving one purpose). Skip 100%-deleted files and
trivial-only changes (whitespace, import reorder, version bump). Split any unit
15 files or ~3000 lines. Cap at 15 units (merge smallest if over). Tag each unit priority: critical (auth/security/data/access) | high (core logic, user-facing) | medium (utilities/internal) | low (config). Set
docs_only: trueONLY when every file is prose documentation — matches*.md/*.mdx/*.txt/*.rstor lives underdocs/**— AND none lives underskills/**,agents/**,tests/**, or any other executable/prompt tree. Those files ARE the product (aSKILL.mdor agent.mdis code-as-prompt) and must be reviewed on the best model, not the Haiku model. Config files (*.json,*.yaml,*.toml, dotfiles) are code-side, NOT docs. Any unit taggedcriticalorhighisdocs_only: falseregardless of extension. There is no early-exit — true docs units are still reviewed, just on the Haiku model. Emit JSON:
[{"name":"...","files":["..."],"priority":"critical","skip":false,"docs_only":false}, ...]
Filter skip:true; sort critical-first; record the skipped count.
Dispatch one Agent(subagent_type: "second-brain:code-review-unit-reviewer") per
non-skipped unit, choosing the model by unit kind:
docs_only: false): dispatch with NO model override — the agent
inherits the session model, i.e. the best model available (the v2 directive).docs_only: true): dispatch with model: "haiku" — docs don't
need deep reasoning.Dispatch in waves of at most 5 concurrent agents (not all 15 at once): run
critical/high code units first, then medium/low code units, then doc units. Pack
each wave to the cap from the priority-sorted list — backfill spare slots from the
next tier; don't leave slots idle. The wave cap bounds peak agent count and RAM.
Pass each agent: unit name + file list,
origin/<base> as the base ref, the change summary, the combined project
conventions (CLAUDE.md + wiki pages), and the episodic prior-review note. Each
agent returns structured findings only (no file bodies). Collect them.
If at least one critical or high unit exists, dispatch exactly ONE
Agent(subagent_type: "second-brain:quality-reviewer") over the deduped union of
all critical+high unit files. It occupies one slot in wave 1 (as do the Pass 2c
history reviewer and the Pass 2d premise reviewer when they run) — so wave 1 holds at
most 2 unit-reviewers + the architectural + history + premise reviewers (≤5 concurrent
total), keeping the cap intact. Each skipped advisory/lens pass returns its slot to
unit-reviewers (all three of 2b/2c/2d run → ≤2 unit-reviewers; any two run → ≤3; any
one runs → ≤4; none → ≤5 unit-reviewers) — the ≤5 cap holds in every combination. It
depends only on Pass 1's unit list, not Pass 2's
findings. Pass it origin/<base> (the SAME base-ref form Pass 2 uses), the change
summary, and the file set, and instruct it to scope findings to lines changed since
that ref — ignore pre-existing issues on untouched lines. If there are no
critical/high units, skip this pass.
Its CRITICAL/WARNING/INFO output is collected verbatim for a separate
"Architectural notes (advisory)" section in Pass 4. These notes are advisory only:
they are never scored or recorded as false positives, and are kept distinct from
the numbered bug findings.
If at least one non-skipped code unit exists (docs_only: false), dispatch
exactly ONE Agent(subagent_type: "second-brain:code-review-history-reviewer") over
the deduped union of all non-skipped code-unit files. It occupies one slot in wave
1 alongside the architectural reviewer (see the Pass 2b wave-1 note). It depends
only on Pass 1's unit list, not Pass 2's findings, so it runs concurrently. Pass it
origin/<base> (the SAME base-ref form Pass 2 uses), the change summary, the combined
project conventions (CLAUDE.md + wiki), and the episodic prior-review note. Unlike the
architectural pass, its findings ARE bugs (category regression): they flow into
Pass 3 dedup + scoring exactly like the per-unit findings. If every unit is docs-only,
skip this pass.
If at least one non-skipped code unit exists (docs_only:false), dispatch exactly
ONE Agent(subagent_type:"second-brain:code-review-premise-reviewer") over the deduped
union of all non-skipped code-unit files. It occupies one slot in wave 1 alongside
the architectural (2b) and history (2c) reviewers. It depends only on Pass 1's unit
list, not Pass 2's findings, so it runs concurrently. Pass it origin/<base>, the
change summary, the combined project conventions (CLAUDE.md + wiki), the prior-review
note, and the review-fragile-premises.md contents from Pass 0. Its findings (category
premise) flow into Pass 3 dedup + scoring exactly like the per-unit findings. The
premise reviewer NAMES unproven runtime premises (the bug class diff-static review
misses); Pass 3.5 PROBES them. If every unit is docs-only, skip this pass.
regression finding
(Pass 2c, which cites a prior commit short-SHA) and a non-regression finding on
the same line, prefer the regression one — its commit citation is what makes it
actionable and would otherwise be lost. prior-review findings (Pass 0) participate in dedup and scoring like any other finding.Agent(subagent_type: "second-brain:code-review-scorer"), passing the finding,
its file paths, the project conventions, and the false-positive store contents
from Pass 0. A premise finding (Pass 2d) scores HIGH when the premise is
load-bearing AND unproven AND — if Pass 3.5 ran — shown BROKEN; LOW when Pass 3.5
confirmed it holds or it is established/defended. A premise Pass 3.5 marked BROKEN
is force-promoted to confirmed (≥70) regardless of the scorer's number.
For findings of severity critical or high, run a 3-vote refuter panel
instead of a single scorer: one normal code-review-scorer plus two dispatched
with REFUTE MODE in their task. The final score is the median of the three
(mathematically identical to "confirmed iff >= 2 of 3 score >= 70"), so it drops into
the >= 70 / 16-69 / <= 15 partition with no rule change. The panel scorers inherit the
session/best model (a quality floor — a refuter must out-reason the finder); do NOT
pin them to a cheaper model. Medium/low findings keep the single scorer.
Dispatch scorers and refuter-panel votes in the same waves of at most 5
concurrent agents as Pass 2 — Pass 3 scoring shares the ≤5 wave cap (the
refuter panel multiplies Pass-3 agents ×3 for each critical/high finding, so this
bounds peak agent count and RAM on a constrained host, exactly as in Pass 2).Runs ONLY when is_bugfix (Pass 0) AND Pass 2d flagged ≥1 load-bearing premise. This
is the ONE step that executes code — run by the orchestrator (this trusted session),
NEVER by a sandboxed PR-influenced agent.
proof_probe will run; it
executes code. On decline: skip, mark the premise findings "unverified (user
declined)", continue to Pass 4. Never blocks the review.proof_probe, exercising the changed code
path in the real env — the actual environment state, NOT a sandbox that sets
convenient values. Record holds / BROKEN. A BROKEN premise elevates its finding
to confirmed critical ("fix does not hold in the real runtime").test-gap finding ("no test covers the regime where the bug occurs").Best-effort: any probe error is reported, never fails the review.
Output.
Default (no --comment): print the formatted review to the terminal.
--comment: re-run the eligibility check (still open, no competing deep
review posted since we started), then gh pr comment <#> --body "...".
Comment format (no emojis):
### Deep code review
Analyzed X review units (Y files, Z skipped as trivial). Found N issues:
1. **<brief description>** (category: severity)
<link>
...
Generated with [Claude Code](https://claude.ai/code) using second-brain:code-review-deep
Categories include: logic-error, type-safety, cross-file, edge-case, test-gap, convention, security, infrastructure, regression, premise, prior-review.
Or, if none: Analyzed X review units (Y files, Z skipped as trivial). No issues found.
Lower-confidence findings (unverified). If the low-confidence bucket
(score 16–69) is non-empty, append — after the numbered confirmed findings and
before the architectural notes — a section titled Lower-confidence findings (unverified — may be false positives). Lead with one line noting these were
found but not confirmed at high confidence and may include false positives, then
list each as - **<brief>** (category: severity) followed by its <link>. Keep
it visually distinct from the numbered confirmed list and from the architectural
notes so the three are never conflated. For --comment, post it under that same
subhead. If the bucket is empty, omit the section.
Architectural notes (advisory). If Pass 2b ran, append after the numbered
findings a section titled Architectural notes (advisory — not blocking)
containing the quality-reviewer output. For --comment, post it under that
same labelled subhead, visually separated from the numbered bug list so a
reader never mistakes an architectural opinion for a confirmed bug.
Runtime-premise verification. If Pass 3.5 ran, append a section titled
Runtime-premise verification listing each probed premise with holds / BROKEN
and a one-line real-env evidence note. A confirmed BROKEN premise may be appended
(user-confirmed) to ~/.second-brain/review-fragile-premises.md using the format
below — same best-effort, never-fail discipline as the false-positive store.
Link format (literal full SHA, renders in Markdown):
https://github.com/<owner>/<repo>/blob/<FULL-SHA>/<path>#L<start>-L<end>
— full SHA written literally (NOT $(git rev-parse …)), # after the path,
range L<start>-L<end>, ≥1 line of context each side.
False-positive write-back (user dismissals only — no auto-record).
~/.second-brain/review-false-positives.md (read current contents with Read,
append, Write back; if the file is absent create it with the header below).
Recording is best-effort — a write failure must NOT fail the review.File header (only when creating it):
# Review false-positive patterns
<!-- Read by code-review-scorer to suppress known non-issues. Append-only. -->
Per entry:
## <short pattern title>
- repo: <owner/repo>
- where: <path or glob> (<category>)
- why not a bug: <one-line reason>
- source: user-dismissed
- date: <YYYY-MM-DD>
Fragile-premises file (~/.second-brain/review-fragile-premises.md) — header on create:
# Review fragile-premise patterns
<!-- Read by code-review-premise-reviewer to raise severity on known-fragile runtime premises. Append-only. -->
Per entry:
## <short premise title>
- repo: <owner/repo>
- premise: <the assumption that proved fragile>
- why fragile: <one-line: how it fails in the real runtime>
- source: pass-3.5-confirmed | user
- date: <YYYY-MM-DD>
If parallel subagent dispatch is unavailable, fall back to a single-context
review over the full git diff origin/<base>...HEAD (no unit fan-out, no parallel
scoring) and say so in the output. Second-brain reads and FP write-back still apply.
Pre-existing issues; not-actually-a-bug; senior-engineer nitpicks; anything a linter/typechecker/compiler catches; general quality gripes unless a convention requires them; convention issues explicitly silenced in code; intentional functional changes; real issues on lines this change did not modify.
proof_probe, not a general build/typecheck/test run).gh for PR metadata/posting; use local git diff + Read for code.ps -eo pid,etimes,args | grep -E 'claude --bare|claude -p' — recursive
extractors (fires only in API-key mode; OAuth queues them) → fix at the Stop-hook
extractor; (b) ps -eo pid,ppid,args | grep server.bundle — orphaned MCP servers
whose parent claude exited → reap them; (c) the session claude RSS climbing
run-over-run is parent-context bloat, inherent to inline fan-out — the wave cap +
lean sub-agent returns above are the mitigation (bounded, not a true leak).cost-router setup), because it cannot tier a running skill's
internal dispatches. The per-pass model choices in this skill are
correctness/resource decisions, not price decisions: best-model-for-code and the
code-as-prompt .md exception are accuracy floors; the wave caps and unit-size
bounds are RAM/peak-agent ceilings.model parameter (e.g. model: "haiku") — NEVER an agent name.
Subagents are always named via subagent_type: "second-brain:<agent>". Don't go
looking for an agent called "Haiku".npx claudepluginhub cain-ish/claude-code-plugin --plugin second-brainCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.