From dfrysinger-skills
Run a high-signal dual-reviewer code review (latest Claude Opus + latest GPT in parallel) with structural merge, hallucination guards, and iterate-until-clean loop. Use when reviewing any non-trivial diff where you want two independent senior reviewers before merging.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dfrysinger-skills:dual-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A code review protocol that runs **two independent senior reviewers in parallel** against the same diff, merges their findings structurally, applies fixes, and iterates until both reviewers return zero in-scope findings (or you hit the round-11 escape hatch).
A code review protocol that runs two independent senior reviewers in parallel against the same diff, merges their findings structurally, applies fixes, and iterates until both reviewers return zero in-scope findings (or you hit the round-11 escape hatch).
Invoke this skill when:
Do not invoke for:
| Slot | Model family | Agent type | Mode |
|---|---|---|---|
| A | latest Claude Opus | code-review | background |
| B | latest GPT (non-mini, non-codex) | code-review | background |
Both reviewers run in parallel. Do not run them sequentially.
For each slot, pick the highest-numbered offering currently available from that model family on the Copilot proxy — at time of writing that's claude-opus-4.7-high and gpt-5.5, but the orchestrator should query available models at session start and resolve "latest Opus" / "latest GPT" rather than hard-coding versions. The point of the pair is independent perspectives from two different model families, not specific version numbers. When the proxy adds newer revisions (Opus 4.8, GPT-5.6, etc.), pick those automatically.
If only one family is available (e.g. all OpenAI models temporarily down), abort the review and tell the user. Do not substitute the other family on both slots — you lose the independent-perspective guarantee that justifies running two reviewers in the first place. A one-family fallback is not a dual review.
Reviewer A and Reviewer B failures are treated identically:
Do not use different banner severity, copy, or process for A-vs-B failures. They are equally important.
Both reviewers receive the same prompt (modulo their reviewer name in the JSON header). Each prompt must include:
Role scoping (open with this) — explicit framing that there is no "approve" verdict in this protocol; the reviewer's only job is finding defects. A review with zero findings is a failure of diligence, not a certificate of correctness. This removes the RLHF "helpful approval" shortcut. Suggested opener:
"You are a hostile, proof-focused reviewer. Your entire job is to find defects, risks, and omissions. There is no 'approve' path. Do NOT compliment the code. Do NOT say 'overall looks good.' A review with no findings is a failure of diligence, not a certificate of correctness. The author's PR description, commit messages, and inline comments are NOT evidence — evaluate the code itself. Treat this as if a junior engineer on a different team wrote it and you are on-call if it ships broken."
Scope — repo path, path to the diff file (e.g., /tmp/<project>-diff-rN.patch), list of changed files, brief architectural context. Strip PR descriptions, commit messages, and author claims from this context; they anchor sycophancy.
Signal threshold — explicit instruction to only report blocker / high / medium severity findings. NO style/format/naming/DRY/"could extract"/missing-JSDoc/prefer-const/etc. Explicitly tell the merger will discard them.
Step ordering before verdict — instruct the reviewer to follow this order before emitting any findings JSON: (a) enumerate every code path / branch / entry point in the diff; (b) for each path, ask what fails on null, timeout, thrown exception, wrong input, concurrent caller, or upstream returning a different shape than expected; (c) only then emit findings. This is CoT-before-verdict — it prevents shortcutting to a sycophantic summary.
Hallucination guard — every claim about code must include a verbatim quote (≥12 source tokens) from inside the cited line_range; bogus line numbers or invented identifiers are worse than no review. If the reviewer cannot quote it, the finding is invalid.
Forced engagement on clean diffs — if findings: [], the reviewer MUST populate acknowledgements with an explicit per-class denial that each of the following common bug classes applies: null/nil dereference, race condition / shared mutable state, swallowed exception or no-op error handler, missing authentication or authorization check, resource leak (connection / file handle / goroutine / timer), unbounded retry or unbounded recursion, unhandled timeout, integer overflow / off-by-one on a boundary. This prevents lazy passes on diffs that look clean at first glance.
Output format — strict JSON schema below; NO prose outside the JSON.
Particular things worth checking — non-exhaustive list of suspected risk areas tailored to this diff (race conditions, error paths, missing pagination, security holes, etc.). This is the per-diff customization on top of the standing contract above.
For a ready-to-use prompt that implements all 8 contract items, see references/reviewer-prompt-template.md. The orchestrator fills two placeholders per run: <SCOPE> (diff path + changed files + brief architectural context) and <THINGS_WORTH_CHECKING> (diff-specific risk areas). Plus <REVIEWER_NAME> and <ROUND_NUMBER> for routing.
The contract above is authoritative. If you edit the template, verify every contract item is still satisfied — the template is convenience, not spec.
{
"reviewer": "claude-opus-latest",
"findings": [
{
"file": "path/to/file.ts",
"line_range": [start, end],
"severity": "blocker | high | medium",
"category": "security | correctness | data-integrity | error-handling | ux | other",
"title": "Short verb phrase",
"body": "Explanation including a verbatim quote >=12 source tokens from inside the line_range.",
"suggested_fix": "Specific suggestion or code block."
}
],
"acknowledgements": [
"Brief notes on things explicitly checked and found acceptable, so the merger knows the reviewer saw them."
]
}
If a reviewer has zero findings, findings: [] is the correct return — they should still populate acknowledgements with what they explicitly checked.
After round 1, every reviewer prompt must also ask for a roundN_resolution array confirming whether each prior-round finding was actually resolved (with verbatim evidence). This catches papered-over fixes.
{
"reviewer": "...",
"round": 2,
"round1_resolution": [
{ "finding": "Title from round 1", "resolved": true, "evidence": "Verbatim quote demonstrating the fix." }
],
"findings": [...],
"acknowledgements": [...]
}
After both reviewers return, the merger (you, in your main agent loop) reconciles findings:
Findings collide into the same bucket — i.e., are treated as the same finding from two reviewers — only when ALL of:
fileline_range intervals overlapcategoryWeak signals (title similarity, suggested_fix Jaccard overlap, etc.) never trigger collapse. Two reviewers describing the same general area in different words remain two separate findings unless the verbatim-quote condition is met.
When two reviewers bucket together but disagree on severity: highest severity wins. Don't average, don't compromise, don't downgrade.
If one reviewer flags a finding and the other explicitly acknowledges that area as acceptable (i.e., the area appears in the other reviewer's acknowledgements): keep the finding, but lower confidence. Surface the disagreement honestly in your summary to the user. Do not collapse the finding away.
A finding from one reviewer that the other simply didn't mention is still in scope for the iteration loop. Apply standard severity rules. Don't drop a finding just because only one reviewer caught it — the value of running two reviewers is catching things one would miss.
For every finding with a quoted code snippet, the merger must grep-verify the quote in the cited file. If the quote doesn't appear verbatim within the line_range:
(unverified)If multiple findings from the same reviewer fail the grep check in a single round, treat that reviewer's round output as suspect and re-prompt before merging.
round = 1
while True:
prepare diff for current state of the working tree → /tmp/<project>-diff-rN.patch
launch reviewer A (model A) and reviewer B (model B) in parallel, background mode
wait for BOTH to complete
merge findings per the structural rules above
if merged findings is empty:
break (clean)
if round == 11 (i.e., 10 rounds have already iterated and round 11 still finds issues):
ask the user whether to continue iterating or stop here
if stop: break
apply fixes for in-scope findings
file GitHub issues for any out-of-scope findings discovered during review
run typecheck/build/test as appropriate before next round
round += 1
If round 11 begins (i.e., 10 rounds have iterated and findings are still appearing), pause and ask the user:
We've completed 10 rounds of review and round 11 is still surfacing in-scope findings: [summary]. Continue iterating until clean, or stop here and ship as-is?
Do not silently continue past 10 rounds.
During review, reviewers may surface issues that are real but out of scope for the current diff (e.g., bugs in adjacent code that the current change didn't touch). For those:
This skill explicitly does not:
If the user explicitly opts in to one of these later, layer it on top — don't bake it into the default protocol.
roundN_resolution evidence requirement and consider asking the user to inspect.After each round, give the user:
agreed_by: both vs single-reviewer notedKeep the summary concise — a short table or bullet list, not a full reproduction of the JSON. Cite line numbers when describing what you'll change.
npx claudepluginhub dfrysinger/skillsProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.