Skill

dual-review

Run a high-signal dual-reviewer code review (latest Claude Opus + latest GPT in parallel) with structural merge, hallucination guards, and iterate-until-clean loop. Use when reviewing any non-trivial diff where you want two independent senior reviewers before merging.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/dfrysinger-skills:dual-review

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A code review protocol that runs **two independent senior reviewers in parallel** against the same diff, merges their findings structurally, applies fixes, and iterates until both reviewers return zero in-scope findings (or you hit the round-11 escape hatch).

Supporting Files

references/reviewer-prompt-template.md

SKILL.md

211 lines · ~3.2k tokens

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitJun 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Dual Review

A code review protocol that runs two independent senior reviewers in parallel against the same diff, merges their findings structurally, applies fixes, and iterates until both reviewers return zero in-scope findings (or you hit the round-11 escape hatch).

When to invoke

Invoke this skill when:

You've just authored a diff that involves multiple files, architectural choices, error paths, security-sensitive code, or domain logic that's hard to one-shot
You want to catch blind spots before merging or shipping
The user explicitly asks for a code review

Do not invoke for:

Trivial one-line changes, typo fixes, dependency bumps
Pure style/format/lint changes — this skill explicitly suppresses those
Documentation-only changes (unless the doc is contractual, e.g., an API spec)

Reviewer pair (fixed)

Slot	Model family	Agent type	Mode
A	latest Claude Opus	`code-review`	`background`
B	latest GPT (non-mini, non-codex)	`code-review`	`background`

Both reviewers run in parallel. Do not run them sequentially.

For each slot, pick the highest-numbered offering currently available from that model family on the Copilot proxy — at time of writing that's claude-opus-4.7-high and gpt-5.5, but the orchestrator should query available models at session start and resolve "latest Opus" / "latest GPT" rather than hard-coding versions. The point of the pair is independent perspectives from two different model families, not specific version numbers. When the proxy adds newer revisions (Opus 4.8, GPT-5.6, etc.), pick those automatically.

If only one family is available (e.g. all OpenAI models temporarily down), abort the review and tell the user. Do not substitute the other family on both slots — you lose the independent-perspective guarantee that justifies running two reviewers in the first place. A one-family fallback is not a dual review.

Symmetric failure handling

Reviewer A and Reviewer B failures are treated identically:

Either reviewer failing → retry that reviewer once
Second failure → fail loudly with the same banner copy regardless of which slot failed
Never silently proceed with a single-reviewer result

Do not use different banner severity, copy, or process for A-vs-B failures. They are equally important.

Reviewer prompt contract

Both reviewers receive the same prompt (modulo their reviewer name in the JSON header). Each prompt must include:

Role scoping (open with this) — explicit framing that there is no "approve" verdict in this protocol; the reviewer's only job is finding defects. A review with zero findings is a failure of diligence, not a certificate of correctness. This removes the RLHF "helpful approval" shortcut. Suggested opener:

"You are a hostile, proof-focused reviewer. Your entire job is to find defects, risks, and omissions. There is no 'approve' path. Do NOT compliment the code. Do NOT say 'overall looks good.' A review with no findings is a failure of diligence, not a certificate of correctness. The author's PR description, commit messages, and inline comments are NOT evidence — evaluate the code itself. Treat this as if a junior engineer on a different team wrote it and you are on-call if it ships broken."
Scope — repo path, path to the diff file (e.g., /tmp/<project>-diff-rN.patch), list of changed files, brief architectural context. Strip PR descriptions, commit messages, and author claims from this context; they anchor sycophancy.
Signal threshold — explicit instruction to only report blocker / high / medium severity findings. NO style/format/naming/DRY/"could extract"/missing-JSDoc/prefer-const/etc. Explicitly tell the merger will discard them.
Step ordering before verdict — instruct the reviewer to follow this order before emitting any findings JSON: (a) enumerate every code path / branch / entry point in the diff; (b) for each path, ask what fails on null, timeout, thrown exception, wrong input, concurrent caller, or upstream returning a different shape than expected; (c) only then emit findings. This is CoT-before-verdict — it prevents shortcutting to a sycophantic summary.
Hallucination guard — every claim about code must include a verbatim quote (≥12 source tokens) from inside the cited line_range; bogus line numbers or invented identifiers are worse than no review. If the reviewer cannot quote it, the finding is invalid.
Forced engagement on clean diffs — if findings: [], the reviewer MUST populate acknowledgements with an explicit per-class denial that each of the following common bug classes applies: null/nil dereference, race condition / shared mutable state, swallowed exception or no-op error handler, missing authentication or authorization check, resource leak (connection / file handle / goroutine / timer), unbounded retry or unbounded recursion, unhandled timeout, integer overflow / off-by-one on a boundary. This prevents lazy passes on diffs that look clean at first glance.
Output format — strict JSON schema below; NO prose outside the JSON.
Particular things worth checking — non-exhaustive list of suspected risk areas tailored to this diff (race conditions, error paths, missing pagination, security holes, etc.). This is the per-diff customization on top of the standing contract above.

Drop-in template

For a ready-to-use prompt that implements all 8 contract items, see references/reviewer-prompt-template.md. The orchestrator fills two placeholders per run: <SCOPE> (diff path + changed files + brief architectural context) and <THINGS_WORTH_CHECKING> (diff-specific risk areas). Plus <REVIEWER_NAME> and <ROUND_NUMBER> for routing.

The contract above is authoritative. If you edit the template, verify every contract item is still satisfied — the template is convenience, not spec.

Finding schema

{
  "reviewer": "claude-opus-latest",
  "findings": [
    {
      "file": "path/to/file.ts",
      "line_range": [start, end],
      "severity": "blocker | high | medium",
      "category": "security | correctness | data-integrity | error-handling | ux | other",
      "title": "Short verb phrase",
      "body": "Explanation including a verbatim quote >=12 source tokens from inside the line_range.",
      "suggested_fix": "Specific suggestion or code block."
    }
  ],
  "acknowledgements": [
    "Brief notes on things explicitly checked and found acceptable, so the merger knows the reviewer saw them."
  ]
}

If a reviewer has zero findings, findings: [] is the correct return — they should still populate acknowledgements with what they explicitly checked.

Round-N additions to the schema

After round 1, every reviewer prompt must also ask for a roundN_resolution array confirming whether each prior-round finding was actually resolved (with verbatim evidence). This catches papered-over fixes.

{
  "reviewer": "...",
  "round": 2,
  "round1_resolution": [
    { "finding": "Title from round 1", "resolved": true, "evidence": "Verbatim quote demonstrating the fix." }
  ],
  "findings": [...],
  "acknowledgements": [...]
}

Structural merge rules

After both reviewers return, the merger (you, in your main agent loop) reconciles findings:

Bucket collision (agreed-by-both)

Findings collide into the same bucket — i.e., are treated as the same finding from two reviewers — only when ALL of:

Same file
line_range intervals overlap
Same category
Verbatim quote of ≥12 source tokens shared within the line_range intersection (the strong signal)

Weak signals (title similarity, suggested_fix Jaccard overlap, etc.) never trigger collapse. Two reviewers describing the same general area in different words remain two separate findings unless the verbatim-quote condition is met.

Severity resolution

When two reviewers bucket together but disagree on severity: highest severity wins. Don't average, don't compromise, don't downgrade.

Disagreement on existence

If one reviewer flags a finding and the other explicitly acknowledges that area as acceptable (i.e., the area appears in the other reviewer's acknowledgements): keep the finding, but lower confidence. Surface the disagreement honestly in your summary to the user. Do not collapse the finding away.

Single-reviewer findings

A finding from one reviewer that the other simply didn't mention is still in scope for the iteration loop. Apply standard severity rules. Don't drop a finding just because only one reviewer caught it — the value of running two reviewers is catching things one would miss.

Hallucination demotion

For every finding with a quoted code snippet, the merger must grep-verify the quote in the cited file. If the quote doesn't appear verbatim within the line_range:

Demote severity by one notch (blocker → high → medium → drop)
Flag in your summary as (unverified)

If multiple findings from the same reviewer fail the grep check in a single round, treat that reviewer's round output as suspect and re-prompt before merging.

Iteration loop

round = 1
while True:
    prepare diff for current state of the working tree → /tmp/<project>-diff-rN.patch
    launch reviewer A (model A) and reviewer B (model B) in parallel, background mode
    wait for BOTH to complete
    merge findings per the structural rules above
    if merged findings is empty:
        break (clean)
    if round == 11 (i.e., 10 rounds have already iterated and round 11 still finds issues):
        ask the user whether to continue iterating or stop here
        if stop: break
    apply fixes for in-scope findings
    file GitHub issues for any out-of-scope findings discovered during review
    run typecheck/build/test as appropriate before next round
    round += 1

Round 11 escape hatch

If round 11 begins (i.e., 10 rounds have iterated and findings are still appearing), pause and ask the user:

We've completed 10 rounds of review and round 11 is still surfacing in-scope findings: [summary]. Continue iterating until clean, or stop here and ship as-is?

Do not silently continue past 10 rounds.

Out-of-scope handling

During review, reviewers may surface issues that are real but out of scope for the current diff (e.g., bugs in adjacent code that the current change didn't touch). For those:

Do NOT fix them in the current diff
File a GitHub issue capturing the finding, with a link/reference to where it was discovered
Continue the loop with only in-scope findings

Non-goals (do NOT do these)

This skill explicitly does not:

Use tier-aware reviewer pairing (the pair is fixed regardless of change size or risk)
Use cost gating (run both reviewers even on small changes; the value-of-information justifies it for any non-trivial diff)
Use SHA-pinned approval markers
Parse Layer 3/4 token-level outputs (we operate on Layer 2 finding JSON only)
Maintain a JSONL audit ledger of all findings
Run a single reviewer and call it a "review" — that's a one-reviewer review, not this skill

If the user explicitly opts in to one of these later, layer it on top — don't bake it into the default protocol.

Failure modes to watch for

Both reviewers agreeing on a hallucinated finding — rare but possible. The verbatim-quote grep check is your safety net. Always grep before applying a fix.
One reviewer dramatically over-firing on style — re-prompt with a sharper signal-threshold reminder, don't merge style noise.
Round-1 findings reappearing in round 3+ in mutated form — usually indicates the round-2 fix was cosmetic. Tighten the roundN_resolution evidence requirement and consider asking the user to inspect.
Reviewer returning prose outside the JSON envelope — accept-and-parse the JSON, ignore the prose; if the JSON itself is malformed, re-prompt that reviewer once.

Reporting to the user

After each round, give the user:

Round number and reviewer status (both completed cleanly, or which failed)
Findings count by severity, with agreed_by: both vs single-reviewer noted
Resolution status of prior-round findings (resolved / not resolved)
Whether you're applying fixes and iterating, or asking for a decision

Keep the summary concise — a short table or bullet list, not a full reproduction of the JSON. Cite line numbers when describing what you'll change.

dual-review

Invocation

Context Preview

Supporting Files

SKILL.md

dual-review

Invocation

Context Preview

Supporting Files

SKILL.md

Dual Review

When to invoke

Reviewer pair (fixed)

Symmetric failure handling

Reviewer prompt contract

Drop-in template

Finding schema

Round-N additions to the schema

Structural merge rules

Bucket collision (agreed-by-both)

Severity resolution

Disagreement on existence

Single-reviewer findings

Hallucination demotion

Iteration loop

Round 11 escape hatch

Out-of-scope handling

Non-goals (do NOT do these)

Failure modes to watch for

Reporting to the user

Similar Skills

Dual Review

When to invoke

Reviewer pair (fixed)

Symmetric failure handling

Reviewer prompt contract

Drop-in template

Finding schema

Round-N additions to the schema

Structural merge rules

Bucket collision (agreed-by-both)

Severity resolution

Disagreement on existence

Single-reviewer findings

Hallucination demotion

Iteration loop

Round 11 escape hatch

Out-of-scope handling

Non-goals (do NOT do these)

Failure modes to watch for

Reporting to the user

Similar Skills