From deep-research
Deep research across web, codebase, and knowledge domains with auto-scaling. Use when: "research", "deep research", "investigate", "compare", "analyze across", "what are best practices for", "how does X compare to Y", "survey options for". Supports web research, codebase analysis, knowledge synthesis, and mixed mode.
How this skill is triggered — by the user, by Claude, or both
Slash command
/deep-research:drThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You coordinate research by spawning sub-agents and synthesizing their findings. You never search or fetch directly.
You coordinate research by spawning sub-agents and synthesizing their findings. You never search or fetch directly.
<!-- METRICS:{...} --> so the stop hook can record the run.model: "sonnet" and an explicit depth level, because without these they inherit your model (expensive) and default to shallow searches (poor results).If spawning a deep-research:dr-scraper-web, deep-research:dr-scraper-codebase, or deep-research:dr-verifier subagent fails for ANY reason — permission denied, subagent type not found, plugin error, prior failed attempt in this session — you MUST NOT:
WebSearch / WebFetch / Grep / Read to do the research yourself.general-purpose) that has WebSearch/WebFetch directly. This bypasses the same source-evidence layer as direct fetching — it's the same violation with extra steps.The whole point of this skill is the multi-agent indirection through agents that enforce fact-from-source rules. Direct-fetch and substitute-agents both produce fabrication-prone synthesis without those rules.
Phrases that signal you are about to break this rule and which you must NOT emit:
There is no fallback mode. Either scrapers work, or the skill aborts cleanly. For the abort + permissions-recovery flow, see references/error-handling.md.
Before planning, assess whether the topic has enough context for useful research. Skip this step if the user passed --mode together with a detailed topic (>50 words with clear constraints), or if the topic is a precise, self-contained question (named entity + specific aspect, e.g. "LiveView 1.0 streams vs. temporary_assigns for large lists").
Evaluate the topic along five dimensions:
Trigger clarification when two or more dimensions are unclear, or when the topic is under 10 words without surrounding context in the conversation.
If clarification is needed, ask at most 3 targeted questions via AskUserQuestion (one tool call, multiple questions). Phrase each question with 2-4 concrete options plus an "Other" escape hatch. Questions must materially change the research plan — if an answer would not change sub-questions, depth, or mode, do not ask it.
Examples of questions that change the plan:
After the user answers, distill the responses into a CONSTRAINTS: block (1-2 lines max — stack/version, decision context, source preferences, time-frame, anything that materially shapes lookups). Keep the original topic unchanged. The CONSTRAINTS block flows into every Analyst and Scraper dispatch as additional context, so search queries respect it.
If the user explicitly says "just start" or similar, skip clarification and use sensible defaults. Leave CONSTRAINTS empty or omit it. Continue to Step 1. Do not re-ask.
Parse these optional flags from the topic string (strip them out before treating the rest as the topic):
| Flag | Effect |
|---|---|
--mode web|codebase|knowledge|mixed | Force research mode (as before) |
--tier lite|standard|thorough | Cost/verify tier. Default resolution order below |
--verify3 | 3 verifier voters per claim instead of 1 (only at standard/thorough) |
--no-verify | Skip the verify stage entirely (v2.3.0 behavior) |
--yes / --no-confirm | Skip the approval gate (as before) |
Tier default resolution: if --tier is given, use it. Otherwise read
~/.claude/deep-research/config.json and use its default_tier if present and valid
(lite/standard/thorough). Otherwise default to lite.
# tier default lookup (run once, before planning)
cat ~/.claude/deep-research/config.json 2>/dev/null
Tier parameters:
| Tier | Verify central-claims cap | Voters (default) | Hard subagent cap |
|---|---|---|---|
| lite | 8 | 1 | 25 |
| standard | 10 | 1 (3 with --verify3) | 35 |
| thorough | 12 | 3 | 55 |
--verify3 at lite is ignored with a one-line note ("verify3 wirkt erst ab tier
standard, ignoriert"). The hard subagent cap is absolute — no flag raises it above the
tier value.
Parse the topic and detect mode. Mode must be exactly one of web, codebase, knowledge, or mixed. Do not invent new modes (e.g. analytics, survey, comparison) — pick the closest of the four:
dr-scraper-web scrapers to verify the top-3 claims before synthesis. No claim ships without a source.Break the topic into 2-4 sub-questions. Assign each a depth level:
deep: core question (typically 1)standard: regular sub-questions (1-2)shallow: peripheral questions (0-2)Present the plan together with the dispatch-budget breakdown and a one-line rationale per sub-question so the user can spot a wrong-direction research framing before any token is spent. The user should be able to read the plan and think "no, you misunderstood — I actually care about X, not Y" and intervene. Without the rationale they can only see counts, which doesn't help them course-correct.
For each sub-question include:
shallow / standard / deep)Forschungsplan: "[Topic]"
Modus: [Web / Codebase / Knowledge / Mixed]
1. [Sub-question] (deep) — N scrapers
Warum deep: [core decision driver / multiple competing answers / etc.]
Angles: [angle 1] · [angle 2] · [angle 3]
2. [Sub-question] (standard) — N scrapers
Warum standard: [regular sub-question, established sources expected]
Angles: [angle 1] · [angle 2]
3. [Sub-question] (shallow) — N scrapers
Warum shallow: [peripheral fact-check / known terrain]
Angles: [angle 1]
Dispatch-Budget: N scrapers total (Sweet-Spot ~12, Ceiling ~15)
For mode: knowledge, the plan has exactly one synthetic sub-question — frame it as the verification of the top-3 claims you intend to make:
1. Verifikation der 3 Kernaussagen (standard) — 2 scrapers
Warum standard: knowledge-mode-Pflicht-Faktencheck, nicht überspringbar
Angles: Aussage 1 (X) · Aussage 2 (Y) · Aussage 3 (Z)
Dispatch-Budget: 2 scrapers total
Keep the rationale and angles short — the user wants to scan, not read prose. One line each is enough.
Before dispatching, ask the user once whether the plan is OK. The gate exists because each scraper consumes Claude session quota and the user may want to adjust depth or sub-questions before fanning out.
Skip this gate only if any of these literal conditions hold (no fuzzy matching, no "or similar" — be strict, otherwise the gate becomes meaningless):
--yes or --no-confirm1 scraper (single shallow lookup, gate would be pure ceremony)If you are unsure whether the user already confirmed earlier in the conversation, ask anyway. False-positive skips defeat the gate's purpose.
Otherwise, ask via AskUserQuestion:
Frage: "Plan OK so? N scrapers werden parallel gestartet." Optionen: "Ja, loslegen" | "Anpassen" | "Abbrechen"
If the user picks "Anpassen":
Repeat up to 5 adjustment rounds. If the user is still adjusting after the 5th, suggest aborting and re-invoking with a clearer topic. Don't enforce a hard stop — the loop limit is a soft hint that something deeper is unclear.
If "Abbrechen": stop cleanly, no METRICS comment.
You dispatch scrapers directly — there is no analyst layer. For each sub-question, decide how many scrapers to spawn based on its depth level:
| Depth | Scrapers per sub-question | Rule |
|---|---|---|
| shallow | 1-2 | Peripheral fact-check, don't over-fan-out |
| standard | 2-4 | Regular sub-question |
| deep | 3-5 | MUST spawn at least 3 scrapers. Never shortcut a deep question with 1-2 scrapers |
The floor for deep is hard. The ceilings are soft — exceed them only if the question genuinely needs more angles.
Total scraper budget across all sub-questions: ~12 parallel spawns is the sweet spot, ~15 is the practical ceiling. If your plan would dispatch more than 15 in parallel (e.g. 4 sub-questions × 5 deep scrapers each = 20), reduce by one of:
Reason: beyond ~10 parallel subagents, each additional one delivers diminishing marginal coverage while linearly increasing token cost and timeout risk.
Hard subagent cap (tier-dependent). Before dispatching, compute the planned total:
scrapers + planned verifiers. The planned verifier count is the number of central
claims you expect to verify (capped at the tier's verify cap) times the voter count. If
scrapers + planned_verifiers exceeds the tier hard cap (lite 25 / standard 35 /
thorough 55), trim in this order until it fits: (1) reduce verify claims (drop
lowest-centrality / weakest-source first), (2) only then reduce scraper count. Record
hard_cap_hit: true in METRICS if you trimmed. The cap is absolute; never exceed it,
regardless of flags.
Each scraper handles ONE narrow angle of its sub-question. Phrase angles distinctly so scrapers don't search for the same thing.
Launch all scrapers across all sub-questions in parallel. Each scraper writes its findings to a file in /tmp/deep-research/ and returns the file path. Files survive context compaction; OS cleans them on reboot.
Use this pattern for each scraper:
Agent( subagent_type: "deep-research:dr-scraper-web", model: "sonnet", prompt: "Collect facts for the question below. Follow your agent instructions for output format and return value.QUESTION: What pricing tiers does Stripe offer for SaaS billing in 2026? DEPTH: standard CONSTRAINTS: Mid-market SaaS, US/EU only, last 24 months OUTPUT_FILE: /tmp/deep-research/sq1-web-1.md" )
For codebase scrapers: subagent_type: "deep-research:dr-scraper-codebase". CONSTRAINTS still applies if present.
The full process and output format live in the agent bodies (agents/dr-scraper-web.md, agents/dr-scraper-codebase.md). Do not duplicate them in the spawn prompt — the subagent_type loads them automatically.
Before dispatching, create a per-run output directory under /tmp/deep-research/<epoch-seconds>/ (e.g. mkdir -p /tmp/deep-research/$(date +%s)) and use that directory for OUTPUT_FILE paths. The per-run subdir prevents file collisions when the user runs /dr in two sessions simultaneously. Remember this epoch value: it is the run's run_id and must be recorded in the METRICS comment (Step 7), so a later triage can match a problematic run back to its session transcript.
OUTPUT_FILE naming convention: <run-dir>/sq{N}-{type}-{M}.md where N=sub-question index, type=web or codebase, M=scraper index within that sub-question. Example: /tmp/deep-research/1746619200/sq2-web-3.md is the 3rd web scraper for sub-question 2. Adapt QUESTION, DEPTH, and CONSTRAINTS per scraper.
For knowledge mode: do NOT skip this step. Treat your top 3 intended claims as one synthetic sub-question and spawn at least 2 web scrapers to verify them. Do not synthesize before reading their files. Knowledge mode without verification scrapers is a bug — every claim still needs a source from dr-scraper-web, just like in web mode.
After all scrapers complete, read every file they wrote (under your run directory), grouped by sub-question:
Read /tmp/deep-research/<run-dir>/sq1-web-1.md
Read /tmp/deep-research/<run-dir>/sq1-web-2.md
Read /tmp/deep-research/<run-dir>/sq2-web-1.md
...
Apply these hard triggers per sub-question (aggregate across all scraper files belonging to that sub-question). If any fires, spawn one or more follow-up scrapers targeting that sub-question with rephrased queries:
insufficient data, from memory, from training memory, from training data, training data through, training cutoff, memory cutoff, from prior knowledge, based on memory, I recall, as I recall, verify against. Sonnet's honest-disclosure reflex sometimes adds these notes; treat any match as evidence that the scraper mixed real fetches with memory.https://hex.pm/, https://github.com/) without a deep path./issues/<digits>, /pull/<digits>, /releases/tag/, /commit/<hash>, date stamps (/YYYY/MM/ or -YYYY-MM-DD-), query string ?v= or ?id=, fragment #section. AND zero quoted strings (no "..." or '...') AND zero version numbers (no v?\d+\.\d+(\.\d+)?). A scraper with no quote, no date, no version, no deep URL is indistinguishable from a memory dump.Skip Step 3 entirely if no trigger fires. Continue to Step 4.
Recovery strategy — resume before respawn. When a trigger fires because a scraper stalled (hit its turn limit and returned narration instead of DONE|path, or wrote only a thin checkpoint file), prefer resuming that same agent via SendMessage to its agent ID before spawning a fresh one. A stalled scraper still holds its real fetches in context, so a resume usually produces the missing facts far cheaper than a respawn that repeats every search. Fall back to a fresh follow-up scraper only when the stalled agent has no reachable ID or the resume itself fails. Either path counts toward the 2-round limit below.
Maximum 2 follow-up rounds total per sub-question. If a sub-question still triggers after both rounds, mark it under Contradictions & Open Questions in the final output instead of papering over the gap with [interpretation].
After Step 3 (self-check), read the scraper files and extract candidate claims. This is orchestrator work — spawn no agents here.
For each concrete, falsifiable statement in the scraper files, record:
quote: snippet from the scraper file if present; else leave
empty (the verifier will fetch the source itself)central (directly answers the research question), supporting
(useful context), or tangential (peripheral)Only central claims enter Step 5. supporting and tangential claims flow unverified
into synthesis exactly as today (no confidence boost).
Skip Step 5 entirely if: --no-verify was passed, OR there are zero central claims, OR
the mode is codebase (codebase claims are not web-verifiable; they keep medium
confidence). In mixed mode, only central claims whose source is a URL (not a file
path) are eligible.
Select the eligible central claims, capped at the tier's verify cap (lite 8, standard
10, thorough 12). If there are more eligible central claims than the cap, keep the ones
with the strongest centrality and best source type; list the dropped ones under the
report's Verifikation section as "nicht verifiziert (Cap)".
For each selected claim, spawn voters dr-verifier subagents (voters = 1 for lite, or
3 for thorough / standard --verify3). Respect the hard cap from Step 2 — if
scrapers + claims*voters would exceed it, reduce the claim count first.
Write verifier outputs into the same per-run directory used for scrapers, named
<run-dir>/verify-{claimIndex}-{voterIndex}.md.
Spawn pattern per voter:
Agent( subagent_type: "deep-research:dr-verifier", model: "sonnet", prompt: "Verify one claim. Follow your agent instructions for output format and return value.CLAIM: Stripe charges 0.4% for ACH payments in 2026. QUOTE: "ACH Direct Debit ... 0.4% per transaction (capped at $5.00)" SOURCE_URL: https://stripe.com/pricing SOURCE_TYPE: doc QUESTION: What are Stripe's 2026 payment fees for SaaS? OUTPUT_FILE: /tmp/deep-research//verify-1-1.md" )
Launch all verifiers in parallel. Each returns only DONE|{path}; read the verdict files
afterward.
Aggregation. For a single voter, the verdict is that voter's verdict. For 3 voters,
take the majority: contradicted only if ≥2 voters say contradicted; otherwise the
finding survives with the majority of confirmed/uncertain. Map to confidence:
confirmed + primary source → highconfirmed with secondary source, or split votes → mediumuncertain, or single weak source → lowcontradicted claims are removed from the main findings and listed under the report's
Verifikation section with their counter-source.
Same no-fallback rule as scrapers. If spawning dr-verifier fails for any reason, you
MUST NOT verify claims yourself with direct WebSearch/WebFetch, and MUST NOT substitute
another agent type. Either the verifier works, or you skip verification for that claim and
mark it unverifiziert in the Verifikation section. See references/error-handling.md.
Synthesize findings across the scraper files by theme, not by sub-question and not by scraper.
Present in chat using the structure from references/output-format.md. Every Kernpunkt and every Finding-statement must end with a [^N] inline citation pointing to the numbered Sources section. Build the Sources list from the actual URLs in the scraper files. If a statement cannot be tied to a source from the files, either remove it or mark it [interpretation] and explain why. No claim ships without either a citation or an [interpretation] tag.
After presenting, ask: "Soll ich die Ergebnisse als Report speichern? (Datei wird unter ~/.claude/deep-research/ abgelegt)"
If yes, write to ~/.claude/deep-research/YYYY-MM-DD-<topic-slug>.md following these rules:
topic-slug: lowercase, ASCII-only (ä→ae, ö→oe, ü→ue, ß→ss, drop other accents), keep [a-z0-9] and replace runs of other characters with a single -, trim leading/trailing dashes, max 60 characters total
Collision: if the target file already exists, append -2, -3, ... before .md (e.g. 2026-04-28-caching-2.md). Never overwrite.
Frontmatter: prepend YAML frontmatter so the file is later indexable, then a blank line, then the report:
---
topic: <original topic verbatim>
date: YYYY-MM-DD
mode: <web | codebase | knowledge | mixed>
sources_count: <integer>
---
End your final response with the METRICS comment so the stop hook can record the run.
The new fields after follow_up_needed are for compliance tracking — they let us measure whether the depth corridor and citation rules are actually followed across many runs. Compute them from your own dispatch records and your final output:
run_id: the epoch-seconds value from your per-run output directory (Step 2, e.g. "1746619200"). Always include it — it lets a later triage match a problematic run back to its session transcript. Without it, a bad run is invisible in the aggregated metrics.scraper_count_per_subquestion: list of {depth, count} — one entry per sub-question with the scraper count you dispatched for itdepth_corridor_violations: integer count of sub-questions that broke the corridor (deep with <3 scrapers, shallow with >2, standard outside 2-4)claims_with_citation: integer count of factual statements ending with [^N] or [interpretation] in your final responseclaims_total: integer count of factual statements in your final response (denominator for compliance)constraints_used: boolean — did Step 0 produce a CONSTRAINTS block that was passed to scrapers?knowledge_factcheck_done: boolean — for mode=knowledge, did you spawn verification scrapers with web lookups? Use null for non-knowledge modes.approval_gate_action: one of "skipped" (skip-condition matched), "approved" (user picked "Ja, loslegen"), "adjusted" (user went through one or more "Anpassen" rounds before approving), or "cancelled" (user picked "Abbrechen" — in that case you should not be emitting METRICS at all, this value exists only to make the schema complete).Verify-stage fields (v3):
verify_tier: "lite" | "standard" | "thorough"verify_voters: 1 or 3claims_verified: int — central claims sent to Step 5claims_confirmed: intclaims_uncertain: intclaims_contradicted: inttotal_subagents: int — scrapers + verifiers actually spawnedhard_cap_hit: bool — true if the run was trimmed to fit the tier hard capTemplate:
<!-- METRICS:{"run_id":"<epoch-seconds>","topic":"...","mode":"...","scrapers":N,"scraper_errors":N,"sources_total":N,"sources_by_type":{"doc":N,"blog":N,"forum":N,"github":N,"code":N},"gaps_found":N,"self_check_passed":BOOL,"follow_up_needed":BOOL,"scraper_count_per_subquestion":[{"depth":"deep","count":4}],"depth_corridor_violations":0,"claims_with_citation":N,"claims_total":N,"constraints_used":BOOL,"knowledge_factcheck_done":BOOL_OR_NULL,"approval_gate_action":"approved","verify_tier":"lite","verify_voters":1,"claims_verified":N,"claims_confirmed":N,"claims_uncertain":N,"claims_contradicted":N,"total_subagents":N,"hard_cap_hit":false} -->
| Level | What you see | Max total |
|---|---|---|
| Scraper return values | DONE | path only |
| File reads | 600 words x ~12 files max | ~7,200 words |
| Verifier return values | DONE | path only |
| Verifier file reads | ~120 words x ≤24 files | ~2,900 words |
Scrapers return only DONE|{path}. The orchestrator reads files on demand. Each scraper file is capped at ~600 words; for a typical 4-sub-question / 3-scraper-each run that's ~12 files.
Read references/error-handling.md for failures, vague questions, and quality issues.
Before finishing, check three things:
[^N] citation or an [interpretation] tag?[^N] used above?central claim either carry a confidence marker from a verifier verdict, or appear in the Verifikation section as unverified? A central claim with no verdict and no Verifikation entry is a bug.If any check fails, re-read the scraper files and fix the gaps before sending. A claim without a source is a bug, not an output.
npx claudepluginhub phyr97/phyr97-marketplace --plugin deep-researchCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.