From llm-gateway
Dispatches tasks to the optimal LLM (Codex, Gemini, Claude, etc.) based on task type and security needs, with configurable approval strategies and polling behavior.
How this skill is triggered — by the user, by Claude, or both
Slash command
/llm-gateway:model-routingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Choose the right LLM for each task. Based on real usage across 11+ VerivusAI projects.
Choose the right LLM for each task. Based on real usage across 11+ VerivusAI projects.
Apply these on every dispatch unless the caller has explicitly overridden a rule in the current turn:
model — let the gateway use its configured default per CLI. Nominating a model risks deprecated IDs (o3, o3-pro, gpt-4o, …) and capability mismatches. Call list_models only when the caller has asked for a specific variant.approvalStrategy:"mcp_managed" is the skill dispatch default (the gateway schema default is "legacy"). It gates the request before execution, then sets each provider to a safe accept-edits-level mode (auto-accept file edits; Bash and other dangerous tools stay gated): Claude and Grok --permission-mode acceptEdits, Mistral --agent accept-edits, and Gemini prompted default (the agy CLI has no accept-edits rung, so Gemini cannot auto-approve mutating tools under mcp_managed). Codex still needs fullAuto:true for autonomous file/shell work (its sandboxed workspace-write mode is unchanged). Full unattended execution requires the operator opt-in LLM_GATEWAY_APPROVAL_ALLOW_BYPASS=1, which restores each provider's full auto-approve mode (Claude bypassPermissions, Grok --always-approve, Mistral auto-approve, Gemini --dangerously-skip-permissions).idleTimeoutMs is a separate no-output safeguard.NOT APPROVED or conditional, fix + re-review → repeat. Escalate after 3 rounds. This rule does not apply to pure implementation or non-review analysis dispatches.All tool invocations below use the dispatch defaults above (omit model, approvalStrategy:"mcp_managed", fullAuto:true for Codex, poll every 60 s, loop on reviews).
| Task | Best LLM | Why | Tool |
|---|---|---|---|
| Code implementation | Codex | Strongest at writing correct code, handles large codebases | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Code review (quality) | Codex | Thorough, finds real issues, gives actionable feedback | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Code review (security) | Gemini | Strong security focus, OWASP awareness, edge case detection | gemini_request (approvalStrategy:"mcp_managed") |
| Architecture review | Claude | Best at high-level design, pattern recognition, trade-off analysis | claude_request (approvalStrategy:"mcp_managed") |
| Design doc review | Codex | Checks feasibility, completeness, finds gaps in plans | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Bug investigation | Codex | Can read code, trace logic, identify root causes | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Refactoring | Codex | Handles multi-file changes reliably | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Documentation | Claude | Best prose quality, understands audience | claude_request (approvalStrategy:"mcp_managed") |
| Test generation | Codex | Understands test frameworks, generates comprehensive cases | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Security audit | Gemini | Security-focused analysis, threat modeling | gemini_request (approvalStrategy:"mcp_managed") |
| Multi-file analysis | Codex | Handles large codebases with sqry integration | codex_request (fullAuto:true, approvalStrategy:"mcp_managed") |
| Diversity / tie-breaker review | Grok (xAI) | Independent fourth model from a different vendor family — useful when Claude/Codex/Gemini might share a blind spot | grok_request (approvalStrategy:"mcp_managed") |
| Maximum diversity | Mistral Vibe | Fifth independent vendor (EU / open-weights family); uncorrelated with OpenAI/Anthropic/Google/xAI | mistral_request (approvalStrategy:"mcp_managed") |
| Consensus / unanimous gate | All five in parallel | Catches issues any single model misses; use when correctness > cost | *_request_async for Claude/Codex/Gemini/Grok/Mistral |
The gateway uses sensible configured defaults. Omitting model is almost always correct.
codex_request({prompt: "...", fullAuto: true, approvalStrategy: "mcp_managed"})
gemini_request({prompt: "...", approvalStrategy: "mcp_managed"})
claude_request({prompt: "...", approvalStrategy: "mcp_managed"})
grok_request({prompt: "...", approvalStrategy: "mcp_managed"})
Treat old memory/config IDs such as o3, o3-pro, and gpt-4o as legacy unless list_models currently reports them for the target CLI.
If you see stale IDs in old configs or memory, prefer the configured default or call list_models.
The dispatch default is to omit model. Only include it if the user has explicitly named a model in the current turn.
// Only when the caller asked for this specific variant:
gemini_request({prompt: "...", model: "<explicit-user-request>", approvalStrategy: "mcp_managed"})
list_models() // All CLIs
list_models({cli: "codex"}) // Codex models only
promptParts to share stable prefix bytes across callsWhen the same long system / tools / context block is sent to multiple CLIs (parallel reviews, consensus, multi-round loops), switch from prompt to the structured promptParts field:
codex_request({
promptParts: {
system: "<long stable system instruction>",
tools: "<long stable tool description>",
context: "<long stable file dump / spec>",
task: "Implement X per the above."
},
fullAuto: true,
approvalStrategy: "mcp_managed"
})
The gateway concatenates in canonical order system → tools → context → task, so the stable prefix bytes are byte-identical across the parallel dispatch and across rounds — that raises implicit cache hit rate at each provider with no special-case API contortions. prompt and promptParts are mutually exclusive (the runtime returns provide exactly one of \prompt` or `promptParts`if both are supplied). The stable prefix hash is recorded in the flight recorder and queryable viacache-state://prefix/{hash}` so you can verify the prefix actually got shared.
For short one-off questions, plain prompt is fine.
The most common pattern. Codex with fullAuto handles implementation + testing:
codex_request({
prompt: "Implement [feature] in [path]. Requirements:\n- [req 1]\n- [req 2]\n\nInclude tests.",
fullAuto: true,
approvalStrategy: "mcp_managed"
})
Second most common. Codex reviews with full codebase access:
codex_request({
prompt: "Review [path] for [criteria]. End with APPROVED or NOT APPROVED with findings.",
fullAuto: true,
approvalStrategy: "mcp_managed"
})
For security-sensitive changes:
gemini_request({
prompt: "Security audit [path]. Check for injection, auth bypass, data leaks, OWASP Top 10. End with APPROVED or NOT APPROVED with findings.",
approvalStrategy: "mcp_managed"
})
For comprehensive coverage:
codex_request_async({prompt: "Review [path] for correctness... End with APPROVED or NOT APPROVED with findings.", fullAuto: true, approvalStrategy: "mcp_managed", correlationId: "review-codex"})
gemini_request_async({prompt: "Security audit [path]... End with APPROVED or NOT APPROVED with findings.", approvalStrategy: "mcp_managed", correlationId: "review-gemini"})
grok_request_async({prompt: "Independent review of [path]... End with APPROVED or NOT APPROVED with findings.", approvalStrategy: "mcp_managed", correlationId: "review-grok"})
Model routing affects session strategy:
| LLM | Session Continuity | Implication |
|---|---|---|
| Claude | Real (--continue / --session-id) | Can do multi-turn refinement |
| Codex | Real (codex exec resume <UUID> / --last) — sessionId must be a real Codex UUID from ~/.codex/sessions/; --full-auto dropped on resume | Good for iterative work; pass resumeLatest:true for the most recent cwd session |
| Gemini | Real (--resume) | Good for iterative analysis |
| Grok | Real (--resume <id> / --continue) | Good for iterative review/diversity rounds |
This means:
resumeLatest:true or a real Codex session UUID — gateway-generated gw-* IDs are rejectedfullAuto:true is silently dropped on resumegw-* IDs are bookkeeping handles and rejected if replayedfullAuto is the most autonomous but most expensive per callmodel so configured fast defaults such as Haiku or Flash can apply.fullAuto: true and approvalStrategy: "mcp_managed" when the task needs autonomous code edits, tests, or shell commands.model unless the caller asked for a specific variant.correlationId on every request for tracing.status:"deferred" in responses, then poll every 60s. Results are durable for 30 days (LLM_GATEWAY_JOB_RETENTION_DAYS) — re-issuing the same call within the dedup window (LLM_GATEWAY_DEDUP_WINDOW_MS, default 1h) reattaches to the live job. Pass forceRefresh:true only when inputs actually changed.cli_versions to inspect installed CLI versions. Use cli_upgrade with dryRun:true first; run real upgrades only when the caller wants the local CLI updated. Grok self-updates via grok update; the same cli_upgrade tool routes it for you.promptParts over prompt — same routing rules apply per CLI, but the gateway hashes the stable prefix so you can verify cache effectiveness via cache-state://global or cache-state://prefix/{hash} (tokens / hashes only, no prompt text).npx claudepluginhub verivus-oss/llm-cli-gateway --plugin llm-gatewayConsults external LLMs (OpenAI Codex, Google Gemini) via CLIs for second opinions on architecture, design decisions, model selection, and approach comparisons.
Consults 100+ external AI models via LiteLLM for architectural reviews, security audits, deep code analysis, or extended reasoning on codebases. Runs async with session management and CLI status checks.
Routes Claude Code tasks to optimal models (Haiku, Sonnet, Opus) using decision matrices, cost tables, complexity signals, and subagent assignments for cost/quality tradeoffs.