By sethdford
SRE plugin to measure and compare Claude Code CLI performance across Anthropic Direct API and AWS Bedrock. Tracks TTFB, throughput, P50/P90/P99, and generates latency reports.
Matches all tools
Hooks run on every tool call, not just specific ones
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Measure and compare Claude API performance across Anthropic Direct API and AWS Bedrock directly from Claude Code. Pure bash/jq/curl/perl — zero Python dependency.
| Metric | Description |
|---|---|
| TTFB (Time to First Byte) | Network round-trip time to the API endpoint |
| TTFT (Time to First Token) | How long until the first content token streams back |
| Server Latency | Server-side processing time (Bedrock metrics.latencyMs) |
| Generation Time | Token streaming duration after first token |
| Output Throughput | Tokens generated per second |
| Percentiles | P50, P90, P95, P99 for all timing metrics |
| Tool Budget | Time breakdown: Model vs Bash vs MCP vs CLI tools |
/sre-latency:benchmarkFull benchmark comparing both providers with statistical analysis.
/sre-latency:benchmark -n 10 --prompt-size medium --output results.json
/sre-latency:benchmark --providers anthropic-direct -n 20
/sre-latency:latency-checkQuick single-request probe for spot-checking.
/sre-latency:latency-check both
/sre-latency:latency-check direct
/sre-latency:latency-check bedrock
/sre-latency:reportGenerate formatted reports from saved benchmark data.
/sre-latency:report results.json
/sre-latency:gradeGrade benchmark results against SLO thresholds (A-F).
/sre-latency:grade results.json
/sre-latency:tool-timingsView categorized time-spent analysis — Model vs Tools vs MCP breakdown.
/sre-latency:tool-timings
/sre-latency:compareSide-by-side comparison of multiple benchmark runs.
/sre-latency:compare results/budget-run1.json results/budget-run2.json
/sre-latency:compare results/ --latest 3
Compare real Claude Code coding sessions across providers:
scripts/session_benchmark.sh --task simple --output results/session.json
scripts/session_benchmark.sh --task medium
scripts/session_benchmark.sh --task complex --model opus
Task levels:
The latency-advisor skill activates automatically when you discuss latency issues, Bedrock performance, or TTFT optimization. It provides SRE-focused guidance.
# Required tools (all standard on macOS/Linux)
# bash, jq, curl, perl, bc
# Anthropic Direct API
export ANTHROPIC_API_KEY=sk-ant-...
# AWS Bedrock (configure at least one)
export AWS_REGION=us-east-1
# Plus standard AWS credentials (aws configure, env vars, or SSO)
# Load locally during development
claude --plugin-dir ./sre-latency-monitor
# Or install from a marketplace
/plugin install sre-latency@your-marketplace
The plugin includes a statusline script that displays real-time tool latency in the Claude Code status bar:
[D] Sonnet 4.5 | Ctx 45% | $0.15 | 5m30s | Bash:2m CLI:30s [2m30s tools] | TTFT 450ms
Provider indicators: [D] = Direct API, [BR] = Bedrock, [VX] = Vertex.
Assigns letter grades (A-F) based on configurable SLO thresholds:
| Grade | TTFT P50 | TTFT P99 | Throughput Mean | Error Rate |
|---|---|---|---|---|
| A | ≤ 500ms | ≤ 1500ms | ≥ 80 t/s | 0% |
| B | ≤ 800ms | ≤ 2500ms | ≥ 60 t/s | ≤ 1% |
| C | ≤ 1200ms | ≤ 4000ms | ≥ 40 t/s | ≤ 5% |
| D | ≤ 2000ms | ≤ 6000ms | ≥ 20 t/s | ≤ 10% |
| F | > 2000ms | > 6000ms | < 20 t/s | > 10% |
Bedrock adds measurable per-request overhead from AWS network routing and the Bedrock invocation layer. This overhead exists even with no Guardrails configured.
We measured this overhead across two independent test runs on Haiku 4.5 (the fastest model, where overhead is most visible):
| Run | Direct API (avg) | Bedrock (avg) | Overhead |
|---|---|---|---|
| Guardrail benchmark (N=5) | 581ms | 1,135ms | +554ms (+95%) |
| Latency budget (N=5) | 568ms | 1,402ms | +834ms (+147%) |
Both runs: Haiku 4.5, "count 1-20" prompt, max_tokens=128. Run-to-run variance is significant with N=5.
The measured overhead ranges from ~550–850ms per call. This variance is expected with small sample sizes and network conditions. The key finding is directional: Bedrock consistently adds hundreds of milliseconds of per-request overhead.
Adding a Guardrail (content filter) adds another ~800ms on top of the base Bedrock overhead:
npx claudepluginhub sethdford/sre-latency-monitorQA skills for api-testing domain.
Master architecture communication: C4 models, RFCs, presentations, and documentation. Align stakeholders and explain architectural decisions.
Prioritization frameworks, roadmapping, and trade-off decision-making.
Design patterns, data structures, algorithms, concurrency patterns, functional patterns, reactive patterns, state management, and resource management.
Establish architecture governance, design fitness functions, manage tech debt, and ensure compliance. Build sustainable architecture practices.
Track and optimize application response times
OpenLit telemetry for Claude Code: sessions, tool calls, edit decisions, and cost rollups.
DevsForge cloud cost optimization specialist for analyzing and reducing infrastructure expenses
45% cost reduction measured. Cache expiry prevention, SubTask auto-delegation, zero-cost context restoration, real-time cost dashboard. The only Claude Code plugin built from CC source analysis.
API endpoint benchmarking and performance reporting
Agent-ready playbooks for LLM serving benchmarks, capacity planning, torch-profiler triage, pipeline analysis, compute simulation, SGLang/vLLM SOTA Humanize loops, human code review, production incident triage, and model PR-history dossiers.