Search everything...

Stats

Actions

Available In

sre-latency

Name: sre-latency
Author: sethdford

By sethdford

SRE plugin to measure and compare Claude Code CLI performance across Anthropic Direct API and AWS Bedrock. Tracks TTFB, throughput, P50/P90/P99, and generates latency reports.

npx claudepluginhub sethdford/sre-latency-monitor

Popularity

Stars

Med: 0·Avg: 280

Installs

Top 10%

Med: 0·Avg: 1

What's Inside

Slash Commands9

Context

/benchmark

- Plugin root: ${CLAUDE_PLUGIN_ROOT}

Context

/compare

- Plugin root: ${CLAUDE_PLUGIN_ROOT}

Context

/grade

Grade benchmark results against SLO thresholds (A-F) for each provider

Context

/http-trace

- Plugin root: ${CLAUDE_PLUGIN_ROOT}

Context

/latency-budget

- Plugin root: ${CLAUDE_PLUGIN_ROOT}

Skills1

latency-advisor

/latency-advisor

Provides SRE latency optimization advice for Claude API usage. Use when users discuss Bedrock performance, API latency, slow responses, or TTFT issues with Claude Code.

Hooks1

Event Hooks

All tools

3 hooks across 3 events

Stats

Version2.2.0

LanguageShell

Stars0

Copy clicks1

MaintenanceExcellent

LicenseMIT

Last CommitFeb 21, 2026

AddedFeb 7, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Safety Signals

Critical

Matches all tools

Hooks run on every tool call, not just specific ones

README

SRE Latency Monitor — Claude Code Plugin

Measure and compare Claude API performance across Anthropic Direct API and AWS Bedrock directly from Claude Code. Pure bash/jq/curl/perl — zero Python dependency.

What It Measures

Metric	Description
TTFB (Time to First Byte)	Network round-trip time to the API endpoint
TTFT (Time to First Token)	How long until the first content token streams back
Server Latency	Server-side processing time (Bedrock `metrics.latencyMs`)
Generation Time	Token streaming duration after first token
Output Throughput	Tokens generated per second
Percentiles	P50, P90, P95, P99 for all timing metrics
Tool Budget	Time breakdown: Model vs Bash vs MCP vs CLI tools

Commands

`/sre-latency:benchmark`

Full benchmark comparing both providers with statistical analysis.

/sre-latency:benchmark -n 10 --prompt-size medium --output results.json
/sre-latency:benchmark --providers anthropic-direct -n 20

`/sre-latency:latency-check`

Quick single-request probe for spot-checking.

/sre-latency:latency-check both
/sre-latency:latency-check direct
/sre-latency:latency-check bedrock

`/sre-latency:report`

Generate formatted reports from saved benchmark data.

/sre-latency:report results.json

`/sre-latency:grade`

Grade benchmark results against SLO thresholds (A-F).

/sre-latency:grade results.json

`/sre-latency:tool-timings`

View categorized time-spent analysis — Model vs Tools vs MCP breakdown.

/sre-latency:tool-timings

`/sre-latency:compare`

Side-by-side comparison of multiple benchmark runs.

/sre-latency:compare results/budget-run1.json results/budget-run2.json
/sre-latency:compare results/ --latest 3

Session Benchmark

Compare real Claude Code coding sessions across providers:

scripts/session_benchmark.sh --task simple --output results/session.json
scripts/session_benchmark.sh --task medium
scripts/session_benchmark.sh --task complex --model opus

Task levels:

simple — Write a function + verify (~2 tool calls)
medium — Write a module with tests (~5-8 tool calls)
complex — Read existing code, refactor, add tests (~10-15 tool calls)

Auto-Invoked Skill

The latency-advisor skill activates automatically when you discuss latency issues, Bedrock performance, or TTFT optimization. It provides SRE-focused guidance.

Prerequisites

# Required tools (all standard on macOS/Linux)
# bash, jq, curl, perl, bc

# Anthropic Direct API
export ANTHROPIC_API_KEY=sk-ant-...

# AWS Bedrock (configure at least one)
export AWS_REGION=us-east-1
# Plus standard AWS credentials (aws configure, env vars, or SSO)

Installation

# Load locally during development
claude --plugin-dir ./sre-latency-monitor

# Or install from a marketplace
/plugin install sre-latency@your-marketplace

Statusline

The plugin includes a statusline script that displays real-time tool latency in the Claude Code status bar:

[D] Sonnet 4.5 | Ctx 45% | $0.15 | 5m30s | Bash:2m CLI:30s [2m30s tools] | TTFT 450ms

Provider indicators: [D] = Direct API, [BR] = Bedrock, [VX] = Vertex.

SLO Grading

Assigns letter grades (A-F) based on configurable SLO thresholds:

Grade	TTFT P50	TTFT P99	Throughput Mean	Error Rate
A	≤ 500ms	≤ 1500ms	≥ 80 t/s	0%
B	≤ 800ms	≤ 2500ms	≥ 60 t/s	≤ 1%
C	≤ 1200ms	≤ 4000ms	≥ 40 t/s	≤ 5%
D	≤ 2000ms	≤ 6000ms	≥ 20 t/s	≤ 10%
F	> 2000ms	> 6000ms	< 20 t/s	> 10%

Findings: Why Bedrock Feels Slow

The Root Cause: Per-Request Overhead

Bedrock adds measurable per-request overhead from AWS network routing and the Bedrock invocation layer. This overhead exists even with no Guardrails configured.

We measured this overhead across two independent test runs on Haiku 4.5 (the fastest model, where overhead is most visible):

Run	Direct API (avg)	Bedrock (avg)	Overhead
Guardrail benchmark (N=5)	581ms	1,135ms	+554ms (+95%)
Latency budget (N=5)	568ms	1,402ms	+834ms (+147%)

Both runs: Haiku 4.5, "count 1-20" prompt, max_tokens=128. Run-to-run variance is significant with N=5.

The measured overhead ranges from ~550–850ms per call. This variance is expected with small sample sizes and network conditions. The key finding is directional: Bedrock consistently adds hundreds of milliseconds of per-request overhead.

Guardrail Impact

Adding a Guardrail (content filter) adds another ~800ms on top of the base Bedrock overhead:

View full README on GitHub

sre-latency

Popularity

What's Inside

Confidence

README

SRE Latency Monitor — Claude Code Plugin

What It Measures

Commands

/sre-latency:benchmark

/sre-latency:latency-check

/sre-latency:report

/sre-latency:grade

/sre-latency:tool-timings

/sre-latency:compare

Session Benchmark

Auto-Invoked Skill

Prerequisites

Installation

Statusline

SLO Grading

Findings: Why Bedrock Feels Slow

The Root Cause: Per-Request Overhead

Guardrail Impact

Similar Plugins

response-time-tracker

openlit-cc

cost-optimizer

claude-code-token-saver

api-benchmarker

ai-infra-auto-driven-skills

More by sethdford

api-testing

communication

prioritization

implementation-patterns

architecture-governance

SRE Latency Monitor — Claude Code Plugin

What It Measures

Commands

/sre-latency:benchmark

/sre-latency:latency-check

/sre-latency:report

/sre-latency:grade

/sre-latency:tool-timings

/sre-latency:compare

Session Benchmark

Auto-Invoked Skill

Prerequisites

Installation

Statusline

SLO Grading

Findings: Why Bedrock Feels Slow

The Root Cause: Per-Request Overhead

Guardrail Impact

Popularity

Health & Quality

More by sethdford

api-testing

communication

prioritization

implementation-patterns

architecture-governance

Similar Plugins

response-time-tracker

openlit-cc

cost-optimizer

claude-code-token-saver

api-benchmarker

ai-infra-auto-driven-skills

`/sre-latency:benchmark`

`/sre-latency:latency-check`

`/sre-latency:report`

`/sre-latency:grade`

`/sre-latency:tool-timings`

`/sre-latency:compare`

`/sre-latency:benchmark`

`/sre-latency:latency-check`

`/sre-latency:report`

`/sre-latency:grade`

`/sre-latency:tool-timings`

`/sre-latency:compare`