From builder-ai
Use before merging any PR that adds an LLM API call. Every call must handle timeout, malformed output, low confidence, and refusal — with a defined, user-safe fallback for each. Blocks "add error handling later" completions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/builder-ai:fallback-requiredThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
```
LLM CALLS WITHOUT FALLBACKS ARE TICKING FAILURES.
Every model times out. Every model returns garbage sometimes.
"The model is reliable" is a claim about averages — users experience tails.
A defined, tested fallback path for each failure mode IS reliability.
Trigger on every PR that:
Every LLM call must handle all four:
| Failure Mode | What Happens | Required Response |
|---|---|---|
| Timeout / API error | Network down, provider outage, slow response | Retry with exponential backoff (max 3), then graceful degradation |
| Malformed output | Wrong format, truncated JSON, schema violation | Schema validation → fallback to rule-based default |
| Low confidence | Model expresses uncertainty, output score below threshold | Route to fallback model, simpler rule, or human review |
| Refusal | Model declines to answer, content filter triggered | Detect refusal pattern → user-friendly error, do not surface raw refusal |
Before writing the LLM call, answer: what does this feature return when the model fails?
The fallback must be:
async def call_llm(prompt: str) -> Result:
for attempt in range(MAX_RETRIES):
try:
response = await llm.complete(
prompt, timeout=TIMEOUT_SECONDS
)
parsed = parse_and_validate(response) # raises OutputParseError on bad schema
if parsed.confidence < CONFIDENCE_THRESHOLD: # default 0.7; use 0.85 for high-stakes domains
log_fallback("low_confidence", attempt)
return fallback_result(reason="low_confidence")
return parsed
except TimeoutError:
if attempt == MAX_RETRIES - 1:
log_fallback("timeout", attempt)
return fallback_result(reason="timeout")
await backoff(attempt)
except OutputParseError:
log_fallback("malformed_output", attempt)
return fallback_result(reason="malformed_output")
except RefusalError:
log_fallback("refused", attempt)
return fallback_result(reason="refused")
return fallback_result(reason="max_retries_exceeded")
def test_returns_fallback_on_timeout():
with mock_llm_timeout():
result = call_llm("...")
assert result.is_fallback is True
assert result.reason == "timeout"
def test_returns_fallback_on_malformed_output():
with mock_llm_response("not valid json{{{"):
result = call_llm("...")
assert result.is_fallback is True
A fallback without a test is a promise, not an implementation.
Set an alert if fallback rate exceeds threshold (e.g., > 5% of calls in 5 min). High fallback rates signal prompt regressions, provider incidents, or input distribution shifts — none of which should be silent.
These thoughts mean fallback handling is incomplete — stop:
When fallback-required is satisfied, state it like this:
Fallbacks implemented.
Timeout/API error: retry (max N, backoff Xs–Ys), then fallback_result("timeout") ✓
Malformed output: schema validation → fallback_result("malformed_output") ✓
Low confidence: threshold = X (default 0.7; 0.85 for medical/legal/financial) → fallback_result("low_confidence") ✓
Refusal: refusal pattern detection → fallback_result("refused") ✓
Tests: 4 failure-mode tests passing ✓
Fallback logging: reason field → <log destination> ✓
Alert: fallback rate > N% triggers <alert channel> ✓
All four modes required. A partially-handled call is an unhandled call.
LLM products fail differently than deterministic software. Timeouts spike under load. Output schemas break when models update. Confidence degrades on edge-case inputs. The fallback IS the product's reliability — the model is just the happy path.
Whole-repo audit for over-engineering: finds dead code, unnecessary abstractions, stdlib-replaceable dependencies. Outputs ranked findings and net line/dep savings.
npx claudepluginhub rbraga01/a-team --plugin builder-ai