From contextd
Use when reviewing agent behavior patterns, improving CLAUDE.md based on past failures, or checking ReasoningBank health. REQUIRES contextd MCP server - this skill is inoperable without it.
How this skill is triggered — by the user, by Claude, or both
Slash command
/contextd:self-reflectionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Mine memories and remediations for behavior patterns, surface findings to user, remediate docs with pressure-tested improvements.
Mine memories and remediations for behavior patterns, surface findings to user, remediate docs with pressure-tested improvements.
Core loop: Search -> Report -> User prioritizes -> Brainstorm -> Pressure test -> Apply
contextd-workflow skill/remembercontextd-workflow skillFocus on agent behaviors, not technical failures:
| Behavior Type | Description | Examples |
|---|---|---|
| rationalized-skip | Justified skipping required step | "too simple to test", "user implied consent" |
| overclaimed | Absolute language inappropriately | "ensures", "guarantees", "production ready" |
| ignored-instruction | Didn't follow CLAUDE.md/skill | Skipped contextd search, ignored TDD |
| assumed-context | Assumed without verification | Assumed permission, requirements, state |
| undocumented-decision | Significant choice without rationale | Changed architecture without comparison |
| Severity | Combination |
|---|---|
| CRITICAL | rationalized-skip + destructive/security operation |
| HIGH | rationalized-skip + validation skip, ignored-instruction |
| MEDIUM | overclaimed, assumed-context |
| LOW | undocumented-decision, style issues |
For each finding, surface:
Present findings
|
User selects findings to remediate
|
Generate doc improvements
|
Generate pressure scenarios (from real failures)
|
Run batch tests via subagents
|
Pass? --No--> Iterate
| Yes
Create Issue/PR
|
Apply changes
|
Close feedback loop:
memory_feedback(memory_id, helpful=true)
Tag original memories as remediated
# Rationalized skips
memory_search("skip OR skipped OR bypass OR ignored")
memory_search("too simple OR trivial OR obvious")
# User feedback indicating ignored instructions
memory_search("why did you OR should have OR forgot to")
# Assumptions without verification
memory_search("assumed OR without checking")
# Overclaiming
memory_search("ensures OR guarantees OR production ready")
Filter out technical bugs: Exclude memories with error:* tags or stack traces.
--health flag analyzes:
| Action | Command |
|---|---|
| Full report | /reflect |
| Health only | /reflect --health |
| Apply fixes | /reflect --apply |
| Recent only | /reflect --since=7d |
| Filter by behavior | /reflect --behavior=rationalized-skip |
| Filter by severity | /reflect --severity=HIGH |
| Mistake | Why It Fails |
|---|---|
| Skipping pressure tests | "Fixed" docs don't actually prevent behavior |
| Modifying plugin source | Breaks on update; use includes |
| Auto-applying security fixes | High-stakes changes need review |
| Ignoring frequency | 10 TDD skips is systemic, not minor |
| Absolute claims in fixes | "This prevents X" -> "This helps reduce X" |
Go beyond symptoms to find root causes:
{
"finding_id": "ref_001",
"behavior": "rationalized-skip",
"symptom": "Skipped tests before claiming fix complete",
"causal_chain": [
{
"level": 1,
"cause": "Agent claimed fix without running tests",
"evidence": ["mem_123", "mem_124"]
},
{
"level": 2,
"cause": "CLAUDE.md test instruction buried in long section",
"evidence": ["claude_md_line_245"]
},
{
"level": 3,
"cause": "No PreToolUse hook enforcing test requirement",
"evidence": ["hooks.json missing enforcement"]
}
],
"root_cause": "Missing automated enforcement of test-before-fix policy",
"fix_target": "hooks.json + CLAUDE.md restructure"
}
| Level | Description | Fix Location |
|---|---|---|
| 1 | Immediate behavior | Agent prompt/skill |
| 2 | Missing guidance | CLAUDE.md/documentation |
| 3 | Missing enforcement | Hooks/automation |
| 4 | Systemic gap | Plugin/skill redesign |
Find patterns across incidents:
causal_correlate(findings: [ref_001, ref_002, ref_003])
Returns:
shared_root_causes: [
{ cause: "Missing hook enforcement", incidents: [ref_001, ref_002] },
{ cause: "Ambiguous CLAUDE.md section", incidents: [ref_002, ref_003] }
]
recommended_fixes: [
{ target: "hooks.json", impact: "high", fixes_incidents: 2 }
]
Track improvement (or regression):
{
"benchmark_period": "2026-01-01 to 2026-01-28",
"metrics": {
"rationalized_skip": {
"count": 5,
"previous_period": 12,
"trend": "improving",
"change_pct": -58
},
"ignored_instruction": {
"count": 8,
"previous_period": 6,
"trend": "regressing",
"change_pct": +33
},
"assumed_context": {
"count": 3,
"previous_period": 3,
"trend": "stable",
"change_pct": 0
}
}
}
| Metric | Target | Good | Warning | Critical |
|---|---|---|---|---|
| rationalized_skip/week | 0 | < 2 | 2-5 | > 5 |
| ignored_instruction/week | 0 | < 3 | 3-7 | > 7 |
| overclaimed/week | 0 | < 5 | 5-10 | > 10 |
| test_coverage_skip | 0% | < 5% | 5-15% | > 15% |
/reflect --benchmark --compare-periods "2026-01" "2025-12"
Output:
| Behavior | Dec 2025 | Jan 2026 | Change |
|----------|----------|----------|--------|
| rationalized-skip | 12 | 5 | -58% |
| ignored-instruction | 6 | 8 | +33% |
Top Improvement: Hook enforcement reduced skips
Top Regression: New skills lack CLAUDE.md entries
Predict likely future failures based on patterns:
{
"prediction": {
"behavior": "rationalized-skip",
"likelihood": 0.75,
"conditions": [
"Complex task with > 5 sub-steps",
"Time pressure mentioned in prompt",
"No explicit test requirement in task"
],
"historical_basis": ["mem_101", "mem_102", "mem_103"],
"prevention": "Add explicit test checkpoint to complex task prompts"
}
}
| Factor | Risk Increase | Mitigation |
|---|---|---|
| Task complexity > 5 steps | +40% skip risk | Explicit checkpoints |
| "Quick fix" language | +60% skip risk | Reject quick-fix framing |
| No acceptance criteria | +50% assumption risk | Require criteria |
| Security-adjacent code | +30% overclaim risk | Require review |
{
"alert": "high_risk_task_detected",
"task_description": "Quick fix for authentication bug",
"risk_factors": ["quick_fix_language", "security_adjacent"],
"predicted_behaviors": ["rationalized-skip", "assumed-context"],
"recommended_guardrails": [
"Require explicit test plan before starting",
"Trigger consensus-review before merge"
]
}
Auto-intervene when risk detected:
{
"hook_type": "PreToolUse",
"tool_name": "Edit",
"condition": "file_path.contains('auth') AND prediction.skip_risk > 0.5",
"prompt": "High skip risk detected for security code. Before editing, confirm: 1) Tests exist 2) Review planned 3) No assumptions about user state"
}
Tag reflection findings with standard types:
| Finding Type | Tag | Purpose |
|---|---|---|
| Behavior pattern | type:pattern, category:behavior | Track patterns |
| Root cause | type:decision, category:analysis | Document cause |
| Fix proposal | type:learning, category:improvement | Capture fix |
| Regression | type:failure, category:regression | Track setbacks |
| Policy update | type:policy, category:enforcement | New rules |
<org>/<project>/reflections/<reflection_id>
Examples:
fyrsmithlabs/contextd/reflections/2026-01-weekly
fyrsmithlabs/marketplace/reflections/v1.6-pre-release
<reflection_namespace>/findings/<finding_id>
Example:
fyrsmithlabs/contextd/reflections/2026-01-weekly/findings/ref_001
All reflection records include:
| Field | Description | Auto-set |
|---|---|---|
created_by | Reflection agent/session | Yes |
created_at | Analysis timestamp | Yes |
period_start | Analysis period start | Yes |
period_end | Analysis period end | Yes |
memory_count | Memories analyzed | Yes |
finding_count | Findings generated | Yes |
remediation_count | Fixes applied | Yes |
Run reflection analysis without blocking:
Task(
subagent_type: "general-purpose",
prompt: "Analyze memories for behavior patterns over past 7 days",
run_in_background: true,
description: "Background reflection analysis"
)
// Continue other work...
// Collect results later:
TaskOutput(task_id, block: true)
Chain reflection phases:
search_task = Task(prompt: "Search memories for behavior patterns")
analyze_task = Task(prompt: "Analyze patterns, build causal chains", addBlockedBy: [search_task.id])
benchmark_task = Task(prompt: "Compare to previous period", addBlockedBy: [analyze_task.id])
predict_task = Task(prompt: "Generate predictions", addBlockedBy: [analyze_task.id])
report_task = Task(prompt: "Synthesize report", addBlockedBy: [benchmark_task.id, predict_task.id])
Auto-alert on predicted risky operations:
{
"hook_type": "PreToolUse",
"tool_name": "Edit|Bash",
"condition": "prediction_model.risk_score > 0.7",
"prompt": "High-risk operation predicted. Review risk factors and confirm guardrails are in place before proceeding."
}
Auto-record behavior patterns:
{
"hook_type": "PostToolUse",
"tool_name": "Task",
"condition": "task_description.contains('reflection')",
"prompt": "Reflection complete. Record findings to memory with type:pattern tags. Update benchmarks."
}
Self-reflection emits events for other skills:
{
"event": "reflection_complete",
"payload": {
"reflection_id": "2026-01-weekly",
"findings_count": 12,
"critical_count": 1,
"high_count": 3,
"trend": "improving",
"top_behavior": "rationalized-skip"
},
"notify": ["setup", "workflow", "orchestration"]
}
Subscribe to reflection events:
reflection_started - Analysis beganreflection_complete - Analysis finishedcritical_finding - CRITICAL behavior detectedregression_detected - Metrics worseningbenchmark_updated - New baseline recordedprediction_generated - Risk prediction availableintervention_triggered - Auto-guardrail activatednpx claudepluginhub fyrsmithlabs/marketplace --plugin contextdReviews completed coding sessions to extract actionable improvements: DX friction, documentation gaps, architecture issues, anti-patterns, bug prevention, and tooling updates.
Captures high/medium/low confidence patterns from conversations to prevent repeating mistakes and preserve successes. Invoke proactively after corrections, praise, edge cases, or skill-heavy sessions.
Captures high/medium/low confidence learnings from conversations via triggers like corrections, praise, edge cases. Improves skills by preventing mistakes and preserving successes. Invoke proactively after 'no/wrong', 'perfect', or session ends.