Create Architecture Decision Records (ADRs) and Runbooks for operational documentation.
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
name: documentation description: "Create Architecture Decision Records (ADRs) and Runbooks for operational documentation." autoInvoke: false priority: medium triggers:
Category: Documentation Version: 1.0.0 Used By: All agents, Phase 8
Create Architecture Decision Records (ADRs) and Runbooks for operational documentation.
# ADR-[NUMBER]: [TITLE]
**Status:** [Proposed | Accepted | Deprecated | Superseded by ADR-XXX]
**Date:** YYYY-MM-DD
**Deciders:** [Names/Teams]
## Context
[What is the issue? Why do we need to make a decision?]
## Decision
[What is the change being proposed/decided?]
## Options Considered
### Option 1: [Name]
- **Pros:** [Benefits]
- **Cons:** [Drawbacks]
### Option 2: [Name]
- **Pros:** [Benefits]
- **Cons:** [Drawbacks]
### Option 3: [Name]
- **Pros:** [Benefits]
- **Cons:** [Drawbacks]
## Consequences
### Positive
- [Benefit 1]
- [Benefit 2]
### Negative
- [Tradeoff 1]
- [Tradeoff 2]
### Risks
- [Risk 1] - Mitigation: [How to handle]
## References
- [Link to relevant docs/discussions]
# ADR-001: Use PostgreSQL for Primary Database
**Status:** Accepted
**Date:** 2025-01-15
**Deciders:** Backend Team, DevOps
## Context
We need a relational database for our new application. The application requires ACID compliance, complex queries, and JSON support.
## Decision
Use PostgreSQL 16 as the primary database.
## Options Considered
### Option 1: PostgreSQL
- **Pros:** ACID, JSON support, excellent performance, open source
- **Cons:** Requires more ops expertise than managed solutions
### Option 2: MySQL
- **Pros:** Familiar, widely supported
- **Cons:** Weaker JSON support, licensing concerns
### Option 3: MongoDB
- **Pros:** Flexible schema, easy scaling
- **Cons:** Not ideal for relational data, eventual consistency
## Consequences
### Positive
- Full ACID compliance
- Native JSON/JSONB support
- Strong ecosystem and tooling
### Negative
- Team needs PostgreSQL training
- More complex backup strategy
docs/adr/
├── ADR-001-database-selection.md
├── ADR-002-authentication-strategy.md
├── ADR-003-api-versioning.md
└── README.md (index)
# Runbook: [Service/Task Name]
**Service:** [Service name]
**Owner:** [Team/Person]
**Last Updated:** YYYY-MM-DD
**On-Call:** [Rotation/Contact]
## Overview
[Brief description of what this runbook covers]
## Prerequisites
- [ ] Access to [system/tool]
- [ ] Credentials for [service]
- [ ] VPN connected (if applicable)
## Common Operations
### Start Service
```bash
# Command to start
systemctl start service-name
# Verify running
systemctl status service-name
# Graceful shutdown
systemctl stop service-name
# Force stop (if graceful fails)
systemctl kill service-name
# Recent logs
journalctl -u service-name -n 100
# Follow logs
journalctl -u service-name -f
# Search for errors
journalctl -u service-name | grep -i error
# Endpoint check
curl -s http://localhost:8080/health | jq
# Expected response
# { "status": "healthy", "version": "1.0.0" }
Symptoms: Service fails to start, exits immediately
Diagnosis:
journalctl -u service-name -n 50
Common Causes:
.env filelsof -i :8080Resolution:
# Fix env vars
source /etc/service-name/env
# Restart
systemctl restart service-name
Symptoms: Memory > 80% threshold
Diagnosis:
# Check memory
free -h
ps aux --sort=-%mem | head -10
Resolution:
# Restart service (temporary)
systemctl restart service-name
# Scale if needed
kubectl scale deployment service-name --replicas=3
| Alert | Severity | Action | Escalate After |
|---|---|---|---|
| Service Down | Critical | Restart, check logs | 5 min |
| High CPU | Warning | Monitor, scale if needed | 15 min |
| High Memory | Warning | Restart if > 90% | 10 min |
| Error Rate > 5% | Critical | Check logs, rollback | 5 min |
| Role | Name | Contact |
|---|---|---|
| Primary On-Call | [Name] | [Slack/Phone] |
| Secondary | [Name] | [Slack/Phone] |
| Team Lead | [Name] | [Slack/Phone] |
### Runbook Naming Convention
docs/runbooks/ ├── api-service.md ├── database-maintenance.md ├── deployment-rollback.md ├── incident-response.md └── README.md (index)
---
## Documentation Checklist
### ADR Checklist
- [ ] Clear problem statement
- [ ] Options evaluated objectively
- [ ] Decision clearly stated
- [ ] Consequences documented
- [ ] Numbered and indexed
### Runbook Checklist
- [ ] Prerequisites listed
- [ ] Commands are copy-paste ready
- [ ] Common issues documented
- [ ] Escalation path defined
- [ ] Contacts current
---
## Best Practices
### Do's
- Keep ADRs immutable (supersede, don't edit)
- Test runbook commands before documenting
- Include "why" not just "what"
- Review runbooks after incidents
- Index all documentation
### Don'ts
- Delete old ADRs (mark superseded)
- Write runbooks without testing
- Assume reader knows context
- Let docs become stale
- Skip the troubleshooting section
---
**Version:** 1.0.0 | **Last Updated:** 2025-11-28