Use when designing system architecture, making high-level technical decisions, or planning major system changes. Focuses on structure, patterns, and long-term strategy.
Limited to specific tools
Additional assets for this skill
This skill is limited to using the following tools:
name: architecture-design description: Use when designing system architecture, making high-level technical decisions, or planning major system changes. Focuses on structure, patterns, and long-term strategy. allowed-tools:
Design system architecture and make strategic technical decisions.
Good architecture enables change while maintaining simplicity.
Architecture Design (this skill):
Technical Planning (technical-planning skill):
Use architecture when:
Use planning when:
Business context:
Technical context:
Team context:
Functional requirements:
Non-functional requirements:
Example:
## Requirements
### Functional
- Users can search products by name/category
- Users can add items to cart
- Users can checkout and pay
### Non-Functional
- Search response time < 200ms (p95)
- Support 10,000 concurrent users
- 99.9% uptime
- PCI DSS compliant for payments
- Team of 5 developers can maintain
Technical constraints:
Business constraints:
Team constraints:
Never design in a vacuum - consider options:
Example: Data storage choice
Option 1: PostgreSQL
Option 2: MongoDB
Option 3: DynamoDB
Decision: PostgreSQL
Define components and their responsibilities:
┌─────────────────────────────────────────────┐
│ Client Apps │
│ (Web, iOS, Android) │
└────────────────┬────────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ API Gateway / Load Balancer │
└────────────────┬────────────────────────────┘
│
┌────────┴────────┐
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Auth │ │ Core API │
│ Service │ │ Service │
└───────┬───────┘ └───────┬───────┘
│ │
│ ┌────────┴────────┐
│ ▼ ▼
│ ┌──────────────┐ ┌──────────────┐
│ │ PostgreSQL │ │ Redis │
│ │ (Primary) │ │ (Cache) │
│ └──────────────┘ └──────────────┘
│
▼
┌───────────────┐
│ User DB │
└───────────────┘
Component descriptions:
## Components
### API Gateway
**Responsibility:** Route requests, rate limiting, authentication
**Technology:** Nginx
**Dependencies:** Auth Service, Core API Service
**Scale:** 2-3 instances behind load balancer
### Auth Service
**Responsibility:** User authentication, session management, JWT issuing
**Technology:** Python (Flask), PostgreSQL
**API:** REST
**Scale:** Stateless, 2-N instances
### Core API Service
**Responsibility:** Business logic, data access, external integrations
**Technology:** Python (FastAPI), PostgreSQL, Redis
**API:** REST
**Scale:** Stateless, 2-N instances
### PostgreSQL
**Responsibility:** Primary data store
**Scale:** Primary with read replica
### Redis
**Responsibility:** Session storage, caching, rate limiting
**Scale:** Cluster mode (3 nodes)
API contracts:
## API Design
### POST /api/auth/login
**Purpose:** Authenticate user, issue JWT
**Request:**
```json
{
"email": "user@example.com",
"password": "secure_password"
}
Response (200):
{
"token": "eyJ...",
"user": {
"id": "123",
"email": "user@example.com",
"name": "John Doe"
}
}
Errors:
### 7. Plan for Failure
**What can go wrong?**
- Database unavailable
- External API down
- Network partition
- High load
- Data corruption
**Mitigation strategies:**
- Retry with exponential backoff
- Circuit breakers for external services
- Graceful degradation
- Health checks and monitoring
- Database backups
**Example:**
```markdown
## Failure Scenarios
### Database Unavailable
**Impact:** Cannot read/write data
**Mitigation:**
- Read replica failover (automated)
- Circuit breaker after 3 failures
- Cache serves stale data for 5 minutes
- User sees degraded experience message
**Recovery:** Manual failover to replica, fix primary
### External Payment API Down
**Impact:** Cannot process payments
**Mitigation:**
- Retry 3 times with exponential backoff
- Queue payments for later processing
- User notified of delay
- Alert on-call engineer
**Recovery:** Process queued payments once API recovers
Architecture Decision Record (ADR):
# ADR-001: Use PostgreSQL for Primary Database
**Status:** Accepted
**Date:** 2024-01-15
**Deciders:** Tech Lead, Backend Team
## Context
We need to choose a primary database for user data, products, and orders.
Requirements:
- Strong consistency (ACID)
- Complex queries (joins, aggregations)
- < 200ms query time for 90% of queries
- Support 100k users initially
## Decision
Use PostgreSQL as primary database.
## Alternatives Considered
### MongoDB
- **Pros:** Flexible schema, horizontal scaling
- **Cons:** Team unfamiliar, eventual consistency issues
- **Why not:** Team expertise more valuable than flexibility
### DynamoDB
- **Pros:** Managed service, auto-scaling
- **Cons:** Vendor lock-in, limited query capability, cost
- **Why not:** Query limitations would hurt development velocity
### MySQL
- **Pros:** Similar to PostgreSQL, team knows it
- **Cons:** Less feature-rich than PostgreSQL
- **Why not:** PostgreSQL offers JSON support, better full-text search
## Consequences
**Positive:**
- Team can be productive immediately
- Strong consistency guarantees
- Rich query capabilities
- JSON support for flexible data
**Negative:**
- Vertical scaling limits (mitigated: can add read replicas)
- More complex than managed services (mitigated: use RDS)
- Higher operational overhead
**Trade-offs:**
- Chose familiarity over horizontal scaling
- Chose rich queries over eventual consistency
- Can re-evaluate if scale requirements change
## Validation
- Team confirmed expertise in PostgreSQL
- Load testing shows meets performance requirements
- Cost analysis shows acceptable for first year
Start simple, add complexity only when needed.
❌ BAD: Microservices from day 1 with 20 services
✅ GOOD: Start with monolith, split when needed
Apply YAGNI: You Aren't Gonna Need It
Each component has one clear responsibility.
✅ GOOD:
- Auth Service: Authentication only
- User Service: User profile management
- Order Service: Order processing
❌ BAD:
- God Service: Does everything
Apply SOLID principles:
Components depend on interfaces, not implementations.
// ❌ BAD: Tight coupling
class OrderService {
constructor(private db: PostgresDatabase) {}
}
// ✅ GOOD: Loose coupling
class OrderService {
constructor(private db: Database) {} // Interface
}
Benefits:
Related functionality stays together.
✅ GOOD:
user/
- create_user.ts
- update_user.ts
- delete_user.ts
- user_repository.ts
❌ BAD:
create/
- create_user.ts
- create_order.ts
update/
- update_user.ts
- update_order.ts
Make dependencies and contracts clear.
// ❌ BAD: Implicit dependency
function processOrder(orderId: string) {
const db = global.database // Where does this come from?
// ...
}
// ✅ GOOD: Explicit dependency
function processOrder(
orderId: string,
db: Database,
logger: Logger
) {
// Dependencies are clear
}
Detect and report errors early.
// ❌ BAD: Silent failure
function divide(a: number, b: number) {
if (b === 0) return 0 // Wrong!
return a / b
}
// ✅ GOOD: Fail fast
function divide(a: number, b: number) {
if (b === 0) {
throw new Error('Division by zero')
}
return a / b
}
Make it easy to test.
// ❌ BAD: Hard to test
class OrderService {
processOrder(orderId: string) {
const db = new PostgresDatabase() // Can't mock
const api = new PaymentAPI() // Can't mock
// ...
}
}
// ✅ GOOD: Easy to test
class OrderService {
constructor(
private db: Database, // Can inject mock
private api: PaymentAPI // Can inject mock
) {}
processOrder(orderId: string) {
// ...
}
}
┌─────────────────────┐
│ Presentation │ (UI, API controllers)
├─────────────────────┤
│ Business Logic │ (Domain, services)
├─────────────────────┤
│ Data Access │ (Repositories, ORMs)
├─────────────────────┤
│ Database │ (Storage)
└─────────────────────┘
When to use: Simple to moderate complexity
┌───────────────────────┐
│ External Systems │
│ (UI, DB, APIs) │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ Adapters │ (Implementation)
│ (REST, PostgreSQL) │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ Ports │ (Interfaces)
│ (IUserRepo, IAuth) │
└──────────┬────────────┘
│
┌──────────▼────────────┐
│ Core Domain │ (Business logic)
│ (Pure logic) │
└───────────────────────┘
When to use: Want to isolate business logic, multiple frontends
┌─────────┐ ┌─────────┐ ┌─────────┐
│ User │ │ Order │ │ Payment │
│ Service │ │ Service │ │ Service │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└────────────┴────────────┘
│
┌───────▼────────┐
│ Message Bus │
│ (Event-driven)│
└────────────────┘
When to use: Large team, need independent deploy, clear boundaries
Avoid when: Small team, unclear boundaries, early stage
┌─────────┐ ┌─────────────┐ ┌─────────┐
│Producer │──────▶│ Event Bus │──────▶│Consumer │
└─────────┘ └─────────────┘ └─────────┘
When to use: Async processing, decoupled systems, audit trails
Don't optimize for scale you don't have.
BAD: Build microservices for 100 users
GOOD: Start with monolith, split when needed
Don't choose technology to pad resume.
BAD: "I want to learn Kubernetes, let's use it"
GOOD: "Kubernetes fits our scale needs"
Microservices that are tightly coupled.
BAD: Service A can't deploy without Service B
GOOD: Services are independently deployable
No structure, everything depends on everything.
BAD: Any code can call any other code
GOOD: Clear layers and boundaries
Over-analyzing, never shipping.
BAD: Spend 6 months on perfect architecture
GOOD: Design enough to start, iterate
The best architecture is the one that's simple enough to ship and flexible enough to evolve.