Microservices Architecture Patterns

Expert guidance for designing, implementing, and operating microservices architectures with proven patterns for service decomposition, inter-service communication, data management, and operational resilience.

When to Use This Skill

Breaking down monolithic applications into services
Designing new distributed systems from scratch
Implementing service-to-service communication patterns
Managing data consistency across services
Building resilient distributed systems
Defining service boundaries and responsibilities
Implementing API gateways and service meshes
Designing event-driven architectures
Managing cross-cutting concerns (auth, logging, tracing)

Core Principles

1. Single Responsibility per Service

Each microservice should have one reason to change and one business capability.

Good Service Boundaries:

✓ User Authentication Service
✓ Order Management Service
✓ Inventory Service
✓ Notification Service

✗ User and Order Management Service
✗ Everything Service

2. Independent Deployability

Services should be deployable without coordinating with other services.

3. Decentralized Data Management

Each service owns its data and doesn't share databases.

4. Design for Failure

Failures are inevitable in distributed systems; design for resilience.

5. Infrastructure Automation

Automate deployment, scaling, and recovery.

Service Decomposition Patterns

Pattern 1: Decompose by Business Capability

Definition: Organize services around business capabilities, not technical layers.

Example:

Business Capabilities → Services

Order Management:
  - Order Service (create, track, cancel orders)
  - Fulfillment Service (pick, pack, ship)

Customer Management:
  - Customer Profile Service
  - Customer Preferences Service

Inventory:
  - Stock Management Service
  - Warehouse Service

Benefits:

Services aligned with business domains
Clear ownership boundaries
Easier to understand and maintain
Teams organized around business capabilities

Trade-offs:

Requires deep business understanding
May need refactoring as business evolves
Service boundaries can be subjective

Pattern 2: Decompose by Subdomain (DDD)

Definition: Use Domain-Driven Design to identify bounded contexts as service boundaries.

Example:

E-commerce Domain:

Core Subdomains (competitive advantage):
  - Product Catalog Service
  - Order Processing Service
  - Pricing Engine Service

Supporting Subdomains:
  - Customer Service
  - Notification Service

Generic Subdomains (buy vs. build):
  - Payment Gateway (integrate Stripe)
  - Shipping (integrate FedEx/UPS)

Bounded Context Indicators:

Different language/terminology
Different business rules
Independent rate of change
Different data models

Pattern 3: Decompose by Transaction

Definition: Group operations that need to be ACID transactions into a service.

Example:

Order Service includes:
  - Create Order
  - Reserve Inventory
  - Calculate Total
  - Apply Discount

Why? These operations need to be atomic and consistent.

When to Use:

Operations require strong consistency
Complex business rules span multiple entities
Avoid distributed transactions

Communication Patterns

Synchronous Communication

Pattern: API Gateway

Purpose: Single entry point for all clients, routing to appropriate services.

Client → API Gateway → [Auth, Rate Limiting, Routing] → Microservices

Benefits:
- Simplified client interface
- Centralized cross-cutting concerns
- Protocol translation (REST → gRPC)
- Request aggregation

Implementations: Kong, AWS API Gateway, Nginx, Traefik

Pattern: Service-to-Service REST

Best Practices:

# Use service discovery (Consul, Eureka, Kubernetes DNS)
GET http://order-service:8080/orders/123

# Include correlation IDs for tracing
X-Correlation-ID: a3f7c9b2-d8e1-4f6g-h9i0

# Use circuit breakers (Hystrix, Resilience4j)
@CircuitBreaker(name = "inventory-service")
public Product getProduct(String id) {
  return restTemplate.getForObject(
    "http://inventory-service/products/" + id,
    Product.class
  );
}

# Implement timeouts
connect-timeout: 2000
read-timeout: 5000

Pattern: gRPC for Internal Communication

When to Use:

High performance requirements
Type-safe contracts (Protocol Buffers)
Streaming data (server/client/bidirectional)
Internal service-to-service communication

service OrderService {
  rpc GetOrder (GetOrderRequest) returns (Order);
  rpc CreateOrder (CreateOrderRequest) returns (Order);
  rpc StreamOrders (StreamRequest) returns (stream Order);
}

message Order {
  string id = 1;
  string customer_id = 2;
  repeated OrderItem items = 3;
  double total = 4;
}

Asynchronous Communication

Pattern: Event-Driven Architecture

Purpose: Services communicate via events, decoupled in time and space.

Event Types:

1. Domain Events (business events):
   - OrderCreated
   - PaymentProcessed
   - InventoryReserved

2. Change Data Capture (CDC):
   - OrderStatusChanged
   - CustomerUpdated

3. Integration Events (cross-service):
   - SendWelcomeEmail
   - UpdateRecommendations

Event Structure:

{
  "event_id": "evt_a3f7c9b2",
  "event_type": "order.created",
  "event_version": "1.0",
  "timestamp": "2024-01-15T10:30:00Z",
  "source": "order-service",
  "correlation_id": "corr_x1y2z3",
  "data": {
    "order_id": "ord_123",
    "customer_id": "cust_456",
    "total": 99.99,
    "items": [...]
  },
  "metadata": {
    "user_id": "user_789",
    "tenant_id": "tenant_abc"
  }
}

Pattern: Message Queue (Point-to-Point)

Use Case: Work distribution, background jobs, reliable delivery.

Producer → Queue → Consumer(s)

Examples:
- Order placed → Queue → Payment processor
- Email requested → Queue → Email sender
- Image uploaded → Queue → Thumbnail generator

Implementations: RabbitMQ, AWS SQS, Azure Service Bus

Pattern: Publish-Subscribe (Pub/Sub)

Use Case: Broadcasting events to multiple interested services.

Publisher → Topic → Subscriber 1
                  → Subscriber 2
                  → Subscriber N

Example:
OrderCreated event published to "orders" topic
Subscribers:
  - Inventory Service (reserve stock)
  - Fulfillment Service (prepare shipment)
  - Analytics Service (update metrics)
  - Notification Service (send confirmation email)

Implementations: Apache Kafka, AWS SNS, Google Pub/Sub

Pattern: Event Sourcing

Definition: Store all state changes as a sequence of events, not current state.

Traditional (CRUD):
  orders table: id, customer_id, status, total

Event Sourcing:
  order_events table:
    - OrderCreated(order_id, customer_id, items, total)
    - PaymentReceived(order_id, amount, payment_method)
    - OrderShipped(order_id, tracking_number)
    - OrderDelivered(order_id, delivery_time)

Current state = replay all events

Benefits:
- Complete audit trail
- Temporal queries ("what was the state at time T?")
- Event replay for debugging
- Easy to add new projections

Challenges:
- Query complexity
- Event versioning
- Storage growth

Data Management Patterns

Pattern: Database per Service

Principle: Each service has its own database, never shared.

Order Service → Orders DB (PostgreSQL)
Inventory Service → Inventory DB (PostgreSQL)
Product Service → Products DB (MongoDB)
Analytics Service → Analytics DB (ClickHouse)

Benefits:
- Independent scaling
- Technology choice flexibility
- Loose coupling
- Clear ownership

Challenges:
- No cross-service joins
- Distributed transactions
- Data consistency

Pattern: Saga (Distributed Transactions)

Purpose: Maintain data consistency across services without 2PC.

Orchestration-Based Saga

Order Saga Orchestrator:

1. Create Order (Order Service)
   ↓ success
2. Reserve Inventory (Inventory Service)
   ↓ success
3. Process Payment (Payment Service)
   ↓ success
4. Update Order Status (Order Service)
   ↓ failure → Compensate
5. Compensating Transactions:
   - Release Inventory
   - Refund Payment
   - Cancel Order

Implementation:
- Orchestrator maintains state machine
- Explicit control flow
- Centralized logic

Choreography-Based Saga

Event-Driven Saga:

OrderCreated event
  → Inventory Service reserves stock
    → InventoryReserved event
      → Payment Service processes payment
        → PaymentProcessed event
          → Order Service updates status

If failure at any step:
  → Compensating events cascade backwards

Implementation:
- Decentralized coordination
- Event-driven
- Implicit control flow

Pattern: CQRS (Command Query Responsibility Segregation)

Definition: Separate read and write models for optimal performance.

Write Model (Commands):
  - Optimized for consistency
  - Normalized schema
  - Transactional

Read Model (Queries):
  - Optimized for performance
  - Denormalized views
  - Eventually consistent
  - Materialized views/caching

Sync via events:
  Command → Write DB → Event → Read DB(s)

Example:
  Write: OrderCreated → Orders DB (PostgreSQL)
  Event: OrderCreated published
  Read: Update OrderSummary view (Redis)
        Update OrderAnalytics (Elasticsearch)

Pattern: API Composition

Purpose: Join data from multiple services at the API layer.

GET /customers/123/dashboard

API Gateway:
1. GET /customers/123 (Customer Service)
2. GET /orders?customer_id=123 (Order Service)
3. GET /recommendations/123 (Recommendation Service)
4. Compose response:

{
  "customer": {...},
  "recent_orders": [...],
  "recommendations": [...]
}

Challenges:
- N+1 queries
- Slower response time
- Complex error handling

Optimizations:
- Parallel requests
- GraphQL (client-controlled aggregation)
- Backend for Frontend (BFF) pattern

Resilience Patterns

Pattern: Circuit Breaker

Purpose: Prevent cascading failures by failing fast.

States:
  Closed → Normal operation, requests pass through
  Open → Failure threshold reached, fail fast
  Half-Open → Test if service recovered

Configuration:
  failure_threshold: 5 failures in 10s
  timeout: 30s
  half_open_max_calls: 3

@CircuitBreaker(name = "payment-service")
public PaymentResult processPayment(Payment payment) {
  return paymentClient.process(payment);
}

Libraries: Resilience4j, Hystrix, Polly

Pattern: Retry with Exponential Backoff

Purpose: Retry failed requests with increasing delays.

@Retry(
  maxAttempts = 3,
  backoff = @Backoff(
    delay = 1000,  // 1s initial
    multiplier = 2,  // 1s, 2s, 4s
    maxDelay = 10000
  )
)
public Order getOrder(String id) {
  return orderClient.getOrder(id);
}

Add jitter to prevent thundering herd:
delay = base_delay * (2 ^ attempt) + random(0, 1000)

Pattern: Bulkhead

Purpose: Isolate resources to prevent one failure affecting others.

Thread Pool Isolation:

payment-service:
  thread_pool_size: 10
  queue_size: 20

inventory-service:
  thread_pool_size: 20
  queue_size: 50

If payment-service threads exhaust, inventory-service unaffected.

@Bulkhead(
  name = "payment-service",
  type = Bulkhead.Type.THREADPOOL,
  maxThreadPoolSize = 10
)

Pattern: Rate Limiting

Purpose: Protect services from overload.

Strategies:

1. Token Bucket:
   - Tokens refill at fixed rate
   - Request consumes token
   - Burst capacity allowed

2. Leaky Bucket:
   - Requests queued
   - Processed at fixed rate
   - Queue overflow rejected

3. Fixed Window:
   - 100 requests per minute
   - Counter resets each minute

4. Sliding Window:
   - More accurate
   - Prevents burst at window boundary

Implementation:
@RateLimiter(
  name = "api",
  limitForPeriod = 100,
  limitRefreshPeriod = "1m"
)

Pattern: Timeout

Purpose: Prevent indefinite waiting.

Timeout Hierarchy:

Client → (5s) → API Gateway → (3s) → Service A → (1s) → Service B

Each layer has shorter timeout than caller.

RestTemplate:
  .setConnectTimeout(2000)
  .setReadTimeout(5000)

HTTP Client:
  HttpClient.newBuilder()
    .connectTimeout(Duration.ofSeconds(2))
    .build()

Cross-Cutting Concerns

Observability Patterns

Distributed Tracing

Request ID propagation:

Client → API Gateway [trace_id: abc123]
  → Service A [trace_id: abc123, span_id: 001]
    → Service B [trace_id: abc123, span_id: 002]
    → Service C [trace_id: abc123, span_id: 003]

Implementations: Jaeger, Zipkin, AWS X-Ray

Correlation:
X-Correlation-ID: abc123
X-Request-ID: req_xyz789

Centralized Logging

Log aggregation pattern:

Services → Log Shipper → Log Aggregator → Search/Analysis

Structure logs (JSON):
{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "service": "order-service",
  "trace_id": "abc123",
  "span_id": "001",
  "message": "Order created",
  "order_id": "ord_123",
  "customer_id": "cust_456"
}

Stack: Filebeat → Logstash → Elasticsearch → Kibana
       Fluentd → Kafka → Splunk

Metrics & Monitoring

Key Metrics (RED method):
  - Rate: requests per second
  - Errors: error rate
  - Duration: response time (p50, p95, p99)

USE method (infrastructure):
  - Utilization: CPU, memory, disk
  - Saturation: queue depth
  - Errors: error counts

Implementations: Prometheus, Grafana, DataDog

Service Mesh

Purpose: Infrastructure layer handling service-to-service communication.

Features:
  - Traffic management (routing, retries, timeouts)
  - Security (mTLS, authentication)
  - Observability (metrics, tracing)
  - Resilience (circuit breaking, rate limiting)

Architecture:
  Service A ←→ Sidecar Proxy (Envoy)
                    ↕
             Control Plane (Istio/Linkerd)
                    ↕
  Service B ←→ Sidecar Proxy (Envoy)

Implementations: Istio, Linkerd, Consul Connect

Deployment Patterns

Blue-Green Deployment

Load Balancer
  → Blue (current version, 100% traffic)
  → Green (new version, 0% traffic)

Deploy to Green → Test → Switch traffic → Blue becomes standby

Canary Deployment

Load Balancer
  → v1 (95% traffic)
  → v2 (5% traffic - canary)

Monitor metrics → Increase traffic → Full rollout

Rolling Deployment

Instances: [v1, v1, v1, v1]
Step 1:    [v2, v1, v1, v1]
Step 2:    [v2, v2, v1, v1]
Step 3:    [v2, v2, v2, v1]
Step 4:    [v2, v2, v2, v2]

Anti-Patterns to Avoid

Distributed Monolith: Services tightly coupled, must deploy together
Shared Database: Multiple services accessing same database
Chatty APIs: Too many synchronous calls between services
Mega Services: Services too large, violating single responsibility
Missing Circuit Breakers: No protection against cascading failures
Synchronous Everything: No asynchronous communication
God Service: One service orchestrating everything
Ignoring Network Failures: Assuming network is reliable
No Versioning: Breaking changes without versioning strategy
Missing Monitoring: Deploying without observability

Migration Strategies

Strangler Fig Pattern

Purpose: Gradually migrate from monolith to microservices.

Phase 1: Routing layer intercepts requests
  Client → Router → Monolith (all traffic)

Phase 2: Extract first service
  Client → Router → Service A (10% traffic)
                 → Monolith (90% traffic)

Phase 3: Extract more services
  Client → Router → Service A (all orders)
                 → Service B (all users)
                 → Monolith (remaining)

Phase N: Retire monolith
  Client → Router → Services A, B, C, ... (all traffic)

Branch by Abstraction

Purpose: Refactor incrementally without feature branches.

1. Create abstraction layer
2. Implement new service behind abstraction
3. Gradually migrate calls to new implementation
4. Remove old implementation
5. Remove abstraction (optional)

Best Practices Summary

Service Boundaries: Align with business capabilities, not technical layers
Data Ownership: Each service owns its data, no shared databases
Communication: Async for events, sync for queries, choose appropriately
Resilience: Circuit breakers, retries, timeouts on all external calls
Observability: Distributed tracing, centralized logging, metrics everywhere
Deployment: Automate everything, blue-green or canary deploys
Versioning: Version all APIs, maintain backward compatibility
Testing: Unit, integration, contract, end-to-end, chaos engineering
Security: mTLS between services, API gateway for external auth
Documentation: Service catalog, API specs, architecture diagrams

Resources

Books: "Building Microservices" (Newman), "Microservices Patterns" (Richardson)
Sites: microservices.io, Martin Fowler's microservices articles
Tools: Kubernetes, Istio, Linkerd, Kafka, Kong, Jaeger
Patterns: CQRS, Event Sourcing, Saga, Circuit Breaker, API Gateway

microservices-patterns

Microservices Architecture Patterns

When to Use This Skill

Core Principles

1. Single Responsibility per Service

2. Independent Deployability

3. Decentralized Data Management

4. Design for Failure

5. Infrastructure Automation

Service Decomposition Patterns

Pattern 1: Decompose by Business Capability

Pattern 2: Decompose by Subdomain (DDD)

Pattern 3: Decompose by Transaction

Communication Patterns

Synchronous Communication

Pattern: API Gateway

Pattern: Service-to-Service REST

Pattern: gRPC for Internal Communication

Asynchronous Communication

Pattern: Event-Driven Architecture

Pattern: Message Queue (Point-to-Point)

Pattern: Publish-Subscribe (Pub/Sub)

Pattern: Event Sourcing

Data Management Patterns

Pattern: Database per Service

Pattern: Saga (Distributed Transactions)

Orchestration-Based Saga

Choreography-Based Saga

Pattern: CQRS (Command Query Responsibility Segregation)

Pattern: API Composition

Resilience Patterns

Pattern: Circuit Breaker

Pattern: Retry with Exponential Backoff

Pattern: Bulkhead

Pattern: Rate Limiting

Pattern: Timeout

Cross-Cutting Concerns

Observability Patterns

Distributed Tracing

Centralized Logging

Metrics & Monitoring

Service Mesh

Deployment Patterns

Blue-Green Deployment

Canary Deployment

Rolling Deployment

Anti-Patterns to Avoid

Migration Strategies

Strangler Fig Pattern

Branch by Abstraction

Best Practices Summary

Resources

Related Skills