database-sharding

Comprehensive database sharding patterns for horizontal scaling with hash, range, and directory-based strategies.

Quick Start (10 Minutes)

Step 1: Choose sharding strategy from templates:

# Hash-based (even distribution)
cat templates/hash-router.ts

# Range-based (time-series data)
cat templates/range-router.ts

# Directory-based (multi-tenancy)
cat templates/directory-router.ts

Step 2: Select shard key criteria:

✅ High cardinality (millions of unique values)
✅ Even distribution (no single value > 5%)
✅ Immutable (never changes)
✅ Query alignment (in 80%+ of WHERE clauses)

Step 3: Implement router:

import { HashRouter } from './hash-router';

const router = new HashRouter([
  { id: 'shard_0', connection: { host: 'db0.example.com' } },
  { id: 'shard_1', connection: { host: 'db1.example.com' } },
  { id: 'shard_2', connection: { host: 'db2.example.com' } },
  { id: 'shard_3', connection: { host: 'db3.example.com' } },
]);

// Query single shard
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);

Critical Rules

✓ Always Do

Rule	Reason
Include shard key in queries	Avoid scanning all shards (100x slower)
Monitor shard distribution	Detect hotspots before they cause outages
Plan for rebalancing upfront	Cannot easily add shards later
Choose immutable shard key	Changing key = data migration nightmare
Test distribution with production data	Synthetic data hides real hotspots
Denormalize for data locality	Keep related data on same shard

✗ Never Do

Anti-Pattern	Why It's Bad
Sequential ID with range sharding	Latest shard gets all writes (hotspot)
Timestamp as shard key	Recent shard overwhelmed
Cross-shard transactions without 2PC	Data corruption, inconsistency
Simple modulo without consistent hashing	Cannot add shards without full re-shard
Nullable shard key	Special NULL handling creates hotspots
No shard routing layer	Hardcoded shards = cannot rebalance

Top 7 Critical Errors

Error 1: Wrong Shard Key Choice (Hotspots)

Symptom: One shard receives 80%+ of traffic Fix:

// ❌ Bad: Low cardinality (status field)
shard_key = order.status; // 90% are 'pending' → shard_0 overloaded

// ✅ Good: High cardinality (user_id)
shard_key = order.user_id; // Millions of users, even distribution

Error 2: Missing Shard Key in Queries

Symptom: Queries scan ALL shards (extremely slow) Fix:

// ❌ Bad: No shard key
SELECT * FROM orders WHERE status = 'shipped'; // Scans all 100 shards!

// ✅ Good: Include shard key
SELECT * FROM orders WHERE user_id = ? AND status = 'shipped'; // Targets 1 shard

Error 3: Sequential IDs with Range Sharding

Symptom: Latest shard gets all writes Fix:

// ❌ Bad: Range sharding with auto-increment
// Shard 0: 1-1M, Shard 1: 1M-2M, Shard 2: 2M+ → All new writes to Shard 2!

// ✅ Good: Hash-based sharding
const shardId = hash(id) % shardCount; // Even distribution

Error 4: No Rebalancing Strategy

Symptom: Stuck with initial shard count, cannot scale Fix:

// ❌ Bad: Simple modulo
const shardId = hash(key) % shardCount; // Adding 5th shard breaks ALL keys

// ✅ Good: Consistent hashing
const ring = new ConsistentHashRing(shards);
const shardId = ring.getNode(key); // Only ~25% of keys move when adding shard

Error 5: Cross-Shard Transactions

Symptom: Data inconsistency, partial writes Fix:

// ❌ Bad: Cross-shard transaction (will corrupt)
BEGIN;
UPDATE shard_1.accounts SET balance = balance - 100 WHERE id = 'A';
UPDATE shard_2.accounts SET balance = balance + 100 WHERE id = 'B';
COMMIT; // If shard_2 fails, shard_1 already committed!

// ✅ Good: Two-Phase Commit or Saga pattern
const txn = new TwoPhaseCommitTransaction();
txn.addOperation(shard_1, 'UPDATE accounts SET balance = balance - 100 WHERE id = ?', ['A']);
txn.addOperation(shard_2, 'UPDATE accounts SET balance = balance + 100 WHERE id = ?', ['B']);
await txn.execute(); // Atomic across shards

Error 6: Mutable Shard Key

Symptom: Records move shards, causing duplicates Fix:

// ❌ Bad: Shard by country (user relocates)
shard_key = user.country; // User moves US → CA, now in different shard!

// ✅ Good: Shard by immutable user_id
shard_key = user.id; // Never changes

Error 7: No Monitoring

Symptom: Silent hotspots, sudden performance degradation Fix:

// ✅ Required metrics
- Per-shard record counts (should be within 20%)
- Query distribution (no shard > 40% of queries)
- Storage per shard (alert at 80%)
- Latency p99 per shard

Load references/error-catalog.md for all 10 errors with detailed fixes.

Sharding Strategies

Strategy	Best For	Pros	Cons
Hash	User data, even load critical	No hotspots, predictable	Range queries scatter
Range	Time-series, logs, append-only	Range queries efficient, archival	Recent shard hotspot
Directory	Multi-tenancy, complex routing	Flexible, easy rebalancing	Lookup overhead, SPOF

Load references/sharding-strategies.md for detailed comparisons with production examples (Instagram, Discord, Salesforce).

Shard Key Selection Criteria

Criterion	Importance	Check Method
High cardinality	Critical	`COUNT(DISTINCT shard_key)` > shard_count × 100
Even distribution	Critical	No value > 5% of total
Immutable	Critical	Value never changes
Query alignment	High	80%+ queries include it
Data locality	Medium	Related records together

Decision Tree:

User-focused app → user_id
Multi-tenant SaaS → tenant_id
Time-series/logs → timestamp (range sharding)
Product catalog → product_id

Load references/shard-key-selection.md for comprehensive decision trees and testing strategies.

Configuration Summary

Hash-Based Router

import { HashRouter } from './templates/hash-router';

const router = new HashRouter([
  { id: 'shard_0', connection: { /* PostgreSQL config */ } },
  { id: 'shard_1', connection: { /* PostgreSQL config */ } },
]);

// Automatically routes to correct shard
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);

Range-Based Router

import { RangeRouter } from './templates/range-router';

const router = new RangeRouter(shardConfigs, [
  { start: Date.parse('2024-01-01'), end: Date.parse('2024-04-01'), shardId: 'shard_q1' },
  { start: Date.parse('2024-04-01'), end: Date.parse('2024-07-01'), shardId: 'shard_q2' },
  { start: Date.parse('2024-07-01'), end: Infinity, shardId: 'shard_q3' },
]);

// Range queries target specific shards
const janEvents = await router.queryRange(
  Date.parse('2024-01-01'),
  Date.parse('2024-02-01'),
  'SELECT * FROM events WHERE created_at BETWEEN $1 AND $2'
);

Directory-Based Router

import { DirectoryRouter } from './templates/directory-router';

const router = new DirectoryRouter(directoryDBConfig, shardConfigs);

// Assign tenant to specific shard
await router.assignShard('tenant_acme', 'shard_enterprise');

// Route automatically
const users = await router.query('tenant_acme', 'SELECT * FROM users');

When to Load References

Choosing Strategy

Load references/sharding-strategies.md when:

Deciding between hash, range, directory
Need production examples (Instagram, Discord)
Planning hybrid approaches

Selecting Shard Key

Load references/shard-key-selection.md when:

Choosing shard key for new project
Evaluating existing shard key
Testing distribution with production data

Implementation

Load references/implementation-patterns.md when:

Building shard router from scratch
Implementing consistent hashing
Need transaction handling (2PC, Saga)
Setting up monitoring/metrics

Cross-Shard Operations

Load references/cross-shard-queries.md when:

Need to aggregate across shards (COUNT, SUM, AVG)
Implementing cross-shard joins
Building pagination across shards
Optimizing scatter-gather patterns

Rebalancing

Load references/rebalancing-guide.md when:

Adding new shards
Migrating data between shards
Planning zero-downtime migrations
Balancing uneven load

Error Prevention

Load references/error-catalog.md when:

Troubleshooting performance issues
Reviewing shard architecture
All 10 documented errors with fixes

Complete Setup Checklist

Before Sharding:

Tested shard key distribution with production data
Shard key in 80%+ of queries
Monitoring infrastructure ready
Rebalancing strategy planned

Router Implementation:

Shard routing layer (not hardcoded shards)
Connection pooling per shard
Error handling and retries
Metrics collection (queries/shard, latency)

Shard Configuration:

4-8 shards initially (room to grow)
Consistent hashing or virtual shards
Replicas per shard (HA)
Backup strategy per shard

Application Changes:

All queries include shard key
Cross-shard joins eliminated (denormalized)
Transaction boundaries respected
Connection pooling configured

Production Example

Before (Single database overwhelmed):

// Single PostgreSQL instance
const db = new Pool({ host: 'db.example.com' });

// All 10M users on one server
const users = await db.query('SELECT * FROM users WHERE status = $1', ['active']);
// Query time: 5000ms (slow!)
// DB CPU: 95%
// Disk: 500GB, growing

After (Sharded across 8 servers):

// Hash-based sharding with 8 shards
const router = new HashRouter([
  { id: 'shard_0', connection: { host: 'db0.example.com' } },
  { id: 'shard_1', connection: { host: 'db1.example.com' } },
  // ... 6 more shards
]);

// Query single user (targets 1 shard)
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
// Query time: 10ms (500x faster!)

// Query all shards (scatter-gather)
const allActive = await router.queryAll('SELECT * FROM users WHERE status = $1', ['active']);
// Query time: 800ms (parallelized across 8 shards, 6x faster than single)

// Result: Each shard handles ~1.25M users
// DB CPU per shard: 20%
// Disk per shard: 65GB
// Can scale to 16 shards easily (consistent hashing)

Known Issues Prevention

All 10 documented errors prevented:

✅ Wrong shard key (hotspots) → Test distribution first
✅ Missing shard key in queries → Code review, linting
✅ Cross-shard transactions → Use 2PC or Saga pattern
✅ Sequential ID hotspots → Use hash-based sharding
✅ No rebalancing strategy → Consistent hashing from day 1
✅ Timestamp sharding hotspots → Hybrid hash+range approach
✅ Mutable shard key → Choose immutable keys (user_id)
✅ No routing layer → Abstract with router from start
✅ No monitoring → Track per-shard metrics
✅ Weak hash function → Use MD5, MurmurHash3, xxHash

See: references/error-catalog.md for detailed fixes

Resources

Templates:

templates/hash-router.ts - Hash-based sharding
templates/range-router.ts - Range-based sharding
templates/directory-router.ts - Directory-based sharding
templates/cross-shard-aggregation.ts - Aggregation patterns

References:

references/sharding-strategies.md - Strategy comparison
references/shard-key-selection.md - Key selection guide
references/implementation-patterns.md - Router implementations
references/cross-shard-queries.md - Query patterns
references/rebalancing-guide.md - Migration strategies
references/error-catalog.md - All 10 errors documented

Production Examples:

Instagram: Range sharding for media
Discord: Hash sharding for messages
Salesforce: Directory sharding for orgs

Production-tested | 10 errors prevented | MIT License

database-sharding

database-sharding

Quick Start (10 Minutes)

Critical Rules

✓ Always Do

✗ Never Do

Top 7 Critical Errors

Error 1: Wrong Shard Key Choice (Hotspots)

Error 2: Missing Shard Key in Queries

Error 3: Sequential IDs with Range Sharding

Error 4: No Rebalancing Strategy

Error 5: Cross-Shard Transactions

Error 6: Mutable Shard Key

Error 7: No Monitoring

Sharding Strategies

Shard Key Selection Criteria

Configuration Summary

Hash-Based Router

Range-Based Router

Directory-Based Router

When to Load References

Choosing Strategy

Selecting Shard Key

Implementation

Cross-Shard Operations

Rebalancing

Error Prevention

Complete Setup Checklist

Production Example

Known Issues Prevention

Resources

Similar Skills