Performance optimization patterns for Mem0 memory operations including query optimization, caching strategies, embedding efficiency, database tuning, batch operations, and cost reduction for both Platform and OSS deployments. Use when optimizing memory performance, reducing costs, improving query speed, implementing caching, tuning database performance, analyzing bottlenecks, or when user mentions memory optimization, performance tuning, cost reduction, slow queries, caching, or Mem0 optimization.
Limited to specific tools
Additional assets for this skill
This skill is limited to using the following tools:
QUICK_START.mdREADME.mdexamples/before-after-benchmarks.mdexamples/optimization-case-studies.mdscripts/analyze-costs.shscripts/analyze-performance.shscripts/deduplicate-memories.shscripts/diagnose-slow-queries.shscripts/generate-cache-config.shscripts/setup-monitoring.shscripts/suggest-vector-db.shtemplates/cache-strategies/redis-cache.pytemplates/embedding-configs/cost-optimized.pytemplates/embedding-configs/performance-optimized.pytemplates/optimized-memory-config.pyPerformance optimization patterns and tools for Mem0 memory systems. This skill provides comprehensive optimization techniques for query performance, cost reduction, caching strategies, and infrastructure tuning for both Platform and OSS deployments.
Start by analyzing your current memory system performance:
bash scripts/analyze-performance.sh [project_name]
This generates a comprehensive performance report including:
Review the output to identify optimization priorities:
Optimize memory search operations for speed and efficiency.
Problem: Retrieving too many results increases latency and costs.
Solution: Use appropriate limit values based on use case.
# ❌ BAD: Using default or excessive limits
memories = memory.search(query, user_id=user_id) # Default: 10
# ✅ GOOD: Optimized limits
memories = memory.search(query, user_id=user_id, limit=5) # Chat apps
memories = memory.search(query, user_id=user_id, limit=3) # Quick context
memories = memory.search(query, user_id=user_id, limit=10) # RAG systems
Impact: 30-40% reduction in query time
Guidelines:
Problem: Searching entire index is slow and expensive.
Solution: Apply filters to narrow search scope.
# ❌ BAD: Full index scan
memories = memory.search(query)
# ✅ GOOD: Filtered search
memories = memory.search(
query
filters={
"user_id": user_id
"categories": ["preferences", "profile"]
}
limit=5
)
# ✅ BETTER: Multiple filter conditions
memories = memory.search(
query
filters={
"AND": [
{"user_id": user_id}
{"agent_id": "support_v2"}
{"created_after": "2025-01-01"}
]
}
limit=5
)
Impact: 40-60% reduction in query time
Available Filters:
user_id: Scope to specific useragent_id: Scope to specific agentrun_id: Scope to session/runcategories: Filter by memory categoriesmetadata: Custom metadata filterscreated_after, created_beforeProblem: Default reranking may be overkill for simple queries.
Solution: Configure reranker based on accuracy requirements.
# Platform Mode (Mem0 Cloud)
from mem0 import MemoryClient
# Disable reranking for fast, simple queries
memory = MemoryClient(api_key=api_key)
memories = memory.search(
query
user_id=user_id
rerank=False # 2x faster, slightly lower accuracy
)
# OSS Mode
from mem0 import Memory
from mem0.configs.base import MemoryConfig
# Use lightweight reranker
config = MemoryConfig(
reranker={
"provider": "cohere"
"config": {
"model": "rerank-english-v3.0", # Fast model
"top_n": 5 # Rerank only top results
}
}
)
memory = Memory(config)
Reranker Options:
Decision Guide:
Problem: Blocking operations limit throughput.
Solution: Use async for high-concurrency scenarios.
import asyncio
from mem0 import AsyncMemory
async def get_user_context(user_id: str, queries: list[str]):
memory = AsyncMemory()
# Run multiple searches concurrently
results = await asyncio.gather(*[
memory.search(q, user_id=user_id, limit=3)
for q in queries
])
return results
# Usage
contexts = await get_user_context(
"user_123"
["preferences", "recent activity", "goals"]
)
Impact: 3-5x throughput improvement under load
Implement multi-layer caching to reduce API calls and improve response times.
Use for: Frequently accessed, rarely changing data (user preferences).
from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def get_user_preferences(user_id: str) -> list:
"""Cache user preferences for 5 minutes"""
return memory.search(
"user preferences"
user_id=user_id
limit=5
)
# Clear cache when preferences update
get_user_preferences.cache_clear()
Impact: Near-instant response for cached queries
Configuration:
maxsize=1000: Cache 1000 users' preferencesUse for: Shared caching across services, TTL control.
import redis
import json
redis_client = redis.Redis(host='localhost', port=6379, db=0)
def get_user_context_cached(user_id: str, query: str) -> list:
# Generate cache key
cache_key = f"mem0:search:{user_id}:{hashlib.md5(query.encode()).hexdigest()}"
# Check cache
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss - query Mem0
result = memory.search(query, user_id=user_id, limit=5)
# Cache result (5 minute TTL)
redis_client.setex(
cache_key
300, # 5 minutes
json.dumps(result)
)
return result
# Invalidate cache on update
def update_memory(user_id: str, message: str):
memory.add(message, user_id=user_id)
# Clear user's cache
pattern = f"mem0:search:{user_id}:*"
for key in redis_client.scan_iter(match=pattern):
redis_client.delete(key)
Impact: 50-70% reduction in API calls
TTL Guidelines:
Use the caching template generator:
bash scripts/generate-cache-config.sh redis [ttl_seconds]
Use for: Global applications, very high traffic.
See template: templates/edge-cache-config.yaml
Optimize embedding generation and storage costs.
Problem: Oversized embeddings increase cost and latency.
Solution: Match model to use case.
from mem0 import Memory
from mem0.configs.base import MemoryConfig
# ❌ EXPENSIVE: Large model for simple data
config = MemoryConfig(
embedder={
"provider": "openai"
"config": {
"model": "text-embedding-3-large", # 3072 dims, $0.13/1M tokens
}
}
)
# ✅ OPTIMIZED: Appropriate model
config = MemoryConfig(
embedder={
"provider": "openai"
"config": {
"model": "text-embedding-3-small", # 1536 dims, $0.02/1M tokens
}
}
)
Model Selection Guide:
| Use Case | Recommended Model | Dimensions | Cost |
|---|---|---|---|
| User preferences | text-embedding-3-small | 1536 | $0.02/1M |
| Simple chat context | text-embedding-3-small | 1536 | $0.02/1M |
| Advanced RAG | text-embedding-3-large | 3072 | $0.13/1M |
| Multilingual | text-embedding-3-large | 3072 | $0.13/1M |
| Budget-conscious | text-embedding-ada-002 | 1536 | $0.0001/1M |
Impact: 70-85% cost reduction with appropriate model selection
Problem: Individual embedding calls have overhead.
Solution: Batch multiple texts for embedding.
# ❌ BAD: Individual embedding calls
for message in messages:
memory.add(message, user_id=user_id) # Separate API call each
# ✅ GOOD: Batched operation
memory.add(messages, user_id=user_id) # Single batched call
Impact: 40-60% reduction in embedding costs
Batch Size Guidelines:
Problem: Re-embedding same text wastes costs.
Solution: Cache embeddings for frequent queries.
import hashlib
embedding_cache = {}
def get_or_create_embedding(text: str) -> list[float]:
# Generate hash of text
text_hash = hashlib.sha256(text.encode()).hexdigest()
# Check cache
if text_hash in embedding_cache:
return embedding_cache[text_hash]
# Generate embedding
embedding = generate_embedding(text)
embedding_cache[text_hash] = embedding
return embedding
Use Cases:
Optimize vector database performance for self-hosted deployments.
Decision Matrix:
bash scripts/suggest-vector-db.sh
| Database | Best For | Performance | Setup Complexity |
|---|---|---|---|
| Qdrant | Production, high scale | Excellent | Medium |
| Chroma | Development, prototyping | Good | Low |
| pgvector | Existing PostgreSQL | Good | Low |
| Milvus | Enterprise, billions of vectors | Excellent | High |
Recommendation:
For Qdrant:
config = MemoryConfig(
vector_store={
"provider": "qdrant"
"config": {
"collection_name": "memories"
"host": "localhost"
"port": 6333
"on_disk": True, # Reduce memory usage
"hnsw_config": {
"m": 16, # Balance between speed and accuracy
"ef_construct": 200, # Higher = better quality
}
"quantization_config": {
"scalar": {
"type": "int8", # Reduce storage by 4x
"quantile": 0.99
}
}
}
}
)
For pgvector (Supabase):
# Use specialized Supabase integration skill
bash ../supabase-integration/scripts/optimize-pgvector.sh
See: templates/vector-db-optimization/ for database-specific configs.
Problem: Creating new connections on each request is slow.
Solution: Use connection pooling.
from mem0 import Memory
from mem0.configs.base import MemoryConfig
config = MemoryConfig(
vector_store={
"provider": "qdrant"
"config": {
"host": "localhost"
"port": 6333
"grpc_port": 6334
"prefer_grpc": True, # Faster protocol
"timeout": 5
"connection_pool_size": 50, # Reuse connections
}
}
)
Impact: 30-50% reduction in connection overhead
Pool Size Guidelines:
Optimize bulk operations for efficiency.
# ❌ BAD: Individual operations
for msg in conversation_history:
memory.add(msg, user_id=user_id)
# ✅ GOOD: Batched operation
memory.add(conversation_history, user_id=user_id)
Impact: 60% faster, 40% lower cost
import asyncio
# ❌ BAD: Sequential searches
results = []
for query in queries:
results.append(memory.search(query, user_id=user_id))
# ✅ GOOD: Parallel searches
async def batch_search(queries, user_id):
memory = AsyncMemory()
return await asyncio.gather(*[
memory.search(q, user_id=user_id, limit=5)
for q in queries
])
results = await batch_search(queries, user_id)
Impact: 4-5x faster for multiple searches
Reduce operational costs for memory systems.
bash scripts/analyze-costs.sh [user_id] [date_range]
This generates:
Strategy 1: Memory Deduplication
bash scripts/deduplicate-memories.sh [user_id]
Removes similar/duplicate memories to reduce storage and query costs.
Impact: 20-40% storage reduction
Strategy 2: Archival and Tiered Storage
bash scripts/setup-memory-archival.sh [retention_days]
Move old memories to cheaper storage:
Impact: 50-70% storage cost reduction
Strategy 3: Smaller Embeddings for Archives
# Use cheaper embeddings for archived memories
archived_config = MemoryConfig(
embedder={
"provider": "openai"
"config": {
"model": "text-embedding-ada-002", # Cheaper
}
}
)
Strategy 4: Smart Pruning
bash scripts/prune-low-value-memories.sh [user_id] [score_threshold]
Remove memories that:
Impact: 30-50% cost reduction
Set up performance monitoring and alerts.
bash scripts/setup-monitoring.sh [project_name]
Tracks:
Use the alert configuration template:
bash scripts/generate-alert-config.sh
Recommended Alerts:
Benchmark and validate optimizations.
bash scripts/benchmark-performance.sh [config_name]
Measures:
bash scripts/load-test.sh [concurrent_users] [duration_seconds]
Simulates real-world load to identify bottlenecks.
bash scripts/compare-configs.sh [config1] [config2]
A/B test different optimization strategies.
bash scripts/diagnose-slow-queries.sh
Diagnostic Flow:
bash scripts/diagnose-high-costs.sh
Diagnostic Flow:
bash scripts/optimize-cache.sh
Diagnostic Flow:
80% hit rate? → Excellent, no action needed
Scripts (all functional):
scripts/analyze-performance.sh - Comprehensive performance analysisscripts/analyze-costs.sh - Cost breakdown and optimizationscripts/benchmark-performance.sh - Performance benchmarkingscripts/load-test.sh - Load testing and stress testingscripts/compare-configs.sh - A/B test configurationsscripts/diagnose-slow-queries.sh - Query performance diagnosticsscripts/diagnose-high-costs.sh - Cost diagnosticsscripts/optimize-cache.sh - Cache tuning recommendationsscripts/deduplicate-memories.sh - Remove duplicate memoriesscripts/prune-low-value-memories.sh - Remove unused memoriesscripts/setup-memory-archival.sh - Configure archival systemscripts/setup-monitoring.sh - Configure performance monitoringscripts/generate-alert-config.sh - Create alert rulesscripts/generate-cache-config.sh - Generate cache configurationsscripts/suggest-vector-db.sh - Vector database recommendationsTemplates:
templates/optimized-memory-config.py - Production-ready configurationtemplates/cache-strategies/ - Caching implementation patterns
in-memory-cache.py - Python LRU cacheredis-cache.py - Redis caching layeredge-cache-config.yaml - CDN/edge cachingtemplates/vector-db-optimization/ - Database-specific tuning
qdrant-config.py - Optimized Qdrant setuppgvector-config.py - Optimized pgvector setupmilvus-config.py - Optimized Milvus setuptemplates/embedding-configs/ - Embedding optimization
cost-optimized.py - Minimal cost configurationperformance-optimized.py - Maximum performancebalanced.py - Cost/performance balancetemplates/monitoring/ - Monitoring configurations
prometheus-metrics.yaml - Metrics collectiongrafana-dashboard.json - Performance dashboardalert-rules.yaml - Alert configurationsExamples:
examples/optimization-case-studies.md - Real-world optimization examplesexamples/before-after-benchmarks.md - Performance improvement resultsexamples/cost-reduction-strategies.md - Cost optimization success storiesexamples/caching-patterns.md - Effective caching implementationsexamples/oss-vs-platform-optimization.md - Platform-specific strategiesQuery Latency:
Cache Performance:
Cost Efficiency:
Resource Usage (OSS):
Slow Queries Despite Optimization:
Cache Not Improving Performance:
High Costs After Optimization:
Optimization Caused Accuracy Issues:
Plugin: mem0 Version: 1.0.0 Last Updated: 2025-10-27