RAG (Retrieval Augmented Generation) implementation patterns including document chunking, embedding generation, vector database integration, semantic search, and RAG pipelines. Use when building RAG systems, implementing semantic search, creating knowledge bases, or when user mentions RAG, embeddings, vector database, retrieval, document chunking, or knowledge retrieval.
Limited to specific tools
Additional assets for this skill
This skill is limited to using the following tools:
scripts/validate-rag-setup.shtemplates/rag-pipeline.tsPurpose: Provide complete RAG pipeline templates, chunking strategies, vector database schemas, and retrieval patterns for building production-ready RAG systems with Vercel AI SDK.
Activation Triggers:
Key Resources:
templates/rag-pipeline.ts - Complete RAG pipeline templatetemplates/vector-db-schemas/ - Database schemas for Pinecone, Chroma, pgvector, Weaviatetemplates/chunking-strategies.ts - Document chunking implementationstemplates/retrieval-patterns.ts - Semantic search and hybrid search patternsscripts/chunk-documents.sh - Document chunking utilityscripts/generate-embeddings.sh - Batch embedding generationscripts/validate-rag-setup.sh - Validate RAG configurationexamples/ - Complete RAG implementations (chatbot, Q&A, search)Template: templates/rag-pipeline.ts
Workflow:
// 1. Ingest documents
const documents = await loadDocuments()
// 2. Chunk documents
const chunks = await chunkDocuments(documents, {
chunkSize: 1000
overlap: 200
strategy: 'semantic'
})
// 3. Generate embeddings
const embeddings = await embedMany({
model: openai.embedding('text-embedding-3-small')
values: chunks.map(c => c.text)
})
// 4. Store in vector DB
await vectorDB.upsert(chunks.map((chunk, i) => ({
id: chunk.id
embedding: embeddings.embeddings[i]
metadata: chunk.metadata
})))
// 5. Retrieve relevant chunks
const query = await embed({
model: openai.embedding('text-embedding-3-small')
value: userQuestion
})
const results = await vectorDB.query({
vector: query.embedding
topK: 5
})
// 6. Generate response with context
const response = await generateText({
model: openai('gpt-4o')
messages: [
{
role: 'system'
content: `Answer based on this context:\n\n${results.map(r => r.text).join('\n\n')}`
}
{ role: 'user', content: userQuestion }
]
})
When to use: Simple documents, consistent structure
Template: templates/chunking-strategies.ts#fixedSize
function chunkByFixedSize(text: string, chunkSize: number, overlap: number) {
const chunks = []
for (let i = 0; i < text.length; i += chunkSize - overlap) {
chunks.push(text.slice(i, i + chunkSize))
}
return chunks
}
Best for: Articles, blog posts, documentation
When to use: Preserve meaning and context
Template: templates/chunking-strategies.ts#semantic
function chunkBySemantic(text: string) {
// Split on paragraphs, headings, or natural breaks
const sections = text.split(/\n\n+/)
const chunks = []
let currentChunk = ''
for (const section of sections) {
if ((currentChunk + section).length > 1000) {
if (currentChunk) chunks.push(currentChunk.trim())
currentChunk = section
} else {
currentChunk += '\n\n' + section
}
}
if (currentChunk) chunks.push(currentChunk.trim())
return chunks
}
Best for: Books, research papers, structured content
When to use: Hierarchical documents with sections/subsections
Template: templates/chunking-strategies.ts#recursive
Best for: Technical docs, manuals, legal documents
1. Pinecone (Fully Managed)
Template: templates/vector-db-schemas/pinecone-schema.ts
import { Pinecone } from '@pinecone-database/pinecone'
const pinecone = new Pinecone({
apiKey: process.env.PINECONE_API_KEY!
})
const index = pinecone.index('knowledge-base')
// Upsert embeddings
await index.upsert([
{
id: 'doc-1-chunk-1'
values: embedding
metadata: {
text: chunk.text
source: chunk.source
timestamp: Date.now()
}
}
])
// Query
const results = await index.query({
vector: queryEmbedding
topK: 5
includeMetadata: true
})
2. Chroma (Open Source)
Template: templates/vector-db-schemas/chroma-schema.ts
Best for: Local development, prototyping
3. pgvector (Postgres Extension)
Template: templates/vector-db-schemas/pgvector-schema.sql
Best for: Existing Postgres infrastructure, cost-effective
4. Weaviate (Open Source/Cloud)
Template: templates/vector-db-schemas/weaviate-schema.ts
Best for: Advanced filtering, hybrid search
Template: templates/retrieval-patterns.ts#simpleSearch
async function semanticSearch(query: string, topK: number = 5) {
// Embed query
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small')
value: query
})
// Search vector DB
const results = await vectorDB.query({
vector: embedding
topK
})
return results
}
Template: templates/retrieval-patterns.ts#hybridSearch
async function hybridSearch(query: string, topK: number = 10) {
// Vector search
const vectorResults = await semanticSearch(query, topK)
// Keyword search (BM25 or full-text)
const keywordResults = await fullTextSearch(query, topK)
// Combine and re-rank
const combined = rerank(vectorResults, keywordResults)
return combined.slice(0, topK)
}
Best practice: Use hybrid search for better recall
Template: templates/retrieval-patterns.ts#reranking
async function rerankResults(query: string, results: any[]) {
// Use cross-encoder or LLM for re-ranking
const reranked = await generateObject({
model: openai('gpt-4o')
schema: z.object({
rankedIds: z.array(z.string())
})
messages: [
{
role: 'system'
content: 'Rank these documents by relevance to the query.'
}
{
role: 'user'
content: `Query: ${query}\n\nDocuments: ${JSON.stringify(results)}`
}
]
})
return reranked.object.rankedIds.map(id =>
results.find(r => r.id === id)
)
}
# Check dependencies and configuration
./scripts/validate-rag-setup.sh
Checks:
Decision tree:
Considerations:
# Batch generate embeddings
./scripts/generate-embeddings.sh ./documents/ openai
Optimization:
embedMany for batch processingUse template: templates/retrieval-patterns.ts
Customize:
Pattern:
const context = retrievedChunks.map(chunk => chunk.text).join('\n\n')
const response = await generateText({
model: openai('gpt-4o')
messages: [
{
role: 'system'
content: `Answer based on this context. If the answer is not in the context, say so.\n\nContext:\n${context}`
}
{ role: 'user', content: query }
]
})
Guideline:
Test with your data: Use scripts/chunk-documents.sh with different sizes
OpenAI text-embedding-3-small:
OpenAI text-embedding-3-large:
Cohere embed-english-v3.0:
Multi-query retrieval:
// Generate multiple query variations
const variations = await generateText({
model: openai('gpt-4o')
messages: [{
role: 'user'
content: `Generate 3 variations of this query: "${query}"`
}]
})
// Search with all variations and combine results
const allResults = await Promise.all(
variations.map(v => semanticSearch(v))
)
const combined = deduplicateAndRank(allResults.flat())
try {
const results = await ragPipeline(query)
return results
} catch (error) {
if (error.code === 'RATE_LIMIT') {
// Implement exponential backoff
} else if (error.code === 'VECTOR_DB_ERROR') {
// Fallback to keyword search
}
throw error
}
// Cache embeddings
const cache = new Map<string, number[]>()
async function getEmbedding(text: string) {
if (cache.has(text)) {
return cache.get(text)!
}
const { embedding } = await embed({ model, value: text })
cache.set(text, embedding)
return embedding
}
// Track RAG metrics
metrics.record({
operation: 'rag_query'
latency: Date.now() - startTime
chunksRetrieved: results.length
vectorDBCalls: 1
embeddingCost: calculateCost(query.length)
})
Example: examples/conversational-rag.ts
Maintains conversation context while retrieving relevant information
Example: examples/multi-document-rag.ts
Retrieves from multiple knowledge bases
Example: examples/agentic-rag.ts
Uses tools to decide when and what to retrieve
Scripts:
chunk-documents.sh - Chunk documents with different strategiesgenerate-embeddings.sh - Batch embedding generationvalidate-rag-setup.sh - Validate configurationTemplates:
rag-pipeline.ts - Complete RAG implementationchunking-strategies.ts - All chunking approachesretrieval-patterns.ts - Search and re-ranking patternsvector-db-schemas/ - Database-specific schemasExamples:
conversational-rag.ts - Chat with memorymulti-document-rag.ts - Multiple sourcesagentic-rag.ts - Tool-based retrievalSupported Vector DBs: Pinecone, Chroma, pgvector, Weaviate, Qdrant SDK Version: Vercel AI SDK 5+ Embedding Models: OpenAI, Cohere, Custom
Best Practice: Start with simple semantic search, add complexity as needed