LlamaIndex implementation patterns with templates, scripts, and examples for building RAG applications. Use when implementing LlamaIndex, building RAG pipelines, creating vector indices, setting up query engines, implementing chat engines, integrating LlamaCloud, or when user mentions LlamaIndex, RAG, VectorStoreIndex, document indexing, semantic search, or question answering systems.
Limited to specific tools
Additional assets for this skill
This skill is limited to using the following tools:
examples/chatbot-with-memory.pyexamples/multi-document-rag.pyexamples/question-answering.pyscripts/create-index.shscripts/setup-llamaindex.shscripts/test-llamaindex.shtemplates/basic-rag-pipeline.pytemplates/custom-retriever.pytemplates/llamacloud-integration.pyComprehensive implementation patterns, templates, and examples for building production-ready RAG (Retrieval-Augmented Generation) applications with LlamaIndex.
This skill provides complete, functional implementations for:
All scripts, templates, and examples are production-ready and fully functional.
Automated LlamaIndex installation with dependency management and environment setup.
bash scripts/setup-llamaindex.sh
Features:
Output:
.env file with API key templatesrequirements.txt with pinned versionsdata/ directory for documentsstorage/ directory for persisted indicesCreate a VectorStoreIndex from documents with progress tracking.
bash scripts/create-index.sh [data_dir] [storage_dir] [index_name]
Arguments:
data_dir: Directory containing documents (default: ./data)storage_dir: Where to persist the index (default: ./storage)index_name: Name for the index (default: default_index)Features:
Example:
bash scripts/create-index.sh ./documents ./indices my_knowledge_base
Comprehensive validation tests for LlamaIndex installation and configuration.
bash scripts/test-llamaindex.sh
Tests:
Output:
Complete RAG pipeline implementation with best practices.
Features:
Key Components:
class BasicRAGPipeline:
def load_or_create_index() # Smart index loading/creation
def query() # Simple question answering
def query_with_sources() # Answers with citations
def chat() # Interactive chat mode
Usage:
from basic_rag_pipeline import BasicRAGPipeline
pipeline = BasicRAGPipeline(
data_dir="./data",
storage_dir="./storage",
model="gpt-4o-mini"
)
pipeline.load_or_create_index()
response = pipeline.query("What is LlamaIndex?")
Use Cases:
Advanced retrieval strategies with filtering, reranking, and hybrid search.
Retrievers Included:
MetadataFilteredRetriever:
HybridRetriever:
RerankedRetriever:
Example:
from custom_retriever import MetadataFilteredRetriever
retriever = MetadataFilteredRetriever(
index=index,
similarity_top_k=10,
metadata_filters={"category": "technical", "year": 2024}
)
nodes = retriever.retrieve("How to deploy?")
Use Cases:
LlamaCloud managed services integration template.
Features:
Components:
LlamaParse Integration:
Managed Indices:
Example:
from llamacloud_integration import LlamaCloudRAG
rag = LlamaCloudRAG(api_key="your_key")
documents = rag.parse_with_llamaparse("complex.pdf")
rag.create_managed_index(documents, "prod-index")
Use Cases:
Note: Requires LlamaCloud account and llama-parse package for full functionality. Template includes fallbacks for development.
Complete Q&A system with citations and interactive mode.
Run:
python examples/question-answering.py
Features:
Demonstrates:
Conversational AI with memory management and context awareness.
Run:
python examples/chatbot-with-memory.py
Features:
Components:
class ConversationalChatbot:
def load_knowledge_base() # Setup knowledge
def initialize_chat_engine() # Configure memory
def chat() # Send/receive messages
def reset_conversation() # Clear memory
def get_conversation_summary() # History summary
def interactive_mode() # CLI interface
Commands:
/help - Show available commands/reset - Reset conversation memory/summary - View conversation history/exit - Exit chatbotUse Cases:
Advanced RAG with cross-document reasoning and filtering.
Run:
python examples/multi-document-rag.py
Features:
Components:
class MultiDocumentRAG:
def build_index() # Index multiple docs
def query_by_category() # Filtered queries
def cross_document_query() # Search all docs
def compare_documents() # Compare specific docs
Demonstrates:
Use Cases:
Step 1: Install Dependencies
cd plugins/rag-pipeline/skills/llamaindex-patterns
bash scripts/setup-llamaindex.sh
Step 2: Configure API Keys
Edit .env file:
OPENAI_API_KEY=sk-your-actual-key-here
ANTHROPIC_API_KEY=sk-ant-your-key-here # Optional
Step 3: Validate Installation
bash scripts/test-llamaindex.sh
Option 1: Using Scripts
# 1. Add your documents to ./data directory
mkdir -p data
cp /path/to/your/docs/* data/
# 2. Create index
bash scripts/create-index.sh data storage my_index
# 3. Use the index in your code
python -c "
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir='storage/my_index')
index = load_index_from_storage(storage_context)
response = index.as_query_engine().query('Your question?')
print(response)
"
Option 2: Using Templates
# Copy template to your project
cp templates/basic-rag-pipeline.py my_rag_app.py
# Customize and run
python my_rag_app.py
Option 3: Using Examples
# Run examples directly
python examples/question-answering.py
python examples/chatbot-with-memory.py
python examples/multi-document-rag.py
For Next.js Applications:
# Use in API route: app/api/chat/route.ts
# Create Python backend with FastAPI:
from fastapi import FastAPI
from basic_rag_pipeline import BasicRAGPipeline
app = FastAPI()
pipeline = BasicRAGPipeline()
pipeline.load_or_create_index()
@app.post("/query")
async def query(question: str):
response = pipeline.query(question)
return {"answer": response}
For FastAPI Projects:
# Integrate into existing FastAPI app
from contextlib import asynccontextmanager
from basic_rag_pipeline import BasicRAGPipeline
rag_pipeline = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global rag_pipeline
rag_pipeline = BasicRAGPipeline()
rag_pipeline.load_or_create_index()
yield
app = FastAPI(lifespan=lifespan)
For Standalone Python Applications:
# Use directly in your Python code
from basic_rag_pipeline import BasicRAGPipeline
def main():
pipeline = BasicRAGPipeline(
data_dir="./knowledge_base",
storage_dir="./indices"
)
pipeline.load_or_create_index()
while True:
question = input("Ask: ")
answer = pipeline.query(question)
print(f"Answer: {answer}")
from llama_index.core.node_parser import SentenceSplitter
node_parser = SentenceSplitter(
chunk_size=512, # Smaller chunks for precise retrieval
chunk_overlap=50, # Overlap for context continuity
)
index = VectorStoreIndex.from_documents(
documents,
node_parser=node_parser
)
# Use custom retriever template for routing between indices
tech_index = VectorStoreIndex.from_documents(tech_docs)
business_index = VectorStoreIndex.from_documents(business_docs)
# Route queries based on content
if "technical" in query.lower():
response = tech_index.as_query_engine().query(query)
else:
response = business_index.as_query_engine().query(query)
query_engine = index.as_query_engine(streaming=True)
streaming_response = query_engine.query("Your question")
for text in streaming_response.response_gen:
print(text, end="", flush=True)
# Persist
index.storage_context.persist(persist_dir="./storage")
# Load
from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
# Development
export OPENAI_API_KEY=sk-dev-key
export ENVIRONMENT=development
# Production
export OPENAI_API_KEY=sk-prod-key
export ENVIRONMENT=production
export REDIS_URL=redis://prod-cache:6379 # For caching
# Enable logging
import logging
logging.basicConfig(level=logging.INFO)
# Track usage
from llama_index.core import set_global_handler
set_global_handler("simple")
try:
response = pipeline.query(question)
except Exception as e:
logger.error(f"Query failed: {e}")
response = "I encountered an error. Please try again."
from ratelimit import limits, sleep_and_retry
@sleep_and_retry
@limits(calls=10, period=60) # 10 calls per minute
def query_with_rate_limit(question: str):
return pipeline.query(question)
# Enable response caching
from llama_index.core.storage.cache import SimpleCache
Settings.cache = SimpleCache()
# Process multiple queries efficiently
questions = ["Q1", "Q2", "Q3"]
responses = [pipeline.query(q) for q in questions]
# Use appropriate similarity_top_k
query_engine = index.as_query_engine(
similarity_top_k=3 # Lower for speed, higher for accuracy
)
# Validate environment
bash scripts/test-llamaindex.sh
# Check .env file
cat .env | grep OPENAI_API_KEY
# Reinstall dependencies
bash scripts/setup-llamaindex.sh
# Verify installation
python -c "import llama_index; print(llama_index.__version__)"
# Check storage directory exists
import os
assert os.path.exists("./storage"), "Storage directory not found"
# Verify index files
assert os.path.exists("./storage/docstore.json"), "Index not persisted"
# Reduce chunk size
node_parser = SentenceSplitter(chunk_size=256) # Smaller chunks
# Process documents in batches
for batch in document_batches:
batch_index = VectorStoreIndex.from_documents(batch)
# Merge indices