From pdf-research
Indexes PDF documents with LightRAG, extracts text via PyMuPDF, builds embeddings and knowledge graphs, enables hybrid semantic searches with citations for document Q&A.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pdf-research:pdf-researchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
LightRAG-based PDF document indexing and semantic search for Claude Code research workflows.
LightRAG-based PDF document indexing and semantic search for Claude Code research workflows.
When user invokes /pdf-research, Claude should:
python pdf_research.py status to see current configuration# Always run from scripts directory
cd ~/.claude/skills/pdf-research/scripts
# Check current status
python pdf_research.py status
# Index PDFs (when user provides a directory)
python pdf_research.py index /path/to/pdfs
# Search (single query)
python pdf_research.py search "user's question" --mode hybrid
# Interactive search session
python pdf_research.py search
Before running commands, ensure:
# Activate Python environment with dependencies
source /path/to/venv/bin/activate # or use system Python with deps installed
# Ensure OpenAI API key is set
export OPENAI_API_KEY=sk-...
index command)search command)status command)config command)python pdf_research.py index <path>python pdf_research.py statuspython pdf_research.py search "<question>"python pdf_research.py config --pdf-dir <path> --storage-dir <path># Configure defaults (run once)
python pdf_research.py config --pdf-dir /path/to/pdfs --storage-dir ./rag_storage
# Index PDFs
python pdf_research.py index [pdf_dir] [--storage <path>]
# Search (single query)
python pdf_research.py search "query" [--mode hybrid|local|global|naive]
# Search (interactive)
python pdf_research.py search
# Check status
python pdf_research.py status
| Mode | Best For | Description |
|---|---|---|
hybrid | General queries | Combined local + global (default) |
local | Specific facts | Names, numbers, definitions |
global | Summaries | Themes, trends, overviews |
naive | Exact terms | Simple keyword matching |
After indexing, rag_storage/ contains:
| File | Description |
|---|---|
config.json | User configuration |
kv_store_full_docs.json | Full document text |
kv_store_text_chunks.json | Semantic chunks |
kv_store_full_entities.json | Extracted entities |
vdb_*.json | Vector embeddings |
graph_*.graphml | Knowledge graph |
User: /pdf-research ~/Documents/papers 인덱싱해줘
Claude: [Runs indexing]
Indexing complete!
- Documents: 5
- Chunks: 247
- Storage: 32.5 MB
User: AI 인재 양성 전략에 대해 알려줘
Claude: [Runs search]
Based on the indexed documents...
[Detailed response with references]
export OPENAI_API_KEY=sk-your-key
python pdf_research.py index /path/to/pdfs
pip install lightrag-hku[api] pymupdf python-dotenv
lightrag-hku[api]>=1.4.9pymupdf>=1.24.0python-dotenv>=1.0.0npx claudepluginhub hongsw/plugin-for-claude-research --plugin pdf-researchIngests PDF datasheets or reference manuals into the embedded docs search index via ingest_docs tool. Reports chunks ingested and tables found.
Searches indexed local document folders using natural language queries on Markdown/text files. Activates for file content questions, 'find document about...', or indexing requests.
Extracts key insights ('gems') from PDFs, academic papers, and research documents into the laqrumcode memory graph for study and recall.