Automatically convert arXiv papers to well-structured Markdown documentation. Invoke with an arXiv ID to fetch materials (LaTeX source or PDF), convert to Markdown, and generate implementation-ready reference documentation with preserved mathematics and section structure.
This skill inherits all available tools. When active, it can use any tool Claude has access to.
references/arxiv-fetch.mdreferences/latex-conversion.mdreferences/output-format.mdreferences/pdf-conversion.mdscripts/check_pdf_skill.pyscripts/convert_latex.pyscripts/convert_paper.pyscripts/convert_pdf_double_column.pyscripts/convert_pdf_extract.pyscripts/convert_pdf_simple.pyscripts/convert_pdf_split_columns.pyscripts/convert_pdf_with_vision.pyscripts/fetch_paper.pyscripts/pdf_converter_lib.pyAutomatically converts arXiv papers into structured Markdown documentation for implementation reference.
This skill automatically:
Fetches paper materials from arXiv
Converts to structured Markdown
$...$, $$...$$)Generates implementation-ready documentation
papers/{ARXIV_ID}/{ARXIV_ID}.mdInvoke this skill when the user requests:
Use the main orchestrator script which handles everything automatically:
python scripts/convert_paper.py ARXIV_ID [--output-dir DIR]
The orchestrator:
fetch_paper.py to download materials (with automatic source→PDF fallback)convert_latex.py or convert_pdf_simple.py)papers/{ARXIV_ID}/{ARXIV_ID}.mdAll HTTP requests (curl), file extraction (tar), and directory creation (mkdir) are handled automatically.
The fetcher tries LaTeX source first, then PDF:
.tar.gz, extracts to papers/{ID}/source/, converts with pandocpapers/{ID}/pdf/, extracts text with pdfplumberNo manual intervention needed—the skill handles format detection and fallback automatically.
Generated Markdown includes:
$f(x) = x^2$$$\int_0^\infty e^{-x} dx = 1$$Output location: papers/{ARXIV_ID}/{ARXIV_ID}.md
Three specialized scripts for direct PDF conversion:
Convert all pages as single-column layout.
uv run convert_pdf_simple.py paper.pdf -o output.md
Convert all pages as double-column layout (for academic papers).
uv run convert_pdf_double_column.py paper.pdf -o output.md
Extract specific pages with optional double-column processing.
# Extract specific pages
uv run convert_pdf_extract.py paper.pdf --pages 1-5,10 -o output.md
# Extract with mixed column layouts
uv run convert_pdf_extract.py paper.pdf --pages 1-10 --double-column-pages 3-7 -o output.md
Note: --double-column-pages must be a subset of --pages. Invalid page ranges cause immediate error.
All three scripts share common conversion logic through pdf_converter_lib.py, ensuring consistent behavior while keeping each script focused on its specific use case.
For papers with complex mathematical formulas where text extraction fails, a vision-based approach is available as a manual fallback:
# Generate high-resolution images from PDF
python scripts/convert_pdf_with_vision.py paper.pdf --dpi 300 --columns 2
This creates page images (with optional column splitting) that can be read manually with Claude's vision capabilities for maximum accuracy. This is NOT part of the automatic workflow—use it only when automatic conversion produces poor results.
See references/pdf-conversion.md for details on vision-based conversion.
papers/
└── {ARXIV_ID}/
├── source/ # LaTeX source files (if available)
├── pdf/ # PDF file
├── {ARXIV_ID}.md # Generated Markdown output
└── figures/ # Extracted figures (if any)