Official codebase for PIXELRAG: Web Screenshots Beat Text for
Retrieval-Augmented Generation
Yichuan Wang*,
Zhifei Li*,
Zirui Wang,
Paul Teiletche,
Lesheng Jin
Matei Zaharia†,
Joseph E. Gonzalez†,
Sewon Min†
* Equal contribution † Equal advising
Work done at Berkeley SkyLab & BAIR & Berkeley NLP
Search any document by how it looks, not just the text it contains.
What it is ·
Give Claude eyes ·
How it works ·
Pipelines
pip install pixelrag
The two core operations — render a page to screenshots, search a visual index:
# Render any page or document to screenshot tiles
pixelshot https://en.wikipedia.org/wiki/Python --output ./tiles
# Search a hosted index of 8.28M Wikipedia pages — no setup, runs against the live API
curl -X POST https://api.pixelrag.ai/search \
-H "Content-Type: application/json" \
-d '{"queries": [{"text": "What is the capital of France?"}], "n_docs": 5}'
Live, hosted endpoint — https://api.pixelrag.ai serves a
pre-built index of 8.28M Wikipedia pages. No setup, no API key. It even takes an image as the query
(visual search) — see the API reference →.
Or try it in the browser at pixelrag.ai, or run the demo notebook in
Colab
— it
renders a page and searches the hosted index, with the images inline.
What it is
PixelRAG renders documents — web pages, PDFs, images — as screenshots and retrieves over the
images directly. Visual structure that HTML parsing throws away — tables, charts, layout,
infographics — stays intact, so the reader model can actually answer questions about it.
Wikipedia's 8.28M articles ship as a pre-built index; the pipeline itself is general-purpose.
Give Claude eyes
The renderer also ships as a Claude Code plugin — the pixelbrowse skill. Instead of fetching
raw HTML, Claude screenshots a page with pixelshot and reads the image, so it sees
charts, diagrams, tables, and layout the way a person does.
Install it — no clone needed (pixelshot comes from pip install pixelrag):
pip install pixelrag # provides the pixelshot command
claude plugin marketplace add StarTrail-org/PixelRAG
claude plugin install pixelbrowse@pixelrag-plugins
Then just ask Claude to look at a page:
claude -p "screenshot https://news.ycombinator.com and summarize the top stories"
claude -p "screenshot https://arxiv.org/abs/2404.12387 and explain the key findings"
Or use the slash command in an interactive session: /screenshot https://example.com.
No MCP server, no backend: the skill just calls pixelshot (Playwright/CDP) on your machine.
How it works