Your LLM can't watch a screen recording. Screex turns one into text it can read.
Screex
Screen-recording understanding for agents. Screex turns a screencast into a queryable
index of UI states — each with the on-screen text (OCR), what text changed since the
previous state, a thumbnail, and a full-resolution keyframe — so an LLM/agent can produce an
action transcript, answer questions, or generate a how-to guide / bug report from a recording.
- Training-free & model-agnostic — no fine-tuned UI model; any LLM can read the index.
pip install-only — OCR via rapidocr-onnxruntime, no system binaries.
- Server-friendly runtime — uses headless OpenCV, so CI and Linux servers do not need GUI
libraries just to build indexes.
- Cheap by design — the on-screen text is plain text (nearly free to read); full-res
keyframes are escalated to only when the text is insufficient.
- ~70% lower token cost — in our GUI video-QA benchmarks, handing an agent the Screex
index instead of raw video frames cut the input tokens sent to the model by around 70%,
with little loss in answer accuracy.
- Fast OCR — tuned onnxruntime threading makes text extraction ~3.85× faster than the default.
- Narration-aware — with
pip install 'screex[audio]', the index includes a timestamped transcript of the spoken audio, interleaved into the step transcript.
Good for: bug repros → reproduction reports · demos & Loom videos → how-to docs ·
tutorials → step lists · "what did the user do / what URL did they open?" Q&A over a recording.
Best on screen recordings. Screex is tuned for screencasts — mostly-static UI punctuated
by discrete changes (clicks, typing, navigation). On that input it segments into a handful of
meaningful states quickly. For general / continuous-motion video (camera footage, gameplay,
talking-head clips) the change detector fires on nearly every frame, so prefer --fast with a
higher --change-threshold (e.g. --fast --change-threshold 0.10) to avoid over-segmentation.
Example
A short screen recording of a login → settings → error flow becomes a timestamped step list:
screex transcript bug-repro.mp4 -o steps.md
steps.md:
# Transcript — bug-repro.mp4 (0:06)
## 0:00–0:01 · State 1
Acme Console · Sign in · Email: [email protected]
**Appeared:** Acme Console, Sign in
## 0:01–0:02 · State 2
Dashboard · Welcome back, Rushi · Projects: 3
**Appeared:** Dashboard, Welcome back, Rushi
**Gone:** Acme Console, Sign in
## 0:03–0:04 · State 3
Settings > API Keys · New key: sk-live-9f2a · [ Save ]
**Appeared:** Settings > API Keys, New key: sk-live-9f2a
## 0:04–0:06 · State 4
Error: invalid API key format · Expected prefix 'sk_' not 'sk-'
**Appeared:** Error: invalid API key format
Prefer richer output? Hand the index.json to Claude via the bundled skill and ask for a
bug report, a how-to guide, or answers to questions about the recording.
Install
From PyPI
pip install screex
For spoken-word narration in the index, also install the audio extra: pip install 'screex[audio]'.
From source
git clone https://github.com/blueprintparadise/Screex.git
cd Screex
pip install -e . # add ".[test]" to also install pytest
Both give you a screex command (entry point screex.cli:main). Requires Python ≥ 3.9.
The OCR models ship inside the rapidocr-onnxruntime dependency, so no separate download is needed.
Quickstart (CLI)
# Build the index for a screen recording
screex index path/to/recording.mp4 --fps 2
# (or, without installing the package:)
python -m screex.cli index path/to/recording.mp4 --fps 2
This writes:
path/to/recording.screex/
index.json # the ScreenIndex (ordered UI states)
frames/00000.png # full-res keyframe per state
frames/00000_thumb.png# thumbnail per state
...