Skill

screex

Indexes screen recordings to extract UI state sequences, on-screen text, and events. Generates transcripts, how-to docs, or bug reports from screencasts and demos.

developer-tools

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/screex:screex

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

The user points you at a screen recording (a screencast, demo, tutorial, or bug repro) and

SKILL.md

93 lines · ~1.5k tokens

Stats

LanguagePython

Stars12

MaintenanceExcellent

Last CommitJun 22, 2026

Actions

View Source View Plugin View on GitHub View README

Screex — screen-recording understanding

When to use

The user points you at a screen recording (a screencast, demo, tutorial, or bug repro) and wants a step transcript, a how-to doc, a bug report, or answers to questions about it.

Build the index

Run: screex index <recording> --fps 2 By default Screex segments by on-screen text change, so even a subtle local change (a dialog, a status line, a new field) becomes its own UI state — no threshold tuning needed. This writes <recording>.screex/index.json plus per-state frames/NNNNN.png (full-res keyframe) and frames/NNNNN_thumb.png (thumbnail).

Raise --fps for fast-moving recordings.
Lower --text-threshold (default 0.80) to split states more eagerly; raise it to merge.
Add --fast for motion-only segmentation (no per-frame OCR) on simple clips — faster, but it misses subtle local changes.

Performance — long or fast-moving recordings

Text mode OCRs every changed frame, so it is slow on long or busy video (a 2-minute clip can take several minutes). Choose options up front:

Recording longer than ~30s, or anything that isn't a calm UI screencast → start with --fast (motion-only) or cap the work with --max-frames 60.
Only use full text mode when subtle on-screen text changes actually matter.

Screex prints progress to stderr (index: state N …) as it builds. Watch that to see it working — do not sit in a long sleep. If you must run it in the background, poll the output file for new state lines rather than blind-waiting.

Read the index

Read index.json. It is an ordered list of UI states, each with t_start/t_end, ocr_text (the on-screen text), text_added / text_removed (what text appeared or disappeared vs the previous state — the strongest signal of what the user did), and paths to a thumbnail and full-res keyframe. The on-screen text is plain text — reading it across states is cheap. If the recording was narrated and screex[audio] is installed, the index also has a narration field (timestamped spoken text) — use it to explain why each step happened and to answer questions about what the narrator said (--no-audio skips it).

Optional fields (enable when useful)

--events: adds an event per state (after the first) — a typed action (navigate/type/click/open_dialog/error/scroll/edit) grounded to the changed screen region. When present it is the strongest "what did the user do" signal, above text_added. It is still heuristic (inferred from pixels + OCR, no real click events), so trust text_added/ocr_text when they disagree.
--interactions: adds an interactions array per state ({t, x, y, label}) — a heuristic estimate of where the user was acting and the nearest on-screen text. Great for "what did they click?" questions, but it's approximate (no real click events) — trust text_added over it when they disagree.
--boxes: adds boxes per state ({text, box:[x,y,w,h]}) so you can reason about where text is (e.g. "the button in the top-right").
--redact: masks secrets/PII (keys, emails, tokens, cards) in the text and blurs those regions in the keyframes. Use it whenever the recording may contain credentials before you read or share keyframes.
--keyframe-budget N: scores each state's salience (how much its text changed + how crisp the keyframe is + whether it carries a typed event) so the index can surface the N most informative, temporally-spread keyframes. Read them via ScreenIndex.compact_dict(keyframe_budget=N)["curated_keyframes"]. When you need to escalate to images but have a tight budget, read these keyframes rather than guessing which states to open.

Produce one of three views

Action transcript: walk the states in order; use text_added/text_removed plus the thumbnail to narrate timestamped steps, e.g. "0:04 opened Settings; 0:09 entered an API key; 0:14 an 'invalid key' error appeared."
- Shortcut: for a quick deterministic markdown transcript without reasoning over the index yourself, run screex transcript <recording> -o steps.md and read/return that file.
Q&A: answer the user's question by scanning ocr_text across states (cheap). Read the full-res keyframe PNG for a state only when the text is insufficient (small icons, layout, colour).
Doc / bug report: format the transcript into a how-to guide, or a structured reproduction report (steps to reproduce, expected vs actual).

Inspect or query the index from the shell

You don't have to read the whole JSON: screex info <index.json> prints a summary (state count, duration, event histogram, warnings), and screex search <index.json> "<text>" [--event click] [--since 5] [--until 20] returns the matching states. Both take --json. Use these to jump to the right states before escalating to keyframes.

Cost discipline

The ocr_text and text_* fields are text and nearly free to read. Escalate to a keyframe image only for the few states where the text doesn't answer the question.

Caveats

ocr_text can contain minor OCR noise (stray glyphs); collapse states whose ocr_text is essentially identical when you narrate. If a long recording produced only one state, re-run with a lower --text-threshold or a higher --fps (or drop --fast). If the index has no on-screen text at all, the recording isn't a text UI — re-run with --fast to get visual (motion) states instead.

screex

Popularity

Invocation

Context Preview

SKILL.md

screex

Popularity

Invocation

Context Preview

SKILL.md

Screex — screen-recording understanding

When to use

Build the index

Performance — long or fast-moving recordings

Read the index

Optional fields (enable when useful)

Produce one of three views

Inspect or query the index from the shell

Cost discipline

Caveats

Similar Skills

Screex — screen-recording understanding

When to use

Build the index

Performance — long or fast-moving recordings

Read the index

Optional fields (enable when useful)

Produce one of three views

Inspect or query the index from the shell

Cost discipline

Caveats

Similar Skills