From screex
Indexes screen recordings to extract UI state sequences, on-screen text, and events. Generates transcripts, how-to docs, or bug reports from screencasts and demos.
How this skill is triggered — by the user, by Claude, or both
Slash command
/screex:screexThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The user points you at a screen recording (a screencast, demo, tutorial, or bug repro) and
The user points you at a screen recording (a screencast, demo, tutorial, or bug repro) and wants a step transcript, a how-to doc, a bug report, or answers to questions about it.
Run:
screex index <recording> --fps 2
By default Screex segments by on-screen text change, so even a subtle local change (a
dialog, a status line, a new field) becomes its own UI state — no threshold tuning needed.
This writes <recording>.screex/index.json plus per-state frames/NNNNN.png (full-res
keyframe) and frames/NNNNN_thumb.png (thumbnail).
--fps for fast-moving recordings.--text-threshold (default 0.80) to split states more eagerly; raise it to merge.--fast for motion-only segmentation (no per-frame OCR) on simple clips — faster, but
it misses subtle local changes.Text mode OCRs every changed frame, so it is slow on long or busy video (a 2-minute clip can take several minutes). Choose options up front:
--fast (motion-only) or cap the work with --max-frames 60.Screex prints progress to stderr (index: state N …) as it builds. Watch that to see it
working — do not sit in a long sleep. If you must run it in the background, poll the
output file for new state lines rather than blind-waiting.
Read index.json. It is an ordered list of UI states, each with t_start/t_end,
ocr_text (the on-screen text), text_added / text_removed (what text appeared or
disappeared vs the previous state — the strongest signal of what the user did), and paths to
a thumbnail and full-res keyframe. The on-screen text is plain text — reading it across
states is cheap. If the recording was narrated and screex[audio] is installed, the index also has a narration field (timestamped spoken text) — use it to explain why each step happened and to answer questions about what the narrator said (--no-audio skips it).
--events: adds an event per state (after the first) — a typed action
(navigate/type/click/open_dialog/error/scroll/edit) grounded to the changed
screen region. When present it is the strongest "what did the user do" signal, above
text_added. It is still heuristic (inferred from pixels + OCR, no real click events), so
trust text_added/ocr_text when they disagree.--interactions: adds an interactions array per state ({t, x, y, label}) — a heuristic
estimate of where the user was acting and the nearest on-screen text. Great for "what did
they click?" questions, but it's approximate (no real click events) — trust text_added
over it when they disagree.--boxes: adds boxes per state ({text, box:[x,y,w,h]}) so you can reason about where
text is (e.g. "the button in the top-right").--redact: masks secrets/PII (keys, emails, tokens, cards) in the text and blurs those
regions in the keyframes. Use it whenever the recording may contain credentials before
you read or share keyframes.--keyframe-budget N: scores each state's salience (how much its text changed + how crisp
the keyframe is + whether it carries a typed event) so the index can surface the N most
informative, temporally-spread keyframes. Read them via
ScreenIndex.compact_dict(keyframe_budget=N)["curated_keyframes"]. When you need to escalate
to images but have a tight budget, read these keyframes rather than guessing which states
to open.text_added/text_removed plus the
thumbnail to narrate timestamped steps, e.g. "0:04 opened Settings; 0:09 entered an API
key; 0:14 an 'invalid key' error appeared."
screex transcript <recording> -o steps.md and read/return that file.ocr_text across states (cheap). Read
the full-res keyframe PNG for a state only when the text is insufficient (small icons,
layout, colour).You don't have to read the whole JSON: screex info <index.json> prints a summary (state count,
duration, event histogram, warnings), and screex search <index.json> "<text>" [--event click] [--since 5] [--until 20] returns the matching states. Both take --json. Use these to jump to the
right states before escalating to keyframes.
The ocr_text and text_* fields are text and nearly free to read. Escalate to a
keyframe image only for the few states where the text doesn't answer the question.
ocr_text can contain minor OCR noise (stray glyphs); collapse states whose ocr_text is
essentially identical when you narrate. If a long recording produced only one state, re-run
with a lower --text-threshold or a higher --fps (or drop --fast).
If the index has no on-screen text at all, the recording isn't a text UI — re-run with --fast
to get visual (motion) states instead.
npx claudepluginhub blueprintparadise/screex --plugin screexAnalyzes product walkthroughs, bug report videos, Loom, or ScreenPal recordings into a durable brief with transcript, key frames, issues, and next steps.
Extracts scene-change frames, pacing metrics, and transcript from video URLs or local paths; produces structured report for editorial analysis.
Ingests video/audio from files, URLs, RTSP feeds, or desktop capture; indexes visual/spoken content for search; transcodes, edits timelines, generates assets, and creates real-time alerts.