Skill

ReadAware Ingest

把一本书导入 readaware 书库——提取正文、解析卷/章结构、生成可定位的 manifest。当用户想"加一本书""把这个 epub/txt 导进来读""开始读某本新书"时触发。Ingest a book into the reading library so read can locate passages in it.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/readaware:ingest

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

BashReadWriteAskUserQuestion

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Goal: put the user's book into the library `~/.claude/readaware/books/<slug>/`, holding

SKILL.md

142 lines · ~1.9k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 12, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

readaware:ingest — turn a book into a "locatable structure"

Goal: put the user's book into the library ~/.claude/readaware/books/<slug>/, holding text.txt (cleaned body text) and manifest.json (which paragraph each volume/chapter starts at, and the front-matter/back-matter boundaries). Afterwards readaware:read uses this manifest to precisely locate the passages the user throws at it.

The scripts live in ${CLAUDE_PLUGIN_ROOT}/scripts/: extract_text.py turns .epub/.html/.txt into the body format, and build_manifest.py does the structure parsing.

Steps

1. Get the body text as a txt

Use the bundled extractor — it's pure-stdlib, no pandoc/Calibre needed, and handles .epub, .html/.xhtml, and .txt:

python3 "${CLAUDE_PLUGIN_ROOT}/scripts/extract_text.py" input.epub \
    -o ~/.claude/readaware/books/<slug>/text.txt

For .epub it walks the spine in reading order, strips tags/scripts, decodes entities, and collapses each block into one line. It also writes a text.struct.json sidecar from the epub's own NCX/nav table of contents — build_manifest.py uses that authoritative structure instead of guessing chapter boundaries from text patterns.
Target format: one paragraph per line (blank lines are dropped anyway; build_manifest.py treats every non-empty line as one paragraph).
If extraction comes back empty (DRM, scanned-image, or a weird container), say so and ask the user for a .txt. pandoc/ebook-convert remain fine alternatives if the user prefers them.

2. Pick a slug, make the dir, drop in the text

Give the book a short slug (lowercase English, e.g. karamazov, brothers-k).
mkdir -p ~/.claude/readaware/books/<slug>, copy the body text in, and name it text.txt.

3. Self-check the structure before writing the manifest (key — don't skip)

First run --toc to see whether parsing is right, and hand the result to the user to verify:

python3 "${CLAUDE_PLUGIN_ROOT}/scripts/build_manifest.py" \
    ~/.claude/readaware/books/<slug>/text.txt \
    --title "Title" --translator "Translator" --toc

The parser reports a parse mode:

epub-struct — used the epub's own NCX/nav TOC (via the text.struct.json sidecar). Most reliable; this is what you get for a normal epub.
toc — a .txt with an in-text table of contents; parsed it and matched chapters in the body.
scan — a .txt with no TOC; scanned the body directly for headings.

It auto-detects layout either way — Chinese (第X卷/第X章/bare 一标题) and Western (Part/Book + Chapter, Arabic/Roman numbering) — and the volume/part layer is optional (flat chapter-only books work too). Check two things: whether the part/chapter counts look right; and (in toc mode) if it warns "TOC chapter count ≠ body match count", some chapter headings didn't line up and locating will drift.

4. If the layout is off, tune parameters — don't edit the script

Most books need no tuning. When an unusual layout doesn't line up, adjust the command-line parameters and retry (locate.py never moves — the book-specific knowledge stays here):

--part-re (volume/part-title regex(es); group1 = number, group2 = title) — defaults cover 第X卷/部/篇 and Part/Book/Volume X
--chap-re (extra prefixed chapter-title regex(es), appended to the built-in 第X章/Chapter X)
--epilogue (names of unnumbered closing parts, e.g. 尾声, Epilogue)
--front (front-matter section titles), --back (back-matter/afterword titles)

Once --toc looks right, go to the next step.

5. Write the manifest

python3 "${CLAUDE_PLUGIN_ROOT}/scripts/build_manifest.py" \
    ~/.claude/readaware/books/<slug>/text.txt \
    -o ~/.claude/readaware/books/<slug>/manifest.json \
    --title "Title" --translator "Translator"   # carry over any parameters you tuned in step 4

6. Verify the parse — and repair if it's wrong (never edit the shared scripts)

Don't trust the parse by eye. Run the checker:

python3 "${CLAUDE_PLUGIN_ROOT}/scripts/verify_manifest.py" ~/.claude/readaware/books/<slug>

Exit 0 = good (skim any ⚠️ WARN — usually fine). Exit 1 = FAIL: the structure is wrong (citation initials like "V." mistaken for chapter numbers, the whole book collapsed into "one chapter", counts that don't add up, markers out of order…). Fix it before continuing.

The repair principle: the shared scripts (build_manifest.py, locate.py, extract_text.py) stay universal — never edit them per book. We can't make one parser fit every book; a book the defaults can't handle is fixed with data in that book's own directory, then re-verified. Escalate only as far as you need:

Tune parameters (cheapest). Re-run step 5 with --part-re/--chap-re/--front/--back, then re-verify. Good when the layout is regular but unusual (卷一 not 第一卷, Book I headings…).
Hand-author the structure sidecar when parameters can't express the layout (chapter numbers buried in markup, OCR noise, an epub with no usable TOC). You become the parser: read text.txt, find where each chapter/part actually starts, and write text.struct.json beside it — build_manifest.py uses it verbatim (mode epub-struct). Format (reading order; depth 1 = part, 2 = chapter; drop parts entirely for a flat book):
```
{"headings": [
  {"para_idx": 12, "title": "Part One", "depth": 1},
  {"para_idx": 13, "title": "Chapter 1: Adventurers", "depth": 2}
]}
```
para_idx is the 0-based index among non-empty lines of text.txt. To find indices, dump the short (heading-ish) lines with their indices, then pick the real ones:
```
python3 - ~/.claude/readaware/books/<slug>/text.txt <<'PY'
import sys
for i, line in enumerate(p for p in open(sys.argv[1]) if p.strip()):
    if len(line.strip()) <= 40:
        print(i, repr(line.strip()[:50]))
PY
```
Then re-run step 5 (it picks up the sidecar) and re-verify.
Edit manifest.json directly (last resort) for a handful of wrong markers — fix/add/remove markers[] entries or body_start/body_end, keeping para_idx strictly increasing — re-verify.

Loop until verify_manifest.py passes (or you've reasoned a lone WARN is genuinely fine). Whatever you wrote — tuned params noted in your summary, or the sidecar/manifest in the book dir — stays with the book, so re-ingest is reproducible.

7. (Optional but recommended) Write a book card so read "knows" the book better

In the book dir, write card.md: this book's core motifs, main characters, and structures worth stopping for. read references it during analysis so the interpretation hugs this book instead of being generic. Write it from what you know about the book, flag anything uncertain, and don't make things up.

8. Set it as the active book

Write it into ~/.claude/readaware/state.json:

{"active_book": "<slug>"}

(Create the file if it doesn't exist. read defaults to active_book; it can be omitted when the library has only one book.)

When done

Tell the user: which book was ingested, how many parts/chapters were parsed (and the parse mode), that it passed verify_manifest.py (and any repair you had to do), where it lives, and whether a card was written. They can start readaware:read.

ReadAware Ingest

Invocation

Tool Access

Context Preview

SKILL.md

ReadAware Ingest

Invocation

Tool Access

Context Preview

SKILL.md

readaware:ingest — turn a book into a "locatable structure"

Steps

1. Get the body text as a txt

2. Pick a slug, make the dir, drop in the text

3. Self-check the structure before writing the manifest (key — don't skip)

4. If the layout is off, tune parameters — don't edit the script

5. Write the manifest

6. Verify the parse — and repair if it's wrong (never edit the shared scripts)

7. (Optional but recommended) Write a book card so read "knows" the book better

8. Set it as the active book

When done

Similar Skills

readaware:ingest — turn a book into a "locatable structure"

Steps

1. Get the body text as a txt

2. Pick a slug, make the dir, drop in the text

3. Self-check the structure before writing the manifest (key — don't skip)

4. If the layout is off, tune parameters — don't edit the script

5. Write the manifest

6. Verify the parse — and repair if it's wrong (never edit the shared scripts)

7. (Optional but recommended) Write a book card so read "knows" the book better

8. Set it as the active book

When done

Similar Skills