From newsroom-os
Config-driven, channel-agnostic INTAKE capability for an installed newsroom: it brings external signals in through the INGEST ADAPTER SLOT and emits one provenance-bearing WIRE NOTE per signal into newsroom/wire/. The agent DETECTs which ingest adapter the installation has wired (email-imap / m365-connector / rss-atom / http-api / manual-drop), fetches via that adapter, and DEGRADES GRACEFULLY when an adapter is absent — manual-drop is the irreducible floor that always works. Cleaning runs on the GENERIC CLEANER FLOOR (a defuddle-style pass that works on ANY source, lossy) as the v1 acceptance path; robust per-source cleaning is OPT-IN hand-authored refinement the operator adds later (PROPOSE→CONFIRM), NEVER auto-derived from a few samples (D1). Markdown-first (P13): ALL intelligence — what is signal vs noise, the signal_reason, the provenance anchor judgment — is the agent reading and writing prose; any bundled script is mechanical-only (fetch/convert/strip-tracking), never extraction or summarization. Wire notes conform to the bundled wire-note schema and carry a resolvable provenance anchor (url / corpus_note / raw). The skill never folds wires into dossiers (that is topic-editor) and never normalizes the controlled tag vocabulary. Use when the user asks to "ingest", "import sources", "new sources", "run ingestion", "scout the pool", "fetch the feeds", "pull the newsletters", or "bring these sources in". Manifest-first: reads MANIFEST.md before globbing zones.
How this skill is triggered — by the user, by Claude, or both
Slash command
/newsroom-os:newsroom-ingestionThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
The **intake layer** of the newsroom: it turns external signals — whatever channel
The intake layer of the newsroom: it turns external signals — whatever channel they arrive on — into atomic, provenance-bearing wire notes that the topic-editor later folds into dossiers. It is the first phase of the substrate spine.
It is a skill, not a CLI and not a scraper — the agent reads the invocation, DETECTs the wired ingest adapter, runs the mechanical fetch/convert/clean wrappers, and then reads each cleaned source and writes wire notes by judgment. All intelligence is the agent reading content and deciding what is signal, why it might matter, and which provenance anchor resolves it. No bundled script extracts topics, summarizes, or judges signal strength — those are agent reasoning (P13 / the E3 fence).
This skill operates on an installed newsroom (a project-workspace-contract@2
workspace). Every installation-specific value it needs — which adapter is wired,
the source registry, the relevance lens that informs signal judgment — is read from
config and the bundled contracts at runtime. Nothing about any one company, tool,
or source is hardcoded (R63 self-containment; the config-as-truth invariant).
Generic by construction. This skill ships zero source-installation content. All worked examples below use the synthetic Meridian Robotics (a fictional industrial-automation vendor) carried by the bundled contracts — mechanics only, never a real installation's sources, tool choices, or audience. The adapter names below (
email-imap,rss-atom, …) are generic capability labels, not product names; any reference to a named external tool is a clearly-labelled generic example, never a hardcoded tool binding (AC2).
Per the substrate I/O spine
([[newsroom-os/skills/newsroom-install/templates/substrate/contracts/io-contracts.md]];
ingestion conforms to it and operates on the installed copy at runtime), ingestion
is the Intake step — it writes the wire notes everything downstream consumes:
external signal ─┐
├─▶ INGESTION ─▶ wire notes ─▶ topic-editor (fold-in N:M) ─▶ dossiers
(via the ingest ┘ (this skill) newsroom/wire/<id>.md │
adapter slot) ▼
curation / managing-editor
| Direction | Artifact | Path | Role |
|---|---|---|---|
| Reads | MANIFEST.md | <root>/MANIFEST.md | routing surface (Phase 0) — what wire notes already exist |
| Reads | source registry | config/<source_registry_ref> (per the channel module) | which sources to pull + their channel/adapter |
| Reads | company-context | config/company-context.md §5 | relevance lens — informs signal_strength judgment only |
| Reads (in) | raw signals | the wired adapter's surface (mailbox / feed / API / research/sources/<id>/raw.*) | the material to file |
| Writes | wire notes | newsroom/wire/<wire-id>.md | ONE per filed signal — the deliverable (wire_state: filed) |
| Writes (overlay) | source overlay | research/sources/<source-id>/source.md | optional Kind-2 AI overlay on a human-dropped raw source |
Ingestion writes ONLY wire notes (+ the optional source overlay). It does not write
linked_dossiers(topic-editor's canonical surface), does not normalize the controlledtags:/entities:vocabulary (topic-editor's job at fold-in), and does not transition any downstream*_state:. A wire note is a filing — "we noticed this, here's why" — not a story draft.
Operational runbook. Read
references/ingestion-runbook.mdbefore executing — it carries the step-by-step procedure (adapter readiness handshake, the mechanical fetch/convert/clean wrappers, the generic-cleaner-floor pass, batch-and-sub-agent sizing, the write-first rule, and per-adapter error recovery). The wire-emission contract lives inreferences/wire-note-emission.md; the opt-in refinement guide inreferences/per-source-refinement.md; the end-of-run gate inchecklists/wire-note-emission.md.
Read MANIFEST.md at the newsroom root FIRST, before any files-first globbing
of newsroom/wire/. The manifest is the routing surface: it tells you which wire
notes already exist (so intake is idempotent — never re-file a signal that
already has a wire note). Use the newsroom/wire/ folder only to resolve the
concrete files the manifest points at. The meta/wire-index cache is a derived
cache subordinate to the manifest (manifest wins on conflict). Files-first
discovery — globbing the wire zone to learn what exists — is the refused
anti-pattern.
The newsroom must not hard-assume any mailbox, feed, or tool. Getting signals in
is an abstract capability slot; a concrete adapter implements it. Ingestion
DETECTs which adapter the installation has wired (from the channel module's
ingest.ingest_slot + the credential --discover probe), fetches through it, and
degrades gracefully when an adapter is absent.
The closed slot enum (channel-module schema §6.1) — these are generic capability labels, not product names:
ingest_slot | What it abstracts | Reference adapter (clearly-labelled generic example) |
|---|---|---|
manual-drop | Files a human drops into the newsroom | the research/sources/<id>/raw.* drop folder — the irreducible floor |
rss-atom | A no-auth structured feed | an RSS/Atom poller wrapper |
http-api | A public API returning structured items | an API-fetch wrapper (e.g. a papers-feed-like API) |
email-imap | A mailbox over IMAP | an IMAP fetch wrapper |
m365-connector | A mailbox over a hosted connector | a connector-fetch path (AI-mediated .eml contract — runbook §"Connector fetch") |
The lowest-auth-channel heuristic (generic guidance, ported as policy, not a
mandate). When a source offers more than one channel, prefer the lowest-auth,
most-structured one: a no-auth structured feed (rss-atom / http-api) over an
auth-heavy mailbox (email-imap / m365-connector). The one documented exception:
a pre-curated source whose editorial filter is its value (someone already
selected the day's signal) may justifiably stay on its richer email channel even
though email is more auth-heavy — the curation it carries is worth the cost. The
installation decides this at install time and records it in the source registry;
ingestion reads the decision, it does not re-litigate it.
Mirror the research-slot readiness pattern. Before depending on an adapter:
Resolve the wired ingest_slot from the channel module's config + registry.yaml.
Probe presence without echoing secrets — --discover reports key_present
booleans per the ingest credential namespace; never print a value, hash, or
length (R58 redaction).
If the wired adapter is unavailable (no credential, tool absent, fetch fails), degrade down the ladder rather than blocking:
email-imap / m365-connector / rss-atom / http-api (the wired adapter)
│ unavailable
▼
manual-drop ← the irreducible floor: ALWAYS works,
needs no credential, no network, no tool
manual-drop is the floor. A human-dropped research/sources/<id>/raw.* can
always be read and filed as a wire note. If nothing is wired and the drop
folder is empty, STOP and tell the user there is nothing to ingest — never
fabricate signals.
Record the degradation. When you fall back, note it in the run report
("wired adapter email-imap unavailable → manual-drop only this run") so the
operator sees the reduced coverage — exactly as the research slot surfaces its
degraded mode. Degraded intake is an acceptable state; silent degradation is
not.
How a fetched signal becomes readable markdown the agent can file. The design is deliberately two-tiered (D1):
A defuddle-style mechanical pass that works on ANY source — strip boilerplate (nav, footers, tracking pixels), convert HTML → markdown, drop tracking URL parameters. It is lossy: it does not know any one source's exact ad blocks or section structure, so some noise survives and the agent finishes the cleanup by judgment when it reads the source. This floor is the v1 pass bar — a newsroom is fully functional on the generic floor alone, on day one, with zero per-source tuning. The floor is a mechanical wrapper (P13 / R28): it transforms bytes; it does not extract topics, summarize, or judge signal — that is the agent's job.
A robust per-source cleaning pattern is an optional refinement the operator adds
later, on top of the floor, for a high-volume source whose noise is worth tuning
away. It is PROPOSE→CONFIRM (the agent may propose a pattern from observed test
items; the operator ratifies it) and hand-authored — it is NEVER auto-derived
from a few samples and silently applied. A few samples do not generalize; a
pattern the operator did not ratify is not a contract. See
references/per-source-refinement.md for the proposal-and-ratify loop and where a
ratified pattern lives. The floor must work without any refinement; refinement is
hand-maintained gravy, not a dependency.
The fence (D1, reviewer-refusable). The substrate ships the generic floor only. It does NOT ship a corpus of per-source cleaners, and it NEVER auto-derives one. Per-source patterns are the operator's hand-authored IP, layered in after install. A reviewer who finds an auto-derived-and-applied per-source cleaner in the payload refuses it.
In order. The fetch/convert/clean steps are mechanical wrappers; reading and filing
are agent judgment. None of these transitions a downstream *_state:.
Read MANIFEST.md (Phase 0), then the source registry the channel module points at
(config/<source_registry_ref>) to learn which sources to pull and which adapter
each uses. Cross-reference the manifest's existing wire notes to compute the
delta — what has not yet been filed (idempotency).
Run the readiness handshake; fetch through the wired adapter (or the degraded
fallback). Fetch is a mechanical wrapper — it retrieves raw bytes to a working
location (workspace/ scratch or the research/sources/<id>/raw.* Kind-2 layer for
human drops). It makes no editorial judgment.
Run the generic cleaner floor over each fetched item: boilerplate strip → HTML→md → tracking-URL strip. If a ratified per-source refinement exists for this source (Tier 2), apply it on top; otherwise the floor stands alone. Output is readable markdown in the working area. Still mechanical — no extraction.
This is the intelligence step. For each cleaned source, the agent reads it and files one or more wire notes — one atomic filed signal each. For each wire note the agent decides, by judgment:
config/company-context.md §5) — the lens
informs signal_strength, it does not gate or auto-score anything;signal_strength (low | medium | high) and a one-sentence signal_reason;hype_filter when the material is vendor PR / marketing-flavoured
(what may be overstated);provenance_anchor (url | corpus_note | raw) and its resolution —
see the anchor rule below.Be selective: a dense source yields a handful of wire notes, not one per
paragraph; merge near-duplicates within a source. Write the wire note per
references/wire-note-emission.md and the bundled schema. Write-first rule: for
high-volume runs, the write IS the deliverable — file each wire note as you finish
reading its source; do not batch all reading then run out of context before writing
(runbook §"Sub-agent delegation").
Walk checklists/wire-note-emission.md: every wire note validates against the
schema, every wire note resolves to an anchor, no signal was filed twice, and the
run report records the adapter used + any degradation. Ingestion is not complete
until every intended source is either filed or explicitly reported as skipped/failed.
Each wire note conforms to the bundled wire-note schema
([[newsroom-os/skills/newsroom-install/templates/substrate/schemas/wire-note.md]]).
Ingestion emits wire notes at wire_state: filed with an empty
linked_dossiers: [] (linking is the topic-editor's canonical surface, set at the
filed → linked fold step) and empty tags:/entities: (topic-editor
normalizes the controlled vocabulary at fold-in). What ingestion DOES set:
| Field | Ingestion sets | Notes |
|---|---|---|
type | wire-note | — |
wire_id | <filing-date>-<source-slug>[-<n>] | matches ^\d{4}-\d{2}-\d{2}-[a-z0-9-]+$ |
status | draft | the RESERVED enum (R61) — never the intake lifecycle |
wire_state | filed | the intake lifecycle, ALONGSIDE status: (R61) |
source_channel | from the registry | closed enum: newsletter | papers | manual | rss | api | x |
source_name | from the registry | human-readable source identifier |
source_url | the canonical external URL, or null | required non-null when provenance_anchor: url |
filed_at / filed_by | ISO timestamp / newsroom-ingestion | — |
signal_strength / signal_reason | agent judgment | the one-line reason it might matter |
hype_filter | optional skeptic note | when material is vendor PR |
linked_dossiers | [] (empty) | topic-editor's surface — ingestion leaves it empty |
provenance_anchor | url | corpus_note | raw | see the anchor rule |
tags / entities | [] (empty) | topic-editor normalizes at fold-in |
decision_history | one filed entry | decided_by: newsroom-ingestion |
The anchor rule (Commandment V at intake grain, reviewer-refusable). Every wire note must resolve to a concrete provenance anchor:
provenance_anchor: url → source_url: is non-null and stable (the common case
for public web sources). Prefer this — a stable public URL is the cheapest
durable anchor.provenance_anchor: raw → related: links the retained raw source under
research/sources/<id>/raw.* (when there is no stable public URL — e.g. a local
drop, a paywalled copy with no public edition).provenance_anchor: corpus_note → related: links a durable in-vault note
(incl. a knowledge/<subzone>/ digested-evergreen note) the wire cites.A wire note that resolves to no anchor is a provenance gap — STOP and either
retain the raw (raw) or capture the URL (url); never file an unanchored wire.
This is the contract the two-path provenance integrity audit (AC10) ultimately
walks back to.
Ingestion is not a newsletter feature. A wire note is channel-agnostic: it
records a signal, not a destination. The same wire pool feeds a dossier that may
later spawn a story-arc commissioned to any installed channel (newsletter,
blog, …). Ingestion reads which sources to pull from the registry, but it never
assumes the wire will be published anywhere, nor in any particular format. A channel
module declares ingest.needs_ingest + an ingest_slot if it needs intake at all
(a blog module typically does not — it commissions from existing dossiers); a module
that needs no ingest simply wires no source.
newsroom/wire/ (wire_state: filed,
empty linked_dossiers/tags/entities) and, optionally, a Kind-2
research/sources/<id>/source.md AI overlay on a human-dropped raw source.linked_dossiers (topic-editor's canonical surface),
never transitions wire_state past filed, never proposes or mutates a
dossier, never normalizes the controlled tags:/entities: vocabulary, and
never writes any downstream status:/*_state:.This skill's own runtime references — the runbooks under references/ and the
checklist under checklists/ — resolve inside this skill dir, with no read
outside the plugin. Intra-skill ../ traversal (SKILL.md ↔ references/ ↔
checklists/) is legitimate self-contained navigation and is not an R63
violation — the AC1 self-containment grep should not false-flag it.
The schemas + I/O contract ingestion conforms to are not bundled here. Following
the plugin's bundle-vs-install rule (the plugin ships templates under
newsroom-install; skills reference those templates and read the installed
copies at runtime — no per-skill duplication, exactly as curation, managing-editor,
and story-research do), ingestion cites them by in-plugin wikilink:
[[newsroom-os/skills/newsroom-install/templates/substrate/contracts/io-contracts.md]] (the spine; output = wire notes)[[newsroom-os/skills/newsroom-install/templates/substrate/schemas/wire-note.md]] (output schema — the contract every wire note conforms to)[[newsroom-os/skills/newsroom-install/templates/substrate/contracts/company-context-contract.md]] (§5 relevance lens — informs signal_strength only)[[newsroom-os/skills/newsroom-install/templates/substrate/contracts/channel-module-contract.md]] (ingest.ingest_slot + source_registry_ref — which adapter + sources)The wikilinks resolve inside the plugin (R63-clean); the operative copies at ingestion time are the newsroom's installed ones.
newsroom-ingestion/
├── SKILL.md # this file — the workflow
├── references/
│ ├── ingestion-runbook.md # step-by-step: readiness, fetch/clean wrappers, generic floor, batch sizing, write-first, recovery
│ ├── adapter-slots.md # the ingest slot enum, DETECT handshake, degradation ladder, lowest-auth heuristic
│ ├── generic-cleaner-floor.md # the defuddle-style floor (any-source, lossy) — the v1 pass bar
│ ├── per-source-refinement.md # OPT-IN hand-authored refinement (PROPOSE→CONFIRM), never auto-derived (D1)
│ └── wire-note-emission.md # the wire-note output contract + the Commandment-V anchor rule
└── checklists/
└── wire-note-emission.md # the end-of-run gate
| Condition | Action |
|---|---|
MANIFEST.md missing | Stop. The newsroom is not installed / not manifest-routable; tell the user to run the installer. |
No source registry / no ingest_slot wired | Fall back to manual-drop. If the drop folder is empty too, stop — nothing to ingest. |
| Wired adapter unavailable (no credential / tool absent / fetch error) | Degrade down the ladder to manual-drop; record the degradation in the run report. Do NOT block. Do NOT fabricate. |
| Generic floor leaves heavy noise | The agent finishes the cleanup by judgment when reading (Move 4); optionally PROPOSE a per-source refinement (never auto-apply). |
| A source yields no resolvable anchor | Retain the raw (provenance_anchor: raw) or capture the URL; never file an unanchored wire. |
| A signal already has a wire note (manifest) | Skip silently — idempotency. Re-file only if the source materially changed. |
config/company-context.md §5 missing | Continue intake (signal judgment is coarser without the lens); note it in the report — do NOT stop. The lens informs signal_strength; it is not a hard input to filing. |
| Move | Type | What happens |
|---|---|---|
| 0. Manifest-first | Verification | Read MANIFEST first; compute the unfiled delta |
| 1. Resolve source pool | File I/O | Read registry + manifest; idempotency delta |
| 2. Fetch via adapter | Mechanical wrapper + DETECT | Adapter readiness; fetch raw; degrade gracefully |
| 3. Generic-floor clean | Mechanical wrapper | Boilerplate strip → HTML→md → tracking strip (lossy) |
| 4. Read + file wire notes | Agent judgment | What is signal, why it matters, anchor, signal_strength |
| 5. End-of-run gate | Verification | Schema + anchor + idempotency + report |
Ingestion's human-facing output language is config-driven, never hardcoded — the
wire-note prose (the "What the signal is" / "Why it might matter" sections,
signal_reason, hype_filter) and the run report follow the installed
config/company-context.md §7 output_language. Respond to the user in the
conversation's language unless asked otherwise.
output_language (company-context §7) — honor that config declaration; do not
hardcode a language. Quote the source material in its own language when the exact
wording is the signal.ue, oe, ae, ss).type, status,
wire_state, source_channel, signal_strength, provenance_anchor) regardless
of content language.Standard runs end normally: the filed wire notes + the run report (adapter used, any degradation, idempotency delta). Do not fire a generic "anything to add?" prompt.
When a deviation specific to ingestion occurred during the run, surface it and
ask whether to fold the change back into SKILL.md / the bundled references (the
ingestion runbook, the adapter-slots doc, the generic-cleaner-floor doc, the
per-source-refinement guide, the wire-note-emission contract) before going idle.
Ingestion-specific triggers:
Name the concrete source / wire note and the reference doc the fix would touch. If nothing deviated, end without the prompt.
npx claudepluginhub cmgramse/skill-development --plugin newsroom-osGenerates brand assets: logos (55+ styles, Gemini AI), CIP mockups, HTML slides (Chart.js), banners (22 styles), SVG icons (15 styles), and social media photos. Routes to sub-skills for design tokens and UI styling.