Skill

voice

Manage Claude Code, Codex, or Opencode voice assistant — TTS with personality. Use when the user says /voice, asks about speech settings, wants to change voice persona, enable/disable TTS, or configure voice providers. Also use when user says "stop talking", "disable speech", "enable voice", or similar. Also use when user mentions voice names, accents, pacing, style, or says things like "more expressive", "use a male voice", "speak in Dutch", "slow down", "what voice is this?".

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/voice-assistant:voice-assistant

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Text-to-speech with configurable personality for Claude Code, Codex, and

Supporting Files

SKILL.md

1012 lines · ~14.1k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitJun 30, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Voice Assistant Skill

Text-to-speech with configurable personality for Claude Code, Codex, and Opencode project configuration.

Current Readiness

The voice system supports five provider families: macos, voxtral, gemini, local_tts, and omlx. In provider names, voxtral means cloud Voxtral through the Mistral API. Local Voxtral models run through provider=local_tts. The local_tts provider is a generic OpenAI-compatible HTTP adapter and has been real-E2E verified with MLX-Audio for Kokoro 4-bit, PocketTTS 4-bit, and Voxtral TTS 4-bit. Do not claim every MLX-Audio model is production-ready until that specific model has a real audio test on the user's machine.

omlx is the same OpenAI-compatible /v1/audio/speech client pointed at a separate, externally-managed oMLX server (default :8001, auth-gated) instead of the bundled mlx-audio server (local_tts, :8000). It exists as a distinct provider so both can be configured at once and switched, and so it never auto-starts a server. It is unverified for TTS until oMLX is confirmed to serve /v1/audio/speech with a TTS model loaded — see the omlx schema below.

Read references/local-tts-onboarding.md before advising users about local models, PocketTTS, device support, custom voices, or installation mode.

For the EliteExperts project-local SAMI setup, read requirements.md before installing, repairing, or handing off the skill. Run python3 <skill>/scripts/check_requirements.py --target both from the project root to verify Apple Silicon, uv, Claude CLI, Hugging Face CLI, real project-local skill copies, local sami_de persona files, Voxtral model cache, and the shared MLX-Audio server. The checker must not install anything silently; when a requirement is missing, report the missing item and ask the user before installing dependencies, downloading the model, or starting the server.

Project-local lifecycle hooks run a deliberately small SessionStart preflight: the project must explicitly define enabled, provider, active_profile, and primary_language, and the active persona must exist in the project-local host directory when project_local_persona_required is true. Vendored handoff projects should set that flag. Symlinked personal workspaces may continue to use global or built-in personas intentionally. Treat broader readiness as a markdown rubric plus check_requirements.py, not as a large programmatic test framework.

Provider Choice

Provider	Best for	Main tradeoff
`macos`	Zero-setup speech and emergency fallback	Robotic, limited personality
`gemini`	Most expressive cloud voice, Director's Notes, accents	Requires Gemini credentials/config
`voxtral`	Cloud Voxtral through the Mistral API with simple emotion-mapped voices	Requires Mistral credentials/config
`local_tts`	Local/private speech through MLX-Audio or another OpenAI-compatible server	Needs a running local server, model downloads, RAM, and real E2E testing
`omlx`	Local/private speech through a separate, externally-managed oMLX server (`:8001`, auth-gated) — co-exists with `local_tts`	Needs oMLX running with a TTS model loaded + a bearer token; no auto-start

Machine-readable provider credential requirements live in defaults/provider-requirements.json. That file contains only non-secret metadata. It must distinguish cloud Voxtral (provider=voxtral, Mistral API key) from local Voxtral (provider=local_tts, no Mistral API key unless the local server itself uses a bearer-token setting).

For Apple Silicon users who want local/private English summaries, start with Kokoro. The built-in vulcan-science-officer persona uses Kokoro 4-bit with the English bm_george voice. For German or Dutch projects, prefer local Voxtral 4-bit. Vollständige Modell/Sprach-Hardrules + Auswahl-Logik: authoritative in defaults/local-tts-models.json project_language_rules. Keep fallbacks disabled until failures are understood; this system should fail loudly during rollout.

Commands

Command	Description
`/voice`	Show current status (see Status Display below)
`/voice setup`	Guided first-time setup (interactive Q&A)
`/voice create`	Create a new persona interactively
`/voice list`	List all available personas with traits preview
`/voice switch <name>`	Switch active persona for this project
`/voice provider macos\|voxtral\|gemini\|local_tts\|omlx`	Switch voice provider
`/voice language <code>`	Persist primary project conversation language for voice/model checks
`/voice local status`	Show local TTS endpoint, model, voice, fallback, health, and last synthesis diagnostics
`/voice local check`	Check configured local TTS server health without generating audio
`/voice local doctor`	Run full local TTS readiness checks and print actionable setup notices
`/voice local test`	Generate and play a short local TTS test phrase
`/voice local models`	List OpenAI-compatible model IDs when the server supports `/v1/models`
`/voice local-models --language <code>`	Show known local TTS model/language compatibility before choosing a model
`/voice doctor`	Cross-cutting voice self-diagnostic — config, credential presence, both servers + loaded models, last synthesis, recent announcer outcomes, and observations that flag the recurring failure modes (see Voice self-diagnostic below)
`/voice credentials status`	Show redacted credential readiness by provider and target
`/voice credentials set gemini\|voxtral`	Prompt locally for a cloud-provider key and write it to a user-global credential file
`/voice credentials verify gemini\|voxtral\|local_tts`	Exit nonzero when required credentials for that provider are missing
`/voice server status`	Show shared local TTS server readiness, managed PID, model, voice, and log path
`/voice server ensure`	Reuse a ready shared server or start one detached shared MLX-Audio server
`/voice server logs`	Show recent shared MLX-Audio server logs
`/voice on`	Enable TTS (persists across sessions)
`/voice off`	Disable TTS (persists across sessions)
`/voice test`	Speak a test sentence with current persona + provider
`/voice voice`	Browse all 30 Gemini voices with traits
`/voice voice preview <name>`	Play a sample in that voice
`/voice voice <name>`	Set the active Gemini voice
`/voice style`	Show current Director's Notes (style, pacing, accent)
`/voice style <description>`	Update style/pacing/accent from description
`/voice cache warm`	Pre-generate cached audio for acknowledgments

Fresh Project Onboarding (`I want to engage voice`)

This is a markdown-driven setup conversation, not a silent wizard. Ask one decision at a time, wait for the user's answer, then act. Do not install hooks, choose a provider, start a local server, or enable voice until that exact decision has been made.

Comprehensive walkthrough: see references/onboarding/00-master-flow.md for the full ordered checklist linking persona setup, language alignment, and platform-specific persona blocks. Use that as the master conductor; this SKILL.md section covers the technical gates only (install scope, agent target, provider, language, audible test, enable).

Start by using the helper script for state:

uv run <skill>/scripts/voice.py status --target auto
uv run <skill>/scripts/voice.py doctor --target auto

Run those commands from the consuming project root, or set VOICE_ASSISTANT_PROJECT_DIR=/absolute/path/to/project when invoking helpers from the skill-development checkout. The helpers use that explicit env var, then CLAUDE_PROJECT_DIR/CODEX_PROJECT_DIR, then the current working directory, then the skill path.

The helper script is not the onboarding experience. The skill is. The script exists so the AI can inspect state, persist explicit choices, and fail loudly without inventing shell snippets. Hook installation and audio tests remain in their specialist scripts.

Required order:

Prompt 1: Install Scope

Ask exactly one:

Should I install voice for this project only, or as your user-wide default? Recommended for a fresh repo: project-only, so hooks and skill code are explicit here.

Choices:

project-only: use .agents/skills/voice-assistant and/or .claude/skills/voice-assistant; hooks point at this project copy.
user-wide: use the global skill/preferences; suitable only when the user wants the same voice behavior across projects.

For project-local hooks, the project-local skill copy is the runtime source of truth. Do not install both unless the user explicitly asks.

Prompt 2: Agent Target

Ask exactly one:

Which agent should I wire voice into: Codex, Claude Code, Opencode config, or all supported targets?

Choices:

codex: install Codex hooks only.
claude: install Claude hooks only.
opencode: distribute skill/persona/config AND install the Opencode lifecycle adapter (an ESM plugin at .opencode/plugins/voice-assistant.plugin.ts). Audible lifecycle hooks are implemented for Opencode, pending live verification — the adapter, event mapping, detached Stop dispatch, and the hook stdin/project-root handoff are tested in isolation, but an end-to-end audible run inside a live Opencode TUI / opencode run session has not yet been confirmed on a user machine. See references/opencode-harness-contract.md.
both: keep .agents/skills/voice-assistant and .claude/skills/voice-assistant mirrored except documented host-specific differences.
all: install Codex and Claude hooks, plus the Opencode lifecycle adapter and skill/persona/config distribution.

For both, persist provider and enabled state by running each host's own scripts/voice.py helper. Do not use the Codex helper to write Claude config or the Claude helper to write Codex config; config reads and writes must use the same runtime source of truth.

Prompt 3: Provider Choice

Ask exactly one:

Which voice provider do you want first? macos is fastest, gemini is most polished, voxtral is cloud Voxtral through the Mistral API, and local_tts is private/local but requires a running server.

Provider rules:

macos: zero credentials; acceptable fallback; least expressive.
gemini: cloud; needs Gemini credentials saved where hooks can read them; best Director's Notes/style support.
voxtral: cloud Voxtral through the Mistral API; needs Mistral credentials; use documented cloud Voxtral voices.
local_tts: local/private, including local Voxtral models; requires explicit model + voice, loopback server, diagnostics, and successful audible test before enabling.

Prompt 4: Project Language

Ask exactly one before selecting a local TTS model:

What is the primary language for instructions and AI replies in this project?

Persist the answer, then inspect known local model support:

uv run <skill>/scripts/voice.py language <code> --scope project
uv run <skill>/scripts/voice.py local-models --language <code>

This is a project/persona setting today; the skill does not dynamically switch local models per response. Modell/Sprach-Hardrules (z.B. Kokoro/PocketTTS-Sperre für DE, Voxtral als DE/NL-Empfehlung): authoritative in defaults/local-tts-models.json project_language_rules. Den Helper-Aufruf oben nutzen, dessen Output beachten, nicht parallel pflegen.

Local TTS Setup Gate

If the user chooses local_tts, branch to the platform-specific docs:

Persona-Side (Modell/Voice/emotion_map): siehe references/onboarding/05-platform-local-tts.md.
Server-Side (uv, MLX-Audio, shared manager, pre-download, fail-loud): siehe references/local-tts-onboarding.md.
Modell-Sprach-Kompatibilität + Hardrules (z.B. „Kokoro nicht für Deutsch"): authoritative in defaults/local-tts-models.json project_language_rules + models[]. Andere Docs verweisen.

Apple Silicon zuerst; auf Intel/Windows/Linux funktioniert local_tts mit beliebigem OpenAI-kompatiblem Server, aber MLX-Audio nicht versprechen. Hooks importieren nie MLX direkt; SessionStart reuse-t den shared Server oder fails loud.

Completion Gate

Voice is not considered engaged until:

Hooks are installed for the selected target.
The claude CLI is installed and authenticated because greeting and summary rewrites use claude -p --model sonnet before TTS.
Provider config resolves without validation errors.
For local_tts, voice.py server status shows a ready shared server or voice.py server ensure starts/reuses one; failures must produce an explicit user-facing diagnostic.
Active persona has a non-"User" user_name. Built-in professional / casual ship with "User" as a placeholder; the notification hook explicitly suppresses the name-prefix in that case, so onboarding is not complete until user_name has been set per references/onboarding/02-persona-naming-and-tone.md.
Active persona has a populated voice.<provider> block matching the chosen provider. For Gemini that means voice_name, audio_profile, scene, directors_notes (style/pacing/accent), and sample_context; for Voxtral cloud primary_voice + emotions; for local_tts model + voice + emotion_map.
If primary_language != "en": persona's acknowledgments, greeting_style, language_style, and (if used) notification_message / precompact_message are in the target language.
A short audible test succeeds through the selected provider, in the user's persona AND in the user's language.
The user confirms enabling voice.

Command mapping:

uv run <skill>/scripts/voice.py status --target codex
uv run <skill>/scripts/voice.py doctor --target codex
uv run <skill>/scripts/install_hooks.py --target codex
uv run <skill>/scripts/voice.py server status --target codex
uv run <skill>/scripts/voice.py server ensure --target codex --timeout 30
uv run <skill>/scripts/local_tts.py --target codex doctor
uv run <skill>/scripts/local_tts.py --target codex test --text "Local TTS is working."
uv run <skill>/scripts/voice.py provider local_tts --target codex --scope project --confirm-audible-test
uv run <skill>/scripts/voice.py on --target codex --scope project
uv run <skill>/scripts/test_tts.py --provider local_tts --text "Voice is working."

Custom voice/reference-audio cloning is not implemented. If requested, explain that current behavior is consent-gated but intentionally unsupported: custom_voice_consent_required for missing consent and unsupported_custom_voice even when valid consent/reference data is supplied. Do not clone, ignore the custom voice block, or silently substitute a built-in voice.

Status Display

When running /voice, show this information:

Always show:

Enabled/disabled
Active persona name + core_tone
Provider (macos, voxtral, gemini, or local_tts)

When provider is gemini, also show:

Voice name + gender + official trait (read from defaults/gemini-voices.md)
Director's Notes: style, pacing, accent (from persona's voice.gemini.directors_notes)
If persona has no voice.gemini block, note "Using defaults (Aoede, Breezy)" and offer to configure

Example output:

Voice Assistant Status
  Enabled:    yes
  Persona:    Sami (warm and personal)
  Provider:   gemini

  Gemini TTS:
    Voice:    Aoede (Female, Breezy)
    Style:    Warm, witty, affectionate. Light British charm...
    Pacing:   Varied — brisk for wins, slower for reassurance
    Accent:   Modern British, London-adjacent

When provider is local_tts, also show:

Endpoint and whether the resolved host is loopback (127.0.0.1, ::1, or localhost)
Model and voice from the active persona's voice.local_tts block
Response format (wav, mp3, flac, ogg, or m4a)
Timeout and retry settings (connect_timeout_seconds, read_timeout_seconds, retry_count)
Fallback state: enabled/disabled, fallback provider, and eligible error types
Custom voice state: disabled, consent-required, unsupported, or future supported state
Last health check: status (not_checked, disabled, healthy, reachable_warning, or failed), endpoint, HTTP status, latency, and error type when present. Show disabled when health.enabled: false is configured.
Last synthesis: status, response format, byte count, playback/cache result, latency, and fallback use
Last error: error type, exception class, message, and request ID when available
Shared server status: ready/not-ready, managed PID when known, log file path, and whether the server was reused or started.
If local_tts is configured and a loopback server is unreachable, say so plainly, show the shared server manager command, and ask whether the user wants you to start/reuse it or will start it themselves.

Example output:

Local TTS:
  Endpoint:        http://127.0.0.1:8000/v1/audio/speech (loopback: yes)
  Model:           mlx-community/Kokoro-82M-4bit
  Voice:           af_sarah
  Response format: wav
  Timeout/retry:   connect 2s, read 20s, retries 0
  Fallback:        disabled (provider: macos; on: connection_error, timeout)
  Custom voice:    disabled
  Last health:     not checked
  Last synthesis:  not checked
  Last error:      none

Voice Browser (`/voice voice`)

When running /voice voice (no argument), show all 30 Gemini voices grouped by gender. Read defaults/gemini-voices.md for the complete voice table with official traits.

Authority: The trait labels in the block below mirror defaults/gemini-voices.md. If a trait drifts, fix it in defaults/gemini-voices.md first — that file is the single source of truth. Do not edit the labels here in isolation.

Display format:

Female voices:
  Achernar (Soft) · Aoede* (Breezy) · Autonoe (Bright)
  Callirrhoe (Easy-going) · Despina (Smooth) · Erinome (Clear)
  Gacrux (Mature) · Kore (Firm) · Laomedeia (Upbeat)
  Leda (Youthful) · Pulcherrima (Forward) · Schedar (Even)
  Sulafat (Warm) · Vindemiatrix (Gentle) · Zephyr (Bright)

Male voices:
  Achird (Friendly) · Algenib (Gravelly) · Algieba (Smooth)
  Alnilam (Firm) · Charon (Informative) · Enceladus (Breathy)
  Fenrir (Excitable) · Iapetus (Clear) · Orus (Firm)
  Puck (Upbeat) · Rasalgethi (Informative) · Sadachbia (Lively)
  Sadaltager (Knowledgeable) · Umbriel (Easy-going) · Zubenelgenubi (Casual)

* = currently active

Say "preview <name>" to hear a sample, or "<name>" to switch.

To preview a voice: Run uv run <skill>/scripts/preview_voice.py --voice-name <name>. To set a voice: Run uv run <skill>/scripts/update_gemini_config.py --persona <id> --voice-name <name>, then speak a test phrase.

Style Configuration (`/voice style`)

/voice style (no argument) — Show the current Director's Notes:

Read the persona's voice.gemini.directors_notes object
Display style, pacing, and accent fields
If no gemini block, show "No Gemini style configured" and offer to create one

/voice style <description> — Update style conversationally:

Parse the user's description into style/pacing/accent fields
Run uv run <skill>/scripts/update_gemini_config.py with the new values
Speak a test phrase to confirm

Natural Language Triggers

Basic controls

"enable voice" / "turn on voice" / "start talking"
"disable voice" / "turn off voice" / "stop talking" / "be quiet"
"switch to sami" / "use professional persona"

Gemini voice and style

"more expressive" / "more energy" / "more personality" — Read current directors_notes.style, add more vivid descriptors, save via update_gemini_config.py, test
"less expressive" / "tone it down" / "more subtle" — Simplify style descriptors
"I want a male voice" / "use a female voice" — Show filtered voice list from gemini-voices.md, offer preview
"change voice" / "different voice" / "preview voices" — Show /voice voice browser
"what voice is this?" / "who is speaking?" — Show current Gemini voice + style + pacing + accent

Pacing and accent

"slow down" / "speak slower" — Update directors_notes.pacing to slower variant, save
"faster" / "speed up" — Update pacing to brisker variant
"speak in Dutch" / "switch to French" / "use Japanese" — Update directors_notes.accent with the language directive (e.g., "Dutch, natural Netherlands accent"). Check gemini-voices.md for BCP-47 code support.
"sound more British" / "American accent" / "Australian" — Update accent field specifically

How to handle style/voice changes

Read current persona from ~/.codex/voice/profiles/<id>.json with fallback to ~/.claude/voice/profiles/<id>.json
Modify the voice.gemini.directors_notes or voice.gemini.voice_name fields
Write back via uv run <skill>/scripts/update_gemini_config.py
Speak a test phrase: "How's this? I've adjusted the [style/voice/pacing]."
Ask: "Better? Want me to tweak anything else?"

How It Works

When enabled, Codex speaks its responses using a configurable personality. The Stop hook extracts Codex's last response, sends it to the configured summarizer backend (Claude Haiku/Sonnet by default, or an opt-in local model — see below) for persona-flavored rewriting with emotion/style/pacing/ accent selection, and speaks via macOS say, Mistral Voxtral, Google Gemini TTS, or local HTTP TTS.

If the rewrite fails, the Stop hook degrades gracefully (it does not announce a raw failure): it speaks a short un-stylized excerpt of the actual message and writes a visible stderr diagnostic with the failure reason.

Summary/greeting rewrite backend (`providers.summarizer`)

The Stop summary and SessionStart greeting rewrites run through a config-driven provider (utils/summarizer.py). Two backends:

claude_cli (default, shipped) — the legacy claude -p path (--model haiku for the Stop summary, --model sonnet for the greeting). Unchanged behavior.
openai_compatible (opt-in, per project) — POSTs to a local OpenAI-compatible /v1/chat/completions (e.g. an oMLX server running a fast local model such as gemma-4-12B-it-OptiQ-4bit). Reach for this when running several projects in parallel makes the Anthropic CLI calls contend and time out — a local model has no shared rate limit and returns in seconds.

By default the local path reuses the same persona-tuned rewrite prompt the Claude path builds (prompt_style: "shared") — there is no second prompt to maintain — and parses the response leniently: prose that skips the JSON envelope is still used as the spoken line, so the local hop succeeds rather than falling through. prompt_style: "minimal" is an optional stripped-down prompt kept purely as a per-platform A/B tuning lever (see scripts/test_summarizer.py). On local failure the chain runs a short-budgeted claude -p fallback (claude_fallback_timeout_seconds, default 12s — deliberately not the full 110s, which would recreate the timeout), then the hook's terminal excerpt fallback. Set allow_claude_fallback: false for a fully-offline chain.

Opt a project in — in <project>/.claude/tts_config.json (deep-merges over the shipped defaults, so only the changed keys are needed):

{
  "providers": {
    "summarizer": {
      "type": "openai_compatible",
      "openai_compatible": {
        "base_url": "http://127.0.0.1:8001",
        "model": "gemma-4-12B-it-OptiQ-4bit",
        "credential_key": "omlx_api_key"
      }
    }
  }
}

The bearer token (oMLX is auth-gated) is read from ~/.skills/voice-assistant/credentials.json under credential_key via the R50 loader — never place the token in project config. Store it with scripts/setup_credentials.py. base_url is loopback-only by default (set require_loopback: false to allow a remote host); redirects and URL-embedded credentials are refused, and header values redact in diagnostics. Leave type at claude_cli to keep the Anthropic path — openai_compatible never becomes a silent global default.

Reasoning models — disable "thinking". A reasoning/thinking model (e.g. gemma-4-12B-it on oMLX) spends the token budget on chain-of-thought and returns message.reasoning_content with an empty message.content (the rewrite then fails closed and falls back). Disable thinking server-side (oMLX: turn off the model's thinking; OpenAI-compatible flag chat_template_kwargs: {"enable_thinking": false} or reasoning_effort: "none" also works) — measured ~2–5s clean output vs ~30s+ with thinking on. Note model IDs are server-specific: oMLX uses the bare name (gemma-4-12B-it-OptiQ-4bit), other MLX servers use the mlx-community/… form.

Detached announcer (Stop summary never blocks the hook)

Local TTS synthesis of a summary-length utterance is slow — Qwen3/Ryan via oMLX takes ~30–40s for ~200 chars, then playback adds ~15–20s — and the rewrite calls a model that serializes across concurrent projects on a shared local model. Running all of that inline would (a) trip the provider's HTTP read timeout, aborting long synths so the summary never plays, and (b) blow the harness hook budget.

So the Stop hook is a thin dispatcher: after its cheap guards + cooldown it hands the raw input to a detached worker (scripts/speak_worker.py, spawned by utils/announce.py with start_new_session=True) and returns immediately. The worker owns the whole slow chain — transcript extract → produce_spoken rewrite → TTS synth → playback — and survives the hook's exit, so audio duration is fully decoupled from the hook budget and concurrent Stop hooks never queue model calls inside it. The SessionStart greeting (short, once per session) detaches only its playback via announce.speak_detached. Every attempt appends a durable line to ~/.<host>/voice/logs/announcer.log (result=ok|fail|no_text|…); the per-provider status/omlx_tts.json keeps only the most recent synthesis.

Two settings make this robust: providers.omlx.read_timeout_seconds (default 120s, comfortably above worst-case synth) bounds only the detached worker, and announce.cap_spoken_text caps a runaway summary.

Failures are heard — in the project's own voice

A broken pipeline must be heard, not silently logged. When synth/playback fails the worker speaks a short audit notice via speak.speak_audit_failure — and it uses the active persona's voice.macos voice + rate (macOS say, which needs no key/quota and works even when the failing provider is the real TTS). So a failure in a Vulcan project is heard in Daniel's voice and a failure in a Californian persona in hers — when several projects run at once you can tell which one broke from the voice alone. Give every persona a voice.macos block (voice + rate) so its failures are identifiable; without one the notice uses the system default voice. (The rare "couldn't even launch the announcer" case is voiced from the hook the same way.)

Voice self-diagnostic (`/voice doctor`)

scripts/voice_doctor.py gathers factual evidence into JSON — config summary, credential presence (booleans, never values), each configured server's reachability

loaded models, the most recent synthesis + last error, recent announcer.log outcomes, hook-install state, and a list of plain observations that encode the failure modes we keep re-deriving (TTS read timeout too low for slow local synth, configured server unreachable, recent result=fail, stale hook install). Run it and interpret the evidence into a health read + concrete fixes. Per R32 it emits facts and observations only — there is no numeric health score and no PASS/FAIL gate; the judgment is yours, over the evidence. This is the runtime half of the self-improvement loop: each hard-won lesson becomes a live, visible check instead of a surprise the next time.

Two checks make that loop concrete (added 2026-06-10 after a missing-dependency incident — see references/retro-2026-06-10-doctor-gaps.md):

C1 synth_path_dep_coverage — verifies the active provider's SDK is declared in the PEP 723 deps of every script that actually synthesizes (the detached scripts/speak_worker.py plus the hook entries), not merely that a credential exists. Catches the class where a refactor moves synthesis into a new entry script but leaves its inline deps short (e.g. worker declared only requests while Gemini needs google-genai). The synth_path_dep_coverage block carries the per-script matrix; a shortfall also emits a provider_dep_missing_on_synth_path observation.
C2 failure_signature — classifies announcer-log fail-lines and status.*.last_error text against the living FAILURE_SIGNATURES registry in voice_doctor.py into a typed cause + concrete fix, so the doctor interprets a known error string instead of dumping it raw. Add a row to that registry (and a runbook entry) whenever a new failure mode is root-caused — that is how the loop stays closed.

The checks field in the output is the tracked manifest of everything the doctor covers.

Gemini TTS

Gemini TTS (gemini-3.1-flash-tts-preview) is an LLM that generates audio directly. Prompts follow Google's official TTS prompting guide structure:

# AUDIO PROFILE: Character Name
### DIRECTOR'S NOTES
Style: Warm, witty, affectionate...
Pacing: Varied — brisk for wins, slower for reassurance
Accent: Modern British
### TRANSCRIPT
The actual text to speak.

The rewriter (Sonnet) produces {text, emotion, style, pacing, accent}. The _build_gemini_prompt() function in speak.py renders the structured prompt. Each TTS call is independent (32k token context window, per-call).

Gemini also supports inline audio tags ([whispers], [excited], [warmly]) as delivery modifiers; when a persona declares voice.gemini.preferred_audio_tags, the rewriter emits a transcript_with_tags field with sparse inline tags. Details: 03-platform-gemini.md §Audio Tags.

Custom personas without a voice.gemini block fall back to GEMINI_DEFAULTS (Aoede voice, no style).

Proactive Gemini configuration

Wenn aktive Persona keinen voice.gemini-Block hat, proaktiv das Onboarding aus references/onboarding/03-platform-gemini.md (Steps G1–G6) anbieten statt stillschweigend GEMINI_DEFAULTS zu nutzen.

Hook Events

SessionStart — Greet the user in the active persona's style
UserPromptSubmit — Quick acknowledgment phrase (no LLM call)
Stop — Persona-voiced summary of what Codex did; for Gemini provider with preferred_audio_tags, also emits transcript_with_tags
PreCompact — Claude-only context memory warning

When provider=local_tts, SessionStart first runs a bounded shared-server readiness check. If a compatible loopback server is already running, it reuses it. If none is running, it starts one detached shared MLX-Audio server. If the server cannot become ready, SessionStart writes the diagnostic and skips the greeting/rewrite work instead of silently falling back.

Codex currently installs only SessionStart, UserPromptSubmit, and Stop. Do not install PreCompact for Codex: current Codex hook review UX can surface it as needing review without offering a reliable approval path. Claude also supports Notification, PreCompact, and SubagentStop hooks. The shared defaults include a harmless notification flag for Claude/project compatibility, but the Codex installer does not wire a Notification hook.

Opencode installs the lifecycle adapter plugin, which maps Opencode's native event stream onto the same Python hooks:

Opencode event	Voice hook	Notes
`session.created`	SessionStart	Fires once per new session
user message text part (`message.part.updated`)	UserPromptSubmit	First text part of a new user message
`session.status` with `status.type === "idle"`	Stop	Reliable in both TUI and `opencode run`; `session.idle` is deprecated. Passes the current turn's assistant text as `last_assistant_message`. Dispatch depends on the client (via `OPENCODE_CLIENT`): interactive desktop/TUI → plain awaited spawn (loop stays alive); `opencode run` one-shot / unknown client → detached via `opencode/detach_hook.py` so the 35-75s `claude -p` rewrite survives teardown. Detached is the safe default. Skips cleanly if the turn has no assistant text
`session.compacted`	PreCompact	Opencode surfaces compaction natively

Opencode has no Notification/SubagentStop equivalents wired today (no native event maps cleanly); those remain Claude-only. Full contract: references/opencode-harness-contract.md.

Install Codex hooks with:

uv run .agents/skills/voice-assistant/scripts/install_hooks.py --target codex

Install the Opencode lifecycle adapter with:

uv run .opencode/skills/voice-assistant/scripts/install_hooks.py --target opencode

This writes the ESM plugin into <project>/.opencode/plugins/ (OpenCode cannot reference a local plugin by path, so placement there is mandatory) and adopts the project's existing .claude/.codex voice config so Opencode runs the identical voice; if no config exists anywhere it points you at onboarding. The plugin is installed as a real-file copy (never a symlink — Bun resolves a symlink's target, which would break the plugin's import.meta.dir skill-path anchor), so plugin edits need a re-install; hook/config edits are live via the project skill symlink.

Hook Message Localization

Notification and PreCompact speak fixed-template messages — no LLM rewrite. Their text is resolved by utils/locales.py:pick_message() using a three-tier priority chain:

Persona override — persona["<event>_message"] (e.g. notification_message, precompact_message). Single string or list of variants (random.choice picks one per call so repeated messages don't feel monotonous).
Locale default — .agents/locales/<primary_language>.json (e.g. .agents/locales/de.json). Same string-or-list shape.
English fallback — .agents/locales/en.json. Always present.

Locales ship inside the skill under .agents/locales/ and are accessible to any AI that loads this skill via the standard skill mount points. To add a new language, drop a <lang>.json file in .agents/locales/ with the same keys (notification, precompact) — no hook edits needed.

For a German-speaking project, set primary_language: "de" in the project's tts_config.json; the German locale defaults will be used automatically for any persona that doesn't override.

SessionStart and Stop use full LLM rewrites and follow the persona's voice naturally — they don't need locale files. SubagentStop plays only a chime. UserPromptSubmit reads acknowledgments directly from the persona's acknowledgments array (translate per-persona).

Local TTS Setup Flow (`/voice setup` with local_tts provider)

Cross-Ref only. SKILL.md beschreibt den Provider-Wahl-Gate; die Setup-Schritte stehen anderswo:

Persona-Side-Conversation (Modell, Voice, emotion_map, Sprach-Coordination): references/onboarding/05-platform-local-tts.md.
Server-Side-Technik (uv, MLX-Audio, shared manager, pre-download, missing-server-notice, fail-loud-rules): references/local-tts-onboarding.md.
Hardrules pro Sprache: defaults/local-tts-models.json project_language_rules.

Gemini Setup Flow (`/voice setup` with Gemini provider)

Persona-Side-Onboarding-Conversation für Gemini siehe references/onboarding/03-platform-gemini.md (Steps G1–G6 + Pre-Flight-Checks). Voice-Katalog + offizielle Traits: defaults/gemini-voices.md.

Configuration

Tier	Path	Purpose
Project	`.codex/tts_config.json`	Per-project Codex settings
Project	`.claude/tts_config.json`	Per-project Claude fallback
Project	`.opencode/tts_config.json`	Per-project Opencode voice settings
Shared global	`~/.skills/voice-assistant/preferences.json`	Host-agnostic user defaults
Global	`~/.codex/voice/preferences.json`	User-wide Codex defaults
Global	`~/.claude/voice/preferences.json`	User-wide Claude fallback
Global	`~/.config/opencode/voice/preferences.json`	User-wide Opencode defaults
Shared global	`~/.skills/voice-assistant/profiles/*.json`	Custom personas available to all hosts
Global	`~/.codex/voice/profiles/*.json`	Custom Codex personas
Global	`~/.claude/voice/profiles/*.json`	Custom Claude fallback personas
Shared global	`~/.skills/voice-assistant/credentials.json`	Recommended user-level API keys (Voxtral, Gemini, local bearer tokens)
Global	`~/.codex/voice/credentials.json`	Optional Codex-specific API key overrides
Global	`~/.claude/voice/credentials.json`	Optional Claude-specific API key overrides
Global	`~/.config/opencode/voice/credentials.json`	Optional Opencode-specific API key overrides
Env var	`GEMINI_API_KEY`	Gemini API key fallback when no file key exists
Env var	`MISTRAL_API_KEY`	Mistral/Voxtral API key fallback when no file key exists
Defaults	`<skill>/defaults/settings.json`	Fallback settings
Defaults	`<skill>/defaults/personas/*.json`	Built-in personas
Locales	`<skill>/.agents/locales/en.json`	English message templates for `Notification` + `PreCompact` (always present, ultimate fallback)
Locales	`<skill>/.agents/locales/<lang>.json`	Localized message templates loaded when `config.primary_language` matches (e.g. `de.json`, `nl.json`)

Credential policy

Do not store API keys in the skill folder, project config, persona JSON, hook settings, or repository files. The workable second-best credential model is:

Put user-wide shared keys in ~/.skills/voice-assistant/credentials.json.
Use host-specific credential files only for deliberate overrides.
Use env vars only as a process-level override.
Keep project configs limited to provider/model/persona choices.
Do not auto-read legacy shared stores such as ~/.voice-assistant or ~/.iurfriend-skills as silent fallbacks. If a legacy credentials file exists, ask the user before copying or moving secrets into the shared skill root and keep the resulting file mode at 600.

Credential resolution merges the shared file first, then the selected host's credential file, then process environment values when file keys are absent. Host-specific keys override shared keys for that host only.

For managed installs and tests, VOICE_ASSISTANT_SHARED_DIR may point at an explicit shared directory. VOICE_ASSISTANT_SKILLS_ROOT may point at a different parent root; the default shared directory remains ~/.skills/voice-assistant.

For onboarding, never ask the user to paste an API key into chat. Use the local helper instead:

uv run <skill>/scripts/voice.py credentials status --target auto
uv run <skill>/scripts/voice.py credentials set gemini --target shared
uv run <skill>/scripts/voice.py credentials set voxtral --target shared
uv run <skill>/scripts/voice.py credentials verify gemini --target auto

credentials set voxtral stores mistral_api_key for cloud Voxtral through the Mistral API. It is not used for local Voxtral under provider=local_tts. The helper prompts in the local terminal with hidden input and writes only to a user-global credential file with restricted file permissions when supported by the filesystem.

Installation and source of truth

The skill can be installed globally for the user or copied into a project. When project hooks point at .agents/skills/voice-assistant, that project-local copy is the code that actually runs. Global preferences and profiles still merge in, but runtime behavior comes from the hooked project skill. Keep the Codex copy (.agents/skills/voice-assistant) and Claude copy (.claude/skills/voice-assistant) synchronized unless an agent-specific hook difference is intentional.

Opencode does not use the Claude/Codex hook JSON formats. It loads ESM plugins from .opencode/plugins/ via its Bun runtime. The voice-assistant Opencode lifecycle adapter is .opencode/plugins/voice-assistant.plugin.ts, installed by install_hooks.py --target opencode. It subscribes to Opencode's event stream and dispatches the SAME project-local Python hooks the Claude/Codex installers use (via scripts/hook_wrapper.sh + stdin JSON) — it adds no voice logic of its own, so the Python hooks stay the single runtime source of truth. For Opencode, keep .opencode/skills/voice-assistant, .opencode/tts_config.json, and .opencode/agents/*.md synchronized with the shared project persona, and run the installer to (re)generate the adapter. Full harness contract + event mapping: references/opencode-harness-contract.md.

Gemini persona schema

voice.gemini-Block-Schema authoritative in references/onboarding/03-platform-gemini.md §The voice.gemini block. Optionales preferred_audio_tags-Feld (Persona-Tag-Bias für den Rewriter) ebd. §Audio Tags. Gelebte Beispiele: defaults/personas/*.json.

Local TTS provider schema

Local TTS is a generic OpenAI-compatible HTTP adapter. Hooks never import MLX, load models directly, or choose a hidden model or voice. The user must configure both providers.local_tts transport settings and each persona's voice.local_tts model/voice values before switching the provider. When provider=local_tts, SessionStart may run the shared server manager to reuse or start a detached bundled MLX-Audio server after making failures visible.

Project or global settings:

{
  "provider": "local_tts",
  "primary_language": "en",
  "providers": {
    "local_tts": {
      "base_url": "http://127.0.0.1:8000",
      "endpoint": "/v1/audio/speech",
      "models_endpoint": "/v1/models",
      "api_key_env": null,
      "credential_key": null,
      "headers": {},
      "timeout_seconds": 20,
      "connect_timeout_seconds": 2,
      "read_timeout_seconds": 20,
      "retry_count": 0,
      "response_format": "wav",
      "request_format": "openai_audio_speech",
      "response_encoding": "binary",
      "server": {
        "mode": "shared",
        "auto_start_on_session_start": true,
        "startup_timeout_seconds": 30,
        "session_start_timeout_seconds": 8
      },
      "health": {
        "enabled": true,
        "method": "GET",
        "endpoint": "/health",
        "timeout_seconds": 2,
        "healthy_statuses": [200, 204],
        "treat_404_as_reachable_warning": true
      },
      "fallback": {
        "enabled": false,
        "provider": "macos",
        "on": ["connection_error", "timeout"]
      },
      "diagnostics": {
        "status_filename": "local_tts.json",
        "redact_headers": ["authorization", "x-api-key"]
      }
    }
  }
}

Persona-Side-voice.local_tts-Schema (model/voice/emotion_map/custom_voice) authoritative in references/onboarding/05-platform-local-tts.md §The voice.local_tts block. PocketTTS-Model-IDs + Voice-Liste in references/local-tts-onboarding.md §PocketTTS.

Switch explicitly with /voice provider local_tts or by setting "provider": "local_tts" in .codex/tts_config.json or the global Codex voice preferences. The resolver intentionally has no hidden Kokoro or af_heart defaults: missing model returns missing_model, and missing voice returns missing_voice.

Custom voice/reference-audio support is intentionally blocked in Phase 2. A persona with voice.local_tts.custom_voice.enabled: true must return unsupported_custom_voice after consent/reference-audio validation; it must not silently clone a voice or fall back to an implicit voice.

oMLX provider schema

The omlx provider reuses the entire local_tts synth/play/cache/fallback engine; its config and persona blocks mirror local_tts, just under the omlx keys and pointed at the oMLX server (default :8001).

{
  "provider": "omlx",
  "primary_language": "en",
  "providers": {
    "omlx": {
      "base_url": "http://127.0.0.1:8001",
      "endpoint": "/v1/audio/speech",
      "credential_key": "omlx_api_key",
      "diagnostics": { "status_filename": "omlx_tts.json" }
    }
  }
}

The active persona must carry a voice.omlx block whose model + voice the loaded oMLX TTS model actually serves:

{ "voice": { "omlx": { "model": "mlx-community/Voxtral-4B-TTS-2603-mlx-4bit", "voice": "<a voice the model serves>", "response_format": "wav" } } }

Differences from local_tts: no server block and no auto-start (oMLX is run by its own app/CLI, e.g. omlx serve … --port 8001), plus a separate omlx_tts.json status file. The bearer token (oMLX is auth-gated) resolves from ~/.skills/voice-assistant/credentials.json via credential_key — store it with scripts/setup_credentials.py, never in project config.

Verification gate — do this before relying on omlx for speech. The provider ships regardless, but oMLX may only have an LLM (e.g. gemma) loaded:

# 1) confirm a TTS model is loadable (auth-gated)
curl -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8001/v1/models
# 2) confirm /v1/audio/speech actually returns audio bytes
curl -s -H "Authorization: Bearer $TOKEN" -H 'Content-Type: application/json' \
  -d '{"model":"<tts-model-id>","input":"oMLX speech test.","voice":"<voice>"}' \
  http://127.0.0.1:8001/v1/audio/speech --output /tmp/omlx_test.wav && file /tmp/omlx_test.wav

If only the LLM is loaded, load a TTS model first (via omlx serve / the oMLX app) before switching provider to omlx. The Kokoro local_tts path (:8000) is unaffected by configuring omlx.

TTS model lenses (`defaults/tts-lenses.json`)

Different TTS models behave differently along three axes — voice strategy (named voices vs zero-shot clone), emotion (voice-map vs instruct-scene vs none), and transport (oMLX :8001 vs mlx-audio :8000). A lens is a per-model preset that encodes exactly that, so a persona can pick a model by name instead of restating its quirks. A persona opts in with voice.<provider>.lens: "<name>"; speak() resolves it and maps the turn's emotion onto the model's controls (no-op when no lens is set). The default voice across lenses is a happy, cheerful female.

{ "voice": { "omlx": { "lens": "voxtral-omlx" } } }

Lens	Transport	Emotion mechanism	Status
`voxtral-omlx`	oMLX `:8001`	voice-map (emotion → a named voice; `cheerful_female`, `de_`, `nl_`, …)	✅ verified
`pocket-omlx`	oMLX `:8001`	none (single voice)	✅ verified
`kokoro-local`	mlx-audio `:8000`	none (voice selection)	✅ verified
`higgs-omlx`	oMLX `:8001`	instruct-scene (emotion → natural-language `instruct`)	⏳ gated on oMLX (Higgs adapter bug)

voice_map lenses set the voice from the emotion (and expose emotion_map so the synth layer re-resolves per turn). instruct_scene lenses (Higgs) put a scene string in extra_body.instruct and thread the reference-clone ref_audio/ref_text (base64 audio) + temperature — that path is blocked by an oMLX-side bug (Kokoro via oMLX likewise fails on oMLX's torch; both keep working on their healthy paths). Lens resolution + emotion mapping live in utils/lens.py; the registry is defaults/tts-lenses.json.

Local TTS safe server example

Shared-Server-Runtime, MLX-Audio-Pin-Commit, Voxtral-Pre-Download, fail-loud-rules + missing-server-notice authoritative in references/local-tts-onboarding.md §Shared Server Runtime

§Local Voxtral + §Missing Server Notice + §Fail-Loud Rules. Normal-Bootstrap:

uv run <skill>/scripts/voice.py server ensure --timeout 30

Reference files

<skill>/requirements.md — Projektlokale EliteExperts-Requirements fuer SAMI mit Apple-Silicon-, MLX-Audio-, Voxtral-, Persona- und Setup-Gates
<skill>/references/troubleshooting-and-audit.md — Start here when voice misbehaves. Ordered debugging runbook (reproduce the Stop hook with stderr, failure-signature table, rewriter/Gemini/credential checks) plus a copy-paste audit rubric to certify a healthy install. Captures the known failure modes (rewrite timeout, MCP-overflow on Haiku, preview-model throttle) and the fail-loud design invariants.
<skill>/references/timeouts.md — Single reference for every timeout. Knob table (timeouts.* in settings.json / per-project tts_config.json — greeting_rewrite_seconds, summary_rewrite_seconds, rewrite_hook_headroom_seconds), the values derived from them (hook ceilings = budget + headroom, via utils/timeouts.py), the fixed safety ceilings that must NOT be tuned, and the known no-timeout gap on the Gemini synth call. Read before changing any timeout.
<skill>/references/opencode-harness-contract.md — Opencode lifecycle adapter: the harness plugin contract, event → voice-hook mapping, stdin/env/project-root preservation, install/uninstall, and the supported/unsupported event matrix
<skill>/references/local-tts-onboarding.md — Local provider readiness, model guide, setup steps, installation model, custom voice position, and fail-loud rules
<skill>/references/voxtral-best-practices.md — Voxtral TTS local prompting best practices (Voice-as-an-Instruction paradigm, 20-preset voice catalog verified against Hugging Face model card, CC BY-NC 4.0 production caveat, provider-asymmetry vs. Gemini audio tags)
<skill>/references/onboarding/00-master-flow.md — Master conversation flow for onboarding a fresh user end-to-end (technical + persona + platform); start here for new-user setup
<skill>/references/onboarding/01-persona-anatomy.md — Field-by-field reference for every persona JSON key (built-in vs user-tier, validation rules, anti-patterns)
<skill>/references/onboarding/02-persona-naming-and-tone.md — Conversation script for naming, user_name, relationship, traits, language alignment
<skill>/references/onboarding/02b-persona-from-character-concept.md — Generator-Pfad: vollständige Persona aus einem Charakter-Konzept ableiten (Trigger, Mapping-Heuristik pro Feld, Worked Examples, Validation-Loop)
<skill>/references/onboarding/03-platform-gemini.md — Gemini-specific persona block (voice_name, audio_profile, scene, directors_notes, sample_context) with onboarding prompts
<skill>/references/onboarding/04-platform-voxtral.md — Voxtral cloud persona block (primary_voice + emotion-mapping) with onboarding prompts
<skill>/references/onboarding/05-platform-local-tts.md — Local TTS persona block (model + voice + emotion_map + language coordination) with onboarding prompts
<skill>/defaults/local-tts-models.json — AI-facing local model/language compatibility table with E2E verification status
<skill>/defaults/gemini-voices.md — All 30 voices with official traits, 87 supported languages with BCP-47 codes, pacing examples, prompting guide
<skill>/defaults/voxtral-voices.md — Voxtral voice reference
<skill>/.agents/locales/*.json — Hook message templates per language (Notification + PreCompact); resolved by utils/locales.py:pick_message() with priority persona override > locale > English fallback. Add a new language by dropping <lang>.json with the same keys; no hook edits required.

Helper scripts

<skill>/scripts/check_requirements.py --target both — Projektlokaler Requirements-Check fuer EliteExperts/SAMI; installiert nichts automatisch und gibt konkrete Nutzerfragen bei fehlenden Voraussetzungen aus
<skill>/scripts/preview_voice.py — Preview a Gemini voice: uv run preview_voice.py --voice-name <name> [--text "sample"]
<skill>/scripts/update_gemini_config.py — Update persona Gemini config: uv run update_gemini_config.py --persona <id> [--voice-name X] [--style "..."] [--pacing "..."] [--accent "..."]
<skill>/scripts/warm_cache.py — Pre-generate acknowledgment audio cache: uv run warm_cache.py [--persona <id>]
<skill>/scripts/voice.py status|doctor|on|off|provider — Thin AI-facing status and switching helper
<skill>/scripts/voice.py local-models --language <code> — Show local TTS model/language compatibility before choosing local_tts model/voice
<skill>/scripts/test_tts.py — Test TTS with current config
<skill>/scripts/voice.py server status|ensure|start|stop|logs — Manage the shared local TTS server via the AI-facing helper
<skill>/scripts/voice_server.py status|ensure|start|stop|logs — Direct shared local TTS server manager; PID/logs live in ~/.skills/voice-assistant/local-tts-server/
<skill>/scripts/mlx_audio_server.py --host 127.0.0.1 --port 8000 — Run a local MLX-Audio OpenAI-compatible server with voice-assistant-safe defaults
<skill>/scripts/local_tts.py status — Read durable local TTS diagnostics
<skill>/scripts/local_tts.py check — Check local server health
<skill>/scripts/local_tts.py doctor — Check readiness and print missing-server onboarding notices
<skill>/scripts/local_tts.py test --text "Hello" — Generate and play a test phrase
<skill>/scripts/local_tts.py models — List models when supported
<skill>/scripts/test_tts.py --provider local_tts — Exercise local TTS through normal provider dispatch

voice

Invocation

Context Preview

Supporting Files

SKILL.md

voice

Invocation

Context Preview

Supporting Files

SKILL.md

Voice Assistant Skill

Current Readiness

Provider Choice

Commands

Fresh Project Onboarding (I want to engage voice)

Prompt 1: Install Scope

Prompt 2: Agent Target

Prompt 3: Provider Choice

Prompt 4: Project Language

Local TTS Setup Gate

Completion Gate

Status Display

Voice Browser (/voice voice)

Style Configuration (/voice style)

Natural Language Triggers

Basic controls

Gemini voice and style

Pacing and accent

How to handle style/voice changes

How It Works

Summary/greeting rewrite backend (providers.summarizer)

Detached announcer (Stop summary never blocks the hook)

Failures are heard — in the project's own voice

Voice self-diagnostic (/voice doctor)

Gemini TTS

Proactive Gemini configuration

Hook Events

Hook Message Localization

Local TTS Setup Flow (/voice setup with local_tts provider)

Gemini Setup Flow (/voice setup with Gemini provider)

Configuration

Credential policy

Installation and source of truth

Gemini persona schema

Local TTS provider schema

oMLX provider schema

TTS model lenses (defaults/tts-lenses.json)

Local TTS safe server example

Reference files

Helper scripts

Similar Skills

Voice Assistant Skill

Current Readiness

Provider Choice

Commands

Fresh Project Onboarding (I want to engage voice)

Prompt 1: Install Scope

Prompt 2: Agent Target

Prompt 3: Provider Choice

Prompt 4: Project Language

Local TTS Setup Gate

Completion Gate

Status Display

Voice Browser (/voice voice)

Style Configuration (/voice style)

Natural Language Triggers

Basic controls

Gemini voice and style

Pacing and accent

How to handle style/voice changes

How It Works

Summary/greeting rewrite backend (providers.summarizer)

Detached announcer (Stop summary never blocks the hook)

Failures are heard — in the project's own voice

Voice self-diagnostic (/voice doctor)

Gemini TTS

Proactive Gemini configuration

Hook Events

Hook Message Localization

Local TTS Setup Flow (/voice setup with local_tts provider)

Fresh Project Onboarding (`I want to engage voice`)

Voice Browser (`/voice voice`)

Style Configuration (`/voice style`)

Summary/greeting rewrite backend (`providers.summarizer`)

Voice self-diagnostic (`/voice doctor`)

Local TTS Setup Flow (`/voice setup` with local_tts provider)

Gemini Setup Flow (`/voice setup` with Gemini provider)

TTS model lenses (`defaults/tts-lenses.json`)

Fresh Project Onboarding (`I want to engage voice`)

Voice Browser (`/voice voice`)

Style Configuration (`/voice style`)

Summary/greeting rewrite backend (`providers.summarizer`)

Voice self-diagnostic (`/voice doctor`)

Local TTS Setup Flow (`/voice setup` with local_tts provider)

Gemini Setup Flow (`/voice setup` with Gemini provider)

TTS model lenses (`defaults/tts-lenses.json`)