This skill provides voice interaction capabilities for AI assistants. This skill should be used when users mention "voice mode", "voicemode", "speak to me", "talk to me", "have a voice conversation", "converse", ask to "check voice service status", "start Whisper", "start Kokoro", ask about "voice configuration", mention "STT", "TTS", "speech-to-text", "text-to-speech", or need help with "voice setup", "voice troubleshooting", or "voice preferences".
This skill inherits all available tools. When active, it can use any tool Claude has access to.
Voice interaction capabilities for Claude Code - enabling natural conversations through speech-to-text (STT) and text-to-speech (TTS) services.
When a user wants to use voice mode for the first time, guide them through these steps:
First, check if voice services are already running:
# Check STT service (Whisper)
voicemode:service("whisper", "status")
# Check TTS service (Kokoro)
voicemode:service("kokoro", "status")
If services aren't installed, guide the user to install them:
Prerequisites:
Installation:
# Install VoiceMode with UV (recommended)
uv tool install voice-mode-install
voice-mode-install
# Or update to latest version
voicemode update
Install Voice Services:
# Install Whisper for local STT
voicemode whisper service install
# Install Kokoro for local TTS
voicemode kokoro install
Services auto-start after installation.
Once services are running, start a voice conversation:
# Simple greeting
voicemode:converse("Hello! I'm ready to talk. What would you like to discuss?")
# The tool will:
# - Speak the message using TTS
# - Listen for the user's response
# - Return the transcribed text
That's it! You're now in a voice conversation.
The converse tool is your primary interface for voice interactions:
# Basic usage - speak and listen
voicemode:converse("How can I help you today?")
# Speak without waiting for response (for narration)
voicemode:converse("Let me search for that information", wait_for_response=False)
# With specific voice
voicemode:converse(
message="I found the answer",
voice="nova",
tts_provider="openai"
)
Key Parameters:
message (required): Text to speakwait_for_response (default: true): Whether to listen for user responsevoice: TTS voice name (auto-selected if not specified)tts_provider: "openai" or "kokoro" (auto-selected based on availability)listen_duration_max: Maximum listening time in seconds (default: 120)Manage voice services using the service tool:
# Check status
voicemode:service("whisper", "status")
voicemode:service("kokoro", "status")
# Start/stop services
voicemode:service("whisper", "start")
voicemode:service("kokoro", "stop")
voicemode:service("whisper", "restart")
# View logs for troubleshooting
voicemode:service("whisper", "logs", lines=50)
Available Services:
whisper: Local STT using Whisper.cppkokoro: Local TTS with multiple voiceslivekit: Room-based real-time communication (advanced)Service Actions:
status: Check if running and resource usagestart: Start the servicestop: Stop the servicerestart: Restart the servicelogs: View recent logsenable: Configure to start at boot/logindisable: Remove from startupPattern 1: Question and Answer
# Ask a question
voicemode:converse("What would you like to work on today?")
# User responds via voice
# Response text is returned for you to process
# Continue the conversation
voicemode:converse("Great! Let me help you with that.")
Pattern 2: Narrating Actions (Default Behavior)
When performing actions, speak without waiting to create natural flow:
# Announce action without waiting
voicemode:converse("Let me search the codebase for that", wait_for_response=False)
# Perform the action in parallel
Grep(pattern="function_name", path="/path/to/code")
# Announce results
voicemode:converse("I found 5 matches. Would you like me to show them?")
Pattern 3: Step-by-Step Guidance
When asking questions in voice mode:
# Good - one question at a time
voicemode:converse("Would you like to use local or cloud TTS?", wait_for_response=True)
# Wait for answer...
voicemode:converse("Should I install Kokoro for you?", wait_for_response=True)
# Avoid - multiple questions bundled together
# This is overwhelming in voice conversations
Check if everything is working:
# Check service status
voicemode:service("whisper", "status")
voicemode:service("kokoro", "status")
# If services aren't running, start them
voicemode:service("whisper", "start")
voicemode:service("kokoro", "start")
Using CLI for diagnostics:
# Check all dependencies
voicemode deps
# Diagnostic information
voicemode diag info
voicemode diag devices # List audio devices
voicemode diag registry # Show provider registry
# View service logs
voicemode whisper service logs
voicemode kokoro logs
Voice Selection:
Available voices depend on your TTS provider:
OpenAI Voices: alloy, echo, fable, onyx, nova, shimmer
Kokoro Voices: Multiple voices (check with voicemode kokoro voices)
Configuration:
# View current configuration
voicemode config list
# Set default voice
voicemode config set VOICEMODE_TTS_VOICE nova
# Set default provider
voicemode config set VOICEMODE_TTS_PROVIDER kokoro
# Edit full configuration
voicemode config edit
Project and User Preferences:
.voicemode file in project root~/.voicemode file in home directory~/.voicemode/config/config.yamlVoiceMode uses OpenAI-compatible endpoints for all services:
Cloud Providers:
Local Providers:
The system automatically:
Requirements:
Supported Formats:
Configuration Options:
disable_silence_detection: Keep listening even during silencevad_aggressiveness: 0-3 (default: 2) - how strict voice detection islisten_duration_min: Minimum recording time before silence detection (default: 2.0s)speed: Speech rate 0.25-4.0 (default: 1.0)chime_enabled: Enable/disable audio feedback chimesWhen playing audio files, you can batch multiple announcements and playback commands. Tools execute sequentially within the batch:
# Batch announce-play sequences
voicemode:converse("Chapter 1 - Introduction", wait_for_response=False)
Bash(command="mpv --start=00:00 --length=3 song.mp3")
voicemode:converse("Chapter 2 - Main Theme", wait_for_response=False)
Bash(command="mpv --start=00:10 --length=5 song.mp3")
This creates natural narration with audio playback.
Configure VoiceMode behavior:
VOICEMODE_TTS_VOICE: Default TTS voiceVOICEMODE_TTS_PROVIDER: Default TTS provider (openai, kokoro)VOICEMODE_STT_PROVIDER: Default STT providerVOICEMODE_AUDIO_FORMAT: Audio format preferenceVOICEMODE_DEBUG: Enable debug loggingVoiceMode maintains logs in ~/.voicemode/:
Log Structure:
logs/conversations/: Daily conversation transcriptslogs/events/: Operational events and errorsaudio/: Saved audio recordingsconfig/: Configuration filesEnable Debug Mode:
# Via environment variable
export VOICEMODE_DEBUG=true
# Via CLI flag
voicemode converse --debug
# Via MCP parameter
voicemode:converse(message="Test", debug=True)
# Start conversation
voicemode:converse("Hello!")
# Speak without waiting
voicemode:converse("Working on it...", wait_for_response=False)
# Check service status
voicemode:service("whisper", "status")
voicemode:service("kokoro", "status")
# Start services
voicemode:service("whisper", "start")
voicemode:service("kokoro", "start")
# View logs
voicemode:service("whisper", "logs", lines=50)
# Check status
voicemode whisper service status
voicemode kokoro status
# Start services
voicemode whisper service start
voicemode kokoro start
# View logs
voicemode whisper service logs
voicemode kokoro logs
# Configuration
voicemode config list
voicemode config set VOICEMODE_TTS_VOICE nova
voicemode config edit
# Diagnostics
voicemode deps
voicemode diag info
voicemode diag devices
When using CLI commands directly (not MCP tools), redirect STDERR to save tokens:
# Suppresses FFmpeg warnings and debug output
voicemode converse -m "Hello" 2>/dev/null
# Omit when debugging
voicemode converse -m "Hello" # Shows all diagnostic info
This only applies to Bash tool calls - MCP tools handle this automatically.
~/.voicemodeFor detailed documentation:
docs/reference/: Complete API and parameter documentationdocs/tutorials/: Step-by-step guidesdocs/services/: Service-specific documentationdocs/testing/installer-testing.md: Installer testing guide for Tart VMsServices won't start:
ffmpeg -versionvoicemode:service("whisper", "logs")voicemode:service("whisper", "restart")Audio quality issues:
voicemode diag devicesvad_aggressiveness=1 (more permissive)~/.voicemode/logs/conversations/Conversations not working:
voicemode:service("whisper", "status")voicemode diag registryConfiguration issues:
voicemode config list~/.voicemode/config/config.yaml