From apple-kit-skills
Transcribes live and pre-recorded audio to text using Apple's Speech framework. Covers SpeechAnalyzer, SpeechTranscriber, SFSpeechRecognizer, authorization, and result handling.
How this skill is triggered — by the user, by Claude, or both
Slash command
/apple-kit-skills:speech-recognitionThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Transcribe live and pre-recorded audio to text using Apple's Speech framework.
Transcribe live and pre-recorded audio to text using Apple's Speech framework.
Covers SpeechAnalyzer / SpeechTranscriber (iOS 26+) and
SFSpeechRecognizer (iOS 10+). Targets Swift 6.3 / iOS 26+ while preserving
fallback guidance for apps that support older OS versions.
Scope boundary: Use this skill for speech-to-text recognition, speech
authorization, microphone capture plumbing, and result handling. Hand off text
analysis, language identification after transcription, sentiment, embeddings,
and translation to natural-language; hand off audio playback UI to avkit;
hand off summarization or generation over transcripts to apple-on-device-ai.
Use SpeechAnalyzer for modern iOS 26+ speech analysis, especially long-form
recordings, live transcription, time-indexed transcripts, and fully on-device
flows. Keep SFSpeechRecognizer for iOS 10+ deployment targets, server-backed
locale coverage, or existing callback/delegate implementations.
Read SpeechAnalyzer patterns when implementing an iOS 26+ transcription pipeline, model asset handling, volatile results, or file/buffer examples.
SpeechTranscriber for the newer general-purpose on-device model.DictationTranscriber when SpeechTranscriber is unavailable for the
current device or locale and dictation-compatible support is acceptable.SpeechDetector only in conjunction with a transcriber when voice
activity detection is worth the accuracy/power tradeoff.SpeechTranscriber.isAvailableSpeechTranscriber.supportedLocale(equivalentTo:)SpeechTranscriber.installedLocales / supportedLocales when showing
language choices..transcription for basic accurate transcription..progressiveTranscription for live UI updates..timeIndexedProgressiveTranscription when playback highlighting needs
audioTimeRange.AssetInventory.assetInstallationRequest.SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:) before yielding
AnalyzerInput.AsyncSequence in a separate task.finalizeAndFinish(through:),
finalizeAndFinishThroughEndOfInput(), or cancelAndFinishNow().Do not use an offlineTranscription preset; Apple does not document one.
Finishing an AsyncStream input sequence does not finish the analyzer session.
import Speech
// Default locale (user's current language)
let recognizer = SFSpeechRecognizer()
// Specific locale
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
// Check if recognition is available for this locale
guard let recognizer, recognizer.isAvailable else {
print("Speech recognition not available")
return
}
final class SpeechManager: NSObject, SFSpeechRecognizerDelegate {
private let recognizer = SFSpeechRecognizer()!
override init() {
super.init()
recognizer.delegate = self
}
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
// Update UI — disable record button when unavailable
}
}
Request both speech recognition and microphone permissions before starting
live transcription. Add these keys to Info.plist:
NSSpeechRecognitionUsageDescriptionNSMicrophoneUsageDescriptionimport Speech
import AVFoundation
func requestPermissions() async -> Bool {
let speechStatus = await withCheckedContinuation { continuation in
SFSpeechRecognizer.requestAuthorization { status in
continuation.resume(returning: status)
}
}
guard speechStatus == .authorized else { return false }
let micStatus: Bool
if #available(iOS 17, *) {
micStatus = await AVAudioApplication.requestRecordPermission()
} else {
micStatus = await withCheckedContinuation { continuation in
AVAudioSession.sharedInstance().requestRecordPermission { granted in
continuation.resume(returning: granted)
}
}
}
return micStatus
}
The standard pattern: AVAudioEngine captures microphone audio → buffers are
appended to SFSpeechAudioBufferRecognitionRequest → results stream in.
import Speech
import AVFoundation
final class LiveTranscriber {
private let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private let audioEngine = AVAudioEngine()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
func startTranscribing() throws {
// Cancel any in-progress task
recognitionTask?.cancel()
recognitionTask = nil
// Configure audio session
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
// Create request
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true
self.recognitionRequest = request
// Start recognition task
recognitionTask = recognizer.recognitionTask(with: request) { result, error in
if let result {
let text = result.bestTranscription.formattedString
print("Transcription: \(text)")
if result.isFinal {
self.stopTranscribing()
}
}
if let error {
print("Recognition error: \(error)")
self.stopTranscribing()
}
}
// Install audio tap
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) {
buffer, _ in
request.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
}
func stopTranscribing() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recognitionRequest = nil
recognitionTask?.cancel()
recognitionTask = nil
}
}
Use SFSpeechURLRecognitionRequest for audio files on disk:
func transcribeFile(at url: URL) async throws -> String {
guard let recognizer = SFSpeechRecognizer(), recognizer.isAvailable else {
throw SpeechError.unavailable
}
let request = SFSpeechURLRecognitionRequest(url: url)
request.shouldReportPartialResults = false
return try await withCheckedThrowingContinuation { continuation in
var didResume = false
recognizer.recognitionTask(with: request) { result, error in
guard !didResume else { return }
if let error {
didResume = true
continuation.resume(throwing: error)
} else if let result, result.isFinal {
didResume = true
continuation.resume(
returning: result.bestTranscription.formattedString
)
}
}
}
}
SFSpeechRecognizer can use on-device recognition for supported locales on
iOS 13+. If supportsOnDeviceRecognition is false, the recognizer requires a
network connection. requiresOnDeviceRecognition only has effect when the
recognizer supports it.
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
// Check if on-device is supported for this locale
if recognizer.supportsOnDeviceRecognition {
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // Force on-device
}
SFSpeechRecognizer requests may still be a poor fit for long-form capture.
Apple documents a roughly one-minute task limit for speech recognition and
other service limits. For long recordings on iOS 26+, prefer SpeechAnalyzer;
otherwise chunk or restart recognition before the limit and preserve transcript
state across tasks.
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true // default is true
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
if result.isFinal {
// Final transcription — recognition is complete
let final = result.bestTranscription.formattedString
} else {
// Partial result — may change as more audio is processed
let partial = result.bestTranscription.formattedString
}
}
recognizer.recognitionTask(with: request) { result, error in
guard let result else { return }
// Best transcription
let best = result.bestTranscription
// All alternatives (sorted by confidence, descending)
for transcription in result.transcriptions {
for segment in transcription.segments {
print("\(segment.substring): \(segment.confidence)")
}
}
}
let request = SFSpeechAudioBufferRecognitionRequest()
request.addsPunctuation = true
Improve recognition of domain-specific terms:
let request = SFSpeechAudioBufferRecognitionRequest()
request.contextualStrings = ["SwiftUI", "Xcode", "CloudKit"]
// ❌ DON'T: Only request speech authorization for live audio
SFSpeechRecognizer.requestAuthorization { status in
// Missing microphone permission — audio engine will fail
self.startRecording()
}
// ✅ DO: Request both permissions before recording
SFSpeechRecognizer.requestAuthorization { status in
guard status == .authorized else { return }
AVAudioSession.sharedInstance().requestRecordPermission { granted in
guard granted else { return }
self.startRecording()
}
}
// ❌ DON'T: Assume recognizer stays available after initial check
let recognizer = SFSpeechRecognizer()!
// Recognition may fail if network drops or locale changes
// ✅ DO: Monitor availability via delegate
recognizer.delegate = self
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
recordButton.isEnabled = available
}
// ❌ DON'T: Leave audio engine running after recognition finishes
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true {
// Audio engine still running, wasting resources and battery
}
}
// ✅ DO: Clean up all audio resources
recognizer.recognitionTask(with: request) { result, error in
if result?.isFinal == true || error != nil {
self.audioEngine.stop()
self.audioEngine.inputNode.removeTap(onBus: 0)
self.recognitionRequest?.endAudio()
self.recognitionRequest = nil
}
}
// ❌ DON'T: Force on-device without checking support
let request = SFSpeechAudioBufferRecognitionRequest()
request.requiresOnDeviceRecognition = true // Ignored unless the recognizer supports it
// ✅ DO: Check support before requiring on-device
if recognizer.supportsOnDeviceRecognition {
request.requiresOnDeviceRecognition = true
} else {
// Fall back to server-based or inform user
}
// ❌ DON'T: Start one long continuous recognition session
func startRecording() {
// SFSpeechRecognizer tasks can be cut off after about 60 seconds
}
// ✅ DO: roll the segment before the limit and let cleanup end audio once
func scheduleRecognitionRollover() {
recognitionTimer = Timer.scheduledTimer(withTimeInterval: 55, repeats: false) { [weak self] _ in
self?.commitLatestPartialText()
self?.stopTranscribing() // owns endAudio(), tap removal, and task cancellation
try? self?.startTranscribing()
}
}
SFSpeechRecognitionTask exposes finish(), cancel(), state, and error;
do not invent task properties such as recognitionTask to restart work. Keep
the active SFSpeechAudioBufferRecognitionRequest in your manager and call
endAudio() from one cleanup path only.
// ❌ DON'T: Only finish the AsyncStream and expect result streams to close
inputBuilder.finish()
// ✅ DO: explicitly finish or cancel the analyzer session
let lastSampleTime = try await analyzer.analyzeSequence(inputSequence)
if let lastSampleTime {
try await analyzer.finalizeAndFinish(through: lastSampleTime)
} else {
try analyzer.cancelAndFinishNow()
}
// ✅ Replace volatile text with the finalized result for the same audio range
for try await result in transcriber.results {
if result.isFinal {
volatileTranscript = AttributedString()
finalizedTranscript.append(result.text)
} else {
volatileTranscript = result.text
}
}
// ❌ DON'T: Start a new task without canceling the previous one
func startRecording() {
recognitionTask = recognizer.recognitionTask(with: request) { ... }
// Previous task is still running — undefined behavior
}
// ✅ DO: Cancel existing task before creating a new one
func startRecording() {
recognitionTask?.cancel()
recognitionTask = nil
recognitionTask = recognizer.recognitionTask(with: request) { ... }
}
NSSpeechRecognitionUsageDescription is in Info.plistNSMicrophoneUsageDescription is in Info.plist (if using live audio)SFSpeechRecognizerDelegate is set to handle availabilityDidChangerecognitionRequest.endAudio() is called when done recordingrecognitionTask is canceled before starting a new onesupportsOnDeviceRecognition is checked before requiring on-device modeisFinal) resultsSFSpeechRecognizer one-minute/service limits are accounted forAssetInventory assets are installed before using SpeechAnalyzerSpeechTranscriber.isAvailable and locale support are checkednpx claudepluginhub dpearson2699/swift-ios-skills --plugin all-ios-skillsGuides implementation of Apple's SpeechAnalyzer and SpeechTranscriber for on-device speech-to-text transcription in macOS 26+ and iOS 26+ apps, including best practices and setup sequences.
Helps build low-latency voice agents using speech-to-speech (OpenAI Realtime) or pipeline (STT→LLM→TTS) architectures. Covers VAD, interruption handling, and tooling like Pipecat, Deepgram, ElevenLabs.
Provides expert guidance for Azure AI Speech development: STT/TTS APIs, custom voice/avatars, Voice Live, batch transcription, and containerized services. Includes troubleshooting, best practices, limits, security, and deployment.