Use when implementing on-device AI with Apple's Foundation Models framework — prevents context overflow, blocking UI, wrong model use cases, and manual JSON parsing when @Generable should be used. iOS 26+, macOS 26+, iPadOS 26+, visionOS 26+
Inherits all available tools
Additional assets for this skill
This skill inherits all available tools. When active, it can use any tool Claude has access to.
Use when:
foundation-models-diag for systematic troubleshooting (context exceeded, guardrail violations, availability problems)foundation-models-ref for complete API reference with all WWDC code examplesThese are real questions developers ask that this skill is designed to answer:
→ The skill explains @Generable macro for guaranteed structured output with constrained decoding
→ The skill covers streaming with PartiallyGenerated, prewarming sessions, and performance optimization
→ The skill demonstrates transcript condensing strategies and context management patterns
→ The skill clarifies when each is appropriate (privacy, offline, world knowledge trade-offs)
→ The skill demonstrates Tool protocol implementation with MapKit, WeatherKit, Contacts examples
Why it fails: The on-device model is 3 billion parameters, optimized for summarization, extraction, classification — NOT world knowledge or complex reasoning.
Example of wrong use:
// ❌ BAD - Asking for world knowledge
let session = LanguageModelSession()
let response = try await session.respond(to: "What's the capital of France?")
Why: Model will hallucinate or give low-quality answers. It's trained for content generation, not encyclopedic knowledge.
Correct approach: Use server LLMs (ChatGPT, Claude) for world knowledge, or provide factual data through Tool calling.
Why it fails: session.respond() is async but if called synchronously on main thread, freezes UI for seconds.
Example of wrong use:
// ❌ BAD - Blocking main thread
Button("Generate") {
let response = try await session.respond(to: prompt) // UI frozen!
}
Why: Generation takes 1-5 seconds. User sees frozen app, bad reviews follow.
Correct approach:
// ✅ GOOD - Async on background
Button("Generate") {
Task {
let response = try await session.respond(to: prompt)
// Update UI with response
}
}
Why it fails: Prompting for JSON and parsing with JSONDecoder leads to hallucinated keys, invalid JSON, no type safety.
Example of wrong use:
// ❌ BAD - Manual JSON parsing
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
let data = response.content.data(using: .utf8)!
let person = try JSONDecoder().decode(Person.self, from: data) // CRASHES!
Why: Model might output {firstName: "John"} when you expect {name: "John"}. Or invalid JSON entirely.
Correct approach:
// ✅ GOOD - @Generable guarantees structure
@Generable
struct Person {
let name: String
let age: Int
}
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
// response.content is type-safe Person instance
Why it fails: Foundation Models only runs on Apple Intelligence devices in supported regions. App crashes or shows errors without check.
Example of wrong use:
// ❌ BAD - No availability check
let session = LanguageModelSession() // Might fail!
Correct approach:
// ✅ GOOD - Check first
switch SystemLanguageModel.default.availability {
case .available:
let session = LanguageModelSession()
// proceed
case .unavailable(let reason):
// Show graceful UI: "AI features require Apple Intelligence"
}
Why it fails: 4096 token context window (input + output). One massive prompt hits limit, gives poor results.
Example of wrong use:
// ❌ BAD - Everything in one prompt
let prompt = """
Generate a 7-day itinerary for Tokyo including hotels, restaurants,
activities for each day, transportation details, budget breakdown...
"""
// Exceeds context, poor quality
Correct approach: Break into smaller tasks, use tools for external data, multi-turn conversation.
Why it fails: Multi-turn conversations grow transcript. Eventually exceeds 4096 tokens, throws error, conversation ends.
Must handle:
// ✅ GOOD - Handle overflow
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// Condense transcript and create new session
session = condensedSession(from: session)
}
Why it fails: Model has content policy. Certain prompts trigger guardrails, throw error.
Must handle:
// ✅ GOOD - Handle guardrails
do {
let response = try await session.respond(to: userInput)
} catch LanguageModelSession.GenerationError.guardrailViolation {
// Show message: "I can't help with that request"
}
Why it fails: Model supports specific languages. User input might be unsupported, throws error.
Must check:
// ✅ GOOD - Check supported languages
let supported = SystemLanguageModel.default.supportedLanguages
guard supported.contains(Locale.current.language) else {
// Show disclaimer
return
}
Before writing any Foundation Models code, complete these steps:
switch SystemLanguageModel.default.availability {
case .available:
// Proceed with implementation
print("✅ Foundation Models available")
case .unavailable(let reason):
// Handle gracefully - show UI message
print("❌ Unavailable: \(reason)")
}
Why: Foundation Models requires:
Failure mode: App crashes or shows confusing errors without check.
Ask yourself: What is my primary goal?
| Use Case | Foundation Models? | Alternative |
|---|---|---|
| Summarization | ✅ YES | |
| Extraction (key info from text) | ✅ YES | |
| Classification (categorize content) | ✅ YES | |
| Content tagging | ✅ YES (built-in adapter!) | |
| World knowledge | ❌ NO | ChatGPT, Claude, Gemini |
| Complex reasoning | ❌ NO | Server LLMs |
| Mathematical computation | ❌ NO | Calculator, symbolic math |
Critical: If your use case requires world knowledge or advanced reasoning, stop. Foundation Models is the wrong tool.
If you need structured output (not just plain text):
Bad approach: Prompt for "JSON" and parse manually Good approach: Define @Generable type
@Generable
struct SearchSuggestions {
@Guide(description: "Suggested search terms", .count(4))
var searchTerms: [String]
}
Why: Constrained decoding guarantees structure. No parsing errors, no hallucinated keys.
If your feature needs external information:
Don't try to get this information from the model (it will hallucinate). Do define Tool protocol implementations.
If generation takes >1 second, use streaming:
let stream = session.streamResponse(
to: prompt,
generating: Itinerary.self
)
for try await partial in stream {
// Update UI incrementally
self.itinerary = partial
}
Why: Users see progress immediately, perceived latency drops dramatically.
Need on-device AI?
│
├─ World knowledge/reasoning?
│ └─ ❌ NOT Foundation Models
│ → Use ChatGPT, Claude, Gemini, etc.
│ → Reason: 3B parameter model, not trained for encyclopedic knowledge
│
├─ Summarization?
│ └─ ✅ YES → Pattern 1 (Basic Session)
│ → Example: Summarize article, condense email
│ → Time: 10-15 minutes
│
├─ Structured extraction?
│ └─ ✅ YES → Pattern 2 (@Generable)
│ → Example: Extract name, date, amount from invoice
│ → Time: 15-20 minutes
│
├─ Content tagging?
│ └─ ✅ YES → Pattern 3 (contentTagging use case)
│ → Example: Tag article topics, extract entities
│ → Time: 10 minutes
│
├─ Need external data?
│ └─ ✅ YES → Pattern 4 (Tool calling)
│ → Example: Fetch weather, query contacts, get locations
│ → Time: 20-30 minutes
│
├─ Long generation?
│ └─ ✅ YES → Pattern 5 (Streaming)
│ → Example: Generate itinerary, create story
│ → Time: 15-20 minutes
│
└─ Dynamic schemas (runtime-defined structure)?
└─ ✅ YES → Pattern 6 (DynamicGenerationSchema)
→ Example: Level creator, user-defined forms
→ Time: 30-40 minutes
Use when: Simple text generation, summarization, or content analysis.
LanguageModelSession:
import FoundationModels
func respond(userInput: String) async throws -> String {
let session = LanguageModelSession(instructions: """
You are a friendly barista in a pixel art coffee shop.
Respond to the player's question concisely.
"""
)
let response = try await session.respond(to: userInput)
return response.content
}
Code from: WWDC 301:1:05
let session = LanguageModelSession()
// First turn
let first = try await session.respond(to: "Write a haiku about fishing")
print(first.content)
// "Silent waters gleam,
// Casting lines in morning mist—
// Hope in every cast."
// Second turn - model remembers context
let second = try await session.respond(to: "Do another one about golf")
print(second.content)
// "Silent morning dew,
// Caddies guide with gentle words—
// Paths of patience tread."
// Inspect full transcript
print(session.transcript)
Code from: WWDC 286:17:46
Why this works: Session retains transcript automatically. Model uses context from previous turns.
let transcript = session.transcript
// Use for:
// - Debugging generation issues
// - Showing conversation history in UI
// - Exporting chat logs
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.guardrailViolation {
// Content policy triggered
print("Cannot generate that content")
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
// Language not supported
print("Please use English or another supported language")
}
✅ Good for:
❌ Not good for:
Implementation: 10-15 minutes for basic usage Debugging: +5-10 minutes if hitting errors
Use when: You need structured data from model, not just plain text.
Without @Generable:
// ❌ BAD - Unreliable
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
// Might get: {"firstName": "John"} when you expect {"name": "John"}
// Might get invalid JSON entirely
// Must parse manually, prone to crashes
@Generable
struct Person {
let name: String
let age: Int
}
let session = LanguageModelSession()
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
let person = response.content // Type-safe Person instance!
Code from: WWDC 301:8:14
@Generable macro generates schema at compile-timeFrom WWDC 286: "Constrained decoding masks out invalid tokens. Model can only pick tokens valid according to schema."
Primitives:
String, Int, Float, Double, BoolArrays:
@Generable
struct SearchSuggestions {
var searchTerms: [String]
}
Nested/Composed:
@Generable
struct Itinerary {
var destination: String
var days: [DayPlan] // Composed type
}
@Generable
struct DayPlan {
var activities: [String]
}
Code from: WWDC 286:6:18
Enums with Associated Values:
@Generable
struct NPC {
let name: String
let encounter: Encounter
@Generable
enum Encounter {
case orderCoffee(String)
case wantToTalkToManager(complaint: String)
}
}
Code from: WWDC 301:10:49
Recursive Types:
@Generable
struct Itinerary {
var destination: String
var relatedItineraries: [Itinerary] // Recursive!
}
Control generated values with @Guide:
Natural Language Description:
@Generable
struct NPC {
@Guide(description: "A full name with first and last")
let name: String
}
Numeric Ranges:
@Generable
struct Character {
@Guide(.range(1...10))
let level: Int
}
Code from: WWDC 301:11:20
Array Count:
@Generable
struct Suggestions {
@Guide(description: "Suggested search terms", .count(4))
var searchTerms: [String]
}
Code from: WWDC 286:5:32
Maximum Count:
@Generable
struct Result {
@Guide(.maximumCount(3))
let topics: [String]
}
Regex Patterns:
@Generable
struct NPC {
@Guide(Regex {
Capture {
ChoiceOf {
"Mr"
"Mrs"
}
}
". "
OneOrMore(.word)
})
let name: String
}
// Output: {name: "Mrs. Brewster"}
Code from: WWDC 301:13:40
Properties generated in declaration order:
@Generable
struct Itinerary {
var destination: String // Generated first
var days: [DayPlan] // Generated second
var summary: String // Generated last
}
From WWDC 286: "You may find model produces best summaries when they're last property."
Why: Later properties can reference earlier ones. Put most important properties first for streaming.
Use when: Generation takes >1 second and you want progressive UI updates.
Without streaming:
// User waits 3-5 seconds seeing nothing
let response = try await session.respond(to: prompt, generating: Itinerary.self)
// Then entire result appears at once
User experience: Feels slow, frozen UI.
@Generable
struct Itinerary {
var name: String
var days: [DayPlan]
}
let stream = session.streamResponse(
to: "Generate a 3-day itinerary to Mt. Fuji",
generating: Itinerary.self
)
for try await partial in stream {
print(partial) // Incrementally updated
}
Code from: WWDC 286:9:40
@Generable macro automatically creates PartiallyGenerated type:
// Compiler generates:
extension Itinerary {
struct PartiallyGenerated {
var name: String? // All properties optional!
var days: [DayPlan]?
}
}
Why optional: Properties fill in as model generates them.
struct ItineraryView: View {
let session: LanguageModelSession
@State private var itinerary: Itinerary.PartiallyGenerated?
var body: some View {
VStack {
if let name = itinerary?.name {
Text(name)
.font(.title)
}
if let days = itinerary?.days {
ForEach(days, id: \.self) { day in
DayView(day: day)
}
}
Button("Generate") {
Task {
let stream = session.streamResponse(
to: "Generate 3-day itinerary to Tokyo",
generating: Itinerary.self
)
for try await partial in stream {
self.itinerary = partial
}
}
}
}
}
}
Code from: WWDC 286:10:05
Add polish:
if let name = itinerary?.name {
Text(name)
.transition(.opacity)
}
if let days = itinerary?.days {
ForEach(days, id: \.self) { day in
DayView(day: day)
.transition(.slide)
}
}
From WWDC 286: "Get creative with SwiftUI animations to hide latency. Turn waiting into delight."
Critical for arrays:
// ✅ GOOD - Stable identity
ForEach(days, id: \.id) { day in
DayView(day: day)
}
// ❌ BAD - Identity changes, animations break
ForEach(days.indices, id: \.self) { index in
DayView(day: days[index])
}
// ✅ GOOD - Title appears first, summary last
@Generable
struct Itinerary {
var name: String // Shows first
var days: [DayPlan] // Shows second
var summary: String // Shows last (can reference days)
}
// ❌ BAD - Summary before content
@Generable
struct Itinerary {
var summary: String // Doesn't make sense before days!
var days: [DayPlan]
}
Code from: WWDC 286:11:00
✅ Use for:
❌ Skip for:
Implementation: 15-20 minutes with SwiftUI Polish (animations): +5-10 minutes
Use when: Model needs external data (weather, locations, contacts) to generate response.
// ❌ BAD - Model will hallucinate
let response = try await session.respond(
to: "What's the temperature in Cupertino?"
)
// Output: "It's about 72°F" (completely made up!)
Why: 3B parameter model doesn't have real-time weather data.
Let model autonomously call your code to fetch external data.
import FoundationModels
import WeatherKit
import CoreLocation
struct GetWeatherTool: Tool {
let name = "getWeather"
let description = "Retrieve latest weather for a city"
@Generable
struct Arguments {
@Guide(description: "The city to fetch weather for")
var city: String
}
func call(arguments: Arguments) async throws -> ToolOutput {
let places = try await CLGeocoder().geocodeAddressString(arguments.city)
let weather = try await WeatherService.shared.weather(for: places.first!.location!)
let temp = weather.currentWeather.temperature.value
return ToolOutput("\(arguments.city)'s temperature is \(temp) degrees.")
}
}
Code from: WWDC 286:13:42
let session = LanguageModelSession(
tools: [GetWeatherTool()],
instructions: "Help user with weather forecasts."
)
let response = try await session.respond(
to: "What's the temperature in Cupertino?"
)
print(response.content)
// "It's 71°F in Cupertino!"
Code from: WWDC 286:15:03
Model autonomously:
GetWeatherToolprotocol Tool {
var name: String { get }
var description: String { get }
associatedtype Arguments: Generable
func call(arguments: Arguments) async throws -> ToolOutput
}
Name: Short, verb-based (e.g. getWeather, findContact)
Description: One sentence explaining purpose
Arguments: Must be @Generable (guarantees valid input)
call: Your code — fetch data, process, return
Two forms:
return ToolOutput("Temperature is 71°F")
let content = GeneratedContent(properties: ["temperature": 71])
return ToolOutput(content)
let session = LanguageModelSession(
tools: [
GetWeatherTool(),
FindRestaurantTool(),
FindHotelTool()
],
instructions: "Plan travel itineraries."
)
let response = try await session.respond(
to: "Create a 2-day plan for Tokyo"
)
// Model autonomously decides:
// - Calls FindRestaurantTool for dining
// - Calls FindHotelTool for accommodation
// - Calls GetWeatherTool to suggest activities
Tools can maintain state across calls:
class FindContactTool: Tool {
let name = "findContact"
let description = "Find contact from age generation"
var pickedContacts = Set<String>() // State!
@Generable
struct Arguments {
let generation: Generation
@Generable
enum Generation {
case babyBoomers
case genX
case millennial
case genZ
}
}
func call(arguments: Arguments) async throws -> ToolOutput {
// Use Contacts API
var contacts = fetchContacts(for: arguments.generation)
// Remove already picked
contacts.removeAll(where: { pickedContacts.contains($0.name) })
guard let picked = contacts.randomElement() else {
return ToolOutput("No more contacts")
}
pickedContacts.insert(picked.name) // Update state
return ToolOutput(picked.name)
}
}
Code from: WWDC 301:21:55
Why class, not struct: Need to mutate state from call method.
1. Session initialized with tools
2. User prompt: "What's Tokyo's weather?"
3. Model analyzes: "Need weather data"
4. Model generates tool call: getWeather(city: "Tokyo")
5. Framework calls your tool's call() method
6. Your tool fetches real data from API
7. Tool output inserted into transcript
8. Model generates final response using tool output
From WWDC 301: "Model decides autonomously when and how often to call tools. Can call multiple tools per request, even in parallel."
✅ Guaranteed:
❌ Not guaranteed:
struct FindPointsOfInterestTool: Tool {
let name = "findPointsOfInterest"
let description = "Find restaurants, museums, parks near a landmark"
let landmark: String
@Generable
struct Arguments {
let category: Category
@Generable
enum Category {
case restaurant
case museum
case park
case marina
}
}
func call(arguments: Arguments) async throws -> ToolOutput {
// Use MapKit
let request = MKLocalSearch.Request()
request.naturalLanguageQuery = "\(arguments.category) near \(landmark)"
let search = MKLocalSearch(request: request)
let response = try await search.start()
let names = response.mapItems.prefix(5).map { $0.name ?? "" }
return ToolOutput(names.joined(separator: ", "))
}
}
From WWDC 259 summary: "Tool fetches points of interest from MapKit. Model uses world knowledge to determine promising categories."
✅ Use for:
❌ Don't use for:
Simple tool: 20-25 minutes Complex tool with state: 30-40 minutes
Use when: Multi-turn conversations that might exceed 4096 token limit.
// Long conversation...
for i in 1...100 {
let response = try await session.respond(to: "Question \(i)")
// Eventually...
// Error: exceededContextWindowSize
}
Context window: 4096 tokens (input + output combined) Average: ~3 characters per token in English
Rough calculation:
Long conversation or verbose prompts/responses → Exceed limit
var session = LanguageModelSession()
do {
let response = try await session.respond(to: prompt)
print(response.content)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// New session, no history
session = LanguageModelSession()
}
Code from: WWDC 301:3:37
Problem: Loses entire conversation history.
var session = LanguageModelSession()
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// New session with condensed history
session = condensedSession(from: session)
}
func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
let allEntries = previous.transcript.entries
var condensedEntries = [Transcript.Entry]()
// Always include first entry (instructions)
if let first = allEntries.first {
condensedEntries.append(first)
// Include last entry (most recent context)
if allEntries.count > 1, let last = allEntries.last {
condensedEntries.append(last)
}
}
let condensedTranscript = Transcript(entries: condensedEntries)
return LanguageModelSession(transcript: condensedTranscript)
}
Code from: WWDC 301:3:55
Why this works:
For long conversations where recent context isn't enough:
func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
let entries = previous.transcript.entries
guard entries.count > 3 else {
return LanguageModelSession(transcript: previous.transcript)
}
// Keep first (instructions) and last (recent)
var condensedEntries = [entries.first!]
// Summarize middle entries
let middleEntries = Array(entries[1..<entries.count-1])
let summaryPrompt = """
Summarize this conversation in 2-3 sentences:
\(middleEntries.map { $0.content }.joined(separator: "\n"))
"""
// Use Foundation Models itself to summarize!
let summarySession = LanguageModelSession()
let summary = try await summarySession.respond(to: summaryPrompt)
condensedEntries.append(Transcript.Entry(content: summary.content))
condensedEntries.append(entries.last!)
return LanguageModelSession(transcript: Transcript(entries: condensedEntries))
}
From WWDC 301: "You could summarize parts of transcript with Foundation Models itself."
1. Keep prompts concise:
// ❌ BAD
let prompt = """
I want you to generate a comprehensive detailed analysis of this article
with multiple sections including summary, key points, sentiment analysis,
main arguments, counter arguments, logical fallacies, and conclusions...
"""
// ✅ GOOD
let prompt = "Summarize this article's key points"
2. Use tools for data: Instead of putting entire dataset in prompt, use tools to fetch on-demand.
3. Break complex tasks into steps:
// ❌ BAD - One massive generation
let response = try await session.respond(
to: "Create 7-day itinerary with hotels, restaurants, activities..."
)
// ✅ GOOD - Multiple smaller generations
let overview = try await session.respond(to: "Create high-level 7-day plan")
for day in 1...7 {
let details = try await session.respond(to: "Detail activities for day \(day)")
}
From WWDC 286: "Each token in instructions and prompt adds latency. Longer outputs take longer."
Use Instruments (Foundation Models template) to:
Basic overflow handling: 5-10 minutes Condensing strategy: 15-20 minutes Advanced summarization: 30-40 minutes
Use when: You need control over output randomness/determinism.
Model generates output one token at a time:
Default: Random sampling → Different output each time
let response = try await session.respond(
to: prompt,
options: GenerationOptions(sampling: .greedy)
)
Code from: WWDC 301:6:14
Use cases:
Caveat: Only holds for same model version. OS updates may change output.
Low variance (conservative, focused):
let response = try await session.respond(
to: prompt,
options: GenerationOptions(temperature: 0.5)
)
High variance (creative, diverse):
let response = try await session.respond(
to: prompt,
options: GenerationOptions(temperature: 2.0)
)
Code from: WWDC 301:6:14
Temperature scale:
0.1-0.5: Very focused, predictable1.0 (default): Balanced1.5-2.0: Creative, variedExample use cases:
✅ Greedy for:
✅ Low temperature for:
✅ High temperature for:
Implementation: 2-3 minutes (one line change)
Context: You're implementing a new AI feature. PM suggests using ChatGPT API for "better results."
Pressure signals:
Rationalization traps:
Why this fails:
Privacy violation: User data sent to external server
Cost: Every API call costs money
Offline unavailable: Requires internet
Latency: Network round-trip adds 500-2000ms
When ChatGPT IS appropriate:
Mandatory response:
"I understand ChatGPT delivers great results for certain tasks. However,
for this feature, Foundation Models is the right choice for three critical reasons:
1. **Privacy**: This feature processes [medical notes/financial data/personal content].
Users expect this data stays on-device. Sending to external API violates that trust
and may have compliance issues.
2. **Cost**: At scale, ChatGPT API calls cost $X per 1000 requests. Foundation Models
is free. For Y million users, that's $Z annually we can avoid.
3. **Offline capability**: Foundation Models works without internet. Users in airplane
mode or with poor signal still get full functionality.
**When to use ChatGPT**: If this feature required world knowledge or complex reasoning,
ChatGPT would be the right choice. But this is [summarization/extraction/classification],
which is exactly what Foundation Models is optimized for.
**Time estimate**: Foundation Models implementation: 15-20 minutes.
Privacy compliance review for ChatGPT: 2-4 weeks."
Time saved: Privacy compliance review vs correct implementation: 2-4 weeks vs 20 minutes
Context: Teammate suggests prompting for JSON, parsing with JSONDecoder. Claims it's "simple and familiar."
Pressure signals:
Rationalization traps:
Why this fails:
Hallucinated keys: Model outputs {firstName: "John"} when you expect {name: "John"}
keyNotFoundInvalid JSON: Model might output:
Here's the person: {name: "John", age: 30}
No type safety: Manual string parsing, prone to errors
Real-world example:
// ❌ BAD - Will fail
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
// Model outputs: {"firstName": "John Smith", "years": 30}
// Your code expects: {"name": ..., "age": ...}
// CRASH: keyNotFound(name)
Debugging time: 2-4 hours finding edge cases, writing parsing hacks
Correct approach:
// ✅ GOOD - 15 minutes, guaranteed to work
@Generable
struct Person {
let name: String
let age: Int
}
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
// response.content is type-safe Person, always valid
Mandatory response:
"I understand JSON parsing feels familiar, but for LLM output, @Generable is objectively
better for three technical reasons:
1. **Constrained decoding guarantees structure**: Model can ONLY generate valid Person
instances. Impossible to get wrong keys, invalid JSON, or missing fields.
2. **No parsing code needed**: Framework handles parsing automatically. Zero chance of
parsing bugs.
3. **Compile-time safety**: If we change Person struct, compiler catches all issues.
Manual JSON parsing = runtime crashes.
**Real cost**: Manual JSON approach will hit edge cases. Debugging 'keyNotFound' crashes
takes 2-4 hours. @Generable implementation takes 15 minutes and has zero parsing bugs.
**Analogy**: This is like choosing Swift over Objective-C for new code. Both work, but
Swift's type safety prevents entire categories of bugs."
Time saved: 4-8 hours debugging vs 15 minutes correct implementation
Context: Feature requires extracting name, date, amount, category from invoice. Teammate suggests one prompt: "Extract all information."
Pressure signals:
Rationalization traps:
Why this fails:
Better approach: Break into tasks + use tools
// ❌ BAD - One massive prompt
let prompt = """
Extract from this invoice:
- Vendor name
- Invoice date
- Total amount
- Line items (description, quantity, price each)
- Payment terms
- Due date
- Tax amount
...
"""
// 4 seconds, poor quality, might exceed context
// ✅ GOOD - Structured extraction with focused prompts
@Generable
struct InvoiceBasics {
let vendor: String
let date: String
let amount: Double
}
let basics = try await session.respond(
to: "Extract vendor, date, and amount",
generating: InvoiceBasics.self
) // 0.5 seconds, high quality
@Generable
struct LineItem {
let description: String
let quantity: Int
let price: Double
}
let items = try await session.respond(
to: "Extract line items",
generating: [LineItem].self
) // 1 second, high quality
// Total: 1.5 seconds, better quality, graceful partial failures
Mandatory response:
"I understand the appeal of one simple API call. However, this specific task requires
a different approach:
1. **Context limits**: Invoice + complex extraction prompt will likely exceed 4096 token
limit. Multiple focused prompts stay well under limit.
2. **Better quality**: Model performs better with focused tasks. 'Extract vendor name'
gets 95%+ accuracy. 'Extract everything' gets 60-70%.
3. **Faster perceived performance**: Multiple prompts with streaming show progressive
results. Users see vendor name in 0.5s, not waiting 5s for everything.
4. **Graceful degradation**: If line items fail, we still have basics. All-or-nothing
approach means total failure.
**Implementation**: Breaking into 3-4 focused extractions takes 30 minutes. One big
prompt takes 2-3 hours debugging why it hits context limit and produces poor results."
Time saved: 2-3 hours debugging vs 30 minutes proper design
Problem: First generation takes 1-2 seconds just to load model.
Solution: Create session before user interaction.
class ViewModel: ObservableObject {
private var session: LanguageModelSession?
init() {
// Prewarm on init, not when user taps button
Task {
self.session = LanguageModelSession(instructions: "...")
}
}
func generate(prompt: String) async throws -> String {
let response = try await session!.respond(to: prompt)
return response.content
}
}
From WWDC 259: "Prewarming session before user interaction reduces initial latency."
Time saved: 1-2 seconds off first generation
Problem: @Generable schemas inserted into prompt, increases token count.
Solution: For subsequent requests with same schema, skip insertion.
let firstResponse = try await session.respond(
to: "Generate first person",
generating: Person.self
// Schema inserted automatically
)
// Subsequent requests with SAME schema
let secondResponse = try await session.respond(
to: "Generate another person",
generating: Person.self,
options: GenerationOptions(includeSchemaInPrompt: false)
)
From WWDC 259: "Setting includeSchemaInPrompt to false decreases token count and latency for subsequent requests."
When to use: Multi-turn with same @Generable type
Time saved: 10-20% latency reduction per request
Problem: User waits for entire generation.
Solution: Put important properties first, stream to show early.
// ✅ GOOD - Title shows immediately
@Generable
struct Article {
var title: String // Shows in 0.2s
var summary: String // Shows in 0.8s
var fullText: String // Shows in 2.5s
}
// ❌ BAD - Wait for everything
@Generable
struct Article {
var fullText: String // User waits 2.5s
var title: String
var summary: String
}
UX impact: Perceived latency drops from 2.5s to 0.2s
Use Instruments app with Foundation Models template to:
From WWDC 286: "New Instruments profiling template lets you observe areas of optimization and quantify improvements."
Access: Instruments → Create → Foundation Models template
Before shipping Foundation Models features:
exceededContextWindowSize)guardrailViolation)unsupportedLanguageOrLocale)Task {} for async)WWDC 2025 Sessions:
Related Axiom Skills:
foundation-models-diag — Systematic troubleshooting for context exceeded, guardrail violations, availability problemsfoundation-models-ref — Complete API reference with all code examples, dynamic schemas, tool calling detailsApple Documentation:
Last Updated: 2025-12-03 Version: 1.0.0 Target: iOS 26+, macOS 26+, iPadOS 26+, visionOS 26+