subject not detected, hand pose missing landmarks, low confidence observations, Vision performance, coordinate conversion, VisionKit errors, observation nil
This skill inherits all available tools. When active, it can use any tool Claude has access to.
Systematic troubleshooting for Vision framework issues: subjects not detected, missing landmarks, low confidence, performance problems, and coordinate mismatches.
Core Principle: When Vision doesn't work, the problem is usually:
Always check environment and confidence BEFORE debugging code.
Symptoms that indicate Vision-specific issues:
| Symptom | Likely Cause |
|---|---|
| Subject not detected at all | Edge of frame, poor lighting, very small subject |
| Hand landmarks intermittently nil | Hand near edge, parallel to camera, glove/occlusion |
| Body pose skipped frames | Person bent over, upside down, flowing clothing |
| UI freezes during processing | Running Vision on main thread |
| Overlays in wrong position | Coordinate conversion (lower-left vs top-left) |
| Crash on older devices | Using iOS 17+ APIs without @available check |
| Person segmentation misses people | >4 people in scene (instance mask limit) |
| Low FPS in camera feed | maximumHandCount too high, not dropping frames |
Before investigating code, run these diagnostics:
let request = VNGenerateForegroundInstanceMaskRequest() // Or hand/body pose
let handler = VNImageRequestHandler(cgImage: testImage)
do {
try handler.perform([request])
if let results = request.results {
print("✅ Request succeeded")
print("Result count: \(results.count)")
if let observation = results.first as? VNInstanceMaskObservation {
print("All instances: \(observation.allInstances)")
print("Instance count: \(observation.allInstances.count)")
}
} else {
print("⚠️ Request succeeded but no results")
}
} catch {
print("❌ Request failed: \(error)")
}
Expected output:
// For hand/body pose
if let observation = request.results?.first as? VNHumanHandPoseObservation {
let allPoints = try observation.recognizedPoints(.all)
for (key, point) in allPoints {
print("\(key): confidence \(point.confidence)")
if point.confidence < 0.3 {
print(" ⚠️ LOW CONFIDENCE - unreliable")
}
}
}
Expected output:
print("🧵 Thread: \(Thread.current)")
if Thread.isMainThread {
print("❌ Running on MAIN THREAD - will block UI!")
} else {
print("✅ Running on background thread")
}
Expected output:
DispatchQueue.global()Vision not working as expected?
│
├─ No results returned?
│ ├─ Check Step 1 output
│ │ ├─ "Request failed" → See Pattern 1a (API availability)
│ │ ├─ "No results" → See Pattern 1b (nothing detected)
│ │ └─ Results but count = 0 → See Pattern 1c (edge of frame)
│
├─ Landmarks have nil/low confidence?
│ ├─ Hand pose → See Pattern 2 (hand detection issues)
│ ├─ Body pose → See Pattern 3 (body detection issues)
│ └─ Face detection → See Pattern 4 (face detection issues)
│
├─ UI freezing/slow?
│ ├─ Check Step 3 (threading)
│ │ ├─ Main thread → See Pattern 5a (move to background)
│ │ └─ Background thread → See Pattern 5b (performance tuning)
│
├─ Overlays in wrong position?
│ └─ See Pattern 6 (coordinate conversion)
│
├─ Person segmentation missing people?
│ └─ See Pattern 7 (crowded scenes)
│
└─ VisionKit not working?
└─ See Pattern 8 (VisionKit specific)
Symptom: try handler.perform([request]) throws error
Common errors:
"VNGenerateForegroundInstanceMaskRequest is only available on iOS 17.0 or newer"
"VNDetectHumanBodyPose3DRequest is only available on iOS 17.0 or newer"
Root cause: Using iOS 17+ APIs on older deployment target
Fix:
if #available(iOS 17.0, *) {
let request = VNGenerateForegroundInstanceMaskRequest()
// ...
} else {
// Fallback for iOS 14-16
let request = VNGeneratePersonSegmentationRequest()
// ...
}
Prevention: Check API availability in vision-ref before implementing
Time to fix: 10 min
Symptom: request.results == nil or results.isEmpty
Diagnostic:
// 1. Save debug image to Photos
UIImageWriteToSavedPhotosAlbum(debugImage, nil, nil, nil)
// 2. Inspect visually
// - Is subject too small? (< 10% of image)
// - Is subject blurry?
// - Poor contrast with background?
Common causes:
Fix:
// Crop image to focus on region of interest
let croppedImage = cropImage(sourceImage, to: regionOfInterest)
let handler = VNImageRequestHandler(cgImage: croppedImage)
Time to fix: 30 min
Symptom: Subject detected intermittently as object moves across frame
Root cause: Partial occlusion when subject touches image edges
Diagnostic:
// Check if subject is near edges
if let observation = results.first as? VNInstanceMaskObservation {
let mask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: true
)
let bounds = calculateMaskBounds(mask)
if bounds.minX < 0.1 || bounds.maxX > 0.9 ||
bounds.minY < 0.1 || bounds.maxY > 0.9 {
print("⚠️ Subject too close to edge")
}
}
Fix:
// Add padding to capture area
let paddedRect = captureRect.insetBy(dx: -20, dy: -20)
// OR guide user with on-screen overlay
overlayView.addSubview(guideBox) // Visual boundary
Time to fix: 20 min
Symptom: VNDetectHumanHandPoseRequest returns nil or low confidence landmarks
Diagnostic:
if let observation = request.results?.first as? VNHumanHandPoseObservation {
let thumbTip = try? observation.recognizedPoint(.thumbTip)
let wrist = try? observation.recognizedPoint(.wrist)
print("Thumb confidence: \(thumbTip?.confidence ?? 0)")
print("Wrist confidence: \(wrist?.confidence ?? 0)")
// Check hand orientation
if let thumb = thumbTip, let wristPoint = wrist {
let angle = atan2(
thumb.location.y - wristPoint.location.y,
thumb.location.x - wristPoint.location.x
)
print("Hand angle: \(angle * 180 / .pi) degrees")
if abs(angle) > 80 && abs(angle) < 100 {
print("⚠️ Hand parallel to camera (hard to detect)")
}
}
}
Common causes:
| Cause | Confidence Pattern | Fix |
|---|---|---|
| Hand near edge | Tips have low confidence | Adjust framing |
| Hand parallel to camera | All landmarks low | Prompt user to rotate hand |
| Gloves/occlusion | Fingers low, wrist high | Remove gloves or change lighting |
| Feet detected as hands | Unexpected hand detected | Add chirality check or ignore |
Fix for parallel hand:
// Detect and warn user
if avgConfidence < 0.4 {
showWarning("Rotate your hand toward the camera")
}
Time to fix: 45 min
Symptom: VNDetectHumanBodyPoseRequest skips frames or returns low confidence
Diagnostic:
if let observation = request.results?.first as? VNHumanBodyPoseObservation {
let nose = try? observation.recognizedPoint(.nose)
let root = try? observation.recognizedPoint(.root)
if let nosePoint = nose, let rootPoint = root {
let bodyAngle = atan2(
nosePoint.location.y - rootPoint.location.y,
nosePoint.location.x - rootPoint.location.x
)
let angleFromVertical = abs(bodyAngle - .pi / 2)
if angleFromVertical > .pi / 4 {
print("⚠️ Person bent over or upside down")
}
}
}
Common causes:
| Cause | Solution |
|---|---|
| Person bent over | Prompt user to stand upright |
| Upside down (handstand) | Use ARKit instead (better for dynamic poses) |
| Flowing clothing | Increase contrast or use tighter clothing |
| Multiple people overlapping | Use person instance segmentation |
Time to fix: 1 hour
Symptom: VNDetectFaceRectanglesRequest misses faces or returns wrong count
Diagnostic:
if let faces = request.results as? [VNFaceObservation] {
print("Detected \(faces.count) faces")
for face in faces {
print("Face bounds: \(face.boundingBox)")
print("Confidence: \(face.confidence)")
if face.boundingBox.width < 0.1 {
print("⚠️ Face too small")
}
}
}
Common causes:
Time to fix: 30 min
Symptom: App freezes when performing Vision request
Diagnostic (Step 3 above confirms main thread)
Fix:
// BEFORE (wrong)
let request = VNGenerateForegroundInstanceMaskRequest()
try handler.perform([request]) // Blocks UI
// AFTER (correct)
DispatchQueue.global(qos: .userInitiated).async {
let request = VNGenerateForegroundInstanceMaskRequest()
try? handler.perform([request])
DispatchQueue.main.async {
// Update UI
}
}
Time to fix: 15 min
Symptom: Already on background thread but still slow / dropping frames
Diagnostic:
let start = CFAbsoluteTimeGetCurrent()
try handler.perform([request])
let elapsed = CFAbsoluteTimeGetCurrent() - start
print("Request took \(elapsed * 1000)ms")
if elapsed > 0.2 { // 200ms = too slow for real-time
print("⚠️ Request too slow for real-time processing")
}
Common causes & fixes:
| Cause | Fix | Time Saved |
|---|---|---|
maximumHandCount = 10 | Set to actual need (e.g., 2) | 50-70% |
| Processing every frame | Skip frames (process every 3rd) | 66% |
| Full-res images | Downscale to 1280x720 | 40-60% |
| Multiple requests per frame | Batch or alternate requests | 30-50% |
Fix for real-time camera:
// Skip frames
frameCount += 1
guard frameCount % 3 == 0 else { return }
// OR downscale
let scaledImage = resizeImage(sourceImage, to: CGSize(width: 1280, height: 720))
// OR set lower hand count
request.maximumHandCount = 2 // Instead of default
Time to fix: 1 hour
Symptom: UI overlays appear in wrong position
Diagnostic:
// Vision point (lower-left origin, normalized)
let visionPoint = recognizedPoint.location
print("Vision point: \(visionPoint)") // e.g., (0.5, 0.8)
// Convert to UIKit
let uiX = visionPoint.x * imageWidth
let uiY = (1 - visionPoint.y) * imageHeight // FLIP Y
print("UIKit point: (\(uiX), \(uiY))")
// Verify overlay
overlayView.center = CGPoint(x: uiX, y: uiY)
Common mistakes:
// ❌ WRONG (no Y flip)
let uiPoint = CGPoint(
x: visionPoint.x * width,
y: visionPoint.y * height
)
// ❌ WRONG (forgot to scale from normalized)
let uiPoint = CGPoint(
x: visionPoint.x,
y: 1 - visionPoint.y
)
// ✅ CORRECT
let uiPoint = CGPoint(
x: visionPoint.x * width,
y: (1 - visionPoint.y) * height
)
Time to fix: 20 min
Symptom: VNGeneratePersonInstanceMaskRequest misses people or combines them
Diagnostic:
// Count faces
let faceRequest = VNDetectFaceRectanglesRequest()
try handler.perform([faceRequest])
let faceCount = faceRequest.results?.count ?? 0
print("Detected \(faceCount) faces")
// Person instance segmentation
let personRequest = VNGeneratePersonInstanceMaskRequest()
try handler.perform([personRequest])
let personCount = (personRequest.results?.first as? VNInstanceMaskObservation)?.allInstances.count ?? 0
print("Detected \(personCount) people")
if faceCount > 4 && personCount <= 4 {
print("⚠️ Crowded scene - some people combined or missing")
}
Fix:
if faceCount > 4 {
// Fallback: Use single mask for all people
let singleMaskRequest = VNGeneratePersonSegmentationRequest()
try handler.perform([singleMaskRequest])
// OR guide user
showWarning("Please reduce number of people in frame (max 4)")
}
Time to fix: 30 min
Symptom: ImageAnalysisInteraction not showing subject lifting UI
Diagnostic:
// 1. Check interaction types
print("Interaction types: \(interaction.preferredInteractionTypes)")
// 2. Check if analysis is set
print("Analysis: \(interaction.analysis != nil ? "set" : "nil")")
// 3. Check if view supports interaction
if let view = interaction.view {
print("View: \(view)")
} else {
print("❌ View not set")
}
Common causes:
| Symptom | Cause | Fix |
|---|---|---|
| No UI appears | analysis not set | Call analyzer.analyze() and set result |
| UI appears but no subject lifting | Wrong interaction type | Set .imageSubject or .automatic |
| Crash on interaction | View removed before interaction | Keep view in memory |
Fix:
// Ensure analysis is set
let analyzer = ImageAnalyzer()
let analysis = try await analyzer.analyze(image, configuration: config)
interaction.analysis = analysis // Required!
interaction.preferredInteractionTypes = .imageSubject
Time to fix: 20 min
Situation: App Store review rejected for "app freezes when tapping analyze button"
Triage (5 min):
Fix (15 min):
@IBAction func analyzeTapped(_ sender: UIButton) {
showLoadingIndicator()
DispatchQueue.global(qos: .userInitiated).async { [weak self] in
let request = VNGenerateForegroundInstanceMaskRequest()
// ... perform request
DispatchQueue.main.async {
self?.hideLoadingIndicator()
self?.updateUI(with: results)
}
}
}
Communicate to PM: "App Store rejection due to Vision processing on main thread. Fixed by moving to background queue (industry standard). Testing on iPhone 12 confirms fix. Safe to resubmit."
| Symptom | Likely Cause | First Check | Pattern | Est. Time |
|---|---|---|---|---|
| No results | Nothing detected | Step 1 output | 1b/1c | 30 min |
| Intermittent detection | Edge of frame | Subject position | 1c | 20 min |
| Hand missing landmarks | Low confidence | Step 2 (confidence) | 2 | 45 min |
| Body pose skipped | Person bent over | Body angle | 3 | 1 hour |
| UI freezes | Main thread | Step 3 (threading) | 5a | 15 min |
| Slow processing | Performance tuning | Request timing | 5b | 1 hour |
| Wrong overlay position | Coordinates | Print points | 6 | 20 min |
| Missing people (>4) | Crowded scene | Face count | 7 | 30 min |
| VisionKit no UI | Analysis not set | Interaction state | 8 | 20 min |
Related Axiom skills:
vision — Decision trees and implementation patternsvision-ref — Complete API referenceWWDC debugging sessions:
Apple documentation: