Vision 框架诊断指南：解决 iOS 视觉识别问题（主体检测、关键点、置信度、性能）

axiom-vision-diag by charleswiltgen/axiom

159 周安装量

767 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/charleswiltgen/axiom --skill axiom-vision-diag

iOS 调试计算机视觉

🇨🇳中文介绍

Vision 框架诊断

针对 Vision 框架问题的系统性故障排除：主体未检测到、缺少关键点、置信度低、性能问题、坐标不匹配、文本识别失败、条形码检测问题以及文档扫描问题。

概述

核心原则：当 Vision 不工作时，问题通常是：

环境（光照、遮挡、画面边缘）- 40%
置信度阈值（忽略了低置信度数据）- 30%
线程处理（阻塞主线程导致 UI 冻结）- 15%
坐标（混淆左下角和左上角原点）- 10%
API 可用性（在旧设备上使用 iOS 17+ 的 API）- 5%

在调试代码之前，务必先检查环境和置信度。

危险信号

指示 Vision 特定问题的症状：

症状	可能原因
完全未检测到主体	画面边缘、光照差、主体非常小
手部关键点间歇性为 nil	手靠近边缘、与摄像头平行、手套/遮挡
人体姿态检测跳帧	人弯腰、倒立、宽松衣物
处理过程中 UI 冻结	在主线程上运行 Vision
叠加层位置错误	坐标转换（左下角 vs 左上角）
在旧设备上崩溃

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

强制性的第一步

在调查代码之前，运行以下诊断：

步骤 1：使用诊断代码验证检测

let request = VNGenerateForegroundInstanceMaskRequest()  // 或手部/人体姿态
let handler = VNImageRequestHandler(cgImage: testImage)

do {
    try handler.perform([request])

    if let results = request.results {
        print("✅ 请求成功")
        print("结果数量: \(results.count)")

        if let observation = results.first as? VNInstanceMaskObservation {
            print("所有实例: \(observation.allInstances)")
            print("实例数量: \(observation.allInstances.count)")
        }
    } else {
        print("⚠️ 请求成功但无结果")
    }
} catch {
    print("❌ 请求失败: \(error)")
}

✅ 请求成功，实例数量 > 0 → 检测正常工作
⚠️ 请求成功，实例数量 = 0 → 未检测到任何内容（参见决策树）
❌ 请求失败 → API 可用性问题

步骤 2：检查置信度分数

// 对于手部/人体姿态
if let observation = request.results?.first as? VNHumanHandPoseObservation {
    let allPoints = try observation.recognizedPoints(.all)

    for (key, point) in allPoints {
        print("\(key): 置信度 \(point.confidence)")

        if point.confidence < 0.3 {
            print("  ⚠️ 置信度低 - 不可靠")
        }
    }
}

大多数关键点 > 0.5 置信度 → 检测良好
许多关键点 < 0.3 → 光照差、遮挡或画面边缘

步骤 3：验证线程处理

print("🧵 线程: \(Thread.current)")

if Thread.isMainThread {
    print("❌ 在主线程上运行 - 将阻塞 UI！")
} else {
    print("✅ 在后台线程上运行")
}

✅ 后台线程 → 正确
❌ 主线程 → 移至 DispatchQueue.global()

Vision 未按预期工作？
│
├─ 没有返回结果？
│  ├─ 检查步骤 1 输出
│  │  ├─ "请求失败" → 参见模式 1a (API 可用性)
│  │  ├─ "无结果" → 参见模式 1b (未检测到任何内容)
│  │  └─ 有结果但数量 = 0 → 参见模式 1c (画面边缘)
│
├─ 关键点为 nil/置信度低？
│  ├─ 手部姿态 → 参见模式 2 (手部检测问题)
│  ├─ 人体姿态 → 参见模式 3 (人体检测问题)
│  └─ 人脸检测 → 参见模式 4 (人脸检测问题)
│
├─ UI 冻结/缓慢？
│  ├─ 检查步骤 3 (线程处理)
│  │  ├─ 主线程 → 参见模式 5a (移至后台)
│  │  └─ 后台线程 → 参见模式 5b (性能调优)
│
├─ 叠加层位置错误？
│  └─ 参见模式 6 (坐标转换)
│
├─ 人物分割遗漏人物？
│  └─ 参见模式 7 (拥挤场景)
│
├─ VisionKit 不工作？
│  └─ 参见模式 8 (VisionKit 特定问题)
│
├─ 文本识别问题？
│  ├─ 未检测到文本 → 参见模式 9a (图像质量)
│  ├─ 字符错误 → 参见模式 9b (语言/校正)
│  └─ 太慢 → 参见模式 9c (识别级别)
│
├─ 条形码检测问题？
│  ├─ 未检测到条形码 → 参见模式 10a (符号类型/大小)
│  └─ 载荷错误 → 参见模式 10b (条形码质量)
│
├─ DataScannerViewController 问题？
│  ├─ 空白屏幕 → 参见模式 11a (可用性检查)
│  └─ 未检测到项目 → 参见模式 11b (数据类型)
│
└─ 文档扫描问题？
   ├─ 未检测到边缘 → 参见模式 12a (对比度/形状)
   └─ 透视校正错误 → 参见模式 12b (角点)

模式 1a：请求失败 (API 可用性)

症状：try handler.perform([request]) 抛出错误

"VNGenerateForegroundInstanceMaskRequest 仅在 iOS 17.0 或更高版本上可用"
"VNDetectHumanBodyPose3DRequest 仅在 iOS 17.0 或更高版本上可用"

根本原因：在旧部署目标上使用 iOS 17+ API

if #available(iOS 17.0, *) {
    let request = VNGenerateForegroundInstanceMaskRequest()
    // ...
} else {
    // iOS 14-16 的回退方案
    let request = VNGeneratePersonSegmentationRequest()
    // ...
}

预防：在实现前检查 axiom-vision-ref 中的 API 可用性

修复时间：10 分钟

模式 1b：无结果 (未检测到任何内容)

症状：request.results == nil 或 results.isEmpty

// 1. 将调试图像保存到相册
UIImageWriteToSavedPhotosAlbum(debugImage, nil, nil, nil)

// 2. 视觉检查
// - 主体是否太小？(< 图像的 10%)
// - 主体是否模糊？
// - 与背景对比度差？

主体太小（调整大小或裁剪得更近）
主体太模糊（增加光照，稳定相机）
对比度低（主体与背景颜色相同）

// 裁剪图像以聚焦于感兴趣区域
let croppedImage = cropImage(sourceImage, to: regionOfInterest)
let handler = VNImageRequestHandler(cgImage: croppedImage)

修复时间：30 分钟

模式 1c：画面边缘问题

症状：当物体在画面中移动时，主体被间歇性检测到

根本原因：主体接触图像边缘时的部分遮挡

// 检查主体是否靠近边缘
if let observation = results.first as? VNInstanceMaskObservation {
    let mask = try observation.createScaledMask(
        for: observation.allInstances,
        croppedToInstancesContent: true
    )

    let bounds = calculateMaskBounds(mask)

    if bounds.minX < 0.1 || bounds.maxX > 0.9 ||
       bounds.minY < 0.1 || bounds.maxY > 0.9 {
        print("⚠️ 主体太靠近边缘")
    }
}

// 为捕获区域添加边距
let paddedRect = captureRect.insetBy(dx: -20, dy: -20)

// 或者使用屏幕叠加层引导用户
overlayView.addSubview(guideBox)  // 视觉边界

修复时间：20 分钟

模式 2：手部姿态问题

症状：VNDetectHumanHandPoseRequest 返回 nil 或置信度低的关键点

if let observation = request.results?.first as? VNHumanHandPoseObservation {
    let thumbTip = try? observation.recognizedPoint(.thumbTip)
    let wrist = try? observation.recognizedPoint(.wrist)

    print("拇指置信度: \(thumbTip?.confidence ?? 0)")
    print("手腕置信度: \(wrist?.confidence ?? 0)")

    // 检查手部方向
    if let thumb = thumbTip, let wristPoint = wrist {
        let angle = atan2(
            thumb.location.y - wristPoint.location.y,
            thumb.location.x - wristPoint.location.x
        )
        print("手部角度: \(angle * 180 / .pi) 度")

        if abs(angle) > 80 && abs(angle) < 100 {
            print("⚠️ 手部与摄像头平行（难以检测）")
        }
    }
}

原因	置信度模式	修复
手靠近边缘	指尖置信度低	调整取景
手与摄像头平行	所有关键点置信度低	提示用户转动手部
手套/遮挡	手指置信度低，手腕置信度高	摘掉手套或改变光照
脚被检测为手	检测到意外的手	添加 `chirality` 检查或忽略

针对平行手的修复：

// 检测并警告用户
if avgConfidence < 0.4 {
    showWarning("将您的手转向摄像头")
}

修复时间：45 分钟

模式 3：人体姿态问题

症状：VNDetectHumanBodyPoseRequest 跳帧或返回置信度低

if let observation = request.results?.first as? VNHumanBodyPoseObservation {
    let nose = try? observation.recognizedPoint(.nose)
    let root = try? observation.recognizedPoint(.root)

    if let nosePoint = nose, let rootPoint = root {
        let bodyAngle = atan2(
            nosePoint.location.y - rootPoint.location.y,
            nosePoint.location.x - rootPoint.location.x
        )

        let angleFromVertical = abs(bodyAngle - .pi / 2)

        if angleFromVertical > .pi / 4 {
            print("⚠️ 人弯腰或倒立")
        }
    }
}

原因	解决方案
人弯腰	提示用户站直
倒立（倒立）	改用 ARKit（更适合动态姿势）
宽松衣物	增加对比度或穿紧身衣物
多人重叠	使用人物实例分割

修复时间：1 小时

模式 4：人脸检测问题

症状：VNDetectFaceRectanglesRequest 遗漏人脸或返回错误数量

if let faces = request.results as? [VNFaceObservation] {
    print("检测到 \(faces.count) 张人脸")

    for face in faces {
        print("人脸边界: \(face.boundingBox)")
        print("置信度: \(face.confidence)")

        if face.boundingBox.width < 0.1 {
            print("⚠️ 人脸太小")
        }
    }
}

人脸 < 图像的 10%（裁剪得更近）
侧面视图（改用面部关键点请求）
光照差（增加曝光）

修复时间：30 分钟

模式 5a：UI 冻结 (主线程)

症状：执行 Vision 请求时应用冻结

诊断（上述步骤 3 确认在主线程）

// 之前（错误）
let request = VNGenerateForegroundInstanceMaskRequest()
try handler.perform([request])  // 阻塞 UI

// 之后（正确）
DispatchQueue.global(qos: .userInitiated).async {
    let request = VNGenerateForegroundInstanceMaskRequest()
    try? handler.perform([request])

    DispatchQueue.main.async {
        // 更新 UI
    }
}

修复时间：15 分钟

模式 5b：性能问题 (后台线程)

症状：已在后台线程但仍缓慢/丢帧

let start = CFAbsoluteTimeGetCurrent()

try handler.perform([request])

let elapsed = CFAbsoluteTimeGetCurrent() - start
print("请求耗时 \(elapsed * 1000) 毫秒")

if elapsed > 0.2 {  // 200 毫秒 = 对于实时处理来说太慢
    print("⚠️ 请求对于实时处理来说太慢")
}

常见原因及修复：

原因	修复	节省时间
`maximumHandCount` = 10	设置为实际需要（例如 2）	50-70%
处理每一帧	跳过帧（每 3 帧处理一次）	66%
全分辨率图像	缩放到 1280x720	40-60%
每帧多个请求	批量或交替请求	30-50%

针对实时相机的修复：

// 跳过帧
frameCount += 1
guard frameCount % 3 == 0 else { return }

// 或者缩放
let scaledImage = resizeImage(sourceImage, to: CGSize(width: 1280, height: 720))

// 或者设置较低的手部数量
request.maximumHandCount = 2  // 而不是默认值

修复时间：1 小时

模式 6：坐标转换

症状：UI 叠加层出现在错误位置

// Vision 点（左下角原点，归一化）
let visionPoint = recognizedPoint.location
print("Vision 点: \(visionPoint)")  // 例如，(0.5, 0.8)

// 转换为 UIKit
let uiX = visionPoint.x * imageWidth
let uiY = (1 - visionPoint.y) * imageHeight  // 翻转 Y
print("UIKit 点: (\(uiX), \(uiY))")

// 验证叠加层
overlayView.center = CGPoint(x: uiX, y: uiY)

// ❌ 错误（未翻转 Y）
let uiPoint = CGPoint(
    x: axiom-visionPoint.x * width,
    y: axiom-visionPoint.y * height
)

// ❌ 错误（忘记从归一化值缩放）
let uiPoint = CGPoint(
    x: axiom-visionPoint.x,
    y: 1 - visionPoint.y
)

// ✅ 正确
let uiPoint = CGPoint(
    x: axiom-visionPoint.x * width,
    y: (1 - visionPoint.y) * height
)

修复时间：20 分钟

模式 7：拥挤场景 (>4 人)

症状：VNGeneratePersonInstanceMaskRequest 遗漏人物或合并人物

// 统计人脸数量
let faceRequest = VNDetectFaceRectanglesRequest()
try handler.perform([faceRequest])

let faceCount = faceRequest.results?.count ?? 0
print("检测到 \(faceCount) 张人脸")

// 人物实例分割
let personRequest = VNGeneratePersonInstanceMaskRequest()
try handler.perform([personRequest])

let personCount = (personRequest.results?.first as? VNInstanceMaskObservation)?.allInstances.count ?? 0
print("检测到 \(personCount) 个人")

if faceCount > 4 && personCount <= 4 {
    print("⚠️ 拥挤场景 - 部分人物被合并或遗漏")
}

if faceCount > 4 {
    // 回退：为所有人使用单个掩码
    let singleMaskRequest = VNGeneratePersonSegmentationRequest()
    try handler.perform([singleMaskRequest])

    // 或者引导用户
    showWarning("请减少画面中的人数（最多 4 人）")
}

修复时间：30 分钟

模式 8：VisionKit 特定问题

症状：ImageAnalysisInteraction 未显示主体提取 UI

// 1. 检查交互类型
print("交互类型: \(interaction.preferredInteractionTypes)")

// 2. 检查分析是否已设置
print("分析: \(interaction.analysis != nil ? "已设置" : "nil")")

// 3. 检查视图是否支持交互
if let view = interaction.view {
    print("视图: \(view)")
} else {
    print("❌ 未设置视图")
}

症状	原因	修复
未出现 UI	未设置 `analysis`	调用 `analyzer.analyze()` 并设置结果
出现 UI 但无主体提取	交互类型错误	设置为 `.imageSubject` 或 `.automatic`
交互时崩溃	交互前视图被移除	将视图保留在内存中

// 确保分析已设置
let analyzer = ImageAnalyzer()
let analysis = try await analyzer.analyze(image, configuration: config)

interaction.analysis = analysis  // 必需！
interaction.preferredInteractionTypes = .imageSubject

修复时间：20 分钟

模式 9a：未检测到文本 (图像质量)

症状：VNRecognizeTextRequest 无结果或返回空字符串

let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate

try handler.perform([request])

if request.results?.isEmpty ?? true {
    print("❌ 未检测到文本")

    // 检查图像质量
    print("图像尺寸: \(image.size)")
    print("最小文本高度: \(request.minimumTextHeight)")
}

for obs in request.results as? [VNRecognizedTextObservation] ?? [] {
    let top = obs.topCandidates(3)
    for candidate in top {
        print("'\(candidate.string)' 置信度: \(candidate.confidence)")
    }
}

原因	症状	修复
图像模糊	无结果	改善光照，稳定相机
文本太小	无结果	降低 `minimumTextHeight` 或裁剪得更近
艺术字体	误读或无结果	尝试 `.accurate` 识别级别
对比度低	部分结果	改善光照，增加图像对比度
旋转文本	使用 `.fast` 时无结果	使用 `.accurate`（处理旋转）

针对小文本的修复：

// 降低最小文本高度（默认忽略非常小的文本）
request.minimumTextHeight = 0.02  // 图像高度的 2%

修复时间：30 分钟

模式 9b：字符错误 (语言/校正)

症状：检测到文本但字符错误（例如，"C001" → "COOL"）

// 检查所有候选结果，不仅仅是第一个
for observation in results {
    let candidates = observation.topCandidates(5)
    for (i, candidate) in candidates.enumerated() {
        print("候选 \(i): '\(candidate.string)' (\(candidate.confidence))")
    }
}

输入类型	问题	修复
序列号	语言校正"修正"了它们	禁用 `usesLanguageCorrection`
技术代码	被误读为单词	添加到 `customWords`
非英语	错误的 ML 模型	设置正确的 `recognitionLanguages`
门牌号	艺术字体 → 误读	检查所有候选结果，不仅仅是顶部

针对代码/序列号的修复：

let request = VNRecognizeTextRequest()
request.usesLanguageCorrection = false  // 不要"修正"代码

// 使用领域知识进行后处理
func correctSerialNumber(_ text: String) -> String {
    text.replacingOccurrences(of: "O", with: "0")
        .replacingOccurrences(of: "l", with: "1")
        .replacingOccurrences(of: "S", with: "5")
}

修复时间：30 分钟

模式 9c：文本识别太慢

症状：文本识别耗时 >500 毫秒，实时相机丢帧

let start = CFAbsoluteTimeGetCurrent()
try handler.perform([request])
let elapsed = CFAbsoluteTimeGetCurrent() - start

print("识别耗时 \(elapsed * 1000) 毫秒")
print("识别级别: \(request.recognitionLevel == .fast ? "fast" : "accurate")")
print("语言校正: \(request.usesLanguageCorrection)")

常见原因及修复：

原因	修复	加速
实时处理使用 `.accurate`	切换到 `.fast`	3-5 倍
启用了语言校正	对代码禁用	20-30%
处理完整图像	使用 `regionOfInterest`	2-4 倍
处理每一帧	跳过帧	50-70%

针对实时的修复：

request.recognitionLevel = .fast
request.usesLanguageCorrection = false
request.regionOfInterest = CGRect(x: 0.1, y: 0.3, width: 0.8, height: 0.4)

// 跳过帧
frameCount += 1
guard frameCount % 3 == 0 else { return }

修复时间：30 分钟

模式 10a：未检测到条形码 (符号类型/大小)

症状：VNDetectBarcodesRequest 无结果

let request = VNDetectBarcodesRequest()
// 不要指定符号类型以检测所有类型
try handler.perform([request])

if let results = request.results as? [VNBarcodeObservation] {
    print("找到 \(results.count) 个条形码")
    for barcode in results {
        print("类型: \(barcode.symbology)")
        print("载荷: \(barcode.payloadStringValue ?? "nil")")
        print("边界: \(barcode.boundingBox)")
    }
} else {
    print("❌ 未检测到条形码")
}

原因	症状	修复
符号类型错误	未检测到	不要过滤，或添加正确类型
条形码太小	未检测到	将相机移近，裁剪图像
眩光/反光	未检测到	改变角度，改善光照
条形码损坏	部分/未检测到	清洁条形码，改善图像
使用修订版 1	仅一个代码	使用修订版 2+ 以支持多个

针对小条形码的修复：

// 裁剪到条形码区域以获得更好的检测效果
let croppedHandler = VNImageRequestHandler(
    cgImage: croppedImage,
    options: [:]
)

修复时间：20 分钟

模式 10b：条形码载荷错误

症状：检测到条形码但 payloadStringValue 错误或为 nil

if let barcode = results.first {
    print("字符串载荷: \(barcode.payloadStringValue ?? "nil")")
    print("原始载荷: \(barcode.payloadData ?? Data())")
    print("符号类型: \(barcode.symbology)")
    print("置信度: 隐式（条形码始终为 1.0）")
}

原因	修复
二进制条形码（非字符串）	改用 `payloadData`
代码损坏	重新扫描或清洁条形码
假设了错误的符号类型	检查实际的 `symbology` 值

修复时间：15 分钟

模式 11a：DataScanner 空白屏幕

症状：呈现 DataScannerViewController 时显示黑色/空白

// 首先检查支持情况
print("isSupported: \(DataScannerViewController.isSupported)")
print("isAvailable: \(DataScannerViewController.isAvailable)")

// 检查相机权限
let status = AVCaptureDevice.authorizationStatus(for: .video)
print("相机访问: \(status.rawValue)")

症状	原因	修复
`isSupported = false`	设备缺少相机/芯片	在呈现前检查
`isAvailable = false`	家长控制或访问被拒绝	请求相机权限
黑屏	相机被其他应用占用	确保独占访问
呈现时崩溃	缺少权限	添加相机使用描述

guard DataScannerViewController.isSupported else {
    showError("此设备不支持扫描")
    return
}

guard DataScannerViewController.isAvailable else {
    // 请求相机访问权限
    AVCaptureDevice.requestAccess(for: .video) { granted in
        // 访问授予后重试
    }
    return
}

修复时间：15 分钟

模式 11b：DataScanner 未检测到项目

症状：DataScanner 显示相机但未识别项目

// 检查识别的数据类型
print("数据类型: \(scanner.recognizedDataTypes)")

// 添加委托以查看发生的情况
func dataScanner(_ scanner: DataScannerViewController,
                 didAdd items: [RecognizedItem],
                 allItems: [RecognizedItem]) {
    print("添加了 \(items.count) 个项目，总计: \(allItems.count)")
    for item in items {
        switch item {
        case .text(let text): print("文本: \(text.transcript)")
        case .barcode(let barcode): print("条形码: \(barcode.payloadStringValue ?? "")")
        @unknown default: break
        }
    }
}

原因	修复
数据类型错误	添加正确的 `.barcode(symbologies:)` 或 `.text()`
文本内容类型过滤器	移除过滤器或使用正确类型
相机太近/太远	调整距离
光照差	改善光照

修复时间：20 分钟

模式 12a：未检测到文档边缘

症状：VNDetectDocumentSegmentationRequest 无结果

let request = VNDetectDocumentSegmentationRequest()
try handler.perform([request])

if let observation = request.results?.first {
    print("文档位于: \(observation.boundingBox)")
    print("角点: TL=\(observation.topLeft), TR=\(observation.topRight)")
} else {
    print("❌ 未检测到文档")
}

原因	修复
对比度低	使用对比鲜明的背景
非矩形	ML 期望矩形文档
眩光/反光	改变光照角度
文档充满画面	需要一些背景可见

修复：使用 VNDocumentCameraViewController 提供具有实时反馈的引导式用户体验。

修复时间：15 分钟

模式 12b：透视校正错误

症状：提取的文档但变形

// 验证角点顺序
print("左上角: \(observation.topLeft)")
print("右上角: \(observation.topRight)")
print("左下角: \(observation.bottomLeft)")
print("右下角: \(observation.bottomRight)")

// 检查角点是否在预期位置
// 左上角的 Y 应大于左下角（Vision 使用左下角原点）

原因	修复
角点顺序错误	Vision 使用从左上角开始的逆时针顺序
坐标系	将归一化坐标转换为像素坐标
过滤器参数错误	检查 CIPerspectiveCorrection 参数

// 将归一化坐标缩放到图像坐标
func scaled(_ point: CGPoint, to size: CGSize) -> CGPoint {
    CGPoint(x: point.x * size.width, y: point.y * size.height)
}

修复时间：20 分钟

情况：App Store 审核因"点击分析按钮时应用冻结"被拒绝

分类（5 分钟）：

确认 Vision 在主线程上运行 → 模式 5a
在旧设备（iPhone 12）上验证 → 冻结
检查性能分析：在主线程上耗时 800 毫秒

修复（15 分钟）：

@IBAction func analyzeTapped(_ sender: UIButton) {
    showLoadingIndicator()

    DispatchQueue.global(qos: .userInitiated).async { [weak self] in
        let request = VNGenerateForegroundInstanceMaskRequest()
        // ... 执行请求

        DispatchQueue.main.async {
            self?.hideLoadingIndicator()
            self?.updateUI(with: results)
        }
    }
}

与项目经理沟通："App Store 因 Vision 在主线程上处理而被拒绝。已通过移至后台队列（行业标准）修复。在 iPhone 12 上测试确认修复。可以安全重新提交。"

症状	可能原因	首先检查	模式	预计时间
无结果	未检测到任何内容	步骤 1 输出	1b/1c	30 分钟
间歇性检测	画面边缘	主体位置	1c	20 分钟
手部缺少关键点	置信度低	步骤 2 (置信度)	2	45 分钟
人体姿态检测跳帧	人弯腰	身体角度	3	1 小时
UI 冻结	主线程	步骤 3 (线程处理)	5a	15 分钟
处理缓慢	性能调优	请求计时	5b	1 小时
叠加层位置错误	坐标	打印点	6	20 分钟
遗漏人物 (>4)	拥挤场景	人脸数量	7	30 分钟
VisionKit 无 UI	未设置分析	交互状态	8	20 分钟
未检测到文本	图像质量	结果数量	9a	30 分钟
字符错误	语言设置	候选列表	9b	30 分钟
文本识别缓慢	识别级别	计时	9c	30 分钟
未检测到条形码	符号类型/大小	结果转储	10a	20 分钟
条形码载荷错误	损坏/二进制	载荷数据	10b	15 分钟
DataScanner 空白	可用性	isSupported/isAvailable	11a	15 分钟
DataScanner 无项目	数据类型	recognizedDataTypes	11b	20 分钟
未检测到文档边缘	对比度/形状	结果检查	12a	15 分钟
透视校正错误	角点顺序	角点位置	12b	20 分钟

WWDC：2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2020-10653

文档：/vision, /vision/vnrecognizetextrequest, /visionkit

技能：axiom-vision, axiom-vision-ref

2026 年 1 月 21 日

🇺🇸English

Vision Framework Diagnostics

Systematic troubleshooting for Vision framework issues: subjects not detected, missing landmarks, low confidence, performance problems, coordinate mismatches, text recognition failures, barcode detection issues, and document scanning problems.

Overview

Core Principle : When Vision doesn't work, the problem is usually:

Environment (lighting, occlusion, edge of frame) - 40%
Confidence threshold (ignoring low confidence data) - 30%
Threading (blocking main thread causes frozen UI) - 15%
Coordinates (mixing lower-left and top-left origins) - 10%
API availability (using iOS 17+ APIs on older devices) - 5%

Always check environment and confidence BEFORE debugging code.

Red Flags

Symptoms that indicate Vision-specific issues:

Symptom	Likely Cause
Subject not detected at all	Edge of frame, poor lighting, very small subject
Hand landmarks intermittently nil	Hand near edge, parallel to camera, glove/occlusion
Body pose skipped frames	Person bent over, upside down, flowing clothing
UI freezes during processing	Running Vision on main thread
Overlays in wrong position	Coordinate conversion (lower-left vs top-left)
Crash on older devices	Using iOS 17+ APIs without `@available` check
Person segmentation misses people	>4 people in scene (instance mask limit)
Low FPS in camera feed	`maximumHandCount` too high, not dropping frames
Text not recognized at all	Blurry image, stylized font, wrong recognition level
Text misread (wrong characters)	Language correction disabled, missing custom words
Barcode not detected	Wrong symbology, code too small, glare/reflection
DataScanner shows blank screen	Camera access denied, device not supported
Document edges not detected	Low contrast, non-rectangular, glare
Real-time scanning too slow	Processing every frame, region too large

Mandatory First Steps

Before investigating code, run these diagnostics:

Step 1: Verify Detection with Diagnostic Code

let request = VNGenerateForegroundInstanceMaskRequest()  // Or hand/body pose
let handler = VNImageRequestHandler(cgImage: testImage)

do {
    try handler.perform([request])

    if let results = request.results {
        print("✅ Request succeeded")
        print("Result count: \(results.count)")

        if let observation = results.first as? VNInstanceMaskObservation {
            print("All instances: \(observation.allInstances)")
            print("Instance count: \(observation.allInstances.count)")
        }
    } else {
        print("⚠️ Request succeeded but no results")
    }
} catch {
    print("❌ Request failed: \(error)")
}

Expected output :

✅ Request succeeded, instance count > 0 → Detection working
⚠️ Request succeeded, instance count = 0 → Nothing detected (see Decision Tree)
❌ Request failed → API availability issue

Step 2: Check Confidence Scores

// For hand/body pose
if let observation = request.results?.first as? VNHumanHandPoseObservation {
    let allPoints = try observation.recognizedPoints(.all)

    for (key, point) in allPoints {
        print("\(key): confidence \(point.confidence)")

        if point.confidence < 0.3 {
            print("  ⚠️ LOW CONFIDENCE - unreliable")
        }
    }
}

Expected output :

Most landmarks > 0.5 confidence → Good detection
Many landmarks < 0.3 → Poor lighting, occlusion, or edge of frame

Step 3: Verify Threading

print("🧵 Thread: \(Thread.current)")

if Thread.isMainThread {
    print("❌ Running on MAIN THREAD - will block UI!")
} else {
    print("✅ Running on background thread")
}

Expected output :

✅ Background thread → Correct
❌ Main thread → Move to DispatchQueue.global()

Decision Tree

Vision not working as expected?
│
├─ No results returned?
│  ├─ Check Step 1 output
│  │  ├─ "Request failed" → See Pattern 1a (API availability)
│  │  ├─ "No results" → See Pattern 1b (nothing detected)
│  │  └─ Results but count = 0 → See Pattern 1c (edge of frame)
│
├─ Landmarks have nil/low confidence?
│  ├─ Hand pose → See Pattern 2 (hand detection issues)
│  ├─ Body pose → See Pattern 3 (body detection issues)
│  └─ Face detection → See Pattern 4 (face detection issues)
│
├─ UI freezing/slow?
│  ├─ Check Step 3 (threading)
│  │  ├─ Main thread → See Pattern 5a (move to background)
│  │  └─ Background thread → See Pattern 5b (performance tuning)
│
├─ Overlays in wrong position?
│  └─ See Pattern 6 (coordinate conversion)
│
├─ Person segmentation missing people?
│  └─ See Pattern 7 (crowded scenes)
│
├─ VisionKit not working?
│  └─ See Pattern 8 (VisionKit specific)
│
├─ Text recognition issues?
│  ├─ No text detected → See Pattern 9a (image quality)
│  ├─ Wrong characters → See Pattern 9b (language/correction)
│  └─ Too slow → See Pattern 9c (recognition level)
│
├─ Barcode detection issues?
│  ├─ Barcode not detected → See Pattern 10a (symbology/size)
│  └─ Wrong payload → See Pattern 10b (barcode quality)
│
├─ DataScannerViewController issues?
│  ├─ Blank screen → See Pattern 11a (availability check)
│  └─ Items not detected → See Pattern 11b (data types)
│
└─ Document scanning issues?
   ├─ Edges not detected → See Pattern 12a (contrast/shape)
   └─ Perspective wrong → See Pattern 12b (corner points)

Diagnostic Patterns

Pattern 1a: Request Failed (API Availability)

Symptom : try handler.perform([request]) throws error

Common errors :

"VNGenerateForegroundInstanceMaskRequest is only available on iOS 17.0 or newer"
"VNDetectHumanBodyPose3DRequest is only available on iOS 17.0 or newer"

Root cause : Using iOS 17+ APIs on older deployment target

Fix :

if #available(iOS 17.0, *) {
    let request = VNGenerateForegroundInstanceMaskRequest()
    // ...
} else {
    // Fallback for iOS 14-16
    let request = VNGeneratePersonSegmentationRequest()
    // ...
}

Prevention : Check API availability in axiom-vision-ref before implementing

Time to fix : 10 min

Pattern 1b: No Results (Nothing Detected)

Symptom : request.results == nil or results.isEmpty

Diagnostic :

// 1. Save debug image to Photos
UIImageWriteToSavedPhotosAlbum(debugImage, nil, nil, nil)

// 2. Inspect visually
// - Is subject too small? (< 10% of image)
// - Is subject blurry?
// - Poor contrast with background?

Common causes :

Subject too small (resize or crop closer)
Subject too blurry (increase lighting, stabilize camera)
Low contrast (subject same color as background)

Fix :

// Crop image to focus on region of interest
let croppedImage = cropImage(sourceImage, to: regionOfInterest)
let handler = VNImageRequestHandler(cgImage: croppedImage)

Time to fix : 30 min

Pattern 1c: Edge of Frame Issues

Symptom : Subject detected intermittently as object moves across frame

Root cause : Partial occlusion when subject touches image edges

Diagnostic :

// Check if subject is near edges
if let observation = results.first as? VNInstanceMaskObservation {
    let mask = try observation.createScaledMask(
        for: observation.allInstances,
        croppedToInstancesContent: true
    )

    let bounds = calculateMaskBounds(mask)

    if bounds.minX < 0.1 || bounds.maxX > 0.9 ||
       bounds.minY < 0.1 || bounds.maxY > 0.9 {
        print("⚠️ Subject too close to edge")
    }
}

Fix :

// Add padding to capture area
let paddedRect = captureRect.insetBy(dx: -20, dy: -20)

// OR guide user with on-screen overlay
overlayView.addSubview(guideBox)  // Visual boundary

Time to fix : 20 min

Pattern 2: Hand Pose Issues

Symptom : VNDetectHumanHandPoseRequest returns nil or low confidence landmarks

Diagnostic :

if let observation = request.results?.first as? VNHumanHandPoseObservation {
    let thumbTip = try? observation.recognizedPoint(.thumbTip)
    let wrist = try? observation.recognizedPoint(.wrist)

    print("Thumb confidence: \(thumbTip?.confidence ?? 0)")
    print("Wrist confidence: \(wrist?.confidence ?? 0)")

    // Check hand orientation
    if let thumb = thumbTip, let wristPoint = wrist {
        let angle = atan2(
            thumb.location.y - wristPoint.location.y,
            thumb.location.x - wristPoint.location.x
        )
        print("Hand angle: \(angle * 180 / .pi) degrees")

        if abs(angle) > 80 && abs(angle) < 100 {
            print("⚠️ Hand parallel to camera (hard to detect)")
        }
    }
}

Common causes :

Cause	Confidence Pattern	Fix
Hand near edge	Tips have low confidence	Adjust framing
Hand parallel to camera	All landmarks low	Prompt user to rotate hand
Gloves/occlusion	Fingers low, wrist high	Remove gloves or change lighting
Feet detected as hands	Unexpected hand detected	Add `chirality` check or ignore

Fix for parallel hand :

// Detect and warn user
if avgConfidence < 0.4 {
    showWarning("Rotate your hand toward the camera")
}

Time to fix : 45 min

Pattern 3: Body Pose Issues

Symptom : VNDetectHumanBodyPoseRequest skips frames or returns low confidence

Diagnostic :

if let observation = request.results?.first as? VNHumanBodyPoseObservation {
    let nose = try? observation.recognizedPoint(.nose)
    let root = try? observation.recognizedPoint(.root)

    if let nosePoint = nose, let rootPoint = root {
        let bodyAngle = atan2(
            nosePoint.location.y - rootPoint.location.y,
            nosePoint.location.x - rootPoint.location.x
        )

        let angleFromVertical = abs(bodyAngle - .pi / 2)

        if angleFromVertical > .pi / 4 {
            print("⚠️ Person bent over or upside down")
        }
    }
}

Common causes :

Cause	Solution
Person bent over	Prompt user to stand upright
Upside down (handstand)	Use ARKit instead (better for dynamic poses)
Flowing clothing	Increase contrast or use tighter clothing
Multiple people overlapping	Use person instance segmentation

Time to fix : 1 hour

Pattern 4: Face Detection Issues

Symptom : VNDetectFaceRectanglesRequest misses faces or returns wrong count

Diagnostic :

if let faces = request.results as? [VNFaceObservation] {
    print("Detected \(faces.count) faces")

    for face in faces {
        print("Face bounds: \(face.boundingBox)")
        print("Confidence: \(face.confidence)")

        if face.boundingBox.width < 0.1 {
            print("⚠️ Face too small")
        }
    }
}

Common causes :

Face < 10% of image (crop closer)
Profile view (use face landmarks request instead)
Poor lighting (increase exposure)

Time to fix : 30 min

Pattern 5a: UI Freezing (Main Thread)

Symptom : App freezes when performing Vision request

Diagnostic (Step 3 above confirms main thread)

Fix :

// BEFORE (wrong)
let request = VNGenerateForegroundInstanceMaskRequest()
try handler.perform([request])  // Blocks UI

// AFTER (correct)
DispatchQueue.global(qos: .userInitiated).async {
    let request = VNGenerateForegroundInstanceMaskRequest()
    try? handler.perform([request])

    DispatchQueue.main.async {
        // Update UI
    }
}

Time to fix : 15 min

Pattern 5b: Performance Issues (Background Thread)

Symptom : Already on background thread but still slow / dropping frames

Diagnostic :

let start = CFAbsoluteTimeGetCurrent()

try handler.perform([request])

let elapsed = CFAbsoluteTimeGetCurrent() - start
print("Request took \(elapsed * 1000)ms")

if elapsed > 0.2 {  // 200ms = too slow for real-time
    print("⚠️ Request too slow for real-time processing")
}

Common causes & fixes:

Cause	Fix	Time Saved
`maximumHandCount` = 10	Set to actual need (e.g., 2)	50-70%
Processing every frame	Skip frames (process every 3rd)	66%
Full-res images	Downscale to 1280x720	40-60%
Multiple requests per frame	Batch or alternate requests	30-50%

Fix for real-time camera :

// Skip frames
frameCount += 1
guard frameCount % 3 == 0 else { return }

// OR downscale
let scaledImage = resizeImage(sourceImage, to: CGSize(width: 1280, height: 720))

// OR set lower hand count
request.maximumHandCount = 2  // Instead of default

Time to fix : 1 hour

Pattern 6: Coordinate Conversion

Symptom : UI overlays appear in wrong position

Diagnostic :

// Vision point (lower-left origin, normalized)
let visionPoint = recognizedPoint.location
print("Vision point: \(visionPoint)")  // e.g., (0.5, 0.8)

// Convert to UIKit
let uiX = visionPoint.x * imageWidth
let uiY = (1 - visionPoint.y) * imageHeight  // FLIP Y
print("UIKit point: (\(uiX), \(uiY))")

// Verify overlay
overlayView.center = CGPoint(x: uiX, y: uiY)

Common mistakes :

// ❌ WRONG (no Y flip)
let uiPoint = CGPoint(
    x: axiom-visionPoint.x * width,
    y: axiom-visionPoint.y * height
)

// ❌ WRONG (forgot to scale from normalized)
let uiPoint = CGPoint(
    x: axiom-visionPoint.x,
    y: 1 - visionPoint.y
)

// ✅ CORRECT
let uiPoint = CGPoint(
    x: axiom-visionPoint.x * width,
    y: (1 - visionPoint.y) * height
)

Time to fix : 20 min

Pattern 7: Crowded Scenes (>4 People)

Symptom : VNGeneratePersonInstanceMaskRequest misses people or combines them

Diagnostic :

// Count faces
let faceRequest = VNDetectFaceRectanglesRequest()
try handler.perform([faceRequest])

let faceCount = faceRequest.results?.count ?? 0
print("Detected \(faceCount) faces")

// Person instance segmentation
let personRequest = VNGeneratePersonInstanceMaskRequest()
try handler.perform([personRequest])

let personCount = (personRequest.results?.first as? VNInstanceMaskObservation)?.allInstances.count ?? 0
print("Detected \(personCount) people")

if faceCount > 4 && personCount <= 4 {
    print("⚠️ Crowded scene - some people combined or missing")
}

Fix :

if faceCount > 4 {
    // Fallback: Use single mask for all people
    let singleMaskRequest = VNGeneratePersonSegmentationRequest()
    try handler.perform([singleMaskRequest])

    // OR guide user
    showWarning("Please reduce number of people in frame (max 4)")
}

Time to fix : 30 min

Pattern 8: VisionKit Specific Issues

Symptom : ImageAnalysisInteraction not showing subject lifting UI

Diagnostic :

// 1. Check interaction types
print("Interaction types: \(interaction.preferredInteractionTypes)")

// 2. Check if analysis is set
print("Analysis: \(interaction.analysis != nil ? "set" : "nil")")

// 3. Check if view supports interaction
if let view = interaction.view {
    print("View: \(view)")
} else {
    print("❌ View not set")
}

Common causes :

Symptom	Cause	Fix
No UI appears	`analysis` not set	Call `analyzer.analyze()` and set result
UI appears but no subject lifting	Wrong interaction type	Set `.imageSubject` or `.automatic`
Crash on interaction	View removed before interaction	Keep view in memory

Fix :

// Ensure analysis is set
let analyzer = ImageAnalyzer()
let analysis = try await analyzer.analyze(image, configuration: config)

interaction.analysis = analysis  // Required!
interaction.preferredInteractionTypes = .imageSubject

Time to fix : 20 min

Pattern 9a: Text Not Detected (Image Quality)

Symptom : VNRecognizeTextRequest returns no results or empty strings

Diagnostic :

let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate

try handler.perform([request])

if request.results?.isEmpty ?? true {
    print("❌ No text detected")

    // Check image quality
    print("Image size: \(image.size)")
    print("Minimum text height: \(request.minimumTextHeight)")
}

for obs in request.results as? [VNRecognizedTextObservation] ?? [] {
    let top = obs.topCandidates(3)
    for candidate in top {
        print("'\(candidate.string)' confidence: \(candidate.confidence)")
    }
}

Common causes :

Cause	Symptom	Fix
Blurry image	No results	Improve lighting, stabilize camera
Text too small	No results	Lower `minimumTextHeight` or crop closer
Stylized font	Misread or no results	Try `.accurate` recognition level
Low contrast	Partial results	Improve lighting, increase image contrast
Rotated text	No results with `.fast`	Use `.accurate` (handles rotation)

Fix for small text :

// Lower minimum text height (default ignores very small text)
request.minimumTextHeight = 0.02  // 2% of image height

Time to fix : 30 min

Pattern 9b: Wrong Characters (Language/Correction)

Symptom : Text is detected but characters are wrong (e.g., "C001" → "COOL")

Diagnostic :

// Check all candidates, not just first
for observation in results {
    let candidates = observation.topCandidates(5)
    for (i, candidate) in candidates.enumerated() {
        print("Candidate \(i): '\(candidate.string)' (\(candidate.confidence))")
    }
}

Common causes :

Input Type	Problem	Fix
Serial numbers	Language correction "fixes" them	Disable `usesLanguageCorrection`
Technical codes	Misread as words	Add to `customWords`
Non-English	Wrong ML model	Set correct `recognitionLanguages`
House numbers	Stylized → misread	Check all candidates, not just top

Fix for codes/serial numbers :

let request = VNRecognizeTextRequest()
request.usesLanguageCorrection = false  // Don't "fix" codes

// Post-process with domain knowledge
func correctSerialNumber(_ text: String) -> String {
    text.replacingOccurrences(of: "O", with: "0")
        .replacingOccurrences(of: "l", with: "1")
        .replacingOccurrences(of: "S", with: "5")
}

Time to fix : 30 min

Pattern 9c: Text Recognition Too Slow

Symptom : Text recognition takes >500ms, real-time camera drops frames

Diagnostic :

let start = CFAbsoluteTimeGetCurrent()
try handler.perform([request])
let elapsed = CFAbsoluteTimeGetCurrent() - start

print("Recognition took \(elapsed * 1000)ms")
print("Recognition level: \(request.recognitionLevel == .fast ? "fast" : "accurate")")
print("Language correction: \(request.usesLanguageCorrection)")

Common causes & fixes:

Cause	Fix	Speedup
Using `.accurate` for real-time	Switch to `.fast`	3-5x
Language correction enabled	Disable for codes	20-30%
Full image processing	Use `regionOfInterest`	2-4x
Processing every frame	Skip frames	50-70%

Fix for real-time :

request.recognitionLevel = .fast
request.usesLanguageCorrection = false
request.regionOfInterest = CGRect(x: 0.1, y: 0.3, width: 0.8, height: 0.4)

// Skip frames
frameCount += 1
guard frameCount % 3 == 0 else { return }

Time to fix : 30 min

Pattern 10a: Barcode Not Detected (Symbology/Size)

Symptom : VNDetectBarcodesRequest returns no results

Diagnostic :

let request = VNDetectBarcodesRequest()
// Don't specify symbologies to detect all types
try handler.perform([request])

if let results = request.results as? [VNBarcodeObservation] {
    print("Found \(results.count) barcodes")
    for barcode in results {
        print("Type: \(barcode.symbology)")
        print("Payload: \(barcode.payloadStringValue ?? "nil")")
        print("Bounds: \(barcode.boundingBox)")
    }
} else {
    print("❌ No barcodes detected")
}

Common causes :

Cause	Symptom	Fix
Wrong symbology	Not detected	Don't filter, or add correct type
Barcode too small	Not detected	Move camera closer, crop image
Glare/reflection	Not detected	Change angle, improve lighting
Damaged barcode	Partial/no detection	Clean barcode, improve image
Using revision 1	Only one code	Use revision 2+ for multiple

Fix for small barcodes :

// Crop to barcode region for better detection
let croppedHandler = VNImageRequestHandler(
    cgImage: croppedImage,
    options: [:]
)

Time to fix : 20 min

Pattern 10b: Wrong Barcode Payload

Symptom : Barcode detected but payloadStringValue is wrong or nil

Diagnostic :

if let barcode = results.first {
    print("String payload: \(barcode.payloadStringValue ?? "nil")")
    print("Raw payload: \(barcode.payloadData ?? Data())")
    print("Symbology: \(barcode.symbology)")
    print("Confidence: Implicit (always 1.0 for barcodes)")
}

Common causes :

Cause	Fix
Binary barcode (not string)	Use `payloadData` instead
Damaged code	Re-scan or clean barcode
Wrong symbology assumed	Check actual `symbology` value

Time to fix : 15 min

Pattern 11a: DataScanner Blank Screen

Symptom : DataScannerViewController shows black/blank when presented

Diagnostic :

// Check support first
print("isSupported: \(DataScannerViewController.isSupported)")
print("isAvailable: \(DataScannerViewController.isAvailable)")

// Check camera permission
let status = AVCaptureDevice.authorizationStatus(for: .video)
print("Camera access: \(status.rawValue)")

Common causes :

Symptom	Cause	Fix
`isSupported = false`	Device lacks camera/chip	Check before presenting
`isAvailable = false`	Parental controls or access denied	Request camera permission
Black screen	Camera in use by another app	Ensure exclusive access
Crash on present	Missing entitlements	Add camera usage description

Fix :

guard DataScannerViewController.isSupported else {
    showError("Scanning not supported on this device")
    return
}

guard DataScannerViewController.isAvailable else {
    // Request camera access
    AVCaptureDevice.requestAccess(for: .video) { granted in
        // Retry after access granted
    }
    return
}

Time to fix : 15 min

Pattern 11b: DataScanner Items Not Detected

Symptom : DataScanner shows camera but doesn't recognize items

Diagnostic :

// Check recognized data types
print("Data types: \(scanner.recognizedDataTypes)")

// Add delegate to see what's happening
func dataScanner(_ scanner: DataScannerViewController,
                 didAdd items: [RecognizedItem],
                 allItems: [RecognizedItem]) {
    print("Added \(items.count) items, total: \(allItems.count)")
    for item in items {
        switch item {
        case .text(let text): print("Text: \(text.transcript)")
        case .barcode(let barcode): print("Barcode: \(barcode.payloadStringValue ?? "")")
        @unknown default: break
        }
    }
}

Common causes :

Cause	Fix
Wrong data types	Add correct `.barcode(symbologies:)` or `.text()`
Text content type filter	Remove filter or use correct type
Camera too close/far	Adjust distance
Poor lighting	Improve lighting

Time to fix : 20 min

Pattern 12a: Document Edges Not Detected

Symptom : VNDetectDocumentSegmentationRequest returns no results

Diagnostic :

let request = VNDetectDocumentSegmentationRequest()
try handler.perform([request])

if let observation = request.results?.first {
    print("Document found at: \(observation.boundingBox)")
    print("Corners: TL=\(observation.topLeft), TR=\(observation.topRight)")
} else {
    print("❌ No document detected")
}

Common causes :

Cause	Fix
Low contrast	Use contrasting background
Non-rectangular	ML expects rectangular documents
Glare/reflection	Change lighting angle
Document fills frame	Need some background visible

Fix : Use VNDocumentCameraViewController for guided user experience with live feedback.

Time to fix : 15 min

Pattern 12b: Perspective Correction Wrong

Symptom : Document extracted but distorted

Diagnostic :

// Verify corner order
print("TopLeft: \(observation.topLeft)")
print("TopRight: \(observation.topRight)")
print("BottomLeft: \(observation.bottomLeft)")
print("BottomRight: \(observation.bottomRight)")

// Check if corners are in expected positions
// TopLeft should have larger Y than BottomLeft (Vision uses lower-left origin)

Common causes :

Cause	Fix
Corner order wrong	Vision uses counterclockwise from top-left
Coordinate system	Convert normalized to pixel coordinates
Filter parameters wrong	Check CIPerspectiveCorrection parameters

Fix :

// Scale normalized to image coordinates
func scaled(_ point: CGPoint, to size: CGSize) -> CGPoint {
    CGPoint(x: point.x * size.width, y: point.y * size.height)
}

Time to fix : 20 min

Production Crisis Scenario

Situation : App Store review rejected for "app freezes when tapping analyze button"

Triage (5 min) :

Confirm Vision running on main thread → Pattern 5a
Verify on older device (iPhone 12) → Freezes
Check profiling: 800ms on main thread

Fix (15 min) :

@IBAction func analyzeTapped(_ sender: UIButton) {
    showLoadingIndicator()

    DispatchQueue.global(qos: .userInitiated).async { [weak self] in
        let request = VNGenerateForegroundInstanceMaskRequest()
        // ... perform request

        DispatchQueue.main.async {
            self?.hideLoadingIndicator()
            self?.updateUI(with: results)
        }
    }
}

Communicate to PM : "App Store rejection due to Vision processing on main thread. Fixed by moving to background queue (industry standard). Testing on iPhone 12 confirms fix. Safe to resubmit."

Quick Reference Table

Symptom	Likely Cause	First Check	Pattern	Est. Time
No results	Nothing detected	Step 1 output	1b/1c	30 min
Intermittent detection	Edge of frame	Subject position	1c	20 min
Hand missing landmarks	Low confidence	Step 2 (confidence)	2	45 min
Body pose skipped	Person bent over	Body angle	3	1 hour
UI freezes	Main thread	Step 3 (threading)	5a	15 min
Slow processing	Performance tuning

Resources

WWDC : 2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2020-10653

Docs : /vision, /vision/vnrecognizetextrequest, /visionkit

Skills : axiom-vision, axiom-vision-ref

Weekly Installs

Repository

charleswiltgen/axiom

GitHub Stars

601

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode72

claude-code68

codex67

gemini-cli65

cursor65

github-copilot62

Vision 框架诊断指南：解决 iOS 视觉识别问题（主体检测、关键点、置信度、性能）

🇨🇳中文介绍

Vision 框架诊断

概述

危险信号

相关 Skills

强制性的第一步

步骤 1：使用诊断代码验证检测

步骤 2：检查置信度分数

步骤 3：验证线程处理

决策树

诊断模式

模式 1a：请求失败 (API 可用性)

模式 1b：无结果 (未检测到任何内容)

模式 1c：画面边缘问题

模式 2：手部姿态问题

模式 3：人体姿态问题

模式 4：人脸检测问题

模式 5a：UI 冻结 (主线程)

模式 5b：性能问题 (后台线程)

模式 6：坐标转换

模式 7：拥挤场景 (>4 人)

模式 8：VisionKit 特定问题

模式 9a：未检测到文本 (图像质量)

模式 9b：字符错误 (语言/校正)

模式 9c：文本识别太慢

模式 10a：未检测到条形码 (符号类型/大小)

模式 10b：条形码载荷错误

模式 11a：DataScanner 空白屏幕

模式 11b：DataScanner 未检测到项目

模式 12a：未检测到文档边缘

模式 12b：透视校正错误

生产危机场景

快速参考表

资源

🇺🇸English

Vision Framework Diagnostics

Overview

Red Flags

Mandatory First Steps

Step 1: Verify Detection with Diagnostic Code

Step 2: Check Confidence Scores

Step 3: Verify Threading

Decision Tree

Diagnostic Patterns

Pattern 1a: Request Failed (API Availability)

Pattern 1b: No Results (Nothing Detected)

Pattern 1c: Edge of Frame Issues

Pattern 2: Hand Pose Issues

Pattern 3: Body Pose Issues

Pattern 4: Face Detection Issues

Pattern 5a: UI Freezing (Main Thread)

Pattern 5b: Performance Issues (Background Thread)

Pattern 6: Coordinate Conversion

Pattern 7: Crowded Scenes (>4 People)

Pattern 8: VisionKit Specific Issues

Pattern 9a: Text Not Detected (Image Quality)

Pattern 9b: Wrong Characters (Language/Correction)

Pattern 9c: Text Recognition Too Slow

Pattern 10a: Barcode Not Detected (Symbology/Size)

Pattern 10b: Wrong Barcode Payload

Pattern 11a: DataScanner Blank Screen

Pattern 11b: DataScanner Items Not Detected

Pattern 12a: Document Edges Not Detected

Pattern 12b: Perspective Correction Wrong

Production Crisis Scenario

Quick Reference Table

Resources

最新 Skills