vision-framework by dpearson2699/swift-ios-skills
npx skills add https://github.com/dpearson2699/swift-ios-skills --skill vision-framework使用设备端计算机视觉检测图像和视频中的文本、人脸、条形码、物体和身体姿态。模式针对 iOS 26+ 和 Swift 6.2,在注明处向后兼容。
完整代码模式请参阅 references/vision-requests.md,DataScannerViewController 集成请参阅 references/visionkit-scanner.md。
Vision 有两个不同的 API 层。新代码建议使用现代 API。
| 方面 | 现代(iOS 18+) | 旧版 |
|---|---|---|
| 模式 | let result = try await request.perform(on: image) |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
VNImageRequestHandler + 完成处理器 |
| 请求类型 | Swift 类型 — 结构体和类(RecognizeTextRequest、DetectFaceRectanglesRequest) | ObjC 类(VNRecognizeTextRequest、VNDetectFaceRectanglesRequest) |
| 并发性 | 原生 async/await | 完成处理器或同步 perform |
| 观察结果 | 类型化返回值 | 从 [Any] 转换 results |
| 可用性 | iOS 18+ / macOS 15+ | iOS 11+ |
现代 API 使用 ImageProcessingRequest 协议。每个请求类型都有一个 perform(on:orientation:) 方法,接受 CGImage、CIImage、CVPixelBuffer、CMSampleBuffer、Data 或 URL。大多数请求是结构体;用于视频追踪的有状态请求(例如 TrackObjectRequest、TrackRectangleRequest、DetectTrajectoriesRequest)是最终类。
所有现代 Vision 请求都遵循相同的模式:创建请求结构体,调用 perform(on:),并处理类型化的结果。
import Vision
func recognizeText(in image: CGImage) async throws -> [String] {
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = [Locale.Language(identifier: "en-US")]
let observations = try await request.perform(on: image)
return observations.compactMap { observation in
observation.topCandidates(1).first?.string
}
}
当面向较旧的部署版本时,使用带有基于完成请求的 VNImageRequestHandler。
import Vision
func recognizeTextLegacy(in image: CGImage) throws -> [String] {
var recognized: [String] = []
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
recognized = observations.compactMap { $0.topCandidates(1).first?.string }
}
request.recognitionLevel = .accurate
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
return recognized
}
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate // .fast 用于实时
request.recognitionLanguages = [
Locale.Language(identifier: "en-US"),
Locale.Language(identifier: "fr-FR"),
]
request.usesLanguageCorrection = true
request.customWords = ["SwiftUI", "Xcode"] // 领域特定术语
let observations = try await request.perform(on: cgImage)
for observation in observations {
guard let candidate = observation.topCandidates(1).first else { continue }
let text = candidate.string
let confidence = candidate.confidence // 0.0 ... 1.0
let bounds = observation.boundingBox // 归一化坐标
}
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en-US", "fr-FR"]
request.usesLanguageCorrection = true
关键区别: 现代 API 使用 Locale.Language 表示语言;旧版使用字符串标识符。两者都支持 .accurate(最佳质量)和 .fast(适合实时)识别级别。
检测人脸矩形、特征点(眼睛、鼻子、嘴巴)和捕捉质量。
// 现代 API
let faceRequest = DetectFaceRectanglesRequest()
let faces = try await faceRequest.perform(on: cgImage)
for face in faces {
let boundingBox = face.boundingBox // 归一化 CGRect
let roll = face.roll // Measurement<UnitAngle>
let yaw = face.yaw // Measurement<UnitAngle>
}
// 特征点(眼睛、鼻子、嘴巴轮廓)
var landmarkRequest = DetectFaceLandmarksRequest()
let landmarkFaces = try await landmarkRequest.perform(on: cgImage)
for face in landmarkFaces {
let landmarks = face.landmarks
let leftEye = landmarks?.leftEye?.normalizedPoints
let nose = landmarks?.nose?.normalizedPoints
}
Vision 使用一个原点在左下角的归一化坐标系。在显示前转换为 UIKit(左上角原点):
func convertToUIKit(_ rect: CGRect, imageHeight: CGFloat) -> CGRect {
CGRect(
x: rect.origin.x,
y: imageHeight - rect.origin.y - rect.height,
width: rect.width,
height: rect.height
)
}
检测包括二维码在内的 1D 和 2D 条形码。
var request = DetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128, .pdf417]
let barcodes = try await request.perform(on: cgImage)
for barcode in barcodes {
let payload = barcode.payloadString // 解码内容
let symbology = barcode.symbology // .qr, .ean13 等
let bounds = barcode.boundingBox // 归一化矩形
}
常见符号体系:.qr、.aztec、.pdf417、.dataMatrix、.ean8、.ean13、.code39、.code128、.upce、.itf14。
RecognizeDocumentsRequest 提供具有布局理解的文档结构化读取,超越了基本 OCR。返回带有嵌套 Container 结构的 DocumentObservation 对象,用于段落、表格、列表和条形码。
var request = RecognizeDocumentsRequest()
let documents = try await request.perform(on: cgImage)
for observation in documents {
let container = observation.document
// 完整文本内容
let fullText = container.text
// 对段落的结构化访问
for paragraph in container.paragraphs {
let paragraphText = paragraph.text
}
// 表格和列表
for table in container.tables { /* 结构化表格数据 */ }
for list in container.lists { /* 结构化列表数据 */ }
// 在文档内检测到的嵌入式条形码
for barcode in container.barcodes { /* 条形码数据 */ }
// 如果检测到文档标题
if let title = container.title { print(title) }
}
对于更简单的文档相机扫描,使用 VisionKit 的 VNDocumentCameraViewController,它提供全屏相机 UI,具有自动捕捉、透视校正和多页扫描功能。
var request = GeneratePersonSegmentationRequest()
request.qualityLevel = .accurate // .balanced, .fast
let mask = try await request.perform(on: cgImage)
// mask 是一个具有 pixelBuffer 属性的 PersonSegmentationObservation
let maskBuffer = mask.pixelBuffer
// 使用 Core Image 应用遮罩:CIFilter.blendWithMask()
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .accurate // .balanced, .fast
request.outputPixelFormat = kCVPixelFormatType_OneComponent8
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
guard let mask = request.results?.first?.pixelBuffer else { return }
// 使用 Core Image 应用遮罩:CIFilter.blendWithMask()
质量级别:
.accurate -- 最佳质量,最慢(约 1 秒),全分辨率.balanced -- 良好质量,中等速度(约 100 毫秒),960x540.fast -- 最低质量,最快(约 10 毫秒),256x144,适合实时为每个人分离遮罩以实现个体效果。
// 现代 API(iOS 18+)
let request = GeneratePersonInstanceMaskRequest()
let observation = try await request.perform(on: cgImage)
let indices = observation.allInstances
for index in indices {
let mask = try observation.generateMask(forInstances: IndexSet(integer: index))
// mask 是一个仅显示此人的 CVPixelBuffer
}
// 旧版 API(iOS 17+)
let request = VNGeneratePersonInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
guard let result = request.results?.first else { return }
let indices = result.allInstances
for index in indices {
let instanceMask = try result.generateMaskedImage(
ofInstances: IndexSet(integer: index),
from: handler,
croppedToInstancesExtent: false
)
}
遮罩合成和 Core Image 滤镜集成模式请参阅 references/vision-requests.md。
TrackObjectRequest 是一个有状态请求,跨帧维护追踪上下文。同时遵循 ImageProcessingRequest 和 StatefulRequest 协议。
// 使用检测到的物体的边界框初始化
let initialObservation = DetectedObjectObservation(boundingBox: detectedRect)
var request = TrackObjectRequest(observation: initialObservation)
request.trackingLevel = .accurate
// 对于每个视频帧:
let results = try await request.perform(on: pixelBuffer)
if let tracked = results.first {
let updatedBounds = tracked.boundingBox
let confidence = tracked.confidence
}
let trackRequest = VNTrackObjectRequest(detectedObjectObservation: initialObservation)
trackRequest.trackingLevel = .accurate
let sequenceHandler = VNSequenceRequestHandler()
// 对于每一帧:
try sequenceHandler.perform([trackRequest], on: pixelBuffer)
if let result = trackRequest.results?.first {
let updatedBounds = result.boundingBox
trackRequest.inputObservation = result
}
Vision 提供了 references/vision-requests.md 中涵盖的额外请求:
| 请求 | 用途 |
|---|---|
ClassifyImageRequest | 分类场景内容(户外、食物、动物等) |
GenerateAttentionBasedSaliencyImageRequest | 观看者注意力焦点的热力图 |
GenerateObjectnessBasedSaliencyImageRequest | 类似物体区域的热力图 |
GenerateForegroundInstanceMaskRequest | 前景物体分割(非特定于人) |
DetectRectanglesRequest | 检测矩形形状(文档、卡片、屏幕) |
DetectHorizonRequest | 检测地平线角度以自动水平照片 |
DetectHumanBodyPoseRequest | 检测身体关节(肩膀、肘部、膝盖) |
DetectHumanBodyPose3DRequest | 3D 人体姿态估计 |
DetectHumanHandPoseRequest | 检测手部关节和手指位置 |
DetectAnimalBodyPoseRequest | 检测动物身体关节位置 |
DetectFaceCaptureQualityRequest | 人脸捕捉质量评分(0–1),用于照片选择 |
TrackRectangleRequest | 跨视频帧追踪矩形物体 |
TrackOpticalFlowRequest | 视频帧之间的光流 |
DetectTrajectoriesRequest | 检测视频中的物体轨迹 |
以上所有现代请求类型均为 iOS 18+ / macOS 15+。
通过 Vision 运行自定义 Core ML 模型,以进行自动图像预处理(调整大小、归一化、色彩空间转换)。
// 现代 API(iOS 18+)
let model = try MLModel(contentsOf: modelURL)
let request = CoreMLRequest(model: .init(model))
let results = try await request.perform(on: cgImage)
// 分类模型
if let classification = results.first as? ClassificationObservation {
let label = classification.identifier
let confidence = classification.confidence
}
// 旧版 API
let vnModel = try VNCoreMLModel(for: model)
let request = VNCoreMLRequest(model: vnModel) { request, error in
guard let results = request.results as? [VNClassificationObservation] else { return }
let topResult = results.first
}
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
关于模型转换和优化,请参阅 coreml 技能。
DataScannerViewController 为文本和条形码提供全屏实时相机扫描器。完整模式请参阅 references/visionkit-scanner.md。
import VisionKit
// 检查可用性(需要 A12+ 芯片和相机)
guard DataScannerViewController.isSupported,
DataScannerViewController.isAvailable else { return }
let scanner = DataScannerViewController(
recognizedDataTypes: [
.text(languages: ["en"]),
.barcode(symbologies: [.qr, .ean13])
],
qualityLevel: .balanced,
recognizesMultipleItems: true,
isHighFrameRateTrackingEnabled: true,
isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
try? scanner.startScanning()
}
将 DataScannerViewController 包装在 UIViewControllerRepresentable 中。完整实现请参阅 references/visionkit-scanner.md。
不要: 在新的 iOS 18+ 项目中使用旧版 VNImageRequestHandler API。要: 使用基于结构体的现代请求以及 perform(on:) 和 async/await。原因: 现代 API 提供类型安全、更好的 Swift 并发支持以及更清晰的错误处理。
不要: 在绘制边界框之前忘记转换归一化坐标。要: 使用 VNImageRectForNormalizedRect(_:_:_:) 或手动从左下角原点转换到 UIKit 左上角原点。原因: Vision 使用左下角原点的归一化坐标(0...1);UIKit 使用左上角原点的点。
不要: 在主线程上运行 Vision 请求。要: 在后台线程执行请求或从分离的任务中使用 async/await。原因: 图像分析是 CPU/GPU 密集型的,如果在主 actor 上运行会阻塞 UI。
不要: 对实时相机流使用 .accurate 识别级别。要: 对实时视频使用 .fast,对静态图像或离线处理使用 .accurate。原因: 准确识别对于 30fps 视频来说太慢;快速识别以质量换取速度。
不要: 忽略观察结果上的 confidence 分数。要: 根据您的用例使用适当的置信度阈值(例如 > 0.5)过滤结果。原因: 低置信度结果通常是错误的,会降低用户体验。
不要: 在追踪物体时为每一帧创建新的 VNImageRequestHandler。要: 对视频帧序列使用 VNSequenceRequestHandler。原因: 序列处理器为追踪维护时间上下文;每帧处理器会丢失状态。
不要: 当您只需要二维码时请求所有条形码符号体系。要: 在请求中仅指定您需要的符号体系。原因: 更少的符号体系意味着更快的检测和更少的误报。
不要: 假设 DataScannerViewController 在所有设备上都可用。要: 在呈现之前检查 isSupported(硬件)和 isAvailable(用户权限)。原因: 需要 A12+ 芯片;isAvailable 还会检查相机访问授权。
.fast,静态图像用 .accurate)DataScannerViewController 的可用性NSCameraUsageDescription)VNSequenceRequestHandler(而非每帧处理器)references/vision-requests.mdreferences/visionkit-scanner.md每周安装次数
343
代码仓库
GitHub 星标
269
首次出现
2026年3月8日
安全审计
安装于
codex340
gemini-cli337
github-copilot337
amp337
cline337
kimi-cli337
Detect text, faces, barcodes, objects, and body poses in images and video using on-device computer vision. Patterns target iOS 26+ with Swift 6.2, backward-compatible where noted.
See references/vision-requests.md for complete code patterns and references/visionkit-scanner.md for DataScannerViewController integration.
Vision has two distinct API layers. Prefer the modern API for new code.
| Aspect | Modern (iOS 18+) | Legacy |
|---|---|---|
| Pattern | let result = try await request.perform(on: image) | VNImageRequestHandler + completion handler |
| Request types | Swift types — structs and classes (RecognizeTextRequest, DetectFaceRectanglesRequest) | ObjC classes (VNRecognizeTextRequest, VNDetectFaceRectanglesRequest) |
| Concurrency | Native async/await | Completion handlers or synchronous perform |
| Observations | Typed return values | Cast results from [Any] |
| Availability | iOS 18+ / macOS 15+ | iOS 11+ |
The modern API uses the ImageProcessingRequest protocol. Each request type has a perform(on:orientation:) method that accepts CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data, or URL. Most requests are structs; stateful requests for video tracking (e.g., TrackObjectRequest, TrackRectangleRequest, DetectTrajectoriesRequest) are final classes.
All modern Vision requests follow the same pattern: create a request struct, call perform(on:), and handle the typed result.
import Vision
func recognizeText(in image: CGImage) async throws -> [String] {
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = [Locale.Language(identifier: "en-US")]
let observations = try await request.perform(on: image)
return observations.compactMap { observation in
observation.topCandidates(1).first?.string
}
}
Use VNImageRequestHandler with completion-based requests when targeting older deployment versions.
import Vision
func recognizeTextLegacy(in image: CGImage) throws -> [String] {
var recognized: [String] = []
let request = VNRecognizeTextRequest { request, error in
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
recognized = observations.compactMap { $0.topCandidates(1).first?.string }
}
request.recognitionLevel = .accurate
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
return recognized
}
var request = RecognizeTextRequest()
request.recognitionLevel = .accurate // .fast for real-time
request.recognitionLanguages = [
Locale.Language(identifier: "en-US"),
Locale.Language(identifier: "fr-FR"),
]
request.usesLanguageCorrection = true
request.customWords = ["SwiftUI", "Xcode"] // domain-specific terms
let observations = try await request.perform(on: cgImage)
for observation in observations {
guard let candidate = observation.topCandidates(1).first else { continue }
let text = candidate.string
let confidence = candidate.confidence // 0.0 ... 1.0
let bounds = observation.boundingBox // normalized coordinates
}
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.recognitionLanguages = ["en-US", "fr-FR"]
request.usesLanguageCorrection = true
Key differences: Modern API uses Locale.Language for languages; legacy uses string identifiers. Both support .accurate (best quality) and .fast (real-time suitable) recognition levels.
Detect face rectangles, landmarks (eyes, nose, mouth), and capture quality.
// Modern API
let faceRequest = DetectFaceRectanglesRequest()
let faces = try await faceRequest.perform(on: cgImage)
for face in faces {
let boundingBox = face.boundingBox // normalized CGRect
let roll = face.roll // Measurement<UnitAngle>
let yaw = face.yaw // Measurement<UnitAngle>
}
// Landmarks (eyes, nose, mouth contours)
var landmarkRequest = DetectFaceLandmarksRequest()
let landmarkFaces = try await landmarkRequest.perform(on: cgImage)
for face in landmarkFaces {
let landmarks = face.landmarks
let leftEye = landmarks?.leftEye?.normalizedPoints
let nose = landmarks?.nose?.normalizedPoints
}
Vision uses a normalized coordinate system with origin at the bottom-left. Convert to UIKit (top-left origin) before display:
func convertToUIKit(_ rect: CGRect, imageHeight: CGFloat) -> CGRect {
CGRect(
x: rect.origin.x,
y: imageHeight - rect.origin.y - rect.height,
width: rect.width,
height: rect.height
)
}
Detect 1D and 2D barcodes including QR codes.
var request = DetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128, .pdf417]
let barcodes = try await request.perform(on: cgImage)
for barcode in barcodes {
let payload = barcode.payloadString // decoded content
let symbology = barcode.symbology // .qr, .ean13, etc.
let bounds = barcode.boundingBox // normalized rect
}
Common symbologies: .qr, .aztec, .pdf417, .dataMatrix, .ean8, .ean13, .code39, .code128, .upce, .itf14.
RecognizeDocumentsRequest provides structured document reading with layout understanding beyond basic OCR. Returns DocumentObservation objects with a nested Container structure for paragraphs, tables, lists, and barcodes.
var request = RecognizeDocumentsRequest()
let documents = try await request.perform(on: cgImage)
for observation in documents {
let container = observation.document
// Full text content
let fullText = container.text
// Structured access to paragraphs
for paragraph in container.paragraphs {
let paragraphText = paragraph.text
}
// Tables and lists
for table in container.tables { /* structured table data */ }
for list in container.lists { /* structured list data */ }
// Embedded barcodes detected within the document
for barcode in container.barcodes { /* barcode data */ }
// Document title if detected
if let title = container.title { print(title) }
}
For simpler document camera scanning, use VisionKit's VNDocumentCameraViewController which provides a full-screen camera UI with auto-capture, perspective correction, and multi-page scanning.
var request = GeneratePersonSegmentationRequest()
request.qualityLevel = .accurate // .balanced, .fast
let mask = try await request.perform(on: cgImage)
// mask is a PersonSegmentationObservation with a pixelBuffer property
let maskBuffer = mask.pixelBuffer
// Apply mask using Core Image: CIFilter.blendWithMask()
let request = VNGeneratePersonSegmentationRequest()
request.qualityLevel = .accurate // .balanced, .fast
request.outputPixelFormat = kCVPixelFormatType_OneComponent8
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
guard let mask = request.results?.first?.pixelBuffer else { return }
// Apply mask using Core Image: CIFilter.blendWithMask()
Quality levels:
.accurate -- best quality, slowest (~1s), full resolution.balanced -- good quality, moderate speed (~100ms), 960x540.fast -- lowest quality, fastest (~10ms), 256x144, suitable for real-timeSeparate masks per person for individual effects.
// Modern API (iOS 18+)
let request = GeneratePersonInstanceMaskRequest()
let observation = try await request.perform(on: cgImage)
let indices = observation.allInstances
for index in indices {
let mask = try observation.generateMask(forInstances: IndexSet(integer: index))
// mask is a CVPixelBuffer with only this person visible
}
// Legacy API (iOS 17+)
let request = VNGeneratePersonInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
guard let result = request.results?.first else { return }
let indices = result.allInstances
for index in indices {
let instanceMask = try result.generateMaskedImage(
ofInstances: IndexSet(integer: index),
from: handler,
croppedToInstancesExtent: false
)
}
See references/vision-requests.md for mask composition and Core Image filter integration patterns.
TrackObjectRequest is a stateful request that maintains tracking context across frames. Conforms to both ImageProcessingRequest and StatefulRequest.
// Initialize with a detected object's bounding box
let initialObservation = DetectedObjectObservation(boundingBox: detectedRect)
var request = TrackObjectRequest(observation: initialObservation)
request.trackingLevel = .accurate
// For each video frame:
let results = try await request.perform(on: pixelBuffer)
if let tracked = results.first {
let updatedBounds = tracked.boundingBox
let confidence = tracked.confidence
}
let trackRequest = VNTrackObjectRequest(detectedObjectObservation: initialObservation)
trackRequest.trackingLevel = .accurate
let sequenceHandler = VNSequenceRequestHandler()
// For each frame:
try sequenceHandler.perform([trackRequest], on: pixelBuffer)
if let result = trackRequest.results?.first {
let updatedBounds = result.boundingBox
trackRequest.inputObservation = result
}
Vision provides additional requests covered in references/vision-requests.md:
| Request | Purpose |
|---|---|
ClassifyImageRequest | Classify scene content (outdoor, food, animal, etc.) |
GenerateAttentionBasedSaliencyImageRequest | Heat map of where viewers focus attention |
GenerateObjectnessBasedSaliencyImageRequest | Heat map of object-like regions |
GenerateForegroundInstanceMaskRequest | Foreground object segmentation (not person-specific) |
DetectRectanglesRequest | Detect rectangular shapes (documents, cards, screens) |
DetectHorizonRequest |
All modern request types above are iOS 18+ / macOS 15+.
Run custom Core ML models through Vision for automatic image preprocessing (resizing, normalization, color space conversion).
// Modern API (iOS 18+)
let model = try MLModel(contentsOf: modelURL)
let request = CoreMLRequest(model: .init(model))
let results = try await request.perform(on: cgImage)
// Classification model
if let classification = results.first as? ClassificationObservation {
let label = classification.identifier
let confidence = classification.confidence
}
// Legacy API
let vnModel = try VNCoreMLModel(for: model)
let request = VNCoreMLRequest(model: vnModel) { request, error in
guard let results = request.results as? [VNClassificationObservation] else { return }
let topResult = results.first
}
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
For model conversion and optimization, see the coreml skill.
DataScannerViewController provides a full-screen live camera scanner for text and barcodes. See references/visionkit-scanner.md for complete patterns.
import VisionKit
// Check availability (requires A12+ chip and camera)
guard DataScannerViewController.isSupported,
DataScannerViewController.isAvailable else { return }
let scanner = DataScannerViewController(
recognizedDataTypes: [
.text(languages: ["en"]),
.barcode(symbologies: [.qr, .ean13])
],
qualityLevel: .balanced,
recognizesMultipleItems: true,
isHighFrameRateTrackingEnabled: true,
isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
try? scanner.startScanning()
}
Wrap DataScannerViewController in UIViewControllerRepresentable. See references/visionkit-scanner.md for the full implementation.
DON'T: Use the legacy VNImageRequestHandler API for new iOS 18+ projects. DO: Use modern struct-based requests with perform(on:) and async/await. Why: Modern API provides type safety, better Swift concurrency support, and cleaner error handling.
DON'T: Forget to convert normalized coordinates before drawing bounding boxes. DO: Use VNImageRectForNormalizedRect(_:_:_:) or manual conversion from bottom-left origin to UIKit top-left origin. Why: Vision uses normalized coordinates (0...1) with bottom-left origin; UIKit uses points with top-left origin.
DON'T: Run Vision requests on the main thread. DO: Perform requests on a background thread or use async/await from a detached task. Why: Image analysis is CPU/GPU-intensive and blocks the UI if run on the main actor.
DON'T: Use .accurate recognition level for real-time camera feeds. DO: Use .fast for live video, .accurate for still images or offline processing. Why: Accurate recognition is too slow for 30fps video; fast recognition trades quality for speed.
DON'T: Ignore the confidence score on observations. DO: Filter results by confidence threshold (e.g., > 0.5) appropriate for your use case. Why: Low-confidence results are often incorrect and degrade user experience.
DON'T: Create a new VNImageRequestHandler for each frame when tracking objects. DO: Use VNSequenceRequestHandler for video frame sequences. Why: Sequence handler maintains temporal context for tracking; per-frame handlers lose state.
DON'T: Request all barcode symbologies when you only need QR codes. DO: Specify only the symbologies you need in the request. Why: Fewer symbologies means faster detection and fewer false positives.
DON'T: Assume DataScannerViewController is available on all devices. DO: Check both isSupported (hardware) and isAvailable (user permissions) before presenting. Why: Requires A12+ chip; isAvailable also checks camera access authorization.
.fast for video, .accurate for stills)DataScannerViewController availability checked before presentationNSCameraUsageDescription) in Info.plist for VisionKitVNSequenceRequestHandler used for video frame tracking (not per-frame handler)references/vision-requests.mdreferences/visionkit-scanner.mdWeekly Installs
343
Repository
GitHub Stars
269
First Seen
Mar 8, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex340
gemini-cli337
github-copilot337
amp337
cline337
kimi-cli337
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
41,800 周安装
| Detect horizon angle for auto-leveling photos |
DetectHumanBodyPoseRequest | Detect body joints (shoulders, elbows, knees) |
DetectHumanBodyPose3DRequest | 3D human body pose estimation |
DetectHumanHandPoseRequest | Detect hand joints and finger positions |
DetectAnimalBodyPoseRequest | Detect animal body joint positions |
DetectFaceCaptureQualityRequest | Face capture quality scoring (0–1) for photo selection |
TrackRectangleRequest | Track rectangular objects across video frames |
TrackOpticalFlowRequest | Optical flow between video frames |
DetectTrajectoriesRequest | Detect object trajectories in video |