axiom-vision-ref by charleswiltgen/axiom
npx skills add https://github.com/charleswiltgen/axiom --skill axiom-vision-refVision 框架计算机视觉功能全面参考:主体分割、手部/身体姿态检测、人物检测、人脸分析、文本识别(OCR)、条形码检测和文档扫描。
相关技能 : 查看 axiom-vision 获取决策树和模式,axiom-vision-diag 获取故障排除
Vision 为静态图像和视频提供计算机视觉算法:
核心工作流程 :
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
VNDetectHumanHandPoseRequest())VNImageRequestHandler(cgImage: image))try handler.perform([request]))request.results 访问观察结果坐标系 : 左下角原点,归一化(0.0-1.0)坐标
性能 : 在后台队列运行 - 资源密集,如果在主线程运行会阻塞 UI
Vision 为不同场景提供两种请求处理器。
分析单张图像。使用图像初始化,对其执行请求,然后丢弃。
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request1, request2]) // 多个请求,一张图像
使用以下初始化 : CGImage、CIImage、CVPixelBuffer、Data 或 URL
规则 : 每个图像一个处理器。不支持使用不同图像重用处理器。
分析帧序列(视频、相机流)。空初始化,将每帧传递给 perform()。维护帧间状态以实现时间平滑。
let sequenceHandler = VNSequenceRequestHandler()
// 在您的相机/视频帧回调中:
func processFrame(_ pixelBuffer: CVPixelBuffer) throws {
try sequenceHandler.perform([request], on: pixelBuffer)
}
规则 : 创建一次,跨帧重用。处理器在调用之间跟踪状态。
| 使用场景 | 处理器 |
|---|---|
| 单张照片或截图 | VNImageRequestHandler |
| 视频流或相机帧 | VNSequenceRequestHandler |
| 时间平滑(姿态、分割) | VNSequenceRequestHandler |
| 一次性分析 CVPixelBuffer | VNImageRequestHandler |
这些请求通过 VNSequenceRequestHandler 运行时使用帧间状态:
VNDetectHumanBodyPoseRequest — 更平滑的关节跟踪VNDetectHumanHandPoseRequest — 更平滑的特征点跟踪VNGeneratePersonSegmentationRequest — 时间一致的掩码VNGeneratePersonInstanceMaskRequest — 跨帧稳定的人物身份VNDetectDocumentSegmentationRequest — 稳定的文档边缘VNStatefulRequest 子类 — 专为序列设计为每个视频帧创建新的 VNImageRequestHandler 会丢弃时间上下文。姿态特征点抖动,分割掩码闪烁,并且您失去了序列处理提供的平滑效果。
// 错误 — 每帧都丢失时间上下文
func processFrame(_ buffer: CVPixelBuffer) throws {
let handler = VNImageRequestHandler(cvPixelBuffer: buffer)
try handler.perform([poseRequest])
}
// 正确 — 维护帧间状态
let sequenceHandler = VNSequenceRequestHandler()
func processFrame(_ buffer: CVPixelBuffer) throws {
try sequenceHandler.perform([poseRequest], on: buffer)
}
可用性 : iOS 17+, macOS 14+, tvOS 17+, visionOS 1+
生成前景对象(人物、宠物、建筑物、食物、鞋子等)的类别无关实例掩码。
let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else {
return
}
allInstances : 包含所有前景实例索引的 IndexSet(排除背景 0)
instanceMask : 带有 UInt8 标签的 CVPixelBuffer(0 = 背景,1+ = 实例索引)
instanceAtPoint(_:) : 返回归一化点处的实例索引
let point = CGPoint(x: 0.5, y: 0.5) // 图像中心
let instance = observation.instanceAtPoint(point)
if instance == 0 {
print("点击了背景")
} else {
print("点击了实例 \(instance)")
}
createScaledMask(for:croppedToInstancesContent:)
参数:
for: 要包含的实例的 IndexSetcroppedToInstancesContent:
false = 输出匹配输入分辨率(用于合成)true = 围绕选定实例紧密裁剪返回:单通道浮点 CVPixelBuffer(软分割掩码)
// 所有实例,全分辨率
let mask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: false
)
// 单个实例,裁剪
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
for: instances,
croppedToInstancesContent: true
)
访问原始像素缓冲区以将点击坐标映射到实例标签:
let instanceMask = observation.instanceMask
CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }
let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)
// 将归一化点击转换为像素坐标
let pixelPoint = VNImagePointForNormalizedPoint(
CGPoint(x: normalizedX, y: normalizedY),
width: imageWidth,
height: imageHeight
)
// 计算字节偏移量
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)
// 读取实例标签
let label = UnsafeRawPointer(baseAddress!).load(
fromByteOffset: offset,
as: UInt8.self
)
let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))
可用性 : iOS 16+, iPadOS 16+
向视图添加类似系统的主体提取 UI:
let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject // 或 .automatic
imageView.addInteraction(interaction)
交互类型 :
.automatic: 主体提取 + 实时文本 + 数据检测器.imageSubject: 仅主体提取(无交互式文本)可用性 : macOS 13+
let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)
let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])
let analysis = try await analyzer.analyze(image, configuration: configuration)
subjects : [Subject] - 图像中的所有主体
highlightedSubjects : Set<Subject> - 当前高亮显示(用户长按)
subject(at:) : 在归一化点处异步查找主体(如果没有则返回 nil)
// 获取所有主体
let subjects = analysis.subjects
// 在点击处查找主体
if let subject = try await analysis.subject(at: tapPoint) {
// 处理主体
}
// 更改高亮状态
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])
image : UIImage/NSImage - 提取的带透明度的主体图像
bounds : CGRect - 图像坐标中的主体边界
// 单个主体图像
let subjectImage = subject.image
// 合成多个主体
let compositeImage = try await analysis.image(for: [subject1, subject2])
进程外 : VisionKit 分析在进程外进行(性能优势,图像大小受限)
可用性 : iOS 15+, macOS 12+
返回包含图像中所有人物的单个掩码:
let request = VNGeneratePersonSegmentationRequest()
// 如果需要,配置质量级别
try handler.perform([request])
guard let observation = request.results?.first as? VNPixelBufferObservation else {
return
}
let personMask = observation.pixelBuffer // CVPixelBuffer
可用性 : iOS 17+, macOS 14+
返回最多 4 个人的独立掩码:
let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else {
return
}
// 与前景实例掩码相同的 InstanceMaskObservation API
let allPeople = observation.allInstances // 最多 4 个人(1-4)
// 获取人物 1 的掩码
let person1Mask = try observation.createScaledMask(
for: IndexSet(integer: 1),
croppedToInstancesContent: false
)
限制 :
VNDetectFaceRectanglesRequest 来统计人脸可用性 : iOS 14+, macOS 11+
检测每只手21 个手部特征点:
let request = VNDetectHumanHandPoseRequest()
request.maximumHandCount = 2 // 默认:2,如果需要可以增加
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for observation in request.results as? [VNHumanHandPoseObservation] ?? [] {
// 处理每只手
}
性能说明 : maximumHandCount 影响延迟。仅对 ≤ 最大数量的手计算姿态。设置为可接受的最低值。
手腕 : 1 个特征点
拇指(4 个特征点):
.thumbTip.thumbIP(指间关节).thumbMP(掌指关节).thumbCMC(腕掌关节)手指(每个 4 个特征点):
.indexTip、.middleTip、.ringTip、.littleTip)访问特征点组:
| 组键 | 点 |
|---|---|
.all | 所有 21 个特征点 |
.thumb | 4 个拇指关节 |
.indexFinger | 4 个食指关节 |
.middleFinger | 4 个中指关节 |
.ringFinger | 4 个无名指关节 |
.littleFinger | 4 个小指关节 |
// 获取所有点
let allPoints = try observation.recognizedPoints(.all)
// 仅获取食指点
let indexPoints = try observation.recognizedPoints(.indexFinger)
// 获取特定点
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)
// 检查置信度
guard thumbTip.confidence > 0.5 else { return }
// 访问位置(归一化坐标,左下角原点)
let location = thumbTip.location // CGPoint
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)
guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else {
return
}
let distance = hypot(
thumbTip.location.x - indexTip.location.x,
thumbTip.location.y - indexTip.location.y
)
let isPinching = distance < 0.05 // 归一化阈值
let chirality = observation.chirality // .left 或 .right 或 .unknown
可用性 : iOS 14+, macOS 11+
检测18 个身体特征点(2D 归一化坐标):
let request = VNDetectHumanBodyPoseRequest()
try handler.perform([request])
for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] {
// 处理每个人
}
面部(5 个特征点):
.nose、.leftEye、.rightEye、.leftEar、.rightEar手臂(6 个特征点):
.leftShoulder、.leftElbow、.leftWrist.rightShoulder、.rightElbow、.rightWrist躯干(7 个特征点):
.neck(肩膀之间).leftShoulder、.rightShoulder(也在手臂组中).leftHip、.rightHip.root(臀部之间)腿部(6 个特征点):
.leftHip、.leftKnee、.leftAnkle.rightHip、.rightKnee、.rightAnkle注意 : 肩膀和臀部出现在多个组中
| 组键 | 点 |
|---|---|
.all | 所有 18 个特征点 |
.face | 5 个面部特征点 |
.leftArm | 肩膀、手肘、手腕 |
.rightArm | 肩膀、手肘、手腕 |
.torso | 颈部、肩膀、臀部、根节点 |
.leftLeg | 臀部、膝盖、脚踝 |
.rightLeg | 臀部、膝盖、脚踝 |
// 获取所有身体点
let allPoints = try observation.recognizedPoints(.all)
// 仅获取左臂
let leftArmPoints = try observation.recognizedPoints(.leftArm)
// 获取特定关节
let leftWrist = try observation.recognizedPoint(.leftWrist)
可用性 : iOS 17+, macOS 14+
返回具有 17 个关节的 3D 骨架,单位为米(真实世界坐标):
let request = VNDetectHumanBodyPose3DRequest()
try handler.perform([request])
guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else {
return
}
// 获取 3D 关节位置
let leftWrist = try observation.recognizedPoint(.leftWrist)
let position = leftWrist.position // simd_float4x4 矩阵
let localPosition = leftWrist.localPosition // 相对于父关节
3D 身体特征点(17 个点):与 2D 相同,但没有耳朵(15 个 vs 18 个 2D 特征点)
bodyHeight : 以米为单位的估计高度
heightEstimation : .measured 或 .reference
cameraOriginMatrix : simd_float4x4 相机相对于主体的位置/方向
pointInImage(_:) : 将 3D 关节投影回 2D 图像坐标
let wrist2D = try observation.pointInImage(leftWrist)
VNPoint3D : 具有 simd_float4x4 位置矩阵的基类
VNRecognizedPoint3D : 添加标识符(关节名称)
VNHumanBodyRecognizedPoint3D : 添加 localPosition 和 parentJoint
// 相对于骨架根节点(臀部中心)的位置
let modelPosition = leftWrist.position
// 相对于父关节(左手肘)的位置
let relativePosition = leftWrist.localPosition
Vision 接受与图像一起的深度数据:
// 来自 AVDepthData
let handler = VNImageRequestHandler(
cvPixelBuffer: imageBuffer,
depthData: depthData,
orientation: orientation
)
// 来自文件(自动深度提取)
let handler = VNImageRequestHandler(url: imageURL) // 深度自动获取
深度格式 : 视差或深度(可通过 AVFoundation 互换)
LiDAR : 在实时捕获会话中使用以获得精确的比例/测量
可用性 : iOS 11+
检测人脸边界框:
let request = VNDetectFaceRectanglesRequest()
try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] {
let faceBounds = observation.boundingBox // 归一化矩形
}
可用性 : iOS 11+
检测具有详细特征点的人脸:
let request = VNDetectFaceLandmarksRequest()
try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] {
if let landmarks = observation.landmarks {
let leftEye = landmarks.leftEye
let nose = landmarks.nose
let leftPupil = landmarks.leftPupil // 修订版 2+
}
}
修订版 :
可用性 : iOS 13+
检测人物边界框(躯干检测):
let request = VNDetectHumanRectanglesRequest()
try handler.perform([request])
for observation in request.results as? [VNHumanObservation] ?? [] {
let humanBounds = observation.boundingBox // 归一化矩形
}
使用场景 : 当您只需要位置时,比姿态检测更快
使用 Vision 掩码在新背景上合成主体:
// 1. 从 Vision 获取掩码
let observation = request.results?.first as? VNInstanceMaskObservation
let visionMask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: false
)
// 2. 转换为 CIImage
let maskImage = CIImage(cvPixelBuffer: visionMask)
// 3. 应用滤镜
let filter = CIFilter(name: "CIBlendWithMask")!
filter.setValue(sourceImage, forKey: kCIInputImageKey)
filter.setValue(maskImage, forKey: kCIInputMaskImageKey)
filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)
let output = filter.outputImage // 合成结果
参数 :
HDR 保留 : CoreImage 保留输入的高动态范围(Vision/VisionKit 输出是 SDR)
可用性 : iOS 13+, macOS 10.15+
以可配置的准确度/速度权衡识别图像中的文本。
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate // 或 .fast
request.recognitionLanguages = ["en-US", "de-DE"] // 顺序很重要
request.usesLanguageCorrection = true
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for observation in request.results as? [VNRecognizedTextObservation] ?? [] {
// 获取最佳候选
let candidates = observation.topCandidates(3)
let bestText = candidates.first?.string ?? ""
}
| 级别 | 性能 | 准确度 | 最适合 |
|---|---|---|---|
.fast | 实时 | 良好 | 相机流、大文本、标志 |
.accurate | 较慢 | 优秀 | 文档、收据、手写文字 |
快速路径 : 逐字符识别(神经网络 → 字符检测)
准确路径 : 整行 ML 识别(神经网络 → 行/词识别)
| 属性 | 类型 | 描述 |
|---|---|---|
recognitionLevel | VNRequestTextRecognitionLevel | .fast 或 .accurate |
recognitionLanguages | [String] | BCP 47 语言代码,顺序 = 优先级 |
usesLanguageCorrection | Bool | 使用语言模型进行校正 |
customWords | [String] | 领域特定词汇 |
automaticallyDetectsLanguage | Bool | 自动检测语言(iOS 16+) |
minimumTextHeight | Float | 文本最小高度占图像的比例(0-1) |
revision | Int | API 版本(影响支持的语言) |
// 检查当前设置支持的语言
let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages(
for: .accurate,
revision: VNRecognizeTextRequestRevision3
)
语言校正 : 提高准确度但需要处理时间。对于代码/序列号请禁用。
自定义词汇 : 添加领域特定词汇以获得更好的识别效果(医学术语、产品代码)。
boundingBox : 包含识别文本的归一化矩形
topCandidates(_:) : 返回按置信度排序的 [VNRecognizedText]
| 属性 | 类型 | 描述 |
|---|---|---|
string | String | 识别的文本 |
confidence | VNConfidence | 0.0-1.0 |
boundingBox(for:) | VNRectangleObservation? | 子字符串范围的边界框 |
// 获取子字符串的边界框
let text = candidate.string
if let range = text.range(of: "invoice") {
let box = try candidate.boundingBox(for: range)
}
可用性 : iOS 11+, macOS 10.13+
检测和解码条形码和二维码。
let request = VNDetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128] // 特定代码
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for barcode in request.results as? [VNBarcodeObservation] ?? [] {
let payload = barcode.payloadStringValue
let type = barcode.symbology
let bounds = barcode.boundingBox
}
1D 条形码 :
.codabar(iOS 15+).code39、.code39Checksum、.code39FullASCII、.code39FullASCIIChecksum.code93、.code93i.code128.ean8、.ean13.gs1DataBar、.gs1DataBarExpanded、.gs1DataBarLimited(iOS 15+).i2of5、.i2of5Checksum.itf14.upce2D 码 :
.aztec.dataMatrix.microPDF417(iOS 15+).microQR(iOS 15+).pdf417.qr性能 : 指定较少的符号体系 = 更快的检测
| 修订版 | iOS | 功能 |
|---|---|---|
| 1 | 11+ | 基本检测,一次一个代码 |
| 2 | 15+ | Codabar、GS1、MicroPDF、MicroQR,更好的 ROI |
| 3 | 16+ | 基于 ML,多个代码,更好的边界框 |
| 属性 | 类型 | 描述 |
|---|---|---|
payloadStringValue | String? | 解码的内容 |
symbology | VNBarcodeSymbology | 条形码类型 |
boundingBox | CGRect | 归一化边界 |
topLeft/topRight/bottomLeft/bottomRight | CGPoint | 角点 |
可用性 : iOS 16+
基于相机的实时扫描器,具有用于文本和条形码的内置 UI。
// 硬件支持
DataScannerViewController.isSupported
// 运行时可用性(相机访问、家长控制)
DataScannerViewController.isAvailable
import VisionKit
let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [
.barcode(symbologies: [.qr, .ean13]),
.text(textContentType: .URL), // 或 nil 表示所有文本
// .text(languages: ["ja"]) // 按语言过滤
]
let scanner = DataScannerViewController(
recognizedDataTypes: dataTypes,
qualityLevel: .balanced, // .fast, .balanced, .accurate
recognizesMultipleItems: true,
isHighFrameRateTrackingEnabled: true,
isPinchToZoomEnabled: true,
isGuidanceEnabled: true,
isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
try? scanner.startScanning()
}
| 类型 | 描述 |
|---|---|
.barcode(symbologies:) | 特定条形码类型 |
.text() | 所有文本 |
.text(languages:) | 按语言过滤的文本 |
.text(textContentType:) | 按类型过滤的文本(URL、电话、电子邮件) |
protocol DataScannerViewControllerDelegate {
func dataScanner(_ dataScanner: DataScannerViewController,
didTapOn item: RecognizedItem)
func dataScanner(_ dataScanner: DataScannerViewController,
didAdd addedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didUpdate updatedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didRemove removedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}
enum RecognizedItem {
case text(RecognizedItem.Text)
case barcode(RecognizedItem.Barcode)
var id: UUID { get }
var bounds: RecognizedItem.Bounds { get }
}
// 文本项
struct Text {
let transcript: String
}
// 条形码项
struct Barcode {
let payloadStringValue: String?
let observation: VNBarcodeObservation
}
// 委托的替代方案
for await items in scanner.recognizedItems {
// 当前识别的项目
}
// 在识别的项目上添加自定义视图
scanner.overlayContainerView.addSubview(customHighlight)
// 捕获静态照片
let photo = try await scanner.capturePhoto()
可用性 : iOS 13+
文档扫描,具有自动边缘检测、透视校正和光照调整。
import VisionKit
let camera = VNDocumentCameraViewController()
camera.delegate = self
present(camera, animated: true)
protocol VNDocumentCameraViewControllerDelegate {
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan)
func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFailWithError error: Error)
}
| 属性 | 类型 | 描述 |
|---|---|---|
pageCount | Int | 扫描的页数 |
imageOfPage(at:) | UIImage | 获取索引处的页面图像 |
title | String | 用户可编辑的标题 |
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan) {
controller.dismiss(animated: true)
for i in 0..<scan.pageCount {
let pageImage = scan.imageOfPage(at: i)
// 使用 VNRecognizeTextRequest 处理
}
}
可用性 : iOS 15+, macOS 12+
检测文档边界,用于自定义相机 UI 或后处理。
let request = VNDetectDocumentSegmentationRequest()
let handler = VNImageRequestHandler(ciImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNRectangleObservation else {
return // 未找到文档
}
// 获取角点(归一化)
let corners = [
observation.topLeft,
observation.topRight,
observation.bottomLeft,
observation.bottomRight
]
vs VNDetectRectanglesRequest :
可用性 : iOS 26+, macOS 26+
具有语义解析的结构化文档理解。
let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: imageData)
guard let document = observations.first?.document else {
return
}
DocumentObservation
└── document: DocumentObservation.Document
├── text: TextObservation
├── tables: [Container.Table]
├── lists: [Container.List]
└── barcodes: [Container.Barcode]
for table in document.tables {
for row in table.rows {
for cell in row {
let text = cell.content.text.transcript
let detectedData = cell.content.text.detectedData
}
}
}
for data in document.text.detectedData {
switch data.match.details {
case .emailAddress(let email):
let address = email.emailAddress
case .phoneNumber(let phone):
let number = phone.phoneNumber
case .link(let url):
let link = url
case .address(let address):
let components = address
case .date(let date):
let dateValue = date
default:
break
}
}
TextObservation
├── transcript: String
├── lines: [TextObservation.Line]
├── paragraphs: [TextObservation.Paragraph]
├── words: [TextObservation.Word]
└── detectedData: [DetectedDataObservation]
Visual Intelligence 是一个系统级功能(iOS 26+),允许用户将相机对准真实世界的对象,并在应用程序之间查找匹配的内容。这与上述介绍的 Vision 框架(基于 VNRequest 的图像分析)不同。Vision 在您的应用程序内分析图像;Visual Intelligence 让系统在用户使用相机或截图搜索时调用您的应用程序。
IntentValueQuery 查询参与的应用程序SemanticContentDescriptorAppEntity 结果import VisualIntelligence
import AppIntents
系统提供的核心对象,用于描述用户正在查看的内容。
| 属性 | 类型 | 描述 |
|---|---|---|
labels | [String] | 检测到项目的分类标签 |
pixelBuffer | CVReadOnlyPixelBuffer? | 检测到项目的视觉数据 |
使用标签对您的内容目录进行快速关键字匹配。当标签不足时,使用像素缓冲区进行图像相似性搜索。
Visual Intelligence 与您的应用程序通信的入口点。实现 values(for:) 以接收搜索请求并返回匹配的实体。
struct LandmarkIntentValueQuery: IntentValueQuery {
@Dependency var modelData: ModelData
func values(for input: SemanticContentDescriptor) async throws -> [LandmarkEntity] {
if !input.labels.isEmpty {
return try await modelData.search(matching: input.labels)
}
guard let pixelBuffer = input.pixelBuffer else { return [] }
return try await
Comprehensive reference for Vision framework computer vision: subject segmentation, hand/body pose detection, person detection, face analysis, text recognition (OCR), barcode detection, and document scanning.
Related skills : See axiom-vision for decision trees and patterns, axiom-vision-diag for troubleshooting
Vision provides computer vision algorithms for still images and video:
Core workflow :
VNDetectHumanHandPoseRequest())VNImageRequestHandler(cgImage: image))try handler.perform([request]))request.resultsCoordinate system : Lower-left origin, normalized (0.0-1.0) coordinates
Performance : Run on background queue - resource intensive, blocks UI if on main thread
Vision provides two request handlers for different scenarios.
Analyzes a single image. Initialize with the image, perform requests against it, discard.
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request1, request2]) // Multiple requests, one image
Initialize with : CGImage, CIImage, CVPixelBuffer, Data, or URL
Rule : One handler per image. Reusing a handler with a different image is unsupported.
Analyzes a sequence of frames (video, camera feed). Initialize empty, pass each frame to perform(). Maintains inter-frame state for temporal smoothing.
let sequenceHandler = VNSequenceRequestHandler()
// In your camera/video frame callback:
func processFrame(_ pixelBuffer: CVPixelBuffer) throws {
try sequenceHandler.perform([request], on: pixelBuffer)
}
Rule : Create once, reuse across frames. The handler tracks state between calls.
| Use Case | Handler |
|---|---|
| Single photo or screenshot | VNImageRequestHandler |
| Video stream or camera frames | VNSequenceRequestHandler |
| Temporal smoothing (pose, segmentation) | VNSequenceRequestHandler |
| One-off analysis of a CVPixelBuffer | VNImageRequestHandler |
These requests use inter-frame state when run through VNSequenceRequestHandler:
VNDetectHumanBodyPoseRequest — Smoother joint trackingVNDetectHumanHandPoseRequest — Smoother landmark trackingVNGeneratePersonSegmentationRequest — Temporally consistent masksVNGeneratePersonInstanceMaskRequest — Stable person identity across framesVNDetectDocumentSegmentationRequest — Stable document edgesVNStatefulRequest subclass — Designed for sequencesCreating a new VNImageRequestHandler per video frame discards temporal context. Pose landmarks jitter, segmentation masks flicker, and you lose the smoothing that sequence handling provides.
// Wrong — loses temporal context every frame
func processFrame(_ buffer: CVPixelBuffer) throws {
let handler = VNImageRequestHandler(cvPixelBuffer: buffer)
try handler.perform([poseRequest])
}
// Right — maintains inter-frame state
let sequenceHandler = VNSequenceRequestHandler()
func processFrame(_ buffer: CVPixelBuffer) throws {
try sequenceHandler.perform([poseRequest], on: buffer)
}
Availability : iOS 17+, macOS 14+, tvOS 17+, visionOS 1+
Generates class-agnostic instance mask of foreground objects (people, pets, buildings, food, shoes, etc.)
let request = VNGenerateForegroundInstanceMaskRequest()
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else {
return
}
allInstances : IndexSet containing all foreground instance indices (excludes background 0)
instanceMask : CVPixelBuffer with UInt8 labels (0 = background, 1+ = instance indices)
instanceAtPoint(_:) : Returns instance index at normalized point
let point = CGPoint(x: 0.5, y: 0.5) // Center of image
let instance = observation.instanceAtPoint(point)
if instance == 0 {
print("Background tapped")
} else {
print("Instance \(instance) tapped")
}
createScaledMask(for:croppedToInstancesContent:)
Parameters:
for: IndexSet of instances to includecroppedToInstancesContent:
false = Output matches input resolution (for compositing)true = Tight crop around selected instancesReturns: Single-channel floating-point CVPixelBuffer (soft segmentation mask)
// All instances, full resolution
let mask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: false
)
// Single instance, cropped
let instances = IndexSet(integer: 1)
let croppedMask = try observation.createScaledMask(
for: instances,
croppedToInstancesContent: true
)
Access raw pixel buffer to map tap coordinates to instance labels:
let instanceMask = observation.instanceMask
CVPixelBufferLockBaseAddress(instanceMask, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(instanceMask, .readOnly) }
let baseAddress = CVPixelBufferGetBaseAddress(instanceMask)
let width = CVPixelBufferGetWidth(instanceMask)
let bytesPerRow = CVPixelBufferGetBytesPerRow(instanceMask)
// Convert normalized tap to pixel coordinates
let pixelPoint = VNImagePointForNormalizedPoint(
CGPoint(x: normalizedX, y: normalizedY),
width: imageWidth,
height: imageHeight
)
// Calculate byte offset
let offset = Int(pixelPoint.y) * bytesPerRow + Int(pixelPoint.x)
// Read instance label
let label = UnsafeRawPointer(baseAddress!).load(
fromByteOffset: offset,
as: UInt8.self
)
let instances = label == 0 ? observation.allInstances : IndexSet(integer: Int(label))
Availability : iOS 16+, iPadOS 16+
Adds system-like subject lifting UI to views:
let interaction = ImageAnalysisInteraction()
interaction.preferredInteractionTypes = .imageSubject // Or .automatic
imageView.addInteraction(interaction)
Interaction types :
.automatic: Subject lifting + Live Text + data detectors.imageSubject: Subject lifting only (no interactive text)Availability : macOS 13+
let overlayView = ImageAnalysisOverlayView()
overlayView.preferredInteractionTypes = .imageSubject
nsView.addSubview(overlayView)
let analyzer = ImageAnalyzer()
let configuration = ImageAnalyzer.Configuration([.text, .visualLookUp])
let analysis = try await analyzer.analyze(image, configuration: configuration)
subjects : [Subject] - All subjects in image
highlightedSubjects : Set<Subject> - Currently highlighted (user long-pressed)
subject(at:) : Async lookup of subject at normalized point (returns nil if none)
// Get all subjects
let subjects = analysis.subjects
// Look up subject at tap
if let subject = try await analysis.subject(at: tapPoint) {
// Process subject
}
// Change highlight state
analysis.highlightedSubjects = Set([subjects[0], subjects[1]])
image : UIImage/NSImage - Extracted subject with transparency
bounds : CGRect - Subject boundaries in image coordinates
// Single subject image
let subjectImage = subject.image
// Composite multiple subjects
let compositeImage = try await analysis.image(for: [subject1, subject2])
Out-of-process : VisionKit analysis happens out-of-process (performance benefit, image size limited)
Availability : iOS 15+, macOS 12+
Returns single mask containing all people in image:
let request = VNGeneratePersonSegmentationRequest()
// Configure quality level if needed
try handler.perform([request])
guard let observation = request.results?.first as? VNPixelBufferObservation else {
return
}
let personMask = observation.pixelBuffer // CVPixelBuffer
Availability : iOS 17+, macOS 14+
Returns separate masks for up to 4 people :
let request = VNGeneratePersonInstanceMaskRequest()
try handler.perform([request])
guard let observation = request.results?.first as? VNInstanceMaskObservation else {
return
}
// Same InstanceMaskObservation API as foreground instance masks
let allPeople = observation.allInstances // Up to 4 people (1-4)
// Get mask for person 1
let person1Mask = try observation.createScaledMask(
for: IndexSet(integer: 1),
croppedToInstancesContent: false
)
Limitations :
VNDetectFaceRectanglesRequest to count faces if you need to handle crowded scenesAvailability : iOS 14+, macOS 11+
Detects 21 hand landmarks per hand:
let request = VNDetectHumanHandPoseRequest()
request.maximumHandCount = 2 // Default: 2, increase if needed
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for observation in request.results as? [VNHumanHandPoseObservation] ?? [] {
// Process each hand
}
Performance note : maximumHandCount affects latency. Pose computed only for hands ≤ maximum. Set to lowest acceptable value.
Wrist : 1 landmark
Thumb (4 landmarks):
.thumbTip.thumbIP (interphalangeal joint).thumbMP (metacarpophalangeal joint).thumbCMC (carpometacarpal joint)Fingers (4 landmarks each):
.indexTip, .middleTip, .ringTip, .littleTip)Access landmark groups:
| Group Key | Points |
|---|---|
.all | All 21 landmarks |
.thumb | 4 thumb joints |
.indexFinger | 4 index finger joints |
.middleFinger | 4 middle finger joints |
.ringFinger | 4 ring finger joints |
.littleFinger | 4 little finger joints |
// Get all points
let allPoints = try observation.recognizedPoints(.all)
// Get index finger points only
let indexPoints = try observation.recognizedPoints(.indexFinger)
// Get specific point
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)
// Check confidence
guard thumbTip.confidence > 0.5 else { return }
// Access location (normalized coordinates, lower-left origin)
let location = thumbTip.location // CGPoint
let thumbTip = try observation.recognizedPoint(.thumbTip)
let indexTip = try observation.recognizedPoint(.indexTip)
guard thumbTip.confidence > 0.5, indexTip.confidence > 0.5 else {
return
}
let distance = hypot(
thumbTip.location.x - indexTip.location.x,
thumbTip.location.y - indexTip.location.y
)
let isPinching = distance < 0.05 // Normalized threshold
let chirality = observation.chirality // .left or .right or .unknown
Availability : iOS 14+, macOS 11+
Detects 18 body landmarks (2D normalized coordinates):
let request = VNDetectHumanBodyPoseRequest()
try handler.perform([request])
for observation in request.results as? [VNHumanBodyPoseObservation] ?? [] {
// Process each person
}
Face (5 landmarks):
.nose, .leftEye, .rightEye, .leftEar, .rightEarArms (6 landmarks):
.leftShoulder, .leftElbow, .leftWrist.rightShoulder, .rightElbow, .rightWristTorso (7 landmarks):
.neck (between shoulders).leftShoulder, .rightShoulder (also in arm groups).leftHip, .rightHip.root (between hips)Legs (6 landmarks):
.leftHip, .leftKnee, .leftAnkle.rightHip, .rightKnee, .rightAnkleNote : Shoulders and hips appear in multiple groups
| Group Key | Points |
|---|---|
.all | All 18 landmarks |
.face | 5 face landmarks |
.leftArm | shoulder, elbow, wrist |
.rightArm | shoulder, elbow, wrist |
.torso | neck, shoulders, hips, root |
.leftLeg | hip, knee, ankle |
// Get all body points
let allPoints = try observation.recognizedPoints(.all)
// Get left arm only
let leftArmPoints = try observation.recognizedPoints(.leftArm)
// Get specific joint
let leftWrist = try observation.recognizedPoint(.leftWrist)
Availability : iOS 17+, macOS 14+
Returns 3D skeleton with 17 joints in meters (real-world coordinates):
let request = VNDetectHumanBodyPose3DRequest()
try handler.perform([request])
guard let observation = request.results?.first as? VNHumanBodyPose3DObservation else {
return
}
// Get 3D joint position
let leftWrist = try observation.recognizedPoint(.leftWrist)
let position = leftWrist.position // simd_float4x4 matrix
let localPosition = leftWrist.localPosition // Relative to parent joint
3D Body Landmarks (17 points): Same as 2D except no ears (15 vs 18 2D landmarks)
bodyHeight : Estimated height in meters
heightEstimation : .measured or .reference
cameraOriginMatrix : simd_float4x4 camera position/orientation relative to subject
pointInImage(_:) : Project 3D joint back to 2D image coordinates
let wrist2D = try observation.pointInImage(leftWrist)
VNPoint3D : Base class with simd_float4x4 position matrix
VNRecognizedPoint3D : Adds identifier (joint name)
VNHumanBodyRecognizedPoint3D : Adds localPosition and parentJoint
// Position relative to skeleton root (center of hip)
let modelPosition = leftWrist.position
// Position relative to parent joint (left elbow)
let relativePosition = leftWrist.localPosition
Vision accepts depth data alongside images:
// From AVDepthData
let handler = VNImageRequestHandler(
cvPixelBuffer: imageBuffer,
depthData: depthData,
orientation: orientation
)
// From file (automatic depth extraction)
let handler = VNImageRequestHandler(url: imageURL) // Depth auto-fetched
Depth formats : Disparity or Depth (interchangeable via AVFoundation)
LiDAR : Use in live capture sessions for accurate scale/measurement
Availability : iOS 11+
Detects face bounding boxes:
let request = VNDetectFaceRectanglesRequest()
try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] {
let faceBounds = observation.boundingBox // Normalized rect
}
Availability : iOS 11+
Detects face with detailed landmarks:
let request = VNDetectFaceLandmarksRequest()
try handler.perform([request])
for observation in request.results as? [VNFaceObservation] ?? [] {
if let landmarks = observation.landmarks {
let leftEye = landmarks.leftEye
let nose = landmarks.nose
let leftPupil = landmarks.leftPupil // Revision 2+
}
}
Revisions :
Availability : iOS 13+
Detects human bounding boxes (torso detection):
let request = VNDetectHumanRectanglesRequest()
try handler.perform([request])
for observation in request.results as? [VNHumanObservation] ?? [] {
let humanBounds = observation.boundingBox // Normalized rect
}
Use case : Faster than pose detection when you only need location
Composite subject on new background using Vision mask:
// 1. Get mask from Vision
let observation = request.results?.first as? VNInstanceMaskObservation
let visionMask = try observation.createScaledMask(
for: observation.allInstances,
croppedToInstancesContent: false
)
// 2. Convert to CIImage
let maskImage = CIImage(cvPixelBuffer: visionMask)
// 3. Apply filter
let filter = CIFilter(name: "CIBlendWithMask")!
filter.setValue(sourceImage, forKey: kCIInputImageKey)
filter.setValue(maskImage, forKey: kCIInputMaskImageKey)
filter.setValue(newBackground, forKey: kCIInputBackgroundImageKey)
let output = filter.outputImage // Composited result
Parameters :
HDR preservation : CoreImage preserves high dynamic range from input (Vision/VisionKit output is SDR)
Availability : iOS 13+, macOS 10.15+
Recognizes text in images with configurable accuracy/speed trade-off.
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate // Or .fast
request.recognitionLanguages = ["en-US", "de-DE"] // Order matters
request.usesLanguageCorrection = true
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for observation in request.results as? [VNRecognizedTextObservation] ?? [] {
// Get top candidates
let candidates = observation.topCandidates(3)
let bestText = candidates.first?.string ?? ""
}
| Level | Performance | Accuracy | Best For |
|---|---|---|---|
.fast | Real-time | Good | Camera feed, large text, signs |
.accurate | Slower | Excellent | Documents, receipts, handwriting |
Fast path : Character-by-character recognition (Neural Network → Character Detection)
Accurate path : Full-line ML recognition (Neural Network → Line/Word Recognition)
| Property | Type | Description |
|---|---|---|
recognitionLevel | VNRequestTextRecognitionLevel | .fast or .accurate |
recognitionLanguages | [String] | BCP 47 language codes, order = priority |
usesLanguageCorrection |
// Check supported languages for current settings
let languages = try VNRecognizeTextRequest.supportedRecognitionLanguages(
for: .accurate,
revision: VNRecognizeTextRequestRevision3
)
Language correction : Improves accuracy but takes processing time. Disable for codes/serial numbers.
Custom words : Add domain-specific vocabulary for better recognition (medical terms, product codes).
boundingBox : Normalized rect containing recognized text
topCandidates(_:) : Returns [VNRecognizedText] ordered by confidence
| Property | Type | Description |
|---|---|---|
string | String | Recognized text |
confidence | VNConfidence | 0.0-1.0 |
boundingBox(for:) | VNRectangleObservation? | Box for substring range |
// Get bounding box for substring
let text = candidate.string
if let range = text.range(of: "invoice") {
let box = try candidate.boundingBox(for: range)
}
Availability : iOS 11+, macOS 10.13+
Detects and decodes barcodes and QR codes.
let request = VNDetectBarcodesRequest()
request.symbologies = [.qr, .ean13, .code128] // Specific codes
let handler = VNImageRequestHandler(cgImage: image)
try handler.perform([request])
for barcode in request.results as? [VNBarcodeObservation] ?? [] {
let payload = barcode.payloadStringValue
let type = barcode.symbology
let bounds = barcode.boundingBox
}
1D Barcodes :
.codabar (iOS 15+).code39, .code39Checksum, .code39FullASCII, .code39FullASCIIChecksum.code93, .code93i.code128.ean8, .ean13.gs1DataBar, , (iOS 15+)2D Codes :
.aztec.dataMatrix.microPDF417 (iOS 15+).microQR (iOS 15+).pdf417.qrPerformance : Specifying fewer symbologies = faster detection
| Revision | iOS | Features |
|---|---|---|
| 1 | 11+ | Basic detection, one code at a time |
| 2 | 15+ | Codabar, GS1, MicroPDF, MicroQR, better ROI |
| 3 | 16+ | ML-based, multiple codes, better bounding boxes |
| Property | Type | Description |
|---|---|---|
payloadStringValue | String? | Decoded content |
symbology | VNBarcodeSymbology | Barcode type |
boundingBox | CGRect | Normalized bounds |
topLeft/topRight/bottomLeft/bottomRight |
Availability : iOS 16+
Camera-based live scanner with built-in UI for text and barcodes.
// Hardware support
DataScannerViewController.isSupported
// Runtime availability (camera access, parental controls)
DataScannerViewController.isAvailable
import VisionKit
let dataTypes: Set<DataScannerViewController.RecognizedDataType> = [
.barcode(symbologies: [.qr, .ean13]),
.text(textContentType: .URL), // Or nil for all text
// .text(languages: ["ja"]) // Filter by language
]
let scanner = DataScannerViewController(
recognizedDataTypes: dataTypes,
qualityLevel: .balanced, // .fast, .balanced, .accurate
recognizesMultipleItems: true,
isHighFrameRateTrackingEnabled: true,
isPinchToZoomEnabled: true,
isGuidanceEnabled: true,
isHighlightingEnabled: true
)
scanner.delegate = self
present(scanner, animated: true) {
try? scanner.startScanning()
}
| Type | Description |
|---|---|
.barcode(symbologies:) | Specific barcode types |
.text() | All text |
.text(languages:) | Text filtered by language |
.text(textContentType:) | Text filtered by type (URL, phone, email) |
protocol DataScannerViewControllerDelegate {
func dataScanner(_ dataScanner: DataScannerViewController,
didTapOn item: RecognizedItem)
func dataScanner(_ dataScanner: DataScannerViewController,
didAdd addedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didUpdate updatedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
didRemove removedItems: [RecognizedItem],
allItems: [RecognizedItem])
func dataScanner(_ dataScanner: DataScannerViewController,
becameUnavailableWithError error: DataScannerViewController.ScanningUnavailable)
}
enum RecognizedItem {
case text(RecognizedItem.Text)
case barcode(RecognizedItem.Barcode)
var id: UUID { get }
var bounds: RecognizedItem.Bounds { get }
}
// Text item
struct Text {
let transcript: String
}
// Barcode item
struct Barcode {
let payloadStringValue: String?
let observation: VNBarcodeObservation
}
// Alternative to delegate
for await items in scanner.recognizedItems {
// Current recognized items
}
// Add custom views over recognized items
scanner.overlayContainerView.addSubview(customHighlight)
// Capture still photo
let photo = try await scanner.capturePhoto()
Availability : iOS 13+
Document scanning with automatic edge detection, perspective correction, and lighting adjustment.
import VisionKit
let camera = VNDocumentCameraViewController()
camera.delegate = self
present(camera, animated: true)
protocol VNDocumentCameraViewControllerDelegate {
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan)
func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFailWithError error: Error)
}
| Property | Type | Description |
|---|---|---|
pageCount | Int | Number of scanned pages |
imageOfPage(at:) | UIImage | Get page image at index |
title | String | User-editable title |
func documentCameraViewController(_ controller: VNDocumentCameraViewController,
didFinishWith scan: VNDocumentCameraScan) {
controller.dismiss(animated: true)
for i in 0..<scan.pageCount {
let pageImage = scan.imageOfPage(at: i)
// Process with VNRecognizeTextRequest
}
}
Availability : iOS 15+, macOS 12+
Detects document boundaries for custom camera UIs or post-processing.
let request = VNDetectDocumentSegmentationRequest()
let handler = VNImageRequestHandler(ciImage: image)
try handler.perform([request])
guard let observation = request.results?.first as? VNRectangleObservation else {
return // No document found
}
// Get corner points (normalized)
let corners = [
observation.topLeft,
observation.topRight,
observation.bottomLeft,
observation.bottomRight
]
vs VNDetectRectanglesRequest :
Availability : iOS 26+, macOS 26+
Structured document understanding with semantic parsing.
let request = RecognizeDocumentsRequest()
let observations = try await request.perform(on: imageData)
guard let document = observations.first?.document else {
return
}
DocumentObservation
└── document: DocumentObservation.Document
├── text: TextObservation
├── tables: [Container.Table]
├── lists: [Container.List]
└── barcodes: [Container.Barcode]
for table in document.tables {
for row in table.rows {
for cell in row {
let text = cell.content.text.transcript
let detectedData = cell.content.text.detectedData
}
}
}
for data in document.text.detectedData {
switch data.match.details {
case .emailAddress(let email):
let address = email.emailAddress
case .phoneNumber(let phone):
let number = phone.phoneNumber
case .link(let url):
let link = url
case .address(let address):
let components = address
case .date(let date):
let dateValue = date
default:
break
}
}
TextObservation
├── transcript: String
├── lines: [TextObservation.Line]
├── paragraphs: [TextObservation.Paragraph]
├── words: [TextObservation.Word]
└── detectedData: [DetectedDataObservation]
Visual Intelligence is a system-level feature (iOS 26+) that lets users point their camera at real-world objects and find matching content across apps. This is distinct from the Vision framework (VNRequest-based image analysis) covered above. Vision analyzes images within your app; Visual Intelligence lets the system invoke your app when users search with the camera or screenshots.
IntentValueQuerySemanticContentDescriptor with labels and/or pixel dataAppEntity resultsimport VisualIntelligence
import AppIntents
The core object the system provides to describe what the user is looking at.
| Property | Type | Description |
|---|---|---|
labels | [String] | Classification labels for the detected item |
pixelBuffer | CVReadOnlyPixelBuffer? | Visual data of the detected item |
Use labels for fast keyword matching against your content catalog. Use the pixel buffer for image-similarity search when labels are insufficient.
The entry point for Visual Intelligence to communicate with your app. Implement values(for:) to receive search requests and return matching entities.
struct LandmarkIntentValueQuery: IntentValueQuery {
@Dependency var modelData: ModelData
func values(for input: SemanticContentDescriptor) async throws -> [LandmarkEntity] {
if !input.labels.isEmpty {
return try await modelData.search(matching: input.labels)
}
guard let pixelBuffer = input.pixelBuffer else { return [] }
return try await modelData.search(matching: pixelBuffer)
}
}
Use @UnionValue when your app can return different entity types from a single search.
@UnionValue
enum VisualSearchResult {
case landmark(LandmarkEntity)
case collection(CollectionEntity)
}
Visual Intelligence uses your entity's DisplayRepresentation to show results. Provide a title, subtitle, and image for each result.
struct LandmarkEntity: AppEntity {
var id: String
var name: String
var location: String
static var typeDisplayRepresentation: TypeDisplayRepresentation {
TypeDisplayRepresentation(
name: LocalizedStringResource("Landmark", table: "AppIntents"),
numericFormat: "\(placeholder: .int) landmarks"
)
}
var displayRepresentation: DisplayRepresentation {
DisplayRepresentation(
title: "\(name)",
subtitle: "\(location)",
image: .init(named: thumbnailImageName)
)
}
}
When a user taps a result, your app should open to the relevant content. Provide an appLinkURL on your entity.
var appLinkURL: URL? {
URL(string: "yourapp://landmark/\(id)")
}
For large result sets, provide a VisualIntelligenceSearchIntent that opens your app's full search UI.
struct ViewMoreLandmarksIntent: AppIntent, VisualIntelligenceSearchIntent {
static var title: LocalizedStringResource = "View More Landmarks"
@Parameter(title: "Semantic Content")
var semanticContent: SemanticContentDescriptor
func perform() async throws -> some IntentResult {
// Open your app's search view with the semantic content
return .result()
}
}
LocalizedStringResource for all user-facing text| API | Platform | Purpose |
|---|---|---|
VNGenerateForegroundInstanceMaskRequest | iOS 17+ | Class-agnostic subject instances |
VNGeneratePersonInstanceMaskRequest | iOS 17+ | Up to 4 people separately |
VNGeneratePersonSegmentationRequest | iOS 15+ | All people (single mask) |
ImageAnalysisInteraction (VisionKit) | iOS 16+ | UI for subject lifting |
| API | Platform | Landmarks | Coordinates |
|---|---|---|---|
VNDetectHumanHandPoseRequest | iOS 14+ | 21 per hand | 2D normalized |
VNDetectHumanBodyPoseRequest | iOS 14+ | 18 body joints | 2D normalized |
VNDetectHumanBodyPose3DRequest | iOS 17+ | 17 body joints | 3D meters |
| API | Platform | Purpose |
|---|---|---|
VNDetectFaceRectanglesRequest | iOS 11+ | Face bounding boxes |
VNDetectFaceLandmarksRequest | iOS 11+ | Face with detailed landmarks |
VNDetectHumanRectanglesRequest | iOS 13+ | Human torso bounding boxes |
| API | Platform | Purpose |
|---|---|---|
VNRecognizeTextRequest | iOS 13+ | Text recognition (OCR) |
VNDetectBarcodesRequest | iOS 11+ | Barcode/QR detection |
DataScannerViewController | iOS 16+ | Live camera scanner (text + barcodes) |
VNDocumentCameraViewController | iOS 13+ | Document scanning with perspective correction |
VNDetectDocumentSegmentationRequest |
| API | Platform | Purpose |
|---|---|---|
SemanticContentDescriptor | iOS 26+ | Describes what the user is looking at (labels + pixel buffer) |
IntentValueQuery | iOS 26+ | Entry point for receiving visual search requests |
VisualIntelligenceSearchIntent | iOS 26+ | "More results" deep link to your app |
| Observation | Returned By |
|---|---|
VNInstanceMaskObservation | Foreground/person instance masks |
VNPixelBufferObservation | Person segmentation (single mask) |
VNHumanHandPoseObservation | Hand pose |
VNHumanBodyPoseObservation | Body pose (2D) |
VNHumanBodyPose3DObservation | Body pose (3D) |
VNFaceObservation |
WWDC : 2019-234, 2021-10041, 2022-10024, 2022-10025, 2025-272, 2023-10176, 2023-111241, 2023-10048, 2020-10653, 2020-10043, 2020-10099
Docs : /vision, /visionkit, /visualintelligence, /visualintelligence/semanticcontentdescriptor, /vision/vnrecognizetextrequest, /vision/vndetectbarcodesrequest
Skills : axiom-vision, axiom-vision-diag
Weekly Installs
139
Repository
GitHub Stars
678
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode125
codex118
gemini-cli117
cursor114
github-copilot114
amp110
主题工厂技能:一键应用专业字体颜色主题,提升演示文稿设计效率
142 周安装
合规审计专家:SOC2、HIPAA、GDPR、PCI DSS 合规审计、差距分析与整改指导
141 周安装
agent-browser 浏览器自动化工具 - 快速网页交互与测试命令行工具
134 周安装
游戏安全逆向工程工具大全:调试、反编译、内存分析与反作弊绕过
135 周安装
Flutter 表单验证教程:使用 Form、TextFormField 和 GlobalKey 实现高效验证
135 周安装
Apache ECharts 教程 - 20+交互式图表制作与数据可视化技能
136 周安装
.rightLeg| hip, knee, ankle |
Bool| Use language model for correction |
customWords | [String] | Domain-specific vocabulary |
automaticallyDetectsLanguage | Bool | Auto-detect language (iOS 16+) |
minimumTextHeight | Float | Min text height as fraction of image (0-1) |
revision | Int | API version (affects supported languages) |
.gs1DataBarExpanded.gs1DataBarLimited.i2of5, .i2of5Checksum.itf14.upceCGPoint |
| Corner points |
| iOS 15+ |
| Programmatic document edge detection |
RecognizeDocumentsRequest | iOS 26+ | Structured document extraction |
| Face detection/landmarks |
VNHumanObservation | Human rectangles |
VNRecognizedTextObservation | Text recognition |
VNBarcodeObservation | Barcode detection |
VNRectangleObservation | Document segmentation |
DocumentObservation | Structured document (iOS 26+) |