senior-computer-vision by alirezarezvani/claude-skills
npx skills add https://github.com/alirezarezvani/claude-skills --skill senior-computer-vision面向目标检测、图像分割和视觉 AI 系统部署的生产级计算机视觉工程技能。
# 为 YOLO 或 Faster R-CNN 生成训练配置
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8
# 分析模型以寻找优化机会(量化、剪枝)
python scripts/inference_optimizer.py model.pt --target onnx --benchmark
# 构建包含数据增强的数据集流水线
python scripts/dataset_pipeline_builder.py images/ --format coco --augment
此技能提供以下方面的指导:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 类别 | 技术 |
|---|---|
| 框架 | PyTorch, torchvision, timm |
| 检测 | Ultralytics (YOLO), Detectron2, MMDetection |
| 分割 | segment-anything, mmsegmentation |
| 优化 | ONNX, TensorRT, OpenVINO, torch.compile |
| 图像处理 | OpenCV, Pillow, albumentations |
| 标注 | CVAT, Label Studio, Roboflow |
| 实验跟踪 | MLflow, Weights & Biases |
| 服务化 | Triton Inference Server, TorchServe |
从头开始构建目标检测系统时使用此工作流。
分析检测任务需求:
Detection Requirements Analysis:
- Target objects: [列出要检测的具体类别]
- Real-time requirement: [是/否,目标 FPS]
- Accuracy priority: [速度与准确性的权衡]
- Deployment target: [云端 GPU,边缘设备,移动端]
- Dataset size: [图像数量,每个类别的标注数量]
根据需求选择架构:
| 需求 | 推荐架构 | 原因 |
|---|---|---|
| 实时性 (>30 FPS) | YOLOv8/v11, RT-DETR | 单阶段,针对速度优化 |
| 高精度 | Faster R-CNN, DINO | 两阶段,定位更准 |
| 小目标 | YOLO + SAHI, Faster R-CNN + FPN | 多尺度检测 |
| 边缘部署 | YOLOv8n, MobileNetV3-SSD | 轻量级架构 |
| 基于变换器 | DETR, DINO, RT-DETR | 端到端,无需 NMS |
将标注转换为所需格式:
# COCO 格式(推荐)
python scripts/dataset_pipeline_builder.py data/images/ \
--annotations data/labels/ \
--format coco \
--split 0.8 0.1 0.1 \
--output data/coco/
# 验证数据集
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"
生成训练配置:
# 对于 Ultralytics YOLO
python scripts/vision_model_trainer.py data/coco/ \
--task detection \
--arch yolov8m \
--epochs 100 \
--batch 16 \
--imgsz 640 \
--output configs/
# 对于 Detectron2
python scripts/vision_model_trainer.py data/coco/ \
--task detection \
--arch faster_rcnn_R_50_FPN \
--framework detectron2 \
--output configs/
# Ultralytics 训练
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640
# Detectron2 训练
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1
# 在测试集上验证
yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml
需要分析的关键指标:
| 指标 | 目标值 | 描述 |
|---|---|---|
| mAP@50 | >0.7 | IoU 阈值为 0.5 时的平均精度均值 |
| mAP@50:95 | >0.5 | COCO 主要指标 |
| 精确率 | >0.8 | 低误报率 |
| 召回率 | >0.8 | 低漏检率 |
| 推理时间 | <33ms | 针对 30 FPS 实时性 |
为生产部署准备已训练模型时使用此工作流。
# 测量当前模型性能
python scripts/inference_optimizer.py model.pt \
--benchmark \
--input-size 640 640 \
--batch-sizes 1 4 8 16 \
--warmup 10 \
--iterations 100
预期输出:
Baseline Performance (PyTorch FP32):
- Batch 1: 45.2ms (22.1 FPS)
- Batch 4: 89.4ms (44.7 FPS)
- Batch 8: 165.3ms (48.4 FPS)
- Memory: 2.1 GB
- Parameters: 25.9M
| 部署目标 | 优化路径 |
|---|---|
| NVIDIA GPU (云端) | PyTorch → ONNX → TensorRT FP16 |
| NVIDIA GPU (边缘) | PyTorch → TensorRT INT8 |
| Intel CPU | PyTorch → ONNX → OpenVINO |
| Apple Silicon | PyTorch → CoreML |
| 通用 CPU | PyTorch → ONNX Runtime |
| 移动端 | PyTorch → TFLite 或 ONNX Mobile |
# 使用动态批次大小导出
python scripts/inference_optimizer.py model.pt \
--export onnx \
--input-size 640 640 \
--dynamic-batch \
--simplify \
--output model.onnx
# 验证 ONNX 模型
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"
使用校准进行 INT8 量化:
# 生成校准数据集
python scripts/inference_optimizer.py model.onnx \
--quantize int8 \
--calibration-data data/calibration/ \
--calibration-samples 500 \
--output model_int8.onnx
量化影响分析:
| 精度 | 大小 | 速度 | 精度下降 |
|---|---|---|---|
| FP32 | 100% | 1x | 0% |
| FP16 | 50% | 1.5-2x | <0.5% |
| INT8 | 25% | 2-4x | 1-3% |
# TensorRT (NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
# OpenVINO (Intel)
mo --input_model model.onnx --output_dir openvino/
# CoreML (Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"
python scripts/inference_optimizer.py model.engine \
--benchmark \
--runtime tensorrt \
--compare model.pt
预期加速效果:
Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAP
为训练准备计算机视觉数据集时使用此工作流。
# 分析图像数据集
python scripts/dataset_pipeline_builder.py data/raw/ \
--analyze \
--output analysis/
分析报告包括:
Dataset Analysis:
- Total images: 5,234
- Image sizes: 640x480 to 4096x3072 (variable)
- Formats: JPEG (4,891), PNG (343)
- Corrupted: 12 files
- Duplicates: 45 pairs
Annotation Analysis:
- Format detected: Pascal VOC XML
- Total annotations: 28,456
- Classes: 5 (car, person, bicycle, dog, cat)
- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
- Empty images: 234
# 移除损坏和重复的图像
python scripts/dataset_pipeline_builder.py data/raw/ \
--clean \
--remove-corrupted \
--remove-duplicates \
--output data/cleaned/
# 将 VOC 转换为 COCO 格式
python scripts/dataset_pipeline_builder.py data/cleaned/ \
--annotations data/annotations/ \
--input-format voc \
--output-format coco \
--output data/coco/
支持的格式转换:
| 源格式 | 目标格式 |
|---|---|
| Pascal VOC XML | COCO JSON |
| YOLO TXT | COCO JSON |
| COCO JSON | YOLO TXT |
| LabelMe JSON | COCO JSON |
| CVAT XML | COCO JSON |
# 生成增强配置
python scripts/dataset_pipeline_builder.py data/coco/ \
--augment \
--aug-config configs/augmentation.yaml \
--output data/augmented/
针对检测的推荐增强:
# configs/augmentation.yaml
augmentations:
geometric:
- horizontal_flip: { p: 0.5 }
- vertical_flip: { p: 0.1 } # 仅当方向不变时使用
- rotate: { limit: 15, p: 0.3 }
- scale: { scale_limit: 0.2, p: 0.5 }
color:
- brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
- hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
- blur: { blur_limit: 3, p: 0.1 }
advanced:
- mosaic: { p: 0.5 } # YOLO 风格的马赛克
- mixup: { p: 0.1 } # 图像混合
- cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }
python scripts/dataset_pipeline_builder.py data/augmented/ \
--split 0.8 0.1 0.1 \
--stratify \
--seed 42 \
--output data/final/
划分策略指南:
| 数据集大小 | 训练集 | 验证集 | 测试集 |
|---|---|---|---|
| <1,000 张图像 | 70% | 15% | 15% |
| 1,000-10,000 | 80% | 10% | 10% |
10,000 | 90% | 5% | 5%
# 对于 Ultralytics YOLO
python scripts/dataset_pipeline_builder.py data/final/ \
--generate-config yolo \
--output data.yaml
# 对于 Detectron2
python scripts/dataset_pipeline_builder.py data/final/ \
--generate-config detectron2 \
--output detectron2_config.py
| 架构 | 速度 | 精度 | 最佳适用场景 |
|---|---|---|---|
| YOLOv8n | 1.2ms | 37.3 mAP | 边缘、移动端、实时 |
| YOLOv8s | 2.1ms | 44.9 mAP | 速度/精度平衡 |
| YOLOv8m | 4.2ms | 50.2 mAP | 通用目的 |
| YOLOv8l | 6.8ms | 52.9 mAP | 高精度 |
| YOLOv8x | 10.1ms | 53.9 mAP | 最高精度 |
| RT-DETR-L | 5.3ms | 53.0 mAP | 变换器,无 NMS |
| Faster R-CNN R50 | 46ms | 40.2 mAP | 两阶段,高质量 |
| DINO-4scale | 85ms | 49.0 mAP | SOTA 变换器 |
| 架构 | 类型 | 速度 | 最佳适用场景 |
|---|---|---|---|
| YOLOv8-seg | 实例 | 4.5ms | 实时实例分割 |
| Mask R-CNN | 实例 | 67ms | 高质量掩码 |
| SAM | 可提示 | 50ms | 零样本分割 |
| DeepLabV3+ | 语义 | 25ms | 场景解析 |
| SegFormer | 语义 | 15ms | 高效语义分割 |
| 方面 | CNN (YOLO, R-CNN) | ViT (DETR, DINO) |
|---|---|---|
| 所需训练数据 | 1K-10K 张图像 | 10K-100K+ 张图像 |
| 训练时间 | 快 | 慢(需要更多轮次) |
| 推理速度 | 更快 | 更慢 |
| 小目标 | 配合 FPN 效果良好 | 需要多尺度 |
| 全局上下文 | 有限 | 优秀 |
| 位置编码 | 隐式 | 显式 |
→ 详情请参阅 references/reference-docs-and-commands.md
| 指标 | 实时性 | 高精度 | 边缘 |
|---|---|---|---|
| FPS | >30 | >10 | >15 |
| mAP@50 | >0.6 | >0.8 | >0.5 |
| P99 延迟 | <50ms | <150ms | <100ms |
| GPU 内存 | <4GB | <8GB | <2GB |
| 模型大小 | <50MB | <200MB | <20MB |
references/computer_vision_architectures.mdreferences/object_detection_optimization.mdreferences/production_vision_systems.mdscripts/ 目录下的自动化工具每周安装量
216
仓库
GitHub 星标
6.9K
首次出现
2026年1月20日
安全审计
安装于
claude-code176
opencode168
gemini-cli165
codex155
cursor144
github-copilot135
Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.
# Generate training configuration for YOLO or Faster R-CNN
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8
# Analyze model for optimization opportunities (quantization, pruning)
python scripts/inference_optimizer.py model.pt --target onnx --benchmark
# Build dataset pipeline with augmentations
python scripts/dataset_pipeline_builder.py images/ --format coco --augment
This skill provides guidance on:
| Category | Technologies |
|---|---|
| Frameworks | PyTorch, torchvision, timm |
| Detection | Ultralytics (YOLO), Detectron2, MMDetection |
| Segmentation | segment-anything, mmsegmentation |
| Optimization | ONNX, TensorRT, OpenVINO, torch.compile |
| Image Processing | OpenCV, Pillow, albumentations |
| Annotation | CVAT, Label Studio, Roboflow |
| Experiment Tracking | MLflow, Weights & Biases |
| Serving | Triton Inference Server, TorchServe |
Use this workflow when building an object detection system from scratch.
Analyze the detection task requirements:
Detection Requirements Analysis:
- Target objects: [list specific classes to detect]
- Real-time requirement: [yes/no, target FPS]
- Accuracy priority: [speed vs accuracy trade-off]
- Deployment target: [cloud GPU, edge device, mobile]
- Dataset size: [number of images, annotations per class]
Choose architecture based on requirements:
| Requirement | Recommended Architecture | Why |
|---|---|---|
| Real-time (>30 FPS) | YOLOv8/v11, RT-DETR | Single-stage, optimized for speed |
| High accuracy | Faster R-CNN, DINO | Two-stage, better localization |
| Small objects | YOLO + SAHI, Faster R-CNN + FPN | Multi-scale detection |
| Edge deployment | YOLOv8n, MobileNetV3-SSD | Lightweight architectures |
| Transformer-based | DETR, DINO, RT-DETR | End-to-end, no NMS required |
Convert annotations to required format:
# COCO format (recommended)
python scripts/dataset_pipeline_builder.py data/images/ \
--annotations data/labels/ \
--format coco \
--split 0.8 0.1 0.1 \
--output data/coco/
# Verify dataset
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"
Generate training configuration:
# For Ultralytics YOLO
python scripts/vision_model_trainer.py data/coco/ \
--task detection \
--arch yolov8m \
--epochs 100 \
--batch 16 \
--imgsz 640 \
--output configs/
# For Detectron2
python scripts/vision_model_trainer.py data/coco/ \
--task detection \
--arch faster_rcnn_R_50_FPN \
--framework detectron2 \
--output configs/
# Ultralytics training
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640
# Detectron2 training
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1
# Validate on test set
yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml
Key metrics to analyze:
| Metric | Target | Description |
|---|---|---|
| mAP@50 | >0.7 | Mean Average Precision at IoU 0.5 |
| mAP@50:95 | >0.5 | COCO primary metric |
| Precision | >0.8 | Low false positives |
| Recall | >0.8 | Low missed detections |
| Inference time | <33ms | For 30 FPS real-time |
Use this workflow when preparing a trained model for production deployment.
# Measure current model performance
python scripts/inference_optimizer.py model.pt \
--benchmark \
--input-size 640 640 \
--batch-sizes 1 4 8 16 \
--warmup 10 \
--iterations 100
Expected output:
Baseline Performance (PyTorch FP32):
- Batch 1: 45.2ms (22.1 FPS)
- Batch 4: 89.4ms (44.7 FPS)
- Batch 8: 165.3ms (48.4 FPS)
- Memory: 2.1 GB
- Parameters: 25.9M
| Deployment Target | Optimization Path |
|---|---|
| NVIDIA GPU (cloud) | PyTorch → ONNX → TensorRT FP16 |
| NVIDIA GPU (edge) | PyTorch → TensorRT INT8 |
| Intel CPU | PyTorch → ONNX → OpenVINO |
| Apple Silicon | PyTorch → CoreML |
| Generic CPU | PyTorch → ONNX Runtime |
| Mobile | PyTorch → TFLite or ONNX Mobile |
# Export with dynamic batch size
python scripts/inference_optimizer.py model.pt \
--export onnx \
--input-size 640 640 \
--dynamic-batch \
--simplify \
--output model.onnx
# Verify ONNX model
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"
For INT8 quantization with calibration:
# Generate calibration dataset
python scripts/inference_optimizer.py model.onnx \
--quantize int8 \
--calibration-data data/calibration/ \
--calibration-samples 500 \
--output model_int8.onnx
Quantization impact analysis:
| Precision | Size | Speed | Accuracy Drop |
|---|---|---|---|
| FP32 | 100% | 1x | 0% |
| FP16 | 50% | 1.5-2x | <0.5% |
| INT8 | 25% | 2-4x | 1-3% |
# TensorRT (NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16
# OpenVINO (Intel)
mo --input_model model.onnx --output_dir openvino/
# CoreML (Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"
python scripts/inference_optimizer.py model.engine \
--benchmark \
--runtime tensorrt \
--compare model.pt
Expected speedup:
Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAP
Use this workflow when preparing a computer vision dataset for training.
# Analyze image dataset
python scripts/dataset_pipeline_builder.py data/raw/ \
--analyze \
--output analysis/
Analysis report includes:
Dataset Analysis:
- Total images: 5,234
- Image sizes: 640x480 to 4096x3072 (variable)
- Formats: JPEG (4,891), PNG (343)
- Corrupted: 12 files
- Duplicates: 45 pairs
Annotation Analysis:
- Format detected: Pascal VOC XML
- Total annotations: 28,456
- Classes: 5 (car, person, bicycle, dog, cat)
- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
- Empty images: 234
# Remove corrupted and duplicate images
python scripts/dataset_pipeline_builder.py data/raw/ \
--clean \
--remove-corrupted \
--remove-duplicates \
--output data/cleaned/
# Convert VOC to COCO format
python scripts/dataset_pipeline_builder.py data/cleaned/ \
--annotations data/annotations/ \
--input-format voc \
--output-format coco \
--output data/coco/
Supported format conversions:
| From | To |
|---|---|
| Pascal VOC XML | COCO JSON |
| YOLO TXT | COCO JSON |
| COCO JSON | YOLO TXT |
| LabelMe JSON | COCO JSON |
| CVAT XML | COCO JSON |
# Generate augmentation config
python scripts/dataset_pipeline_builder.py data/coco/ \
--augment \
--aug-config configs/augmentation.yaml \
--output data/augmented/
Recommended augmentations for detection:
# configs/augmentation.yaml
augmentations:
geometric:
- horizontal_flip: { p: 0.5 }
- vertical_flip: { p: 0.1 } # Only if orientation invariant
- rotate: { limit: 15, p: 0.3 }
- scale: { scale_limit: 0.2, p: 0.5 }
color:
- brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
- hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
- blur: { blur_limit: 3, p: 0.1 }
advanced:
- mosaic: { p: 0.5 } # YOLO-style mosaic
- mixup: { p: 0.1 } # Image mixing
- cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }
python scripts/dataset_pipeline_builder.py data/augmented/ \
--split 0.8 0.1 0.1 \
--stratify \
--seed 42 \
--output data/final/
Split strategy guidelines:
| Dataset Size | Train | Val | Test |
|---|---|---|---|
| <1,000 images | 70% | 15% | 15% |
| 1,000-10,000 | 80% | 10% | 10% |
10,000 | 90% | 5% | 5%
# For Ultralytics YOLO
python scripts/dataset_pipeline_builder.py data/final/ \
--generate-config yolo \
--output data.yaml
# For Detectron2
python scripts/dataset_pipeline_builder.py data/final/ \
--generate-config detectron2 \
--output detectron2_config.py
| Architecture | Speed | Accuracy | Best For |
|---|---|---|---|
| YOLOv8n | 1.2ms | 37.3 mAP | Edge, mobile, real-time |
| YOLOv8s | 2.1ms | 44.9 mAP | Balanced speed/accuracy |
| YOLOv8m | 4.2ms | 50.2 mAP | General purpose |
| YOLOv8l | 6.8ms | 52.9 mAP | High accuracy |
| YOLOv8x | 10.1ms | 53.9 mAP | Maximum accuracy |
| RT-DETR-L | 5.3ms | 53.0 mAP | Transformer, no NMS |
| Faster R-CNN R50 | 46ms | 40.2 mAP | Two-stage, high quality |
| Architecture | Type | Speed | Best For |
|---|---|---|---|
| YOLOv8-seg | Instance | 4.5ms | Real-time instance seg |
| Mask R-CNN | Instance | 67ms | High-quality masks |
| SAM | Promptable | 50ms | Zero-shot segmentation |
| DeepLabV3+ | Semantic | 25ms | Scene parsing |
| SegFormer | Semantic | 15ms | Efficient semantic seg |
| Aspect | CNN (YOLO, R-CNN) | ViT (DETR, DINO) |
|---|---|---|
| Training data needed | 1K-10K images | 10K-100K+ images |
| Training time | Fast | Slow (needs more epochs) |
| Inference speed | Faster | Slower |
| Small objects | Good with FPN | Needs multi-scale |
| Global context | Limited | Excellent |
| Positional encoding | Implicit | Explicit |
→ See references/reference-docs-and-commands.md for details
| Metric | Real-time | High Accuracy | Edge |
|---|---|---|---|
| FPS | >30 | >10 | >15 |
| mAP@50 | >0.6 | >0.8 | >0.5 |
| Latency P99 | <50ms | <150ms | <100ms |
| GPU Memory | <4GB | <8GB | <2GB |
| Model Size | <50MB | <200MB | <20MB |
references/computer_vision_architectures.mdreferences/object_detection_optimization.mdreferences/production_vision_systems.mdscripts/ directory for automation toolsWeekly Installs
216
Repository
GitHub Stars
6.9K
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code176
opencode168
gemini-cli165
codex155
cursor144
github-copilot135
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
60,400 周安装
Claude技能开发指南:创建自定义MCP管道技能与元技能开发教程
211 周安装
色彩可访问性指南:WCAG对比度标准、色盲模拟与最佳实践
212 周安装
AgentOps技能转换器 - 一键将技能转换为Codex、Cursor等AI平台格式
212 周安装
Agile Skill Build:快速创建和扩展ace-skills的自动化工具,提升AI技能开发效率
1 周安装
LLM评估工具lm-evaluation-harness使用指南:HuggingFace模型基准测试与性能分析
212 周安装
Agently TriggerFlow 状态与资源管理:runtime_data、flow_data 和运行时资源详解
1 周安装
| DINO-4scale |
| 85ms |
| 49.0 mAP |
| SOTA transformer |