照片内容识别与整理专家：AI智能照片分析、人脸聚类、重复检测与美学评分

photo-content-recognition-curation-expert by erichowens/some_claude_skills

146 周安装量

85 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/erichowens/some_claude_skills --skill photo-content-recognition-curation-expert

AI/机器学习图像处理计算机视觉

🇨🇳中文介绍

照片内容识别与整理专家

精通照片内容分析与智能整理。结合经典计算机视觉与现代深度学习，提供全面的照片分析。

何时使用此技能

✅ 适用于：

人脸识别与聚类（识别重要人物）
动物/宠物检测与聚类
使用感知哈希进行近似重复检测（DINOHash、pHash、dHash）
连拍照片选择（从10-50张照片中找出最佳帧）
截图与照片分类
表情包/下载内容过滤
不适宜内容检测
大型照片库的快速索引（10K+）
美学质量评分

❌ 不适用于：

基于GPS的位置聚类 → event-detection-temporal-intelligence-expert
调色板提取 → color-theory-palette-harmony-expert
语义图像-文本匹配 → clip-aware-embeddings
视频分析或帧提取

快速决策树

What do you need to recognize/filter?
│
├─ Duplicate photos? ─────────────────────────────── Perceptual Hashing
│   ├─ Exact duplicates? ──────────────────────────── dHash (fastest)
│   ├─ Brightness/contrast changes? ───────────────── pHash (DCT-based)
│   ├─ Heavy crops/compression? ───────────────────── DINOHash (2025 SOTA)
│   └─ Production system? ─────────────────────────── Hybrid (pHash → DINOHash)
│
├─ People in photos? ─────────────────────────────── Face Clustering
│   ├─ Known thresholds? ──────────────────────────── Apple-style Agglomerative
│   └─ Unknown data distribution? ─────────────────── HDBSCAN
│
├─ Pets/Animals? ─────────────────────────────────── Pet Recognition
│   ├─ Detection? ─────────────────────────────────── YOLOv8
│   └─ Individual clustering? ─────────────────────── CLIP + HDBSCAN
│
├─ Best from burst? ──────────────────────────────── Burst Selection
│   └─ Score: sharpness + face quality + aesthetics
│
└─ Filter junk? ──────────────────────────────────── Content Detection
    ├─ Screenshots? ───────────────────────────────── Multi-signal classifier
    └─ NSFW? ──────────────────────────────────────── Safety classifier

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

1. 用于近似重复检测的感知哈希

问题： 相机连拍、重新保存的图像和轻微编辑会产生近似重复的照片。

解决方案： 感知哈希为视觉上相似的图像生成相似的值。

方法	速度	鲁棒性	最佳适用场景
dHash	最快	低	完全重复的照片
pHash	快	中等	亮度/对比度变化
DINOHash	较慢	高	严重裁剪、压缩
混合方法	中等	非常高	生产系统

混合流水线：

阶段 1： 快速 pHash 过滤（排除明显的非重复项）
阶段 2： DINOHash 精炼（精确检测）
阶段 3： 可选的 Siamese ViT 验证

汉明距离阈值：

保守：≤5 位不同 = 重复项
激进：≤10 位不同 = 重复项

→ 深入探讨 : references/perceptual-hashing.md

2. 人脸识别与聚类

目标： 无需用户标记，按人物对照片进行分组。

Apple Photos 策略：

提取人脸 + 上半身嵌入
两轮凝聚聚类
保守的第一轮
HAC 第二轮
新照片的增量更新

HDBSCAN 替代方案：

无需调整阈值
对噪声鲁棒
更适合未知数据分布

设置	凝聚聚类	HDBSCAN
第一轮阈值	0.4	-
第二轮阈值	0.6	-
最小聚类大小	-	3 张照片
度量标准	余弦	余弦

→ 深入探讨 : references/face-clustering.md

3. 连拍照片选择

问题： 连拍模式会产生 10-50 张几乎相同的照片。

多标准评分：

标准	权重	测量方式
清晰度	30%	拉普拉斯方差
人脸质量	35%	眼睛睁开、微笑、人脸清晰度
美学	20%	NIMA 分数
位置	10%	中间帧奖励
曝光	5%	直方图裁剪检查

连拍检测： 彼此间隔 0.5 秒内的照片。

→ 深入探讨 : references/content-detection.md

多信号方法：

信号	置信度	描述
UI 元素	0.85	检测到状态栏、按钮
完美矩形	0.75	>5 个 UI 按钮
高文本覆盖率	0.70	>25% 文本覆盖率
无相机 EXIF	0.60	缺少制造商/型号/镜头信息
设备宽高比	0.60	精确的手机屏幕比例
完美清晰度	0.50	>2000 拉普拉斯方差

决策： 置信度 >0.6 = 截图

→ 深入探讨 : references/content-detection.md

5. 快速索引流水线

目标： 通过缓存高效索引 10K+ 照片。

提取的特征：

感知哈希
人脸嵌入
CLIP 嵌入
调色板
美学分数

操作	时间
感知哈希	2 分钟
CLIP 嵌入	3 分钟
人脸检测	4 分钟
调色板	1 分钟
美学评分	2 分钟
聚类 + 去重	1 分钟
总计	~13 分钟
增量更新	< 1 分钟

→ 深入探讨 : references/photo-indexing.md

反模式：对人脸嵌入使用欧氏距离

distance = np.linalg.norm(embedding1 - embedding2)  # WRONG

错误原因： 人脸嵌入是归一化的；余弦相似度是正确的度量标准。

from scipy.spatial.distance import cosine
distance = cosine(embedding1, embedding2)  # Correct

反模式：固定的聚类阈值

表现： 对所有人物聚类使用相同的距离阈值。

错误原因： 不同人物的类内方差不同。

正确做法： 使用 HDBSCAN 自动发现阈值，或使用保守 + 宽松的两轮聚类。

反模式：使用原始像素比较检测重复项

is_duplicate = np.allclose(img1, img2)  # WRONG

错误原因： 重新保存的 JPEG、裁剪、亮度变化会产生像素差异。

正确做法： 使用感知哈希配合汉明距离。

反模式：顺序人脸检测

表现： 一次处理一张照片的人脸，不进行批处理。

错误原因： GPU 利用率不足，比批处理慢 10 倍。

正确做法： 使用 GPU 加速进行批处理。

反模式：无置信度过滤

for face in all_detected_faces:
    cluster(face)  # No filtering

错误原因： 低置信度检测会产生噪声聚类。

正确做法： 按置信度过滤。

反模式：强制每张照片都归入聚类

表现： 将噪声点分配给最近的聚类。

错误原因： 单人照片不应污染人物聚类。

正确做法： HDBSCAN/DBSCAN 自然识别噪声。保持噪声分离。

from photo_curation import PhotoCurationPipeline

pipeline = PhotoCurationPipeline()

# Index photo library
index = pipeline.index_library('/path/to/photos')

# De-duplicate
duplicates = index.find_duplicates()
print(f"Found {len(duplicates)} duplicate groups")

# Cluster faces
face_clusters = index.cluster_faces()
print(f"Found {len(face_clusters)} people")

# Select best from bursts
best_photos = pipeline.select_best_from_bursts(index)

# Filter screenshots
real_photos = pipeline.filter_screenshots(index)

# Curate for collage
collage_photos = pipeline.curate_for_collage(index, target_count=100)

torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract

event-detection-temporal-intelligence-expert : 提供时间事件聚类，用于事件感知整理
color-theory-palette-harmony-expert : 提取调色板以实现视觉多样性
collage-layout-expert : 接收整理好的照片进行拼贴
clip-aware-embeddings : 提供 CLIP 嵌入，用于语义搜索和 DeepDBSCAN

DINOHash (2025) : "Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
Apple Photos (2021) : "Recognizing People in Photos Through Private On-Device ML"
HDBSCAN : "Hierarchical Density-Based Spatial Clustering" (2013-2025)
Perceptual Hashing : dHash (Neal Krawetz), DCT-based pHash

版本 : 2.0.0 最后更新 : November 2025

🇺🇸English

Photo Content Recognition & Curation Expert

Expert in photo content analysis and intelligent curation. Combines classical computer vision with modern deep learning for comprehensive photo analysis.

When to Use This Skill

✅ Use for:

Face recognition and clustering (identifying important people)
Animal/pet detection and clustering
Near-duplicate detection using perceptual hashing (DINOHash, pHash, dHash)
Burst photo selection (finding best frame from 10-50 shots)
Screenshot vs photo classification
Meme/download filtering
NSFW content detection
Quick indexing for large photo libraries (10K+)
Aesthetic quality scoring (NIMA)

❌ NOT for:

GPS-based location clustering → event-detection-temporal-intelligence-expert
Color palette extraction → color-theory-palette-harmony-expert
Semantic image-text matching → clip-aware-embeddings
Video analysis or frame extraction

Quick Decision Tree

What do you need to recognize/filter?
│
├─ Duplicate photos? ─────────────────────────────── Perceptual Hashing
│   ├─ Exact duplicates? ──────────────────────────── dHash (fastest)
│   ├─ Brightness/contrast changes? ───────────────── pHash (DCT-based)
│   ├─ Heavy crops/compression? ───────────────────── DINOHash (2025 SOTA)
│   └─ Production system? ─────────────────────────── Hybrid (pHash → DINOHash)
│
├─ People in photos? ─────────────────────────────── Face Clustering
│   ├─ Known thresholds? ──────────────────────────── Apple-style Agglomerative
│   └─ Unknown data distribution? ─────────────────── HDBSCAN
│
├─ Pets/Animals? ─────────────────────────────────── Pet Recognition
│   ├─ Detection? ─────────────────────────────────── YOLOv8
│   └─ Individual clustering? ─────────────────────── CLIP + HDBSCAN
│
├─ Best from burst? ──────────────────────────────── Burst Selection
│   └─ Score: sharpness + face quality + aesthetics
│
└─ Filter junk? ──────────────────────────────────── Content Detection
    ├─ Screenshots? ───────────────────────────────── Multi-signal classifier
    └─ NSFW? ──────────────────────────────────────── Safety classifier

Core Concepts

1. Perceptual Hashing for Near-Duplicate Detection

Problem: Camera bursts, re-saved images, and minor edits create near-duplicates.

Solution: Perceptual hashes generate similar values for visually similar images.

Method Comparison:

Method	Speed	Robustness	Best For
dHash	Fastest	Low	Exact duplicates
pHash	Fast	Medium	Brightness/contrast changes
DINOHash	Slower	High	Heavy crops, compression
Hybrid	Medium	Very High	Production systems

Hybrid Pipeline (2025 Best Practice):

Stage 1: Fast pHash filtering (eliminates obvious non-duplicates)
Stage 2: DINOHash refinement (accurate detection)
Stage 3: Optional Siamese ViT verification

Hamming Distance Thresholds:

Conservative: ≤5 bits different = duplicates
Aggressive: ≤10 bits different = duplicates

→ Deep dive : references/perceptual-hashing.md

2. Face Recognition & Clustering

Goal: Group photos by person without user labeling.

Apple Photos Strategy (2021-2025):

Extract face + upper body embeddings (FaceNet, 512-dim)
Two-pass agglomerative clustering
Conservative first pass (threshold=0.4, high precision)
HAC second pass (threshold=0.6, increase recall)
Incremental updates for new photos

HDBSCAN Alternative:

No threshold tuning required
Robust to noise
Better for unknown data distributions

Parameters:

Setting	Agglomerative	HDBSCAN
Pass 1 threshold	0.4 (cosine)	-
Pass 2 threshold	0.6 (cosine)	-
Min cluster size	-	3 photos
Metric	cosine	cosine

→ Deep dive : references/face-clustering.md

3. Burst Photo Selection

Problem: Burst mode creates 10-50 nearly identical photos.

Multi-Criteria Scoring:

Criterion	Weight	Measurement
Sharpness	30%	Laplacian variance
Face Quality	35%	Eyes open, smiling, face sharpness
Aesthetics	20%	NIMA score
Position	10%	Middle frames bonus
Exposure	5%	Histogram clipping check

Burst Detection: Photos within 0.5 seconds of each other.

→ Deep dive : references/content-detection.md

4. Screenshot Detection

Multi-Signal Approach:

Signal	Confidence	Description
UI elements	0.85	Status bars, buttons detected
Perfect rectangles	0.75	>5 UI buttons (90° angles)
High text	0.70	>25% text coverage (OCR)
No camera EXIF	0.60	Missing Make/Model/Lens
Device aspect	0.60	Exact phone screen ratio
Perfect sharpness	0.50	>2000 Laplacian variance

Decision: Confidence >0.6 = screenshot

→ Deep dive : references/content-detection.md

5. Quick Indexing Pipeline

Goal: Index 10K+ photos efficiently with caching.

Features Extracted:

Perceptual hashes (de-duplication)
Face embeddings (people clustering)
CLIP embeddings (semantic search)
Color palettes
Aesthetic scores

Performance (10K photos, M1 MacBook Pro):

Operation	Time
Perceptual hashing	2 min
CLIP embeddings	3 min (GPU)
Face detection	4 min
Color palettes	1 min
Aesthetic scoring	2 min (GPU)
Clustering + dedup	1 min
Total (first run)	~13 min
Incremental	< 1 min

→ Deep dive : references/photo-indexing.md

Common Anti-Patterns

Anti-Pattern: Euclidean Distance for Face Embeddings

What it looks like:

distance = np.linalg.norm(embedding1 - embedding2)  # WRONG

Why it's wrong: Face embeddings are normalized; cosine similarity is the correct metric.

What to do instead:

from scipy.spatial.distance import cosine
distance = cosine(embedding1, embedding2)  # Correct

Anti-Pattern: Fixed Clustering Thresholds

What it looks like: Using same distance threshold for all face clusters.

Why it's wrong: Different people have varying intra-class variance (twins vs. diverse ages).

What to do instead: Use HDBSCAN for automatic threshold discovery, or two-pass clustering with conservative + relaxed passes.

Anti-Pattern: Raw Pixel Comparison for Duplicates

What it looks like:

is_duplicate = np.allclose(img1, img2)  # WRONG

Why it's wrong: Re-saved JPEGs, crops, brightness changes create pixel differences.

What to do instead: Perceptual hashing (pHash or DINOHash) with Hamming distance.

Anti-Pattern: Sequential Face Detection

What it looks like: Processing faces one photo at a time without batching.

Why it's wrong: GPU underutilization, 10x slower than batched.

What to do instead: Batch process images (batch_size=32) with GPU acceleration.

Anti-Pattern: No Confidence Filtering

What it looks like:

for face in all_detected_faces:
    cluster(face)  # No filtering

Why it's wrong: Low-confidence detections create noise clusters (hands, objects).

What to do instead: Filter by confidence (threshold 0.9 for faces).

Anti-Pattern: Forcing Every Photo into Clusters

What it looks like: Assigning noise points to nearest cluster.

Why it's wrong: Solo appearances shouldn't pollute person clusters.

What to do instead: HDBSCAN/DBSCAN naturally identifies noise (label=-1). Keep noise separate.

Quick Start

from photo_curation import PhotoCurationPipeline

pipeline = PhotoCurationPipeline()

# Index photo library
index = pipeline.index_library('/path/to/photos')

# De-duplicate
duplicates = index.find_duplicates()
print(f"Found {len(duplicates)} duplicate groups")

# Cluster faces
face_clusters = index.cluster_faces()
print(f"Found {len(face_clusters)} people")

# Select best from bursts
best_photos = pipeline.select_best_from_bursts(index)

# Filter screenshots
real_photos = pipeline.filter_screenshots(index)

# Curate for collage
collage_photos = pipeline.curate_for_collage(index, target_count=100)

Python Dependencies

torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract

Integration Points

event-detection-temporal-intelligence-expert : Provides temporal event clustering for event-aware curation
color-theory-palette-harmony-expert : Extracts color palettes for visual diversity
collage-layout-expert : Receives curated photos for assembly
clip-aware-embeddings : Provides CLIP embeddings for semantic search and DeepDBSCAN

References

DINOHash (2025) : "Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
Apple Photos (2021) : "Recognizing People in Photos Through Private On-Device ML"
HDBSCAN : "Hierarchical Density-Based Spatial Clustering" (2013-2025)
Perceptual Hashing : dHash (Neal Krawetz), DCT-based pHash

Version : 2.0.0 Last Updated : November 2025

Weekly Installs

116

Repository

erichowens/some…e_skills

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykPass

Installed on

opencode108

codex107

gemini-cli106

cursor106

github-copilot102

kimi-cli95

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

50,900 周安装

照片内容识别与整理专家：AI智能照片分析、人脸聚类、重复检测与美学评分

🇨🇳中文介绍

照片内容识别与整理专家

何时使用此技能

快速决策树

相关 Skills

核心概念

1. 用于近似重复检测的感知哈希

2. 人脸识别与聚类

3. 连拍照片选择

4. 截图检测

5. 快速索引流水线

常见反模式

反模式：对人脸嵌入使用欧氏距离

反模式：固定的聚类阈值

反模式：使用原始像素比较检测重复项

反模式：顺序人脸检测

反模式：无置信度过滤

反模式：强制每张照片都归入聚类

快速开始

Python 依赖项

集成点

参考文献

🇺🇸English

Photo Content Recognition & Curation Expert

When to Use This Skill

Quick Decision Tree

Core Concepts

1. Perceptual Hashing for Near-Duplicate Detection

2. Face Recognition & Clustering

3. Burst Photo Selection

4. Screenshot Detection

5. Quick Indexing Pipeline

Common Anti-Patterns

Anti-Pattern: Euclidean Distance for Face Embeddings

Anti-Pattern: Fixed Clustering Thresholds

Anti-Pattern: Raw Pixel Comparison for Duplicates

Anti-Pattern: Sequential Face Detection

Anti-Pattern: No Confidence Filtering

Anti-Pattern: Forcing Every Photo into Clusters

Quick Start

Python Dependencies

Integration Points

References

最新 Skills