视频处理技能指南：OpenCV事件检测与运动跟踪实战教程

video-processing by letta-ai/skills

189 周安装量

84 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/letta-ai/skills --skill video-processing

AI/机器学习图像处理数据处理

🇨🇳中文介绍

视频处理

概述

此技能为涉及使用计算机视觉库（如 OpenCV）进行帧级分析、事件检测和运动跟踪的视频处理任务提供指导。它强调验证优先的方法，并防范视频分析工作流中的常见陷阱。

核心方法：先验证后实施

在编写检测算法之前，先建立对视频内容的真实理解：

提取并检查样本帧 - 将关键帧保存为图像，以直观验证特定帧号处发生的情况
理解视频元数据 - 帧数、FPS、时长、分辨率
将预期事件映射到帧范围 - 如果存在测试数据，了解哪些帧对应哪些事件
先构建诊断工具 - 帧提取和可视化工具提供关键洞察

事件检测任务工作流

阶段 1：视频探索

# Essential first steps for any video analysis task
import cv2

cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration = frame_count / fps

print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")

关键：在预期事件位置提取帧以验证理解：

def save_frame(video_path, frame_num, output_path):
    cap = cv2.VideoCapture(video_path)
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
    ret, frame = cap.read()
    if ret:
        cv2.imwrite(output_path, frame)
    cap.release()

# Save frames at expected event times for visual inspection
save_frame("video.mp4", 50, "frame_050.png")
save_frame("video.mp4", 60, "frame_060.png")

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

阶段 2：算法开发

开发检测算法时：

从简单开始 - 在采用复杂方法之前，先使用基本的帧差法或阈值法
使用可配置的阈值 - 避免硬编码的魔法数字；从数据中推导
先在已知帧上测试 - 验证算法在已知真实情况的帧上产生预期结果
记录中间值 - 跟踪每一帧的指标以理解算法行为

合理性检查输出 - 检测到的事件是否以合理的顺序和时间发生？
在多个视频上测试 - 验证在不同输入上的泛化能力
与预期范围比较 - 如果存在真实情况，验证检测准确性

将帧与参考帧（第一帧或前一帧）进行比较以检测运动：

# Background subtraction approach
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY)
first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)

# For each subsequent frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
diff = cv2.absdiff(first_frame, gray)

陷阱：如果场景变化或摄像机移动，第一帧可能不是合适的参考。

基于轮廓的检测

通过在阈值化图像中查找轮廓来识别对象：

_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

陷阱：阈值（例如 25）和最小轮廓面积在没有校准时是任意的。

随时间跟踪位置

对于检测跳跃或手势等事件，跨帧跟踪对象位置：

positions = []  # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
    # ... detection code ...
    if detected:
        positions.append((frame_num, cx, cy, area))

陷阱：坐标系很重要。在图像坐标系中，Y 向下增加，因此“在帧中更高”意味着更小的 Y 值。

在检测到事件的时间保存帧以验证正确性：

# After detecting takeoff at frame N
save_frame(video_path, detected_takeoff, "detected_takeoff.png")
save_frame(video_path, detected_takeoff - 5, "before_takeoff.png")
save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")

检查检测到的事件在时间上是否合理：

duration_seconds = frame_count / fps
event_time = detected_frame / fps

# Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds
if event_time > duration_seconds - 0.5:
    print("WARNING: Event detected very late in video - verify correctness")

确保事件以逻辑顺序发生：

if detected_landing <= detected_takeoff:
    print("ERROR: Landing cannot occur before or at takeoff")

早期在多个输入上测试，以避免对单个视频特征的过拟合。

1. 没有真实情况验证

问题：完全依赖计算指标而没有视觉确认。

解决方案：始终保存并检查检测到事件位置的帧。

2. 数据解释中的确认偏误

问题：当数据显示意外模式时，编造符合先入之见的解释，而不是质疑假设。

解决方案：当检测结果看起来错误时，调查根本原因，而不是合理化意外行为。

3. 魔法数字阈值

问题：使用没有经验依据的任意阈值（轮廓面积为 500，二值化阈值为 25）。

解决方案：从实际视频数据推导阈值，或使其可配置并设置合理的默认值。

4. 忽略检测间隙

问题：当检测在一系列帧上失败时，未经调查就假设这是预期行为。

解决方案：调查检测失败的原因——这可能表明算法缺陷，而非预期行为。

问题：误解 Y 坐标（在图像坐标系中，更小的 Y = 在帧中更高）。

解决方案：明确记录坐标系假设，并通过视觉检查进行验证。

6. 忽略时间合理性

问题：接受在时间上不合理（例如，在 4 秒视频的最后 0.8 秒检测到事件）的检测结果。

解决方案：对输出时间实施合理性检查。

7. 单视频过拟合

问题：算法在一个视频上有效，但在其他视频上失败。

解决方案：在开发早期就在多个视频上测试。

输出格式注意事项

输出结果时（例如到 TOML、JSON）：

import numpy as np

# Convert numpy types to Python native types for serialization
result = {
    "takeoff_frame": int(takeoff_frame),  # Not np.int64
    "landing_frame": int(landing_frame),
}

当检测结果不正确时：

我是否已对预期事件时间的帧进行了视觉检查？
我是否已对我检测到事件时间的帧进行了视觉检查？
根据视频时长，我检测到的时间在时间上是否合理？
我是否已在已知真实情况的帧上验证了我的算法？
我是否正确解释了坐标系？
我是否已在多个视频上测试过？
我的阈值是来自数据还是任意的？
当检测在某些帧上失败时，我是否理解原因？

2026 年 1 月 24 日

🇺🇸English

Video Processing

Overview

This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.

Core Approach: Verify Before Implementing

Before writing detection algorithms, establish ground truth understanding of the video content:

Extract and inspect sample frames - Save key frames as images to visually verify what is happening at specific frame numbers
Understand video metadata - Frame count, FPS, duration, resolution
Map expected events to frame ranges - If test data exists, understand what frames correspond to which events
Build diagnostic tools first - Frame extraction and visualization utilities provide critical insight

Workflow for Event Detection Tasks

Phase 1: Video Exploration

# Essential first steps for any video analysis task
import cv2

cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration = frame_count / fps

print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")

Critical : Extract frames at expected event locations to verify understanding:

def save_frame(video_path, frame_num, output_path):
    cap = cv2.VideoCapture(video_path)
    cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
    ret, frame = cap.read()
    if ret:
        cv2.imwrite(output_path, frame)
    cap.release()

# Save frames at expected event times for visual inspection
save_frame("video.mp4", 50, "frame_050.png")
save_frame("video.mp4", 60, "frame_060.png")

Phase 2: Algorithm Development

When developing detection algorithms:

Start simple - Basic frame differencing or thresholding before complex approaches
Use configurable thresholds - Avoid hardcoded magic numbers; derive from data
Test on known frames first - Verify algorithm produces expected results on frames with known ground truth
Log intermediate values - Track metrics at each frame to understand algorithm behavior

Phase 3: Validation

Before finalizing:

Sanity check outputs - Do detected events occur in reasonable order and timing?
Test on multiple videos - Verify generalization across different inputs
Compare against expected ranges - If ground truth exists, verify detection accuracy

Common Detection Approaches

Frame Differencing

Compares frames against a reference (first frame or previous frame) to detect motion:

# Background subtraction approach
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY)
first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)

# For each subsequent frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
diff = cv2.absdiff(first_frame, gray)

Pitfall : First frame may not be a suitable reference if scene changes or camera moves.

Contour-Based Detection

Identifies objects by finding contours in thresholded images:

_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

Pitfall : Threshold values (e.g., 25) and minimum contour areas are arbitrary without calibration.

Tracking Position Over Time

For detecting events like jumps or gestures, track object position across frames:

positions = []  # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
    # ... detection code ...
    if detected:
        positions.append((frame_num, cx, cy, area))

Pitfall : Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.

Verification Strategies

1. Visual Inspection

Save frames at detected event times to verify correctness:

# After detecting takeoff at frame N
save_frame(video_path, detected_takeoff, "detected_takeoff.png")
save_frame(video_path, detected_takeoff - 5, "before_takeoff.png")
save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")

2. Timing Reasonableness

Check if detected events make temporal sense:

duration_seconds = frame_count / fps
event_time = detected_frame / fps

# Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds
if event_time > duration_seconds - 0.5:
    print("WARNING: Event detected very late in video - verify correctness")

3. Sequence Validation

Ensure events occur in logical order:

if detected_landing <= detected_takeoff:
    print("ERROR: Landing cannot occur before or at takeoff")

4. Multi-Video Testing

Test on multiple inputs early to catch overfitting to single video characteristics.

Common Pitfalls

1. No Ground Truth Verification

Problem : Relying entirely on computed metrics without visual confirmation.

Solution : Always save and inspect frames at detected event locations.

2. Confirmation Bias in Data Interpretation

Problem : When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.

Solution : When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.

3. Magic Number Thresholds

Problem : Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.

Solution : Derive thresholds from actual video data or make them configurable with sensible defaults.

4. Ignoring Detection Gaps

Problem : When detection fails for a range of frames, assuming this is expected behavior without investigation.

Solution : Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.

5. Coordinate System Confusion

Problem : Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).

Solution : Explicitly document coordinate system assumptions and verify with visual inspection.

6. Ignoring Timing Reasonableness

Problem : Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).

Solution : Implement sanity checks on output timing.

7. Single Video Overfitting

Problem : Algorithm works on one video but fails on others.

Solution : Test on multiple videos early in development.

Output Format Considerations

When outputting results (e.g., to TOML, JSON):

import numpy as np

# Convert numpy types to Python native types for serialization
result = {
    "takeoff_frame": int(takeoff_frame),  # Not np.int64
    "landing_frame": int(landing_frame),
}

Debugging Checklist

When detection results are incorrect:

Have I visually inspected frames at the expected event times?
Have I visually inspected frames at my detected event times?
Do my detected times make temporal sense given video duration?
Have I verified my algorithm on frames with known ground truth?
Am I correctly interpreting the coordinate system?
Have I tested on multiple videos?
Are my thresholds derived from data or arbitrary?
When detection fails on some frames, do I understand why?

Weekly Installs

189

Repository

letta-ai/skills

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode177

codex169

gemini-cli167

cursor160

github-copilot160

kimi-cli155

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

47,700 周安装