npx skills add https://github.com/letta-ai/skills --skill video-processing此技能为涉及使用计算机视觉库(如 OpenCV)进行帧级分析、事件检测和运动跟踪的视频处理任务提供指导。它强调验证优先的方法,并防范视频分析工作流中的常见陷阱。
在编写检测算法之前,先建立对视频内容的真实理解:
# Essential first steps for any video analysis task
import cv2
cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration = frame_count / fps
print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")
关键:在预期事件位置提取帧以验证理解:
def save_frame(video_path, frame_num, output_path):
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
ret, frame = cap.read()
if ret:
cv2.imwrite(output_path, frame)
cap.release()
# Save frames at expected event times for visual inspection
save_frame("video.mp4", 50, "frame_050.png")
save_frame("video.mp4", 60, "frame_060.png")
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
开发检测算法时:
最终确定前:
将帧与参考帧(第一帧或前一帧)进行比较以检测运动:
# Background subtraction approach
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY)
first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)
# For each subsequent frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
diff = cv2.absdiff(first_frame, gray)
陷阱:如果场景变化或摄像机移动,第一帧可能不是合适的参考。
通过在阈值化图像中查找轮廓来识别对象:
_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
陷阱:阈值(例如 25)和最小轮廓面积在没有校准时是任意的。
对于检测跳跃或手势等事件,跨帧跟踪对象位置:
positions = [] # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
# ... detection code ...
if detected:
positions.append((frame_num, cx, cy, area))
陷阱:坐标系很重要。在图像坐标系中,Y 向下增加,因此“在帧中更高”意味着更小的 Y 值。
在检测到事件的时间保存帧以验证正确性:
# After detecting takeoff at frame N
save_frame(video_path, detected_takeoff, "detected_takeoff.png")
save_frame(video_path, detected_takeoff - 5, "before_takeoff.png")
save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")
检查检测到的事件在时间上是否合理:
duration_seconds = frame_count / fps
event_time = detected_frame / fps
# Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds
if event_time > duration_seconds - 0.5:
print("WARNING: Event detected very late in video - verify correctness")
确保事件以逻辑顺序发生:
if detected_landing <= detected_takeoff:
print("ERROR: Landing cannot occur before or at takeoff")
早期在多个输入上测试,以避免对单个视频特征的过拟合。
问题:完全依赖计算指标而没有视觉确认。
解决方案:始终保存并检查检测到事件位置的帧。
问题:当数据显示意外模式时,编造符合先入之见的解释,而不是质疑假设。
解决方案:当检测结果看起来错误时,调查根本原因,而不是合理化意外行为。
问题:使用没有经验依据的任意阈值(轮廓面积为 500,二值化阈值为 25)。
解决方案:从实际视频数据推导阈值,或使其可配置并设置合理的默认值。
问题:当检测在一系列帧上失败时,未经调查就假设这是预期行为。
解决方案:调查检测失败的原因——这可能表明算法缺陷,而非预期行为。
问题:误解 Y 坐标(在图像坐标系中,更小的 Y = 在帧中更高)。
解决方案:明确记录坐标系假设,并通过视觉检查进行验证。
问题:接受在时间上不合理(例如,在 4 秒视频的最后 0.8 秒检测到事件)的检测结果。
解决方案:对输出时间实施合理性检查。
问题:算法在一个视频上有效,但在其他视频上失败。
解决方案:在开发早期就在多个视频上测试。
输出结果时(例如到 TOML、JSON):
import numpy as np
# Convert numpy types to Python native types for serialization
result = {
"takeoff_frame": int(takeoff_frame), # Not np.int64
"landing_frame": int(landing_frame),
}
当检测结果不正确时:
每周安装数
189
代码仓库
GitHub 星标数
74
首次出现
2026 年 1 月 24 日
安全审计
安装于
opencode177
codex169
gemini-cli167
cursor160
github-copilot160
kimi-cli155
This skill provides guidance for video processing tasks involving frame-level analysis, event detection, and motion tracking using computer vision libraries like OpenCV. It emphasizes verification-first approaches and guards against common pitfalls in video analysis workflows.
Before writing detection algorithms, establish ground truth understanding of the video content:
# Essential first steps for any video analysis task
import cv2
cap = cv2.VideoCapture(video_path)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
duration = frame_count / fps
print(f"Frames: {frame_count}, FPS: {fps}, Duration: {duration:.2f}s")
Critical : Extract frames at expected event locations to verify understanding:
def save_frame(video_path, frame_num, output_path):
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
ret, frame = cap.read()
if ret:
cv2.imwrite(output_path, frame)
cap.release()
# Save frames at expected event times for visual inspection
save_frame("video.mp4", 50, "frame_050.png")
save_frame("video.mp4", 60, "frame_060.png")
When developing detection algorithms:
Before finalizing:
Compares frames against a reference (first frame or previous frame) to detect motion:
# Background subtraction approach
first_frame = cv2.cvtColor(first_frame, cv2.COLOR_BGR2GRAY)
first_frame = cv2.GaussianBlur(first_frame, (21, 21), 0)
# For each subsequent frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (21, 21), 0)
diff = cv2.absdiff(first_frame, gray)
Pitfall : First frame may not be a suitable reference if scene changes or camera moves.
Identifies objects by finding contours in thresholded images:
_, thresh = cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
Pitfall : Threshold values (e.g., 25) and minimum contour areas are arbitrary without calibration.
For detecting events like jumps or gestures, track object position across frames:
positions = [] # (frame_num, x, y, area) tuples
for frame_num in range(frame_count):
# ... detection code ...
if detected:
positions.append((frame_num, cx, cy, area))
Pitfall : Coordinate systems matter. In image coordinates, Y increases downward, so "higher in frame" means smaller Y values.
Save frames at detected event times to verify correctness:
# After detecting takeoff at frame N
save_frame(video_path, detected_takeoff, "detected_takeoff.png")
save_frame(video_path, detected_takeoff - 5, "before_takeoff.png")
save_frame(video_path, detected_takeoff + 5, "after_takeoff.png")
Check if detected events make temporal sense:
duration_seconds = frame_count / fps
event_time = detected_frame / fps
# Example: A jump in a 4-second video shouldn't be detected in the last 0.5 seconds
if event_time > duration_seconds - 0.5:
print("WARNING: Event detected very late in video - verify correctness")
Ensure events occur in logical order:
if detected_landing <= detected_takeoff:
print("ERROR: Landing cannot occur before or at takeoff")
Test on multiple inputs early to catch overfitting to single video characteristics.
Problem : Relying entirely on computed metrics without visual confirmation.
Solution : Always save and inspect frames at detected event locations.
Problem : When data shows unexpected patterns, inventing explanations that fit preconceptions rather than questioning assumptions.
Solution : When detection results seem wrong, investigate root causes rather than rationalizing unexpected behavior.
Problem : Using arbitrary thresholds (500 for contour area, 25 for binary threshold) without empirical basis.
Solution : Derive thresholds from actual video data or make them configurable with sensible defaults.
Problem : When detection fails for a range of frames, assuming this is expected behavior without investigation.
Solution : Investigate why detection fails - it may indicate algorithm flaws rather than expected behavior.
Problem : Misinterpreting Y coordinates (smaller Y = higher in frame in image coordinates).
Solution : Explicitly document coordinate system assumptions and verify with visual inspection.
Problem : Accepting detections that don't make temporal sense (e.g., event detected in last 0.8 seconds of a 4-second video).
Solution : Implement sanity checks on output timing.
Problem : Algorithm works on one video but fails on others.
Solution : Test on multiple videos early in development.
When outputting results (e.g., to TOML, JSON):
import numpy as np
# Convert numpy types to Python native types for serialization
result = {
"takeoff_frame": int(takeoff_frame), # Not np.int64
"landing_frame": int(landing_frame),
}
When detection results are incorrect:
Weekly Installs
189
Repository
GitHub Stars
74
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode177
codex169
gemini-cli167
cursor160
github-copilot160
kimi-cli155
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
47,700 周安装