ffmpeg-analyse-video by fabriqaai/ffmpeg-analyse-video-skill
npx skills add https://github.com/fabriqaai/ffmpeg-analyse-video-skill --skill ffmpeg-analyse-video使用 ffmpeg 从视频文件中提取帧。将帧读取任务委托给子智能体,以保留主上下文窗口。从纯文本的子智能体报告中合成结构化的带时间戳摘要。
问题:将数十张图像读入主对话上下文会消耗大部分上下文窗口,留给合成和后续操作的空间很少。
解决方案:一个 3 阶段流水线:
Main Agent Sub-Agents (disposable context)
────────── ──────────────────────────────
1. ffprobe 元数据 ───►
2. ffmpeg 帧提取 ───►
3. 将帧拆分成批次 ──► 4. 读取图像(视觉)
写入文本描述
到 batch_N_analysis.md
5. 仅读取文本文件 ◄─── (上下文被丢弃)
6. 合成最终输出
图像仅存在于子智能体上下文中。主智能体只读取轻量级的文本文件。这可将上下文使用量减少约 90%。
which ffmpeg && which ffprobe
如果缺少任何一个,显示特定平台的安装说明并停止:
brew install ffmpegsudo apt install ffmpegExtract frames from video files with ffmpeg. Delegate frame reading to sub-agents to preserve the main context window. Synthesise a structured timestamped summary from text-only sub-agent reports.
Problem : Reading dozens of images into the main conversation context consumes most of the context window, leaving little room for synthesis and follow-up.
Solution : A 3-phase pipeline:
Main Agent Sub-Agents (disposable context)
────────── ──────────────────────────────
1. ffprobe metadata ───►
2. ffmpeg frame extraction ───►
3. Split frames into batches ──► 4. Read images (vision)
Write text descriptions
to batch_N_analysis.md
5. Read text files only ◄─── (context discarded)
6. Synthesise final output
Images only ever exist inside sub-agent contexts. The main agent only reads lightweight text files. This cuts context usage by ~90%.
which ffmpeg && which ffprobe
If either is missing, show platform-specific install instructions and STOP:
brew install ffmpeg广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
choco install ffmpeg 或 winget install ffmpeg# macOS/Linux
TMPDIR="/tmp/video-analysis-$(date +%s)"
mkdir -p "$TMPDIR"
# Windows (PowerShell)
# $TMPDIR = "$env:TEMP\video-analysis-$(Get-Date -UFormat %s)"
# New-Item -ItemType Directory -Path $TMPDIR
ffprobe -v quiet -print_format json -show_format -show_streams "VIDEO_PATH"
提取并报告:时长、分辨率(宽 x 高)、fps、编解码器、文件大小、是否存在音频。
如果未找到视频流,报告“纯音频文件”并停止。如果文件大小 > 2GB,警告用户并建议使用 -ss START -to END 分析一个时间范围。
根据时长选择策略:
| 时长 | 策略 | 命令 |
|---|---|---|
| 0-60秒 | 每 2 秒 1 帧 | ffmpeg -hide_banner -y -i INPUT -vf "fps=1/2,scale='min(1280,iw)':-2" -q:v 5 DIR/frame_%04d.jpg |
| 1-10分钟 | 场景检测(阈值 0.3) | ffmpeg -hide_banner -y -i INPUT -vf "select='gt(scene,0.3)',scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/scene_%04d.jpg |
| 10-30分钟 | 关键帧提取 | ffmpeg -hide_banner -y -skip_frame nokey -i INPUT -vf "scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/key_%04d.jpg |
| 30分钟以上 | 缩略图过滤器 | ffmpeg -hide_banner -y -i INPUT -vf "thumbnail=SEGMENT_FRAMES,scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/thumb_%04d.jpg |
对于缩略图过滤器,计算 SEGMENT_FRAMES = total_frames / 60 以将输出限制在约 60 帧。
备用方案:
时间范围分析: 当用户指定范围时,在 -i 前添加 -ss START -to END。更高细节模式: 如果请求,将 fps 速率加倍,并将场景阈值降低到 0.2。
提取后,列出所有帧文件,并根据其序列号和提取速率计算每帧的时间戳。
这是节省上下文的关键步骤。 请不要在主对话中读取帧图像。相反,将帧分成批次,并将每个批次委托给一个子智能体。
将提取的帧文件列表分成每批 8-10 帧。为每个批次记录:
TMPDIR/batch_N_analysis.md为每个批次生成一个子智能体,使用下面的提示词。在工具支持的情况下并行启动所有批次——它们是完全独立的。
原样使用此提示词,替换占位符:
You are analysing frames extracted from a video file.
VIDEO: {filename}
DURATION: {duration}
BATCH: {batch_number} of {total_batches}
Read each frame image listed below using the Read tool (or equivalent file reading tool that supports images). For each frame, write a structured description.
FRAMES:
{for each frame in batch}
- {absolute_path_to_frame} (timestamp: {MM:SS})
{end for}
For each frame, describe:
1. SCENE: What is visible (layout, UI elements, environment)
2. CONTENT: Text, code, labels, menus, or dialogue visible on screen
3. ACTION: What is happening or has changed since the likely previous frame
4. DETAILS: Any notable specifics (error messages, URLs, file names, button states)
After describing all frames, add a BATCH SUMMARY section with:
- Content type (one of: Screencast, Presentation, Tutorial, Footage, Animation)
- Key events in this batch's time range
- Any text/prompts/commands the user typed (quote exactly)
Write the complete analysis to: {TMPDIR}/batch_{N}_analysis.md
Format the output file as:
# Batch {N} Analysis ({start_timestamp} - {end_timestamp})
## Frame-by-Frame
### Frame {sequence} ({timestamp})
- **Scene**: ...
- **Content**: ...
- **Action**: ...
- **Details**: ...
(repeat for each frame)
## Batch Summary
- **Content Type**: ...
- **Key Events**: ...
- **Quoted Text/Prompts**: ...
使用你的工具提供的任何子智能体、后台任务或独立智能体机制。要求很简单——每个子智能体需要:
如果你的工具支持,请并行启动所有批次——它们是完全独立的,没有共享状态。
如果你的工具没有子智能体机制,则回退到在主上下文中直接读取帧,但限制在最多 20 帧,并警告用户关于上下文使用情况。
所有子智能体完成后,读取文本分析文件。这些是轻量级的 Markdown 文件——没有图像进入主上下文。
ls TMPDIR/batch_*_analysis.md
按顺序读取每个 batch_N_analysis.md 文件。这些文件只包含文本描述——与读取原始图像相比,上下文开销是最小的。
仅使用批次分析文件中的文本,在主上下文中执行合成:
将输出格式化为:
# Video Analysis: [filename]
## Metadata
| Property | Value |
|----------|-------|
| Duration | M:SS |
| Resolution | WxH |
| FPS | N |
| Content Type | [detected] |
| Frames Analysed | N |
## Timeline
### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.
### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.
## Key Moments
1. **[M:SS] Title**: Description
2. **[M:SS] Title**: Description
3. **[M:SS] Title**: Description
## Summary
[2-5 sentence narrative paragraph summarising the entire video]
输出完成后删除临时目录:
# macOS/Linux
rm -rf "$TMPDIR"
# Windows (PowerShell)
# Remove-Item -Recurse -Force $TMPDIR
如果用户要求保留帧,则跳过清理。
时间范围 : "Analyse 2:00 to 5:00 of video.mp4" → 使用 -ss 120 -to 300
更高细节 : "Analyse in high detail" → 将帧速率加倍,将场景阈值降低到 0.2
聚焦区域 : "Focus on the code shown" → 在子智能体提示词中优先提取文本/代码
精灵图 : 为了获得视觉概览,生成一个联系表:
ffmpeg -hide_banner -y -i INPUT -vf "select='not(mod(n,EVERY_N))',scale='min(320,iw)':-2,tile=5xROWS" -frames:v 1 DIR/sprite.jpg
每周安装量
160
仓库
首次出现
2026年2月15日
安全审计
安装于
codex156
gemini-cli154
opencode154
github-copilot154
amp153
kimi-cli153
sudo apt install ffmpegchoco install ffmpeg or winget install ffmpeg# macOS/Linux
TMPDIR="/tmp/video-analysis-$(date +%s)"
mkdir -p "$TMPDIR"
# Windows (PowerShell)
# $TMPDIR = "$env:TEMP\video-analysis-$(Get-Date -UFormat %s)"
# New-Item -ItemType Directory -Path $TMPDIR
ffprobe -v quiet -print_format json -show_format -show_streams "VIDEO_PATH"
Extract and report: duration, resolution (width x height), fps, codec, file size, whether audio is present.
If no video stream is found, report "audio-only file" and STOP. If file size > 2GB, warn the user and suggest analysing a time range with -ss START -to END.
Choose strategy based on duration:
| Duration | Strategy | Command |
|---|---|---|
| 0-60s | 1 frame every 2s | ffmpeg -hide_banner -y -i INPUT -vf "fps=1/2,scale='min(1280,iw)':-2" -q:v 5 DIR/frame_%04d.jpg |
| 1-10min | Scene detection (threshold 0.3) | ffmpeg -hide_banner -y -i INPUT -vf "select='gt(scene,0.3)',scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/scene_%04d.jpg |
| 10-30min | Keyframe extraction | ffmpeg -hide_banner -y -skip_frame nokey -i INPUT -vf "scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/key_%04d.jpg |
| 30min+ | Thumbnail filter | ffmpeg -hide_banner -y -i INPUT -vf "thumbnail=SEGMENT_FRAMES,scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/thumb_%04d.jpg |
For thumbnail filter, calculate SEGMENT_FRAMES = total_frames / 60 to cap output at ~60 frames.
Fallbacks:
Time range analysis: When user specifies a range, prepend -ss START -to END before -i. Higher detail mode: If requested, double the fps rate and lower scene threshold to 0.2.
After extraction, list all frame files and calculate each frame's timestamp from its sequence number and the extraction rate.
This is the critical context-saving step. Do NOT read frame images in the main conversation. Instead, split frames into batches and delegate each batch to a sub-agent.
Split the extracted frame file list into batches of 8-10 frames each. For each batch, record:
TMPDIR/batch_N_analysis.mdFor each batch, spawn a sub-agent with the prompt below. Launch all batches in parallel where the tool supports it — they are fully independent.
Use this prompt verbatim, substituting the placeholders:
You are analysing frames extracted from a video file.
VIDEO: {filename}
DURATION: {duration}
BATCH: {batch_number} of {total_batches}
Read each frame image listed below using the Read tool (or equivalent file reading tool that supports images). For each frame, write a structured description.
FRAMES:
{for each frame in batch}
- {absolute_path_to_frame} (timestamp: {MM:SS})
{end for}
For each frame, describe:
1. SCENE: What is visible (layout, UI elements, environment)
2. CONTENT: Text, code, labels, menus, or dialogue visible on screen
3. ACTION: What is happening or has changed since the likely previous frame
4. DETAILS: Any notable specifics (error messages, URLs, file names, button states)
After describing all frames, add a BATCH SUMMARY section with:
- Content type (one of: Screencast, Presentation, Tutorial, Footage, Animation)
- Key events in this batch's time range
- Any text/prompts/commands the user typed (quote exactly)
Write the complete analysis to: {TMPDIR}/batch_{N}_analysis.md
Format the output file as:
# Batch {N} Analysis ({start_timestamp} - {end_timestamp})
## Frame-by-Frame
### Frame {sequence} ({timestamp})
- **Scene**: ...
- **Content**: ...
- **Action**: ...
- **Details**: ...
(repeat for each frame)
## Batch Summary
- **Content Type**: ...
- **Key Events**: ...
- **Quoted Text/Prompts**: ...
Use whatever sub-agent, background task, or independent agent mechanism your tool provides. The requirements are simple — each sub-agent needs to:
Launch all batches in parallel if your tool supports it — they are fully independent with no shared state.
If your tool has no sub-agent mechanism , fall back to reading frames directly in the main context but limit to 20 frames maximum and warn the user about context usage.
After all sub-agents complete, read the text analysis files. These are lightweight markdown — no images enter the main context.
ls TMPDIR/batch_*_analysis.md
Read each batch_N_analysis.md file in order. These contain only text descriptions — the context cost is minimal compared to reading the original images.
Using only the text from the batch analysis files, perform synthesis in the main context:
Format the output as:
# Video Analysis: [filename]
## Metadata
| Property | Value |
|----------|-------|
| Duration | M:SS |
| Resolution | WxH |
| FPS | N |
| Content Type | [detected] |
| Frames Analysed | N |
## Timeline
### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.
### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.
## Key Moments
1. **[M:SS] Title**: Description
2. **[M:SS] Title**: Description
3. **[M:SS] Title**: Description
## Summary
[2-5 sentence narrative paragraph summarising the entire video]
Remove the temp directory after output is complete:
# macOS/Linux
rm -rf "$TMPDIR"
# Windows (PowerShell)
# Remove-Item -Recurse -Force $TMPDIR
Skip cleanup if the user asks to keep frames.
Time range : "Analyse 2:00 to 5:00 of video.mp4" → use -ss 120 -to 300
Higher detail : "Analyse in high detail" → double frame rate, lower scene threshold to 0.2
Focus area : "Focus on the code shown" → prioritise text/code extraction in sub-agent prompts
Sprite sheet : For a visual overview, generate a contact sheet:
ffmpeg -hide_banner -y -i INPUT -vf "select='not(mod(n,EVERY_N))',scale='min(320,iw)':-2,tile=5xROWS" -frames:v 1 DIR/sprite.jpg
Weekly Installs
160
Repository
First Seen
Feb 15, 2026
Security Audits
Installed on
codex156
gemini-cli154
opencode154
github-copilot154
amp153
kimi-cli153
Skills CLI 使用指南:AI Agent 技能包管理器安装与管理教程
33,600 周安装
Copilot CLI 新手模式:为非技术用户设计的通俗易懂AI助手工具
5,300 周安装
Nuxt 4+ 渐进式开发指南:服务端路由、文件路由、中间件与插件
5,400 周安装
完整输出强制执行 - AI代码生成完整性保障工具 | 杜绝省略代码
5,600 周安装
Git高级工作流教程:交互式变基、拣选、二分查找与工作树实战指南
5,600 周安装
A股分析全能Skill:基于AKShare的自然语言股票分析工具,支持QQ/Telegram
5,600 周安装
苏格拉底式指导方法:GitHub Copilot 辅助初级开发者与 AI 新手高效学习编程
5,500 周安装