FFmpeg视频分析工具：高效提取帧并生成结构化摘要，节省90%上下文

ffmpeg-analyse-video by fabriqaai/ffmpeg-analyse-video-skill

160 周安装量

GitHub

安装命令

npx skills add https://github.com/fabriqaai/ffmpeg-analyse-video-skill --skill ffmpeg-analyse-video

自动化音频处理计算机视觉

🇨🇳中文介绍

FFmpeg 视频分析

使用 ffmpeg 从视频文件中提取帧。将帧读取任务委托给子智能体，以保留主上下文窗口。从纯文本的子智能体报告中合成结构化的带时间戳摘要。

架构：上下文高效的子智能体流水线

问题：将数十张图像读入主对话上下文会消耗大部分上下文窗口，留给合成和后续操作的空间很少。

解决方案：一个 3 阶段流水线：

Main Agent                          Sub-Agents (disposable context)
──────────                          ──────────────────────────────
1. ffprobe 元数据        ───►
2. ffmpeg 帧提取 ───►
3. 将帧拆分成批次 ──►   4. 读取图像（视觉）
                                      写入文本描述
                                      到 batch_N_analysis.md
5. 仅读取文本文件    ◄───    (上下文被丢弃)
6. 合成最终输出

图像仅存在于子智能体上下文中。主智能体只读取轻量级的文本文件。这可将上下文使用量减少约 90%。

1. 先决条件

which ffmpeg && which ffprobe

如果缺少任何一个，显示特定平台的安装说明并停止：

macOS : brew install ffmpeg
Ubuntu/Debian : sudo apt install ffmpeg

🇺🇸English

FFmpeg Video Analysis

Extract frames from video files with ffmpeg. Delegate frame reading to sub-agents to preserve the main context window. Synthesise a structured timestamped summary from text-only sub-agent reports.

Architecture: Context-Efficient Sub-Agent Pipeline

Problem : Reading dozens of images into the main conversation context consumes most of the context window, leaving little room for synthesis and follow-up.

Solution : A 3-phase pipeline:

Main Agent                          Sub-Agents (disposable context)
──────────                          ──────────────────────────────
1. ffprobe metadata        ───►
2. ffmpeg frame extraction ───►
3. Split frames into batches ──►   4. Read images (vision)
                                      Write text descriptions
                                      to batch_N_analysis.md
5. Read text files only    ◄───    (context discarded)
6. Synthesise final output

Images only ever exist inside sub-agent contexts. The main agent only reads lightweight text files. This cuts context usage by ~90%.

1. Prerequisites

which ffmpeg && which ffprobe

If either is missing, show platform-specific install instructions and STOP:

macOS : brew install ffmpeg

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 设置临时目录

# macOS/Linux
TMPDIR="/tmp/video-analysis-$(date +%s)"
mkdir -p "$TMPDIR"

# Windows (PowerShell)
# $TMPDIR = "$env:TEMP\video-analysis-$(Get-Date -UFormat %s)"
# New-Item -ItemType Directory -Path $TMPDIR

3. 提取视频元数据

ffprobe -v quiet -print_format json -show_format -show_streams "VIDEO_PATH"

提取并报告：时长、分辨率（宽 x 高）、fps、编解码器、文件大小、是否存在音频。

如果未找到视频流，报告“纯音频文件”并停止。如果文件大小 > 2GB，警告用户并建议使用 -ss START -to END 分析一个时间范围。

根据时长选择策略：

时长	策略	命令
0-60秒	每 2 秒 1 帧	`ffmpeg -hide_banner -y -i INPUT -vf "fps=1/2,scale='min(1280,iw)':-2" -q:v 5 DIR/frame_%04d.jpg`
1-10分钟	场景检测（阈值 0.3）	`ffmpeg -hide_banner -y -i INPUT -vf "select='gt(scene,0.3)',scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/scene_%04d.jpg`
10-30分钟	关键帧提取	`ffmpeg -hide_banner -y -skip_frame nokey -i INPUT -vf "scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/key_%04d.jpg`
30分钟以上	缩略图过滤器	`ffmpeg -hide_banner -y -i INPUT -vf "thumbnail=SEGMENT_FRAMES,scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/thumb_%04d.jpg`

对于缩略图过滤器，计算 SEGMENT_FRAMES = total_frames / 60 以将输出限制在约 60 帧。

场景检测产生 0 帧 → 以 1 帧/5秒的间隔重试
提取的帧超过 100 帧 → 均匀下采样至 80 帧
帧提取失败 → 尝试下一个更简单的策略（场景检测 → 间隔提取，关键帧 → 间隔提取）

时间范围分析： 当用户指定范围时，在 -i 前添加 -ss START -to END。更高细节模式： 如果请求，将 fps 速率加倍，并将场景阈值降低到 0.2。

提取后，列出所有帧文件，并根据其序列号和提取速率计算每帧的时间戳。

5. 将帧分析委托给子智能体

这是节省上下文的关键步骤。 请不要在主对话中读取帧图像。相反，将帧分成批次，并将每个批次委托给一个子智能体。

5a. 准备批次清单

将提取的帧文件列表分成每批 8-10 帧。为每个批次记录：

批次编号（1, 2, 3, ...）
帧文件路径（绝对路径）
帧时间戳（根据序列号计算）
输出文件路径：TMPDIR/batch_N_analysis.md

5b. 生成子智能体

为每个批次生成一个子智能体，使用下面的提示词。在工具支持的情况下并行启动所有批次——它们是完全独立的。

子智能体提示词模板

原样使用此提示词，替换占位符：

You are analysing frames extracted from a video file.

VIDEO: {filename}
DURATION: {duration}
BATCH: {batch_number} of {total_batches}

Read each frame image listed below using the Read tool (or equivalent file reading tool that supports images). For each frame, write a structured description.

FRAMES:
{for each frame in batch}
- {absolute_path_to_frame} (timestamp: {MM:SS})
{end for}

For each frame, describe:
1. SCENE: What is visible (layout, UI elements, environment)
2. CONTENT: Text, code, labels, menus, or dialogue visible on screen
3. ACTION: What is happening or has changed since the likely previous frame
4. DETAILS: Any notable specifics (error messages, URLs, file names, button states)

After describing all frames, add a BATCH SUMMARY section with:
- Content type (one of: Screencast, Presentation, Tutorial, Footage, Animation)
- Key events in this batch's time range
- Any text/prompts/commands the user typed (quote exactly)

Write the complete analysis to: {TMPDIR}/batch_{N}_analysis.md

Format the output file as:

# Batch {N} Analysis ({start_timestamp} - {end_timestamp})

## Frame-by-Frame

### Frame {sequence} ({timestamp})
- **Scene**: ...
- **Content**: ...
- **Action**: ...
- **Details**: ...

(repeat for each frame)

## Batch Summary
- **Content Type**: ...
- **Key Events**: ...
- **Quoted Text/Prompts**: ...

使用你的工具提供的任何子智能体、后台任务或独立智能体机制。要求很简单——每个子智能体需要：

读取图像文件（帧 JPEG 文件）
写入一个文本文件（批次分析 Markdown 文件）

如果你的工具支持，请并行启动所有批次——它们是完全独立的，没有共享状态。

如果你的工具没有子智能体机制，则回退到在主上下文中直接读取帧，但限制在最多 20 帧，并警告用户关于上下文使用情况。

所有子智能体完成后，读取文本分析文件。这些是轻量级的 Markdown 文件——没有图像进入主上下文。

ls TMPDIR/batch_*_analysis.md

按顺序读取每个 batch_N_analysis.md 文件。这些文件只包含文本描述——与读取原始图像相比，上下文开销是最小的。

仅使用批次分析文件中的文本，在主上下文中执行合成：

将所有帧描述合并成一个按时间顺序排列的时间线
将帧分组到自然的片段中（相同的场景、幻灯片或屏幕）
检测所有批次中的主要内容类型
识别 3-7 个关键时刻
提取用户输入的所有引用文本、提示或命令
写一个 2-5 句的叙述性摘要

将输出格式化为：

# Video Analysis: [filename]

## Metadata
| Property | Value |
|----------|-------|
| Duration | M:SS |
| Resolution | WxH |
| FPS | N |
| Content Type | [detected] |
| Frames Analysed | N |

## Timeline
### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.

### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.

## Key Moments
1. **[M:SS] Title**: Description
2. **[M:SS] Title**: Description
3. **[M:SS] Title**: Description

## Summary
[2-5 sentence narrative paragraph summarising the entire video]

输出完成后删除临时目录：

# macOS/Linux
rm -rf "$TMPDIR"

# Windows (PowerShell)
# Remove-Item -Recurse -Force $TMPDIR

如果用户要求保留帧，则跳过清理。

时间范围 : "Analyse 2:00 to 5:00 of video.mp4" → 使用 -ss 120 -to 300
更高细节 : "Analyse in high detail" → 将帧速率加倍，将场景阈值降低到 0.2
聚焦区域 : "Focus on the code shown" → 在子智能体提示词中优先提取文本/代码

精灵图 : 为了获得视觉概览，生成一个联系表：

ffmpeg -hide_banner -y -i INPUT -vf "select='not(mod(n,EVERY_N))',scale='min(320,iw)':-2,tile=5xROWS" -frames:v 1 DIR/sprite.jpg

未找到 ffmpeg → 按平台显示安装说明，停止
无视频流 → 报告为纯音频文件，停止
场景检测产生 0 帧 → 回退到间隔提取
帧过多（>100）→ 下采样至 80 帧
文件过大（>2GB）→ 警告，建议使用时间范围
子智能体失败或超时 → 作为备用方案直接读取该批次的帧，警告上下文使用情况
子智能体中帧读取失败 → 跳过该帧，在批次分析文件中注明间隙

Ubuntu/Debian : sudo apt install ffmpeg

Windows : choco install ffmpeg or winget install ffmpeg

2. Setup Temp Directory

# macOS/Linux
TMPDIR="/tmp/video-analysis-$(date +%s)"
mkdir -p "$TMPDIR"

# Windows (PowerShell)
# $TMPDIR = "$env:TEMP\video-analysis-$(Get-Date -UFormat %s)"
# New-Item -ItemType Directory -Path $TMPDIR

3. Extract Video Metadata

ffprobe -v quiet -print_format json -show_format -show_streams "VIDEO_PATH"

Extract and report: duration, resolution (width x height), fps, codec, file size, whether audio is present.

If no video stream is found, report "audio-only file" and STOP. If file size > 2GB, warn the user and suggest analysing a time range with -ss START -to END.

Choose strategy based on duration:

Duration	Strategy	Command
0-60s	1 frame every 2s	`ffmpeg -hide_banner -y -i INPUT -vf "fps=1/2,scale='min(1280,iw)':-2" -q:v 5 DIR/frame_%04d.jpg`
1-10min	Scene detection (threshold 0.3)	`ffmpeg -hide_banner -y -i INPUT -vf "select='gt(scene,0.3)',scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/scene_%04d.jpg`
10-30min	Keyframe extraction	`ffmpeg -hide_banner -y -skip_frame nokey -i INPUT -vf "scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/key_%04d.jpg`
30min+	Thumbnail filter	`ffmpeg -hide_banner -y -i INPUT -vf "thumbnail=SEGMENT_FRAMES,scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/thumb_%04d.jpg`

For thumbnail filter, calculate SEGMENT_FRAMES = total_frames / 60 to cap output at ~60 frames.

Scene detection yields 0 frames → retry with interval at 1 frame/5s
More than 100 frames extracted → subsample evenly to 80
Frame extraction fails → try the next simpler strategy (scene → interval, keyframe → interval)

Time range analysis: When user specifies a range, prepend -ss START -to END before -i. Higher detail mode: If requested, double the fps rate and lower scene threshold to 0.2.

After extraction, list all frame files and calculate each frame's timestamp from its sequence number and the extraction rate.

5. Delegate Frame Analysis to Sub-Agents

This is the critical context-saving step. Do NOT read frame images in the main conversation. Instead, split frames into batches and delegate each batch to a sub-agent.

5a. Prepare Batch Manifest

Split the extracted frame file list into batches of 8-10 frames each. For each batch, record:

Batch number (1, 2, 3, ...)
Frame file paths (absolute)
Frame timestamps (calculated from sequence number)
Output file path: TMPDIR/batch_N_analysis.md

5b. Spawn Sub-Agents

For each batch, spawn a sub-agent with the prompt below. Launch all batches in parallel where the tool supports it — they are fully independent.

Sub-Agent Prompt Template

Use this prompt verbatim, substituting the placeholders:

You are analysing frames extracted from a video file.

VIDEO: {filename}
DURATION: {duration}
BATCH: {batch_number} of {total_batches}

Read each frame image listed below using the Read tool (or equivalent file reading tool that supports images). For each frame, write a structured description.

FRAMES:
{for each frame in batch}
- {absolute_path_to_frame} (timestamp: {MM:SS})
{end for}

For each frame, describe:
1. SCENE: What is visible (layout, UI elements, environment)
2. CONTENT: Text, code, labels, menus, or dialogue visible on screen
3. ACTION: What is happening or has changed since the likely previous frame
4. DETAILS: Any notable specifics (error messages, URLs, file names, button states)

After describing all frames, add a BATCH SUMMARY section with:
- Content type (one of: Screencast, Presentation, Tutorial, Footage, Animation)
- Key events in this batch's time range
- Any text/prompts/commands the user typed (quote exactly)

Write the complete analysis to: {TMPDIR}/batch_{N}_analysis.md

Format the output file as:

# Batch {N} Analysis ({start_timestamp} - {end_timestamp})

## Frame-by-Frame

### Frame {sequence} ({timestamp})
- **Scene**: ...
- **Content**: ...
- **Action**: ...
- **Details**: ...

(repeat for each frame)

## Batch Summary
- **Content Type**: ...
- **Key Events**: ...
- **Quoted Text/Prompts**: ...

Use whatever sub-agent, background task, or independent agent mechanism your tool provides. The requirements are simple — each sub-agent needs to:

Read image files (the frame JPEGs)
Write a text file (the batch analysis markdown)

Launch all batches in parallel if your tool supports it — they are fully independent with no shared state.

If your tool has no sub-agent mechanism , fall back to reading frames directly in the main context but limit to 20 frames maximum and warn the user about context usage.

After all sub-agents complete, read the text analysis files. These are lightweight markdown — no images enter the main context.

ls TMPDIR/batch_*_analysis.md

Read each batch_N_analysis.md file in order. These contain only text descriptions — the context cost is minimal compared to reading the original images.

6. Synthesise Output

Using only the text from the batch analysis files, perform synthesis in the main context:

Merge all frame descriptions into a single chronological timeline
Group frames into natural segments (same scene, slide, or screen)
Detect the dominant content type across all batches
Identify 3-7 key moments
Extract all quoted text, prompts, or commands the user typed
Write a 2-5 sentence narrative summary

Format the output as:

# Video Analysis: [filename]

## Metadata
| Property | Value |
|----------|-------|
| Duration | M:SS |
| Resolution | WxH |
| FPS | N |
| Content Type | [detected] |
| Frames Analysed | N |

## Timeline
### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.

### [Segment Title] (M:SS - M:SS)
Description of what happens in this segment.

## Key Moments
1. **[M:SS] Title**: Description
2. **[M:SS] Title**: Description
3. **[M:SS] Title**: Description

## Summary
[2-5 sentence narrative paragraph summarising the entire video]

Remove the temp directory after output is complete:

# macOS/Linux
rm -rf "$TMPDIR"

# Windows (PowerShell)
# Remove-Item -Recurse -Force $TMPDIR

Skip cleanup if the user asks to keep frames.

Time range : "Analyse 2:00 to 5:00 of video.mp4" → use -ss 120 -to 300
Higher detail : "Analyse in high detail" → double frame rate, lower scene threshold to 0.2
Focus area : "Focus on the code shown" → prioritise text/code extraction in sub-agent prompts

Sprite sheet : For a visual overview, generate a contact sheet:

ffmpeg -hide_banner -y -i INPUT -vf "select='not(mod(n,EVERY_N))',scale='min(320,iw)':-2,tile=5xROWS" -frames:v 1 DIR/sprite.jpg

ffmpeg not found → install instructions per platform, STOP
No video stream → report audio-only, STOP
Scene detection yields 0 frames → fallback to interval
Too many frames (>100) → subsample to 80
Large files (>2GB) → warn, suggest time range
Sub-agent fails or times out → read that batch's frames directly as fallback, warn about context usage
Frame read failure in sub-agent → skip frame, note gap in batch analysis file

Skills CLI 使用指南：AI Agent 技能包管理器安装与管理教程

33,600 周安装

FFmpeg视频分析工具：高效提取帧并生成结构化摘要，节省90%上下文

🇨🇳中文介绍

FFmpeg 视频分析

架构：上下文高效的子智能体流水线

1. 先决条件

🇺🇸English

FFmpeg Video Analysis

Architecture: Context-Efficient Sub-Agent Pipeline

1. Prerequisites

相关 Skills

2. 设置临时目录

3. 提取视频元数据

4. 提取帧

5. 将帧分析委托给子智能体

5a. 准备批次清单

5b. 生成子智能体

子智能体提示词模板

如何生成

5c. 收集结果

6. 合成输出

7. 清理

高级选项

错误处理

2. Setup Temp Directory

3. Extract Video Metadata

4. Extract Frames

5. Delegate Frame Analysis to Sub-Agents

5a. Prepare Batch Manifest

5b. Spawn Sub-Agents

Sub-Agent Prompt Template

How to Spawn

5c. Collect Results

6. Synthesise Output

7. Cleanup

Advanced Options

Error Handling

最新 Skills