Clipper视频剪辑技能：AI智能分析转录，自动提取高光片段与完整叙事

clipper by aj47/agent-skills

1 周安装量

1 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/aj47/agent-skills --skill clipper

AI/机器学习内容创作自动化

🇨🇳中文介绍

视频剪辑技能

分析视频转录文件，识别最有趣且值得剪辑的片段，并提供精确的时间戳。

关键特性：叙事完整性 此技能采用双通道系统，确保剪辑讲述完整的故事，而非孤立的瞬间。查看 NARRATIVE_TEMPLATES.md 了解故事弧模板。

快速开始

用户只需在其目录中提供转录 JSON 和视频文件。当他们要求您分析或查找剪辑时，您将自动处理一切：

User: "Find interesting clips from my video"

您将自动执行以下操作：

检测转录文件（通常是 out.json 或 *.json）
检查 parsed.json 是否存在，如果不存在，则运行解析脚本
通道 1：分析 parsed.json 以查找值得剪辑的瞬间 → segments.json
通道 2：验证叙事完整性，检测缺失部分 →

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 1：解析转录（自动）

当用户要求您查找剪辑时，首先检查 parsed.json 是否存在。如果不存在，自动运行：

python .claude/skills/clipper/scripts/parse_transcription.py <transcription_file> > parsed.json

如何检测转录文件：

在当前目录中查找 out.json
如果未找到，查找任何包含 sentences 数组的 *.json 文件
如果找到多个 JSON 文件，询问用户

输入格式：包含 sentences 数组的 JSON，数组包含：

text：句子内容
start：开始时间戳（秒）
end：结束时间戳（秒）
words：单词级时间戳数组

输出格式：带有时间戳的简化 JSON 句子数组

步骤 2：分析有趣片段

您（Claude）将直接分析解析后的转录。遵循此过程：

读取解析后的转录 使用读取工具
以窗口形式分析 每次大约 100 个句子以管理上下文
根据以下标准识别值得剪辑的片段
输出结构化 JSON 包含所有已识别的片段

创建一个名为 segments.json 的文件，结构如下：

{
  "total_clips": 148,
  "clips": [
    {
      "start_index": 3,
      "end_index": 5,
      "start_time": 18.80,
      "end_time": 26.08,
      "duration": 7.28,
      "text": "Oh my god, that's incredible. Didn't know I could stream like that.",
      "reason": "High-energy reaction to discovering new streaming capability - genuine surprise moment",
      "category": "reaction",
      "topic": "tiktok_streaming",
      "subtopic": "vertical_discovery",
      "keywords": ["tiktok", "streaming", "vertical", "discovery"],
      "confidence": 0.92,
      "suggested_title": "TikTok Vertical Streaming Discovery",
      "key_quote": "Oh my god, that's incredible.",
      "first_sentence": "Oh my god, that's incredible.",
      "last_sentence": "Didn't know I could stream like that.",
      "total_sentences": 3,
      "context_expansion": {
        "sentences_added_before": 0,
        "sentences_added_after": 0,
        "reason": "Segment was self-contained from initial identification"
      },
      "coherence_validated": true,
      "narrative": {
        "beat_type": "RESOLUTION",
        "arc_id": "arc_003",
        "arc_role": "Conclusion of discovery arc",
        "linked_claims": [],
        "is_orphan": false
      }
    },
    {
      "start_index": 152,
      "end_index": 189,
      "start_time": 375.0,
      "end_time": 512.8,
      "duration": 137.8,
      "text": "Let me prove it. I'm going to build this with just a skill...",
      "reason": "DEMO: Live demonstration proving skills can replace MCPs - core evidence for stream thesis",
      "category": "teaching",
      "topic": "skills_vs_mcps",
      "subtopic": "demo_proof",
      "keywords": ["skills", "mcp", "demo", "proof", "building"],
      "confidence": 0.95,
      "suggested_title": "Building a Skill to Replace MCPs - Live Demo",
      "key_quote": "Let me prove it.",
      "first_sentence": "Let me prove it.",
      "last_sentence": "And there we go, it's working.",
      "total_sentences": 38,
      "narrative": {
        "beat_type": "EVIDENCE",
        "arc_id": "arc_001",
        "arc_role": "Demo proving the thesis claim",
        "linked_claims": ["clip_023"],
        "is_orphan": false,
        "demo_score_boost": 15
      }
    }
  ],
  "story_arcs": [
    {
      "id": "arc_001",
      "template": "ARGUMENT",
      "title": "Skills Can Replace MCPs - Complete Proof",
      "completeness_score": 1.0,
      "status": "COMPLETE",
      "beats": [
        {
          "type": "CLAIM",
          "clip_index": 23,
          "start_time": 342.5,
          "end_time": 358.2,
          "confidence": 0.95
        },
        {
          "type": "EVIDENCE",
          "clip_index": 45,
          "start_time": 375.0,
          "end_time": 512.8,
          "confidence": 0.92
        },
        {
          "type": "RESOLUTION",
          "clip_index": 67,
          "start_time": 525.0,
          "end_time": 538.5,
          "confidence": 0.94
        }
      ],
      "total_duration": 196.0,
      "required_beats_found": 3,
      "required_beats_total": 3,
      "optional_beats_found": ["STAKES"]
    }
  ],
  "compilations": [
    {
      "id": "comp_auth",
      "title": "Complete Authentication Implementation",
      "topic": "authentication",
      "subtopics": ["oauth_setup", "token_config", "testing"],
      "segment_indices": [5, 12, 34, 67, 89],
      "total_segments": 5,
      "talking_duration": 312.5,
      "time_span": 3600.0,
      "created_automatically": true
    }
  ],
  "orphan_beats": [
    {
      "type": "CLAIM",
      "clip_index": 78,
      "text": "I think Cursor is overrated...",
      "note": "Found CLAIM but no EVIDENCE or RESOLUTION detected within 5 minutes"
    }
  ],
  "metadata": {
    "total_sentences_analyzed": 4400,
    "total_duration": 24165.0,
    "min_clip_length": 30,
    "max_clip_length": 180,
    "confidence_threshold": 0.6,
    "individual_clips": 148,
    "compilations_created": 12,
    "story_arcs_detected": 8,
    "story_arcs_complete": 6,
    "orphan_beats": 3,
    "demo_clips_detected": 12,
    "narrative_validation_run": true
  }
}

分析转录时，识别符合以下标准的片段：

1. 高能量时刻（类别："reaction"）

强烈的情感反应（兴奋、惊讶、震惊、沮丧）
感叹和强调性语言
语气突然变化
示例："Oh my god!"、"That's incredible!"、"No way!"
置信度：强烈反应 0.8+，中等反应 0.6-0.7

2. 有价值的提示与建议（类别："tip"）

包含特定工具/方法的具体建议
观众可操作的建议
"I recommend..."、"You should..."、"The best way..."
示例："I recommend getting MLX models"
置信度：具体提示 0.8+，一般建议 0.6-0.7

3. 教学时刻（类别："teaching"）

概念的技术解释
事物如何运作或为何重要
问题解决演示
"The way this works..."、"The reason is..."
置信度：清晰解释 0.7+，部分解释 0.6+

4. 问题与互动（类别："question"）

直接的观众互动
向观众提出的反问或直接问题
行动号召
"What do you think?"、"Let me know..."
置信度：直接互动 0.7+，反问 0.6+

5. 幽默与娱乐（类别："humor"）

有铺垫和笑点的笑话
有趣的观察或情境
自嘲式幽默
引起共鸣的时刻
置信度：明确的笑话 0.8+，轻度幽默 0.6-0.7

6. 故事与叙事（类别："story"）

有开头/中间/结尾的个人轶事
案例研究或示例
"So this one time..."、"I remember when..."
置信度：完整故事 0.8+，简短轶事 0.6-0.7

7. 强烈观点（类别："opinion"）

有争议或反直觉的陈述
对主题的明确立场
"I honestly think..."、"The truth is..."
置信度：强烈观点 0.8+，温和观点 0.6-0.7

剪辑的技术要求

时长：30-180 秒之间（理想情况：上下文丰富的剪辑 60-120 秒）
完整性：必须是完整的想法，不能中途切断
自包含性：无需视频前文即可理解（在关键时刻前包含铺垫/上下文）
自然边界：在语音的逻辑点开始/结束
置信度阈值：仅包含置信度 ≥ 0.6 的片段
上下文：考虑在关键时刻前包含 10-30 秒的铺垫/上下文

关键：句子完整性规则

每个片段必须：

✓ 在句子的开头开始（start_index 处的句子第一个单词）
✓ 在句子的结尾结束（end_index 处的句子最后一个单词）
✗ 绝不包含部分句子
✗ 绝不从句子中间开始
✗ 绝不在句子中间结束

第一个句子应自然开始（大写字母，正确开头）
最后一个句子应完全结束（句号、问号或感叹号）
start_index 和 end_index 之间的所有句子必须是完整的
没有悬空指代（"it"、"this"、"that"）在片段中没有明确的先行词

上下文扩展算法

在句子 N 处识别出有趣时刻后，为保持连贯性进行扩展：

步骤 1：检查第一个句子是否需要上下文

第一个句子是否：

引用 "this"、"that"、"it" 但没有明确的先行词？
使用 "so"、"therefore"、"because" 暗示了先前的铺垫？
以 "And"、"But"、"Or"（连接词）开头？
假设了视频中较早部分的知识？

→ 如果是：包含句子 N-1，然后重新检查 → 最大回溯：3 个句子

步骤 2：检查最后一个句子是否感觉完整

最后一个句子是否：

以省略号或未完成的想法结尾？
开始解释但没有完成？（例如，"The way this works is..."）
提出问题但没有提供答案？
感觉像是中途被切断的想法？

→ 如果是：包含句子 N+1，然后重新检查 → 最大前瞻：2 个句子

步骤 3：验证时长

上下文扩展后，检查总时长是否仍在 30-180 秒内
如果扩展后太长，找到自然的细分点

连贯性验证清单

对于每个片段，在包含前验证所有这些：

句子边界：在完整句子的开头开始和结尾结束
无模糊代词：第一个句子不使用 "it"、"this"、"that" 指代未知上下文
自包含性：无需观看先前视频即可理解
完整想法：有开头（铺垫）、中间（内容）和结尾（解决）
自然开始：没有在没有上下文的情况下以连接词开头
自然结束：没有在解释中途结束，感觉像是自然的停止点
时长检查：扩展后，仍在 30-180 秒范围内

如果任何检查失败，扩展上下文或跳过该片段。

主题和关键词提取

对于每个已识别的片段，提取分层主题和关键词：

1. 提取关键词（自动，无硬编码）

识别大写的专有名词："TikTok"、"Redis"、"OAuth"、"MediaPipe"
提取技术术语："authentication"、"database"、"API"、"deployment"
查找重复出现的名词短语："user login"、"token validation"、"cache setup"
按突出程度排序并选择前 3-5 个关键词

2. 确定主题（广泛类别）

使用最突出的关键词作为主题
转换为小写加下划线："tiktok_api"、"authentication"、"redis_setup"
主题代表片段的通用主题领域

3. 确定子主题（具体方面）

识别主题内的具体行动或方面
查找行动词："setup"、"config"、"debug"、"deploy"、"test"、"install"
结合上下文："oauth_setup"、"token_validation"、"cache_configuration"
子主题代表您对主题进行的操作

4. 示例主题层次结构

Segment text: "So I'm setting up OAuth for the authentication system using Google's API..."
Keywords: ["oauth", "authentication", "google", "api", "setup"]
Topic: "authentication"
Subtopic: "oauth_setup"



Segment text: "The Redis cache is throwing errors when I try to deploy to production..."
Keywords: ["redis", "cache", "errors", "deploy", "production"]
Topic: "redis"
Subtopic: "deployment_debugging"

由于解析后的转录可能很大，请使用重叠窗口分步分析：

获取总计数：读取 parsed.json 元数据查看总句子数
使用重叠窗口分析：使用 200 个句子的窗口，重叠 50 个句子
- 窗口 1：句子 0-200
- 窗口 2：句子 150-350（重叠 50）
- 窗口 3：句子 300-500（重叠 50）
- 继续直到分析完所有句子
- 为什么重叠？ 防止在上下文中间分割思路
跟踪所有片段：保留所有已识别剪辑的列表，包含主题/子主题/关键词
去重片段：如果同一片段出现在多个窗口中
- 检查重叠的句子范围（例如，160-175 在窗口 1 和 2 中都找到）
- 仅保留一个实例（最高置信度版本）
- 如果边界略有不同但明显是同一片段，则合并
按主题分组：按主题字段组织片段
- 将所有 topic="authentication" 的片段分组
- 将所有 topic="redis" 的片段分组
- 等等。
自动创建合集：对于每个有 2+ 个片段的主题
- 创建合集条目，包含：
  - 唯一 ID（comp_auth、comp_redis 等）
  - 基于主题和子主题的标题
  - 片段索引列表（引用 clips 数组）
  - 总讲话时长（片段时长总和）
  - 时间跨度（从第一个到最后一个片段）
- 间隙处理：片段之间的间隙在提取过程中自动移除
- 如果间隙包含另一个具有相同主题的片段，则该片段被包含在内
排序剪辑：最终 clips 数组按置信度分数排序（最高优先）
写入输出：创建 segments.json，包含：
- 所有带有主题/子主题/关键词的单个剪辑
- 所有合集
- 更新后的元数据，包括合集计数

带验证的分步片段识别

分析转录时，对每个潜在剪辑遵循此过程：

1. 识别有趣时刻

扫描句子，寻找匹配 7 个类别的模式
注意包含有趣时刻的关键句子
示例：句子 436 有高能量反应

2. 确定初始句子边界

识别包含该时刻的完整句子
示例：句子 436-438 包含核心内容

3. 运行上下文扩展算法

检查第一个句子是否需要上下文（代词、连接词）
如果需要，向后扩展（最多 3 个句子）
检查最后一个句子是否完整
如果需要，向前扩展（最多 2 个句子）
示例：为添加上下文添加了句子 434-435

4. 验证连贯性清单

验证连贯性清单中的所有项目
确保句子边界完整
确认自包含性
检查第一个/最后一个句子是否有意义

5. 提取元数据

记录 first_sentence 和 last_sentence 文本
统计 total_sentences
记录 context_expansion（添加了多少以及原因）
设置 coherence_validated: true

6. 创建片段条目

包含所有必填字段
添加验证元数据
确保时间戳对应完整句子

重新阅读片段文本，就像从未看过视频一样
提问："这作为一个独立片段是否完全合理？"
如果否：进一步扩展或跳过片段
如果是：添加到 clips 数组

示例分析命令流程

# User asks Claude:
"Find interesting clips from my video"

# Claude automatically:
# 1. Checks if parsed.json exists, creates it if needed
# 2. Reads parsed.json to understand structure (4400 sentences, 6.7 hours)
# 3. Analyzes in overlapping windows:
#    - Window 1: Sentences 0-200
#    - Window 2: Sentences 150-350
#    - Window 3: Sentences 300-500
#    - ... (continues to end)
# 4. For each segment: extracts keywords, determines topic/subtopic
# 5. Deduplicates segments found in multiple windows
# 6. Groups segments by topic
# 7. Auto-creates compilations for topics with 2+ segments
# 8. Writes segments.json with clips + compilations
# 9. Reports summary:
#    "Found 148 clips across 12 topics. Created 12 compilations."

Analysis complete! Found 148 individual clips.

Topics detected:
  • authentication (5 segments, 312s total) → Compilation created
  • redis_setup (3 segments, 198s total) → Compilation created
  • tiktok_api (4 segments, 256s total) → Compilation created
  • mediapipe (6 segments, 445s total) → Compilation created
  • frontend_debugging (2 segments, 128s total) → Compilation created
  ... (7 more topics)

Created 12 compilations automatically.

Output saved to segments.json
  - 148 individual clips
  - 12 topic compilations

Extracting clips...

步骤 2.5：叙事验证（通道 2）

初始片段检测后，运行叙事验证以确保完整的故事弧。

从完整转录中提取叙事结构
将检测到的剪辑映射到故事弧模板（ARGUMENT、TUTORIAL、DISCOVERY 等）
识别缺失部分，即关键叙事元素缺失的地方
建议额外的剪辑以完成故事

扫描转录以查找故事弧模式：

ARGUMENT 弧（声明 → 证据 → 解决）：

在观点片段中查找 CLAIM 模式
在声明后 5 分钟内搜索 EVIDENCE（演示、证明）
查找结束论证的 RESOLUTION 模式

TUTORIAL 弧（目标 → 步骤 → 结果）：

检测 GOAL 陈述（"Let's build..."、"I want to show you..."）
识别具有连续动作的 STEPS
查找 RESULT 模式（"Done!"、"It works!"）

DISCOVERY 弧（发现 → 探索 → 判断）：

查找 FIND 模式（"Check this out..."、"Just found..."）
检测 EXPLORE 行为（测试、实验）
查找 VERDICT 评估

PROBLEM-SOLUTION 弧（问题 → 洞察 → 修复 → 确认）：

检测 PROBLEM 陈述（错误、沮丧）
查找 INSIGHT 时刻（"Wait..."、"The issue is..."）
识别 FIX 动作和 CONFIRMATION

有关详细检测模式，请参见 BEAT_PATTERNS.md。

在分析过程中应用以下评分提升：

内容类型	评分提升
完整演示（包含执行/结果）	+15
结果/成功时刻	+8
发现/洞察时刻	+6
无演示的口头声明	+4
上下文/铺垫	+2

这确保演示优先于口头声明。

对于每个检测到的声明：

向前搜索（最多 5 分钟）寻找支持证据
如果找到证据，将声明链接到其证明
如果未找到证据，标记为孤立声明
同样，标记没有先前声明的证据

当出现以下情况时存在缺失部分：

找到声明，证据缺失，找到解决 - 缺少证明！
找到目标，步骤缺失，找到结果 - 教程不完整
引用的内容未被剪辑 - "as I showed" 但该内容未被捕获

缺失部分优先级评分：

高：核心内容缺失（流媒体标题主题、主要论点证明）
中：支持性内容缺失
低：可选/附带内容缺失

生成 validation_report.json：

{
  "validation_summary": {
    "total_clips_pass1": 45,
    "narrative_arcs_detected": 8,
    "arcs_complete": 3,
    "arcs_with_gaps": 5,
    "gaps_detected": 12,
    "high_priority_gaps": 3
  },
  "narrative_arcs": [
    {
      "id": "arc_001",
      "type": "ARGUMENT",
      "title": "Skills Can Replace MCPs",
      "completeness_score": 0.33,
      "status": "INCOMPLETE",
      "beats": {
        "CLAIM": {"found": true, "clip_id": "clip_023"},
        "EVIDENCE": {"found": false, "expected_range": [1400, 2100]},
        "RESOLUTION": {"found": true, "clip_id": "clip_045"}
      },
      "gaps": ["evidence_missing"]
    }
  ],
  "gaps": [
    {
      "id": "gap_001",
      "priority": "HIGH",
      "type": "missing_evidence",
      "arc_id": "arc_001",
      "timestamp_range": [1400.0, 2100.0],
      "description": "Demo proving skills can replace MCPs",
      "why_important": "Stream titled 'Skills vs MCPs' - the demo IS the content",
      "suggested_clip": {
        "start_time": 1400.0,
        "end_time": 2100.0,
        "suggested_title": "Building a Skill to Replace MCPs - Live Demo"
      }
    }
  ],
  "orphan_claims": [
    {
      "clip_id": "clip_078",
      "claim_text": "I think Cursor is overrated",
      "note": "CLAIM found but no EVIDENCE detected within 5 minutes"
    }
  ]
}

向用户展示缺失部分

验证后，像这样展示缺失部分：

⚠️  检测到叙事缺失部分

以下重要内容未被捕获为剪辑：

1. [高] "Skills vs MCPs Demo" (23:20 - 35:00)
   您的论点声明和结论已被剪辑，但证明
   论点的演示被遗漏了。这个 12 分钟的片段展示了实际
   技能构建过程。

   → 建议：添加为剪辑 "Building a Skill to Replace MCPs - Live Demo"

2. [中] "Authentication Setup" (45:00 - 48:30)
   剪辑 #12 显示了结果（"It works!"），但显示
   如何配置的设置过程缺失。

   → 建议：添加为剪辑 "OAuth Configuration Walkthrough"

接受建议吗？[Y/n/select]

合并批准的建议

当用户批准建议时：

将建议的剪辑添加到 segments.json
将叙事弧标记为完成
如果需要，更新合集
继续提取

步骤 3：审查已识别的片段

分析后，审查 segments.json：

故事弧：完整的叙事弧（最高优先级）
单个剪辑：所有已识别的片段按置信度排序
主题元数据：每个剪辑包含主题、子主题和关键词
合集：为有 2+ 个片段的主题自动创建
叙事元数据：节拍类型、弧成员资格、孤立状态
每个片段包含：
- 用于视频提取的精确时间戳
- 类别和原因，解释为何有趣
- 分层主题信息
- 剪辑的建议标题
- 节拍类型（如果是叙事弧的一部分）

步骤 4：提取视频剪辑（自动）

创建 segments.json 后，自动检测视频文件并运行：

python .claude/skills/clipper/scripts/extract_clips.py segments.json <original_transcription> <video_file> clips/

如何检测文件：

原始转录：使用步骤 1 中的相同文件（通常是 out.json）
视频文件：在当前目录中查找 *.mp4、*.mov 或 *.mkv
如果找到多个视频文件，询问用户

单词级精度：使用单词时间戳实现精确边界
安全缓冲：前后添加 0.1 秒以避免在单词内部剪切
连贯提取：精确提取分析识别的内容（不分割）
合集支持：自动从多个片段创建基于主题的合集
长度限制：单个剪辑 1-6 分钟，合集可以更长
FFmpeg 集成：生成高质量 MP4 剪辑

segments.json - 步骤 2 分析的输出
out.json - 带有单词级数据的原始转录
video.mp4 - 源视频文件
clips/ - 输出目录（可选，默认为 "clips/"）

单个剪辑的工作原理：

从原始转录加载单词级时间戳
对于每个片段，从句子边界找到第一个和最后一个单词
在第一个单词前和最后一个单词后应用 0.1 秒安全缓冲
检查总时长是否在 60s-360s 范围内
使用 ffmpeg 提取一个连续剪辑
保持自然语音流，不分割

合集的工作原理：

从 segments.json 读取合集定义
按索引检索所有引用的片段
按时间顺序排序片段（按 start_index）
对于合集中的每个片段：
- 提取为连贯的临时剪辑
将所有片段合并为一个长合集剪辑
片段之间的间隙自动移除
保存时添加 "comp_" 前缀：comp_auth_Complete_Authentication.mp4

配置（编辑 extract_clips.py 进行调整）：

SAFETY_BUFFER = 0.1 - 单词前后的缓冲（秒）
MIN_CLIP_LENGTH = 60.0 - 最小总剪辑长度（1 分钟）
MAX_CLIP_LENGTH = 360.0 - 最大总剪辑长度（6 分钟）

单个剪辑：001_Clip_Title.mp4、002_Another_Clip.mp4 等。
合集：comp_auth_Complete_Authentication.mp4、comp_redis_Redis_Setup.mp4 等。
太短（<1 分钟）或太长（>6 分钟）的单个剪辑被跳过
合集可以超过 6 分钟（包含多个片段）
打印摘要统计信息：提取的单个剪辑、创建的合集、跳过的

步骤 5：清理剪辑（自动）

提取剪辑后，自动清理它们，移除填充词和静默：

python .claude/skills/clipper/scripts/cleanup_clips.py segments.json <original_transcription> <video_file> clips/

这是一个自动的后处理阶段：

阶段 1（分析）：识别具有完整句子的连贯片段 ✓
阶段 2（提取）：精确提取分析识别的内容 ✓
阶段 3（清理）：移除填充词和静默 ✓ ← 此步骤

填充词移除：移除 "um"、"uh"、"ah"、"like" 等。
静默移除：检测并移除单词之间 > 0.4 秒的间隙
片段合并：将剩余部分合并为精炼的剪辑
保持连贯性：仅移除填充词/静默，保持句子结构

对于每个提取的剪辑，加载单词级时间戳
识别并移除填充词
检测剩余单词之间 > 0.4 秒的静默间隙
从原始视频重新提取，跳过填充词和静默
将剩余片段合并为清理后的剪辑
保存到 clips/cleaned/ 目录

配置（编辑 cleanup_clips.py 进行调整）：

SAFETY_BUFFER = 0.1 - 剪切前后的缓冲（秒）
SILENCE_THRESHOLD = 0.4 - 视为静默的最小间隙（秒）
MIN_SEGMENT_LENGTH = 0.3 - 保留的最小片段长度（秒）
FILLER_WORDS - 要移除的填充词集合

原始剪辑保存在 clips/
清理后的剪辑保存到 clips/cleaned/
摘要统计信息：移除的填充词、移除的静默、节省的时间

什么是好的剪辑？

参见上面步骤 2 中的标准，或查看 SEGMENT_TYPES.md 了解每个类别的详细指导。

分析时可以调整这些参数：

MIN_CLIP_LENGTH：30 秒（分析的理想范围，提取强制执行 60 秒最小值）
MAX_CLIP_LENGTH：180 秒（分析的理想范围，提取允许

🇺🇸English

Video Clipper Skill

Analyzes video transcription files to identify the most interesting and clip-worthy segments with precise timestamps.

Key Feature: Narrative Completeness This skill uses a two-pass system to ensure clips tell complete stories, not just isolated moments. See NARRATIVE_TEMPLATES.md for story arc templates.

Quick Start

User just needs the transcription JSON and video file in their directory. When they ask you to analyze or find clips, you handle everything automatically:

User: "Find interesting clips from my video"

You will automatically:

Detect the transcription file (usually out.json or *.json)
Check if parsed.json exists, if not, run the parse script
Pass 1 : Analyze parsed.json for clip-worthy moments → segments.json
Pass 2 : Validate narrative completeness, detect gaps → validation_report.json
Merge approved suggestions into final segments
Detect video file and run extraction script
Run cleanup script to remove fillers and silences

Execution Checklist

When user asks to find clips, follow this exact sequence:

A. Detect Files

Check if parsed.json exists
If not, look for transcription JSON (check out.json first, then *.json)
Note: segments.json and other analysis files are NOT transcription files

B. Parse (if needed)

If parsed.json doesn't exist, run: python .claude/skills/clipper/scripts/parse_transcription.py <transcription> > parsed.json
Verify parsed.json was created successfully

C. Pass 1: Signal Detection

Read parsed.json in windows (100-200 sentences at a time)
Identify clip-worthy segments using the 7 categories + narrative beats
Apply Demo-First Scoring (demos +15, verbal claims +4)
Write initial results to segments.json
Report summary to user (number of clips found, categories, etc.)

D. Pass 2: Narrative Validation (NEW)

Extract narrative structures from full transcript (see NARRATIVE_TEMPLATES.md)
Map detected clips to story arcs (ARGUMENT, TUTORIAL, DISCOVERY, etc.)
Run Claim-Proof Linking - link claims to their evidence
Detect gaps where key narrative elements are missing
Generate validation_report.json with gaps and suggestions
Present gaps and suggestions to user
User approves/rejects suggested additions
Merge approved suggestions into segments.json

E. Extract (automatic)

Detect video file (*.mp4, *.mov, *.mkv)
Run: python .claude/skills/clipper/scripts/extract_clips.py segments.json <transcription> <video> clips/
Monitor output and report results

F. Cleanup (automatic)

Run: python .claude/skills/clipper/scripts/cleanup_clips.py segments.json <transcription> <video> clips/
Cleaned clips are saved to clips/cleaned/
Report cleanup stats (fillers removed, silences removed, time saved)

Workflow

Step 1: Parse Transcription (Automatic)

When a user asks you to find clips, FIRST check if parsed.json exists. If not, automatically run:

python .claude/skills/clipper/scripts/parse_transcription.py <transcription_file> > parsed.json

How to detect the transcription file:

Look for out.json in current directory
If not found, look for any *.json file that contains a sentences array
Ask user if multiple JSON files are found

Input format : JSON with sentences array containing:

text: Sentence content
start: Start timestamp (seconds)
end: End timestamp (seconds)
words: Array of word-level timestamps

Output format : Simplified JSON array of sentences with timestamps

Step 2: Analyze for Interesting Segments

YOU (Claude) will analyze the parsed transcription directly. Follow this process:

Analysis Process

Read the parsed transcription using the Read tool
Analyze in windows of approximately 100 sentences at a time to manage context
Identify clip-worthy segments based on the criteria below
Output structured JSON with all identified segments

Output Format

Create a file called segments.json with this structure:

{
  "total_clips": 148,
  "clips": [
    {
      "start_index": 3,
      "end_index": 5,
      "start_time": 18.80,
      "end_time": 26.08,
      "duration": 7.28,
      "text": "Oh my god, that's incredible. Didn't know I could stream like that.",
      "reason": "High-energy reaction to discovering new streaming capability - genuine surprise moment",
      "category": "reaction",
      "topic": "tiktok_streaming",
      "subtopic": "vertical_discovery",
      "keywords": ["tiktok", "streaming", "vertical", "discovery"],
      "confidence": 0.92,
      "suggested_title": "TikTok Vertical Streaming Discovery",
      "key_quote": "Oh my god, that's incredible.",
      "first_sentence": "Oh my god, that's incredible.",
      "last_sentence": "Didn't know I could stream like that.",
      "total_sentences": 3,
      "context_expansion": {
        "sentences_added_before": 0,
        "sentences_added_after": 0,
        "reason": "Segment was self-contained from initial identification"
      },
      "coherence_validated": true,
      "narrative": {
        "beat_type": "RESOLUTION",
        "arc_id": "arc_003",
        "arc_role": "Conclusion of discovery arc",
        "linked_claims": [],
        "is_orphan": false
      }
    },
    {
      "start_index": 152,
      "end_index": 189,
      "start_time": 375.0,
      "end_time": 512.8,
      "duration": 137.8,
      "text": "Let me prove it. I'm going to build this with just a skill...",
      "reason": "DEMO: Live demonstration proving skills can replace MCPs - core evidence for stream thesis",
      "category": "teaching",
      "topic": "skills_vs_mcps",
      "subtopic": "demo_proof",
      "keywords": ["skills", "mcp", "demo", "proof", "building"],
      "confidence": 0.95,
      "suggested_title": "Building a Skill to Replace MCPs - Live Demo",
      "key_quote": "Let me prove it.",
      "first_sentence": "Let me prove it.",
      "last_sentence": "And there we go, it's working.",
      "total_sentences": 38,
      "narrative": {
        "beat_type": "EVIDENCE",
        "arc_id": "arc_001",
        "arc_role": "Demo proving the thesis claim",
        "linked_claims": ["clip_023"],
        "is_orphan": false,
        "demo_score_boost": 15
      }
    }
  ],
  "story_arcs": [
    {
      "id": "arc_001",
      "template": "ARGUMENT",
      "title": "Skills Can Replace MCPs - Complete Proof",
      "completeness_score": 1.0,
      "status": "COMPLETE",
      "beats": [
        {
          "type": "CLAIM",
          "clip_index": 23,
          "start_time": 342.5,
          "end_time": 358.2,
          "confidence": 0.95
        },
        {
          "type": "EVIDENCE",
          "clip_index": 45,
          "start_time": 375.0,
          "end_time": 512.8,
          "confidence": 0.92
        },
        {
          "type": "RESOLUTION",
          "clip_index": 67,
          "start_time": 525.0,
          "end_time": 538.5,
          "confidence": 0.94
        }
      ],
      "total_duration": 196.0,
      "required_beats_found": 3,
      "required_beats_total": 3,
      "optional_beats_found": ["STAKES"]
    }
  ],
  "compilations": [
    {
      "id": "comp_auth",
      "title": "Complete Authentication Implementation",
      "topic": "authentication",
      "subtopics": ["oauth_setup", "token_config", "testing"],
      "segment_indices": [5, 12, 34, 67, 89],
      "total_segments": 5,
      "talking_duration": 312.5,
      "time_span": 3600.0,
      "created_automatically": true
    }
  ],
  "orphan_beats": [
    {
      "type": "CLAIM",
      "clip_index": 78,
      "text": "I think Cursor is overrated...",
      "note": "Found CLAIM but no EVIDENCE or RESOLUTION detected within 5 minutes"
    }
  ],
  "metadata": {
    "total_sentences_analyzed": 4400,
    "total_duration": 24165.0,
    "min_clip_length": 30,
    "max_clip_length": 180,
    "confidence_threshold": 0.6,
    "individual_clips": 148,
    "compilations_created": 12,
    "story_arcs_detected": 8,
    "story_arcs_complete": 6,
    "orphan_beats": 3,
    "demo_clips_detected": 12,
    "narrative_validation_run": true
  }
}

Analysis Instructions

When analyzing the transcription, identify segments that match these criteria:

1. High-Energy Moments (category: "reaction")

Strong emotional reactions (excitement, surprise, shock, frustration)
Exclamations and emphatic language
Sudden tone changes
Examples: "Oh my god!", "That's incredible!", "No way!"
Confidence : 0.8+ for strong reactions, 0.6-0.7 for moderate

2. Valuable Tips & Advice (category: "tip")

Concrete recommendations with specific tools/methods
Actionable advice viewers can apply
"I recommend...", "You should...", "The best way..."
Examples: "I recommend getting MLX models"
Confidence : 0.8+ for specific tips, 0.6-0.7 for general advice

3. Teaching Moments (category: "teaching")

Technical explanations of concepts
How things work or why they matter
Problem-solving demonstrations
"The way this works...", "The reason is..."
Confidence : 0.7+ for clear explanations, 0.6+ for partial

4. Questions & Engagement (category: "question")

Direct audience interaction
Rhetorical or direct questions to viewers
Calls to action
"What do you think?", "Let me know..."
Confidence : 0.7+ for direct engagement, 0.6+ for rhetorical

5. Humor & Entertainment (category: "humor")

Jokes with setup and punchline
Funny observations or situations
Self-deprecating humor
Relatable moments
Confidence : 0.8+ for clear jokes, 0.6-0.7 for mild humor

6. Stories & Narratives (category: "story")

Personal anecdotes with beginning/middle/end
Case studies or examples
"So this one time...", "I remember when..."
Confidence : 0.8+ for complete stories, 0.6-0.7 for brief anecdotes

7. Strong Opinions (category: "opinion")

Controversial or counter-intuitive statements
Definitive positions on topics
"I honestly think...", "The truth is..."
Confidence : 0.8+ for strong takes, 0.6-0.7 for mild opinions

Technical Requirements for Clips

Duration : Between 30-180 seconds (ideal: 60-120 seconds for context-rich clips)
Completeness : Must be complete thoughts, not cut mid-sentence
Self-contained : Understandable without prior video context (include setup/context before key moments)
Natural boundaries : Start/end at logical points in speech
Confidence threshold : Only include segments with confidence ≥ 0.6
Context : Consider including 10-30 seconds of setup/context before the key moment

CRITICAL: Sentence Integrity Rules

EVERY segment MUST:

✓ Start at the BEGINNING of a sentence (first word of sentence at start_index)
✓ End at the END of a sentence (last word of sentence at end_index)
✗ NEVER include partial sentences
✗ NEVER start mid-sentence
✗ NEVER end mid-sentence

Validation:

First sentence should start naturally (capital letter, proper start)
Last sentence should end completely (period, question mark, or exclamation mark)
All sentences between start_index and end_index must be complete
No dangling references ("it", "this", "that") without clear antecedent in the segment

Context Expansion Algorithm

After identifying an interesting moment at sentence N, expand for coherence:

Step 1: Check if first sentence needs context

Does the first sentence:

Reference "this", "that", "it" without clear antecedent?
Use "so", "therefore", "because" implying prior setup?
Start with "And", "But", "Or" (continuation word)?
Assume knowledge from earlier in video?

→ If YES : Include sentence N-1, then re-check → Maximum lookback : 3 sentences

Step 2: Check if last sentence feels complete

Does the last sentence:

End with ellipsis or trailing thought?
Start explanation but not finish? (e.g., "The way this works is...")
Ask a question without providing an answer?
Feel like it's cut off mid-thought?

→ If YES : Include sentence N+1, then re-check → Maximum lookahead : 2 sentences

Step 3: Verify duration

After context expansion, check total duration still within 30-180 seconds
If too long after expansion, find natural sub-division point

Coherence Validation Checklist

For EACH segment, verify all of these before including:

Sentence Boundaries : Starts at beginning and ends at end of complete sentences
No Unclear Pronouns : First sentence doesn't use "it", "this", "that" referring to unknown context
Self-Contained : Can be understood without watching prior video
Complete Thought : Has beginning (setup), middle (content), and end (resolution)
Natural Start : Doesn't start with continuation words without context
Natural End : Doesn't end mid-explanation, feels like natural stopping point
Duration Check : After expansion, still within 30-180 second range

If any check fails, expand context or skip the segment.

Topic and Keyword Extraction

For each identified segment, extract hierarchical topics and keywords:

1. Extract Keywords (Automatic, No Hard-Coding)

Identify capitalized proper nouns: "TikTok", "Redis", "OAuth", "MediaPipe"
Extract technical terms: "authentication", "database", "API", "deployment"
Find recurring noun phrases: "user login", "token validation", "cache setup"
Rank by prominence and select top 3-5 keywords

2. Determine Topic (Broad Category)

Use most prominent keyword as topic
Convert to lowercase with underscores: "tiktok_api", "authentication", "redis_setup"
Topic represents the general subject area of the segment

3. Determine Subtopic (Specific Aspect)

Identify specific action or aspect within the topic
Look for action words: "setup", "config", "debug", "deploy", "test", "install"
Combine with context: "oauth_setup", "token_validation", "cache_configuration"
Subtopic represents what you're doing with the topic

4. Example Topic Hierarchy

Segment text: "So I'm setting up OAuth for the authentication system using Google's API..."
Keywords: ["oauth", "authentication", "google", "api", "setup"]
Topic: "authentication"
Subtopic: "oauth_setup"



Segment text: "The Redis cache is throwing errors when I try to deploy to production..."
Keywords: ["redis", "cache", "errors", "deploy", "production"]
Topic: "redis"
Subtopic: "deployment_debugging"

Multi-step Process

Since the parsed transcription may be large, analyze it in steps with overlapping windows:

Get total count : Read the parsed.json metadata to see total sentences
Analyze in overlapping windows : Use 200-sentence windows with 50-sentence overlap
- Window 1: Sentences 0-200
- Window 2: Sentences 150-350 (50 overlap)
- Window 3: Sentences 300-500 (50 overlap)
- Continue until all sentences analyzed
- Why overlap? Prevents splitting trains of thought mid-context
Track all segments : Keep a running list of all identified clips with topic/subtopic/keywords
Deduplicate segments : If same segment appears in multiple windows
- Check for overlapping sentence ranges (e.g., 160-175 found in both Window 1 and 2)
- Keep only one instance (highest confidence version)
- Merge if boundaries slightly differ but clearly same segment
Group by topic : Organize segments by their topic field
- Group all segments with topic="authentication"
- Group all segments with topic="redis"
- etc.
Create compilations automatically : For each topic with 2+ segments
- Create compilation entry with:
  - Unique ID (comp_auth, comp_redis, etc.)
  - Title based on topic and subtopics
  - List of segment indices (references to clips array)
  - Total talking duration (sum of segment durations)

Step-by-Step Segment Identification with Validation

When analyzing transcription, follow this process for EACH potential clip:

1. Identify Interesting Moment

Scan sentences for patterns matching the 7 categories
Note the key sentence(s) containing the interesting moment
Example: Sentence 436 has high-energy reaction

2. Determine Initial Sentence Boundaries

Identify complete sentence containing the moment
Example: Sentences 436-438 contain the core content

3. Run Context Expansion Algorithm

Check first sentence for context needs (pronouns, continuation words)
If needed, expand backward (max 3 sentences)
Check last sentence for completeness
If needed, expand forward (max 2 sentences)
Example: Added sentences 434-435 for context

4. Validate Coherence Checklist

Verify all items in coherence checklist
Ensure sentence boundaries are intact
Confirm self-containment
Check that first/last sentences make sense

5. Extract Metadata

Record first_sentence and last_sentence text
Count total_sentences
Document context_expansion (how many added and why)
Set coherence_validated: true

6. Create Segment Entry

Include all required fields
Add validation metadata
Ensure timestamps correspond to complete sentences

7. Final Validation

Re-read segment text as if you've never seen the video
Ask: "Would this make complete sense standalone?"
If NO: expand further or skip segment
If YES: add to clips array

Example Analysis Command Flow

# User asks Claude:
"Find interesting clips from my video"

# Claude automatically:
# 1. Checks if parsed.json exists, creates it if needed
# 2. Reads parsed.json to understand structure (4400 sentences, 6.7 hours)
# 3. Analyzes in overlapping windows:
#    - Window 1: Sentences 0-200
#    - Window 2: Sentences 150-350
#    - Window 3: Sentences 300-500
#    - ... (continues to end)
# 4. For each segment: extracts keywords, determines topic/subtopic
# 5. Deduplicates segments found in multiple windows
# 6. Groups segments by topic
# 7. Auto-creates compilations for topics with 2+ segments
# 8. Writes segments.json with clips + compilations
# 9. Reports summary:
#    "Found 148 clips across 12 topics. Created 12 compilations."

Example Analysis Output

Analysis complete! Found 148 individual clips.

Topics detected:
  • authentication (5 segments, 312s total) → Compilation created
  • redis_setup (3 segments, 198s total) → Compilation created
  • tiktok_api (4 segments, 256s total) → Compilation created
  • mediapipe (6 segments, 445s total) → Compilation created
  • frontend_debugging (2 segments, 128s total) → Compilation created
  ... (7 more topics)

Created 12 compilations automatically.

Output saved to segments.json
  - 148 individual clips
  - 12 topic compilations

Extracting clips...

Step 2.5: Narrative Validation (Pass 2)

After initial segment detection, run narrative validation to ensure complete story arcs.

What This Step Does

Extracts narrative structures from the full transcript
Maps detected clips to story arc templates (ARGUMENT, TUTORIAL, DISCOVERY, etc.)
Identifies gaps where key narrative elements are missing
Suggests additional clips to complete stories

Narrative Arc Detection

Scan the transcript for story arc patterns:

ARGUMENT arcs (Claim → Evidence → Resolution):

Look for CLAIM patterns in opinion segments
Search for EVIDENCE (demos, proofs) within 5 minutes of claims
Find RESOLUTION patterns that conclude arguments

TUTORIAL arcs (Goal → Steps → Result):

Detect GOAL statements ("Let's build...", "I want to show you...")
Identify STEPS with sequential actions
Find RESULT patterns ("Done!", "It works!")

DISCOVERY arcs (Find → Explore → Verdict):

Look for FIND patterns ("Check this out...", "Just found...")
Detect EXPLORE behavior (testing, experimenting)
Find VERDICT assessments

PROBLEM-SOLUTION arcs (Problem → Insight → Fix → Confirmation):

Detect PROBLEM statements (errors, frustration)
Find INSIGHT moments ("Wait...", "The issue is...")
Identify FIX actions and CONFIRMATION

See BEAT_PATTERNS.md for detailed detection patterns.

Demo-First Scoring

Apply these scoring boosts during analysis:

Content Type	Score Boost
Complete demo (with execution/results)	+15
Result/success moment	+8
Discovery/insight moment	+6
Verbal claim without demo	+4
Context/setup	+2

This ensures demos are prioritized over verbal claims.

Claim-Proof Linking

For each detected claim:

Search forward (up to 5 minutes) for supporting evidence
If evidence found, link the claim to its proof
If no evidence found, flag as orphan claim
Similarly, flag evidence without preceding claims

Gap Detection

A gap exists when:

Claim found, evidence missing, resolution found - the proof is missing!
Goal found, steps missing, result found - the tutorial is incomplete
Referenced content not clipped - "as I showed" but that content wasn't captured

Gap priority scoring:

HIGH : Core content missing (stream title topic, main thesis proof)
MEDIUM : Supporting content missing
LOW : Optional/tangential content

Validation Output

Generate validation_report.json:

{
  "validation_summary": {
    "total_clips_pass1": 45,
    "narrative_arcs_detected": 8,
    "arcs_complete": 3,
    "arcs_with_gaps": 5,
    "gaps_detected": 12,
    "high_priority_gaps": 3
  },
  "narrative_arcs": [
    {
      "id": "arc_001",
      "type": "ARGUMENT",
      "title": "Skills Can Replace MCPs",
      "completeness_score": 0.33,
      "status": "INCOMPLETE",
      "beats": {
        "CLAIM": {"found": true, "clip_id": "clip_023"},
        "EVIDENCE": {"found": false, "expected_range": [1400, 2100]},
        "RESOLUTION": {"found": true, "clip_id": "clip_045"}
      },
      "gaps": ["evidence_missing"]
    }
  ],
  "gaps": [
    {
      "id": "gap_001",
      "priority": "HIGH",
      "type": "missing_evidence",
      "arc_id": "arc_001",
      "timestamp_range": [1400.0, 2100.0],
      "description": "Demo proving skills can replace MCPs",
      "why_important": "Stream titled 'Skills vs MCPs' - the demo IS the content",
      "suggested_clip": {
        "start_time": 1400.0,
        "end_time": 2100.0,
        "suggested_title": "Building a Skill to Replace MCPs - Live Demo"
      }
    }
  ],
  "orphan_claims": [
    {
      "clip_id": "clip_078",
      "claim_text": "I think Cursor is overrated",
      "note": "CLAIM found but no EVIDENCE detected within 5 minutes"
    }
  ]
}

Presenting Gaps to User

After validation, present gaps like this:

⚠️  NARRATIVE GAPS DETECTED

The following important content was NOT captured as clips:

1. [HIGH] "Skills vs MCPs Demo" (23:20 - 35:00)
   Your thesis claim and conclusion were clipped, but the DEMO proving
   the thesis was missed. This 12-minute segment shows the actual
   skill being built.

   → Suggest: Add as clip "Building a Skill to Replace MCPs - Live Demo"

2. [MEDIUM] "Authentication Setup" (45:00 - 48:30)
   Clip #12 shows the result ("It works!") but the setup showing
   HOW it was configured is missing.

   → Suggest: Add as clip "OAuth Configuration Walkthrough"

Accept suggestions? [Y/n/select]

Merging Approved Suggestions

When user approves suggestions:

Add suggested clips to segments.json
Mark narrative arcs as complete
Update compilations if needed
Proceed to extraction

Step 3: Review Identified Segments

After analysis, review segments.json:

Story arcs : Complete narrative arcs (highest priority)
Individual clips : All identified segments sorted by confidence
Topic metadata : Each clip includes topic, subtopic, and keywords
Compilations : Automatically created for topics with 2+ segments
Narrative metadata : Beat type, arc membership, orphan status
Each segment includes:
- Precise timestamps for video extraction
- Category and reason explaining why it's interesting
- Hierarchical topic information
- Suggested title for the clip
- Beat type (if part of a narrative arc)

Step 4: Extract Video Clips (Automatic)

After creating segments.json, automatically detect the video file and run:

python .claude/skills/clipper/scripts/extract_clips.py segments.json <original_transcription> <video_file> clips/

How to detect files:

Original transcription: Use the same file from Step 1 (usually out.json)
Video file: Look for *.mp4, *.mov, or *.mkv in current directory
Ask user if multiple video files are found

Features:

Word-level precision : Uses word timestamps for exact boundaries
Safety buffer : Adds 0.1s before/after to avoid clipping inside words
Coherent extraction : Extracts exactly what analysis identified (no splitting)
Compilation support : Automatically creates topic-based compilations from multiple segments
Length constraints : Individual clips 1-6 min, compilations can be longer
FFmpeg integration : Generates high-quality MP4 clips

Arguments:

segments.json - Output from Step 2 analysis
out.json - Original transcription with word-level data
video.mp4 - Source video file
clips/ - Output directory (optional, defaults to "clips/")

How it works for individual clips:

Loads word-level timestamps from original transcription
For each segment, finds first and last words from sentence boundaries
Applies 0.1s safety buffer before first word and after last word
Checks if total duration is within 60s-360s range
Extracts one continuous clip using ffmpeg
Preserves natural speech flow with no splitting

How it works for compilations:

Reads compilation definition from segments.json
Retrieves all referenced segments by index
Sorts segments chronologically (by start_index)
For each segment in the compilation:
- Extracts as coherent temporary clip
Combines all segments into one long compilation clip
Gaps between segments are automatically removed
Saves with "comp_" prefix: comp_auth_Complete_Authentication.mp4

Configuration (edit extract_clips.py to adjust):

SAFETY_BUFFER = 0.1 - Buffer before/after words (seconds)
MIN_CLIP_LENGTH = 60.0 - Minimum total clip length (1 minute)
MAX_CLIP_LENGTH = 360.0 - Maximum total clip length (6 minutes)

Output:

Individual clips: 001_Clip_Title.mp4, 002_Another_Clip.mp4, etc.
Compilations: comp_auth_Complete_Authentication.mp4, comp_redis_Redis_Setup.mp4, etc.
Individual clips too short (<1min) or too long (>6min) are skipped
Compilations can exceed 6 minutes (contain multiple segments)
Summary stats printed: individual clips extracted, compilations created, skipped

Step 5: Cleanup Clips (Automatic)

After extracting clips, automatically clean them up by removing filler words and silences:

python .claude/skills/clipper/scripts/cleanup_clips.py segments.json <original_transcription> <video_file> clips/

This is an automatic post-processing phase:

Phase 1 (Analysis): Identify coherent segments with complete sentences ✓
Phase 2 (Extraction): Extract exactly what analysis identified ✓
Phase 3 (Cleanup): Remove fillers and silences ✓ ← THIS STEP

Features:

Filler word removal : Removes "um", "uh", "ah", "like", etc.
Silence removal : Detects and removes gaps > 0.4s between words
Segment combining : Merges remaining parts into polished clips
Preserves coherence : Only removes fillers/silences, maintains sentence structure

How it works:

For each extracted clip, loads word-level timestamps
Identifies and removes filler words
Detects silence gaps > 0.4s between remaining words
Re-extracts from original video, skipping fillers and silences
Combines remaining segments into cleaned clip
Saves to clips/cleaned/ directory

Configuration (edit cleanup_clips.py to adjust):

SAFETY_BUFFER = 0.1 - Buffer before/after cuts (seconds)
SILENCE_THRESHOLD = 0.4 - Min gap to consider silence (seconds)
MIN_SEGMENT_LENGTH = 0.3 - Min segment length to keep (seconds)
FILLER_WORDS - Set of filler words to remove

Output:

Original clips preserved in clips/
Cleaned clips saved to clips/cleaned/
Summary stats: fillers removed, silences removed, time saved

What Makes a Good Clip?

See the criteria in Step 2 above, or review SEGMENT_TYPES.md for detailed guidance on each category.

Configuration

You can adjust these parameters when analyzing:

MIN_CLIP_LENGTH : 30 seconds (ideal range for analysis, extraction enforces 60s minimum)
MAX_CLIP_LENGTH : 180 seconds (ideal range for analysis, extraction allows up to 360s)
CONFIDENCE_THRESHOLD : 0.6 (only include clips above this score)
WINDOW_SIZE : 100 sentences per analysis window

Note: The extraction script enforces 60s minimum and 360s maximum to ensure clips have sufficient context.

Examples

See EXAMPLES.md for sample analyses and outputs.

Requirements

Python 3.8+ for the scripts (no additional packages needed)
ffmpeg for video clip extraction (install: brew install ffmpeg on macOS or apt-get install ffmpeg on Linux)

Troubleshooting

"No segments found" : The content may be low-energy or monotone. Try lowering confidence threshold mentally to 0.5 or including shorter clips (down to 8 seconds).

"Too many segments" : Raise confidence threshold to 0.75+ or increase minimum length to 15-20 seconds.

Large files : For very long videos (4+ hours), analyze in smaller windows and take breaks to avoid context limits.

Weekly Installs

Repository

aj47/agent-skills

GitHub Stars

First Seen

Today

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

amp1

cline1

openclaw1

opencode1

cursor1

kimi-cli1

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

60,400 周安装

Time span (from first to last segment)

Gap handling : Gaps between segments are automatically removed during extraction

If a gap contains another segment with the same topic, that segment is included

Sort clips : Final clips array sorted by confidence score (highest first)

Write output : Create segments.json with:

All individual clips with topic/subtopic/keywords
All compilations
Updated metadata including compilation count