baoyu-youtube-transcript by jimliu/baoyu-skills
npx skills add https://github.com/jimliu/baoyu-skills --skill baoyu-youtube-transcript从 YouTube 视频下载字幕(字幕/说明文字)。适用于手动创建和自动生成的字幕。无需 API 密钥或浏览器——直接使用 YouTube 的 InnerTube API。
首次运行时获取视频元数据和封面图像,缓存原始数据以便快速重新格式化。
脚本位于 scripts/ 子目录中。{baseDir} = 此 SKILL.md 文件的目录路径。解析 ${BUN_X} 运行时:如果已安装 bun → bun;如果 npx 可用 → npx -y bun;否则建议安装 bun。请将 {baseDir} 和 ${BUN_X} 替换为实际值。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 脚本 | 用途 |
|---|---|
scripts/main.ts | 字幕下载命令行界面 |
# 默认:带时间戳的 Markdown 格式(英语)
${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>
# 指定语言(优先级顺序)
${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja
# 不带时间戳
${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps
# 带章节分段
${BUN_X} {baseDir}/scripts/main.ts <url> --chapters
# 带说话人识别(需要 AI 后处理)
${BUN_X} {baseDir}/scripts/main.ts <url> --speakers
# SRT 字幕文件
${BUN_X} {baseDir}/scripts/main.ts <url> --format srt
# 翻译字幕
${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans
# 列出可用的字幕
${BUN_X} {baseDir}/scripts/main.ts <url> --list
# 强制重新获取(忽略缓存)
${BUN_X} {baseDir}/scripts/main.ts <url> --refresh
| 选项 | 描述 | 默认值 |
|---|---|---|
<url-or-id> | YouTube URL 或视频 ID(允许多个) | 必填 |
--languages <codes> | 语言代码,逗号分隔,按优先级顺序 | en |
--format <fmt> | 输出格式:text, srt | text |
--translate <code> | 翻译到指定的语言代码 | |
--list | 列出可用的字幕而非获取 | |
--timestamps | 每段包含 [HH:MM:SS → HH:MM:SS] 时间戳 | 开启 |
--no-timestamps | 禁用时间戳 | |
--chapters | 从视频描述中提取章节分段 | |
--speakers | 包含元数据的原始字幕,用于说话人识别 | |
--exclude-generated | 跳过自动生成的字幕 | |
--exclude-manually-created | 跳过硬性创建的字幕 | |
--refresh | 强制重新获取,忽略缓存数据 | |
-o, --output <path> | 保存到特定文件路径 | 自动生成 |
--output-dir <dir> | 基础输出目录 | youtube-transcript |
接受以下任意一种作为视频输入:
https://www.youtube.com/watch?v=dQw4w9WgXcQhttps://youtu.be/dQw4w9WgXcQhttps://www.youtube.com/embed/dQw4w9WgXcQhttps://www.youtube.com/shorts/dQw4w9WgXcQdQw4w9WgXcQ| 格式 | 扩展名 | 描述 |
|---|---|---|
text | .md | 带 frontmatter(包含 description)、标题、摘要、可选目录/封面/时间戳/章节/说话人的 Markdown 文件 |
srt | .srt | 适用于视频播放器的 SubRip 字幕格式 |
youtube-transcript/
├── .index.json # 视频 ID → 目录路径映射(用于缓存查找)
└── {channel-slug}/{title-full-slug}/
├── meta.json # 视频元数据(标题、频道、描述、时长、章节等)
├── transcript-raw.json # 来自 YouTube API 的原始字幕片段(已缓存)
├── transcript-sentences.json # 句子分割后的字幕(按标点符号分割,跨片段合并)
├── imgs/
│ └── cover.jpg # 视频缩略图
├── transcript.md # Markdown 字幕(从句子生成)
└── transcript.srt # SRT 字幕(如果使用 --format srt,则从原始片段生成)
{channel-slug}:频道名称的 kebab-case 形式{title-full-slug}:完整视频标题的 kebab-case 形式--list 模式仅输出到标准输出(不保存文件)。
首次获取时,脚本会保存:
meta.json — 视频元数据、章节、封面图像路径、语言信息transcript-raw.json — 来自 YouTube API 的原始字幕片段({ text, start, duration }[])transcript-sentences.json — 句子分割后的字幕({ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]),按句子结束标点符号(.?!…。?! 等)分割,时间戳按字符长度比例分配,支持 CJK 文本合并imgs/cover.jpg — 视频缩略图后续对同一视频的运行将使用缓存数据(无需网络调用)。使用 --refresh 强制重新获取。如果请求了不同的语言,缓存会自动刷新。
SRT 输出(--format srt)从 transcript-raw.json 生成。文本/Markdown 输出使用 transcript-sentences.json 以获得自然的句子边界。
当用户提供 YouTube URL 并想要字幕时:
--list 运行以显示可用选项? 视为通配符,因此未加引号的 YouTube URL 会导致“未找到匹配项”:请使用 'https://www.youtube.com/watch?v=ID'--chapters --speakers 运行以获得最丰富的输出(章节 + 说话人识别)--speakers 模式:脚本保存原始文件后,按照下面的说话人识别工作流程进行后处理以添加说话人标签当用户只想要封面图像或元数据时,使用任何选项运行脚本也会缓存 meta.json 和 imgs/cover.jpg。
当重新格式化同一视频时(例如,先文本后 SRT),会重用缓存数据——无需重新获取。
--chapters)脚本从视频描述中解析章节时间戳(例如 0:00 Introduction),按章节边界分割字幕,将片段分组为可读段落,并保存为带目录的 .md 文件。无需进一步处理。
如果描述中没有章节时间戳,则输出不带章节标题的分组段落。
--speakers)说话人识别需要 AI 处理。脚本输出一个包含以下内容的原始 .md 文件:
脚本保存原始文件后,生成一个子代理(使用成本更低的模型,如 Sonnet,以提高成本效益)来处理说话人识别:
.md 文件{baseDir}/prompts/speaker-transcript.md 处的提示模板**说话人姓名:** 标签、段落分组(2-4 句)和 [HH:MM:SS → HH:MM:SS] 时间戳进行格式化.md 文件(保留 YAML frontmatter)当使用 --speakers 时,隐含 --chapters — 处理后的输出始终包含章节分割。
| 错误 | 含义 |
|---|---|
| 字幕已禁用 | 视频完全没有字幕 |
| 未找到字幕 | 请求的语言不可用 |
| 视频不可用 | 视频已删除、设为私有或受区域限制 |
| IP 被阻止 | 请求过多,请稍后重试 |
| 年龄限制 | 视频需要登录进行年龄验证 |
每周安装量
1.0K
仓库
GitHub 星标
10.8K
首次出现
2 天前
安全审计
安装于
opencode999
gemini-cli998
codex998
github-copilot997
amp997
kimi-cli996
Downloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly.
Fetches video metadata and cover image on first run, caches raw data for fast re-formatting.
Scripts in scripts/ subdirectory. {baseDir} = this SKILL.md's directory path. Resolve ${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun. Replace {baseDir} and ${BUN_X} with actual values.
| Script | Purpose |
|---|---|
scripts/main.ts | Transcript download CLI |
# Default: markdown with timestamps (English)
${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>
# Specify languages (priority order)
${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja
# Without timestamps
${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps
# With chapter segmentation
${BUN_X} {baseDir}/scripts/main.ts <url> --chapters
# With speaker identification (requires AI post-processing)
${BUN_X} {baseDir}/scripts/main.ts <url> --speakers
# SRT subtitle file
${BUN_X} {baseDir}/scripts/main.ts <url> --format srt
# Translate transcript
${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans
# List available transcripts
${BUN_X} {baseDir}/scripts/main.ts <url> --list
# Force re-fetch (ignore cache)
${BUN_X} {baseDir}/scripts/main.ts <url> --refresh
| Option | Description | Default |
|---|---|---|
<url-or-id> | YouTube URL or video ID (multiple allowed) | Required |
--languages <codes> | Language codes, comma-separated, in priority order | en |
--format <fmt> | Output format: text, srt | text |
Accepts any of these as video input:
https://www.youtube.com/watch?v=dQw4w9WgXcQhttps://youtu.be/dQw4w9WgXcQhttps://www.youtube.com/embed/dQw4w9WgXcQhttps://www.youtube.com/shorts/dQw4w9WgXcQdQw4w9WgXcQ| Format | Extension | Description |
|---|---|---|
text | .md | Markdown with frontmatter (incl. description), title heading, summary, optional TOC/cover/timestamps/chapters/speakers |
srt | .srt | SubRip subtitle format for video players |
youtube-transcript/
├── .index.json # Video ID → directory path mapping (for cache lookup)
└── {channel-slug}/{title-full-slug}/
├── meta.json # Video metadata (title, channel, description, duration, chapters, etc.)
├── transcript-raw.json # Raw transcript snippets from YouTube API (cached)
├── transcript-sentences.json # Sentence-segmented transcript (split by punctuation, merged across snippets)
├── imgs/
│ └── cover.jpg # Video thumbnail
├── transcript.md # Markdown transcript (generated from sentences)
└── transcript.srt # SRT subtitle (generated from raw snippets, if --format srt)
{channel-slug}: Channel name in kebab-case{title-full-slug}: Full video title in kebab-caseThe --list mode outputs to stdout only (no file saved).
On first fetch, the script saves:
meta.json — video metadata, chapters, cover image path, language infotranscript-raw.json — raw transcript snippets from YouTube API ({ text, start, duration }[])transcript-sentences.json — sentence-segmented transcript ({ text, start: "HH:mm:ss", end: "HH:mm:ss" }[]), split by sentence-ending punctuation (.?!…。?! etc.), timestamps proportionally allocated by character length, CJK-aware text mergingimgs/cover.jpg — video thumbnailSubsequent runs for the same video use cached data (no network calls). Use --refresh to force re-fetch. If a different language is requested, the cache is automatically refreshed.
SRT output (--format srt) is generated from transcript-raw.json. Text/markdown output uses transcript-sentences.json for natural sentence boundaries.
When user provides a YouTube URL and wants the transcript:
--list first if the user hasn't specified a language, to show available options? as a glob wildcard, so an unquoted YouTube URL causes "no matches found": use 'https://www.youtube.com/watch?v=ID'--chapters --speakers for the richest output (chapters + speaker identification)--speakers mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labelsWhen user only wants a cover image or metadata, running the script with any option will also cache meta.json and imgs/cover.jpg.
When re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.
--chapters)The script parses chapter timestamps from the video description (e.g., 0:00 Introduction), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as .md with a Table of Contents. No further processing needed.
If no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.
--speakers)Speaker identification requires AI processing. The script outputs a raw .md file containing:
After the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:
.md file{baseDir}/prompts/speaker-transcript.md**Speaker Name:** labels, paragraph grouping (2-4 sentences), and [HH:MM:SS → HH:MM:SS] timestamps.md file with the processed transcript (keep the YAML frontmatter)When --speakers is used, --chapters is implied — the processed output always includes chapter segmentation.
| Error | Meaning |
|---|---|
| Transcripts disabled | Video has no captions at all |
| No transcript found | Requested language not available |
| Video unavailable | Video deleted, private, or region-locked |
| IP blocked | Too many requests, try again later |
| Age restricted | Video requires login for age verification |
Weekly Installs
1.0K
Repository
GitHub Stars
10.8K
First Seen
2 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode999
gemini-cli998
codex998
github-copilot997
amp997
kimi-cli996
99,500 周安装
--translate <code> | Translate to specified language code |
--list | List available transcripts instead of fetching |
--timestamps | Include [HH:MM:SS → HH:MM:SS] timestamps per paragraph | on |
--no-timestamps | Disable timestamps |
--chapters | Chapter segmentation from video description |
--speakers | Raw transcript with metadata for speaker identification |
--exclude-generated | Skip auto-generated transcripts |
--exclude-manually-created | Skip manually created transcripts |
--refresh | Force re-fetch, ignore cached data |
-o, --output <path> | Save to specific file path | auto-generated |
--output-dir <dir> | Base output directory | youtube-transcript |