文本转播客生成工具 - 一键将文章文档转换为高质量双主持人对话音频

podcast-generation by bytedance/deer-flow

275 周安装量

41,600 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/bytedance/deer-flow --skill podcast-generation

AI/机器学习内容创作自动化音频处理

🇨🇳中文介绍

播客生成技能

概述

此技能可将文本内容转换为高质量的播客音频。工作流程包括创建结构化的 JSON 脚本（对话形式）并通过文本转语音合成执行音频生成。

核心能力

将任何文本内容（文章、报告、文档）转换为播客脚本
生成自然的双主持人对话（男主持和女主持）
使用文本转语音合成语音音频
将音频片段混合成最终的播客 MP3 文件
支持英文和中文内容

工作流程

步骤 1：理解需求

当用户请求生成播客时，请确认：

源内容：要转换为播客的文本/文章/报告
语言：英文或中文（基于内容）
输出位置：保存生成的播客的位置
你无需检查 /mnt/user-data 下的文件夹

步骤 2：创建结构化脚本 JSON

在 /mnt/user-data/workspace/ 目录下生成一个结构化的 JSON 脚本文件，命名模式为：{描述性名称}-script.json

JSON 结构：

{
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "dialogue text"},
    {"speaker": "female", "paragraph": "dialogue text"}
  ]
}

步骤 3：执行生成

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

776,000 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

106,200 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

104,600 周安装

调用 Python 脚本：

python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/script-file.json \
  --output-file /mnt/user-data/outputs/generated-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/generated-podcast-transcript.md

--script-file：JSON 脚本文件的绝对路径（必需）
--output-file：输出 MP3 文件的绝对路径（必需）
--transcript-file：输出转录文本 Markdown 文件的绝对路径（可选，但推荐）

[!IMPORTANT]

在一个完整的调用中执行脚本。请勿将工作流程拆分为单独的步骤。

脚本内部处理所有 TTS API 调用和音频生成。

请勿读取 Python 文件，只需使用参数调用它。

始终包含 --transcript-file 参数，以便为用户生成可读的转录文本。

脚本 JSON 文件必须遵循此结构：

{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
    {"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
    {"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
  ]
}

title：播客剧集标题（可选，在转录文本中用作标题）
locale：语言代码 - "en" 表示英文，"zh" 表示中文
lines：对话行数组
- speaker："male" 或 "female"
- paragraph：该主持人的对话文本

创建脚本 JSON 时，请遵循以下指南：

仅两位主持人：男主持和女主持，自然交替
目标时长：对话约 10 分钟（约 40-60 行）
以男主持问候语开始，问候语需包含 "Hello Deer"

自然、对话式的交流 - 像两个朋友在聊天
使用随意的表达和对话过渡
避免过于正式的语言或学术语气
包含反应、后续问题和自然的插话

主持人之间频繁互动
保持句子简短，便于口语表达
仅纯文本 - 输出中无 Markdown 格式
将技术概念转化为易于理解的语言
无数学公式、代码或复杂符号
使内容对仅收听音频的听众具有吸引力和可访问性
排除元信息，如日期、作者姓名或文档结构

用户请求："生成一个关于人工智能历史的播客"

步骤 1：创建脚本文件 /mnt/user-data/workspace/ai-history-script.json：

{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another fascinating episode. Today we're diving into something that's literally shaping our future - the history of artificial intelligence."},
    {"speaker": "female", "paragraph": "Oh, I love this topic! You know, AI feels so modern, but it actually has roots going back over seventy years."},
    {"speaker": "male", "paragraph": "Exactly! It all started back in the 1950s. The term artificial intelligence was actually coined by John McCarthy in 1956 at a famous conference at Dartmouth."},
    {"speaker": "female", "paragraph": "Wait, so they were already thinking about machines that could think back then? That's incredible!"},
    {"speaker": "male", "paragraph": "Right? The early pioneers were so optimistic. They thought we'd have human-level AI within a generation."},
    {"speaker": "female", "paragraph": "But things didn't quite work out that way, did they?"},
    {"speaker": "male", "paragraph": "No, not at all. The 1970s brought what's called the first AI winter..."}
  ]
}

步骤 2：执行生成：

python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/ai-history-script.json \
  --output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/ai-history-transcript.md

ai-history-podcast.mp3：音频播客文件
ai-history-transcript.md：播客的可读 Markdown 转录文本

仅在匹配用户请求时阅读以下模板文件。

技术讲解 - 用于转换技术文档和教程

生成的播客遵循 "Hello Deer" 格式：

两位主持人：一位男主持，一位女主持
自然的对话式交流
以 "Hello Deer" 问候语开始
目标时长：约 10 分钟
交替发言以保持流畅性和吸引力

播客和转录文本保存在 /mnt/user-data/outputs/ 目录下
使用 present_files 工具与用户共享播客 MP3 和转录文本 MD
提供生成结果的简要描述（主题、时长、主持人）
如果需要调整，可提议重新生成

必须设置以下环境变量：

VOLCENGINE_TTS_APPID：火山引擎 TTS 应用 ID
VOLCENGINE_TTS_ACCESS_TOKEN：火山引擎 TTS 访问令牌
VOLCENGINE_TTS_CLUSTER：火山引擎 TTS 集群（可选，默认为 "volcano_tts"）

始终在一个调用中执行完整流程 - 无需测试单个步骤或担心超时
脚本 JSON 应与内容语言（en 或 zh）匹配
脚本中的技术内容应简化以确保音频可访问性
脚本中的复杂符号（公式、代码）应转换为通俗语言
长内容可能导致播客时长增加

🇺🇸English

Podcast Generation Skill

Overview

This skill generates high-quality podcast audio from text content. The workflow includes creating a structured JSON script (conversational dialogue) and executing audio generation through text-to-speech synthesis.

Core Capabilities

Convert any text content (articles, reports, documentation) into podcast scripts
Generate natural two-host conversational dialogue (male and female hosts)
Synthesize speech audio using text-to-speech
Mix audio chunks into a final podcast MP3 file
Support both English and Chinese content

Workflow

Step 1: Understand Requirements

When a user requests podcast generation, identify:

Source content: The text/article/report to convert into a podcast
Language: English or Chinese (based on content)
Output location: Where to save the generated podcast
You don't need to check the folder under /mnt/user-data

Step 2: Create Structured Script JSON

Generate a structured JSON script file in /mnt/user-data/workspace/ with naming pattern: {descriptive-name}-script.json

The JSON structure:

{
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "dialogue text"},
    {"speaker": "female", "paragraph": "dialogue text"}
  ]
}

Step 3: Execute Generation

Call the Python script:

python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/script-file.json \
  --output-file /mnt/user-data/outputs/generated-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/generated-podcast-transcript.md

Parameters:

--script-file: Absolute path to JSON script file (required)
--output-file: Absolute path to output MP3 file (required)
--transcript-file: Absolute path to output transcript markdown file (optional, but recommended)

[!IMPORTANT]

Execute the script in one complete call. Do NOT split the workflow into separate steps.

The script handles all TTS API calls and audio generation internally.

Do NOT read the Python file, just call it with the parameters.

Always include --transcript-file to generate a readable transcript for the user.

Script JSON Format

The script JSON file must follow this structure:

{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
    {"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
    {"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
  ]
}

Fields:

title: Title of the podcast episode (optional, used as heading in transcript)
locale: Language code - "en" for English or "zh" for Chinese
lines: Array of dialogue lines
- speaker: Either "male" or "female"
- paragraph: The dialogue text for this speaker

Script Writing Guidelines

When creating the script JSON, follow these guidelines:

Format Requirements

Only two hosts: male and female, alternating naturally
Target runtime: approximately 10 minutes of dialogue (around 40-60 lines)
Start with the male host saying a greeting that includes "Hello Deer"

Tone & Style

Natural, conversational dialogue - like two friends chatting
Use casual expressions and conversational transitions
Avoid overly formal language or academic tone
Include reactions, follow-up questions, and natural interjections

Content Guidelines

Frequent back-and-forth between hosts
Keep sentences short and easy to follow when spoken
Plain text only - no markdown formatting in the output
Translate technical concepts into accessible language
No mathematical formulas, code, or complex notation
Make content engaging and accessible for audio-only listeners
Exclude meta information like dates, author names, or document structure

Podcast Generation Example

User request: "Generate a podcast about the history of artificial intelligence"

Step 1: Create script file /mnt/user-data/workspace/ai-history-script.json:

{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another fascinating episode. Today we're diving into something that's literally shaping our future - the history of artificial intelligence."},
    {"speaker": "female", "paragraph": "Oh, I love this topic! You know, AI feels so modern, but it actually has roots going back over seventy years."},
    {"speaker": "male", "paragraph": "Exactly! It all started back in the 1950s. The term artificial intelligence was actually coined by John McCarthy in 1956 at a famous conference at Dartmouth."},
    {"speaker": "female", "paragraph": "Wait, so they were already thinking about machines that could think back then? That's incredible!"},
    {"speaker": "male", "paragraph": "Right? The early pioneers were so optimistic. They thought we'd have human-level AI within a generation."},
    {"speaker": "female", "paragraph": "But things didn't quite work out that way, did they?"},
    {"speaker": "male", "paragraph": "No, not at all. The 1970s brought what's called the first AI winter..."}
  ]
}

Step 2: Execute generation:

python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/ai-history-script.json \
  --output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/ai-history-transcript.md

This will generate:

ai-history-podcast.mp3: The audio podcast file
ai-history-transcript.md: A readable markdown transcript of the podcast

Specific Templates

Read the following template file only when matching the user request.

Tech Explainer - For converting technical documentation and tutorials

Output Format

The generated podcast follows the "Hello Deer" format:

Two hosts: one male, one female
Natural conversational dialogue
Starts with "Hello Deer" greeting
Target duration: approximately 10 minutes
Alternating speakers for engaging flow

Output Handling

After generation:

Podcasts and transcripts are saved in /mnt/user-data/outputs/
Share both the podcast MP3 and transcript MD with user using present_files tool
Provide brief description of the generation result (topic, duration, hosts)
Offer to regenerate if adjustments needed

Requirements

The following environment variables must be set:

VOLCENGINE_TTS_APPID: Volcengine TTS application ID
VOLCENGINE_TTS_ACCESS_TOKEN: Volcengine TTS access token
VOLCENGINE_TTS_CLUSTER: Volcengine TTS cluster (optional, defaults to "volcano_tts")

Notes

Always execute the full pipeline in one call - no need to test individual steps or worry about timeouts
The script JSON should match the content language (en or zh)
Technical content should be simplified for audio accessibility in the script
Complex notations (formulas, code) should be translated to plain language in the script
Long content may result in longer podcasts

Weekly Installs

140

Repository

bytedance/deer-flow

GitHub Stars

27.8K

First Seen

Feb 17, 2026

Security Audits

Gen Agent Trust HubPass SocketFail SnykPass

Installed on

gemini-cli137

github-copilot137

opencode137

kimi-cli136

amp136

cursor136

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

56,200 周安装