Transcription Automation by claude-office-skills/skills
npx skills add https://github.com/claude-office-skills/skills --skill 'Transcription Automation'用于自动化音频/视频转录和内容处理的综合技能。
转录流程:
┌─────────────────┐
│ 音频/视频 │
│ 输入 │
└────────┬────────┘
▼
┌─────────────────┐
│ 预处理 │
│ - 转换 │
│ - 增强 │
│ - 分割 │
└────────┬────────┘
▼
┌─────────────────┐
│ 转录 │
│ - STT引擎 │
│ - 说话人分离 │
└────────┬────────┘
▼
┌─────────────────┐
│ 后处理 │
│ - 格式化 │
│ - 时间戳 │
│ - 说话人 │
└────────┬────────┘
▼
┌─────────────────┐
│ 输出 │
│ - 文本/SRT/VTT │
│ - 摘要 │
└─────────────────┘
transcription_config:
engine: whisper # whisper, assembly_ai, deepgram
audio_settings:
sample_rate: 16000
channels: mono
format: wav
transcription:
language: auto # 或指定语言: en, zh, es
model: large # tiny, base, small, medium, large
task: transcribe # transcribe 或 translate
features:
speaker_diarization: true
word_timestamps: true
punctuation: true
profanity_filter: false
output:
formats:
- txt
- srt
- vtt
- json
include_confidence: true
include_timestamps: true
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
meeting_transcript:
metadata:
title: "{{meeting_title}}"
date: "{{date}}"
duration: "{{duration}}"
attendees: "{{speakers}}"
output_template: |
# {{title}}
**日期:** {{date}}
**时长:** {{duration}}
**参会者:** {{attendees}}
## 摘要
{{ai_summary}}
## 关键点
{{#each key_points}}
- {{this}}
{{/each}}
## 行动项
{{#each action_items}}
- [ ] {{task}} - @{{assignee}} - 截止日期: {{due_date}}
{{/each}}
## 完整转录
{{#each segments}}
**[{{timestamp}}] {{speaker}}:** {{text}}
{{/each}}
diarization_config:
min_speakers: 2
max_speakers: 10
speaker_labels:
- name: "Speaker 1"
voice_sample: "sample_1.wav" # 可选
- name: "Speaker 2"
voice_sample: "sample_2.wav"
output_format:
speaker_prefix: true
speaker_timestamps: true
example_output: |
[00:00:05] SPEAKER_1: 欢迎大家参加今天的会议。
[00:00:12] SPEAKER_2: 感谢邀请我们。
[00:00:18] SPEAKER_1: 让我们从议程开始。
subtitle_config:
format: srt
timing:
max_duration: 7 # 每条字幕的秒数
min_gap: 0.1 # 字幕间隔秒数
chars_per_line: 42
max_lines: 2
style:
case: sentence # sentence, upper, lower
numbers: words # words, digits
example_output: |
1
00:00:05,000 --> 00:00:08,500
欢迎参加今天的演示
关于转录自动化。
2
00:00:09,000 --> 00:00:12,000
让我从解释基本概念开始。
vtt_config:
format: vtt
features:
cue_settings: true
styling: true
example_output: |
WEBVTT
00:00:05.000 --> 00:00:08.500 align:center
欢迎参加今天的演示
关于转录自动化。
00:00:09.000 --> 00:00:12.000 align:center
<v Speaker 1>让我从解释基本概念开始。
zoom_transcription:
trigger:
event: recording_completed
workflow:
- step: download_recording
source: zoom_cloud
- step: transcribe
engine: whisper
language: auto
- step: diarize
identify_speakers: true
- step: generate_notes
template: meeting_notes
include_summary: true
extract_action_items: true
- step: distribute
destinations:
- notion_page
- slack_channel
- email_attendees
youtube_subtitles:
trigger:
event: video_uploaded
workflow:
- step: download_audio
source: youtube_video
- step: transcribe
engine: whisper
task: transcribe
- step: generate_subtitles
formats: [srt, vtt]
- step: translate
target_languages: [es, zh, ja, de, fr]
- step: upload_subtitles
destination: youtube
as_cc: true
podcast_workflow:
input:
source: rss_feed
format: audio/mp3
processing:
- transcribe:
engine: whisper
model: large
- generate_chapters:
detect_topics: true
min_duration: 60 # 秒
- create_show_notes:
summarize: true
extract_links: true
highlight_quotes: true
- create_searchable_index:
full_text: true
timestamps: true
output:
- transcript_txt
- chapters_json
- show_notes_md
- search_index
multilingual:
auto_detect: true
supported_languages:
- code: en
name: English
model: large
- code: zh
name: Chinese
model: large
- code: es
name: Spanish
model: large
- code: ja
name: Japanese
model: medium
translation:
enabled: true
target: en
preserve_original: true
code_switching:
enabled: true
primary_language: en
secondary_languages: [zh, es]
output: |
[00:01:23] The next topic is about 人工智能,
which has been muy importante in recent years.
handling:
detect_language_per_segment: true
tag_language_switches: true
post_processing:
text_cleanup:
- remove_filler_words: ["um", "uh", "like"]
- fix_common_errors: true
- normalize_numbers: true
formatting:
- add_punctuation: true
- capitalize_sentences: true
- paragraph_breaks: true
speaker_attribution:
- merge_short_segments: true
- min_segment_duration: 1.0
output_enhancement:
- add_timestamps: true
- highlight_keywords: true
- generate_summary: true
转录质量报告
═══════════════════════════════════════
文件: meeting_2024_01_15.mp3
时长: 45:32
引擎: Whisper Large
指标:
词错误率 (WER): 4.2%
字符错误率: 2.8%
置信度分数: 0.94
说话人分离:
检测到的说话人: 4
分离准确率: 91%
处理时间:
总计: 8分23秒
实时因子: 0.18倍
检测到的问题:
• 12:34处置信度低(背景噪音)
• 23:45处语音重叠
• 34:12处未知说话人
import openai
# 转录音频
with open("meeting.mp3", "rb") as audio_file:
transcript = openai.Audio.transcribe(
model="whisper-1",
file=audio_file,
response_format="verbose_json",
timestamp_granularities=["word", "segment"]
)
# 访问结果
for segment in transcript.segments:
print(f"[{segment.start:.2f}] {segment.text}")
import assemblyai as aai
transcriber = aai.Transcriber()
config = aai.TranscriptionConfig(
speaker_labels=True,
auto_chapters=True,
entity_detection=True
)
transcript = transcriber.transcribe(
"https://example.com/meeting.mp3",
config=config
)
for utterance in transcript.utterances:
print(f"Speaker {utterance.speaker}: {utterance.text}")
每周安装次数
0
仓库
GitHub星标数
7
首次出现
1970年1月1日
安全审计
通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南
31,600 周安装