音频视频转录自动化技能：AI转录、会议记录、字幕生成一站式解决方案

Transcription Automation by claude-office-skills/skills

40 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/claude-office-skills/skills --skill 'Transcription Automation'

自动化音频处理自然语言处理

🇨🇳中文介绍

转录自动化

用于自动化音频/视频转录和内容处理的综合技能。

核心工作流

1. 转录流程

转录流程：
┌─────────────────┐
│  音频/视频      │
│     输入        │
└────────┬────────┘
         ▼
┌─────────────────┐
│  预处理         │
│  - 转换         │
│  - 增强         │
│  - 分割         │
└────────┬────────┘
         ▼
┌─────────────────┐
│  转录           │
│  - STT引擎      │
│  - 说话人分离   │
└────────┬────────┘
         ▼
┌─────────────────┐
│  后处理         │
│  - 格式化       │
│  - 时间戳       │
│  - 说话人       │
└────────┬────────┘
         ▼
┌─────────────────┐
│     输出        │
│  - 文本/SRT/VTT │
│  - 摘要         │
└─────────────────┘

2. 转录配置

transcription_config:
  engine: whisper  # whisper, assembly_ai, deepgram
  
  audio_settings:
    sample_rate: 16000
    channels: mono
    format: wav
    
  transcription:
    language: auto  # 或指定语言: en, zh, es
    model: large  # tiny, base, small, medium, large
    task: transcribe  # transcribe 或 translate
    
  features:
    speaker_diarization: true
    word_timestamps: true
    punctuation: true
    profanity_filter: false
    
  output:
    formats:
      - txt
      - srt
      - vtt
      - json
    include_confidence: true
    include_timestamps: true

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

117,000 周安装

GitHub Actions 官方文档查询助手 - 精准解答 CI/CD 工作流问题

31,800 周安装

Skills CLI 使用指南：AI Agent 技能包管理器安装与管理教程

31,600 周安装

meeting_transcript:
  metadata:
    title: "{{meeting_title}}"
    date: "{{date}}"
    duration: "{{duration}}"
    attendees: "{{speakers}}"
    
  output_template: |
    # {{title}}
    
    **日期:** {{date}}
    **时长:** {{duration}}
    **参会者:** {{attendees}}
    
    ## 摘要
    {{ai_summary}}
    
    ## 关键点
    {{#each key_points}}
    - {{this}}
    {{/each}}
    
    ## 行动项
    {{#each action_items}}
    - [ ] {{task}} - @{{assignee}} - 截止日期: {{due_date}}
    {{/each}}
    
    ## 完整转录
    {{#each segments}}
    **[{{timestamp}}] {{speaker}}:** {{text}}
    
    {{/each}}

diarization_config:
  min_speakers: 2
  max_speakers: 10
  
  speaker_labels:
    - name: "Speaker 1"
      voice_sample: "sample_1.wav"  # 可选
    - name: "Speaker 2"
      voice_sample: "sample_2.wav"
      
  output_format:
    speaker_prefix: true
    speaker_timestamps: true
    
  example_output: |
    [00:00:05] SPEAKER_1: 欢迎大家参加今天的会议。
    [00:00:12] SPEAKER_2: 感谢邀请我们。
    [00:00:18] SPEAKER_1: 让我们从议程开始。

subtitle_config:
  format: srt
  
  timing:
    max_duration: 7  # 每条字幕的秒数
    min_gap: 0.1     # 字幕间隔秒数
    chars_per_line: 42
    max_lines: 2
    
  style:
    case: sentence  # sentence, upper, lower
    numbers: words  # words, digits
    
  example_output: |
    1
    00:00:05,000 --> 00:00:08,500
    欢迎参加今天的演示
    关于转录自动化。
    
    2
    00:00:09,000 --> 00:00:12,000
    让我从解释基本概念开始。

vtt_config:
  format: vtt
  
  features:
    cue_settings: true
    styling: true
    
  example_output: |
    WEBVTT
    
    00:00:05.000 --> 00:00:08.500 align:center
    欢迎参加今天的演示
    关于转录自动化。
    
    00:00:09.000 --> 00:00:12.000 align:center
    <v Speaker 1>让我从解释基本概念开始。

zoom_transcription:
  trigger:
    event: recording_completed
    
  workflow:
    - step: download_recording
      source: zoom_cloud
      
    - step: transcribe
      engine: whisper
      language: auto
      
    - step: diarize
      identify_speakers: true
      
    - step: generate_notes
      template: meeting_notes
      include_summary: true
      extract_action_items: true
      
    - step: distribute
      destinations:
        - notion_page
        - slack_channel
        - email_attendees

youtube_subtitles:
  trigger:
    event: video_uploaded
    
  workflow:
    - step: download_audio
      source: youtube_video
      
    - step: transcribe
      engine: whisper
      task: transcribe
      
    - step: generate_subtitles
      formats: [srt, vtt]
      
    - step: translate
      target_languages: [es, zh, ja, de, fr]
      
    - step: upload_subtitles
      destination: youtube
      as_cc: true

podcast_workflow:
  input:
    source: rss_feed
    format: audio/mp3
    
  processing:
    - transcribe:
        engine: whisper
        model: large
        
    - generate_chapters:
        detect_topics: true
        min_duration: 60  # 秒
        
    - create_show_notes:
        summarize: true
        extract_links: true
        highlight_quotes: true
        
    - create_searchable_index:
        full_text: true
        timestamps: true
        
  output:
    - transcript_txt
    - chapters_json
    - show_notes_md
    - search_index

multilingual:
  auto_detect: true
  
  supported_languages:
    - code: en
      name: English
      model: large
      
    - code: zh
      name: Chinese
      model: large
      
    - code: es
      name: Spanish
      model: large
      
    - code: ja
      name: Japanese
      model: medium
      
  translation:
    enabled: true
    target: en
    preserve_original: true

code_switching:
  enabled: true
  primary_language: en
  secondary_languages: [zh, es]
  
  output: |
    [00:01:23] The next topic is about 人工智能,
    which has been muy importante in recent years.
    
  handling:
    detect_language_per_segment: true
    tag_language_switches: true

post_processing:
  text_cleanup:
    - remove_filler_words: ["um", "uh", "like"]
    - fix_common_errors: true
    - normalize_numbers: true
    
  formatting:
    - add_punctuation: true
    - capitalize_sentences: true
    - paragraph_breaks: true
    
  speaker_attribution:
    - merge_short_segments: true
    - min_segment_duration: 1.0
    
  output_enhancement:
    - add_timestamps: true
    - highlight_keywords: true
    - generate_summary: true

转录质量报告
═══════════════════════════════════════

文件: meeting_2024_01_15.mp3
时长: 45:32
引擎: Whisper Large

指标:
词错误率 (WER):  4.2%
字符错误率:   2.8%
置信度分数:       0.94

说话人分离:
检测到的说话人: 4
分离准确率: 91%

处理时间:
总计: 8分23秒
实时因子: 0.18倍

检测到的问题:
• 12:34处置信度低（背景噪音）
• 23:45处语音重叠
• 34:12处未知说话人

import openai

# 转录音频
with open("meeting.mp3", "rb") as audio_file:
    transcript = openai.Audio.transcribe(
        model="whisper-1",
        file=audio_file,
        response_format="verbose_json",
        timestamp_granularities=["word", "segment"]
    )

# 访问结果
for segment in transcript.segments:
    print(f"[{segment.start:.2f}] {segment.text}")

import assemblyai as aai

transcriber = aai.Transcriber()

config = aai.TranscriptionConfig(
    speaker_labels=True,
    auto_chapters=True,
    entity_detection=True
)

transcript = transcriber.transcribe(
    "https://example.com/meeting.mp3",
    config=config
)

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

高质量音频 : 干净的输入 = 更好的输出
选择合适模型 : 平衡速度与准确性
使用说话人分离 : 清晰识别说话人
后处理 : 清理自动化输出
验证关键内容 : 重要内容人工审核
考虑隐私 : 处理敏感内容
高效存储 : 压缩和索引
提供上下文 : 词汇提示有帮助

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

31,600 周安装

音频视频转录自动化技能：AI转录、会议记录、字幕生成一站式解决方案

🇨🇳中文介绍

转录自动化

核心工作流

1. 转录流程

2. 转录配置

相关 Skills

会议转录

会议记录模板

说话人分离

字幕生成

SRT格式

VTT格式

集成工作流

Zoom集成

YouTube集成

播客处理

语言支持

多语言转录

语码转换

质量增强

后处理

准确性指标

API示例

OpenAI Whisper

AssemblyAI

最佳实践

最新 Skills