⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

音频语音恢复与法证分析指南：45条最佳实践，从低质量录音中提取清晰语音

audio-voice-recovery by pproenca/dot-skills

74 周安装量

97 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/pproenca/dot-skills --skill audio-voice-recovery

AI/机器学习法律合规音频处理

🇨🇳中文介绍

法证音频研究与音频语音恢复最佳实践

全面的音频法证和语音恢复指南，提供 CSI 级别的能力，用于从低质量、低音量或受损的音频录音中恢复语音。包含 8 个类别共 45 条规则，按影响优先级排序，以指导音频增强、法证分析和转录工作流程。

适用场景

在以下情况下参考本指南：

从嘈杂或低质量的录音中恢复语音
为转录或法律证据增强音频
执行法证音频认证
分析录音是否存在篡改或拼接
构建自动化音频处理流水线
转录困难或退化的语音

按优先级划分的规则类别

优先级	类别	影响	前缀	规则数量
1	信号保存与分析	关键	`signal-`	5
2	噪声分析与估计	关键	`noise-`	5

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

1. 信号保存与分析 (关键)

signal-preserve-original - 绝不修改原始录音
signal-lossless-format - 使用无损格式进行处理
signal-sample-rate - 保持原生采样率
signal-bit-depth - 处理时使用最大位深度
signal-analyze-first - 处理前先进行分析

2. 噪声分析与估计 (关键)

noise-profile-silence - 从静音片段提取噪声特征
noise-identify-type - 降噪前识别噪声类型
noise-adaptive-estimation - 对非平稳噪声使用自适应估计
noise-snr-assessment - 测量处理前后的信噪比
noise-avoid-overprocessing - 避免过度处理和音乐噪声伪影

3. 频谱处理 (高)

spectral-subtraction - 对平稳噪声应用谱减法
spectral-wiener-filter - 使用维纳滤波器进行最优噪声估计
spectral-notch-filter - 应用陷波滤波器处理单音干扰
spectral-band-limiting - 对语音应用频带限制
spectral-equalization - 使用法证均衡恢复可懂度
spectral-declip - 在其他处理前修复削波音频

4. 语音分离与增强 (高)

voice-rnnoise - 使用 RNNoise 进行实时机器学习降噪
voice-dialogue-isolate - 对复杂背景使用源分离
voice-formant-preserve - 在音高调整期间保留共振峰
voice-dereverb - 对房间回声应用去混响
voice-enhance-speech - 使用 AI 语音增强服务快速获得结果
voice-vad-segment - 使用 VAD 进行针对性处理
voice-frequency-boost - 提升特定音素的频率区域

5. 时域处理 (中高)

temporal-dynamic-range - 使用动态范围压缩保持电平一致性
temporal-noise-gate - 应用噪声门静音非语音片段
temporal-time-stretch - 使用时间拉伸提高可懂度
temporal-transient-repair - 修复瞬态损伤（咔嗒声、爆音、信号丢失）
temporal-silence-trim - 导出前修剪静音并归一化

6. 转录与识别 (中)

transcribe-whisper - 使用 Whisper 进行抗噪转录
transcribe-multipass - 对困难音频使用多轮转录
transcribe-segment - 分割音频进行针对性转录
transcribe-confidence - 跟踪不确定词汇的置信度分数
transcribe-hallucination - 检测并过滤 ASR 幻觉

7. 法证认证 (中)

forensic-enf-analysis - 使用 ENF 分析进行时间戳验证
forensic-metadata - 提取并验证音频元数据
forensic-tampering - 检测音频篡改和拼接
forensic-chain-custody - 记录证据的保管链
forensic-speaker-id - 提取说话人特征用于识别

8. 工具集成与自动化 (低中)

tool-ffmpeg-essentials - 掌握核心 FFmpeg 音频命令
tool-sox-commands - 使用 SoX 进行高级音频操作
tool-python-pipeline - 构建 Python 音频处理流水线
tool-audacity-workflow - 使用 Audacity 进行可视化分析和手动编辑
tool-install-guide - 安装音频法证工具链
tool-batch-automation - 自动化批处理工作流程
tool-quality-assessment - 测量音频质量指标

工具	用途	安装命令
FFmpeg	格式转换，滤波	`brew install ffmpeg`
SoX	噪声分析，效果处理	`brew install sox`
Whisper	语音转录	`pip install openai-whisper`
librosa	Python 音频分析	`pip install librosa`
noisereduce	机器学习降噪	`pip install noisereduce`
Audacity	可视化编辑	`brew install audacity`

工作流程脚本 (推荐)

使用捆绑的脚本来生成客观基线、创建工作流程计划并验证结果。

scripts/preflight_audio.py - 生成法证飞行前检查报告 (JSON 或 Markdown)。
scripts/plan_from_preflight.py - 根据飞行前检查报告创建工作流程计划模板。
scripts/compare_audio.py - 比较基线与处理后音频的客观指标。

# 1) 分析并捕获基线指标
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json

# 2) 生成工作流程计划模板
python3 skills/.experimental/audio-voice-recovery/scripts/plan_from_preflight.py --preflight preflight.json --out plan.md

# 3) 比较基线 vs 处理后指标
python3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py \
  --before evidence.wav \
  --after enhanced.wav \
  --format md \
  --out comparison.md

法证飞行前检查工作流程 (在进行任何更改前执行)

使飞行前检查与 SWGDE 数字音频增强最佳实践 (20-a-001) 和 SWGDE 法证音频最佳实践 (08-a-001) 保持一致。建立客观的基线状态并规划工作流程，确保处理过程不会引入削波、伪影或虚假的“完成”信心。使用 scripts/preflight_audio.py 捕获基线指标，并将报告与案件文件一同保存。

在处理前捕获并记录：

记录证据身份和完整性：路径、文件名、文件大小、SHA-256 校验和、来源、格式/容器、编解码器
记录信号完整性：采样率、位深度、通道数、时长
测量基线响度和电平：LUFS/LKFS、真峰值、峰值、RMS、动态范围、直流偏移
检测削波并记录削波样本百分比、峰值余量、精确时间范围
识别噪声特征：平稳 vs 非平稳、主要噪声频段、信噪比估计
定位感兴趣区域 (ROI) 并记录时间范围及随时间的变化
检查频谱内容并估计语音频段能量和可懂度风险
扫描时域缺陷：信号丢失、不连续性、拼接、漂移
评估通道相关性和相位异常（如果是立体声）
提取并保存元数据：时间戳、设备/型号标签、嵌入的注释

准备法证工作副本，验证哈希值，并保持原始文件不被触碰。
定位 ROI 和目标信号；记录精确的时间范围及录音过程中的变化。
评估可懂度和信号质量的挑战；将挑战映射到缓解策略。
确定所需的处理步骤，并规划一个避免产生不必要伪影的工作流程顺序。使用 scripts/plan_from_preflight.py 生成计划草案，并根据具体案件决策完成它。
根据 ITU-R BS.1770 / EBU R 128 测量基线响度和真峰值，并记录峰值/RMS/直流偏移。
检测削波和信号丢失；如果存在削波，首先进行削波修复或暂停并记录限制。
检查频谱内容和噪声类型；收集代表性的噪声特征片段并估计信噪比。
如果是立体声，评估通道相关性和相位；记录异常情况。
创建基线试听日志（多个设备），并定义可懂度和可听性的成功标准。

失败模式防护措施：

在捕获所有飞行前检查字段之前，不要进行处理。
记录每个处理步骤、设置、软件版本和时间段，以确保可重复性。
将每个处理后的输出与未处理的输入进行比较，评估在可懂度和可听性方面的进展。
避免过度处理；检查被移除的信号（滤波器残留），以避免移除目标信号成分。
保持中间文件为未压缩格式，并在工具间移动时保留采样率/位深度。
对照原始文件进行最终审查；如果不满意，则修改或停止并报告限制。
如果请求无法实现，沟通限制，不要宣布完成。
在宣布完成前，需要客观指标和 A/B 试听。
不要仅依赖客观指标；通过批判性试听进行佐证。
在长时间审查期间，进行试听休息以避免听觉疲劳。

快速增强流水线

# 1. 分析原始文件 (运行飞行前检查并捕获基线指标)
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json

# 2. 创建带校验和的工作副本
cp evidence.wav working.wav
sha256sum evidence.wav > evidence.sha256

# 3. 应用增强
ffmpeg -i working.wav -af "\
  highpass=f=80,\
  adeclick=w=55:o=75,\
  afftdn=nr=12:nf=-30:nt=w,\
  equalizer=f=2500:t=q:w=1:g=3,\
  loudnorm=I=-16:TP=-1.5:LRA=11\
" enhanced.wav

# 4. 转录
whisper enhanced.wav --model large-v3 --language en

# 5. 验证原始文件未改变
sha256sum -c evidence.sha256

# 6. 验证改进 (客观比较 + A/B 试听)
python3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py \
  --before evidence.wav \
  --after enhanced.wav \
  --format md \
  --out comparison.md

阅读各个参考文件以获取详细解释和代码示例：

章节定义 - 类别结构和影响级别
规则模板 - 添加新规则的模板

文件	描述
AGENTS.md	包含所有规则的完整编译指南
references/_sections.md	类别定义和排序
assets/templates/_template.md	新规则模板
metadata.json	版本和参考信息

2026 年 2 月 5 日

🇺🇸English

Forensic Audio Research Audio Voice Recovery Best Practices

Comprehensive audio forensics and voice recovery guide providing CSI-level capabilities for recovering voice from low-quality, low-volume, or damaged audio recordings. Contains 45 rules across 8 categories, prioritized by impact to guide audio enhancement, forensic analysis, and transcription workflows.

When to Apply

Reference these guidelines when:

Recovering voice from noisy or low-quality recordings
Enhancing audio for transcription or legal evidence
Performing forensic audio authentication
Analyzing recordings for tampering or splices
Building automated audio processing pipelines
Transcribing difficult or degraded speech

Rule Categories by Priority

Priority	Category	Impact	Prefix	Rules
1	Signal Preservation & Analysis	CRITICAL	`signal-`	5
2	Noise Profiling & Estimation	CRITICAL	`noise-`	5
3	Spectral Processing	HIGH	`spectral-`	6
4	Voice Isolation & Enhancement	HIGH	`voice-`	7
5	Temporal Processing	MEDIUM-HIGH	`temporal-`	5
6	Transcription & Recognition	MEDIUM	`transcribe-`	5
7	Forensic Authentication	MEDIUM	`forensic-`	5
8	Tool Integration & Automation	LOW-MEDIUM	`tool-`	7

Quick Reference

1. Signal Preservation & Analysis (CRITICAL)

signal-preserve-original - Never modify original recording
signal-lossless-format - Use lossless formats for processing
signal-sample-rate - Preserve native sample rate
signal-bit-depth - Use maximum bit depth for processing
signal-analyze-first - Analyze before processing

2. Noise Profiling & Estimation (CRITICAL)

noise-profile-silence - Extract noise profile from silent segments
noise-identify-type - Identify noise type before reduction
noise-adaptive-estimation - Use adaptive estimation for non-stationary noise
noise-snr-assessment - Measure SNR before and after
noise-avoid-overprocessing - Avoid over-processing and musical artifacts

3. Spectral Processing (HIGH)

spectral-subtraction - Apply spectral subtraction for stationary noise
spectral-wiener-filter - Use Wiener filter for optimal noise estimation
spectral-notch-filter - Apply notch filters for tonal interference
spectral-band-limiting - Apply frequency band limiting for speech
spectral-equalization - Use forensic equalization to restore intelligibility
- Repair clipped audio before other processing

4. Voice Isolation & Enhancement (HIGH)

voice-rnnoise - Use RNNoise for real-time ML denoising
voice-dialogue-isolate - Use source separation for complex backgrounds
voice-formant-preserve - Preserve formants during pitch manipulation
voice-dereverb - Apply dereverberation for room echo
voice-enhance-speech - Use AI speech enhancement services for quick results
- Use VAD for targeted processing

5. Temporal Processing (MEDIUM-HIGH)

temporal-dynamic-range - Use dynamic range compression for level consistency
temporal-noise-gate - Apply noise gate to silence non-speech segments
temporal-time-stretch - Use time stretching for intelligibility
temporal-transient-repair - Repair transient damage (clicks, pops, dropouts)
temporal-silence-trim - Trim silence and normalize before export

6. Transcription & Recognition (MEDIUM)

transcribe-whisper - Use Whisper for noise-robust transcription
transcribe-multipass - Use multi-pass transcription for difficult audio
transcribe-segment - Segment audio for targeted transcription
transcribe-confidence - Track confidence scores for uncertain words
transcribe-hallucination - Detect and filter ASR hallucinations

7. Forensic Authentication (MEDIUM)

forensic-enf-analysis - Use ENF analysis for timestamp verification
forensic-metadata - Extract and verify audio metadata
forensic-tampering - Detect audio tampering and splices
forensic-chain-custody - Document chain of custody for evidence
forensic-speaker-id - Extract speaker characteristics for identification

8. Tool Integration & Automation (LOW-MEDIUM)

tool-ffmpeg-essentials - Master essential FFmpeg audio commands
tool-sox-commands - Use SoX for advanced audio manipulation
tool-python-pipeline - Build Python audio processing pipelines
tool-audacity-workflow - Use Audacity for visual analysis and manual editing
tool-install-guide - Install audio forensic toolchain
- Automate batch processing workflows

Essential Tools

Tool	Purpose	Install
FFmpeg	Format conversion, filtering	`brew install ffmpeg`
SoX	Noise profiling, effects	`brew install sox`
Whisper	Speech transcription	`pip install openai-whisper`
librosa	Python audio analysis	`pip install librosa`
noisereduce	ML noise reduction	`pip install noisereduce`

Workflow Scripts (Recommended)

Use the bundled scripts to generate objective baselines, create a workflow plan, and verify results.

scripts/preflight_audio.py - Generate a forensic preflight report (JSON or Markdown).
scripts/plan_from_preflight.py - Create a workflow plan template from the preflight report.
scripts/compare_audio.py - Compare objective metrics between baseline and processed audio.

Example usage:

# 1) Analyze and capture baseline metrics
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json

# 2) Generate a workflow plan template
python3 skills/.experimental/audio-voice-recovery/scripts/plan_from_preflight.py --preflight preflight.json --out plan.md

# 3) Compare baseline vs processed metrics
python3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py \
  --before evidence.wav \
  --after enhanced.wav \
  --format md \
  --out comparison.md

Forensic Preflight Workflow (Do This Before Any Changes)

Align preflight with SWGDE Best Practices for the Enhancement of Digital Audio (20-a-001) and SWGDE Best Practices for Forensic Audio (08-a-001). Establish an objective baseline state and plan the workflow so processing does not introduce clipping, artifacts, or false "done" confidence. Use scripts/preflight_audio.py to capture baseline metrics and preserve the report with the case file.

Capture and record before processing:

Record evidence identity and integrity: path, filename, file size, SHA-256 checksum, source, format/container, codec
Record signal integrity: sample rate, bit depth, channels, duration
Measure baseline loudness and levels: LUFS/LKFS, true peak, peak, RMS, dynamic range, DC offset
Detect clipping and document clipped-sample percentage, peak headroom, exact time ranges
Identify noise profile: stationary vs non-stationary, dominant noise bands, SNR estimate
Locate the region of interest (ROI) and document time ranges and changes over time
Inspect spectral content and estimate speech-band energy and intelligibility risk
Scan for temporal defects: dropouts, discontinuities, splices, drift
Evaluate channel correlation and phase anomalies (if stereo)
Extract and preserve metadata: timestamps, device/model tags, embedded notes

Procedure:

Prepare a forensic working copy, verify hashes, and preserve the original untouched.
Locate ROI and target signal; document exact time ranges and changes across the recording.
Assess challenges to intelligibility and signal quality; map challenges to mitigation strategies.
Identify required processing and plan a workflow order that avoids unwanted artifacts. Generate a plan draft with scripts/plan_from_preflight.py and complete it with case-specific decisions.
Measure baseline loudness and true peak per ITU-R BS.1770 / EBU R 128 and record peak/RMS/DC offset.
Detect clipping and dropouts; if clipping is present, declip first or pause and document limitations.
Inspect spectral content and noise type; collect representative noise profile segments and estimate SNR.
If stereo, evaluate channel correlation and phase; document anomalies.
Create a baseline listening log (multiple devices) and define success criteria for intelligibility and listenability.

Failure-pattern guardrails:

Do not process until every preflight field is captured.
Document every process, setting, software version, and time segment to enable repeatability.
Compare each processed output to the unprocessed input and assess progress toward intelligibility and listenability.
Avoid over-processing; review removed signal (filter residue) to avoid removing target signal components.
Keep intermediate files uncompressed and preserve sample rate/bit depth when moving between tools.
Perform a final review against the original; if unsatisfactory, revise or stop and report limitations.
If the request is not achievable, communicate limitations and do not declare completion.
Require objective metrics and A/B listening before declaring completion.
Do not rely solely on objective metrics; corroborate with critical listening.
Take listening breaks to avoid ear fatigue during extended reviews.

Quick Enhancement Pipeline

# 1. Analyze original (run preflight and capture baseline metrics)
python3 skills/.experimental/audio-voice-recovery/scripts/preflight_audio.py evidence.wav --out preflight.json

# 2. Create working copy with checksum
cp evidence.wav working.wav
sha256sum evidence.wav > evidence.sha256

# 3. Apply enhancement
ffmpeg -i working.wav -af "\
  highpass=f=80,\
  adeclick=w=55:o=75,\
  afftdn=nr=12:nf=-30:nt=w,\
  equalizer=f=2500:t=q:w=1:g=3,\
  loudnorm=I=-16:TP=-1.5:LRA=11\
" enhanced.wav

# 4. Transcribe
whisper enhanced.wav --model large-v3 --language en

# 5. Verify original unchanged
sha256sum -c evidence.sha256

# 6. Verify improvement (objective comparison + A/B listening)
python3 skills/.experimental/audio-voice-recovery/scripts/compare_audio.py \
  --before evidence.wav \
  --after enhanced.wav \
  --format md \
  --out comparison.md

How to Use

Read individual reference files for detailed explanations and code examples:

Section definitions - Category structure and impact levels
Rule template - Template for adding new rules

Reference Files

File	Description
AGENTS.md	Complete compiled guide with all rules
references/_sections.md	Category definitions and ordering
assets/templates/_template.md	Template for new rules
metadata.json	Version and reference information

Weekly Installs

Repository

pproenca/dot-skills

GitHub Stars

First Seen

Feb 5, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex54

gemini-cli53

github-copilot52

cursor52

opencode52

kimi-cli51

AI界面设计评审工具 - 全面评估UI/UX设计质量、检测AI生成痕迹与优化用户体验

58,500 周安装

voice-frequency-boost - Boost frequency regions for specific phonemes

tool-batch-automation

tool-quality-assessment - Measure audio quality metrics