音效工程师技能：空间音频、程序化声音与用户体验音频设计指南

sound-engineer by erichowens/some_claude_skills

94 周安装量

84 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/erichowens/some_claude_skills --skill sound-engineer

音频处理游戏用户体验

🇨🇳中文介绍

音效工程师：空间音频、程序化声音与应用用户体验音频

互动媒体（游戏、VR/AR 和移动应用）领域的专业音效工程师。专长于空间音频、程序化声音生成、中间件集成以及用户体验声音设计。

何时使用此技能

✅ 适用于：

空间音频（HRTF、双耳、Ambisonics）
程序化声音（脚步声、风声、环境音）
游戏音频中间件（Wwise、FMOD）
自适应/互动音乐系统
用户界面/用户体验声音设计（点击声、通知声、反馈音）
声音品牌（音频标识、品牌声音）
iOS/Android 音频会话处理
触觉-音频协调
实时数字信号处理（混响、均衡器、压缩）

❌ 不适用于：

音乐作曲/制作 → DAW 工具（Logic、Ableton）
语音合成/克隆 → voice-audio-engineer
电影音频后期制作 → 线性编辑工作流
播客编辑 → 标准音频编辑器
硬件麦克风设置 → 专业领域

MCP 集成

MCP	用途
ElevenLabs	`text_to_sound_effects` - 生成用户界面声音、通知音、撞击声
Firecrawl	研究 Wwise/FMOD 文档、数字信号处理算法、平台指南

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

专家与新手鉴别点

主题	新手	专家
空间音频	"左右平移就行"	使用 HRTF 卷积实现真 3D；了解用于 VR 头部追踪的 Ambisonics
脚步声	"用 10-20 个样本"	程序化合成：无限变化、极小内存占用、参数驱动
中间件	"播放声音就行"	使用 RTPC 处理连续参数、Switches 处理材质、States 处理音乐
自适应音乐	"交叉淡入淡出音轨"	水平重新配器（分层） + 垂直重新混音（音干）
用户界面声音	"任何点击声都行"	为品牌一致性、可访问性、触觉协调而设计
iOS 音频	"AVAudioPlayer 能用"	了解 AVAudioSession 类别、中断处理、路由变更
距离衰减	线性衰减	带参考距离的平方反比；对数衰减以求真实感
CPU 预算	"音频很便宜"	知道 5-10% 的预算；HRTF 卷积很昂贵（2毫秒/声源）

反面模式：大规模基于样本的脚步声

表现：20 个脚步声样本 × 6 种表面 × 3 种强度 = 360 个文件（180MB） 错误原因：内存膨胀，播放 20 分钟后可听出重复 正确做法：程序化合成 - 撞击 + 纹理层，通过参数实现无限变化 何时可用样本：小型游戏，非常特定的角色声音

反面模式：为每个声音使用 HRTF

表现：在 50 个同时发声的声源上使用完整的 HRTF 卷积 错误原因：50 × 2毫秒 = 100毫秒 CPU 时间；破坏帧预算 正确做法：为 3-5 个重要声源使用 HRTF；为环境背景音使用 Ambisonics；为远处/不重要的声音使用简单平移

反面模式：忽略音频会话（移动端）

表现：用户接到电话时应用音频停止，且永不恢复 错误原因：iOS/Android 需要明确的会话管理 正确做法：实现 AVAudioSession（iOS）或 AudioFocus（Android）；处理中断、路由变更

反面模式：硬编码声音

表现：PlaySound("footstep_concrete_01.wav") 错误原因：无变化、无参数控制、无法适应上下文 正确做法：使用带有 Switches/RTPCs 的中间件事件；为环境音使用程序化生成

反面模式：过响的用户界面声音

表现：每个按钮点击声都在 -3dB，与游戏音频音量相同 错误原因：用户界面声音应保持微妙，永不令人疲劳；违反平台指南 正确做法：用户界面声音在 -18 到 -24dB；使用短促、高频的瞬态音；尊重系统音量

2010年前：固定音频

仅样本播放
基本立体声平移
有限的实时处理

2010-2015：中间件时代

Wwise/FMOD 成为标准
RTPC 和 State 系统成熟
基本 HRTF 支持

2016-2020：VR 音频革命

用于 VR 头部追踪的 Ambisonics
空间音频 API（Resonance、Steam Audio）
程序化音频兴起

2021-2024：AI 与移动端

ElevenLabs/AI 音效生成
Apple 为 AirPods 推出空间音频
程序化音频成为 3A 游戏标准
触觉-音频设计成为独立学科

2025+：当前最佳实践

AI 辅助声音设计
神经音频编解码器
实时语音变换
从照片生成个性化 HRTF

方法	CPU 成本	质量	用例
立体声平移	~0.01毫秒	基本	远处声音，多个声源
HRTF 卷积	~2毫秒/声源	优秀	近距离/重要的 3D 声音
Ambisonics	~1毫秒总计	良好	VR，多个声源，头部追踪
双耳（简单）	~0.1毫秒/声源	尚可	预算/移动端空间音频

HRTF：将音频与测量的耳部脉冲响应（512-1024 个抽头）进行卷积。创造出令人信服的 3D 定位，包括高度。

Ambisonics：将声场编码为球谐函数（一阶为 W,X,Y,Z）。旋转不变，对多个声源效率高。

// 关键洞察：编码一次，廉价旋转
AmbisonicSignal encode(mono_input, direction) {
    return {
        mono * 0.707f,      // W（全向）
        mono * direction.x, // X（前后）
        mono * direction.y, // Y（左右）
        mono * direction.z  // Z（上下）
    };
}

为何程序化优于样本：

✅ 无限变化（无重复）
✅ 极小内存占用（约 50KB 对比 5-10MB）
✅ 参数驱动（速度 → 撞击力度）
✅ 通过物理材质感知表面

撞击爆发（20毫秒噪声 + 共振音调）
表面纹理（碎石 = 颗粒状，草地 = 滤波噪声）
碎屑（分散的微撞击）
表面均衡器（金属 = 明亮，草地 = 沉闷）

// 表面共振频率（专家知识）

float get_resonance(Surface s) {
    switch(s) {
        case Concrete: return 150.0f;  // 低，沉闷
        case Wood:     return 250.0f;  // 中，温暖
        case Metal:    return 500.0f;  // 高，清脆
        case Gravel:   return 300.0f;  // 嘎吱作响的中频
        default:       return 200.0f;
    }
}

事件：触发声音（脚步声、爆炸声、环境循环音）
RTPC：连续参数（速度 0-100，生命值 0-1）
Switches：离散选择（表面类型、武器类型）
States：全局上下文（音乐强度、水下状态）

// 通过 Wwise 实现材质感知的脚步声 void OnFootDown(FHitResult& hit) { FString surface = DetectSurface(hit.PhysMaterial); float speed = GetVelocity().Size();
```
SetSwitch("Surface", surface, this);        // 混凝土/木材/金属
SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 归一化
PostEvent(FootstepEvent, this);
```
}

用户界面/用户体验声音设计

应用声音设计原则：

微妙 - 用户界面声音在 -18 到 -24dB
短促 - 大多数交互音为 50-200 毫秒
一致 - 整个应用使用相同的家族/音色
可访问 - 不要仅依赖音频提供反馈
触觉配对 - iOS 触觉应与音频特性匹配

类别	示例	时长	特性
点击反馈	按钮、开关	30-80毫秒	柔和、高频点击声
成功	保存、发送、完成	150-300毫秒	上升、积极的音调
错误	无效、失败	200-400毫秒	下降、小调音调
通知	提醒、提示	300-800毫秒	独特、引人注意
过渡	屏幕切换、模态框	100-250毫秒	嗖嗖声、微妙的移动感

iOS/Android 音频会话

iOS AVAudioSession 类别：

.ambient - 与其他音频混合，响铃时静音
.playback - 中断其他音频，忽略响铃
.playAndRecord - 用于语音应用
.soloAmbient - 默认，使其他音频静音

关键处理程序：

中断（电话）
路由变更（耳机拔出）
次要音频（Siri）

// 正确的 iOS 音频会话设置 func configureAudioSession() { let session = AVAudioSession.sharedInstance() try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers]) try? session.setActive(true)
```
NotificationCenter.default.addObserver(
    self,
    selector: #selector(handleInterruption),
    name: AVAudioSession.interruptionNotification,
    object: nil
)
```
}

操作	CPU 时间	备注
HRTF 卷积（512抽头）	~2毫秒/声源	使用 FFT 重叠相加法
Ambisonic 编码	~0.1毫秒/声源	非常高效
Ambisonic 解码（双耳）	~1毫秒总计	支持多个声源
程序化脚步声	~1-2毫秒	对比每个样本 500KB
风声合成	~0.5毫秒/帧	实时流式处理
Wwise 事件发布	<0.1毫秒	可忽略不计
iOS 音频回调	5-10毫秒预算	在 48kHz/512 样本下

预算指南：音频应使用帧时间的 5-10%。

空间音频决策树

带头部追踪的 VR？ → Ambisonics
少量重要声源？ → 完整 HRTF
多个背景声源？ → 简单平移 + 距离衰减
CPU 有限的移动端？ → 双耳（简单）或平移

何时使用程序化音频

环境音（风、雨、火） → 始终程序化
脚步声 → 大型游戏用程序化，小型游戏用样本
用户界面声音 → 生成一次，然后缓存
撞击/爆炸声 → 混合（程序化 + 样本层）

带音乐的游戏：.ambient + mixWithOthers
冥想/专注应用：.playback（中断音乐）
语音聊天：.playAndRecord
视频播放器：.playback

voice-audio-engineer - 语音合成和 TTS
vr-avatar-engineer - VR 音频 + 虚拟形象集成
metal-shader-expert - GPU 音频处理
native-app-designer - 应用用户界面声音集成

详细实现：参见 /references/implementations.md

记住：出色的音频是隐形的——玩家能感受到它，但不会注意到它。专注于支持体验，而非炫耀。程序化音频节省内存并消除重复。始终尊重 CPU 预算和平台音频会话要求。

🇺🇸English

Sound Engineer: Spatial Audio, Procedural Sound & App UX Audio

Expert audio engineer for interactive media: games, VR/AR, and mobile apps. Specializes in spatial audio, procedural sound generation, middleware integration, and UX sound design.

When to Use This Skill

✅ Use for:

Spatial audio (HRTF, binaural, Ambisonics)
Procedural sound (footsteps, wind, environmental)
Game audio middleware (Wwise, FMOD)
Adaptive/interactive music systems
UI/UX sound design (clicks, notifications, feedback)
Sonic branding (audio logos, brand sounds)
iOS/Android audio session handling
Haptic-audio coordination
Real-time DSP (reverb, EQ, compression)

❌ Do NOT use for:

Music composition/production → DAW tools (Logic, Ableton)
Voice synthesis/cloning → voice-audio-engineer
Film audio post-production → linear editing workflows
Podcast editing → standard audio editors
Hardware microphone setup → specialized domain

MCP Integrations

MCP	Purpose
ElevenLabs	`text_to_sound_effects` - Generate UI sounds, notifications, impacts
Firecrawl	Research Wwise/FMOD docs, DSP algorithms, platform guidelines
WebFetch	Fetch Apple/Android audio session documentation

Expert vs Novice Shibboleths

Topic	Novice	Expert
Spatial audio	"Just pan left/right"	Uses HRTF convolution for true 3D; knows Ambisonics for VR head tracking
Footsteps	"Use 10-20 samples"	Procedural synthesis: infinite variation, tiny memory, parameter-driven
Middleware	"Just play sounds"	Uses RTPC for continuous params, Switches for materials, States for music
Adaptive music	"Crossfade tracks"	Horizontal re-orchestration (layers) + vertical remixing (stems)
UI sounds	"Any click sound works"	Designs for brand consistency, accessibility, haptic coordination
iOS audio	"AVAudioPlayer works"	Knows AVAudioSession categories, interruption handling, route changes
Distance rolloff	Linear attenuation

Common Anti-Patterns

Anti-Pattern: Sample-Based Footsteps at Scale

What it looks like : 20 footstep samples × 6 surfaces × 3 intensities = 360 files (180MB) Why it's wrong : Memory bloat, repetition audible after 20 minutes of play What to do instead : Procedural synthesis - impact + texture layers, infinite variation from parameters When samples OK : Small games, very specific character sounds

Anti-Pattern: HRTF for Every Sound

What it looks like : Full HRTF convolution on 50 simultaneous sources Why it's wrong : 50 × 2ms = 100ms CPU time; destroys frame budget What to do instead : HRTF for 3-5 important sources; Ambisonics for ambient bed; simple panning for distant/unimportant

Anti-Pattern: Ignoring Audio Sessions (Mobile)

What it looks like : App audio stops when user gets a phone call, never resumes Why it's wrong : iOS/Android require explicit session management What to do instead : Implement AVAudioSession (iOS) or AudioFocus (Android); handle interruptions, route changes

Anti-Pattern: Hard-Coded Sounds

What it looks like : PlaySound("footstep_concrete_01.wav") Why it's wrong : No variation, no parameter control, can't adapt to context What to do instead : Use middleware events with Switches/RTPCs; procedural generation for environmental sounds

Anti-Pattern: Loud UI Sounds

What it looks like : Every button click at -3dB, same volume as gameplay audio Why it's wrong : UI sounds should be subtle, never fatiguing; violates platform guidelines What to do instead : UI sounds at -18 to -24dB; use short, high-frequency transients; respect system volume

Evolution Timeline

Pre-2010: Fixed Audio

Sample playback only
Basic stereo panning
Limited real-time processing

2010-2015: Middleware Era

Wwise/FMOD become standard
RTPC and State systems mature
Basic HRTF support

2016-2020: VR Audio Revolution

Ambisonics for VR head tracking
Spatial audio APIs (Resonance, Steam Audio)
Procedural audio gains traction

2021-2024: AI & Mobile

ElevenLabs/AI sound effect generation
Apple Spatial Audio for AirPods
Procedural audio standard for AAA
Haptic-audio design becomes discipline

2025+: Current Best Practices

AI-assisted sound design
Neural audio codecs
Real-time voice transformation
Personalized HRTF from photos

Core Concepts

Spatial Audio Approaches

Approach	CPU Cost	Quality	Use Case
Stereo panning	~0.01ms	Basic	Distant sounds, many sources
HRTF convolution	~2ms/source	Excellent	Close/important 3D sounds
Ambisonics	~1ms total	Good	VR, many sources, head tracking
Binaural (simple)	~0.1ms/source	Decent	Budget/mobile spatial

HRTF : Convolves audio with measured ear impulse responses (512-1024 taps). Creates convincing 3D positioning including elevation.

Ambisonics : Encodes sound field as spherical harmonics (W,X,Y,Z for 1st order). Rotation-invariant, efficient for many sources.

// Key insight: encode once, rotate cheaply
AmbisonicSignal encode(mono_input, direction) {
    return {
        mono * 0.707f,      // W (omnidirectional)
        mono * direction.x, // X (front-back)
        mono * direction.y, // Y (left-right)
        mono * direction.z  // Z (up-down)
    };
}

Procedural Footsteps

Why procedural beats samples:

✅ Infinite variation (no repetition)
✅ Tiny memory (~50KB vs 5-10MB)
✅ Parameter-driven (speed → impact force)
✅ Surface-aware from physics materials

Core synthesis:

Impact burst (20ms noise + resonant tone)
Surface texture (gravel = granular, grass = filtered noise)
Debris (scattered micro-impacts)
Surface EQ (metal = bright, grass = muffled)

// Surface resonance frequencies (expert knowledge)

float get_resonance(Surface s) {
    switch(s) {
        case Concrete: return 150.0f;  // Low, dull
        case Wood:     return 250.0f;  // Mid, warm
        case Metal:    return 500.0f;  // High, ringing
        case Gravel:   return 300.0f;  // Crunchy mid
        default:       return 200.0f;
    }
}

Wwise/FMOD Integration

Key abstractions:

Events : Trigger sounds (footstep, explosion, ambient loop)
RTPC : Continuous parameters (speed 0-100, health 0-1)
Switches : Discrete choices (surface type, weapon type)
States : Global context (music intensity, underwater)

// Material-aware footsteps via Wwise void OnFootDown(FHitResult& hit) { FString surface = DetectSurface(hit.PhysMaterial); float speed = GetVelocity().Size();
```
SetSwitch("Surface", surface, this);        // Concrete/Wood/Metal
SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 normalized
PostEvent(FootstepEvent, this);
```
}

UI/UX Sound Design

Principles for app sounds:

Subtle - UI sounds at -18 to -24dB
Short - 50-200ms for most interactions
Consistent - Same family/timbre across app
Accessible - Don't rely solely on audio for feedback
Haptic-paired - iOS haptics should match audio characteristics

Sound types:

Category	Examples	Duration	Character
Tap feedback	Button, toggle	30-80ms	Soft, high-frequency click
Success	Save, send, complete	150-300ms	Rising, positive tone
Error	Invalid, failed	200-400ms	Descending, minor tone
Notification	Alert, reminder	300-800ms	Distinctive, attention-getting
Transition	Screen change, modal	100-250ms	Whoosh, subtle movement

iOS/Android Audio Sessions

iOS AVAudioSession categories:

.ambient - Mixes with other audio, silenced by ringer
.playback - Interrupts other audio, ignores ringer
.playAndRecord - For voice apps
.soloAmbient - Default, silences other audio

Critical handlers:

Interruption (phone call)
Route change (headphones unplugged)
Secondary audio (Siri)

// Proper iOS audio session setup func configureAudioSession() { let session = AVAudioSession.sharedInstance() try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers]) try? session.setActive(true)
```
NotificationCenter.default.addObserver(
    self,
    selector: #selector(handleInterruption),
    name: AVAudioSession.interruptionNotification,
    object: nil
)
```
}

Performance Targets

Operation	CPU Time	Notes
HRTF convolution (512-tap)	~2ms/source	Use FFT overlap-add
Ambisonic encode	~0.1ms/source	Very efficient
Ambisonic decode (binaural)	~1ms total	Supports many sources
Procedural footstep	~1-2ms	vs 500KB per sample
Wind synthesis	~0.5ms/frame	Real-time streaming
Wwise event post	<0.1ms	Negligible
iOS audio callback	5-10ms budget	At 48kHz/512 samples

Budget guideline : Audio should use 5-10% of frame time.

Quick Reference

Spatial Audio Decision Tree

VR with head tracking? → Ambisonics
Few important sources? → Full HRTF
Many background sources? → Simple panning + distance rolloff
Mobile with limited CPU? → Binaural (simple) or panning

When to Use Procedural Audio

Environmental (wind, rain, fire) → Always procedural
Footsteps → Procedural for large games, samples for small
UI sounds → Generated once, then cached
Impacts/explosions → Hybrid (procedural + sample layers)

Platform Audio Sessions

Game with music : .ambient + mixWithOthers
Meditation/focus app : .playback (interrupt music)
Voice chat : .playAndRecord
Video player : .playback

Integrates With

voice-audio-engineer - Voice synthesis and TTS
vr-avatar-engineer - VR audio + avatar integration
metal-shader-expert - GPU audio processing
native-app-designer - App UI sound integration

For detailed implementations : See /references/implementations.md

Remember : Great audio is invisible—players feel it, don't notice it. Focus on supporting the experience, not showing off. Procedural audio saves memory and eliminates repetition. Always respect CPU budgets and platform audio session requirements.

Weekly Installs

Repository

erichowens/some…e_skills

GitHub Stars

First Seen

Jan 22, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode77

codex76

gemini-cli76

claude-code68

github-copilot68

cursor68

AI虚拟形象视频生成工具 - 说话头部与口型同步技术，支持多语言配音

7,900 周安装