sound-engineer by erichowens/some_claude_skills
npx skills add https://github.com/erichowens/some_claude_skills --skill sound-engineer互动媒体(游戏、VR/AR 和移动应用)领域的专业音效工程师。专长于空间音频、程序化声音生成、中间件集成以及用户体验声音设计。
✅ 适用于:
❌ 不适用于:
| MCP | 用途 |
|---|---|
| ElevenLabs | text_to_sound_effects - 生成用户界面声音、通知音、撞击声 |
| Firecrawl | 研究 Wwise/FMOD 文档、数字信号处理算法、平台指南 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| WebFetch |
| 获取 Apple/Android 音频会话文档 |
| 主题 | 新手 | 专家 |
|---|---|---|
| 空间音频 | "左右平移就行" | 使用 HRTF 卷积实现真 3D;了解用于 VR 头部追踪的 Ambisonics |
| 脚步声 | "用 10-20 个样本" | 程序化合成:无限变化、极小内存占用、参数驱动 |
| 中间件 | "播放声音就行" | 使用 RTPC 处理连续参数、Switches 处理材质、States 处理音乐 |
| 自适应音乐 | "交叉淡入淡出音轨" | 水平重新配器(分层) + 垂直重新混音(音干) |
| 用户界面声音 | "任何点击声都行" | 为品牌一致性、可访问性、触觉协调而设计 |
| iOS 音频 | "AVAudioPlayer 能用" | 了解 AVAudioSession 类别、中断处理、路由变更 |
| 距离衰减 | 线性衰减 | 带参考距离的平方反比;对数衰减以求真实感 |
| CPU 预算 | "音频很便宜" | 知道 5-10% 的预算;HRTF 卷积很昂贵(2毫秒/声源) |
表现:20 个脚步声样本 × 6 种表面 × 3 种强度 = 360 个文件(180MB) 错误原因:内存膨胀,播放 20 分钟后可听出重复 正确做法:程序化合成 - 撞击 + 纹理层,通过参数实现无限变化 何时可用样本:小型游戏,非常特定的角色声音
表现:在 50 个同时发声的声源上使用完整的 HRTF 卷积 错误原因:50 × 2毫秒 = 100毫秒 CPU 时间;破坏帧预算 正确做法:为 3-5 个重要声源使用 HRTF;为环境背景音使用 Ambisonics;为远处/不重要的声音使用简单平移
表现:用户接到电话时应用音频停止,且永不恢复 错误原因:iOS/Android 需要明确的会话管理 正确做法:实现 AVAudioSession(iOS)或 AudioFocus(Android);处理中断、路由变更
表现:PlaySound("footstep_concrete_01.wav") 错误原因:无变化、无参数控制、无法适应上下文 正确做法:使用带有 Switches/RTPCs 的中间件事件;为环境音使用程序化生成
表现:每个按钮点击声都在 -3dB,与游戏音频音量相同 错误原因:用户界面声音应保持微妙,永不令人疲劳;违反平台指南 正确做法:用户界面声音在 -18 到 -24dB;使用短促、高频的瞬态音;尊重系统音量
| 方法 | CPU 成本 | 质量 | 用例 |
|---|---|---|---|
| 立体声平移 | ~0.01毫秒 | 基本 | 远处声音,多个声源 |
| HRTF 卷积 | ~2毫秒/声源 | 优秀 | 近距离/重要的 3D 声音 |
| Ambisonics | ~1毫秒总计 | 良好 | VR,多个声源,头部追踪 |
| 双耳(简单) | ~0.1毫秒/声源 | 尚可 | 预算/移动端空间音频 |
HRTF:将音频与测量的耳部脉冲响应(512-1024 个抽头)进行卷积。创造出令人信服的 3D 定位,包括高度。
Ambisonics:将声场编码为球谐函数(一阶为 W,X,Y,Z)。旋转不变,对多个声源效率高。
// 关键洞察:编码一次,廉价旋转
AmbisonicSignal encode(mono_input, direction) {
return {
mono * 0.707f, // W(全向)
mono * direction.x, // X(前后)
mono * direction.y, // Y(左右)
mono * direction.z // Z(上下)
};
}
为何程序化优于样本:
核心合成:
// 表面共振频率(专家知识)
float get_resonance(Surface s) {
switch(s) {
case Concrete: return 150.0f; // 低,沉闷
case Wood: return 250.0f; // 中,温暖
case Metal: return 500.0f; // 高,清脆
case Gravel: return 300.0f; // 嘎吱作响的中频
default: return 200.0f;
}
}
关键抽象:
事件:触发声音(脚步声、爆炸声、环境循环音)
RTPC:连续参数(速度 0-100,生命值 0-1)
Switches:离散选择(表面类型、武器类型)
States:全局上下文(音乐强度、水下状态)
// 通过 Wwise 实现材质感知的脚步声 void OnFootDown(FHitResult& hit) { FString surface = DetectSurface(hit.PhysMaterial); float speed = GetVelocity().Size();
SetSwitch("Surface", surface, this); // 混凝土/木材/金属
SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 归一化
PostEvent(FootstepEvent, this);
}
应用声音设计原则:
声音类型:
| 类别 | 示例 | 时长 | 特性 |
|---|---|---|---|
| 点击反馈 | 按钮、开关 | 30-80毫秒 | 柔和、高频点击声 |
| 成功 | 保存、发送、完成 | 150-300毫秒 | 上升、积极的音调 |
| 错误 | 无效、失败 | 200-400毫秒 | 下降、小调音调 |
| 通知 | 提醒、提示 | 300-800毫秒 | 独特、引人注意 |
| 过渡 | 屏幕切换、模态框 | 100-250毫秒 | 嗖嗖声、微妙的移动感 |
iOS AVAudioSession 类别:
.ambient - 与其他音频混合,响铃时静音.playback - 中断其他音频,忽略响铃.playAndRecord - 用于语音应用.soloAmbient - 默认,使其他音频静音关键处理程序:
中断(电话)
路由变更(耳机拔出)
次要音频(Siri)
// 正确的 iOS 音频会话设置 func configureAudioSession() { let session = AVAudioSession.sharedInstance() try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers]) try? session.setActive(true)
NotificationCenter.default.addObserver(
self,
selector: #selector(handleInterruption),
name: AVAudioSession.interruptionNotification,
object: nil
)
}
| 操作 | CPU 时间 | 备注 |
|---|---|---|
| HRTF 卷积(512抽头) | ~2毫秒/声源 | 使用 FFT 重叠相加法 |
| Ambisonic 编码 | ~0.1毫秒/声源 | 非常高效 |
| Ambisonic 解码(双耳) | ~1毫秒总计 | 支持多个声源 |
| 程序化脚步声 | ~1-2毫秒 | 对比每个样本 500KB |
| 风声合成 | ~0.5毫秒/帧 | 实时流式处理 |
| Wwise 事件发布 | <0.1毫秒 | 可忽略不计 |
| iOS 音频回调 | 5-10毫秒预算 | 在 48kHz/512 样本下 |
预算指南:音频应使用帧时间的 5-10%。
.ambient + mixWithOthers.playback(中断音乐).playAndRecord.playback详细实现:参见 /references/implementations.md
记住:出色的音频是隐形的——玩家能感受到它,但不会注意到它。专注于支持体验,而非炫耀。程序化音频节省内存并消除重复。始终尊重 CPU 预算和平台音频会话要求。
每周安装数
88
仓库
GitHub 星标数
78
首次出现
2026年1月22日
安全审计
安装于
opencode77
codex76
gemini-cli76
claude-code68
github-copilot68
cursor68
Expert audio engineer for interactive media: games, VR/AR, and mobile apps. Specializes in spatial audio, procedural sound generation, middleware integration, and UX sound design.
✅ Use for:
❌ Do NOT use for:
| MCP | Purpose |
|---|---|
| ElevenLabs | text_to_sound_effects - Generate UI sounds, notifications, impacts |
| Firecrawl | Research Wwise/FMOD docs, DSP algorithms, platform guidelines |
| WebFetch | Fetch Apple/Android audio session documentation |
| Topic | Novice | Expert |
|---|---|---|
| Spatial audio | "Just pan left/right" | Uses HRTF convolution for true 3D; knows Ambisonics for VR head tracking |
| Footsteps | "Use 10-20 samples" | Procedural synthesis: infinite variation, tiny memory, parameter-driven |
| Middleware | "Just play sounds" | Uses RTPC for continuous params, Switches for materials, States for music |
| Adaptive music | "Crossfade tracks" | Horizontal re-orchestration (layers) + vertical remixing (stems) |
| UI sounds | "Any click sound works" | Designs for brand consistency, accessibility, haptic coordination |
| iOS audio | "AVAudioPlayer works" | Knows AVAudioSession categories, interruption handling, route changes |
| Distance rolloff | Linear attenuation |
What it looks like : 20 footstep samples × 6 surfaces × 3 intensities = 360 files (180MB) Why it's wrong : Memory bloat, repetition audible after 20 minutes of play What to do instead : Procedural synthesis - impact + texture layers, infinite variation from parameters When samples OK : Small games, very specific character sounds
What it looks like : Full HRTF convolution on 50 simultaneous sources Why it's wrong : 50 × 2ms = 100ms CPU time; destroys frame budget What to do instead : HRTF for 3-5 important sources; Ambisonics for ambient bed; simple panning for distant/unimportant
What it looks like : App audio stops when user gets a phone call, never resumes Why it's wrong : iOS/Android require explicit session management What to do instead : Implement AVAudioSession (iOS) or AudioFocus (Android); handle interruptions, route changes
What it looks like : PlaySound("footstep_concrete_01.wav") Why it's wrong : No variation, no parameter control, can't adapt to context What to do instead : Use middleware events with Switches/RTPCs; procedural generation for environmental sounds
What it looks like : Every button click at -3dB, same volume as gameplay audio Why it's wrong : UI sounds should be subtle, never fatiguing; violates platform guidelines What to do instead : UI sounds at -18 to -24dB; use short, high-frequency transients; respect system volume
| Approach | CPU Cost | Quality | Use Case |
|---|---|---|---|
| Stereo panning | ~0.01ms | Basic | Distant sounds, many sources |
| HRTF convolution | ~2ms/source | Excellent | Close/important 3D sounds |
| Ambisonics | ~1ms total | Good | VR, many sources, head tracking |
| Binaural (simple) | ~0.1ms/source | Decent | Budget/mobile spatial |
HRTF : Convolves audio with measured ear impulse responses (512-1024 taps). Creates convincing 3D positioning including elevation.
Ambisonics : Encodes sound field as spherical harmonics (W,X,Y,Z for 1st order). Rotation-invariant, efficient for many sources.
// Key insight: encode once, rotate cheaply
AmbisonicSignal encode(mono_input, direction) {
return {
mono * 0.707f, // W (omnidirectional)
mono * direction.x, // X (front-back)
mono * direction.y, // Y (left-right)
mono * direction.z // Z (up-down)
};
}
Why procedural beats samples:
Core synthesis:
// Surface resonance frequencies (expert knowledge)
float get_resonance(Surface s) {
switch(s) {
case Concrete: return 150.0f; // Low, dull
case Wood: return 250.0f; // Mid, warm
case Metal: return 500.0f; // High, ringing
case Gravel: return 300.0f; // Crunchy mid
default: return 200.0f;
}
}
Key abstractions:
Events : Trigger sounds (footstep, explosion, ambient loop)
RTPC : Continuous parameters (speed 0-100, health 0-1)
Switches : Discrete choices (surface type, weapon type)
States : Global context (music intensity, underwater)
// Material-aware footsteps via Wwise void OnFootDown(FHitResult& hit) { FString surface = DetectSurface(hit.PhysMaterial); float speed = GetVelocity().Size();
SetSwitch("Surface", surface, this); // Concrete/Wood/Metal
SetRTPCValue("Impact_Force", speed/600.0f); // 0-1 normalized
PostEvent(FootstepEvent, this);
}
Principles for app sounds:
Sound types:
| Category | Examples | Duration | Character |
|---|---|---|---|
| Tap feedback | Button, toggle | 30-80ms | Soft, high-frequency click |
| Success | Save, send, complete | 150-300ms | Rising, positive tone |
| Error | Invalid, failed | 200-400ms | Descending, minor tone |
| Notification | Alert, reminder | 300-800ms | Distinctive, attention-getting |
| Transition | Screen change, modal | 100-250ms | Whoosh, subtle movement |
iOS AVAudioSession categories:
.ambient - Mixes with other audio, silenced by ringer.playback - Interrupts other audio, ignores ringer.playAndRecord - For voice apps.soloAmbient - Default, silences other audioCritical handlers:
Interruption (phone call)
Route change (headphones unplugged)
Secondary audio (Siri)
// Proper iOS audio session setup func configureAudioSession() { let session = AVAudioSession.sharedInstance() try? session.setCategory(.playback, mode: .default, options: [.mixWithOthers]) try? session.setActive(true)
NotificationCenter.default.addObserver(
self,
selector: #selector(handleInterruption),
name: AVAudioSession.interruptionNotification,
object: nil
)
}
| Operation | CPU Time | Notes |
|---|---|---|
| HRTF convolution (512-tap) | ~2ms/source | Use FFT overlap-add |
| Ambisonic encode | ~0.1ms/source | Very efficient |
| Ambisonic decode (binaural) | ~1ms total | Supports many sources |
| Procedural footstep | ~1-2ms | vs 500KB per sample |
| Wind synthesis | ~0.5ms/frame | Real-time streaming |
| Wwise event post | <0.1ms | Negligible |
| iOS audio callback | 5-10ms budget | At 48kHz/512 samples |
Budget guideline : Audio should use 5-10% of frame time.
.ambient + mixWithOthers.playback (interrupt music).playAndRecord.playbackFor detailed implementations : See /references/implementations.md
Remember : Great audio is invisible—players feel it, don't notice it. Focus on supporting the experience, not showing off. Procedural audio saves memory and eliminates repetition. Always respect CPU budgets and platform audio session requirements.
Weekly Installs
88
Repository
GitHub Stars
78
First Seen
Jan 22, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode77
codex76
gemini-cli76
claude-code68
github-copilot68
cursor68
AI虚拟形象视频生成工具 - 说话头部与口型同步技术,支持多语言配音
7,900 周安装
Angular SignalStore 最佳实践 - NgRx 信号状态管理规则与技巧
222 周安装
lp-agent:自动化流动性提供策略工具 | Hummingbot API 与 Solana DEX 集成
217 周安装
SkyPilot 多云编排指南:跨 AWS/GCP/Azure 自动优化机器学习成本与分布式训练
215 周安装
邮件序列设计指南:自动化营销策略、模板与最佳实践 | 提升转化率
218 周安装
开发者成长分析工具 - 基于Claude Code聊天历史识别编码模式和改进领域
218 周安装
高级全栈开发技能:项目脚手架与代码质量分析工具,快速搭建Next.js/FastAPI/MERN项目
215 周安装
| Inverse square with reference distance; logarithmic for realism |
| CPU budget | "Audio is cheap" | Knows 5-10% budget; HRTF convolution is expensive (2ms/source) |