重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
azure-ai-voicelive-py by sickn33/antigravity-awesome-skills
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill azure-ai-voicelive-py使用双向 WebSocket 通信构建实时语音 AI 应用程序。
pip install azure-ai-voicelive aiohttp azure-identity
AZURE_COGNITIVE_SERVICES_ENDPOINT=https://<region>.api.cognitive.microsoft.com
# 用于 API 密钥认证(不建议用于生产环境)
AZURE_COGNITIVE_SERVICES_KEY=<api-key>
DefaultAzureCredential(推荐) :
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential
async with connect(
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
credential=DefaultAzureCredential(),
model="gpt-4o-realtime-preview",
credential_scopes=["https://cognitiveservices.azure.com/.default"]
) as conn:
...
API 密钥 :
from azure.ai.voicelive.aio import connect
from azure.core.credentials import AzureKeyCredential
async with connect(
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_COGNITIVE_SERVICES_KEY"]),
model="gpt-4o-realtime-preview"
) as conn:
...
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
import asyncio
import os
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential
async def main():
async with connect(
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
credential=DefaultAzureCredential(),
model="gpt-4o-realtime-preview",
credential_scopes=["https://cognitiveservices.azure.com/.default"]
) as conn:
# 使用指令更新会话
await conn.session.update(session={
"instructions": "You are a helpful assistant.",
"modalities": ["text", "audio"],
"voice": "alloy"
})
# 监听事件
async for event in conn:
print(f"Event: {event.type}")
if event.type == "response.audio_transcript.done":
print(f"Transcript: {event.transcript}")
elif event.type == "response.done":
break
asyncio.run(main())
VoiceLiveConnection 暴露了以下资源:
| 资源 | 用途 | 关键方法 |
|---|---|---|
conn.session | 会话配置 | update(session=...) |
conn.response | 模型响应 | create(), cancel() |
conn.input_audio_buffer | 音频输入 | append(), commit(), clear() |
conn.output_audio_buffer | 音频输出 | clear() |
conn.conversation | 对话状态 | item.create(), item.delete(), item.truncate() |
conn.transcription_session | 转录配置 | update(session=...) |
from azure.ai.voicelive.models import RequestSession, FunctionTool
await conn.session.update(session=RequestSession(
instructions="You are a helpful voice assistant.",
modalities=["text", "audio"],
voice="alloy", # 或 "echo", "shimmer", "sage" 等
input_audio_format="pcm16",
output_audio_format="pcm16",
turn_detection={
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 500
},
tools=[
FunctionTool(
type="function",
name="get_weather",
description="Get current weather",
parameters={
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
)
]
))
import base64
# 读取音频块(16 位 PCM,24kHz 单声道)
audio_chunk = await read_audio_from_microphone()
b64_audio = base64.b64encode(audio_chunk).decode()
await conn.input_audio_buffer.append(audio=b64_audio)
async for event in conn:
if event.type == "response.audio.delta":
audio_bytes = base64.b64decode(event.delta)
await play_audio(audio_bytes)
elif event.type == "response.audio.done":
print("Audio complete")
async for event in conn:
match event.type:
# 会话事件
case "session.created":
print(f"Session: {event.session}")
case "session.updated":
print("Session updated")
# 音频输入事件
case "input_audio_buffer.speech_started":
print(f"Speech started at {event.audio_start_ms}ms")
case "input_audio_buffer.speech_stopped":
print(f"Speech stopped at {event.audio_end_ms}ms")
# 转录事件
case "conversation.item.input_audio_transcription.completed":
print(f"User said: {event.transcript}")
case "conversation.item.input_audio_transcription.delta":
print(f"Partial: {event.delta}")
# 响应事件
case "response.created":
print(f"Response started: {event.response.id}")
case "response.audio_transcript.delta":
print(event.delta, end="", flush=True)
case "response.audio.delta":
audio = base64.b64decode(event.delta)
case "response.done":
print(f"Response complete: {event.response.status}")
# 函数调用
case "response.function_call_arguments.done":
result = handle_function(event.name, event.arguments)
await conn.conversation.item.create(item={
"type": "function_call_output",
"call_id": event.call_id,
"output": json.dumps(result)
})
await conn.response.create()
# 错误
case "error":
print(f"Error: {event.error.message}")
await conn.session.update(session={"turn_detection": None})
# 手动控制轮换
await conn.input_audio_buffer.append(audio=b64_audio)
await conn.input_audio_buffer.commit() # 用户轮换结束
await conn.response.create() # 触发响应
async for event in conn:
if event.type == "input_audio_buffer.speech_started":
# 用户中断 - 取消当前响应
await conn.response.cancel()
await conn.output_audio_buffer.clear()
# 添加系统消息
await conn.conversation.item.create(item={
"type": "message",
"role": "system",
"content": [{"type": "input_text", "text": "Be concise."}]
})
# 添加用户消息
await conn.conversation.item.create(item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
})
await conn.response.create()
| 语音 | 描述 |
|---|---|
alloy | 中性,平衡 |
echo | 温暖,对话式 |
shimmer | 清晰,专业 |
sage | 冷静,权威 |
coral | 友好,乐观 |
ash | 深沉,稳重 |
ballad | 富有表现力 |
verse | 叙事风格 |
Azure 语音:使用 AzureStandardVoice、AzureCustomVoice 或 AzurePersonalVoice 模型。
| 格式 | 采样率 | 使用场景 |
|---|---|---|
pcm16 | 24kHz | 默认,高质量 |
pcm16-8000hz | 8kHz | 电话通信 |
pcm16-16000hz | 16kHz | 语音助手 |
g711_ulaw | 8kHz | 电话通信(美国) |
g711_alaw | 8kHz | 电话通信(欧盟) |
# 服务器 VAD(默认)
{"type": "server_vad", "threshold": 0.5, "silence_duration_ms": 500}
# Azure 语义 VAD(更智能的检测)
{"type": "azure_semantic_vad"}
{"type": "azure_semantic_vad_en"} # 英语优化
{"type": "azure_semantic_vad_multilingual"}
from azure.ai.voicelive.aio import ConnectionError, ConnectionClosed
try:
async with connect(...) as conn:
async for event in conn:
if event.type == "error":
print(f"API Error: {event.error.code} - {event.error.message}")
except ConnectionClosed as e:
print(f"Connection closed: {e.code} - {e.reason}")
except ConnectionError as e:
print(f"Connection error: {e}")
此技能适用于执行概述中描述的工作流或操作。
每周安装次数
51
代码仓库
GitHub 星标数
27.6K
首次出现
2026 年 2 月 17 日
安全审计
安装于
opencode50
codex50
gemini-cli49
github-copilot49
amp49
cline49
Build real-time voice AI applications with bidirectional WebSocket communication.
pip install azure-ai-voicelive aiohttp azure-identity
AZURE_COGNITIVE_SERVICES_ENDPOINT=https://<region>.api.cognitive.microsoft.com
# For API key auth (not recommended for production)
AZURE_COGNITIVE_SERVICES_KEY=<api-key>
DefaultAzureCredential (preferred) :
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential
async with connect(
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
credential=DefaultAzureCredential(),
model="gpt-4o-realtime-preview",
credential_scopes=["https://cognitiveservices.azure.com/.default"]
) as conn:
...
API Key :
from azure.ai.voicelive.aio import connect
from azure.core.credentials import AzureKeyCredential
async with connect(
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_COGNITIVE_SERVICES_KEY"]),
model="gpt-4o-realtime-preview"
) as conn:
...
import asyncio
import os
from azure.ai.voicelive.aio import connect
from azure.identity.aio import DefaultAzureCredential
async def main():
async with connect(
endpoint=os.environ["AZURE_COGNITIVE_SERVICES_ENDPOINT"],
credential=DefaultAzureCredential(),
model="gpt-4o-realtime-preview",
credential_scopes=["https://cognitiveservices.azure.com/.default"]
) as conn:
# Update session with instructions
await conn.session.update(session={
"instructions": "You are a helpful assistant.",
"modalities": ["text", "audio"],
"voice": "alloy"
})
# Listen for events
async for event in conn:
print(f"Event: {event.type}")
if event.type == "response.audio_transcript.done":
print(f"Transcript: {event.transcript}")
elif event.type == "response.done":
break
asyncio.run(main())
The VoiceLiveConnection exposes these resources:
| Resource | Purpose | Key Methods |
|---|---|---|
conn.session | Session configuration | update(session=...) |
conn.response | Model responses | create(), cancel() |
conn.input_audio_buffer | Audio input | append(), , |
from azure.ai.voicelive.models import RequestSession, FunctionTool
await conn.session.update(session=RequestSession(
instructions="You are a helpful voice assistant.",
modalities=["text", "audio"],
voice="alloy", # or "echo", "shimmer", "sage", etc.
input_audio_format="pcm16",
output_audio_format="pcm16",
turn_detection={
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 500
},
tools=[
FunctionTool(
type="function",
name="get_weather",
description="Get current weather",
parameters={
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
)
]
))
import base64
# Read audio chunk (16-bit PCM, 24kHz mono)
audio_chunk = await read_audio_from_microphone()
b64_audio = base64.b64encode(audio_chunk).decode()
await conn.input_audio_buffer.append(audio=b64_audio)
async for event in conn:
if event.type == "response.audio.delta":
audio_bytes = base64.b64decode(event.delta)
await play_audio(audio_bytes)
elif event.type == "response.audio.done":
print("Audio complete")
async for event in conn:
match event.type:
# Session events
case "session.created":
print(f"Session: {event.session}")
case "session.updated":
print("Session updated")
# Audio input events
case "input_audio_buffer.speech_started":
print(f"Speech started at {event.audio_start_ms}ms")
case "input_audio_buffer.speech_stopped":
print(f"Speech stopped at {event.audio_end_ms}ms")
# Transcription events
case "conversation.item.input_audio_transcription.completed":
print(f"User said: {event.transcript}")
case "conversation.item.input_audio_transcription.delta":
print(f"Partial: {event.delta}")
# Response events
case "response.created":
print(f"Response started: {event.response.id}")
case "response.audio_transcript.delta":
print(event.delta, end="", flush=True)
case "response.audio.delta":
audio = base64.b64decode(event.delta)
case "response.done":
print(f"Response complete: {event.response.status}")
# Function calls
case "response.function_call_arguments.done":
result = handle_function(event.name, event.arguments)
await conn.conversation.item.create(item={
"type": "function_call_output",
"call_id": event.call_id,
"output": json.dumps(result)
})
await conn.response.create()
# Errors
case "error":
print(f"Error: {event.error.message}")
await conn.session.update(session={"turn_detection": None})
# Manually control turns
await conn.input_audio_buffer.append(audio=b64_audio)
await conn.input_audio_buffer.commit() # End of user turn
await conn.response.create() # Trigger response
async for event in conn:
if event.type == "input_audio_buffer.speech_started":
# User interrupted - cancel current response
await conn.response.cancel()
await conn.output_audio_buffer.clear()
# Add system message
await conn.conversation.item.create(item={
"type": "message",
"role": "system",
"content": [{"type": "input_text", "text": "Be concise."}]
})
# Add user message
await conn.conversation.item.create(item={
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
})
await conn.response.create()
| Voice | Description |
|---|---|
alloy | Neutral, balanced |
echo | Warm, conversational |
shimmer | Clear, professional |
sage | Calm, authoritative |
coral | Friendly, upbeat |
ash | Deep, measured |
Azure voices: Use AzureStandardVoice, AzureCustomVoice, or AzurePersonalVoice models.
| Format | Sample Rate | Use Case |
|---|---|---|
pcm16 | 24kHz | Default, high quality |
pcm16-8000hz | 8kHz | Telephony |
pcm16-16000hz | 16kHz | Voice assistants |
g711_ulaw | 8kHz | Telephony (US) |
g711_alaw | 8kHz | Telephony (EU) |
# Server VAD (default)
{"type": "server_vad", "threshold": 0.5, "silence_duration_ms": 500}
# Azure Semantic VAD (smarter detection)
{"type": "azure_semantic_vad"}
{"type": "azure_semantic_vad_en"} # English optimized
{"type": "azure_semantic_vad_multilingual"}
from azure.ai.voicelive.aio import ConnectionError, ConnectionClosed
try:
async with connect(...) as conn:
async for event in conn:
if event.type == "error":
print(f"API Error: {event.error.code} - {event.error.message}")
except ConnectionClosed as e:
print(f"Connection closed: {e.code} - {e.reason}")
except ConnectionError as e:
print(f"Connection error: {e}")
This skill is applicable to execute the workflow or actions described in the overview.
Weekly Installs
51
Repository
GitHub Stars
27.6K
First Seen
Feb 17, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode50
codex50
gemini-cli49
github-copilot49
amp49
cline49
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
127,000 周安装
OpenTelemetry 语义约定指南:标准化遥测属性、跨栈关联与可观测性最佳实践
46 周安装
Flutter BLoC 模式最佳实践指南:状态管理、依赖注入与导航规范
46 周安装
skill-comply:AI编码智能体合规性自动化测试工具 - 验证Claude技能规则遵循
46 周安装
ljg-card:内容转PNG图片工具,支持长图、信息图、漫画等多种视觉化格式
46 周安装
Trello项目管理技能:Membrane集成指南与API自动化操作教程
46 周安装
Linear CLI Watch:实时监听Linear问题变更,支持自定义轮询与JSON输出的命令行工具
46 周安装
commit()clear()conn.output_audio_buffer | Audio output | clear() |
conn.conversation | Conversation state | item.create(), item.delete(), item.truncate() |
conn.transcription_session | Transcription config | update(session=...) |
ballad| Expressive |
verse | Storytelling |