azure-openai-2025 by josiahsiegel/claude-plugin-marketplace
npx skills add https://github.com/josiahsiegel/claude-plugin-marketplace --skill azure-openai-2025包含 GPT-5、GPT-4.1、推理模型以及 Azure AI Foundry 集成的最新 2025 模型的 Azure OpenAI Service 完整知识库。
Azure OpenAI Service 通过 REST API 提供对 OpenAI 最强大模型的访问,并具备企业级安全性、合规性和区域可用性。
需要注册的模型:
gpt-5-pro: 最高能力,复杂推理gpt-5: 平衡性能与成本gpt-5-codex: 针对代码生成优化无需注册:
gpt-5-mini: 更快,更经济gpt-5-nano: 超快,适用于简单任务gpt-5-chat: 针对对话使用优化广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
gpt-4.1: 100 万令牌上下文窗口gpt-4.1-mini: 具有 100 万上下文的高效版本gpt-4.1-nano: 最快的变体主要改进:
o4-mini : 轻量级推理模型
o3 : 高级推理模型
o1 : 原始推理模型
o1-mini : 高效推理
GPT-image-1 (2025-04-15)
Sora (2025-05-02)
gpt-4o-transcribe : 由 GPT-4o 驱动的语音转文本
gpt-4o-mini-transcribe : 更快、更经济的转录
# Create OpenAI account
az cognitiveservices account create \
--name myopenai \
--resource-group MyRG \
--kind OpenAI \
--sku S0 \
--location eastus \
--custom-domain myopenai \
--public-network-access Disabled \
--identity-type SystemAssigned
# Get endpoint and key
az cognitiveservices account show \
--name myopenai \
--resource-group MyRG \
--query "properties.endpoint" \
--output tsv
az cognitiveservices account keys list \
--name myopenai \
--resource-group MyRG \
--query "key1" \
--output tsv
# Deploy gpt-5
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-5 \
--model-name gpt-5 \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 100 \
--scale-type Standard
# Deploy gpt-5-pro (requires registration)
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-5-pro \
--model-name gpt-5-pro \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 50
# Deploy o3 reasoning model
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name o3-reasoning \
--model-name o3 \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 50
# Deploy o4-mini
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name o4-mini \
--model-name o4-mini \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 100
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-4-1 \
--model-name gpt-4.1 \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 100
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name image-gen \
--model-name gpt-image-1 \
--model-version 2025-04-15 \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 10
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name sora \
--model-name sora \
--model-version 2025-05-02 \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 5
from openai import AzureOpenAI
import os
# Initialize client
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2025-02-01-preview",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
# GPT-5 completion
response = client.chat.completions.create(
model="gpt-5", # deployment name
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=1000,
temperature=0.7,
top_p=0.95
)
print(response.choices[0].message.content)
# o3 reasoning with chain-of-thought
response = client.chat.completions.create(
model="o3-reasoning",
messages=[
{"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."},
{"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"}
],
max_tokens=2000,
temperature=0.2 # Lower temperature for reasoning tasks
)
print(response.choices[0].message.content)
# Read a large document
with open('large_document.txt', 'r') as f:
document = f.read()
# GPT-4.1 can handle up to 1M tokens
response = client.chat.completions.create(
model="gpt-4-1",
messages=[
{"role": "system", "content": "You are a document analysis expert."},
{"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"}
],
max_tokens=4000
)
print(response.choices[0].message.content)
# Generate image with DALL-E 3 successor
response = client.images.generate(
model="image-gen",
prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K",
size="1024x1024",
quality="hd",
n=1
)
image_url = response.data[0].url
print(f"Generated image: {image_url}")
# Generate video with Sora
response = client.videos.generate(
model="sora",
prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore",
duration=10, # seconds
resolution="1080p",
fps=30
)
video_url = response.data[0].url
print(f"Generated video: {video_url}")
# Transcribe audio file
audio_file = open("meeting_recording.mp3", "rb")
response = client.audio.transcriptions.create(
model="gpt-4o-transcribe",
file=audio_file,
language="en",
response_format="verbose_json"
)
print(f"Transcription: {response.text}")
print(f"Duration: {response.duration}s")
# Speaker diarization
for segment in response.segments:
print(f"[{segment.start}s - {segment.end}s] {segment.text}")
from azure.ai.foundry import ModelRouter
# Initialize model router
router = ModelRouter(
endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
credential=os.getenv("AZURE_OPENAI_API_KEY")
)
# Automatically select optimal model
response = router.complete(
prompt="Analyze this complex scientific paper...",
optimization_goals=["quality", "cost"],
available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"]
)
print(f"Selected model: {response.model_used}")
print(f"Response: {response.content}")
print(f"Cost: ${response.cost}")
优势:
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
# Initialize search client
search_client = SearchClient(
endpoint=os.getenv("SEARCH_ENDPOINT"),
index_name="documents",
credential=AzureKeyCredential(os.getenv("SEARCH_KEY"))
)
# Agentic retrieval with Azure OpenAI
response = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "You have access to a document search system."},
{"role": "user", "content": "What are the company's revenue projections for Q3?"}
],
tools=[{
"type": "function",
"function": {
"name": "search_documents",
"description": "Search company documents",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
}],
tool_choice="auto"
)
# Process tool calls
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
if tool_call.function.name == "search_documents":
query = json.loads(tool_call.function.arguments)["query"]
results = search_client.search(query)
# Feed results back to model for final answer
改进:
from azure.ai.foundry import FoundryObservability
# Enable observability
observability = FoundryObservability(
workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"),
enable_tracing=True,
enable_metrics=True
)
# Monitor agent execution
with observability.trace_agent("customer_support_agent") as trace:
response = client.chat.completions.create(
model="gpt-5",
messages=messages
)
trace.log_tool_call("search_kb", {"query": "refund policy"})
trace.log_reasoning_step("Retrieved refund policy document")
trace.log_token_usage(response.usage.total_tokens)
# View in Azure AI Foundry portal:
# - End-to-end trace logs
# - Reasoning steps and tool calls
# - Performance metrics
# - Cost analysis
# List deployments with usage
az cognitiveservices account deployment list \
--resource-group MyRG \
--name myopenai \
--output table
# Check usage metrics
az monitor metrics list \
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
--metric "TokenTransaction" \
--start-time 2025-01-01T00:00:00Z \
--end-time 2025-01-31T23:59:59Z \
--interval PT1H \
--aggregation Total
# Scale up deployment capacity
az cognitiveservices account deployment update \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-5 \
--sku-capacity 200
# Scale down during off-peak
az cognitiveservices account deployment update \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-5 \
--sku-capacity 50
# Create private endpoint
az network private-endpoint create \
--name openai-private-endpoint \
--resource-group MyRG \
--vnet-name MyVNet \
--subnet PrivateEndpointSubnet \
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
--group-id account \
--connection-name openai-connection
# Create private DNS zone
az network private-dns zone create \
--resource-group MyRG \
--name privatelink.openai.azure.com
# Link to VNet
az network private-dns link vnet create \
--resource-group MyRG \
--zone-name privatelink.openai.azure.com \
--name openai-dns-link \
--virtual-network MyVNet \
--registration-enabled false
# Create DNS zone group
az network private-endpoint dns-zone-group create \
--resource-group MyRG \
--endpoint-name openai-private-endpoint \
--name default \
--private-dns-zone privatelink.openai.azure.com \
--zone-name privatelink.openai.azure.com
# Enable system-assigned identity
az cognitiveservices account identity assign \
--name myopenai \
--resource-group MyRG
# Grant role to managed identity
PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Cognitive Services OpenAI User" \
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
# Configure content filtering
az cognitiveservices account update \
--name myopenai \
--resource-group MyRG \
--set properties.customContentFilter='{
"hate": {"severity": "medium", "enabled": true},
"violence": {"severity": "medium", "enabled": true},
"sexual": {"severity": "medium", "enabled": true},
"selfHarm": {"severity": "high", "enabled": true}
}'
对以下情况使用 GPT-5-mini 或 GPT-5-nano:
对以下情况使用 GPT-5 或 GPT-4.1:
对以下情况使用推理模型 (o3, o4-mini):
# Use semantic cache to reduce duplicate requests
from azure.ai.cache import SemanticCache
cache = SemanticCache(
similarity_threshold=0.95,
ttl_seconds=3600
)
# Check cache before API call
cached_response = cache.get(user_query)
if cached_response:
return cached_response
response = client.chat.completions.create(
model="gpt-5",
messages=messages
)
cache.set(user_query, response)
import tiktoken
# Count tokens before API call
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode(prompt))
if tokens > 100000:
print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")
# Use shorter max_tokens when appropriate
response = client.chat.completions.create(
model="gpt-5",
messages=messages,
max_tokens=500 # Limit output tokens
)
# Create budget alert
az consumption budget create \
--budget-name openai-monthly-budget \
--resource-group MyRG \
--amount 1000 \
--category Cost \
--time-grain Monthly \
--start-date 2025-01-01 \
--end-date 2025-12-31 \
--notifications '{
"actual_GreaterThan_80_Percent": {
"enabled": true,
"operator": "GreaterThan",
"threshold": 80,
"contactEmails": ["billing@example.com"]
}
}'
from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging
# Configure logging
logger = logging.getLogger(__name__)
logger.addHandler(AzureLogHandler(
connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
))
# Log API calls
logger.info("OpenAI API call", extra={
"custom_dimensions": {
"model": "gpt-5",
"tokens": response.usage.total_tokens,
"cost": calculate_cost(response.usage.total_tokens),
"latency_ms": response.response_ms
}
})
✓ 使用模型路由器 进行自动成本优化 ✓ 实现缓存 以减少重复请求 ✓ 监控令牌使用情况 并设置预算 ✓ 为生产工作负载使用专用终结点 ✓ 启用托管标识 而非 API 密钥 ✓ 配置内容过滤 以确保安全 ✓ 根据实际需求调整容量大小 ✓ 使用 Foundry 可观测性 进行监控 ✓ 实现指数退避的重试逻辑 ✓ 根据任务复杂性选择适当的模型
Azure OpenAI Service 与 GPT-5 及推理模型将企业级 AI 带入您的应用程序!
每周安装次数
68
代码仓库
GitHub 星标数
21
首次出现
2026年1月24日
安全审计
安装于
opencode54
gemini-cli54
codex51
claude-code50
cursor49
github-copilot46
Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration.
Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability.
Registration Required Models:
gpt-5-pro: Highest capability, complex reasoninggpt-5: Balanced performance and costgpt-5-codex: Optimized for code generationNo Registration Required:
gpt-5-mini: Faster, more affordablegpt-5-nano: Ultra-fast for simple tasksgpt-5-chat: Optimized for conversational usegpt-4.1: 1 million token context windowgpt-4.1-mini: Efficient version with 1M contextgpt-4.1-nano: Fastest variantKey Improvements:
o4-mini : Lightweight reasoning model
o3 : Advanced reasoning model
o1 : Original reasoning model
o1-mini : Efficient reasoning
GPT-image-1 (2025-04-15)
Sora (2025-05-02)
gpt-4o-transcribe : Speech-to-text powered by GPT-4o
gpt-4o-mini-transcribe : Faster, more affordable transcription
# Create OpenAI account
az cognitiveservices account create \
--name myopenai \
--resource-group MyRG \
--kind OpenAI \
--sku S0 \
--location eastus \
--custom-domain myopenai \
--public-network-access Disabled \
--identity-type SystemAssigned
# Get endpoint and key
az cognitiveservices account show \
--name myopenai \
--resource-group MyRG \
--query "properties.endpoint" \
--output tsv
az cognitiveservices account keys list \
--name myopenai \
--resource-group MyRG \
--query "key1" \
--output tsv
# Deploy gpt-5
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-5 \
--model-name gpt-5 \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 100 \
--scale-type Standard
# Deploy gpt-5-pro (requires registration)
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-5-pro \
--model-name gpt-5-pro \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 50
# Deploy o3 reasoning model
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name o3-reasoning \
--model-name o3 \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 50
# Deploy o4-mini
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name o4-mini \
--model-name o4-mini \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 100
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-4-1 \
--model-name gpt-4.1 \
--model-version latest \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 100
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name image-gen \
--model-name gpt-image-1 \
--model-version 2025-04-15 \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 10
az cognitiveservices account deployment create \
--resource-group MyRG \
--name myopenai \
--deployment-name sora \
--model-name sora \
--model-version 2025-05-02 \
--model-format OpenAI \
--sku-name Standard \
--sku-capacity 5
from openai import AzureOpenAI
import os
# Initialize client
client = AzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2025-02-01-preview",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
# GPT-5 completion
response = client.chat.completions.create(
model="gpt-5", # deployment name
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
max_tokens=1000,
temperature=0.7,
top_p=0.95
)
print(response.choices[0].message.content)
# o3 reasoning with chain-of-thought
response = client.chat.completions.create(
model="o3-reasoning",
messages=[
{"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."},
{"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"}
],
max_tokens=2000,
temperature=0.2 # Lower temperature for reasoning tasks
)
print(response.choices[0].message.content)
# Read a large document
with open('large_document.txt', 'r') as f:
document = f.read()
# GPT-4.1 can handle up to 1M tokens
response = client.chat.completions.create(
model="gpt-4-1",
messages=[
{"role": "system", "content": "You are a document analysis expert."},
{"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"}
],
max_tokens=4000
)
print(response.choices[0].message.content)
# Generate image with DALL-E 3 successor
response = client.images.generate(
model="image-gen",
prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K",
size="1024x1024",
quality="hd",
n=1
)
image_url = response.data[0].url
print(f"Generated image: {image_url}")
# Generate video with Sora
response = client.videos.generate(
model="sora",
prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore",
duration=10, # seconds
resolution="1080p",
fps=30
)
video_url = response.data[0].url
print(f"Generated video: {video_url}")
# Transcribe audio file
audio_file = open("meeting_recording.mp3", "rb")
response = client.audio.transcriptions.create(
model="gpt-4o-transcribe",
file=audio_file,
language="en",
response_format="verbose_json"
)
print(f"Transcription: {response.text}")
print(f"Duration: {response.duration}s")
# Speaker diarization
for segment in response.segments:
print(f"[{segment.start}s - {segment.end}s] {segment.text}")
from azure.ai.foundry import ModelRouter
# Initialize model router
router = ModelRouter(
endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
credential=os.getenv("AZURE_OPENAI_API_KEY")
)
# Automatically select optimal model
response = router.complete(
prompt="Analyze this complex scientific paper...",
optimization_goals=["quality", "cost"],
available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"]
)
print(f"Selected model: {response.model_used}")
print(f"Response: {response.content}")
print(f"Cost: ${response.cost}")
Benefits:
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
# Initialize search client
search_client = SearchClient(
endpoint=os.getenv("SEARCH_ENDPOINT"),
index_name="documents",
credential=AzureKeyCredential(os.getenv("SEARCH_KEY"))
)
# Agentic retrieval with Azure OpenAI
response = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "You have access to a document search system."},
{"role": "user", "content": "What are the company's revenue projections for Q3?"}
],
tools=[{
"type": "function",
"function": {
"name": "search_documents",
"description": "Search company documents",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
}],
tool_choice="auto"
)
# Process tool calls
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
if tool_call.function.name == "search_documents":
query = json.loads(tool_call.function.arguments)["query"]
results = search_client.search(query)
# Feed results back to model for final answer
Improvements:
from azure.ai.foundry import FoundryObservability
# Enable observability
observability = FoundryObservability(
workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"),
enable_tracing=True,
enable_metrics=True
)
# Monitor agent execution
with observability.trace_agent("customer_support_agent") as trace:
response = client.chat.completions.create(
model="gpt-5",
messages=messages
)
trace.log_tool_call("search_kb", {"query": "refund policy"})
trace.log_reasoning_step("Retrieved refund policy document")
trace.log_token_usage(response.usage.total_tokens)
# View in Azure AI Foundry portal:
# - End-to-end trace logs
# - Reasoning steps and tool calls
# - Performance metrics
# - Cost analysis
# List deployments with usage
az cognitiveservices account deployment list \
--resource-group MyRG \
--name myopenai \
--output table
# Check usage metrics
az monitor metrics list \
--resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
--metric "TokenTransaction" \
--start-time 2025-01-01T00:00:00Z \
--end-time 2025-01-31T23:59:59Z \
--interval PT1H \
--aggregation Total
# Scale up deployment capacity
az cognitiveservices account deployment update \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-5 \
--sku-capacity 200
# Scale down during off-peak
az cognitiveservices account deployment update \
--resource-group MyRG \
--name myopenai \
--deployment-name gpt-5 \
--sku-capacity 50
# Create private endpoint
az network private-endpoint create \
--name openai-private-endpoint \
--resource-group MyRG \
--vnet-name MyVNet \
--subnet PrivateEndpointSubnet \
--private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
--group-id account \
--connection-name openai-connection
# Create private DNS zone
az network private-dns zone create \
--resource-group MyRG \
--name privatelink.openai.azure.com
# Link to VNet
az network private-dns link vnet create \
--resource-group MyRG \
--zone-name privatelink.openai.azure.com \
--name openai-dns-link \
--virtual-network MyVNet \
--registration-enabled false
# Create DNS zone group
az network private-endpoint dns-zone-group create \
--resource-group MyRG \
--endpoint-name openai-private-endpoint \
--name default \
--private-dns-zone privatelink.openai.azure.com \
--zone-name privatelink.openai.azure.com
# Enable system-assigned identity
az cognitiveservices account identity assign \
--name myopenai \
--resource-group MyRG
# Grant role to managed identity
PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)
az role assignment create \
--assignee $PRINCIPAL_ID \
--role "Cognitive Services OpenAI User" \
--scope /subscriptions/<sub-id>/resourceGroups/MyRG
# Configure content filtering
az cognitiveservices account update \
--name myopenai \
--resource-group MyRG \
--set properties.customContentFilter='{
"hate": {"severity": "medium", "enabled": true},
"violence": {"severity": "medium", "enabled": true},
"sexual": {"severity": "medium", "enabled": true},
"selfHarm": {"severity": "high", "enabled": true}
}'
Use GPT-5-mini or GPT-5-nano for:
Use GPT-5 or GPT-4.1 for:
Use Reasoning Models (o3, o4-mini) for:
# Use semantic cache to reduce duplicate requests
from azure.ai.cache import SemanticCache
cache = SemanticCache(
similarity_threshold=0.95,
ttl_seconds=3600
)
# Check cache before API call
cached_response = cache.get(user_query)
if cached_response:
return cached_response
response = client.chat.completions.create(
model="gpt-5",
messages=messages
)
cache.set(user_query, response)
import tiktoken
# Count tokens before API call
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode(prompt))
if tokens > 100000:
print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")
# Use shorter max_tokens when appropriate
response = client.chat.completions.create(
model="gpt-5",
messages=messages,
max_tokens=500 # Limit output tokens
)
# Create budget alert
az consumption budget create \
--budget-name openai-monthly-budget \
--resource-group MyRG \
--amount 1000 \
--category Cost \
--time-grain Monthly \
--start-date 2025-01-01 \
--end-date 2025-12-31 \
--notifications '{
"actual_GreaterThan_80_Percent": {
"enabled": true,
"operator": "GreaterThan",
"threshold": 80,
"contactEmails": ["billing@example.com"]
}
}'
from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging
# Configure logging
logger = logging.getLogger(__name__)
logger.addHandler(AzureLogHandler(
connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
))
# Log API calls
logger.info("OpenAI API call", extra={
"custom_dimensions": {
"model": "gpt-5",
"tokens": response.usage.total_tokens,
"cost": calculate_cost(response.usage.total_tokens),
"latency_ms": response.response_ms
}
})
✓ Use Model Router for automatic cost optimization ✓ Implement caching to reduce duplicate requests ✓ Monitor token usage and set budgets ✓ Use private endpoints for production workloads ✓ Enable managed identity instead of API keys ✓ Configure content filtering for safety ✓ Right-size capacity based on actual demand ✓ Use Foundry Observability for monitoring ✓ Implement retry logic with exponential backoff ✓ Choose appropriate models for task complexity
Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!
Weekly Installs
68
Repository
GitHub Stars
21
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode54
gemini-cli54
codex51
claude-code50
cursor49
github-copilot46
Azure 配额管理指南:服务限制、容量验证与配额增加方法
122,900 周安装