Azure OpenAI 2025 模型全解析：GPT-5、GPT-4.1、推理模型部署与集成指南

azure-openai-2025 by josiahsiegel/claude-plugin-marketplace

68 周安装量

21 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/josiahsiegel/claude-plugin-marketplace --skill azure-openai-2025

AI/机器学习自动化云服务

🇨🇳中文介绍

Azure OpenAI Service - 2025 模型与功能

包含 GPT-5、GPT-4.1、推理模型以及 Azure AI Foundry 集成的最新 2025 模型的 Azure OpenAI Service 完整知识库。

概述

Azure OpenAI Service 通过 REST API 提供对 OpenAI 最强大模型的访问，并具备企业级安全性、合规性和区域可用性。

简单问题
分类任务
内容审核
摘要

对以下情况使用 GPT-5 或 GPT-4.1:

复杂推理
长篇内容生成
文档分析
代码生成

对以下情况使用推理模型 (o3, o4-mini):

数学问题
科学分析
分步推理
逻辑谜题

# Use semantic cache to reduce duplicate requests
from azure.ai.cache import SemanticCache

cache = SemanticCache(
    similarity_threshold=0.95,
    ttl_seconds=3600
)

# Check cache before API call
cached_response = cache.get(user_query)
if cached_response:
    return cached_response

response = client.chat.completions.create(
    model="gpt-5",
    messages=messages
)

cache.set(user_query, response)

import tiktoken

# Count tokens before API call
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode(prompt))

if tokens > 100000:
    print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")

# Use shorter max_tokens when appropriate
response = client.chat.completions.create(
    model="gpt-5",
    messages=messages,
    max_tokens=500  # Limit output tokens
)

# Create budget alert
az consumption budget create \
  --budget-name openai-monthly-budget \
  --resource-group MyRG \
  --amount 1000 \
  --category Cost \
  --time-grain Monthly \
  --start-date 2025-01-01 \
  --end-date 2025-12-31 \
  --notifications '{
    "actual_GreaterThan_80_Percent": {
      "enabled": true,
      "operator": "GreaterThan",
      "threshold": 80,
      "contactEmails": ["billing@example.com"]
    }
  }'

Application Insights 集成

from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

# Configure logging
logger = logging.getLogger(__name__)
logger.addHandler(AzureLogHandler(
    connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
))

# Log API calls
logger.info("OpenAI API call", extra={
    "custom_dimensions": {
        "model": "gpt-5",
        "tokens": response.usage.total_tokens,
        "cost": calculate_cost(response.usage.total_tokens),
        "latency_ms": response.response_ms
    }
})

✓ 使用模型路由器 进行自动成本优化 ✓ 实现缓存 以减少重复请求 ✓ 监控令牌使用情况 并设置预算 ✓ 为生产工作负载使用专用终结点 ✓ 启用托管标识 而非 API 密钥 ✓ 配置内容过滤 以确保安全 ✓ 根据实际需求调整容量大小 ✓ 使用 Foundry 可观测性 进行监控 ✓ 实现指数退避的重试逻辑 ✓ 根据任务复杂性选择适当的模型

Azure OpenAI Service 与 GPT-5 及推理模型将企业级 AI 带入您的应用程序！

🇺🇸English

Azure OpenAI Service - 2025 Models and Features

Complete knowledge base for Azure OpenAI Service with latest 2025 models including GPT-5, GPT-4.1, reasoning models, and Azure AI Foundry integration.

Overview

Azure OpenAI Service provides REST API access to OpenAI's most powerful models with enterprise-grade security, compliance, and regional availability.

Latest Models (2025)

GPT-5 Series (GA August 2025)

Registration Required Models:

gpt-5-pro: Highest capability, complex reasoning
gpt-5: Balanced performance and cost
gpt-5-codex: Optimized for code generation

No Registration Required:

gpt-5-mini: Faster, more affordable
gpt-5-nano: Ultra-fast for simple tasks
gpt-5-chat: Optimized for conversational use

GPT-4.1 Series

gpt-4.1: 1 million token context window
gpt-4.1-mini: Efficient version with 1M context
gpt-4.1-nano: Fastest variant

Key Improvements:

1,000,000 token context (vs 128K in GPT-4 Turbo)
Better instruction following
Reduced hallucinations
Improved multilingual support

Reasoning Models

o4-mini : Lightweight reasoning model

Faster inference
Lower cost
Suitable for structured reasoning tasks

o3 : Advanced reasoning model

Complex problem solving
Mathematical reasoning
Scientific analysis

o1 : Original reasoning model

General-purpose reasoning
Step-by-step explanations

o1-mini : Efficient reasoning

Balanced cost and performance

Image Generation

GPT-image-1 (2025-04-15)

DALL-E 3 successor
Higher quality images
Better prompt understanding
Improved safety filters

Video Generation

Sora (2025-05-02)

Text-to-video generation
Realistic and imaginative scenes
Up to 60 seconds of video
Multiple camera angles and styles

Audio Models

gpt-4o-transcribe : Speech-to-text powered by GPT-4o

High accuracy transcription
Multiple languages
Speaker diarization

gpt-4o-mini-transcribe : Faster, more affordable transcription

Good accuracy
Lower latency
Cost-effective

Deploying Azure OpenAI

Create Azure OpenAI Resource

# Create OpenAI account
az cognitiveservices account create \
  --name myopenai \
  --resource-group MyRG \
  --kind OpenAI \
  --sku S0 \
  --location eastus \
  --custom-domain myopenai \
  --public-network-access Disabled \
  --identity-type SystemAssigned

# Get endpoint and key
az cognitiveservices account show \
  --name myopenai \
  --resource-group MyRG \
  --query "properties.endpoint" \
  --output tsv

az cognitiveservices account keys list \
  --name myopenai \
  --resource-group MyRG \
  --query "key1" \
  --output tsv

Deploy GPT-5 Model

# Deploy gpt-5
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-5 \
  --model-name gpt-5 \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 100 \
  --scale-type Standard

# Deploy gpt-5-pro (requires registration)
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-5-pro \
  --model-name gpt-5-pro \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 50

Deploy Reasoning Models

# Deploy o3 reasoning model
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name o3-reasoning \
  --model-name o3 \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 50

# Deploy o4-mini
az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name o4-mini \
  --model-name o4-mini \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 100

Deploy GPT-4.1 with 1M Context

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-4-1 \
  --model-name gpt-4.1 \
  --model-version latest \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 100

Deploy Image Generation Model

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name image-gen \
  --model-name gpt-image-1 \
  --model-version 2025-04-15 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 10

Deploy Sora Video Generation

az cognitiveservices account deployment create \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name sora \
  --model-name sora \
  --model-version 2025-05-02 \
  --model-format OpenAI \
  --sku-name Standard \
  --sku-capacity 5

Using Azure OpenAI Models

Python SDK (GPT-5)

from openai import AzureOpenAI
import os

# Initialize client
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2025-02-01-preview",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# GPT-5 completion
response = client.chat.completions.create(
    model="gpt-5",  # deployment name
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    max_tokens=1000,
    temperature=0.7,
    top_p=0.95
)

print(response.choices[0].message.content)

Python SDK (o3 Reasoning Model)

# o3 reasoning with chain-of-thought
response = client.chat.completions.create(
    model="o3-reasoning",
    messages=[
        {"role": "system", "content": "You are an expert problem solver. Show your reasoning step-by-step."},
        {"role": "user", "content": "If a train travels 120 km in 2 hours, then speeds up to travel 180 km in the next 2 hours, what is the average speed for the entire journey?"}
    ],
    max_tokens=2000,
    temperature=0.2  # Lower temperature for reasoning tasks
)

print(response.choices[0].message.content)

Python SDK (GPT-4.1 with 1M Context)

# Read a large document
with open('large_document.txt', 'r') as f:
    document = f.read()

# GPT-4.1 can handle up to 1M tokens
response = client.chat.completions.create(
    model="gpt-4-1",
    messages=[
        {"role": "system", "content": "You are a document analysis expert."},
        {"role": "user", "content": f"Analyze this document and provide key insights:\n\n{document}"}
    ],
    max_tokens=4000
)

print(response.choices[0].message.content)

Image Generation (GPT-image-1)

# Generate image with DALL-E 3 successor
response = client.images.generate(
    model="image-gen",
    prompt="A futuristic city with flying cars and vertical gardens, cyberpunk style, highly detailed, 4K",
    size="1024x1024",
    quality="hd",
    n=1
)

image_url = response.data[0].url
print(f"Generated image: {image_url}")

Video Generation (Sora)

# Generate video with Sora
response = client.videos.generate(
    model="sora",
    prompt="A serene lakeside at sunset with birds flying overhead and gentle waves on the shore",
    duration=10,  # seconds
    resolution="1080p",
    fps=30
)

video_url = response.data[0].url
print(f"Generated video: {video_url}")

Audio Transcription

# Transcribe audio file
audio_file = open("meeting_recording.mp3", "rb")

response = client.audio.transcriptions.create(
    model="gpt-4o-transcribe",
    file=audio_file,
    language="en",
    response_format="verbose_json"
)

print(f"Transcription: {response.text}")
print(f"Duration: {response.duration}s")

# Speaker diarization
for segment in response.segments:
    print(f"[{segment.start}s - {segment.end}s] {segment.text}")

Azure AI Foundry Integration

Model Router (Automatic Model Selection)

from azure.ai.foundry import ModelRouter

# Initialize model router
router = ModelRouter(
    endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    credential=os.getenv("AZURE_OPENAI_API_KEY")
)

# Automatically select optimal model
response = router.complete(
    prompt="Analyze this complex scientific paper...",
    optimization_goals=["quality", "cost"],
    available_models=["gpt-5", "gpt-5-mini", "gpt-4-1"]
)

print(f"Selected model: {response.model_used}")
print(f"Response: {response.content}")
print(f"Cost: ${response.cost}")

Benefits:

Automatic model selection based on prompt complexity
Balance quality vs cost
Reduce costs by up to 40% while maintaining quality

Agentic Retrieval (Azure AI Search Integration)

from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

# Initialize search client
search_client = SearchClient(
    endpoint=os.getenv("SEARCH_ENDPOINT"),
    index_name="documents",
    credential=AzureKeyCredential(os.getenv("SEARCH_KEY"))
)

# Agentic retrieval with Azure OpenAI
response = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "You have access to a document search system."},
        {"role": "user", "content": "What are the company's revenue projections for Q3?"}
    ],
    tools=[{
        "type": "function",
        "function": {
            "name": "search_documents",
            "description": "Search company documents",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    }],
    tool_choice="auto"
)

# Process tool calls
if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        if tool_call.function.name == "search_documents":
            query = json.loads(tool_call.function.arguments)["query"]
            results = search_client.search(query)
            # Feed results back to model for final answer

Improvements:

40% better on complex, multi-part questions
Automatic query decomposition
Relevance ranking
Citation generation

Foundry Observability (Preview)

from azure.ai.foundry import FoundryObservability

# Enable observability
observability = FoundryObservability(
    workspace_id=os.getenv("AI_FOUNDRY_WORKSPACE_ID"),
    enable_tracing=True,
    enable_metrics=True
)

# Monitor agent execution
with observability.trace_agent("customer_support_agent") as trace:
    response = client.chat.completions.create(
        model="gpt-5",
        messages=messages
    )

    trace.log_tool_call("search_kb", {"query": "refund policy"})
    trace.log_reasoning_step("Retrieved refund policy document")
    trace.log_token_usage(response.usage.total_tokens)

# View in Azure AI Foundry portal:
# - End-to-end trace logs
# - Reasoning steps and tool calls
# - Performance metrics
# - Cost analysis

Capacity and Quota Management

Check Quota

# List deployments with usage
az cognitiveservices account deployment list \
  --resource-group MyRG \
  --name myopenai \
  --output table

# Check usage metrics
az monitor metrics list \
  --resource $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
  --metric "TokenTransaction" \
  --start-time 2025-01-01T00:00:00Z \
  --end-time 2025-01-31T23:59:59Z \
  --interval PT1H \
  --aggregation Total

Update Capacity

# Scale up deployment capacity
az cognitiveservices account deployment update \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-5 \
  --sku-capacity 200

# Scale down during off-peak
az cognitiveservices account deployment update \
  --resource-group MyRG \
  --name myopenai \
  --deployment-name gpt-5 \
  --sku-capacity 50

Request Quota Increase

Navigate to Azure Portal → Azure OpenAI resource
Go to "Quotas" blade
Select model and region
Click "Request quota increase"
Provide justification and target capacity

Security and Networking

Private Endpoint

# Create private endpoint
az network private-endpoint create \
  --name openai-private-endpoint \
  --resource-group MyRG \
  --vnet-name MyVNet \
  --subnet PrivateEndpointSubnet \
  --private-connection-resource-id $(az cognitiveservices account show -g MyRG -n myopenai --query id -o tsv) \
  --group-id account \
  --connection-name openai-connection

# Create private DNS zone
az network private-dns zone create \
  --resource-group MyRG \
  --name privatelink.openai.azure.com

# Link to VNet
az network private-dns link vnet create \
  --resource-group MyRG \
  --zone-name privatelink.openai.azure.com \
  --name openai-dns-link \
  --virtual-network MyVNet \
  --registration-enabled false

# Create DNS zone group
az network private-endpoint dns-zone-group create \
  --resource-group MyRG \
  --endpoint-name openai-private-endpoint \
  --name default \
  --private-dns-zone privatelink.openai.azure.com \
  --zone-name privatelink.openai.azure.com

Managed Identity Access

# Enable system-assigned identity
az cognitiveservices account identity assign \
  --name myopenai \
  --resource-group MyRG

# Grant role to managed identity
PRINCIPAL_ID=$(az cognitiveservices account show -g MyRG -n myopenai --query identity.principalId -o tsv)

az role assignment create \
  --assignee $PRINCIPAL_ID \
  --role "Cognitive Services OpenAI User" \
  --scope /subscriptions/<sub-id>/resourceGroups/MyRG

Content Filtering

# Configure content filtering
az cognitiveservices account update \
  --name myopenai \
  --resource-group MyRG \
  --set properties.customContentFilter='{
    "hate": {"severity": "medium", "enabled": true},
    "violence": {"severity": "medium", "enabled": true},
    "sexual": {"severity": "medium", "enabled": true},
    "selfHarm": {"severity": "high", "enabled": true}
  }'

Cost Optimization

Model Selection Strategy

Use GPT-5-mini or GPT-5-nano for:

Simple questions
Classification tasks
Content moderation
Summarization

Use GPT-5 or GPT-4.1 for:

Complex reasoning
Long-form content generation
Document analysis
Code generation

Use Reasoning Models (o3, o4-mini) for:

Mathematical problems
Scientific analysis
Step-by-step reasoning
Logic puzzles

Implement Caching

# Use semantic cache to reduce duplicate requests
from azure.ai.cache import SemanticCache

cache = SemanticCache(
    similarity_threshold=0.95,
    ttl_seconds=3600
)

# Check cache before API call
cached_response = cache.get(user_query)
if cached_response:
    return cached_response

response = client.chat.completions.create(
    model="gpt-5",
    messages=messages
)

cache.set(user_query, response)

Token Management

import tiktoken

# Count tokens before API call
encoding = tiktoken.get_encoding("cl100k_base")
tokens = len(encoding.encode(prompt))

if tokens > 100000:
    print(f"Warning: Prompt has {tokens} tokens, this will be expensive!")

# Use shorter max_tokens when appropriate
response = client.chat.completions.create(
    model="gpt-5",
    messages=messages,
    max_tokens=500  # Limit output tokens
)

Monitoring and Alerts

Set Up Cost Alerts

# Create budget alert
az consumption budget create \
  --budget-name openai-monthly-budget \
  --resource-group MyRG \
  --amount 1000 \
  --category Cost \
  --time-grain Monthly \
  --start-date 2025-01-01 \
  --end-date 2025-12-31 \
  --notifications '{
    "actual_GreaterThan_80_Percent": {
      "enabled": true,
      "operator": "GreaterThan",
      "threshold": 80,
      "contactEmails": ["billing@example.com"]
    }
  }'

Application Insights Integration

from opencensus.ext.azure.log_exporter import AzureLogHandler
import logging

# Configure logging
logger = logging.getLogger(__name__)
logger.addHandler(AzureLogHandler(
    connection_string=os.getenv("APPLICATIONINSIGHTS_CONNECTION_STRING")
))

# Log API calls
logger.info("OpenAI API call", extra={
    "custom_dimensions": {
        "model": "gpt-5",
        "tokens": response.usage.total_tokens,
        "cost": calculate_cost(response.usage.total_tokens),
        "latency_ms": response.response_ms
    }
})

Best Practices

✓ Use Model Router for automatic cost optimization ✓ Implement caching to reduce duplicate requests ✓ Monitor token usage and set budgets ✓ Use private endpoints for production workloads ✓ Enable managed identity instead of API keys ✓ Configure content filtering for safety ✓ Right-size capacity based on actual demand ✓ Use Foundry Observability for monitoring ✓ Implement retry logic with exponential backoff ✓ Choose appropriate models for task complexity

References

Azure OpenAI Service with GPT-5 and reasoning models brings enterprise-grade AI to your applications!

Weekly Installs

Repository

josiahsiegel/cl…ketplace

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode54

gemini-cli54

codex51

claude-code50

cursor49

github-copilot46

Azure 配额管理指南：服务限制、容量验证与配额增加方法

122,900 周安装

Azure OpenAI 2025 模型全解析：GPT-5、GPT-4.1、推理模型部署与集成指南

🇨🇳中文介绍

Azure OpenAI Service - 2025 模型与功能

概述

最新模型 (2025)

GPT-5 系列 (2025年8月正式发布)

相关 Skills

GPT-4.1 系列

推理模型

图像生成

视频生成

音频模型

部署 Azure OpenAI

创建 Azure OpenAI 资源

部署 GPT-5 模型

部署推理模型

部署具有 100 万上下文的 GPT-4.1

部署图像生成模型

部署 Sora 视频生成

使用 Azure OpenAI 模型

Python SDK (GPT-5)

Python SDK (o3 推理模型)

Python SDK (具有 100 万上下文的 GPT-4.1)

图像生成 (GPT-image-1)

视频生成 (Sora)

音频转录

Azure AI Foundry 集成

模型路由器 (自动模型选择)

智能检索 (Azure AI Search 集成)

Foundry 可观测性 (预览版)

容量和配额管理

检查配额

更新容量

请求增加配额

安全与网络

专用终结点

托管标识访问

内容过滤

成本优化

模型选择策略

实现缓存

令牌管理

监控与警报

设置成本警报

Application Insights 集成

最佳实践

参考

🇺🇸English

Azure OpenAI Service - 2025 Models and Features

Overview

Latest Models (2025)

GPT-5 Series (GA August 2025)

GPT-4.1 Series

Reasoning Models

Image Generation

Video Generation

Audio Models

Deploying Azure OpenAI

Create Azure OpenAI Resource

Deploy GPT-5 Model

Deploy Reasoning Models

Deploy GPT-4.1 with 1M Context

Deploy Image Generation Model

Deploy Sora Video Generation

Using Azure OpenAI Models

Python SDK (GPT-5)

Python SDK (o3 Reasoning Model)

Python SDK (GPT-4.1 with 1M Context)

Image Generation (GPT-image-1)

Video Generation (Sora)

Audio Transcription

Azure AI Foundry Integration

Model Router (Automatic Model Selection)

Agentic Retrieval (Azure AI Search Integration)

Foundry Observability (Preview)

Capacity and Quota Management

Check Quota

Update Capacity

Request Quota Increase

Security and Networking