⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

Azure AI Vision Python SDK 图像分析教程：OCR、对象检测与图像描述

azure-ai-vision-imageanalysis-py by sickn33/antigravity-awesome-skills

47 周安装量

28,500 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill azure-ai-vision-imageanalysis-py

Python Web框架云服务计算机视觉

🇨🇳中文介绍

Azure AI Vision 图像分析 Python SDK

用于 Azure AI Vision 4.0 图像分析的客户端库，功能包括图像描述、标签、对象检测、OCR 等。

安装

pip install azure-ai-vision-imageanalysis

环境变量

VISION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
VISION_KEY=<your-api-key>  # 如果使用 API 密钥

身份验证

API 密钥

import os
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["VISION_ENDPOINT"]
key = os.environ["VISION_KEY"]

client = ImageAnalysisClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key)
)

Entra ID（推荐）

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.identity import DefaultAzureCredential

client = ImageAnalysisClient(
    endpoint=os.environ["VISION_ENDPOINT"],
    credential=DefaultAzureCredential()
)

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

从 URL 分析图像

from azure.ai.vision.imageanalysis.models import VisualFeatures

image_url = "https://example.com/image.jpg"

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[
        VisualFeatures.CAPTION,
        VisualFeatures.TAGS,
        VisualFeatures.OBJECTS,
        VisualFeatures.READ,
        VisualFeatures.PEOPLE,
        VisualFeatures.SMART_CROPS,
        VisualFeatures.DENSE_CAPTIONS
    ],
    gender_neutral_caption=True,
    language="en"
)

从文件分析图像

with open("image.jpg", "rb") as f:
    image_data = f.read()

result = client.analyze(
    image_data=image_data,
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS]
)

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.CAPTION],
    gender_neutral_caption=True
)

if result.caption:
    print(f"描述: {result.caption.text}")
    print(f"置信度: {result.caption.confidence:.2f}")

密集描述（多区域）

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.DENSE_CAPTIONS]
)

if result.dense_captions:
    for caption in result.dense_captions.list:
        print(f"描述: {caption.text}")
        print(f"  置信度: {caption.confidence:.2f}")
        print(f"  边界框: {caption.bounding_box}")

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.TAGS]
)

if result.tags:
    for tag in result.tags.list:
        print(f"标签: {tag.name} (置信度: {tag.confidence:.2f})")

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.OBJECTS]
)

if result.objects:
    for obj in result.objects.list:
        print(f"对象: {obj.tags[0].name}")
        print(f"  置信度: {obj.tags[0].confidence:.2f}")
        box = obj.bounding_box
        print(f"  边界框: x={box.x}, y={box.y}, w={box.width}, h={box.height}")

OCR（文本提取）

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.READ]
)

if result.read:
    for block in result.read.blocks:
        for line in block.lines:
            print(f"行: {line.text}")
            print(f"  边界多边形: {line.bounding_polygon}")
            
            # 单词级详细信息
            for word in line.words:
                print(f"  单词: {word.text} (置信度: {word.confidence:.2f})")

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.PEOPLE]
)

if result.people:
    for person in result.people.list:
        print(f"检测到人物:")
        print(f"  置信度: {person.confidence:.2f}")
        box = person.bounding_box
        print(f"  边界框: x={box.x}, y={box.y}, w={box.width}, h={box.height}")

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.SMART_CROPS],
    smart_crops_aspect_ratios=[0.9, 1.33, 1.78]  # 纵向、4:3、16:9
)

if result.smart_crops:
    for crop in result.smart_crops.list:
        print(f"宽高比: {crop.aspect_ratio}")
        box = crop.bounding_box
        print(f"  裁剪区域: x={box.x}, y={box.y}, w={box.width}, h={box.height}")

from azure.ai.vision.imageanalysis.aio import ImageAnalysisClient
from azure.identity.aio import DefaultAzureCredential

async def analyze_image():
    async with ImageAnalysisClient(
        endpoint=endpoint,
        credential=DefaultAzureCredential()
    ) as client:
        result = await client.analyze_from_url(
            image_url=image_url,
            visual_features=[VisualFeatures.CAPTION]
        )
        print(result.caption.text)

功能	描述
`CAPTION`	描述图像的单句描述
`DENSE_CAPTIONS`	多个区域的描述
`TAGS`	内容标签（对象、场景、动作）
`OBJECTS`	带边界框的对象检测
`READ`	OCR 文本提取
`PEOPLE`	带边界框的人物检测
`SMART_CROPS`	缩略图的建议裁剪区域

from azure.core.exceptions import HttpResponseError

try:
    result = client.analyze_from_url(
        image_url=image_url,
        visual_features=[VisualFeatures.CAPTION]
    )
except HttpResponseError as e:
    print(f"状态码: {e.status_code}")
    print(f"原因: {e.reason}")
    print(f"消息: {e.error.message}")

格式：JPEG、PNG、GIF、BMP、WEBP、ICO、TIFF、MPO
最大大小：20 MB
尺寸：50x50 至 16000x16000 像素

仅选择所需功能 以优化延迟和成本
在高吞吐量场景中使用异步客户端
处理 HttpResponseError 以应对无效图像或身份验证问题
启用 gender_neutral_caption 以获取包容性描述
指定语言 以获取本地化描述
使用 smart_crops_aspect_ratios 匹配您的缩略图要求
缓存结果 当多次分析同一图像时

此技能适用于执行概述中描述的工作流或操作。

🇺🇸English

Azure AI Vision Image Analysis SDK for Python

Client library for Azure AI Vision 4.0 image analysis including captions, tags, objects, OCR, and more.

Installation

pip install azure-ai-vision-imageanalysis

Environment Variables

VISION_ENDPOINT=https://<resource>.cognitiveservices.azure.com
VISION_KEY=<your-api-key>  # If using API key

Authentication

API Key

import os
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.core.credentials import AzureKeyCredential

endpoint = os.environ["VISION_ENDPOINT"]
key = os.environ["VISION_KEY"]

client = ImageAnalysisClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(key)
)

Entra ID (Recommended)

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.identity import DefaultAzureCredential

client = ImageAnalysisClient(
    endpoint=os.environ["VISION_ENDPOINT"],
    credential=DefaultAzureCredential()
)

Analyze Image from URL

from azure.ai.vision.imageanalysis.models import VisualFeatures

image_url = "https://example.com/image.jpg"

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[
        VisualFeatures.CAPTION,
        VisualFeatures.TAGS,
        VisualFeatures.OBJECTS,
        VisualFeatures.READ,
        VisualFeatures.PEOPLE,
        VisualFeatures.SMART_CROPS,
        VisualFeatures.DENSE_CAPTIONS
    ],
    gender_neutral_caption=True,
    language="en"
)

Analyze Image from File

with open("image.jpg", "rb") as f:
    image_data = f.read()

result = client.analyze(
    image_data=image_data,
    visual_features=[VisualFeatures.CAPTION, VisualFeatures.TAGS]
)

Image Caption

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.CAPTION],
    gender_neutral_caption=True
)

if result.caption:
    print(f"Caption: {result.caption.text}")
    print(f"Confidence: {result.caption.confidence:.2f}")

Dense Captions (Multiple Regions)

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.DENSE_CAPTIONS]
)

if result.dense_captions:
    for caption in result.dense_captions.list:
        print(f"Caption: {caption.text}")
        print(f"  Confidence: {caption.confidence:.2f}")
        print(f"  Bounding box: {caption.bounding_box}")

Object Detection

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.OBJECTS]
)

if result.objects:
    for obj in result.objects.list:
        print(f"Object: {obj.tags[0].name}")
        print(f"  Confidence: {obj.tags[0].confidence:.2f}")
        box = obj.bounding_box
        print(f"  Bounding box: x={box.x}, y={box.y}, w={box.width}, h={box.height}")

OCR (Text Extraction)

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.READ]
)

if result.read:
    for block in result.read.blocks:
        for line in block.lines:
            print(f"Line: {line.text}")
            print(f"  Bounding polygon: {line.bounding_polygon}")
            
            # Word-level details
            for word in line.words:
                print(f"  Word: {word.text} (confidence: {word.confidence:.2f})")

People Detection

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.PEOPLE]
)

if result.people:
    for person in result.people.list:
        print(f"Person detected:")
        print(f"  Confidence: {person.confidence:.2f}")
        box = person.bounding_box
        print(f"  Bounding box: x={box.x}, y={box.y}, w={box.width}, h={box.height}")

Smart Cropping

result = client.analyze_from_url(
    image_url=image_url,
    visual_features=[VisualFeatures.SMART_CROPS],
    smart_crops_aspect_ratios=[0.9, 1.33, 1.78]  # Portrait, 4:3, 16:9
)

if result.smart_crops:
    for crop in result.smart_crops.list:
        print(f"Aspect ratio: {crop.aspect_ratio}")
        box = crop.bounding_box
        print(f"  Crop region: x={box.x}, y={box.y}, w={box.width}, h={box.height}")

Async Client

from azure.ai.vision.imageanalysis.aio import ImageAnalysisClient
from azure.identity.aio import DefaultAzureCredential

async def analyze_image():
    async with ImageAnalysisClient(
        endpoint=endpoint,
        credential=DefaultAzureCredential()
    ) as client:
        result = await client.analyze_from_url(
            image_url=image_url,
            visual_features=[VisualFeatures.CAPTION]
        )
        print(result.caption.text)

Visual Features

Feature	Description
`CAPTION`	Single sentence describing the image
`DENSE_CAPTIONS`	Captions for multiple regions
`TAGS`	Content tags (objects, scenes, actions)
`OBJECTS`	Object detection with bounding boxes
`READ`	OCR text extraction
`PEOPLE`	People detection with bounding boxes

Error Handling

from azure.core.exceptions import HttpResponseError

try:
    result = client.analyze_from_url(
        image_url=image_url,
        visual_features=[VisualFeatures.CAPTION]
    )
except HttpResponseError as e:
    print(f"Status code: {e.status_code}")
    print(f"Reason: {e.reason}")
    print(f"Message: {e.error.message}")

Image Requirements

Formats: JPEG, PNG, GIF, BMP, WEBP, ICO, TIFF, MPO
Max size: 20 MB
Dimensions: 50x50 to 16000x16000 pixels

Best Practices

Select only needed features to optimize latency and cost
Use async client for high-throughput scenarios
Handle HttpResponseError for invalid images or auth issues
Enable gender_neutral_caption for inclusive descriptions
Specify language for localized captions
Use smart_crops_aspect_ratios matching your thumbnail requirements
Cache results when analyzing the same image multiple times

When to Use

This skill is applicable to execute the workflow or actions described in the overview.

Weekly Installs

Repository

sickn33/antigra…e-skills

GitHub Stars

28.5K

First Seen

Feb 17, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

codex45

opencode45

github-copilot44

amp44

cline44

kimi-cli44

Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU

127,000 周安装