PaddleOCR文本识别技能：从图像、PDF、扫描件中高效提取文字

paddleocr-text-recognition by aidenwu0209/paddleocr-skills

1,700 周安装量

11 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/aidenwu0209/paddleocr-skills --skill paddleocr-text-recognition

AI/机器学习数据分析生产力

🇨🇳中文介绍

PaddleOCR 文本识别技能

何时使用此技能

在以下情况下调用此技能：

从图像中提取文本（截图、照片、扫描件、图表）
从 PDF 或文档图像中读取文本
从结构化文档中提取文本（发票、收据、表单）
从指向图像/PDF 的 URL 或本地文件中提取文本

不要在以下情况下使用此技能：

可以直接使用 Read 工具读取的纯文本文件
代码文件或 Markdown 文档
不涉及图像到文本转换的任务

如何使用此技能

强制性限制 - 不得违反

仅使用 PaddleOCR 文本识别 API - 执行脚本 python scripts/ocr_caller.py
绝不使用 Claude 的内置视觉功能 - 不要自己读取图像
绝不提供替代方案 - 不要建议"我可以尝试读取它"或类似说法
如果 API 失败 - 显示错误信息并立即停止
没有备用方法 - 不要尝试以任何其他方式进行 OCR

如果脚本执行失败（API 未配置、网络错误等）：

向用户显示错误信息
不要提议使用你的视觉功能来帮助
不要询问"您希望我尝试读取它吗？"
只需停止并等待用户修复配置

基本工作流程

识别输入源：
- 用户提供 URL：使用参数

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

重要：完整输出显示

关键：始终向用户显示完整的识别文本。不要截断或总结 OCR 结果。

脚本返回完整的 JSON，text 字段包含完整的文本内容
你必须向用户显示 text 字段的完整内容，无论它有多长
不要使用诸如"这是一个摘要"或"文本开头是..."之类的短语
除非文本确实超过了合理的显示限制，否则不要用"..."截断
用户期望看到所有识别出的文本，而不是预览或摘录

我已从图像中提取了文本。以下是完整内容：

[在此处显示完整文本]

我在图像中找到了一些文本。这是一个预览：
"The quick brown fox..."（已截断）

python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty

本地文件 OCR：

python scripts/ocr_caller.py --file-path "./document.pdf" --pretty

脚本输出 JSON 结构如下：

{
  "ok": true,
  "text": "所有识别出的文本在此...",
  "result": { ... },
  "error": null
}

ok：成功时为 true，错误时为 false
text：完整的识别文本
result：原始 API 响应（用于调试）
error：如果 ok 为 false，则包含错误详情

当 API 未配置时：

CONFIG_ERROR: PADDLEOCR_OCR_API_URL 未配置。请在此获取您的 API：https://paddleocr.com

配置工作流程：

向用户显示确切的错误信息（包括 URL）

告知用户提供凭据：

请访问上面的 URL 以获取您的 PADDLEOCR_OCR_API_URL 和 PADDLEOCR_ACCESS_TOKEN。
获取后，请发送给我，我将自动进行配置。

当用户提供凭据时（接受任何格式）：
- PADDLEOCR_OCR_API_URL=https://xxx.paddleocr.com/ocr, PADDLEOCR_ACCESS_TOKEN=abc123...
- 这是我的 API：https://xxx 和 token：abc123
- 复制粘贴的代码格式
- 任何其他合理的格式
从用户消息中解析凭据：
- 提取 PADDLEOCR_OCR_API_URL 值（查找包含 paddleocr.com 或类似内容的 URL）
- 提取 PADDLEOCR_ACCESS_TOKEN 值（长字母数字字符串，通常 40+ 字符）

自动配置：

python scripts/configure.py --api-url "解析出的URL" --token "解析出的TOKEN"

如果配置成功：
- 通知用户："配置完成！现在运行 OCR..."
- 重试原始的 OCR 任务
如果配置失败：
- 显示错误
- 要求用户验证凭据

身份验证失败：

API_ERROR: 身份验证失败 (403)。请检查您的 token。

Token 无效，请使用正确的凭据重新配置

API_ERROR: API 速率限制已超出 (429)

每日 API 配额已用完，通知用户等待或升级

未检测到文本：

text 字段为空
图像可能为空白、损坏或不包含文本

提高结果的技巧

如果识别质量较差，建议：

检查图像是否清晰且包含文本
如果可能，提供更高分辨率的图像

如需深入了解 OCR 系统，请参考：

references/output_schema.md - 输出格式规范
references/provider_api.md - 提供商 API 合约

注意：模型版本和功能由您的 API 端点（PADDLEOCR_OCR_API_URL）决定。

要验证技能是否正常工作：

python scripts/smoke_test.py

这将测试配置和 API 连接性。

🇺🇸English

PaddleOCR Text Recognition Skill

When to Use This Skill

Invoke this skill in the following situations:

Extract text from images (screenshots, photos, scans, charts)
Read text from PDFs or document images
Extract text from structured documents (invoices, receipts, forms)
Extract text from URLs or local files pointing to images/PDFs

Do not use this skill in the following situations:

Plain text files that can be read directly with the Read tool
Code files or markdown documents
Tasks that do not involve image-to-text conversion

How to Use This Skill

MANDATORY RESTRICTIONS - DO NOT VIOLATE

ONLY use PaddleOCR Text Recognition API - Execute the script python scripts/ocr_caller.py
NEVER use Claude's built-in vision - Do NOT read images yourself
NEVER offer alternatives - Do NOT suggest "I can try to read it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt OCR any other way

If the script execution fails (API not configured, network error, etc.):

Show the error message to the user
Do NOT offer to help using your vision capabilities
Do NOT ask "Would you like me to try reading it?"
Simply stop and wait for user to fix the configuration

Basic Workflow

Identify the input source :
- User provides URL: Use the --file-url parameter
- User provides local file path: Use the --file-path parameter
- User uploads image: Save it first, then use --file-path

Execute OCR :

python scripts/ocr_caller.py --file-url "URL provided by user" --pretty

Or for local files:

     python scripts/ocr_caller.py --file-path "file path" --pretty

Save result to file (recommended):

     python scripts/ocr_caller.py --file-url "URL" --output result.json --pretty

3. Parse JSON response :

 * Check the `ok` field: `true` means success, `false` means error
 * Extract text: `text` field contains all recognized text
 * Handle errors: If `ok` is false, display `error.message`

4. Present results to user :

 * Display extracted text in a readable format
 * If the text is empty, the image may contain no text

IMPORTANT: Complete Output Display

CRITICAL : Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.

The script returns the full JSON with complete text content in text field
You MUST display the entiretext content to the user, no matter how long it is
Do NOT use phrases like "Here's a summary" or "The text begins with..."
Do NOT truncate with "..." unless the text truly exceeds reasonable display limits
The user expects to see ALL the recognized text, not a preview or excerpt

Correct approach :

I've extracted the text from the image. Here's the complete content:

[Display the entire text here]

Incorrect approach :

I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)

Usage Examples

URL OCR :

python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty

Local File OCR :

python scripts/ocr_caller.py --file-path "./document.pdf" --pretty

Understanding the Output

The script outputs JSON structure as follows:

{
  "ok": true,
  "text": "All recognized text here...",
  "result": { ... },
  "error": null
}

Key fields :

ok: true for success, false for error
text: Complete recognized text
result: Raw API response (for debugging)
error: Error details if ok is false

First-Time Configuration

When API is not configured :

The error will show:

CONFIG_ERROR: PADDLEOCR_OCR_API_URL not configured. Get your API at: https://paddleocr.com

Configuration workflow :

Show the exact error message to user (including the URL)

Tell user to provide credentials :

Please visit the URL above to get your PADDLEOCR_OCR_API_URL and PADDLEOCR_ACCESS_TOKEN.
Once you have them, send them to me and I'll configure it automatically.

When user provides credentials (accept any format):
- PADDLEOCR_OCR_API_URL=https://xxx.paddleocr.com/ocr, PADDLEOCR_ACCESS_TOKEN=abc123...
- Here's my API: https://xxx and token: abc123
- Copy-pasted code format
- Any other reasonable format
Parse credentials from user's message :
- Extract PADDLEOCR_OCR_API_URL value (look for URLs with paddleocr.com or similar)
- Extract PADDLEOCR_ACCESS_TOKEN value (long alphanumeric string, usually 40+ chars)

Configure automatically :

python scripts/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN"

Error Handling

Authentication failed :

API_ERROR: Authentication failed (403). Check your token.

Token is invalid, reconfigure with correct credentials

Quota exceeded :

API_ERROR: API rate limit exceeded (429)

Daily API quota exhausted, inform user to wait or upgrade

No text detected :

text field is empty
Image may be blank, corrupted, or contain no text

Tips for Better Results

If recognition quality is poor, suggest:

Check if the image is clear and contains text
Provide a higher resolution image if possible

Reference Documentation

For in-depth understanding of the OCR system, refer to:

references/output_schema.md - Output format specification
references/provider_api.md - Provider API contract

Note : Model version and capabilities are determined by your API endpoint (PADDLEOCR_OCR_API_URL).

Testing the Skill

To verify the skill is working properly:

python scripts/smoke_test.py

This tests configuration and API connectivity.

Weekly Installs

707

Repository

aidenwu0209/pad…r-skills

GitHub Stars

First Seen

Feb 9, 2026

Security Audits

Gen Agent Trust HubPass SocketFail SnykFail

Installed on

opencode696

codex694

gemini-cli694

kimi-cli693

amp693

github-copilot693

If configuration succeeds :

Inform user: "Configuration complete! Running OCR now..."
Retry the original OCR task

If configuration fails :

Show the error
Ask user to verify the credentials