paddleocr-doc-parsing by aidenwu0209/paddleocr-skills
npx skills add https://github.com/aidenwu0209/paddleocr-skills --skill paddleocr-doc-parsing✅ 使用文档解析适用于:
❌ 使用文本识别替代适用于:
⛔ 强制限制 - 请勿违反 ⛔
python scripts/vl_caller.py如果脚本执行失败(API 未配置、网络错误等):
:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
python scripts/vl_caller.py --file-url "用户提供的URL"
或者对于本地文件:
python scripts/vl_caller.py --file-path "文件路径"
可选:显式设置文件类型:
python scripts/vl_caller.py --file-url "用户提供的URL" --file-type 0
* `--file-type 0`: PDF
* `--file-type 1`: 图像
* 如果省略,服务可以从输入推断文件类型。
将结果保存到文件(推荐):
python scripts/vl_caller.py --file-url "URL" --output result.json --pretty
* 脚本将显示:`Result saved to: /absolute/path/to/result.json`
* 此消息出现在 stderr 上,JSON 被保存到文件
* **告诉用户消息中显示的文件路径**
2. 脚本返回包含所有文档内容的完整 JSON:
* 页眉、页脚、页码
* 正文内容
* 带结构的表格
* 公式(带有 LaTeX)
* 图形和图表
* 脚注和参考文献
* 印章和图章
* 布局和阅读顺序
注意:实际可以解析的内容类型取决于您的 API 端点(PADDLEOCR_DOC_PARSING_API_URL)配置的模型。上面的列表代表了支持类型的最大集合。
关键:您必须根据用户需求向用户显示完整的提取内容。
text 字段这意味着:
示例 - 正确:
User: "从这份文档中提取所有文本"
Claude: 我已解析了完整文档。以下是所有提取的文本:
[显示整个文本字段]
文档统计:
- 总区域数:25
- 文本块:15
- 表格:3
- 公式:2
质量:优秀(置信度:0.92)
示例 - 错误 ❌:
User: "提取所有文本"
Claude: "我找到了一个包含多个部分的文档。这是开头部分:
'引言...'(为简洁起见,内容已截断)"
脚本返回一个包装原始 API 结果的 JSON 信封:
{
"ok": true,
"text": "从所有页面提取的完整 markdown/HTML 文本",
"result": [
{
"prunedResult": { ... }, // 布局元素位置、内容、置信度
"markdown": {
"text": "完整页面内容,markdown/HTML 格式",
"images": { ... }
}
}
],
"error": null
}
关键字段:
text — 从所有页面提取的 markdown 文本(用于快速文本显示)result — 原始 API 结果数组(每个页面一个对象)result[n].prunedResult — 布局元素位置、内容和置信度分数result[n].markdown — 完整页面内容,markdown/HTML 格式| 用户说 | 提取什么 | 方法 |
|---|---|---|
| "提取所有文本" | 所有内容 | 直接使用 text 字段 |
| "获取所有表格" | 仅表格 | 在 markdown 文本中查找 <table> |
| "显示主要内容" | 正文文本 | 使用 text 字段,根据需要过滤 |
| "完整文档" | 所有内容 | 使用 text 字段 |
示例 1:提取主要内容(默认行为)
python scripts/vl_caller.py \
--file-url "https://example.com/paper.pdf" \
--pretty
然后使用 text 字段显示主要内容。
示例 2:仅提取表格
python scripts/vl_caller.py \
--file-path "./financial_report.pdf" \
--pretty
然后在结果中查找 <table> 内容以提取表格。
示例 3:包含所有内容的完整文档
python scripts/vl_caller.py \
--file-url "URL" \
--pretty
然后使用 text 字段或遍历完整结果。
当 API 未配置时:
错误将显示:
Configuration error: API not configured. Get your API at: https://paddleocr.com
配置工作流程:
向用户显示确切的错误信息(包括 URL)
告诉用户提供凭据:
请访问上面的 URL 以获取您的 PADDLEOCR_DOC_PARSING_API_URL 和 PADDLEOCR_ACCESS_TOKEN。
获取后,请发送给我,我将自动配置。
当用户提供凭据时(接受任何格式):
PADDLEOCR_DOC_PARSING_API_URL=https://xxx.paddleocr.com/layout-parsing, PADDLEOCR_ACCESS_TOKEN=abc123...这是我的 API:https://xxx 和令牌:abc123从用户消息中解析凭据:
自动配置:
python scripts/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN"
如果配置成功:
如果配置失败:
重要:错误信息格式是严格的,必须完全按照脚本提供的方式显示。不要修改或转述。
API 没有文件大小限制。对于 PDF,每个请求最多 100 页。
大文件提示:
对于非常大的本地文件,优先使用 --file-url 而不是 --file-path,以避免 base64 编码开销:
python scripts/vl_caller.py --file-url "https://your-server.com/large_file.pdf"
如果只需要大型 PDF 中的某些页面,请先提取它们:
# 使用 pypdfium2(需要:pip install pypdfium2)
python -c "
import pypdfium2 as pdfium
doc = pdfium.PdfDocument('large.pdf')
# 提取页面 0-4(前 5 页)
new_doc = pdfium.PdfDocument.new()
for i in range(min(5, len(doc))):
new_doc.import_pages(doc, [i])
new_doc.save('pages_1_5.pdf')
"
# 然后处理较小的文件
python scripts/vl_caller.py --file-path "pages_1_5.pdf"
认证失败 (403):
error: Authentication failed
→ 令牌无效,使用正确的凭据重新配置
API 配额超出 (429):
error: API quota exceeded
→ 每日 API 配额已用完,通知用户等待或升级
不支持格式:
error: Unsupported file format
→ 文件格式不支持,转换为 PDF/PNG/JPG
要深入了解 PaddleOCR 文档解析系统,请参考:
references/output_schema.md - 输出格式规范references/provider_api.md - 提供商 API 合约注意:模型版本和功能由您的 API 端点(PADDLEOCR_DOC_PARSING_API_URL)决定。
在以下情况下将这些参考文档加载到上下文中:
要验证技能是否正常工作:
python scripts/smoke_test.py
这将测试配置和可选的 API 连接性。
每周安装次数
50
仓库
GitHub 星标数
5
首次出现
2026年2月9日
安全审计
安装于
gemini-cli48
opencode48
codex47
github-copilot46
kimi-cli45
amp45
✅ Use Document Parsing for :
❌ Use Text Recognition instead for :
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
python scripts/vl_caller.pyIf the script execution fails (API not configured, network error, etc.):
Execute document parsing :
python scripts/vl_caller.py --file-url "URL provided by user"
Or for local files:
python scripts/vl_caller.py --file-path "file path"
Optional: explicitly set file type :
python scripts/vl_caller.py --file-url "URL provided by user" --file-type 0
* `--file-type 0`: PDF
* `--file-type 1`: image
* If omitted, the service can infer file type from input.
Save result to file (recommended):
python scripts/vl_caller.py --file-url "URL" --output result.json --pretty
* The script will display: `Result saved to: /absolute/path/to/result.json`
* This message appears on stderr, the JSON is saved to the file
* **Tell the user the file path** shown in the message
2. The script returns COMPLETE JSON with all document content:
* Headers, footers, page numbers
* Main text content
* Tables with structure
* Formulas (with LaTeX)
* Figures and charts
* Footnotes and references
* Seals and stamps
* Layout and reading order
Note : The actual content types that can be parsed depend on the model configured at your API endpoint (PADDLEOCR_DOC_PARSING_API_URL). The list above represents the maximum set of supported types.
CRITICAL : You must display the COMPLETE extracted content to the user based on their needs.
text fieldWhat this means :
Example - Correct :
User: "Extract all the text from this document"
Claude: I've parsed the complete document. Here's all the extracted text:
[Display the entire text field]
Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)
Example - Incorrect ❌:
User: "Extract all the text"
Claude: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"
The script returns a JSON envelope wrapping the raw API result:
{
"ok": true,
"text": "Full markdown/HTML text extracted from all pages",
"result": [
{
"prunedResult": { ... }, // layout element positions, content, confidence
"markdown": {
"text": "Full page content in markdown/HTML format",
"images": { ... }
}
}
],
"error": null
}
Key fields :
text — extracted markdown text from all pages (use this for quick text display)result — raw API result array (one object per page)result[n].prunedResult — layout element positions, content, and confidence scoresresult[n].markdown — full page content in markdown/HTML format| User Says | What to Extract | How |
|---|---|---|
| "Extract all text" | Everything | Use text field directly |
| "Get all tables" | Tables only | Look for <table> in the markdown text |
| "Show main content" | Main body text | Use text field, filter as needed |
| "Complete document" | Everything | Use text field |
Example 1: Extract Main Content (default behavior)
python scripts/vl_caller.py \
--file-url "https://example.com/paper.pdf" \
--pretty
Then use the text field for main content display.
Example 2: Extract Tables Only
python scripts/vl_caller.py \
--file-path "./financial_report.pdf" \
--pretty
Then look for <table> content in the result to extract tables.
Example 3: Complete Document with Everything
python scripts/vl_caller.py \
--file-url "URL" \
--pretty
Then use the text field or iterate the full result.
When API is not configured :
The error will show:
Configuration error: API not configured. Get your API at: https://paddleocr.com
Configuration workflow :
Show the exact error message to user (including the URL)
Tell user to provide credentials :
Please visit the URL above to get your PADDLEOCR_DOC_PARSING_API_URL and PADDLEOCR_ACCESS_TOKEN.
Once you have them, send them to me and I'll configure it automatically.
When user provides credentials (accept any format):
PADDLEOCR_DOC_PARSING_API_URL=https://xxx.paddleocr.com/layout-parsing, PADDLEOCR_ACCESS_TOKEN=abc123...Here's my API: https://xxx and token: abc123Parse credentials from user's message :
Configure automatically :
python scripts/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN"
IMPORTANT : The error message format is STRICT and must be shown exactly as provided by the script. Do not modify or paraphrase it.
There is no file size limit for the API. For PDFs, the maximum is 100 pages per request.
Tips for large files :
For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:
python scripts/vl_caller.py --file-url "https://your-server.com/large_file.pdf"
If you only need certain pages from a large PDF, extract them first:
# Using pypdfium2 (requires: pip install pypdfium2)
python -c "
import pypdfium2 as pdfium
doc = pdfium.PdfDocument('large.pdf')
# Extract pages 0-4 (first 5 pages)
new_doc = pdfium.PdfDocument.new()
for i in range(min(5, len(doc))):
new_doc.import_pages(doc, [i])
new_doc.save('pages_1_5.pdf')
"
# Then process the smaller file
python scripts/vl_caller.py --file-path "pages_1_5.pdf"
Authentication failed (403) :
error: Authentication failed
→ Token is invalid, reconfigure with correct credentials
API quota exceeded (429) :
error: API quota exceeded
→ Daily API quota exhausted, inform user to wait or upgrade
Unsupported format :
error: Unsupported file format
→ File format not supported, convert to PDF/PNG/JPG
For in-depth understanding of the PaddleOCR Document Parsing system, refer to:
references/output_schema.md - Output format specificationreferences/provider_api.md - Provider API contractNote : Model version and capabilities are determined by your API endpoint (PADDLEOCR_DOC_PARSING_API_URL).
Load these reference documents into context when:
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and optionally API connectivity.
Weekly Installs
50
Repository
GitHub Stars
5
First Seen
Feb 9, 2026
Security Audits
Gen Agent Trust HubPassSocketFailSnykFail
Installed on
gemini-cli48
opencode48
codex47
github-copilot46
kimi-cli45
amp45
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
47,700 周安装
If configuration succeeds :
If configuration fails :