npx skills add https://github.com/yejinlei/pdf-ocr-skill --skill pdf-ocr-skillPDF OCR技能用于从影印版PDF文件和图片文件中提取文字内容。该技能支持两种OCR引擎:
pip install pymupdf pillow requests python-dotenv
安装RapidOCR以获得本地识别能力:
pip install rapidocr_onnxruntime
.env.example 文件并重命名为 .env广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
# OCR引擎选择
# - "rapid": 使用RapidOCR本地引擎(默认,无需API密钥)
# - "siliconflow": 使用硅基流动API引擎(需要API密钥)
OCR_ENGINE=rapid
# 如果使用硅基流动API引擎,需要配置以下选项:
SILICON_FLOW_API_KEY=your_api_key_here
SILICON_FLOW_OCR_MODEL=deepseek-ai/DeepSeek-OCR
# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor
# 创建处理器实例(默认使用RapidOCR)
processor = PDFOCRProcessor()
# 执行PDF OCR识别
result = processor.ocr_pdf('path/to/your/scanned.pdf')
# 获取识别结果
print(f"识别完成,共 {result['page_count']} 页")
print(f"使用引擎: {result['engine']}")
print(result['text'])
# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor
# 创建处理器实例,指定使用硅基流动API
processor = PDFOCRProcessor(engine="siliconflow")
# 执行PDF OCR识别
result = processor.ocr_pdf('path/to/your/scanned.pdf')
# 获取识别结果
print(f"识别完成,共 {result['page_count']} 页")
print(result['text'])
# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor
# 创建处理器实例
processor = PDFOCRProcessor() # 或 PDFOCRProcessor(engine="siliconflow")
# 执行图片OCR识别
result = processor.ocr_image_file('path/to/your/image.jpg')
# 获取识别结果
print(f"识别结果: {result['text']}")
# 使用默认RapidOCR引擎
python pdf_ocr_processor.py your_document.pdf
# 使用硅基流动API引擎
python pdf_ocr_processor.py your_document.pdf siliconflow
import os
from scripts.pdf_ocr_processor import PDFOCRProcessor
# 创建处理器实例
processor = PDFOCRProcessor()
# 批量处理目录中的所有PDF文件
pdf_dir = "path/to/pdf/files"
output_dir = "path/to/output"
os.makedirs(output_dir, exist_ok=True)
for pdf_file in os.listdir(pdf_dir):
if pdf_file.endswith('.pdf'):
pdf_path = os.path.join(pdf_dir, pdf_file)
output_path = os.path.join(output_dir, f"{os.path.splitext(pdf_file)[0]}.txt")
print(f"处理文件: {pdf_file}")
try:
result = processor.ocr_pdf(pdf_path)
# 保存识别结果到文本文件
with open(output_path, 'w', encoding='utf-8') as f:
f.write(f"=== PDF OCR 识别结果 ===\n")
f.write(f"文件名: {pdf_file}\n")
f.write(f"页数: {result['page_count']}\n")
f.write(f"使用引擎: {result['engine']}\n\n")
f.write(result['text'])
print(f"处理完成,结果已保存到: {output_path}")
except Exception as e:
print(f"处理失败: {e}")
from scripts.pdf_ocr_processor import PDFOCRProcessor
def process_with_best_engine(pdf_path):
"""尝试使用RapidOCR,如果效果不佳则使用硅基流动API"""
# 首先使用RapidOCR本地引擎
rapid_processor = PDFOCRProcessor(engine="rapid")
rapid_result = rapid_processor.ocr_pdf(pdf_path)
# 简单评估识别效果(例如:检查识别出的文本长度)
text_length = len(rapid_result['text'])
if text_length < 100: # 如果识别出的文本太短,可能效果不佳
print("RapidOCR识别效果可能不佳,尝试使用硅基流动API...")
silicon_processor = PDFOCRProcessor(engine="siliconflow")
silicon_result = silicon_processor.ocr_pdf(pdf_path)
return silicon_result
else:
return rapid_result
# 使用示例
result = process_with_best_engine('path/to/your/document.pdf')
print(f"识别完成,使用引擎: {result['engine']}")
print(result['text'])
{
"text": "识别的完整文本内容",
"page_count": 页数, # 图片文件始终为1
"engine": "rapid" | "siliconflow" # 使用的OCR引擎
}
RapidOCR引擎 :
硅基流动API引擎 :
对于复杂的扫描版PDF或图片,识别准确率可能会有所不同
建议使用高清晰度的扫描版PDF或图片以获得更好的识别效果
在与 AI IDE 中的助手交互时,您可以使用以下提示词来指定使用不同的 OCR 引擎:
示例 1:使用本地引擎
用户:帮我处理这个扫描版 PDF,用本地 OCR 引擎快速识别
助手:好的,我将使用 RapidOCR 本地引擎为您处理。请提供 PDF 文件路径。
示例 2:使用云端引擎
用户:这个 PDF 包含手写体,需要高精度识别,用硅基流动 API
助手:理解,我将使用硅基流动 API 大模型为您处理。请提供 PDF 文件路径和您的 API 密钥(如果尚未配置)。
示例 3:自动选择
用户:帮我识别这个 PDF,选择最合适的引擎
助手:我将默认使用 RapidOCR 本地引擎为您处理。如果识别效果不理想,我们可以尝试使用硅基流动 API。
当 AI 助手接收到这些提示词时,会:
通过使用这些提示词,您可以在与 AI IDE 交互时灵活控制 OCR 引擎的选择,获得最佳的识别效果
RapidOCR初始化失败
ModuleNotFoundError: No module named 'rapidocr_onnxruntime'pip install rapidocr_onnxruntime硅基流动API 401错误
Unauthorized: 401 Client Error.env文件中PDF转图片失败
ImportError: No module named 'fitz'pip install pymupdf识别结果为空
PDF OCR Skill is used to extract text content from scanned PDF files and image files. This skill supports two OCR engines:
pip install pymupdf pillow requests python-dotenv
Install RapidOCR for local recognition capability:
pip install rapidocr_onnxruntime
.env.example file and rename it to .env# OCR engine selection
# - "rapid": Use RapidOCR local engine (default, no API key required)
# - "siliconflow": Use SiliconFlow API engine (API key required)
OCR_ENGINE=rapid
# If using SiliconFlow API engine, configure the following options:
SILICON_FLOW_API_KEY=your_api_key_here
SILICON_FLOW_OCR_MODEL=deepseek-ai/DeepSeek-OCR
# Import OCR processor
from scripts.pdf_ocr_processor import PDFOCRProcessor
# Create processor instance (default uses RapidOCR)
processor = PDFOCRProcessor()
# Perform PDF OCR recognition
result = processor.ocr_pdf('path/to/your/scanned.pdf')
# Get recognition result
print(f"Recognition completed, total {result['page_count']} pages")
print(f"Engine used: {result['engine']}")
print(result['text'])
# Import OCR processor
from scripts.pdf_ocr_processor import PDFOCRProcessor
# Create processor instance, specify to use SiliconFlow API
processor = PDFOCRProcessor(engine="siliconflow")
# Perform PDF OCR recognition
result = processor.ocr_pdf('path/to/your/scanned.pdf')
# Get recognition result
print(f"Recognition completed, total {result['page_count']} pages")
print(result['text'])
# Import OCR processor
from scripts.pdf_ocr_processor import PDFOCRProcessor
# Create processor instance
processor = PDFOCRProcessor() # or PDFOCRProcessor(engine="siliconflow")
# Perform image OCR recognition
result = processor.ocr_image_file('path/to/your/image.jpg')
# Get recognition result
print(f"Recognition result: {result['text']}")
# Use default RapidOCR engine
python pdf_ocr_processor.py your_document.pdf
# Use SiliconFlow API engine
python pdf_ocr_processor.py your_document.pdf siliconflow
import os
from scripts.pdf_ocr_processor import PDFOCRProcessor
# Create processor instance
processor = PDFOCRProcessor()
# Batch process all PDF files in directory
pdf_dir = "path/to/pdf/files"
output_dir = "path/to/output"
os.makedirs(output_dir, exist_ok=True)
for pdf_file in os.listdir(pdf_dir):
if pdf_file.endswith('.pdf'):
pdf_path = os.path.join(pdf_dir, pdf_file)
output_path = os.path.join(output_dir, f"{os.path.splitext(pdf_file)[0]}.txt")
print(f"Processing file: {pdf_file}")
try:
result = processor.ocr_pdf(pdf_path)
# Save recognition result to text file
with open(output_path, 'w', encoding='utf-8') as f:
f.write(f"=== PDF OCR Recognition Result ===\n")
f.write(f"File name: {pdf_file}\n")
f.write(f"Pages: {result['page_count']}\n")
f.write(f"Engine used: {result['engine']}\n\n")
f.write(result['text'])
print(f"Processing completed, result saved to: {output_path}")
except Exception as e:
print(f"Processing failed: {e}")
from scripts.pdf_ocr_processor import PDFOCRProcessor
def process_with_best_engine(pdf_path):
"""Try using RapidOCR, if not good enough then use SiliconFlow API"""
# First use RapidOCR local engine
rapid_processor = PDFOCRProcessor(engine="rapid")
rapid_result = rapid_processor.ocr_pdf(pdf_path)
# Simple evaluation of recognition effect (e.g., check recognized text length)
text_length = len(rapid_result['text'])
if text_length < 100: # If recognized text is too short, may not be good enough
print("RapidOCR recognition effect may not be good enough, trying SiliconFlow API...")
silicon_processor = PDFOCRProcessor(engine="siliconflow")
silicon_result = silicon_processor.ocr_pdf(pdf_path)
return silicon_result
else:
return rapid_result
# Usage example
result = process_with_best_engine('path/to/your/document.pdf')
print(f"Recognition completed, engine used: {result['engine']}")
print(result['text'])
{
"text": "Recognized full text content",
"page_count": number_of_pages, # Always 1 for image files
"engine": "rapid" | "siliconflow" # OCR engine used
}
RapidOCR Engine :
SiliconFlow API Engine :
Recognition accuracy may vary for complex scanned PDFs or images
It is recommended to use high-resolution scanned PDFs or images for better recognition results
When interacting with assistants in AI IDEs, you can use the following prompt words to specify different OCR engines:
Example 1: Using Local Engine
User: Help me process this scanned PDF, use local OCR engine for quick recognition
Assistant: Sure, I'll use the RapidOCR local engine for you. Please provide the PDF file path.
Example 2: Using Cloud Engine
User: This PDF contains handwritten text, need high-precision recognition, use SiliconFlow API
Assistant: Understood, I'll use the SiliconFlow API large model for you. Please provide the PDF file path and your API key (if not already configured).
Example 3: Automatic Selection
User: Help me recognize this PDF, choose the most suitable engine
Assistant: I'll default to using the RapidOCR local engine for you. If the recognition effect is not ideal, we can try using SiliconFlow API.
When the AI assistant receives these prompt words, it will:
By using these prompt words, you can flexibly control the OCR engine selection when interacting with AI IDEs to get the best recognition results
RapidOCR Initialization Failure
ModuleNotFoundError: No module named 'rapidocr_onnxruntime'pip install rapidocr_onnxruntimeSiliconFlow API 401 Error
Unauthorized: 401 Client Error.env filePDF to Image Conversion Failure
ImportError: No module named 'fitz'pip install pymupdfEmpty Recognition Result
MIT License - See LICENSE.txt
Weekly Installs
430
Repository
GitHub Stars
2
First Seen
13 days ago
Security Audits
Installed on
codex428
cline426
gemini-cli426
kimi-cli426
cursor426
opencode426
PDF OCR技能用于从影印版PDF文件和图片文件中提取文字内容。该技能支持两种OCR引擎:
pip install pymupdf pillow requests python-dotenv
安装RapidOCR以获得本地识别能力:
pip install rapidocr_onnxruntime
.env.example 文件并重命名为 .env# OCR引擎选择
# - "rapid": 使用RapidOCR本地引擎(默认,无需API密钥)
# - "siliconflow": 使用硅基流动API引擎(需要API密钥)
OCR_ENGINE=rapid
# 如果使用硅基流动API引擎,需要配置以下选项:
SILICON_FLOW_API_KEY=your_api_key_here
SILICON_FLOW_OCR_MODEL=deepseek-ai/DeepSeek-OCR
# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor
# 创建处理器实例(默认使用RapidOCR)
processor = PDFOCRProcessor()
# 执行PDF OCR识别
result = processor.ocr_pdf('path/to/your/scanned.pdf')
# 获取识别结果
print(f"识别完成,共 {result['page_count']} 页")
print(f"使用引擎: {result['engine']}")
print(result['text'])
# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor
# 创建处理器实例,指定使用硅基流动API
processor = PDFOCRProcessor(engine="siliconflow")
# 执行PDF OCR识别
result = processor.ocr_pdf('path/to/your/scanned.pdf')
# 获取识别结果
print(f"识别完成,共 {result['page_count']} 页")
print(result['text'])
# 导入OCR处理器
from scripts.pdf_ocr_processor import PDFOCRProcessor
# 创建处理器实例
processor = PDFOCRProcessor() # 或 PDFOCRProcessor(engine="siliconflow")
# 执行图片OCR识别
result = processor.ocr_image_file('path/to/your/image.jpg')
# 获取识别结果
print(f"识别结果: {result['text']}")
# 使用默认RapidOCR引擎
python pdf_ocr_processor.py your_document.pdf
# 使用硅基流动API引擎
python pdf_ocr_processor.py your_document.pdf siliconflow
import os
from scripts.pdf_ocr_processor import PDFOCRProcessor
# 创建处理器实例
processor = PDFOCRProcessor()
# 批量处理目录中的所有PDF文件
pdf_dir = "path/to/pdf/files"
output_dir = "path/to/output"
os.makedirs(output_dir, exist_ok=True)
for pdf_file in os.listdir(pdf_dir):
if pdf_file.endswith('.pdf'):
pdf_path = os.path.join(pdf_dir, pdf_file)
output_path = os.path.join(output_dir, f"{os.path.splitext(pdf_file)[0]}.txt")
print(f"处理文件: {pdf_file}")
try:
result = processor.ocr_pdf(pdf_path)
# 保存识别结果到文本文件
with open(output_path, 'w', encoding='utf-8') as f:
f.write(f"=== PDF OCR 识别结果 ===\n")
f.write(f"文件名: {pdf_file}\n")
f.write(f"页数: {result['page_count']}\n")
f.write(f"使用引擎: {result['engine']}\n\n")
f.write(result['text'])
print(f"处理完成,结果已保存到: {output_path}")
except Exception as e:
print(f"处理失败: {e}")
from scripts.pdf_ocr_processor import PDFOCRProcessor
def process_with_best_engine(pdf_path):
"""尝试使用RapidOCR,如果效果不佳则使用硅基流动API"""
# 首先使用RapidOCR本地引擎
rapid_processor = PDFOCRProcessor(engine="rapid")
rapid_result = rapid_processor.ocr_pdf(pdf_path)
# 简单评估识别效果(例如:检查识别出的文本长度)
text_length = len(rapid_result['text'])
if text_length < 100: # 如果识别出的文本太短,可能效果不佳
print("RapidOCR识别效果可能不佳,尝试使用硅基流动API...")
silicon_processor = PDFOCRProcessor(engine="siliconflow")
silicon_result = silicon_processor.ocr_pdf(pdf_path)
return silicon_result
else:
return rapid_result
# 使用示例
result = process_with_best_engine('path/to/your/document.pdf')
print(f"识别完成,使用引擎: {result['engine']}")
print(result['text'])
{
"text": "识别的完整文本内容",
"page_count": 页数, # 图片文件始终为1
"engine": "rapid" | "siliconflow" # 使用的OCR引擎
}
RapidOCR引擎 :
硅基流动API引擎 :
对于复杂的扫描版PDF或图片,识别准确率可能会有所不同
建议使用高清晰度的扫描版PDF或图片以获得更好的识别效果
在与 AI IDE 中的助手交互时,您可以使用以下提示词来指定使用不同的 OCR 引擎:
示例 1:使用本地引擎
用户:帮我处理这个扫描版 PDF,用本地 OCR 引擎快速识别
助手:好的,我将使用 RapidOCR 本地引擎为您处理。请提供 PDF 文件路径。
示例 2:使用云端引擎
用户:这个 PDF 包含手写体,需要高精度识别,用硅基流动 API
助手:理解,我将使用硅基流动 API 大模型为您处理。请提供 PDF 文件路径和您的 API 密钥(如果尚未配置)。
示例 3:自动选择
用户:帮我识别这个 PDF,选择最合适的引擎
助手:我将默认使用 RapidOCR 本地引擎为您处理。如果识别效果不理想,我们可以尝试使用硅基流动 API。
当 AI 助手接收到这些提示词时,会:
通过使用这些提示词,您可以在与 AI IDE 交互时灵活控制 OCR 引擎的选择,获得最佳的识别效果
RapidOCR初始化失败
ModuleNotFoundError: No module named 'rapidocr_onnxruntime'pip install rapidocr_onnxruntime硅基流动API 401错误
Unauthorized: 401 Client Error.env文件中PDF转图片失败
ImportError: No module named 'fitz'pip install pymupdf识别结果为空
PDF OCR Skill is used to extract text content from scanned PDF files and image files. This skill supports two OCR engines:
pip install pymupdf pillow requests python-dotenv
Install RapidOCR for local recognition capability:
pip install rapidocr_onnxruntime
.env.example file and rename it to .env# OCR engine selection
# - "rapid": Use RapidOCR local engine (default, no API key required)
# - "siliconflow": Use SiliconFlow API engine (API key required)
OCR_ENGINE=rapid
# If using SiliconFlow API engine, configure the following options:
SILICON_FLOW_API_KEY=your_api_key_here
SILICON_FLOW_OCR_MODEL=deepseek-ai/DeepSeek-OCR
# Import OCR processor
from scripts.pdf_ocr_processor import PDFOCRProcessor
# Create processor instance (default uses RapidOCR)
processor = PDFOCRProcessor()
# Perform PDF OCR recognition
result = processor.ocr_pdf('path/to/your/scanned.pdf')
# Get recognition result
print(f"Recognition completed, total {result['page_count']} pages")
print(f"Engine used: {result['engine']}")
print(result['text'])
# Import OCR processor
from scripts.pdf_ocr_processor import PDFOCRProcessor
# Create processor instance, specify to use SiliconFlow API
processor = PDFOCRProcessor(engine="siliconflow")
# Perform PDF OCR recognition
result = processor.ocr_pdf('path/to/your/scanned.pdf')
# Get recognition result
print(f"Recognition completed, total {result['page_count']} pages")
print(result['text'])
# Import OCR processor
from scripts.pdf_ocr_processor import PDFOCRProcessor
# Create processor instance
processor = PDFOCRProcessor() # or PDFOCRProcessor(engine="siliconflow")
# Perform image OCR recognition
result = processor.ocr_image_file('path/to/your/image.jpg')
# Get recognition result
print(f"Recognition result: {result['text']}")
# Use default RapidOCR engine
python pdf_ocr_processor.py your_document.pdf
# Use SiliconFlow API engine
python pdf_ocr_processor.py your_document.pdf siliconflow
import os
from scripts.pdf_ocr_processor import PDFOCRProcessor
# Create processor instance
processor = PDFOCRProcessor()
# Batch process all PDF files in directory
pdf_dir = "path/to/pdf/files"
output_dir = "path/to/output"
os.makedirs(output_dir, exist_ok=True)
for pdf_file in os.listdir(pdf_dir):
if pdf_file.endswith('.pdf'):
pdf_path = os.path.join(pdf_dir, pdf_file)
output_path = os.path.join(output_dir, f"{os.path.splitext(pdf_file)[0]}.txt")
print(f"Processing file: {pdf_file}")
try:
result = processor.ocr_pdf(pdf_path)
# Save recognition result to text file
with open(output_path, 'w', encoding='utf-8') as f:
f.write(f"=== PDF OCR Recognition Result ===\n")
f.write(f"File name: {pdf_file}\n")
f.write(f"Pages: {result['page_count']}\n")
f.write(f"Engine used: {result['engine']}\n\n")
f.write(result['text'])
print(f"Processing completed, result saved to: {output_path}")
except Exception as e:
print(f"Processing failed: {e}")
from scripts.pdf_ocr_processor import PDFOCRProcessor
def process_with_best_engine(pdf_path):
"""Try using RapidOCR, if not good enough then use SiliconFlow API"""
# First use RapidOCR local engine
rapid_processor = PDFOCRProcessor(engine="rapid")
rapid_result = rapid_processor.ocr_pdf(pdf_path)
# Simple evaluation of recognition effect (e.g., check recognized text length)
text_length = len(rapid_result['text'])
if text_length < 100: # If recognized text is too short, may not be good enough
print("RapidOCR recognition effect may not be good enough, trying SiliconFlow API...")
silicon_processor = PDFOCRProcessor(engine="siliconflow")
silicon_result = silicon_processor.ocr_pdf(pdf_path)
return silicon_result
else:
return rapid_result
# Usage example
result = process_with_best_engine('path/to/your/document.pdf')
print(f"Recognition completed, engine used: {result['engine']}")
print(result['text'])
{
"text": "Recognized full text content",
"page_count": number_of_pages, # Always 1 for image files
"engine": "rapid" | "siliconflow" # OCR engine used
}
RapidOCR Engine :
SiliconFlow API Engine :
Recognition accuracy may vary for complex scanned PDFs or images
It is recommended to use high-resolution scanned PDFs or images for better recognition results
When interacting with assistants in AI IDEs, you can use the following prompt words to specify different OCR engines:
Example 1: Using Local Engine
User: Help me process this scanned PDF, use local OCR engine for quick recognition
Assistant: Sure, I'll use the RapidOCR local engine for you. Please provide the PDF file path.
Example 2: Using Cloud Engine
User: This PDF contains handwritten text, need high-precision recognition, use SiliconFlow API
Assistant: Understood, I'll use the SiliconFlow API large model for you. Please provide the PDF file path and your API key (if not already configured).
Example 3: Automatic Selection
User: Help me recognize this PDF, choose the most suitable engine
Assistant: I'll default to using the RapidOCR local engine for you. If the recognition effect is not ideal, we can try using SiliconFlow API.
When the AI assistant receives these prompt words, it will:
By using these prompt words, you can flexibly control the OCR engine selection when interacting with AI IDEs to get the best recognition results
RapidOCR Initialization Failure
ModuleNotFoundError: No module named 'rapidocr_onnxruntime'pip install rapidocr_onnxruntimeSiliconFlow API 401 Error
Unauthorized: 401 Client Error.env filePDF to Image Conversion Failure
ImportError: No module named 'fitz'pip install pymupdfMIT License - See LICENSE.txt
Weekly Installs
430
Repository
GitHub Stars
2
First Seen
13 days ago
Security Audits
Gen Agent Trust HubWarnSocketPassSnykWarn
Installed on
codex428
cline426
gemini-cli426
kimi-cli426
cursor426
opencode426
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
41,400 周安装