npx skills add https://github.com/89jobrien/steve --skill 'PDF Processing'使用 pdfplumber 从 PDF 中提取文本:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
通过自动检测从 PDF 中提取表格:
import pdfplumber
with pdfplumber.open("report.pdf") as pdf:
page = pdf.pages[0]
tables = page.extract_tables()
for table in tables:
for row in table:
print(row)
高效处理多页文档:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
full_text = ""
for page in pdf.pages:
full_text += page.extract_text() + "\n\n"
print(full_text)
关于 PDF 表单填写,请参阅 FORMS.md 获取完整指南,包括字段分析和验证。
合并多个 PDF 文件:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
from pypdf import PdfMerger
merger = PdfMerger()
for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
merger.append(pdf)
merger.write("merged.pdf")
merger.close()
提取特定页面或页码范围:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
# 提取第 2-5 页
for page_num in range(1, 5):
writer.add_page(reader.pages[page_num])
with open("output.pdf", "wb") as output:
writer.write(output)
提取并保存文本:
import pdfplumber
with pdfplumber.open("input.pdf") as pdf:
text = "\n\n".join(page.extract_text() for page in pdf.pages)
with open("output.txt", "w") as f:
f.write(text)
将表格提取到 CSV:
import pdfplumber
import csv
with pdfplumber.open("tables.pdf") as pdf:
tables = pdf.pages[0].extract_tables()
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
for table in tables:
writer.writerows(table)
处理常见的 PDF 问题:
import pdfplumber
try:
with pdfplumber.open("document.pdf") as pdf:
if len(pdf.pages) == 0:
print("PDF 没有页面")
else:
text = pdf.pages[0].extract_text()
if text is None or text.strip() == "":
print("页面不包含可提取的文本(可能是扫描件)")
else:
print(text)
except Exception as e:
print(f"处理 PDF 时出错:{e}")
每周安装数
0
代码仓库
GitHub 星标数
4
首次出现
1970年1月1日
安全审计
Use pdfplumber to extract text from PDFs:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
Extract tables from PDFs with automatic detection:
import pdfplumber
with pdfplumber.open("report.pdf") as pdf:
page = pdf.pages[0]
tables = page.extract_tables()
for table in tables:
for row in table:
print(row)
Process multi-page documents efficiently:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
full_text = ""
for page in pdf.pages:
full_text += page.extract_text() + "\n\n"
print(full_text)
For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.
Combine multiple PDF files:
from pypdf import PdfMerger
merger = PdfMerger()
for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
merger.append(pdf)
merger.write("merged.pdf")
merger.close()
Extract specific pages or ranges:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
# Extract pages 2-5
for page_num in range(1, 5):
writer.add_page(reader.pages[page_num])
with open("output.pdf", "wb") as output:
writer.write(output)
Extract and save text:
import pdfplumber
with pdfplumber.open("input.pdf") as pdf:
text = "\n\n".join(page.extract_text() for page in pdf.pages)
with open("output.txt", "w") as f:
f.write(text)
Extract tables to CSV:
import pdfplumber
import csv
with pdfplumber.open("tables.pdf") as pdf:
tables = pdf.pages[0].extract_tables()
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
for table in tables:
writer.writerows(table)
Handle common PDF issues:
import pdfplumber
try:
with pdfplumber.open("document.pdf") as pdf:
if len(pdf.pages) == 0:
print("PDF has no pages")
else:
text = pdf.pages[0].extract_text()
if text is None or text.strip() == "":
print("Page contains no extractable text (might be scanned)")
else:
print(text)
except Exception as e:
print(f"Error processing PDF: {e}")
Weekly Installs
0
Repository
GitHub Stars
4
First Seen
Jan 1, 1970
Security Audits
Skills CLI 使用指南:AI Agent 技能包管理器安装与管理教程
31,600 周安装
Agile Skill Build:快速创建和扩展ace-skills的自动化工具,提升AI技能开发效率
1 周安装
LLM评估工具lm-evaluation-harness使用指南:HuggingFace模型基准测试与性能分析
212 周安装
Agently TriggerFlow 状态与资源管理:runtime_data、flow_data 和运行时资源详解
1 周安装
Agently Tools 工具系统详解:Python 代理工具注册、循环控制与内置工具使用
1 周安装
Agently Prompt配置文件技能:YAML/JSON提示模板加载、映射与导出指南
1 周安装
iOS/Android推送通知设置指南:Firebase Cloud Messaging与React Native实现
212 周安装