web-scraper by guia-matthieu/clawfu-skills
npx skills add https://github.com/guia-matthieu/clawfu-skills --skill web-scraper使用 BeautifulSoup 和 requests 从网站提取结构化数据 - 将任何网页转化为可用数据。
| Claude 负责 | 您决定 |
|---|---|
| 构建分析框架 | 战略优先级 |
| 综合市场数据 | 竞争定位 |
| 识别机会 | 资源分配 |
| 创建战略选项 | 最终战略选择 |
| 建议实施方法 | 执行决策 |
pip install beautifulsoup4 requests pandas click lxml
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
python scripts/main.py scrape https://example.com --selector "h1,h2,p"
python scripts/main.py scrape https://example.com --selector ".product-price"
python scripts/main.py links https://example.com
python scripts/main.py links https://example.com --internal-only
python scripts/main.py emails https://example.com
python scripts/main.py emails https://example.com --depth 2
python scripts/main.py structured https://example.com/article --schema article
python scripts/main.py structured https://example.com/product --schema product
python scripts/main.py scrape https://competitor.com/pricing --selector ".price,.plan-name"
# 输出:
# 提取了 6 个元素
# 1. Starter - $29/月
# 2. Pro - $99/月
# 3. Enterprise - 联系我们
python scripts/main.py structured https://blog.example.com/post --schema article
# 输出: article_data.json
# {
# "title": "如何扩展您的初创公司",
# "author": "Jane Doe",
# "date": "2024-01-15",
# "content": "...",
# "word_count": 1523
# }
| 选择器 | 描述 | 示例 |
|---|---|---|
tag | 元素类型 | h1, p, div |
.class | 类名 | .price, .title |
#id | 元素 ID | #main-content |
tag.class | 带类的标签 | div.product |
tag[attr] | 具有属性 | a[href] |
parent > child | 直接子元素 | ul > li |
tag1, tag2 | 多个 | h1, h2, h3 |
模式 : 半人马座
category: automation subcategory: data-extraction dependencies: [beautifulsoup4, requests, pandas] difficulty: intermediate time_saved: 5+ hours/week
每周安装量
103
代码仓库
GitHub 星标数
44
首次出现
2026年2月13日
安全审计
安装于
gemini-cli102
opencode102
codex101
github-copilot100
cursor100
kimi-cli99
Extract structured data from websites using BeautifulSoup and requests - turn any webpage into usable data.
| Claude Does | You Decide |
|---|---|
| Structures analysis frameworks | Strategic priorities |
| Synthesizes market data | Competitive positioning |
| Identifies opportunities | Resource allocation |
| Creates strategic options | Final strategy selection |
| Suggests implementation approaches | Execution decisions |
pip install beautifulsoup4 requests pandas click lxml
python scripts/main.py scrape https://example.com --selector "h1,h2,p"
python scripts/main.py scrape https://example.com --selector ".product-price"
python scripts/main.py links https://example.com
python scripts/main.py links https://example.com --internal-only
python scripts/main.py emails https://example.com
python scripts/main.py emails https://example.com --depth 2
python scripts/main.py structured https://example.com/article --schema article
python scripts/main.py structured https://example.com/product --schema product
python scripts/main.py scrape https://competitor.com/pricing --selector ".price,.plan-name"
# Output:
# Extracted 6 elements
# 1. Starter - $29/mo
# 2. Pro - $99/mo
# 3. Enterprise - Contact us
python scripts/main.py structured https://blog.example.com/post --schema article
# Output: article_data.json
# {
# "title": "How to Scale Your Startup",
# "author": "Jane Doe",
# "date": "2024-01-15",
# "content": "...",
# "word_count": 1523
# }
| Selector | Description | Example |
|---|---|---|
tag | Element type | h1, p, div |
.class | Class name | .price, .title |
#id |
Mode : centaur
category: automation subcategory: data-extraction dependencies: [beautifulsoup4, requests, pandas] difficulty: intermediate time_saved: 5+ hours/week
Weekly Installs
103
Repository
GitHub Stars
44
First Seen
Feb 13, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
gemini-cli102
opencode102
codex101
github-copilot100
cursor100
kimi-cli99
Python PDF处理教程:合并拆分、提取文本表格、创建PDF文件
65,000 周安装
| Element ID |
#main-content |
tag.class | Tag with class | div.product |
tag[attr] | Has attribute | a[href] |
parent > child | Direct child | ul > li |
tag1, tag2 | Multiple | h1, h2, h3 |