firecrawl-scraping by casper-studios/casper-marketplace
npx skills add https://github.com/casper-studios/casper-marketplace --skill firecrawl-scraping抓取单个网页并将其转换为干净、可供大语言模型(LLM)使用的 Markdown 格式。能够处理 JavaScript 渲染、反机器人保护和动态内容。
What are you scraping?
│
├── Single page (article, blog, docs)
│ └── references/single-page.md
│ └── Script: scripts/firecrawl_scrape.py
│
└── Entire website (multiple pages, crawling)
└── references/website-crawler.md
└── (Use Apify Website Content Crawler for multi-page)
# Required in .env
FIRECRAWL_API_KEY=fc-your-api-key-here
获取您的 API 密钥:https://firecrawl.dev/app/api-keys
python scripts/firecrawl_scrape.py "https://example.com/article"
python scripts/firecrawl_scrape.py "https://wsj.com/article" \
--proxy stealth \
--format markdown summary \
--timeout 60000
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 模式 | 使用场景 |
|---|---|
basic | 标准网站,速度最快 |
stealth | 反机器人保护、付费内容(如 WSJ, NYT) |
auto | 让 Firecrawl 决定(推荐) |
markdown - 干净的 Markdown 内容(默认)html - 原始 HTMLsummary - AI 生成的摘要screenshot - 页面截图links - 页面上的所有链接每页约 1 个积分。Stealth 代理模式可能会消耗额外积分。
FIRECRAWL_API_KEY 存储在 .env 文件中(切勿提交到 git).tmp/ 目录中症状: API 返回“积分不足”或配额超限错误 原因: 账户积分已用完 解决方案:
basic 代理模式以节省积分症状: 返回空内容或部分 HTML 原因: 重度使用 JavaScript 的页面未完全加载 解决方案:
--js-render 标志启用 JavaScript 渲染--timeout 60000 增加超时时间(60 秒)stealth 代理模式--wait-for 选择器等待特定元素症状: 脚本返回 403 状态码 原因: 网站阻止自动化访问 解决方案:
stealth 代理模式症状: 抓取成功但 Markdown 为空或格式错误 原因: 动态内容在页面加载后加载,或页面结构异常 解决方案:
--wait-for 等待特定内容html 格式以查看原始内容症状: 请求在完成前超时 原因: 页面加载缓慢或页面内容过大 解决方案:
basic 代理以获得更快的响应技能: firecrawl-scraping → parallel-research 使用场景: 抓取竞争对手页面,然后分析内容策略 流程:
技能: firecrawl-scraping → content-generation 使用场景: 根据网络研究创建摘要文档 流程:
技能: firecrawl-scraping → attio-crm 使用场景: 用网站数据丰富公司记录 流程:
每周安装量
122
代码仓库
GitHub 星标数
9
首次出现
2026年2月24日
安全审计
安装于
github-copilot121
codex121
kimi-cli121
gemini-cli121
cursor121
amp121
Scrape individual web pages and convert them to clean, LLM-ready markdown. Handles JavaScript rendering, anti-bot protection, and dynamic content.
What are you scraping?
│
├── Single page (article, blog, docs)
│ └── references/single-page.md
│ └── Script: scripts/firecrawl_scrape.py
│
└── Entire website (multiple pages, crawling)
└── references/website-crawler.md
└── (Use Apify Website Content Crawler for multi-page)
# Required in .env
FIRECRAWL_API_KEY=fc-your-api-key-here
Get your API key: https://firecrawl.dev/app/api-keys
python scripts/firecrawl_scrape.py "https://example.com/article"
python scripts/firecrawl_scrape.py "https://wsj.com/article" \
--proxy stealth \
--format markdown summary \
--timeout 60000
| Mode | Use Case |
|---|---|
basic | Standard sites, fastest |
stealth | Anti-bot protection, premium content (WSJ, NYT) |
auto | Let Firecrawl decide (recommended) |
markdown - Clean markdown content (default)html - Raw HTMLsummary - AI-generated summaryscreenshot - Page screenshotlinks - All links on page~1 credit per page. Stealth proxy may use additional credits.
FIRECRAWL_API_KEY in .env file (never commit to git).tmp/ directorySymptoms: API returns "insufficient credits" or quota exceeded error Cause: Account credits depleted Solution:
basic proxy mode to conserve creditsSymptoms: Empty content or partial HTML returned Cause: JavaScript-heavy page not fully loading Solution:
--js-render flag--timeout 60000 (60 seconds)stealth proxy mode for protected sites--wait-for selectorSymptoms: Script returns 403 status code Cause: Site blocking automated access Solution:
stealth proxy modeSymptoms: Scrape succeeds but markdown is empty or malformed Cause: Dynamic content loaded after page load, or unusual page structure Solution:
--wait-for to wait for specific contenthtml format to see raw contentSymptoms: Request times out before completion Cause: Slow page load or large page content Solution:
basic proxy for faster responseSkills: firecrawl-scraping → parallel-research Use case: Scrape competitor pages, then analyze content strategy Flow:
Skills: firecrawl-scraping → content-generation Use case: Create summary documents from web research Flow:
Skills: firecrawl-scraping → attio-crm Use case: Enrich company records with website data Flow:
Weekly Installs
122
Repository
GitHub Stars
9
First Seen
Feb 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
github-copilot121
codex121
kimi-cli121
gemini-cli121
cursor121
amp121
通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南
33,600 周安装