just-scrape AI网页抓取CLI工具 - 智能数据提取与自动化爬虫

just-scrape by scrapegraphai/just-scrape

171 周安装量

11 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/scrapegraphai/just-scrape --skill just-scrape

AI/机器学习自动化数据分析

🇨🇳中文介绍

使用 just-scrape 进行网页抓取

由 ScrapeGraph AI 提供的 AI 驱动的网页抓取 CLI。在 dashboard.scrapegraphai.com 获取 API 密钥。

安装设置

始终安装或运行 @latest 版本，以确保您拥有最新的功能和修复。

npm install -g just-scrape@latest           # npm
pnpm add -g just-scrape@latest              # pnpm
yarn global add just-scrape@latest          # yarn
bun add -g just-scrape@latest               # bun
npx just-scrape@latest --help               # 无需安装直接运行
bunx just-scrape@latest --help              # 无需安装直接运行 (bun)



export SGAI_API_KEY="sgai-..."

API 密钥解析顺序：SGAI_API_KEY 环境变量 → .env 文件 → ~/.scrapegraphai/config.json → 交互式提示（保存到配置）。

命令选择

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

843,800 周安装

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

125,100 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

123,100 周安装

所有命令都支持 --json 以生成机器可读的输出（抑制横幅、旋转器、提示）。

抓取命令共享以下可选标志：

--stealth — 绕过反机器人检测（+4 积分）
--headers <json> — 自定义 HTTP 头，作为 JSON 字符串
--schema <json> — 强制执行输出 JSON 模式

使用 AI 从任何 URL 提取结构化数据。

just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n>     # 无限滚动 (0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n>       # 多页 (1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth         # 反机器人 (+4 积分)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-text



# 电子商务提取
just-scrape smart-scraper https://store.example.com/shoes -p "提取所有产品名称、价格和评分"

# 严格模式 + 滚动
just-scrape smart-scraper https://news.example.com -p "获取标题和日期" \
  --schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}' \
  --scrolls 5

# 使用反机器人保护的 JS 密集型 SPA
just-scrape smart-scraper https://app.example.com/dashboard -p "提取用户统计数据" \
  --stealth

搜索网络并从结果中提取结构化数据。

just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n>     # 要抓取的来源 (3-20, 默认 3)
just-scrape search-scraper <prompt> --no-extraction       # 仅 Markdown (2 积分 vs 10)
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>



# 跨来源研究
just-scrape search-scraper "2025 年最佳 Python Web 框架" --num-results 10

# 仅 Markdown（更便宜）
just-scrape search-scraper "React 与 Vue 比较" --no-extraction --num-results 5

# 结构化输出
just-scrape search-scraper "前 5 名云服务提供商定价" \
  --schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'

将任何网页转换为干净的 Markdown。

just-scrape markdownify <url>
just-scrape markdownify <url> --stealth         # +4 积分
just-scrape markdownify <url> --headers <json>



just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md

抓取多个页面并从每个页面提取数据。

just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n>      # 默认 10
just-scrape crawl <url> -p <prompt> --depth <n>           # 默认 1
just-scrape crawl <url> --no-extraction --max-pages <n>   # 仅 Markdown (2 积分/页)
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json>        # include_paths, same_domain
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealth



# 爬取文档网站
just-scrape crawl https://docs.example.com -p "提取所有代码片段" --max-pages 20 --depth 3

# 仅过滤博客页面
just-scrape crawl https://example.com -p "提取文章标题" \
  --rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50

# 原始 Markdown，无 AI 提取（更便宜）
just-scrape crawl https://example.com --no-extraction --max-pages 10

从 URL 获取原始 HTML 内容。

just-scrape scrape <url>
just-scrape scrape <url> --stealth          # +4 积分
just-scrape scrape <url> --branding         # 提取徽标/颜色/字体 (+2 积分)
just-scrape scrape <url> --country-code <iso>



just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --branding

使用 AI 进行浏览器自动化 — 登录、点击、导航、填写表单。步骤是以逗号分隔的字符串。

just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session   # 保持浏览器会话



# 登录 + 提取仪表板
just-scrape agentic-scraper https://app.example.com/login \
  -s "用 user@test.com 填写邮箱,用 secret 填写密码,点击登录" \
  --ai-extraction -p "提取所有仪表板指标"

# 多步骤表单
just-scrape agentic-scraper https://example.com/wizard \
  -s "点击下一步,选择高级计划,用 John 填写姓名,点击提交"

# 跨运行保持会话
just-scrape agentic-scraper https://app.example.com \
  -s "点击设置" --use-session

根据自然语言描述生成 JSON 模式。

just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>



just-scrape generate-schema "包含名称、价格、评分和评论数组的电子商务产品"

# 优化现有模式
just-scrape generate-schema "添加一个可用性字段" \
  --existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'

从网站的站点地图获取所有 URL。

just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'

浏览请求历史。默认是交互式的（使用方向键导航，选择以查看详情）。

just-scrape history <service>                     # 交互式浏览器
just-scrape history <service> <request-id>        # 特定请求
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n>     # 最大 100
just-scrape history <service> --json

服务：markdownify, smartscraper, searchscraper, scrape, crawl, agentic-scraper, sitemap

just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'

just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validate

生成模式然后使用它进行抓取

just-scrape generate-schema "包含名称、价格和评论的产品" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "提取产品" --schema "$(cat schema.json)"

为脚本编写管道传输 JSON

just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
  just-scrape smart-scraper "$url" -p "提取标题" --json >> results.jsonl
done

# 使用 Cloudflare 保护的 JS 密集型 SPA
just-scrape smart-scraper https://protected.example.com -p "提取数据" --stealth

# 使用自定义 cookies/headers
just-scrape smart-scraper https://example.com -p "提取数据" \
  --cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'

功能	额外积分
`--stealth`	每次请求 +4
`--branding`（仅限 scrape）	+2
`search-scraper` 提取	每次请求 10
`search-scraper --no-extraction`	每次请求 2
`crawl --no-extraction`	每页 2

SGAI_API_KEY=sgai-...              # API 密钥
JUST_SCRAPE_TIMEOUT_S=300          # 请求超时（秒）（默认 120）
JUST_SCRAPE_DEBUG=1                # 调试日志输出到 stderr

🇺🇸English

Web Scraping with just-scrape

AI-powered web scraping CLI by ScrapeGraph AI. Get an API key at dashboard.scrapegraphai.com.

Setup

Always install or run the @latest version to ensure you have the most recent features and fixes.

npm install -g just-scrape@latest           # npm
pnpm add -g just-scrape@latest              # pnpm
yarn global add just-scrape@latest          # yarn
bun add -g just-scrape@latest               # bun
npx just-scrape@latest --help               # run without installing
bunx just-scrape@latest --help              # run without installing (bun)



export SGAI_API_KEY="sgai-..."

API key resolution order: SGAI_API_KEY env var → .env file → ~/.scrapegraphai/config.json → interactive prompt (saves to config).

Command Selection

Need	Command
Extract structured data from a known URL	`smart-scraper`
Search the web and extract from results	`search-scraper`
Convert a page to clean markdown	`markdownify`
Crawl multiple pages from a site	`crawl`
Get raw HTML	`scrape`
Automate browser actions (login, click, fill)	`agentic-scraper`

Common Flags

All commands support --json for machine-readable output (suppresses banner, spinners, prompts).

Scraping commands share these optional flags:

--stealth — bypass anti-bot detection (+4 credits)
--headers <json> — custom HTTP headers as JSON string
--schema <json> — enforce output JSON schema

Commands

Smart Scraper

Extract structured data from any URL using AI.

just-scrape smart-scraper <url> -p <prompt>
just-scrape smart-scraper <url> -p <prompt> --schema <json>
just-scrape smart-scraper <url> -p <prompt> --scrolls <n>     # infinite scroll (0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n>       # multi-page (1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth         # anti-bot (+4 credits)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-text



# E-commerce extraction
just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"

# Strict schema + scrolling
just-scrape smart-scraper https://news.example.com -p "Get headlines and dates" \
  --schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}' \
  --scrolls 5

# JS-heavy SPA behind anti-bot
just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats" \
  --stealth

Search Scraper

Search the web and extract structured data from results.

just-scrape search-scraper <prompt>
just-scrape search-scraper <prompt> --num-results <n>     # sources to scrape (3-20, default 3)
just-scrape search-scraper <prompt> --no-extraction       # markdown only (2 credits vs 10)
just-scrape search-scraper <prompt> --schema <json>
just-scrape search-scraper <prompt> --stealth --headers <json>



# Research across sources
just-scrape search-scraper "Best Python web frameworks in 2025" --num-results 10

# Cheap markdown-only
just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5

# Structured output
just-scrape search-scraper "Top 5 cloud providers pricing" \
  --schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'

Markdownify

Convert any webpage to clean markdown.

just-scrape markdownify <url>
just-scrape markdownify <url> --stealth         # +4 credits
just-scrape markdownify <url> --headers <json>



just-scrape markdownify https://blog.example.com/my-article
just-scrape markdownify https://protected.example.com --stealth
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md

Crawl

Crawl multiple pages and extract data from each.

just-scrape crawl <url> -p <prompt>
just-scrape crawl <url> -p <prompt> --max-pages <n>      # default 10
just-scrape crawl <url> -p <prompt> --depth <n>           # default 1
just-scrape crawl <url> --no-extraction --max-pages <n>   # markdown only (2 credits/page)
just-scrape crawl <url> -p <prompt> --schema <json>
just-scrape crawl <url> -p <prompt> --rules <json>        # include_paths, same_domain
just-scrape crawl <url> -p <prompt> --no-sitemap
just-scrape crawl <url> -p <prompt> --stealth



# Crawl docs site
just-scrape crawl https://docs.example.com -p "Extract all code snippets" --max-pages 20 --depth 3

# Filter to blog pages only
just-scrape crawl https://example.com -p "Extract article titles" \
  --rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50

# Raw markdown, no AI extraction (cheaper)
just-scrape crawl https://example.com --no-extraction --max-pages 10

Scrape

Get raw HTML content from a URL.

just-scrape scrape <url>
just-scrape scrape <url> --stealth          # +4 credits
just-scrape scrape <url> --branding         # extract logos/colors/fonts (+2 credits)
just-scrape scrape <url> --country-code <iso>



just-scrape scrape https://example.com
just-scrape scrape https://store.example.com --stealth --country-code DE
just-scrape scrape https://example.com --branding

Agentic Scraper

Browser automation with AI — login, click, navigate, fill forms. Steps are comma-separated strings.

just-scrape agentic-scraper <url> -s <steps>
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session   # persist browser session



# Login + extract dashboard
just-scrape agentic-scraper https://app.example.com/login \
  -s "Fill email with user@test.com,Fill password with secret,Click Sign In" \
  --ai-extraction -p "Extract all dashboard metrics"

# Multi-step form
just-scrape agentic-scraper https://example.com/wizard \
  -s "Click Next,Select Premium plan,Fill name with John,Click Submit"

# Persistent session across runs
just-scrape agentic-scraper https://app.example.com \
  -s "Click Settings" --use-session

Generate Schema

Generate a JSON schema from a natural language description.

just-scrape generate-schema <prompt>
just-scrape generate-schema <prompt> --existing-schema <json>



just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"

# Refine an existing schema
just-scrape generate-schema "Add an availability field" \
  --existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'

Sitemap

Get all URLs from a website's sitemap.

just-scrape sitemap <url>
just-scrape sitemap https://example.com --json | jq -r '.urls[]'

History

Browse request history. Interactive by default (arrow keys to navigate, select to view details).

just-scrape history <service>                     # interactive browser
just-scrape history <service> <request-id>        # specific request
just-scrape history <service> --page <n>
just-scrape history <service> --page-size <n>     # max 100
just-scrape history <service> --json

Services: markdownify, smartscraper, searchscraper, scrape, crawl, agentic-scraper, sitemap

just-scrape history smartscraper
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'

Credits & Validate

just-scrape credits
just-scrape credits --json | jq '.remaining_credits'
just-scrape validate

Common Patterns

Generate schema then scrape with it

just-scrape generate-schema "Product with name, price, and reviews" --json | jq '.schema' > schema.json
just-scrape smart-scraper https://store.example.com -p "Extract products" --schema "$(cat schema.json)"

Pipe JSON for scripting

just-scrape sitemap https://example.com --json | jq -r '.urls[]' | while read url; do
  just-scrape smart-scraper "$url" -p "Extract title" --json >> results.jsonl
done

Protected sites

# JS-heavy SPA behind Cloudflare
just-scrape smart-scraper https://protected.example.com -p "Extract data" --stealth

# With custom cookies/headers
just-scrape smart-scraper https://example.com -p "Extract data" \
  --cookies '{"session":"abc123"}' --headers '{"Authorization":"Bearer token"}'

Credit Costs

Feature	Extra Credits
`--stealth`	+4 per request
`--branding` (scrape only)	+2
`search-scraper` extraction	10 per request
`search-scraper --no-extraction`	2 per request
`crawl --no-extraction`	2 per page

Environment Variables

SGAI_API_KEY=sgai-...              # API key
JUST_SCRAPE_TIMEOUT_S=300          # Request timeout in seconds (default 120)
JUST_SCRAPE_DEBUG=1                # Debug logging to stderr

Weekly Installs

171

Repository

scrapegraphai/j…t-scrape

GitHub Stars

First Seen

Feb 13, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

cursor78

claude-code71

codex59

opencode51

gemini-cli35

github-copilot34

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

113,700 周安装

从已知 URL 提取结构化数据	`smart-scraper`
搜索网络并从结果中提取	`search-scraper`
将页面转换为干净的 Markdown	`markdownify`
从网站抓取多个页面	`crawl`
获取原始 HTML	`scrape`
自动化浏览器操作（登录、点击、填写）	`agentic-scraper`
根据描述生成 JSON 模式	`generate-schema`
从站点地图获取所有 URL	`sitemap`
检查积分余额	`credits`
浏览历史请求	`history`
验证 API 密钥	`validate`

just-scrape AI网页抓取CLI工具 - 智能数据提取与自动化爬虫

🇨🇳中文介绍

使用 just-scrape 进行网页抓取

安装设置

命令选择

相关 Skills

通用标志

命令

智能抓取器

搜索抓取器

Markdown 转换

爬取

抓取

代理抓取器

生成模式

站点地图

历史记录

积分与验证

常用模式