ScrapeNinja：高性能网络爬虫API，绕过反爬虫，支持JS渲染与代理轮换 | SkillsMD

ScrapeNinja：高性能网络爬虫API，绕过反爬虫，支持JS渲染与代理轮换

scrapeninja by vm0-ai/vm0-skills

98 周安装量

52 GitHub Stars

安装命令

npx skills add https://github.com/vm0-ai/vm0-skills --skill scrapeninja

自动化数据处理 API

🇨🇳中文介绍

ScrapeNinja

高性能网络爬虫 API，具备 Chrome TLS 指纹、轮换代理、智能重试和可选的 JavaScript 渲染功能。

官方文档：https://scrapeninja.net/docs/

使用场景

在以下情况时使用此技能：

抓取具有反爬虫保护（Cloudflare、Datadome）的网站
无需运行完整浏览器即可提取数据（快速的 /scrape 端点）
渲染 JavaScript 密集型页面（/scrape-js 端点）
使用支持地理选择的轮换代理（美国、欧盟、巴西等）
使用 Cheerio 提取器提取结构化数据
拦截 AJAX 请求
对页面进行截图

先决条件

从 RapidAPI 或 APIRoad 获取 API 密钥：

RapidAPI: https://rapidapi.com/restyler/api/scrapeninja
APIRoad: https://apiroad.net/marketplace/apis/scrapeninja

设置环境变量：

# For RapidAPI
export SCRAPENINJA_TOKEN="your-rapidapi-key"

# For APIRoad (use X-Apiroad-Key header instead)
export SCRAPENINJA_TOKEN="your-apiroad-key"

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

131,500 周安装

GitHub Actions 官方文档查询助手 - 精准解答 CI/CD 工作流问题

40,500 周安装

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

40,000 周安装

{
  "url": "https://example.com"
}

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'

{
  "url": "https://example.com",
  "headers": ["Accept-Language: en-US"],
  "retryNum": 3,
  "timeout": 15
}

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

{
  "url": "https://example.com",
  "waitForSelector": "h1",
  "timeout": 20
}

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, bodyLength: (.body | length)}'

{
  "url": "https://example.com",
  "screenshot": true
}

# 从响应中获取截图 URL
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq -r '.info.screenshot'

{
  "url": "https://example.com",
  "geo": "eu"
}

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq .info

{
  "url": "https://example.com",
  "retryNum": 3,
  "statusNotExpected": [403, 429, 503],
  "textNotExpected": ["captcha", "Access Denied"]
}

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

{
  "url": "https://news.ycombinator.com",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.extractor'

{
  "url": "https://example.com",
  "catchAjaxHeadersUrlMask": "api/data"
}

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.info.catchedAjax'

{
  "url": "https://example.com",
  "blockImages": true,
  "blockMedia": true
}

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

端点	描述
`/scrape`	使用 Chrome TLS 指纹进行快速非 JS 抓取
`/scrape-js`	使用完整 Chrome 浏览器进行 JS 渲染
`/v2/scrape-js`	增强型 JS 渲染，适用于受保护网站（仅限 APIRoad）

参数	类型	默认值	描述
`url`	string	必需	要抓取的 URL
`headers`	string[]	-	自定义 HTTP 请求头
`retryNum`	int	1	重试尝试次数
`geo`	string	`us`	代理地理位置：us, eu, br, fr, de, 4g-eu
`proxy`	string	-	自定义代理 URL（覆盖 geo）
`timeout`	int	10/16	每次尝试的超时时间（秒）
`textNotExpected`	string[]	-	触发重试的文本模式
`statusNotExpected`	int[]	[403, 502]	触发重试的 HTTP 状态码
`extractor`	string	-	Cheerio 提取器函数

参数	类型	默认值	描述
`waitForSelector`	string	-	等待的 CSS 选择器
`postWaitTime`	int	-	加载后的额外等待时间（1-12 秒）
`screenshot`	bool	true	对页面进行截图
`blockImages`	bool	false	阻止图像加载
`blockMedia`	bool	false	阻止 CSS/字体加载
`catchAjaxHeadersUrlMask`	string	-	用于拦截 AJAX 的 URL 模式
`viewport`	object	1920x1080	自定义视口大小

{
  "info": {
  "statusCode": 200,
  "finalUrl": "https://example.com",
  "headers": ["content-type: text/html"],
  "screenshot": "base64-encoded-png",
  "catchedAjax": {
  "url": "https://example.com/api/data",
  "method": "GET",
  "body": "...",
  "status": 200
  }
  },
  "body": "<html>...</html>",
  "extractor": { "extracted": "data" }
}

🇺🇸English

ScrapeNinja

High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.

Official docs: https://scrapeninja.net/docs/

When to Use

Use this skill when you need to:

Scrape websites with anti-bot protection (Cloudflare, Datadome)
Extract data without running a full browser (fast /scrape endpoint)
Render JavaScript-heavy pages (/scrape-js endpoint)
Use rotating proxies with geo selection (US, EU, Brazil, etc.)
Extract structured data with Cheerio extractors
Intercept AJAX requests
Take screenshots of pages

Prerequisites

Get an API key from RapidAPI or APIRoad:

RapidAPI: https://rapidapi.com/restyler/api/scrapeninja
APIRoad: https://apiroad.net/marketplace/apis/scrapeninja

Set environment variable:

# For RapidAPI
export SCRAPENINJA_TOKEN="your-rapidapi-key"

# For APIRoad (use X-Apiroad-Key header instead)
export SCRAPENINJA_TOKEN="your-apiroad-key"

How to Use

1. Basic Scrape (Non-JS, Fast)

High-performance scraping with Chrome TLS fingerprint, no JavaScript:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com"
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'

With custom headers and retries:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "headers": ["Accept-Language: en-US"],
  "retryNum": 3,
  "timeout": 15
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

2. Scrape with JavaScript Rendering

For JavaScript-heavy sites (React, Vue, etc.):

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "waitForSelector": "h1",
  "timeout": 20
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, bodyLength: (.body | length)}'

With screenshot:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "screenshot": true
}

Then run:

# Get screenshot URL from response
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq -r '.info.screenshot'

3. Geo-Based Proxy Selection

Use proxies from specific regions:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "geo": "eu"
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq .info

Available geos: us, eu, br (Brazil), fr (France), de (Germany), 4g-eu

4. Smart Retries

Retry on specific HTTP status codes or text patterns:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "retryNum": 3,
  "statusNotExpected": [403, 429, 503],
  "textNotExpected": ["captcha", "Access Denied"]
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

5. Extract Data with Cheerio

Extract structured JSON using Cheerio extractor functions:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://news.ycombinator.com",
  "extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.extractor'

6. Intercept AJAX Requests

Capture XHR/fetch responses:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "catchAjaxHeadersUrlMask": "api/data"
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.info.catchedAjax'

7. Block Resources for Speed

Speed up JS rendering by blocking images and media:

Write to /tmp/scrapeninja_request.json:

{
  "url": "https://example.com",
  "blockImages": true,
  "blockMedia": true
}

Then run:

curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json

API Endpoints

Endpoint	Description
`/scrape`	Fast non-JS scraping with Chrome TLS fingerprint
`/scrape-js`	Full Chrome browser with JS rendering
`/v2/scrape-js`	Enhanced JS rendering for protected sites (APIRoad only)

Request Parameters

Common Parameters (all endpoints)

Parameter	Type	Default	Description
`url`	string	required	URL to scrape
`headers`	string[]	-	Custom HTTP headers
`retryNum`	int	1	Number of retry attempts
`geo`	string	`us`

JS Rendering Parameters (`/scrape-js`, `/v2/scrape-js`)

Parameter	Type	Default	Description
`waitForSelector`	string	-	CSS selector to wait for
`postWaitTime`	int	-	Extra wait time after load (1-12s)
`screenshot`	bool	true	Take page screenshot
`blockImages`	bool	false	Block image loading

Response Format

{
  "info": {
  "statusCode": 200,
  "finalUrl": "https://example.com",
  "headers": ["content-type: text/html"],
  "screenshot": "base64-encoded-png",
  "catchedAjax": {
  "url": "https://example.com/api/data",
  "method": "GET",
  "body": "...",
  "status": 200
  }
  },
  "body": "<html>...</html>",
  "extractor": { "extracted": "data" }
}

Guidelines

Start with/scrape: Use the fast non-JS endpoint first, only switch to /scrape-js if needed
Retries : Set retryNum to 2-3 for unreliable sites
Geo Selection : Use eu for European sites, us for American sites
Extractors : Test extractors at https://scrapeninja.net/cheerio-sandbox/
Blocked Sites : For Cloudflare/Datadome protected sites, use /v2/scrape-js via APIRoad
Screenshots : Set screenshot: false to speed up JS rendering
Rate Limits : Check your plan limits on RapidAPI/APIRoad dashboard

Tools

Playground : https://scrapeninja.net/scraper-sandbox
Cheerio Sandbox : https://scrapeninja.net/cheerio-sandbox
cURL Converter : https://scrapeninja.net/curl-to-scraper

Weekly Installs

98

Repository

vm0-ai/vm0-skills

GitHub Stars

52

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

opencode86

gemini-cli85

codex81

cursor80

github-copilot77

amp75

blockMedia