scrapeninja by vm0-ai/vm0-skills
npx skills add https://github.com/vm0-ai/vm0-skills --skill scrapeninja高性能网络爬虫 API,具备 Chrome TLS 指纹、轮换代理、智能重试和可选的 JavaScript 渲染功能。
在以下情况时使用此技能:
/scrape 端点)/scrape-js 端点)设置环境变量:
# For RapidAPI
export SCRAPENINJA_TOKEN="your-rapidapi-key"
# For APIRoad (use X-Apiroad-Key header instead)
export SCRAPENINJA_TOKEN="your-apiroad-key"
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
使用 Chrome TLS 指纹进行高性能抓取,无需 JavaScript:
写入 /tmp/scrapeninja_request.json:
{
"url": "https://example.com"
}
然后运行:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'
使用自定义请求头和重试:
写入 /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"headers": ["Accept-Language: en-US"],
"retryNum": 3,
"timeout": 15
}
然后运行:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json
适用于 JavaScript 密集型网站(React、Vue 等):
写入 /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"waitForSelector": "h1",
"timeout": 20
}
然后运行:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, bodyLength: (.body | length)}'
附带截图:
写入 /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"screenshot": true
}
然后运行:
# 从响应中获取截图 URL
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq -r '.info.screenshot'
使用来自特定区域的代理:
写入 /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"geo": "eu"
}
然后运行:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq .info
可用地理位置:us、eu、br(巴西)、fr(法国)、de(德国)、4g-eu
根据特定 HTTP 状态码或文本模式进行重试:
写入 /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"retryNum": 3,
"statusNotExpected": [403, 429, 503],
"textNotExpected": ["captcha", "Access Denied"]
}
然后运行:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json
使用 Cheerio 提取器函数提取结构化 JSON:
写入 /tmp/scrapeninja_request.json:
{
"url": "https://news.ycombinator.com",
"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}
然后运行:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.extractor'
捕获 XHR/fetch 响应:
写入 /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"catchAjaxHeadersUrlMask": "api/data"
}
然后运行:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.info.catchedAjax'
通过屏蔽图像和媒体来加速 JS 渲染:
写入 /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"blockImages": true,
"blockMedia": true
}
然后运行:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json
| 端点 | 描述 |
|---|---|
/scrape | 使用 Chrome TLS 指纹进行快速非 JS 抓取 |
/scrape-js | 使用完整 Chrome 浏览器进行 JS 渲染 |
/v2/scrape-js | 增强型 JS 渲染,适用于受保护网站(仅限 APIRoad) |
| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
url | string | 必需 | 要抓取的 URL |
headers | string[] | - | 自定义 HTTP 请求头 |
retryNum | int | 1 | 重试尝试次数 |
geo | string | us | 代理地理位置:us, eu, br, fr, de, 4g-eu |
proxy | string | - | 自定义代理 URL(覆盖 geo) |
timeout | int | 10/16 | 每次尝试的超时时间(秒) |
textNotExpected | string[] | - | 触发重试的文本模式 |
statusNotExpected | int[] | [403, 502] | 触发重试的 HTTP 状态码 |
extractor | string | - | Cheerio 提取器函数 |
/scrape-js、/v2/scrape-js)| 参数 | 类型 | 默认值 | 描述 |
|---|---|---|---|
waitForSelector | string | - | 等待的 CSS 选择器 |
postWaitTime | int | - | 加载后的额外等待时间(1-12 秒) |
screenshot | bool | true | 对页面进行截图 |
blockImages | bool | false | 阻止图像加载 |
blockMedia | bool | false | 阻止 CSS/字体加载 |
catchAjaxHeadersUrlMask | string | - | 用于拦截 AJAX 的 URL 模式 |
viewport | object | 1920x1080 | 自定义视口大小 |
{
"info": {
"statusCode": 200,
"finalUrl": "https://example.com",
"headers": ["content-type: text/html"],
"screenshot": "base64-encoded-png",
"catchedAjax": {
"url": "https://example.com/api/data",
"method": "GET",
"body": "...",
"status": 200
}
},
"body": "<html>...</html>",
"extractor": { "extracted": "data" }
}
/scrape开始:首先使用快速的非 JS 端点,仅在需要时切换到 /scrape-jsretryNum 设置为 2-3eu,美国网站使用 us/v2/scrape-jsscreenshot: false 以加速 JS 渲染每周安装量
98
代码仓库
GitHub 星标数
52
首次出现
2026年1月24日
安全审计
安装于
opencode86
gemini-cli85
codex81
cursor80
github-copilot77
amp75
High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.
Official docs: https://scrapeninja.net/docs/
Use this skill when you need to:
/scrape endpoint)/scrape-js endpoint)Set environment variable:
# For RapidAPI
export SCRAPENINJA_TOKEN="your-rapidapi-key"
# For APIRoad (use X-Apiroad-Key header instead)
export SCRAPENINJA_TOKEN="your-apiroad-key"
High-performance scraping with Chrome TLS fingerprint, no JavaScript:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com"
}
Then run:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'
With custom headers and retries:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"headers": ["Accept-Language: en-US"],
"retryNum": 3,
"timeout": 15
}
Then run:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json
For JavaScript-heavy sites (React, Vue, etc.):
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"waitForSelector": "h1",
"timeout": 20
}
Then run:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '{status: .info.statusCode, bodyLength: (.body | length)}'
With screenshot:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"screenshot": true
}
Then run:
# Get screenshot URL from response
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq -r '.info.screenshot'
Use proxies from specific regions:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"geo": "eu"
}
Then run:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq .info
Available geos: us, eu, br (Brazil), fr (France), de (Germany), 4g-eu
Retry on specific HTTP status codes or text patterns:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"retryNum": 3,
"statusNotExpected": [403, 429, 503],
"textNotExpected": ["captcha", "Access Denied"]
}
Then run:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json
Extract structured JSON using Cheerio extractor functions:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://news.ycombinator.com",
"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}
Then run:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.extractor'
Capture XHR/fetch responses:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"catchAjaxHeadersUrlMask": "api/data"
}
Then run:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json | jq '.info.catchedAjax'
Speed up JS rendering by blocking images and media:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"blockImages": true,
"blockMedia": true
}
Then run:
curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: $(printenv SCRAPENINJA_TOKEN)" -d @/tmp/scrapeninja_request.json
| Endpoint | Description |
|---|---|
/scrape | Fast non-JS scraping with Chrome TLS fingerprint |
/scrape-js | Full Chrome browser with JS rendering |
/v2/scrape-js | Enhanced JS rendering for protected sites (APIRoad only) |
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | required | URL to scrape |
headers | string[] | - | Custom HTTP headers |
retryNum | int | 1 | Number of retry attempts |
geo | string | us |
/scrape-js, /v2/scrape-js)| Parameter | Type | Default | Description |
|---|---|---|---|
waitForSelector | string | - | CSS selector to wait for |
postWaitTime | int | - | Extra wait time after load (1-12s) |
screenshot | bool | true | Take page screenshot |
blockImages | bool | false | Block image loading |
{
"info": {
"statusCode": 200,
"finalUrl": "https://example.com",
"headers": ["content-type: text/html"],
"screenshot": "base64-encoded-png",
"catchedAjax": {
"url": "https://example.com/api/data",
"method": "GET",
"body": "...",
"status": 200
}
},
"body": "<html>...</html>",
"extractor": { "extracted": "data" }
}
/scrape: Use the fast non-JS endpoint first, only switch to /scrape-js if neededretryNum to 2-3 for unreliable siteseu for European sites, us for American sites/v2/scrape-js via APIRoadscreenshot: false to speed up JS renderingWeekly Installs
98
Repository
GitHub Stars
52
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode86
gemini-cli85
codex81
cursor80
github-copilot77
amp75
Skills CLI 使用指南:AI Agent 技能包管理器安装与管理教程
40,000 周安装
| Proxy geo: us, eu, br, fr, de, 4g-eu |
proxy | string | - | Custom proxy URL (overrides geo) |
timeout | int | 10/16 | Timeout per attempt in seconds |
textNotExpected | string[] | - | Text patterns that trigger retry |
statusNotExpected | int[] | [403, 502] | HTTP status codes that trigger retry |
extractor | string | - | Cheerio extractor function |
blockMedia| bool |
| false |
| Block CSS/fonts loading |
catchAjaxHeadersUrlMask | string | - | URL pattern to intercept AJAX |
viewport | object | 1920x1080 | Custom viewport size |