actionbook-scraper by actionbook/actionbook
npx skills add https://github.com/actionbook/actionbook --skill actionbook-scraper每个生成的脚本必须通过以下两项检查:
| 检查项 | 验证内容 | 失败示例 |
|---|---|---|
| 第一部分:脚本运行 | 无错误,无超时 | 未找到选择器 |
| 第二部分:数据正确 | 内容符合预期 | 提取了"点击展开"而非名称 |
┌─────────────────────────────────────────────────────┐
│ 1. 生成脚本 │
│ ↓ │
│ 2. 执行脚本 │
│ ↓ │
│ 3. 检查第一部分:脚本是否无错误运行? │
│ ↓ │
│ 4. 检查第二部分:数据内容是否正确? │
│ - 非空 │
│ - 非占位符文本("加载中...") │
│ - 非界面文本("点击展开") │
│ - 字段映射正确 │
│ ↓ │
│ ┌───┴───┐ │
│ 两者都通过 任一失败 │
│ │ │ │
│ │ ↓ │
│ │ 是 Actionbook 数据问题吗? │
│ │ │ │
│ │ ┌───┴───┐ │
│ │ 是 否 │
│ │ │ │ │
│ │ ↓ ↓ │
│ │ 记录到 修复脚本 │
│ │ .actionbook-issues.log │
│ │ │ │ │
│ │ └───┬───┘ │
│ │ ↓ │
│ │ 重试(最多3次) │
│ ↓ │
│ 输出脚本 │
└─────────────────────────────────────────────────────┘
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
/actionbook-scraper:generate <url>
默认 = agent-browser 脚本(bash 命令)
agent-browser open "https://example.com"
agent-browser scroll down 2000
agent-browser get text ".selector"
agent-browser close
/actionbook-scraper:generate <url> --standalone
输出 = Playwright JavaScript 代码
每个生成的脚本必须通过以下两项检查:
| 检查项 | 验证内容 | 失败处理 |
|---|---|---|
| 1. 脚本运行 | 无错误,无超时 | 修复语法/选择器错误 |
| 2. 数据正确 | 内容符合预期字段 | 修复提取逻辑 |
验证提取的数据是否符合预期结构:
Expected: Company name, description, website, year founded
Actual: "Click to expand", "Loading...", empty strings
→ 失败:数据内容不正确,需要修复提取逻辑
数据验证规则:
| 规则 | 失败示例 | 修复方法 |
|---|---|---|
| 字段非空 | name: "" | 检查选择器是否定位到正确元素 |
| 无占位符文本 | name: "Loading..." | 为动态内容添加等待 |
| 无界面文本 | name: "Click to expand" | 在展开后提取,而非按钮文本 |
| 正确数据类型 | year: "View Details" | 选择器错误,修复字段映射 |
| 合理数量 | 预期约100,得到3 | 添加滚动/分页处理 |
node script.js运行/generate <url> → 输出:agent-browser bash 命令
/generate <url> --standalone → 输出:Playwright .js 文件
┌─────────────────────────────────────────────────────────────┐
│ /generate <url> │
│ │
│ 1. 搜索 Actionbook → 获取选择器 │
│ 2. 生成输出: │
│ │
│ 不使用 --standalone │ 使用 --standalone │
│ ───────────────────── │ ────────────────── │
│ agent-browser 命令 │ Playwright .js 代码 │
│ │ │
│ ```bash │ ```javascript │
│ agent-browser open ... │ const { chromium } = ... │
│ agent-browser get ... │ await page.goto(...) │
│ agent-browser close │ ``` │
│ ``` │ │
└─────────────────────────────────────────────────────────────┘
| 操作 | 主要工具 | 备用工具 | 备注 |
|---|---|---|---|
| 为 URL 查找选择器 | search_actions | 无 | 按域名/关键词搜索 |
| 获取完整选择器详情 | get_action_by_id | 无 | 使用搜索得到的 action_id |
| 列出可用数据源 | list_sources | search_sources | 浏览所有已索引的网站 |
| 生成 agent-browser 脚本 | Agent (sonnet) | - | /generate 的默认模式 |
| 生成 Playwright 脚本 | Agent (sonnet) | - | 使用 --standalone 标志 |
| 结构分析 | Agent (haiku) | - | 解析 Actionbook 响应 |
| 请求新网站 | agent-browser | 手动 | 提交到 actionbook.dev(唯一会执行 agent-browser 的命令) |
每个生成的脚本必须通过执行来验证。
| 步骤 | 操作 |
|---|---|
| 1 | 使用 Actionbook 选择器生成脚本 |
| 2 | 执行脚本以验证其工作 |
| 3 | 如果失败:分析错误,修复脚本,返回步骤 2 |
| 4 | 如果成功:输出已验证的脚本 + 数据预览 |
对于 agent-browser 脚本:
# 执行每条命令
agent-browser open "https://example.com"
agent-browser wait --load networkidle
agent-browser get text ".selector"
# 检查是否返回数据
# 如果错误 → 修复并重试
agent-browser close
对于 Playwright 脚本(--standalone):
# 写入临时文件并执行
node /tmp/scraper.js
# 检查输出文件是否有数据
# 如果错误 → 修复并重试
agent-browser close| 错误 | 示例 | 修复方法 |
|---|---|---|
| 提取了按钮文本 | name: "Click to expand" | 在展开后提取内容 |
| 提取了占位符 | desc: "Loading..." | 为动态内容添加等待 |
| 空字段 | name: "" | 修复选择器 |
| 错误的字段映射 | year: "San Francisco" | 为每个字段修复选择器 |
| 项目数量过少 | 预期 100,得到 3 | 添加滚动/分页 |
如果 Actionbook 选择器错误或已过时,记录到本地文件:
.actionbook-issues.log
何时记录:
日志格式:
[YYYY-MM-DD HH:MM] URL: {url}
Action ID: {action_id}
Issue Type: {selector_error | outdated | missing}
Details: {description}
Selector: {selector}
Expected: {what it should select}
Actual: {what it actually selects or error}
---
当 Actionbook 提供多个选择器时,按以下顺序优先选择:
data-testid - 最稳定,专为自动化设计aria-label - 基于可访问性,语义化css - 基于类的选择器xpath - 最后手段,最脆弱| 命令 | 描述 | 代理 |
|---|---|---|
/actionbook-scraper:analyze <url> | 分析页面结构并显示可用选择器 | structure-analyzer |
/actionbook-scraper:generate <url> | 生成 agent-browser 爬虫脚本 | code-generator |
/actionbook-scraper:generate <url> --standalone | 生成 Playwright/Puppeteer 脚本 | code-generator |
/actionbook-scraper:list-sources | 列出包含 Actionbook 数据的网站 | - |
/actionbook-scraper:request-website <url> | 请求索引新网站(使用 agent-browser) | website-requester |
1. 用户:/actionbook-scraper:analyze https://example.com/page
2. 从 URL 提取域名 → "example.com"
3. search_actions("example page") → [action_ids]
4. 对于最佳匹配:get_action_by_id(action_id) → 完整的选择器数据
5. structure-analyzer 代理格式化并呈现结果
用户:/actionbook-scraper:generate https://example.com/page
步骤 1:搜索 Actionbook
search_actions("example.com page") → action_ids
步骤 2:获取选择器
get_action_by_id(best_match) → selectors
步骤 3:生成 agent-browser 脚本
```bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close
步骤 4:验证脚本(必需) 执行命令并检查是否提取了数据 如果失败 → 分析错误 → 修复脚本 → 重试(最多3次)
步骤 5:返回已验证的脚本 + 数据预览
**示例输出:**
````markdown
## 已验证的爬虫(agent-browser)
**状态**:✅ 已验证(提取了 50 个项目)
运行以下命令进行爬取:
```bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close
[
{"name": "Item 1", "description": "..."},
{"name": "Item 2", "description": "..."},
// ... 显示前 3 个项目
]
### 生成命令(--standalone:Playwright 脚本)
```
用户:/actionbook-scraper:generate https://example.com/page --standalone
步骤 1:为选择器搜索 Actionbook
步骤 2:获取完整的选择器数据
步骤 3:生成 Playwright/Puppeteer 脚本
步骤 4:验证脚本(必需)
写入临时文件 → node /tmp/scraper.js → 检查输出
如果失败 → 分析错误 → 修复脚本 → 重试(最多3次)
步骤 5:返回已验证的脚本 + 数据预览
```
**示例输出:**
````markdown
## 已验证的爬虫(Playwright)
**状态**:✅ 已验证(提取了 50 个项目)
```javascript
const { chromium } = require('playwright');
// ... 包含 Actionbook 选择器的完整代码
```
用法:
```bash
npm install playwright
node scraper.js
```
### 数据预览
```json
[
{"name": "Item 1", "description": "..."},
// ... 前 3 个项目
]
```
1. 用户:/actionbook-scraper:request-website https://newsite.com/page
2. 启动 website-requester 代理(使用 agent-browser)
3. 代理工作流:
a. agent-browser open "https://actionbook.dev/request-website"
b. agent-browser snapshot -i (发现表单选择器)
c. agent-browser type <url-field> "https://newsite.com/page"
d. agent-browser type <email-field> (可选)
e. agent-browser type <usecase-field> (可选)
f. agent-browser click <submit-button>
g. agent-browser snapshot -i (验证提交)
h. agent-browser close
4. 输出:提交确认
Actionbook 以以下格式返回选择器数据:
{
"url": "https://example.com/page",
"title": "Page Title",
"content": "## Selector Reference\n\n| Element | CSS | XPath | Type |\n..."
}
卡片式布局:
Container: .card-list, .grid-container
Card item: .card, .list-item
Card name: .card__title, .card-name
Card description: .card__description
Expand button: .card__expand, button.expand
详情提取(dt/dd 模式):
// 键值对的常见模式
const items = container.querySelectorAll('.info-item');
items.forEach(item => {
const label = item.querySelector('dt').textContent;
const value = item.querySelector('dd').textContent;
});
表格布局:
Table: table, .data-table
Header: thead th, .table-header
Row: tbody tr, .table-row
Cell: td, .table-cell
| 指示器 | 页面类型 | 模板 |
|---|---|---|
| 滚动以加载更多 | 动态/无限滚动 | playwright-js (带滚动) |
| 点击展开 | 卡片式 | playwright-js (带点击) |
| 分页链接 | 分页 | playwright-js (带分页) |
| 静态内容 | 静态 | puppeteer 或 playwright |
| 检测到 SPA 框架 | SPA | playwright-js (网络空闲) |
## 页面分析:{url}
### 匹配的操作
- **操作 ID**:{action_id}
- **置信度**:高 | 中 | 低
### 可用选择器
| 元素 | 选择器 | 类型 | 方法 |
|---------|----------|------|---------|
| {name} | {selector} | {type} | {methods} |
### 页面结构
- **类型**:{静态|动态|spa}
- **数据模式**:{卡片|表格|列表}
- **懒加载**:{是|否}
- **展开/折叠**:{是|否}
### 建议
- 建议模板:{template}
- 需要特殊处理:{notes}
## 生成的爬虫
**目标 URL**:{url}
**模板**:{template}
**预期输出**:{description}
### 依赖项
```bash
npm install playwright
{generated_code}
node scraper.js
结果保存到 {output_file}
## 模板参考
| 模板 | 标志 | 输出 | 运行方式 |
|----------|------|--------|----------|
| **agent-browser** | (默认) | CLI 命令 | `agent-browser` CLI |
| playwright-js | --standalone | .js 文件 | `node scraper.js` |
| playwright-python | --standalone --template playwright-python | .py 文件 | `python scraper.py` |
| puppeteer | --standalone --template puppeteer | .js 文件 | `node scraper.js` |
## 错误处理
| 错误 | 原因 | 解决方案 |
|-------|-------|----------|
| 未找到操作 | URL 未索引 | 使用 `/actionbook-scraper:request-website` 请求索引 |
| 选择器不工作 | 页面已更新 | 报告给 Actionbook,尝试替代选择器 |
| 超时 | 页面加载慢 | 增加超时时间,添加重试逻辑 |
| 空数据 | 动态内容 | 添加滚动/等待处理 |
| 表单提交失败 | 网络/页面问题 | 重试或手动在 actionbook.dev 提交 |
## agent-browser 用法
对于 `request-website` 命令,插件使用 **agent-browser CLI** 来自动化表单提交。
### agent-browser 命令
```bash
# 打开 URL
agent-browser open "https://actionbook.dev/request-website"
# 获取页面快照(发现选择器)
agent-browser snapshot -i
# 在表单字段中输入
agent-browser type "input[name='url']" "https://example.com"
# 点击按钮
agent-browser click "button[type='submit']"
# 关闭浏览器(始终执行此操作)
agent-browser close
如果表单选择器未知,使用快照来发现它们:
agent-browser open "https://actionbook.dev/request-website"
agent-browser snapshot -i # 返回包含选择器的页面结构
关键:始终在任何 agent-browser 会话结束时运行 agent-browser close,即使发生错误。
/actionbook-scraper:generate https://firstround.com/companies
输出:agent-browser 命令
```bash
agent-browser open "https://firstround.com/companies"
agent-browser scroll down 2000
agent-browser get text ".company-list-card-small"
agent-browser close
用户运行这些命令进行爬取。
### 示例 2:生成 Playwright 脚本
/actionbook-scraper:generate https://firstround.com/companies --standalone
输出:Playwright JavaScript 代码
const { chromium } = require('playwright');
// ... 完整脚本
用户运行:node scraper.js
### 示例 3:分析页面结构
/actionbook-scraper:analyze https://example.com/products
输出:显示以下内容的分析:
可用选择器
页面结构
推荐方法
/actionbook-scraper:request-website https://newsite.com/data
操作:向 actionbook.dev 提交表单(此命令会执行 agent-browser)
## 最佳实践
1. **始终在生成前进行分析** \- 首先了解页面结构
2. **检查 list-sources** \- 在尝试前验证网站是否已索引
3. **审查生成的代码** \- 验证选择器是否匹配预期元素
4. **添加适当的延迟** \- 尊重目标服务器
5. **处理边缘情况** \- 空状态、加载状态、错误
6. **增量测试** \- 在完整爬取前在小子集上运行
每周安装量
125
代码仓库
GitHub 星标数
1.4K
首次出现
2026年2月4日
安全审计
安装于
opencode101
codex101
gemini-cli94
github-copilot80
claude-code75
amp70
Every generated script MUST pass BOTH checks:
| Check | What to Verify | Failure Example |
|---|---|---|
| Part 1: Script Runs | No errors, no timeouts | Selector not found |
| Part 2: Data Correct | Content matches expected | Extracted "Click to expand" instead of name |
┌─────────────────────────────────────────────────────┐
│ 1. Generate Script │
│ ↓ │
│ 2. Execute Script │
│ ↓ │
│ 3. Check Part 1: Script runs without errors? │
│ ↓ │
│ 4. Check Part 2: Data content is correct? │
│ - Not empty │
│ - Not placeholder text ("Loading...") │
│ - Not UI text ("Click to expand") │
│ - Fields mapped correctly │
│ ↓ │
│ ┌───┴───┐ │
│ BOTH Pass Either Fails │
│ │ │ │
│ │ ↓ │
│ │ Is it Actionbook data issue? │
│ │ │ │
│ │ ┌───┴───┐ │
│ │ Yes No │
│ │ │ │ │
│ │ ↓ ↓ │
│ │ Log to Fix script │
│ │ .actionbook-issues.log │
│ │ │ │ │
│ │ └───┬───┘ │
│ │ ↓ │
│ │ Retry (max 3x) │
│ ↓ │
│ Output Script │
└─────────────────────────────────────────────────────┘
/actionbook-scraper:generate <url>
DEFAULT = agent-browser script (bash commands)
agent-browser open "https://example.com"
agent-browser scroll down 2000
agent-browser get text ".selector"
agent-browser close
/actionbook-scraper:generate <url> --standalone
Output = Playwright JavaScript code
Every generated script must pass BOTH checks:
| Check | What to Verify | Failure Action |
|---|---|---|
| 1. Script Runs | No errors, no timeouts | Fix syntax/selector errors |
| 2. Data Correct | Content matches expected fields | Fix extraction logic |
Verify extracted data matches the expected structure:
Expected: Company name, description, website, year founded
Actual: "Click to expand", "Loading...", empty strings
→ FAIL: Data content incorrect, need to fix extraction logic
Data validation rules:
| Rule | Example Failure | Fix |
|---|---|---|
| Fields not empty | name: "" | Check selector targets correct element |
| No placeholder text | name: "Loading..." | Add wait for dynamic content |
| No UI text | name: "Click to expand" | Extract after expanding, not button text |
| Correct data type | year: "View Details" | Wrong selector, fix field mapping |
| Reasonable count | Expected ~100, got 3 | Add scroll/pagination handling |
node script.js/generate <url> → OUTPUT: agent-browser bash commands
/generate <url> --standalone → OUTPUT: Playwright .js file
┌─────────────────────────────────────────────────────────────┐
│ /generate <url> │
│ │
│ 1. Search Actionbook → get selectors │
│ 2. Generate OUTPUT: │
│ │
│ WITHOUT --standalone │ WITH --standalone │
│ ───────────────────── │ ────────────────── │
│ agent-browser commands │ Playwright .js code │
│ │ │
│ ```bash │ ```javascript │
│ agent-browser open ... │ const { chromium } = ... │
│ agent-browser get ... │ await page.goto(...) │
│ agent-browser close │ ``` │
│ ``` │ │
└─────────────────────────────────────────────────────────────┘
| Operation | Primary Tool | Fallback | Notes |
|---|---|---|---|
| Find selectors for URL | search_actions | None | Search by domain/keywords |
| Get full selector details | get_action_by_id | None | Use action_id from search |
| List available sources | list_sources | search_sources | Browse all indexed sites |
| Generate agent-browser script | Agent (sonnet) | - | Default mode for /generate |
Every generated script MUST be verified by executing it.
| Step | Action |
|---|---|
| 1 | Generate script with Actionbook selectors |
| 2 | Execute script to verify it works |
| 3 | If failed: analyze error, fix script, go to step 2 |
| 4 | If success: output verified script + data preview |
For agent-browser scripts:
# Execute each command
agent-browser open "https://example.com"
agent-browser wait --load networkidle
agent-browser get text ".selector"
# Check if data is returned
# If error → fix and retry
agent-browser close
For Playwright scripts (--standalone):
# Write to temp file and execute
node /tmp/scraper.js
# Check if output file has data
# If error → fix and retry
agent-browser close| Error | Example | Fix |
|---|---|---|
| Extracted button text | name: "Click to expand" | Extract content after expanding |
| Extracted placeholder | desc: "Loading..." | Add wait for dynamic content |
| Empty fields | name: "" | Fix selector |
| Wrong field mapping | year: "San Francisco" | Fix selector for each field |
| Too few items | Expected 100, got 3 | Add scroll/pagination |
If Actionbook selectors are wrong or outdated, record to local file:
.actionbook-issues.log
When to record:
Log format:
[YYYY-MM-DD HH:MM] URL: {url}
Action ID: {action_id}
Issue Type: {selector_error | outdated | missing}
Details: {description}
Selector: {selector}
Expected: {what it should select}
Actual: {what it actually selects or error}
---
When Actionbook provides multiple selectors, prefer in this order:
data-testid - Most stable, designed for automationaria-label - Accessibility-based, semanticcss - Class-based selectorsxpath - Last resort, most fragile| Command | Description | Agent |
|---|---|---|
/actionbook-scraper:analyze <url> | Analyze page structure and show available selectors | structure-analyzer |
/actionbook-scraper:generate <url> | Generate agent-browser scraper script | code-generator |
/actionbook-scraper:generate <url> --standalone | Generate Playwright/Puppeteer script | code-generator |
/actionbook-scraper:list-sources | List websites with Actionbook data | - |
/actionbook-scraper:request-website <url> |
1. User: /actionbook-scraper:analyze https://example.com/page
2. Extract domain from URL → "example.com"
3. search_actions("example page") → [action_ids]
4. For best match: get_action_by_id(action_id) → full selector data
5. Structure-analyzer agent formats and presents findings
User: /actionbook-scraper:generate https://example.com/page
Step 1: Search Actionbook
search_actions("example.com page") → action_ids
Step 2: Get selectors
get_action_by_id(best_match) → selectors
Step 3: Generate agent-browser script
```bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close
Step 4: VERIFY script (REQUIRED) Execute the commands and check if data is extracted If failed → analyze error → fix script → retry (max 3x)
Step 5: Return verified script + data preview
**Example Output:**
````markdown
## Verified Scraper (agent-browser)
**Status**: ✅ Verified (extracted 50 items)
Run these commands to scrape:
```bash
agent-browser open "https://example.com/page"
agent-browser wait --load networkidle
agent-browser scroll down 2000
agent-browser get text ".item-container"
agent-browser close
[
{"name": "Item 1", "description": "..."},
{"name": "Item 2", "description": "..."},
// ... showing first 3 items
]
### Generate Command (--standalone: Playwright script)
```
User: /actionbook-scraper:generate https://example.com/page --standalone
Step 1: Search Actionbook for selectors
Step 2: Get full selector data
Step 3: Generate Playwright/Puppeteer script
Step 4: VERIFY script (REQUIRED)
Write to temp file → node /tmp/scraper.js → check output
If failed → analyze error → fix script → retry (max 3x)
Step 5: Return verified script + data preview
```
**Example Output:**
````markdown
## Verified Scraper (Playwright)
**Status**: ✅ Verified (extracted 50 items)
```javascript
const { chromium } = require('playwright');
// ... generated code with Actionbook selectors
```
Usage:
```bash
npm install playwright
node scraper.js
```
### Data Preview
```json
[
{"name": "Item 1", "description": "..."},
// ... first 3 items
]
```
1. User: /actionbook-scraper:request-website https://newsite.com/page
2. Launch website-requester agent (uses agent-browser)
3. Agent workflow:
a. agent-browser open "https://actionbook.dev/request-website"
b. agent-browser snapshot -i (discover form selectors)
c. agent-browser type <url-field> "https://newsite.com/page"
d. agent-browser type <email-field> (optional)
e. agent-browser type <usecase-field> (optional)
f. agent-browser click <submit-button>
g. agent-browser snapshot -i (verify submission)
h. agent-browser close
4. Output: Confirmation of submission
Actionbook returns selector data in this format:
{
"url": "https://example.com/page",
"title": "Page Title",
"content": "## Selector Reference\n\n| Element | CSS | XPath | Type |\n..."
}
Card-based layouts:
Container: .card-list, .grid-container
Card item: .card, .list-item
Card name: .card__title, .card-name
Card description: .card__description
Expand button: .card__expand, button.expand
Detail extraction (dt/dd pattern):
// Common pattern for key-value pairs
const items = container.querySelectorAll('.info-item');
items.forEach(item => {
const label = item.querySelector('dt').textContent;
const value = item.querySelector('dd').textContent;
});
Table layouts:
Table: table, .data-table
Header: thead th, .table-header
Row: tbody tr, .table-row
Cell: td, .table-cell
| Indicator | Page Type | Template |
|---|---|---|
| Scroll to load more | Dynamic/Infinite | playwright-js (with scroll) |
| Click to expand | Card-based | playwright-js (with click) |
| Pagination links | Paginated | playwright-js (with pagination) |
| Static content | Static | puppeteer or playwright |
| SPA framework detected | SPA | playwright-js (network idle) |
## Page Analysis: {url}
### Matched Action
- **Action ID**: {action_id}
- **Confidence**: HIGH | MEDIUM | LOW
### Available Selectors
| Element | Selector | Type | Methods |
|---------|----------|------|---------|
| {name} | {selector} | {type} | {methods} |
### Page Structure
- **Type**: {static|dynamic|spa}
- **Data Pattern**: {cards|table|list}
- **Lazy Loading**: {yes|no}
- **Expand/Collapse**: {yes|no}
### Recommendations
- Suggested template: {template}
- Special handling needed: {notes}
## Generated Scraper
**Target URL**: {url}
**Template**: {template}
**Expected Output**: {description}
### Dependencies
```bash
npm install playwright
{generated_code}
node scraper.js
Results saved to {output_file}
## Templates Reference
| Template | Flag | Output | Run With |
|----------|------|--------|----------|
| **agent-browser** | (default) | CLI commands | `agent-browser` CLI |
| playwright-js | --standalone | .js file | `node scraper.js` |
| playwright-python | --standalone --template playwright-python | .py file | `python scraper.py` |
| puppeteer | --standalone --template puppeteer | .js file | `node scraper.js` |
## Error Handling
| Error | Cause | Solution |
|-------|-------|----------|
| No actions found | URL not indexed | Use `/actionbook-scraper:request-website` to request indexing |
| Selectors not working | Page updated | Report to Actionbook, try alternative selectors |
| Timeout | Slow page load | Increase timeout, add retry logic |
| Empty data | Dynamic content | Add scroll/wait handling |
| Form submission failed | Network/page issue | Retry or submit manually at actionbook.dev |
## agent-browser Usage
For the `request-website` command, the plugin uses **agent-browser CLI** to automate form submission.
### agent-browser Commands
```bash
# Open a URL
agent-browser open "https://actionbook.dev/request-website"
# Get page snapshot (discover selectors)
agent-browser snapshot -i
# Type into form field
agent-browser type "input[name='url']" "https://example.com"
# Click button
agent-browser click "button[type='submit']"
# Close browser (ALWAYS do this)
agent-browser close
If form selectors are unknown, use snapshot to discover them:
agent-browser open "https://actionbook.dev/request-website"
agent-browser snapshot -i # Returns page structure with selectors
Critical : Always run agent-browser close at the end of any agent-browser session, even if errors occur.
/actionbook-scraper:generate https://firstround.com/companies
Output: agent-browser commands
```bash
agent-browser open "https://firstround.com/companies"
agent-browser scroll down 2000
agent-browser get text ".company-list-card-small"
agent-browser close
User runs these commands to scrape.
### Example 2: Generate Playwright Script
/actionbook-scraper:generate https://firstround.com/companies --standalone
Output: Playwright JavaScript code
const { chromium } = require('playwright');
// ... full script
User runs: node scraper.js
### Example 3: Analyze Page Structure
/actionbook-scraper:analyze https://example.com/products
Output: Analysis showing:
Available selectors
Page structure
Recommended approach
/actionbook-scraper:request-website https://newsite.com/data
Action: Submits form to actionbook.dev (this command DOES execute agent-browser)
## Best Practices
1. **Always analyze before generating** - Understand the page structure first
2. **Check list-sources** - Verify the site is indexed before attempting
3. **Review generated code** - Verify selectors match expected elements
4. **Add appropriate delays** - Be respectful to target servers
5. **Handle edge cases** - Empty states, loading states, errors
6. **Test incrementally** - Run on small subset before full scrape
Weekly Installs
125
Repository
GitHub Stars
1.4K
First Seen
Feb 4, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode101
codex101
gemini-cli94
github-copilot80
claude-code75
amp70
通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南
44,900 周安装
| Generate Playwright script | Agent (sonnet) | - | Use --standalone flag |
| Structure analysis | Agent (haiku) | - | Parse Actionbook response |
| Request new website | agent-browser | Manual | Submit to actionbook.dev (ONLY command that executes agent-browser) |
| Request new website to be indexed (uses agent-browser) |
| website-requester |