agentation-self-driving by benjitaylor/agentation
npx skills add https://github.com/benjitaylor/agentation --skill agentation-self-driving通过 Agentation 工具栏自动为网页添加设计批注,实现自主评审——在可见的带界面浏览器中运行,让用户能够实时观察智能体工作,就像观看自动驾驶汽车导航一样。
浏览器必须可见。切勿以无头模式运行。用户将观察你扫描、悬停、点击和批注的过程。
预检:首先验证 agent-browser 是否可用:
command -v agent-browser >/dev/null || { echo "ERROR: agent-browser not found. Install the agent-browser skill first."; exit 1; }
启动:首先尝试直接打开。仅当打开命令因会话过期错误而失败时,才关闭现有会话——这可以避免终止他人正在使用的浏览器:
# 尝试打开。如果失败(会话过期),先关闭再重试。
agent-browser --headed open <url> 2>&1 || { agent-browser close 2>/dev/null; agent-browser --headed open <url>; }
然后验证 Agentation 工具栏是否存在并展开它:
# 1. 检查页面上是否存在工具栏(data-feedback-toolbar 是根标记)
agent-browser eval "document.querySelector('[data-feedback-toolbar]') ? 'toolbar found' : 'NOT FOUND'"
# 如果返回 "NOT FOUND":此页面未安装 Agentation——停止并告知用户
# 2. 仅在折叠时展开(已展开时点击会折叠它)
agent-browser eval "document.querySelector('[data-feedback-toolbar][class*=expanded]') ? 'already expanded' : (document.querySelector('[class*=toggleContent]')?.click(), 'expanding')"
# 3. 验证:截取快照并查找工具栏控件
agent-browser snapshot -i
# 如果已展开:你将看到“阻止页面交互”复选框、颜色按钮(紫色、蓝色等)
# 如果已折叠:你只会看到小的切换按钮——重试步骤 2
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
“阻止页面交互”必须勾选(默认:开启)。
eval 引用规则:在 eval 字符串中始终使用
[class*=toggleContent](属性值不加引号)。不要在 eval 中使用双感叹号,因为 bash 会将其视为历史扩展。也不要使用反斜杠转义的内引号,因为它们在不同 shell 中会不可预测地中断。
标准元素点击(click @ref)不会触发批注对话框。 Agentation 覆盖层在坐标级别拦截指针事件。使用基于坐标的鼠标事件——这也会使交互在浏览器中可见,因为光标会在页面上移动。
@ref兼容性:只有click、fill、type、hover、focus、check、select、drag支持@ref语法。命令scrollintoview、get box和eval不支持——它们期望 CSS 选择器。使用eval配合querySelector进行滚动和位置查找。
# 1. 截取交互式快照——识别目标元素并构建 CSS 选择器
agent-browser snapshot -i
# 示例:快照显示标题 "Point at bugs." [ref=e10]
# 推导 CSS 选择器:'h1',或更具体:'h1:first-of-type'
# 2. 通过 eval 将元素滚动到视图中(不要使用 scrollintoview @ref——那会出错)
agent-browser eval "document.querySelector('h1').scrollIntoView({block:'center'})"
# 3. 通过 eval 获取其边界框(不要使用 get box @ref——那也会出错)
agent-browser eval "((r) => r.x+','+r.y+','+r.width+','+r.height)(document.querySelector('h1').getBoundingClientRect())"
# 返回:"383,245,200,40"(解析为 x,y,width,height)
# 4. 将光标移动到元素中心,然后点击
# centerX = x + width/2, centerY = y + height/2
agent-browser mouse move <centerX> <centerY>
agent-browser mouse down left
agent-browser mouse up left
# 5. 获取批注对话框的引用——读取完整的快照输出
# 对话框引用出现在列表的底部,不要用 head/tail 截断
agent-browser snapshot -i
# 查找:文本框 "What should change?" 和 "Cancel" / "Add" 按钮
# 6. 输入评论——fill 和 click 确实支持 @ref
agent-browser fill @<textboxRef> "Your critique here"
# 7. 提交(填写文本后 Add 按钮会启用)
agent-browser click @<addRef>
如果点击后没有出现对话框,工具栏可能已折叠。重新展开(仅在折叠时)并重试:
agent-browser eval "document.querySelector('[data-feedback-toolbar][class*=expanded]') ? 'ok' : (document.querySelector('[class*=toggleContent]')?.click(), 'expanded')"
快照显示元素角色、名称和引用。将它们映射到 CSS 选择器:
| 快照行 | CSS 选择器 |
|---|---|
heading "Point at bugs." [ref=e10] | h1 或 h1:first-of-type |
button "npm install agentation Copy" [ref=e15] | button:has(code) 或通过 eval 按文本内容选择 |
link "Star on GitHub" [ref=e28] | a[href*=github] |
paragraph (long text...) [ref=e20] | 按区域定位:section:nth-of-type(2) p |
如有疑问,使用更宽泛的选择器并通过 eval 验证:
agent-browser eval "document.querySelector('h2').textContent"
从上到下处理页面。对于每个批注:
scrollIntoView)滚动到目标区域getBoundingClientRect)获取其边界框mouse move → mouse down → mouse up)fill @ref)并提交(click @ref)提交每个批注后,确认计数是否增加:
agent-browser eval "document.querySelectorAll('[data-annotation-marker]').length"
# 应返回预期计数(第一个后为 1,第二个后为 2,依此类推)
如果计数没有增加,提交静默失败——重新截取快照并检查对话框是否仍处于打开状态。
除非另有说明,每页目标为 5-8 个批注。
| 区域 | 关注点 |
|---|---|
| 首屏/折叠上方 | 标题层级、行动号召按钮位置、视觉分组 |
| 导航 | 标签样式、分类分组、视觉权重 |
| 演示/插图 | 清晰度、深度、动画可读性 |
| 内容区域 | 间距节奏、标注处理、排版层级 |
| 关键标语 | 是否有足够视觉强调的共鸣语句 |
| 行动号召按钮和页脚 | 转化权重、视觉分隔、最终行动 |
每个批注最多 2-3 句话:
差评:“这个部分需要改进” 好评:“这个项目列表读起来像文档,而不是展示。使用带图标的 3 列卡片网格——类似于 Stripe 的指南模式。创造视觉节奏和可扫描性。”
技能必须符号链接到 ~/.claude/skills/ 以便 Claude Code 发现它:
ln -s "$(pwd)/skills/agentation-self-driving" ~/.claude/skills/agentation-self-driving
安装后重启 Claude Code。使用 /agentation-self-driving 验证——如果加载了技能说明,则符号链接正常工作。
agent-browser close 2>/dev/null,然后重试 --headed open 命令/agentation 进行设置[class*=expanded]),重试agent-browser close 进行清理如果不注意这些,它们会静默中断工作流:
| 陷阱 | 发生情况 | 修复方法 |
|---|---|---|
scrollintoview @ref | 崩溃:“解析 css 选择器时出现不支持的标记 @ref” | 使用 eval "document.querySelector('sel').scrollIntoView({block:'center'})" |
get box @ref | 同样崩溃——get box 将引用解析为 CSS 选择器 | 使用 eval "((r)=>r.x+','+r.y+','+r.width+','+r.height)(document.querySelector('sel').getBoundingClientRect())" |
eval 中使用双感叹号 | Bash 在命令运行前将双感叹号扩展为历史替换 | 改用 expr !== null 或 expr ? true : false |
eval 中使用反斜杠转义引号 | 转义的内引号在不同 shell 中会中断 | 去掉引号:[class*=toggleContent] 适用于没有空格的简单值 |
| `snapshot -i | head -50` | 批注对话框引用(textbox "What should change?"、Add、Cancel)出现在快照的底部 |
click @ref 用于覆盖层元素 | 点击会穿透到真实的 DOM,绕过 Agentation 覆盖层 | 使用 mouse move → mouse down left → mouse up left 进行基于坐标的点击,覆盖层会拦截这些点击 |
--headed open 失败并显示“浏览器未启动” | 先前运行的过期会话阻止新启动 | 运行 agent-browser close 2>/dev/null,然后重试打开命令 |
经验法则:@ref 适用于交互命令(click、fill、type、hover)。对于其他所有命令(eval、get、scrollintoview),在 eval 中通过 querySelector 使用 CSS 选择器。
当 MCP 已连接(工具栏显示“MCP Connected”)时,批注会自动发送到任何监听的智能体。这使得:
agentation_watch_annotations,接收批注,编辑代码以解决每个问题用户观看会话 1 在浏览器中驱动页面,同时会话 2 在代码库中修复问题——实现完全自主的设计评审和实施。
每周安装量
1.3K
仓库
GitHub 星标数
2.8K
首次出现
2026年2月15日
安全审计
安装于
codex1.2K
opencode1.2K
gemini-cli1.2K
github-copilot1.2K
claude-code1.1K
cursor1.1K
Autonomously critique a web page by adding design annotations via the Agentation toolbar — in a visible headed browser so the user can watch the agent work in real time, like watching a self-driving car navigate.
The browser MUST be visible. Never run headless. The user watches you scan, hover, click, and annotate.
Preflight : Verify agent-browser is available before anything else:
command -v agent-browser >/dev/null || { echo "ERROR: agent-browser not found. Install the agent-browser skill first."; exit 1; }
Launch : Try opening directly first. Only close an existing session if the open command fails with a stale session error — this avoids killing a browser someone else is using:
# Try to open. If it fails (stale session), close first then retry.
agent-browser --headed open <url> 2>&1 || { agent-browser close 2>/dev/null; agent-browser --headed open <url>; }
Then verify the Agentation toolbar is present and expand it:
# 1. Check toolbar exists on the page (data-feedback-toolbar is the root marker)
agent-browser eval "document.querySelector('[data-feedback-toolbar]') ? 'toolbar found' : 'NOT FOUND'"
# If "NOT FOUND": Agentation is not installed on this page — stop and tell the user
# 2. Expand ONLY if collapsed (clicking when already expanded collapses it)
agent-browser eval "document.querySelector('[data-feedback-toolbar][class*=expanded]') ? 'already expanded' : (document.querySelector('[class*=toggleContent]')?.click(), 'expanding')"
# 3. Verify: take a snapshot and look for toolbar controls
agent-browser snapshot -i
# If expanded: you'll see "Block page interactions" checkbox, color buttons (Purple, Blue, etc.)
# If collapsed: you'll only see the small toggle button — retry step 2
"Block page interactions" must be checked (default: on).
eval quoting rule : Always use
[class*=toggleContent](no quotes around the attribute value) in eval strings. Do not use double-bang in eval because bash treats it as history expansion. Do not use backslash-escaped inner quotes either, as they break unpredictably across shells.
Standard element clicks (click @ref) do NOT trigger annotation dialogs. The Agentation overlay intercepts pointer events at the coordinate level. Use coordinate-based mouse events — this also makes the interaction visible in the browser as the cursor moves across the page.
@refcompatibility: Onlyclick,fill,type,hover,focus,check,select,dragsupport@refsyntax. The commandsscrollintoview,get box, and do NOT — they expect CSS selectors. Use with for scrolling and position lookup.
# 1. Take interactive snapshot — identify target element and build a CSS selector
agent-browser snapshot -i
# Example: snapshot shows heading "Point at bugs." [ref=e10]
# Derive a CSS selector: 'h1', or more specific: 'h1:first-of-type'
# 2. Scroll the element into view via eval (NOT scrollintoview @ref — that breaks)
agent-browser eval "document.querySelector('h1').scrollIntoView({block:'center'})"
# 3. Get its bounding box via eval (NOT get box @ref — that also breaks)
agent-browser eval "((r) => r.x+','+r.y+','+r.width+','+r.height)(document.querySelector('h1').getBoundingClientRect())"
# Returns: "383,245,200,40" (parse these as x,y,width,height)
# 4. Move cursor to element center, then click
# centerX = x + width/2, centerY = y + height/2
agent-browser mouse move <centerX> <centerY>
agent-browser mouse down left
agent-browser mouse up left
# 5. Get the annotation dialog refs — read the FULL snapshot output
# Dialog refs appear at the BOTTOM of the list, don't truncate with head/tail
agent-browser snapshot -i
# Look for: textbox "What should change?" and "Cancel" / "Add" buttons
# 6. Type critique — fill and click DO support @ref
agent-browser fill @<textboxRef> "Your critique here"
# 7. Submit (Add button enables after text is filled)
agent-browser click @<addRef>
If no dialog appears after clicking, the toolbar may have collapsed. Re-expand (only if collapsed) and retry:
agent-browser eval "document.querySelector('[data-feedback-toolbar][class*=expanded]') ? 'ok' : (document.querySelector('[class*=toggleContent]')?.click(), 'expanded')"
The snapshot shows element roles, names, and refs. Map them to CSS selectors:
| Snapshot line | CSS selector |
|---|---|
heading "Point at bugs." [ref=e10] | h1 or h1:first-of-type |
button "npm install agentation Copy" [ref=e15] | button:has(code) or by text content via eval |
link "Star on GitHub" [ref=e28] | a[href*=github] |
paragraph (long text...) [ref=e20] |
When in doubt, use a broader selector and verify with eval:
agent-browser eval "document.querySelector('h2').textContent"
Work top-to-bottom through the page. For each annotation:
scrollIntoView)getBoundingClientRect)mouse move → mouse down → mouse up)fill @ref) and submit (click @ref)After submitting each annotation, confirm the count increased:
agent-browser eval "document.querySelectorAll('[data-annotation-marker]').length"
# Should return the expected count (1 after first, 2 after second, etc.)
If the count didn't increase, the submission failed silently — re-snapshot and check if the dialog is still open.
Aim for 5-8 annotations per page unless told otherwise.
| Area | What to look for |
|---|---|
| Hero / above the fold | Headline hierarchy, CTA placement, visual grouping |
| Navigation | Label styling, category grouping, visual weight |
| Demo / illustrations | Clarity, depth, animation readability |
| Content sections | Spacing rhythm, callout treatments, typography hierarchy |
| Key taglines | Whether resonant lines get enough visual emphasis |
| CTAs and footer | Conversion weight, visual separation, final actions |
2-3 sentences max per annotation:
Bad: "This section needs work" Good: "This bullet list reads like docs, not a showcase. Use a 3-column card grid with icons — similar to Stripe's guidelines pattern. Creates visual rhythm and scannability."
The skill must be symlinked into ~/.claude/skills/ for Claude Code to discover it:
ln -s "$(pwd)/skills/agentation-self-driving" ~/.claude/skills/agentation-self-driving
Restart Claude Code after installing. Verify with /agentation-self-driving — if it loads the skill instructions, the symlink is working.
agent-browser close 2>/dev/null then retry the --headed open command/agentation to set it up first[class*=expanded] first), retryagent-browser close to clean up before starting a new sessionThese will silently break the workflow if you're not aware of them:
| Pitfall | What happens | Fix |
|---|---|---|
scrollintoview @ref | Crashes: "Unsupported token @ref while parsing css selector" | Use eval "document.querySelector('sel').scrollIntoView({block:'center'})" |
get box @ref | Same crash — get box parses refs as CSS selectors | Use eval "((r)=>r.x+','+r.y+','+r.width+','+r.height)(document.querySelector('sel').getBoundingClientRect())" |
eval with double-bang | Bash expands double-bang as history substitution before the command runs |
Rule of thumb : @ref works for interaction commands (click, fill, type, hover). For everything else (eval, get, scrollintoview), use CSS selectors via querySelector in an eval.
With MCP connected (toolbar shows "MCP Connected"), annotations auto-send to any listening agent. This enables:
agentation_watch_annotations in a loop, receives annotations, edits code to address each oneThe user watches Session 1 drive through the page in the browser while Session 2 fixes issues in the codebase — fully autonomous design review and implementation.
Weekly Installs
1.3K
Repository
GitHub Stars
2.8K
First Seen
Feb 15, 2026
Security Audits
Gen Agent Trust HubPassSocketFailSnykWarn
Installed on
codex1.2K
opencode1.2K
gemini-cli1.2K
github-copilot1.2K
claude-code1.1K
cursor1.1K
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
102,200 周安装
AI智能体长期记忆系统 - 精英级架构,融合6种方法,永不丢失上下文
1,200 周安装
AI新闻播客制作技能:实时新闻转对话式播客脚本与音频生成
1,200 周安装
Word文档处理器:DOCX创建、编辑、分析与修订痕迹处理全指南 | 自动化办公解决方案
1,200 周安装
React Router 框架模式指南:全栈开发、文件路由、数据加载与渲染策略
1,200 周安装
Nano Banana AI 图像生成工具:使用 Gemini 3 Pro 生成与编辑高分辨率图像
1,200 周安装
SVG Logo Designer - AI 驱动的专业矢量标识设计工具,生成可缩放品牌标识
1,200 周安装
evalevalquerySelectorTarget by section: section:nth-of-type(2) p |
Use expr !== null or expr ? true : false instead |
eval with backslash-escaped quotes | Escaped inner quotes break across shells | Drop the quotes: [class*=toggleContent] works for simple values without spaces |
| `snapshot -i | head -50` | Annotation dialog refs (textbox "What should change?", Add, Cancel) appear at the BOTTOM of the snapshot |
click @ref on overlay elements | The click goes through to the real DOM, bypassing the Agentation overlay | Use mouse move → mouse down left → mouse up left for coordinate-based clicks that the overlay intercepts |
--headed open fails with "Browser not launched" | Stale sessions from previous runs block new launches | Run agent-browser close 2>/dev/null then retry the open command |