agent-browser by toolshell/skills
npx skills add https://github.com/toolshell/skills --skill agent-browser通过 inference.sh 实现 AI 代理的浏览器自动化。底层使用 Playwright,并采用简单的 @e 引用系统进行元素交互。

需要 inference.sh CLI (
infsh)。获取安装说明:npx skills add inference-sh/skills@agent-tools
infsh login
# 打开页面并获取可交互元素
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
每个浏览器自动化都遵循以下模式:
@e 引用Browser automation for AI agents via inference.sh. Uses Playwright under the hood with a simple @e ref system for element interaction.

Requires inference.sh CLI (
infsh). Get installation instructions:npx skills add inference-sh/skills@agent-tools
infsh login
# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new
Every browser automation follows this pattern:
@e refs for elements广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
# 1. 启动会话
RESULT=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# 元素:@e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"
# 2. 填写并提交
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "click", "ref": "@e3"
}'
# 3. 导航后重新快照
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'
# 4. 完成后关闭
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
| 函数 | 描述 |
|---|---|
open | 导航到 URL,配置浏览器(视口、代理、视频录制) |
snapshot | DOM 变化后重新获取页面状态及 @e 引用 |
interact | 使用 @e 引用执行操作(点击、填写、拖拽、上传等) |
screenshot | 截取页面截图(视口或完整页面) |
execute | 在页面上运行 JavaScript 代码 |
close | 关闭会话,如果启用了录制则返回视频 |
| 操作 | 描述 | 必填字段 |
|---|---|---|
click | 点击元素 | ref |
dblclick | 双击元素 | ref |
fill | 清除并输入文本 | ref, text |
type | 输入文本(不清除) | text |
press | 按键(Enter、Tab 等) | text |
select | 选择下拉选项 | ref, text |
hover | 悬停在元素上 | ref |
check | 勾选复选框 | ref |
uncheck | 取消勾选复选框 | ref |
drag | 拖放 | ref, target_ref |
upload | 上传文件 | ref, file_paths |
scroll | 滚动页面 | direction (上/下/左/右), scroll_amount |
back | 后退历史记录 | - |
wait | 等待毫秒数 | wait_ms |
goto | 导航到 URL | url |
元素以 @e 引用形式返回:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"
重要提示: 引用在导航后失效。在以下操作后务必重新快照:
录制浏览器会话用于调试或文档记录:
# 启动时启用录制(可选显示光标指示器)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true,
"show_cursor": true
}' | jq -r '.session_id')
# ... 执行操作 ...
# 关闭以获取视频文件
infsh app run agent-browser --function close --session $SESSION --input '{}'
# 返回:{"success": true, "video": <File>}
在截图和视频中显示可见光标(适用于演示):
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"show_cursor": true,
"record_video": true
}'
光标显示为红点,跟随鼠标移动并显示点击反馈。
通过代理服务器路由流量:
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"proxy_url": "http://proxy.example.com:8080",
"proxy_username": "user",
"proxy_password": "pass"
}'
将文件上传到文件输入框:
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "upload",
"ref": "@e5",
"file_paths": ["/path/to/file.pdf"]
}'
将元素拖放到目标位置:
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "drag",
"ref": "@e1",
"target_ref": "@e2"
}'
运行自定义 JavaScript:
infsh app run agent-browser --function execute --session $SESSION --input '{
"code": "document.querySelectorAll(\"h2\").length"
}'
# 返回:{"result": "5", "screenshot": <File>}
| 参考 | 描述 |
|---|---|
| references/commands.md | 包含所有选项的完整函数参考 |
| references/snapshot-refs.md | 引用生命周期、失效规则、故障排除 |
| references/session-management.md | 会话持久化、并行会话 |
| references/authentication.md | 登录流程、OAuth、2FA 处理 |
| references/video-recording.md | 用于调试的录制工作流 |
| references/proxy-support.md | 代理配置、地理测试 |
| 模板 | 描述 |
|---|---|
| templates/form-automation.sh | 带验证的表单填写 |
| templates/authenticated-session.sh | 登录一次,重复使用会话 |
| templates/capture-workflow.sh | 带截图的内容提取 |
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/contact"
}' | jq -r '.session_id')
# 获取元素:@e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://google.com"
}' | jq -r '.session_id')
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true
}' | jq -r '.session_id')
# 截取完整页面截图
infsh app run agent-browser --function screenshot --session $SESSION --input '{
"full_page": true
}'
# 关闭并获取视频
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'
浏览器状态在会话中保持。请始终:
--session newsession_id# 网络搜索(用于研究 + 浏览)
npx skills add inference-sh/skills@web-search
# LLM 模型(分析提取的内容)
npx skills add inference-sh/skills@llm-models
每周安装量
25.4K
仓库
GitHub 星标数
125
首次出现
2 天前
安全审计
安装于
claude-code20.4K
gemini-cli17.8K
codex17.8K
opencode17.8K
amp17.8K
kimi-cli17.8K
# 1. Start session
RESULT=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/login"
}')
SESSION_ID=$(echo $RESULT | jq -r '.session_id')
# Elements: @e1 [input] "Email", @e2 [input] "Password", @e3 [button] "Sign In"
# 2. Fill and submit
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e1", "text": "user@example.com"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "fill", "ref": "@e2", "text": "password123"
}'
infsh app run agent-browser --function interact --session $SESSION_ID --input '{
"action": "click", "ref": "@e3"
}'
# 3. Re-snapshot after navigation
infsh app run agent-browser --function snapshot --session $SESSION_ID --input '{}'
# 4. Close when done
infsh app run agent-browser --function close --session $SESSION_ID --input '{}'
| Function | Description |
|---|---|
open | Navigate to URL, configure browser (viewport, proxy, video recording) |
snapshot | Re-fetch page state with @e refs after DOM changes |
interact | Perform actions using @e refs (click, fill, drag, upload, etc.) |
screenshot | Take page screenshot (viewport or full page) |
execute | Run JavaScript code on the page |
close | Close session, returns video if recording was enabled |
| Action | Description | Required Fields |
|---|---|---|
click | Click element | ref |
dblclick | Double-click element | ref |
fill | Clear and type text | ref, text |
type | Type text (no clear) | text |
press | Press key (Enter, Tab, etc.) | text |
select | Select dropdown option | ref, text |
hover | Hover over element | ref |
check | Check checkbox | ref |
uncheck | Uncheck checkbox | ref |
drag | Drag and drop | ref, target_ref |
upload | Upload file(s) | ref, file_paths |
scroll | Scroll page | direction (up/down/left/right), scroll_amount |
back | Go back in history | - |
wait | Wait milliseconds | wait_ms |
goto | Navigate to URL | url |
Elements are returned with @e refs:
@e1 [a] "Home" href="/"
@e2 [input type="text"] placeholder="Search"
@e3 [button] "Submit"
@e4 [select] "Choose option"
@e5 [input type="checkbox"] name="agree"
Important: Refs are invalidated after navigation. Always re-snapshot after:
Record browser sessions for debugging or documentation:
# Start with recording enabled (optionally show cursor indicator)
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true,
"show_cursor": true
}' | jq -r '.session_id')
# ... perform actions ...
# Close to get the video file
infsh app run agent-browser --function close --session $SESSION --input '{}'
# Returns: {"success": true, "video": <File>}
Show a visible cursor in screenshots and video (useful for demos):
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"show_cursor": true,
"record_video": true
}'
The cursor appears as a red dot that follows mouse movements and shows click feedback.
Route traffic through a proxy server:
infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"proxy_url": "http://proxy.example.com:8080",
"proxy_username": "user",
"proxy_password": "pass"
}'
Upload files to file inputs:
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "upload",
"ref": "@e5",
"file_paths": ["/path/to/file.pdf"]
}'
Drag elements to targets:
infsh app run agent-browser --function interact --session $SESSION --input '{
"action": "drag",
"ref": "@e1",
"target_ref": "@e2"
}'
Run custom JavaScript:
infsh app run agent-browser --function execute --session $SESSION --input '{
"code": "document.querySelectorAll(\"h2\").length"
}'
# Returns: {"result": "5", "screenshot": <File>}
| Reference | Description |
|---|---|
| references/commands.md | Full function reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Session persistence, parallel sessions |
| references/authentication.md | Login flows, OAuth, 2FA handling |
| references/video-recording.md | Recording workflows for debugging |
| references/proxy-support.md | Proxy configuration, geo-testing |
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse session |
| templates/capture-workflow.sh | Content extraction with screenshots |
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com/contact"
}' | jq -r '.session_id')
# Get elements: @e1 [input] "Name", @e2 [input] "Email", @e3 [textarea], @e4 [button] "Send"
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "John Doe"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e2", "text": "john@example.com"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e3", "text": "Hello!"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "click", "ref": "@e4"}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://google.com"
}' | jq -r '.session_id')
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "fill", "ref": "@e1", "text": "weather today"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "press", "text": "Enter"}'
infsh app run agent-browser --function interact --session $SESSION --input '{"action": "wait", "wait_ms": 2000}'
infsh app run agent-browser --function snapshot --session $SESSION --input '{}'
infsh app run agent-browser --function close --session $SESSION --input '{}'
SESSION=$(infsh app run agent-browser --function open --session new --input '{
"url": "https://example.com",
"record_video": true
}' | jq -r '.session_id')
# Take full page screenshot
infsh app run agent-browser --function screenshot --session $SESSION --input '{
"full_page": true
}'
# Close and get video
RESULT=$(infsh app run agent-browser --function close --session $SESSION --input '{}')
echo $RESULT | jq '.video'
Browser state persists within a session. Always:
--session new on first callsession_id for subsequent calls# Web search (for research + browse)
npx skills add inference-sh/skills@web-search
# LLM models (analyze extracted content)
npx skills add inference-sh/skills@llm-models
Weekly Installs
25.4K
Repository
GitHub Stars
125
First Seen
2 days ago
Security Audits
Installed on
claude-code20.4K
gemini-cli17.8K
codex17.8K
opencode17.8K
amp17.8K
kimi-cli17.8K
AI新闻播客制作技能:实时新闻转对话式播客脚本与音频生成
1,200 周安装