Chrome Bridge 自动化：Midscene 真实浏览器自动化工具，保留登录状态

chrome-bridge-automation by web-infra-dev/midscene-skills

651 周安装量

141 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/web-infra-dev/midscene-skills --skill chrome-bridge-automation

AI/机器学习自动化测试

🇨🇳中文介绍

Chrome Bridge 自动化

关键规则 — 违反将导致工作流中断：

切勿在后台运行 midscene 命令。 每个命令必须同步运行，以便您能在决定下一步操作前读取其输出（尤其是截图）。后台执行会破坏截图-分析-执行循环。

一次只运行一个 midscene 命令。 等待前一个命令完成，读取截图，然后决定下一个操作。切勿将多个命令串联在一起。

为每个命令预留足够的完成时间。 Midscene 命令涉及 AI 推理和屏幕交互，可能比典型的 shell 命令耗时更长。一个典型命令需要大约 1 分钟；复杂的 act 命令可能需要更长时间。

在结束前始终报告任务结果。 完成自动化任务后，您必须主动向用户总结结果——包括找到的关键数据、完成的动作、拍摄的截图以及任何相关发现。切勿在最后一个自动化步骤后默默结束；用户期望在单次交互中获得完整的回应。

通过 Midscene Chrome 扩展程序（Bridge 模式）自动化用户的真实 Chrome 浏览器，保留 cookies、会话和登录状态。您（AI 代理）充当大脑，根据截图决定采取哪些操作。

`act` 能做什么

在 Chrome Bridge 模式的单个 act 调用中，Midscene 可以根据当前可见内容，在用户的真实 Chrome 会话中点击、右键点击、双击、悬停、输入或清除文本、按键、滚动、拖拽、长按，并继续执行多步骤页面流程。启用触摸输入时，它还可以处理面向触摸的页面上的滑动或捏合式交互。

命令格式

关键 — 每个命令必须完全遵循此格式。请勿修改命令前缀。

npx @midscene/web@1 --bridge <子命令> [参数]

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

749,400 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

103,800 周安装

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

53,500 周安装

--bridge 标志在此处是强制性的——它激活 Bridge 模式以连接到用户的桌面 Chrome 浏览器

用户已经准备好 Chrome 和 Midscene 扩展程序。在连接之前，请勿检查浏览器或扩展程序状态——直接连接即可。

Midscene 需要具备强大视觉基础能力的模型。必须配置以下环境变量——可以作为系统环境变量，也可以在当前工作目录的 .env 文件中配置（Midscene 会自动加载 .env）：

MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"

示例：Gemini (Gemini-3-Flash)

MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"

MIDSCENE_MODEL_API_KEY="your-aliyun-api-key"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_REASONING_ENABLED="false"
# 如果使用 OpenRouter，请设置：
# MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"
# MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus"
# MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"

示例：Doubao Seed 2.0 Lite

MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-seed"

常用模型：Doubao Seed 2.0 Lite、Qwen 3.5、Zhipu GLM-4.6V、Gemini-3-Pro、Gemini-3-Flash。

如果模型未配置，请要求用户进行设置。有关支持的提供商，请参阅模型配置。

npx @midscene/web@1 --bridge connect --url https://example.com

npx @midscene/web@1 --bridge take_screenshot

截图后，读取保存的图像文件以了解当前页面状态，然后再决定下一步操作。

使用 act 与页面交互并获取结果。它在内部自主处理所有 UI 交互——点击、输入、滚动、悬停、等待和导航——因此您应该给它一个完整、复杂的高层任务，而不是将其分解为小步骤。用自然语言描述您想做什么以及期望的效果：

# 具体指令
npx @midscene/web@1 --bridge act --prompt "点击登录按钮并在邮箱字段中填入 'user@example.com'"
npx @midscene/web@1 --bridge act --prompt "向下滚动并点击提交按钮"

# 或目标驱动的指令
npx @midscene/web@1 --bridge act --prompt "点击国家下拉菜单并选择日本"

npx @midscene/web@1 --bridge disconnect

Bridge 模式连接到用户的真实 Chrome 浏览器。每个 CLI 命令都会建立自己的临时连接，但无论您是否断开连接，浏览器、标签页和所有状态（cookies、登录会话）都会始终保留。这使得重新连接变得轻量且无损。

连接到一个 URL 以建立会话
截图以查看当前状态，确保页面已加载。
执行操作，使用 act 执行期望的操作或目标驱动的指令。
报告结果——总结完成了什么，呈现任务期间提取的关键发现和数据，并列出生成的任何文件（截图、日志等）及其路径
断开连接，仅当用户的整体任务完全完成时。如果用户可能有后续操作，请不要断开连接——保持会话可用，以便在后续对话轮次中继续交互。

始终先连接：在进行任何交互之前，使用 connect --url 导航到目标 URL。
明确指定 UI 元素：与其说 "那个按钮"，不如说 "联系表单中的蓝色提交按钮"。
使用自然语言：描述您在页面上看到的内容，而不是 CSS 选择器。说 "红色的立即购买按钮"，而不是 "#buy-btn"。
处理加载状态：在触发页面加载的导航或操作之后，截图以验证页面是否已加载。
仅在完全完成后断开连接：仅当用户的整体任务完全完成且预计没有后续操作时才断开连接。在多轮对话中，跳过断开连接以允许继续浏览器交互。断开连接是安全的——它只关闭 CLI 端的桥接连接，而不是浏览器或标签页——但如果用户想要继续，重新连接会增加不必要的开销。
切勿在后台运行：每个 midscene 命令必须同步运行——后台执行会破坏截图-分析-执行循环。
将相关操作批量放入单个 act 命令中：在同一页面内执行连续操作时，将它们合并到一个 act 提示中，而不是拆分成单独的命令。例如，"填写邮箱和密码字段，然后点击登录按钮" 应该是一个 act 调用，而不是三个。这减少了往返次数，避免了不必要的截图-分析周期，并且速度显著更快。
完成后始终报告结果：完成自动化任务后，您必须主动向用户呈现结果，而不要等待用户询问。这包括：(1) 用户原始问题的答案或请求任务的结果，(2) 执行期间提取或观察到的关键数据，(3) 截图和其他生成的文件及其路径，(4) 所采取步骤的简要总结。切勿在最后一个自动化命令后默默结束——用户期望在单次交互中获得完整的结果。

示例 — 下拉菜单选择：

npx @midscene/web@1 --bridge act --prompt "点击国家下拉菜单并选择日本"
npx @midscene/web@1 --bridge take_screenshot

示例 — 表单交互：

npx @midscene/web@1 --bridge act --prompt "在邮箱字段中填入 'user@example.com'，在密码字段中填入 'pass123'，然后点击登录按钮"
npx @midscene/web@1 --bridge take_screenshot

Bridge 模式连接失败

请用户检查 Chrome 是否已打开并安装了 Midscene 扩展程序且已启用。
Midscene 扩展程序可以从 Chrome 网上应用店安装：https://chromewebstore.google.com/detail/midscenejs/gbldofcpkknbggpkmbdaefngejllnief
检查扩展程序中的 'bridge mode' 指示器是否显示 "Listening" 状态。
请参阅 Bridge 模式文档。

网页可能需要时间加载。连接后，在交互之前截图以验证准备就绪。
对于加载缓慢的页面，在步骤之间稍作等待。

截图路径是本地文件的绝对路径。使用 Read 工具查看它。

🇺🇸English

Chrome Bridge Automation

CRITICAL RULES — VIOLATIONS WILL BREAK THE WORKFLOW:

Never run midscene commands in the background. Each command must run synchronously so you can read its output (especially screenshots) before deciding the next action. Background execution breaks the screenshot-analyze-act loop.

Run only one midscene command at a time. Wait for the previous command to finish, read the screenshot, then decide the next action. Never chain multiple commands together.

Allow enough time for each command to complete. Midscene commands involve AI inference and screen interaction, which can take longer than typical shell commands. A typical command needs about 1 minute; complex act commands may need even longer.

Always report task results before finishing. After completing the automation task, you MUST proactively summarize the results to the user — including key data found, actions completed, screenshots taken, and any relevant findings. Never silently end after the last automation step; the user expects a complete response in a single interaction.

Automate the user's real Chrome browser via the Midscene Chrome Extension (Bridge mode), preserving cookies, sessions, and login state. You (the AI agent) act as the brain, deciding which actions to take based on screenshots.

What `act` Can Do

Inside a single act call in Chrome Bridge mode, Midscene can click, right-click, double-click, hover, type or clear text, press keys, scroll, drag, long-press, and continue through multi-step page flows in the user's real Chrome session based on what is currently visible. When touch input is enabled, it can also handle swipe- or pinch-style interactions on touch-oriented pages.

Command Format

CRITICAL — Every command MUST follow this EXACT format. Do NOT modify the command prefix.

npx @midscene/web@1 --bridge <subcommand> [args]

--bridge flag is MANDATORY here — it activates Bridge mode to connect to the user's desktop Chrome browser

Prerequisites

The user has already prepared Chrome and the Midscene Extension. Do NOT check browser or extension status before connecting — just connect directly.

Midscene requires models with strong visual grounding capabilities. The following environment variables must be configured — either as system environment variables or in a .env file in the current working directory (Midscene loads .env automatically):

MIDSCENE_MODEL_API_KEY="your-api-key"
MIDSCENE_MODEL_NAME="model-name"
MIDSCENE_MODEL_BASE_URL="https://..."
MIDSCENE_MODEL_FAMILY="family-identifier"

Example: Gemini (Gemini-3-Flash)

MIDSCENE_MODEL_API_KEY="your-google-api-key"
MIDSCENE_MODEL_NAME="gemini-3-flash"
MIDSCENE_MODEL_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
MIDSCENE_MODEL_FAMILY="gemini"

Example: Qwen 3.5

MIDSCENE_MODEL_API_KEY="your-aliyun-api-key"
MIDSCENE_MODEL_NAME="qwen3.5-plus"
MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
MIDSCENE_MODEL_FAMILY="qwen3.5"
MIDSCENE_MODEL_REASONING_ENABLED="false"
# If using OpenRouter, set:
# MIDSCENE_MODEL_API_KEY="your-openrouter-api-key"
# MIDSCENE_MODEL_NAME="qwen/qwen3.5-plus"
# MIDSCENE_MODEL_BASE_URL="https://openrouter.ai/api/v1"

Example: Doubao Seed 2.0 Lite

MIDSCENE_MODEL_API_KEY="your-doubao-api-key"
MIDSCENE_MODEL_NAME="doubao-seed-2-0-lite"
MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
MIDSCENE_MODEL_FAMILY="doubao-seed"

Commonly used models: Doubao Seed 2.0 Lite, Qwen 3.5, Zhipu GLM-4.6V, Gemini-3-Pro, Gemini-3-Flash.

If the model is not configured, ask the user to set it up. See Model Configuration for supported providers.

Commands

Connect to a Web Page

npx @midscene/web@1 --bridge connect --url https://example.com

Take Screenshot

npx @midscene/web@1 --bridge take_screenshot

After taking a screenshot, read the saved image file to understand the current page state before deciding the next action.

Perform Action

Use act to interact with the page and get the result. It autonomously handles all UI interactions internally — clicking, typing, scrolling, hovering, waiting, and navigating — so you should give it complex, high-level tasks as a whole rather than breaking them into small steps. Describe what you want to do and the desired effect in natural language:

# specific instructions
npx @midscene/web@1 --bridge act --prompt "click the Login button and fill in the email field with 'user@example.com'"
npx @midscene/web@1 --bridge act --prompt "scroll down and click the Submit button"

# or target-driven instructions
npx @midscene/web@1 --bridge act --prompt "click the country dropdown and select Japan"

Disconnect

npx @midscene/web@1 --bridge disconnect

Workflow Pattern

Bridge mode connects to the user's real Chrome browser. Each CLI command establishes its own temporary connection, but the browser, tabs, and all state (cookies, login sessions) are always preserved regardless of whether you disconnect. This makes reconnecting lightweight and lossless.

Follow this pattern:

Connect to a URL to establish a session
Take screenshot to see the current state, make sure the page is loaded.
Execute action using act to perform the desired action or target-driven instructions.
Report results — summarize what was accomplished, present key findings and data extracted during the task, and list any generated files (screenshots, logs, etc.) with their paths
Disconnect only when the user's overall task is fully complete. Do NOT disconnect if the user may have follow-up actions — keep the session available for continued interaction in subsequent conversation turns.

Best Practices

Always connect first : Navigate to the target URL with connect --url before any interaction.
Be specific about UI elements : Instead of "the button", say "the blue Submit button in the contact form".
Use natural language : Describe what you see on the page, not CSS selectors. Say "the red Buy Now button" instead of "#buy-btn".
Handle loading states : After navigation or actions that trigger page loads, take a screenshot to verify the page has loaded.
Disconnect only when fully done : Only disconnect when the user's overall task is completely finished and no follow-up actions are expected. In multi-turn conversations, skip the disconnect to allow continued browser interaction. Disconnecting is safe — it only closes the CLI-side bridge connection, not the browser or tabs — but reconnecting adds unnecessary overhead if the user wants to continue.
Never run in background : Every midscene command must run synchronously — background execution breaks the screenshot-analyze-act loop.
Batch related operations into a singleact command: When performing consecutive operations within the same page, combine them into one prompt instead of splitting them into separate commands. For example, "fill in the email and password fields, then click the Login button" should be a single call, not three. This reduces round-trips, avoids unnecessary screenshot-analyze cycles, and is significantly faster.

Example — Dropdown selection:

npx @midscene/web@1 --bridge act --prompt "click the country dropdown and select Japan"
npx @midscene/web@1 --bridge take_screenshot

Example — Form interaction:

npx @midscene/web@1 --bridge act --prompt "fill in the email field with 'user@example.com' and the password field with 'pass123', then click the Log In button"
npx @midscene/web@1 --bridge take_screenshot

Troubleshooting

Bridge Mode Connection Failures

Ask user to check if Chrome is open with the Midscene Extension installed and enabled.
The Midscene Extension can be installed from the Chrome Web Store: https://chromewebstore.google.com/detail/midscenejs/gbldofcpkknbggpkmbdaefngejllnief
Check that the 'bridge mode' indicator in the extension shows "Listening" status.
See the Bridge Mode documentation.

Timeouts

Web pages may take time to load. After connecting, take a screenshot to verify readiness before interacting.
For slow pages, wait briefly between steps.

Screenshots Not Displaying

The screenshot path is an absolute path to a local file. Use the Read tool to view it.

Weekly Installs

651

Repository

web-infra-dev/m…e-skills

GitHub Stars

141

First Seen

Mar 6, 2026

Security Audits

Gen Agent Trust HubPass SocketWarn SnykFail

Installed on

openclaw396

codex339

opencode337

cursor336

cline333

gemini-cli332

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

41,400 周安装

Always report results after completion : After finishing the automation task, you MUST proactively present the results to the user without waiting for them to ask. This includes: (1) the answer to the user's original question or the outcome of the requested task, (2) key data extracted or observed during execution, (3) screenshots and other generated files with their paths, (4) a brief summary of steps taken. Do NOT silently finish after the last automation command — the user expects complete results in a single interaction.

Chrome Bridge 自动化：Midscene 真实浏览器自动化工具，保留登录状态

🇨🇳中文介绍

Chrome Bridge 自动化

`act` 能做什么

命令格式

相关 Skills

前提条件

命令

连接到网页

截图

执行操作

断开连接

工作流程模式

最佳实践

故障排除

Bridge 模式连接失败

超时

截图未显示

🇺🇸English

Chrome Bridge Automation

What `act` Can Do

Command Format

Prerequisites

Commands

Connect to a Web Page

Take Screenshot

Perform Action

Disconnect

Workflow Pattern

Best Practices

Troubleshooting

Bridge Mode Connection Failures

Timeouts

Screenshots Not Displaying

最新 Skills

Chrome Bridge 自动化：Midscene 真实浏览器自动化工具，保留登录状态

🇨🇳中文介绍

Chrome Bridge 自动化

act 能做什么

命令格式

相关 Skills

前提条件

命令

连接到网页

截图

执行操作

断开连接

工作流程模式

最佳实践

故障排除

Bridge 模式连接失败

超时

截图未显示

🇺🇸English

Chrome Bridge Automation

What act Can Do

Command Format

Prerequisites

Commands

Connect to a Web Page

Take Screenshot

Perform Action

Disconnect

Workflow Pattern

Best Practices

Troubleshooting

Bridge Mode Connection Failures

Timeouts

Screenshots Not Displaying

最新 Skills

`act` 能做什么

What `act` Can Do