Claude for Safari - macOS Safari浏览器自动化工具，通过AppleScript控制真实浏览器会话

claude-for-safari by sdlll/claude-for-safari

92 周安装量

14 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/sdlll/claude-for-safari --skill claude-for-safari

自动化 macOS 浏览器扩展

🇨🇳中文介绍

Claude for Safari

通过 AppleScript (osascript) 和 screencapture 操作用户在 macOS 上的真实 Safari 浏览器。这提供了对用户实际浏览器会话的完全访问权限——包括登录状态、Cookie 和打开的标签页——无需任何扩展或额外软件。

前提条件

首次使用前，请确认已启用两项设置。在每次会话开始时运行此检查：

osascript -e 'tell application "Safari" to get name of front window' 2>&1

如果失败，请指导用户启用：

系统设置 > 隐私与安全性 > 自动化 — 授予终端应用控制 Safari 的权限
Safari > 设置 > 高级 — 启用"为网页开发者显示功能"，然后 开发菜单 > 允许来自 Apple 事件的 JavaScript

核心功能

1. 列出所有打开的标签页

osascript -e '
tell application "Safari"
  set output to ""
  repeat with w from 1 to (count of windows)
    repeat with t from 1 to (count of tabs of window w)
      set tabName to name of tab t of window w
      set tabURL to URL of tab t of window w
      set output to output & "W" & w & "T" & t & " | " & tabName & " | " & tabURL & linefeed
    end repeat
  end repeat
  return output
end tell'

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 读取页面内容

读取当前标签页的完整文本内容：

osascript -e '
tell application "Safari"
  do JavaScript "document.body.innerText" in current tab of front window
end tell'

读取结构化内容（标题、URL、元描述、标题）：

osascript -e '
tell application "Safari"
  do JavaScript "JSON.stringify({
    title: document.title,
    url: location.href,
    description: document.querySelector(\"meta[name=description]\")?.content || \"\",
    h1: [...document.querySelectorAll(\"h1\")].map(e => e.textContent).join(\" | \"),
    h2: [...document.querySelectorAll(\"h2\")].map(e => e.textContent).join(\" | \")
  })" in current tab of front window
end tell'

读取简化版 DOM（类似于 Chrome ACP 的 browser_read）：

osascript -e '
tell application "Safari"
  do JavaScript "
    (function() {
      const walk = (node, depth) => {
        let result = \"\";
        for (const child of node.childNodes) {
          if (child.nodeType === 3) {
            const text = child.textContent.trim();
            if (text) result += text + \"\\n\";
          } else if (child.nodeType === 1) {
            const tag = child.tagName.toLowerCase();
            if ([\"script\",\"style\",\"noscript\",\"svg\"].includes(tag)) continue;
            const style = getComputedStyle(child);
            if (style.display === \"none\" || style.visibility === \"hidden\") continue;
            if ([\"h1\",\"h2\",\"h3\",\"h4\",\"h5\",\"h6\"].includes(tag))
              result += \"#\".repeat(parseInt(tag[1])) + \" \";
            if (tag === \"a\") result += \"[\";
            if (tag === \"img\") result += \"[Image: \" + (child.alt || \"\") + \"]\\n\";
            else if (tag === \"input\") result += \"[Input \" + child.type + \": \" + (child.value || child.placeholder || \"\") + \"]\\n\";
            else if (tag === \"button\") result += \"[Button: \" + child.textContent.trim() + \"]\\n\";
            else result += walk(child, depth + 1);
            if (tag === \"a\") result += \"](\" + child.href + \")\\n\";
            if ([\"p\",\"div\",\"li\",\"tr\",\"br\",\"h1\",\"h2\",\"h3\",\"h4\",\"h5\",\"h6\"].includes(tag))
              result += \"\\n\";
          }
        }
        return result;
      };
      return walk(document.body, 0).substring(0, 50000);
    })()
  " in current tab of front window
end tell'

3. 执行 JavaScript

在页面上下文中运行任意 JavaScript 并获取返回值：

osascript -e '
tell application "Safari"
  do JavaScript "YOUR_JS_CODE_HERE" in current tab of front window
end tell'

对于多行脚本，使用 heredoc：

osascript << 'APPLESCRIPT'
tell application "Safari"
  do JavaScript "
    (function() {
      // Multi-line JS here
      return 'result';
    })()
  " in current tab of front window
end tell
APPLESCRIPT

有两种方法可用。在会话开始时自动检测使用哪种：

# Test if Screen Recording permission is granted (background screenshot available)
/tmp/safari_wid 2>/dev/null && echo "BACKGROUND_SCREENSHOT=true" || echo "BACKGROUND_SCREENSHOT=false"

后台截图（需要屏幕录制权限）

如果用户已授予终端应用屏幕录制权限，使用 screencapture -l 捕获 Safari 而无需激活它：

# 每个会话编译一次助手（如果尚未编译）
if [ ! -f /tmp/safari_wid ]; then
cat > /tmp/safari_wid.swift << 'SWIFT'
import CoreGraphics
import Foundation
let options: CGWindowListOption = [.optionOnScreenOnly, .excludeDesktopElements]
guard let windowList = CGWindowListCopyWindowInfo(options, kCGNullWindowID) as? [[String: Any]] else { exit(1) }
for window in windowList {
    guard let owner = window[kCGWindowOwnerName as String] as? String,
          owner == "Safari",
          let layer = window[kCGWindowLayer as String] as? Int,
          layer == 0,
          let wid = window[kCGWindowNumber as String] as? Int else { continue }
    print(wid)
    exit(0)
}
exit(1)
SWIFT
swiftc /tmp/safari_wid.swift -o /tmp/safari_wid
fi

# 在后台捕获 Safari 窗口（无需激活）
WID=$(/tmp/safari_wid)
screencapture -l "$WID" -o -x /tmp/safari_screenshot.png

要启用此功能，请指导用户：系统设置 > 隐私与安全性 > 屏幕录制 — 授予终端应用（Terminal / iTerm / Warp）权限。

前台截图（无需额外权限）

如果未授予屏幕录制权限，则回退到基于区域的捕获。这会短暂激活 Safari（约 0.5 秒），然后切换回来：

# 记住当前最前端的应用
FRONT_APP=$(osascript -e 'tell application "System Events" to get name of first process whose frontmost is true')

# 激活 Safari 并捕获其窗口区域
osascript -e 'tell application "Safari" to activate'
sleep 0.3
BOUNDS=$(osascript -e '
tell application "System Events"
  tell process "Safari"
    -- Safari 可能将薄工具栏暴露为窗口 1；找到最大的窗口
    set bestW to 0
    set bestBounds to ""
    repeat with i from 1 to (count of windows)
      set {x, y} to position of window i
      set {w, h} to size of window i
      if w * h > bestW then
        set bestW to w * h
        set bestBounds to (x as text) & "," & (y as text) & "," & (w as text) & "," & (h as text)
      end if
    end repeat
    return bestBounds
  end tell
end tell')
screencapture -x -R "$BOUNDS" /tmp/safari_screenshot.png

# 切换回之前的应用
osascript -e "tell application \"$FRONT_APP\" to activate"

使用任一方法捕获后，读取截图以查看屏幕内容：

Use the Read tool on /tmp/safari_screenshot.png to view the captured image.

在当前标签页中打开 URL：

osascript -e '
tell application "Safari"
  set URL of current tab of front window to "https://example.com"
end tell'

在新标签页中打开 URL：

osascript -e '
tell application "Safari"
  tell front window
    set newTab to make new tab with properties {URL:"https://example.com"}
    set current tab to newTab
  end tell
end tell'

在新窗口中打开 URL：

osascript -e 'tell application "Safari" to make new document with properties {URL:"https://example.com"}'

使用 JavaScript 点击（推荐 — 适用于 SPA 和响应式框架）：

osascript -e '
tell application "Safari"
  do JavaScript "
    const el = document.querySelector(\"button.submit\");
    if (el) {
      el.dispatchEvent(new MouseEvent(\"click\", {bubbles: true, cancelable: true}));
      \"clicked\";
    } else {
      \"element not found\";
    }
  " in current tab of front window
end tell'

重要提示：对于 React/Vue/Angular 兼容性，请使用 dispatchEvent(new MouseEvent(..., {bubbles: true})) 而不是 .click()。原生的 .click() 可能会绕过合成事件处理程序。

7. 输入和填写表单

通过 JavaScript 设置输入值：

osascript -e '
tell application "Safari"
  do JavaScript "
    const input = document.querySelector(\"input[name=search]\");
    const nativeSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, \"value\").set;
    nativeSetter.call(input, \"search text\");
    input.dispatchEvent(new Event(\"input\", {bubbles: true}));
    input.dispatchEvent(new Event(\"change\", {bubbles: true}));
  " in current tab of front window
end tell'

重要提示：对于 React 控制的输入，请使用上面显示的原生 setter + dispatchEvent 模式。直接设置 .value 不会触发 React 的状态更新。

通过系统事件输入（模拟真实键盘 — 当 JS 注入被阻止时有用）：

osascript -e '
tell application "Safari" to activate
delay 0.3
tell application "System Events"
  keystroke "hello world"
end tell'

osascript -e '
tell application "System Events"
  key code 36  -- Enter/Return
  key code 48  -- Tab
  key code 51  -- Delete/Backspace
  keystroke "a" using command down  -- Cmd+A (select all)
  keystroke "c" using command down  -- Cmd+C (copy)
end tell'

# 向下滚动 500px
osascript -e 'tell application "Safari" to do JavaScript "window.scrollBy(0, 500)" in current tab of front window'

# 滚动到顶部
osascript -e 'tell application "Safari" to do JavaScript "window.scrollTo(0, 0)" in current tab of front window'

# 滚动到底部
osascript -e 'tell application "Safari" to do JavaScript "window.scrollTo(0, document.body.scrollHeight)" in current tab of front window'

# 将元素滚动到视图中
osascript -e 'tell application "Safari" to do JavaScript "document.querySelector(\"#target\").scrollIntoView({behavior: \"smooth\"})" in current tab of front window'

# 切换到最前端窗口的第 2 个标签页
osascript -e 'tell application "Safari" to set current tab of front window to tab 2 of front window'

# 通过 URL 匹配切换到标签页
osascript -e '
tell application "Safari"
  repeat with t from 1 to (count of tabs of front window)
    if URL of tab t of front window contains "github.com" then
      set current tab of front window to tab t of front window
      exit repeat
    end if
  end repeat
end tell'

10. 等待页面加载

osascript -e '
tell application "Safari"
  -- 等待页面完成加载（最多 10 秒）
  repeat 20 times
    set readyState to do JavaScript "document.readyState" in current tab of front window
    if readyState is "complete" then exit repeat
    delay 0.5
  end repeat
end tell'

工作流程：使用截图反馈循环进行浏览

对于需要视觉确认的任务，使用截图循环：

执行操作（导航、点击、滚动等）
如果需要，等待页面加载
截图（后台或前台）→ 读取图像以查看结果
根据可见内容决定下一步操作

在特定标签页上操作

要在当前标签页以外的标签页上操作，使用 tab N of window M 语法：

# 读取窗口 1 中标签页 3 的内容
osascript -e 'tell application "Safari" to do JavaScript "document.title" in tab 3 of window 1'

# 在特定标签页中执行 JS
osascript -e 'tell application "Safari" to do JavaScript "document.body.innerText.substring(0, 1000)" in tab 2 of front window'

注意：后台截图捕获整个 Safari 窗口（无论哪个标签页处于活动状态）。要截取特定标签页的截图，请先通过 AppleScript 切换到该标签页。

仅限 macOS — AppleScript 和 screencapture 是 macOS 特有的
无法拦截网络请求 — 仅限页面内容和 JS 执行
无法访问跨域 iframe — 浏览器安全限制适用
无痕浏览窗口 — AppleScript 无法控制无痕窗口
系统事件按键是"盲打" — 它会输入到当前聚焦的任何位置；使用前请确保 Safari 在最前端

2026 年 2 月 28 日

🇺🇸English

Claude for Safari

Operate the user's real Safari browser on macOS via AppleScript (osascript) and screencapture. This provides full access to the user's actual browser session — including login state, cookies, and open tabs — without any extensions or additional software.

Prerequisites

Before first use, verify two settings are enabled. Run this check at the start of every session:

osascript -e 'tell application "Safari" to get name of front window' 2>&1

If this fails, instruct the user to enable:

System Settings > Privacy & Security > Automation — grant terminal app permission to control Safari
Safari > Settings > Advanced — enable "Show features for web developers", then Develop menu > Allow JavaScript from Apple Events

Core Capabilities

1. List All Open Tabs

osascript -e '
tell application "Safari"
  set output to ""
  repeat with w from 1 to (count of windows)
    repeat with t from 1 to (count of tabs of window w)
      set tabName to name of tab t of window w
      set tabURL to URL of tab t of window w
      set output to output & "W" & w & "T" & t & " | " & tabName & " | " & tabURL & linefeed
    end repeat
  end repeat
  return output
end tell'

2. Read Page Content

Read the full text content of the current tab:

osascript -e '
tell application "Safari"
  do JavaScript "document.body.innerText" in current tab of front window
end tell'

Read structured content (title, URL, meta description, headings):

osascript -e '
tell application "Safari"
  do JavaScript "JSON.stringify({
    title: document.title,
    url: location.href,
    description: document.querySelector(\"meta[name=description]\")?.content || \"\",
    h1: [...document.querySelectorAll(\"h1\")].map(e => e.textContent).join(\" | \"),
    h2: [...document.querySelectorAll(\"h2\")].map(e => e.textContent).join(\" | \")
  })" in current tab of front window
end tell'

Read a simplified DOM (similar to Chrome ACP's browser_read):

osascript -e '
tell application "Safari"
  do JavaScript "
    (function() {
      const walk = (node, depth) => {
        let result = \"\";
        for (const child of node.childNodes) {
          if (child.nodeType === 3) {
            const text = child.textContent.trim();
            if (text) result += text + \"\\n\";
          } else if (child.nodeType === 1) {
            const tag = child.tagName.toLowerCase();
            if ([\"script\",\"style\",\"noscript\",\"svg\"].includes(tag)) continue;
            const style = getComputedStyle(child);
            if (style.display === \"none\" || style.visibility === \"hidden\") continue;
            if ([\"h1\",\"h2\",\"h3\",\"h4\",\"h5\",\"h6\"].includes(tag))
              result += \"#\".repeat(parseInt(tag[1])) + \" \";
            if (tag === \"a\") result += \"[\";
            if (tag === \"img\") result += \"[Image: \" + (child.alt || \"\") + \"]\\n\";
            else if (tag === \"input\") result += \"[Input \" + child.type + \": \" + (child.value || child.placeholder || \"\") + \"]\\n\";
            else if (tag === \"button\") result += \"[Button: \" + child.textContent.trim() + \"]\\n\";
            else result += walk(child, depth + 1);
            if (tag === \"a\") result += \"](\" + child.href + \")\\n\";
            if ([\"p\",\"div\",\"li\",\"tr\",\"br\",\"h1\",\"h2\",\"h3\",\"h4\",\"h5\",\"h6\"].includes(tag))
              result += \"\\n\";
          }
        }
        return result;
      };
      return walk(document.body, 0).substring(0, 50000);
    })()
  " in current tab of front window
end tell'

3. Execute JavaScript

Run arbitrary JavaScript in the page context and get the return value:

osascript -e '
tell application "Safari"
  do JavaScript "YOUR_JS_CODE_HERE" in current tab of front window
end tell'

For multi-line scripts, use a heredoc:

osascript << 'APPLESCRIPT'
tell application "Safari"
  do JavaScript "
    (function() {
      // Multi-line JS here
      return 'result';
    })()
  " in current tab of front window
end tell
APPLESCRIPT

4. Screenshot

Two approaches are available. Auto-detect which to use at session start:

# Test if Screen Recording permission is granted (background screenshot available)
/tmp/safari_wid 2>/dev/null && echo "BACKGROUND_SCREENSHOT=true" || echo "BACKGROUND_SCREENSHOT=false"

Background Screenshot (requires Screen Recording permission)

If the user has granted Screen Recording permission to the terminal app, use screencapture -l to capture Safari without activating it :

# Compile the helper once per session (if not already compiled)
if [ ! -f /tmp/safari_wid ]; then
cat > /tmp/safari_wid.swift << 'SWIFT'
import CoreGraphics
import Foundation
let options: CGWindowListOption = [.optionOnScreenOnly, .excludeDesktopElements]
guard let windowList = CGWindowListCopyWindowInfo(options, kCGNullWindowID) as? [[String: Any]] else { exit(1) }
for window in windowList {
    guard let owner = window[kCGWindowOwnerName as String] as? String,
          owner == "Safari",
          let layer = window[kCGWindowLayer as String] as? Int,
          layer == 0,
          let wid = window[kCGWindowNumber as String] as? Int else { continue }
    print(wid)
    exit(0)
}
exit(1)
SWIFT
swiftc /tmp/safari_wid.swift -o /tmp/safari_wid
fi

# Capture Safari window in background (no activation needed)
WID=$(/tmp/safari_wid)
screencapture -l "$WID" -o -x /tmp/safari_screenshot.png

To enable this, instruct the user: System Settings > Privacy & Security > Screen Recording — grant permission to the terminal app (Terminal / iTerm / Warp).

Foreground Screenshot (no extra permissions needed)

If Screen Recording is not granted, fall back to region-based capture. This briefly activates Safari (~0.5s), then switches back:

# Remember current frontmost app
FRONT_APP=$(osascript -e 'tell application "System Events" to get name of first process whose frontmost is true')

# Activate Safari and capture its window region
osascript -e 'tell application "Safari" to activate'
sleep 0.3
BOUNDS=$(osascript -e '
tell application "System Events"
  tell process "Safari"
    -- Safari may expose a thin toolbar as window 1; find the largest window
    set bestW to 0
    set bestBounds to ""
    repeat with i from 1 to (count of windows)
      set {x, y} to position of window i
      set {w, h} to size of window i
      if w * h > bestW then
        set bestW to w * h
        set bestBounds to (x as text) & "," & (y as text) & "," & (w as text) & "," & (h as text)
      end if
    end repeat
    return bestBounds
  end tell
end tell')
screencapture -x -R "$BOUNDS" /tmp/safari_screenshot.png

# Switch back to the previous app
osascript -e "tell application \"$FRONT_APP\" to activate"

After capturing with either method, read the screenshot to see what's on screen:

Use the Read tool on /tmp/safari_screenshot.png to view the captured image.

5. Navigate

Open a URL in the current tab:

osascript -e '
tell application "Safari"
  set URL of current tab of front window to "https://example.com"
end tell'

Open a URL in a new tab:

osascript -e '
tell application "Safari"
  tell front window
    set newTab to make new tab with properties {URL:"https://example.com"}
    set current tab to newTab
  end tell
end tell'

Open a URL in a new window:

osascript -e 'tell application "Safari" to make new document with properties {URL:"https://example.com"}'

6. Click Elements

Click using JavaScript (preferred — works with SPAs and reactive frameworks):

osascript -e '
tell application "Safari"
  do JavaScript "
    const el = document.querySelector(\"button.submit\");
    if (el) {
      el.dispatchEvent(new MouseEvent(\"click\", {bubbles: true, cancelable: true}));
      \"clicked\";
    } else {
      \"element not found\";
    }
  " in current tab of front window
end tell'

Important : Use dispatchEvent(new MouseEvent(..., {bubbles: true})) instead of .click() for React/Vue/Angular compatibility. Native .click() may bypass synthetic event handlers.

7. Type and Fill Forms

Set input values via JavaScript:

osascript -e '
tell application "Safari"
  do JavaScript "
    const input = document.querySelector(\"input[name=search]\");
    const nativeSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, \"value\").set;
    nativeSetter.call(input, \"search text\");
    input.dispatchEvent(new Event(\"input\", {bubbles: true}));
    input.dispatchEvent(new Event(\"change\", {bubbles: true}));
  " in current tab of front window
end tell'

Important : For React-controlled inputs, use the native setter + dispatchEvent pattern shown above. Directly setting .value will not trigger React's state update.

Type via System Events (simulates real keyboard — useful when JS injection is blocked):

osascript -e '
tell application "Safari" to activate
delay 0.3
tell application "System Events"
  keystroke "hello world"
end tell'

Press special keys:

osascript -e '
tell application "System Events"
  key code 36  -- Enter/Return
  key code 48  -- Tab
  key code 51  -- Delete/Backspace
  keystroke "a" using command down  -- Cmd+A (select all)
  keystroke "c" using command down  -- Cmd+C (copy)
end tell'

8. Scroll

# Scroll down 500px
osascript -e 'tell application "Safari" to do JavaScript "window.scrollBy(0, 500)" in current tab of front window'

# Scroll to top
osascript -e 'tell application "Safari" to do JavaScript "window.scrollTo(0, 0)" in current tab of front window'

# Scroll to bottom
osascript -e 'tell application "Safari" to do JavaScript "window.scrollTo(0, document.body.scrollHeight)" in current tab of front window'

# Scroll element into view
osascript -e 'tell application "Safari" to do JavaScript "document.querySelector(\"#target\").scrollIntoView({behavior: \"smooth\"})" in current tab of front window'

9. Switch Tabs

# Switch to tab 2 in the front window
osascript -e 'tell application "Safari" to set current tab of front window to tab 2 of front window'

# Switch to a tab by URL match
osascript -e '
tell application "Safari"
  repeat with t from 1 to (count of tabs of front window)
    if URL of tab t of front window contains "github.com" then
      set current tab of front window to tab t of front window
      exit repeat
    end if
  end repeat
end tell'

10. Wait for Page Load

osascript -e '
tell application "Safari"
  -- Wait until page finishes loading (max 10 seconds)
  repeat 20 times
    set readyState to do JavaScript "document.readyState" in current tab of front window
    if readyState is "complete" then exit repeat
    delay 0.5
  end repeat
end tell'

Workflow: Browsing with Screenshot Feedback Loop

For tasks that require visual confirmation, use the screenshot loop:

Perform action (navigate, click, scroll, etc.)
Wait for page load if needed
Take screenshot (background or foreground) → Read the image to see result
Decide next action based on what is visible

Operating on Specific Tabs

To operate on a tab other than the current one, use tab N of window M syntax:

# Read content of tab 3 in window 1
osascript -e 'tell application "Safari" to do JavaScript "document.title" in tab 3 of window 1'

# Execute JS in a specific tab
osascript -e 'tell application "Safari" to do JavaScript "document.body.innerText.substring(0, 1000)" in tab 2 of front window'

Note: Background screenshots capture the entire Safari window (whichever tab is active). To screenshot a specific tab, first switch to it via AppleScript.

Limitations

macOS only — AppleScript and screencapture are macOS-specific
Cannot intercept network requests — only page content and JS execution
Cannot access cross-origin iframes — browser security applies
Private browsing windows — AppleScript cannot control private windows
System Events keystroke is "blind" — it types into whatever is focused; ensure Safari is frontmost before using

Weekly Installs

Repository

sdlll/claude-for-safari

GitHub Stars

First Seen

Feb 28, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykFail

Installed on

codex91

opencode91

kimi-cli89

gemini-cli89

amp89

cline89

Skills CLI 使用指南：AI Agent 技能包管理器安装与管理教程

43,100 周安装

Claude for Safari - macOS Safari浏览器自动化工具，通过AppleScript控制真实浏览器会话

🇨🇳中文介绍

Claude for Safari

前提条件

核心功能

1. 列出所有打开的标签页

相关 Skills

2. 读取页面内容

3. 执行 JavaScript

4. 截图

后台截图（需要屏幕录制权限）

前台截图（无需额外权限）

5. 导航

6. 点击元素

7. 输入和填写表单

8. 滚动

9. 切换标签页

10. 等待页面加载

工作流程：使用截图反馈循环进行浏览

在特定标签页上操作

限制

🇺🇸English

Claude for Safari

Prerequisites

Core Capabilities

1. List All Open Tabs

2. Read Page Content

3. Execute JavaScript

4. Screenshot

Background Screenshot (requires Screen Recording permission)

Foreground Screenshot (no extra permissions needed)

5. Navigate

6. Click Elements

7. Type and Fill Forms

8. Scroll

9. Switch Tabs

10. Wait for Page Load

Workflow: Browsing with Screenshot Feedback Loop

Operating on Specific Tabs

Limitations

最新 Skills