claude-for-safari by sdlll/claude-for-safari
npx skills add https://github.com/sdlll/claude-for-safari --skill claude-for-safari通过 AppleScript (osascript) 和 screencapture 操作用户在 macOS 上的真实 Safari 浏览器。这提供了对用户实际浏览器会话的完全访问权限——包括登录状态、Cookie 和打开的标签页——无需任何扩展或额外软件。
首次使用前,请确认已启用两项设置。在每次会话开始时运行此检查:
osascript -e 'tell application "Safari" to get name of front window' 2>&1
如果失败,请指导用户启用:
osascript -e '
tell application "Safari"
set output to ""
repeat with w from 1 to (count of windows)
repeat with t from 1 to (count of tabs of window w)
set tabName to name of tab t of window w
set tabURL to URL of tab t of window w
set output to output & "W" & w & "T" & t & " | " & tabName & " | " & tabURL & linefeed
end repeat
end repeat
return output
end tell'
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
读取当前标签页的完整文本内容:
osascript -e '
tell application "Safari"
do JavaScript "document.body.innerText" in current tab of front window
end tell'
读取结构化内容(标题、URL、元描述、标题):
osascript -e '
tell application "Safari"
do JavaScript "JSON.stringify({
title: document.title,
url: location.href,
description: document.querySelector(\"meta[name=description]\")?.content || \"\",
h1: [...document.querySelectorAll(\"h1\")].map(e => e.textContent).join(\" | \"),
h2: [...document.querySelectorAll(\"h2\")].map(e => e.textContent).join(\" | \")
})" in current tab of front window
end tell'
读取简化版 DOM(类似于 Chrome ACP 的 browser_read):
osascript -e '
tell application "Safari"
do JavaScript "
(function() {
const walk = (node, depth) => {
let result = \"\";
for (const child of node.childNodes) {
if (child.nodeType === 3) {
const text = child.textContent.trim();
if (text) result += text + \"\\n\";
} else if (child.nodeType === 1) {
const tag = child.tagName.toLowerCase();
if ([\"script\",\"style\",\"noscript\",\"svg\"].includes(tag)) continue;
const style = getComputedStyle(child);
if (style.display === \"none\" || style.visibility === \"hidden\") continue;
if ([\"h1\",\"h2\",\"h3\",\"h4\",\"h5\",\"h6\"].includes(tag))
result += \"#\".repeat(parseInt(tag[1])) + \" \";
if (tag === \"a\") result += \"[\";
if (tag === \"img\") result += \"[Image: \" + (child.alt || \"\") + \"]\\n\";
else if (tag === \"input\") result += \"[Input \" + child.type + \": \" + (child.value || child.placeholder || \"\") + \"]\\n\";
else if (tag === \"button\") result += \"[Button: \" + child.textContent.trim() + \"]\\n\";
else result += walk(child, depth + 1);
if (tag === \"a\") result += \"](\" + child.href + \")\\n\";
if ([\"p\",\"div\",\"li\",\"tr\",\"br\",\"h1\",\"h2\",\"h3\",\"h4\",\"h5\",\"h6\"].includes(tag))
result += \"\\n\";
}
}
return result;
};
return walk(document.body, 0).substring(0, 50000);
})()
" in current tab of front window
end tell'
在页面上下文中运行任意 JavaScript 并获取返回值:
osascript -e '
tell application "Safari"
do JavaScript "YOUR_JS_CODE_HERE" in current tab of front window
end tell'
对于多行脚本,使用 heredoc:
osascript << 'APPLESCRIPT'
tell application "Safari"
do JavaScript "
(function() {
// Multi-line JS here
return 'result';
})()
" in current tab of front window
end tell
APPLESCRIPT
有两种方法可用。在会话开始时自动检测使用哪种:
# Test if Screen Recording permission is granted (background screenshot available)
/tmp/safari_wid 2>/dev/null && echo "BACKGROUND_SCREENSHOT=true" || echo "BACKGROUND_SCREENSHOT=false"
如果用户已授予终端应用屏幕录制权限,使用 screencapture -l 捕获 Safari 而无需激活它:
# 每个会话编译一次助手(如果尚未编译)
if [ ! -f /tmp/safari_wid ]; then
cat > /tmp/safari_wid.swift << 'SWIFT'
import CoreGraphics
import Foundation
let options: CGWindowListOption = [.optionOnScreenOnly, .excludeDesktopElements]
guard let windowList = CGWindowListCopyWindowInfo(options, kCGNullWindowID) as? [[String: Any]] else { exit(1) }
for window in windowList {
guard let owner = window[kCGWindowOwnerName as String] as? String,
owner == "Safari",
let layer = window[kCGWindowLayer as String] as? Int,
layer == 0,
let wid = window[kCGWindowNumber as String] as? Int else { continue }
print(wid)
exit(0)
}
exit(1)
SWIFT
swiftc /tmp/safari_wid.swift -o /tmp/safari_wid
fi
# 在后台捕获 Safari 窗口(无需激活)
WID=$(/tmp/safari_wid)
screencapture -l "$WID" -o -x /tmp/safari_screenshot.png
要启用此功能,请指导用户:系统设置 > 隐私与安全性 > 屏幕录制 — 授予终端应用(Terminal / iTerm / Warp)权限。
如果未授予屏幕录制权限,则回退到基于区域的捕获。这会短暂激活 Safari(约 0.5 秒),然后切换回来:
# 记住当前最前端的应用
FRONT_APP=$(osascript -e 'tell application "System Events" to get name of first process whose frontmost is true')
# 激活 Safari 并捕获其窗口区域
osascript -e 'tell application "Safari" to activate'
sleep 0.3
BOUNDS=$(osascript -e '
tell application "System Events"
tell process "Safari"
-- Safari 可能将薄工具栏暴露为窗口 1;找到最大的窗口
set bestW to 0
set bestBounds to ""
repeat with i from 1 to (count of windows)
set {x, y} to position of window i
set {w, h} to size of window i
if w * h > bestW then
set bestW to w * h
set bestBounds to (x as text) & "," & (y as text) & "," & (w as text) & "," & (h as text)
end if
end repeat
return bestBounds
end tell
end tell')
screencapture -x -R "$BOUNDS" /tmp/safari_screenshot.png
# 切换回之前的应用
osascript -e "tell application \"$FRONT_APP\" to activate"
使用任一方法捕获后,读取截图以查看屏幕内容:
Use the Read tool on /tmp/safari_screenshot.png to view the captured image.
在当前标签页中打开 URL:
osascript -e '
tell application "Safari"
set URL of current tab of front window to "https://example.com"
end tell'
在新标签页中打开 URL:
osascript -e '
tell application "Safari"
tell front window
set newTab to make new tab with properties {URL:"https://example.com"}
set current tab to newTab
end tell
end tell'
在新窗口中打开 URL:
osascript -e 'tell application "Safari" to make new document with properties {URL:"https://example.com"}'
使用 JavaScript 点击(推荐 — 适用于 SPA 和响应式框架):
osascript -e '
tell application "Safari"
do JavaScript "
const el = document.querySelector(\"button.submit\");
if (el) {
el.dispatchEvent(new MouseEvent(\"click\", {bubbles: true, cancelable: true}));
\"clicked\";
} else {
\"element not found\";
}
" in current tab of front window
end tell'
重要提示:对于 React/Vue/Angular 兼容性,请使用 dispatchEvent(new MouseEvent(..., {bubbles: true})) 而不是 .click()。原生的 .click() 可能会绕过合成事件处理程序。
通过 JavaScript 设置输入值:
osascript -e '
tell application "Safari"
do JavaScript "
const input = document.querySelector(\"input[name=search]\");
const nativeSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, \"value\").set;
nativeSetter.call(input, \"search text\");
input.dispatchEvent(new Event(\"input\", {bubbles: true}));
input.dispatchEvent(new Event(\"change\", {bubbles: true}));
" in current tab of front window
end tell'
重要提示:对于 React 控制的输入,请使用上面显示的原生 setter + dispatchEvent 模式。直接设置 .value 不会触发 React 的状态更新。
通过系统事件输入(模拟真实键盘 — 当 JS 注入被阻止时有用):
osascript -e '
tell application "Safari" to activate
delay 0.3
tell application "System Events"
keystroke "hello world"
end tell'
按下特殊键:
osascript -e '
tell application "System Events"
key code 36 -- Enter/Return
key code 48 -- Tab
key code 51 -- Delete/Backspace
keystroke "a" using command down -- Cmd+A (select all)
keystroke "c" using command down -- Cmd+C (copy)
end tell'
# 向下滚动 500px
osascript -e 'tell application "Safari" to do JavaScript "window.scrollBy(0, 500)" in current tab of front window'
# 滚动到顶部
osascript -e 'tell application "Safari" to do JavaScript "window.scrollTo(0, 0)" in current tab of front window'
# 滚动到底部
osascript -e 'tell application "Safari" to do JavaScript "window.scrollTo(0, document.body.scrollHeight)" in current tab of front window'
# 将元素滚动到视图中
osascript -e 'tell application "Safari" to do JavaScript "document.querySelector(\"#target\").scrollIntoView({behavior: \"smooth\"})" in current tab of front window'
# 切换到最前端窗口的第 2 个标签页
osascript -e 'tell application "Safari" to set current tab of front window to tab 2 of front window'
# 通过 URL 匹配切换到标签页
osascript -e '
tell application "Safari"
repeat with t from 1 to (count of tabs of front window)
if URL of tab t of front window contains "github.com" then
set current tab of front window to tab t of front window
exit repeat
end if
end repeat
end tell'
osascript -e '
tell application "Safari"
-- 等待页面完成加载(最多 10 秒)
repeat 20 times
set readyState to do JavaScript "document.readyState" in current tab of front window
if readyState is "complete" then exit repeat
delay 0.5
end repeat
end tell'
对于需要视觉确认的任务,使用截图循环:
要在当前标签页以外的标签页上操作,使用 tab N of window M 语法:
# 读取窗口 1 中标签页 3 的内容
osascript -e 'tell application "Safari" to do JavaScript "document.title" in tab 3 of window 1'
# 在特定标签页中执行 JS
osascript -e 'tell application "Safari" to do JavaScript "document.body.innerText.substring(0, 1000)" in tab 2 of front window'
注意:后台截图捕获整个 Safari 窗口(无论哪个标签页处于活动状态)。要截取特定标签页的截图,请先通过 AppleScript 切换到该标签页。
每周安装数
92
仓库
GitHub 星标数
14
首次出现
2026 年 2 月 28 日
安全审计
安装于
codex91
opencode91
kimi-cli89
gemini-cli89
amp89
cline89
Operate the user's real Safari browser on macOS via AppleScript (osascript) and screencapture. This provides full access to the user's actual browser session — including login state, cookies, and open tabs — without any extensions or additional software.
Before first use, verify two settings are enabled. Run this check at the start of every session:
osascript -e 'tell application "Safari" to get name of front window' 2>&1
If this fails, instruct the user to enable:
osascript -e '
tell application "Safari"
set output to ""
repeat with w from 1 to (count of windows)
repeat with t from 1 to (count of tabs of window w)
set tabName to name of tab t of window w
set tabURL to URL of tab t of window w
set output to output & "W" & w & "T" & t & " | " & tabName & " | " & tabURL & linefeed
end repeat
end repeat
return output
end tell'
Read the full text content of the current tab:
osascript -e '
tell application "Safari"
do JavaScript "document.body.innerText" in current tab of front window
end tell'
Read structured content (title, URL, meta description, headings):
osascript -e '
tell application "Safari"
do JavaScript "JSON.stringify({
title: document.title,
url: location.href,
description: document.querySelector(\"meta[name=description]\")?.content || \"\",
h1: [...document.querySelectorAll(\"h1\")].map(e => e.textContent).join(\" | \"),
h2: [...document.querySelectorAll(\"h2\")].map(e => e.textContent).join(\" | \")
})" in current tab of front window
end tell'
Read a simplified DOM (similar to Chrome ACP's browser_read):
osascript -e '
tell application "Safari"
do JavaScript "
(function() {
const walk = (node, depth) => {
let result = \"\";
for (const child of node.childNodes) {
if (child.nodeType === 3) {
const text = child.textContent.trim();
if (text) result += text + \"\\n\";
} else if (child.nodeType === 1) {
const tag = child.tagName.toLowerCase();
if ([\"script\",\"style\",\"noscript\",\"svg\"].includes(tag)) continue;
const style = getComputedStyle(child);
if (style.display === \"none\" || style.visibility === \"hidden\") continue;
if ([\"h1\",\"h2\",\"h3\",\"h4\",\"h5\",\"h6\"].includes(tag))
result += \"#\".repeat(parseInt(tag[1])) + \" \";
if (tag === \"a\") result += \"[\";
if (tag === \"img\") result += \"[Image: \" + (child.alt || \"\") + \"]\\n\";
else if (tag === \"input\") result += \"[Input \" + child.type + \": \" + (child.value || child.placeholder || \"\") + \"]\\n\";
else if (tag === \"button\") result += \"[Button: \" + child.textContent.trim() + \"]\\n\";
else result += walk(child, depth + 1);
if (tag === \"a\") result += \"](\" + child.href + \")\\n\";
if ([\"p\",\"div\",\"li\",\"tr\",\"br\",\"h1\",\"h2\",\"h3\",\"h4\",\"h5\",\"h6\"].includes(tag))
result += \"\\n\";
}
}
return result;
};
return walk(document.body, 0).substring(0, 50000);
})()
" in current tab of front window
end tell'
Run arbitrary JavaScript in the page context and get the return value:
osascript -e '
tell application "Safari"
do JavaScript "YOUR_JS_CODE_HERE" in current tab of front window
end tell'
For multi-line scripts, use a heredoc:
osascript << 'APPLESCRIPT'
tell application "Safari"
do JavaScript "
(function() {
// Multi-line JS here
return 'result';
})()
" in current tab of front window
end tell
APPLESCRIPT
Two approaches are available. Auto-detect which to use at session start:
# Test if Screen Recording permission is granted (background screenshot available)
/tmp/safari_wid 2>/dev/null && echo "BACKGROUND_SCREENSHOT=true" || echo "BACKGROUND_SCREENSHOT=false"
If the user has granted Screen Recording permission to the terminal app, use screencapture -l to capture Safari without activating it :
# Compile the helper once per session (if not already compiled)
if [ ! -f /tmp/safari_wid ]; then
cat > /tmp/safari_wid.swift << 'SWIFT'
import CoreGraphics
import Foundation
let options: CGWindowListOption = [.optionOnScreenOnly, .excludeDesktopElements]
guard let windowList = CGWindowListCopyWindowInfo(options, kCGNullWindowID) as? [[String: Any]] else { exit(1) }
for window in windowList {
guard let owner = window[kCGWindowOwnerName as String] as? String,
owner == "Safari",
let layer = window[kCGWindowLayer as String] as? Int,
layer == 0,
let wid = window[kCGWindowNumber as String] as? Int else { continue }
print(wid)
exit(0)
}
exit(1)
SWIFT
swiftc /tmp/safari_wid.swift -o /tmp/safari_wid
fi
# Capture Safari window in background (no activation needed)
WID=$(/tmp/safari_wid)
screencapture -l "$WID" -o -x /tmp/safari_screenshot.png
To enable this, instruct the user: System Settings > Privacy & Security > Screen Recording — grant permission to the terminal app (Terminal / iTerm / Warp).
If Screen Recording is not granted, fall back to region-based capture. This briefly activates Safari (~0.5s), then switches back:
# Remember current frontmost app
FRONT_APP=$(osascript -e 'tell application "System Events" to get name of first process whose frontmost is true')
# Activate Safari and capture its window region
osascript -e 'tell application "Safari" to activate'
sleep 0.3
BOUNDS=$(osascript -e '
tell application "System Events"
tell process "Safari"
-- Safari may expose a thin toolbar as window 1; find the largest window
set bestW to 0
set bestBounds to ""
repeat with i from 1 to (count of windows)
set {x, y} to position of window i
set {w, h} to size of window i
if w * h > bestW then
set bestW to w * h
set bestBounds to (x as text) & "," & (y as text) & "," & (w as text) & "," & (h as text)
end if
end repeat
return bestBounds
end tell
end tell')
screencapture -x -R "$BOUNDS" /tmp/safari_screenshot.png
# Switch back to the previous app
osascript -e "tell application \"$FRONT_APP\" to activate"
After capturing with either method, read the screenshot to see what's on screen:
Use the Read tool on /tmp/safari_screenshot.png to view the captured image.
Open a URL in the current tab:
osascript -e '
tell application "Safari"
set URL of current tab of front window to "https://example.com"
end tell'
Open a URL in a new tab:
osascript -e '
tell application "Safari"
tell front window
set newTab to make new tab with properties {URL:"https://example.com"}
set current tab to newTab
end tell
end tell'
Open a URL in a new window:
osascript -e 'tell application "Safari" to make new document with properties {URL:"https://example.com"}'
Click using JavaScript (preferred — works with SPAs and reactive frameworks):
osascript -e '
tell application "Safari"
do JavaScript "
const el = document.querySelector(\"button.submit\");
if (el) {
el.dispatchEvent(new MouseEvent(\"click\", {bubbles: true, cancelable: true}));
\"clicked\";
} else {
\"element not found\";
}
" in current tab of front window
end tell'
Important : Use dispatchEvent(new MouseEvent(..., {bubbles: true})) instead of .click() for React/Vue/Angular compatibility. Native .click() may bypass synthetic event handlers.
Set input values via JavaScript:
osascript -e '
tell application "Safari"
do JavaScript "
const input = document.querySelector(\"input[name=search]\");
const nativeSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, \"value\").set;
nativeSetter.call(input, \"search text\");
input.dispatchEvent(new Event(\"input\", {bubbles: true}));
input.dispatchEvent(new Event(\"change\", {bubbles: true}));
" in current tab of front window
end tell'
Important : For React-controlled inputs, use the native setter + dispatchEvent pattern shown above. Directly setting .value will not trigger React's state update.
Type via System Events (simulates real keyboard — useful when JS injection is blocked):
osascript -e '
tell application "Safari" to activate
delay 0.3
tell application "System Events"
keystroke "hello world"
end tell'
Press special keys:
osascript -e '
tell application "System Events"
key code 36 -- Enter/Return
key code 48 -- Tab
key code 51 -- Delete/Backspace
keystroke "a" using command down -- Cmd+A (select all)
keystroke "c" using command down -- Cmd+C (copy)
end tell'
# Scroll down 500px
osascript -e 'tell application "Safari" to do JavaScript "window.scrollBy(0, 500)" in current tab of front window'
# Scroll to top
osascript -e 'tell application "Safari" to do JavaScript "window.scrollTo(0, 0)" in current tab of front window'
# Scroll to bottom
osascript -e 'tell application "Safari" to do JavaScript "window.scrollTo(0, document.body.scrollHeight)" in current tab of front window'
# Scroll element into view
osascript -e 'tell application "Safari" to do JavaScript "document.querySelector(\"#target\").scrollIntoView({behavior: \"smooth\"})" in current tab of front window'
# Switch to tab 2 in the front window
osascript -e 'tell application "Safari" to set current tab of front window to tab 2 of front window'
# Switch to a tab by URL match
osascript -e '
tell application "Safari"
repeat with t from 1 to (count of tabs of front window)
if URL of tab t of front window contains "github.com" then
set current tab of front window to tab t of front window
exit repeat
end if
end repeat
end tell'
osascript -e '
tell application "Safari"
-- Wait until page finishes loading (max 10 seconds)
repeat 20 times
set readyState to do JavaScript "document.readyState" in current tab of front window
if readyState is "complete" then exit repeat
delay 0.5
end repeat
end tell'
For tasks that require visual confirmation, use the screenshot loop:
To operate on a tab other than the current one, use tab N of window M syntax:
# Read content of tab 3 in window 1
osascript -e 'tell application "Safari" to do JavaScript "document.title" in tab 3 of window 1'
# Execute JS in a specific tab
osascript -e 'tell application "Safari" to do JavaScript "document.body.innerText.substring(0, 1000)" in tab 2 of front window'
Note: Background screenshots capture the entire Safari window (whichever tab is active). To screenshot a specific tab, first switch to it via AppleScript.
Weekly Installs
92
Repository
GitHub Stars
14
First Seen
Feb 28, 2026
Security Audits
Gen Agent Trust HubWarnSocketPassSnykFail
Installed on
codex91
opencode91
kimi-cli89
gemini-cli89
amp89
cline89
Skills CLI 使用指南:AI Agent 技能包管理器安装与管理教程
43,100 周安装