automate-this by github/awesome-copilot
npx skills add https://github.com/github/awesome-copilot --skill automate-this分析手动操作的屏幕录制视频,并为其构建可运行的自动化脚本。
用户录制自己执行重复或繁琐任务的过程,将视频文件交给你,你需要分析他们在做什么、为什么做,以及如何通过脚本实现自动化。
在分析任何录制内容之前,请先验证所需工具是否可用。静默运行这些检查,仅在发现问题时提示:
command -v ffmpeg >/dev/null 2>&1 && ffmpeg -version 2>/dev/null | head -1 || echo "NO_FFMPEG"
command -v whisper >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || echo "NO_WHISPER"
brew install ffmpeg(macOS)或适用于其操作系统的等效命令。pip install openai-whisper 或 brew install whisper-cpp。如果用户拒绝,则仅进行视觉分析。给定一个视频文件路径(通常在 ~/Desktop/ 上),提取视觉帧和音频:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
每 2 秒提取一帧。这能在覆盖范围和上下文窗口限制之间取得平衡。
WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/automate-this-XXXXXX")
chmod 700 "$WORK_DIR"
mkdir -p "$WORK_DIR/frames"
ffmpeg -y -i "<VIDEO_PATH>" -vf "fps=0.5" -q:v 2 -loglevel warning "$WORK_DIR/frames/frame_%04d.jpg"
ls "$WORK_DIR/frames/" | wc -l
在会话的所有后续临时文件路径中使用 $WORK_DIR。具有 0700 模式的每次运行独立目录确保提取的帧仅对当前用户可读。
如果录制时长超过 5 分钟(超过 150 帧),则将间隔增加到每 4 秒一帧,以保持在上下文限制内。告知用户,对于较长的录制内容,您正在降低采样频率。
检查视频是否包含音轨:
ffprobe -i "<VIDEO_PATH>" -show_streams -select_streams a -loglevel error | head -5
如果存在音频:
ffmpeg -y -i "<VIDEO_PATH>" -ac 1 -ar 16000 -loglevel warning "$WORK_DIR/audio.wav"
# 使用可用的任意 whisper 二进制文件
if command -v whisper >/dev/null 2>&1; then
whisper "$WORK_DIR/audio.wav" --model small --language en --output_format txt --output_dir "$WORK_DIR/"
cat "$WORK_DIR/audio.txt"
elif command -v whisper-cpp >/dev/null 2>&1; then
whisper-cpp -m "$(brew --prefix 2>/dev/null)/share/whisper-cpp/models/ggml-small.bin" -l en -f "$WORK_DIR/audio.wav" -otxt -of "$WORK_DIR/audio"
cat "$WORK_DIR/audio.txt"
else
echo "NO_WHISPER"
fi
如果 whisper 二进制文件均不可用且录制内容包含音频,请告知用户他们缺少旁白上下文,并询问他们是否希望安装 Whisper(pip install openai-whisper 或 brew install whisper-cpp)或继续进行仅视觉分析。
分析提取的帧(以及可用的转录文本),以结构化地理解用户的操作。按顺序处理帧并识别:
将重构结果以编号步骤列表的形式呈现给用户,并在提出自动化建议前请他们确认其准确性。这至关重要 — 错误的理解会导致无用的自动化。
格式:
以下是我在录制中看到您所做的操作:
1. 打开 Chrome 并导航到 [特定 URL]
2. 使用凭据登录
3. 点击进入报告仪表板
4. 下载 CSV 导出文件
5. 在 Excel 中打开 CSV 文件
6. 筛选 B 列为 "pending" 的行
7. 将这些行复制到新的电子表格中
8. 将新电子表格通过电子邮件发送给 [收件人]
您为不同的报告类型重复了步骤 3-8 三次。
[如果存在旁白]:您提到导出步骤是最慢的部分,并且您每周一早上都要做这个。
这与您的操作相符吗?我有任何错误或遗漏吗?
在用户确认重构准确之前,不要进入第三阶段。
在提出自动化建议之前,先了解用户实际可用的工具。运行以下检查:
echo "=== OS ===" && uname -a
echo "=== Shell ===" && echo $SHELL
echo "=== Python ===" && { command -v python3 && python3 --version 2>&1; } || echo "not installed"
echo "=== Node ===" && { command -v node && node --version 2>&1; } || echo "not installed"
echo "=== Homebrew ===" && { command -v brew && echo "installed"; } || echo "not installed"
echo "=== 常用工具 ===" && for cmd in curl jq playwright selenium osascript automator crontab; do command -v $cmd >/dev/null 2>&1 && echo "$cmd: yes" || echo "$cmd: no"; done
利用这些信息将建议限制在用户已有的工具范围内。除非更简单的路径确实行不通,否则永远不要提出需要安装五个新东西的自动化方案。
基于重构的流程和用户环境,提出最多三个层级的自动化方案。并非每个流程都需要三个层级 — 请自行判断。
层级 1 — 快速见效(设置时间少于 5 分钟) 最小但有用的自动化。一个 shell 别名、一行命令、一个键盘快捷键、一段 AppleScript 代码片段。自动化最痛苦的单一步骤,而非整个流程。
层级 2 — 脚本(设置时间少于 30 分钟) 一个独立的脚本(bash、Python 或 Node — 用户有哪个就用哪个),可以端到端地自动化整个流程。处理常见错误。可以在需要时手动运行。
层级 3 — 完全自动化(设置时间少于 2 小时) 层级 2 的脚本,加上:计划执行(cron、launchd 或 GitHub Actions)、日志记录、错误通知以及任何必要的集成脚手架(API 密钥、身份验证令牌等)。
对于每个层级,提供:
## 层级 [N]: [名称]
**自动化内容:** [重构步骤中的哪些部分]
**仍需手动操作:** [哪些步骤仍然需要人工参与]
**节省时间:** [基于录制时长和重复次数估算的每次运行节省时间]
**前提条件:** [任何需要但尚未安装的东西 — 理想情况下没有]
**工作原理:**
[2-3 句通俗易懂的解释]
**代码:**
[完整、可运行、带注释的代码 — 不是伪代码]
**如何测试:**
[验证其工作的确切步骤,如果可能,从试运行开始]
**如何撤销:**
[如果出现问题,如何撤销任何更改]
根据录制中出现的应用程序使用以下策略:
基于浏览器的工作流:
curl 或 wget。电子表格和数据工作流:
csvkit 进行快速的命令行 CSV 操作,无需编写代码。电子邮件工作流:
osascript 可以控制 Mail.app 发送带附件的电子邮件。smtplib 发送邮件,imaplib 读取邮件。文件管理工作流:
find + xargs 进行批量操作。fswatch 或 watchman 实现基于变化的触发自动化。终端/CLI 工作流:
macOS 特定工作流:
automator。launchd plist 文件(在 macOS 上优先于 cron)。跨应用工作流(数据在应用间移动):
将以下原则应用于每个提案:
--dry-run 标志、预览输出或在破坏性操作前进行确认提示。security 命令)或在运行时提示输入。当用户选择一个层级时:
~/Desktop/)。分析完成后(无论结果如何),清理提取的帧和音频:
rm -rf "$WORK_DIR"
告知用户您正在清理临时文件,以便他们知道没有留下任何东西。
每周安装次数
567
代码库
GitHub 星标数
26.7K
首次出现
2026年3月9日
安全审计
安装于
gemini-cli509
codex508
opencode497
cursor494
github-copilot492
kimi-cli490
Analyze a screen recording of a manual process and build working automation for it.
The user records themselves doing something repetitive or tedious, hands you the video file, and you figure out what they're doing, why, and how to script it away.
Before analyzing any recording, verify the required tools are available. Run these checks silently and only surface problems:
command -v ffmpeg >/dev/null 2>&1 && ffmpeg -version 2>/dev/null | head -1 || echo "NO_FFMPEG"
command -v whisper >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || echo "NO_WHISPER"
brew install ffmpeg (macOS) or the equivalent for their OS.pip install openai-whisper or brew install whisper-cpp. If the user declines, proceed with visual analysis only.Given a video file path (typically on ~/Desktop/), extract both visual frames and audio:
Extract frames at one frame every 2 seconds. This balances coverage with context window limits.
WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/automate-this-XXXXXX")
chmod 700 "$WORK_DIR"
mkdir -p "$WORK_DIR/frames"
ffmpeg -y -i "<VIDEO_PATH>" -vf "fps=0.5" -q:v 2 -loglevel warning "$WORK_DIR/frames/frame_%04d.jpg"
ls "$WORK_DIR/frames/" | wc -l
Use $WORK_DIR for all subsequent temp file paths in the session. The per-run directory with mode 0700 ensures extracted frames are only readable by the current user.
If the recording is longer than 5 minutes (more than 150 frames), increase the interval to one frame every 4 seconds to stay within context limits. Tell the user you're sampling less frequently for longer recordings.
Check if the video has an audio track:
ffprobe -i "<VIDEO_PATH>" -show_streams -select_streams a -loglevel error | head -5
If audio exists:
ffmpeg -y -i "<VIDEO_PATH>" -ac 1 -ar 16000 -loglevel warning "$WORK_DIR/audio.wav"
# Use whichever whisper binary is available
if command -v whisper >/dev/null 2>&1; then
whisper "$WORK_DIR/audio.wav" --model small --language en --output_format txt --output_dir "$WORK_DIR/"
cat "$WORK_DIR/audio.txt"
elif command -v whisper-cpp >/dev/null 2>&1; then
whisper-cpp -m "$(brew --prefix 2>/dev/null)/share/whisper-cpp/models/ggml-small.bin" -l en -f "$WORK_DIR/audio.wav" -otxt -of "$WORK_DIR/audio"
cat "$WORK_DIR/audio.txt"
else
echo "NO_WHISPER"
fi
If neither whisper binary is available and the recording has audio, inform the user they're missing narration context and ask if they want to install Whisper (pip install openai-whisper or brew install whisper-cpp) or proceed with visual-only analysis.
Analyze the extracted frames (and transcript, if available) to build a structured understanding of what the user did. Work through the frames sequentially and identify:
Present this reconstruction to the user as a numbered step list and ask them to confirm it's accurate before proposing automation. This is critical — a wrong understanding leads to useless automation.
Format:
Here's what I see you doing in this recording:
1. Open Chrome and navigate to [specific URL]
2. Log in with credentials
3. Click through to the reporting dashboard
4. Download a CSV export
5. Open the CSV in Excel
6. Filter rows where column B is "pending"
7. Copy those rows into a new spreadsheet
8. Email the new spreadsheet to [recipient]
You repeated steps 3-8 three times for different report types.
[If narration was present]: You mentioned that the export step is the slowest
part and that you do this every Monday morning.
Does this match what you were doing? Anything I got wrong or missed?
Do NOT proceed to Phase 3 until the user confirms the reconstruction is accurate.
Before proposing automation, understand what the user actually has to work with. Run these checks:
echo "=== OS ===" && uname -a
echo "=== Shell ===" && echo $SHELL
echo "=== Python ===" && { command -v python3 && python3 --version 2>&1; } || echo "not installed"
echo "=== Node ===" && { command -v node && node --version 2>&1; } || echo "not installed"
echo "=== Homebrew ===" && { command -v brew && echo "installed"; } || echo "not installed"
echo "=== Common Tools ===" && for cmd in curl jq playwright selenium osascript automator crontab; do command -v $cmd >/dev/null 2>&1 && echo "$cmd: yes" || echo "$cmd: no"; done
Use this to constrain proposals to tools the user already has. Never propose automation that requires installing five new things unless the simpler path genuinely doesn't work.
Based on the reconstructed process and the user's environment, propose automation at up to three tiers. Not every process needs three tiers — use judgment.
Tier 1 — Quick Win (under 5 minutes to set up) The smallest useful automation. A shell alias, a one-liner, a keyboard shortcut, an AppleScript snippet. Automates the single most painful step, not the whole process.
Tier 2 — Script (under 30 minutes to set up) A standalone script (bash, Python, or Node — whichever the user has) that automates the full process end-to-end. Handles common errors. Can be run manually when needed.
Tier 3 — Full Automation (under 2 hours to set up) The script from Tier 2, plus: scheduled execution (cron, launchd, or GitHub Actions), logging, error notifications, and any necessary integration scaffolding (API keys, auth tokens, etc.).
For each tier, provide:
## Tier [N]: [Name]
**What it automates:** [Which steps from the reconstruction]
**What stays manual:** [Which steps still need a human]
**Time savings:** [Estimated time saved per run, based on the recording length and repetition count]
**Prerequisites:** [Anything needed that isn't already installed — ideally nothing]
**How it works:**
[2-3 sentence plain-English explanation]
**The code:**
[Complete, working, commented code — not pseudocode]
**How to test it:**
[Exact steps to verify it works, starting with a dry run if possible]
**How to undo:**
[How to reverse any changes if something goes wrong]
Use these strategies based on which applications appear in the recording:
Browser-based workflows:
curl or wget for simple HTTP requests with known endpoints.Spreadsheet and data workflows:
csvkit for quick command-line CSV manipulation without writing code.Email workflows:
osascript can control Mail.app to send emails with attachments.smtplib for sending, imaplib for reading.File management workflows:
find + xargs for batch operations.fswatch or watchman for triggered-on-change automation.Terminal/CLI workflows:
macOS-specific workflows:
automator for file-based workflows.launchd plist files for scheduled tasks (prefer over cron on macOS).Cross-application workflows (data moves between apps):
Apply these principles to every proposal:
Automate the bottleneck first. The narration and timing in the recording reveal which step is actually painful. A 30-second automation of the worst step beats a 2-hour automation of the whole process.
Match the user's skill level. If the recording shows someone comfortable in a terminal, propose shell scripts. If it shows someone navigating GUIs, propose something with a simple trigger (double-click a script, run a Shortcut, or type one command).
Estimate real time savings. Count the recording duration and multiply by how often they do it. "This recording is 4 minutes. You said you do this daily. That's 17 hours per year. Tier 1 cuts it to 30 seconds each time — you get 16 hours back."
Handle the 80% case. The first version of the automation should cover the common path perfectly. Edge cases can be handled in Tier 3 or flagged for manual intervention.
Preserve human checkpoints. If the recording shows the user reviewing or approving something mid-process, keep that as a manual step. Don't automate judgment calls.
Propose dry runs. Every script should have a mode where it shows what it would do without doing it. --dry-run flags, preview output, or confirmation prompts before destructive actions.
Account for auth and secrets. If the process involves logging in or using credentials, never hardcode them. Use environment variables, keychain access (macOS security command), or prompt for them at runtime.
When the user picks a tier:
~/Desktop/ otherwise).After analysis is complete (regardless of outcome), clean up extracted frames and audio:
rm -rf "$WORK_DIR"
Tell the user you're cleaning up temporary files so they know nothing is left behind.
Weekly Installs
567
Repository
GitHub Stars
26.7K
First Seen
Mar 9, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
gemini-cli509
codex508
opencode497
cursor494
github-copilot492
kimi-cli490
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
103,800 周安装
Consider failure modes. What happens if the website is down? If the file doesn't exist? If the format changes? Good proposals mention this and handle it.