视频转自动化脚本工具：分析屏幕录制，生成可运行代码，提升工作效率

automate-this by github/awesome-copilot

616 周安装量

26,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/github/awesome-copilot --skill automate-this

AI/机器学习自动化生产力

🇨🇳中文介绍

自动化处理

分析手动操作的屏幕录制视频，并为其构建可运行的自动化脚本。

用户录制自己执行重复或繁琐任务的过程，将视频文件交给你，你需要分析他们在做什么、为什么做，以及如何通过脚本实现自动化。

前提条件检查

在分析任何录制内容之前，请先验证所需工具是否可用。静默运行这些检查，仅在发现问题时提示：

command -v ffmpeg >/dev/null 2>&1 && ffmpeg -version 2>/dev/null | head -1 || echo "NO_FFMPEG"
command -v whisper >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || echo "NO_WHISPER"

ffmpeg 是必需的。 如果缺失，请告知用户：brew install ffmpeg（macOS）或适用于其操作系统的等效命令。
Whisper 是可选的。 仅在录制内容包含旁白时需要。如果缺失且录制内容包含音轨，建议：pip install openai-whisper 或 brew install whisper-cpp。如果用户拒绝，则仅进行视觉分析。

第一阶段：从录制内容中提取信息

给定一个视频文件路径（通常在 ~/Desktop/ 上），提取视觉帧和音频：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

音频提取和转录

检查视频是否包含音轨：

ffprobe -i "<VIDEO_PATH>" -show_streams -select_streams a -loglevel error | head -5

如果存在音频：

ffmpeg -y -i "<VIDEO_PATH>" -ac 1 -ar 16000 -loglevel warning "$WORK_DIR/audio.wav"

# 使用可用的任意 whisper 二进制文件
if command -v whisper >/dev/null 2>&1; then
  whisper "$WORK_DIR/audio.wav" --model small --language en --output_format txt --output_dir "$WORK_DIR/"
  cat "$WORK_DIR/audio.txt"
elif command -v whisper-cpp >/dev/null 2>&1; then
  whisper-cpp -m "$(brew --prefix 2>/dev/null)/share/whisper-cpp/models/ggml-small.bin" -l en -f "$WORK_DIR/audio.wav" -otxt -of "$WORK_DIR/audio"
  cat "$WORK_DIR/audio.txt"
else
  echo "NO_WHISPER"
fi

如果 whisper 二进制文件均不可用且录制内容包含音频，请告知用户他们缺少旁白上下文，并询问他们是否希望安装 Whisper（pip install openai-whisper 或 brew install whisper-cpp）或继续进行仅视觉分析。

第二阶段：重构操作流程

分析提取的帧（以及可用的转录文本），以结构化地理解用户的操作。按顺序处理帧并识别：

使用的应用程序 — 录制中出现了哪些应用？（浏览器、终端、Finder、邮件客户端、电子表格、IDE 等）
操作序列 — 用户按顺序做了什么？逐次点击，逐步操作。
数据流 — 步骤之间传递了哪些信息？（复制的文本、下载的文件、表单输入等）
决策点 — 是否有用户暂停、检查某物或做出选择的时刻？
重复模式 — 用户是否使用不同的输入多次执行相同的操作？
痛点 — 流程的哪些部分看起来缓慢、容易出错或繁琐？旁白通常会直接揭示这一点（"我讨厌这部分"、"这总是要花很长时间"、"我必须为每一个都这样做"）。

将重构结果以编号步骤列表的形式呈现给用户，并在提出自动化建议前请他们确认其准确性。这至关重要 — 错误的理解会导致无用的自动化。

以下是我在录制中看到您所做的操作：

1.  打开 Chrome 并导航到 [特定 URL]
2.  使用凭据登录
3.  点击进入报告仪表板
4.  下载 CSV 导出文件
5.  在 Excel 中打开 CSV 文件
6.  筛选 B 列为 "pending" 的行
7.  将这些行复制到新的电子表格中
8.  将新电子表格通过电子邮件发送给 [收件人]

您为不同的报告类型重复了步骤 3-8 三次。

[如果存在旁白]：您提到导出步骤是最慢的部分，并且您每周一早上都要做这个。

这与您的操作相符吗？我有任何错误或遗漏吗？

在用户确认重构准确之前，不要进入第三阶段。

第三阶段：环境指纹识别

在提出自动化建议之前，先了解用户实际可用的工具。运行以下检查：

echo "=== OS ===" && uname -a
echo "=== Shell ===" && echo $SHELL
echo "=== Python ===" && { command -v python3 && python3 --version 2>&1; } || echo "not installed"
echo "=== Node ===" && { command -v node && node --version 2>&1; } || echo "not installed"
echo "=== Homebrew ===" && { command -v brew && echo "installed"; } || echo "not installed"
echo "=== 常用工具 ===" && for cmd in curl jq playwright selenium osascript automator crontab; do command -v $cmd >/dev/null 2>&1 && echo "$cmd: yes" || echo "$cmd: no"; done

利用这些信息将建议限制在用户已有的工具范围内。除非更简单的路径确实行不通，否则永远不要提出需要安装五个新东西的自动化方案。

第四阶段：提出自动化方案

基于重构的流程和用户环境，提出最多三个层级的自动化方案。并非每个流程都需要三个层级 — 请自行判断。

层级 1 — 快速见效（设置时间少于 5 分钟） 最小但有用的自动化。一个 shell 别名、一行命令、一个键盘快捷键、一段 AppleScript 代码片段。自动化最痛苦的单一步骤，而非整个流程。

层级 2 — 脚本（设置时间少于 30 分钟） 一个独立的脚本（bash、Python 或 Node — 用户有哪个就用哪个），可以端到端地自动化整个流程。处理常见错误。可以在需要时手动运行。

层级 3 — 完全自动化（设置时间少于 2 小时） 层级 2 的脚本，加上：计划执行（cron、launchd 或 GitHub Actions）、日志记录、错误通知以及任何必要的集成脚手架（API 密钥、身份验证令牌等）。

对于每个层级，提供：

## 层级 [N]: [名称]

**自动化内容：** [重构步骤中的哪些部分]
**仍需手动操作：** [哪些步骤仍然需要人工参与]
**节省时间：** [基于录制时长和重复次数估算的每次运行节省时间]
**前提条件：** [任何需要但尚未安装的东西 — 理想情况下没有]

**工作原理：**
[2-3 句通俗易懂的解释]

**代码：**
[完整、可运行、带注释的代码 — 不是伪代码]

**如何测试：**
[验证其工作的确切步骤，如果可能，从试运行开始]

**如何撤销：**
[如果出现问题，如何撤销任何更改]

特定应用的自动化策略

根据录制中出现的应用程序使用以下策略：

基于浏览器的工作流：

首选：检查网站是否有公共 API。API 调用比浏览器自动化可靠 10 倍。搜索 API 文档。
次选：对于具有已知端点的简单 HTTP 请求，使用 curl 或 wget。
第三选择：对于需要通过 UI 点击的工作流，使用 Playwright 或 Selenium。优先选择 Playwright — 它更快、更稳定。
寻找模式：如果用户反复从仪表板下载相同的报告，几乎可以肯定可以通过 API 或带有查询参数的直接 URL 获取。

电子表格和数据工作流：

使用 Python 的 pandas 进行数据筛选、转换和聚合。
如果用户在 Excel 中进行简单的列操作，一个 5 行的 Python 脚本就可以替代整个手动过程。
使用 csvkit 进行快速的命令行 CSV 操作，无需编写代码。
如果输出需要保持 Excel 格式，请使用 openpyxl。

电子邮件工作流：

macOS：osascript 可以控制 Mail.app 发送带附件的电子邮件。
跨平台：使用 Python 的 smtplib 发送邮件，imaplib 读取邮件。
如果电子邮件遵循模板，则使用变量替换从模板文件生成正文。

文件管理工作流：

使用 shell 脚本处理移动/复制/重命名模式。
使用 find + xargs 进行批量操作。
使用 fswatch 或 watchman 实现基于变化的触发自动化。
如果用户按日期或类型将文件组织到文件夹中，那只是一个 3 行的 shell 脚本。

终端/CLI 工作流：

为频繁键入的命令创建 shell 别名。
为多步骤序列创建 shell 函数。
为特定项目的任务集创建 Makefile。
如果用户使用不同的参数运行相同的命令，那是一个循环。

macOS 特定工作流：

使用 AppleScript/JXA 控制原生应用（Mail、Calendar、Finder、Preview 等）。
对于不需要代码的简单多应用工作流，使用 Shortcuts.app。
对于基于文件的工作流，使用 automator。
对于计划任务，使用 launchd plist 文件（在 macOS 上优先于 cron）。

跨应用工作流（数据在应用间移动）：

识别数据传输点。每次传输都是一个自动化机会。
录制中基于剪贴板的传输表明应用之间没有直接通信 — 应寻找 API、基于文件的交接或直接集成。
如果用户从应用 A 复制并粘贴到应用 B，自动化应直接从 A 的数据源读取，并直接写入 B 的输入格式。

制定有针对性的提案

将以下原则应用于每个提案：

首先自动化瓶颈。 录制中的旁白和时机揭示了哪个步骤实际上最痛苦。对最糟糕步骤进行 30 秒的自动化，胜过对整个流程进行 2 小时的自动化。
匹配用户的技能水平。 如果录制显示用户熟悉终端，则建议 shell 脚本。如果显示用户导航 GUI，则建议具有简单触发器的方案（双击脚本、运行快捷指令或输入一个命令）。
估算实际节省的时间。 计算录制时长并乘以他们执行的频率。"这段录制时长 4 分钟。您说您每天都要做。那就是每年 17 小时。层级 1 每次将其缩短到 30 秒 — 您将节省 16 小时。"
处理 80% 的情况。 自动化的第一个版本应完美覆盖常见路径。边缘情况可以在层级 3 中处理，或标记为需要手动干预。
保留人工检查点。 如果录制显示用户在流程中审查或批准某些内容，请将其保留为手动步骤。不要自动化需要判断的环节。
建议试运行。 每个脚本都应有一个模式，显示它将做什么而不实际执行。使用 --dry-run 标志、预览输出或在破坏性操作前进行确认提示。
考虑身份验证和密钥。 如果流程涉及登录或使用凭据，切勿硬编码。使用环境变量、钥匙串访问（macOS security 命令）或在运行时提示输入。
考虑故障模式。 如果网站宕机怎么办？如果文件不存在怎么办？如果格式改变了怎么办？好的提案会提及并处理这些问题。

第五阶段：构建和测试

当用户选择一个层级时：

将完整的自动化代码写入文件（建议一个合理的位置 — 如果存在用户的项目目录，则使用它，否则使用 ~/Desktop/）。
在用户观看的情况下，进行试运行或测试。
如果测试成功，展示如何实际运行它。
如果失败，诊断并修复 — 不要一次尝试后就放弃。

分析完成后（无论结果如何），清理提取的帧和音频：

rm -rf "$WORK_DIR"

告知用户您正在清理临时文件，以便他们知道没有留下任何东西。

🇺🇸English

Automate This

Analyze a screen recording of a manual process and build working automation for it.

The user records themselves doing something repetitive or tedious, hands you the video file, and you figure out what they're doing, why, and how to script it away.

Prerequisites Check

Before analyzing any recording, verify the required tools are available. Run these checks silently and only surface problems:

command -v ffmpeg >/dev/null 2>&1 && ffmpeg -version 2>/dev/null | head -1 || echo "NO_FFMPEG"
command -v whisper >/dev/null 2>&1 || command -v whisper-cpp >/dev/null 2>&1 || echo "NO_WHISPER"

ffmpeg is required. If missing, tell the user: brew install ffmpeg (macOS) or the equivalent for their OS.
Whisper is optional. Only needed if the recording has narration. If missing AND the recording has an audio track, suggest: pip install openai-whisper or brew install whisper-cpp. If the user declines, proceed with visual analysis only.

Phase 1: Extract Content from the Recording

Given a video file path (typically on ~/Desktop/), extract both visual frames and audio:

Frame Extraction

Extract frames at one frame every 2 seconds. This balances coverage with context window limits.

WORK_DIR=$(mktemp -d "${TMPDIR:-/tmp}/automate-this-XXXXXX")
chmod 700 "$WORK_DIR"
mkdir -p "$WORK_DIR/frames"
ffmpeg -y -i "<VIDEO_PATH>" -vf "fps=0.5" -q:v 2 -loglevel warning "$WORK_DIR/frames/frame_%04d.jpg"
ls "$WORK_DIR/frames/" | wc -l

Use $WORK_DIR for all subsequent temp file paths in the session. The per-run directory with mode 0700 ensures extracted frames are only readable by the current user.

If the recording is longer than 5 minutes (more than 150 frames), increase the interval to one frame every 4 seconds to stay within context limits. Tell the user you're sampling less frequently for longer recordings.

Audio Extraction and Transcription

Check if the video has an audio track:

ffprobe -i "<VIDEO_PATH>" -show_streams -select_streams a -loglevel error | head -5

If audio exists:

ffmpeg -y -i "<VIDEO_PATH>" -ac 1 -ar 16000 -loglevel warning "$WORK_DIR/audio.wav"

# Use whichever whisper binary is available
if command -v whisper >/dev/null 2>&1; then
  whisper "$WORK_DIR/audio.wav" --model small --language en --output_format txt --output_dir "$WORK_DIR/"
  cat "$WORK_DIR/audio.txt"
elif command -v whisper-cpp >/dev/null 2>&1; then
  whisper-cpp -m "$(brew --prefix 2>/dev/null)/share/whisper-cpp/models/ggml-small.bin" -l en -f "$WORK_DIR/audio.wav" -otxt -of "$WORK_DIR/audio"
  cat "$WORK_DIR/audio.txt"
else
  echo "NO_WHISPER"
fi

If neither whisper binary is available and the recording has audio, inform the user they're missing narration context and ask if they want to install Whisper (pip install openai-whisper or brew install whisper-cpp) or proceed with visual-only analysis.

Phase 2: Reconstruct the Process

Analyze the extracted frames (and transcript, if available) to build a structured understanding of what the user did. Work through the frames sequentially and identify:

Applications used — Which apps appear in the recording? (browser, terminal, Finder, mail client, spreadsheet, IDE, etc.)
Sequence of actions — What did the user do, in order? Click-by-click, step-by-step.
Data flow — What information moved between steps? (copied text, downloaded files, form inputs, etc.)
Decision points — Were there moments where the user paused, checked something, or made a choice?
Repetition patterns — Did the user do the same thing multiple times with different inputs?
Pain points — Where did the process look slow, error-prone, or tedious? The narration often reveals this directly ("I hate this part," "this always takes forever," "I have to do this for every single one").

Present this reconstruction to the user as a numbered step list and ask them to confirm it's accurate before proposing automation. This is critical — a wrong understanding leads to useless automation.

Format:

Here's what I see you doing in this recording:

1. Open Chrome and navigate to [specific URL]
2. Log in with credentials
3. Click through to the reporting dashboard
4. Download a CSV export
5. Open the CSV in Excel
6. Filter rows where column B is "pending"
7. Copy those rows into a new spreadsheet
8. Email the new spreadsheet to [recipient]

You repeated steps 3-8 three times for different report types.

[If narration was present]: You mentioned that the export step is the slowest
part and that you do this every Monday morning.

Does this match what you were doing? Anything I got wrong or missed?

Do NOT proceed to Phase 3 until the user confirms the reconstruction is accurate.

Phase 3: Environment Fingerprint

Before proposing automation, understand what the user actually has to work with. Run these checks:

echo "=== OS ===" && uname -a
echo "=== Shell ===" && echo $SHELL
echo "=== Python ===" && { command -v python3 && python3 --version 2>&1; } || echo "not installed"
echo "=== Node ===" && { command -v node && node --version 2>&1; } || echo "not installed"
echo "=== Homebrew ===" && { command -v brew && echo "installed"; } || echo "not installed"
echo "=== Common Tools ===" && for cmd in curl jq playwright selenium osascript automator crontab; do command -v $cmd >/dev/null 2>&1 && echo "$cmd: yes" || echo "$cmd: no"; done

Use this to constrain proposals to tools the user already has. Never propose automation that requires installing five new things unless the simpler path genuinely doesn't work.

Phase 4: Propose Automation

Based on the reconstructed process and the user's environment, propose automation at up to three tiers. Not every process needs three tiers — use judgment.

Tier Structure

Tier 1 — Quick Win (under 5 minutes to set up) The smallest useful automation. A shell alias, a one-liner, a keyboard shortcut, an AppleScript snippet. Automates the single most painful step, not the whole process.

Tier 2 — Script (under 30 minutes to set up) A standalone script (bash, Python, or Node — whichever the user has) that automates the full process end-to-end. Handles common errors. Can be run manually when needed.

Tier 3 — Full Automation (under 2 hours to set up) The script from Tier 2, plus: scheduled execution (cron, launchd, or GitHub Actions), logging, error notifications, and any necessary integration scaffolding (API keys, auth tokens, etc.).

Proposal Format

For each tier, provide:

## Tier [N]: [Name]

**What it automates:** [Which steps from the reconstruction]
**What stays manual:** [Which steps still need a human]
**Time savings:** [Estimated time saved per run, based on the recording length and repetition count]
**Prerequisites:** [Anything needed that isn't already installed — ideally nothing]

**How it works:**
[2-3 sentence plain-English explanation]

**The code:**
[Complete, working, commented code — not pseudocode]

**How to test it:**
[Exact steps to verify it works, starting with a dry run if possible]

**How to undo:**
[How to reverse any changes if something goes wrong]

Application-Specific Automation Strategies

Use these strategies based on which applications appear in the recording:

Browser-based workflows:

First choice: Check if the website has a public API. API calls are 10x more reliable than browser automation. Search for API documentation.
Second choice: curl or wget for simple HTTP requests with known endpoints.
Third choice: Playwright or Selenium for workflows that require clicking through UI. Prefer Playwright — it's faster and less flaky.
Look for patterns: if the user is downloading the same report from a dashboard repeatedly, it's almost certainly available via API or direct URL with query parameters.

Spreadsheet and data workflows:

Python with pandas for data filtering, transformation, and aggregation.
If the user is doing simple column operations in Excel, a 5-line Python script replaces the entire manual process.
csvkit for quick command-line CSV manipulation without writing code.
If the output needs to stay in Excel format, use openpyxl.

Email workflows:

macOS: osascript can control Mail.app to send emails with attachments.
Cross-platform: Python smtplib for sending, imaplib for reading.
If the email follows a template, generate the body from a template file with variable substitution.

File management workflows:

Shell scripts for move/copy/rename patterns.
find + xargs for batch operations.
fswatch or watchman for triggered-on-change automation.
If the user is organizing files into folders by date or type, that's a 3-line shell script.

Terminal/CLI workflows:

Shell aliases for frequently typed commands.
Shell functions for multi-step sequences.
Makefiles for project-specific task sets.
If the user ran the same command with different arguments, that's a loop.

macOS-specific workflows:

AppleScript/JXA for controlling native apps (Mail, Calendar, Finder, Preview, etc.).
Shortcuts.app for simple multi-app workflows that don't need code.
automator for file-based workflows.
launchd plist files for scheduled tasks (prefer over cron on macOS).

Cross-application workflows (data moves between apps):

Identify the data transfer points. Each transfer is an automation opportunity.
Clipboard-based transfers in the recording suggest the apps don't talk to each other — look for APIs, file-based handoffs, or direct integrations instead.
If the user copies from App A and pastes into App B, the automation should read from A's data source and write to B's input format directly.

Making Proposals Targeted

Apply these principles to every proposal:

Automate the bottleneck first. The narration and timing in the recording reveal which step is actually painful. A 30-second automation of the worst step beats a 2-hour automation of the whole process.
Match the user's skill level. If the recording shows someone comfortable in a terminal, propose shell scripts. If it shows someone navigating GUIs, propose something with a simple trigger (double-click a script, run a Shortcut, or type one command).
Estimate real time savings. Count the recording duration and multiply by how often they do it. "This recording is 4 minutes. You said you do this daily. That's 17 hours per year. Tier 1 cuts it to 30 seconds each time — you get 16 hours back."
Handle the 80% case. The first version of the automation should cover the common path perfectly. Edge cases can be handled in Tier 3 or flagged for manual intervention.
Preserve human checkpoints. If the recording shows the user reviewing or approving something mid-process, keep that as a manual step. Don't automate judgment calls.
Propose dry runs. Every script should have a mode where it shows what it would do without doing it. --dry-run flags, preview output, or confirmation prompts before destructive actions.
Account for auth and secrets. If the process involves logging in or using credentials, never hardcode them. Use environment variables, keychain access (macOS security command), or prompt for them at runtime.

Phase 5: Build and Test

When the user picks a tier:

Write the complete automation code to a file (suggest a sensible location — the user's project directory if one exists, or ~/Desktop/ otherwise).
Walk through a dry run or test with the user watching.
If the test works, show how to run it for real.
If it fails, diagnose and fix — don't give up after one attempt.

Cleanup

After analysis is complete (regardless of outcome), clean up extracted frames and audio:

rm -rf "$WORK_DIR"

Tell the user you're cleaning up temporary files so they know nothing is left behind.

Weekly Installs

567

Repository

github/awesome-copilot

GitHub Stars

26.7K

First Seen

Mar 9, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli509

codex508

opencode497

cursor494

github-copilot492

kimi-cli490

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

103,800 周安装

Consider failure modes. What happens if the website is down? If the file doesn't exist? If the format changes? Good proposals mention this and handle it.