baoyu-url-to-markdown by jimliu/baoyu-skills
npx skills add https://github.com/jimliu/baoyu-skills --skill baoyu-url-to-markdown通过 Chrome CDP 获取任意 URL,保存渲染后的 HTML 快照,并将其转换为干净的 Markdown。
重要提示:所有脚本都位于此技能的 scripts/ 子目录中。
代理执行说明:
{baseDir}{baseDir}/scripts/<脚本名称>.ts${BUN_X} 运行时:如果已安装 bun → 使用 bun;如果 npx 可用 → 使用 npx -y bun;否则建议安装 bun广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
{baseDir} 和 ${BUN_X} 替换为实际值脚本参考:
| 脚本 | 用途 |
|---|---|
scripts/main.ts | URL 获取的 CLI 入口点 |
scripts/html-to-markdown.ts | Markdown 转换入口点及转换器选择 |
scripts/parsers/index.ts | 统一解析器入口:在通用转换器之前分发 URL 特定规则 |
scripts/parsers/types.ts | 所有规则文件共享的统一解析器接口 |
scripts/parsers/rules/*.ts | 每个 URL 规则一个文件,例如 X 状态和 X 文章 |
scripts/defuddle-converter.ts | 基于 Defuddle 的转换 |
scripts/legacy-converter.ts | Defuddle 之前的旧版提取和 Markdown 转换 |
scripts/markdown-conversion-shared.ts | 共享的元数据解析和 Markdown 文档辅助函数 |
检查 EXTEND.md 是否存在(优先级顺序):
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-url-to-markdown/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "user" }
┌────────────────────────────────────────────────────────┬───────────────────┐ │ 路径 │ 位置 │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ .baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ 项目目录 │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ $HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ 用户主目录 │ └────────────────────────────────────────────────────────┴───────────────────┘
┌───────────┬───────────────────────────────────────────────────────────────────────────┐ │ 结果 │ 操作 │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ 找到 │ 读取、解析、应用设置 │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ 未找到 │ 必须 运行首次设置(见下文)——切勿静默创建默认值 │ └───────────┴───────────────────────────────────────────────────────────────────────────┘
EXTEND.md 支持:默认下载媒体 | 默认输出目录 | 默认捕获模式 | 超时设置
关键:当未找到 EXTEND.md 时,在创建 EXTEND.md 之前,你必须使用 AskUserQuestion 来询问用户的偏好设置。切勿在不询问的情况下使用默认值创建 EXTEND.md。这是一个阻塞式操作——在设置完成之前,请勿进行任何转换。
使用 AskUserQuestion 一次性提出所有问题:
问题 1 — 标题:"媒体",问题:"如何处理页面中的图片和视频?"
问题 2 — 标题:"输出",问题:"默认输出目录?"
问题 3 — 标题:"保存",问题:"将偏好设置保存在哪里?"
用户回答后,在所选位置创建 EXTEND.md,确认"偏好设置已保存到 [路径]",然后继续。
| 键 | 默认值 | 值 | 描述 |
|---|---|---|---|
download_media | ask | ask / 1 / 0 | ask = 每次提示,1 = 总是下载,0 = 从不 |
default_output_dir | 空 | 路径或空 | 默认输出目录(空 = ./url-to-markdown/) |
EXTEND.md → CLI 映射:
| EXTEND.md 键 | CLI 参数 | 备注 |
|---|---|---|
download_media: 1 | --download-media | |
default_output_dir: ./posts/ | --output-dir ./posts/ | 目录路径。不要传递给 -o(它期望的是文件路径) |
值优先级:
--download-media, -o, --output-dir)-captured.html 文件x.com / twitter.com 上的标题/正文/媒体提取archive.ph / 相关的存档镜像可以从 input[name=q] 恢复原始 URL,并优先使用 #CONTENT,然后再回退到页面正文defuddle.md/<url> 并仍然保存 Markdown# 自动模式(默认)- 页面加载时捕获
${BUN_X} {baseDir}/scripts/main.ts <url>
# 等待模式 - 等待用户信号后再捕获
${BUN_X} {baseDir}/scripts/main.ts <url> --wait
# 保存到特定文件
${BUN_X} {baseDir}/scripts/main.ts <url> -o output.md
# 保存到自定义输出目录(自动生成文件名)
${BUN_X} {baseDir}/scripts/main.ts <url> --output-dir ./posts/
# 将图片和视频下载到本地目录
${BUN_X} {baseDir}/scripts/main.ts <url> --download-media
| 选项 | 描述 |
|---|---|
<url> | 要获取的 URL |
-o <路径> | 输出文件路径 — 必须是文件路径,而非目录(默认:自动生成) |
--output-dir <目录> | 基础输出目录 — 自动生成 {dir}/{domain}/{slug}.md(默认:./url-to-markdown/) |
--wait | 捕获前等待用户信号 |
--timeout <毫秒> | 页面加载超时时间(默认:30000) |
--download-media | 将图片/视频资源下载到本地的 imgs/ 和 videos/,并将 Markdown 链接重写为本地相对路径 |
| 模式 | 行为 | 使用场景 |
|---|---|---|
| 自动(默认) | 网络空闲时捕获 | 公开页面,静态内容 |
等待 (--wait) | 用户就绪时发出信号 | 需要登录,懒加载,付费墙 |
等待模式工作流程:
--wait 运行 → 脚本输出"准备就绪后按 Enter"每次运行会并排保存两个文件:
url、title、description、author、published、可选的 coverImage 和 captured_at 的 YAML 前置元数据,后跟转换后的 Markdown 内容*-captured.html,包含从 Chrome 捕获的渲染页面 HTML当 Defuddle 或页面元数据提供语言提示时,Markdown 前置元数据也会包含 language。
HTML 快照在任何 Markdown 媒体本地化之前保存,因此它忠实保留了用于转换的页面 DOM 捕获。如果使用了托管的 defuddle.md API 回退,Markdown 仍会被保存,但该次运行不会生成本地的 -captured.html 快照。
默认:url-to-markdown/<domain>/<slug>.md 使用 --output-dir ./posts/ 时:./posts/<domain>/<slug>.md
HTML 快照路径使用相同的基础名称:
url-to-markdown/<domain>/<slug>-captured.html./posts/<domain>/<slug>-captured.html<slug>:来自页面标题或 URL 路径(短横线分隔,2-6 个单词)<slug>-YYYYMMDD-HHMMSS.md当启用 --download-media 时:
imgs/ 目录videos/ 目录转换顺序:
https://defuddle.md/<url> API 并直接保存其 Markdown 输出CLI 输出将显示:
Converter: parser:... 当 URL 特定解析器成功时Converter: defuddle 当 Defuddle 成功时Converter: legacy:... 加上 Fallback used: ... 当需要回退时Converter: defuddle-api 当本地浏览器捕获失败并使用托管 API 时基于 EXTEND.md 中的 download_media 设置:
| 设置 | 行为 |
|---|---|
1(总是) | 使用 --download-media 标志运行脚本 |
0(从不) | 不使用 --download-media 标志运行脚本 |
ask(默认) | 遵循下面的每次询问流程 |
--download-media 运行脚本 → Markdown 已保存https://)AskUserQuestion:
--download-media 运行脚本(用本地化链接覆盖 Markdown)| 变量 | 描述 |
|---|---|
URL_CHROME_PATH | 自定义 Chrome 可执行文件路径 |
URL_DATA_DIR | 自定义数据目录 |
URL_CHROME_PROFILE_DIR | 自定义 Chrome 配置文件目录 |
故障排除:未找到 Chrome → 设置 URL_CHROME_PATH。超时 → 增加 --timeout。复杂页面 → 尝试 --wait 模式。如果 Markdown 质量差,请检查保存的 -captured.html 并查看运行是否记录了旧版回退。
--wait,并在观看页面完全水合后捕获。https://defuddle.md/<url>。Shell 形式:curl https://defuddle.md/stephango.com通过 EXTEND.md 进行自定义配置。有关路径和支持的选项,请参阅偏好设置部分。
每周安装量
11.7K
仓库
GitHub 星标
11.3K
首次出现
2026 年 1 月 22 日
安全审计
安装于
opencode10.7K
gemini-cli10.4K
codex10.2K
cursor9.9K
github-copilot9.5K
amp9.2K
Fetches any URL via Chrome CDP, saves the rendered HTML snapshot, and converts it to clean markdown.
Important : All scripts are located in the scripts/ subdirectory of this skill.
Agent Execution Instructions :
{baseDir}{baseDir}/scripts/<script-name>.ts${BUN_X} runtime: if bun installed → bun; if npx available → npx -y bun; else suggest installing bun{baseDir} and ${BUN_X} in this document with actual valuesScript Reference :
| Script | Purpose |
|---|---|
scripts/main.ts | CLI entry point for URL fetching |
scripts/html-to-markdown.ts | Markdown conversion entry point and converter selection |
scripts/parsers/index.ts | Unified parser entry: dispatches URL-specific rules before generic converters |
scripts/parsers/types.ts | Unified parser interface shared by all rule files |
scripts/parsers/rules/*.ts | One file per URL rule, for example X status and X article |
scripts/defuddle-converter.ts |
Check EXTEND.md existence (priority order):
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-url-to-markdown/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "user" }
┌────────────────────────────────────────────────────────┬───────────────────┐ │ Path │ Location │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ .baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ Project directory │ ├────────────────────────────────────────────────────────┼───────────────────┤ │ $HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md │ User home │ └────────────────────────────────────────────────────────┴───────────────────┘
┌───────────┬───────────────────────────────────────────────────────────────────────────┐ │ Result │ Action │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Found │ Read, parse, apply settings │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Not found │ MUST run first-time setup (see below) — do NOT silently create defaults │ └───────────┴───────────────────────────────────────────────────────────────────────────┘
EXTEND.md Supports : Download media by default | Default output directory | Default capture mode | Timeout settings
CRITICAL : When EXTEND.md is not found, you MUST useAskUserQuestion to ask the user for their preferences before creating EXTEND.md. NEVER create EXTEND.md with defaults without asking. This is a BLOCKING operation — do NOT proceed with any conversion until setup is complete.
Use AskUserQuestion with ALL questions in ONE call:
Question 1 — header: "Media", question: "How to handle images and videos in pages?"
Question 2 — header: "Output", question: "Default output directory?"
Question 3 — header: "Save", question: "Where to save preferences?"
After user answers, create EXTEND.md at the chosen location, confirm "Preferences saved to [path]", then continue.
Full reference: references/config/first-time-setup.md
| Key | Default | Values | Description |
|---|---|---|---|
download_media | ask | ask / 1 / 0 | ask = prompt each time, 1 = always download, 0 = never |
default_output_dir |
EXTEND.md → CLI mapping :
| EXTEND.md key | CLI argument | Notes |
|---|---|---|
download_media: 1 | --download-media | |
default_output_dir: ./posts/ | --output-dir ./posts/ | Directory path. Do NOT pass to -o (which expects a file path) |
Value priority :
--download-media, -o, --output-dir)-captured.html filex.com / twitter.comarchive.ph / related archive mirrors can restore the original URL from input[name=q] and prefer #CONTENT before falling back to the page bodydefuddle.md/<url> and still save markdown# Auto mode (default) - capture when page loads
${BUN_X} {baseDir}/scripts/main.ts <url>
# Wait mode - wait for user signal before capture
${BUN_X} {baseDir}/scripts/main.ts <url> --wait
# Save to specific file
${BUN_X} {baseDir}/scripts/main.ts <url> -o output.md
# Save to a custom output directory (auto-generates filename)
${BUN_X} {baseDir}/scripts/main.ts <url> --output-dir ./posts/
# Download images and videos to local directories
${BUN_X} {baseDir}/scripts/main.ts <url> --download-media
| Option | Description |
|---|---|
<url> | URL to fetch |
-o <path> | Output file path — must be a file path, not directory (default: auto-generated) |
--output-dir <dir> | Base output directory — auto-generates {dir}/{domain}/{slug}.md (default: ./url-to-markdown/) |
--wait | Wait for user signal before capturing |
--timeout <ms> |
| Mode | Behavior | Use When |
|---|---|---|
| Auto (default) | Capture on network idle | Public pages, static content |
Wait (--wait) | User signals when ready | Login-required, lazy loading, paywalls |
Wait mode workflow :
--wait → script outputs "Press Enter when ready"Each run saves two files side by side:
url, title, description, author, published, optional coverImage, and captured_at, followed by converted markdown content*-captured.html, containing the rendered page HTML captured from ChromeWhen Defuddle or page metadata provides a language hint, the markdown front matter also includes language.
The HTML snapshot is saved before any markdown media localization, so it stays a faithful capture of the page DOM used for conversion. If the hosted defuddle.md API fallback is used, markdown is still saved, but there is no local -captured.html snapshot for that run.
Default: url-to-markdown/<domain>/<slug>.md With --output-dir ./posts/: ./posts/<domain>/<slug>.md
HTML snapshot path uses the same basename:
url-to-markdown/<domain>/<slug>-captured.html
./posts/<domain>/<slug>-captured.html
<slug>: From page title or URL path (kebab-case, 2-6 words)
Conflict resolution: Append timestamp <slug>-YYYYMMDD-HHMMSS.md
When --download-media is enabled:
imgs/ next to the markdown filevideos/ next to the markdown fileConversion order:
https://defuddle.md/<url> API and save its markdown output directlyCLI output will show:
Converter: parser:... when a URL-specific parser succeededConverter: defuddle when Defuddle succeedsConverter: legacy:... plus Fallback used: ... when fallback was neededConverter: defuddle-api when local browser capture failed and the hosted API was used insteadBased on download_media setting in EXTEND.md:
| Setting | Behavior |
|---|---|
1 (always) | Run script with --download-media flag |
0 (never) | Run script without --download-media flag |
ask (default) | Follow the ask-each-time flow below |
--download-media → markdown savedhttps:// in image/video links)AskUserQuestion:
--download-media (overwrites markdown with localized links)| Variable | Description |
|---|---|
URL_CHROME_PATH | Custom Chrome executable path |
URL_DATA_DIR | Custom data directory |
URL_CHROME_PROFILE_DIR | Custom Chrome profile directory |
Troubleshooting : Chrome not found → set URL_CHROME_PATH. Timeout → increase --timeout. Complex pages → try --wait mode. If markdown quality is poor, inspect the saved -captured.html and check whether the run logged a legacy fallback.
--wait and capture after the watch page is fully hydrated.https://defuddle.md/<url>. In shell form: curl https://defuddle.md/stephango.comCustom configurations via EXTEND.md. See Preferences section for paths and supported options.
Weekly Installs
11.7K
Repository
GitHub Stars
11.3K
First Seen
Jan 22, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode10.7K
gemini-cli10.4K
codex10.2K
cursor9.9K
github-copilot9.5K
amp9.2K
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
102,200 周安装
AI智能体长期记忆系统 - 精英级架构,融合6种方法,永不丢失上下文
1,200 周安装
AI新闻播客制作技能:实时新闻转对话式播客脚本与音频生成
1,200 周安装
Word文档处理器:DOCX创建、编辑、分析与修订痕迹处理全指南 | 自动化办公解决方案
1,200 周安装
React Router 框架模式指南:全栈开发、文件路由、数据加载与渲染策略
1,200 周安装
Nano Banana AI 图像生成工具:使用 Gemini 3 Pro 生成与编辑高分辨率图像
1,200 周安装
SVG Logo Designer - AI 驱动的专业矢量标识设计工具,生成可缩放品牌标识
1,200 周安装
| Defuddle-based conversion |
scripts/legacy-converter.ts | Pre-Defuddle legacy extraction and markdown conversion |
scripts/markdown-conversion-shared.ts | Shared metadata parsing and markdown document helpers |
| empty |
| path or empty |
Default output directory (empty = ./url-to-markdown/) |
| Page load timeout (default: 30000) |
--download-media | Download image/video assets to local imgs/ and videos/, and rewrite markdown links to local relative paths |