Defuddle 网页内容提取工具 - 一键去除广告侧边栏，输出干净 Markdown

defuddle by joeseesun/defuddle-skill

916 周安装量

84 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/joeseesun/defuddle-skill --skill defuddle

内容创作开发自动化

🇨🇳中文介绍

Defuddle - 网页内容提取

从网页中提取主要文章内容，去除广告、侧边栏、导航栏和其他杂乱元素。输出带有元数据的干净 Markdown。

前提条件

首次使用前，请检查是否已安装 defuddle：

command -v defuddle >/dev/null 2>&1 || npm install -g defuddle jsdom

默认工作流程

当用户提供 URL 时，请遵循以下工作流程：

步骤 1：提取内容为 Markdown + JSON 元数据

始终同时使用 -m 和 -j 标志以获取带有完整元数据的 Markdown 内容：

defuddle parse "<url>" -m -j

步骤 2：向用户展示摘要

向用户展示：

标题：来自 JSON 的 title 字段
作者：来自 JSON 的 author 字段

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

733,500 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

252,100 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

202,600 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

133,200 周安装

标志	描述
`-m, --markdown`	将内容转换为 Markdown
`-j, --json`	以 JSON 格式输出完整元数据
`-o, --output <file>`	写入文件而非标准输出
`-p, --property <name>`	提取单个属性（title、description、domain、author、published、wordCount、content）
`--debug`	详细日志记录

title — 文章标题
author — 作者姓名
published — 发布日期
description — 元描述
content — 提取的 Markdown 内容（使用 -m 时）
domain — 来源域名
favicon — 网站图标 URL
image — 特色图片 URL
site — 网站名称
wordCount — 字数统计
parseTime — 处理时间（毫秒）

🇺🇸English

Defuddle - Web Content Extraction

Extract main article content from web pages, removing ads, sidebars, navigation, and other clutter. Output clean Markdown with metadata.

Prerequisites

Before first use, check if defuddle is installed:

command -v defuddle >/dev/null 2>&1 || npm install -g defuddle jsdom

Default Workflow

When user provides a URL, follow this workflow:

Step 1: Extract content as Markdown + JSON metadata

Always use both -m and -j flags to get markdown content with full metadata:

defuddle parse "<url>" -m -j

Step 2: Present a summary to the user

Show the user:

Title : from JSON title field
Author : from JSON author field
Source : domain
Word count : from JSON wordCount field
A brief preview (first 2-3 sentences)

Step 3: Ask where to save

If this is the first time using defuddle in this conversation, ask the user:

"Save to which directory? (e.g. ~/Documents, ~/Desktop, or a custom path)"

Remember the user's chosen directory for subsequent uses in the same conversation.

Step 4: Save as Markdown file

Write the file with frontmatter + full content:

---
title: {title}
author: {author}
source: {url}
date: {published or "Unknown"}
clipped: {today's date YYYY-MM-DD}
wordCount: {wordCount}
---

# {title}

{markdown content}

File naming : Use the article title as filename, sanitized for filesystem:

Replace special characters with spaces
Trim whitespace
Example: The Shape of the Essay Field.md

Step 5: Confirm to user

Tell the user the file path where it was saved.

CLI Reference

defuddle parse <source> [options]

Arguments:

<source> — URL (https://...) or local HTML file path

Options:

Flag	Description
`-m, --markdown`	Convert content to Markdown
`-j, --json`	Output as JSON with full metadata
`-o, --output <file>`	Write to file instead of stdout
`-p, --property <name>`	Extract single property (title, description, domain, author, published, wordCount, content)
`--debug`	Verbose logging

JSON Response Fields

When using -j, the response includes:

title — Article title
author — Author name
published — Publication date
description — Meta description
content — Extracted Markdown (when -m used)
domain — Source domain
favicon — Favicon URL
image — Featured image URL
site — Site name

Notes

Requires Node.js and npm
jsdom is required as a peer dependency
Works best with article-style pages (blogs, news, documentation)
Not designed for SPAs or JavaScript-heavy pages (e.g. WeChat articles need browser rendering)

Weekly Installs

906

Repository

joeseesun/defuddle-skill

GitHub Stars

First Seen

Mar 4, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykWarn

Installed on

opencode869

cursor868

codex868

gemini-cli867

github-copilot867

kimi-cli866