content-parser by marswaveai/skills
npx skills add https://github.com/marswaveai/skills --skill content-parser从支持的平台提取并规范化 URL 内容。返回结构化数据,包括内容正文、元数据和引用。可作为内容生成技能的预处理步骤或独立的内容提取工具。
shared/authentication.md 以获取 API 密钥和请求头信息shared/common-patterns.md 中的轮询、错误和交互模式shared/config-pattern.md 读取配置~/Downloads/ 或 .listenhub/ — 保存到当前工作目录广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
遵循 shared/config-pattern.md § API 密钥检查。如果密钥缺失,立即停止。
遵循 shared/config-pattern.md 步骤 0(零问题启动)。
如果文件不存在 — 静默创建默认配置并继续:
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
不要询问任何设置问题。 直接进入交互流程。
如果文件存在 — 静默读取配置并继续:
CONFIG_PATH=".listenhub/content-parser/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
仅在用户明确要求重新配置时运行。显示当前设置:
当前配置 (content-parser):
自动下载:{是 / 否}
然后询问:
autoDownload: trueautoDownload: false立即保存:
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
自由文本输入。询问用户:
您想从哪个 URL 提取内容?
询问用户是否要配置提取选项:
问题:"您想配置提取选项吗?"
选项:
- "否,使用默认设置" — 使用默认设置提取
- "是,配置选项" — 设置摘要、最大长度或 Twitter 推文数量
如果选择“是”,则询问后续问题:
总结:
准备提取内容:
URL: {url}
选项: {summarize: true, maxLength: 5000, twitter.count: 50} / default
继续吗?
在调用 API 之前,等待明确的确认。
验证 URL : 必须是 HTTP(S)。如果需要,进行规范化(参见 references/supported-platforms.md)
构建请求体 :
{ "source": { "type": "url", "uri": "{url}" }, "options": { "summarize": true/false, "maxLength": 5000, "twitter": { "count": 50 } } }
如果用户选择了默认设置,则省略 options。
提交(前台) : POST /v1/content/extract → 提取 taskId
告知用户提取正在进行中
轮询(后台) : 使用 run_in_background: true 和 timeout: 300000 运行以下精确的 bash 命令。注意:状态字段是 .data.status(不是 processStatus),间隔为 5 秒,值为 processing/completed/failed:
TASK_ID="<id-from-step-3>"
for i in $(seq 1 60); do
RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "X-Source: skills" 2>/dev/null)
STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
case "$STATUS" in
completed) echo "$RESULT"; exit 0 ;;
failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
*) sleep 5 ;;
esac
done
echo "TIMEOUT" >&2; exit 2
收到通知后,下载并呈现结果 :
如果 autoDownload 是 true,则从提取的标题生成一个 slug(如果没有标题则回退到域名)。遵循 shared/config-pattern.md § 工件命名规则来生成 slug 并去重。
* 将 `{slug}.md` 写入**当前目录** — 完整的提取内容(Markdown 格式)
* 将 `{slug}.json` 写入**当前目录** — 完整的原始 API 响应数据
SLUG="{title-slug}" # 例如 "topology-wikipedia"
# 去重:检查文件是否存在
BASE="$SLUG"; i=2
while [ -e "${SLUG}.md" ] || [ -e "${SLUG}.json" ]; do SLUG="${BASE}-${i}"; i=$((i+1)); done
echo "$CONTENT_MD" > "${SLUG}.md"
echo "$RESULT" > "${SLUG}.json"
呈现:
内容提取完成!
来源:{url}
标题:{metadata.title}
长度:~{character count} 字符
消耗积分:{credits}
已保存到当前目录:
{slug}.md
{slug}.json
7. 显示提取内容的预览(前约 500 个字符)
/podcast, /tts)预计时间 : 10-30 秒,具体取决于内容大小和平台。
shared/api-content-extract.mdreferences/supported-platforms.mdshared/common-patterns.md § 异步轮询shared/common-patterns.md § 错误处理shared/config-pattern.md用户 : "解析这篇文章:https://en.wikipedia.org/wiki/Topology"
代理工作流程 :
https://en.wikipedia.org/wiki/Topologycurl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://en.wikipedia.org/wiki/Topology"
}
}'
4. 轮询直到完成:
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "X-Source: skills"
5. 呈现提取内容预览并提供后续操作建议。
用户 : "从 @elonmusk 提取最近的推文,获取 50 条"
代理工作流程 :
https://x.com/elonmusk{"twitter": {"count": 50}}curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'
4. 轮询直到完成,呈现结果。
每周安装次数
371
代码仓库
GitHub 星标数
24
首次出现
11 天前
安全审计
安装于
codex367
gemini-cli365
cursor365
opencode365
kimi-cli364
amp364
Extract and normalize content from URLs across supported platforms. Returns structured data including content body, metadata, and references. Useful as a preprocessing step for content generation skills or standalone content extraction.
shared/authentication.md for API key and headersshared/common-patterns.md for polling, errors, and interaction patternsshared/config-pattern.md before any interaction~/Downloads/ or .listenhub/ — save to the current working directoryFollow shared/config-pattern.md § API Key Check. If the key is missing, stop immediately.
Follow shared/config-pattern.md Step 0 (Zero-Question Boot).
If file doesn't exist — silently create with defaults and proceed:
mkdir -p ".listenhub/content-parser"
echo '{"autoDownload":true}' > ".listenhub/content-parser/config.json"
CONFIG_PATH=".listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Do NOT ask any setup questions. Proceed directly to the Interaction Flow.
If file exists — read config silently and proceed:
CONFIG_PATH=".listenhub/content-parser/config.json"
[ ! -f "$CONFIG_PATH" ] && CONFIG_PATH="$HOME/.listenhub/content-parser/config.json"
CONFIG=$(cat "$CONFIG_PATH")
Only run when the user explicitly asks to reconfigure. Display current settings:
当前配置 (content-parser):
自动下载:{是 / 否}
Then ask:
autoDownload: trueautoDownload: falseSave immediately:
NEW_CONFIG=$(echo "$CONFIG" | jq --argjson dl {true/false} '. + {"autoDownload": $dl}')
echo "$NEW_CONFIG" > "$CONFIG_PATH"
CONFIG=$(cat "$CONFIG_PATH")
Free text input. Ask the user:
What URL would you like to extract content from?
Ask if the user wants to configure extraction options:
Question: "Do you want to configure extraction options?"
Options:
- "No, use defaults" — Extract with default settings
- "Yes, configure options" — Set summarize, maxLength, or Twitter tweet count
If "Yes", ask follow-up questions:
Summarize:
Ready to extract content:
URL: {url}
Options: {summarize: true, maxLength: 5000, twitter.count: 50} / default
Proceed?
Wait for explicit confirmation before calling the API.
Validate URL : Must be HTTP(S). Normalize if needed (see references/supported-platforms.md)
Build request body :
{ "source": { "type": "url", "uri": "{url}" }, "options": { "summarize": true/false, "maxLength": 5000, "twitter": { "count": 50 } } }
Omit options if user chose defaults.
Submit (foreground) : POST /v1/content/extract → extract taskId
Tell the user extraction is in progress
Poll (background) : Run the following exact bash command with run_in_background: true and timeout: 300000. Note: status field is .data.status (not processStatus), interval is 5s, values are processing/completed/failed:
If autoDownload is true, generate a slug from the extracted title (falling back to domain name if no title). Follow shared/config-pattern.md § Artifact Naming for slug generation and dedup.
* Write `{slug}.md` to the **current directory** — full extracted content in markdown
* Write `{slug}.json` to the **current directory** — full raw API response data
SLUG="{title-slug}" # e.g. "topology-wikipedia"
# Dedup: check if files exist
BASE="$SLUG"; i=2
while [ -e "${SLUG}.md" ] || [ -e "${SLUG}.json" ]; do SLUG="${BASE}-${i}"; i=$((i+1)); done
echo "$CONTENT_MD" > "${SLUG}.md"
echo "$RESULT" > "${SLUG}.json"
Present:
内容提取完成!
来源:{url}
标题:{metadata.title}
长度:~{character count} 字符
消耗积分:{credits}
已保存到当前目录:
{slug}.md
{slug}.json
7. Show a preview of the extracted content (first ~500 chars)
/podcast, /tts)Estimated time : 10-30 seconds depending on content size and platform.
shared/api-content-extract.mdreferences/supported-platforms.mdshared/common-patterns.md § Async Pollingshared/common-patterns.md § Error Handlingshared/config-pattern.mdUser : "Parse this article: https://en.wikipedia.org/wiki/Topology"
Agent workflow :
https://en.wikipedia.org/wiki/Topologycurl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://en.wikipedia.org/wiki/Topology"
}
}'
4. Poll until complete:
curl -sS "https://api.marswave.ai/openapi/v1/content/extract/69a7dac700cf95938f86d9bb" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "X-Source: skills"
5. Present extracted content preview and offer next actions.
User : "Extract recent tweets from @elonmusk, get 50 tweets"
Agent workflow :
https://x.com/elonmusk{"twitter": {"count": 50}}curl -sS -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Source: skills" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'
4. Poll until complete, present results.
Weekly Installs
371
Repository
GitHub Stars
24
First Seen
11 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
codex367
gemini-cli365
cursor365
opencode365
kimi-cli364
amp364
Skills CLI 使用指南:AI Agent 技能包管理器安装与管理教程
19,000 周安装
Axiom 仪表板构建指南:设计决策优先的监控仪表板与数据可视化
311 周安装
Google Ads Manager 技能:广告系列管理、关键词研究、出价优化与效果分析
311 周安装
Telegram机器人开发教程:构建AI助手、通知系统与群组自动化工具
311 周安装
AI图像生成提示词优化指南:DALL-E、Midjourney、Stable Diffusion提示工程技巧
311 周安装
AI协作头脑风暴工具 - 将想法转化为完整设计规范,支持代码模板与项目管理
311 周安装
解决 Docker 沙盒 npm 安装崩溃:sandbox-npm-install 技能详解与使用指南
311 周安装
TASK_ID="<id-from-step-3>"
for i in $(seq 1 60); do
RESULT=$(curl -sS "https://api.marswave.ai/openapi/v1/content/extract/$TASK_ID"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "X-Source: skills" 2>/dev/null)
STATUS=$(echo "$RESULT" | tr -d '\000-\037\177' | jq -r '.data.status // "processing"')
case "$STATUS" in
completed) echo "$RESULT"; exit 0 ;;
failed) echo "FAILED: $RESULT" >&2; exit 1 ;;
*) sleep 5 ;;
esac
done
echo "TIMEOUT" >&2; exit 2
When notified, download and present result :