data-analysis by bytedance/deer-flow
npx skills add https://github.com/bytedance/deer-flow --skill data-analysis此技能使用 DuckDB(一个进程内分析型 SQL 引擎)来分析用户上传的 Excel/CSV 文件。它支持模式检查、基于 SQL 的查询、统计摘要和结果导出,所有这些功能都通过一个 Python 脚本实现。
当用户上传数据文件并请求分析时,需要明确:
/mnt/user-data/uploads/ 下的路径/mnt/user-data 下的文件夹首先,检查上传的文件以了解其模式:
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action inspect
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
这将返回:
基于模式,构建 SQL 查询来回答用户的问题。
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action query \
--sql "SELECT category, COUNT(*) as count, AVG(amount) as avg_amount FROM Sheet1 GROUP BY category ORDER BY count DESC"
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action summary \
--table Sheet1
这将为每个数值列返回:计数、平均值、标准差、最小值、25%、50%、75%、最大值、空值计数。对于字符串列:计数、唯一值数量、最高频值、频率、空值计数。
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action query \
--sql "SELECT * FROM Sheet1 WHERE amount > 1000" \
--output-file /mnt/user-data/outputs/filtered-results.csv
支持的输出格式(根据扩展名自动检测):
.csv — 逗号分隔值.json — 记录组成的 JSON 数组.md — Markdown 表格| 参数 | 是否必需 | 描述 |
|---|---|---|
--files | 是 | Excel/CSV 文件的路径,以空格分隔 |
--action | 是 | 可选值:inspect, query, summary |
--sql | 仅 query 时需要 | 要执行的 SQL 查询 |
--table | 仅 summary 时需要 | 要汇总的表/工作表名称 |
--output-file | 否 | 导出结果的路径(CSV/JSON/MD) |
[!NOTE] 请不要读取 Python 文件,只需使用参数调用它。
Sheet1、Sales、Revenue)data.csv → data)"2024_Sales"-- 行数
SELECT COUNT(*) FROM Sheet1
-- 列中的不同值
SELECT DISTINCT category FROM Sheet1
-- 值分布
SELECT category, COUNT(*) as cnt FROM Sheet1 GROUP BY category ORDER BY cnt DESC
-- 日期范围
SELECT MIN(date_col), MAX(date_col) FROM Sheet1
-- 按类别和月份的营收
SELECT category, DATE_TRUNC('month', order_date) as month,
SUM(revenue) as total_revenue
FROM Sales
GROUP BY category, month
ORDER BY month, total_revenue DESC
-- 按消费额排名的前 10 名客户
SELECT customer_name, SUM(amount) as total_spend
FROM Orders GROUP BY customer_name
ORDER BY total_spend DESC LIMIT 10
-- 将销售数据与来自不同文件的客户信息连接
SELECT s.order_id, s.amount, c.customer_name, c.region
FROM sales s
JOIN customers c ON s.customer_id = c.id
WHERE s.amount > 500
-- 累计总和与排名
SELECT order_date, amount,
SUM(amount) OVER (ORDER BY order_date) as running_total,
RANK() OVER (ORDER BY amount DESC) as amount_rank
FROM Sales
-- 透视:按类别的月度营收
SELECT category,
SUM(CASE WHEN MONTH(date) = 1 THEN revenue END) as Jan,
SUM(CASE WHEN MONTH(date) = 2 THEN revenue END) as Feb,
SUM(CASE WHEN MONTH(date) = 3 THEN revenue END) as Mar
FROM Sales
GROUP BY category
用户上传 sales_2024.xlsx(包含工作表:Orders、Products、Customers)并询问:“分析我的销售数据——显示按营收排名的顶级产品和月度趋势。”
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action inspect
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action query \
--sql "SELECT p.product_name, SUM(o.quantity * o.unit_price) as total_revenue, SUM(o.quantity) as total_units FROM Orders o JOIN Products p ON o.product_id = p.id GROUP BY p.product_name ORDER BY total_revenue DESC LIMIT 10"
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action query \
--sql "SELECT DATE_TRUNC('month', order_date) as month, SUM(quantity * unit_price) as revenue FROM Orders GROUP BY month ORDER BY month" \
--output-file /mnt/user-data/outputs/monthly-trends.csv
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action summary \
--table Orders
向用户呈现结果,并清晰地解释发现、趋势和可操作的见解。
用户上传 orders.csv 和 customers.xlsx 并询问:“哪个地区的平均订单价值最高?”
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/orders.csv /mnt/user-data/uploads/customers.xlsx \
--action query \
--sql "SELECT c.region, AVG(o.amount) as avg_order_value, COUNT(*) as order_count FROM orders o JOIN Customers c ON o.customer_id = c.id GROUP BY c.region ORDER BY avg_order_value DESC"
分析完成后:
present_files 工具分享脚本会自动缓存加载的数据,以避免每次调用时重新解析文件:
/mnt/user-data/workspace/.data-analysis-cache/ 下的持久化 DuckDB 数据库中这在针对相同数据文件运行多个查询(检查 → 查询 → 摘要)时特别有用。
DATE_TRUNC、EXTRACT 等)"Column Name"每周安装量
204
代码仓库
GitHub 星标数
27.8K
首次出现
2026 年 2 月 17 日
安全审计
安装于
gemini-cli200
github-copilot200
opencode200
kimi-cli199
amp199
codex199
This skill analyzes user-uploaded Excel/CSV files using DuckDB — an in-process analytical SQL engine. It supports schema inspection, SQL-based querying, statistical summaries, and result export, all through a single Python script.
When a user uploads data files and requests analysis, identify:
/mnt/user-data/uploads//mnt/user-dataFirst, inspect the uploaded file to understand its schema:
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action inspect
This returns:
Based on the schema, construct SQL queries to answer the user's questions.
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action query \
--sql "SELECT category, COUNT(*) as count, AVG(amount) as avg_amount FROM Sheet1 GROUP BY category ORDER BY count DESC"
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action summary \
--table Sheet1
This returns for each numeric column: count, mean, std, min, 25%, 50%, 75%, max, null_count. For string columns: count, unique, top value, frequency, null_count.
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/data.xlsx \
--action query \
--sql "SELECT * FROM Sheet1 WHERE amount > 1000" \
--output-file /mnt/user-data/outputs/filtered-results.csv
Supported output formats (auto-detected from extension):
.csv — Comma-separated values.json — JSON array of records.md — Markdown table| Parameter | Required | Description |
|---|---|---|
--files | Yes | Space-separated paths to Excel/CSV files |
--action | Yes | One of: inspect, query, summary |
--sql | For query | SQL query to execute |
[!NOTE] Do NOT read the Python file, just call it with the parameters.
Sheet1, Sales, Revenue)data.csv → data)"2024_Sales"-- Row count
SELECT COUNT(*) FROM Sheet1
-- Distinct values in a column
SELECT DISTINCT category FROM Sheet1
-- Value distribution
SELECT category, COUNT(*) as cnt FROM Sheet1 GROUP BY category ORDER BY cnt DESC
-- Date range
SELECT MIN(date_col), MAX(date_col) FROM Sheet1
-- Revenue by category and month
SELECT category, DATE_TRUNC('month', order_date) as month,
SUM(revenue) as total_revenue
FROM Sales
GROUP BY category, month
ORDER BY month, total_revenue DESC
-- Top 10 customers by spend
SELECT customer_name, SUM(amount) as total_spend
FROM Orders GROUP BY customer_name
ORDER BY total_spend DESC LIMIT 10
-- Join sales with customer info from different files
SELECT s.order_id, s.amount, c.customer_name, c.region
FROM sales s
JOIN customers c ON s.customer_id = c.id
WHERE s.amount > 500
-- Running total and rank
SELECT order_date, amount,
SUM(amount) OVER (ORDER BY order_date) as running_total,
RANK() OVER (ORDER BY amount DESC) as amount_rank
FROM Sales
-- Pivot: monthly revenue by category
SELECT category,
SUM(CASE WHEN MONTH(date) = 1 THEN revenue END) as Jan,
SUM(CASE WHEN MONTH(date) = 2 THEN revenue END) as Feb,
SUM(CASE WHEN MONTH(date) = 3 THEN revenue END) as Mar
FROM Sales
GROUP BY category
User uploads sales_2024.xlsx (with sheets: Orders, Products, Customers) and asks: "Analyze my sales data — show top products by revenue and monthly trends."
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action inspect
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action query \
--sql "SELECT p.product_name, SUM(o.quantity * o.unit_price) as total_revenue, SUM(o.quantity) as total_units FROM Orders o JOIN Products p ON o.product_id = p.id GROUP BY p.product_name ORDER BY total_revenue DESC LIMIT 10"
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action query \
--sql "SELECT DATE_TRUNC('month', order_date) as month, SUM(quantity * unit_price) as revenue FROM Orders GROUP BY month ORDER BY month" \
--output-file /mnt/user-data/outputs/monthly-trends.csv
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/sales_2024.xlsx \
--action summary \
--table Orders
Present results to the user with clear explanations of findings, trends, and actionable insights.
User uploads orders.csv and customers.xlsx and asks: "Which region has the highest average order value?"
python /mnt/skills/public/data-analysis/scripts/analyze.py \
--files /mnt/user-data/uploads/orders.csv /mnt/user-data/uploads/customers.xlsx \
--action query \
--sql "SELECT c.region, AVG(o.amount) as avg_order_value, COUNT(*) as order_count FROM orders o JOIN Customers c ON o.customer_id = c.id GROUP BY c.region ORDER BY avg_order_value DESC"
After analysis:
present_files toolThe script automatically caches loaded data to avoid re-parsing files on every call:
/mnt/user-data/workspace/.data-analysis-cache/This is especially useful when running multiple queries against the same data files (inspect → query → summary).
DATE_TRUNC, EXTRACT, etc.)"Column Name"Weekly Installs
204
Repository
GitHub Stars
27.8K
First Seen
Feb 17, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykPass
Installed on
gemini-cli200
github-copilot200
opencode200
kimi-cli199
amp199
codex199
DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本
41,800 周安装
Google Sheets自动化教程:n8n工作流实现数据同步与任务管理
463 周安装
AI产品需求文档(PRD)撰写指南:功能规格模板、用户故事与成功指标
463 周安装
字节跳动Deer-Flow视频生成技能:结构化提示词AI视频制作工具
464 周安装
50+ AI智能体专家集合 | 适用于Claude Code、Cursor、Copilot等开发工具的专业人格
464 周安装
Playwriter:AI代理浏览器自动化工具,连接真实Chrome会话,保留登录状态
464 周安装
TanStack Query 中文指南:React 服务器状态管理与数据缓存库
464 周安装
--table | For summary | Table/sheet name to summarize |
--output-file | No | Path to export results (CSV/JSON/MD) |