data-analysis by casper-studios/casper-marketplace
npx skills add https://github.com/casper-studios/casper-marketplace --skill data-analysis一个专为金融、SaaS 和 RevOps 场景优化的综合性数据分析和叙事技能。该技能提供结构化的工作流程,将原始数据转化为可操作的见解,并完全透明地展示分析决策、偏差认知和渐进式披露报告。
每次分析都遵循一个 7 阶段流程:
1. 设置 → 初始化 Marimo 笔记本(运行 init_marimo_notebook.py)
2. 数据摄取 → 加载数据,记录数据源和假设
3. 探索 → 进行探索性数据分析并记录决策(为何选择此可视化,为何应用此过滤器)
4. 建模 → 如有需要,采用可解释性优先的方法
5. 解释 → 应用偏差检查清单,适当进行限定说明
6. 数据愿望清单 → 记录数据缺口和使用的代理指标
7. 输出 → 生成相应层级的输出(幻灯片/报告/笔记本)
每个分析选择都必须被记录。 这创建了审计追踪并确保了可复现性。
| 决策类型 | 示例 | 记录格式 |
|---|---|---|
| 数据过滤 | 移除了 47 条收入为空的记录 | FILTER: [原因] - [数量] 条记录受影响 |
| 指标选择 | 使用徽标流失率而非收入流失率 | METRIC: 选择 [选定指标] 而非 [备选指标] 因为 [原因] |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 可视化 | 时间序列使用折线图 | VIZ: [类型] 因为 [原因] |
| 假设 | 假设线性增长进行预测 | ASSUMPTION: [陈述] - 置信度: [高/中/低] |
| 使用的代理指标 | 使用支持工单作为 NPS 的代理指标 | PROXY: [代理指标] 替代 [缺失数据] - 质量: [强/中/弱] |
# === 决策日志 ===
# FILTER: 排除了试用账户 - 移除了 1,247 条记录
# METRIC: 选择 NRR 而非 GRR,因为扩张是重要因素
# ASSUMPTION: Q4 季节性模式与去年相似 - 置信度: 中
# PROXY: 使用支持工单情感作为 NPS 代理 - 质量: 弱
运行初始化脚本,创建一个带有预构建脚手架的新 Marimo 笔记本:
python scripts/init_marimo_notebook.py <notebook_name>
这将创建一个包含以下内容的 .py 文件:
加载数据时:
# === 数据源 ===
# 来源: sales_data_2024.csv
# 加载时间: 2024-01-15
# 记录数: 15,847 行 x 23 列
# 备注: 数据截至 2024-01-10,与源系统有 5 天延迟
遵循此探索性数据分析检查清单:
记录每个可视化选择和过滤决策。
优先考虑可解释性:
始终提供:
在最终确定见解之前,运行偏差检查清单。 完整检查清单请参见 references/biases.md。
快速检查:
适当进行限定说明:
⚠️ 关卡:在进入输出阶段之前,您必须运行数据质量验证检查清单。
这是强制性的。在最终确定之前,请运行 references/data-quality-validator.md 中的检查:
关键模式检查清单:
统计检查:
逻辑检查:
关于时间范围的方法论说明: 在评估技能与运气时(例如,销售代表表现、投资回报):
在完成此检查清单之前,请勿进入阶段 6/7。
记录数据缺口和代理指标。模式请参见 references/data-wishlisting.md。
格式:
## 数据愿望清单
| 缺失数据 | 使用的代理指标 | 质量 | 对分析的影响 |
|--------------|------------|---------|-------------------|
| 客户 NPS | 支持工单情感 | 弱 | 核心发现,需要验证 |
| 真实 LTV | 12 个月价值 | 中 | 可用于细分市场分析 |
根据受众和目的选择输出层级:
| 层级 | 使用时机 | 工具 |
|---|---|---|
| 幻灯片 | 执行摘要、董事会演示文稿 | generate_pptx_summary.py |
| 报告 | 详细发现、利益相关者评审 | Markdown/PDF |
| 笔记本 | 完整分析、数据团队交接 | Marimo .py 文件 |
针对需要在分析前进行清洗的杂乱数据:
python scripts/profile_data.py <csv_file> --output data_quality_report.md
这将生成:
参考 references/data-cleaning.md 了解:
参考 references/datetime-handling.md 了解:
针对交互式监控仪表板:
python scripts/init_dashboard.py <dashboard_name>
这将创建一个包含以下内容的 Marimo 仪表板:
参考 references/dashboard-patterns.md 了解:
在呈现或接受分析结论之前:
参考 references/data-quality-validator.md 获取全面的检查清单:
统计谬误:
图表错误:
逻辑谬误:
合理性检查:
用于将分析结果导出到具有适当公式和格式的 Excel:
参考 references/xlsx-patterns.md 了解:
创建包含公式的 Excel 文件后,务必重新计算:
python scripts/recalc.py output.xlsx
这确保了:
用于从 PDF 提取数据或创建 PDF 报告:
参考 references/pdf-patterns.md 了解:
参考 references/pdf-patterns.md 了解:
在分析过程中根据需要加载这些文件:
| 参考文件 | 使用时机 |
|---|---|
references/metrics.md | 计算 SaaS/RevOps 指标 |
references/biases.md | 解释阶段,在最终确定见解之前 |
references/report-templates.md | 构建输出结构(金字塔式 vs 咨询式) |
references/visualization-guide.md | 选择图表类型,避免反模式 |
references/data-wishlisting.md | 记录数据缺口,评估代理指标质量 |
references/data-cleaning.md | 数据质量检查,清洗模式 |
references/datetime-handling.md | 时区、解析、财年日历 |
references/dashboard-patterns.md | Marimo 布局、KPI、交互性 |
references/data-quality-validator.md | 数据质量验证,问题检测 |
references/xlsx-patterns.md | Excel 输出、财务模型标准、公式 |
references/pdf-patterns.md | PDF 提取、报告创建、操作 |
| 脚本 | 用途 | 用法 |
|---|---|---|
scripts/init_marimo_notebook.py | 初始化分析工作空间 | python scripts/init_marimo_notebook.py <name> |
scripts/generate_pptx_summary.py | 根据发现创建幻灯片演示文稿 | python scripts/generate_pptx_summary.py <config.json> |
scripts/profile_data.py | 生成数据质量报告 | python scripts/profile_data.py <csv_file> |
scripts/init_dashboard.py | 搭建交互式仪表板脚手架 | python scripts/init_dashboard.py <name> |
scripts/recalc.py | 重新计算 Excel 公式 | python scripts/recalc.py <xlsx_file> |
| 工具 | 用途 | 原因 |
|---|---|---|
| Marimo | 笔记本环境 | 纯 Python 文件、响应式、对 Git 友好 |
| pandas | 数据操作 | 可靠的 LLM 代码生成、成熟的生态系统 |
| Matplotlib/Seaborn | 可视化 | 出版质量、静态、支持良好 |
| python-pptx | 幻灯片生成 | 程序化创建 PowerPoint |
| openpyxl | Excel 文件 | 公式、格式化、财务模型 |
| pypdf/pdfplumber | PDF 处理 | 提取文本、表格;创建报告 |
| reportlab | PDF 创建 | 专业 PDF 报告 |
收入分析:
"按细分市场分析我们的 ARR 趋势,并确定增长/流失的驱动因素"
销售管道分析:
"按交易规模和销售代表构建胜率分析"
队列分析:
"为 2023 年获取的客户创建留存队列分析"
预测:
"基于当前销售管道预测下季度收入"
董事会演示文稿:
"创建我们关键 SaaS 指标的执行摘要演示文稿"
数据清洗:
"清洗这个杂乱的 CSV 文件并剖析数据质量"
仪表板:
"构建一个仪表板来监控我们的关键 SaaS 指标"
数据验证:
"在我呈现这些发现之前进行验证"
Excel 输出:
"将此分析导出到具有适当公式和格式的 Excel"
PDF 提取:
"从此季度报告 PDF 中提取表格"
财务模型:
"在 Excel 中创建一个包含情景输入的收入预测模型"
每周安装量
98
代码仓库
GitHub 星标数
9
首次出现
2026 年 2 月 24 日
安全审计
安装于
cline98
github-copilot98
codex98
kimi-cli98
gemini-cli98
cursor98
A comprehensive data analysis and storytelling skill optimized for financial, SaaS, and RevOps contexts. This skill provides structured workflows for turning raw data into actionable insights with full transparency on analytical decisions, bias awareness, and progressive disclosure reporting.
Every analysis follows a 7-phase process:
1. SETUP → Initialize Marimo notebook (run init_marimo_notebook.py)
2. INGEST → Load data, document sources and assumptions
3. EXPLORE → EDA with logged decisions (why this viz, why this filter)
4. MODEL → If needed, with interpretable-first approach
5. INTERPRET → Apply bias checklist, hedge appropriately
6. WISHLIST → Document data gaps and proxies used
7. OUTPUT → Generate appropriate tier (slides/report/notebook)
Every analytical choice must be logged. This creates an audit trail and enables reproducibility.
| Decision Type | Example | Log Format |
|---|---|---|
| Data filtering | Removed 47 records with null revenue | FILTER: [reason] - [count] records affected |
| Metric choice | Used logo churn vs revenue churn | METRIC: [chosen] over [alternative] because [reason] |
| Visualization | Line chart for time series | VIZ: [type] because [reason] |
| Assumption | Assumed linear growth for projection | ASSUMPTION: [statement] - confidence: [H/M/L] |
| Proxy used | Used support tickets as NPS proxy | PROXY: [proxy] for [missing data] - quality: [S/M/W] |
# === DECISION LOG ===
# FILTER: Excluded trial accounts - 1,247 records removed
# METRIC: NRR over GRR because expansion is significant factor
# ASSUMPTION: Q4 seasonality similar to prior year - confidence: M
# PROXY: Support ticket sentiment for NPS - quality: Weak
Run the initialization script to create a new Marimo notebook with pre-built scaffolding:
python scripts/init_marimo_notebook.py <notebook_name>
This creates a .py file with:
When loading data:
# === DATA SOURCE ===
# Source: sales_data_2024.csv
# Loaded: 2024-01-15
# Records: 15,847 rows x 23 columns
# Note: Data through 2024-01-10, 5-day lag from source system
Follow this EDA checklist:
Log every visualization choice and filtering decision.
Prioritize interpretability:
Always provide:
Before finalizing insights, run the bias checklist. See references/biases.md for full checklist.
Quick check:
Hedge appropriately:
⚠️ GATE: Before proceeding to output, you MUST run the data quality validation checklist.
This is not optional. Run through references/data-quality-validator.md before finalizing:
Critical Patterns Checklist:
Statistical Checks:
Logic Checks:
Methodology Note on Time Horizons: When assessing skill vs luck (e.g., sales rep performance, investment returns):
Do not proceed to Phase 6/7 until this checklist is complete.
Document gaps and proxies. See references/data-wishlisting.md for patterns.
Format:
## Data Wishlist
| Missing Data | Proxy Used | Quality | Impact on Analysis |
|--------------|------------|---------|-------------------|
| Customer NPS | Support sentiment | Weak | Core finding, needs validation |
| True LTV | 12-month value | Moderate | Acceptable for segmentation |
Choose output tier based on audience and purpose:
| Tier | When to Use | Tool |
|---|---|---|
| Slides | Executive summary, board deck | generate_pptx_summary.py |
| Report | Detailed findings, stakeholder review | Markdown/PDF |
| Notebook | Full analysis, data team handoff | Marimo .py file |
For messy data that needs cleaning before analysis:
python scripts/profile_data.py <csv_file> --output data_quality_report.md
This generates:
Reference references/data-cleaning.md for:
Reference references/datetime-handling.md for:
For interactive monitoring dashboards:
python scripts/init_dashboard.py <dashboard_name>
This creates a Marimo dashboard with:
Reference references/dashboard-patterns.md for:
Before presenting or accepting analytical claims:
Reference references/data-quality-validator.md for comprehensive checklists:
Statistical Sins:
Chart Crimes:
Logic Fallacies:
Sanity Checks:
For exporting analysis results to Excel with proper formulas and formatting:
Reference references/xlsx-patterns.md for:
After creating Excel files with formulas, always recalculate:
python scripts/recalc.py output.xlsx
This ensures:
For extracting data from PDFs or creating PDF reports:
Reference references/pdf-patterns.md for:
Reference references/pdf-patterns.md for:
Load these as needed during analysis:
| Reference | When to Use |
|---|---|
references/metrics.md | Calculating SaaS/RevOps metrics |
references/biases.md | Interpretation phase, before finalizing insights |
references/report-templates.md | Structuring output (pyramid vs consulting style) |
references/visualization-guide.md | Choosing chart types, avoiding anti-patterns |
references/data-wishlisting.md | Documenting gaps, rating proxy quality |
references/data-cleaning.md |
| Script | Purpose | Usage |
|---|---|---|
scripts/init_marimo_notebook.py | Initialize analysis workspace | python scripts/init_marimo_notebook.py <name> |
scripts/generate_pptx_summary.py | Create slide deck from findings | python scripts/generate_pptx_summary.py <config.json> |
scripts/profile_data.py | Generate data quality report | python scripts/profile_data.py <csv_file> |
| Tool | Purpose | Why |
|---|---|---|
| Marimo | Notebook environment | Pure Python files, reactive, git-friendly |
| pandas | Data manipulation | Reliable LLM code generation, mature ecosystem |
| Matplotlib/Seaborn | Visualization | Publication-quality, static, well-supported |
| python-pptx | Slide generation | Programmatic PowerPoint creation |
| openpyxl | Excel files | Formulas, formatting, financial models |
| pypdf/pdfplumber | PDF handling | Extract text, tables; create reports |
| reportlab | PDF creation | Professional PDF reports |
Revenue analysis:
"Analyze our ARR trends by segment and identify drivers of growth/churn"
Pipeline analytics:
"Build a win rate analysis by deal size and sales rep"
Cohort analysis:
"Create a retention cohort analysis for customers acquired in 2023"
Forecasting:
"Project next quarter revenue based on current pipeline"
Board deck:
"Create an executive summary deck of our key SaaS metrics"
Data cleaning:
"Clean this messy CSV and profile the data quality"
Dashboard:
"Build a dashboard to monitor our key SaaS metrics"
Data validation:
"Validate these findings before I present them"
Excel output:
"Export this analysis to Excel with proper formulas and formatting"
PDF extraction:
"Extract the tables from this quarterly report PDF"
Financial model:
"Create a revenue projection model in Excel with scenario inputs"
Weekly Installs
98
Repository
GitHub Stars
9
First Seen
Feb 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
cline98
github-copilot98
codex98
kimi-cli98
gemini-cli98
cursor98
DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本
51,800 周安装
| Data quality checks, cleaning patterns |
references/datetime-handling.md | Timezone, parsing, fiscal calendars |
references/dashboard-patterns.md | Marimo layouts, KPIs, interactivity |
references/data-quality-validator.md | Data quality validation, detecting issues |
references/xlsx-patterns.md | Excel output, financial model standards, formulas |
references/pdf-patterns.md | PDF extraction, report creation, manipulation |
scripts/init_dashboard.py | Scaffold interactive dashboard | python scripts/init_dashboard.py <name> |
scripts/recalc.py | Recalculate Excel formulas | python scripts/recalc.py <xlsx_file> |