专业级数据分析技能：SaaS指标计算、麦肯锡风格图表与Streamlit仪表板开发

data-analysis by scientiacapital/skills

101 周安装量

6 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/scientiacapital/skills --skill data-analysis

数据可视化数据分析 SaaS

🇨🇳中文介绍

<quick_start> 通用数据加载器：

df = load_data("file.csv")  # 支持 CSV、Excel、JSON、Parquet、PDF、PPTX

SaaS 指标：

metrics = calculate_saas_metrics(df)  # MRR、ARR、LTV、CAC、流失率
retention = cohort_retention_analysis(df)  # 留存矩阵

麦肯锡风格图表： 使用行动标题（例如"第四季度收入超出目标 23%"），而非描述性标题 </quick_start>

<success_criteria> 分析成功的标准：

数据已加载并完成清洗（删除空值、去重、类型转换）
指标计算正确（MRR、ARR、LTV:CAC、流失率、同期群留存率）
图表遵循麦肯锡原则：行动标题、数据墨水比 >80%、每张图表传达一个核心信息
使用高管配色（主色 #003366、正向色 #2E7D32、负向色 #C62828）
Streamlit 仪表板运行无错误
禁止使用 OpenAI：如需生成叙述性内容，请使用 Claude </success_criteria>

<core_content> 面向风投、私募股权和 C 级高管演示文稿的、专业级的数据分析，使用 pandas、polars、Plotly、Altair 和 Streamlit。

快速参考

任务	工具	输出
数据摄取	pandas、polars、pdfplumber、python-pptx

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

通用数据加载器

import pandas as pd
import polars as pl
from pathlib import Path

def load_data(file_path: str) -> pd.DataFrame:
    """从任何常见格式加载数据。"""
    path = Path(file_path)
    suffix = path.suffix.lower()

    loaders = {
        '.csv': lambda p: pd.read_csv(p),
        '.xlsx': lambda p: pd.read_excel(p, engine='openpyxl'),
        '.xls': lambda p: pd.read_excel(p, engine='xlrd'),
        '.json': lambda p: pd.read_json(p),
        '.parquet': lambda p: pd.read_parquet(p),
        '.sql': lambda p: pd.read_sql(open(p).read(), conn),
        '.md': lambda p: parse_markdown_tables(p),
        '.pdf': lambda p: extract_pdf_tables(p),
        '.pptx': lambda p: extract_pptx_tables(p),
    }

    if suffix not in loaders:
        raise ValueError(f"不支持的格式：{suffix}")

    return loaders[suffix](path)

import pdfplumber

def extract_pdf_tables(pdf_path: str) -> pd.DataFrame:
    """使用 pdfplumber 从 PDF 中提取表格。"""
    all_tables = []

    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                if table and len(table) > 1:
                    df = pd.DataFrame(table[1:], columns=table[0])
                    all_tables.append(df)

    return pd.concat(all_tables, ignore_index=True) if all_tables else pd.DataFrame()

PowerPoint 数据提取

from pptx import Presentation
from pptx.util import Inches

def extract_pptx_tables(pptx_path: str) -> list[pd.DataFrame]:
    """从 PowerPoint 中提取所有表格。"""
    prs = Presentation(pptx_path)
    tables = []

    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.has_table:
                table = shape.table
                data = []
                for row in table.rows:
                    data.append([cell.text for cell in row.cells])
                df = pd.DataFrame(data[1:], columns=data[0])
                tables.append(df)

    return tables

Polars 性能优化（比 pandas 快 30 倍）

import polars as pl

# 针对大型数据集的惰性求值
df = (
    pl.scan_csv("large_file.csv")
    .filter(pl.col("revenue") > 0)
    .with_columns([
        (pl.col("revenue") / pl.col("customers")).alias("arpu"),
        pl.col("date").str.to_date().alias("date_parsed"),
    ])
    .group_by("segment")
    .agg([
        pl.col("revenue").sum().alias("total_revenue"),
        pl.col("customers").mean().alias("avg_customers"),
    ])
    .collect()
)

def prepare_for_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """标准数据准备流程。"""
    return (df
        .dropna(subset=['key_column'])
        .drop_duplicates()
        .assign(
            date=lambda x: pd.to_datetime(x['date']),
            revenue=lambda x: pd.to_numeric(x['revenue'], errors='coerce'),
            month=lambda x: x['date'].dt.to_period('M'),
        )
        .sort_values('date')
        .reset_index(drop=True)
    )

def calculate_saas_metrics(df: pd.DataFrame) -> dict:
    """为投资者报告计算关键的 SaaS 指标。"""

    # MRR / ARR
    mrr = df.groupby('month')['mrr'].sum()
    arr = mrr.iloc[-1] * 12

    # 增长率
    mrr_growth = mrr.pct_change().iloc[-1]

    # 流失率
    churned = df[df['status'] == 'churned']['mrr'].sum()
    total_mrr = df['mrr'].sum()
    churn_rate = churned / total_mrr if total_mrr > 0 else 0

    # CAC 和 LTV
    total_sales_marketing = df['sales_cost'].sum() + df['marketing_cost'].sum()
    new_customers = df[df['is_new']]['customer_id'].nunique()
    cac = total_sales_marketing / new_customers if new_customers > 0 else 0

    avg_revenue_per_customer = df.groupby('customer_id')['mrr'].mean().mean()
    avg_lifespan_months = 1 / churn_rate if churn_rate > 0 else 36
    ltv = avg_revenue_per_customer * avg_lifespan_months

    ltv_cac_ratio = ltv / cac if cac > 0 else 0
    cac_payback_months = cac / avg_revenue_per_customer if avg_revenue_per_customer > 0 else 0

    return {
        'mrr': mrr.iloc[-1],
        'arr': arr,
        'mrr_growth': mrr_growth,
        'churn_rate': churn_rate,
        'cac': cac,
        'ltv': ltv,
        'ltv_cac_ratio': ltv_cac_ratio,
        'cac_payback_months': cac_payback_months,
    }

def cohort_retention_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """为投资者报告构建同期群留存矩阵。"""

    # 分配同期群（首次购买月份）
    df['cohort'] = df.groupby('customer_id')['date'].transform('min').dt.to_period('M')
    df['period'] = df['date'].dt.to_period('M')
    df['cohort_age'] = (df['period'] - df['cohort']).apply(lambda x: x.n)

    # 构建留存矩阵
    cohort_data = df.groupby(['cohort', 'cohort_age']).agg({
        'customer_id': 'nunique',
        'revenue': 'sum'
    }).reset_index()

    # 数据透视以便可视化
    cohort_counts = cohort_data.pivot(
        index='cohort',
        columns='cohort_age',
        values='customer_id'
    )

    # 计算留存百分比
    cohort_sizes = cohort_counts.iloc[:, 0]
    retention = cohort_counts.divide(cohort_sizes, axis=0) * 100

    return retention

麦肯锡/BCG 图表原则

mckinsey_style:
  colors:
    primary: "#003366"      # 深蓝色
    accent: "#0066CC"       # 亮蓝色
    positive: "#2E7D32"     # 绿色
    negative: "#C62828"     # 红色
    neutral: "#757575"      # 灰色

  typography:
    title: "Georgia, serif"
    body: "Arial, sans-serif"
    size_title: 18
    size_body: 12

  principles:
    - "每张图表传达一个核心信息"
    - "使用行动标题（而非描述性标题）"
    - "数据墨水比 > 80%"
    - "移除图表垃圾"
    - "直接在图表上标注"

Plotly 高管级图表

import plotly.express as px
import plotly.graph_objects as go

EXEC_COLORS = {
    'primary': '#003366',
    'secondary': '#0066CC',
    'positive': '#2E7D32',
    'negative': '#C62828',
    'neutral': '#757575',
}

def exec_line_chart(df, x, y, title):
    """麦肯锡风格折线图。"""
    fig = px.line(df, x=x, y=y)

    fig.update_layout(
        title=dict(
            text=f"<b>{title}</b>",
            font=dict(size=18, family="Georgia"),
            x=0,
        ),
        font=dict(family="Arial", size=12),
        plot_bgcolor='white',
        xaxis=dict(showgrid=False, showline=True, linecolor='black'),
        yaxis=dict(showgrid=True, gridcolor='#E0E0E0', showline=True, linecolor='black'),
        margin=dict(l=60, r=40, t=60, b=40),
    )

    fig.update_traces(line=dict(color=EXEC_COLORS['primary'], width=3))

    return fig

def exec_waterfall(values, labels, title):
    """用于收入/成本分解的瀑布图。"""
    fig = go.Figure(go.Waterfall(
        orientation="v",
        measure=["relative"] * (len(values) - 1) + ["total"],
        x=labels,
        y=values,
        connector=dict(line=dict(color="rgb(63, 63, 63)")),
        increasing=dict(marker=dict(color=EXEC_COLORS['positive'])),
        decreasing=dict(marker=dict(color=EXEC_COLORS['negative'])),
        totals=dict(marker=dict(color=EXEC_COLORS['primary'])),
    ))

    fig.update_layout(
        title=dict(text=f"<b>{title}</b>", font=dict(size=18, family="Georgia")),
        font=dict(family="Arial", size=12),
        plot_bgcolor='white',
        showlegend=False,
    )

    return fig

def cohort_heatmap(retention_df, title="按同期群划分的客户留存率"):
    """出版物质量的同期群留存热力图。"""
    import plotly.figure_factory as ff

    fig = px.imshow(
        retention_df.values,
        labels=dict(x="获客后月数", y="同期群", color="留存率 %"),
        x=list(retention_df.columns),
        y=[str(c) for c in retention_df.index],
        color_continuous_scale='Blues',
        aspect='auto',
    )

    # 添加文本注释
    for i, row in enumerate(retention_df.values):
        for j, val in enumerate(row):
            if not pd.isna(val):
                fig.add_annotation(
                    x=j, y=i,
                    text=f"{val:.0f}%",
                    showarrow=False,
                    font=dict(color='white' if val > 50 else 'black', size=10)
                )

    fig.update_layout(
        title=dict(text=f"<b>{title}</b>", font=dict(size=18, family="Georgia")),
        font=dict(family="Arial", size=12),
    )

    return fig

Streamlit 仪表板模板

import streamlit as st
import pandas as pd
import plotly.express as px

st.set_page_config(page_title="高管仪表板", layout="wide")

# 用于高管风格的自定义 CSS
st.markdown("""
<style>
    .metric-card {
        background: linear-gradient(135deg, #003366, #0066CC);
        padding: 20px;
        border-radius: 10px;
        color: white;
    }
    .stMetric label { font-family: Georgia, serif; }
</style>
""", unsafe_allow_html=True)

# 页眉
st.title("高管仪表板")
st.markdown("---")

# KPI 行
col1, col2, col3, col4 = st.columns(4)

with col1:
    st.metric("MRR", f"${mrr:,.0f}", f"{mrr_growth:+.1%}")
with col2:
    st.metric("ARR", f"${arr:,.0f}", f"{arr_growth:+.1%}")
with col3:
    st.metric("LTV:CAC", f"{ltv_cac:.1f}x", delta_color="normal")
with col4:
    st.metric("流失率", f"{churn:.1%}", f"{churn_delta:+.1%}", delta_color="inverse")

# 图表行
st.markdown("## 收入趋势")
st.plotly_chart(exec_line_chart(df, 'month', 'revenue', 'MRR 增长超出目标'), use_container_width=True)

# 同期群分析
st.markdown("## 同期群留存")
st.plotly_chart(cohort_heatmap(retention_df), use_container_width=True)

投资者演示文稿模式

融资演讲稿指标序列

investor_metrics_flow:
  1_unit_economics:
    charts: ["CAC vs LTV 柱状图", "LTV:CAC 趋势线"]
    key_message: "3倍以上的 LTV:CAC 证明了高效增长"

  2_mrr_waterfall:
    charts: ["MRR 瀑布图（新增、扩张、流失、收缩）"]
    key_message: "净收入留存率 > 100%"

  3_cohort_retention:
    charts: ["同期群热力图", "收入留存曲线"]
    key_message: "强劲的留存率 = 复合价值"

  4_growth_efficiency:
    charts: ["魔力数字", "CAC 回收期"]
    key_message: "高效的增长引擎"

  5_projections:
    charts: ["包含情景的 ARR 预测"]
    key_message: "清晰的 ARR 达到 $X 的路径"

行动标题（麦肯锡风格）

## 差（描述性） → 好（行动性）

❌ "按季度划分的收入"
✅ "第四季度收入超出目标 23%"

❌ "客户获取成本"
✅ "CAC 下降 40%，同时保持质量"

❌ "同期群分析"
✅ "90 天留存率提升至 85%，高于之前的 72%"

❌ "市场规模"
✅ "TAM 达 42 亿美元，拥有清晰的 5 亿美元 SAM 路径"

# 加载并分析任何文件
df = load_data("data.csv")
metrics = calculate_saas_metrics(df)
retention = cohort_retention_analysis(df)

# 生成高管级图表
fig = exec_line_chart(df, 'month', 'mrr', 'MRR 增长加速')
fig.write_html("mrr_chart.html")
fig.write_image("mrr_chart.png", scale=2)

# 运行 Streamlit 仪表板
# streamlit run dashboard.py

可与以下技能配合使用 : revenue-ops-skill（指标）、pricing-strategy-skill（建模）
技术栈 : Python 3.11+、pandas、polars、plotly、altair、streamlit
项目 : coperniq-forge（ROI 计算器）、thetaroom（交易分析）
禁止使用 OpenAI : 如需生成叙述性内容，请使用 Claude

reference/chart-gallery.md - 20 多个带代码的图表模板
reference/saas-metrics.md - 完整的 SaaS KPI 定义
reference/streamlit-patterns.md - 生产级仪表板模式
reference/data-wrangling.md - 特定格式的提取指南

输出结果侧车文件

写入 ~/.claude/skill-analytics/last-outcome-data-analysis.json: {"ts":"[UTC ISO8601]","skill":"data-analysis","version":"1.0.0","variant":"default","status":"[success|partial|error]","runtime_ms":[ms],"metrics":{"datasets_analyzed":[n],"charts_generated":[n],"insights_extracted":[n]},"error":null,"session_id":"[YYYY-MM-DD]"}

2026 年 1 月 23 日

🇺🇸English

<quick_start> Universal data loader:

df = load_data("file.csv")  # Supports CSV, Excel, JSON, Parquet, PDF, PPTX

SaaS metrics:

metrics = calculate_saas_metrics(df)  # MRR, ARR, LTV, CAC, churn
retention = cohort_retention_analysis(df)  # Retention matrix

McKinsey-style charts: Action titles ("Q4 Revenue Exceeded Target by 23%"), not descriptive titles </quick_start>

<success_criteria> Analysis is successful when:

Data loaded and cleaned (dropna, dedup, type conversion)
Metrics calculated correctly (MRR, ARR, LTV:CAC, churn, cohort retention)
Charts follow McKinsey principles: action titles, data-ink ratio >80%, one message per chart
Executive colors used (#003366 primary, #2E7D32 positive, #C62828 negative)
Streamlit dashboard runs without errors
NO OPENAI: Use Claude for narrative generation if needed </success_criteria>

<core_content> Executive-grade data analysis for VC, PE, C-suite presentations using pandas, polars, Plotly, Altair, and Streamlit.

Quick Reference

Task	Tools	Output
Data ingestion	pandas, polars, pdfplumber, python-pptx	DataFrame
Wrangling	pandas/polars transforms	Clean dataset
Analysis	numpy, scipy, statsmodels	Insights
Visualization	Plotly, Altair, Seaborn	Charts
Dashboards	Streamlit, DuckDB	Interactive apps
Presentations	Plotly export, PDF generation	Investor-ready

Data Ingestion Patterns

Universal Data Loader

import pandas as pd
import polars as pl
from pathlib import Path

def load_data(file_path: str) -> pd.DataFrame:
    """Load data from any common format."""
    path = Path(file_path)
    suffix = path.suffix.lower()

    loaders = {
        '.csv': lambda p: pd.read_csv(p),
        '.xlsx': lambda p: pd.read_excel(p, engine='openpyxl'),
        '.xls': lambda p: pd.read_excel(p, engine='xlrd'),
        '.json': lambda p: pd.read_json(p),
        '.parquet': lambda p: pd.read_parquet(p),
        '.sql': lambda p: pd.read_sql(open(p).read(), conn),
        '.md': lambda p: parse_markdown_tables(p),
        '.pdf': lambda p: extract_pdf_tables(p),
        '.pptx': lambda p: extract_pptx_tables(p),
    }

    if suffix not in loaders:
        raise ValueError(f"Unsupported format: {suffix}")

    return loaders[suffix](path)

PDF Table Extraction

import pdfplumber

def extract_pdf_tables(pdf_path: str) -> pd.DataFrame:
    """Extract tables from PDF using pdfplumber."""
    all_tables = []

    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                if table and len(table) > 1:
                    df = pd.DataFrame(table[1:], columns=table[0])
                    all_tables.append(df)

    return pd.concat(all_tables, ignore_index=True) if all_tables else pd.DataFrame()

PowerPoint Data Extraction

from pptx import Presentation
from pptx.util import Inches

def extract_pptx_tables(pptx_path: str) -> list[pd.DataFrame]:
    """Extract all tables from PowerPoint."""
    prs = Presentation(pptx_path)
    tables = []

    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.has_table:
                table = shape.table
                data = []
                for row in table.rows:
                    data.append([cell.text for cell in row.cells])
                df = pd.DataFrame(data[1:], columns=data[0])
                tables.append(df)

    return tables

Data Wrangling Patterns

Polars for Performance (30x faster than pandas)

import polars as pl

# Lazy evaluation for large datasets
df = (
    pl.scan_csv("large_file.csv")
    .filter(pl.col("revenue") > 0)
    .with_columns([
        (pl.col("revenue") / pl.col("customers")).alias("arpu"),
        pl.col("date").str.to_date().alias("date_parsed"),
    ])
    .group_by("segment")
    .agg([
        pl.col("revenue").sum().alias("total_revenue"),
        pl.col("customers").mean().alias("avg_customers"),
    ])
    .collect()
)

Common Transformations

def prepare_for_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """Standard data prep pipeline."""
    return (df
        .dropna(subset=['key_column'])
        .drop_duplicates()
        .assign(
            date=lambda x: pd.to_datetime(x['date']),
            revenue=lambda x: pd.to_numeric(x['revenue'], errors='coerce'),
            month=lambda x: x['date'].dt.to_period('M'),
        )
        .sort_values('date')
        .reset_index(drop=True)
    )

SaaS Metrics Calculations

Core Metrics

def calculate_saas_metrics(df: pd.DataFrame) -> dict:
    """Calculate key SaaS metrics for investor reporting."""

    # MRR / ARR
    mrr = df.groupby('month')['mrr'].sum()
    arr = mrr.iloc[-1] * 12

    # Growth rates
    mrr_growth = mrr.pct_change().iloc[-1]

    # Churn
    churned = df[df['status'] == 'churned']['mrr'].sum()
    total_mrr = df['mrr'].sum()
    churn_rate = churned / total_mrr if total_mrr > 0 else 0

    # CAC & LTV
    total_sales_marketing = df['sales_cost'].sum() + df['marketing_cost'].sum()
    new_customers = df[df['is_new']]['customer_id'].nunique()
    cac = total_sales_marketing / new_customers if new_customers > 0 else 0

    avg_revenue_per_customer = df.groupby('customer_id')['mrr'].mean().mean()
    avg_lifespan_months = 1 / churn_rate if churn_rate > 0 else 36
    ltv = avg_revenue_per_customer * avg_lifespan_months

    ltv_cac_ratio = ltv / cac if cac > 0 else 0
    cac_payback_months = cac / avg_revenue_per_customer if avg_revenue_per_customer > 0 else 0

    return {
        'mrr': mrr.iloc[-1],
        'arr': arr,
        'mrr_growth': mrr_growth,
        'churn_rate': churn_rate,
        'cac': cac,
        'ltv': ltv,
        'ltv_cac_ratio': ltv_cac_ratio,
        'cac_payback_months': cac_payback_months,
    }

Cohort Analysis

def cohort_retention_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """Build cohort retention matrix for investor reporting."""

    # Assign cohort (first purchase month)
    df['cohort'] = df.groupby('customer_id')['date'].transform('min').dt.to_period('M')
    df['period'] = df['date'].dt.to_period('M')
    df['cohort_age'] = (df['period'] - df['cohort']).apply(lambda x: x.n)

    # Build retention matrix
    cohort_data = df.groupby(['cohort', 'cohort_age']).agg({
        'customer_id': 'nunique',
        'revenue': 'sum'
    }).reset_index()

    # Pivot for visualization
    cohort_counts = cohort_data.pivot(
        index='cohort',
        columns='cohort_age',
        values='customer_id'
    )

    # Calculate retention percentages
    cohort_sizes = cohort_counts.iloc[:, 0]
    retention = cohort_counts.divide(cohort_sizes, axis=0) * 100

    return retention

Executive Visualization

McKinsey/BCG Chart Principles

mckinsey_style:
  colors:
    primary: "#003366"      # Deep blue
    accent: "#0066CC"       # Bright blue
    positive: "#2E7D32"     # Green
    negative: "#C62828"     # Red
    neutral: "#757575"      # Gray

  typography:
    title: "Georgia, serif"
    body: "Arial, sans-serif"
    size_title: 18
    size_body: 12

  principles:
    - "One message per chart"
    - "Action title (not descriptive)"
    - "Data-ink ratio > 80%"
    - "Remove chartjunk"
    - "Label directly on chart"

Plotly Executive Charts

import plotly.express as px
import plotly.graph_objects as go

EXEC_COLORS = {
    'primary': '#003366',
    'secondary': '#0066CC',
    'positive': '#2E7D32',
    'negative': '#C62828',
    'neutral': '#757575',
}

def exec_line_chart(df, x, y, title):
    """McKinsey-style line chart."""
    fig = px.line(df, x=x, y=y)

    fig.update_layout(
        title=dict(
            text=f"<b>{title}</b>",
            font=dict(size=18, family="Georgia"),
            x=0,
        ),
        font=dict(family="Arial", size=12),
        plot_bgcolor='white',
        xaxis=dict(showgrid=False, showline=True, linecolor='black'),
        yaxis=dict(showgrid=True, gridcolor='#E0E0E0', showline=True, linecolor='black'),
        margin=dict(l=60, r=40, t=60, b=40),
    )

    fig.update_traces(line=dict(color=EXEC_COLORS['primary'], width=3))

    return fig

def exec_waterfall(values, labels, title):
    """Waterfall chart for revenue/cost breakdown."""
    fig = go.Figure(go.Waterfall(
        orientation="v",
        measure=["relative"] * (len(values) - 1) + ["total"],
        x=labels,
        y=values,
        connector=dict(line=dict(color="rgb(63, 63, 63)")),
        increasing=dict(marker=dict(color=EXEC_COLORS['positive'])),
        decreasing=dict(marker=dict(color=EXEC_COLORS['negative'])),
        totals=dict(marker=dict(color=EXEC_COLORS['primary'])),
    ))

    fig.update_layout(
        title=dict(text=f"<b>{title}</b>", font=dict(size=18, family="Georgia")),
        font=dict(family="Arial", size=12),
        plot_bgcolor='white',
        showlegend=False,
    )

    return fig

Cohort Heatmap

def cohort_heatmap(retention_df, title="Customer Retention by Cohort"):
    """Publication-quality cohort retention heatmap."""
    import plotly.figure_factory as ff

    fig = px.imshow(
        retention_df.values,
        labels=dict(x="Months Since Acquisition", y="Cohort", color="Retention %"),
        x=list(retention_df.columns),
        y=[str(c) for c in retention_df.index],
        color_continuous_scale='Blues',
        aspect='auto',
    )

    # Add text annotations
    for i, row in enumerate(retention_df.values):
        for j, val in enumerate(row):
            if not pd.isna(val):
                fig.add_annotation(
                    x=j, y=i,
                    text=f"{val:.0f}%",
                    showarrow=False,
                    font=dict(color='white' if val > 50 else 'black', size=10)
                )

    fig.update_layout(
        title=dict(text=f"<b>{title}</b>", font=dict(size=18, family="Georgia")),
        font=dict(family="Arial", size=12),
    )

    return fig

Streamlit Dashboard Template

import streamlit as st
import pandas as pd
import plotly.express as px

st.set_page_config(page_title="Executive Dashboard", layout="wide")

# Custom CSS for executive styling
st.markdown("""
<style>
    .metric-card {
        background: linear-gradient(135deg, #003366, #0066CC);
        padding: 20px;
        border-radius: 10px;
        color: white;
    }
    .stMetric label { font-family: Georgia, serif; }
</style>
""", unsafe_allow_html=True)

# Header
st.title("Executive Dashboard")
st.markdown("---")

# KPI Row
col1, col2, col3, col4 = st.columns(4)

with col1:
    st.metric("MRR", f"${mrr:,.0f}", f"{mrr_growth:+.1%}")
with col2:
    st.metric("ARR", f"${arr:,.0f}", f"{arr_growth:+.1%}")
with col3:
    st.metric("LTV:CAC", f"{ltv_cac:.1f}x", delta_color="normal")
with col4:
    st.metric("Churn", f"{churn:.1%}", f"{churn_delta:+.1%}", delta_color="inverse")

# Charts Row
st.markdown("## Revenue Trend")
st.plotly_chart(exec_line_chart(df, 'month', 'revenue', 'MRR Growth Exceeds Target'), use_container_width=True)

# Cohort Analysis
st.markdown("## Cohort Retention")
st.plotly_chart(cohort_heatmap(retention_df), use_container_width=True)

Investor Presentation Patterns

Pitch Deck Metrics Sequence

investor_metrics_flow:
  1_unit_economics:
    charts: ["CAC vs LTV bar", "LTV:CAC trend line"]
    key_message: "3x+ LTV:CAC proves efficient growth"

  2_mrr_waterfall:
    charts: ["MRR waterfall (new, expansion, churn, contraction)"]
    key_message: "Net revenue retention > 100%"

  3_cohort_retention:
    charts: ["Cohort heatmap", "Revenue retention curve"]
    key_message: "Strong retention = compounding value"

  4_growth_efficiency:
    charts: ["Magic Number", "CAC payback period"]
    key_message: "Efficient growth engine"

  5_projections:
    charts: ["ARR projection with scenarios"]
    key_message: "Clear path to $X ARR"

Action Titles (McKinsey Style)

## Bad (Descriptive) → Good (Action)

❌ "Revenue by Quarter"
✅ "Q4 Revenue Exceeded Target by 23%"

❌ "Customer Acquisition Cost"
✅ "CAC Decreased 40% While Maintaining Quality"

❌ "Cohort Analysis"
✅ "90-Day Retention Improved to 85%, Up From 72%"

❌ "Market Size"
✅ "TAM of $4.2B with Clear Path to $500M SAM"

Quick Commands

# Load and analyze any file
df = load_data("data.csv")
metrics = calculate_saas_metrics(df)
retention = cohort_retention_analysis(df)

# Generate executive charts
fig = exec_line_chart(df, 'month', 'mrr', 'MRR Growth Accelerating')
fig.write_html("mrr_chart.html")
fig.write_image("mrr_chart.png", scale=2)

# Run Streamlit dashboard
# streamlit run dashboard.py

Integration Notes

Pairs with : revenue-ops-skill (metrics), pricing-strategy-skill (modeling)
Stack : Python 3.11+, pandas, polars, plotly, altair, streamlit
Projects : coperniq-forge (ROI calculators), thetaroom (trading analysis)
NO OPENAI : Use Claude for narrative generation

Reference Files

reference/chart-gallery.md - 20+ chart templates with code
reference/saas-metrics.md - Complete SaaS KPI definitions
reference/streamlit-patterns.md - Production dashboard patterns
reference/data-wrangling.md - Format-specific extraction guides

Emit Outcome Sidecar

Write to ~/.claude/skill-analytics/last-outcome-data-analysis.json: {"ts":"[UTC ISO8601]","skill":"data-analysis","version":"1.0.0","variant":"default","status":"[success|partial|error]","runtime_ms":[ms],"metrics":{"datasets_analyzed":[n],"charts_generated":[n],"insights_extracted":[n]},"error":null,"session_id":"[YYYY-MM-DD]"}

Weekly Installs

101

Repository

scientiacapital/skills

GitHub Stars

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykWarn

Installed on

gemini-cli97

opencode96

codex96

github-copilot92

cursor91

amp90

Excel财务建模规范与xlsx文件处理指南：专业格式、零错误公式与数据分析

45,000 周安装