数据分析师技能指南：SQL查询、数据可视化与商业智能实战教程

data-analyst by borghei/claude-skills

239 周安装量

70 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/borghei/claude-skills --skill data-analyst

数据可视化数据分析商业智能

🇨🇳中文介绍

数据分析师

为企业洞察提供专家级数据分析。

核心能力

SQL 与数据库查询
数据可视化
统计分析
商业智能
数据叙事
仪表板开发
报告自动化
利益相关者沟通

SQL 精通

查询模式

聚合：

SELECT
    date_trunc('month', created_at) as month,
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_order_value
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY 1
ORDER BY 1;

窗口函数：

SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as order_number,
    LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as previous_order
FROM orders;

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

数据类型	最佳图表	备选方案
时间趋势	折线图	面积图
部分与整体	饼图/环形图	堆叠条形图
比较	条形图	柱状图
分布	直方图	箱线图
相关性	散点图	热力图
地理数据	地图	分级统计图

可视化最佳实践

Y 轴从零开始（针对条形图）
使用一致的配色
清晰标注坐标轴
包含上下文（基准、目标）
按有意义的方式排序类别

使用 3D 图表
使用超过 5-7 种颜色
误导性地截断坐标轴
用网格线使图表杂乱
对过多类别使用饼图

┌─────────────────────────────────────────────────────────────┐
│  执行摘要                                                   │
│  [KPI 1: $X]  [KPI 2: X%]  [KPI 3: X]  [KPI 4: X%]         │
├─────────────────────────────────────────────────────────────┤
│  趋势                          │  细分分析                 │
│  [折线图 - 主要指标]           │  [条形图 - 细分市场]      │
│                                │                            │
├────────────────────────────────┼────────────────────────────┤
│  对比                          │  明细表                    │
│  [条形图 - 对比目标/去年]       │  [Top N 及指标]           │
│                                │                            │
└────────────────────────────────┴────────────────────────────┘

import pandas as pd
import numpy as np

def describe_data(df, column):
    stats = {
        'count': df[column].count(),
        'mean': df[column].mean(),
        'median': df[column].median(),
        'std': df[column].std(),
        'min': df[column].min(),
        'max': df[column].max(),
        'q25': df[column].quantile(0.25),
        'q75': df[column].quantile(0.75),
        'skewness': df[column].skew(),
        'kurtosis': df[column].kurtosis()
    }
    return stats

from scipy import stats

# T检验：比较两组数据
def compare_groups(group_a, group_b, alpha=0.05):
    stat, p_value = stats.ttest_ind(group_a, group_b)

    result = {
        't_statistic': stat,
        'p_value': p_value,
        'significant': p_value < alpha,
        'effect_size': (group_a.mean() - group_b.mean()) / np.sqrt(
            (group_a.std()**2 + group_b.std()**2) / 2
        )
    }
    return result

# 卡方检验：检验独立性
def test_independence(observed, alpha=0.05):
    stat, p_value, dof, expected = stats.chi2_contingency(observed)

    return {
        'chi2': stat,
        'p_value': p_value,
        'degrees_of_freedom': dof,
        'significant': p_value < alpha
    }

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

def simple_regression(X, y):
    model = LinearRegression()
    model.fit(X.reshape(-1, 1), y)

    predictions = model.predict(X.reshape(-1, 1))

    return {
        'coefficient': model.coef_[0],
        'intercept': model.intercept_,
        'r_squared': r2_score(y, predictions),
        'mae': mean_absolute_error(y, predictions)
    }

# 分析：[主题]

## 商业问题
[我们试图回答什么问题？]

## 假设
[我们期望发现什么？]

## 数据源
- [数据源 1]：[描述]
- [数据源 2]：[描述]

## 方法论
1. [步骤 1]
2. [步骤 2]
3. [步骤 3]

## 发现

### 发现 1：[标题]
[描述及支持数据]

### 发现 2：[标题]
[描述及支持数据]

## 建议
1. [建议]：[预期影响]
2. [建议]：[预期影响]

## 局限性
- [局限性 1]
- [局限性 2]

## 后续步骤
- [待办事项]

客户获取成本
每条线索成本
转化率

日/月活跃用户
会话时长
功能采用率

流失率
留存率
净收入留存率

月度经常性收入
每用户平均收入
用户生命周期价值

1. 背景
   - 为什么这很重要？
   - 我们要回答什么问题？

2. 关键发现
   - 以洞察开头
   - 使其令人印象深刻

3. 证据
   - 展示数据
   - 使用有效的视觉元素

4. 影响
   - 这意味着什么？
   - 那又怎样？

5. 建议
   - 我们应该做什么？
   - 清晰的后续步骤

## [标题：以行动为导向的发现]

**内容：**[对发现的一句话描述]

**意义：**[这对业务为何重要]

**行动：**[建议采取的行动]

**证据：**
[支持该发现的图表或数据]

**置信度：**[高/中/低]

references/sql_patterns.md - 高级 SQL 查询
references/visualization.md - 图表选择指南
references/statistics.md - 统计方法
references/storytelling.md - 演示最佳实践

# 数据探查器
python scripts/data_profiler.py --table orders --output profile.html

# SQL 查询分析器
python scripts/query_analyzer.py --query query.sql --explain

# 仪表板生成器
python scripts/dashboard_gen.py --config dashboard.yaml

# 报告自动化
python scripts/report_gen.py --template monthly --output report.pdf

🇺🇸English

Data Analyst

Expert-level data analysis for business insights.

Core Competencies

SQL and database querying
Data visualization
Statistical analysis
Business intelligence
Data storytelling
Dashboard development
Reporting automation
Stakeholder communication

SQL Mastery

Query Patterns

Aggregation:

SELECT
    date_trunc('month', created_at) as month,
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id) as unique_customers,
    SUM(amount) as total_revenue,
    AVG(amount) as avg_order_value
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY 1
ORDER BY 1;

Window Functions:

SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) as order_number,
    LAG(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as previous_order
FROM orders;

CTEs for Clarity:

WITH monthly_metrics AS (
    SELECT
        date_trunc('month', created_at) as month,
        SUM(amount) as revenue
    FROM orders
    GROUP BY 1
),
growth_calc AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_revenue
    FROM monthly_metrics
)
SELECT
    month,
    revenue,
    ROUND((revenue - prev_revenue) / prev_revenue * 100, 1) as growth_pct
FROM growth_calc;

Cohort Analysis:

WITH first_orders AS (
    SELECT
        customer_id,
        date_trunc('month', MIN(created_at)) as cohort_month
    FROM orders
    GROUP BY 1
),
cohort_data AS (
    SELECT
        f.cohort_month,
        date_trunc('month', o.created_at) as order_month,
        COUNT(DISTINCT o.customer_id) as customers
    FROM orders o
    JOIN first_orders f ON o.customer_id = f.customer_id
    GROUP BY 1, 2
)
SELECT
    cohort_month,
    order_month,
    EXTRACT(MONTH FROM AGE(order_month, cohort_month)) as months_since_cohort,
    customers
FROM cohort_data
ORDER BY 1, 2;

Query Optimization

Use EXPLAIN:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;

Best Practices:

Use indexes on filtered columns
Avoid SELECT * in production
Use LIMIT for exploratory queries
Filter early, aggregate late
Use appropriate data types

Data Visualization

Chart Selection Guide

Data Type	Best Chart	Alternative
Trend over time	Line chart	Area chart
Part of whole	Pie/Donut	Stacked bar
Comparison	Bar chart	Column chart
Distribution	Histogram	Box plot
Correlation	Scatter plot	Heatmap
Geographic	Map	Choropleth

Visualization Best Practices

Do:

Start Y-axis at zero (for bars)
Use consistent colors
Label axes clearly
Include context (benchmarks, targets)
Order categories meaningfully

Don't:

Use 3D charts
Use more than 5-7 colors
Truncate axes misleadingly
Clutter with gridlines
Use pie charts for many categories

Dashboard Layout

┌─────────────────────────────────────────────────────────────┐
│  EXECUTIVE SUMMARY                                          │
│  [KPI 1: $X]  [KPI 2: X%]  [KPI 3: X]  [KPI 4: X%]         │
├─────────────────────────────────────────────────────────────┤
│  TRENDS                          │  BREAKDOWN               │
│  [Line Chart - Primary Metric]   │  [Bar Chart - Segments]  │
│                                  │                          │
├──────────────────────────────────┼──────────────────────────┤
│  COMPARISON                      │  DETAIL TABLE            │
│  [Bar Chart - vs Target/LY]      │  [Top N with metrics]    │
│                                  │                          │
└──────────────────────────────────┴──────────────────────────┘

Statistical Analysis

Descriptive Statistics

import pandas as pd
import numpy as np

def describe_data(df, column):
    stats = {
        'count': df[column].count(),
        'mean': df[column].mean(),
        'median': df[column].median(),
        'std': df[column].std(),
        'min': df[column].min(),
        'max': df[column].max(),
        'q25': df[column].quantile(0.25),
        'q75': df[column].quantile(0.75),
        'skewness': df[column].skew(),
        'kurtosis': df[column].kurtosis()
    }
    return stats

Hypothesis Testing

from scipy import stats

# T-test: Compare two groups
def compare_groups(group_a, group_b, alpha=0.05):
    stat, p_value = stats.ttest_ind(group_a, group_b)

    result = {
        't_statistic': stat,
        'p_value': p_value,
        'significant': p_value < alpha,
        'effect_size': (group_a.mean() - group_b.mean()) / np.sqrt(
            (group_a.std()**2 + group_b.std()**2) / 2
        )
    }
    return result

# Chi-square: Test independence
def test_independence(observed, alpha=0.05):
    stat, p_value, dof, expected = stats.chi2_contingency(observed)

    return {
        'chi2': stat,
        'p_value': p_value,
        'degrees_of_freedom': dof,
        'significant': p_value < alpha
    }

Regression Analysis

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error

def simple_regression(X, y):
    model = LinearRegression()
    model.fit(X.reshape(-1, 1), y)

    predictions = model.predict(X.reshape(-1, 1))

    return {
        'coefficient': model.coef_[0],
        'intercept': model.intercept_,
        'r_squared': r2_score(y, predictions),
        'mae': mean_absolute_error(y, predictions)
    }

Business Analysis

Analysis Framework

# Analysis: [Topic]

## Business Question
[What are we trying to answer?]

## Hypothesis
[What do we expect to find?]

## Data Sources
- [Source 1]: [Description]
- [Source 2]: [Description]

## Methodology
1. [Step 1]
2. [Step 2]
3. [Step 3]

## Findings

### Finding 1: [Title]
[Description with supporting data]

### Finding 2: [Title]
[Description with supporting data]

## Recommendations
1. [Recommendation]: [Expected impact]
2. [Recommendation]: [Expected impact]

## Limitations
- [Limitation 1]
- [Limitation 2]

## Next Steps
- [Action item]

Key Business Metrics

Acquisition:

Customer Acquisition Cost (CAC)
Cost per Lead (CPL)
Conversion Rate

Engagement:

Daily/Monthly Active Users
Session Duration
Feature Adoption

Retention:

Churn Rate
Retention Rate
Net Revenue Retention

Revenue:

Monthly Recurring Revenue (MRR)
Average Revenue Per User (ARPU)
Lifetime Value (LTV)

Data Storytelling

Presentation Structure

1. CONTEXT
   - Why does this matter?
   - What question are we answering?

2. KEY FINDING
   - Lead with the insight
   - Make it memorable

3. EVIDENCE
   - Show the data
   - Use effective visuals

4. IMPLICATIONS
   - What does this mean?
   - So what?

5. RECOMMENDATIONS
   - What should we do?
   - Clear next steps

Insight Template

## [Headline: Action-oriented finding]

**What:** [One sentence description of the finding]

**So What:** [Why this matters to the business]

**Now What:** [Recommended action]

**Evidence:**
[Chart or data supporting the finding]

**Confidence:** [High/Medium/Low]

Reference Materials

references/sql_patterns.md - Advanced SQL queries
references/visualization.md - Chart selection guide
references/statistics.md - Statistical methods
references/storytelling.md - Presentation best practices

Scripts

# Data profiler
python scripts/data_profiler.py --table orders --output profile.html

# SQL query analyzer
python scripts/query_analyzer.py --query query.sql --explain

# Dashboard generator
python scripts/dashboard_gen.py --config dashboard.yaml

# Report automation
python scripts/report_gen.py --template monthly --output report.pdf

Weekly Installs

141

Repository

borghei/claude-skills

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode111

gemini-cli104

codex98

cursor96

github-copilot95

amp89

Excel财务建模规范与xlsx文件处理指南：专业格式、零错误公式与数据分析

42,000 周安装

数据分析师技能指南：SQL查询、数据可视化与商业智能实战教程

🇨🇳中文介绍

数据分析师

核心能力

SQL 精通

查询模式

相关 Skills

查询优化

数据可视化

图表选择指南

可视化最佳实践

仪表板布局

统计分析

描述性统计

假设检验

回归分析

商业分析

分析框架

关键商业指标

数据叙事

演示结构

洞察模板

参考资料

脚本