⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

数据科学技能：SQL查询、Python数据分析与可视化、统计建模实战指南

data-science by htlin222/dotfiles

49 周安装量

75 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/htlin222/dotfiles --skill data-science

数据可视化数据分析数据处理

🇨🇳中文介绍

数据科学

数据分析、SQL 和洞察生成。

使用场景

编写 SQL 查询
数据分析和探索
创建可视化图表
统计分析
ETL 和数据管道

SQL 模式

常用查询

-- 使用窗口函数进行聚合
SELECT
    user_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as recency_rank
FROM orders;

-- 使用 CTE 提高可读性
WITH monthly_stats AS (
    SELECT
        DATE_TRUNC('month', created_at) as month,
        COUNT(*) as total_orders,
        SUM(amount) as revenue
    FROM orders
    GROUP BY 1
),
growth AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_revenue,
        (revenue - LAG(revenue) OVER (ORDER BY month)) / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) as growth_rate
    FROM monthly_stats
)
SELECT * FROM growth;

BigQuery 特性

-- 分区表查询
SELECT *
FROM `project.dataset.events`
WHERE DATE(_PARTITIONTIME) BETWEEN '2024-01-01' AND '2024-01-31';

-- 用于数组的 UNNEST
SELECT
    user_id,
    item
FROM `project.dataset.orders`,
UNNEST(items) as item;

-- 大数据集的近似计数
SELECT APPROX_COUNT_DISTINCT(user_id) as unique_users
FROM `project.dataset.events`;

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

🇺🇸English

Data Science

Data analysis, SQL, and insights generation.

When to Use

Writing SQL queries
Data analysis and exploration
Creating visualizations
Statistical analysis
ETL and data pipelines

SQL Patterns

Common Queries

-- Aggregation with window functions
SELECT
    user_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as recency_rank
FROM orders;

-- CTEs for readability
WITH monthly_stats AS (
    SELECT
        DATE_TRUNC('month', created_at) as month,
        COUNT(*) as total_orders,
        SUM(amount) as revenue
    FROM orders
    GROUP BY 1
),
growth AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_revenue,
        (revenue - LAG(revenue) OVER (ORDER BY month)) / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) as growth_rate
    FROM monthly_stats
)
SELECT * FROM growth;

BigQuery Specifics

-- Partitioned table query
SELECT *
FROM `project.dataset.events`
WHERE DATE(_PARTITIONTIME) BETWEEN '2024-01-01' AND '2024-01-31';

-- UNNEST for arrays
SELECT
    user_id,
    item
FROM `project.dataset.orders`,
UNNEST(items) as item;

-- Approximate counts for large data
SELECT APPROX_COUNT_DISTINCT(user_id) as unique_users
FROM `project.dataset.events`;

Python Analysis

import pandas as pd
import numpy as np

# Load and explore
df = pd.read_csv('data.csv')
df.info()
df.describe()

# Clean and transform
df['date'] = pd.to_datetime(df['date'])
df = df.dropna(subset=['required_field'])
df['category'] = df['category'].fillna('Unknown')

# Aggregate
summary = df.groupby('category').agg({
    'value': ['mean', 'sum', 'count'],
    'date': ['min', 'max']
}).round(2)

# Visualize
import matplotlib.pyplot as plt
df.groupby('date')['value'].sum().plot(figsize=(12, 6))
plt.title('Daily Values')
plt.savefig('chart.png', dpi=150, bbox_inches='tight')

Statistical Analysis

from scipy import stats

# Hypothesis testing
t_stat, p_value = stats.ttest_ind(group_a, group_b)

# Correlation
correlation = df['x'].corr(df['y'])

# Regression
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
print(f"R² = {model.score(X, y):.3f}")

Output Format

## Analysis Summary

**Question:** [What we're trying to answer]
**Data Source:** [Tables/files used]
**Date Range:** [Time period]

### Key Findings

1. [Finding with supporting metric]
2. [Finding with supporting metric]

### Visualization

[Chart description or embedded image]

### Recommendations

- [Actionable insight]

Examples

Input: "Analyze user retention" Action: Query cohort data, calculate retention rates, visualize trends

Input: "Find top customers" Action: Write SQL for RFM analysis, segment users, summarize findings

Weekly Installs

Repository

htlin222/dotfiles

GitHub Stars

First Seen

Jan 22, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli46

cursor44

opencode44

codex44

github-copilot43

kimi-cli41