重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
npx skills add https://github.com/htlin222/dotfiles --skill data-science数据分析、SQL 和洞察生成。
-- 使用窗口函数进行聚合
SELECT
user_id,
order_date,
amount,
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) as running_total,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as recency_rank
FROM orders;
-- 使用 CTE 提高可读性
WITH monthly_stats AS (
SELECT
DATE_TRUNC('month', created_at) as month,
COUNT(*) as total_orders,
SUM(amount) as revenue
FROM orders
GROUP BY 1
),
growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_revenue,
(revenue - LAG(revenue) OVER (ORDER BY month)) / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) as growth_rate
FROM monthly_stats
)
SELECT * FROM growth;
-- 分区表查询
SELECT *
FROM `project.dataset.events`
WHERE DATE(_PARTITIONTIME) BETWEEN '2024-01-01' AND '2024-01-31';
-- 用于数组的 UNNEST
SELECT
user_id,
item
FROM `project.dataset.orders`,
UNNEST(items) as item;
-- 大数据集的近似计数
SELECT APPROX_COUNT_DISTINCT(user_id) as unique_users
FROM `project.dataset.events`;
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
import pandas as pd
import numpy as np
# 加载和探索数据
df = pd.read_csv('data.csv')
df.info()
df.describe()
# 清理和转换数据
df['date'] = pd.to_datetime(df['date'])
df = df.dropna(subset=['required_field'])
df['category'] = df['category'].fillna('Unknown')
# 聚合数据
summary = df.groupby('category').agg({
'value': ['mean', 'sum', 'count'],
'date': ['min', 'max']
}).round(2)
# 可视化
import matplotlib.pyplot as plt
df.groupby('date')['value'].sum().plot(figsize=(12, 6))
plt.title('Daily Values')
plt.savefig('chart.png', dpi=150, bbox_inches='tight')
from scipy import stats
# 假设检验
t_stat, p_value = stats.ttest_ind(group_a, group_b)
# 相关性分析
correlation = df['x'].corr(df['y'])
# 回归分析
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
print(f"R² = {model.score(X, y):.3f}")
## 分析摘要
**问题:** [我们试图回答的问题]
**数据源:** [使用的表/文件]
**日期范围:** [时间段]
### 主要发现
1. [附带支持性指标的发现]
2. [附带支持性指标的发现]
### 可视化
[图表描述或嵌入的图片]
### 建议
- [可操作的见解]
输入: "分析用户留存率" 操作: 查询同期群数据,计算留存率,可视化趋势
输入: "寻找顶级客户" 操作: 编写 RFM 分析 SQL,对用户进行细分,总结发现
每周安装量
49
代码仓库
GitHub 星标数
75
首次出现
2026年1月22日
安全审计
安装于
gemini-cli46
cursor44
opencode44
codex44
github-copilot43
kimi-cli41
Data analysis, SQL, and insights generation.
-- Aggregation with window functions
SELECT
user_id,
order_date,
amount,
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) as running_total,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as recency_rank
FROM orders;
-- CTEs for readability
WITH monthly_stats AS (
SELECT
DATE_TRUNC('month', created_at) as month,
COUNT(*) as total_orders,
SUM(amount) as revenue
FROM orders
GROUP BY 1
),
growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_revenue,
(revenue - LAG(revenue) OVER (ORDER BY month)) / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) as growth_rate
FROM monthly_stats
)
SELECT * FROM growth;
-- Partitioned table query
SELECT *
FROM `project.dataset.events`
WHERE DATE(_PARTITIONTIME) BETWEEN '2024-01-01' AND '2024-01-31';
-- UNNEST for arrays
SELECT
user_id,
item
FROM `project.dataset.orders`,
UNNEST(items) as item;
-- Approximate counts for large data
SELECT APPROX_COUNT_DISTINCT(user_id) as unique_users
FROM `project.dataset.events`;
import pandas as pd
import numpy as np
# Load and explore
df = pd.read_csv('data.csv')
df.info()
df.describe()
# Clean and transform
df['date'] = pd.to_datetime(df['date'])
df = df.dropna(subset=['required_field'])
df['category'] = df['category'].fillna('Unknown')
# Aggregate
summary = df.groupby('category').agg({
'value': ['mean', 'sum', 'count'],
'date': ['min', 'max']
}).round(2)
# Visualize
import matplotlib.pyplot as plt
df.groupby('date')['value'].sum().plot(figsize=(12, 6))
plt.title('Daily Values')
plt.savefig('chart.png', dpi=150, bbox_inches='tight')
from scipy import stats
# Hypothesis testing
t_stat, p_value = stats.ttest_ind(group_a, group_b)
# Correlation
correlation = df['x'].corr(df['y'])
# Regression
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
print(f"R² = {model.score(X, y):.3f}")
## Analysis Summary
**Question:** [What we're trying to answer]
**Data Source:** [Tables/files used]
**Date Range:** [Time period]
### Key Findings
1. [Finding with supporting metric]
2. [Finding with supporting metric]
### Visualization
[Chart description or embedded image]
### Recommendations
- [Actionable insight]
Input: "Analyze user retention" Action: Query cohort data, calculate retention rates, visualize trends
Input: "Find top customers" Action: Write SQL for RFM analysis, segment users, summarize findings
Weekly Installs
49
Repository
GitHub Stars
75
First Seen
Jan 22, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
gemini-cli46
cursor44
opencode44
codex44
github-copilot43
kimi-cli41
DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本
55,800 周安装
Google Workspace 文档共享与协作者通知自动化脚本 - 一键分享编辑权限并邮件通知
8,500 周安装
Google Tasks 任务列表创建教程 - 使用 gws CLI 自动化添加和管理任务
8,600 周安装
Google Workspace CLI 创建共享云端硬盘教程 - 自动化配置团队协作空间
8,600 周安装
Google Workspace CLI 教程:通过邮件自动发送云端硬盘文件链接,提升团队协作效率
8,700 周安装
Google Workspace CLI 技能:从 Google 文档草拟 Gmail 邮件 - 自动化邮件撰写
8,800 周安装
Google Slides 演示文稿创建与共享自动化教程 - 使用 Google Workspace CLI
9,200 周安装