队列分析完整指南：Python实现用户留存、LTV与行为分析

Cohort Analysis by aj-geddes/useful-ai-prompts

153 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/aj-geddes/useful-ai-prompts --skill 'Cohort Analysis'

Python Web框架数据可视化数据分析

🇨🇳中文介绍

队列分析

概述

队列分析追踪具有共同特征的用户群体随时间的变化，揭示留存、参与度和生命周期价值的模式。

何时使用

衡量用户留存率并识别用户流失的时间点
分析客户生命周期价值（LTV）和投资回收期
比较不同用户获取渠道或营销活动的表现
了解产品变更如何随时间影响不同的用户群体
追踪参与模式并识别流失的早期预警信号
评估用户引导改进或功能发布的长期影响

核心概念

队列 : 共享某个特征（注册日期、地区等）的用户群体
队列规模 : 初始群体大小
留存率 : 保持活跃的百分比
流失率 : 离开的百分比
留存曲线 : 队列随时间衰减的情况

队列类型

获取日期 : 按注册时间段分组的用户
行为 : 按所采取行动分组的用户
收入 : 按购买价值分组的用户
地理 : 按地理位置分组的用户
人口统计 : 按特征分组的用户

使用 Python 实现

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# 创建示例用户生命周期数据
np.random.seed(42)

# 生成用户数据
n_users = 5000
users = []

for user_id in range(n_users):
    signup_month = np.random.choice(range(1, 13))
    lifetime_months = np.random.poisson(6) + 1

    for month in range(1, lifetime_months + 1):
        users.append({
            'user_id': user_id,
            'signup_month': signup_month,
            'month': month,
            'active': 1,
        })

df = pd.DataFrame(users)

# 添加衍生列
df['cohort_month'] = df['signup_month']
df['cohort_age'] = df['month']  # 可以是天、周等
df['date'] = pd.to_datetime('2023-01-01') + pd.to_timedelta(df['signup_month'] * 30, unit='D')

print("用户数据摘要:")
print(df.head(10))

# 1. 队列表（留存矩阵）
cohort_data = df.groupby(['cohort_month', 'cohort_age']).agg({
    'user_id': 'nunique'
}).reset_index()
cohort_data.columns = ['cohort_month', 'cohort_age', 'unique_users']

# 创建数据透视表
cohort_pivot = cohort_data.pivot(index='cohort_month', columns='cohort_age', values='unique_users')

print("\n队列规模（原始用户数）:")
print(cohort_pivot)

# 2. 队列留存（以队列规模的百分比表示）
cohort_size = cohort_pivot.iloc[:, 0]
retention_table = cohort_pivot.divide(cohort_size, axis=0) * 100

print("\n队列留存率（%）:")
print(retention_table.round(1))

# 3. 可视化留存矩阵
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# 原始计数的热力图
sns.heatmap(cohort_pivot, annot=True, fmt='g', cmap='YlOrRd', ax=axes[0],
            cbar_kws={'label': '用户数'})
axes[0].set_title('队列规模 - 用户数')
axes[0].set_xlabel('队列年龄（月）')
axes[0].set_ylabel('队列月份')

# 留存率的热力图
sns.heatmap(retention_table, annot=True, fmt='.0f', cmap='RdYlGn', vmin=0, vmax=100,
            ax=axes[1], cbar_kws={'label': '留存率 %'})
axes[1].set_title('队列留存率（%）')
axes[1].set_xlabel('队列年龄（月）')
axes[1].set_ylabel('队列月份')

plt.tight_layout()
plt.show()

# 4. 留存曲线
fig, ax = plt.subplots(figsize=(12, 6))

# 绘制每个队列的留存曲线
for cohort_month in cohort_pivot.index[:8]:  # 前 8 个队列
    cohort_retention = retention_table.loc[cohort_month]
    ax.plot(cohort_retention.index, cohort_retention.values, marker='o', label=f'队列 {cohort_month}')

ax.set_xlabel('队列年龄（月）')
ax.set_ylabel('留存率（%）')
ax.set_title('各队列的留存曲线')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
ax.grid(True, alpha=0.3)
ax.set_ylim([0, 105])

plt.tight_layout()
plt.show()

# 5. 平均留存曲线
fig, ax = plt.subplots(figsize=(10, 6))

# 计算每个年龄段的平均留存率
avg_retention = retention_table.mean()
ax.plot(avg_retention.index, avg_retention.values, marker='o', linewidth=2, markersize=8, color='navy')
ax.fill_between(avg_retention.index, avg_retention.values, alpha=0.3, color='navy')

# 添加置信区间
std_retention = retention_table.std()
ax.fill_between(std_retention.index,
                avg_retention - std_retention,
                avg_retention + std_retention,
                alpha=0.2, color='navy', label='±1 标准差')

ax.set_xlabel('队列年龄（月）')
ax.set_ylabel('留存率（%）')
ax.set_title('带置信区间的平均留存曲线')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim([0, 105])

plt.tight_layout()
plt.show()

# 6. 流失率
churn_rate = 100 - retention_table
print("\n流失率（%）:")
print(churn_rate.round(1).head())

# 7. 收入队列分析
# 添加收入数据
np.random.seed(42)
df['revenue'] = np.random.exponential(50, len(df))

# 按队列统计收入
revenue_data = df.groupby(['cohort_month', 'cohort_age']).agg({
    'revenue': 'sum',
    'user_id': 'nunique'
}).reset_index()
revenue_data['revenue_per_user'] = revenue_data['revenue'] / revenue_data['user_id']

revenue_pivot = revenue_data.pivot(index='cohort_month', columns='cohort_age', values='revenue')
rpu_pivot = revenue_data.pivot(index='cohort_month', columns='cohort_age', values='revenue_per_user')

# 可视化收入
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

sns.heatmap(revenue_pivot, annot=True, fmt='.0f', cmap='YlGnBu', ax=axes[0],
            cbar_kws={'label': '总收入（$）'})
axes[0].set_title('各队列总收入')
axes[0].set_xlabel('队列年龄（月）')
axes[0].set_ylabel('队列月份')

sns.heatmap(rpu_pivot, annot=True, fmt='.2f', cmap='YlGnBu', ax=axes[1],
            cbar_kws={'label': '每用户收入（$）'})
axes[1].set_title('各队列每用户收入')
axes[1].set_xlabel('队列年龄（月）')
axes[1].set_ylabel('队列月份')

plt.tight_layout()
plt.show()

# 8. 生命周期价值计算
df['month_since_signup'] = df['cohort_age']
ltv_data = df.groupby('user_id').agg({
    'revenue': 'sum',
    'cohort_month': 'first',
    'month_since_signup': 'max',
}).reset_index()
ltv_data.columns = ['user_id', 'lifetime_value', 'cohort_month', 'lifetime_months']

# 按队列计算平均 LTV
ltv_by_cohort = ltv_data.groupby('cohort_month')['lifetime_value'].agg(['mean', 'median', 'std'])

print("\n各队列生命周期价值:")
print(ltv_by_cohort.round(2))

fig, ax = plt.subplots(figsize=(10, 6))
ltv_by_cohort['mean'].plot(kind='bar', ax=ax, color='skyblue', edgecolor='black')
ax.set_title('各队列平均生命周期价值')
ax.set_xlabel('队列月份')
ax.set_ylabel('生命周期价值（$）')
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

# 9. 队列构成随时间变化
fig, ax = plt.subplots(figsize=(12, 6))

# 每月各队列的活跃用户数
active_by_month = df.groupby(['date', 'cohort_month']).size().reset_index(name='active_users')
pivot_active = active_by_month.pivot(index='date', columns='cohort_month', values='active_users')

pivot_active.plot(ax=ax, marker='o')
ax.set_title('每月各队列活跃用户数')
ax.set_xlabel('月份')
ax.set_ylabel('活跃用户数')
ax.legend(title='队列月份', bbox_to_anchor=(1.05, 1))
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 10. 队列摘要指标
summary_metrics = pd.DataFrame({
    'Cohort Month': cohort_size.index,
    'Initial Size': cohort_size.values,
    'Month 1 Retention': retention_table.iloc[:, 0].values,
    'Month 3 Retention': retention_table.iloc[:, min(2, retention_table.shape[1]-1)].values,
    'Avg LTV': ltv_by_cohort['mean'].values,
})

print("\n队列摘要指标:")
print(summary_metrics.round(2))

# 11. 可视化比较
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# 第 1 月 vs 第 3 月留存率
ax_plot = axes[0]
months = ['Month 1', 'Month 3']
month_1_ret = retention_table.iloc[:, 0].mean()
month_3_ret = retention_table.iloc[:, min(2, retention_table.shape[1]-1)].mean()
ax_plot.bar(months, [month_1_ret, month_3_ret], color=['#1f77b4', '#ff7f0e'], edgecolor='black')
ax_plot.set_ylabel('留存率（%）')
ax_plot.set_title('各里程碑平均留存率')
ax_plot.set_ylim([0, 100])
for i, v in enumerate([month_1_ret, month_3_ret]):
    ax_plot.text(i, v + 2, f'{v:.1f}%', ha='center')

# 队列规模趋势
axes[1].plot(cohort_size.index, cohort_size.values, marker='o', linewidth=2, markersize=8)
axes[1].set_xlabel('队列月份')
axes[1].set_ylabel('队列规模')
axes[1].set_title('队列规模随时间变化')
axes[1].grid(True, alpha=0.3)

# LTV 趋势
axes[2].plot(ltv_by_cohort.index, ltv_by_cohort['mean'].values, marker='o', linewidth=2, markersize=8, color='green')
axes[2].set_xlabel('队列月份')
axes[2].set_ylabel('平均生命周期价值（$）')
axes[2].set_title('各队列 LTV 趋势')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n队列分析完成！")

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

🇺🇸English

Cohort Analysis

Overview

Cohort analysis tracks groups of users with shared characteristics over time, revealing patterns in retention, engagement, and lifetime value.

When to Use

Measuring user retention rates and identifying when users churn
Analyzing customer lifetime value (LTV) and payback periods
Comparing performance across different user acquisition channels or campaigns
Understanding how product changes affect different user groups over time
Tracking engagement patterns and identifying early warning signs of churn
Evaluating the long-term impact of onboarding improvements or feature releases

Core Concepts

Cohort : Group of users sharing a characteristic (signup date, region, etc.)
Cohort Size : Initial group size
Retention Rate : Percentage remaining active
Churn Rate : Percentage who left
Retention Curve : How cohort degrades over time

Cohort Types

Acquisition Date : Users grouped by signup period
Behavioral : Users grouped by actions taken
Revenue : Users grouped by purchase value
Geographic : Users grouped by location
Demographic : Users grouped by characteristics

Implementation with Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Create sample user lifecycle data
np.random.seed(42)

# Generate user data
n_users = 5000
users = []

for user_id in range(n_users):
    signup_month = np.random.choice(range(1, 13))
    lifetime_months = np.random.poisson(6) + 1

    for month in range(1, lifetime_months + 1):
        users.append({
            'user_id': user_id,
            'signup_month': signup_month,
            'month': month,
            'active': 1,
        })

df = pd.DataFrame(users)

# Add derived columns
df['cohort_month'] = df['signup_month']
df['cohort_age'] = df['month']  # Could be day, week, etc.
df['date'] = pd.to_datetime('2023-01-01') + pd.to_timedelta(df['signup_month'] * 30, unit='D')

print("User Data Summary:")
print(df.head(10))

# 1. Cohort Table (Retention Matrix)
cohort_data = df.groupby(['cohort_month', 'cohort_age']).agg({
    'user_id': 'nunique'
}).reset_index()
cohort_data.columns = ['cohort_month', 'cohort_age', 'unique_users']

# Create pivot table
cohort_pivot = cohort_data.pivot(index='cohort_month', columns='cohort_age', values='unique_users')

print("\nCohort Sizes (Raw User Counts):")
print(cohort_pivot)

# 2. Cohort Retention (as percentage of cohort size)
cohort_size = cohort_pivot.iloc[:, 0]
retention_table = cohort_pivot.divide(cohort_size, axis=0) * 100

print("\nCohort Retention Rate (%):")
print(retention_table.round(1))

# 3. Visualize Retention Matrix
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Heatmap of raw counts
sns.heatmap(cohort_pivot, annot=True, fmt='g', cmap='YlOrRd', ax=axes[0],
            cbar_kws={'label': 'User Count'})
axes[0].set_title('Cohort Sizes - User Counts')
axes[0].set_xlabel('Cohort Age (Months)')
axes[0].set_ylabel('Cohort Month')

# Heatmap of retention rates
sns.heatmap(retention_table, annot=True, fmt='.0f', cmap='RdYlGn', vmin=0, vmax=100,
            ax=axes[1], cbar_kws={'label': 'Retention %'})
axes[1].set_title('Cohort Retention Rates (%)')
axes[1].set_xlabel('Cohort Age (Months)')
axes[1].set_ylabel('Cohort Month')

plt.tight_layout()
plt.show()

# 4. Retention Curve
fig, ax = plt.subplots(figsize=(12, 6))

# Plot retention curves for each cohort
for cohort_month in cohort_pivot.index[:8]:  # First 8 cohorts
    cohort_retention = retention_table.loc[cohort_month]
    ax.plot(cohort_retention.index, cohort_retention.values, marker='o', label=f'Cohort {cohort_month}')

ax.set_xlabel('Cohort Age (Months)')
ax.set_ylabel('Retention Rate (%)')
ax.set_title('Retention Curves by Cohort')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
ax.grid(True, alpha=0.3)
ax.set_ylim([0, 105])

plt.tight_layout()
plt.show()

# 5. Average Retention Curve
fig, ax = plt.subplots(figsize=(10, 6))

# Calculate average retention at each age
avg_retention = retention_table.mean()
ax.plot(avg_retention.index, avg_retention.values, marker='o', linewidth=2, markersize=8, color='navy')
ax.fill_between(avg_retention.index, avg_retention.values, alpha=0.3, color='navy')

# Add confidence interval
std_retention = retention_table.std()
ax.fill_between(std_retention.index,
                avg_retention - std_retention,
                avg_retention + std_retention,
                alpha=0.2, color='navy', label='±1 Std Dev')

ax.set_xlabel('Cohort Age (Months)')
ax.set_ylabel('Retention Rate (%)')
ax.set_title('Average Retention Curve with Confidence Band')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim([0, 105])

plt.tight_layout()
plt.show()

# 6. Churn Rate
churn_rate = 100 - retention_table
print("\nChurn Rates (%):")
print(churn_rate.round(1).head())

# 7. Revenue Cohort Analysis
# Add revenue data
np.random.seed(42)
df['revenue'] = np.random.exponential(50, len(df))

# Revenue by cohort
revenue_data = df.groupby(['cohort_month', 'cohort_age']).agg({
    'revenue': 'sum',
    'user_id': 'nunique'
}).reset_index()
revenue_data['revenue_per_user'] = revenue_data['revenue'] / revenue_data['user_id']

revenue_pivot = revenue_data.pivot(index='cohort_month', columns='cohort_age', values='revenue')
rpu_pivot = revenue_data.pivot(index='cohort_month', columns='cohort_age', values='revenue_per_user')

# Visualize revenue
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

sns.heatmap(revenue_pivot, annot=True, fmt='.0f', cmap='YlGnBu', ax=axes[0],
            cbar_kws={'label': 'Total Revenue ($)'})
axes[0].set_title('Total Revenue by Cohort')
axes[0].set_xlabel('Cohort Age (Months)')
axes[0].set_ylabel('Cohort Month')

sns.heatmap(rpu_pivot, annot=True, fmt='.2f', cmap='YlGnBu', ax=axes[1],
            cbar_kws={'label': 'Revenue per User ($)'})
axes[1].set_title('Revenue per User by Cohort')
axes[1].set_xlabel('Cohort Age (Months)')
axes[1].set_ylabel('Cohort Month')

plt.tight_layout()
plt.show()

# 8. Lifetime Value Calculation
df['month_since_signup'] = df['cohort_age']
ltv_data = df.groupby('user_id').agg({
    'revenue': 'sum',
    'cohort_month': 'first',
    'month_since_signup': 'max',
}).reset_index()
ltv_data.columns = ['user_id', 'lifetime_value', 'cohort_month', 'lifetime_months']

# Average LTV by cohort
ltv_by_cohort = ltv_data.groupby('cohort_month')['lifetime_value'].agg(['mean', 'median', 'std'])

print("\nLifetime Value by Cohort:")
print(ltv_by_cohort.round(2))

fig, ax = plt.subplots(figsize=(10, 6))
ltv_by_cohort['mean'].plot(kind='bar', ax=ax, color='skyblue', edgecolor='black')
ax.set_title('Average Lifetime Value by Cohort')
ax.set_xlabel('Cohort Month')
ax.set_ylabel('Lifetime Value ($)')
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

# 9. Cohort Composition Over Time
fig, ax = plt.subplots(figsize=(12, 6))

# Active users per month by cohort
active_by_month = df.groupby(['date', 'cohort_month']).size().reset_index(name='active_users')
pivot_active = active_by_month.pivot(index='date', columns='cohort_month', values='active_users')

pivot_active.plot(ax=ax, marker='o')
ax.set_title('Active Users Per Month by Cohort')
ax.set_xlabel('Month')
ax.set_ylabel('Active Users')
ax.legend(title='Cohort Month', bbox_to_anchor=(1.05, 1))
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 10. Cohort Summary Metrics
summary_metrics = pd.DataFrame({
    'Cohort Month': cohort_size.index,
    'Initial Size': cohort_size.values,
    'Month 1 Retention': retention_table.iloc[:, 0].values,
    'Month 3 Retention': retention_table.iloc[:, min(2, retention_table.shape[1]-1)].values,
    'Avg LTV': ltv_by_cohort['mean'].values,
})

print("\nCohort Summary Metrics:")
print(summary_metrics.round(2))

# 11. Visualization comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Month 1 vs Month 3 retention
ax_plot = axes[0]
months = ['Month 1', 'Month 3']
month_1_ret = retention_table.iloc[:, 0].mean()
month_3_ret = retention_table.iloc[:, min(2, retention_table.shape[1]-1)].mean()
ax_plot.bar(months, [month_1_ret, month_3_ret], color=['#1f77b4', '#ff7f0e'], edgecolor='black')
ax_plot.set_ylabel('Retention Rate (%)')
ax_plot.set_title('Average Retention by Milestone')
ax_plot.set_ylim([0, 100])
for i, v in enumerate([month_1_ret, month_3_ret]):
    ax_plot.text(i, v + 2, f'{v:.1f}%', ha='center')

# Cohort size trend
axes[1].plot(cohort_size.index, cohort_size.values, marker='o', linewidth=2, markersize=8)
axes[1].set_xlabel('Cohort Month')
axes[1].set_ylabel('Cohort Size')
axes[1].set_title('Cohort Sizes Over Time')
axes[1].grid(True, alpha=0.3)

# LTV trend
axes[2].plot(ltv_by_cohort.index, ltv_by_cohort['mean'].values, marker='o', linewidth=2, markersize=8, color='green')
axes[2].set_xlabel('Cohort Month')
axes[2].set_ylabel('Average Lifetime Value ($)')
axes[2].set_title('LTV Trend by Cohort')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nCohort analysis complete!")

Key Metrics

Retention Rate : % of cohort active
Churn Rate : % of cohort lost
Day/Month 1 Retention : Early engagement
Lifetime Value : Total revenue per user
Payback Period : Time to recover CAC

Insights to Look For

Early retention predictors
Differences between cohorts
Seasonal patterns
Engagement degradation
Revenue trends

Deliverables

Cohort retention matrix
Retention curve visualization
Churn rate analysis
Lifetime value calculations
Revenue per cohort
Executive summary with insights
Actionable recommendations

Weekly Installs

Repository

aj-geddes/usefu…-prompts

GitHub Stars

121

First Seen

Jan 1, 1970

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass