scikit-survival by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill scikit-survivalscikit-survival 是一个基于 scikit-learn 构建的、用于生存分析的 Python 库。它提供了专门用于事件时间分析的工具,能够处理删失数据(即某些观测结果仅部分已知)这一独特挑战。
生存分析旨在建立协变量与事件发生时间之间的联系,同时考虑删失记录(特别是来自那些参与者在观察期间未经历事件的研究的右删失数据)。
在以下情况下使用此技能:
scikit-survival 提供了多种模型系列,每种都适用于不同的场景:
适用于:具有可解释系数的标准生存分析
CoxPHSurvivalAnalysis:基本 Cox 模型CoxnetSurvivalAnalysis:用于高维数据的带弹性网络的惩罚 Cox 模型IPCRidge:用于加速失效时间模型的岭回归参见: 获取关于 Cox 模型、正则化和解释的详细指南
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
references/cox-models.md适用于:具有复杂非线性关系的高预测性能
RandomSurvivalForest:稳健、非参数的集成方法GradientBoostingSurvivalAnalysis:基于树的提升方法,以获得最佳性能ComponentwiseGradientBoostingSurvivalAnalysis:带特征选择的线性提升ExtraSurvivalTrees:用于额外正则化的极端随机树参见:references/ensemble-models.md 获取关于集成方法、超参数调优以及何时使用每种模型的全面指南
适用于:具有基于边界学习的中等规模数据集
FastSurvivalSVM:为速度优化的线性 SVMFastKernelSurvivalSVM:用于非线性关系的核 SVMHingeLossSurvivalSVM:使用铰链损失的 SVMClinicalKernelTransform:用于临床和分子数据的专用核参见:references/svm-models.md 获取详细的 SVM 指南、核选择以及超参数调优
Start
├─ 高维数据 (p > n)?
│ ├─ 是 → CoxnetSurvivalAnalysis (弹性网络)
│ └─ 否 → 继续
│
├─ 需要可解释的系数?
│ ├─ 是 → CoxPHSurvivalAnalysis 或 ComponentwiseGradientBoostingSurvivalAnalysis
│ └─ 否 → 继续
│
├─ 预期存在复杂的非线性关系?
│ ├─ 是
│ │ ├─ 大数据集 (n > 1000) → GradientBoostingSurvivalAnalysis
│ │ ├─ 中等数据集 → RandomSurvivalForest 或 FastKernelSurvivalSVM
│ │ └─ 小数据集 → RandomSurvivalForest
│ └─ 否 → CoxPHSurvivalAnalysis 或 FastSurvivalSVM
│
└─ 为获得最佳性能 → 尝试多个模型并进行比较
在建模之前,请妥善准备生存数据:
from sksurv.util import Surv
# 从单独的数组创建
y = Surv.from_arrays(event=event_array, time=time_array)
# 从 DataFrame 创建
y = Surv.from_dataframe('event', 'time', df)
参见:references/data-handling.md 获取完整的预处理工作流程、数据验证和最佳实践
正确的评估对于生存模型至关重要。使用考虑删失的适当指标:
用于排序/区分的主要指标:
from sksurv.metrics import concordance_index_censored, concordance_index_ipcw
# Harrell 的 C-index
c_harrell = concordance_index_censored(y_test['event'], y_test['time'], risk_scores)[0]
# Uno 的 C-index (推荐)
c_uno = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
评估特定时间点的区分能力:
from sksurv.metrics import cumulative_dynamic_auc
times = [365, 730, 1095] # 1, 2, 3 年
auc, mean_auc = cumulative_dynamic_auc(y_train, y_test, risk_scores, times)
评估区分能力和校准能力:
from sksurv.metrics import integrated_brier_score
ibs = integrated_brier_score(y_train, y_test, survival_functions, times)
参见:references/evaluation-metrics.md 获取全面的评估指南、指标选择以及将评分器与交叉验证结合使用
处理存在多种互斥事件类型的情况:
from sksurv.nonparametric import cumulative_incidence_competing_risks
# 估计每种事件类型的累积发生率
time_points, cif_event1, cif_event2 = cumulative_incidence_competing_risks(y)
在以下情况下使用竞争风险分析:
参见:references/competing-risks.md 获取详细的竞争风险方法、特定原因风险模型和解释
无需参数假设即可估计生存函数:
from sksurv.nonparametric import kaplan_meier_estimator
time, survival_prob = kaplan_meier_estimator(y['event'], y['time'])
from sksurv.nonparametric import nelson_aalen_estimator
time, cumulative_hazard = nelson_aalen_estimator(y['event'], y['time'])
from sksurv.datasets import load_breast_cancer
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.metrics import concordance_index_ipcw
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# 1. 加载并准备数据
X, y = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 2. 预处理
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# 3. 拟合模型
estimator = CoxPHSurvivalAnalysis()
estimator.fit(X_train_scaled, y_train)
# 4. 预测
risk_scores = estimator.predict(X_test_scaled)
# 5. 评估
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
print(f"C-index: {c_index:.3f}")
from sksurv.linear_model import CoxnetSurvivalAnalysis
from sklearn.model_selection import GridSearchCV
from sksurv.metrics import as_concordance_index_ipcw_scorer
# 1. 使用惩罚 Cox 模型进行特征选择
estimator = CoxnetSurvivalAnalysis(l1_ratio=0.9) # 类似 Lasso
# 2. 使用交叉验证调整正则化参数
param_grid = {'alpha_min_ratio': [0.01, 0.001]}
cv = GridSearchCV(estimator, param_grid,
scoring=as_concordance_index_ipcw_scorer(), cv=5)
cv.fit(X, y)
# 3. 识别选定的特征
best_model = cv.best_estimator_
selected_features = np.where(best_model.coef_ != 0)[0]
from sksurv.ensemble import GradientBoostingSurvivalAnalysis
from sklearn.model_selection import GridSearchCV
# 1. 定义参数网格
param_grid = {
'learning_rate': [0.01, 0.05, 0.1],
'n_estimators': [100, 200, 300],
'max_depth': [3, 5, 7]
}
# 2. 网格搜索
gbs = GradientBoostingSurvivalAnalysis()
cv = GridSearchCV(gbs, param_grid, cv=5,
scoring=as_concordance_index_ipcw_scorer(), n_jobs=-1)
cv.fit(X_train, y_train)
# 3. 评估最佳模型
best_model = cv.best_estimator_
risk_scores = best_model.predict(X_test)
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.ensemble import RandomSurvivalForest, GradientBoostingSurvivalAnalysis
from sksurv.svm import FastSurvivalSVM
from sksurv.metrics import concordance_index_ipcw, integrated_brier_score
# 定义模型
models = {
'Cox': CoxPHSurvivalAnalysis(),
'RSF': RandomSurvivalForest(n_estimators=100, random_state=42),
'GBS': GradientBoostingSurvivalAnalysis(random_state=42),
'SVM': FastSurvivalSVM(random_state=42)
}
# 评估每个模型
results = {}
for name, model in models.items():
model.fit(X_train_scaled, y_train)
risk_scores = model.predict(X_test_scaled)
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
results[name] = c_index
print(f"{name}: C-index = {c_index:.3f}")
# 选择最佳模型
best_model_name = max(results, key=results.get)
print(f"\n最佳模型: {best_model_name}")
scikit-survival 完全集成了 scikit-learn 的生态系统:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score, GridSearchCV
# 使用管道
pipeline = Pipeline([
('scaler', StandardScaler()),
('model', CoxPHSurvivalAnalysis())
])
# 使用交叉验证
scores = cross_val_score(pipeline, X, y, cv=5,
scoring=as_concordance_index_ipcw_scorer())
# 使用网格搜索
param_grid = {'model__alpha': [0.1, 1.0, 10.0]}
cv = GridSearchCV(pipeline, param_grid, cv=5)
cv.fit(X, y)
此技能包含针对特定主题的详细参考文件:
references/cox-models.md:Cox 比例风险模型、惩罚 Cox (CoxNet)、IPCRidge、正则化策略和解释的完整指南references/ensemble-models.md:随机生存森林、梯度提升、超参数调优、特征重要性和模型选择references/evaluation-metrics.md:一致性指数(Harrell 与 Uno)、时间依赖性 AUC、Brier 分数、全面的评估流程references/data-handling.md:数据加载、预处理工作流程、处理缺失数据、特征编码、验证检查references/svm-models.md:生存支持向量机、核选择、临床核变换、超参数调优references/competing-risks.md:竞争风险分析、累积发生率函数、特定原因风险模型当需要特定任务的详细信息时,请加载这些参考文件。
sksurv.datasets 获取练习数据集(GBSG2、WHAS500、退伍军人肺癌等)# 模型
from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis, IPCRidge
from sksurv.ensemble import RandomSurvivalForest, GradientBoostingSurvivalAnalysis
from sksurv.svm import FastSurvivalSVM, FastKernelSurvivalSVM
from sksurv.tree import SurvivalTree
# 评估指标
from sksurv.metrics import (
concordance_index_censored,
concordance_index_ipcw,
cumulative_dynamic_auc,
brier_score,
integrated_brier_score,
as_concordance_index_ipcw_scorer,
as_integrated_brier_score_scorer
)
# 非参数估计
from sksurv.nonparametric import (
kaplan_meier_estimator,
nelson_aalen_estimator,
cumulative_incidence_competing_risks
)
# 数据处理
from sksurv.util import Surv
from sksurv.preprocessing import OneHotEncoder, encode_categorical
from sksurv.datasets import load_gbsg2, load_breast_cancer, load_veterans_lung_cancer
# 核函数
from sksurv.kernels import ClinicalKernelTransform
每周安装次数
143
仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
安装于
claude-code123
opencode116
gemini-cli109
cursor108
antigravity98
codex98
scikit-survival is a Python library for survival analysis built on top of scikit-learn. It provides specialized tools for time-to-event analysis, handling the unique challenge of censored data where some observations are only partially known.
Survival analysis aims to establish connections between covariates and the time of an event, accounting for censored records (particularly right-censored data from studies where participants don't experience events during observation periods).
Use this skill when:
scikit-survival provides multiple model families, each suited for different scenarios:
Use for : Standard survival analysis with interpretable coefficients
CoxPHSurvivalAnalysis: Basic Cox modelCoxnetSurvivalAnalysis: Penalized Cox with elastic net for high-dimensional dataIPCRidge: Ridge regression for accelerated failure time modelsSee : references/cox-models.md for detailed guidance on Cox models, regularization, and interpretation
Use for : High predictive performance with complex non-linear relationships
RandomSurvivalForest: Robust, non-parametric ensemble methodGradientBoostingSurvivalAnalysis: Tree-based boosting for maximum performanceComponentwiseGradientBoostingSurvivalAnalysis: Linear boosting with feature selectionExtraSurvivalTrees: Extremely randomized trees for additional regularizationSee : references/ensemble-models.md for comprehensive guidance on ensemble methods, hyperparameter tuning, and when to use each model
Use for : Medium-sized datasets with margin-based learning
FastSurvivalSVM: Linear SVM optimized for speedFastKernelSurvivalSVM: Kernel SVM for non-linear relationshipsHingeLossSurvivalSVM: SVM with hinge lossClinicalKernelTransform: Specialized kernel for clinical + molecular dataSee : references/svm-models.md for detailed SVM guidance, kernel selection, and hyperparameter tuning
Start
├─ High-dimensional data (p > n)?
│ ├─ Yes → CoxnetSurvivalAnalysis (elastic net)
│ └─ No → Continue
│
├─ Need interpretable coefficients?
│ ├─ Yes → CoxPHSurvivalAnalysis or ComponentwiseGradientBoostingSurvivalAnalysis
│ └─ No → Continue
│
├─ Complex non-linear relationships expected?
│ ├─ Yes
│ │ ├─ Large dataset (n > 1000) → GradientBoostingSurvivalAnalysis
│ │ ├─ Medium dataset → RandomSurvivalForest or FastKernelSurvivalSVM
│ │ └─ Small dataset → RandomSurvivalForest
│ └─ No → CoxPHSurvivalAnalysis or FastSurvivalSVM
│
└─ For maximum performance → Try multiple models and compare
Before modeling, properly prepare survival data:
from sksurv.util import Surv
# From separate arrays
y = Surv.from_arrays(event=event_array, time=time_array)
# From DataFrame
y = Surv.from_dataframe('event', 'time', df)
See : references/data-handling.md for complete preprocessing workflows, data validation, and best practices
Proper evaluation is critical for survival models. Use appropriate metrics that account for censoring:
Primary metric for ranking/discrimination:
Harrell's C-index : Use for low censoring (<40%)
Uno's C-index : Use for moderate to high censoring (>40%) - more robust
from sksurv.metrics import concordance_index_censored, concordance_index_ipcw
c_harrell = concordance_index_censored(y_test['event'], y_test['time'], risk_scores)[0]
c_uno = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
Evaluate discrimination at specific time points:
from sksurv.metrics import cumulative_dynamic_auc
times = [365, 730, 1095] # 1, 2, 3 years
auc, mean_auc = cumulative_dynamic_auc(y_train, y_test, risk_scores, times)
Assess both discrimination and calibration:
from sksurv.metrics import integrated_brier_score
ibs = integrated_brier_score(y_train, y_test, survival_functions, times)
See : references/evaluation-metrics.md for comprehensive evaluation guidance, metric selection, and using scorers with cross-validation
Handle situations with multiple mutually exclusive event types:
from sksurv.nonparametric import cumulative_incidence_competing_risks
# Estimate cumulative incidence for each event type
time_points, cif_event1, cif_event2 = cumulative_incidence_competing_risks(y)
Use competing risks when :
See : references/competing-risks.md for detailed competing risks methods, cause-specific hazard models, and interpretation
Estimate survival functions without parametric assumptions:
from sksurv.nonparametric import kaplan_meier_estimator
time, survival_prob = kaplan_meier_estimator(y['event'], y['time'])
from sksurv.nonparametric import nelson_aalen_estimator
time, cumulative_hazard = nelson_aalen_estimator(y['event'], y['time'])
from sksurv.datasets import load_breast_cancer
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.metrics import concordance_index_ipcw
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# 1. Load and prepare data
X, y = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 2. Preprocess
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# 3. Fit model
estimator = CoxPHSurvivalAnalysis()
estimator.fit(X_train_scaled, y_train)
# 4. Predict
risk_scores = estimator.predict(X_test_scaled)
# 5. Evaluate
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
print(f"C-index: {c_index:.3f}")
from sksurv.linear_model import CoxnetSurvivalAnalysis
from sklearn.model_selection import GridSearchCV
from sksurv.metrics import as_concordance_index_ipcw_scorer
# 1. Use penalized Cox for feature selection
estimator = CoxnetSurvivalAnalysis(l1_ratio=0.9) # Lasso-like
# 2. Tune regularization with cross-validation
param_grid = {'alpha_min_ratio': [0.01, 0.001]}
cv = GridSearchCV(estimator, param_grid,
scoring=as_concordance_index_ipcw_scorer(), cv=5)
cv.fit(X, y)
# 3. Identify selected features
best_model = cv.best_estimator_
selected_features = np.where(best_model.coef_ != 0)[0]
from sksurv.ensemble import GradientBoostingSurvivalAnalysis
from sklearn.model_selection import GridSearchCV
# 1. Define parameter grid
param_grid = {
'learning_rate': [0.01, 0.05, 0.1],
'n_estimators': [100, 200, 300],
'max_depth': [3, 5, 7]
}
# 2. Grid search
gbs = GradientBoostingSurvivalAnalysis()
cv = GridSearchCV(gbs, param_grid, cv=5,
scoring=as_concordance_index_ipcw_scorer(), n_jobs=-1)
cv.fit(X_train, y_train)
# 3. Evaluate best model
best_model = cv.best_estimator_
risk_scores = best_model.predict(X_test)
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.ensemble import RandomSurvivalForest, GradientBoostingSurvivalAnalysis
from sksurv.svm import FastSurvivalSVM
from sksurv.metrics import concordance_index_ipcw, integrated_brier_score
# Define models
models = {
'Cox': CoxPHSurvivalAnalysis(),
'RSF': RandomSurvivalForest(n_estimators=100, random_state=42),
'GBS': GradientBoostingSurvivalAnalysis(random_state=42),
'SVM': FastSurvivalSVM(random_state=42)
}
# Evaluate each model
results = {}
for name, model in models.items():
model.fit(X_train_scaled, y_train)
risk_scores = model.predict(X_test_scaled)
c_index = concordance_index_ipcw(y_train, y_test, risk_scores)[0]
results[name] = c_index
print(f"{name}: C-index = {c_index:.3f}")
# Select best model
best_model_name = max(results, key=results.get)
print(f"\nBest model: {best_model_name}")
scikit-survival fully integrates with scikit-learn's ecosystem:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score, GridSearchCV
# Use pipelines
pipeline = Pipeline([
('scaler', StandardScaler()),
('model', CoxPHSurvivalAnalysis())
])
# Use cross-validation
scores = cross_val_score(pipeline, X, y, cv=5,
scoring=as_concordance_index_ipcw_scorer())
# Use grid search
param_grid = {'model__alpha': [0.1, 1.0, 10.0]}
cv = GridSearchCV(pipeline, param_grid, cv=5)
cv.fit(X, y)
This skill includes detailed reference files for specific topics:
references/cox-models.md : Complete guide to Cox proportional hazards models, penalized Cox (CoxNet), IPCRidge, regularization strategies, and interpretationreferences/ensemble-models.md : Random Survival Forests, Gradient Boosting, hyperparameter tuning, feature importance, and model selectionreferences/evaluation-metrics.md : Concordance index (Harrell's vs Uno's), time-dependent AUC, Brier score, comprehensive evaluation pipelinesreferences/data-handling.md : Data loading, preprocessing workflows, handling missing data, feature encoding, validation checksreferences/svm-models.md : Survival Support Vector Machines, kernel selection, clinical kernel transform, hyperparameter tuningreferences/competing-risks.md : Competing risks analysis, cumulative incidence functions, cause-specific hazard modelsLoad these reference files when detailed information is needed for specific tasks.
sksurv.datasets for practice datasets (GBSG2, WHAS500, veterans lung cancer, etc.)# Models
from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis, IPCRidge
from sksurv.ensemble import RandomSurvivalForest, GradientBoostingSurvivalAnalysis
from sksurv.svm import FastSurvivalSVM, FastKernelSurvivalSVM
from sksurv.tree import SurvivalTree
# Evaluation metrics
from sksurv.metrics import (
concordance_index_censored,
concordance_index_ipcw,
cumulative_dynamic_auc,
brier_score,
integrated_brier_score,
as_concordance_index_ipcw_scorer,
as_integrated_brier_score_scorer
)
# Non-parametric estimation
from sksurv.nonparametric import (
kaplan_meier_estimator,
nelson_aalen_estimator,
cumulative_incidence_competing_risks
)
# Data handling
from sksurv.util import Surv
from sksurv.preprocessing import OneHotEncoder, encode_categorical
from sksurv.datasets import load_gbsg2, load_breast_cancer, load_veterans_lung_cancer
# Kernels
from sksurv.kernels import ClinicalKernelTransform
Weekly Installs
143
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code123
opencode116
gemini-cli109
cursor108
antigravity98
codex98
专业SEO审计工具:全面网站诊断、技术SEO优化与页面分析指南
64,900 周安装