npx skills add https://github.com/davila7/claude-code-templates --skill shapSHAP 是一种统一的方法,利用合作博弈论中的沙普利值来解释机器学习模型的输出。本技能提供全面的指导,涵盖:
SHAP 适用于所有模型类型:基于树的模型(XGBoost、LightGBM、CatBoost、随机森林)、深度学习模型(TensorFlow、PyTorch、Keras)、线性模型和黑盒模型。
当用户询问以下问题时触发此技能:
决策树:
shap.TreeExplainer(快速、精确)广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
shap.DeepExplainershap.GradientExplainershap.LinearExplainer(极快)shap.KernelExplainer(模型无关但较慢)shap.Explainer(自动选择最佳算法)有关所有解释器类型的详细信息,请参阅 references/explainers.md。
import shap
# 使用基于树模型的示例(XGBoost)
import xgboost as xgb
# 训练模型
model = xgb.XGBClassifier().fit(X_train, y_train)
# 创建解释器
explainer = shap.TreeExplainer(model)
# 计算 SHAP 值
shap_values = explainer(X_test)
# shap_values 对象包含:
# - values: SHAP 值(特征归因)
# - base_values: 预期模型输出(基线)
# - data: 原始特征值
用于全局理解(整个数据集):
# 蜂群图 - 显示特征重要性及其值分布
shap.plots.beeswarm(shap_values, max_display=15)
# 条形图 - 特征重要性的简洁摘要
shap.plots.bar(shap_values)
用于单个预测:
# 瀑布图 - 单个预测的详细分解
shap.plots.waterfall(shap_values[0])
# 力图 - 加性力可视化
shap.plots.force(shap_values[0])
用于特征关系:
# 散点图 - 特征与预测的关系
shap.plots.scatter(shap_values[:, "Feature_Name"])
# 用另一个特征着色以显示交互作用
shap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Education"])
有关所有图表类型的综合指南,请参阅 references/plots.md。
此技能支持几种常见的工作流程。请选择与当前任务匹配的工作流程。
目标:理解驱动模型预测的因素
步骤:
示例:
# 步骤 1-2:设置
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# 步骤 3:全局重要性
shap.plots.beeswarm(shap_values)
# 步骤 4:特征关系
shap.plots.scatter(shap_values[:, "Most_Important_Feature"])
# 步骤 5:个体解释
shap.plots.waterfall(shap_values[0])
目标:识别并修复模型问题
步骤:
详细的调试工作流程,请参阅 references/workflows.md。
目标:利用 SHAP 洞察改进特征
步骤:
详细的特征工程工作流程,请参阅 references/workflows.md。
目标:比较多个模型以选择最佳可解释选项
步骤:
详细的模型比较工作流程,请参阅 references/workflows.md。
目标:检测和分析跨人口统计群体的模型偏见
步骤:
详细的公平性分析工作流程,请参阅 references/workflows.md。
目标:将 SHAP 解释集成到生产系统中
步骤:
详细的生产部署工作流程,请参阅 references/workflows.md。
定义:SHAP 值量化每个特征对预测的贡献,以与预期模型输出(基线)的偏差来衡量。
属性:
解释:
示例:
基线(期望值):0.30
特征贡献(SHAP 值):
年龄:+0.15
收入:+0.10
教育:-0.05
最终预测:0.30 + 0.15 + 0.10 - 0.05 = 0.50
目的:代表“典型”输入以建立基线期望
选择:
影响:基线影响 SHAP 值的大小,但不影响相对重要性
关键考虑:理解你的模型输出什么
示例:XGBoost 分类器默认解释边际输出(对数几率)。要解释概率,请在 TreeExplainer 中使用 model_output="probability"。
# 1. 设置
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# 2. 全局重要性
shap.plots.beeswarm(shap_values)
shap.plots.bar(shap_values)
# 3. 顶级特征关系
top_features = X_test.columns[np.abs(shap_values.values).mean(0).argsort()[-5:]]
for feature in top_features:
shap.plots.scatter(shap_values[:, feature])
# 4. 示例预测
for i in range(5):
shap.plots.waterfall(shap_values[i])
# 定义队列
cohort1_mask = X_test['Group'] == 'A'
cohort2_mask = X_test['Group'] == 'B'
# 比较特征重要性
shap.plots.bar({
"Group A": shap_values[cohort1_mask],
"Group B": shap_values[cohort2_mask]
})
# 查找错误
errors = model.predict(X_test) != y_test
error_indices = np.where(errors)[0]
# 解释错误
for idx in error_indices[:5]:
print(f"Sample {idx}:")
shap.plots.waterfall(shap_values[idx])
# 调查关键特征
shap.plots.scatter(shap_values[:, "Suspicious_Feature"])
解释器速度(从最快到最慢):
LinearExplainer - 几乎瞬时TreeExplainer - 非常快DeepExplainer - 对于神经网络很快GradientExplainer - 对于神经网络很快KernelExplainer - 慢(仅在必要时使用)PermutationExplainer - 非常慢但准确对于大型数据集:
# 为子集计算 SHAP
shap_values = explainer(X_test[:1000])
# 或使用批处理
batch_size = 100
all_shap_values = []
for i in range(0, len(X_test), batch_size):
batch_shap = explainer(X_test[i:i+batch_size])
all_shap_values.append(batch_shap)
对于可视化:
# 为图表抽样子集
shap.plots.beeswarm(shap_values[:1000])
# 为密集图表调整透明度
shap.plots.scatter(shap_values[:, "Feature"], alpha=0.3)
对于生产环境:
# 缓存解释器
import joblib
joblib.dump(explainer, 'explainer.pkl')
explainer = joblib.load('explainer.pkl')
# 为批量预测预计算
# 仅为 API 响应计算前 N 个特征
问题:对树模型使用 KernelExplainer(慢且不必要) 解决方案:对于基于树的模型,始终使用 TreeExplainer
问题:DeepExplainer/KernelExplainer 的背景样本太少 解决方案:使用 100-1000 个代表性样本
问题:将对数几率解释为概率 解决方案:检查模型输出类型;理解值是概率、对数几率还是原始输出
问题:Matplotlib 后端问题
解决方案:确保后端设置正确;如果需要,使用 plt.show()
问题:默认的 max_display=10 可能过多或过少
解决方案:调整 max_display 参数或使用特征聚类
问题:为非常大的数据集计算 SHAP 解决方案:抽样子集、使用批处理,或确保使用专用解释器(而非 KernelExplainer)
show=True(默认)进行内联图表显示import mlflow
with mlflow.start_run():
# 训练模型
model = train_model(X_train, y_train)
# 计算 SHAP
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# 记录图表
shap.plots.beeswarm(shap_values, show=False)
mlflow.log_figure(plt.gcf(), "shap_beeswarm.png")
plt.close()
# 记录特征重要性指标
mean_abs_shap = np.abs(shap_values.values).mean(axis=0)
for feature, importance in zip(X_test.columns, mean_abs_shap):
mlflow.log_metric(f"shap_{feature}", importance)
class ExplanationService:
def __init__(self, model_path, explainer_path):
self.model = joblib.load(model_path)
self.explainer = joblib.load(explainer_path)
def predict_with_explanation(self, X):
prediction = self.model.predict(X)
shap_values = self.explainer(X)
return {
'prediction': prediction[0],
'base_value': shap_values.base_values[0],
'feature_contributions': dict(zip(X.columns, shap_values.values[0]))
}
此技能包含按主题组织的综合参考文档:
所有解释器类的完整指南:
TreeExplainer - 基于树模型的快速、精确解释DeepExplainer - 深度学习模型(TensorFlow、PyTorch)KernelExplainer - 模型无关(适用于任何模型)LinearExplainer - 线性模型的快速解释GradientExplainer - 基于梯度的神经网络解释PermutationExplainer - 精确但适用于任何模型(较慢)包括:构造函数参数、方法、支持的模型、使用时机、示例、性能考虑。
综合可视化指南:
包括:参数、用例、示例、最佳实践、图表选择指南。
详细的工作流程和最佳实践:
包括:分步说明、代码示例、决策标准、故障排除。
理论基础:
包括:数学基础、证明、比较、高级主题。
何时加载参考文件:
explainers.mdplots.mdworkflows.mdtheory.md默认方法(不加载参考文件):
加载参考文件:
# 要加载参考文件,请使用 Read 工具并指定适当的文件路径:
# /path/to/shap/references/explainers.md
# /path/to/shap/references/plots.md
# /path/to/shap/references/workflows.md
# /path/to/shap/references/theory.md
# 基本安装
uv pip install shap
# 包含可视化依赖
uv pip install shap matplotlib
# 最新版本
uv pip install -U shap
依赖项:numpy、pandas、scikit-learn、matplotlib、scipy
可选:xgboost、lightgbm、tensorflow、torch(取决于模型类型)
此技能为所有用例和模型类型的模型可解释性提供了全面的 SHAP 覆盖。
每周安装次数
141
仓库
GitHub 星标数
22.6K
首次出现
2026 年 1 月 20 日
安全审计
安装于
claude-code119
opencode109
gemini-cli104
cursor99
codex96
antigravity95
SHAP is a unified approach to explain machine learning model outputs using Shapley values from cooperative game theory. This skill provides comprehensive guidance for:
SHAP works with all model types: tree-based models (XGBoost, LightGBM, CatBoost, Random Forest), deep learning models (TensorFlow, PyTorch, Keras), linear models, and black-box models.
Trigger this skill when users ask about :
Decision Tree :
Tree-based model? (XGBoost, LightGBM, CatBoost, Random Forest, Gradient Boosting)
shap.TreeExplainer (fast, exact)Deep neural network? (TensorFlow, PyTorch, Keras, CNNs, RNNs, Transformers)
shap.DeepExplainer or shap.GradientExplainerLinear model? (Linear/Logistic Regression, GLMs)
shap.LinearExplainer (extremely fast)Any other model? (SVMs, custom functions, black-box models)
shap.KernelExplainer (model-agnostic but slower)Unsure?
shap.Explainer (automatically selects best algorithm)Seereferences/explainers.md for detailed information on all explainer types.
import shap
# Example with tree-based model (XGBoost)
import xgboost as xgb
# Train model
model = xgb.XGBClassifier().fit(X_train, y_train)
# Create explainer
explainer = shap.TreeExplainer(model)
# Compute SHAP values
shap_values = explainer(X_test)
# The shap_values object contains:
# - values: SHAP values (feature attributions)
# - base_values: Expected model output (baseline)
# - data: Original feature values
For Global Understanding (entire dataset):
# Beeswarm plot - shows feature importance with value distributions
shap.plots.beeswarm(shap_values, max_display=15)
# Bar plot - clean summary of feature importance
shap.plots.bar(shap_values)
For Individual Predictions :
# Waterfall plot - detailed breakdown of single prediction
shap.plots.waterfall(shap_values[0])
# Force plot - additive force visualization
shap.plots.force(shap_values[0])
For Feature Relationships :
# Scatter plot - feature-prediction relationship
shap.plots.scatter(shap_values[:, "Feature_Name"])
# Colored by another feature to show interactions
shap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Education"])
Seereferences/plots.md for comprehensive guide on all plot types.
This skill supports several common workflows. Choose the workflow that matches the current task.
Goal : Understand what drives model predictions
Steps :
Example :
# Step 1-2: Setup
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# Step 3: Global importance
shap.plots.beeswarm(shap_values)
# Step 4: Feature relationships
shap.plots.scatter(shap_values[:, "Most_Important_Feature"])
# Step 5: Individual explanation
shap.plots.waterfall(shap_values[0])
Goal : Identify and fix model issues
Steps :
Seereferences/workflows.md for detailed debugging workflow.
Goal : Use SHAP insights to improve features
Steps :
Seereferences/workflows.md for detailed feature engineering workflow.
Goal : Compare multiple models to select best interpretable option
Steps :
Seereferences/workflows.md for detailed model comparison workflow.
Goal : Detect and analyze model bias across demographic groups
Steps :
Seereferences/workflows.md for detailed fairness analysis workflow.
Goal : Integrate SHAP explanations into production systems
Steps :
Seereferences/workflows.md for detailed production deployment workflow.
Definition : SHAP values quantify each feature's contribution to a prediction, measured as the deviation from the expected model output (baseline).
Properties :
Interpretation :
Example :
Baseline (expected value): 0.30
Feature contributions (SHAP values):
Age: +0.15
Income: +0.10
Education: -0.05
Final prediction: 0.30 + 0.15 + 0.10 - 0.05 = 0.50
Purpose : Represents "typical" input to establish baseline expectations
Selection :
Impact : Baseline affects SHAP value magnitudes but not relative importance
Critical Consideration : Understand what your model outputs
Example : XGBoost classifiers explain margin output (log-odds) by default. To explain probabilities, use model_output="probability" in TreeExplainer.
# 1. Setup
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# 2. Global importance
shap.plots.beeswarm(shap_values)
shap.plots.bar(shap_values)
# 3. Top feature relationships
top_features = X_test.columns[np.abs(shap_values.values).mean(0).argsort()[-5:]]
for feature in top_features:
shap.plots.scatter(shap_values[:, feature])
# 4. Example predictions
for i in range(5):
shap.plots.waterfall(shap_values[i])
# Define cohorts
cohort1_mask = X_test['Group'] == 'A'
cohort2_mask = X_test['Group'] == 'B'
# Compare feature importance
shap.plots.bar({
"Group A": shap_values[cohort1_mask],
"Group B": shap_values[cohort2_mask]
})
# Find errors
errors = model.predict(X_test) != y_test
error_indices = np.where(errors)[0]
# Explain errors
for idx in error_indices[:5]:
print(f"Sample {idx}:")
shap.plots.waterfall(shap_values[idx])
# Investigate key features
shap.plots.scatter(shap_values[:, "Suspicious_Feature"])
Explainer Speed (fastest to slowest):
LinearExplainer - Nearly instantaneousTreeExplainer - Very fastDeepExplainer - Fast for neural networksGradientExplainer - Fast for neural networksKernelExplainer - Slow (use only when necessary)PermutationExplainer - Very slow but accurateFor Large Datasets :
# Compute SHAP for subset
shap_values = explainer(X_test[:1000])
# Or use batching
batch_size = 100
all_shap_values = []
for i in range(0, len(X_test), batch_size):
batch_shap = explainer(X_test[i:i+batch_size])
all_shap_values.append(batch_shap)
For Visualizations :
# Sample subset for plots
shap.plots.beeswarm(shap_values[:1000])
# Adjust transparency for dense plots
shap.plots.scatter(shap_values[:, "Feature"], alpha=0.3)
For Production :
# Cache explainer
import joblib
joblib.dump(explainer, 'explainer.pkl')
explainer = joblib.load('explainer.pkl')
# Pre-compute for batch predictions
# Only compute top N features for API responses
Problem : Using KernelExplainer for tree models (slow and unnecessary) Solution : Always use TreeExplainer for tree-based models
Problem : DeepExplainer/KernelExplainer with too few background samples Solution : Use 100-1000 representative samples
Problem : Interpreting log-odds as probabilities Solution : Check model output type; understand whether values are probabilities, log-odds, or raw outputs
Problem : Matplotlib backend issues Solution : Ensure backend is set correctly; use plt.show() if needed
Problem : Default max_display=10 may be too many or too few Solution : Adjust max_display parameter or use feature clustering
Problem : Computing SHAP for very large datasets Solution : Sample subset, use batching, or ensure using specialized explainer (not KernelExplainer)
show=True (default)import mlflow
with mlflow.start_run():
# Train model
model = train_model(X_train, y_train)
# Compute SHAP
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)
# Log plots
shap.plots.beeswarm(shap_values, show=False)
mlflow.log_figure(plt.gcf(), "shap_beeswarm.png")
plt.close()
# Log feature importance metrics
mean_abs_shap = np.abs(shap_values.values).mean(axis=0)
for feature, importance in zip(X_test.columns, mean_abs_shap):
mlflow.log_metric(f"shap_{feature}", importance)
class ExplanationService:
def __init__(self, model_path, explainer_path):
self.model = joblib.load(model_path)
self.explainer = joblib.load(explainer_path)
def predict_with_explanation(self, X):
prediction = self.model.predict(X)
shap_values = self.explainer(X)
return {
'prediction': prediction[0],
'base_value': shap_values.base_values[0],
'feature_contributions': dict(zip(X.columns, shap_values.values[0]))
}
This skill includes comprehensive reference documentation organized by topic:
Complete guide to all explainer classes:
TreeExplainer - Fast, exact explanations for tree-based modelsDeepExplainer - Deep learning models (TensorFlow, PyTorch)KernelExplainer - Model-agnostic (works with any model)LinearExplainer - Fast explanations for linear modelsGradientExplainer - Gradient-based for neural networksPermutationExplainer - Exact but slow for any modelIncludes: Constructor parameters, methods, supported models, when to use, examples, performance considerations.
Comprehensive visualization guide:
Includes: Parameters, use cases, examples, best practices, plot selection guide.
Detailed workflows and best practices:
Includes: Step-by-step instructions, code examples, decision criteria, troubleshooting.
Theoretical foundations:
Includes: Mathematical foundations, proofs, comparisons, advanced topics.
When to load reference files :
explainers.md when user needs detailed information about specific explainer types or parametersplots.md when user needs detailed visualization guidance or exploring plot optionsworkflows.md when user has complex multi-step tasks (debugging, fairness analysis, production deployment)theory.md when user asks about theoretical foundations, Shapley values, or mathematical detailsDefault approach (without loading references):
Loading references :
# To load reference files, use the Read tool with appropriate file path:
# /path/to/shap/references/explainers.md
# /path/to/shap/references/plots.md
# /path/to/shap/references/workflows.md
# /path/to/shap/references/theory.md
Choose the right explainer : Use specialized explainers (TreeExplainer, DeepExplainer, LinearExplainer) when possible; avoid KernelExplainer unless necessary
Start global, then go local : Begin with beeswarm/bar plots for overall understanding, then dive into waterfall/scatter plots for details
Use multiple visualizations : Different plots reveal different insights; combine global (beeswarm) + local (waterfall) + relationship (scatter) views
Select appropriate background data : Use 50-1000 representative samples from training data
Understand model output units : Know whether explaining probabilities, log-odds, or raw outputs
Validate with domain knowledge : SHAP shows model behavior; use domain expertise to interpret and validate
Optimize for performance : Sample subsets for visualization, batch for large datasets, cache explainers in production
Check for data leakage : Unexpectedly high feature importance may indicate data quality issues
Consider feature correlations : Use TreeExplainer's correlation-aware options or feature clustering for redundant features
Remember SHAP shows association, not causation : Use domain knowledge for causal interpretation
# Basic installation
uv pip install shap
# With visualization dependencies
uv pip install shap matplotlib
# Latest version
uv pip install -U shap
Dependencies : numpy, pandas, scikit-learn, matplotlib, scipy
Optional : xgboost, lightgbm, tensorflow, torch (depending on model types)
This skill provides comprehensive coverage of SHAP for model interpretability across all use cases and model types.
Weekly Installs
141
Repository
GitHub Stars
22.6K
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code119
opencode109
gemini-cli104
cursor99
codex96
antigravity95
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
62,200 周安装