SHAP 模型解释指南：计算特征重要性、可视化与公平性分析

shap by davila7/claude-code-templates

167 周安装量

23,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill shap

AI/机器学习数据可视化数据分析

🇨🇳中文介绍

SHAP (SHapley Additive exPlanations)

概述

SHAP 是一种统一的方法，利用合作博弈论中的沙普利值来解释机器学习模型的输出。本技能提供全面的指导，涵盖：

为任何模型类型计算 SHAP 值
创建可视化图表以理解特征重要性
调试和验证模型行为
分析公平性和偏见
在生产环境中实现可解释的人工智能

SHAP 适用于所有模型类型：基于树的模型（XGBoost、LightGBM、CatBoost、随机森林）、深度学习模型（TensorFlow、PyTorch、Keras）、线性模型和黑盒模型。

何时使用此技能

当用户询问以下问题时触发此技能：

“解释我的模型中哪些特征最重要”
“生成 SHAP 图表”（瀑布图、蜂群图、条形图、散点图、力图、热力图等）
“为什么我的模型做出了这个预测？”
“为我的模型计算 SHAP 值”
“使用 SHAP 可视化特征重要性”
“调试我的模型行为”或“验证我的模型”
“检查我的模型是否存在偏见”或“分析公平性”
“比较不同模型间的特征重要性”
“实现可解释的人工智能”或“为我的模型添加解释”
“理解特征交互作用”
“创建模型解释仪表板”

快速入门指南

步骤 1：选择合适的解释器

决策树：

基于树的模型？（XGBoost、LightGBm、CatBoost、随机森林、梯度提升）
- 使用 shap.TreeExplainer（快速、精确）
深度神经网络？（TensorFlow、PyTorch、Keras、CNN、RNN、Transformer）
- 使用或

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 2：计算 SHAP 值

import shap

# 使用基于树模型的示例（XGBoost）
import xgboost as xgb

# 训练模型
model = xgb.XGBClassifier().fit(X_train, y_train)

# 创建解释器
explainer = shap.TreeExplainer(model)

# 计算 SHAP 值
shap_values = explainer(X_test)

# shap_values 对象包含：
# - values: SHAP 值（特征归因）
# - base_values: 预期模型输出（基线）
# - data: 原始特征值

步骤 3：可视化结果

用于全局理解（整个数据集）：

# 蜂群图 - 显示特征重要性及其值分布
shap.plots.beeswarm(shap_values, max_display=15)

# 条形图 - 特征重要性的简洁摘要
shap.plots.bar(shap_values)

用于单个预测：

# 瀑布图 - 单个预测的详细分解
shap.plots.waterfall(shap_values[0])

# 力图 - 加性力可视化
shap.plots.force(shap_values[0])

用于特征关系：

# 散点图 - 特征与预测的关系
shap.plots.scatter(shap_values[:, "Feature_Name"])

# 用另一个特征着色以显示交互作用
shap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Education"])

有关所有图表类型的综合指南，请参阅 references/plots.md。

此技能支持几种常见的工作流程。请选择与当前任务匹配的工作流程。

工作流程 1：基本模型解释

目标：理解驱动模型预测的因素

训练模型并创建适当的解释器
为测试集计算 SHAP 值
生成全局重要性图表（蜂群图或条形图）
检查顶级特征关系（散点图）
解释特定预测（瀑布图）

# 步骤 1-2：设置
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)

# 步骤 3：全局重要性
shap.plots.beeswarm(shap_values)

# 步骤 4：特征关系
shap.plots.scatter(shap_values[:, "Most_Important_Feature"])

# 步骤 5：个体解释
shap.plots.waterfall(shap_values[0])

工作流程 2：模型调试

目标：识别并修复模型问题

计算 SHAP 值
识别预测错误
解释错误分类的样本
检查意外的特征重要性（数据泄漏）
验证特征关系是否合理
检查特征交互作用

详细的调试工作流程，请参阅 references/workflows.md。

工作流程 3：特征工程

目标：利用 SHAP 洞察改进特征

为基线模型计算 SHAP 值
识别非线性关系（转换的候选）
识别特征交互作用（交互项的候选）
设计新特征
重新训练并比较 SHAP 值
验证改进

详细的特征工程工作流程，请参阅 references/workflows.md。

工作流程 4：模型比较

目标：比较多个模型以选择最佳可解释选项

训练多个模型
为每个模型计算 SHAP 值
比较全局特征重要性
检查特征排名的一致性
分析跨模型的特定预测
根据准确性、可解释性和一致性进行选择

详细的模型比较工作流程，请参阅 references/workflows.md。

工作流程 5：公平性与偏见分析

目标：检测和分析跨人口统计群体的模型偏见

识别受保护属性（性别、种族、年龄等）
计算 SHAP 值
比较不同群体间的特征重要性
检查受保护属性的 SHAP 重要性
识别代理特征
如果发现偏见，实施缓解策略

详细的公平性分析工作流程，请参阅 references/workflows.md。

工作流程 6：生产部署

目标：将 SHAP 解释集成到生产系统中

训练并保存模型
创建并保存解释器
构建解释服务
为带有解释的预测创建 API 端点
实现缓存和优化
监控解释质量

详细的生产部署工作流程，请参阅 references/workflows.md。

定义：SHAP 值量化每个特征对预测的贡献，以与预期模型输出（基线）的偏差来衡量。

可加性：SHAP 值之和等于预测与基线之间的差值
公平性：基于博弈论中的沙普利值
一致性：如果一个特征变得更重要，其 SHAP 值会增加

正 SHAP 值 → 特征将预测推高
负 SHAP 值 → 特征将预测推低
幅度 → 特征影响的强度
SHAP 值之和 → 相对于基线的总预测变化

基线（期望值）：0.30
特征贡献（SHAP 值）：
  年龄：+0.15
  收入：+0.10
  教育：-0.05
最终预测：0.30 + 0.15 + 0.10 - 0.05 = 0.50

背景数据 / 基线

目的：代表“典型”输入以建立基线期望

从训练数据中随机抽样（50-1000 个样本）
或使用 kmeans 选择代表性样本
对于 DeepExplainer/KernelExplainer：100-1000 个样本可平衡准确性和速度

影响：基线影响 SHAP 值的大小，但不影响相对重要性

关键考虑：理解你的模型输出什么

原始输出：用于回归或树模型的边际值
概率：用于分类概率
对数几率：用于逻辑回归（sigmoid 函数之前）

示例：XGBoost 分类器默认解释边际输出（对数几率）。要解释概率，请在 TreeExplainer 中使用 model_output="probability"。

模式 1：完整的模型分析

# 1. 设置
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)

# 2. 全局重要性
shap.plots.beeswarm(shap_values)
shap.plots.bar(shap_values)

# 3. 顶级特征关系
top_features = X_test.columns[np.abs(shap_values.values).mean(0).argsort()[-5:]]
for feature in top_features:
    shap.plots.scatter(shap_values[:, feature])

# 4. 示例预测
for i in range(5):
    shap.plots.waterfall(shap_values[i])

模式 2：队列比较

# 定义队列
cohort1_mask = X_test['Group'] == 'A'
cohort2_mask = X_test['Group'] == 'B'

# 比较特征重要性
shap.plots.bar({
    "Group A": shap_values[cohort1_mask],
    "Group B": shap_values[cohort2_mask]
})

模式 3：调试错误

# 查找错误
errors = model.predict(X_test) != y_test
error_indices = np.where(errors)[0]

# 解释错误
for idx in error_indices[:5]:
    print(f"Sample {idx}:")
    shap.plots.waterfall(shap_values[idx])

    # 调查关键特征
    shap.plots.scatter(shap_values[:, "Suspicious_Feature"])

解释器速度（从最快到最慢）：

LinearExplainer - 几乎瞬时
TreeExplainer - 非常快
DeepExplainer - 对于神经网络很快
GradientExplainer - 对于神经网络很快
KernelExplainer - 慢（仅在必要时使用）
PermutationExplainer - 非常慢但准确

对于大型数据集：

# 为子集计算 SHAP
shap_values = explainer(X_test[:1000])

# 或使用批处理
batch_size = 100
all_shap_values = []
for i in range(0, len(X_test), batch_size):
    batch_shap = explainer(X_test[i:i+batch_size])
    all_shap_values.append(batch_shap)

对于可视化：

# 为图表抽样子集
shap.plots.beeswarm(shap_values[:1000])

# 为密集图表调整透明度
shap.plots.scatter(shap_values[:, "Feature"], alpha=0.3)

对于生产环境：

# 缓存解释器
import joblib
joblib.dump(explainer, 'explainer.pkl')
explainer = joblib.load('explainer.pkl')

# 为批量预测预计算
# 仅为 API 响应计算前 N 个特征

问题：选择了错误的解释器

问题：对树模型使用 KernelExplainer（慢且不必要） 解决方案：对于基于树的模型，始终使用 TreeExplainer

问题：背景数据不足

问题：DeepExplainer/KernelExplainer 的背景样本太少 解决方案：使用 100-1000 个代表性样本

问题：单位混淆

问题：将对数几率解释为概率 解决方案：检查模型输出类型；理解值是概率、对数几率还是原始输出

问题：图表不显示

问题：Matplotlib 后端问题 解决方案：确保后端设置正确；如果需要，使用 plt.show()

问题：特征过多导致图表混乱

问题：默认的 max_display=10 可能过多或过少 解决方案：调整 max_display 参数或使用特征聚类

问题：计算速度慢

问题：为非常大的数据集计算 SHAP 解决方案：抽样子集、使用批处理，或确保使用专用解释器（而非 KernelExplainer）

与其他工具的集成

交互式力图可无缝工作
使用 show=True（默认）进行内联图表显示
与 markdown 结合用于叙述性解释

MLflow / 实验跟踪

import mlflow

with mlflow.start_run():
    # 训练模型
    model = train_model(X_train, y_train)

    # 计算 SHAP
    explainer = shap.TreeExplainer(model)
    shap_values = explainer(X_test)

    # 记录图表
    shap.plots.beeswarm(shap_values, show=False)
    mlflow.log_figure(plt.gcf(), "shap_beeswarm.png")
    plt.close()

    # 记录特征重要性指标
    mean_abs_shap = np.abs(shap_values.values).mean(axis=0)
    for feature, importance in zip(X_test.columns, mean_abs_shap):
        mlflow.log_metric(f"shap_{feature}", importance)

class ExplanationService:
    def __init__(self, model_path, explainer_path):
        self.model = joblib.load(model_path)
        self.explainer = joblib.load(explainer_path)

    def predict_with_explanation(self, X):
        prediction = self.model.predict(X)
        shap_values = self.explainer(X)

        return {
            'prediction': prediction[0],
            'base_value': shap_values.base_values[0],
            'feature_contributions': dict(zip(X.columns, shap_values.values[0]))
        }

此技能包含按主题组织的综合参考文档：

references/explainers.md

所有解释器类的完整指南：

TreeExplainer - 基于树模型的快速、精确解释
DeepExplainer - 深度学习模型（TensorFlow、PyTorch）
KernelExplainer - 模型无关（适用于任何模型）
LinearExplainer - 线性模型的快速解释
GradientExplainer - 基于梯度的神经网络解释
PermutationExplainer - 精确但适用于任何模型（较慢）

包括：构造函数参数、方法、支持的模型、使用时机、示例、性能考虑。

综合可视化指南：

瀑布图 - 个体预测分解
蜂群图 - 带有值分布的全局重要性
条形图 - 简洁的特征重要性摘要
散点图 - 特征-预测关系及交互作用
力图 - 交互式加性力可视化
热力图 - 多样本比较网格
小提琴图 - 侧重于分布的替代方案
决策图 - 多类别预测路径

包括：参数、用例、示例、最佳实践、图表选择指南。

references/workflows.md

详细的工作流程和最佳实践：

基本模型解释工作流程
模型调试和验证
特征工程指导
模型比较和选择
公平性和偏见分析
深度学习模型解释
生产部署
时间序列模型解释
常见陷阱和解决方案
高级技术
MLOps 集成

包括：分步说明、代码示例、决策标准、故障排除。

references/theory.md

博弈论中的沙普利值
数学公式和属性
与其他解释方法的联系（LIME、DeepLIFT 等）
SHAP 计算算法（Tree SHAP、Kernel SHAP 等）
条件期望和基线选择
解释 SHAP 值
交互值
理论限制和考虑

包括：数学基础、证明、比较、高级主题。

何时加载参考文件：

当用户需要关于特定解释器类型或参数的详细信息时，加载 explainers.md
当用户需要详细的可视化指导或探索图表选项时，加载 plots.md
当用户有复杂的多步骤任务（调试、公平性分析、生产部署）时，加载 workflows.md
当用户询问理论基础、沙普利值或数学细节时，加载 theory.md

默认方法（不加载参考文件）：

使用此 SKILL.md 进行基本解释和快速入门
提供标准工作流程和常见模式
如果需要更多细节，参考文件可用

加载参考文件：

# 要加载参考文件，请使用 Read 工具并指定适当的文件路径：
# /path/to/shap/references/explainers.md
# /path/to/shap/references/plots.md
# /path/to/shap/references/workflows.md
# /path/to/shap/references/theory.md

选择正确的解释器：尽可能使用专用解释器（TreeExplainer、DeepExplainer、LinearExplainer）；除非必要，避免使用 KernelExplainer
先全局，后局部：从蜂群图/条形图开始以获得整体理解，然后深入瀑布图/散点图了解细节
使用多种可视化：不同的图表揭示不同的洞察；结合全局（蜂群图）+ 局部（瀑布图）+ 关系（散点图）视图
选择合适的背景数据：使用训练数据中的 50-1000 个代表性样本
理解模型输出单位：知道是在解释概率、对数几率还是原始输出
用领域知识验证：SHAP 显示模型行为；使用领域专业知识进行解释和验证
优化性能：为可视化抽样子集，为大型数据集使用批处理，在生产环境中缓存解释器
检查数据泄漏：意外高的特征重要性可能表明数据质量问题
考虑特征相关性：对于冗余特征，使用 TreeExplainer 的考虑相关性的选项或特征聚类
记住 SHAP 显示的是关联，而非因果：使用领域知识进行因果解释

# 基本安装
uv pip install shap

# 包含可视化依赖
uv pip install shap matplotlib

# 最新版本
uv pip install -U shap

依赖项：numpy、pandas、scikit-learn、matplotlib、scipy

可选：xgboost、lightgbm、tensorflow、torch（取决于模型类型）

官方文档：https://shap.readthedocs.io/
GitHub 仓库：https://github.com/slundberg/shap
原始论文：Lundberg & Lee (2017) - "A Unified Approach to Interpreting Model Predictions"
Nature MI 论文：Lundberg et al. (2020) - "From local explanations to global understanding with explainable AI for trees"

此技能为所有用例和模型类型的模型可解释性提供了全面的 SHAP 覆盖。

2026 年 1 月 20 日

🇺🇸English

SHAP (SHapley Additive exPlanations)

Overview

SHAP is a unified approach to explain machine learning model outputs using Shapley values from cooperative game theory. This skill provides comprehensive guidance for:

Computing SHAP values for any model type
Creating visualizations to understand feature importance
Debugging and validating model behavior
Analyzing fairness and bias
Implementing explainable AI in production

SHAP works with all model types: tree-based models (XGBoost, LightGBM, CatBoost, Random Forest), deep learning models (TensorFlow, PyTorch, Keras), linear models, and black-box models.

When to Use This Skill

Trigger this skill when users ask about :

"Explain which features are most important in my model"
"Generate SHAP plots" (waterfall, beeswarm, bar, scatter, force, heatmap, etc.)
"Why did my model make this prediction?"
"Calculate SHAP values for my model"
"Visualize feature importance using SHAP"
"Debug my model's behavior" or "validate my model"
"Check my model for bias" or "analyze fairness"
"Compare feature importance across models"
"Implement explainable AI" or "add explanations to my model"
"Understand feature interactions"
"Create model interpretation dashboard"

Quick Start Guide

Step 1: Select the Right Explainer

Decision Tree :

Tree-based model? (XGBoost, LightGBM, CatBoost, Random Forest, Gradient Boosting)
- Use shap.TreeExplainer (fast, exact)
Deep neural network? (TensorFlow, PyTorch, Keras, CNNs, RNNs, Transformers)
- Use shap.DeepExplainer or shap.GradientExplainer
Linear model? (Linear/Logistic Regression, GLMs)
- Use shap.LinearExplainer (extremely fast)
Any other model? (SVMs, custom functions, black-box models)
- Use shap.KernelExplainer (model-agnostic but slower)
Unsure?
- Use shap.Explainer (automatically selects best algorithm)

Seereferences/explainers.md for detailed information on all explainer types.

Step 2: Compute SHAP Values

import shap

# Example with tree-based model (XGBoost)
import xgboost as xgb

# Train model
model = xgb.XGBClassifier().fit(X_train, y_train)

# Create explainer
explainer = shap.TreeExplainer(model)

# Compute SHAP values
shap_values = explainer(X_test)

# The shap_values object contains:
# - values: SHAP values (feature attributions)
# - base_values: Expected model output (baseline)
# - data: Original feature values

Step 3: Visualize Results

For Global Understanding (entire dataset):

# Beeswarm plot - shows feature importance with value distributions
shap.plots.beeswarm(shap_values, max_display=15)

# Bar plot - clean summary of feature importance
shap.plots.bar(shap_values)

For Individual Predictions :

# Waterfall plot - detailed breakdown of single prediction
shap.plots.waterfall(shap_values[0])

# Force plot - additive force visualization
shap.plots.force(shap_values[0])

For Feature Relationships :

# Scatter plot - feature-prediction relationship
shap.plots.scatter(shap_values[:, "Feature_Name"])

# Colored by another feature to show interactions
shap.plots.scatter(shap_values[:, "Age"], color=shap_values[:, "Education"])

Seereferences/plots.md for comprehensive guide on all plot types.

Core Workflows

This skill supports several common workflows. Choose the workflow that matches the current task.

Workflow 1: Basic Model Explanation

Goal : Understand what drives model predictions

Steps :

Train model and create appropriate explainer
Compute SHAP values for test set
Generate global importance plots (beeswarm or bar)
Examine top feature relationships (scatter plots)
Explain specific predictions (waterfall plots)

Example :

# Step 1-2: Setup
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)

# Step 3: Global importance
shap.plots.beeswarm(shap_values)

# Step 4: Feature relationships
shap.plots.scatter(shap_values[:, "Most_Important_Feature"])

# Step 5: Individual explanation
shap.plots.waterfall(shap_values[0])

Workflow 2: Model Debugging

Goal : Identify and fix model issues

Steps :

Compute SHAP values
Identify prediction errors
Explain misclassified samples
Check for unexpected feature importance (data leakage)
Validate feature relationships make sense
Check feature interactions

Seereferences/workflows.md for detailed debugging workflow.

Workflow 3: Feature Engineering

Goal : Use SHAP insights to improve features

Steps :

Compute SHAP values for baseline model
Identify nonlinear relationships (candidates for transformation)
Identify feature interactions (candidates for interaction terms)
Engineer new features
Retrain and compare SHAP values
Validate improvements

Seereferences/workflows.md for detailed feature engineering workflow.

Workflow 4: Model Comparison

Goal : Compare multiple models to select best interpretable option

Steps :

Train multiple models
Compute SHAP values for each
Compare global feature importance
Check consistency of feature rankings
Analyze specific predictions across models
Select based on accuracy, interpretability, and consistency

Seereferences/workflows.md for detailed model comparison workflow.

Workflow 5: Fairness and Bias Analysis

Goal : Detect and analyze model bias across demographic groups

Steps :

Identify protected attributes (gender, race, age, etc.)
Compute SHAP values
Compare feature importance across groups
Check protected attribute SHAP importance
Identify proxy features
Implement mitigation strategies if bias found

Seereferences/workflows.md for detailed fairness analysis workflow.

Workflow 6: Production Deployment

Goal : Integrate SHAP explanations into production systems

Steps :

Train and save model
Create and save explainer
Build explanation service
Create API endpoints for predictions with explanations
Implement caching and optimization
Monitor explanation quality

Seereferences/workflows.md for detailed production deployment workflow.

Key Concepts

SHAP Values

Definition : SHAP values quantify each feature's contribution to a prediction, measured as the deviation from the expected model output (baseline).

Properties :

Additivity : SHAP values sum to difference between prediction and baseline
Fairness : Based on Shapley values from game theory
Consistency : If a feature becomes more important, its SHAP value increases

Interpretation :

Positive SHAP value → Feature pushes prediction higher
Negative SHAP value → Feature pushes prediction lower
Magnitude → Strength of feature's impact
Sum of SHAP values → Total prediction change from baseline

Example :

Baseline (expected value): 0.30
Feature contributions (SHAP values):
  Age: +0.15
  Income: +0.10
  Education: -0.05
Final prediction: 0.30 + 0.15 + 0.10 - 0.05 = 0.50

Background Data / Baseline

Purpose : Represents "typical" input to establish baseline expectations

Selection :

Random sample from training data (50-1000 samples)
Or use kmeans to select representative samples
For DeepExplainer/KernelExplainer: 100-1000 samples balances accuracy and speed

Impact : Baseline affects SHAP value magnitudes but not relative importance

Model Output Types

Critical Consideration : Understand what your model outputs

Raw output : For regression or tree margins
Probability : For classification probability
Log-odds : For logistic regression (before sigmoid)

Example : XGBoost classifiers explain margin output (log-odds) by default. To explain probabilities, use model_output="probability" in TreeExplainer.

Common Patterns

Pattern 1: Complete Model Analysis

# 1. Setup
explainer = shap.TreeExplainer(model)
shap_values = explainer(X_test)

# 2. Global importance
shap.plots.beeswarm(shap_values)
shap.plots.bar(shap_values)

# 3. Top feature relationships
top_features = X_test.columns[np.abs(shap_values.values).mean(0).argsort()[-5:]]
for feature in top_features:
    shap.plots.scatter(shap_values[:, feature])

# 4. Example predictions
for i in range(5):
    shap.plots.waterfall(shap_values[i])

Pattern 2: Cohort Comparison

# Define cohorts
cohort1_mask = X_test['Group'] == 'A'
cohort2_mask = X_test['Group'] == 'B'

# Compare feature importance
shap.plots.bar({
    "Group A": shap_values[cohort1_mask],
    "Group B": shap_values[cohort2_mask]
})

Pattern 3: Debugging Errors

# Find errors
errors = model.predict(X_test) != y_test
error_indices = np.where(errors)[0]

# Explain errors
for idx in error_indices[:5]:
    print(f"Sample {idx}:")
    shap.plots.waterfall(shap_values[idx])

    # Investigate key features
    shap.plots.scatter(shap_values[:, "Suspicious_Feature"])

Performance Optimization

Speed Considerations

Explainer Speed (fastest to slowest):

LinearExplainer - Nearly instantaneous
TreeExplainer - Very fast
DeepExplainer - Fast for neural networks
GradientExplainer - Fast for neural networks
KernelExplainer - Slow (use only when necessary)
PermutationExplainer - Very slow but accurate

Optimization Strategies

For Large Datasets :

# Compute SHAP for subset
shap_values = explainer(X_test[:1000])

# Or use batching
batch_size = 100
all_shap_values = []
for i in range(0, len(X_test), batch_size):
    batch_shap = explainer(X_test[i:i+batch_size])
    all_shap_values.append(batch_shap)

For Visualizations :

# Sample subset for plots
shap.plots.beeswarm(shap_values[:1000])

# Adjust transparency for dense plots
shap.plots.scatter(shap_values[:, "Feature"], alpha=0.3)

For Production :

# Cache explainer
import joblib
joblib.dump(explainer, 'explainer.pkl')
explainer = joblib.load('explainer.pkl')

# Pre-compute for batch predictions
# Only compute top N features for API responses

Troubleshooting

Issue: Wrong explainer choice

Problem : Using KernelExplainer for tree models (slow and unnecessary) Solution : Always use TreeExplainer for tree-based models

Issue: Insufficient background data

Problem : DeepExplainer/KernelExplainer with too few background samples Solution : Use 100-1000 representative samples

Issue: Confusing units

Problem : Interpreting log-odds as probabilities Solution : Check model output type; understand whether values are probabilities, log-odds, or raw outputs

Issue: Plots don't display

Problem : Matplotlib backend issues Solution : Ensure backend is set correctly; use plt.show() if needed

Issue: Too many features cluttering plots

Problem : Default max_display=10 may be too many or too few Solution : Adjust max_display parameter or use feature clustering

Issue: Slow computation

Problem : Computing SHAP for very large datasets Solution : Sample subset, use batching, or ensure using specialized explainer (not KernelExplainer)

Integration with Other Tools

Jupyter Notebooks

Interactive force plots work seamlessly
Inline plot display with show=True (default)
Combine with markdown for narrative explanations

MLflow / Experiment Tracking

import mlflow

with mlflow.start_run():
    # Train model
    model = train_model(X_train, y_train)

    # Compute SHAP
    explainer = shap.TreeExplainer(model)
    shap_values = explainer(X_test)

    # Log plots
    shap.plots.beeswarm(shap_values, show=False)
    mlflow.log_figure(plt.gcf(), "shap_beeswarm.png")
    plt.close()

    # Log feature importance metrics
    mean_abs_shap = np.abs(shap_values.values).mean(axis=0)
    for feature, importance in zip(X_test.columns, mean_abs_shap):
        mlflow.log_metric(f"shap_{feature}", importance)

Production APIs

class ExplanationService:
    def __init__(self, model_path, explainer_path):
        self.model = joblib.load(model_path)
        self.explainer = joblib.load(explainer_path)

    def predict_with_explanation(self, X):
        prediction = self.model.predict(X)
        shap_values = self.explainer(X)

        return {
            'prediction': prediction[0],
            'base_value': shap_values.base_values[0],
            'feature_contributions': dict(zip(X.columns, shap_values.values[0]))
        }

Reference Documentation

This skill includes comprehensive reference documentation organized by topic:

references/explainers.md

Complete guide to all explainer classes:

TreeExplainer - Fast, exact explanations for tree-based models
DeepExplainer - Deep learning models (TensorFlow, PyTorch)
KernelExplainer - Model-agnostic (works with any model)
LinearExplainer - Fast explanations for linear models
GradientExplainer - Gradient-based for neural networks
PermutationExplainer - Exact but slow for any model

Includes: Constructor parameters, methods, supported models, when to use, examples, performance considerations.

references/plots.md

Comprehensive visualization guide:

Waterfall plots - Individual prediction breakdowns
Beeswarm plots - Global importance with value distributions
Bar plots - Clean feature importance summaries
Scatter plots - Feature-prediction relationships and interactions
Force plots - Interactive additive force visualizations
Heatmap plots - Multi-sample comparison grids
Violin plots - Distribution-focused alternatives
Decision plots - Multiclass prediction paths

Includes: Parameters, use cases, examples, best practices, plot selection guide.

references/workflows.md

Detailed workflows and best practices:

Basic model explanation workflow
Model debugging and validation
Feature engineering guidance
Model comparison and selection
Fairness and bias analysis
Deep learning model explanation
Production deployment
Time series model explanation
Common pitfalls and solutions
Advanced techniques
MLOps integration

Includes: Step-by-step instructions, code examples, decision criteria, troubleshooting.

references/theory.md

Theoretical foundations:

Shapley values from game theory
Mathematical formulas and properties
Connection to other explanation methods (LIME, DeepLIFT, etc.)
SHAP computation algorithms (Tree SHAP, Kernel SHAP, etc.)
Conditional expectations and baseline selection
Interpreting SHAP values
Interaction values
Theoretical limitations and considerations

Includes: Mathematical foundations, proofs, comparisons, advanced topics.

Usage Guidelines

When to load reference files :

Load explainers.md when user needs detailed information about specific explainer types or parameters
Load plots.md when user needs detailed visualization guidance or exploring plot options
Load workflows.md when user has complex multi-step tasks (debugging, fairness analysis, production deployment)
Load theory.md when user asks about theoretical foundations, Shapley values, or mathematical details

Default approach (without loading references):

Use this SKILL.md for basic explanations and quick start
Provide standard workflows and common patterns
Reference files are available if more detail is needed

Loading references :

# To load reference files, use the Read tool with appropriate file path:
# /path/to/shap/references/explainers.md
# /path/to/shap/references/plots.md
# /path/to/shap/references/workflows.md
# /path/to/shap/references/theory.md

Best Practices Summary

Choose the right explainer : Use specialized explainers (TreeExplainer, DeepExplainer, LinearExplainer) when possible; avoid KernelExplainer unless necessary
Start global, then go local : Begin with beeswarm/bar plots for overall understanding, then dive into waterfall/scatter plots for details
Use multiple visualizations : Different plots reveal different insights; combine global (beeswarm) + local (waterfall) + relationship (scatter) views
Select appropriate background data : Use 50-1000 representative samples from training data
Understand model output units : Know whether explaining probabilities, log-odds, or raw outputs
Validate with domain knowledge : SHAP shows model behavior; use domain expertise to interpret and validate
Optimize for performance : Sample subsets for visualization, batch for large datasets, cache explainers in production
Check for data leakage : Unexpectedly high feature importance may indicate data quality issues
Consider feature correlations : Use TreeExplainer's correlation-aware options or feature clustering for redundant features
Remember SHAP shows association, not causation : Use domain knowledge for causal interpretation

Installation

# Basic installation
uv pip install shap

# With visualization dependencies
uv pip install shap matplotlib

# Latest version
uv pip install -U shap

Dependencies : numpy, pandas, scikit-learn, matplotlib, scipy

Optional : xgboost, lightgbm, tensorflow, torch (depending on model types)

Additional Resources

Official Documentation : https://shap.readthedocs.io/
GitHub Repository : https://github.com/slundberg/shap
Original Paper : Lundberg & Lee (2017) - "A Unified Approach to Interpreting Model Predictions"
Nature MI Paper : Lundberg et al. (2020) - "From local explanations to global understanding with explainable AI for trees"

This skill provides comprehensive coverage of SHAP for model interpretability across all use cases and model types.

Weekly Installs

141

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code119

opencode109

gemini-cli104

cursor99

codex96

antigravity95

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

62,200 周安装

SHAP 模型解释指南：计算特征重要性、可视化与公平性分析

🇨🇳中文介绍

SHAP (SHapley Additive exPlanations)

概述

何时使用此技能

快速入门指南

步骤 1：选择合适的解释器

相关 Skills

步骤 2：计算 SHAP 值

步骤 3：可视化结果

核心工作流程

工作流程 1：基本模型解释

工作流程 2：模型调试

工作流程 3：特征工程

工作流程 4：模型比较

工作流程 5：公平性与偏见分析

工作流程 6：生产部署

关键概念

SHAP 值

背景数据 / 基线

模型输出类型

常见模式

模式 1：完整的模型分析

模式 2：队列比较

模式 3：调试错误

性能优化

速度考虑

优化策略

故障排除

问题：选择了错误的解释器

问题：背景数据不足

问题：单位混淆

问题：图表不显示

问题：特征过多导致图表混乱

问题：计算速度慢

与其他工具的集成

Jupyter Notebooks

MLflow / 实验跟踪

生产 API

参考文档

references/explainers.md

references/plots.md

references/workflows.md

references/theory.md

使用指南

最佳实践总结

安装

其他资源

🇺🇸English

SHAP (SHapley Additive exPlanations)

Overview

When to Use This Skill

Quick Start Guide

Step 1: Select the Right Explainer

Step 2: Compute SHAP Values

Step 3: Visualize Results

Core Workflows

Workflow 1: Basic Model Explanation

Workflow 2: Model Debugging

Workflow 3: Feature Engineering

Workflow 4: Model Comparison

Workflow 5: Fairness and Bias Analysis

Workflow 6: Production Deployment

Key Concepts

SHAP Values

Background Data / Baseline

Model Output Types

Common Patterns

Pattern 1: Complete Model Analysis

Pattern 2: Cohort Comparison

Pattern 3: Debugging Errors

Performance Optimization

Speed Considerations

Optimization Strategies

Troubleshooting

Issue: Wrong explainer choice

Issue: Insufficient background data

Issue: Confusing units

Issue: Plots don't display

Issue: Too many features cluttering plots