pymc-bayesian-modeling by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill pymc-bayesian-modelingPyMC 是一个用于贝叶斯建模和概率编程的 Python 库。使用 PyMC 的现代 API(版本 5.x+)构建、拟合、验证和比较贝叶斯模型,包括层次模型、MCMC 采样(NUTS)、变分推断和模型比较(LOO、WAIC)。
此技能应在以下情况下使用:
遵循此工作流来构建和验证贝叶斯模型:
import pymc as pm
import arviz as az
import numpy as np
# 加载并准备数据
X = ... # 预测变量
y = ... # 结果变量
# 标准化预测变量以获得更好的采样效果
X_mean = X.mean(axis=0)
X_std = X.std(axis=0)
X_scaled = (X - X_mean) / X_std
关键实践:
coords 的命名维度coords = {
'predictors': ['var1', 'var2', 'var3'],
'obs_id': np.arange(len(y))
}
with pm.Model(coords=coords) as model:
# 先验分布
alpha = pm.Normal('alpha', mu=0, sigma=1)
beta = pm.Normal('beta', mu=0, sigma=1, dims='predictors')
sigma = pm.HalfNormal('sigma', sigma=1)
# 线性预测器
mu = alpha + pm.math.dot(X_scaled, beta)
# 似然函数
y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y, dims='obs_id')
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
关键实践:
HalfNormal 或 Exponentialdims)而非 shapepm.Data()在拟合前始终验证先验:
with model:
prior_pred = pm.sample_prior_predictive(samples=1000, random_seed=42)
# 可视化
az.plot_ppc(prior_pred, group='prior')
检查:
with model:
# 可选:使用 ADVI 进行快速探索
# approx = pm.fit(n=20000)
# 完整的 MCMC 推断
idata = pm.sample(
draws=2000,
tune=1000,
chains=4,
target_accept=0.9,
random_seed=42,
idata_kwargs={'log_likelihood': True} # 用于模型比较
)
关键参数:
draws=2000:每条链的样本数tune=1000:预热样本(丢弃)chains=4:运行 4 条链以检查收敛性target_accept=0.9:对于困难的后验分布使用更高值(0.95-0.99)log_likelihood=True 以进行模型比较使用诊断脚本:
from scripts.model_diagnostics import check_diagnostics
results = check_diagnostics(idata, var_names=['alpha', 'beta', 'sigma'])
检查:
如果出现问题:
target_accept=0.95,使用非中心化参数化验证模型拟合:
with model:
pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=42)
# 可视化
az.plot_ppc(idata)
检查:
# 汇总统计量
print(az.summary(idata, var_names=['alpha', 'beta', 'sigma']))
# 后验分布
az.plot_posterior(idata, var_names=['alpha', 'beta', 'sigma'])
# 系数估计
az.plot_forest(idata, var_names=['beta'], combined=True)
X_new = ... # 新的预测变量值
X_new_scaled = (X_new - X_mean) / X_std
with model:
pm.set_data({'X_scaled': X_new_scaled})
post_pred = pm.sample_posterior_predictive(
idata.posterior,
var_names=['y_obs'],
random_seed=42
)
# 提取预测区间
y_pred_mean = post_pred.posterior_predictive['y_obs'].mean(dim=['chain', 'draw'])
y_pred_hdi = az.hdi(post_pred.posterior_predictive, var_names=['y_obs'])
对于具有线性关系的连续结果变量:
with pm.Model() as linear_model:
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
sigma = pm.HalfNormal('sigma', sigma=1)
mu = alpha + pm.math.dot(X, beta)
y = pm.Normal('y', mu=mu, sigma=sigma, observed=y_obs)
使用模板: assets/linear_regression_template.py
对于二元结果变量:
with pm.Model() as logistic_model:
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
logit_p = alpha + pm.math.dot(X, beta)
y = pm.Bernoulli('y', logit_p=logit_p, observed=y_obs)
对于分组数据(使用非中心化参数化):
with pm.Model(coords={'groups': group_names}) as hierarchical_model:
# 超先验
mu_alpha = pm.Normal('mu_alpha', mu=0, sigma=10)
sigma_alpha = pm.HalfNormal('sigma_alpha', sigma=1)
# 组级别(非中心化)
alpha_offset = pm.Normal('alpha_offset', mu=0, sigma=1, dims='groups')
alpha = pm.Deterministic('alpha', mu_alpha + sigma_alpha * alpha_offset, dims='groups')
# 观测级别
mu = alpha[group_idx]
sigma = pm.HalfNormal('sigma', sigma=1)
y = pm.Normal('y', mu=mu, sigma=sigma, observed=y_obs)
使用模板: assets/hierarchical_model_template.py
关键: 对于层次模型,始终使用非中心化参数化以避免发散。
对于计数数据:
with pm.Model() as poisson_model:
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
log_lambda = alpha + pm.math.dot(X, beta)
y = pm.Poisson('y', mu=pm.math.exp(log_lambda), observed=y_obs)
对于过度离散的计数数据,使用 NegativeBinomial 替代。
对于自回归过程:
with pm.Model() as ar_model:
sigma = pm.HalfNormal('sigma', sigma=1)
rho = pm.Normal('rho', mu=0, sigma=0.5, shape=ar_order)
init_dist = pm.Normal.dist(mu=0, sigma=sigma)
y = pm.AR('y', rho=rho, sigma=sigma, init_dist=init_dist, observed=y_obs)
使用 LOO 或 WAIC 进行模型比较:
from scripts.model_comparison import compare_models, check_loo_reliability
# 使用 log_likelihood 拟合模型
models = {
'Model1': idata1,
'Model2': idata2,
'Model3': idata3
}
# 使用 LOO 进行比较
comparison = compare_models(models, ic='loo')
# 检查可靠性
check_loo_reliability(models)
解释:
检查 Pareto-k 值:
当模型相似时,对预测进行平均:
from scripts.model_comparison import model_averaging
averaged_pred, weights = model_averaging(models, var_name='y_obs')
尺度参数(σ, τ):
pm.HalfNormal('sigma', sigma=1) - 默认选择pm.Exponential('sigma', lam=1) - 替代方案pm.Gamma('sigma', alpha=2, beta=1) - 更具信息性无界参数:
pm.Normal('theta', mu=0, sigma=1) - 对于标准化数据pm.StudentT('theta', nu=3, mu=0, sigma=1) - 对异常值稳健正参数:
pm.LogNormal('theta', mu=0, sigma=1)pm.Gamma('theta', alpha=2, beta=1)概率:
pm.Beta('p', alpha=2, beta=2) - 弱信息先验pm.Uniform('p', lower=0, upper=1) - 无信息先验(谨慎使用)相关矩阵:
pm.LKJCorr('corr', n=n_vars, eta=2) - eta=1 均匀分布,eta>1 偏好单位矩阵连续结果变量:
pm.Normal('y', mu=mu, sigma=sigma) - 连续数据的默认选择pm.StudentT('y', nu=nu, mu=mu, sigma=sigma) - 对异常值稳健计数数据:
pm.Poisson('y', mu=lambda) - 等离散计数pm.NegativeBinomial('y', mu=mu, alpha=alpha) - 过度离散计数pm.ZeroInflatedPoisson('y', psi=psi, mu=mu) - 过多零值二元结果变量:
pm.Bernoulli('y', p=p) 或 pm.Bernoulli('y', logit_p=logit_p)分类结果变量:
pm.Categorical('y', p=probs)参见: references/distributions.md 获取完整的分布参考
默认且推荐用于大多数模型:
idata = pm.sample(
draws=2000,
tune=1000,
chains=4,
target_accept=0.9,
random_seed=42
)
需要时调整:
target_accept=0.95 或更高pm.Metropolis()用于探索或初始化的快速近似:
with model:
approx = pm.fit(n=20000, method='advi')
# 用于初始化
start = approx.sample(return_inferencedata=False)[0]
idata = pm.sample(start=start)
权衡:
参见: references/sampling_inference.md 获取详细的采样指南
from scripts.model_diagnostics import create_diagnostic_report
create_diagnostic_report(
idata,
var_names=['alpha', 'beta', 'sigma'],
output_dir='diagnostics/'
)
创建:
from scripts.model_diagnostics import check_diagnostics
results = check_diagnostics(idata)
检查 R-hat、ESS、发散和树深度。
症状: idata.sample_stats.diverging.sum() > 0
解决方案:
target_accept=0.95 或 0.99症状: ESS < 400
解决方案:
draws=5000症状: R-hat > 1.01
解决方案:
tune=2000, draws=5000解决方案:
cores=8, chains=8dims)以提高清晰度target_accept=0.9 作为基线(需要时更高)log_likelihood=True 以进行模型比较此技能包含:
references/)distributions.md:按类别(连续、离散、多元、混合、时间序列)组织的 PyMC 分布综合目录。在选择先验或似然函数时使用。
sampling_inference.md:采样算法(NUTS、Metropolis、SMC)、变分推断(ADVI、SVGD)和处理采样问题的详细指南。在遇到收敛问题或选择推断方法时使用。
workflows.md:常见模型类型、数据准备、先验选择和模型验证的完整工作流示例和代码模式。作为标准贝叶斯分析的参考手册使用。
scripts/)model_diagnostics.py:自动化诊断检查和报告生成。函数:check_diagnostics() 用于快速检查,create_diagnostic_report() 用于带图的综合分析。
model_comparison.py:使用 LOO/WAIC 的模型比较工具。函数:compare_models()、check_loo_reliability()、model_averaging()。
assets/)linear_regression_template.py:贝叶斯线性回归的完整模板,包含完整工作流(数据准备、先验检查、拟合、诊断、预测)。
hierarchical_model_template.py:层次/多级模型的完整模板,包含非中心化参数化和组级别分析。
with pm.Model(coords={'var': names}) as model:
# 先验分布
param = pm.Normal('param', mu=0, sigma=1, dims='var')
# 似然函数
y = pm.Normal('y', mu=..., sigma=..., observed=data)
idata = pm.sample(draws=2000, tune=1000, chains=4, target_accept=0.9)
from scripts.model_diagnostics import check_diagnostics
check_diagnostics(idata)
from scripts.model_comparison import compare_models
compare_models({'m1': idata1, 'm2': idata2}, ic='loo')
with model:
pm.set_data({'X': X_new})
pred = pm.sample_posterior_predictive(idata.posterior)
pm.model_to_graphviz(model) 可视化模型结构idata.to_netcdf('results.nc') 保存结果az.from_netcdf('results.nc') 加载每周安装次数
143
代码仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
已安装于
opencode117
claude-code116
gemini-cli107
cursor105
codex95
antigravity92
PyMC is a Python library for Bayesian modeling and probabilistic programming. Build, fit, validate, and compare Bayesian models using PyMC's modern API (version 5.x+), including hierarchical models, MCMC sampling (NUTS), variational inference, and model comparison (LOO, WAIC).
This skill should be used when:
Follow this workflow for building and validating Bayesian models:
import pymc as pm
import arviz as az
import numpy as np
# Load and prepare data
X = ... # Predictors
y = ... # Outcomes
# Standardize predictors for better sampling
X_mean = X.mean(axis=0)
X_std = X.std(axis=0)
X_scaled = (X - X_mean) / X_std
Key practices:
coords for claritycoords = {
'predictors': ['var1', 'var2', 'var3'],
'obs_id': np.arange(len(y))
}
with pm.Model(coords=coords) as model:
# Priors
alpha = pm.Normal('alpha', mu=0, sigma=1)
beta = pm.Normal('beta', mu=0, sigma=1, dims='predictors')
sigma = pm.HalfNormal('sigma', sigma=1)
# Linear predictor
mu = alpha + pm.math.dot(X_scaled, beta)
# Likelihood
y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y, dims='obs_id')
Key practices:
HalfNormal or Exponential for scale parametersdims) instead of shape when possiblepm.Data() for values that will be updated for predictionsAlways validate priors before fitting:
with model:
prior_pred = pm.sample_prior_predictive(samples=1000, random_seed=42)
# Visualize
az.plot_ppc(prior_pred, group='prior')
Check:
with model:
# Optional: Quick exploration with ADVI
# approx = pm.fit(n=20000)
# Full MCMC inference
idata = pm.sample(
draws=2000,
tune=1000,
chains=4,
target_accept=0.9,
random_seed=42,
idata_kwargs={'log_likelihood': True} # For model comparison
)
Key parameters:
draws=2000: Number of samples per chaintune=1000: Warmup samples (discarded)chains=4: Run 4 chains for convergence checkingtarget_accept=0.9: Higher for difficult posteriors (0.95-0.99)log_likelihood=True for model comparisonUse the diagnostic script:
from scripts.model_diagnostics import check_diagnostics
results = check_diagnostics(idata, var_names=['alpha', 'beta', 'sigma'])
Check:
If issues arise:
target_accept=0.95, use non-centered parameterizationValidate model fit:
with model:
pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=42)
# Visualize
az.plot_ppc(idata)
Check:
# Summary statistics
print(az.summary(idata, var_names=['alpha', 'beta', 'sigma']))
# Posterior distributions
az.plot_posterior(idata, var_names=['alpha', 'beta', 'sigma'])
# Coefficient estimates
az.plot_forest(idata, var_names=['beta'], combined=True)
X_new = ... # New predictor values
X_new_scaled = (X_new - X_mean) / X_std
with model:
pm.set_data({'X_scaled': X_new_scaled})
post_pred = pm.sample_posterior_predictive(
idata.posterior,
var_names=['y_obs'],
random_seed=42
)
# Extract prediction intervals
y_pred_mean = post_pred.posterior_predictive['y_obs'].mean(dim=['chain', 'draw'])
y_pred_hdi = az.hdi(post_pred.posterior_predictive, var_names=['y_obs'])
For continuous outcomes with linear relationships:
with pm.Model() as linear_model:
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
sigma = pm.HalfNormal('sigma', sigma=1)
mu = alpha + pm.math.dot(X, beta)
y = pm.Normal('y', mu=mu, sigma=sigma, observed=y_obs)
Use template: assets/linear_regression_template.py
For binary outcomes:
with pm.Model() as logistic_model:
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
logit_p = alpha + pm.math.dot(X, beta)
y = pm.Bernoulli('y', logit_p=logit_p, observed=y_obs)
For grouped data (use non-centered parameterization):
with pm.Model(coords={'groups': group_names}) as hierarchical_model:
# Hyperpriors
mu_alpha = pm.Normal('mu_alpha', mu=0, sigma=10)
sigma_alpha = pm.HalfNormal('sigma_alpha', sigma=1)
# Group-level (non-centered)
alpha_offset = pm.Normal('alpha_offset', mu=0, sigma=1, dims='groups')
alpha = pm.Deterministic('alpha', mu_alpha + sigma_alpha * alpha_offset, dims='groups')
# Observation-level
mu = alpha[group_idx]
sigma = pm.HalfNormal('sigma', sigma=1)
y = pm.Normal('y', mu=mu, sigma=sigma, observed=y_obs)
Use template: assets/hierarchical_model_template.py
Critical: Always use non-centered parameterization for hierarchical models to avoid divergences.
For count data:
with pm.Model() as poisson_model:
alpha = pm.Normal('alpha', mu=0, sigma=10)
beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
log_lambda = alpha + pm.math.dot(X, beta)
y = pm.Poisson('y', mu=pm.math.exp(log_lambda), observed=y_obs)
For overdispersed counts, use NegativeBinomial instead.
For autoregressive processes:
with pm.Model() as ar_model:
sigma = pm.HalfNormal('sigma', sigma=1)
rho = pm.Normal('rho', mu=0, sigma=0.5, shape=ar_order)
init_dist = pm.Normal.dist(mu=0, sigma=sigma)
y = pm.AR('y', rho=rho, sigma=sigma, init_dist=init_dist, observed=y_obs)
Use LOO or WAIC for model comparison:
from scripts.model_comparison import compare_models, check_loo_reliability
# Fit models with log_likelihood
models = {
'Model1': idata1,
'Model2': idata2,
'Model3': idata3
}
# Compare using LOO
comparison = compare_models(models, ic='loo')
# Check reliability
check_loo_reliability(models)
Interpretation:
Check Pareto-k values:
When models are similar, average predictions:
from scripts.model_comparison import model_averaging
averaged_pred, weights = model_averaging(models, var_name='y_obs')
Scale parameters (σ, τ):
pm.HalfNormal('sigma', sigma=1) - Default choicepm.Exponential('sigma', lam=1) - Alternativepm.Gamma('sigma', alpha=2, beta=1) - More informativeUnbounded parameters :
pm.Normal('theta', mu=0, sigma=1) - For standardized datapm.StudentT('theta', nu=3, mu=0, sigma=1) - Robust to outliersPositive parameters :
pm.LogNormal('theta', mu=0, sigma=1)pm.Gamma('theta', alpha=2, beta=1)Probabilities :
pm.Beta('p', alpha=2, beta=2) - Weakly informativepm.Uniform('p', lower=0, upper=1) - Non-informative (use sparingly)Correlation matrices :
pm.LKJCorr('corr', n=n_vars, eta=2) - eta=1 uniform, eta>1 prefers identityContinuous outcomes :
pm.Normal('y', mu=mu, sigma=sigma) - Default for continuous datapm.StudentT('y', nu=nu, mu=mu, sigma=sigma) - Robust to outliersCount data :
pm.Poisson('y', mu=lambda) - Equidispersed countspm.NegativeBinomial('y', mu=mu, alpha=alpha) - Overdispersed countspm.ZeroInflatedPoisson('y', psi=psi, mu=mu) - Excess zerosBinary outcomes :
pm.Bernoulli('y', p=p) or pm.Bernoulli('y', logit_p=logit_p)Categorical outcomes :
pm.Categorical('y', p=probs)See: references/distributions.md for comprehensive distribution reference
Default and recommended for most models:
idata = pm.sample(
draws=2000,
tune=1000,
chains=4,
target_accept=0.9,
random_seed=42
)
Adjust when needed:
target_accept=0.95 or higherpm.Metropolis() for discrete varsFast approximation for exploration or initialization:
with model:
approx = pm.fit(n=20000, method='advi')
# Use for initialization
start = approx.sample(return_inferencedata=False)[0]
idata = pm.sample(start=start)
Trade-offs:
See: references/sampling_inference.md for detailed sampling guide
from scripts.model_diagnostics import create_diagnostic_report
create_diagnostic_report(
idata,
var_names=['alpha', 'beta', 'sigma'],
output_dir='diagnostics/'
)
Creates:
from scripts.model_diagnostics import check_diagnostics
results = check_diagnostics(idata)
Checks R-hat, ESS, divergences, and tree depth.
Symptom: idata.sample_stats.diverging.sum() > 0
Solutions:
target_accept=0.95 or 0.99Symptom: ESS < 400
Solutions:
draws=5000Symptom: R-hat > 1.01
Solutions:
tune=2000, draws=5000Solutions:
cores=8, chains=8dims) for claritytarget_accept=0.9 as baseline (higher if needed)log_likelihood=True for model comparisonThis skill includes:
references/)distributions.md : Comprehensive catalog of PyMC distributions organized by category (continuous, discrete, multivariate, mixture, time series). Use when selecting priors or likelihoods.
sampling_inference.md : Detailed guide to sampling algorithms (NUTS, Metropolis, SMC), variational inference (ADVI, SVGD), and handling sampling issues. Use when encountering convergence problems or choosing inference methods.
workflows.md : Complete workflow examples and code patterns for common model types, data preparation, prior selection, and model validation. Use as a cookbook for standard Bayesian analyses.
scripts/)model_diagnostics.py : Automated diagnostic checking and report generation. Functions: check_diagnostics() for quick checks, create_diagnostic_report() for comprehensive analysis with plots.
model_comparison.py : Model comparison utilities using LOO/WAIC. Functions: compare_models(), check_loo_reliability(), model_averaging().
assets/)linear_regression_template.py : Complete template for Bayesian linear regression with full workflow (data prep, prior checks, fitting, diagnostics, predictions).
hierarchical_model_template.py : Complete template for hierarchical/multilevel models with non-centered parameterization and group-level analysis.
with pm.Model(coords={'var': names}) as model:
# Priors
param = pm.Normal('param', mu=0, sigma=1, dims='var')
# Likelihood
y = pm.Normal('y', mu=..., sigma=..., observed=data)
idata = pm.sample(draws=2000, tune=1000, chains=4, target_accept=0.9)
from scripts.model_diagnostics import check_diagnostics
check_diagnostics(idata)
from scripts.model_comparison import compare_models
compare_models({'m1': idata1, 'm2': idata2}, ic='loo')
with model:
pm.set_data({'X': X_new})
pred = pm.sample_posterior_predictive(idata.posterior)
pm.model_to_graphviz(model) to visualize model structureidata.to_netcdf('results.nc')az.from_netcdf('results.nc')Weekly Installs
143
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode117
claude-code116
gemini-cli107
cursor105
codex95
antigravity92
专业SEO审计工具:全面网站诊断、技术SEO优化与页面分析指南
66,700 周安装
d3k 开发调试工具 - 统一日志捕获、错误分析与浏览器自动化
106 周安装
Pine Script 调试器 - TradingView Pine Script 代码调试与问题排查工具
106 周安装
小红书自动化控制工具 - 使用Playwright实现内容发布、搜索、评论等自动化操作
106 周安装
PostgreSQL只读查询技能 - 安全数据库连接与数据分析工具
106 周安装
RealityKit API 参考手册 - 按类别整理的完整 AR 开发指南
107 周安装
阿里云函数计算FC Serverless Devs技能测试指南 - 冒烟测试与部署验证
107 周安装