pyhealth by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill pyhealthPyHealth 是一个用于医疗健康 AI 的综合性 Python 库,为临床机器学习提供专门的工具、模型和数据集。在开发医疗健康预测模型、处理临床数据、使用医疗编码系统或在医疗健康环境中部署 AI 解决方案时,请使用此技能。
在以下情况时调用此技能:
PyHealth 通过一个为医疗健康 AI 优化的模块化 5 阶段流程运行:
性能:医疗健康数据处理速度比 pandas 快 3 倍
from pyhealth.datasets import MIMIC4Dataset
from pyhealth.tasks import mortality_prediction_mimic4_fn
from pyhealth.datasets import split_by_patient, get_dataloader
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer
# 1. 加载数据集并设置任务
dataset = MIMIC4Dataset(root="/path/to/data")
sample_dataset = dataset.set_task(mortality_prediction_mimic4_fn)
# 2. 分割数据
train, val, test = split_by_patient(sample_dataset, [0.7, 0.1, 0.2])
# 3. 创建数据加载器
train_loader = get_dataloader(train, batch_size=64, shuffle=True)
val_loader = get_dataloader(val, batch_size=64, shuffle=False)
test_loader = get_dataloader(test, batch_size=64, shuffle=False)
# 4. 初始化和训练模型
model = Transformer(
dataset=sample_dataset,
feature_keys=["diagnoses", "medications"],
mode="binary",
embedding_dim=128
)
trainer = Trainer(model=model, device="cuda")
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
monitor="pr_auc_score"
)
# 5. 评估
results = trainer.evaluate(test_loader)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
此技能包含按功能组织的全面参考文档。根据需要阅读特定的参考文件:
文件:references/datasets.md
何时阅读:
关键主题:
文件:references/medical_coding.md
何时阅读:
关键主题:
文件:references/tasks.md
何时阅读:
关键主题:
文件:references/models.md
何时阅读:
关键主题:
文件:references/preprocessing.md
何时阅读:
关键主题:
文件:references/training_evaluation.md
何时阅读:
关键主题:
uv pip install pyhealth
要求:
目标:预测重症监护病房的患者死亡率
方法:
references/datasets.mdreferences/tasks.mdreferences/models.mdreferences/training_evaluation.mdreferences/training_evaluation.md目标:推荐药物,同时避免药物相互作用
方法:
references/datasets.mdreferences/tasks.mdreferences/models.mdreferences/medical_coding.mdreferences/training_evaluation.md目标:识别有 30 天再入院风险的患者
方法:
references/datasets.mdreferences/tasks.mdreferences/preprocessing.mdreferences/models.mdreferences/training_evaluation.md目标:根据 EEG 信号对睡眠阶段进行分类
方法:
references/datasets.mdreferences/tasks.mdreferences/preprocessing.mdreferences/models.mdreferences/training_evaluation.md目标:在不同编码系统之间标准化诊断
方法:
references/medical_coding.md 获取全面指导目标:根据临床记录自动分配 ICD 代码
方法:
references/datasets.mdreferences/tasks.mdreferences/preprocessing.mdreferences/models.mdreferences/training_evaluation.md始终按患者分割:确保没有患者出现在多个分割中,以防止数据泄露
from pyhealth.datasets import split_by_patient
train, val, test = split_by_patient(dataset, [0.7, 0.1, 0.2])
检查数据集统计信息:在建模前了解您的数据
print(dataset.stats()) # 患者、就诊、事件、代码分布
使用适当的预处理:使处理器与数据类型匹配(参见 references/preprocessing.md)
references/training_evaluation.md)数据集导入错误:
内存不足:
max_seq_length)性能不佳:
训练缓慢:
device="cuda")# 完整的死亡率预测流程
from pyhealth.datasets import MIMIC4Dataset
from pyhealth.tasks import mortality_prediction_mimic4_fn
from pyhealth.datasets import split_by_patient, get_dataloader
from pyhealth.models import RETAIN
from pyhealth.trainer import Trainer
# 1. 加载数据集
print("Loading MIMIC-IV dataset...")
dataset = MIMIC4Dataset(root="/data/mimic4")
print(dataset.stats())
# 2. 定义任务
print("Setting mortality prediction task...")
sample_dataset = dataset.set_task(mortality_prediction_mimic4_fn)
print(f"Generated {len(sample_dataset)} samples")
# 3. 分割数据(按患者以防止泄露)
print("Splitting data...")
train_ds, val_ds, test_ds = split_by_patient(
sample_dataset, ratios=[0.7, 0.1, 0.2], seed=42
)
# 4. 创建数据加载器
train_loader = get_dataloader(train_ds, batch_size=64, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=64)
test_loader = get_dataloader(test_ds, batch_size=64)
# 5. 初始化可解释模型
print("Initializing RETAIN model...")
model = RETAIN(
dataset=sample_dataset,
feature_keys=["diagnoses", "procedures", "medications"],
mode="binary",
embedding_dim=128,
hidden_dim=128
)
# 6. 训练模型
print("Training model...")
trainer = Trainer(model=model, device="cuda")
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
optimizer="Adam",
learning_rate=1e-3,
weight_decay=1e-5,
monitor="pr_auc_score", # 对不平衡数据使用 AUPRC
monitor_criterion="max",
save_path="./checkpoints/mortality_retain"
)
# 7. 在测试集上评估
print("Evaluating on test set...")
test_results = trainer.evaluate(
test_loader,
metrics=["accuracy", "precision", "recall", "f1_score",
"roc_auc_score", "pr_auc_score"]
)
print("\nTest Results:")
for metric, value in test_results.items():
print(f" {metric}: {value:.4f}")
# 8. 获取带注意力的预测以进行解释
predictions = trainer.inference(
test_loader,
additional_outputs=["visit_attention", "feature_attention"],
return_patient_ids=True
)
# 9. 分析高风险患者
high_risk_idx = predictions["y_pred"].argmax()
patient_id = predictions["patient_ids"][high_risk_idx]
visit_attn = predictions["visit_attention"][high_risk_idx]
feature_attn = predictions["feature_attention"][high_risk_idx]
print(f"\nHigh-risk patient: {patient_id}")
print(f"Risk score: {predictions['y_pred'][high_risk_idx]:.3f}")
print(f"Most influential visit: {visit_attn.argmax()}")
print(f"Most important features: {feature_attn[visit_attn.argmax()].argsort()[-5:]}")
# 10. 保存模型以供部署
trainer.save("./models/mortality_retain_final.pt")
print("\nModel saved successfully!")
有关每个组件的详细信息,请参阅 references/ 目录中的全面参考文件:
全面文档总计:跨模块化参考文件约 28,000 字。
每周安装
143
仓库
GitHub Stars
22.6K
首次出现
Jan 21, 2026
安全审计
安装于
claude-code118
opencode112
gemini-cli105
cursor103
antigravity97
codex94
PyHealth is a comprehensive Python library for healthcare AI that provides specialized tools, models, and datasets for clinical machine learning. Use this skill when developing healthcare prediction models, processing clinical data, working with medical coding systems, or deploying AI solutions in healthcare settings.
Invoke this skill when:
PyHealth operates through a modular 5-stage pipeline optimized for healthcare AI:
Performance : 3x faster than pandas for healthcare data processing
from pyhealth.datasets import MIMIC4Dataset
from pyhealth.tasks import mortality_prediction_mimic4_fn
from pyhealth.datasets import split_by_patient, get_dataloader
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer
# 1. Load dataset and set task
dataset = MIMIC4Dataset(root="/path/to/data")
sample_dataset = dataset.set_task(mortality_prediction_mimic4_fn)
# 2. Split data
train, val, test = split_by_patient(sample_dataset, [0.7, 0.1, 0.2])
# 3. Create data loaders
train_loader = get_dataloader(train, batch_size=64, shuffle=True)
val_loader = get_dataloader(val, batch_size=64, shuffle=False)
test_loader = get_dataloader(test, batch_size=64, shuffle=False)
# 4. Initialize and train model
model = Transformer(
dataset=sample_dataset,
feature_keys=["diagnoses", "medications"],
mode="binary",
embedding_dim=128
)
trainer = Trainer(model=model, device="cuda")
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
monitor="pr_auc_score"
)
# 5. Evaluate
results = trainer.evaluate(test_loader)
This skill includes comprehensive reference documentation organized by functionality. Read specific reference files as needed:
File : references/datasets.md
Read when:
Key Topics:
File : references/medical_coding.md
Read when:
Key Topics:
File : references/tasks.md
Read when:
Key Topics:
File : references/models.md
Read when:
Key Topics:
File : references/preprocessing.md
Read when:
Key Topics:
File : references/training_evaluation.md
Read when:
Key Topics:
uv pip install pyhealth
Requirements:
Objective : Predict patient mortality in intensive care unit
Approach:
references/datasets.mdreferences/tasks.mdreferences/models.mdreferences/training_evaluation.mdreferences/training_evaluation.mdObjective : Recommend medications while avoiding drug-drug interactions
Approach:
references/datasets.mdreferences/tasks.mdreferences/models.mdreferences/medical_coding.mdreferences/training_evaluation.mdObjective : Identify patients at risk of 30-day readmission
Approach:
references/datasets.mdreferences/tasks.mdreferences/preprocessing.mdreferences/models.mdreferences/training_evaluation.mdObjective : Classify sleep stages from EEG signals
Approach:
references/datasets.mdreferences/tasks.mdreferences/preprocessing.mdreferences/models.mdreferences/training_evaluation.mdObjective : Standardize diagnoses across different coding systems
Approach:
references/medical_coding.md for comprehensive guidanceObjective : Automatically assign ICD codes from clinical notes
Approach:
references/datasets.mdreferences/tasks.mdreferences/preprocessing.mdreferences/models.mdreferences/training_evaluation.mdAlways split by patient : Prevent data leakage by ensuring no patient appears in multiple splits
from pyhealth.datasets import split_by_patient
train, val, test = split_by_patient(dataset, [0.7, 0.1, 0.2])
Check dataset statistics : Understand your data before modeling
print(dataset.stats()) # Patients, visits, events, code distributions
Use appropriate preprocessing : Match processors to data types (see references/preprocessing.md)
Start with baselines : Establish baseline performance with simple models
Choose task-appropriate models :
Monitor validation metrics : Use appropriate metrics for task and handle class imbalance
Calibrate predictions : Ensure probabilities are reliable (see references/training_evaluation.md)
Assess fairness : Evaluate across demographic groups to detect bias
Quantify uncertainty : Provide confidence estimates for predictions
Interpret predictions : Use attention weights, SHAP, or ChEFER for clinical trust
Validate thoroughly : Use held-out test sets from different time periods or sites
ImportError for dataset :
Out of memory :
max_seq_length)Poor performance :
Slow training :
device="cuda")# Complete mortality prediction pipeline
from pyhealth.datasets import MIMIC4Dataset
from pyhealth.tasks import mortality_prediction_mimic4_fn
from pyhealth.datasets import split_by_patient, get_dataloader
from pyhealth.models import RETAIN
from pyhealth.trainer import Trainer
# 1. Load dataset
print("Loading MIMIC-IV dataset...")
dataset = MIMIC4Dataset(root="/data/mimic4")
print(dataset.stats())
# 2. Define task
print("Setting mortality prediction task...")
sample_dataset = dataset.set_task(mortality_prediction_mimic4_fn)
print(f"Generated {len(sample_dataset)} samples")
# 3. Split data (by patient to prevent leakage)
print("Splitting data...")
train_ds, val_ds, test_ds = split_by_patient(
sample_dataset, ratios=[0.7, 0.1, 0.2], seed=42
)
# 4. Create data loaders
train_loader = get_dataloader(train_ds, batch_size=64, shuffle=True)
val_loader = get_dataloader(val_ds, batch_size=64)
test_loader = get_dataloader(test_ds, batch_size=64)
# 5. Initialize interpretable model
print("Initializing RETAIN model...")
model = RETAIN(
dataset=sample_dataset,
feature_keys=["diagnoses", "procedures", "medications"],
mode="binary",
embedding_dim=128,
hidden_dim=128
)
# 6. Train model
print("Training model...")
trainer = Trainer(model=model, device="cuda")
trainer.train(
train_dataloader=train_loader,
val_dataloader=val_loader,
epochs=50,
optimizer="Adam",
learning_rate=1e-3,
weight_decay=1e-5,
monitor="pr_auc_score", # Use AUPRC for imbalanced data
monitor_criterion="max",
save_path="./checkpoints/mortality_retain"
)
# 7. Evaluate on test set
print("Evaluating on test set...")
test_results = trainer.evaluate(
test_loader,
metrics=["accuracy", "precision", "recall", "f1_score",
"roc_auc_score", "pr_auc_score"]
)
print("\nTest Results:")
for metric, value in test_results.items():
print(f" {metric}: {value:.4f}")
# 8. Get predictions with attention for interpretation
predictions = trainer.inference(
test_loader,
additional_outputs=["visit_attention", "feature_attention"],
return_patient_ids=True
)
# 9. Analyze a high-risk patient
high_risk_idx = predictions["y_pred"].argmax()
patient_id = predictions["patient_ids"][high_risk_idx]
visit_attn = predictions["visit_attention"][high_risk_idx]
feature_attn = predictions["feature_attention"][high_risk_idx]
print(f"\nHigh-risk patient: {patient_id}")
print(f"Risk score: {predictions['y_pred'][high_risk_idx]:.3f}")
print(f"Most influential visit: {visit_attn.argmax()}")
print(f"Most important features: {feature_attn[visit_attn.argmax()].argsort()[-5:]}")
# 10. Save model for deployment
trainer.save("./models/mortality_retain_final.pt")
print("\nModel saved successfully!")
For detailed information on each component, refer to the comprehensive reference files in the references/ directory:
Total comprehensive documentation : ~28,000 words across modular reference files.
Weekly Installs
143
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code118
opencode112
gemini-cli105
cursor103
antigravity97
codex94
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
50,500 周安装