Scikit-learn 最佳实践指南：机器学习工作流、模型评估与代码规范

scikit-learn-best-practices by mindrally/skills

99 周安装量

43 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mindrally/skills --skill scikit-learn-best-practices

AI/机器学习 Python Web框架代码规范

🇨🇳中文介绍

Scikit-learn 最佳实践

专注于机器学习工作流、模型开发、评估和最佳实践的 scikit-learn 开发专家指南。

代码风格与结构

编写简洁、技术性的回复，并提供准确的 Python 示例
在机器学习工作流中优先考虑可复现性
对数据管道使用函数式编程
对自定义估计器使用面向对象编程
优先使用向量化操作而非显式循环
遵循 PEP 8 风格指南

机器学习工作流

数据准备

在任何预处理之前始终拆分数据：训练集/验证集/测试集
使用 train_test_split() 并设置 random_state 以确保可复现性
对于不平衡分类使用分层拆分：stratify=y
在最终评估之前，保持测试集完全独立

特征工程

对于基于距离的算法，适当缩放特征
对正态分布的特征使用 StandardScaler
对有界特征使用 MinMaxScaler
对有异常值的数据使用 RobustScaler

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

模型选择与调优

使用交叉验证以获得可靠的性能估计
使用 cross_val_score() 进行快速评估
使用 cross_validate() 获取多个指标
使用适当的交叉验证策略：
- 回归使用 KFold
- 分类使用 StratifiedKFold
- 时间序列数据使用 TimeSeriesSplit
- 分组数据使用 GroupKFold

使用 GridSearchCV 进行穷举搜索
对大型参数空间使用 RandomizedSearchCV
始终在训练/验证数据上调优，切勿在测试数据上
设置 n_jobs=-1 进行并行处理

根据问题使用适当的指标：
- 类别平衡时使用 accuracy_score
- 类别不平衡时使用 precision_score、recall_score、f1_score
- 评估排序能力使用 roc_auc_score
使用 classification_report() 获取全面概览
检查 confusion_matrix() 进行错误分析

通用场景使用 mean_squared_error (MSE)
追求可解释性时使用 mean_absolute_error (MAE)
评估解释方差使用 r2_score

报告置信区间，而不仅仅是点估计
使用多个指标来理解模型行为
与有意义的基线进行比较
仅在最后阶段对预留的测试集进行一次评估

处理不平衡数据

使用分层拆分和交叉验证
考虑类别权重：class_weight='balanced'
使用适当的指标（F1、AUC-PR，而非准确率）
根据业务需求调整决策阈值

结合统计检验使用 SelectKBest
使用 RFE（递归特征消除）
使用基于模型的选择：SelectFromModel
检查基于树模型的特征重要性

使用 joblib 保存和加载模型
保存整个管道，而不仅仅是模型
对模型工件进行版本控制
记录模型元数据

在可用时使用 n_jobs=-1 进行并行处理
考虑使用 warm_start=True 进行迭代训练
对高维稀疏数据使用稀疏矩阵
对于大数据集，考虑使用 partial_fit() 进行增量学习

从子模块导入：from sklearn.ensemble import RandomForestClassifier
设置 random_state 以确保可复现性
使用管道防止数据泄露
记录模型选择和超参数

🇺🇸English

Scikit-learn Best Practices

Expert guidelines for scikit-learn development, focusing on machine learning workflows, model development, evaluation, and best practices.

Code Style and Structure

Write concise, technical responses with accurate Python examples
Prioritize reproducibility in machine learning workflows
Use functional programming for data pipelines
Use object-oriented programming for custom estimators
Prefer vectorized operations over explicit loops
Follow PEP 8 style guidelines

Machine Learning Workflow

Data Preparation

Always split data before any preprocessing: train/validation/test
Use train_test_split() with random_state for reproducibility
Stratify splits for imbalanced classification: stratify=y
Keep test set completely separate until final evaluation

Feature Engineering

Scale features appropriately for distance-based algorithms
Use StandardScaler for normally distributed features
Use MinMaxScaler for bounded features
Use RobustScaler for data with outliers
Encode categorical variables: OneHotEncoder, OrdinalEncoder, LabelEncoder
Handle missing values: SimpleImputer, KNNImputer

Pipelines

Always use Pipeline to chain preprocessing and modeling
Prevents data leakage by fitting transformers only on training data
Makes code cleaner and more reproducible
Enables easy deployment and serialization

from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier

pipeline = Pipeline([ ('scaler', StandardScaler()), ('classifier', RandomForestClassifier(random_state=42)) ])

Column Transformers

Use ColumnTransformer for different preprocessing per feature type
Combine numeric and categorical preprocessing in single pipeline

Model Selection and Tuning

Cross-Validation

Use cross-validation for reliable performance estimates
cross_val_score() for quick evaluation
cross_validate() for multiple metrics
Use appropriate CV strategy:
- KFold for regression
- StratifiedKFold for classification
- TimeSeriesSplit for temporal data
- GroupKFold for grouped data

Hyperparameter Tuning

Use GridSearchCV for exhaustive search
Use RandomizedSearchCV for large parameter spaces
Always tune on training/validation data, never test data
Set n_jobs=-1 for parallel processing

Model Evaluation

Classification Metrics

Use appropriate metrics for your problem:
- accuracy_score for balanced classes
- precision_score, recall_score, f1_score for imbalanced
- roc_auc_score for ranking ability
Use classification_report() for comprehensive overview
Examine confusion_matrix() for error analysis

Regression Metrics

mean_squared_error (MSE) for general use
mean_absolute_error (MAE) for interpretability
r2_score for explained variance

Evaluation Best Practices

Report confidence intervals, not just point estimates
Use multiple metrics to understand model behavior
Compare against meaningful baselines
Evaluate on held-out test set only once, at the end

Handling Imbalanced Data

Use stratified splitting and cross-validation
Consider class weights: class_weight='balanced'
Use appropriate metrics (F1, AUC-PR, not accuracy)
Adjust decision threshold based on business needs

Feature Selection

Use SelectKBest with statistical tests
Use RFE (Recursive Feature Elimination)
Use model-based selection: SelectFromModel
Examine feature importances from tree-based models

Model Persistence

Use joblib for saving and loading models
Save entire pipelines, not just models
Version control model artifacts
Document model metadata

Performance Optimization

Use n_jobs=-1 for parallel processing where available
Consider warm_start=True for iterative training
Use sparse matrices for high-dimensional sparse data
Consider incremental learning with partial_fit() for large data

Key Conventions

Import from submodules: from sklearn.ensemble import RandomForestClassifier
Set random_state for reproducibility
Use pipelines to prevent data leakage
Document model choices and hyperparameters

Weekly Installs

Repository

mindrally/skills

GitHub Stars

First Seen

Jan 25, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli84

opencode82

codex78

cursor78

github-copilot74

claude-code71

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

49,600 周安装

Scikit-learn 最佳实践指南：机器学习工作流、模型评估与代码规范

🇨🇳中文介绍

Scikit-learn 最佳实践

代码风格与结构

机器学习工作流

数据准备

特征工程

相关 Skills

管道

列转换器

模型选择与调优

交叉验证

超参数调优

模型评估

分类指标

回归指标

评估最佳实践