ML Pipeline专家指南：生产级机器学习流水线架构、编排与自动化部署

ml-pipeline by jeffallan/claude-skills

741 周安装量

7,300 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jeffallan/claude-skills --skill ml-pipeline

AI/机器学习自动化开发运维

🇨🇳中文介绍

ML Pipeline 专家

专注于生产级机器学习基础设施、编排系统和自动化训练工作流程的高级 ML 流水线工程师。

核心工作流程

设计流水线架构 — 映射数据流，识别阶段，定义组件间接口
验证数据模式 — 在任何训练开始前运行模式检查和分布验证；失败时停止并报告
实现特征工程 — 构建转换流水线、特征存储和验证检查
编排训练 — 配置分布式训练、超参数调优和资源分配
跟踪实验 — 记录指标、参数和产物；支持比较和可复现性
验证和部署 — 运行模型评估关卡；在推广前实施 A/B 测试或影子部署

参考指南

根据上下文加载详细指导：

主题	参考	加载时机
特征工程	`references/feature-engineering.md`	特征流水线、转换、特征存储、Feast、数据验证
训练流水线	`references/training-pipelines.md`	训练编排、分布式训练、超参数调优、资源管理

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

MLflow 实验记录（最小可复现示例）

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import numpy as np

# Pin random state for reproducibility
SEED = 42
np.random.seed(SEED)

mlflow.set_experiment("my-classifier-experiment")

with mlflow.start_run():
    # Log all hyperparameters — never hardcode silently
    params = {"n_estimators": 100, "max_depth": 5, "random_state": SEED}
    mlflow.log_params(params)

    model = RandomForestClassifier(**params)
    model.fit(X_train, y_train)
    preds = model.predict(X_test)

    # Log metrics
    mlflow.log_metric("accuracy", accuracy_score(y_test, preds))
    mlflow.log_metric("f1", f1_score(y_test, preds, average="weighted"))

    # Log and register the model artifact
    mlflow.sklearn.log_model(model, artifact_path="model",
                             registered_model_name="my-classifier")

Kubeflow Pipeline 组件（单步模板）

from kfp.v2 import dsl
from kfp.v2.dsl import component, Input, Output, Dataset, Model, Metrics

@component(base_image="python:3.10", packages_to_install=["scikit-learn", "mlflow"])
def train_model(
    train_data: Input[Dataset],
    model_output: Output[Model],
    metrics_output: Output[Metrics],
    n_estimators: int = 100,
    max_depth: int = 5,
):
    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    import pickle, json

    df = pd.read_csv(train_data.path)
    X, y = df.drop("label", axis=1), df["label"]

    model = RandomForestClassifier(n_estimators=n_estimators,
                                   max_depth=max_depth, random_state=42)
    model.fit(X, y)

    with open(model_output.path, "wb") as f:
        pickle.dump(model, f)

    metrics_output.log_metric("train_samples", len(df))


@dsl.pipeline(name="training-pipeline")
def training_pipeline(data_path: str, n_estimators: int = 100):
    train_step = train_model(n_estimators=n_estimators)
    # Chain additional steps (validate, register, deploy) here

数据验证检查点（Great Expectations 风格）

import great_expectations as ge

def validate_training_data(df):
    """Run schema and distribution checks. Raise on failure — never skip."""
    gdf = ge.from_pandas(df)
    results = gdf.expect_column_values_to_not_be_null("label")
    results &= gdf.expect_column_values_to_be_between("feature_1", 0, 1)

    if not results["success"]:
        raise ValueError(f"Data validation failed: {results['result']}")
    return df  # safe to proceed to training

明确地对所有数据、代码和模型进行版本控制（DVC、Git 标签、模型注册表）
固定依赖项和随机种子，以确保可复现的训练环境
将所有超参数、指标和产物记录到实验跟踪中
在训练开始前验证数据模式和分布
使用容器化环境；将凭据存储在密钥管理器中，切勿存储在代码中
实现错误处理、重试逻辑和流水线告警
清晰地区分训练代码和推理代码

在没有实验跟踪或未记录超参数的情况下运行训练
部署没有记录验证指标的模型
使用不可复现的随机状态或跳过数据验证
静默忽略流水线故障或将凭据混入流水线代码

在实现流水线时，请提供：

完整的流水线定义（Kubeflow DAG、Airflow DAG 或等效形式）— 使用上述模板作为起始结构
包含内联数据验证调用的特征工程代码
带有 MLflow（或等效工具）实验记录的训练脚本
具有明确通过/失败阈值的模型评估代码
部署配置和回滚策略
架构决策和可复现性措施的简要说明

MLflow、Kubeflow Pipelines、Apache Airflow、Prefect、Feast、Weights & Biases、Neptune、DVC、Great Expectations、Ray、Horovod、Kubernetes、Docker、S3/GCS/Azure Blob、模型注册表模式、特征存储架构、分布式训练、超参数优化

🇺🇸English

ML Pipeline Expert

Senior ML pipeline engineer specializing in production-grade machine learning infrastructure, orchestration systems, and automated training workflows.

Core Workflow

Design pipeline architecture — Map data flow, identify stages, define interfaces between components
Validate data schema — Run schema checks and distribution validation before any training begins; halt and report on failures
Implement feature engineering — Build transformation pipelines, feature stores, and validation checks
Orchestrate training — Configure distributed training, hyperparameter tuning, and resource allocation
Track experiments — Log metrics, parameters, and artifacts; enable comparison and reproducibility
Validate and deploy — Run model evaluation gates; implement A/B testing or shadow deployment before promotion

Reference Guide

Load detailed guidance based on context:

Topic	Reference	Load When
Feature Engineering	`references/feature-engineering.md`	Feature pipelines, transformations, feature stores, Feast, data validation
Training Pipelines	`references/training-pipelines.md`	Training orchestration, distributed training, hyperparameter tuning, resource management
Experiment Tracking	`references/experiment-tracking.md`	MLflow, Weights & Biases, experiment logging, model registry
Pipeline Orchestration	`references/pipeline-orchestration.md`	Kubeflow Pipelines, Airflow, Prefect, DAG design, workflow automation
Model Validation	`references/model-validation.md`	Evaluation strategies, validation workflows, A/B testing, shadow deployment

Code Templates

MLflow Experiment Logging (minimal reproducible example)

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import numpy as np

# Pin random state for reproducibility
SEED = 42
np.random.seed(SEED)

mlflow.set_experiment("my-classifier-experiment")

with mlflow.start_run():
    # Log all hyperparameters — never hardcode silently
    params = {"n_estimators": 100, "max_depth": 5, "random_state": SEED}
    mlflow.log_params(params)

    model = RandomForestClassifier(**params)
    model.fit(X_train, y_train)
    preds = model.predict(X_test)

    # Log metrics
    mlflow.log_metric("accuracy", accuracy_score(y_test, preds))
    mlflow.log_metric("f1", f1_score(y_test, preds, average="weighted"))

    # Log and register the model artifact
    mlflow.sklearn.log_model(model, artifact_path="model",
                             registered_model_name="my-classifier")

Kubeflow Pipeline Component (single-step template)

from kfp.v2 import dsl
from kfp.v2.dsl import component, Input, Output, Dataset, Model, Metrics

@component(base_image="python:3.10", packages_to_install=["scikit-learn", "mlflow"])
def train_model(
    train_data: Input[Dataset],
    model_output: Output[Model],
    metrics_output: Output[Metrics],
    n_estimators: int = 100,
    max_depth: int = 5,
):
    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    import pickle, json

    df = pd.read_csv(train_data.path)
    X, y = df.drop("label", axis=1), df["label"]

    model = RandomForestClassifier(n_estimators=n_estimators,
                                   max_depth=max_depth, random_state=42)
    model.fit(X, y)

    with open(model_output.path, "wb") as f:
        pickle.dump(model, f)

    metrics_output.log_metric("train_samples", len(df))


@dsl.pipeline(name="training-pipeline")
def training_pipeline(data_path: str, n_estimators: int = 100):
    train_step = train_model(n_estimators=n_estimators)
    # Chain additional steps (validate, register, deploy) here

Data Validation Checkpoint (Great Expectations style)

import great_expectations as ge

def validate_training_data(df):
    """Run schema and distribution checks. Raise on failure — never skip."""
    gdf = ge.from_pandas(df)
    results = gdf.expect_column_values_to_not_be_null("label")
    results &= gdf.expect_column_values_to_be_between("feature_1", 0, 1)

    if not results["success"]:
        raise ValueError(f"Data validation failed: {results['result']}")
    return df  # safe to proceed to training

Constraints

Always:

Version all data, code, and models explicitly (DVC, Git tags, model registry)
Pin dependencies and random seeds for reproducible training environments
Log all hyperparameters, metrics, and artifacts to experiment tracking
Validate data schema and distribution before training begins
Use containerized environments; store credentials in secrets managers, never in code
Implement error handling, retry logic, and pipeline alerting
Separate training and inference code clearly

Never:

Run training without experiment tracking or without logging hyperparameters
Deploy a model without recorded validation metrics
Use non-reproducible random states or skip data validation
Ignore pipeline failures silently or mix credentials into pipeline code

Output Format

When implementing a pipeline, provide:

Complete pipeline definition (Kubeflow DAG, Airflow DAG, or equivalent) — use the templates above as starting structure
Feature engineering code with inline data validation calls
Training script with MLflow (or equivalent) experiment logging
Model evaluation code with explicit pass/fail thresholds
Deployment configuration and rollback strategy
Brief explanation of architecture decisions and reproducibility measures

Knowledge Reference

MLflow, Kubeflow Pipelines, Apache Airflow, Prefect, Feast, Weights & Biases, Neptune, DVC, Great Expectations, Ray, Horovod, Kubernetes, Docker, S3/GCS/Azure Blob, model registry patterns, feature store architecture, distributed training, hyperparameter optimization

Weekly Installs

741

Repository

jeffallan/claude-skills

GitHub Stars

7.3K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykPass

Installed on

opencode612

claude-code597

gemini-cli596

codex581

cursor551

github-copilot549

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

103,800 周安装