⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

ML模型部署指南：FastAPI、Docker、Kubernetes与监控全流程

model-deployment by secondsky/claude-skills

67 周安装量

93 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/secondsky/claude-skills --skill model-deployment

AI/机器学习开发运维部署策略

🇨🇳中文介绍

ML 模型部署

将训练好的模型部署到生产环境，并提供适当的服务和监控。

部署选项

方法	使用场景	延迟
REST API	Web 服务	中等
批处理	大规模处理	不适用
流式处理	实时处理	低
边缘计算	设备端	极低

FastAPI 模型服务器

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.get('/health')
def health():
    return {'status': 'healthy'}

@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0].max()
    return PredictionResponse(prediction=prediction, probability=probability)

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

快速开始：6 步部署模型

# 1. 保存训练好的模型
import joblib
joblib.dump(model, 'model.pkl')

# 2. 创建 FastAPI 应用（参见 references/fastapi-production-server.md）
# app.py 包含 /predict 和 /health 端点

# 3. 创建 Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF

# 4. 本地构建和测试
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0

# 5. 推送到注册表
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0

# 6. 部署到 Kubernetes
kubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api

1. 无健康检查 = 停机

问题：负载均衡器将流量发送到不健康的 Pod，导致 503 错误。

解决方案：同时实现存活性和就绪性探针：

# app.py
@app.get("/health")  # 存活探针：服务是否存活？
async def health():
    return {"status": "healthy"}

@app.get("/ready")  # 就绪探针：能否处理流量？
async def ready():
    try:
        _ = model_store.model  # 验证模型已加载
        return {"status": "ready"}
    except:
        raise HTTPException(503, "Not ready")



# deployment.yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 30
readinessProbe:
  httpGet:
    path: /ready
    port: 8000
  initialDelaySeconds: 5

2. 容器中模型未找到错误

问题：容器启动时出现 FileNotFoundError: model.pkl。

解决方案：验证模型文件在 Dockerfile 中已复制且路径匹配：

# ❌ 错误：模型在错误的目录中
COPY model.pkl /app/models/  # 但代码期望 /app/model.pkl

# ✅ 正确：路径一致
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl

# 在 Python 中：
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")

3. 未处理的输入验证 = 500 错误

问题：无效输入导致 API 因未处理的异常而崩溃。

解决方案：使用 Pydantic 进行自动验证：

from pydantic import BaseModel, Field, validator

class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=1, max_items=100)

    @validator('features')
    def validate_finite(cls, v):
        if not all(np.isfinite(val) for val in v):
            raise ValueError("All features must be finite")
        return v

# FastAPI 自动验证无效请求并返回 422
@app.post("/predict")
async def predict(request: PredictionRequest):
    # 此处请求保证有效
    pass

4. 无漂移监控 = 无声退化

问题：模型性能随时间下降，直到用户投诉才被发现。

解决方案：实现漂移检测（参见 references/model-monitoring-drift.md）：

monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict(features)
    monitor.log_prediction(features, prediction, latency)

    # 检测到漂移时发出警报
    if monitor.should_retrain():
        alert_manager.send_alert("Model drift detected - retrain recommended")

    return prediction

5. 缺少资源限制 = OOM 终止

问题：Pod 被 Kubernetes OOMKiller 终止，服务中断。

解决方案：设置内存/CPU 限制和请求：

resources:
  requests:
    memory: "512Mi"  # 保证量
    cpu: "500m"
  limits:
    memory: "1Gi"    # 最大允许量
    cpu: "1000m"

# 监控实际使用情况：
kubectl top pods

6. 无回滚计划 = 卡在错误部署上

问题：新模型版本存在错误，无法快速回滚。

解决方案：使用版本标签标记镜像，保留之前的部署：

# 使用版本标签部署
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0

# 如果出现问题，回滚到上一个版本
kubectl rollout undo deployment/model-api

# 或指定版本
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0

7. 同步预测 = 慢速批处理

问题：逐个处理 10,000 个预测需要数小时。

解决方案：实现批处理端点：

@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
    # 一次性处理所有（向量化）
    features = np.array(request.instances)
    predictions = model.predict(features)  # 快得多！
    return {"predictions": predictions.tolist()}

8. 无 CI/CD 验证 = 部署错误模型

问题：部署了未通过基本测试的模型，导致生产环境中断。

解决方案：在 CI 流水线中进行验证（参见 references/cicd-ml-models.md）：

# .github/workflows/deploy.yml
- name: Validate model performance
  run: |
    python scripts/validate_model.py \
      --model model.pkl \
      --test-data test.csv \
      --min-accuracy 0.85  # 低于阈值则失败

一切皆版本化：模型（语义化版本控制）、Docker 镜像、部署
持续监控：延迟、错误率、漂移、资源使用情况
部署前测试：单元测试、集成测试、性能基准测试
逐步部署：金丝雀部署（10%），然后全面推出
规划回滚：保留先前版本，文档化流程
记录预测：启用调试和漂移检测
设置资源限制：防止 OOM 终止和资源争用
使用健康检查：实现适当的负载均衡

何时加载参考文件

加载参考文件以获取详细实现：

FastAPI 生产服务器：加载 references/fastapi-production-server.md 以获取完整的生产就绪 FastAPI 实现，包括错误处理、验证（Pydantic 模型）、日志记录、健康/就绪探针、批处理预测、模型版本控制、中间件、异常处理程序和性能优化（缓存、异步）
模型监控与漂移：加载 references/model-monitoring-drift.md 以获取 ModelMonitor 实现，包括 KS 检验漂移检测、Jensen-Shannon 散度、Prometheus 指标集成、警报配置（Slack、电子邮件）、持续监控服务和仪表板端点
容器化与部署：加载 references/containerization-deployment.md 以获取多阶段 Dockerfile、容器中的模型版本控制、Docker Compose 设置、使用 Nginx 的 A/B 测试、Kubernetes 部署（滚动更新、蓝绿部署、金丝雀部署）、GitHub Actions CI/CD 和部署清单
ML 模型的 CI/CD：加载 references/cicd-ml-models.md 以获取完整的 GitHub Actions 流水线，包括模型验证、数据验证、自动化测试、安全扫描、性能基准测试、自动回滚和部署策略

🇺🇸English

ML Model Deployment

Deploy trained models to production with proper serving and monitoring.

Deployment Options

Method	Use Case	Latency
REST API	Web services	Medium
Batch	Large-scale processing	N/A
Streaming	Real-time	Low
Edge	On-device	Very low

FastAPI Model Server

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.get('/health')
def health():
    return {'status': 'healthy'}

@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0].max()
    return PredictionResponse(prediction=prediction, probability=probability)

Docker Deployment

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Model Monitoring

class ModelMonitor:
    def __init__(self):
        self.predictions = []
        self.latencies = []

    def log_prediction(self, input_data, prediction, latency):
        self.predictions.append({
            'input': input_data,
            'prediction': prediction,
            'latency': latency,
            'timestamp': datetime.now()
        })

    def detect_drift(self, reference_distribution):
        # Compare current predictions to reference
        pass

Deployment Checklist

Model validated on test set
API endpoints documented
Health check endpoint
Authentication configured
Logging and monitoring setup
Model versioning in place
Rollback procedure documented

Quick Start: Deploy Model in 6 Steps

# 1. Save trained model
import joblib
joblib.dump(model, 'model.pkl')

# 2. Create FastAPI app (see references/fastapi-production-server.md)
# app.py with /predict and /health endpoints

# 3. Create Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF

# 4. Build and test locally
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0

# 5. Push to registry
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0

# 6. Deploy to Kubernetes
kubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api

Known Issues Prevention

1. No Health Checks = Downtime

Problem : Load balancer sends traffic to unhealthy pods, causing 503 errors.

Solution : Implement both liveness and readiness probes:

# app.py
@app.get("/health")  # Liveness: Is service alive?
async def health():
    return {"status": "healthy"}

@app.get("/ready")  # Readiness: Can handle traffic?
async def ready():
    try:
        _ = model_store.model  # Verify model loaded
        return {"status": "ready"}
    except:
        raise HTTPException(503, "Not ready")



# deployment.yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 30
readinessProbe:
  httpGet:
    path: /ready
    port: 8000
  initialDelaySeconds: 5

2. Model Not Found Errors in Container

Problem : FileNotFoundError: model.pkl when container starts.

Solution : Verify model file is copied in Dockerfile and path matches:

# ❌ Wrong: Model in wrong directory
COPY model.pkl /app/models/  # But code expects /app/model.pkl

# ✅ Correct: Consistent paths
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl

# In Python:
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")

3. Unhandled Input Validation = 500 Errors

Problem : Invalid inputs crash API with unhandled exceptions.

Solution : Use Pydantic for automatic validation:

from pydantic import BaseModel, Field, validator

class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=1, max_items=100)

    @validator('features')
    def validate_finite(cls, v):
        if not all(np.isfinite(val) for val in v):
            raise ValueError("All features must be finite")
        return v

# FastAPI auto-validates and returns 422 for invalid requests
@app.post("/predict")
async def predict(request: PredictionRequest):
    # Request is guaranteed valid here
    pass

4. No Drift Monitoring = Silent Degradation

Problem : Model performance degrades over time, no one notices until users complain.

Solution : Implement drift detection (see references/model-monitoring-drift.md):

monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict(features)
    monitor.log_prediction(features, prediction, latency)

    # Alert if drift detected
    if monitor.should_retrain():
        alert_manager.send_alert("Model drift detected - retrain recommended")

    return prediction

5. Missing Resource Limits = OOM Kills

Problem : Pod killed by Kubernetes OOMKiller, service goes down.

Solution : Set memory/CPU limits and requests:

resources:
  requests:
    memory: "512Mi"  # Guaranteed
    cpu: "500m"
  limits:
    memory: "1Gi"    # Max allowed
    cpu: "1000m"

# Monitor actual usage:
kubectl top pods

6. No Rollback Plan = Stuck on Bad Deploy

Problem : New model version has bugs, no way to revert quickly.

Solution : Tag images with versions, keep previous deployment:

# Deploy with version tag
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0

# If issues, rollback to previous
kubectl rollout undo deployment/model-api

# Or specify version
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0

7. Synchronous Prediction = Slow Batch Processing

Problem : Processing 10,000 predictions one-by-one takes hours.

Solution : Implement batch endpoint:

@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
    # Process all at once (vectorized)
    features = np.array(request.instances)
    predictions = model.predict(features)  # Much faster!
    return {"predictions": predictions.tolist()}

8. No CI/CD Validation = Deploy Bad Models

Problem : Deploying model that fails basic tests, breaking production.

Solution : Validate in CI pipeline (see references/cicd-ml-models.md):

# .github/workflows/deploy.yml
- name: Validate model performance
  run: |
    python scripts/validate_model.py \
      --model model.pkl \
      --test-data test.csv \
      --min-accuracy 0.85  # Fail if below threshold

Best Practices

Version everything : Models (semantic versioning), Docker images, deployments
Monitor continuously : Latency, error rate, drift, resource usage
Test before deploy : Unit tests, integration tests, performance benchmarks
Deploy gradually : Canary (10%), then full rollout
Plan for rollback : Keep previous version, document procedure
Log predictions : Enable debugging and drift detection
Set resource limits : Prevent OOM kills and resource contention
Use health checks : Enable proper load balancing

When to Load References

Load reference files for detailed implementations:

FastAPI Production Server : Load references/fastapi-production-server.md for complete production-ready FastAPI implementation with error handling, validation (Pydantic models), logging, health/readiness probes, batch predictions, model versioning, middleware, exception handlers, and performance optimizations (caching, async)
Model Monitoring & Drift: Load references/model-monitoring-drift.md for ModelMonitor implementation with KS-test drift detection, Jensen-Shannon divergence, Prometheus metrics integration, alert configuration (Slack, email), continuous monitoring service, and dashboard endpoints
Containerization & Deployment: Load references/containerization-deployment.md for multi-stage Dockerfiles, model versioning in containers, Docker Compose setup, A/B testing with Nginx, Kubernetes deployments (rolling update, blue-green, canary), GitHub Actions CI/CD, and deployment checklists
CI/CD for ML Models : Load references/cicd-ml-models.md for complete GitHub Actions pipeline with model validation, data validation, automated testing, security scanning, performance benchmarks, automated rollback, and deployment strategies

Weekly Installs

Repository

secondsky/claude-skills

GitHub Stars

First Seen

Jan 25, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykPass

Installed on

claude-code62

gemini-cli54

codex53

cursor53

opencode53

github-copilot51