重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
model-deployment by secondsky/claude-skills
npx skills add https://github.com/secondsky/claude-skills --skill model-deployment将训练好的模型部署到生产环境,并提供适当的服务和监控。
| 方法 | 使用场景 | 延迟 |
|---|---|---|
| REST API | Web 服务 | 中等 |
| 批处理 | 大规模处理 | 不适用 |
| 流式处理 | 实时处理 | 低 |
| 边缘计算 | 设备端 | 极低 |
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI()
model = joblib.load('model.pkl')
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: float
probability: float
@app.get('/health')
def health():
return {'status': 'healthy'}
@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
features = np.array(request.features).reshape(1, -1)
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0].max()
return PredictionResponse(prediction=prediction, probability=probability)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.pkl .
COPY app.py .
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
class ModelMonitor:
def __init__(self):
self.predictions = []
self.latencies = []
def log_prediction(self, input_data, prediction, latency):
self.predictions.append({
'input': input_data,
'prediction': prediction,
'latency': latency,
'timestamp': datetime.now()
})
def detect_drift(self, reference_distribution):
# 将当前预测与参考分布进行比较
pass
# 1. 保存训练好的模型
import joblib
joblib.dump(model, 'model.pkl')
# 2. 创建 FastAPI 应用(参见 references/fastapi-production-server.md)
# app.py 包含 /predict 和 /health 端点
# 3. 创建 Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF
# 4. 本地构建和测试
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0
# 5. 推送到注册表
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0
# 6. 部署到 Kubernetes
kubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api
问题:负载均衡器将流量发送到不健康的 Pod,导致 503 错误。
解决方案:同时实现存活性和就绪性探针:
# app.py
@app.get("/health") # 存活探针:服务是否存活?
async def health():
return {"status": "healthy"}
@app.get("/ready") # 就绪探针:能否处理流量?
async def ready():
try:
_ = model_store.model # 验证模型已加载
return {"status": "ready"}
except:
raise HTTPException(503, "Not ready")
# deployment.yaml
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
问题:容器启动时出现 FileNotFoundError: model.pkl。
解决方案:验证模型文件在 Dockerfile 中已复制且路径匹配:
# ❌ 错误:模型在错误的目录中
COPY model.pkl /app/models/ # 但代码期望 /app/model.pkl
# ✅ 正确:路径一致
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl
# 在 Python 中:
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")
问题:无效输入导致 API 因未处理的异常而崩溃。
解决方案:使用 Pydantic 进行自动验证:
from pydantic import BaseModel, Field, validator
class PredictionRequest(BaseModel):
features: List[float] = Field(..., min_items=1, max_items=100)
@validator('features')
def validate_finite(cls, v):
if not all(np.isfinite(val) for val in v):
raise ValueError("All features must be finite")
return v
# FastAPI 自动验证无效请求并返回 422
@app.post("/predict")
async def predict(request: PredictionRequest):
# 此处请求保证有效
pass
问题:模型性能随时间下降,直到用户投诉才被发现。
解决方案:实现漂移检测(参见 references/model-monitoring-drift.md):
monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)
@app.post("/predict")
async def predict(request: PredictionRequest):
prediction = model.predict(features)
monitor.log_prediction(features, prediction, latency)
# 检测到漂移时发出警报
if monitor.should_retrain():
alert_manager.send_alert("Model drift detected - retrain recommended")
return prediction
问题:Pod 被 Kubernetes OOMKiller 终止,服务中断。
解决方案:设置内存/CPU 限制和请求:
resources:
requests:
memory: "512Mi" # 保证量
cpu: "500m"
limits:
memory: "1Gi" # 最大允许量
cpu: "1000m"
# 监控实际使用情况:
kubectl top pods
问题:新模型版本存在错误,无法快速回滚。
解决方案:使用版本标签标记镜像,保留之前的部署:
# 使用版本标签部署
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0
# 如果出现问题,回滚到上一个版本
kubectl rollout undo deployment/model-api
# 或指定版本
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0
问题:逐个处理 10,000 个预测需要数小时。
解决方案:实现批处理端点:
@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
# 一次性处理所有(向量化)
features = np.array(request.instances)
predictions = model.predict(features) # 快得多!
return {"predictions": predictions.tolist()}
问题:部署了未通过基本测试的模型,导致生产环境中断。
解决方案:在 CI 流水线中进行验证(参见 references/cicd-ml-models.md):
# .github/workflows/deploy.yml
- name: Validate model performance
run: |
python scripts/validate_model.py \
--model model.pkl \
--test-data test.csv \
--min-accuracy 0.85 # 低于阈值则失败
加载参考文件以获取详细实现:
FastAPI 生产服务器:加载 references/fastapi-production-server.md 以获取完整的生产就绪 FastAPI 实现,包括错误处理、验证(Pydantic 模型)、日志记录、健康/就绪探针、批处理预测、模型版本控制、中间件、异常处理程序和性能优化(缓存、异步)
模型监控与漂移:加载 references/model-monitoring-drift.md 以获取 ModelMonitor 实现,包括 KS 检验漂移检测、Jensen-Shannon 散度、Prometheus 指标集成、警报配置(Slack、电子邮件)、持续监控服务和仪表板端点
容器化与部署:加载 references/containerization-deployment.md 以获取多阶段 Dockerfile、容器中的模型版本控制、Docker Compose 设置、使用 Nginx 的 A/B 测试、Kubernetes 部署(滚动更新、蓝绿部署、金丝雀部署)、GitHub Actions CI/CD 和部署清单
ML 模型的 CI/CD:加载 references/cicd-ml-models.md 以获取完整的 GitHub Actions 流水线,包括模型验证、数据验证、自动化测试、安全扫描、性能基准测试、自动回滚和部署策略
每周安装次数
67
代码仓库
GitHub 星标数
93
首次出现
2026年1月25日
安全审计
安装于
claude-code62
gemini-cli54
codex53
cursor53
opencode53
github-copilot51
Deploy trained models to production with proper serving and monitoring.
| Method | Use Case | Latency |
|---|---|---|
| REST API | Web services | Medium |
| Batch | Large-scale processing | N/A |
| Streaming | Real-time | Low |
| Edge | On-device | Very low |
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI()
model = joblib.load('model.pkl')
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: float
probability: float
@app.get('/health')
def health():
return {'status': 'healthy'}
@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
features = np.array(request.features).reshape(1, -1)
prediction = model.predict(features)[0]
probability = model.predict_proba(features)[0].max()
return PredictionResponse(prediction=prediction, probability=probability)
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.pkl .
COPY app.py .
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
class ModelMonitor:
def __init__(self):
self.predictions = []
self.latencies = []
def log_prediction(self, input_data, prediction, latency):
self.predictions.append({
'input': input_data,
'prediction': prediction,
'latency': latency,
'timestamp': datetime.now()
})
def detect_drift(self, reference_distribution):
# Compare current predictions to reference
pass
# 1. Save trained model
import joblib
joblib.dump(model, 'model.pkl')
# 2. Create FastAPI app (see references/fastapi-production-server.md)
# app.py with /predict and /health endpoints
# 3. Create Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF
# 4. Build and test locally
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0
# 5. Push to registry
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0
# 6. Deploy to Kubernetes
kubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api
Problem : Load balancer sends traffic to unhealthy pods, causing 503 errors.
Solution : Implement both liveness and readiness probes:
# app.py
@app.get("/health") # Liveness: Is service alive?
async def health():
return {"status": "healthy"}
@app.get("/ready") # Readiness: Can handle traffic?
async def ready():
try:
_ = model_store.model # Verify model loaded
return {"status": "ready"}
except:
raise HTTPException(503, "Not ready")
# deployment.yaml
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
Problem : FileNotFoundError: model.pkl when container starts.
Solution : Verify model file is copied in Dockerfile and path matches:
# ❌ Wrong: Model in wrong directory
COPY model.pkl /app/models/ # But code expects /app/model.pkl
# ✅ Correct: Consistent paths
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl
# In Python:
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")
Problem : Invalid inputs crash API with unhandled exceptions.
Solution : Use Pydantic for automatic validation:
from pydantic import BaseModel, Field, validator
class PredictionRequest(BaseModel):
features: List[float] = Field(..., min_items=1, max_items=100)
@validator('features')
def validate_finite(cls, v):
if not all(np.isfinite(val) for val in v):
raise ValueError("All features must be finite")
return v
# FastAPI auto-validates and returns 422 for invalid requests
@app.post("/predict")
async def predict(request: PredictionRequest):
# Request is guaranteed valid here
pass
Problem : Model performance degrades over time, no one notices until users complain.
Solution : Implement drift detection (see references/model-monitoring-drift.md):
monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)
@app.post("/predict")
async def predict(request: PredictionRequest):
prediction = model.predict(features)
monitor.log_prediction(features, prediction, latency)
# Alert if drift detected
if monitor.should_retrain():
alert_manager.send_alert("Model drift detected - retrain recommended")
return prediction
Problem : Pod killed by Kubernetes OOMKiller, service goes down.
Solution : Set memory/CPU limits and requests:
resources:
requests:
memory: "512Mi" # Guaranteed
cpu: "500m"
limits:
memory: "1Gi" # Max allowed
cpu: "1000m"
# Monitor actual usage:
kubectl top pods
Problem : New model version has bugs, no way to revert quickly.
Solution : Tag images with versions, keep previous deployment:
# Deploy with version tag
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0
# If issues, rollback to previous
kubectl rollout undo deployment/model-api
# Or specify version
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0
Problem : Processing 10,000 predictions one-by-one takes hours.
Solution : Implement batch endpoint:
@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
# Process all at once (vectorized)
features = np.array(request.instances)
predictions = model.predict(features) # Much faster!
return {"predictions": predictions.tolist()}
Problem : Deploying model that fails basic tests, breaking production.
Solution : Validate in CI pipeline (see references/cicd-ml-models.md):
# .github/workflows/deploy.yml
- name: Validate model performance
run: |
python scripts/validate_model.py \
--model model.pkl \
--test-data test.csv \
--min-accuracy 0.85 # Fail if below threshold
Load reference files for detailed implementations:
FastAPI Production Server : Load references/fastapi-production-server.md for complete production-ready FastAPI implementation with error handling, validation (Pydantic models), logging, health/readiness probes, batch predictions, model versioning, middleware, exception handlers, and performance optimizations (caching, async)
Model Monitoring & Drift: Load references/model-monitoring-drift.md for ModelMonitor implementation with KS-test drift detection, Jensen-Shannon divergence, Prometheus metrics integration, alert configuration (Slack, email), continuous monitoring service, and dashboard endpoints
Containerization & Deployment: Load references/containerization-deployment.md for multi-stage Dockerfiles, model versioning in containers, Docker Compose setup, A/B testing with Nginx, Kubernetes deployments (rolling update, blue-green, canary), GitHub Actions CI/CD, and deployment checklists
CI/CD for ML Models : Load references/cicd-ml-models.md for complete GitHub Actions pipeline with model validation, data validation, automated testing, security scanning, performance benchmarks, automated rollback, and deployment strategies
Weekly Installs
67
Repository
GitHub Stars
93
First Seen
Jan 25, 2026
Security Audits
Gen Agent Trust HubWarnSocketPassSnykPass
Installed on
claude-code62
gemini-cli54
codex53
cursor53
opencode53
github-copilot51