senior-ml-engineer by alirezarezvani/claude-skills
npx skills add https://github.com/alirezarezvani/claude-skills --skill senior-ml-engineer面向模型部署、MLOps基础设施和LLM集成的生产级机器学习工程模式。
将训练好的模型部署到生产环境并进行监控:
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model/ /app/model/
COPY src/ /app/src/
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]
| 选项 | 延迟 | 吞吐量 |
|---|
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 使用场景 |
|---|
| FastAPI + Uvicorn | 低 | 中等 | REST API,小型模型 |
| Triton Inference Server | 非常低 | 非常高 | GPU推理,批处理 |
| TensorFlow Serving | 低 | 高 | TensorFlow模型 |
| TorchServe | 低 | 高 | PyTorch模型 |
| Ray Serve | 中等 | 高 | 复杂流水线,多模型 |
建立自动化的训练和部署流程:
from feast import Entity, Feature, FeatureView, FileSource
user = Entity(name="user_id", value_type=ValueType.INT64)
user_features = FeatureView(
name="user_features",
entities=["user_id"],
ttl=timedelta(days=1),
features=[
Feature(name="purchase_count_30d", dtype=ValueType.INT64),
Feature(name="avg_order_value", dtype=ValueType.FLOAT),
],
online=True,
source=FileSource(path="data/user_features.parquet"),
)
| 触发条件 | 检测方式 | 操作 |
|---|---|---|
| 计划任务 | Cron(每周/每月) | 完全重新训练 |
| 性能下降 | 准确率 < 阈值 | 立即重新训练 |
| 数据漂移 | PSI > 0.2 | 评估,然后重新训练 |
| 新数据量 | X个新样本 | 增量更新 |
将LLM API集成到生产应用程序中:
from abc import ABC, abstractmethod
from tenacity import retry, stop_after_attempt, wait_exponential
class LLMProvider(ABC):
@abstractmethod
def complete(self, prompt: str, **kwargs) -> str:
pass
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str:
return provider.complete(prompt)
| 供应商 | 输入成本 | 输出成本 |
|---|---|---|
| GPT-4 | $0.03/1K | $0.06/1K |
| GPT-3.5 | $0.0005/1K | $0.0015/1K |
| Claude 3 Opus | $0.015/1K | $0.075/1K |
| Claude 3 Haiku | $0.00025/1K | $0.00125/1K |
构建检索增强生成流水线:
| 数据库 | 托管方式 | 扩展性 | 延迟 | 最适合 |
|---|---|---|---|---|
| Pinecone | 托管 | 高 | 低 | 生产环境,托管服务 |
| Qdrant | 两者皆可 | 高 | 非常低 | 性能关键型应用 |
| Weaviate | 两者皆可 | 高 | 低 | 混合搜索 |
| Chroma | 自托管 | 中等 | 低 | 原型开发 |
| pgvector | 自托管 | 中等 | 中等 | 已有Postgres |
| 策略 | 分块大小 | 重叠 | 最适合 |
|---|---|---|---|
| 固定大小 | 500-1000个令牌 | 50-100 | 通用文本 |
| 按句子 | 3-5个句子 | 1个句子 | 结构化文本 |
| 语义 | 可变 | 基于含义 | 研究论文 |
| 递归 | 分层 | 父子关系 | 长文档 |
监控生产模型的漂移和性能下降:
from scipy.stats import ks_2samp
def detect_drift(reference, current, threshold=0.05):
statistic, p_value = ks_2samp(reference, current)
return {
"drift_detected": p_value < threshold,
"ks_statistic": statistic,
"p_value": p_value
}
| 指标 | 警告 | 严重 |
|---|---|---|
| p95延迟 | > 100ms | > 200ms |
| 错误率 | > 0.1% | > 1% |
| PSI(漂移) | > 0.1 | > 0.2 |
| 准确率下降 | > 2% | > 5% |
references/mlops_production_patterns.md 包含:
references/llm_integration_guide.md 包含:
references/rag_system_architecture.md 包含:
python scripts/model_deployment_pipeline.py --model model.pkl --target staging
生成部署工件:Dockerfile、Kubernetes清单、健康检查。
python scripts/rag_system_builder.py --config rag_config.yaml --analyze
搭建包含向量存储集成和检索逻辑的RAG流水线脚手架。
python scripts/ml_monitoring_suite.py --config monitoring.yaml --deploy
设置漂移检测、告警和性能仪表板。
| 类别 | 工具 |
|---|---|
| 机器学习框架 | PyTorch, TensorFlow, Scikit-learn, XGBoost |
| LLM框架 | LangChain, LlamaIndex, DSPy |
| MLOps | MLflow, Weights & Biases, Kubeflow |
| 数据 | Spark, Airflow, dbt, Kafka |
| 部署 | Docker, Kubernetes, Triton |
| 数据库 | PostgreSQL, BigQuery, Pinecone, Redis |
每周安装次数
182
代码仓库
GitHub星标数
4.3K
首次出现
Jan 20, 2026
安全审计
安装于
claude-code160
opencode138
gemini-cli135
codex127
cursor116
github-copilot112
Production ML engineering patterns for model deployment, MLOps infrastructure, and LLM integration.
Deploy a trained model to production with monitoring:
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model/ /app/model/
COPY src/ /app/src/
HEALTHCHECK CMD curl -f http://localhost:8080/health || exit 1
EXPOSE 8080
CMD ["uvicorn", "src.server:app", "--host", "0.0.0.0", "--port", "8080"]
| Option | Latency | Throughput | Use Case |
|---|---|---|---|
| FastAPI + Uvicorn | Low | Medium | REST APIs, small models |
| Triton Inference Server | Very Low | Very High | GPU inference, batching |
| TensorFlow Serving | Low | High | TensorFlow models |
| TorchServe | Low | High | PyTorch models |
| Ray Serve | Medium | High | Complex pipelines, multi-model |
Establish automated training and deployment:
from feast import Entity, Feature, FeatureView, FileSource
user = Entity(name="user_id", value_type=ValueType.INT64)
user_features = FeatureView(
name="user_features",
entities=["user_id"],
ttl=timedelta(days=1),
features=[
Feature(name="purchase_count_30d", dtype=ValueType.INT64),
Feature(name="avg_order_value", dtype=ValueType.FLOAT),
],
online=True,
source=FileSource(path="data/user_features.parquet"),
)
| Trigger | Detection | Action |
|---|---|---|
| Scheduled | Cron (weekly/monthly) | Full retrain |
| Performance drop | Accuracy < threshold | Immediate retrain |
| Data drift | PSI > 0.2 | Evaluate, then retrain |
| New data volume | X new samples | Incremental update |
Integrate LLM APIs into production applications:
from abc import ABC, abstractmethod
from tenacity import retry, stop_after_attempt, wait_exponential
class LLMProvider(ABC):
@abstractmethod
def complete(self, prompt: str, **kwargs) -> str:
pass
@retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
def call_llm_with_retry(provider: LLMProvider, prompt: str) -> str:
return provider.complete(prompt)
| Provider | Input Cost | Output Cost |
|---|---|---|
| GPT-4 | $0.03/1K | $0.06/1K |
| GPT-3.5 | $0.0005/1K | $0.0015/1K |
| Claude 3 Opus | $0.015/1K | $0.075/1K |
| Claude 3 Haiku | $0.00025/1K | $0.00125/1K |
Build retrieval-augmented generation pipeline:
| Database | Hosting | Scale | Latency | Best For |
|---|---|---|---|---|
| Pinecone | Managed | High | Low | Production, managed |
| Qdrant | Both | High | Very Low | Performance-critical |
| Weaviate | Both | High | Low | Hybrid search |
| Chroma | Self-hosted | Medium | Low | Prototyping |
| pgvector | Self-hosted | Medium | Medium | Existing Postgres |
| Strategy | Chunk Size | Overlap | Best For |
|---|---|---|---|
| Fixed | 500-1000 tokens | 50-100 | General text |
| Sentence | 3-5 sentences | 1 sentence | Structured text |
| Semantic | Variable | Based on meaning | Research papers |
| Recursive | Hierarchical | Parent-child | Long documents |
Monitor production models for drift and degradation:
from scipy.stats import ks_2samp
def detect_drift(reference, current, threshold=0.05):
statistic, p_value = ks_2samp(reference, current)
return {
"drift_detected": p_value < threshold,
"ks_statistic": statistic,
"p_value": p_value
}
| Metric | Warning | Critical |
|---|---|---|
| p95 latency | > 100ms | > 200ms |
| Error rate | > 0.1% | > 1% |
| PSI (drift) | > 0.1 | > 0.2 |
| Accuracy drop | > 2% | > 5% |
references/mlops_production_patterns.md contains:
references/llm_integration_guide.md contains:
references/rag_system_architecture.md contains:
python scripts/model_deployment_pipeline.py --model model.pkl --target staging
Generates deployment artifacts: Dockerfile, Kubernetes manifests, health checks.
python scripts/rag_system_builder.py --config rag_config.yaml --analyze
Scaffolds RAG pipeline with vector store integration and retrieval logic.
python scripts/ml_monitoring_suite.py --config monitoring.yaml --deploy
Sets up drift detection, alerting, and performance dashboards.
| Category | Tools |
|---|---|
| ML Frameworks | PyTorch, TensorFlow, Scikit-learn, XGBoost |
| LLM Frameworks | LangChain, LlamaIndex, DSPy |
| MLOps | MLflow, Weights & Biases, Kubeflow |
| Data | Spark, Airflow, dbt, Kafka |
| Deployment | Docker, Kubernetes, Triton |
| Databases | PostgreSQL, BigQuery, Pinecone, Redis |
Weekly Installs
182
Repository
GitHub Stars
4.3K
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code160
opencode138
gemini-cli135
codex127
cursor116
github-copilot112
Azure RBAC 权限管理工具:查找最小角色、创建自定义角色与自动化分配
123,100 周安装