monitoring-observability by yonatangross/orchestkit
npx skills add https://github.com/yonatangross/orchestkit --skill monitoring-observability基础设施监控、LLM 可观测性和质量漂移检测的综合模式。每个类别在 rules/ 目录下都有独立的规则文件,按需加载。
| 类别 | 规则数量 | 影响级别 | 使用场景 |
|---|---|---|---|
| 基础设施监控 | 3 | 关键 | Prometheus 指标、Grafana 仪表板、告警规则 |
| LLM 可观测性 | 3 | 高 | Langfuse 追踪、成本跟踪、评估评分 |
| 漂移检测 | 3 | 高 | 统计漂移、质量回归、漂移告警 |
| 静默故障 | 3 | 高 | 工具跳过、质量下降、循环/令牌峰值告警 |
总计:4 个类别,共 12 条规则
# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram
http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])
# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client
@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
get_client().update_current_trace(
user_id="user_123", session_id="session_abc",
tags=["production", "orchestkit"],
)
result = await llm.generate(content)
get_client().score_current_span(name="response_quality", value=0.85)
return result
# PSI drift detection
import numpy as np
psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
alert("Significant quality drift detected!")
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
用于应用健康的 Prometheus 指标、Grafana 仪表板和告警。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| Prometheus 指标 | rules/monitoring-prometheus.md | RED 方法、计数器、直方图、基数 |
| Grafana 仪表板 | rules/monitoring-grafana.md | 黄金信号、SLO/SLI、健康检查 |
| 告警规则 | rules/monitoring-alerting.md | 严重级别、分组、升级、疲劳预防 |
基于 Langfuse 的追踪、成本跟踪和 LLM 应用评估。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| Langfuse 追踪 | rules/llm-langfuse-traces.md | @observe 装饰器、OTEL 跨度、智能体图 |
| 成本跟踪 | rules/llm-cost-tracking.md | 令牌使用量、支出告警、Metrics API v2 |
| 评估评分 | rules/llm-eval-scoring.md | 自定义分数、评估器追踪、质量监控 |
用于生产 LLM 系统的统计和质量漂移检测。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 统计漂移 | rules/drift-statistical.md | PSI、KS 检验、KL 散度、EWMA |
| 质量漂移 | rules/drift-quality.md | 分数回归、基线比较、金丝雀提示 |
| 漂移告警 | rules/drift-alerting.md | 动态阈值、相关性、反模式 |
LLM 智能体中静默故障的检测与告警。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 工具跳过 | rules/silent-tool-skipping.md | 预期与实际工具调用、Langfuse 追踪 |
| 质量下降 | rules/silent-degraded-quality.md | 启发式方法 + LLM 作为评判者、z 分数基线 |
| 静默告警 | rules/silent-alerting.md | 循环检测、令牌峰值、升级工作流 |
| 决策 | 建议 | 理由 |
|---|---|---|
| 指标方法论 | RED 方法(速率、错误、持续时间) | 行业标准,涵盖核心服务健康状态 |
| 日志格式 | 结构化 JSON | 机器可解析,支持日志聚合 |
| 追踪 | OpenTelemetry | 供应商中立、自动插桩、广泛的生态系统 |
| LLM 可观测性 | Langfuse(而非 LangSmith) | 开源、可自托管、内置提示管理 |
| LLM 追踪 API | @observe(as_type=...) + score_current_span() | v4:语义类型、内联评分、跨度过滤 |
| Langfuse API | Observations API v2 + Metrics API v2 | v4(2026 年 3 月):更快的查询、大规模聚合 |
| 漂移方法 | 生产环境用 PSI,小样本用 KS | PSI 对大数据集稳定,KS 更敏感 |
| 阈值策略 | 动态(第 95 百分位数)优于静态 | 减少告警疲劳,具备上下文感知能力 |
| 告警严重级别 | 4 个级别(关键、高、中、低) | 清晰的升级路径,适当的响应时间 |
| 资源 | 描述 |
|---|---|
${CLAUDE_SKILL_DIR}/references/ | 日志记录、指标、追踪、Langfuse、漂移分析指南 |
${CLAUDE_SKILL_DIR}/checklists/ | 监控和 Langfuse 设置的实现检查清单 |
${CLAUDE_SKILL_DIR}/examples/ | 真实世界的监控仪表板和追踪示例 |
${CLAUDE_SKILL_DIR}/scripts/ | 模板:Prometheus、OpenTelemetry、健康检查、Langfuse |
defense-in-depth - 作为安全架构一部分的第 8 层可观测性devops-deployment - 与 CI/CD 和 Kubernetes 集成的可观测性resilience-patterns - 监控熔断器和故障场景llm-evaluation - 与 Langfuse 评分集成的评估模式caching - 降低 Langfuse 跟踪成本的缓存策略每周安装次数
110
代码仓库
GitHub 星标数
132
首次出现
2026 年 2 月 14 日
安全审计
安装于
gemini-cli105
codex105
github-copilot105
opencode105
cursor103
kimi-cli99
Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.
| Category | Rules | Impact | When to Use |
|---|---|---|---|
| Infrastructure Monitoring | 3 | CRITICAL | Prometheus metrics, Grafana dashboards, alerting rules |
| LLM Observability | 3 | HIGH | Langfuse tracing, cost tracking, evaluation scoring |
| Drift Detection | 3 | HIGH | Statistical drift, quality regression, drift alerting |
| Silent Failures | 3 | HIGH | Tool skipping, quality degradation, loop/token spike alerting |
Total: 12 rules across 4 categories
# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram
http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])
# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client
@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
get_client().update_current_trace(
user_id="user_123", session_id="session_abc",
tags=["production", "orchestkit"],
)
result = await llm.generate(content)
get_client().score_current_span(name="response_quality", value=0.85)
return result
# PSI drift detection
import numpy as np
psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
alert("Significant quality drift detected!")
Prometheus metrics, Grafana dashboards, and alerting for application health.
| Rule | File | Key Pattern |
|---|---|---|
| Prometheus Metrics | rules/monitoring-prometheus.md | RED method, counters, histograms, cardinality |
| Grafana Dashboards | rules/monitoring-grafana.md | Golden Signals, SLO/SLI, health checks |
| Alerting Rules | rules/monitoring-alerting.md | Severity levels, grouping, escalation, fatigue prevention |
Langfuse-based tracing, cost tracking, and evaluation for LLM applications.
| Rule | File | Key Pattern |
|---|---|---|
| Langfuse Traces | rules/llm-langfuse-traces.md | @observe decorator, OTEL spans, agent graphs |
| Cost Tracking | rules/llm-cost-tracking.md | Token usage, spend alerts, Metrics API v2 |
| Eval Scoring | rules/llm-eval-scoring.md | Custom scores, evaluator tracing, quality monitoring |
Statistical and quality drift detection for production LLM systems.
| Rule | File | Key Pattern |
|---|---|---|
| Statistical Drift | rules/drift-statistical.md | PSI, KS test, KL divergence, EWMA |
| Quality Drift | rules/drift-quality.md | Score regression, baseline comparison, canary prompts |
| Drift Alerting | rules/drift-alerting.md | Dynamic thresholds, correlation, anti-patterns |
Detection and alerting for silent failures in LLM agents.
| Rule | File | Key Pattern |
|---|---|---|
| Tool Skipping | rules/silent-tool-skipping.md | Expected vs actual tool calls, Langfuse traces |
| Quality Degradation | rules/silent-degraded-quality.md | Heuristics + LLM-as-judge, z-score baselines |
| Silent Alerting | rules/silent-alerting.md | Loop detection, token spikes, escalation workflow |
| Decision | Recommendation | Rationale |
|---|---|---|
| Metric methodology | RED method (Rate, Errors, Duration) | Industry standard, covers essential service health |
| Log format | Structured JSON | Machine-parseable, supports log aggregation |
| Tracing | OpenTelemetry | Vendor-neutral, auto-instrumentation, broad ecosystem |
| LLM observability | Langfuse (not LangSmith) | Open-source, self-hosted, built-in prompt management |
| LLM tracing API | @observe(as_type=...) + score_current_span() | v4: semantic types, inline scoring, span filtering |
| Langfuse APIs | Observations API v2 + Metrics API v2 | v4 (Mar 2026): faster querying, aggregations at scale |
| Drift method |
| Resource | Description |
|---|---|
${CLAUDE_SKILL_DIR}/references/ | Logging, metrics, tracing, Langfuse, drift analysis guides |
${CLAUDE_SKILL_DIR}/checklists/ | Implementation checklists for monitoring and Langfuse setup |
${CLAUDE_SKILL_DIR}/examples/ | Real-world monitoring dashboard and trace examples |
${CLAUDE_SKILL_DIR}/scripts/ | Templates: Prometheus, OpenTelemetry, health checks, Langfuse |
defense-in-depth - Layer 8 observability as part of security architecturedevops-deployment - Observability integration with CI/CD and Kubernetesresilience-patterns - Monitoring circuit breakers and failure scenariosllm-evaluation - Evaluation patterns that integrate with Langfuse scoringcaching - Caching strategies that reduce costs tracked by LangfuseWeekly Installs
110
Repository
GitHub Stars
132
First Seen
Feb 14, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
gemini-cli105
codex105
github-copilot105
opencode105
cursor103
kimi-cli99
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
48,300 周安装
Clawtributor:AI代理安全报告工具,开源社区威胁情报贡献平台
104 周安装
ripgrep (rg) 快速文本搜索工具:比 grep 快 100 倍的正则表达式搜索
104 周安装
Python PDF处理指南:合并、拆分、提取文本与表格,创建PDF文件
104 周安装
Shopify Polaris Web Components 使用指南:为 App Home 构建 UI 的完整教程
104 周安装
每日新闻摘要生成器 - AI自动汇总多源新闻,智能生成Markdown报告
104 周安装
Obsidian CLI 官方命令行工具使用指南:文件管理、搜索、属性与任务操作
104 周安装
| PSI for production, KS for small samples |
| PSI is stable for large datasets, KS more sensitive |
| Threshold strategy | Dynamic (95th percentile) over static | Reduces alert fatigue, context-aware |
| Alert severity | 4 levels (Critical, High, Medium, Low) | Clear escalation paths, appropriate response times |