监控与可观测性：基础设施、LLM应用与质量漂移检测综合指南

monitoring-observability by yonatangross/orchestkit

110 周安装量

132 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/yonatangross/orchestkit --skill monitoring-observability

AI/机器学习可观测性监控

🇨🇳中文介绍

监控与可观测性

基础设施监控、LLM 可观测性和质量漂移检测的综合模式。每个类别在 rules/ 目录下都有独立的规则文件，按需加载。

快速参考

类别	规则数量	影响级别	使用场景
基础设施监控	3	关键	Prometheus 指标、Grafana 仪表板、告警规则
LLM 可观测性	3	高	Langfuse 追踪、成本跟踪、评估评分
漂移检测	3	高	统计漂移、质量回归、漂移告警
静默故障	3	高	工具跳过、质量下降、循环/令牌峰值告警

总计：4 个类别，共 12 条规则

快速开始

# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram

http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
    buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])



# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client

@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
    get_client().update_current_trace(
        user_id="user_123", session_id="session_abc",
        tags=["production", "orchestkit"],
    )
    result = await llm.generate(content)
    get_client().score_current_span(name="response_quality", value=0.85)
    return result



# PSI drift detection
import numpy as np

psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
    alert("Significant quality drift detected!")

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

867,400 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

116,600 周安装

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

66,200 周安装

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

48,700 周安装

用于应用健康的 Prometheus 指标、Grafana 仪表板和告警。

规则	文件	关键模式
Prometheus 指标	`rules/monitoring-prometheus.md`	RED 方法、计数器、直方图、基数
Grafana 仪表板	`rules/monitoring-grafana.md`	黄金信号、SLO/SLI、健康检查
告警规则	`rules/monitoring-alerting.md`	严重级别、分组、升级、疲劳预防

基于 Langfuse 的追踪、成本跟踪和 LLM 应用评估。

规则	文件	关键模式
Langfuse 追踪	`rules/llm-langfuse-traces.md`	@observe 装饰器、OTEL 跨度、智能体图
成本跟踪	`rules/llm-cost-tracking.md`	令牌使用量、支出告警、Metrics API v2
评估评分	`rules/llm-eval-scoring.md`	自定义分数、评估器追踪、质量监控

用于生产 LLM 系统的统计和质量漂移检测。

规则	文件	关键模式
统计漂移	`rules/drift-statistical.md`	PSI、KS 检验、KL 散度、EWMA
质量漂移	`rules/drift-quality.md`	分数回归、基线比较、金丝雀提示
漂移告警	`rules/drift-alerting.md`	动态阈值、相关性、反模式

LLM 智能体中静默故障的检测与告警。

规则	文件	关键模式
工具跳过	`rules/silent-tool-skipping.md`	预期与实际工具调用、Langfuse 追踪
质量下降	`rules/silent-degraded-quality.md`	启发式方法 + LLM 作为评判者、z 分数基线
静默告警	`rules/silent-alerting.md`	循环检测、令牌峰值、升级工作流

决策	建议	理由
指标方法论	RED 方法（速率、错误、持续时间）	行业标准，涵盖核心服务健康状态
日志格式	结构化 JSON	机器可解析，支持日志聚合
追踪	OpenTelemetry	供应商中立、自动插桩、广泛的生态系统
LLM 可观测性	Langfuse（而非 LangSmith）	开源、可自托管、内置提示管理
LLM 追踪 API	`@observe(as_type=...)` + `score_current_span()`	v4：语义类型、内联评分、跨度过滤
Langfuse API	Observations API v2 + Metrics API v2	v4（2026 年 3 月）：更快的查询、大规模聚合
漂移方法	生产环境用 PSI，小样本用 KS	PSI 对大数据集稳定，KS 更敏感
阈值策略	动态（第 95 百分位数）优于静态	减少告警疲劳，具备上下文感知能力
告警严重级别	4 个级别（关键、高、中、低）	清晰的升级路径，适当的响应时间

资源	描述
`${CLAUDE_SKILL_DIR}/references/`	日志记录、指标、追踪、Langfuse、漂移分析指南
`${CLAUDE_SKILL_DIR}/checklists/`	监控和 Langfuse 设置的实现检查清单
`${CLAUDE_SKILL_DIR}/examples/`	真实世界的监控仪表板和追踪示例
`${CLAUDE_SKILL_DIR}/scripts/`	模板：Prometheus、OpenTelemetry、健康检查、Langfuse

defense-in-depth - 作为安全架构一部分的第 8 层可观测性
devops-deployment - 与 CI/CD 和 Kubernetes 集成的可观测性
resilience-patterns - 监控熔断器和故障场景
llm-evaluation - 与 Langfuse 评分集成的评估模式
caching - 降低 Langfuse 跟踪成本的缓存策略

2026 年 2 月 14 日

🇺🇸English

Monitoring & Observability

Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

Category	Rules	Impact	When to Use
Infrastructure Monitoring	3	CRITICAL	Prometheus metrics, Grafana dashboards, alerting rules
LLM Observability	3	HIGH	Langfuse tracing, cost tracking, evaluation scoring
Drift Detection	3	HIGH	Statistical drift, quality regression, drift alerting
Silent Failures	3	HIGH	Tool skipping, quality degradation, loop/token spike alerting

Total: 12 rules across 4 categories

Quick Start

# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram

http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
    buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])



# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client

@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
    get_client().update_current_trace(
        user_id="user_123", session_id="session_abc",
        tags=["production", "orchestkit"],
    )
    result = await llm.generate(content)
    get_client().score_current_span(name="response_quality", value=0.85)
    return result



# PSI drift detection
import numpy as np

psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
    alert("Significant quality drift detected!")

Infrastructure Monitoring

Prometheus metrics, Grafana dashboards, and alerting for application health.

Rule	File	Key Pattern
Prometheus Metrics	`rules/monitoring-prometheus.md`	RED method, counters, histograms, cardinality
Grafana Dashboards	`rules/monitoring-grafana.md`	Golden Signals, SLO/SLI, health checks
Alerting Rules	`rules/monitoring-alerting.md`	Severity levels, grouping, escalation, fatigue prevention

LLM Observability

Langfuse-based tracing, cost tracking, and evaluation for LLM applications.

Rule	File	Key Pattern
Langfuse Traces	`rules/llm-langfuse-traces.md`	@observe decorator, OTEL spans, agent graphs
Cost Tracking	`rules/llm-cost-tracking.md`	Token usage, spend alerts, Metrics API v2
Eval Scoring	`rules/llm-eval-scoring.md`	Custom scores, evaluator tracing, quality monitoring

Drift Detection

Statistical and quality drift detection for production LLM systems.

Rule	File	Key Pattern
Statistical Drift	`rules/drift-statistical.md`	PSI, KS test, KL divergence, EWMA
Quality Drift	`rules/drift-quality.md`	Score regression, baseline comparison, canary prompts
Drift Alerting	`rules/drift-alerting.md`	Dynamic thresholds, correlation, anti-patterns

Silent Failures

Detection and alerting for silent failures in LLM agents.

Rule	File	Key Pattern
Tool Skipping	`rules/silent-tool-skipping.md`	Expected vs actual tool calls, Langfuse traces
Quality Degradation	`rules/silent-degraded-quality.md`	Heuristics + LLM-as-judge, z-score baselines
Silent Alerting	`rules/silent-alerting.md`	Loop detection, token spikes, escalation workflow

Key Decisions

Decision	Recommendation	Rationale
Metric methodology	RED method (Rate, Errors, Duration)	Industry standard, covers essential service health
Log format	Structured JSON	Machine-parseable, supports log aggregation
Tracing	OpenTelemetry	Vendor-neutral, auto-instrumentation, broad ecosystem
LLM observability	Langfuse (not LangSmith)	Open-source, self-hosted, built-in prompt management
LLM tracing API	`@observe(as_type=...)` + `score_current_span()`	v4: semantic types, inline scoring, span filtering
Langfuse APIs	Observations API v2 + Metrics API v2	v4 (Mar 2026): faster querying, aggregations at scale
Drift method

Detailed Documentation

Resource	Description
`${CLAUDE_SKILL_DIR}/references/`	Logging, metrics, tracing, Langfuse, drift analysis guides
`${CLAUDE_SKILL_DIR}/checklists/`	Implementation checklists for monitoring and Langfuse setup
`${CLAUDE_SKILL_DIR}/examples/`	Real-world monitoring dashboard and trace examples
`${CLAUDE_SKILL_DIR}/scripts/`	Templates: Prometheus, OpenTelemetry, health checks, Langfuse

Related Skills

defense-in-depth - Layer 8 observability as part of security architecture
devops-deployment - Observability integration with CI/CD and Kubernetes
resilience-patterns - Monitoring circuit breakers and failure scenarios
llm-evaluation - Evaluation patterns that integrate with Langfuse scoring
caching - Caching strategies that reduce costs tracked by Langfuse

Weekly Installs

110

Repository

yonatangross/orchestkit

GitHub Stars

132

First Seen

Feb 14, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

gemini-cli105

codex105

github-copilot105

opencode105

cursor103

kimi-cli99

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

48,300 周安装

监控与可观测性：基础设施、LLM应用与质量漂移检测综合指南

🇨🇳中文介绍

监控与可观测性

快速参考

快速开始

相关 Skills

基础设施监控

LLM 可观测性

漂移检测

静默故障

关键决策

详细文档

相关技能