qa-observability by vasilyu1983/ai-agents-public
npx skills add https://github.com/vasilyu1983/ai-agents-public --skill qa-observability利用遥测数据(日志、指标、追踪、性能剖析)作为 QA 信号和调试基础。
核心参考(参见 data/sources.json):OpenTelemetry、W3C Trace Context 以及 SLO 实践(Google SRE)。
如果缺少关键上下文,请询问:关键用户旅程、服务/依赖项清单、环境(本地/预发/生产)、当前遥测技术栈以及当前的 SLO/SLA 承诺(如果有)。
traceparent(以及您的请求 ID)端到端地跨边界流动。assets/checklists/template-observability-readiness-checklist.md 和 assets/monitoring/slo/*)。广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 任务 | 推荐默认方案 | 备注 |
|---|---|---|
| 追踪 | OpenTelemetry + Jaeger/Tempo | 尽可能通过 Collector 使用 OTLP 导出器 |
| 指标 | Prometheus + Grafana | 使用直方图统计延迟;注意基数 |
| 日志记录 | 结构化 JSON + 关联 ID | 切勿记录机密信息/PII;积极进行脱敏处理 |
| 可靠性门控 | SLO + 错误预算 + 燃烧率告警 | 基于持续的燃烧/回归情况来门控发布 |
| 性能 | 性能剖析 + 负载测试 + 预算 | 为间歇性问题添加持续性能剖析 |
| 零代码可见性 | eBPF (OpenTelemetry 零代码) + 持续性能剖析 (Parca/Pyroscope) | 在无法进行代码更改时使用 |
需要时打开这些指南:
| 如果用户需要... | 阅读 | 同时使用 |
|---|---|---|
| 一个最小化、生产就绪的基线 | references/core-observability-patterns.md | assets/checklists/template-observability-readiness-checklist.md |
| Node/Python 插桩设置 | references/opentelemetry-best-practices.md | assets/opentelemetry/nodejs/opentelemetry-nodejs-setup.md, assets/opentelemetry/python/opentelemetry-python-setup.md |
| 跨服务的工作追踪传播 | references/distributed-tracing-patterns.md | assets/checklists/template-observability-readiness-checklist.md |
| SLO、燃烧率告警和发布门控 | references/slo-design-guide.md | assets/monitoring/slo/slo-definition.yaml, assets/monitoring/slo/prometheus-alert-rules.yaml |
| 基于证据的性能剖析/负载测试 | references/performance-profiling-guide.md | assets/load-testing/load-testing-k6.js, assets/load-testing/template-load-test-artillery.yaml |
| 成熟度模型和路线图 | references/observability-maturity-model.md | assets/checklists/template-observability-readiness-checklist.md |
| 应避免的事项及修复方法 | references/anti-patterns-best-practices.md | assets/checklists/template-observability-readiness-checklist.md |
| 告警设计和减少疲劳 | references/alerting-strategies.md | assets/monitoring/slo/prometheus-alert-rules.yaml |
| 仪表板层次结构和布局 | references/dashboard-design-patterns.md | assets/monitoring/grafana/template-grafana-dashboard-observability.json |
| 结构化日志记录和成本控制 | references/log-aggregation-patterns.md | assets/observability/template-logging-setup.md |
实施指南(深度探讨):
references/core-observability-patterns.mdreferences/opentelemetry-best-practices.mdreferences/distributed-tracing-patterns.mdreferences/slo-design-guide.mdreferences/performance-profiling-guide.mdreferences/observability-maturity-model.mdreferences/anti-patterns-best-practices.mdreferences/alerting-strategies.mdreferences/dashboard-design-patterns.mdreferences/log-aggregation-patterns.md模板(复制/粘贴):
assets/checklists/template-observability-readiness-checklist.mdassets/opentelemetry/nodejs/opentelemetry-nodejs-setup.mdassets/opentelemetry/python/opentelemetry-python-setup.mdassets/monitoring/slo/slo-definition.yamlassets/monitoring/slo/prometheus-alert-rules.yamlassets/monitoring/grafana/grafana-dashboard-slo.jsonassets/monitoring/grafana/template-grafana-dashboard-observability.jsonassets/load-testing/load-testing-k6.jsassets/load-testing/template-load-test-artillery.yamlassets/performance/frontend/template-lighthouse-ci.jsonassets/performance/backend/template-nodejs-profiling-config.js精选来源:
data/sources.json../ops-devops-platform/SKILL.md../data-sql-optimization/SKILL.md../qa-debugging/SKILL.md../qa-testing-strategy/SKILL.md../qa-resilience/SKILL.md../software-architecture-design/SKILL.mddata/sources.json 开始,如果环境允许,请使用当前文档/发布版本来验证有时效性的声明。每周安装数
76
代码仓库
GitHub 星标数
49
首次出现
2026年1月23日
安全审计
安装于
opencode59
cursor58
codex58
gemini-cli56
github-copilot54
claude-code51
Use telemetry (logs, metrics, traces, profiles) as a QA signal and a debugging substrate.
Core references (see data/sources.json): OpenTelemetry, W3C Trace Context, and SLO practices (Google SRE).
If key context is missing, ask for: critical user journeys, service/dependency inventory, environments (local/staging/prod), current telemetry stack, and current SLO/SLA commitments (if any).
traceparent (and your request ID) flow across boundaries end-to-end.assets/checklists/template-observability-readiness-checklist.md and assets/monitoring/slo/*).| Task | Recommended default | Notes |
|---|---|---|
| Tracing | OpenTelemetry + Jaeger/Tempo | Prefer OTLP exporters via Collector when possible |
| Metrics | Prometheus + Grafana | Use histograms for latency; watch cardinality |
| Logging | Structured JSON + correlation IDs | Never log secrets/PII; redact aggressively |
| Reliability gates | SLOs + error budgets + burn-rate alerts | Gate releases on sustained burn/regressions |
| Performance | Profiling + load tests + budgets | Add continuous profiling for intermittent issues |
| Zero-code visibility | eBPF (OpenTelemetry zero-code) + continuous profiling (Parca/Pyroscope) | Use when code changes are not feasible |
Open these guides when needed:
| If the user needs... | Read | Also use |
|---|---|---|
| A minimal, production-ready baseline | references/core-observability-patterns.md | assets/checklists/template-observability-readiness-checklist.md |
| Node/Python instrumentation setup | references/opentelemetry-best-practices.md | assets/opentelemetry/nodejs/opentelemetry-nodejs-setup.md, assets/opentelemetry/python/opentelemetry-python-setup.md |
| Working trace propagation across services | references/distributed-tracing-patterns.md |
Implementation guides (deep dives):
references/core-observability-patterns.mdreferences/opentelemetry-best-practices.mdreferences/distributed-tracing-patterns.mdreferences/slo-design-guide.mdreferences/performance-profiling-guide.mdreferences/observability-maturity-model.mdreferences/anti-patterns-best-practices.mdreferences/alerting-strategies.mdreferences/dashboard-design-patterns.mdTemplates (copy/paste):
assets/checklists/template-observability-readiness-checklist.mdassets/opentelemetry/nodejs/opentelemetry-nodejs-setup.mdassets/opentelemetry/python/opentelemetry-python-setup.mdassets/monitoring/slo/slo-definition.yamlassets/monitoring/slo/prometheus-alert-rules.yamlassets/monitoring/grafana/grafana-dashboard-slo.jsonassets/monitoring/grafana/template-grafana-dashboard-observability.jsonassets/load-testing/load-testing-k6.jsCurated sources:
data/sources.json../ops-devops-platform/SKILL.md../data-sql-optimization/SKILL.md../qa-debugging/SKILL.md../qa-testing-strategy/SKILL.md../qa-resilience/SKILL.md../software-architecture-design/SKILL.mddata/sources.json and validate time-sensitive claims with current docs/releases if the environment allows it.Weekly Installs
76
Repository
GitHub Stars
49
First Seen
Jan 23, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode59
cursor58
codex58
gemini-cli56
github-copilot54
claude-code51
Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU
104,900 周安装
assets/checklists/template-observability-readiness-checklist.md |
| SLOs, burn-rate alerts, and release gates | references/slo-design-guide.md | assets/monitoring/slo/slo-definition.yaml, assets/monitoring/slo/prometheus-alert-rules.yaml |
| Profiling/load testing with evidence | references/performance-profiling-guide.md | assets/load-testing/load-testing-k6.js, assets/load-testing/template-load-test-artillery.yaml |
| A maturity model and roadmap | references/observability-maturity-model.md | assets/checklists/template-observability-readiness-checklist.md |
| What to avoid and how to fix it | references/anti-patterns-best-practices.md | assets/checklists/template-observability-readiness-checklist.md |
| Alert design and fatigue reduction | references/alerting-strategies.md | assets/monitoring/slo/prometheus-alert-rules.yaml |
| Dashboard hierarchy and layout | references/dashboard-design-patterns.md | assets/monitoring/grafana/template-grafana-dashboard-observability.json |
| Structured logging and cost control | references/log-aggregation-patterns.md | assets/observability/template-logging-setup.md |
references/log-aggregation-patterns.mdassets/load-testing/template-load-test-artillery.yamlassets/performance/frontend/template-lighthouse-ci.jsonassets/performance/backend/template-nodejs-profiling-config.js