Axiom 成本控制技能：优化云日志成本，识别浪费并设置监控仪表板

controlling-costs by axiomhq/skills

201 周安装量

7 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/axiomhq/skills --skill controlling-costs

数据分析开发运维监控

🇨🇳中文介绍

Axiom 成本控制

用于优化 Axiom 使用情况的仪表板、监控器和浪费识别。

开始之前

加载所需技能：

skill: axiom-sre
skill: building-dashboards

Building-dashboards 提供：dashboard-list、dashboard-get、dashboard-create、dashboard-update、dashboard-delete

查找审计数据集。首先尝试 axiom-audit：

['axiom-audit']
| where _time > ago(1h)
| summarize count() by action
| where action in ('usageCalculated', 'runAPLQueryCost')

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

用户请求	运行这些阶段
"降低成本" / "查找浪费"	0 → 1 → 4
"设置成本控制"	0 → 1 → 2 → 3
"部署仪表板"	0 → 2
"创建监控器"	0 → 3
"检查配置漂移"	仅 0

阶段 0：检查现有设置

# 现有仪表板？
dashboard-list <deployment> | grep -i cost

# 现有监控器？
axiom-api <deployment> GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "\(.id)\t\(.name)"'

如果找到，使用 dashboard-get 获取并与 templates/dashboard.json 比较以检查配置漂移。

scripts/baseline-stats -d <deployment> -a <audit-dataset>

捕获每日摄入统计数据并生成分析队列（阶段 4 所需）。

阶段 2：仪表板

scripts/deploy-dashboard -d <deployment> -a <audit-dataset>

创建包含以下内容的仪表板：摄入趋势、消耗率、预测、浪费候选、顶级用户。详细信息请参阅 reference/dashboard-panels.md。

阶段 3：监控器

需要合同信息。 您必须从预检步骤 4 中获取合同限制。

步骤 1：列出可用的通知器

scripts/list-notifiers -d <deployment>

将列表呈现给用户，并询问他们希望使用哪个通知器接收成本警报。如果他们不想要通知，请在不使用 -n 的情况下继续。

步骤 2：创建监控器

scripts/create-monitors -d <deployment> -a <audit-dataset> -c <contract_tb> [-n <notifier_id>]

创建 3 个监控器：

总摄入量防护 — 当日摄入量 > 合同限制的 1.2 倍或 7 天平均值较基线增长 >15% 时发出警报
单数据集峰值 — 稳健的 z 分数检测，按数据集发出警报并附带归因信息
查询成本峰值 — 使用 30 天基线、5 天排除间隙和基于持续性的门控机制（median_z > 3, p25_z > 2.5）的强化 z 分数检测

峰值监控器使用 notifyByGroup: true，因此每个数据集都会触发单独的警报。

阈值推导方法请参阅 reference/monitor-strategy.md。

如果尚未运行，请运行 scripts/baseline-stats。它会输出一个优先级列表：

优先级	含义
P0⛔	按摄入量排名前 3 或占总摄入量 >10% — 必须处理
P1	从未被查询 — 强烈建议删除
P2	很少被查询（Work/GB < 100） — 可能是浪费

Work/GB = 查询成本 (GB·ms) / 摄入量 (GB)。数值越低，数据价值越低。

按顺序分析数据集

从上到下处理。对于每个数据集：

步骤 1：列分析

scripts/analyze-query-coverage -d <deployment> -D <dataset> -a <audit-dataset>

如果查询数为 0 → 建议删除，转到下一个。

步骤 2：字段值分析

从建议列表中选择一个字段（通常是 app、service 或 kubernetes.labels.app）：

scripts/analyze-query-coverage -d <deployment> -D <dataset> -a <audit-dataset> -f <field>

注意那些体积大但从未被查询过的值（⚠️ 标记）。

步骤 3：处理空值

如果 (empty) 的体积 >5%，您必须使用替代字段（例如 kubernetes.namespace_name）进行深入分析。

步骤 4：记录建议

对于每个数据集，记录：名称、摄入量、Work/GB、顶级未查询值、操作（删除/采样/保留）、预计节省量。

所有 P0⛔ 和 P1 数据集都已分析完毕。然后使用 reference/analysis-report-template.md 编译报告。

# 删除监控器
axiom-api <deployment> GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "\(.id)\t\(.name)"'
axiom-api <deployment> DELETE "/v2/monitors/<id>"

# 删除仪表板
dashboard-list <deployment> | grep -i cost
dashboard-delete <deployment> <id>

注意： 运行两次 create-monitors 会创建重复项。如果重新部署，请先删除现有的监控器。

审计数据集字段

字段	描述
`action`	`usageCalculated` 或 `runAPLQueryCost`
`properties.hourly_ingest_bytes`	每小时摄入量（字节）
`properties.hourly_billable_query_gbms`	每小时查询成本
`properties.dataset`	数据集名称
`resource.id`	组织 ID
`actor.email`	用户邮箱

用于值分析的常用字段

数据集类型	主要字段	替代字段
Kubernetes 日志	`kubernetes.labels.app`	`kubernetes.namespace_name`、`kubernetes.container_name`
应用程序日志	`app` 或 `service`	`level`、`logger`、`component`
基础设施	`host`	`region`、`instance`
追踪	`service.name`	`span.kind`、`http.route`

脚本使用 TB/天
仪表板过滤器使用 GB/月

合同	TB/天	GB/月
5 PB/月	167	5,000,000
10 PB/月	333	10,000,000
15 PB/月	500	15,000,000

信号	操作
Work/GB = 0	删除或停止摄入
高容量未查询值	采样或降低日志级别
来自系统命名空间的空值	在摄入时过滤或接受
周环比峰值	检查最近的部署

🇺🇸English

Axiom Cost Control

Dashboards, monitors, and waste identification for Axiom usage optimization.

Before You Start

Load required skills:

skill: axiom-sre
skill: building-dashboards

Building-dashboards provides: dashboard-list, dashboard-get, dashboard-create, dashboard-update, dashboard-delete

Find the audit dataset. Try axiom-audit first:
```
['axiom-audit']
| where _time > ago(1h)
| summarize count() by action
| where action in ('usageCalculated', 'runAPLQueryCost')
```
- If not found → ask user. Common names: axiom-audit-logs-view, audit-logs
- If found but no usageCalculated events → wrong dataset, ask user

Verify axiom-history access (required for Phase 4):

['axiom-history'] | where _time > ago(1h) | take 1

If not found, Phase 4 optimization will not work.

Confirm with user:
- Deployment name?
- Audit dataset name?
- Contract limit in TB/day? (required for Phase 3 monitors)
Replace <deployment> and <audit-dataset> in all commands below.

Tips:

Run any script with -h for full usage
Do NOT pipe script output to head or tail — causes SIGPIPE errors
Requires jq for JSON parsing
Use axiom-sre's axiom-query for ad-hoc APL, not direct CLI

Which Phases to Run

User request	Run these phases
"reduce costs" / "find waste"	0 → 1 → 4
"set up cost control"	0 → 1 → 2 → 3
"deploy dashboard"	0 → 2
"create monitors"	0 → 3
"check for drift"	0 only

Phase 0: Check Existing Setup

# Existing dashboard?
dashboard-list <deployment> | grep -i cost

# Existing monitors?
axiom-api <deployment> GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "\(.id)\t\(.name)"'

If found, fetch with dashboard-get and compare to templates/dashboard.json for drift.

Phase 1: Discovery

scripts/baseline-stats -d <deployment> -a <audit-dataset>

Captures daily ingest stats and produces the Analysis Queue (needed for Phase 4).

Phase 2: Dashboard

scripts/deploy-dashboard -d <deployment> -a <audit-dataset>

Creates dashboard with: ingest trends, burn rate, projections, waste candidates, top users. See reference/dashboard-panels.md for details.

Phase 3: Monitors

Contract is required. You must have the contract limit from preflight step 4.

Step 1: List available notifiers

scripts/list-notifiers -d <deployment>

Present the list to the user and ask which notifier they want for cost alerts. If they don't want notifications, proceed without -n.

Step 2: Create monitors

scripts/create-monitors -d <deployment> -a <audit-dataset> -c <contract_tb> [-n <notifier_id>]

Creates 3 monitors:

Total Ingest Guard — alerts when daily ingest >1.2x contract OR 7-day avg grows >15% vs baseline
Per-Dataset Spike — robust z-score detection, alerts per dataset with attribution
Query Cost Spike — hardened z-score with 30d baseline, 5d exclusion gap, persistence-based gating (median_z > 3, p25_z > 2.5)

The spike monitors use notifyByGroup: true so each dataset triggers a separate alert.

See reference/monitor-strategy.md for threshold derivation.

Phase 4: Optimization

Get the Analysis Queue

Run scripts/baseline-stats if not already done. It outputs a prioritized list:

Priority	Meaning
P0⛔	Top 3 by ingest OR >10% of total — MANDATORY
P1	Never queried — strong drop candidate
P2	Rarely queried (Work/GB < 100) — likely waste

Work/GB = query cost (GB·ms) / ingest (GB). Lower = less value from data.

Analyze datasets in order

Work top-to-bottom. For each dataset:

Step 1: Column analysis

scripts/analyze-query-coverage -d <deployment> -D <dataset> -a <audit-dataset>

If 0 queries → recommend DROP, move to next.

Step 2: Field value analysis

Pick a field from suggested list (usually app, service, or kubernetes.labels.app):

scripts/analyze-query-coverage -d <deployment> -D <dataset> -a <audit-dataset> -f <field>

Note values with high volume but never queried (⚠️ markers).

Step 3: Handle empty values

If (empty) has >5% volume, you MUST drill down with alternative field (e.g., kubernetes.namespace_name).

Step 4: Record recommendation

For each dataset, note: name, ingest volume, Work/GB, top unqueried values, action (DROP/SAMPLE/KEEP), estimated savings.

Done when

All P0⛔ and P1 datasets analyzed. Then compile report using reference/analysis-report-template.md.

Cleanup

# Delete monitors
axiom-api <deployment> GET "/v2/monitors" | jq -r '.[] | select(.name | startswith("Cost Control:")) | "\(.id)\t\(.name)"'
axiom-api <deployment> DELETE "/v2/monitors/<id>"

# Delete dashboard
dashboard-list <deployment> | grep -i cost
dashboard-delete <deployment> <id>

Note: Running create-monitors twice creates duplicates. Delete existing monitors first if re-deploying.

Reference

Audit Dataset Fields

Field	Description
`action`	`usageCalculated` or `runAPLQueryCost`
`properties.hourly_ingest_bytes`	Hourly ingest in bytes
`properties.hourly_billable_query_gbms`	Hourly query cost
`properties.dataset`	Dataset name
`resource.id`

Common Fields for Value Analysis

Dataset type	Primary field	Alternatives
Kubernetes logs	`kubernetes.labels.app`	`kubernetes.namespace_name`, `kubernetes.container_name`
Application logs	`app` or `service`	`level`, `logger`, `component`

Units & Conversions

Scripts use TB/day
Dashboard filter uses GB/month

Contract	TB/day	GB/month
5 PB/month	167	5,000,000
10 PB/month	333	10,000,000
15 PB/month	500	15,000,000

Optimization Actions

Signal	Action
Work/GB = 0	Drop or stop ingesting
High-volume unqueried values	Sample or reduce log level
Empty values from system namespaces	Filter at ingest or accept
WoW spike	Check recent deploys

Weekly Installs

120

Repository

axiomhq/skills

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykPass

Installed on

codex107

gemini-cli105

opencode105

claude-code100

github-copilot100

cursor93

Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU

96,200 周安装