Kibana告警规则API指南：创建、管理监控告警与自动化动作

kibana-alerting-rules by elastic/agent-skills

230 周安装量

291 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/elastic/agent-skills --skill kibana-alerting-rules

自动化开发运维监控

🇨🇳中文介绍

Kibana 告警规则

核心概念

一条规则包含三个部分：条件（检测什么）、计划（检查频率）和动作（条件满足时执行什么）。当条件满足时，规则会创建告警，告警通过连接器触发动作。

身份验证

所有告警 API 调用都需要 API 密钥认证或基本认证。每个变更请求都必须包含 kbn-xsrf 头。

kbn-xsrf: true

所需权限

对应 Kibana 功能的 all 权限（例如，Stack Rules、Observability、Security）
Actions 和 Connectors 的 read 权限（以便将动作附加到规则）

API 参考

基础路径：<kibana_url>/api/alerting（对于非默认空间，路径为 /s/<space_id>/api/alerting）。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

操作	方法	端点
创建规则	POST	`/api/alerting/rule/{id}`
更新规则	PUT	`/api/alerting/rule/{id}`
获取规则	GET	`/api/alerting/rule/{id}`
删除规则	DELETE	`/api/alerting/rule/{id}`
查找规则	GET	`/api/alerting/rules/_find`
列出规则类型	GET	`/api/alerting/rule_types`
启用规则	POST	`/api/alerting/rule/{id}/_enable`
禁用规则	POST	`/api/alerting/rule/{id}/_disable`
静默所有告警	POST	`/api/alerting/rule/{id}/_mute_all`
取消静默所有告警	POST	`/api/alerting/rule/{id}/_unmute_all`
静默特定告警	POST	`/api/alerting/rule/{rule_id}/alert/{alert_id}/_mute`
取消静默特定告警	POST	`/api/alerting/rule/{rule_id}/alert/{alert_id}/_unmute`
更新 API 密钥	POST	`/api/alerting/rule/{id}/_update_api_key`
创建休眠计划	POST	`/api/alerting/rule/{id}/snooze_schedule`
删除休眠计划	DELETE	`/api/alerting/rule/{ruleId}/snooze_schedule/{scheduleId}`
健康检查	GET	`/api/alerting/_health`

字段	类型	描述
`name`	string	显示名称（无需唯一）
`rule_type_id`	string	规则类型（例如，`.es-query`、`.index-threshold`）
`consumer`	string	所属应用：`alerts`、`apm`、`discover`、`infrastructure`、`logs`、`metrics`、`ml`、`monitoring`、`securitySolution`、`siem`、`stackAlerts`、`uptime`
`params`	object	特定于规则类型的参数
`schedule`	object	检查间隔，例如 `{"interval": "5m"}`

字段	类型	描述
`actions`	array	条件满足时执行的动作（每个动作引用一个连接器）
`tags`	array	用于组织规则的标签
`enabled`	boolean	规则是否立即运行（默认值：true）
`notify_when`	string	`onActionGroupChange`、`onActiveAlert` 或 `onThrottleInterval`（建议改为按动作设置）
`alert_delay`	object	仅在连续 N 次匹配后才告警，例如 `{"active": 3}`
`flapping`	object/null	覆盖抖动检测设置

示例：创建 Elasticsearch 查询规则

curl -X POST "https://my-kibana:5601/api/alerting/rule/my-rule-id" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey <your-api-key>" \
  -d '{
    "name": "High error rate",
    "rule_type_id": ".es-query",
    "consumer": "stackAlerts",
    "schedule": { "interval": "5m" },
    "params": {
      "index": ["logs-*"],
      "timeField": "@timestamp",
      "esQuery": "{\"query\":{\"match\":{\"log.level\":\"error\"}}}",
      "threshold": [100],
      "thresholdComparator": ">",
      "timeWindowSize": 5,
      "timeWindowUnit": "m",
      "size": 100
    },
    "actions": [
      {
        "id": "my-slack-connector-id",
        "group": "query matched",
        "params": {
          "message": "Alert: {{rule.name}} - {{context.hits}} hits detected"
        },
        "frequency": {
          "summary": false,
          "notify_when": "onActionGroupChange"
        }
      }
    ],
    "tags": ["production", "errors"]
  }'

相同的结构适用于其他规则类型——设置相应的 rule_type_id（例如，.index-threshold、.es-query）并提供匹配的 params 对象。使用 GET /api/alerting/rule_types 来发现参数模式。

PUT /api/alerting/rule/{id} —— 发送完整的规则主体。rule_type_id 和 consumer 在创建后不可变。如果其他用户同时更新了规则，则返回 409 Conflict；需要重新获取并重试。

curl -X GET "https://my-kibana:5601/api/alerting/rules/_find?per_page=20&page=1&search=cpu&sort_field=name&sort_order=asc" \
  -H "Authorization: ApiKey <your-api-key>"

查询参数：per_page、page、search、default_search_operator、search_fields、sort_field、sort_order、has_reference、fields、filter、filter_consumers。

使用带有 KQL 语法的 filter 参数进行高级查询：

filter=alert.attributes.tags:"production"

# 启用
curl -X POST ".../api/alerting/rule/{id}/_enable" -H "kbn-xsrf: true"

# 禁用
curl -X POST ".../api/alerting/rule/{id}/_disable" -H "kbn-xsrf: true"

# 静默所有告警
curl -X POST ".../api/alerting/rule/{id}/_mute_all" -H "kbn-xsrf: true"

# 静默特定告警
curl -X POST ".../api/alerting/rule/{rule_id}/alert/{alert_id}/_mute" -H "kbn-xsrf: true"

# 删除
curl -X DELETE ".../api/alerting/rule/{id}" -H "kbn-xsrf: true"

使用 elasticstack provider 的资源 elasticstack_kibana_alerting_rule。

terraform {
  required_providers {
    elasticstack = {
      source  = "elastic/elasticstack"
    }
  }
}

provider "elasticstack" {
  kibana {
    endpoints = ["https://my-kibana:5601"]
    api_key   = var.kibana_api_key
  }
}

resource "elasticstack_kibana_alerting_rule" "cpu_alert" {
  name         = "CPU usage critical"
  consumer     = "stackAlerts"
  rule_type_id = ".index-threshold"
  interval     = "1m"
  enabled      = true

  params = jsonencode({
    index              = ["metrics-*"]
    timeField          = "@timestamp"
    aggType            = "avg"
    aggField           = "system.cpu.total.pct"
    groupBy            = "top"
    termField          = "host.name"
    termSize           = 10
    threshold          = [0.9]
    thresholdComparator = ">"
    timeWindowSize     = 5
    timeWindowUnit     = "m"
  })

  tags = ["infrastructure", "production"]
}

Terraform 关键注意事项：

params 必须通过 jsonencode() 作为 JSON 编码的字符串传递
使用 elasticstack_kibana_action_connector 数据源或资源来引用动作中的连接器 ID
导入现有规则：terraform import elasticstack_kibana_alerting_rule.my_rule <space_id>/<rule_id>（对于默认空间使用 default）

从规则触发 Kibana 工作流

预览功能 —— 从 Elastic Stack 9.3 和 Elastic Cloud Serverless 开始可用。API 可能会更改。

使用工作流 ID 作为连接器 ID，将工作流附加为规则动作。设置 params: {} —— 告警上下文通过工作流内的 event 对象自动传递。

curl -X PUT "https://my-kibana:5601/api/alerting/rule/my-rule-id" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey <your-api-key>" \
  -d '{
    "name": "High error rate",
    "schedule": { "interval": "5m" },
    "params": { ... },
    "actions": [
      {
        "id": "<workflow-id>",
        "group": "query matched",
        "params": {},
        "frequency": { "summary": false, "notify_when": "onActionGroupChange" }
      }
    ]
  }'

在 UI 中：Stack Management > Rules > Actions > Workflows。只有 enabled: true 的工作流会出现在选择器中。

关于工作流 YAML 结构、{{ event }} 上下文字段、步骤类型和模式，请参考 kibana-connectors skill（如果可用）。

规则中的连接器和动作

每个动作通过 ID 引用一个连接器，包含一个动作 group、动作 params（使用 Mustache 模板）和一个针对每个动作的 frequency 对象。关键字段：

group —— 哪个触发器状态触发此动作（例如，"query matched"、"Recovered"）。通过 GET /api/alerting/rule_types 发现有效的组。
frequency.summary —— true 表示所有告警的摘要；false 表示针对每个告警。
frequency.notify_when —— onActionGroupChange | onActiveAlert | onThrottleInterval。
frequency.throttle —— 最小重复间隔（例如，"10m"）；仅适用于 onThrottleInterval。

关于动作结构的完整参考、Mustache 变量（{{rule.name}}、{{context.*}}、{{alerts.new.count}}）、Mustache lambda 函数（EvalMath、FormatDate、ParseHjson）、恢复动作和多通道模式，请参考 kibana-connectors skill（如果可用）。

按动作设置动作频率，而不是按规则设置。 规则级别的 notify_when 字段已被弃用，推荐使用每个动作的 frequency 对象。如果你在规则级别设置了它，后来在 Kibana UI 中编辑规则，它会自动转换为动作级别的值。
使用告警摘要来减少通知噪音。 不要为每个告警发送一个通知，而是配置动作用自定义间隔发送定期摘要。使用 "summary": true 并设置一个 throttle 间隔。这对于监控许多主机或文档的规则尤其有价值。
为每个通道选择合适的动作频率。 对寻呼/工单系统使用 onActionGroupChange（触发一次，解决一次）。对索引连接器的审计日志记录使用 onActiveAlert。对仪表板或低优先级通知使用带有节流（如 "30m"）的 onThrottleInterval。
始终添加恢复动作。 没有恢复动作的规则会使 PagerDuty、Jira 和 ServiceNow 中的事件无限期保持打开状态。在 Recovered 动作组中使用连接器的原生关闭/解决事件动作（例如，PagerDuty 的 eventAction: "resolve"）。
设置合理的检查间隔。 建议的最小间隔是 1m。许多规则的非常短的间隔会阻塞 Task Manager 的吞吐量并增加计划漂移。服务器设置 xpack.alerting.rules.minimumScheduleInterval.value 强制执行此限制。
使用 alert_delay 来抑制瞬时峰值。 设置 {"active": 3} 意味着告警仅在连续 3 次运行匹配条件后才触发，从而过滤掉短暂的异常。
启用抖动检测。 在活动状态和恢复状态之间快速切换的告警会被标记为“抖动”，并抑制通知。默认情况下此功能是开启的，但可以通过每个规则的 flapping 对象进行调整。
使用 server.publicBaseUrl 获取深层链接。 在 kibana.yml 中设置 server.publicBaseUrl，以便 {{rule.url}} 和 {{kibanaBaseUrl}} 变量在通知中解析为有效的 URL。
一致地标记规则。 使用诸如 production、staging、team-platform 等标签，以便在 Find API 和 UI 中进行过滤和组织。
使用 Kibana 空间 来按团队或环境隔离规则。对于非默认空间，在 API 路径前加上 /s/<space_id>/。连接器也是空间作用域的，因此需要在每个空间中创建匹配的连接器。

缺少 kbn-xsrf 头。 所有 POST、PUT、DELETE 请求都需要 kbn-xsrf: true 或任何真值。省略它会导致 400 错误。
错误的 consumer 值。 使用无效的 consumer（例如，使用 observability 而不是 infrastructure）会导致 400 错误。通过 GET /api/alerting/rule_types 检查规则类型支持的 consumer。
更新时字段不可变。 你不能用 PUT 更改 rule_type_id 或 consumer。必须删除并重新创建规则。
规则级别的 notify_when 和 throttle 已被弃用。 在规则级别设置这些仍然有效，但与动作级别的频率设置冲突。始终在每个动作对象内部使用 frequency。
规则 ID 冲突。 使用已存在的 ID 向 /api/alerting/rule/{id} 发送 POST 请求会返回 409。要么省略 ID 以自动生成，要么先检查是否存在。
API 密钥所有权。 规则使用创建或最后更新它的用户的 API 密钥运行。如果该用户的权限发生变化或用户被删除，规则可能会静默失败。使用 _update_api_key 重新关联。
每个规则的动作过多。 生成数千个告警并带有多个动作的规则可能会阻塞 Task Manager。服务器设置 xpack.alerting.rules.run.actions.max（默认值不同）限制了每次运行的动作数量。设计规则时使用告警摘要或限制术语大小。
长时间运行的规则。 运行昂贵查询的规则在 xpack.alerting.rules.run.timeout（默认 5m）后会被取消。取消时，该次运行的所有告警和动作都会被丢弃。优化查询或为特定规则类型增加超时时间。
并发更新冲突。 如果规则自你上次读取后被其他用户修改，PUT 会返回 409。更新前始终 GET 最新版本。
导入/导出会丢失密钥。 通过 Saved Objects 导出的规则在导入时会被禁用。连接器会丢失其密钥，必须重新配置。

创建阈值告警： “当任何主机的 CPU 在 5 分钟内超过 90% 时告警。” 使用 rule_type_id: ".index-threshold"、aggField: "system.cpu.total.pct"、threshold: [0.9] 和 timeWindowSize: 5。在 "threshold met" 上附加一个 PagerDuty 动作，并附加一个匹配的 Recovered 动作以自动关闭事件。

按标签查找规则： “显示所有生产环境的告警规则。” 使用 GET /api/alerting/rules/_find，参数为 filter=alert.attributes.tags:"production" 和 sort_field=name 来分页浏览结果。

临时暂停规则： “禁用规则 abc123 直到下周一。” 使用 POST /api/alerting/rule/abc123/_disable。准备好后使用 _enable 重新启用；规则在禁用期间保留所有配置。

在每个 POST、PUT 和 DELETE 请求中包含 kbn-xsrf: true；省略它会导致 400。
在每个动作对象内部设置 frequency —— 规则级别的 notify_when 和 throttle 已被弃用。
rule_type_id 和 consumer 在创建后不可变；删除并重新创建规则以更改它们。
对于非默认的 Kibana 空间，在路径前加上 /s/<space_id>/api/alerting/。
始终将活动动作与 Recovered 动作配对，以自动关闭 PagerDuty、Jira 和 ServiceNow 事件。
首先运行 GET /api/alerting/rule_types 以发现有效的 consumer 值和动作组名称。
使用 alert_delay 抑制瞬时峰值；使用 flapping 对象减少不稳定条件产生的噪音。

🇺🇸English

Kibana Alerting Rules

Core Concepts

A rule has three parts: conditions (what to detect), schedule (how often to check), and actions (what happens when conditions are met). When conditions are met, the rule creates alerts , which trigger actions via connectors.

Authentication

All alerting API calls require either API key auth or Basic auth. Every mutating request must include the kbn-xsrf header.

kbn-xsrf: true

Required Privileges

all privileges for the appropriate Kibana feature (e.g., Stack Rules, Observability, Security)
read privileges for Actions and Connectors (to attach actions to rules)

API Reference

Base path: <kibana_url>/api/alerting (or /s/<space_id>/api/alerting for non-default spaces).

Operation	Method	Endpoint
Create rule	POST	`/api/alerting/rule/{id}`
Update rule	PUT	`/api/alerting/rule/{id}`
Get rule	GET	`/api/alerting/rule/{id}`
Delete rule	DELETE	`/api/alerting/rule/{id}`
Find rules	GET	`/api/alerting/rules/_find`

Creating a Rule

Required Fields

Field	Type	Description
`name`	string	Display name (does not need to be unique)
`rule_type_id`	string	The rule type (e.g., `.es-query`, `.index-threshold`)
`consumer`	string	Owning app: `alerts`, `apm`, , , , , , , , , ,

Optional Fields

Field	Type	Description
`actions`	array	Actions to run when conditions are met (each references a connector)
`tags`	array	Tags for organizing rules
`enabled`	boolean	Whether the rule runs immediately (default: true)
`notify_when`	string	`onActionGroupChange`, `onActiveAlert`, or (prefer setting per-action instead)

Example: Create an Elasticsearch Query Rule

curl -X POST "https://my-kibana:5601/api/alerting/rule/my-rule-id" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey <your-api-key>" \
  -d '{
    "name": "High error rate",
    "rule_type_id": ".es-query",
    "consumer": "stackAlerts",
    "schedule": { "interval": "5m" },
    "params": {
      "index": ["logs-*"],
      "timeField": "@timestamp",
      "esQuery": "{\"query\":{\"match\":{\"log.level\":\"error\"}}}",
      "threshold": [100],
      "thresholdComparator": ">",
      "timeWindowSize": 5,
      "timeWindowUnit": "m",
      "size": 100
    },
    "actions": [
      {
        "id": "my-slack-connector-id",
        "group": "query matched",
        "params": {
          "message": "Alert: {{rule.name}} - {{context.hits}} hits detected"
        },
        "frequency": {
          "summary": false,
          "notify_when": "onActionGroupChange"
        }
      }
    ],
    "tags": ["production", "errors"]
  }'

The same structure applies to other rule types — set the appropriate rule_type_id (e.g., .index-threshold, .es-query) and provide the matching params object. Use GET /api/alerting/rule_types to discover params schemas.

Updating a Rule

PUT /api/alerting/rule/{id} — send the complete rule body. rule_type_id and consumer are immutable after creation. Returns 409 Conflict if another user updated the rule concurrently; re-fetch and retry.

Finding Rules

curl -X GET "https://my-kibana:5601/api/alerting/rules/_find?per_page=20&page=1&search=cpu&sort_field=name&sort_order=asc" \
  -H "Authorization: ApiKey <your-api-key>"

Query parameters: per_page, page, search, default_search_operator, search_fields, sort_field, sort_order, has_reference, fields, filter, filter_consumers.

Use the filter parameter with KQL syntax for advanced queries:

filter=alert.attributes.tags:"production"

Lifecycle Operations

# Enable
curl -X POST ".../api/alerting/rule/{id}/_enable" -H "kbn-xsrf: true"

# Disable
curl -X POST ".../api/alerting/rule/{id}/_disable" -H "kbn-xsrf: true"

# Mute all alerts
curl -X POST ".../api/alerting/rule/{id}/_mute_all" -H "kbn-xsrf: true"

# Mute specific alert
curl -X POST ".../api/alerting/rule/{rule_id}/alert/{alert_id}/_mute" -H "kbn-xsrf: true"

# Delete
curl -X DELETE ".../api/alerting/rule/{id}" -H "kbn-xsrf: true"

Terraform Provider

Use the elasticstack provider resource elasticstack_kibana_alerting_rule.

terraform {
  required_providers {
    elasticstack = {
      source  = "elastic/elasticstack"
    }
  }
}

provider "elasticstack" {
  kibana {
    endpoints = ["https://my-kibana:5601"]
    api_key   = var.kibana_api_key
  }
}

resource "elasticstack_kibana_alerting_rule" "cpu_alert" {
  name         = "CPU usage critical"
  consumer     = "stackAlerts"
  rule_type_id = ".index-threshold"
  interval     = "1m"
  enabled      = true

  params = jsonencode({
    index              = ["metrics-*"]
    timeField          = "@timestamp"
    aggType            = "avg"
    aggField           = "system.cpu.total.pct"
    groupBy            = "top"
    termField          = "host.name"
    termSize           = 10
    threshold          = [0.9]
    thresholdComparator = ">"
    timeWindowSize     = 5
    timeWindowUnit     = "m"
  })

  tags = ["infrastructure", "production"]
}

Key Terraform notes:

params must be passed as a JSON-encoded string via jsonencode()
Use elasticstack_kibana_action_connector data source or resource to reference connector IDs in actions
Import existing rules: terraform import elasticstack_kibana_alerting_rule.my_rule <space_id>/<rule_id> (use default for the default space)

Triggering Kibana Workflows from Rules

Preview feature — available from Elastic Stack 9.3 and Elastic Cloud Serverless. APIs may change.

Attach a workflow as a rule action using the workflow ID as the connector ID. Set params: {} — alert context flows automatically through the event object inside the workflow.

curl -X PUT "https://my-kibana:5601/api/alerting/rule/my-rule-id" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  -H "Authorization: ApiKey <your-api-key>" \
  -d '{
    "name": "High error rate",
    "schedule": { "interval": "5m" },
    "params": { ... },
    "actions": [
      {
        "id": "<workflow-id>",
        "group": "query matched",
        "params": {},
        "frequency": { "summary": false, "notify_when": "onActionGroupChange" }
      }
    ]
  }'

In the UI: Stack Management > Rules > Actions > Workflows. Only enabled: true workflows appear in the picker.

For workflow YAML structure, {{ event }} context fields, step types, and patterns, refer to the kibana-connectors skill if available.

Connectors and Actions in Rules

Each action references a connector by ID, an action group, action params (using Mustache templates), and a per-action frequency object. Key fields:

group — which trigger state fires this action (e.g., "query matched", "Recovered"). Discover valid groups via GET /api/alerting/rule_types.
frequency.summary — true for a digest of all alerts; false for per-alert.
frequency.notify_when — onActionGroupChange | onActiveAlert | onThrottleInterval.

For full reference on action structure, Mustache variables ({{rule.name}}, {{context.*}}, {{alerts.new.count}}), Mustache lambdas (EvalMath, FormatDate, ParseHjson), recovery actions, and multi-channel patterns, refer to the kibana-connectors skill if available.

Best Practices

Set action frequency per action, not per rule. The notify_when field at the rule level is deprecated in favor of per-action frequency objects. If you set it at the rule level and later edit the rule in the Kibana UI, it is automatically converted to action-level values.
Use alert summaries to reduce notification noise. Instead of sending one notification per alert, configure actions to send periodic summaries at a custom interval. Use "summary": true and set a throttle interval. This is especially valuable for rules that monitor many hosts or documents.
Choose the right action frequency for each channel. Use onActionGroupChange for paging/ticketing systems (fire once, resolve once). Use onActiveAlert for audit logging to an Index connector. Use onThrottleInterval with a throttle like for dashboards or lower-priority notifications.

Common Pitfalls

Missingkbn-xsrf header. All POST, PUT, DELETE requests require kbn-xsrf: true or any truthy value. Omitting it returns a 400 error.
Wrongconsumer value. Using an invalid consumer (e.g., observability instead of infrastructure) causes a 400 error. Check the rule type's supported consumers via GET /api/alerting/rule_types.
Immutable fields on update. You cannot change rule_type_id or consumer with PUT. You must delete and recreate the rule.
Setting these at the rule level still works but conflicts with action-level frequency settings. Always use inside each action object.

Examples

Create a threshold alert: "Alert me when CPU exceeds 90% on any host for 5 minutes." Use rule_type_id: ".index-threshold", aggField: "system.cpu.total.pct", threshold: [0.9], and timeWindowSize: 5. Attach a PagerDuty action on "threshold met" and a matching Recovered action to auto-close incidents.

Find rules by tag: "Show all production alerting rules." GET /api/alerting/rules/_find with filter=alert.attributes.tags:"production" and sort_field=name to page through results.

Pause a rule temporarily: "Disable rule abc123 until next Monday." POST /api/alerting/rule/abc123/_disable. Re-enable with _enable when ready; the rule retains all configuration while disabled.

Guidelines

Include kbn-xsrf: true on every POST, PUT, and DELETE; omitting it returns 400.
Set frequency inside each action object — rule-level notify_when and throttle are deprecated.
rule_type_id and consumer are immutable after creation; delete and recreate the rule to change them.
Prefix paths with /s/<space_id>/api/alerting/ for non-default Kibana Spaces.
Always pair an active action with a Recovered action to auto-close PagerDuty, Jira, and ServiceNow incidents.
Run GET /api/alerting/rule_types first to discover valid values and action group names.

Additional Resources

Weekly Installs

143

Repository

elastic/agent-skills

GitHub Stars

First Seen

11 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

cursor129

github-copilot123

opencode122

gemini-cli122

codex122

amp121

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

128,400 周安装

onThrottleInterval

frequency.throttle — minimum repeat interval (e.g., "10m"); only applies with onThrottleInterval.

Always add a recovery action. Rules without a recovery action leave incidents open in PagerDuty, Jira, and ServiceNow indefinitely. Use the connector's native close/resolve event action (e.g., eventAction: "resolve" for PagerDuty) in the Recovered action group.

Set a reasonable check interval. The minimum recommended interval is 1m. Very short intervals across many rules clog Task Manager throughput and increase schedule drift. The server setting xpack.alerting.rules.minimumScheduleInterval.value enforces this.

Usealert_delay to suppress transient spikes. Setting {"active": 3} means the alert only fires after 3 consecutive runs match the condition, filtering out brief anomalies.

Enable flapping detection. Alerts that rapidly switch between active and recovered are marked as "flapping" and notifications are suppressed. This is on by default but can be tuned per-rule with the flapping object.

Useserver.publicBaseUrl for deep links. Set server.publicBaseUrl in kibana.yml so that {{rule.url}} and {{kibanaBaseUrl}} variables resolve to valid URLs in notifications.

Tag rules consistently. Use tags like production, staging, team-platform for filtering and organization in the Find API and UI.

Use Kibana Spaces to isolate rules by team or environment. Prefix API paths with /s/<space_id>/ for non-default spaces. Connectors are also space-scoped, so create matching connectors in each space.

Rule-levelnotify_when and throttle are deprecated.

Rule ID conflicts. POST to /api/alerting/rule/{id} with an existing ID returns 409. Either omit the ID to auto-generate, or check existence first.

API key ownership. Rules run using the API key of the user who created or last updated them. If that user's permissions change or the user is deleted, the rule may fail silently. Use _update_api_key to re-associate.

Too many actions per rule. Rules generating thousands of alerts with multiple actions can clog Task Manager. The server setting xpack.alerting.rules.run.actions.max (default varies) limits actions per run. Design rules to use alert summaries or limit term sizes.

Long-running rules. Rules that run expensive queries are cancelled after xpack.alerting.rules.run.timeout (default 5m). When cancelled, all alerts and actions from that run are discarded. Optimize queries or increase the timeout for specific rule types.

Concurrent update conflicts. PUT returns 409 if the rule was modified by another user since you last read it. Always GET the latest version before updating.

Import/export loses secrets. Rules exported via Saved Objects are disabled on import. Connectors lose their secrets and must be re-configured.

Use alert_delay to suppress transient spikes; use the flapping object to reduce noise from unstable conditions.

Kibana告警规则API指南：创建、管理监控告警与自动化动作

🇨🇳中文介绍

Kibana 告警规则

核心概念

身份验证

所需权限

API 参考

相关 Skills

创建规则

必填字段

可选字段

示例：创建 Elasticsearch 查询规则

更新规则

查找规则

生命周期操作

Terraform Provider

从规则触发 Kibana 工作流

规则中的连接器和动作

最佳实践

常见陷阱

示例

指南

其他资源

🇺🇸English

Kibana Alerting Rules

Core Concepts

Authentication

Required Privileges

API Reference

Creating a Rule

Required Fields

Optional Fields

Example: Create an Elasticsearch Query Rule

Updating a Rule

Finding Rules

Lifecycle Operations

Terraform Provider

Triggering Kibana Workflows from Rules

Connectors and Actions in Rules

Best Practices

Common Pitfalls

Examples

Guidelines

Additional Resources

最新 Skills