Elastic Security 告警分诊技能：自动化安全告警调查、分类与案例管理

security-alert-triage by elastic/agent-skills

127 周安装量

89 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/elastic/agent-skills --skill security-alert-triage

自动化监控安全

🇨🇳中文介绍

告警分诊

逐个分析 Elastic Security 告警：收集上下文、分类、创建案例并确认。此技能依赖 case-management 技能来创建案例。

先决条件

首次使用前，请在 skills/security 目录下安装依赖项：

cd skills/security && npm install

设置必需的环境变量（或将其添加到工作区根目录的 .env 文件中）：

export ELASTICSEARCH_URL="https://your-cluster.es.cloud.example.com:443"
export ELASTICSEARCH_API_KEY="your-api-key"
export KIBANA_URL="https://your-cluster.kb.cloud.example.com:443"
export KIBANA_API_KEY="your-kibana-api-key"

快速开始

所有命令均从工作区根目录执行。始终遵循获取 → 调查 → 记录 → 确认的流程。直接调用工具——不要先读取技能文件或探索工作区。

node skills/security/alert-triage/scripts/fetch-next-alert.js
node skills/security/case-management/scripts/case-manager.js find --tags "agent_id:<id>"
node skills/security/alert-triage/scripts/run-query.js --query-file query.esql --type esql
node skills/security/case-management/scripts/case-manager.js create --title "..." --description "..." --tags "classification:..." "agent_id:<id>" --severity <level> --yes
node skills/security/case-management/scripts/case-manager.js attach-alert --case-id <id> --alert-id <id> --alert-index <index> --rule-id <uuid> --rule-name "<name>" --yes
node skills/security/alert-triage/scripts/acknowledge-alert.js --related --agent <id> --timestamp <ts> --window 60 --yes

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

常见多步骤工作流

任务	需调用的工具（按顺序）
端到端分诊	`fetch_next_alert` → `run_query`（上下文）→ `case_manager` create（案例）→ `acknowledge_alert`
收集上下文	`run_query`（进程树、网络、相关告警）
分类后创建案例	`case_manager` create → `case_manager` attach-alert
分诊后确认	`acknowledge_alert`（批量模式使用 related）

务必完成完整的工作流程：获取 → 调查 → 记录 → 确认。不要在收集上下文后停止——在确认之前，根据调查结果创建或更新案例。

关键执行规则：

立即开始执行工具——不要先读取 SKILL.md、浏览工作区或列出文件。
对于 ES|QL 查询，先将查询写入临时的 .esql 文件，然后通过 --query-file 传递。不要使用 edit_file——使用单个 shell 调用，如 echo "..." > query.esql && node ... --query-file query.esql。
保持上下文收集的专注性：运行 2-4 个有针对性的查询（进程树、网络、相关告警），而不是 10 个以上。
仅报告工具返回的内容。逐字复制标识符——不要意译 ID、时间戳或主机名。

切勿过早分类。 在决定良性/未知/恶意之前，收集所有上下文。
大多数告警是误报，即使它们看起来很严重。像"恶意行为"这样的规则名称或"严重"级别本身并不是证据。
"未知"是可以接受的，并且在证据不足时通常是正确的。
"恶意"需要强有力的佐证证据：持久化 + C2、凭据窃取、横向移动——而不仅仅是可疑的 API 调用。
逐字报告工具输出。 完全按照工具返回的内容复制 ID、主机名、时间戳和计数。不要四舍五入数字、缩写 ID 或意译错误消息。

当分诊多个告警时，先分组，然后对每个组进行分诊：

- [ ] 步骤 0：按代理/主机和时间窗口对告警进行分组
- [ ] 步骤 1：检查现有案例
- [ ] 步骤 2：收集完整上下文（切勿跳过）
- [ ] 步骤 3：创建或更新案例（仅在收集上下文之后）
- [ ] 步骤 4：确认告警及所有相关告警
- [ ] 步骤 5：获取下一个告警组并重复

步骤 0：分诊前对告警进行分组

当用户询问多个未处理的告警时，先对它们进行分组以避免冗余调查：查询未处理的告警，按 agent.id 分组，再按时间窗口（约 5 分钟 = 可能是一个事件）进行子分组，将每个组作为一个单元进行分诊。

使用 ES|QL 进行概览（对于 PowerShell，先写入文件）：

FROM .alerts-security.alerts-*
| WHERE kibana.alert.workflow_status == "open" AND @timestamp >= "<start>"
| STATS alert_count=COUNT(*), rules=VALUES(kibana.alert.rule.name) BY agent.id
| SORT alert_count DESC

完整的查询模板，请参阅 references/classification-guide.md。

步骤 1：检查现有案例

在创建新案例之前，检查此告警是否属于现有案例。使用 case-management 技能：

node skills/security/case-management/scripts/case-manager.js find --tags "agent_id:<agent_id>"
node skills/security/case-management/scripts/case-manager.js cases-for-alert --alert-id <alert_id>

查找具有相同代理 ID、用户或相关检测规则且在相似时间窗口内的案例。

注意： 在 Serverless 上，find --search 可能会返回 500 错误。请改用 find --tags 或 list。

步骤 2：收集上下文

这是最重要的步骤。不要跳过或走捷径。 在形成任何分类意见之前，完成所有子步骤。

时间范围警告： 告警可能是几天或几周前的。切勿使用相对时间，如 NOW() - 1 HOUR。提取告警的 @timestamp 并围绕该时间构建查询，时间窗口为 +/- 1 小时。

子步骤： (2a) 同一代理/用户上的相关告警；(2b) 规则在整个环境中的频率（高 = 容易误报）；(2c) 实体上下文——进程树、网络、注册表、文件；(2d) 行为调查——持久化、C2、横向移动、凭据访问。

示例——进程树（使用带 KEEP 的 ES|QL；避免使用 --full，因为它会产生 10K+ 行）：

FROM logs-endpoint.events.process-*
| WHERE agent.id == "<agent_id>" AND @timestamp >= "<alert_time - 5min>" AND @timestamp <= "<alert_time + 10min>"
  AND process.parent.name IS NOT NULL
  AND process.name NOT IN ("svchost.exe", "conhost.exe", "agentbeat.exe")
| KEEP @timestamp, process.name, process.command_line, process.pid, process.parent.name, process.parent.pid
| SORT @timestamp | LIMIT 80

数据类型	索引模式
告警	`.alerts-security.alerts-*`
进程	`logs-endpoint.events.process-*`
网络	`logs-endpoint.events.network-*`
日志	`logs-*`

完整的查询模板和分类标准，请参阅 references/classification-guide.md。

步骤 3：创建或更新案例

收集上下文后，创建案例并附加告警。使用 --rule-id 和 --rule-name（必需；没有它们会导致 400 错误）：

node skills/security/case-management/scripts/case-manager.js create \
  --title "<简洁摘要>" \
  --description "<发现结果、IOC、攻击链、MITRE 技术>" \
  --tags "classification:<benign|unknown|malicious>" "confidence:<0-100>" "mitre:<technique>" "agent_id:<id>" \
  --severity <low|medium|high|critical>

node skills/security/case-management/scripts/case-manager.js attach-alert \
  --case-id <case_id> --alert-id <alert_id> --alert-index <index> \
  --rule-id <rule_uuid> --rule-name "<rule name>"

# 多个告警：attach-alerts --alert-ids <id1> <id2>
# 添加注释：add-comment --case-id <id> --comment "发现结果..."

案例描述： 摘要（1-2 句话）；攻击链；IOC（哈希值、IP、路径）；MITRE 技术；行为发现结果；响应上下文（补救措施、面临风险的凭据）。

步骤 4：确认告警

一起确认所有相关告警。先使用 --dry-run 确认范围，然后不带该参数运行：

# 按主机名——在对主机进行分诊时首选
node skills/security/alert-triage/scripts/acknowledge-alert.js --query --host <hostname> --dry-run
node skills/security/alert-triage/scripts/acknowledge-alert.js --query --host <hostname> --yes

# 按代理 ID——当已知 agent.id 时首选
node skills/security/alert-triage/scripts/acknowledge-alert.js --related --agent <id> --timestamp <ts> --window 60 --dry-run
node skills/security/alert-triage/scripts/acknowledge-alert.js --related --agent <id> --timestamp <ts> --window 60 --yes

对于较长的攻击链，增加 --window（例如，5 分钟用 300）。报告工具输出中确认告警的确切数量。传递 --yes 以跳过确认提示（当由代理调用时必需）。

node skills/security/alert-triage/scripts/fetch-next-alert.js

获取最旧的未确认的 Elastic Security 告警。

node skills/security/alert-triage/scripts/fetch-next-alert.js [--days <n>] [--json] [--full] [--verbose]

针对 Elasticsearch 运行 KQL 或 ES|QL 查询。

PowerShell 警告：ES|QL 查询包含管道字符（|），PowerShell 会将其解释为 shell 管道。对于 ES|QL，始终使用 --query-file：

# 将查询写入文件，然后运行
node skills/security/alert-triage/scripts/run-query.js --query-file query.esql --type esql

不含管道的 KQL 查询可以直接传递：

node skills/security/alert-triage/scripts/run-query.js "agent.id:<id>" --index "logs-*" --days 7

参数	描述
`query`	KQL 查询（位置参数）
`--query-file`, `-q`	从文件读取查询（在 PowerShell 上运行 ES
`--type`, `-t`	`kql` 或 `esql`（默认：kql）
`--index`, `-i`	索引模式（默认：`logs-*`）
`--size`, `-s`	最大结果数（默认：100）
`--days`, `-d`	限制在最近 N 天内
`--json`	原始 JSON 输出
`--full`	完整文档源

acknowledge-alert.js

通过将 workflow_status 更新为 acknowledged 来确认告警。

模式	命令
单个	`node skills/security/alert-triage/scripts/acknowledge-alert.js <alert_id> --index <index> --yes`
相关	`node skills/security/alert-triage/scripts/acknowledge-alert.js --related --agent <id> --timestamp <ts> [--window 60] --yes`
按主机	`node skills/security/alert-triage/scripts/acknowledge-alert.js --query --host <hostname> [--time-start <ts>] [--time-end <ts>] --yes`
查询	`node skills/security/alert-triage/scripts/acknowledge-alert.js --query --agent <id> [--time-start <ts>] [--time-end <ts>] --yes`
试运行	在任何模式下添加 `--dry-run`（无需确认）
确认	所有写入模式都会提示确认；传递 `--yes` 以跳过

"获取下一个未确认的告警并对其进行分诊"
"调查告警 ID abc-123——收集上下文、分类，如果是恶意的则创建案例"
"处理过去 24 小时内前 5 个严重告警"

仅报告工具输出——不要编造工具响应中不存在的 ID、主机名、IP 或详细信息。
保留请求中的标识符——在工具调用和响应中使用用户提供的精确值。
使用工具的返回数据简洁地确认操作。
区分事实与推断——将超出工具输出的结论标记为您的评估。

所有写入操作（acknowledge-alert.js）都会提示确认。当由代理调用时，传递 --yes 或 -y 以跳过。
在批量确认之前使用 --dry-run，以便在不修改数据的情况下预览范围。
确认脚本使用 Kibana Detection Engine API，该 API 与自管理和 Serverless 部署都兼容。
在运行任何脚本之前，验证环境变量是否指向预期的集群——确认操作无法撤销。

变量	必需	描述
`ELASTICSEARCH_URL`	是	Elasticsearch URL
`ELASTICSEARCH_API_KEY`	是	Elasticsearch API 密钥
`KIBANA_URL`	是	Kibana URL（用于案例管理）
`KIBANA_API_KEY`	是	Kibana API 密钥（用于案例管理）

🇺🇸English

Alert Triage

Analyze Elastic Security alerts one at a time: gather context, classify, create a case, and acknowledge. This skill depends on the case-management skill for case creation.

Prerequisites

Install dependencies before first use from the skills/security directory:

cd skills/security && npm install

Set the required environment variables (or add them to a .env file in the workspace root):

export ELASTICSEARCH_URL="https://your-cluster.es.cloud.example.com:443"
export ELASTICSEARCH_API_KEY="your-api-key"
export KIBANA_URL="https://your-cluster.kb.cloud.example.com:443"
export KIBANA_API_KEY="your-kibana-api-key"

Quick start

All commands from workspace root. Always fetch → investigate → document → acknowledge. Call the tools directly — do not read the skill file or explore the workspace first.

node skills/security/alert-triage/scripts/fetch-next-alert.js
node skills/security/case-management/scripts/case-manager.js find --tags "agent_id:<id>"
node skills/security/alert-triage/scripts/run-query.js --query-file query.esql --type esql
node skills/security/case-management/scripts/case-manager.js create --title "..." --description "..." --tags "classification:..." "agent_id:<id>" --severity <level> --yes
node skills/security/case-management/scripts/case-manager.js attach-alert --case-id <id> --alert-id <id> --alert-index <index> --rule-id <uuid> --rule-name "<name>" --yes
node skills/security/alert-triage/scripts/acknowledge-alert.js --related --agent <id> --timestamp <ts> --window 60 --yes

Common multi-step workflows

Task	Tools to call (in order)
End-to-end triage	`fetch_next_alert` → `run_query` (context) → `case_manager` create (case) → `acknowledge_alert`
Gather context	`run_query` (process tree, network, related alerts)
Create case after classification	`case_manager` create → `case_manager` attach-alert

Always complete the full workflow: fetch → investigate → document → acknowledge. Do not stop after gathering context — create or update a case with findings before acknowledging.

Critical execution rules:

Start executing tools immediately — do not read SKILL.md, browse the workspace, or list files first.
For ES|QL queries, write the query to a temporary .esql file then pass it via --query-file. Do not use edit_file — use a single shell call with echo "..." > query.esql && node ... --query-file query.esql.
Keep context gathering focused: run 2-4 targeted queries (process tree, network, related alerts), not 10+.
Report only what tools return. Copy identifiers verbatim — do not paraphrase IDs, timestamps, or hostnames.

Critical principles

Do NOT classify prematurely. Gather ALL context before deciding benign/unknown/malicious.
Most alerts are false positives , even if they look alarming. Rule names like "Malicious Behavior" or severity "critical" are NOT evidence.
"Unknown" is acceptable and often correct when evidence is insufficient.
MALICIOUS requires strong corroborating evidence : persistence + C2, credential theft, lateral movement — not only suspicious API calls.
Report tool output verbatim. Copy IDs, hostnames, timestamps, and counts exactly as returned by tools. Do not round numbers, abbreviate IDs, or paraphrase error messages.

Workflow

When triaging multiple alerts, group first, then triage each group :

- [ ] Step 0: Group alerts by agent/host and time window
- [ ] Step 1: Check existing cases
- [ ] Step 2: Gather full context (DO NOT SKIP)
- [ ] Step 3: Create or update case (only AFTER context gathered)
- [ ] Step 4: Acknowledge alert and all related alerts
- [ ] Step 5: Fetch next alert group and repeat

Step 0: Group alerts before triaging

When the user asks about multiple open alerts, group them first to avoid redundant investigation: query open alerts, group by agent.id, sub-group by time window (~5 min = likely one incident), triage each group as a single unit.

Use ES|QL for an overview (write to file first for PowerShell):

FROM .alerts-security.alerts-*
| WHERE kibana.alert.workflow_status == "open" AND @timestamp >= "<start>"
| STATS alert_count=COUNT(*), rules=VALUES(kibana.alert.rule.name) BY agent.id
| SORT alert_count DESC

For full query templates, see references/classification-guide.md.

Step 1: Check existing cases

Before creating a new case, check if this alert belongs to an existing one. Use the case-management skill:

node skills/security/case-management/scripts/case-manager.js find --tags "agent_id:<agent_id>"
node skills/security/case-management/scripts/case-manager.js cases-for-alert --alert-id <alert_id>

Look for cases with the same agent ID, user, or related detection rule within a similar time window.

Note: find --search may return 500 errors on Serverless. Use find --tags or list instead.

Step 2: Gather context

This is the most important step. Do not skip or shortcut it. Complete ALL substeps before forming any classification opinion.

Time range warning: Alerts may be days or weeks old. NEVER use relative time like NOW() - 1 HOUR. Extract the alert's @timestamp and build queries around that time with +/- 1 hour window.

Substeps: (2a) Related alerts on same agent/user; (2b) Rule frequency across env (high = FP-prone); (2c) Entity context — process tree, network, registry, files; (2d) Behavior investigation — persistence, C2, lateral movement, credential access.

Example — process tree (use ES|QL with KEEP; avoid --full which produces 10K+ lines):

FROM logs-endpoint.events.process-*
| WHERE agent.id == "<agent_id>" AND @timestamp >= "<alert_time - 5min>" AND @timestamp <= "<alert_time + 10min>"
  AND process.parent.name IS NOT NULL
  AND process.name NOT IN ("svchost.exe", "conhost.exe", "agentbeat.exe")
| KEEP @timestamp, process.name, process.command_line, process.pid, process.parent.name, process.parent.pid
| SORT @timestamp | LIMIT 80

Data type	Index pattern
Alerts	`.alerts-security.alerts-*`
Processes	`logs-endpoint.events.process-*`
Network	`logs-endpoint.events.network-*`
Logs	`logs-*`

For full query templates and classification criteria, see references/classification-guide.md.

Step 3: Create or update case

After gathering context, create a case and attach alert(s). Use --rule-id and --rule-name (required; 400 error without them):

node skills/security/case-management/scripts/case-manager.js create \
  --title "<concise summary>" \
  --description "<findings, IOCs, attack chain, MITRE techniques>" \
  --tags "classification:<benign|unknown|malicious>" "confidence:<0-100>" "mitre:<technique>" "agent_id:<id>" \
  --severity <low|medium|high|critical>

node skills/security/case-management/scripts/case-manager.js attach-alert \
  --case-id <case_id> --alert-id <alert_id> --alert-index <index> \
  --rule-id <rule_uuid> --rule-name "<rule name>"

# Multiple alerts: attach-alerts --alert-ids <id1> <id2>
# Add notes: add-comment --case-id <id> --comment "Findings..."

Case description: Summary (1-2 sentences); Attack chain; IOCs (hashes, IPs, paths); MITRE techniques; Behavioral findings; Response context (remediation, credentials at risk).

Step 4: Acknowledge alerts

Acknowledge ALL related alerts together. Use --dry-run first to confirm scope, then run without it:

# By host name — preferred when triaging a host
node skills/security/alert-triage/scripts/acknowledge-alert.js --query --host <hostname> --dry-run
node skills/security/alert-triage/scripts/acknowledge-alert.js --query --host <hostname> --yes

# By agent ID — preferred when agent.id is known
node skills/security/alert-triage/scripts/acknowledge-alert.js --related --agent <id> --timestamp <ts> --window 60 --dry-run
node skills/security/alert-triage/scripts/acknowledge-alert.js --related --agent <id> --timestamp <ts> --window 60 --yes

Increase --window for longer attack chains (e.g., 300 for 5 minutes). Report the exact count of acknowledged alerts from the tool output. Pass --yes to skip the confirmation prompt (required when called by an agent).

Step 5: Repeat

node skills/security/alert-triage/scripts/fetch-next-alert.js

Tool reference

fetch-next-alert.js

Fetches the oldest unacknowledged Elastic Security alert.

node skills/security/alert-triage/scripts/fetch-next-alert.js [--days <n>] [--json] [--full] [--verbose]

run-query.js

Runs KQL or ES|QL queries against Elasticsearch.

PowerShell warning : ES|QL queries contain pipe characters (|) which PowerShell interprets as shell pipes. ALWAYS use --query-file for ES|QL:

# Write query to file, then run
node skills/security/alert-triage/scripts/run-query.js --query-file query.esql --type esql

KQL queries without pipes can be passed directly:

node skills/security/alert-triage/scripts/run-query.js "agent.id:<id>" --index "logs-*" --days 7

Arg	Description
`query`	KQL query (positional)
`--query-file`, `-q`	Read query from file (required for ES
`--type`, `-t`	`kql` or `esql` (default: kql)
`--index`,

acknowledge-alert.js

Acknowledges alerts by updating workflow_status to acknowledged.

Mode	Command
Single	`node skills/security/alert-triage/scripts/acknowledge-alert.js <alert_id> --index <index> --yes`
Related	`node skills/security/alert-triage/scripts/acknowledge-alert.js --related --agent <id> --timestamp <ts> [--window 60] --yes`
By host	`node skills/security/alert-triage/scripts/acknowledge-alert.js --query --host <hostname> [--time-start <ts>] [--time-end <ts>] --yes`
Query	`node skills/security/alert-triage/scripts/acknowledge-alert.js --query --agent <id> [--time-start <ts>] [--time-end <ts>] --yes`
Dry run	Add to any mode (no confirmation needed)

Examples

"Fetch the next unacknowledged alert and triage it"
"Investigate alert ID abc-123 — gather context, classify, and create a case if malicious"
"Process the top 5 critical alerts from the last 24 hours"

Guidelines

Report only tool output — do not invent IDs, hostnames, IPs, or details not present in the tool response.
Preserve identifiers from the request — use exact values the user provides in tool calls and responses.
Confirm actions concisely using the tool's return data.
Distinguish facts from inference — label conclusions beyond tool output as your assessment.

Production use

All write operations (acknowledge-alert.js) prompt for confirmation. Pass --yes or -y to skip when called by an agent.
Use --dry-run before bulk acknowledgments to preview scope without modifying data.
The acknowledge script uses the Kibana Detection Engine API, which is compatible with both self-managed and Serverless deployments.
Verify environment variables point to the intended cluster before running any script — no undo for acknowledgments.

Environment variables

Variable	Required	Description
`ELASTICSEARCH_URL`	Yes	Elasticsearch URL
`ELASTICSEARCH_API_KEY`	Yes	Elasticsearch API key
`KIBANA_URL`	Yes	Kibana URL (for case management)
`KIBANA_API_KEY`	Yes	Kibana API key (for case management)

Weekly Installs

127

Repository

elastic/agent-skills

GitHub Stars

First Seen

11 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

cursor115

opencode108

gemini-cli108

github-copilot108

codex108

amp108

前端性能优化指南：识别瓶颈与优化策略，提升LCP、FID、CLS核心指标

41,500 周安装

Elastic Security 告警分诊技能：自动化安全告警调查、分类与案例管理

🇨🇳中文介绍

告警分诊

先决条件

快速开始

相关 Skills

常见多步骤工作流

关键原则

工作流程

步骤 0：分诊前对告警进行分组

步骤 1：检查现有案例

步骤 2：收集上下文

步骤 3：创建或更新案例

步骤 4：确认告警

步骤 5：重复

工具参考

fetch-next-alert.js

run-query.js

acknowledge-alert.js

示例

指南

生产使用

环境变量

🇺🇸English

Alert Triage

Prerequisites

Quick start

Common multi-step workflows

Critical principles

Workflow

Step 0: Group alerts before triaging

Step 1: Check existing cases

Step 2: Gather context

Step 3: Create or update case

Step 4: Acknowledge alerts

Step 5: Repeat

Tool reference

fetch-next-alert.js

run-query.js

acknowledge-alert.js

Examples

Guidelines

Production use

Environment variables

最新 Skills