Semgrep安全扫描指南：自动化代码审计与漏洞检测工具使用教程

semgrep by trailofbits/skills

1,400 周安装量

3,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/trailofbits/skills --skill semgrep

开发测试安全

🇨🇳中文介绍

Semgrep 安全扫描

通过自动语言检测、Task 子代理并行执行以及合并的 SARIF 输出，运行 Semgrep 扫描。

基本原则

始终使用 --metrics=off — Semgrep 默认发送遥测数据；--config auto 也会回传数据。每个 semgrep 命令都必须包含 --metrics=off，以防止安全审计期间的数据泄露。
用户必须批准扫描计划（步骤 3 是硬性关卡） — 原始的“扫描此代码库”请求不等同于批准。需呈现确切的规则集、目标、引擎和模式；在启动扫描器之前，等待明确的“是”或“继续”指令。
第三方规则集是必需的，而非可选的 — Trail of Bits、0xdea 和 Decurity 的规则能捕获官方注册表中没有的漏洞。只要检测到的语言匹配，就必须包含它们。
在单条消息中生成所有扫描任务 — 并行执行是核心性能优势。切勿顺序生成任务；始终在一条响应中发出所有任务的工具调用。
扫描前始终检查 Semgrep Pro — Pro 版本支持跨文件污点跟踪，并能捕获约 250% 以上的真正阳性结果。跳过检查意味着会悄无声息地遗漏关键的跨文件漏洞。

使用时机

代码库的安全审计
代码审查前查找漏洞
扫描已知的错误模式
第一轮静态分析

避免使用的情况

二进制分析 → 使用二进制分析工具

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

733,500 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

252,100 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

202,600 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

133,200 周安装

模式	覆盖范围	报告的发现项
全部运行	所有规则集，所有严重级别	全部内容
仅重要项	所有规则集，进行前过滤和后过滤	仅安全漏洞，中高置信度/影响

┌──────────────────────────────────────────────────────────────────┐
│ MAIN AGENT (this skill)                                          │
│ Step 1: Detect languages + check Pro availability                │
│ Step 2: Select scan mode + rulesets (ref: rulesets.md)           │
│ Step 3: Present plan + rulesets, get approval [⛔ HARD GATE]     │
│ Step 4: Spawn parallel scan Tasks (approved rulesets + mode)     │
│ Step 5: Merge results and report                                 │
└──────────────────────────────────────────────────────────────────┘
         │ Step 4
         ▼
┌─────────────────┐
│ Scan Tasks      │
│ (parallel)      │
├─────────────────┤
│ Python scanner  │
│ JS/TS scanner   │
│ Go scanner      │
│ Docker scanner  │
└─────────────────┘

步骤	操作	关卡	关键参考
1	解析输出目录，检测语言 + Pro 可用性	—	使用 Glob，而非 Bash
2	选择扫描模式 + 规则集	—	rulesets.md
3	呈现计划，获取明确批准	⛔ 硬性	AskUserQuestion
4	生成并行扫描任务	—	scanner-task-prompt.md
5	合并结果并报告	—	合并脚本（见下文）

代理	工具	用途
`static-analysis:semgrep-scanner`	Bash	为语言类别执行并行 semgrep 扫描

捷径	错误原因
"用户要求扫描，那就是批准"	原始请求 ≠ 计划批准。需呈现计划，使用 AskUserQuestion，等待明确的“是”
"步骤 3 任务是阻塞的，直接标记完成即可"	谎报任务状态会破坏强制执行机制。仅在真实批准后标记完成
"我已经知道他们想要什么"	假设会导致扫描错误的目录/规则集。呈现计划以供验证
"直接使用默认规则集"	用户必须在扫描前查看并批准确切的规则集
"未经询问添加额外规则集"	未经同意修改已批准的列表会破坏信任
"第三方规则集是可选的"	Trail of Bits、0xdea、Decurity 能捕获官方注册表中没有的漏洞 — 必需
"使用 `--config auto`"	发送遥测数据；对规则集的控制较少
"一次一个任务"	破坏了并行性；应一起生成所有任务
"Pro 太慢，跳过 `--pro`"	跨文件分析能捕获 250% 以上的真正阳性结果；值得花时间
"Semgrep 原生支持 GitHub URL"	对于具有非标准 YAML 的仓库，URL 处理会失败；始终先克隆
"清理是可选的"	克隆的仓库会污染用户的工作空间并在多次运行中累积
"使用 `.` 或相对路径作为目标"	子代理需要绝对路径以避免歧义
"让用户稍后选择输出目录"	输出目录必须在步骤 1 解析，在任何文件创建之前

文件	内容
rulesets.md	完整的规则集目录和选择算法
scan-modes.md	前/后过滤标准和 jq 命令
scanner-task-prompt.md	用于生成扫描器子代理的模板
工作流	用途
---	---
scan-workflow.md	完整的 5 步扫描执行流程

输出目录已解析（用户指定或自动递增的默认值）
所有生成的文件都存储在 $OUTPUT_DIR 内
已检测语言并统计文件数量；已检查 Pro 状态
用户选择了扫描模式（全部运行 / 仅重要项）
规则集包含所有检测到语言的第三方规则
用户明确批准了扫描计划（通过步骤 3 关卡）
所有扫描任务在单条消息中生成并完成
每个 semgrep 命令都使用了 --metrics=off
已批准的规则集记录到 $OUTPUT_DIR/rulesets.txt
原始每次扫描输出存储在 $OUTPUT_DIR/raw/
results.sarif 存在于 $OUTPUT_DIR/results/ 中且是有效的 JSON
仅重要项模式：合并前应用了后过滤；未过滤的结果保留在 raw/ 中
结果摘要已报告，包含严重性和类别细分
克隆的仓库（如果有）已从 $OUTPUT_DIR/repos/ 中清理

🇺🇸English

Semgrep Security Scan

Run a Semgrep scan with automatic language detection, parallel execution via Task subagents, and merged SARIF output.

Essential Principles

Always use--metrics=off — Semgrep sends telemetry by default; --config auto also phones home. Every semgrep command must include --metrics=off to prevent data leakage during security audits.
User must approve the scan plan (Step 3 is a hard gate) — The original "scan this codebase" request is NOT approval. Present exact rulesets, target, engine, and mode; wait for explicit "yes"/"proceed" before spawning scanners.
Third-party rulesets are required, not optional — Trail of Bits, 0xdea, and Decurity rules catch vulnerabilities absent from the official registry. Include them whenever the detected language matches.
Spawn all scan Tasks in a single message — Parallel execution is the core performance advantage. Never spawn Tasks sequentially; always emit all Task tool calls in one response.
Always check for Semgrep Pro before scanning — Pro enables cross-file taint tracking and catches ~250% more true positives. Skipping the check means silently missing critical inter-file vulnerabilities.

When to Use

Security audit of a codebase
Finding vulnerabilities before code review
Scanning for known bug patterns
First-pass static analysis

When NOT to Use

Binary analysis → Use binary analysis tools
Already have Semgrep CI configured → Use existing pipeline
Need cross-file analysis but no Pro license → Consider CodeQL as alternative
Creating custom Semgrep rules → Use semgrep-rule-creator skill
Porting existing rules to other languages → Use semgrep-rule-variant-creator skill

Output Directory

All scan results, SARIF files, and temporary data are stored in a single output directory.

If the user specifies an output directory in their prompt, use it as OUTPUT_DIR.
If not specified , default to ./static_analysis_semgrep_1. If that already exists, increment to _2, _3, etc.

In both cases, always create the directory with mkdir -p before writing any files.

# Resolve output directory
if [ -n "$USER_SPECIFIED_DIR" ]; then
  OUTPUT_DIR="$USER_SPECIFIED_DIR"
else
  BASE="static_analysis_semgrep"
  N=1
  while [ -e "${BASE}_${N}" ]; do
    N=$((N + 1))
  done
  OUTPUT_DIR="${BASE}_${N}"
fi
mkdir -p "$OUTPUT_DIR/raw" "$OUTPUT_DIR/results"

The output directory is resolved once at the start of Step 1 and used throughout all subsequent steps.

$OUTPUT_DIR/
├── rulesets.txt                 # Approved rulesets (logged after Step 3)
├── raw/                         # Per-scan raw output (unfiltered)
│   ├── python-python.json
│   ├── python-python.sarif
│   ├── python-django.json
│   ├── python-django.sarif
│   └── ...
└── results/                     # Final merged output
    └── results.sarif

Prerequisites

Required: Semgrep CLI (semgrep --version). If not installed, see Semgrep installation docs.

Optional: Semgrep Pro — enables cross-file taint tracking, inter-procedural analysis, and additional languages (Apex, C#, Elixir). Check with:

semgrep --pro --validate --config p/default 2>/dev/null && echo "Pro available" || echo "OSS only"

Limitations: OSS mode cannot track data flow across files. Pro mode uses -j 1 for cross-file analysis (slower per ruleset, but parallel rulesets compensate).

Scan Modes

Select mode in Step 2 of the workflow. Mode affects both scanner flags and post-processing.

Mode	Coverage	Findings Reported
Run all	All rulesets, all severity levels	Everything
Important only	All rulesets, pre- and post-filtered	Security vulns only, medium-high confidence/impact

Important only applies two filter layers:

Pre-filter : --severity MEDIUM --severity HIGH --severity CRITICAL (CLI flag)
Post-filter : JSON metadata — keeps only category=security, confidence∈{MEDIUM,HIGH}, impact∈{MEDIUM,HIGH}

See scan-modes.md for metadata criteria and jq filter commands.

Orchestration Architecture

┌──────────────────────────────────────────────────────────────────┐
│ MAIN AGENT (this skill)                                          │
│ Step 1: Detect languages + check Pro availability                │
│ Step 2: Select scan mode + rulesets (ref: rulesets.md)           │
│ Step 3: Present plan + rulesets, get approval [⛔ HARD GATE]     │
│ Step 4: Spawn parallel scan Tasks (approved rulesets + mode)     │
│ Step 5: Merge results and report                                 │
└──────────────────────────────────────────────────────────────────┘
         │ Step 4
         ▼
┌─────────────────┐
│ Scan Tasks      │
│ (parallel)      │
├─────────────────┤
│ Python scanner  │
│ JS/TS scanner   │
│ Go scanner      │
│ Docker scanner  │
└─────────────────┘

Workflow

Follow the detailed workflow inscan-workflow.md. Summary:

Step	Action	Gate	Key Reference
1	Resolve output dir, detect languages + Pro availability	—	Use Glob, not Bash
2	Select scan mode + rulesets	—	rulesets.md
3	Present plan, get explicit approval	⛔ HARD	AskUserQuestion
4	Spawn parallel scan Tasks	—	scanner-task-prompt.md
5	Merge results and report	—	Merge script (below)

Task enforcement: On invocation, create 5 tasks with blockedBy dependencies (each step blocks the previous). Step 3 is a HARD GATE — mark complete ONLY after user explicitly approves.

Merge command (Step 5):

uv run {baseDir}/scripts/merge_sarif.py $OUTPUT_DIR/raw $OUTPUT_DIR/results/results.sarif

Agents

Agent	Tools	Purpose
`static-analysis:semgrep-scanner`	Bash	Executes parallel semgrep scans for a language category

Use subagent_type: static-analysis:semgrep-scanner in Step 4 when spawning Task subagents.

Rationalizations to Reject

Shortcut	Why It's Wrong
"User asked for scan, that's approval"	Original request ≠ plan approval. Present plan, use AskUserQuestion, await explicit "yes"
"Step 3 task is blocking, just mark complete"	Lying about task status defeats enforcement. Only mark complete after real approval
"I already know what they want"	Assumptions cause scanning wrong directories/rulesets. Present plan for verification
"Just use default rulesets"	User must see and approve exact rulesets before scan
"Add extra rulesets without asking"	Modifying approved list without consent breaks trust
"Third-party rulesets are optional"	Trail of Bits, 0xdea, Decurity catch vulnerabilities not in official registry — REQUIRED
"Use --config auto"	Sends metrics; less control over rulesets
"One Task at a time"	Defeats parallelism; spawn all Tasks together
"Pro is too slow, skip --pro"	Cross-file analysis catches 250% more true positives; worth the time
"Semgrep handles GitHub URLs natively"	URL handling fails on repos with non-standard YAML; always clone first
"Cleanup is optional"

Reference Index

File	Content
rulesets.md	Complete ruleset catalog and selection algorithm
scan-modes.md	Pre/post-filter criteria and jq commands
scanner-task-prompt.md	Template for spawning scanner subagents
Workflow	Purpose
---	---
scan-workflow.md	Complete 5-step scan execution process

Success Criteria

Output directory resolved (user-specified or auto-incremented default)
All generated files stored inside $OUTPUT_DIR
Languages detected with file counts; Pro status checked
Scan mode selected by user (run all / important only)
Rulesets include third-party rules for all detected languages
User explicitly approved the scan plan (Step 3 gate passed)
All scan Tasks spawned in a single message and completed
Every semgrep command used --metrics=off
Approved rulesets logged to $OUTPUT_DIR/rulesets.txt
Raw per-scan outputs stored in $OUTPUT_DIR/raw/
results.sarif exists in $OUTPUT_DIR/results/ and is valid JSON
Important-only mode: post-filter applied before merge; unfiltered results preserved in raw/

Weekly Installs

1.4K

Repository

trailofbits/skills

GitHub Stars

3.9K

First Seen

Jan 19, 2026

Security Audits

Gen Agent Trust HubPass SocketWarn SnykWarn

Installed on

claude-code1.2K

codex1.1K

opencode1.1K

gemini-cli1.1K

cursor1.0K

github-copilot1.0K