Semgrep规则创建指南：编写生产级安全检测规则与测试验证

semgrep-rule-creator by trailofbits/skills

1,100 周安装量

3,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/trailofbits/skills --skill semgrep-rule-creator

开发测试安全

🇨🇳中文介绍

Semgrep 规则创建器

创建具备完善测试和验证的生产级 Semgrep 规则。

使用场景

理想场景：

为特定缺陷模式编写 Semgrep 规则
编写规则以检测代码库中的安全漏洞
为数据流漏洞编写污点分析模式规则
编写规则以强制执行编码标准

不适用场景

请勿将此技能用于：

运行现有的 Semgrep 规则集
无需自定义规则的通用静态分析（请使用 static-analysis 技能）

应拒绝的常见简化理由

编写 Semgrep 规则时，请拒绝以下常见简化理由：

"模式看起来完整了" → 仍需运行 semgrep --test --config <rule-id>.yaml <rule-id>.<ext> 进行验证。未经测试的规则存在隐藏的误报/漏报。
"它匹配了漏洞案例" → 匹配漏洞只是工作的一半。需验证安全案例是否不匹配（误报会破坏信任）。
"对此使用污点分析模式是过度设计" → 如果数据从用户输入流向危险接收器，污点分析模式比模式匹配提供更高的精确度。
"一个测试就够了" → 应包含边界情况：不同的编码风格、经过净化的输入、安全替代方案以及边界条件。
"我先优化模式" → 先编写正确的模式，待所有测试通过后再进行优化。过早优化会导致回归问题。
"AST 转储太复杂" → AST 准确揭示了 Semgrep 如何看待代码。跳过它会导致模式遗漏语法变体。

反模式

- 匹配所有内容，对检测无用：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

733,500 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

252,100 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

202,600 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

133,200 周安装

# 错误：匹配任何函数调用
pattern: $FUNC(...)

# 正确：特定的危险函数
pattern: eval(...)

测试中缺少安全案例 - 导致未检测到的误报：

# 错误：仅测试漏洞案例
# ruleid: my-rule
dangerous(user_input)

# 正确：包含安全案例以验证无误报
# ruleid: my-rule
dangerous(user_input)

# ok: my-rule
dangerous(sanitize(user_input))

# ok: my-rule
dangerous("hardcoded_safe_value")

模式过于具体 - 遗漏变体：

# 错误：仅匹配精确格式
pattern: os.system("rm " + $VAR)

# 正确：通过污点跟踪匹配所有 os.system 调用
mode: taint
pattern-sources:
  - pattern: input(...)
pattern-sinks:
  - pattern: os.system(...)

此工作流程是严格的 - 不得跳过步骤：

先阅读文档：在编写 Semgrep 规则前，请参阅文档
测试先行是强制的：切勿在无测试的情况下编写规则
要求 100% 测试通过："大多数测试通过"是不可接受的
优化放在最后：仅在所有测试通过后才简化模式
避免通用模式：规则必须具体，不能匹配宽泛的模式
优先考虑污点分析模式：针对数据流漏洞
一个 YAML 文件 - 一个 Semgrep 规则：每个 YAML 文件必须只包含一个 Semgrep 规则；不要将多个规则合并到单个文件中
禁止通用规则：当针对特定语言编写 Semgrep 规则时 - 避免通用模式匹配（languages: generic）
禁止 todook 和 todoruleid 测试注解：测试文件中禁止使用用于未来规则改进的 todoruleid: <rule-id> 和 todook: <rule-id> 注解

本技能指导创建用于检测安全漏洞和代码模式的 Semgrep 规则。规则是迭代创建的：分析问题、先编写测试、分析 AST 结构、编写规则、迭代直到所有测试通过、优化规则。

污点分析模式（优先）：不受信任的输入到达危险接收器的数据流问题
模式匹配：无数据流要求的简单语法模式

为何优先使用污点分析模式？ 模式匹配能找到语法但会遗漏上下文。模式 eval($X) 会同时匹配 eval(user_input)（易受攻击）和 eval("safe_literal")（安全）。污点分析模式跟踪数据流，因此仅当不受信任的数据实际到达接收器时才发出警报——显著减少了注入漏洞的误报。

方法间迭代：可以尝试实验。如果从污点分析模式开始但效果不佳（例如，污点未按预期传播，误报/漏报过多），可切换到模式匹配。反之，如果模式匹配在安全案例上产生过多误报，可尝试污点分析模式。目标是获得有效的规则——而非僵化地坚持一种方法。

输出结构 - 在以后缀规则 ID 命名的目录中恰好包含 2 个文件：

<rule-id>/
├── <rule-id>.yaml     # Semgrep 规则
└── <rule-id>.<ext>    # 带有 ruleid/ok 注解的测试文件

rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: 传递给 eval() 的用户输入允许代码执行
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)

测试文件 (insecure-eval.py)：

# ruleid: insecure-eval
eval(request.args.get('code'))

# ok: insecure-eval
eval("print('safe')")

运行测试（从规则目录）：semgrep --test --config <rule-id>.yaml <rule-id>.<ext>

有关命令、模式运算符和污点分析模式语法，请参阅 quick-reference.md。
有关详细的工作流程和示例，您必须参阅 workflow.md。

复制此清单并跟踪进度：

Semgrep 规则进度：
- [ ] 步骤 1：分析问题
- [ ] 步骤 2：先编写测试
- [ ] 步骤 3：分析 AST 结构
- [ ] 步骤 4：编写规则
- [ ] 步骤 5：迭代直到所有测试通过 (semgrep --test)
- [ ] 步骤 6：优化规则（移除冗余，重新测试）
- [ ] 步骤 7：最终运行

必需：在编写任何规则之前，请使用 WebFetch 阅读以下 7 个 Semgrep 文档链接的所有内容：

🇺🇸English

Semgrep Rule Creator

Create production-quality Semgrep rules with proper testing and validation.

When to Use

Ideal scenarios:

Writing Semgrep rules for specific bug patterns
Writing rules to detect security vulnerabilities in your codebase
Writing taint mode rules for data flow vulnerabilities
Writing rules to enforce coding standards

When NOT to Use

Do NOT use this skill for:

Running existing Semgrep rulesets
General static analysis without custom rules (use static-analysis skill)

Rationalizations to Reject

When writing Semgrep rules, reject these common shortcuts:

"The pattern looks complete" → Still run semgrep --test --config <rule-id>.yaml <rule-id>.<ext> to verify. Untested rules have hidden false positives/negatives.
"It matches the vulnerable case" → Matching vulnerabilities is half the job. Verify safe cases don't match (false positives break trust).
"Taint mode is overkill for this" → If data flows from user input to a dangerous sink, taint mode gives better precision than pattern matching.
"One test is enough" → Include edge cases: different coding styles, sanitized inputs, safe alternatives, and boundary conditions.
"I'll optimize the patterns first" → Write correct patterns first, optimize after all tests pass. Premature optimization causes regressions.
"The AST dump is too complex" → The AST reveals exactly how Semgrep sees code. Skipping it leads to patterns that miss syntactic variations.

Anti-Patterns

Too broad - matches everything, useless for detection:

# BAD: Matches any function call
pattern: $FUNC(...)

# GOOD: Specific dangerous function
pattern: eval(...)

Missing safe cases in tests - leads to undetected false positives:

# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)

# GOOD: Include safe cases to verify no false positives
# ruleid: my-rule
dangerous(user_input)

# ok: my-rule
dangerous(sanitize(user_input))

# ok: my-rule
dangerous("hardcoded_safe_value")

Overly specific patterns - misses variations:

# BAD: Only matches exact format
pattern: os.system("rm " + $VAR)

# GOOD: Matches all os.system calls with taint tracking
mode: taint
pattern-sources:
  - pattern: input(...)
pattern-sinks:
  - pattern: os.system(...)

Strictness Level

This workflow is strict - do not skip steps:

Read documentation first : See Documentation before writing Semgrep rules
Test-first is mandatory : Never write a rule without tests
100% test pass is required : "Most tests pass" is not acceptable
Optimization comes last : Only simplify patterns after all tests pass
Avoid generic patterns : Rules must be specific, not match broad patterns
Prioritize taint mode : For data flow vulnerabilities
One YAML file - one Semgrep rule : Each YAML file must contain only one Semgrep rule; don't combine multiple rules in a single file
No generic rules : When targeting a specific language for Semgrep rules - avoid generic pattern matching (languages: generic)
Forbiddentodook and todoruleid test annotations: todoruleid: <rule-id> and todook: <rule-id> annotations in tests files for future rule improvements are forbidden

Overview

This skill guides creation of Semgrep rules that detect security vulnerabilities and code patterns. Rules are created iteratively: analyze the problem, write tests first, analyze AST structure, write the rule, iterate until all tests pass, optimize the rule.

Approach selection:

Taint mode (prioritize): Data flow issues where untrusted input reaches dangerous sinks
Pattern matching : Simple syntactic patterns without data flow requirements

Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern eval($X) matches both eval(user_input) (vulnerable) and eval("safe_literal") (safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink—dramatically reducing false positives for injection vulnerabilities.

Iterating between approaches: It's okay to experiment. If you start with taint mode and it's not working well (e.g., taint doesn't propagate as expected, too many false positives/negatives), switch to pattern matching. Conversely, if pattern matching produces too many false positives on safe cases, try taint mode instead. The goal is a working rule—not rigid adherence to one approach.

Output structure - exactly 2 files in a directory named after the rule-id:

<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file with ruleid/ok annotations

Quick Start

rules:
  - id: insecure-eval
    languages: [python]
    severity: HIGH
    message: User input passed to eval() allows code execution
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
    pattern-sinks:
      - pattern: eval(...)

Test file (insecure-eval.py):

# ruleid: insecure-eval
eval(request.args.get('code'))

# ok: insecure-eval
eval("print('safe')")

Run tests (from rule directory): semgrep --test --config <rule-id>.yaml <rule-id>.<ext>

Quick Reference

For commands, pattern operators, and taint mode syntax, see quick-reference.md.
For detailed workflow and examples, you MUST see workflow.md

Workflow

Copy this checklist and track progress:

Semgrep Rule Progress:
- [ ] Step 1: Analyze the Problem
- [ ] Step 2: Write Tests First
- [ ] Step 3: Analyze AST structure
- [ ] Step 4: Write the rule
- [ ] Step 5: Iterate until all tests pass (semgrep --test)
- [ ] Step 6: Optimize the rule (remove redundancies, re-test)
- [ ] Step 7: Final Run

Documentation

REQUIRED : Before writing any rule, use WebFetch to read all of these 7 links with Semgrep documentation:

Weekly Installs

1.1K

Repository

trailofbits/skills

GitHub Stars

3.9K

First Seen

Jan 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

claude-code995

opencode954

gemini-cli933

codex927

cursor901

github-copilot871

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

102,200 周安装