Semgrep 静态代码分析工具：快速安全扫描与自定义规则创建指南

semgrep by semgrep/skills

292 周安装量

163 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/semgrep/skills --skill semgrep

自动化代码质量安全

🇨🇳中文介绍

Semgrep 静态分析

用于安全扫描和自定义规则创建的快速、基于模式的静态分析。

可用的 MCP 工具

如果您的环境中提供了 Semgrep MCP 工具，请优先使用它们进行扫描：

semgrep_scan — 使用内置规则集扫描代码文件以查找安全漏洞。传入绝对文件路径和可选的配置（例如 p/security-audit、auto）。
semgrep_scan_with_custom_rule — 使用您编写的自定义 YAML 规则扫描代码。内联传入代码内容和规则。
semgrep_findings — 从 Semgrep AppSec 平台获取仓库的现有发现结果。
semgrep_rule_schema — 获取用于编写 Semgrep 规则的完整模式。
get_supported_languages — 列出 Semgrep 支持的所有语言。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

何时使用 Semgrep

快速安全扫描（分钟级，而非小时级）
基于模式的错误和漏洞检测
强制执行编码标准和最佳实践
查找已知的漏洞模式（OWASP、CWE）
为您的代码库创建自定义检测规则
使用污点模式进行数据流分析

# pip（推荐）
python3 -m pip install semgrep

# Homebrew
brew install semgrep

# Docker
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

第一部分：运行扫描

semgrep --config auto .                    # 自动检测规则

semgrep --config p/<RULESET> .             # 单个规则集
semgrep --config p/security-audit --config p/trailofbits .  # 多个规则集

规则集	描述
`p/default`	通用安全和代码质量
`p/security-audit`	全面的安全规则
`p/owasp-top-ten`	OWASP Top 10 漏洞
`p/cwe-top-25`	CWE Top 25 漏洞
`p/trailofbits`	Trail of Bits 安全规则
`p/python`	Python 专用
`p/javascript`	JavaScript 专用
`p/golang`	Go 专用

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF
semgrep --config p/security-audit --json -o results.json .     # JSON

semgrep --config p/python app.py           # 单个文件
semgrep --config p/javascript src/         # 目录
semgrep --config auto --include='**/test/**' .  # 包含测试文件

tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/

password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

第二部分：创建自定义规则

何时创建自定义规则

检测项目特定的漏洞模式
强制执行内部编码标准
为自定义框架构建安全检查
为数据流分析创建污点模式规则

方法	适用场景
污点模式	数据从未受信任的源流向危险的接收点（注入漏洞）
模式匹配	无需数据流要求的语法模式（已弃用的 API、硬编码值）

对于注入漏洞，优先使用污点模式。仅靠模式匹配无法区分 eval(user_input)（易受攻击）和 eval("safe_literal")（安全）。

快速入门：模式匹配

rules:
  - id: hardcoded-password
    languages: [python]
    message: "检测到硬编码密码: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

快速入门：污点模式

rules:
  - id: command-injection
    languages: [python]
    message: 用户输入流向命令执行
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: os.system(...)
      - pattern: subprocess.call($CMD, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)

模式语法快速参考

语法	描述	示例
`...`	匹配任何内容	`func(...)`
`$VAR`	捕获元变量	`$FUNC($INPUT)`
`<... ...>`	深度表达式匹配	`<... user_input ...>`
运算符	描述
---	---
`pattern`	匹配精确模式
`patterns`	所有模式必须匹配（AND）
`pattern-either`	任意模式匹配（OR）
`pattern-not`	排除匹配项
`pattern-inside`	仅在上下文中匹配
`pattern-not-inside`	仅在上下文外匹配
`metavariable-regex`	对捕获的值使用正则表达式

必须先进行测试。 使用注解创建测试文件：

# test_rule.py
def test_vulnerable():
    user_input = request.args.get("id")
    # ruleid: my-rule-id
    cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe():
    user_input = request.args.get("id")
    # ok: my-rule-id
    cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))

semgrep --test --config rule.yaml test-file

任务	命令
运行测试	`semgrep --test --config rule.yaml test-file`
验证 YAML	`semgrep --validate --config rule.yaml`
转储 AST	`semgrep --dump-ast -l <lang> <file>`
调试污点流	`semgrep --dataflow-traces -f rule.yaml file`

规则创建工作流程

分析问题 - 理解错误模式，确定使用污点模式还是模式匹配方法
首先创建测试用例 - 在编写规则之前，先编写 ruleid: 和 ok: 注解
分析 AST - 运行 semgrep --dump-ast 以理解代码结构
编写规则 - 从简单开始，逐步迭代
测试直到 100% 通过 - 没有"遗漏行"或"错误行"
优化模式 - 仅在测试通过后移除冗余部分

<rule-id>/
├── <rule-id>.yaml     # Semgrep 规则
└── <rule-id>.<ext>    # 测试文件

官方 Semgrep 文档：

规则语法 - 完整的 YAML 结构、运算符和选项
规则模式 - 完整的 JSON 模式规范

工作流程指南 - 完整的逐步规则创建过程
快速参考 - 模式运算符和污点模式组件

需要避免的反模式

# 错误：匹配任何函数调用
pattern: $FUNC(...)

# 正确：特定的危险函数
pattern: eval(...)

缺少安全案例：

# 错误：仅测试易受攻击的案例
# ruleid: my-rule
dangerous(user_input)

# 正确：包含安全案例
# ruleid: my-rule
dangerous(user_input)

# ok: my-rule
dangerous(sanitize(user_input))

需要拒绝的合理化理由

捷径	为何错误
"Semgrep 没发现任何问题，代码是干净的"	Semgrep 是基于模式的；无法跟踪复杂的跨函数数据流
"模式看起来完整了"	未经测试的规则存在隐藏的误报/漏报
"它匹配了易受攻击的案例"	匹配漏洞只是工作的一半；需确保安全案例不被匹配
"污点模式太小题大做"	对于注入漏洞，污点模式能提供更好的精确度
"一个测试用例就够了"	需要包含边界情况：不同的编码风格、经过清理的输入、安全的替代方案

name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

🇺🇸English

Semgrep Static Analysis

Fast, pattern-based static analysis for security scanning and custom rule creation.

MCP Tools Available

If Semgrep MCP tools are available in your environment, prefer them for scanning:

semgrep_scan — Scan code files for security vulnerabilities using built-in rulesets. Pass absolute file paths and an optional config (e.g., p/security-audit, auto).
semgrep_scan_with_custom_rule — Scan code with a custom YAML rule you've written. Pass code content inline along with the rule.
semgrep_findings — Fetch existing findings from the Semgrep AppSec Platform for a repository.
semgrep_rule_schema — Get the full schema for writing Semgrep rules.
get_supported_languages — List all languages Semgrep supports.

When MCP tools aren't available, fall back to the CLI commands below.

When to Use Semgrep

Ideal scenarios:

Quick security scans (minutes, not hours)
Pattern-based bug and vulnerability detection
Enforcing coding standards and best practices
Finding known vulnerability patterns (OWASP, CWE)
Creating custom detection rules for your codebase
Data flow analysis with taint mode

Installation (CLI)

# pip (recommended)
python3 -m pip install semgrep

# Homebrew
brew install semgrep

# Docker
docker run --rm -v "${PWD}:/src" semgrep/semgrep semgrep --config auto /src

Part 1: Running Scans

Quick Scan

semgrep --config auto .                    # Auto-detect rules

Using Rulesets

semgrep --config p/<RULESET> .             # Single ruleset
semgrep --config p/security-audit --config p/trailofbits .  # Multiple

Ruleset	Description
`p/default`	General security and code quality
`p/security-audit`	Comprehensive security rules
`p/owasp-top-ten`	OWASP Top 10 vulnerabilities
`p/cwe-top-25`	CWE Top 25 vulnerabilities
`p/trailofbits`	Trail of Bits security rules
`p/python`	Python-specific

Output Formats

semgrep --config p/security-audit --sarif -o results.sarif .   # SARIF
semgrep --config p/security-audit --json -o results.json .     # JSON

Scan Specific Paths

semgrep --config p/python app.py           # Single file
semgrep --config p/javascript src/         # Directory
semgrep --config auto --include='**/test/**' .  # Include tests

Configuration

.semgrepignore

tests/fixtures/
**/testdata/
generated/
vendor/
node_modules/

Suppress False Positives

password = get_from_vault()  # nosemgrep: hardcoded-password
dangerous_but_safe()  # nosemgrep

Part 2: Creating Custom Rules

When to Create Custom Rules

Detecting project-specific vulnerability patterns
Enforcing internal coding standards
Building security checks for custom frameworks
Creating taint-mode rules for data flow analysis

Approach Selection

Approach	Use When
Taint mode	Data flows from untrusted source to dangerous sink (injection vulnerabilities)
Pattern matching	Syntactic patterns without data flow requirements (deprecated APIs, hardcoded values)

Prioritize taint mode for injection vulnerabilities. Pattern matching alone can't distinguish between eval(user_input) (vulnerable) and eval("safe_literal") (safe).

Quick Start: Pattern Matching

rules:
  - id: hardcoded-password
    languages: [python]
    message: "Hardcoded password detected: $PASSWORD"
    severity: ERROR
    pattern: password = "$PASSWORD"

Quick Start: Taint Mode

rules:
  - id: command-injection
    languages: [python]
    message: User input flows to command execution
    severity: ERROR
    mode: taint
    pattern-sources:
      - pattern: request.args.get(...)
      - pattern: request.form[...]
    pattern-sinks:
      - pattern: os.system(...)
      - pattern: subprocess.call($CMD, shell=True, ...)
    pattern-sanitizers:
      - pattern: shlex.quote(...)

Pattern Syntax Quick Reference

Syntax	Description	Example
`...`	Match anything	`func(...)`
`$VAR`	Capture metavariable	`$FUNC($INPUT)`
`<... ...>`	Deep expression match	`<... user_input ...>`
Operator	Description

Testing Rules

Test-first is mandatory. Create test files with annotations:

# test_rule.py
def test_vulnerable():
    user_input = request.args.get("id")
    # ruleid: my-rule-id
    cursor.execute("SELECT * FROM users WHERE id = " + user_input)

def test_safe():
    user_input = request.args.get("id")
    # ok: my-rule-id
    cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))

Run tests:

semgrep --test --config rule.yaml test-file

Command Reference

Task	Command
Run tests	`semgrep --test --config rule.yaml test-file`
Validate YAML	`semgrep --validate --config rule.yaml`
Dump AST	`semgrep --dump-ast -l <lang> <file>`
Debug taint flow	`semgrep --dataflow-traces -f rule.yaml file`

Rule Creation Workflow

Analyze the problem - Understand the bug pattern, determine taint vs pattern approach
Create test cases first - Write ruleid: and ok: annotations before the rule
Analyze AST - Run semgrep --dump-ast to understand code structure
Write the rule - Start simple, iterate
Test until 100% pass - No "missed lines" or "incorrect lines"
Optimize patterns - Remove redundancies only after tests pass

Output structure:

<rule-id>/
├── <rule-id>.yaml     # Semgrep rule
└── <rule-id>.<ext>    # Test file

Detailed References

Official Semgrep Documentation:

Rule Syntax - Complete YAML structure, operators, and options
Rule Schema - Full JSON schema specification

Local References:

Workflow Guide - Complete step-by-step rule creation process
Quick Reference - Pattern operators and taint components

Anti-Patterns to Avoid

Too broad:

# BAD: Matches any function call
pattern: $FUNC(...)

# GOOD: Specific dangerous function
pattern: eval(...)

Missing safe cases:

# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)

# GOOD: Include safe cases
# ruleid: my-rule
dangerous(user_input)

# ok: my-rule
dangerous(sanitize(user_input))

Rationalizations to Reject

Shortcut	Why It's Wrong
"Semgrep found nothing, code is clean"	Semgrep is pattern-based; can't track complex cross-function data flow
"The pattern looks complete"	Untested rules have hidden false positives/negatives
"It matches the vulnerable case"	Matching vulnerabilities is half the job; verify safe cases don't match
"Taint mode is overkill"	For injection vulnerabilities, taint mode gives better precision
"One test case is enough"	Include edge cases: different coding styles, sanitized inputs, safe alternatives

CI/CD Integration

GitHub Actions

name: Semgrep

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    - cron: '0 0 1 * *'

jobs:
  semgrep:
    runs-on: ubuntu-latest
    container:
      image: returntocorp/semgrep

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run Semgrep
        run: |
          if [ "${{ github.event_name }}" = "pull_request" ]; then
            semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }}
          else
            semgrep ci
          fi
        env:
          SEMGREP_RULES: >-
            p/security-audit
            p/owasp-top-ten
            p/trailofbits

Resources

Rule Writing:

Rule Syntax: https://semgrep.dev/docs/writing-rules/rule-syntax
Pattern Syntax: https://semgrep.dev/docs/writing-rules/pattern-syntax
Rule Schema: https://github.com/semgrep/semgrep-interfaces/blob/main/rule_schema_v1.yaml

General:

Registry: https://semgrep.dev/explore
Playground: https://semgrep.dev/playground
Docs: https://semgrep.dev/docs/
Trail of Bits Rules: https://github.com/trailofbits/semgrep-rules

Weekly Installs

292

Repository

semgrep/skills

GitHub Stars

163

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

gemini-cli263

codex260

opencode256

github-copilot256

amp238

kimi-cli237

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

22,200 周安装

Semgrep 静态代码分析工具：快速安全扫描与自定义规则创建指南

🇨🇳中文介绍

Semgrep 静态分析

可用的 MCP 工具

相关 Skills

何时使用 Semgrep

安装（CLI）

第一部分：运行扫描

快速扫描

使用规则集

输出格式

扫描特定路径

配置

.semgrepignore

抑制误报

第二部分：创建自定义规则

何时创建自定义规则

方法选择

快速入门：模式匹配

快速入门：污点模式

模式语法快速参考

测试规则

命令参考

规则创建工作流程

详细参考

需要避免的反模式

需要拒绝的合理化理由

CI/CD 集成

GitHub Actions

资源

🇺🇸English

Semgrep Static Analysis

MCP Tools Available

When to Use Semgrep

Installation (CLI)

Part 1: Running Scans

Quick Scan

Using Rulesets

Output Formats

Scan Specific Paths

Configuration

.semgrepignore

Suppress False Positives

Part 2: Creating Custom Rules

When to Create Custom Rules

Approach Selection

Quick Start: Pattern Matching

Quick Start: Taint Mode

Pattern Syntax Quick Reference

Testing Rules

Command Reference

Rule Creation Workflow

Detailed References

Anti-Patterns to Avoid

Rationalizations to Reject

CI/CD Integration

GitHub Actions

Resources

最新 Skills