CodeQL 代码安全分析工具 - 支持多语言的静态代码扫描与漏洞检测

codeql by trailofbits/skills

1,300 周安装量

3,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/trailofbits/skills --skill codeql

开发测试安全

🇨🇳中文介绍

CodeQL 分析

支持的语言：Python、JavaScript/TypeScript、Go、Java/Kotlin、C/C++、C#、Ruby、Swift。

技能资源： 参考文件和模板位于 {baseDir}/references/ 和 {baseDir}/workflows/。

基本原则

数据库质量不容妥协。 能够构建的数据库并不自动就是好的。始终运行质量评估（文件计数、基线代码行数、提取器错误）并与预期的源文件进行比较。缓存的构建会产生零有用的提取。
数据扩展能捕获 CodeQL 遗漏的内容。 即使使用标准框架（Django、Spring、Express）的项目，在数据库调用、请求解析或 shell 执行周围也有自定义包装器。跳过创建数据扩展的工作流意味着会遗漏项目特定代码路径中的漏洞。
显式的套件引用可防止静默丢弃查询。 切勿将包名称直接传递给 codeql database analyze —— 每个包的 defaultSuiteFile 会应用隐藏过滤器，可能导致零结果。始终生成自定义的 .qls 套件文件。
零发现需要调查，而非庆祝。 零结果可能表明数据库质量差、缺少模型、查询包错误或套件静默过滤。在报告干净结果之前先进行调查。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

应拒绝的合理化理由

这些捷径会导致遗漏发现。不要接受它们：

"security-extended 就足够了" - 这是基线。始终检查 Trail of Bits 包和社区包是否可用于该语言。它们能捕获 security-extended 完全遗漏的类别。
"数据库构建成功，所以它是好的" - 能够构建的数据库并不意味着它提取良好。始终运行质量评估，并根据预期的源文件检查文件计数。
"标准框架不需要数据扩展" - 即使是 Django/Spring 应用也有 CodeQL 未建模的自定义包装器。跳过扩展意味着遗漏漏洞。
"对于编译语言，build-mode=none 没问题" - 它会产生严重不完整的分析。仅在绝对最后的手段时使用。在 macOS 上，先尝试 arm64 工具链变通方法或 Rosetta。
"构建在 macOS 上失败，就用 build-mode=none 吧" - 退出码 137 是由 arm64e/arm64 不匹配引起的，而不是根本性的构建失败。参见 macos-arm64e-workaround.md。
"没有发现意味着代码是安全的" - 零发现可能表明数据库质量差、缺少模型或查询包错误。在报告干净结果之前先进行调查。
"我就运行默认套件吧" / "我就直接传递包名吧" - 每个包的 defaultSuiteFile 会应用隐藏过滤器，可能导致零结果。始终使用显式的套件引用。
"我把文件放在当前目录" - 所有生成的文件必须放在 $OUTPUT_DIR 中。将文件散落在工作目录中会使清理变得不可能，并可能覆盖之前的运行结果。
"就用我找到的第一个数据库" - 可能存在针对不同语言或来自先前运行的多个数据库。当发现多个时，向用户展示所有选项。仅当用户已指定要使用哪个数据库时才跳过提示。
"用户说'扫描'，那意味着他们希望我选择一个数据库" - "扫描"不是数据库选择。如果存在多个数据库且用户未指定一个，请询问。

此技能有三个工作流。一旦选定工作流，请逐步执行，不要跳过阶段。

工作流	目的
build-database	按顺序使用构建方法创建 CodeQL 数据库
create-data-extensions	检测或生成项目 API 的数据扩展模型
run-analysis	选择规则集、执行查询、处理结果

如果用户明确指定要做什么（例如，"构建一个数据库"、"对 ./my-db 运行分析"），则直接执行该工作流。如果用户的提示已经清楚地表明了他们的意图，则不要为数据库选择调用 AskUserQuestion —— 例如，"构建一个新数据库"、"分析 static_analysis_codeql_2 中的 codeql 数据库"、"从头开始运行完整扫描"。

针对"测试"、"扫描"、"分析"或类似指令的默认流水线： 首先发现现有数据库，然后决定。

# 通过查找 codeql-database.yml 标记文件来查找所有 CodeQL 数据库
# 搜索顶层目录和一层子目录深度
FOUND_DBS=()
while IFS= read -r yml; do
  db_dir=$(dirname "$yml")
  codeql resolve database -- "$db_dir" >/dev/null 2>&1 && FOUND_DBS+=("$db_dir")
done < <(find . -maxdepth 3 -name "codeql-database.yml" -not -path "*/\.*" 2>/dev/null)

echo "Found ${#FOUND_DBS[@]} existing database(s)"

条件	操作
未找到数据库	解析新的 `$OUTPUT_DIR`，执行构建 → 扩展 → 分析（完整流水线）
找到一个数据库	使用 `AskUserQuestion`：重用它还是构建新的？
找到多个数据库	使用 `AskUserQuestion`：列出所有带元数据的数据库，让用户选择一个或构建新的
用户明确说明了意图	跳过 `AskUserQuestion`，直接根据他们的指示操作

数据库选择提示

当发现现有数据库且用户未明确指定使用哪个时，通过 AskUserQuestion 呈现：

header: "Existing CodeQL Databases"
question: "I found existing CodeQL database(s). What would you like to do?"
options:
  - label: "<db_path_1> (language: python, created: 2026-02-24)"
    description: "Reuse this database"
  - label: "<db_path_2> (language: cpp, created: 2026-02-23)"
    description: "Reuse this database"
  - label: "Build a new database"
    description: "Create a fresh database in a new output directory"

如果用户选择现有数据库： 将 $OUTPUT_DIR 设置为其父目录（或包含它的目录），将 $DB_NAME 设置为所选路径，然后继续执行扩展 → 分析。
如果用户选择"构建新的"： 解析一个新的 $OUTPUT_DIR，执行构建 → 扩展 → 分析。

如果用户的意图不明确（既未明确数据库选择，也未明确工作流），则询问：

I can help with CodeQL analysis. What would you like to do?

1. **Full scan (Recommended)** - Build database, create extensions, then run analysis
2. **Build database** - Create a new CodeQL database from this codebase
3. **Create data extensions** - Generate custom source/sink models for project APIs
4. **Run analysis** - Run security queries on existing database

[If databases found: "I found N existing database(s): <list paths with language>"]
[Show output directory: "Output will be stored in <OUTPUT_DIR>"]

文件	内容
工作流
workflows/build-database.md	使用构建方法序列创建数据库
workflows/create-data-extensions.md	数据扩展生成流水线
workflows/run-analysis.md	查询执行和结果处理
参考资料
references/macos-arm64e-workaround.md	Apple Silicon 构建跟踪变通方法
references/build-fixes.md	构建失败修复目录
references/quality-assessment.md	数据库质量指标和改进
references/extension-yaml-format.md	数据扩展 YAML 列定义和示例
references/sarif-processing.md	用于 SARIF 输出处理的 jq 命令
references/diagnostic-query-templates.md	用于源/汇枚举的 QL 查询
references/important-only-suite.md	仅重要套件模板和生成
references/run-all-suite.md	全部运行套件模板
references/ruleset-catalog.md	按语言划分的可用查询包
references/threat-models.md	威胁模型配置
references/language-details.md	特定于语言的构建和提取细节
references/performance-tuning.md	内存、线程和超时配置

一次完整的 CodeQL 分析运行应满足：

输出目录已解析（用户指定或自动递增的默认值）
所有生成的文件存储在 $OUTPUT_DIR 内
数据库已构建（通过 codeql-database.yml 标记发现）且通过质量评估（基线代码行数 > 0，错误 < 5%）
数据扩展已评估——要么在 $OUTPUT_DIR/extensions/ 中创建，要么有明确理由说明并跳过
使用显式套件引用（而非默认包套件）运行分析
所有已安装的查询包（官方 + Trail of Bits + 社区）都已使用或明确排除
选定的查询包已记录到 $OUTPUT_DIR/rulesets.txt
未过滤的结果保存在 $OUTPUT_DIR/raw/results.sarif 中
最终结果在 $OUTPUT_DIR/results/results.sarif 中（针对仅重要结果进行过滤，针对全部运行进行复制）
零发现结果已调查（数据库质量、模型覆盖范围、套件选择）
构建日志保存在 $OUTPUT_DIR/build.log 中，包含所有命令、修复和质量评估

🇺🇸English

CodeQL Analysis

Supported languages: Python, JavaScript/TypeScript, Go, Java/Kotlin, C/C++, C#, Ruby, Swift.

Skill resources: Reference files and templates are located at {baseDir}/references/ and {baseDir}/workflows/.

Essential Principles

Database quality is non-negotiable. A database that builds is not automatically good. Always run quality assessment (file counts, baseline LoC, extractor errors) and compare against expected source files. A cached build produces zero useful extraction.
Data extensions catch what CodeQL misses. Even projects using standard frameworks (Django, Spring, Express) have custom wrappers around database calls, request parsing, or shell execution. Skipping the create-data-extensions workflow means missing vulnerabilities in project-specific code paths.
Explicit suite references prevent silent query dropping. Never pass pack names directly to codeql database analyze — each pack's defaultSuiteFile applies hidden filters that can produce zero results. Always generate a custom .qls suite file.
Zero findings needs investigation, not celebration. Zero results can indicate poor database quality, missing models, wrong query packs, or silent suite filtering. Investigate before reporting clean.
macOS Apple Silicon requires workarounds for compiled languages. Exit code 137 is arm64e/arm64 mismatch, not a build failure. Try Homebrew arm64 tools or Rosetta before falling back to build-mode=none.
Follow workflows step by step. Once a workflow is selected, execute it step by step without skipping phases. Each phase gates the next — skipping quality assessment or data extensions leads to incomplete analysis.

Output Directory

All generated files (database, build logs, diagnostics, extensions, results) are stored in a single output directory.

If the user specifies an output directory in their prompt, use it as OUTPUT_DIR.
If not specified , default to ./static_analysis_codeql_1. If that already exists, increment to _2, _3, etc.

In both cases, always create the directory with mkdir -p before writing any files.

# Resolve output directory
if [ -n "$USER_SPECIFIED_DIR" ]; then
  OUTPUT_DIR="$USER_SPECIFIED_DIR"
else
  BASE="static_analysis_codeql"
  N=1
  while [ -e "${BASE}_${N}" ]; do
    N=$((N + 1))
  done
  OUTPUT_DIR="${BASE}_${N}"
fi
mkdir -p "$OUTPUT_DIR"

The output directory is resolved once at the start before any workflow executes. All workflows receive $OUTPUT_DIR and store their artifacts there:

$OUTPUT_DIR/
├── rulesets.txt                 # Selected query packs (logged after Step 3)
├── codeql.db/                   # CodeQL database (dir containing codeql-database.yml)
├── build.log                    # Build log
├── codeql-config.yml            # Exclusion config (interpreted languages)
├── diagnostics/                 # Diagnostic queries and CSVs
├── extensions/                  # Data extension YAMLs
├── raw/                         # Unfiltered analysis output
│   ├── results.sarif
│   └── <mode>.qls
└── results/                     # Final results (filtered for important-only, copied for run-all)
    └── results.sarif

Database Discovery

A CodeQL database is identified by the presence of a codeql-database.yml marker file inside its directory. When searching for existing databases, always collect all matches — there may be multiple databases from previous runs or for different languages.

Discovery command:

# Find ALL CodeQL databases (top-level and one subdirectory deep)
find . -maxdepth 3 -name "codeql-database.yml" -not -path "*/\.*" 2>/dev/null \
  | while read -r yml; do dirname "$yml"; done

Inside$OUTPUT_DIR: find "$OUTPUT_DIR" -maxdepth 2 -name "codeql-database.yml"
Project-wide (for auto-detection): find . -maxdepth 3 -name "codeql-database.yml" — covers databases at the project top level (./db-name/) and one subdirectory deep (./subdir/db-name/). Does not search deeper.

Never assume a database is named codeql.db — discover it by its marker file.

When multiple databases are found:

For each discovered database, collect metadata to help the user choose:

# For each database, extract language and creation time
for db in $FOUND_DBS; do
  CODEQL_LANG=$(codeql resolve database --format=json -- "$db" 2>/dev/null | jq -r '.languages[0]')
  CREATED=$(grep '^creationMetadata:' -A5 "$db/codeql-database.yml" 2>/dev/null | grep 'creationTime' | awk '{print $2}')
  echo "$db — language: $CODEQL_LANG, created: $CREATED"
done

Then use AskUserQuestion to let the user select which database to use, or to build a new one. SkipAskUserQuestion if the user explicitly stated which database to use or to build a new one in their prompt.

Quick Start

For the common case ("scan this codebase for vulnerabilities"):

# 1. Verify CodeQL is installed
if ! command -v codeql >/dev/null 2>&1; then
  echo "NOT INSTALLED: codeql binary not found on PATH"
else
  codeql --version || echo "ERROR: codeql found but --version failed (check installation)"
fi

# 2. Resolve output directory
BASE="static_analysis_codeql"; N=1
while [ -e "${BASE}_${N}" ]; do N=$((N + 1)); done
OUTPUT_DIR="${BASE}_${N}"; mkdir -p "$OUTPUT_DIR"

Then execute the full pipeline: build database → create data extensions → run analysis using the workflows below.

When to Use

Scanning a codebase for security vulnerabilities with deep data flow analysis
Building a CodeQL database from source code (with build capability for compiled languages)
Finding complex vulnerabilities that require interprocedural taint tracking or AST/CFG analysis
Performing comprehensive security audits with multiple query packs

When NOT to Use

Writing custom queries - Use a dedicated query development skill
CI/CD integration - Use GitHub Actions documentation directly
Quick pattern searches - Use Semgrep or grep for speed
No build capability for compiled languages - Consider Semgrep instead
Single-file or lightweight analysis - Semgrep is faster for simple pattern matching

Rationalizations to Reject

These shortcuts lead to missed findings. Do not accept them:

"security-extended is enough" - It is the baseline. Always check if Trail of Bits packs and Community Packs are available for the language. They catch categories security-extended misses entirely.
"The database built, so it's good" - A database that builds does not mean it extracted well. Always run quality assessment and check file counts against expected source files.
"Data extensions aren't needed for standard frameworks" - Even Django/Spring apps have custom wrappers that CodeQL does not model. Skipping extensions means missing vulnerabilities.
"build-mode=none is fine for compiled languages" - It produces severely incomplete analysis. Only use as an absolute last resort. On macOS, try the arm64 toolchain workaround or Rosetta first.
"The build fails on macOS, just use build-mode=none" - Exit code 137 is caused by arm64e/arm64 mismatch, not a fundamental build failure. See macos-arm64e-workaround.md.
"No findings means the code is secure" - Zero findings can indicate poor database quality, missing models, or wrong query packs. Investigate before reporting clean results.
"I'll just run the default suite" / "I'll just pass the pack names directly" - Each pack's defaultSuiteFile applies hidden filters and can produce zero results. Always use an explicit suite reference.

Workflow Selection

This skill has three workflows. Once a workflow is selected, execute it step by step without skipping phases.

Workflow	Purpose
build-database	Create CodeQL database using build methods in sequence
create-data-extensions	Detect or generate data extension models for project APIs
run-analysis	Select rulesets, execute queries, process results

Auto-Detection Logic

If user explicitly specifies what to do (e.g., "build a database", "run analysis on ./my-db"), execute that workflow directly. Do NOT callAskUserQuestion for database selection if the user's prompt already makes their intent clear — e.g., "build a new database", "analyze the codeql database in static_analysis_codeql_2", "run a full scan from scratch".

Default pipeline for "test", "scan", "analyze", or similar: Discover existing databases first, then decide.

# Find ALL CodeQL databases by looking for codeql-database.yml marker file
# Search top-level dirs and one subdirectory deep
FOUND_DBS=()
while IFS= read -r yml; do
  db_dir=$(dirname "$yml")
  codeql resolve database -- "$db_dir" >/dev/null 2>&1 && FOUND_DBS+=("$db_dir")
done < <(find . -maxdepth 3 -name "codeql-database.yml" -not -path "*/\.*" 2>/dev/null)

echo "Found ${#FOUND_DBS[@]} existing database(s)"

Condition	Action
No databases found	Resolve new `$OUTPUT_DIR`, execute build → extensions → analysis (full pipeline)
One database found	Use `AskUserQuestion`: reuse it or build new?
Multiple databases found	Use `AskUserQuestion`: list all with metadata, let user pick one or build new
User explicitly stated intent	Skip `AskUserQuestion`, act on their instructions directly

Database Selection Prompt

When existing databases are found and the user did not explicitly specify which to use , present via AskUserQuestion:

header: "Existing CodeQL Databases"
question: "I found existing CodeQL database(s). What would you like to do?"
options:
  - label: "<db_path_1> (language: python, created: 2026-02-24)"
    description: "Reuse this database"
  - label: "<db_path_2> (language: cpp, created: 2026-02-23)"
    description: "Reuse this database"
  - label: "Build a new database"
    description: "Create a fresh database in a new output directory"

After selection:

If user picks an existing database: Set $OUTPUT_DIR to its parent directory (or the directory containing it), set $DB_NAME to the selected path, then proceed to extensions → analysis.
If user picks "Build new": Resolve a new $OUTPUT_DIR, execute build → extensions → analysis.

General Decision Prompt

If the user's intent is ambiguous (neither database selection nor workflow is clear), ask:

I can help with CodeQL analysis. What would you like to do?

1. **Full scan (Recommended)** - Build database, create extensions, then run analysis
2. **Build database** - Create a new CodeQL database from this codebase
3. **Create data extensions** - Generate custom source/sink models for project APIs
4. **Run analysis** - Run security queries on existing database

[If databases found: "I found N existing database(s): <list paths with language>"]
[Show output directory: "Output will be stored in <OUTPUT_DIR>"]

Reference Index

File	Content
Workflows
workflows/build-database.md	Database creation with build method sequence
workflows/create-data-extensions.md	Data extension generation pipeline
workflows/run-analysis.md	Query execution and result processing
References
references/macos-arm64e-workaround.md	Apple Silicon build tracing workarounds
references/build-fixes.md	Build failure fix catalog

Success Criteria

A complete CodeQL analysis run should satisfy:

Output directory resolved (user-specified or auto-incremented default)
All generated files stored inside $OUTPUT_DIR
Database built (discovered via codeql-database.yml marker) with quality assessment passed (baseline LoC > 0, errors < 5%)
Data extensions evaluated — either created in $OUTPUT_DIR/extensions/ or explicitly skipped with justification
Analysis run with explicit suite reference (not default pack suite)
All installed query packs (official + Trail of Bits + Community) used or explicitly excluded
Selected query packs logged to $OUTPUT_DIR/rulesets.txt
Unfiltered results preserved in $OUTPUT_DIR/raw/results.sarif
Final results in $OUTPUT_DIR/results/results.sarif (filtered for important-only, copied for run-all)
Zero-finding results investigated (database quality, model coverage, suite selection)
Build log preserved at $OUTPUT_DIR/build.log with all commands, fixes, and quality assessments

Weekly Installs

1.3K

Repository

trailofbits/skills

GitHub Stars

3.9K

First Seen

Jan 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

claude-code1.2K

codex1.1K

opencode1.1K

gemini-cli1.1K

cursor1.0K

github-copilot985

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

102,200 周安装

"I'll put files in the current directory" - All generated files must go in $OUTPUT_DIR. Scattering files in the working directory makes cleanup impossible and risks overwriting previous runs.

"Just use the first database I find" - Multiple databases may exist for different languages or from previous runs. When more than one is found, present all options to the user. Only skip the prompt when the user already specified which database to use.

"The user said 'scan', that means they want me to pick a database" - "Scan" is not database selection. If multiple databases exist and the user didn't name one, ask.

CodeQL 代码安全分析工具 - 支持多语言的静态代码扫描与漏洞检测

🇨🇳中文介绍

CodeQL 分析

基本原则

相关 Skills

输出目录

数据库发现

快速开始

使用时机

不适用时机