模糊测试字典（Fuzzing Dictionary）使用指南：提升代码覆盖率与漏洞发现效率

fuzzing-dictionary by trailofbits/skills

1,100 周安装量

3,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/trailofbits/skills --skill fuzzing-dictionary

开发测试安全

🇨🇳中文介绍

模糊测试字典

模糊测试字典提供特定领域的标记，以引导模糊测试器生成有意义的输入。模糊测试器不再进行纯粹的随机变异，而是结合已知的关键词、魔数、协议命令和特定格式的字符串，这些内容更有可能触及解析器、协议处理器和文件格式处理器中更深层的代码路径。

概述

字典是包含带引号字符串的文本文件，这些字符串代表了针对目标的、有意义的标记。它们帮助模糊测试器绕过早期的验证检查，并探索仅通过盲目变异难以触及的代码路径。

关键概念

概念	描述
字典条目	一个带引号的字符串（例如 `"keyword"`）或键值对（例如 `kw="value"`）
十六进制转义	用于不可打印字符的字节序列，例如 `"\xF7\xF8"`
标记注入	模糊测试器将字典条目插入到生成的输入中
跨模糊测试器格式	字典文件适用于 libFuzzer、AFL++ 和 cargo-fuzz

适用场景

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

任务	命令/模式
与 libFuzzer 一起使用	`./fuzz -dict=./dictionary.dict ...`
与 AFL++ 一起使用	`afl-fuzz -x ./dictionary.dict ...`
与 cargo-fuzz 一起使用	`cargo fuzz run fuzz_target -- -dict=./dictionary.dict`
从头文件中提取	`grep -o '".*"' header.h > header.dict`
从二进制文件中生成	`strings ./binary

步骤 1：创建字典文件

创建一个文本文件，每行包含带引号的字符串。使用注释（#）进行文档说明。

示例字典格式：

# 以 '#' 开头的行和空行会被忽略。

# 将 "blah"（不带引号）添加到字典中。
kw1="blah"
# 使用 \\ 表示反斜杠，\" 表示引号。
kw2="\"ac\\dc\""
# 使用 \xAB 表示十六进制值
kw3="\xF7\xF8"
# 关键词名称后跟的 '=' 可以省略：
"foo\x0Abar"

步骤 2：生成字典内容

根据可用资源选择生成方法：

从 LLM 生成： 向 ChatGPT 或 Claude 发送提示：

A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.

从头文件生成：

grep -o '".*"' header.h > header.dict

从手册页生成（针对 CLI 工具）：

man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict

从二进制字符串生成：

strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict

步骤 3：将字典传递给模糊测试器

根据你的模糊测试器使用相应的标志（参见上面的快速参考）。

模式：协议关键词

用例： 对 HTTP 或自定义协议处理器进行模糊测试

# HTTP 方法
"GET"
"POST"
"PUT"
"DELETE"
"HEAD"

# 头部
"Content-Type"
"Authorization"
"Host"

# 协议标记
"HTTP/1.1"
"HTTP/2.0"

模式：魔数和文件格式头部

用例： 对图像解析器、媒体解码器、存档处理器进行模糊测试

# PNG 魔数和数据块
png_magic="\x89PNG\r\n\x1a\n"
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"

# JPEG 标记
jpeg_soi="\xFF\xD8"
jpeg_eoi="\xFF\xD9"

模式：配置文件关键词

用例： 对配置文件解析器进行模糊测试（YAML、TOML、INI）

# 常见配置关键词
"true"
"false"
"null"
"version"
"enabled"
"disabled"

# 节标题
"[general]"
"[network]"
"[security]"

技巧	为何有帮助
结合多种生成方法	LLM 生成的关键词 + 二进制文件中的字符串覆盖更广的表面
包含边界值	`"0"`、`"-1"`、`"2147483647"` 会触发边界情况
添加格式分隔符	`:`、`=`、`{`、`}` 帮助模糊测试器构建有效的结构
保持字典聚焦	50-200 个条目的性能优于数千个条目
测试字典有效性	运行带字典和不带字典的测试，比较覆盖率

自动生成的字典（AFL++）

当使用 afl-clang-lto 编译器时，AFL++ 会自动从二进制文件中的字符串比较中提取字典条目。这是通过 AUTODICTIONARY 功能在编译时完成的。

启用自动字典：

export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target
# 字典保存到 auto.dict
afl-fuzz -x auto.dict -i in -o out -- ./target

一些模糊测试器支持多个字典文件：

# AFL++ 使用多个字典
afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target

反面模式	问题	正确方法
包含完整句子	模糊测试器需要原子标记，而不是散文	分解为独立的关键词
重复条目	浪费变异预算	使用 `sort -u` 去重
字典过大	减慢模糊测试器速度，稀释有用标记	保持聚焦：50-200 个最相关的条目
缺少十六进制转义	不可打印字节会被破坏	对二进制值使用 `\xXX`
没有注释	难以维护和审计	使用 `#` 注释记录各个部分

clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/

字典标记在变异期间被插入/替换
与 -max_len 结合使用以控制输入大小
使用 -print_final_stats=1 查看字典有效性指标
长度超过 -max_len 的字典条目会被忽略

afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@

AFL++ 支持多个 -x 标志以使用多个字典
与 afl-clang-lto 一起使用 AFL_LLVM_DICT2FILE 来自动生成字典
字典有效性显示在模糊测试器统计信息 UI 中
标记在确定性和 havoc 阶段被使用

cargo fuzz run fuzz_target -- -dict=./dictionary.dict

cargo-fuzz 使用 libFuzzer 后端，因此所有 libFuzzer 字典标志都适用
将字典文件放在与测试工具相同的 fuzz/ 目录中
从测试工具目录引用：cargo fuzz run target -- -dict=../dictionary.dict

go-fuzz 没有内置的字典支持，但你可以手动用字典条目填充语料库：

# 将字典转换为语料库文件
grep -o '".*"' dict.txt | while read line; do
    echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1)
done

go-fuzz -bin=./target-fuzz.zip -workdir=.

问题	原因	解决方案
字典文件未加载	路径错误或格式错误	检查模糊测试器输出中的字典解析错误；验证文件格式
覆盖率没有改善	字典标记不相关	分析目标代码的实际关键词；尝试不同的生成方法
字典文件中的语法错误	未转义的引号或无效的转义	使用 `\\` 表示反斜杠，`\"` 表示引号；通过测试运行进行验证
模糊测试器忽略长条目	条目超过 `-max_len`	保持条目在最大输入长度以下，或增加 `-max_len`
条目过多导致模糊测试器变慢	字典太大	修剪到 50-200 个最相关的条目

使用此技术的工具

技能	如何应用
libfuzzer	通过 `-dict=` 标志提供原生字典支持
aflpp	通过 `-x` 标志提供原生字典支持；通过 AUTODICTIONARIES 自动生成
cargo-fuzz	使用 libFuzzer 后端，继承 `-dict=` 支持

技能	关系
fuzzing-corpus	字典补充语料库：语料库提供结构，字典提供关键词
coverage-analysis	使用覆盖率数据验证字典有效性
harness-writing	测试工具结构决定了哪些字典标记是有用的

AFL++ 字典 针对常见格式（HTML、XML、JSON、SQL 等）的预构建字典。是特定格式模糊测试的良好起点。

libFuzzer 字典文档 关于字典格式和用法的官方 libFuzzer 文档。解释了标记插入策略和性能影响。

OSS-Fuzz 字典 来自谷歌持续模糊测试服务的真实世界字典。搜索项目目录中的 *.dict 文件以查看生产环境示例。

🇺🇸English

Fuzzing Dictionary

A fuzzing dictionary provides domain-specific tokens to guide the fuzzer toward interesting inputs. Instead of purely random mutations, the fuzzer incorporates known keywords, magic numbers, protocol commands, and format-specific strings that are more likely to reach deeper code paths in parsers, protocol handlers, and file format processors.

Overview

Dictionaries are text files containing quoted strings that represent meaningful tokens for your target. They help fuzzers bypass early validation checks and explore code paths that would be difficult to reach through blind mutation alone.

Key Concepts

Concept	Description
Dictionary Entry	A quoted string (e.g., `"keyword"`) or key-value pair (e.g., `kw="value"`)
Hex Escapes	Byte sequences like `"\xF7\xF8"` for non-printable characters
Token Injection	Fuzzer inserts dictionary entries into generated inputs
Cross-Fuzzer Format	Dictionary files work with libFuzzer, AFL++, and cargo-fuzz

When to Apply

Apply this technique when:

Fuzzing parsers (JSON, XML, config files)
Fuzzing protocol implementations (HTTP, DNS, custom protocols)
Fuzzing file format handlers (PNG, PDF, media codecs)
Coverage plateaus early without reaching deeper logic
Target code checks for specific keywords or magic values

Skip this technique when:

Fuzzing pure algorithms without format expectations
Target has no keyword-based parsing
Corpus already achieves high coverage

Quick Reference

Task	Command/Pattern
Use with libFuzzer	`./fuzz -dict=./dictionary.dict ...`
Use with AFL++	`afl-fuzz -x ./dictionary.dict ...`
Use with cargo-fuzz	`cargo fuzz run fuzz_target -- -dict=./dictionary.dict`
Extract from header	`grep -o '".*"' header.h > header.dict`
Generate from binary	`strings ./binary

Step-by-Step

Step 1: Create Dictionary File

Create a text file with quoted strings on each line. Use comments (#) for documentation.

Example dictionary format:

# Lines starting with '#' and empty lines are ignored.

# Adds "blah" (w/o quotes) to the dictionary.
kw1="blah"
# Use \\ for backslash and \" for quotes.
kw2="\"ac\\dc\""
# Use \xAB for hex values
kw3="\xF7\xF8"
# the name of the keyword followed by '=' may be omitted:
"foo\x0Abar"

Step 2: Generate Dictionary Content

Choose a generation method based on what's available:

From LLM: Prompt ChatGPT or Claude with:

A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.

From header files:

grep -o '".*"' header.h > header.dict

From man pages (for CLI tools):

man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict

From binary strings:

strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict

Step 3: Pass Dictionary to Fuzzer

Use the appropriate flag for your fuzzer (see Quick Reference above).

Common Patterns

Pattern: Protocol Keywords

Use Case: Fuzzing HTTP or custom protocol handlers

Dictionary content:

# HTTP methods
"GET"
"POST"
"PUT"
"DELETE"
"HEAD"

# Headers
"Content-Type"
"Authorization"
"Host"

# Protocol markers
"HTTP/1.1"
"HTTP/2.0"

Pattern: Magic Bytes and File Format Headers

Use Case: Fuzzing image parsers, media decoders, archive handlers

Dictionary content:

# PNG magic bytes and chunks
png_magic="\x89PNG\r\n\x1a\n"
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"

# JPEG markers
jpeg_soi="\xFF\xD8"
jpeg_eoi="\xFF\xD9"

Pattern: Configuration File Keywords

Use Case: Fuzzing config file parsers (YAML, TOML, INI)

Dictionary content:

# Common config keywords
"true"
"false"
"null"
"version"
"enabled"
"disabled"

# Section headers
"[general]"
"[network]"
"[security]"

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Combine multiple generation methods	LLM-generated keywords + strings from binary covers broad surface
Include boundary values	`"0"`, `"-1"`, `"2147483647"` trigger edge cases
Add format delimiters	`:`, `=`, `{`, `}` help fuzzer construct valid structures
Keep dictionaries focused	50-200 entries perform better than thousands

Auto-Generated Dictionaries (AFL++)

When using afl-clang-lto compiler, AFL++ automatically extracts dictionary entries from string comparisons in the binary. This happens at compile time via the AUTODICTIONARY feature.

Enable auto-dictionary:

export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target
# Dictionary saved to auto.dict
afl-fuzz -x auto.dict -i in -o out -- ./target

Combining Multiple Dictionaries

Some fuzzers support multiple dictionary files:

# AFL++ with multiple dictionaries
afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Including full sentences	Fuzzer needs atomic tokens, not prose	Break into individual keywords
Duplicating entries	Wastes mutation budget	Use `sort -u` to deduplicate
Over-sized dictionaries	Slows fuzzer, dilutes useful tokens	Keep focused: 50-200 most relevant entries
Missing hex escapes	Non-printable bytes become mangled	Use `\xXX` for binary values
No comments	Hard to maintain and audit	Document sections with `#` comments

Tool-Specific Guidance

libFuzzer

clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/

Integration tips:

Dictionary tokens are inserted/replaced during mutations
Combine with -max_len to control input size
Use -print_final_stats=1 to see dictionary effectiveness metrics
Dictionary entries longer than -max_len are ignored

AFL++

afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@

Integration tips:

AFL++ supports multiple -x flags for multiple dictionaries
Use AFL_LLVM_DICT2FILE with afl-clang-lto for auto-generated dictionaries
Dictionary effectiveness shown in fuzzer stats UI
Tokens are used during deterministic and havoc stages

cargo-fuzz (Rust)

cargo fuzz run fuzz_target -- -dict=./dictionary.dict

Integration tips:

cargo-fuzz uses libFuzzer backend, so all libFuzzer dict flags work
Place dictionary file in fuzz/ directory alongside harness
Reference from harness directory: cargo fuzz run target -- -dict=../dictionary.dict

go-fuzz (Go)

go-fuzz does not have built-in dictionary support, but you can manually seed the corpus with dictionary entries:

# Convert dictionary to corpus files
grep -o '".*"' dict.txt | while read line; do
    echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1)
done

go-fuzz -bin=./target-fuzz.zip -workdir=.

Troubleshooting

Issue	Cause	Solution
Dictionary file not loaded	Wrong path or format error	Check fuzzer output for dict parsing errors; verify file format
No coverage improvement	Dictionary tokens not relevant	Analyze target code for actual keywords; try different generation method
Syntax errors in dict file	Unescaped quotes or invalid escapes	Use `\\` for backslash, `\"` for quotes; validate with test run
Fuzzer ignores long entries	Entries exceed `-max_len`	Keep entries under max input length, or increase `-max_len`
Too many entries slow fuzzer	Dictionary too large

Related Skills

Tools That Use This Technique

Skill	How It Applies
libfuzzer	Native dictionary support via `-dict=` flag
aflpp	Native dictionary support via `-x` flag; auto-generation with AUTODICTIONARIES
cargo-fuzz	Uses libFuzzer backend, inherits `-dict=` support

Related Techniques

Skill	Relationship
fuzzing-corpus	Dictionaries complement corpus: corpus provides structure, dictionary provides keywords
coverage-analysis	Use coverage data to validate dictionary effectiveness
harness-writing	Harness structure determines which dictionary tokens are useful

Resources

Key External Resources

AFL++ Dictionaries Pre-built dictionaries for common formats (HTML, XML, JSON, SQL, etc.). Good starting point for format-specific fuzzing.

libFuzzer Dictionary Documentation Official libFuzzer documentation on dictionary format and usage. Explains token insertion strategy and performance implications.

Additional Examples

OSS-Fuzz Dictionaries Real-world dictionaries from Google's continuous fuzzing service. Search project directories for *.dict files to see production examples.

Weekly Installs

1.1K

Repository

trailofbits/skills

GitHub Stars

3.9K

First Seen

Jan 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code976

opencode932

gemini-cli913

codex905

cursor881

github-copilot852

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

102,200 周安装

模糊测试字典（Fuzzing Dictionary）使用指南：提升代码覆盖率与漏洞发现效率

🇨🇳中文介绍

模糊测试字典

概述

关键概念

适用场景

相关 Skills

快速参考

分步指南

步骤 1：创建字典文件

步骤 2：生成字典内容

步骤 3：将字典传递给模糊测试器

常见模式

模式：协议关键词

模式：魔数和文件格式头部

模式：配置文件关键词

高级用法

技巧与窍门

自动生成的字典（AFL++）

组合多个字典

反面模式

工具特定指南

libFuzzer

AFL++

cargo-fuzz (Rust)

go-fuzz (Go)

故障排除

相关技能

使用此技术的工具

相关技术

资源

关键外部资源

其他示例

🇺🇸English

Fuzzing Dictionary

Overview

Key Concepts

When to Apply

Quick Reference

Step-by-Step

Step 1: Create Dictionary File

Step 2: Generate Dictionary Content

Step 3: Pass Dictionary to Fuzzer

Common Patterns

Pattern: Protocol Keywords

Pattern: Magic Bytes and File Format Headers

Pattern: Configuration File Keywords

Advanced Usage

Tips and Tricks

Auto-Generated Dictionaries (AFL++)

Combining Multiple Dictionaries

Anti-Patterns

Tool-Specific Guidance

libFuzzer

AFL++

cargo-fuzz (Rust)

go-fuzz (Go)

Troubleshooting

Related Skills

Tools That Use This Technique

Related Techniques

Resources

Key External Resources

Additional Examples

最新 Skills