fuzzing-dictionary by trailofbits/skills
npx skills add https://github.com/trailofbits/skills --skill fuzzing-dictionary模糊测试字典提供特定领域的标记,以引导模糊测试器生成有意义的输入。模糊测试器不再进行纯粹的随机变异,而是结合已知的关键词、魔数、协议命令和特定格式的字符串,这些内容更有可能触及解析器、协议处理器和文件格式处理器中更深层的代码路径。
字典是包含带引号字符串的文本文件,这些字符串代表了针对目标的、有意义的标记。它们帮助模糊测试器绕过早期的验证检查,并探索仅通过盲目变异难以触及的代码路径。
| 概念 | 描述 |
|---|---|
| 字典条目 | 一个带引号的字符串(例如 "keyword")或键值对(例如 kw="value") |
| 十六进制转义 | 用于不可打印字符的字节序列,例如 "\xF7\xF8" |
| 标记注入 | 模糊测试器将字典条目插入到生成的输入中 |
| 跨模糊测试器格式 | 字典文件适用于 libFuzzer、AFL++ 和 cargo-fuzz |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
在以下情况下跳过此技术:
| 任务 | 命令/模式 |
|---|---|
| 与 libFuzzer 一起使用 | ./fuzz -dict=./dictionary.dict ... |
| 与 AFL++ 一起使用 | afl-fuzz -x ./dictionary.dict ... |
| 与 cargo-fuzz 一起使用 | cargo fuzz run fuzz_target -- -dict=./dictionary.dict |
| 从头文件中提取 | grep -o '".*"' header.h > header.dict |
| 从二进制文件中生成 | `strings ./binary |
创建一个文本文件,每行包含带引号的字符串。使用注释(#)进行文档说明。
示例字典格式:
# 以 '#' 开头的行和空行会被忽略。
# 将 "blah"(不带引号)添加到字典中。
kw1="blah"
# 使用 \\ 表示反斜杠,\" 表示引号。
kw2="\"ac\\dc\""
# 使用 \xAB 表示十六进制值
kw3="\xF7\xF8"
# 关键词名称后跟的 '=' 可以省略:
"foo\x0Abar"
根据可用资源选择生成方法:
从 LLM 生成: 向 ChatGPT 或 Claude 发送提示:
A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.
从头文件生成:
grep -o '".*"' header.h > header.dict
从手册页生成(针对 CLI 工具):
man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict
从二进制字符串生成:
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict
根据你的模糊测试器使用相应的标志(参见上面的快速参考)。
用例: 对 HTTP 或自定义协议处理器进行模糊测试
字典内容:
# HTTP 方法
"GET"
"POST"
"PUT"
"DELETE"
"HEAD"
# 头部
"Content-Type"
"Authorization"
"Host"
# 协议标记
"HTTP/1.1"
"HTTP/2.0"
用例: 对图像解析器、媒体解码器、存档处理器进行模糊测试
字典内容:
# PNG 魔数和数据块
png_magic="\x89PNG\r\n\x1a\n"
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"
# JPEG 标记
jpeg_soi="\xFF\xD8"
jpeg_eoi="\xFF\xD9"
用例: 对配置文件解析器进行模糊测试(YAML、TOML、INI)
字典内容:
# 常见配置关键词
"true"
"false"
"null"
"version"
"enabled"
"disabled"
# 节标题
"[general]"
"[network]"
"[security]"
| 技巧 | 为何有帮助 |
|---|---|
| 结合多种生成方法 | LLM 生成的关键词 + 二进制文件中的字符串覆盖更广的表面 |
| 包含边界值 | "0"、"-1"、"2147483647" 会触发边界情况 |
| 添加格式分隔符 | :、=、{、} 帮助模糊测试器构建有效的结构 |
| 保持字典聚焦 | 50-200 个条目的性能优于数千个条目 |
| 测试字典有效性 | 运行带字典和不带字典的测试,比较覆盖率 |
当使用 afl-clang-lto 编译器时,AFL++ 会自动从二进制文件中的字符串比较中提取字典条目。这是通过 AUTODICTIONARY 功能在编译时完成的。
启用自动字典:
export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target
# 字典保存到 auto.dict
afl-fuzz -x auto.dict -i in -o out -- ./target
一些模糊测试器支持多个字典文件:
# AFL++ 使用多个字典
afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target
| 反面模式 | 问题 | 正确方法 |
|---|---|---|
| 包含完整句子 | 模糊测试器需要原子标记,而不是散文 | 分解为独立的关键词 |
| 重复条目 | 浪费变异预算 | 使用 sort -u 去重 |
| 字典过大 | 减慢模糊测试器速度,稀释有用标记 | 保持聚焦:50-200 个最相关的条目 |
| 缺少十六进制转义 | 不可打印字节会被破坏 | 对二进制值使用 \xXX |
| 没有注释 | 难以维护和审计 | 使用 # 注释记录各个部分 |
clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/
集成提示:
-max_len 结合使用以控制输入大小-print_final_stats=1 查看字典有效性指标-max_len 的字典条目会被忽略afl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@
集成提示:
-x 标志以使用多个字典afl-clang-lto 一起使用 AFL_LLVM_DICT2FILE 来自动生成字典cargo fuzz run fuzz_target -- -dict=./dictionary.dict
集成提示:
fuzz/ 目录中cargo fuzz run target -- -dict=../dictionary.dictgo-fuzz 没有内置的字典支持,但你可以手动用字典条目填充语料库:
# 将字典转换为语料库文件
grep -o '".*"' dict.txt | while read line; do
echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1)
done
go-fuzz -bin=./target-fuzz.zip -workdir=.
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 字典文件未加载 | 路径错误或格式错误 | 检查模糊测试器输出中的字典解析错误;验证文件格式 |
| 覆盖率没有改善 | 字典标记不相关 | 分析目标代码的实际关键词;尝试不同的生成方法 |
| 字典文件中的语法错误 | 未转义的引号或无效的转义 | 使用 \\ 表示反斜杠,\" 表示引号;通过测试运行进行验证 |
| 模糊测试器忽略长条目 | 条目超过 -max_len | 保持条目在最大输入长度以下,或增加 -max_len |
| 条目过多导致模糊测试器变慢 | 字典太大 | 修剪到 50-200 个最相关的条目 |
| 技能 | 如何应用 |
|---|---|
| libfuzzer | 通过 -dict= 标志提供原生字典支持 |
| aflpp | 通过 -x 标志提供原生字典支持;通过 AUTODICTIONARIES 自动生成 |
| cargo-fuzz | 使用 libFuzzer 后端,继承 -dict= 支持 |
| 技能 | 关系 |
|---|---|
| fuzzing-corpus | 字典补充语料库:语料库提供结构,字典提供关键词 |
| coverage-analysis | 使用覆盖率数据验证字典有效性 |
| harness-writing | 测试工具结构决定了哪些字典标记是有用的 |
AFL++ 字典 针对常见格式(HTML、XML、JSON、SQL 等)的预构建字典。是特定格式模糊测试的良好起点。
libFuzzer 字典文档 关于字典格式和用法的官方 libFuzzer 文档。解释了标记插入策略和性能影响。
OSS-Fuzz 字典 来自谷歌持续模糊测试服务的真实世界字典。搜索项目目录中的 *.dict 文件以查看生产环境示例。
每周安装量
1.1K
仓库
GitHub 星标数
3.9K
首次出现
Jan 19, 2026
安全审计
安装于
claude-code976
opencode932
gemini-cli913
codex905
cursor881
github-copilot852
A fuzzing dictionary provides domain-specific tokens to guide the fuzzer toward interesting inputs. Instead of purely random mutations, the fuzzer incorporates known keywords, magic numbers, protocol commands, and format-specific strings that are more likely to reach deeper code paths in parsers, protocol handlers, and file format processors.
Dictionaries are text files containing quoted strings that represent meaningful tokens for your target. They help fuzzers bypass early validation checks and explore code paths that would be difficult to reach through blind mutation alone.
| Concept | Description |
|---|---|
| Dictionary Entry | A quoted string (e.g., "keyword") or key-value pair (e.g., kw="value") |
| Hex Escapes | Byte sequences like "\xF7\xF8" for non-printable characters |
| Token Injection | Fuzzer inserts dictionary entries into generated inputs |
| Cross-Fuzzer Format | Dictionary files work with libFuzzer, AFL++, and cargo-fuzz |
Apply this technique when:
Skip this technique when:
| Task | Command/Pattern |
|---|---|
| Use with libFuzzer | ./fuzz -dict=./dictionary.dict ... |
| Use with AFL++ | afl-fuzz -x ./dictionary.dict ... |
| Use with cargo-fuzz | cargo fuzz run fuzz_target -- -dict=./dictionary.dict |
| Extract from header | grep -o '".*"' header.h > header.dict |
| Generate from binary | `strings ./binary |
Create a text file with quoted strings on each line. Use comments (#) for documentation.
Example dictionary format:
# Lines starting with '#' and empty lines are ignored.
# Adds "blah" (w/o quotes) to the dictionary.
kw1="blah"
# Use \\ for backslash and \" for quotes.
kw2="\"ac\\dc\""
# Use \xAB for hex values
kw3="\xF7\xF8"
# the name of the keyword followed by '=' may be omitted:
"foo\x0Abar"
Choose a generation method based on what's available:
From LLM: Prompt ChatGPT or Claude with:
A dictionary can be used to guide the fuzzer. Write me a dictionary file for fuzzing a <PNG parser>. Each line should be a quoted string or key-value pair like kw="value". Include magic bytes, chunk types, and common header values. Use hex escapes like "\xF7\xF8" for binary values.
From header files:
grep -o '".*"' header.h > header.dict
From man pages (for CLI tools):
man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/' | sort -u > man.dict
From binary strings:
strings ./binary | sed 's/^/"&/; s/$/&"/' > strings.dict
Use the appropriate flag for your fuzzer (see Quick Reference above).
Use Case: Fuzzing HTTP or custom protocol handlers
Dictionary content:
# HTTP methods
"GET"
"POST"
"PUT"
"DELETE"
"HEAD"
# Headers
"Content-Type"
"Authorization"
"Host"
# Protocol markers
"HTTP/1.1"
"HTTP/2.0"
Use Case: Fuzzing image parsers, media decoders, archive handlers
Dictionary content:
# PNG magic bytes and chunks
png_magic="\x89PNG\r\n\x1a\n"
ihdr="IHDR"
plte="PLTE"
idat="IDAT"
iend="IEND"
# JPEG markers
jpeg_soi="\xFF\xD8"
jpeg_eoi="\xFF\xD9"
Use Case: Fuzzing config file parsers (YAML, TOML, INI)
Dictionary content:
# Common config keywords
"true"
"false"
"null"
"version"
"enabled"
"disabled"
# Section headers
"[general]"
"[network]"
"[security]"
| Tip | Why It Helps |
|---|---|
| Combine multiple generation methods | LLM-generated keywords + strings from binary covers broad surface |
| Include boundary values | "0", "-1", "2147483647" trigger edge cases |
| Add format delimiters | :, =, {, } help fuzzer construct valid structures |
| Keep dictionaries focused | 50-200 entries perform better than thousands |
When using afl-clang-lto compiler, AFL++ automatically extracts dictionary entries from string comparisons in the binary. This happens at compile time via the AUTODICTIONARY feature.
Enable auto-dictionary:
export AFL_LLVM_DICT2FILE=auto.dict
afl-clang-lto++ target.cc -o target
# Dictionary saved to auto.dict
afl-fuzz -x auto.dict -i in -o out -- ./target
Some fuzzers support multiple dictionary files:
# AFL++ with multiple dictionaries
afl-fuzz -x keywords.dict -x formats.dict -i in -o out -- ./target
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| Including full sentences | Fuzzer needs atomic tokens, not prose | Break into individual keywords |
| Duplicating entries | Wastes mutation budget | Use sort -u to deduplicate |
| Over-sized dictionaries | Slows fuzzer, dilutes useful tokens | Keep focused: 50-200 most relevant entries |
| Missing hex escapes | Non-printable bytes become mangled | Use \xXX for binary values |
| No comments | Hard to maintain and audit | Document sections with # comments |
clang++ -fsanitize=fuzzer,address harness.cc -o fuzz
./fuzz -dict=./dictionary.dict corpus/
Integration tips:
-max_len to control input size-print_final_stats=1 to see dictionary effectiveness metrics-max_len are ignoredafl-fuzz -x ./dictionary.dict -i input/ -o output/ -- ./target @@
Integration tips:
-x flags for multiple dictionariesAFL_LLVM_DICT2FILE with afl-clang-lto for auto-generated dictionariescargo fuzz run fuzz_target -- -dict=./dictionary.dict
Integration tips:
fuzz/ directory alongside harnesscargo fuzz run target -- -dict=../dictionary.dictgo-fuzz does not have built-in dictionary support, but you can manually seed the corpus with dictionary entries:
# Convert dictionary to corpus files
grep -o '".*"' dict.txt | while read line; do
echo -n "$line" | base64 > corpus/$(echo "$line" | md5sum | cut -d' ' -f1)
done
go-fuzz -bin=./target-fuzz.zip -workdir=.
| Issue | Cause | Solution |
|---|---|---|
| Dictionary file not loaded | Wrong path or format error | Check fuzzer output for dict parsing errors; verify file format |
| No coverage improvement | Dictionary tokens not relevant | Analyze target code for actual keywords; try different generation method |
| Syntax errors in dict file | Unescaped quotes or invalid escapes | Use \\ for backslash, \" for quotes; validate with test run |
| Fuzzer ignores long entries | Entries exceed -max_len | Keep entries under max input length, or increase -max_len |
| Too many entries slow fuzzer | Dictionary too large |
| Skill | How It Applies |
|---|---|
| libfuzzer | Native dictionary support via -dict= flag |
| aflpp | Native dictionary support via -x flag; auto-generation with AUTODICTIONARIES |
| cargo-fuzz | Uses libFuzzer backend, inherits -dict= support |
| Skill | Relationship |
|---|---|
| fuzzing-corpus | Dictionaries complement corpus: corpus provides structure, dictionary provides keywords |
| coverage-analysis | Use coverage data to validate dictionary effectiveness |
| harness-writing | Harness structure determines which dictionary tokens are useful |
AFL++ Dictionaries Pre-built dictionaries for common formats (HTML, XML, JSON, SQL, etc.). Good starting point for format-specific fuzzing.
libFuzzer Dictionary Documentation Official libFuzzer documentation on dictionary format and usage. Explains token insertion strategy and performance implications.
OSS-Fuzz Dictionaries Real-world dictionaries from Google's continuous fuzzing service. Search project directories for *.dict files to see production examples.
Weekly Installs
1.1K
Repository
GitHub Stars
3.9K
First Seen
Jan 19, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code976
opencode932
gemini-cli913
codex905
cursor881
github-copilot852
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
102,200 周安装
NestJS专家服务 | 企业级TypeScript后端开发与架构设计
1,000 周安装
安全代码卫士:AI驱动的安全编码指南与最佳实践,防止SQL注入、XSS攻击
1,000 周安装
ESLint迁移到Oxlint完整指南:JavaScript/TypeScript项目性能优化工具
1,000 周安装
Chrome CDP 命令行工具:轻量级浏览器自动化,支持截图、执行JS、无障碍快照
1,000 周安装
Sanity内容建模最佳实践:结构化内容设计原则与无头CMS指南
1,000 周安装
AI Sprint规划器 - 敏捷团队Scrum迭代计划工具,自动估算故事点与容量管理
1,000 周安装
| Test dictionary effectiveness |
| Run with and without dict, compare coverage |
| Prune to 50-200 most relevant entries |