模糊测试工具编写指南：提升代码覆盖率与发现关键错误

harness-writing by trailofbits/skills

1,100 周安装量

3,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/trailofbits/skills --skill harness-writing

开发测试安全

🇨🇳中文介绍

编写模糊测试工具

模糊测试工具是接收来自模糊测试器的随机数据并将其路由到被测系统（SUT）的入口函数。工具的质量直接决定了哪些代码路径会被执行以及是否能发现关键错误。编写不当的工具可能会遗漏整个子系统或产生不可复现的崩溃。

概述

工具是模糊测试器随机字节生成与应用程序API之间的桥梁。它必须将原始字节解析为有意义的输入，调用目标函数，并优雅地处理边界情况。任何模糊测试设置中最重要的部分就是工具——如果编写不当，应用程序的关键部分可能无法被覆盖。

关键概念

概念	描述
工具	接收模糊测试器输入并调用被测目标代码的函数
SUT	被测系统——正在接受模糊测试的代码
入口点	模糊测试器要求的函数签名（例如 `LLVMFuzzerTestOneInput`）
FuzzedDataProvider	从原始字节中有结构地提取类型化数据的辅助类
确定性	确保相同输入始终产生相同行为的属性
交错模糊测试	基于输入执行多个操作的单一工具

适用场景

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

任务	模式
最小C++工具	`extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)`
最小Rust工具	`fuzz_target!(
大小验证	`if (size < MIN_SIZE) return 0;`
转换为整数	`uint32_t val = (uint32_t)(data);`
使用FuzzedDataProvider	`FuzzedDataProvider fuzzed_data(data, size);`
提取类型化数据（C++）	`auto val = fuzzed_data.ConsumeIntegral<uint32_t>();`
提取字符串（C++）	`auto str = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);`

步骤1：识别入口点

在代码库中查找以下函数：

接受外部输入（解析器、验证器、协议处理器）
解析复杂数据格式（JSON、XML、二进制协议）
执行安全关键操作（身份验证、加密）
具有高圈复杂度或许多分支

良好的目标通常是：

协议解析器
文件格式解析器
序列化/反序列化函数
输入验证例程

步骤2：编写最小工具

从调用目标函数的最简单工具开始：

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    target_function(data, size);
    return 0;
}

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    target_function(data);
});

步骤3：添加输入验证

拒绝太小或太大而无意义的输入：

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // 确保最小大小以获得有意义的输入
    if (size < MIN_INPUT_SIZE || size > MAX_INPUT_SIZE) {
        return 0;
    }
    target_function(data, size);
    return 0;
}

原理： 模糊测试器生成各种大小的随机输入。你的工具必须处理空、极小、极大或格式错误的输入，而不会在工具本身引起意外问题（SUT中的崩溃是可以的——这正是我们要寻找的）。

步骤4：结构化输入

对于需要类型化数据（整数、字符串等）的API，使用类型转换或像 FuzzedDataProvider 这样的辅助工具：

简单类型转换：

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size != 2 * sizeof(uint32_t)) {
        return 0;
    }

    uint32_t numerator = *(uint32_t*)(data);
    uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));

    divide(numerator, denominator);
    return 0;
}

使用FuzzedDataProvider：

#include "FuzzedDataProvider.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FuzzedDataProvider fuzzed_data(data, size);

    size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();
    std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
    std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);

    concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
    return 0;
}

步骤5：测试和迭代

运行模糊测试器并监控：

代码覆盖率（是否到达所有有趣的路径？）
每秒执行次数（速度是否足够快？）
崩溃可复现性（能否用保存的输入复现崩溃？）

迭代改进工具以提高这些指标。

模式：超越字节数组——转换为整数

使用场景： 当目标期望整数或浮点数等基本类型时

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // 确保恰好是2个4字节数字
    if (size != 2 * sizeof(uint32_t)) {
        return 0;
    }

    // 将输入拆分为两个整数
    uint32_t numerator = *(uint32_t*)(data);
    uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));

    divide(numerator, denominator);
    return 0;
}

Rust等效实现：

fuzz_target!(|data: &[u8]| {
    if data.len() != 2 * std::mem::size_of::<i32>() {
        return;
    }

    let numerator = i32::from_ne_bytes([data[0], data[1], data[2], data[3]]);
    let denominator = i32::from_ne_bytes([data[4], data[5], data[6], data[7]]);

    divide(numerator, denominator);
});

工作原理： 任何8字节输入都是有效的。模糊测试器学习到输入必须恰好是8字节，并且每个位翻转都会产生一个新的、可能有趣的输入。

模式：用于复杂输入的FuzzedDataProvider

使用场景： 当目标需要多个字符串、整数或可变长度数据时

#include "FuzzedDataProvider.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FuzzedDataProvider fuzzed_data(data, size);

    // 提取不同类型的数据
    size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();

    // 使用终止符消费可变长度字符串
    std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
    std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);

    char* result = concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
    if (result != NULL) {
        free(result);
    }

    return 0;
}

优势： FuzzedDataProvider 处理从字节流中提取结构化数据的复杂性。对于需要不同类型多个参数的API特别有用。

模式：交错模糊测试

使用场景： 当多个相关操作应在单个工具中测试时

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size < 1 + 2 * sizeof(int32_t)) {
        return 0;
    }

    // 第一个字节选择操作
    uint8_t mode = data[0];

    // 后续字节是操作数
    int32_t numbers[2];
    memcpy(numbers, data + 1, 2 * sizeof(int32_t));

    int32_t result = 0;
    switch (mode % 4) {
        case 0:
            result = add(numbers[0], numbers[1]);
            break;
        case 1:
            result = subtract(numbers[0], numbers[1]);
            break;
        case 2:
            result = multiply(numbers[0], numbers[1]);
            break;
        case 3:
            result = divide(numbers[0], numbers[1]);
            break;
    }

    // 防止编译器优化掉调用
    printf("%d", result);
    return 0;
}

编写一个工具比编写多个单独工具更快
共享的单一语料库意味着对一个操作有趣的输入可能对其他操作也有趣
可以发现操作之间交互的错误

操作共享相似的输入类型
操作在逻辑上相关（例如算术运算、CRUD操作）
单一语料库对所有操作都有意义

模式：使用Arbitrary进行结构感知模糊测试（Rust）

使用场景： 当模糊测试使用自定义结构体的Rust代码时

use arbitrary::Arbitrary;

#[derive(Debug, Arbitrary)]
pub struct Name {
    data: String
}

impl Name {
    pub fn check_buf(&self) {
        let data = self.data.as_bytes();
        if data.len() > 0 && data[0] == b'a' {
            if data.len() > 1 && data[1] == b'b' {
                if data.len() > 2 && data[2] == b'c' {
                    process::abort();
                }
            }
        }
    }
}

使用arbitrary的工具：

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: your_project::Name| {
    data.check_buf();
});

添加到Cargo.toml：

[dependencies]
arbitrary = { version = "1", features = ["derive"] }

优势： arbitrary crate自动处理将原始字节反序列化为Rust结构体，减少样板代码并确保有效的结构体构造。

限制： arbitrary crate不提供反向序列化，因此无法手动构造映射到特定结构体的字节数组。这最适合从空语料库开始（对libFuzzer很好，对AFL++有问题）。

技巧	优势
从解析器开始	错误密度高，入口点清晰，易于编写工具
模拟I/O操作	防止阻塞I/O导致的挂起，实现确定性
使用FuzzedDataProvider	简化从原始字节提取结构化数据
重置全局状态	确保每次迭代独立且可复现
在工具中释放资源	防止长时间活动中内存耗尽
避免在工具中记录日志	记录日志很慢——模糊测试需要每秒100-1000次执行
首先手动测试工具	在开始活动前使用已知输入运行工具
尽早检查覆盖率	确保工具到达预期的代码路径

使用Protocol Buffers进行结构感知模糊测试

对于高度结构化的输入格式，考虑使用Protocol Buffers作为中间格式并配合自定义变异器：

// 在.proto文件中定义输入格式
// 使用libprotobuf-mutator生成有效变异
// 这确保模糊测试器变异消息内容，而不是protobuf编码本身

这种方法设置更复杂，但防止模糊测试器在无法解析的输入上浪费时间。详见结构感知模糊测试文档。

问题： 随机值或时间依赖性导致不可复现的崩溃。

用从模糊测试器输入中种子的确定性PRNG替换 rand()：

uint32_t seed = fuzzed_data.ConsumeIntegral<uint32_t>();
srand(seed);

模拟返回时间、PID或随机数据的系统调用
避免从 /dev/random 或 /dev/urandom 读取

如果你的SUT使用全局状态（单例、静态变量），请在迭代之间重置它：

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // 每次迭代前重置全局状态
    global_reset();

    target_function(data, size);

    // 清理资源
    global_cleanup();
    return 0;
}

原理： 全局状态可能导致在N次迭代后崩溃，而不是在特定输入上崩溃，使错误不可复现。

遵循这些规则以确保有效的模糊测试工具：

规则	原理
处理所有输入大小	模糊测试器生成空、极小、极大输入——工具必须优雅处理
绝不调用`exit()`	调用 `exit()` 会停止模糊测试器进程。如果需要，在SUT中使用 `abort()`
连接所有线程	每次迭代必须在下次迭代开始前运行完成
保持快速	目标为每秒100-1000次执行。避免记录日志、高复杂度、过多内存使用
保持确定性	相同输入必须始终产生相同行为以确保可复现性
避免全局状态	全局状态降低可复现性——如果不可避免，在迭代之间重置
使用窄目标	不要在同一工具中模糊测试PNG和TCP——不同格式需要单独目标
释放资源	防止内存泄漏导致长时间活动中资源耗尽

注意： 这些指南不仅适用于工具代码，也适用于整个SUT。如果SUT违反这些规则，考虑修补它（见模糊测试障碍技术）。

反模式	问题	正确方法
全局状态不重置	非确定性崩溃	在工具开始时重置所有全局变量
阻塞I/O或网络调用	挂起模糊测试器，浪费时间	模拟I/O，使用内存缓冲区
工具中的内存泄漏	资源耗尽终止活动	返回前释放所有分配
在SUT中调用`exit()`	停止整个模糊测试过程	使用 `abort()` 或返回错误码
工具中大量记录日志	将执行/秒降低几个数量级	模糊测试期间禁用日志记录
每次迭代操作过多	减慢模糊测试器速度	保持迭代快速且专注
混合不相关的输入格式	语料库条目在不同格式间无用	不同格式使用单独工具
不验证输入大小	工具在边界情况下崩溃	访问 `data` 前检查 `size`

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // 你的代码在这里
    return 0;  // 非零返回值保留供将来使用
}

clang++ -fsanitize=fuzzer,address -g harness.cc -o fuzz_target

使用 FuzzedDataProvider.h 进行结构化输入提取
使用 -fsanitize=fuzzer 编译以链接模糊测试运行时
添加清理器（-fsanitize=address,undefined）以检测更多错误
使用 -g 在崩溃时获得更好的堆栈跟踪
libFuzzer可以从空语料库开始——不需要种子输入

./fuzz_target corpus_dir/

AFL++支持多种工具风格。为获得最佳性能，使用持久模式：

持久模式工具：

#include <unistd.h>

int main(int argc, char **argv) {
    #ifdef __AFL_HAVE_MANUAL_CONTROL
        __AFL_INIT();
    #endif

    unsigned char buf[MAX_SIZE];

    while (__AFL_LOOP(10000)) {
        // 从标准输入读取输入
        ssize_t len = read(0, buf, sizeof(buf));
        if (len <= 0) break;

        // 调用目标函数
        target_function(buf, len);
    }

    return 0;
}

afl-clang-fast++ -g harness.cc -o fuzz_target

使用持久模式（__AFL_LOOP）获得10-100倍加速
考虑延迟初始化（__AFL_INIT()）以跳过设置开销
AFL++要求语料库目录中至少有一个种子输入
使用 AFL_USE_ASAN=1 或 AFL_USE_UBSAN=1 进行清理器构建

afl-fuzz -i seeds/ -o findings/ -- ./fuzz_target

cargo-fuzz（Rust）

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    // 你的代码在这里
});

使用结构化输入（arbitrary crate）：

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: YourStruct| {
    data.check();
});

cargo fuzz init
cargo fuzz add my_target

使用 arbitrary crate进行自动结构体反序列化
cargo-fuzz包装了libFuzzer，因此所有libFuzzer功能都有效
通过cargo-fuzz自动使用清理器编译
工具放在 fuzz/fuzz_targets/ 目录中

cargo +nightly fuzz run my_target

// +build gofuzz

package mypackage

func Fuzz(data []byte) int {
    // 调用目标函数
    target(data)

    // 返回码：
    // -1 如果输入无效
    //  0 如果输入有效但不有趣
    //  1 如果输入有趣（例如，添加了新覆盖率）
    return 0
}

为添加覆盖率的输入返回1（可选——模糊测试器可以自动检测）
为无效输入返回-1以降低类似变异的优先级
go-fuzz自动处理持久性

go-fuzz -bin=./mypackage-fuzz.zip -workdir=fuzz

问题	原因	解决方案
每秒执行次数低	工具太慢（记录日志、I/O、复杂度）	分析工具，移除瓶颈，模拟I/O
未发现崩溃	覆盖率未到达有错误的代码	检查覆盖率，改进工具以到达更多路径
不可复现的崩溃	非确定性或全局状态	移除随机性，在迭代之间重置全局变量
模糊测试器立即退出	工具调用 `exit()`	将 `exit()` 替换为 `abort()` 或返回错误
内存不足错误	工具或SUT中的内存泄漏	释放分配，使用泄漏清理器查找泄漏
空输入时崩溃	工具未验证大小	添加 `if (size < MIN_SIZE) return 0;`
语料库不增长	输入限制过多或格式过于严格	使用FuzzedDataProvider或结构感知模糊测试

使用此技术的工具

技能	应用方式
libfuzzer	使用带有FuzzedDataProvider的 `LLVMFuzzerTestOneInput` 工具签名
aflpp	支持带有 `__AFL_LOOP` 的持久模式工具以获得性能
cargo-fuzz	使用Rust特定的 `fuzz_target!` 宏与arbitrary crate集成
atheris	Python工具接收字节，调用Python函数
ossfuzz	要求工具在特定目录结构中用于云模糊测试

技能	关系
coverage-analysis	测量工具有效性——是否到达目标代码？
address-sanitizer	检测工具发现的错误（缓冲区溢出、释放后使用）
fuzzing-dictionary	提供令牌帮助模糊测试器通过工具中的格式检查
fuzzing-obstacles	当SUT违反工具规则时修补它（退出、非确定性）

在libFuzzer中拆分输入 - Google模糊测试文档 解释在单个模糊测试工具中处理多个输入参数的技术，包括使用魔术分隔符和FuzzedDataProvider。

使用Protocol Buffers进行结构感知模糊测试 使用protobuf作为中间格式并配合自定义变异器的高级技术，确保模糊测试器变异消息内容而不是格式编码。

libFuzzer文档 官方LLVM文档，涵盖工具要求、最佳实践和高级功能。

cargo-fuzz手册 使用cargo-fuzz和arbitrary crate编写Rust模糊测试工具的全面指南。

有效的文件格式模糊测试 - 关于为文件格式解析器编写工具的大会演讲
现代C/C++项目模糊测试 - 涵盖工具设计模式的教程

🇺🇸English

Writing Fuzzing Harnesses

A fuzzing harness is the entrypoint function that receives random data from the fuzzer and routes it to your system under test (SUT). The quality of your harness directly determines which code paths get exercised and whether critical bugs are found. A poorly written harness can miss entire subsystems or produce non-reproducible crashes.

Overview

The harness is the bridge between the fuzzer's random byte generation and your application's API. It must parse raw bytes into meaningful inputs, call target functions, and handle edge cases gracefully. The most important part of any fuzzing setup is the harness—if written poorly, critical parts of your application may not be covered.

Key Concepts

Concept	Description
Harness	Function that receives fuzzer input and calls target code under test
SUT	System Under Test—the code being fuzzed
Entry point	Function signature required by the fuzzer (e.g., `LLVMFuzzerTestOneInput`)
FuzzedDataProvider	Helper class for structured extraction of typed data from raw bytes
Determinism	Property that ensures same input always produces same behavior
Interleaved fuzzing	Single harness that exercises multiple operations based on input

When to Apply

Apply this technique when:

Creating a new fuzz target for the first time
Fuzz campaign has low code coverage or isn't finding bugs
Crashes found during fuzzing are not reproducible
Target API requires complex or structured inputs
Multiple related functions should be tested together

Skip this technique when:

Using existing well-tested harnesses from your project
Tool provides automatic harness generation that meets your needs
Target already has comprehensive fuzzing infrastructure

Quick Reference

Task	Pattern
Minimal C++ harness	`extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size)`
Minimal Rust harness	`fuzz_target!(
Size validation	`if (size < MIN_SIZE) return 0;`
Cast to integers	`uint32_t val = (uint32_t)(data);`
Use FuzzedDataProvider	`FuzzedDataProvider fuzzed_data(data, size);`
Extract typed data (C++)	`auto val = fuzzed_data.ConsumeIntegral<uint32_t>();`

Step-by-Step

Step 1: Identify Entry Points

Find functions in your codebase that:

Accept external input (parsers, validators, protocol handlers)
Parse complex data formats (JSON, XML, binary protocols)
Perform security-critical operations (authentication, cryptography)
Have high cyclomatic complexity or many branches

Good targets are typically:

Protocol parsers
File format parsers
Serialization/deserialization functions
Input validation routines

Step 2: Write Minimal Harness

Start with the simplest possible harness that calls your target function:

C/C++:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    target_function(data, size);
    return 0;
}

Rust:

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    target_function(data);
});

Step 3: Add Input Validation

Reject inputs that are too small or too large to be meaningful:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Ensure minimum size for meaningful input
    if (size < MIN_INPUT_SIZE || size > MAX_INPUT_SIZE) {
        return 0;
    }
    target_function(data, size);
    return 0;
}

Rationale: The fuzzer generates random inputs of all sizes. Your harness must handle empty, tiny, huge, or malformed inputs without causing unexpected issues in the harness itself (crashes in the SUT are fine—that's what we're looking for).

Step 4: Structure the Input

For APIs that require typed data (integers, strings, etc.), use casting or helpers like FuzzedDataProvider:

Simple casting:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size != 2 * sizeof(uint32_t)) {
        return 0;
    }

    uint32_t numerator = *(uint32_t*)(data);
    uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));

    divide(numerator, denominator);
    return 0;
}

Using FuzzedDataProvider:

#include "FuzzedDataProvider.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FuzzedDataProvider fuzzed_data(data, size);

    size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();
    std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
    std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);

    concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
    return 0;
}

Step 5: Test and Iterate

Run the fuzzer and monitor:

Code coverage (are all interesting paths reached?)
Executions per second (is it fast enough?)
Crash reproducibility (can you reproduce crashes with saved inputs?)

Iterate on the harness to improve these metrics.

Common Patterns

Pattern: Beyond Byte Arrays—Casting to Integers

Use Case: When target expects primitive types like integers or floats

Implementation:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Ensure exactly 2 4-byte numbers
    if (size != 2 * sizeof(uint32_t)) {
        return 0;
    }

    // Split input into two integers
    uint32_t numerator = *(uint32_t*)(data);
    uint32_t denominator = *(uint32_t*)(data + sizeof(uint32_t));

    divide(numerator, denominator);
    return 0;
}

Rust equivalent:

fuzz_target!(|data: &[u8]| {
    if data.len() != 2 * std::mem::size_of::<i32>() {
        return;
    }

    let numerator = i32::from_ne_bytes([data[0], data[1], data[2], data[3]]);
    let denominator = i32::from_ne_bytes([data[4], data[5], data[6], data[7]]);

    divide(numerator, denominator);
});

Why it works: Any 8-byte input is valid. The fuzzer learns that inputs must be exactly 8 bytes, and every bit flip produces a new, potentially interesting input.

Pattern: FuzzedDataProvider for Complex Inputs

Use Case: When target requires multiple strings, integers, or variable-length data

Implementation:

#include "FuzzedDataProvider.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FuzzedDataProvider fuzzed_data(data, size);

    // Extract different types of data
    size_t allocation_size = fuzzed_data.ConsumeIntegral<size_t>();

    // Consume variable-length strings with terminator
    std::vector<char> str1 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);
    std::vector<char> str2 = fuzzed_data.ConsumeBytesWithTerminator<char>(32, 0xFF);

    char* result = concat(&str1[0], str1.size(), &str2[0], str2.size(), allocation_size);
    if (result != NULL) {
        free(result);
    }

    return 0;
}

Why it helps: FuzzedDataProvider handles the complexity of extracting structured data from a byte stream. It's particularly useful for APIs that need multiple parameters of different types.

Pattern: Interleaved Fuzzing

Use Case: When multiple related operations should be tested in a single harness

Implementation:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    if (size < 1 + 2 * sizeof(int32_t)) {
        return 0;
    }

    // First byte selects operation
    uint8_t mode = data[0];

    // Next bytes are operands
    int32_t numbers[2];
    memcpy(numbers, data + 1, 2 * sizeof(int32_t));

    int32_t result = 0;
    switch (mode % 4) {
        case 0:
            result = add(numbers[0], numbers[1]);
            break;
        case 1:
            result = subtract(numbers[0], numbers[1]);
            break;
        case 2:
            result = multiply(numbers[0], numbers[1]);
            break;
        case 3:
            result = divide(numbers[0], numbers[1]);
            break;
    }

    // Prevent compiler from optimizing away the calls
    printf("%d", result);
    return 0;
}

Advantages:

Faster to write one harness than multiple individual harnesses
Single shared corpus means interesting inputs for one operation may be interesting for others
Can discover bugs in interactions between operations

When to use:

Operations share similar input types
Operations are logically related (e.g., arithmetic operations, CRUD operations)
Single corpus makes sense across all operations

Pattern: Structure-Aware Fuzzing with Arbitrary (Rust)

Use Case: When fuzzing Rust code that uses custom structs

Implementation:

use arbitrary::Arbitrary;

#[derive(Debug, Arbitrary)]
pub struct Name {
    data: String
}

impl Name {
    pub fn check_buf(&self) {
        let data = self.data.as_bytes();
        if data.len() > 0 && data[0] == b'a' {
            if data.len() > 1 && data[1] == b'b' {
                if data.len() > 2 && data[2] == b'c' {
                    process::abort();
                }
            }
        }
    }
}

Harness with arbitrary:

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: your_project::Name| {
    data.check_buf();
});

Add to Cargo.toml:

[dependencies]
arbitrary = { version = "1", features = ["derive"] }

Why it helps: The arbitrary crate automatically handles deserialization of raw bytes into your Rust structs, reducing boilerplate and ensuring valid struct construction.

Limitation: The arbitrary crate doesn't offer reverse serialization, so you can't manually construct byte arrays that map to specific structs. This works best when starting from an empty corpus (fine for libFuzzer, problematic for AFL++).

Advanced Usage

Tips and Tricks

Tip	Why It Helps
Start with parsers	High bug density, clear entry points, easy to harness
Mock I/O operations	Prevents hangs from blocking I/O, enables determinism
Use FuzzedDataProvider	Simplifies extraction of structured data from raw bytes
Reset global state	Ensures each iteration is independent and reproducible
Free resources in harness	Prevents memory exhaustion during long campaigns
Avoid logging in harness	Logging is slow—fuzzing needs 100s-1000s exec/sec
Test harness manually first	Run harness with known inputs before starting campaign
Check coverage early	Ensure harness reaches expected code paths

Structure-Aware Fuzzing with Protocol Buffers

For highly structured input formats, consider using Protocol Buffers as an intermediate format with custom mutators:

// Define your input format in .proto file
// Use libprotobuf-mutator to generate valid mutations
// This ensures fuzzer mutates message contents, not the protobuf encoding itself

This approach is more setup but prevents the fuzzer from wasting time on unparseable inputs. See structure-aware fuzzing documentation for details.

Handling Non-Determinism

Problem: Random values or timing dependencies cause non-reproducible crashes.

Solutions:

Replace rand() with deterministic PRNG seeded from fuzzer input:

uint32_t seed = fuzzed_data.ConsumeIntegral<uint32_t>();
srand(seed);

Mock system calls that return time, PIDs, or random data
Avoid reading from /dev/random or /dev/urandom

Resetting Global State

If your SUT uses global state (singletons, static variables), reset it between iterations:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Reset global state before each iteration
    global_reset();

    target_function(data, size);

    // Clean up resources
    global_cleanup();
    return 0;
}

Rationale: Global state can cause crashes after N iterations rather than on a specific input, making bugs non-reproducible.

Practical Harness Rules

Follow these rules to ensure effective fuzzing harnesses:

Rule	Rationale
Handle all input sizes	Fuzzer generates empty, tiny, huge inputs—harness must handle gracefully
Never call`exit()`	Calling `exit()` stops the fuzzer process. Use `abort()` in SUT if needed
Join all threads	Each iteration must run to completion before next iteration starts
Be fast	Aim for 100s-1000s executions/sec. Avoid logging, high complexity, excess memory
Maintain determinism	Same input must always produce same behavior for reproducibility
Avoid global state	Global state reduces reproducibility—reset between iterations if unavoidable

Note: These guidelines apply not just to harness code, but to the entire SUT. If the SUT violates these rules, consider patching it (see the fuzzing obstacles technique).

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Global state without reset	Non-deterministic crashes	Reset all globals at start of harness
Blocking I/O or network calls	Hangs fuzzer, wastes time	Mock I/O, use in-memory buffers
Memory leaks in harness	Resource exhaustion kills campaign	Free all allocations before returning
Calling`exit()` in SUT	Stops entire fuzzing process	Use `abort()` or return error codes
Heavy logging in harness	Reduces exec/sec by orders of magnitude	Disable logging during fuzzing
Too many operations per iteration

Tool-Specific Guidance

libFuzzer

Harness signature:

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    // Your code here
    return 0;  // Non-zero return is reserved for future use
}

Compilation:

clang++ -fsanitize=fuzzer,address -g harness.cc -o fuzz_target

Integration tips:

Use FuzzedDataProvider.h for structured input extraction
Compile with -fsanitize=fuzzer to link the fuzzing runtime
Add sanitizers (-fsanitize=address,undefined) to detect more bugs
Use -g for better stack traces when crashes occur
libFuzzer can start with empty corpus—no seed inputs required

Running:

./fuzz_target corpus_dir/

Resources:

AFL++

AFL++ supports multiple harness styles. For best performance, use persistent mode:

Persistent mode harness:

#include <unistd.h>

int main(int argc, char **argv) {
    #ifdef __AFL_HAVE_MANUAL_CONTROL
        __AFL_INIT();
    #endif

    unsigned char buf[MAX_SIZE];

    while (__AFL_LOOP(10000)) {
        // Read input from stdin
        ssize_t len = read(0, buf, sizeof(buf));
        if (len <= 0) break;

        // Call target function
        target_function(buf, len);
    }

    return 0;
}

Compilation:

afl-clang-fast++ -g harness.cc -o fuzz_target

Integration tips:

Use persistent mode (__AFL_LOOP) for 10-100x speedup
Consider deferred initialization (__AFL_INIT()) to skip setup overhead
AFL++ requires at least one seed input in the corpus directory
Use AFL_USE_ASAN=1 or AFL_USE_UBSAN=1 for sanitizer builds

Running:

afl-fuzz -i seeds/ -o findings/ -- ./fuzz_target

cargo-fuzz (Rust)

Harness signature:

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {
    // Your code here
});

With structured input (arbitrary crate):

#![no_main]
use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: YourStruct| {
    data.check();
});

Creating harness:

cargo fuzz init
cargo fuzz add my_target

Integration tips:

Use arbitrary crate for automatic struct deserialization
cargo-fuzz wraps libFuzzer, so all libFuzzer features work
Compile with sanitizers automatically via cargo-fuzz
Harnesses go in fuzz/fuzz_targets/ directory

Running:

cargo +nightly fuzz run my_target

Resources:

go-fuzz

Harness signature:

// +build gofuzz

package mypackage

func Fuzz(data []byte) int {
    // Call target function
    target(data)

    // Return codes:
    // -1 if input is invalid
    //  0 if input is valid but not interesting
    //  1 if input is interesting (e.g., added new coverage)
    return 0
}

Building:

go-fuzz-build

Integration tips:

Return 1 for inputs that add coverage (optional—fuzzer can detect automatically)
Return -1 for invalid inputs to deprioritize similar mutations
go-fuzz handles persistence automatically

Running:

go-fuzz -bin=./mypackage-fuzz.zip -workdir=fuzz

Troubleshooting

Issue	Cause	Solution
Low executions/sec	Harness is too slow (logging, I/O, complexity)	Profile harness, remove bottlenecks, mock I/O
No crashes found	Coverage not reaching buggy code	Check coverage, improve harness to reach more paths
Non-reproducible crashes	Non-determinism or global state	Remove randomness, reset globals between iterations
Fuzzer exits immediately	Harness calls `exit()`	Replace `exit()` with `abort()` or return error
Out of memory errors	Memory leaks in harness or SUT

Related Skills

Tools That Use This Technique

Skill	How It Applies
libfuzzer	Uses `LLVMFuzzerTestOneInput` harness signature with FuzzedDataProvider
aflpp	Supports persistent mode harnesses with `__AFL_LOOP` for performance
cargo-fuzz	Uses Rust-specific `fuzz_target!` macro with arbitrary crate integration
atheris	Python harness takes bytes, calls Python functions
ossfuzz	Requires harnesses in specific directory structure for cloud fuzzing

Related Techniques

Skill	Relationship
coverage-analysis	Measure harness effectiveness—are you reaching target code?
address-sanitizer	Detects bugs found by harness (buffer overflows, use-after-free)
fuzzing-dictionary	Provide tokens to help fuzzer pass format checks in harness
fuzzing-obstacles	Patch SUT when it violates harness rules (exit, non-determinism)

Resources

Key External Resources

Split Inputs in libFuzzer - Google Fuzzing Docs Explains techniques for handling multiple input parameters in a single fuzzing harness, including use of magic separators and FuzzedDataProvider.

Structure-Aware Fuzzing with Protocol Buffers Advanced technique using protobuf as intermediate format with custom mutators to ensure fuzzer mutates message contents rather than format encoding.

libFuzzer Documentation Official LLVM documentation covering harness requirements, best practices, and advanced features.

cargo-fuzz Book Comprehensive guide to writing Rust fuzzing harnesses with cargo-fuzz and the arbitrary crate.

Video Resources

Effective File Format Fuzzing - Conference talk on writing harnesses for file format parsers
Modern Fuzzing of C/C++ Projects - Tutorial covering harness design patterns

Weekly Installs

1.1K

Repository

trailofbits/skills

GitHub Stars

3.9K

First Seen

Jan 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code970

opencode926

gemini-cli906

codex901

cursor879

github-copilot847

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

102,200 周安装

Use narrow targets

模糊测试工具编写指南：提升代码覆盖率与发现关键错误

🇨🇳中文介绍

编写模糊测试工具

概述

关键概念

适用场景

相关 Skills

快速参考

逐步指南

步骤1：识别入口点

步骤2：编写最小工具

步骤3：添加输入验证

步骤4：结构化输入

步骤5：测试和迭代

常见模式

模式：超越字节数组——转换为整数

模式：用于复杂输入的FuzzedDataProvider

模式：交错模糊测试

模式：使用Arbitrary进行结构感知模糊测试（Rust）

高级用法

技巧和窍门

使用Protocol Buffers进行结构感知模糊测试

处理非确定性

重置全局状态

实用工具规则

反模式

工具特定指南

libFuzzer

AFL++

cargo-fuzz（Rust）

go-fuzz

故障排除

相关技能

使用此技术的工具

相关技术

资源

关键外部资源

视频资源

🇺🇸English

Writing Fuzzing Harnesses

Overview

Key Concepts

When to Apply

Quick Reference

Step-by-Step

Step 1: Identify Entry Points

Step 2: Write Minimal Harness

Step 3: Add Input Validation

Step 4: Structure the Input

Step 5: Test and Iterate

Common Patterns

Pattern: Beyond Byte Arrays—Casting to Integers

Pattern: FuzzedDataProvider for Complex Inputs

Pattern: Interleaved Fuzzing

Pattern: Structure-Aware Fuzzing with Arbitrary (Rust)

Advanced Usage

Tips and Tricks

Structure-Aware Fuzzing with Protocol Buffers

Handling Non-Determinism

Resetting Global State

Practical Harness Rules

Anti-Patterns

Tool-Specific Guidance

libFuzzer

AFL++

cargo-fuzz (Rust)

go-fuzz

Troubleshooting

Related Skills

Tools That Use This Technique

Related Techniques

Resources

Key External Resources

Video Resources

最新 Skills