⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

BenchmarkDotNet 微基准测试指南：.NET 性能优化与比较策略

microbenchmarking by dotnet/skills

123 周安装量

1,000 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/dotnet/skills --skill microbenchmarking

.NET 性能优化测试

🇨🇳中文介绍

基准测试编写指南

BenchmarkDotNet（BDN）是一个用于编写和运行微基准测试的 .NET 库。在本技能中，"BDN" 指代 BenchmarkDotNet。

注意： 对 LLM 编写 BenchmarkDotNet 基准测试的评估揭示了常见的失败模式，这些模式是由对 BDN 行为的过时假设引起的——特别是关于运行时比较、作业配置和执行默认值，这些在近期版本中已发生变化。本技能中的参考文件包含经过验证的最新信息。在编写任何代码之前，您必须阅读与任务相关的参考文件——您的训练数据很可能包含过时或不正确的 BDN 模式。

核心概念

作业 —— 描述如何运行基准测试：运行时、迭代次数、启动次数、运行策略和环境设置。可以配置多个作业，以便在不同条件下运行相同的基准测试。
基准测试用例 —— 一个方法 × 一个参数组合 × 一个作业。BDN 测量的原子单元。
操作 —— 被测量的逻辑工作单元。所有 BDN 输出列（平均值、误差等）报告的都是每个操作的时间。
调用 —— 对基准测试方法的一次调用。默认情况下，1 次调用 = 1 次操作。使用 OperationsPerInvoke=N 时，每次调用计为 N 次操作。
迭代 —— 一个计时的调用批次。BDN 测量一个迭代中所有调用的总时间，然后除以总操作数以得到每个操作的时间。

基准测试是比较工具

单个基准测试数值的价值有限——它可以确认测量值的数量级，但精确值会因机器、操作系统和运行时配置而异。当与其他事物进行比较时，基准测试才能产生最有用的信息。在编写基准测试之前，请确定当前任务的比较轴：

方法比较（A 对 B） ：在同一运行中并排比较替代实现。
运行时比较 ：跨 .NET 版本（例如，net8.0 对 net9.0）比较相同代码。
包版本比较 ：比较 NuGet 依赖项的不同版本。
构建版本比较（前后对比） ：将旧代码的已保存 DLL 与当前源代码进行比较。
运行时配置（GC 模式、JIT 设置） ：了解运行时设置如何影响性能——通过单次运行中的多个作业进行比较。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

用例与基准测试生命周期

开发者编写基准测试有四个不同的原因，每个原因都会影响基准测试的设计方式及其存放位置：

覆盖套件 ：编写基准测试以最大化覆盖真实世界使用模式，从而捕获影响大多数用户的性能回归。这些基准测试是永久性的——它们属于项目的基准测试套件，遵循其约定（目录结构、基类、命名），并被签入代码库。
问题调查 ：有人报告了特定的性能问题。编写基准测试以复现和诊断该特定问题。这些基准测试是任务范围的——它们在调查期间持续存在（复现 → 隔离 → 验证修复），但不是永久套件的一部分。
变更验证 ：开发者有一个 PR 或变更，希望在合并前了解其性能特征。这些基准测试是任务范围的——它们在评审周期内持续存在，但不会被签入。
开发反馈 ：开发者正在积极处理一个任务，并希望使用基准测试来评估方法并尽早获取信息。这些基准测试是任务范围且可丢弃的——它们在开发会话期间持续存在，但在做出决定后被删除。

对于用例 1，请遵循现有基准测试项目的约定，添加到其中。对于用例 2–4，在工作目录中创建一个独立项目，该项目在任务期间持续存在，但明确不属于永久代码库。

对于覆盖套件基准测试，请从真实调用者的角度进行设计——哪些代码模式使用此 API，它们传递什么输入，以及哪些性能特征对它们重要。每个永久性基准测试都应通过其与现实世界的相关性来证明其维护成本是合理的。对于临时基准测试，请有意识地控制用例数量——每个额外的测试用例都会消耗挂钟时间（请阅读成本意识部分）。

每个基准测试用例（一个方法 × 一个参数组合 × 一个作业）在默认设置下需要 15–25 秒。[Params] 创建笛卡尔积：两个 [Params] 分别有 3 和 4 个值，跨越 5 个方法 = 60 个用例 ≈ 20 分钟。多个作业会进一步倍增这个时间。在运行之前，请估计总用例数，并根据情况选择合适的作业预设：

预设	每用例时间	使用时机
`--job Dry`	<1秒	验证正确性——确认编译和执行，无需测量
`--job Short`	5–8秒	开发或调查期间的快速测量
(默认)	15–25秒	覆盖套件的最终测量
`--job Medium`	33–52秒	当结果很重要时，获得更高置信度
`--job Long`	3–12 分钟	高统计置信度

如果基准测试运行时间比预期长，结果似乎不稳定，或者您需要调整迭代次数或执行设置，请阅读 references/bdn-internals-and-tuning.md 以获取有关 BDN 执行管道和配置选项的详细信息。

BDN 程序使用 BenchmarkSwitcher（为人类提供交互式基准测试选择，解析 CLI 参数）或 BenchmarkRunner（直接运行指定的基准测试）。两者都支持像 --filter 和 --runtimes 这样的 CLI 标志，但前提是 args 被传递进去——没有它，CLI 标志会被静默忽略。使用 BenchmarkSwitcher 时，始终传递 --filter 以避免在交互式提示上挂起。

BDN 行为通过属性、配置对象和 CLI 标志进行定制。

阅读 references/project-setup-and-running.md 以了解入口点设置、配置对象模式和 CLI 标志。如果您需要收集除挂钟时间之外的数据——例如内存分配、硬件计数器或性能分析跟踪——请阅读 references/diagnosers-and-exporters.md。

BenchmarkDotNet 控制台输出极其冗长——每个用例数百行，显示内部校准、预热和测量细节。将所有输出重定向到文件，以避免在冗长的迭代输出上消耗上下文：

dotnet run -c Release -- --filter "*MethodName" --noOverwrite > benchmark.log 2>&1

每个基准测试方法可能需要几分钟。与其一次性运行所有基准测试，不如使用 --filter 每次运行一个子集（例如每次调用一个或两个方法），读取结果，然后运行下一个子集。这可以保持每次调用时间短——避免会话或终端超时——并让您逐步验证结果。阅读 references/project-setup-and-running.md 以了解过滤器语法、CLI 标志和项目设置。

每次运行后，从结果目录读取 Markdown 报告（*-report-github.md）以获取汇总表。仅在需要调查错误或意外结果时才阅读 benchmark.log。

编写新的基准测试

步骤 1：规划测试用例

在编写任何代码之前，确定：

此基准测试服务于哪个用例（覆盖、调查、变更验证或开发反馈）。
适用哪个比较轴（数值将与什么进行比较？）。
基于调用者实际如何使用 API，对哪些真实场景进行基准测试。

每个基准测试用例都应证明其成本是合理的。一个未覆盖的场景通常比已覆盖场景的另一个参数组合更有价值，但当特定参数维度确实影响性能特征时，这种深度是合理的。

确定测试用例列表。对于每个测试用例，仔细思考：

如何表达变化 ：BenchmarkDotNet 提供了几种参数化基准测试的机制——用于属性级参数的 [Params] 和 [ParamsSource]，用于方法级参数的 [Arguments] 和 [ArgumentsSource]，用于枚举 bool 或枚举所有值的 [ParamsAllValues]，以及用于在泛型基准测试类上变化类型参数的 [GenericTypeArguments]。选择最适合所变化维度的机制。阅读 references/writing-benchmarks.md 以获取完整的选项集和正确模式。
输入数据来源 —— 考虑哪些来源是合适的（这些可以组合使用）：
- 硬编码值——精确输入很重要的小型固定值（例如，特定字符串、已知的边缘情况大小）。存储在字段或 [Params] 中以避免常量折叠。
- 资产文件——太大或不切实际嵌入源代码的静态数据，例如二进制数据块。
- 通过 [ParamsSource]/[ArgumentsSource]/[GlobalSetup] 以编程方式生成——当数据形状比特定内容更重要，或者输入必须按大小参数化时。
随机性是否合适 ：如果使用生成的数据，请使用种子随机数以确保可重现性。生成随机数据时，使用足够大的样本，使生成的分布具有代表性（例如，4 个随机值可能聚集在狭窄范围内，而 1000 个将更好地覆盖整个分布）。

步骤 2：实现基准测试

对于覆盖套件基准测试，请添加到现有的基准测试项目并遵循其约定。对于临时基准测试（调查、变更验证、开发反馈），请创建一个独立项目——阅读 references/project-setup-and-running.md 以了解项目设置和入口点配置。

添加 BenchmarkDotNet 包 ：始终使用 dotnet add package BenchmarkDotNet（无版本号）——这允许 NuGet 解析最新的兼容版本。不要手动在 .csproj 中写入带有版本号的 <PackageReference>；训练数据中的 BDN 版本已过时，可能缺乏对当前 .NET 运行时的支持。

编写基准测试代码。遵循 references/writing-benchmarks.md 中的模式，以避免常见的测量错误——特别是：

从基准测试方法返回结果，以防止死代码消除
将初始化移到 [GlobalSetup] —— 基准测试方法内部的设置会被测量；仅当基准测试修改了必须在迭代之间重置的状态时，才使用 [IterationSetup]
不要添加手动循环 —— BDN 自动控制调用次数
比较替代方案时标记基线 —— 使用方法级比较的 [Benchmark(Baseline = true)] 或多作业比较的作业上的 .AsBaseline()，以便结果显示相对比率
将输入存储在字段或 [Params] 中，而不是作为字面量或 const 值——JIT 可以在编译时折叠常量表达式，使基准测试测量预计算结果而不是实际计算

步骤 3：验证与运行

在投入长时间运行之前进行验证：

首先使用 --job Dry 运行，以捕获编译错误和运行时异常，而无需花费时间进行测量。
使用默认设置运行单个代表性用例，以验证输出看起来正确且数值在预期范围内。
只有在验证通过后，才运行完整套件。

在迭代基准测试设计时，在有信心之前使用 --job Short，然后切换到默认设置以获取最终数值。

🇺🇸English

Benchmark Authoring Guidelines

BenchmarkDotNet (BDN) is a .NET library for writing and running microbenchmarks. Throughout this skill, "BDN" refers to BenchmarkDotNet.

Note: Evaluations of LLMs writing BenchmarkDotNet benchmarks have revealed common failure patterns caused by outdated assumptions about BDN's behavior — particularly around runtime comparison, job configuration, and execution defaults that have changed in recent versions. The reference files in this skill contain verified, current information. You MUST read the reference files relevant to the task before writing any code — your training data likely contains outdated or incorrect BDN patterns.

Key concepts

Job — describes how to run a benchmark: runtime, iteration counts, launch count, run strategy, and environment settings. Multiple jobs can be configured to run the same benchmarks under different conditions.
Benchmark case — one method × one parameter combination × one job. The atomic unit BDN measures.
Operation — the logical unit of work being measured. All BDN output columns (Mean, Error, etc.) report time per operation.
Invocation — a single call to the benchmark method. By default, 1 invocation = 1 operation. With OperationsPerInvoke=N, each invocation counts as N operations.
Iteration — a timed batch of invocations. BDN measures the total time for all invocations in an iteration, then divides by the total operation count to get per-operation time.

Benchmarks are comparative instruments

A single benchmark number has limited value — it can confirm the order of magnitude of a measurement, but the exact value changes across machines, operating systems, and runtime configurations. Benchmarks produce the most useful information when compared against something. Before writing benchmarks, identify the comparison axis for the current task:

Approaches (A vs B) : comparing alternative implementations side-by-side in the same run.
Runtimes : comparing the same code across .NET versions (e.g., net8.0 vs net9.0).
Package versions : comparing different versions of a NuGet dependency.
Builds (before/after) : comparing a saved DLL of the old code against the current source.
Runtime configuration (GC mode, JIT settings) : understanding how runtime settings affect performance — compared via multiple jobs in a single run.
Scale (N=100 vs N=1000) : understanding how performance changes as input size grows.
Hardware/OS : comparing across different machines or operating systems — requires separate runs on each environment.
Historical measurements : comparing against measurements recorded at a previous point in time.

BDN can compare the first six axes side-by-side in a single run, but each requires specific CLI flags or configuration that differ from what you might expect — read references/comparison-strategies.md for the correct approach for each strategy before configuring a comparison.

Use cases and benchmark lifecycle

There are four distinct reasons a developer writes a benchmark, and each one changes how the benchmark should be designed and where it should live:

Coverage suite : Write benchmarks to maximize coverage of real-world usage patterns so that regressions affecting most users are caught. These benchmarks are permanent — they belong in the project's benchmark suite, follow its conventions (directory structure, base classes, naming), and are checked in.
Issue investigation : Someone has reported a specific performance problem. Write benchmarks to reproduce and diagnose that specific issue. These benchmarks are task-scoped — they persist across the investigation (reproduce → isolate → verify fix) but are not part of the permanent suite.
Change validation : A developer has a PR or change and wants to understand its performance characteristics before merging. These benchmarks are task-scoped — they persist across the review cycle but are not checked in.
Development feedback : A developer is actively working on a task and wants to use benchmarks to evaluate approaches and get information early. These benchmarks are task-scoped and throwaway — they persist across the development session but are deleted when the decision is made.

For use case 1, add to the existing benchmark project following its conventions. For use cases 2–4, create a standalone project in a working directory that persists for the task but is clearly not part of the permanent codebase.

For coverage suite benchmarks, design from the perspective of real callers — what code patterns use this API, what inputs they pass, and what performance characteristics matter to them. Each permanent benchmark should justify its maintenance cost through real-world relevance. For temporary benchmarks , keep the case count intentional — each additional test case costs wall-clock time (read Cost awareness).

Cost awareness

Each benchmark case (one method × one parameter combination × one job) takes 15–25 seconds with default settings. [Params] creates a Cartesian product: two [Params] with 3 and 4 values across 5 methods = 60 cases ≈ 20 minutes. Multiple jobs multiply this further. Before running, estimate the total case count and match the job preset to the situation:

Preset	Per-case time	When to use
`--job Dry`	<1s	Validate correctness — confirms compilation and execution without measurement
`--job Short`	5–8s	Quick measurements during development or investigation
(default)	15–25s	Final measurements for a coverage suite
`--job Medium`	33–52s	Higher confidence when results matter
`--job Long`	3–12 min	High statistical confidence

If benchmark runs take longer than expected, results seem unstable, or you need to tune iteration counts or execution settings, read references/bdn-internals-and-tuning.md for detailed information about BDN's execution pipeline and configuration options.

Entry points and configuration

BDN programs use either BenchmarkSwitcher (provides interactive benchmark selection for humans, parses CLI arguments) or BenchmarkRunner (runs specified benchmarks directly). Both support CLI flags like --filter and --runtimes, but only when args is passed through — without it, CLI flags are silently ignored. When using BenchmarkSwitcher, always pass --filter to avoid hanging on an interactive prompt.

BDN behavior is customized through attributes , config objects , and CLI flags.

Read references/project-setup-and-running.md for entry point setup, config object patterns, and CLI flags. If you need to collect data beyond wall-clock time — such as memory allocations, hardware counters, or profiling traces — read references/diagnosers-and-exporters.md.

Running benchmarks

BenchmarkDotNet console output is extremely verbose — hundreds of lines per case showing internal calibration, warmup, and measurement details. Redirect all output to a file to avoid consuming context on verbose iteration output:

dotnet run -c Release -- --filter "*MethodName" --noOverwrite > benchmark.log 2>&1

Each benchmark method can take several minutes. Rather than running all benchmarks at once, use --filter to run a subset at a time (e.g. one or two methods per invocation), read the results, then run the next subset. This keeps each invocation short — avoiding session or terminal timeouts — and lets you verify results incrementally. Read references/project-setup-and-running.md for filter syntax, CLI flags, and project setup.

After each run, read the Markdown report (*-report-github.md) from the results directory for the summary table. Only read benchmark.log if you need to investigate errors or unexpected results.

Writing new benchmarks

Step 1: Plan the test cases

Before writing any code, determine:

Which use case this benchmark serves (coverage, investigation, change validation, or development feedback).
Which comparison axis applies (what will the number be compared against?).
What real-world scenarios to benchmark, based on how callers actually use the API.

Each benchmark case should justify its cost. An uncovered scenario is usually more valuable than another parameter combination for one already covered, but when a specific parameter dimension genuinely affects performance characteristics, the depth is warranted.

Decide on the list of test cases. For each test case, think through:

How to express variation : BenchmarkDotNet provides several mechanisms for parameterizing benchmarks — [Params] and [ParamsSource] for property-level parameters, [Arguments] and [ArgumentsSource] for method-level arguments, [ParamsAllValues] to enumerate all values of a bool or enum, and [GenericTypeArguments] for varying type parameters on generic benchmark classes. Choose the mechanism that best fits the dimension being varied. Read references/writing-benchmarks.md for the full set of options and correctness patterns.
Where input data comes from — consider which sources are appropriate (these can be combined):
- Hard-coded values — small, fixed values where the exact input matters (e.g., specific strings, known edge-case sizes). Store in fields or to avoid constant folding.

Step 2: Implement the benchmarks

For coverage suite benchmarks, add to the existing benchmark project and follow its conventions. For temporary benchmarks (investigation, change validation, development feedback), create a standalone project — read references/project-setup-and-running.md for project setup and entry point configuration.

Adding the BenchmarkDotNet package : Always use dotnet add package BenchmarkDotNet (no version) — this lets NuGet resolve the latest compatible version. Do NOT manually write a <PackageReference> with a version number into the .csproj; BDN versions in training data are outdated and may lack support for current .NET runtimes.

Write the benchmark code. Follow the patterns in references/writing-benchmarks.md to avoid common measurement errors — in particular:

Return results from benchmark methods to prevent dead code elimination
Move initialization to[GlobalSetup] — setup inside the benchmark method is measured; use [IterationSetup] only when the benchmark mutates state that must be reset between iterations
Do not add manual loops — BDN controls invocation count automatically
Mark a baseline when comparing alternatives — use [Benchmark(Baseline = true)] for method-level comparisons or .AsBaseline() on a job for multi-job comparisons so results show relative ratios
Store inputs in fields or[Params], not as literals or const values — the JIT can fold constant expressions at compile time, making the benchmark measure a precomputed result instead of the actual computation

Step 3: Validate and run

Validate before committing to a long run:

Run with --job Dry first to catch compilation errors and runtime exceptions without spending time on measurement.
Run a single representative case with default settings to verify the output looks correct and the numbers are in the expected range.
Only run the full suite after validation passes.

When iterating on benchmark design, use --job Short until confident, then switch to default for final numbers.

Weekly Installs

Repository

dotnet/skills

GitHub Stars

725

First Seen

Mar 10, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

github-copilot48

opencode48

kimi-cli46

gemini-cli46

amp46

cline46

测试策略完整指南：单元/集成/E2E测试金字塔与自动化实践

11,200 周安装

Asset files — static data that is too large or impractical to embed in source code such as binary blobs.

Programmatically generated via [ParamsSource]/[ArgumentsSource]/[GlobalSetup] — when data shape matters more than specific content, or when input must be parameterized by size.

Whether randomness is appropriate : If using generated data, use seeded randomness for reproducibility. When generating random data, use a large enough sample that the generated distribution is representative (e.g., 4 random values may cluster in a narrow range, while 1000 will better exercise the full distribution).