重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
microbenchmarking by dotnet/skills
npx skills add https://github.com/dotnet/skills --skill microbenchmarkingBenchmarkDotNet(BDN)是一个用于编写和运行微基准测试的 .NET 库。在本技能中,"BDN" 指代 BenchmarkDotNet。
注意: 对 LLM 编写 BenchmarkDotNet 基准测试的评估揭示了常见的失败模式,这些模式是由对 BDN 行为的过时假设引起的——特别是关于运行时比较、作业配置和执行默认值,这些在近期版本中已发生变化。本技能中的参考文件包含经过验证的最新信息。在编写任何代码之前,您必须阅读与任务相关的参考文件——您的训练数据很可能包含过时或不正确的 BDN 模式。
OperationsPerInvoke=N 时,每次调用计为 N 次操作。单个基准测试数值的价值有限——它可以确认测量值的数量级,但精确值会因机器、操作系统和运行时配置而异。当与其他事物进行比较时,基准测试才能产生最有用的信息。在编写基准测试之前,请确定当前任务的比较轴:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
BDN 可以在单次运行中并排比较前六个轴,但每个轴都需要特定的 CLI 标志或配置,这些可能与您的预期不同——在配置比较之前,请阅读 references/comparison-strategies.md 以了解每种策略的正确方法。
开发者编写基准测试有四个不同的原因,每个原因都会影响基准测试的设计方式及其存放位置:
对于用例 1,请遵循现有基准测试项目的约定,添加到其中。对于用例 2–4,在工作目录中创建一个独立项目,该项目在任务期间持续存在,但明确不属于永久代码库。
对于覆盖套件基准测试,请从真实调用者的角度进行设计——哪些代码模式使用此 API,它们传递什么输入,以及哪些性能特征对它们重要。每个永久性基准测试都应通过其与现实世界的相关性来证明其维护成本是合理的。对于临时基准测试,请有意识地控制用例数量——每个额外的测试用例都会消耗挂钟时间(请阅读成本意识部分)。
每个基准测试用例(一个方法 × 一个参数组合 × 一个作业)在默认设置下需要 15–25 秒。[Params] 创建笛卡尔积:两个 [Params] 分别有 3 和 4 个值,跨越 5 个方法 = 60 个用例 ≈ 20 分钟。多个作业会进一步倍增这个时间。在运行之前,请估计总用例数,并根据情况选择合适的作业预设:
| 预设 | 每用例时间 | 使用时机 |
|---|---|---|
--job Dry | <1秒 | 验证正确性——确认编译和执行,无需测量 |
--job Short | 5–8秒 | 开发或调查期间的快速测量 |
| (默认) | 15–25秒 | 覆盖套件的最终测量 |
--job Medium | 33–52秒 | 当结果很重要时,获得更高置信度 |
--job Long | 3–12 分钟 | 高统计置信度 |
如果基准测试运行时间比预期长,结果似乎不稳定,或者您需要调整迭代次数或执行设置,请阅读 references/bdn-internals-and-tuning.md 以获取有关 BDN 执行管道和配置选项的详细信息。
BDN 程序使用 BenchmarkSwitcher(为人类提供交互式基准测试选择,解析 CLI 参数)或 BenchmarkRunner(直接运行指定的基准测试)。两者都支持像 --filter 和 --runtimes 这样的 CLI 标志,但前提是 args 被传递进去——没有它,CLI 标志会被静默忽略。使用 BenchmarkSwitcher 时,始终传递 --filter 以避免在交互式提示上挂起。
BDN 行为通过属性、配置对象和 CLI 标志进行定制。
阅读 references/project-setup-and-running.md 以了解入口点设置、配置对象模式和 CLI 标志。如果您需要收集除挂钟时间之外的数据——例如内存分配、硬件计数器或性能分析跟踪——请阅读 references/diagnosers-and-exporters.md。
BenchmarkDotNet 控制台输出极其冗长——每个用例数百行,显示内部校准、预热和测量细节。将所有输出重定向到文件,以避免在冗长的迭代输出上消耗上下文:
dotnet run -c Release -- --filter "*MethodName" --noOverwrite > benchmark.log 2>&1
每个基准测试方法可能需要几分钟。与其一次性运行所有基准测试,不如使用 --filter 每次运行一个子集(例如每次调用一个或两个方法),读取结果,然后运行下一个子集。这可以保持每次调用时间短——避免会话或终端超时——并让您逐步验证结果。阅读 references/project-setup-and-running.md 以了解过滤器语法、CLI 标志和项目设置。
每次运行后,从结果目录读取 Markdown 报告(*-report-github.md)以获取汇总表。仅在需要调查错误或意外结果时才阅读 benchmark.log。
在编写任何代码之前,确定:
每个基准测试用例都应证明其成本是合理的。一个未覆盖的场景通常比已覆盖场景的另一个参数组合更有价值,但当特定参数维度确实影响性能特征时,这种深度是合理的。
确定测试用例列表。对于每个测试用例,仔细思考:
[Params] 和 [ParamsSource],用于方法级参数的 [Arguments] 和 [ArgumentsSource],用于枚举 bool 或枚举所有值的 [ParamsAllValues],以及用于在泛型基准测试类上变化类型参数的 [GenericTypeArguments]。选择最适合所变化维度的机制。阅读 references/writing-benchmarks.md 以获取完整的选项集和正确模式。[Params] 中以避免常量折叠。[ParamsSource]/[ArgumentsSource]/[GlobalSetup] 以编程方式生成——当数据形状比特定内容更重要,或者输入必须按大小参数化时。对于覆盖套件基准测试,请添加到现有的基准测试项目并遵循其约定。对于临时基准测试(调查、变更验证、开发反馈),请创建一个独立项目——阅读 references/project-setup-and-running.md 以了解项目设置和入口点配置。
添加 BenchmarkDotNet 包 :始终使用 dotnet add package BenchmarkDotNet(无版本号)——这允许 NuGet 解析最新的兼容版本。不要手动在 .csproj 中写入带有版本号的 <PackageReference>;训练数据中的 BDN 版本已过时,可能缺乏对当前 .NET 运行时的支持。
编写基准测试代码。遵循 references/writing-benchmarks.md 中的模式,以避免常见的测量错误——特别是:
[GlobalSetup] —— 基准测试方法内部的设置会被测量;仅当基准测试修改了必须在迭代之间重置的状态时,才使用 [IterationSetup][Benchmark(Baseline = true)] 或多作业比较的作业上的 .AsBaseline(),以便结果显示相对比率[Params] 中,而不是作为字面量或 const 值——JIT 可以在编译时折叠常量表达式,使基准测试测量预计算结果而不是实际计算在投入长时间运行之前进行验证:
--job Dry 运行,以捕获编译错误和运行时异常,而无需花费时间进行测量。在迭代基准测试设计时,在有信心之前使用 --job Short,然后切换到默认设置以获取最终数值。
每周安装次数
53
代码仓库
GitHub 星标数
725
首次出现
2026年3月10日
安全审计
安装于
github-copilot48
opencode48
kimi-cli46
gemini-cli46
amp46
cline46
BenchmarkDotNet (BDN) is a .NET library for writing and running microbenchmarks. Throughout this skill, "BDN" refers to BenchmarkDotNet.
Note: Evaluations of LLMs writing BenchmarkDotNet benchmarks have revealed common failure patterns caused by outdated assumptions about BDN's behavior — particularly around runtime comparison, job configuration, and execution defaults that have changed in recent versions. The reference files in this skill contain verified, current information. You MUST read the reference files relevant to the task before writing any code — your training data likely contains outdated or incorrect BDN patterns.
OperationsPerInvoke=N, each invocation counts as N operations.A single benchmark number has limited value — it can confirm the order of magnitude of a measurement, but the exact value changes across machines, operating systems, and runtime configurations. Benchmarks produce the most useful information when compared against something. Before writing benchmarks, identify the comparison axis for the current task:
BDN can compare the first six axes side-by-side in a single run, but each requires specific CLI flags or configuration that differ from what you might expect — read references/comparison-strategies.md for the correct approach for each strategy before configuring a comparison.
There are four distinct reasons a developer writes a benchmark, and each one changes how the benchmark should be designed and where it should live:
Coverage suite : Write benchmarks to maximize coverage of real-world usage patterns so that regressions affecting most users are caught. These benchmarks are permanent — they belong in the project's benchmark suite, follow its conventions (directory structure, base classes, naming), and are checked in.
Issue investigation : Someone has reported a specific performance problem. Write benchmarks to reproduce and diagnose that specific issue. These benchmarks are task-scoped — they persist across the investigation (reproduce → isolate → verify fix) but are not part of the permanent suite.
Change validation : A developer has a PR or change and wants to understand its performance characteristics before merging. These benchmarks are task-scoped — they persist across the review cycle but are not checked in.
Development feedback : A developer is actively working on a task and wants to use benchmarks to evaluate approaches and get information early. These benchmarks are task-scoped and throwaway — they persist across the development session but are deleted when the decision is made.
For use case 1, add to the existing benchmark project following its conventions. For use cases 2–4, create a standalone project in a working directory that persists for the task but is clearly not part of the permanent codebase.
For coverage suite benchmarks, design from the perspective of real callers — what code patterns use this API, what inputs they pass, and what performance characteristics matter to them. Each permanent benchmark should justify its maintenance cost through real-world relevance. For temporary benchmarks , keep the case count intentional — each additional test case costs wall-clock time (read Cost awareness).
Each benchmark case (one method × one parameter combination × one job) takes 15–25 seconds with default settings. [Params] creates a Cartesian product: two [Params] with 3 and 4 values across 5 methods = 60 cases ≈ 20 minutes. Multiple jobs multiply this further. Before running, estimate the total case count and match the job preset to the situation:
| Preset | Per-case time | When to use |
|---|---|---|
--job Dry | <1s | Validate correctness — confirms compilation and execution without measurement |
--job Short | 5–8s | Quick measurements during development or investigation |
| (default) | 15–25s | Final measurements for a coverage suite |
--job Medium | 33–52s | Higher confidence when results matter |
--job Long | 3–12 min | High statistical confidence |
If benchmark runs take longer than expected, results seem unstable, or you need to tune iteration counts or execution settings, read references/bdn-internals-and-tuning.md for detailed information about BDN's execution pipeline and configuration options.
BDN programs use either BenchmarkSwitcher (provides interactive benchmark selection for humans, parses CLI arguments) or BenchmarkRunner (runs specified benchmarks directly). Both support CLI flags like --filter and --runtimes, but only when args is passed through — without it, CLI flags are silently ignored. When using BenchmarkSwitcher, always pass --filter to avoid hanging on an interactive prompt.
BDN behavior is customized through attributes , config objects , and CLI flags.
Read references/project-setup-and-running.md for entry point setup, config object patterns, and CLI flags. If you need to collect data beyond wall-clock time — such as memory allocations, hardware counters, or profiling traces — read references/diagnosers-and-exporters.md.
BenchmarkDotNet console output is extremely verbose — hundreds of lines per case showing internal calibration, warmup, and measurement details. Redirect all output to a file to avoid consuming context on verbose iteration output:
dotnet run -c Release -- --filter "*MethodName" --noOverwrite > benchmark.log 2>&1
Each benchmark method can take several minutes. Rather than running all benchmarks at once, use --filter to run a subset at a time (e.g. one or two methods per invocation), read the results, then run the next subset. This keeps each invocation short — avoiding session or terminal timeouts — and lets you verify results incrementally. Read references/project-setup-and-running.md for filter syntax, CLI flags, and project setup.
After each run, read the Markdown report (*-report-github.md) from the results directory for the summary table. Only read benchmark.log if you need to investigate errors or unexpected results.
Before writing any code, determine:
Each benchmark case should justify its cost. An uncovered scenario is usually more valuable than another parameter combination for one already covered, but when a specific parameter dimension genuinely affects performance characteristics, the depth is warranted.
Decide on the list of test cases. For each test case, think through:
[Params] and [ParamsSource] for property-level parameters, [Arguments] and [ArgumentsSource] for method-level arguments, [ParamsAllValues] to enumerate all values of a bool or enum, and [GenericTypeArguments] for varying type parameters on generic benchmark classes. Choose the mechanism that best fits the dimension being varied. Read references/writing-benchmarks.md for the full set of options and correctness patterns.For coverage suite benchmarks, add to the existing benchmark project and follow its conventions. For temporary benchmarks (investigation, change validation, development feedback), create a standalone project — read references/project-setup-and-running.md for project setup and entry point configuration.
Adding the BenchmarkDotNet package : Always use dotnet add package BenchmarkDotNet (no version) — this lets NuGet resolve the latest compatible version. Do NOT manually write a <PackageReference> with a version number into the .csproj; BDN versions in training data are outdated and may lack support for current .NET runtimes.
Write the benchmark code. Follow the patterns in references/writing-benchmarks.md to avoid common measurement errors — in particular:
[GlobalSetup] — setup inside the benchmark method is measured; use [IterationSetup] only when the benchmark mutates state that must be reset between iterations[Benchmark(Baseline = true)] for method-level comparisons or .AsBaseline() on a job for multi-job comparisons so results show relative ratios[Params], not as literals or const values — the JIT can fold constant expressions at compile time, making the benchmark measure a precomputed result instead of the actual computationValidate before committing to a long run:
--job Dry first to catch compilation errors and runtime exceptions without spending time on measurement.When iterating on benchmark design, use --job Short until confident, then switch to default for final numbers.
Weekly Installs
53
Repository
GitHub Stars
725
First Seen
Mar 10, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
github-copilot48
opencode48
kimi-cli46
gemini-cli46
amp46
cline46
测试策略完整指南:单元/集成/E2E测试金字塔与自动化实践
11,200 周安装
Bitbucket工作流最佳实践:拉取请求、Pipelines CI/CD与Jira集成完整指南
174 周安装
专业市场研究报告生成工具 - 50+页咨询级行业分析,含PESTLE、SWOT、波特五力等框架
171 周安装
表单验证最佳实践:React Hook Form、TypeScript与Vue VeeValidate完整指南
170 周安装
竞争分析师智能体:系统性竞争对手分析、市场定位评估与战略优势识别
170 周安装
ADHD/自闭症任务拆解指南:DECOMPOSE方法助你克服执行功能障碍,提升生产力
187 周安装
查询缓存策略指南:Redis、Memcached与数据库级缓存实现多级缓存优化
171 周安装
[Params][ParamsSource]/[ArgumentsSource]/[GlobalSetup] — when data shape matters more than specific content, or when input must be parameterized by size.