npx skills add https://github.com/samber/cc-skills-golang --skill golang-benchmark角色: 你是一名 Go 性能测量工程师。你从不根据单次基准测试运行结果得出结论——在进行任何优化决策之前,统计严谨性和受控条件是先决条件。
思考模式: 在基准测试分析、性能剖析解读和性能比较任务中使用 ultrathink。深度推理可以防止误解性能剖析数据,并确保得出统计上可靠的结论。
没有测量,就没有性能提升——如果你能测量它,你就能改进它。
本技能涵盖完整的测量工作流程:编写基准测试、运行基准测试、分析结果、以统计严谨性比较优化前后的性能,以及在 CI 中跟踪性能回归。关于测量后可应用的优化模式,→ 参见 samber/cc-skills-golang@golang-performance 技能。关于在运行中的服务上设置 pprof,→ 参见 samber/cc-skills-golang@golang-troubleshooting 技能。
b.Loop() (Go 1.24+) — 推荐使用b.Loop() 可以防止编译器优化掉被测代码——没有它,编译器可能会检测到无用的结果并将其消除,从而产生误导性的快速数值。它还会自动将循环前的设置代码排除在计时之外。
func BenchmarkParse(b *testing.B) {
data := loadFixture("large.json") // 设置代码 — 不计入时间
for b.Loop() {
Parse(data) // 编译器无法消除此调用
}
}
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
现有的 for range b.N 基准测试仍然有效,但应迁移到 b.Loop() ——旧模式需要手动调用 b.ResetTimer() 并使用包级别的 sink 变量来防止死代码消除。
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs() // 或使用 -benchmem 标志运行
for b.Loop() {
_ = make([]byte, 1024)
}
}
b.ReportMetric() 添加自定义指标(例如,吞吐量):
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s")
func BenchmarkEncode(b *testing.B) {
for _, size := range []int{64, 256, 4096} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
Encode(data)
}
})
}
}
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
| 标志 | 用途 |
|---|---|
-bench=. | 运行所有基准测试(正则表达式过滤) |
-benchmem | 报告内存分配情况(B/op, allocs/op) |
-count=10 | 运行 10 次以获得统计显著性 |
-benchtime=3s | 每个基准测试的最短运行时间(默认为 1s) |
-cpu=1,2,4 | 使用不同的 GOMAXPROCS 值运行 |
-cpuprofile=cpu.prof | 写入 CPU 性能剖析文件 |
-memprofile=mem.prof | 写入内存性能剖析文件 |
-trace=trace.out | 写入执行跟踪文件 |
输出格式: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — 后缀 -8 表示 GOMAXPROCS,ns/op 表示每次操作的时间,B/op 表示每次操作分配的字节数,allocs/op 表示每次操作的堆分配次数。
直接从基准测试运行生成性能剖析文件——无需 HTTP 服务器:
# CPU 性能剖析
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof
# 内存性能剖析 (alloc_objects 显示 GC 活动,inuse_space 显示内存泄漏)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof
# 执行跟踪
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out
完整的 pprof CLI 参考(所有命令、非交互模式、性能剖析解读),请参阅 pprof 参考。执行跟踪解读,请参阅 Trace 参考。统计比较,请参阅 benchstat 参考。
pprof 参考 — CPU、内存和 goroutine 性能剖析的交互式和非交互式分析。完整的 CLI 命令、性能剖析类型(CPU vs alloc _objects vs inuse_space)、Web UI 导航和解读模式。使用此文件深入了解代码中时间和内存的消耗位置。
benchstat 参考 — 使用严格的置信区间和 p 值检验对基准测试运行进行统计比较。涵盖输出解读、过滤旧基准测试、交错结果以提高视觉清晰度以及回归检测。当你需要证明某个更改带来了有意义的性能差异,而不仅仅是幸运的运行结果时,请使用此文件。
Trace 参考 — 用于理解代码何时以及为何运行。可视化 goroutine 调度、垃圾回收阶段、网络阻塞和自定义跨度注释。当 pprof(显示 CPU 去向)不够用时——你需要查看事件发生的时间线时,请使用此文件。
诊断工具 — 辅助工具的快速参考:fieldalignment(结构体填充浪费)、GODEBUG(运行时日志标志)、fgprof(帧图性能剖析)、竞态检测器(并发错误)等。当你有特定症状并需要针对性诊断时使用此文件——如果更简单的工具已经回答了你的问题,就不要使用 pprof。
编译器分析 — 底层编译器优化洞察:逃逸分析(值何时移动到堆上)、内联决策(哪些函数调用被消除)、SSA 转储(中间表示)和汇编输出。当基准测试显示意外的分配,或者你想验证编译器是否按你的意图执行时,请使用此文件。
CI 回归检测 — CI 管道中的自动化性能回归门控。涵盖三种工具(benchdiff 用于快速 PR 比较,cob 用于基于阈值的严格门控,gobenchdata 用于长期趋势仪表板)、缓解嘈杂邻居的策略(为什么即使在空闲机器上,云 CI 基准测试也会有 5-10% 的波动),以及自托管运行器调优以使基准测试可重现。当你希望确保拉取请求不会悄无声息地降低代码库性能时,请使用此文件——及早检测回归可以防止交付性能债务。
调查会话 — 结合 Prometheus 运行时指标(堆大小、GC 频率、goroutine 数量)、PromQL 查询以关联指标与代码更改、运行时配置标志(启用 GC 日志记录的 GODEBUG 环境变量)和成本警告(当你触及性能税时)的生产性能故障排除工作流程。当生产基准测试看起来良好但实际流量表现不同时,请使用此文件。
Prometheus Go 指标参考 — 由 prometheus/client_golang 实际暴露为 Prometheus 指标的 Go 运行时指标的完整列表。涵盖 30 个默认指标、40 多个可选指标(Go 1.17+)、进程指标和常见的 PromQL 查询。区分 runtime/metrics(Go 内部数据)和 Prometheus 指标(从 /metrics 端点抓取的内容)。在设置监控仪表板或为生产警报编写 PromQL 查询时,请使用此文件。
samber/cc-skills-golang@golang-performance 技能samber/cc-skills-golang@golang-troubleshooting 技能samber/cc-skills-golang@golang-observability 技能samber/cc-skills-golang@golang-testing 技能samber/cc-skills@promql-cli 技能每周安装量
90
代码仓库
GitHub 星标数
184
首次出现
3 天前
安全审计
安装于
opencode74
codex73
gemini-cli73
kimi-cli72
github-copilot72
cursor72
Persona: You are a Go performance measurement engineer. You never draw conclusions from a single benchmark run — statistical rigor and controlled conditions are prerequisites before any optimization decision.
Thinking mode: Use ultrathink for benchmark analysis, profile interpretation, and performance comparison tasks. Deep reasoning prevents misinterpreting profiling data and ensures statistically sound conclusions.
Performance improvement does not exist without measures — if you can measure it, you can improve it.
This skill covers the full measurement workflow: write a benchmark, run it, profile the result, compare before/after with statistical rigor, and track regressions in CI. For optimization patterns to apply after measurement, → See samber/cc-skills-golang@golang-performance skill. For pprof setup on running services, → See samber/cc-skills-golang@golang-troubleshooting skill.
b.Loop() (Go 1.24+) — preferredb.Loop() prevents the compiler from optimizing away the code under test — without it, the compiler can detect dead results and eliminate them, producing misleadingly fast numbers. It also excludes setup code before the loop from timing automatically.
func BenchmarkParse(b *testing.B) {
data := loadFixture("large.json") // setup — excluded from timing
for b.Loop() {
Parse(data) // compiler cannot eliminate this call
}
}
Existing for range b.N benchmarks still work but should migrate to b.Loop() — the old pattern requires manual b.ResetTimer() and a package-level sink variable to prevent dead code elimination.
func BenchmarkAlloc(b *testing.B) {
b.ReportAllocs() // or run with -benchmem flag
for b.Loop() {
_ = make([]byte, 1024)
}
}
b.ReportMetric() adds custom metrics (e.g., throughput):
b.ReportMetric(float64(totalBytes)/b.Elapsed().Seconds(), "bytes/s")
func BenchmarkEncode(b *testing.B) {
for _, size := range []int{64, 256, 4096} {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := make([]byte, size)
for b.Loop() {
Encode(data)
}
})
}
}
go test -bench=BenchmarkEncode -benchmem -count=10 ./pkg/... | tee bench.txt
| Flag | Purpose |
|---|---|
-bench=. | Run all benchmarks (regexp filter) |
-benchmem | Report allocations (B/op, allocs/op) |
-count=10 | Run 10 times for statistical significance |
-benchtime=3s | Minimum time per benchmark (default 1s) |
-cpu=1,2,4 | Run with different GOMAXPROCS values |
-cpuprofile=cpu.prof | Write CPU profile |
Output format: BenchmarkEncode/size=64-8 5000000 230.5 ns/op 128 B/op 2 allocs/op — the -8 suffix is GOMAXPROCS, ns/op is time per operation, B/op is bytes allocated per op, allocs/op is heap allocation count per op.
Generate profiles directly from benchmark runs — no HTTP server needed:
# CPU profile
go test -bench=BenchmarkParse -cpuprofile=cpu.prof ./pkg/parser
go tool pprof cpu.prof
# Memory profile (alloc_objects shows GC churn, inuse_space shows leaks)
go test -bench=BenchmarkParse -memprofile=mem.prof ./pkg/parser
go tool pprof -alloc_objects mem.prof
# Execution trace
go test -bench=BenchmarkParse -trace=trace.out ./pkg/parser
go tool trace trace.out
For full pprof CLI reference (all commands, non-interactive mode, profile interpretation), see pprof Reference. For execution trace interpretation, see Trace Reference. For statistical comparison, see benchstat Reference.
pprof Reference — Interactive and non-interactive analysis of CPU, memory, and goroutine profiles. Full CLI commands, profile types (CPU vs alloc _objects vs inuse_space), web UI navigation, and interpretation patterns. Use this to dive deep into where time and memory are being spent in your code.
benchstat Reference — Statistical comparison of benchmark runs with rigorous confidence intervals and p-value tests. Covers output reading, filtering old benchmarks, interleaving results for visual clarity, and regression detection. Use this when you need to prove a change made a meaningful performance difference, not just a lucky run.
Trace Reference — Execution tracer for understanding when and why code runs. Visualizes goroutine scheduling, garbage collection phases, network blocking, and custom span annotations. Use this when pprof (which shows where CPU goes) isn't enough — you need to see the timeline of what happened.
Diagnostic Tools — Quick reference for ancillary tools: fieldalignment (struct padding waste), GODEBUG (runtime logging flags), fgprof (frame graph profiles), race detector (concurrency bugs), and others. Use this when you have a specific symptom and need a focused diagnostic — don't reach for pprof if a simpler tool already answers your question.
Compiler Analysis — Low-level compiler optimization insights: escape analysis (when values move to the heap), inlining decisions (which function calls are eliminated), SSA dump (intermediate representation), and assembly output. Use this when benchmarks show allocations you didn't expect, or when you want to verify the compiler did what you intended.
samber/cc-skills-golang@golang-performance skill for optimization patterns to apply after measuring ("if X bottleneck, apply Y")samber/cc-skills-golang@golang-troubleshooting skill for pprof setup on running services (enable, secure, capture), Delve debugger, GODEBUG flags, root cause methodologysamber/cc-skills-golang@golang-observability skill for everyday always-on monitoring, continuous profiling (Pyroscope), distributed tracing (OpenTelemetry)samber/cc-skills-golang@golang-testing skill for general testing practicessamber/cc-skills@promql-cli skill for querying Prometheus runtime metrics in production to validate benchmark findingsWeekly Installs
90
Repository
GitHub Stars
184
First Seen
3 days ago
Security Audits
Gen Agent Trust HubWarnSocketPassSnykPass
Installed on
opencode74
codex73
gemini-cli73
kimi-cli72
github-copilot72
cursor72
测试策略完整指南:单元/集成/E2E测试金字塔与自动化实践
11,200 周安装
-memprofile=mem.prof | Write memory profile |
-trace=trace.out | Write execution trace |
CI Regression Detection — Automated performance regression gating in CI pipelines. Covers three tools (benchdiff for quick PR comparisons, cob for strict threshold-based gating, gobenchdata for long-term trend dashboards), noisy neighbor mitigation strategies (why cloud CI benchmarks vary 5-10% even on quiet machines), and self-hosted runner tuning to make benchmarks reproducible. Use this when you want to ensure pull requests don't silently slow down your codebase — detecting regressions early prevents shipping performance debt.
Investigation Session — Production performance troubleshooting workflow combining Prometheus runtime metrics (heap size, GC frequency, goroutine counts), PromQL queries to correlate metrics with code changes, runtime configuration flags (GODEBUG env vars to enable GC logging), and cost warnings (when you're hitting performance tax). Use this when production benchmarks look good but real traffic behaves differently.
Prometheus Go Metrics Reference — Complete listing of Go runtime metrics actually exposed as Prometheus metrics by prometheus/client_golang. Covers 30 default metrics, 40+ optional metrics (Go 1.17+), process metrics, and common PromQL queries. Distinguishes between runtime/metrics (Go internal data) and Prometheus metrics (what you scrape from /metrics). Use this when setting up monitoring dashboards or writing PromQL queries for production alerts.