重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
linux-perf by mohitmishra786/low-level-dev-skills
npx skills add https://github.com/mohitmishra786/low-level-dev-skills --skill linux-perf指导代理使用 perf 进行 CPU 性能分析:采样、硬件计数器测量、热点识别以及与火焰图生成的集成。
perf 查找热点?"[unknown] 或 [kernel] 帧"# 安装
sudo apt install linux-perf # Debian/Ubuntu (版本匹配)
sudo dnf install perf # Fedora/RHEL
# 检查权限
# 默认情况下 perf 需要 root 或 paranoid 级别 ≤ 1
cat /proc/sys/kernel/perf_event_paranoid
# 2 = 仅 CPU 统计信息(非内核),1 = 用户+内核,0 = 全部,-1 = 无限制
# 临时降低(仅当前会话)
sudo sysctl -w kernel.perf_event_paranoid=1
# 永久设置
echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf
sudo sysctl -p /etc/sysctl.d/99-perf.conf
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
编译目标程序时包含调试符号以获取有用的帧数据:
gcc -g -O2 -fno-omit-frame-pointer -o prog main.c
# -fno-omit-frame-pointer: 基于帧指针展开所必需
# 替代方案:使用 DWARF CFI 编译并使用 --call-graph=dwarf
# 基本硬件计数器
perf stat ./prog
# 指定特定事件
perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog
# 挂钟时间比较:N 次运行
perf stat -r 5 ./prog
# 附加到现有进程
perf stat -p 12345 sleep 10
解读 perf stat 输出:
# 默认:以 1000 Hz 采样(cycles 事件)
perf record -g ./prog
# 指定频率
perf record -F 999 -g ./prog
# 特定事件
perf record -e cache-misses -g ./prog
# 附加到运行中的进程
perf record -F 999 -g -p 12345 sleep 30
# 离线 CPU 分析(等待时间)
perf record -e sched:sched_switch -ag sleep 10
# DWARF 调用图(适用于没有帧指针的二进制文件)
perf record -F 999 --call-graph=dwarf ./prog
# 保存到指定文件
perf record -o myapp.perf.data -g ./prog
perf report # 读取 perf.data
perf report -i myapp.perf.data
perf report --no-children # 仅自用时间(非累积)
perf report --sort comm,dso,sym # 按字段排序
perf report --stdio # 非交互式文本输出
在 TUI 中导航:
Enter — 展开符号a — 注解(显示带命中计数的汇编代码)s — 显示源代码(需要调试信息)d — 按 DSO(库)过滤t — 按线程过滤? — 帮助# 显示带命中百分比的汇编代码
perf annotate sym_name
# 在报告中:在符号上按 'a'
# 或直接使用:
perf annotate -i perf.data --symbol=hot_function --stdio
mov 或 vmovdqa 指令的高命中计数表明该加载指令处发生了缓存未命中。
# 实时 top,类似 'top' 但针对函数
sudo perf top -g
# 按进程过滤
sudo perf top -p 12345
# 生成 perf script 输出
perf script > out.perf
# 使用 Brendan Gregg 的 FlameGraph 工具
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > flamegraph.svg
# 在浏览器中打开 flamegraph.svg
关于阅读火焰图和解读结果,请参见 skills/profilers/flamegraphs。
| 问题 | 原因 | 解决方法 |
|---|---|---|
Permission denied | perf_event_paranoid 过高 | 降低 paranoid 级别或使用 sudo 运行 |
[unknown] 帧 | 缺少帧指针或调试信息 | 使用 -fno-omit-frame-pointer 重新编译或使用 --call-graph=dwarf |
到处是 [kernel] | 内核符号不可见 | 使用 sudo perf record;安装 linux-image-$(uname -r)-dbgsym |
No kallsyms | 内核符号不可用 | `echo 0 |
| 短程序报告为空 | 程序退出太快 | 使用 -F 9999 或检测更长的工作负载 |
| DWARF 展开慢 | DWARF 栈过大 | 使用 --call-graph dwarf,512 限制 |
# 列出所有可用事件
perf list
# 常见硬件事件
cycles
instructions
cache-references
cache-misses
branch-instructions
branch-misses
stalled-cycles-frontend
stalled-cycles-backend
# 软件事件
context-switches
cpu-migrations
page-faults
# 跟踪点(需要 root 权限)
sched:sched_switch
syscalls:sys_enter_read
关于计数器参考和解读指南,请参见 references/events.md。
skills/profilers/flamegraphs 进行 SVG 火焰图生成和阅读skills/profilers/valgrind 进行缓存模拟和内存分析skills/compilers/gcc 或 skills/compilers/clang 从 perf 数据进行 PGO (AutoFDO)每周安装数
58
代码仓库
GitHub 星标数
34
首次出现
2026年2月20日
安全审计
安装于
gemini-cli56
github-copilot56
codex56
opencode56
cursor56
amp55
Guide agents through perf for CPU profiling: sampling, hardware counter measurement, hotspot identification, and integration with flamegraph generation.
perf to find hotspots?"[unknown] or [kernel] frames"# Install
sudo apt install linux-perf # Debian/Ubuntu (version-matched)
sudo dnf install perf # Fedora/RHEL
# Check permissions
# By default perf requires root or paranoid level ≤ 1
cat /proc/sys/kernel/perf_event_paranoid
# 2 = only CPU stats (not kernel), 1 = user+kernel, 0 = all, -1 = no restrictions
# Temporarily lower (session only)
sudo sysctl -w kernel.perf_event_paranoid=1
# Persistent
echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/99-perf.conf
sudo sysctl -p /etc/sysctl.d/99-perf.conf
Compile the target with debug symbols for useful frame data:
gcc -g -O2 -fno-omit-frame-pointer -o prog main.c
# -fno-omit-frame-pointer: essential for frame-pointer-based unwinding
# Alternative: compile with DWARF CFI and use --call-graph=dwarf
# Basic hardware counters
perf stat ./prog
# With specific events
perf stat -e cache-misses,cache-references,instructions,cycles,branch-misses ./prog
# Wall-clock comparison: N runs
perf stat -r 5 ./prog
# Attach to existing process
perf stat -p 12345 sleep 10
Interpret perf stat output:
# Default: sample at 1000 Hz (cycles event)
perf record -g ./prog
# Specify frequency
perf record -F 999 -g ./prog
# Specific event
perf record -e cache-misses -g ./prog
# Attach to running process
perf record -F 999 -g -p 12345 sleep 30
# Off-CPU profiling (time spent waiting)
perf record -e sched:sched_switch -ag sleep 10
# DWARF call graphs (better for binaries without frame pointers)
perf record -F 999 --call-graph=dwarf ./prog
# Save to named file
perf record -o myapp.perf.data -g ./prog
perf report # reads perf.data
perf report -i myapp.perf.data
perf report --no-children # self time only (not cumulative)
perf report --sort comm,dso,sym # sort by fields
perf report --stdio # non-interactive text output
Navigation in TUI:
Enter — expand a symbola — annotate (show assembly with hit counts)s — show source (needs debug info)d — filter by DSO (library)t — filter by thread? — help# Show assembly with hit percentages
perf annotate sym_name
# From report: press 'a' on a symbol
# Or directly:
perf annotate -i perf.data --symbol=hot_function --stdio
High hit count on a mov or vmovdqa suggests a cache miss at that load.
# Live top, like 'top' but for functions
sudo perf top -g
# Filter by process
sudo perf top -p 12345
# Generate perf script output
perf script > out.perf
# Use Brendan Gregg's FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > flamegraph.svg
# Open flamegraph.svg in browser
See skills/profilers/flamegraphs for reading flamegraphs and interpreting results.
| Problem | Cause | Fix |
|---|---|---|
Permission denied | perf_event_paranoid too high | Lower paranoid level or run with sudo |
[unknown] frames | Missing frame pointers or debug info | Recompile with -fno-omit-frame-pointer or use --call-graph=dwarf |
[kernel] everywhere | Kernel symbols not visible |
# List all available events
perf list
# Common hardware events
cycles
instructions
cache-references
cache-misses
branch-instructions
branch-misses
stalled-cycles-frontend
stalled-cycles-backend
# Software events
context-switches
cpu-migrations
page-faults
# Tracepoints (requires root)
sched:sched_switch
syscalls:sys_enter_read
For a counter reference and interpretation guide, see references/events.md.
skills/profilers/flamegraphs for SVG flamegraph generation and readingskills/profilers/valgrind for cache simulation and memory profilingskills/compilers/gcc or skills/compilers/clang for PGO from perf data (AutoFDO)Weekly Installs
58
Repository
GitHub Stars
34
First Seen
Feb 20, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykWarn
Installed on
gemini-cli56
github-copilot56
codex56
opencode56
cursor56
amp55
GSAP 框架集成指南:Vue、Svelte 等框架中 GSAP 动画最佳实践
3,700 周安装
上下文退化模式:诊断与修复AI长对话性能下降的5种模式与策略
58 周安装
Claude Code Expert终极指南:Anthropic CLI工具配置、钩子、MCP与高级工作流
73 周安装
AI知识整合技能:将对话洞察持久化为结构化文档,提升开发效率
56 周安装
数据库层实现指南:从模式设计到查询优化的完整工作流程
52 周安装
Recharts图表集成HeroUI v3与Tailwind CSS v4:数据可视化最佳实践与代码规范
68 周安装
AI视频生成工具:文本转视频、图像转视频、添加画外音,支持DALL-E 3、Stable Diffusion、LumaAI、Runway等模型
74 周安装
Use sudo perf record; install linux-image-$(uname -r)-dbgsym |
No kallsyms | Kernel symbols unavailable | `echo 0 |
| Empty report for short program | Program exits too fast | Use -F 9999 or instrument longer workload |
| DWARF unwinding slow | Large DWARF stack | Limit with --call-graph dwarf,512 |