Node.js/Python/Go性能分析器：火焰图、内存泄漏检测、负载测试与数据库优化

performance-profiler by borghei/claude-skills

1 周安装量

29 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/borghei/claude-skills --skill performance-profiler

Node.js 开发运维性能优化

🇨🇳中文介绍

性能分析器

层级： 强大 类别： 工程 / 性能 维护者： Claude Skills 团队

概述

为 Node.js、Python 和 Go 应用程序提供系统性的性能分析。通过火焰图识别 CPU 瓶颈，通过堆快照检测内存泄漏，分析打包大小，优化数据库查询，检测 N+1 模式，并使用 k6 和 Artillery 进行负载测试。强制执行“先测量”的方法论：建立基线，识别瓶颈，修复，并验证改进。

关键词

性能分析，火焰图，内存泄漏，打包分析，N+1 查询，负载测试，k6，延迟，P99，CPU 分析，堆快照，数据库优化

黄金法则：先测量

WRONG: "I think the N+1 query is slow, let me fix it"
RIGHT: Profile → Confirm bottleneck → Fix → Measure again → Verify improvement

每次优化必须包含：
1.  基线指标（之前）
2.  分析器证据（实际慢在哪里）
3.  修复方案
4.  修复后指标（之后）
5.  差值计算（改进百分比）

核心能力

1. CPU 分析

Node.js: Clinic.js 火焰图，V8 CPU 分析
Python: py-spy 火焰图，cProfile，scalene
Go: pprof CPU 分析，追踪可视化
浏览器: Chrome DevTools 性能面板

2. 内存分析

堆快照与对比（修复前/后）
垃圾回收压力分析
内存泄漏检测模式
保留对象图分析

3. 数据库优化

使用 EXPLAIN ANALYZE 进行查询计划分析
N+1 查询检测与批处理
慢查询日志分析
缺失索引识别
连接池大小调整

4. 打包分析

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

269,400 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

147,400 周安装

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

114,200 周安装

Azure 配额管理指南：服务限制、容量验证与配额增加方法

91,100 周安装

# Start Node with inspector
node --inspect server.js

# Or profile on demand
node --cpu-prof --cpu-prof-dir=./profiles server.js
# Load the .cpuprofile file in Chrome DevTools > Performance

# Programmatic profiling of a specific function
const { Session } = require('inspector');
const session = new Session();
session.connect();

session.post('Profiler.enable', () => {
  session.post('Profiler.start', () => {
    // Run the code you want to profile
    runHeavyOperation();

    session.post('Profiler.stop', (err, { profile }) => {
      require('fs').writeFileSync('profile.cpuprofile', JSON.stringify(profile));
    });
  });
});

// Take heap snapshots programmatically
const v8 = require('v8');
const fs = require('fs');

function takeHeapSnapshot(label) {
  const snapshotPath = `heap-${label}-${Date.now()}.heapsnapshot`;
  const stream = v8.writeHeapSnapshot(snapshotPath);
  console.log(`Heap snapshot written to: ${snapshotPath}`);
  return snapshotPath;
}

// Leak detection pattern: compare two snapshots
// 1. Take snapshot at startup
takeHeapSnapshot('baseline');

// 2. Run operations that you suspect leak
// ... process 1000 requests ...

// 3. Force GC and take another snapshot
if (global.gc) global.gc(); // requires --expose-gc flag
takeHeapSnapshot('after-load');

// Load both .heapsnapshot files in Chrome DevTools > Memory
// Use "Comparison" view to find objects that grew

-- Step 1: Get the actual execution plan (not just estimated)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT t.*, p.name as project_name
FROM tasks t
JOIN projects p ON p.id = t.project_id
WHERE p.workspace_id = 'ws_abc123'
  AND t.status = 'in_progress'
  AND t.deleted_at IS NULL
ORDER BY t.updated_at DESC
LIMIT 20;

-- What to look for in the output:
-- Seq Scan on tasks  → MISSING INDEX (should be Index Scan)
-- Rows Removed by Filter: 99000  → INDEX NOT SELECTIVE ENOUGH
-- Sort Method: external merge  → NOT ENOUGH work_mem
-- Nested Loop with inner Seq Scan  → MISSING INDEX ON JOIN COLUMN
-- Actual rows=1000 vs estimated rows=1  → STALE STATISTICS (run ANALYZE)

// PROBLEM: N+1 query pattern
async function getProjectsWithTasks(workspaceId: string) {
  const projects = await db.query.projects.findMany({
    where: eq(projects.workspaceId, workspaceId),
  });

  // This executes N additional queries (one per project)
  for (const project of projects) {
    project.tasks = await db.query.tasks.findMany({
      where: eq(tasks.projectId, project.id),
    });
  }
  return projects;
}
// Total queries: 1 + N (where N = number of projects)

// FIX: Single query with JOIN or relation loading
async function getProjectsWithTasks(workspaceId: string) {
  return db.query.projects.findMany({
    where: eq(projects.workspaceId, workspaceId),
    with: {
      tasks: true,  // Drizzle generates a single JOIN or subquery
    },
  });
}
// Total queries: 1-2 (depending on ORM strategy)

优化前	优化后	节省空间
`import _ from 'lodash'`	`import groupBy from 'lodash/groupBy'`	~70KB
`import moment from 'moment'`	`import { format } from 'date-fns'`	~60KB
`import { icons } from 'lucide-react'`	`import { Search } from 'lucide-react'`	~50KB
静态导入重型组件	`dynamic(() => import('./HeavyChart'))`	延迟加载
所有路由在一个代码块中	按路由代码分割（Next.js 自动支持）	按路由

// load-test.k6.js
import http from 'k6/http'
import { check, sleep } from 'k6'
import { Trend, Rate } from 'k6/metrics'

const apiLatency = new Trend('api_latency')
const errorRate = new Rate('errors')

export const options = {
  stages: [
    { duration: '1m', target: 20 },    // ramp up
    { duration: '3m', target: 100 },   // sustain
    { duration: '1m', target: 0 },     // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200', 'p(99)<500'],
    errors: ['rate<0.01'],
    api_latency: ['p(95)<150'],
  },
}

export default function () {
  const res = http.get(`${__ENV.BASE_URL}/api/v1/projects?limit=20`, {
    headers: { Authorization: `Bearer ${__ENV.TOKEN}` },
  })

  apiLatency.add(res.timings.duration)
  check(res, {
    'status 200': (r) => r.status === 200,
    'body has data': (r) => JSON.parse(r.body).data !== undefined,
  }) || errorRate.add(1)

  sleep(1)
}



# Run locally
k6 run load-test.k6.js -e BASE_URL=http://localhost:3000 -e TOKEN=$TOKEN

# Run with cloud reporting
k6 cloud load-test.k6.js

## Performance Optimization: [What You Fixed]

**Date:** YYYY-MM-DD
**Ticket:** PROJ-123

### Problem
[1-2 sentences: what was slow, how it was observed]

### Root Cause
[What the profiler revealed — include flamegraph link or screenshot]

### Baseline (Before)
| Metric | Value |
|--------|-------|
| P50 latency | XXms |
| P95 latency | XXms |
| P99 latency | XXms |
| Throughput (RPS) | XX |
| DB queries/request | XX |
| Bundle size | XXkB |

### Fix Applied
[Brief description + link to PR]

### After
| Metric | Before | After | Delta |
|--------|--------|-------|-------|
| P50 | XXms | XXms | -XX% |
| P95 | XXms | XXms | -XX% |
| P99 | XXms | XXms | -XX% |
| RPS | XX | XX | +XX% |
| DB queries/req | XX | XX | -XX% |

### Verification
[Link to k6 output, CI run, or monitoring dashboard]

DATABASE
[ ] Missing indexes on WHERE/ORDER BY columns
[ ] N+1 queries (check query count per request)
[ ] SELECT * when only 2-3 columns needed
[ ] No LIMIT on unbounded queries
[ ] Missing connection pool (new connection per request)
[ ] Stale statistics (run ANALYZE on busy tables)

NODE.JS
[ ] Sync I/O (fs.readFileSync) in request handlers
[ ] JSON.parse/stringify of large objects in hot loops
[ ] Missing response compression (gzip/brotli)
[ ] Dependencies loaded inside request handlers (move to module level)
[ ] Sequential awaits that could be Promise.all

BUNDLE
[ ] Full lodash/moment import instead of specific functions
[ ] Static imports of heavy components (use dynamic import)
[ ] Images not optimized / not using next/image
[ ] No code splitting on routes

API
[ ] No pagination on list endpoints
[ ] No Cache-Control headers on stable responses
[ ] Serial fetches that could run in parallel
[ ] Fetching related data in loops instead of JOINs

始终先建立基线 —— 在修改任何东西之前，记录 P50/P95/P99、RPS 和错误率
一次只做一个更改 —— 隔离变量以确认因果关系，而非相关性
使用真实数据量进行分析 —— 性能特征会随着规模发生巨大变化
设定性能预算 —— 将 p(95) < 200ms 作为 k6 的 CI 门禁
持续监控 —— 为关键代码路径添加 Datadog/Prometheus/Grafana 指标
积极缓存，精确失效 —— 缓存是最快的优化手段，但也最难调试
记录优化成果 —— 在 PR 描述中记录修复前/后的对比，可以激励团队并积累机构知识

🇺🇸English

Performance Profiler

Tier: POWERFUL Category: Engineering / Performance Maintainer: Claude Skills Team

Overview

Systematic performance profiling for Node.js, Python, and Go applications. Identifies CPU bottlenecks with flamegraphs, detects memory leaks with heap snapshots, analyzes bundle sizes, optimizes database queries, detects N+1 patterns, and runs load tests with k6 and Artillery. Enforces a measure-first methodology: establish baseline, identify bottleneck, fix, and verify improvement.

Keywords

performance profiling, flamegraph, memory leak, bundle analysis, N+1 queries, load testing, k6, latency, P99, CPU profiling, heap snapshot, database optimization

Golden Rule: Measure First

WRONG: "I think the N+1 query is slow, let me fix it"
RIGHT: Profile → Confirm bottleneck → Fix → Measure again → Verify improvement

Every optimization must have:
1. Baseline metrics (before)
2. Profiler evidence (what's actually slow)
3. The fix
4. Post-fix metrics (after)
5. Delta calculation (improvement %)

Core Capabilities

1. CPU Profiling

Node.js: Clinic.js flamegraphs, V8 CPU profiles
Python: py-spy flamegraphs, cProfile, scalene
Go: pprof CPU profiles, trace visualization
Browser: Chrome DevTools Performance panel

2. Memory Profiling

Heap snapshots and comparison (before/after)
Garbage collection pressure analysis
Memory leak detection patterns
Retained object graph analysis

3. Database Optimization

EXPLAIN ANALYZE for query plan analysis
N+1 query detection and batching
Slow query log analysis
Missing index identification
Connection pool sizing

4. Bundle Analysis

webpack-bundle-analyzer visualization
Next.js bundle analyzer
Tree-shaking effectiveness
Dynamic import opportunities
Heavy dependency identification

5. Load Testing

k6 scripts with ramp-up patterns
SLA threshold enforcement in CI
Latency percentile tracking (P50, P95, P99)
Concurrent user simulation

When to Use

App is slow and you do not know where the bottleneck is
P99 latency exceeds SLA before a release
Memory usage grows over time (suspected leak)
Bundle size increased after adding dependencies
Preparing for a traffic spike (load test before launch)
Database queries taking >100ms
After a dependency upgrade to verify no regressions

Node.js CPU Profiling

Method 1: Clinic.js Flamegraph

# Install
npm install -g clinic

# Generate flamegraph (starts server, applies load, generates HTML report)
clinic flame -- node server.js

# With specific load profile
clinic flame --autocannon [ /api/endpoint -c 10 -d 30 ] -- node server.js

# Analyze specific scenario
clinic flame --on-port 'autocannon -c 50 -d 60 http://localhost:$PORT/api/heavy-endpoint' -- node server.js

Method 2: V8 CPU Profile

# Start Node with inspector
node --inspect server.js

# Or profile on demand
node --cpu-prof --cpu-prof-dir=./profiles server.js
# Load the .cpuprofile file in Chrome DevTools > Performance

# Programmatic profiling of a specific function
const { Session } = require('inspector');
const session = new Session();
session.connect();

session.post('Profiler.enable', () => {
  session.post('Profiler.start', () => {
    // Run the code you want to profile
    runHeavyOperation();

    session.post('Profiler.stop', (err, { profile }) => {
      require('fs').writeFileSync('profile.cpuprofile', JSON.stringify(profile));
    });
  });
});

Memory Leak Detection

Node.js Heap Snapshots

// Take heap snapshots programmatically
const v8 = require('v8');
const fs = require('fs');

function takeHeapSnapshot(label) {
  const snapshotPath = `heap-${label}-${Date.now()}.heapsnapshot`;
  const stream = v8.writeHeapSnapshot(snapshotPath);
  console.log(`Heap snapshot written to: ${snapshotPath}`);
  return snapshotPath;
}

// Leak detection pattern: compare two snapshots
// 1. Take snapshot at startup
takeHeapSnapshot('baseline');

// 2. Run operations that you suspect leak
// ... process 1000 requests ...

// 3. Force GC and take another snapshot
if (global.gc) global.gc(); // requires --expose-gc flag
takeHeapSnapshot('after-load');

// Load both .heapsnapshot files in Chrome DevTools > Memory
// Use "Comparison" view to find objects that grew

Python Memory Profiling

# Install tracemalloc-based profiler
pip install memray

# Profile a script
memray run my_script.py
memray flamegraph memray-output.bin -o flamegraph.html

# Profile a specific function
python -c "
import tracemalloc
tracemalloc.start()

# Run your code
from my_module import heavy_function
heavy_function()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print('Top 10 memory allocations:')
for stat in top_stats[:10]:
    print(stat)
"

Database Query Optimization

EXPLAIN ANALYZE Workflow

-- Step 1: Get the actual execution plan (not just estimated)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT t.*, p.name as project_name
FROM tasks t
JOIN projects p ON p.id = t.project_id
WHERE p.workspace_id = 'ws_abc123'
  AND t.status = 'in_progress'
  AND t.deleted_at IS NULL
ORDER BY t.updated_at DESC
LIMIT 20;

-- What to look for in the output:
-- Seq Scan on tasks  → MISSING INDEX (should be Index Scan)
-- Rows Removed by Filter: 99000  → INDEX NOT SELECTIVE ENOUGH
-- Sort Method: external merge  → NOT ENOUGH work_mem
-- Nested Loop with inner Seq Scan  → MISSING INDEX ON JOIN COLUMN
-- Actual rows=1000 vs estimated rows=1  → STALE STATISTICS (run ANALYZE)

N+1 Query Detection

// PROBLEM: N+1 query pattern
async function getProjectsWithTasks(workspaceId: string) {
  const projects = await db.query.projects.findMany({
    where: eq(projects.workspaceId, workspaceId),
  });

  // This executes N additional queries (one per project)
  for (const project of projects) {
    project.tasks = await db.query.tasks.findMany({
      where: eq(tasks.projectId, project.id),
    });
  }
  return projects;
}
// Total queries: 1 + N (where N = number of projects)

// FIX: Single query with JOIN or relation loading
async function getProjectsWithTasks(workspaceId: string) {
  return db.query.projects.findMany({
    where: eq(projects.workspaceId, workspaceId),
    with: {
      tasks: true,  // Drizzle generates a single JOIN or subquery
    },
  });
}
// Total queries: 1-2 (depending on ORM strategy)

N+1 Detection Script

# Log query count per request (add to middleware)
# Node.js with Drizzle:
let queryCount = 0;
const originalQuery = db.execute;
db.execute = (...args) => { queryCount++; return originalQuery.apply(db, args); };

// After request completes:
if (queryCount > 10) {
  console.warn(`N+1 ALERT: ${req.method} ${req.path} executed ${queryCount} queries`);
}

Bundle Analysis

Next.js Bundle Analyzer

# Install
pnpm add -D @next/bundle-analyzer

# next.config.js
const withBundleAnalyzer = require('@next/bundle-analyzer')({
  enabled: process.env.ANALYZE === 'true',
});
module.exports = withBundleAnalyzer(nextConfig);

# Run analysis
ANALYZE=true pnpm build
# Opens browser with interactive treemap

Quick Bundle Size Check

# Check what you're shipping
npx source-map-explorer .next/static/chunks/*.js

# Size of individual imports
npx import-cost  # VS Code extension for inline size

# Find heavy dependencies
npx depcheck --json | jq '.dependencies'
npx bundlephobia-cli <package-name>

Common Bundle Wins

Before	After	Savings
`import _ from 'lodash'`	`import groupBy from 'lodash/groupBy'`	~70KB
`import moment from 'moment'`	`import { format } from 'date-fns'`	~60KB
`import { icons } from 'lucide-react'`	`import { Search } from 'lucide-react'`	~50KB
Static import of heavy component

Load Testing with k6

// load-test.k6.js
import http from 'k6/http'
import { check, sleep } from 'k6'
import { Trend, Rate } from 'k6/metrics'

const apiLatency = new Trend('api_latency')
const errorRate = new Rate('errors')

export const options = {
  stages: [
    { duration: '1m', target: 20 },    // ramp up
    { duration: '3m', target: 100 },   // sustain
    { duration: '1m', target: 0 },     // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200', 'p(99)<500'],
    errors: ['rate<0.01'],
    api_latency: ['p(95)<150'],
  },
}

export default function () {
  const res = http.get(`${__ENV.BASE_URL}/api/v1/projects?limit=20`, {
    headers: { Authorization: `Bearer ${__ENV.TOKEN}` },
  })

  apiLatency.add(res.timings.duration)
  check(res, {
    'status 200': (r) => r.status === 200,
    'body has data': (r) => JSON.parse(r.body).data !== undefined,
  }) || errorRate.add(1)

  sleep(1)
}



# Run locally
k6 run load-test.k6.js -e BASE_URL=http://localhost:3000 -e TOKEN=$TOKEN

# Run with cloud reporting
k6 cloud load-test.k6.js

Before/After Measurement Template

## Performance Optimization: [What You Fixed]

**Date:** YYYY-MM-DD
**Ticket:** PROJ-123

### Problem
[1-2 sentences: what was slow, how it was observed]

### Root Cause
[What the profiler revealed — include flamegraph link or screenshot]

### Baseline (Before)
| Metric | Value |
|--------|-------|
| P50 latency | XXms |
| P95 latency | XXms |
| P99 latency | XXms |
| Throughput (RPS) | XX |
| DB queries/request | XX |
| Bundle size | XXkB |

### Fix Applied
[Brief description + link to PR]

### After
| Metric | Before | After | Delta |
|--------|--------|-------|-------|
| P50 | XXms | XXms | -XX% |
| P95 | XXms | XXms | -XX% |
| P99 | XXms | XXms | -XX% |
| RPS | XX | XX | +XX% |
| DB queries/req | XX | XX | -XX% |

### Verification
[Link to k6 output, CI run, or monitoring dashboard]

Quick-Win Optimization Checklist

DATABASE
[ ] Missing indexes on WHERE/ORDER BY columns
[ ] N+1 queries (check query count per request)
[ ] SELECT * when only 2-3 columns needed
[ ] No LIMIT on unbounded queries
[ ] Missing connection pool (new connection per request)
[ ] Stale statistics (run ANALYZE on busy tables)

NODE.JS
[ ] Sync I/O (fs.readFileSync) in request handlers
[ ] JSON.parse/stringify of large objects in hot loops
[ ] Missing response compression (gzip/brotli)
[ ] Dependencies loaded inside request handlers (move to module level)
[ ] Sequential awaits that could be Promise.all

BUNDLE
[ ] Full lodash/moment import instead of specific functions
[ ] Static imports of heavy components (use dynamic import)
[ ] Images not optimized / not using next/image
[ ] No code splitting on routes

API
[ ] No pagination on list endpoints
[ ] No Cache-Control headers on stable responses
[ ] Serial fetches that could run in parallel
[ ] Fetching related data in loops instead of JOINs

Common Pitfalls

Optimizing without measuring — you will optimize the wrong thing
Testing with development data — 10 rows in dev vs millions in prod reveals different bottlenecks
Ignoring P99 — P50 can look fine while P99 is catastrophic for some users
Premature optimization — fix correctness first, then measure and optimize
Not re-measuring after the fix — always verify the fix actually improved the metrics
Load testing production — use staging with production-sized data volumes instead

Best Practices

Baseline first, always — record P50/P95/P99, RPS, and error rate before touching anything
One change at a time — isolate the variable to confirm causation, not correlation
Profile with realistic data volumes — performance characteristics change dramatically with scale
Set performance budgets — p(95) < 200ms as a CI gate with k6
Monitor continuously — add Datadog/Prometheus/Grafana metrics for key code paths
Cache aggressively, invalidate precisely — cache is the fastest optimization but hardest to debug
Document the win — before/after in the PR description motivates the team and creates institutional knowledge

Weekly Installs

Repository

borghei/claude-skills

GitHub Stars

First Seen

Today

Security Audits

Gen Agent Trust HubPass SocketFail SnykPass

Installed on

zencoder1

amp1

cline1

openclaw1

opencode1

cursor1

Node.js/Python/Go性能分析器：火焰图、内存泄漏检测、负载测试与数据库优化

🇨🇳中文介绍

性能分析器

概述

关键词

黄金法则：先测量

核心能力

1. CPU 分析

2. 内存分析

3. 数据库优化

4. 打包分析

相关 Skills

5. 负载测试

使用时机

Node.js CPU 分析

方法 1：Clinic.js 火焰图

方法 2：V8 CPU 分析

内存泄漏检测

Node.js 堆快照

Python 内存分析

数据库查询优化

EXPLAIN ANALYZE 工作流

N+1 查询检测

N+1 检测脚本

打包分析

Next.js 打包分析器

快速打包大小检查

常见的打包优化成果

使用 k6 进行负载测试

修复前/后测量模板

快速优化检查清单

常见陷阱

最佳实践