⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

性能测试指南：SLO定义、测试类型、工具与常见瓶颈解决方案

performance-testing by proffesor-for-testing/agentic-qe

61 周安装量

304 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/proffesor-for-testing/agentic-qe --skill performance-testing

开发运维性能优化测试

🇨🇳中文介绍

性能测试

<default_to_action> 测试性能或规划负载测试时：

定义 SLO：p95 响应时间、吞吐量、错误率目标
识别关键路径：收入流程、高流量页面、关键 API
创建真实场景：用户旅程、思考时间、多样化数据
执行并监控：CPU、内存、数据库查询、网络
分析瓶颈并在上线前修复

快速测试类型选择：

预期负载验证 → 负载测试
寻找崩溃点 → 压力测试
突发流量激增 → 尖峰测试
内存泄漏、资源耗尽 → 耐力/浸泡测试
水平/垂直扩展 → 可扩展性测试

关键成功因素：

性能是一项功能，而非事后考虑
尽早并经常测试，而不仅仅在发布前
关注影响用户的瓶颈 </default_to_action>

快速参考卡

何时使用

主要版本发布前
基础设施变更后
扩展事件前（如黑色星期五）
设置 SLA/SLO 时

测试类型

类型	目的	时机
负载

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

305,300 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

174,900 周安装

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

161,200 周安装

Azure 配额管理指南：服务限制、容量验证与配额增加方法

138,600 周安装

指标	目标	原因
p95 响应时间	< 200ms	用户体验
吞吐量	10k 请求/分钟	容量
错误率	< 0.1%	可靠性
CPU	< 70%	余量
内存	< 80%	稳定性

k6 : 现代、基于 JS、CI/CD 友好
JMeter : 企业级、功能丰富
Artillery : 简单的 YAML 配置
Gatling : Scala、优秀的报告

qe-performance-tester: 负载测试编排
qe-quality-analyzer: 结果分析
qe-production-intelligence: 生产环境对比

差： "系统应该很快" 好： "在 1,000 个并发用户下，p95 响应时间 < 200ms"

export const options = {
  thresholds: {
    http_req_duration: ['p(95)<200'],  // 95% < 200ms
    http_req_failed: ['rate<0.01'],     // < 1% failures
  },
};

差：每个用户重复访问首页好：模拟真实用户行为

// Realistic distribution
// 40% browse, 30% search, 20% details, 10% checkout
export default function () {
  const action = Math.random();
  if (action < 0.4) browse();
  else if (action < 0.7) search();
  else if (action < 0.9) viewProduct();
  else checkout();

  sleep(randomInt(1, 5)); // Think time
}

症状： 负载下查询缓慢，连接池耗尽 修复： 添加索引、优化 N+1 查询、增加池大小、使用只读副本

// BAD: 100 orders = 101 queries
const orders = await Order.findAll();
for (const order of orders) {
  const customer = await Customer.findById(order.customerId);
}

// GOOD: 1 query
const orders = await Order.findAll({ include: [Customer] });

问题： 请求路径中的阻塞操作（结账时发送邮件） 修复： 使用消息队列、异步处理、立即返回

检测： 耐力测试、内存分析 常见原因： 事件监听器未清理、缓存无淘汰策略

解决方案： 积极的超时设置、熔断器、缓存、优雅降级

// performance-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 50 },   // Ramp up
    { duration: '3m', target: 50 },   // Steady
    { duration: '1m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });
  sleep(1);
}



# GitHub Actions
- name: Run k6 test
  uses: grafana/k6-action@v0.3.0
  with:
    filename: performance-test.js

Load: 1,000 users | p95: 180ms | Throughput: 5,000 req/s
Error rate: 0.05% | CPU: 65% | Memory: 70%

Load: 1,000 users | p95: 3,500ms ❌ | Throughput: 500 req/s ❌
Error rate: 5% ❌ | CPU: 95% ❌ | Memory: 90% ❌

关联指标：当响应时间激增时，什么发生了变化？
检查日志：错误、警告、慢查询
分析代码：时间花在哪里？
监控资源：CPU、内存、磁盘
追踪请求：端到端流程

❌ 反模式	✅ 更好的做法
测试太晚	尽早并经常测试
不真实的场景	模拟真实用户行为
瞬间从 0 到 1000 用户	逐步增加负载
测试期间无监控	监控一切
无基准	建立并跟踪趋势
一次性测试	持续性能测试

智能体辅助性能测试

// Comprehensive load test
await Task("Load Test", {
  target: 'https://api.example.com',
  scenarios: {
    checkout: { vus: 100, duration: '5m' },
    search: { vus: 200, duration: '5m' },
    browse: { vus: 500, duration: '5m' }
  },
  thresholds: {
    'http_req_duration': ['p(95)<200'],
    'http_req_failed': ['rate<0.01']
  }
}, "qe-performance-tester");

// Bottleneck analysis
await Task("Analyze Bottlenecks", {
  testResults: perfTest,
  metrics: ['cpu', 'memory', 'db_queries', 'network']
}, "qe-performance-tester");

// CI integration
await Task("CI Performance Gate", {
  mode: 'smoke',
  duration: '1m',
  vus: 10,
  failOn: { 'p95_response_time': 300, 'error_rate': 0.01 }
}, "qe-performance-tester");

智能体协调提示

aqe/performance/
├── results/*       - Test execution results
├── baselines/*     - Performance baselines
├── bottlenecks/*   - Identified bottlenecks
└── trends/*        - Historical trends

const perfFleet = await FleetManager.coordinate({
  strategy: 'performance-testing',
  agents: [
    'qe-performance-tester',
    'qe-quality-analyzer',
    'qe-production-intelligence',
    'qe-deployment-readiness'
  ],
  topology: 'sequential'
});

上线前检查清单

负载测试通过（预期流量）
压力测试通过（预期流量的 2-3 倍）
尖峰测试通过（突发激增）
耐力测试通过（24 小时以上）
数据库索引已就位
缓存已配置
监控和告警已设置
性能基准已建立

agentic-quality-engineering - 智能体协调
api-testing-patterns - API 性能
chaos-engineering-resilience - 弹性测试

性能是一项功能： 像测试功能一样测试它 持续测试： 不仅仅在发布前 监控生产环境： 合成监控 + 真实用户监控 修复重要问题： 关注影响用户的瓶颈 跟踪趋势： 及早发现性能退化

使用智能体： 智能体自动化负载测试、分析瓶颈并与生产环境对比。使用智能体来维持大规模的性能。

每次性能测试运行后，将结果追加到此技能目录下的 run-history.json 文件中：

node -e "
const fs = require('fs');
const h = JSON.parse(fs.readFileSync('.claude/skills/performance-testing/run-history.json'));
h.runs.push({date: new Date().toISOString().split('T')[0], scenario: 'load', p95_ms: P95, throughput_rps: RPS, error_rate_pct: ERR});
fs.writeFileSync('.claude/skills/performance-testing/run-history.json', JSON.stringify(h, null, 2));
"

每次运行前读取 run-history.json — 与基准进行比较。如果 p95 比基准增加 >20%，则发出警报。

智能体生成的 k6 脚本经常硬编码基础 URL — 使用环境变量以提高可移植性
容器中的负载测试在达到应用限制之前会先达到资源限制 — 确保容器拥有目标应用 2 倍的资源
智能体忘记在请求之间包含思考时间 — 没有它，负载会不切实际地突发
P95 与 P99 很重要 — 智能体默认使用平均值，这会隐藏尾部延迟问题
基准比较需要一致的环境 — CI 运行器的差异可能导致 20% 以上的噪声

2026 年 1 月 24 日

🇺🇸English

Performance Testing

<default_to_action> When testing performance or planning load tests:

DEFINE SLOs: p95 response time, throughput, error rate targets
IDENTIFY critical paths: revenue flows, high-traffic pages, key APIs
CREATE realistic scenarios: user journeys, think time, varied data
EXECUTE with monitoring: CPU, memory, DB queries, network
ANALYZE bottlenecks and fix before production

Quick Test Type Selection:

Expected load validation → Load testing
Find breaking point → Stress testing
Sudden traffic spike → Spike testing
Memory leaks, resource exhaustion → Endurance/soak testing
Horizontal/vertical scaling → Scalability testing

Critical Success Factors:

Performance is a feature, not an afterthought
Test early and often, not just before release
Focus on user-impacting bottlenecks </default_to_action>

Quick Reference Card

When to Use

Before major releases
After infrastructure changes
Before scaling events (Black Friday)
When setting SLAs/SLOs

Test Types

Type	Purpose	When
Load	Expected traffic	Every release
Stress	Beyond capacity	Quarterly
Spike	Sudden surge	Before events
Endurance	Memory leaks	After code changes
Scalability	Scaling validation	Infrastructure changes

Key Metrics

Metric	Target	Why
p95 response	< 200ms	User experience
Throughput	10k req/min	Capacity
Error rate	< 0.1%	Reliability
CPU	< 70%	Headroom
Memory	< 80%	Stability

Tools

k6 : Modern, JS-based, CI/CD friendly
JMeter : Enterprise, feature-rich
Artillery : Simple YAML configs
Gatling : Scala, great reporting

Agent Coordination

qe-performance-tester: Load test orchestration
qe-quality-analyzer: Results analysis
qe-production-intelligence: Production comparison

Defining SLOs

Bad: "The system should be fast" Good: "p95 response time < 200ms under 1,000 concurrent users"

export const options = {
  thresholds: {
    http_req_duration: ['p(95)<200'],  // 95% < 200ms
    http_req_failed: ['rate<0.01'],     // < 1% failures
  },
};

Realistic Scenarios

Bad: Every user hits homepage repeatedly Good: Model actual user behavior

// Realistic distribution
// 40% browse, 30% search, 20% details, 10% checkout
export default function () {
  const action = Math.random();
  if (action < 0.4) browse();
  else if (action < 0.7) search();
  else if (action < 0.9) viewProduct();
  else checkout();

  sleep(randomInt(1, 5)); // Think time
}

Common Bottlenecks

Database

Symptoms: Slow queries under load, connection pool exhaustion Fixes: Add indexes, optimize N+1 queries, increase pool size, read replicas

N+1 Queries

// BAD: 100 orders = 101 queries
const orders = await Order.findAll();
for (const order of orders) {
  const customer = await Customer.findById(order.customerId);
}

// GOOD: 1 query
const orders = await Order.findAll({ include: [Customer] });

Synchronous Processing

Problem: Blocking operations in request path (sending email during checkout) Fix: Use message queues, process async, return immediately

Memory Leaks

Detection: Endurance testing, memory profiling Common causes: Event listeners not cleaned, caches without eviction

External Dependencies

Solutions: Aggressive timeouts, circuit breakers, caching, graceful degradation

k6 CI/CD Example

// performance-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 50 },   // Ramp up
    { duration: '3m', target: 50 },   // Steady
    { duration: '1m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });
  sleep(1);
}



# GitHub Actions
- name: Run k6 test
  uses: grafana/k6-action@v0.3.0
  with:
    filename: performance-test.js

Analyzing Results

Good Results

Load: 1,000 users | p95: 180ms | Throughput: 5,000 req/s
Error rate: 0.05% | CPU: 65% | Memory: 70%

Problems

Load: 1,000 users | p95: 3,500ms ❌ | Throughput: 500 req/s ❌
Error rate: 5% ❌ | CPU: 95% ❌ | Memory: 90% ❌

Root Cause Analysis

Correlate metrics: When response time spikes, what changes?
Check logs: Errors, warnings, slow queries
Profile code: Where is time spent?
Monitor resources: CPU, memory, disk
Trace requests: End-to-end flow

Anti-Patterns

❌ Anti-Pattern	✅ Better
Testing too late	Test early and often
Unrealistic scenarios	Model real user behavior
0 to 1000 users instantly	Ramp up gradually
No monitoring during tests	Monitor everything
No baseline	Establish and track trends
One-time testing	Continuous performance testing

Agent-Assisted Performance Testing

// Comprehensive load test
await Task("Load Test", {
  target: 'https://api.example.com',
  scenarios: {
    checkout: { vus: 100, duration: '5m' },
    search: { vus: 200, duration: '5m' },
    browse: { vus: 500, duration: '5m' }
  },
  thresholds: {
    'http_req_duration': ['p(95)<200'],
    'http_req_failed': ['rate<0.01']
  }
}, "qe-performance-tester");

// Bottleneck analysis
await Task("Analyze Bottlenecks", {
  testResults: perfTest,
  metrics: ['cpu', 'memory', 'db_queries', 'network']
}, "qe-performance-tester");

// CI integration
await Task("CI Performance Gate", {
  mode: 'smoke',
  duration: '1m',
  vus: 10,
  failOn: { 'p95_response_time': 300, 'error_rate': 0.01 }
}, "qe-performance-tester");

Agent Coordination Hints

Memory Namespace

aqe/performance/
├── results/*       - Test execution results
├── baselines/*     - Performance baselines
├── bottlenecks/*   - Identified bottlenecks
└── trends/*        - Historical trends

Fleet Coordination

const perfFleet = await FleetManager.coordinate({
  strategy: 'performance-testing',
  agents: [
    'qe-performance-tester',
    'qe-quality-analyzer',
    'qe-production-intelligence',
    'qe-deployment-readiness'
  ],
  topology: 'sequential'
});

Pre-Production Checklist

Load test passed (expected traffic)
Stress test passed (2-3x expected)
Spike test passed (sudden surge)
Endurance test passed (24+ hours)
Database indexes in place
Caching configured
Monitoring and alerting set up
Performance baseline established

Related Skills

agentic-quality-engineering - Agent coordination
api-testing-patterns - API performance
chaos-engineering-resilience - Resilience testing

Remember

Performance is a feature: Test it like functionality Test continuously: Not just before launch Monitor production: Synthetic + real user monitoring Fix what matters: Focus on user-impacting bottlenecks Trend over time: Catch degradation early

With Agents: Agents automate load testing, analyze bottlenecks, and compare with production. Use agents to maintain performance at scale.

Run History

After each performance test run, append results to run-history.json in this skill directory:

node -e "
const fs = require('fs');
const h = JSON.parse(fs.readFileSync('.claude/skills/performance-testing/run-history.json'));
h.runs.push({date: new Date().toISOString().split('T')[0], scenario: 'load', p95_ms: P95, throughput_rps: RPS, error_rate_pct: ERR});
fs.writeFileSync('.claude/skills/performance-testing/run-history.json', JSON.stringify(h, null, 2));
"

Read run-history.json before each run — compare with baselines. Alert if p95 increases >20% from baseline.

Gotchas

k6 scripts generated by agent often hardcode base URLs — use environment variables for portability
Load tests in containers hit resource limits before app limits — ensure container has 2x the resources of target
Agent forgets to include think time between requests — without it, load is unrealistically bursty
P95 vs P99 matters — agent defaults to averages which hide tail latency problems
Baseline comparison requires consistent environment — CI runner variance can cause 20%+ noise

Weekly Installs

Repository

proffesor-for-t…entic-qe

GitHub Stars

281

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode54

gemini-cli54

codex54

github-copilot53

cursor53

amp51

Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU

127,000 周安装

性能测试指南：SLO定义、测试类型、工具与常见瓶颈解决方案

🇨🇳中文介绍

性能测试

快速参考卡

何时使用

测试类型

相关 Skills

关键指标

工具

智能体协调

定义 SLO

真实场景

常见瓶颈

数据库

N+1 查询

同步处理

内存泄漏

外部依赖

k6 CI/CD 示例

分析结果

良好结果

问题

根本原因分析

反模式