重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
multi-agent-performance-profiling by terrylica/cc-skills
npx skills add https://github.com/terrylica/cc-skills --skill multi-agent-performance-profiling用于生成并行剖析智能体的规范性工作流,以全面识别跨多个系统层的性能瓶颈。成功发现 QuestDB 的摄入速度为每秒 110 万行(比目标快 11 倍),证明数据库并非瓶颈——CloudFront 下载占用了流水线 90% 的时间。
在以下情况使用此技能:
关键成果:
智能体 1:剖析(插桩)
time.perf_counter() 在阶段边界进行插桩智能体 2:数据库配置分析
智能体 3:客户端库分析
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
智能体 4:批处理大小分析
智能体 5:集成与综合
并行执行(所有 5 个智能体同时运行):
Agent 1 (Profiling) → [PARALLEL]
Agent 2 (DB Config) → [PARALLEL]
Agent 3 (Client Library) → [PARALLEL]
Agent 4 (Batch Size) → [PARALLEL]
Agent 5 (Integration) → [PARALLEL - reads tmp/ outputs from others]
关键原则:调查智能体(1-4)之间无依赖关系。集成智能体综合所有发现。
动态待办事项管理:
每个智能体产出:
profile_pipeline.py)
time.perf_counter() 插桩tracemalloc 进行内存剖析示例剖析代码:
import time
# Profile multi-stage pipeline
def profile_pipeline():
results = {}
# Phase 1: Download
start = time.perf_counter()
data = download_from_cdn(url)
results["download"] = time.perf_counter() - start
# Phase 2: Extract
start = time.perf_counter()
csv_data = extract_zip(data)
results["extract"] = time.perf_counter() - start
# Phase 3: Parse
start = time.perf_counter()
df = parse_csv(csv_data)
results["parse"] = time.perf_counter() - start
# Phase 4: Ingest
start = time.perf_counter()
ingest_to_db(df)
results["ingest"] = time.perf_counter() - start
# Analysis
total = sum(results.values())
for phase, duration in results.items():
pct = (duration / total) * 100
print(f"{phase}: {duration:.3f}s ({pct:.1f}%)")
return results
优先级等级:
影响报告格式:
### Recommendation: [Optimization Name] (P0/P1/P2) - [IMPACT LEVEL]
**Impact**: 🔴/🟠/🟡 **Nx improvement**
**Effort**: High/Medium/Low (N days)
**Expected Improvement**: CurrentK → TargetK rows/sec
**Rationale**:
- [Why this matters]
- [Supporting evidence from profiling]
- [Comparison to alternatives]
**Implementation**:
[Code snippet or architecture description]
集成智能体职责:
共识标准:
输入:低于服务等级目标的性能指标 输出:包含基线指标的问题陈述
示例问题陈述:
Performance Issue: BTCUSDT 1m ingestion at 47K rows/sec
Target SLO: >100K rows/sec
Gap: 53% below target
Pipeline: CloudFront download → ZIP extract → CSV parse → QuestDB ILP ingest
目录结构:
tmp/perf-optimization/
profiling/ # Agent 1
profile_pipeline.py
PROFILING_REPORT.md
questdb-config/ # Agent 2
CONFIG_ANALYSIS.md
python-client/ # Agent 3
CLIENT_ANALYSIS.md
batch-size/ # Agent 4
BATCH_ANALYSIS.md
MASTER_INTEGRATION_REPORT.md # Agent 5
智能体分配:
重要:使用包含多个 Task 工具调用的单条消息实现真正的并行
示例:
I'm going to spawn 5 parallel investigation agents:
[Uses Task tool 5 times in a single message]
- Agent 1: Profiling
- Agent 2: QuestDB Config
- Agent 3: Python Client
- Agent 4: Batch Size
- Agent 5: Integration (depends on others completing)
执行:
# All agents run simultaneously (user observes 5 parallel tool calls)
# Each agent writes to its own tmp/ subdirectory
# Integration agent polls for completed reports
进度跟踪:
tmp/ 目录以查找报告文件完成标准:
报告结构:
# Master Performance Optimization Integration Report
## Executive Summary
- Critical discovery (what is/isn't the bottleneck)
- Key findings from each agent (1-sentence summary)
## Top 3 Recommendations (Consensus)
1. [P0 Optimization] - HIGHEST IMPACT
2. [P1 Optimization] - HIGH IMPACT
3. [P2 Optimization] - QUICK WIN
## Agent Investigation Summary
### Agent 1: Profiling
### Agent 2: Database Config
### Agent 3: Client Library
### Agent 4: Batch Size
## Implementation Roadmap
### Phase 1: P0 Optimizations (Week 1)
### Phase 2: P1 Optimizations (Week 2)
### Phase 3: P2 Quick Wins (As time permits)
针对每项建议:
示例实施:
# Before optimization
uv run python tmp/perf-optimization/profiling/profile_pipeline.py
# Output: 47K rows/sec, download=857ms (90%)
# Implement P0 recommendation (concurrent downloads)
# [Make code changes]
# After optimization
uv run python tmp/perf-optimization/profiling/profile_pipeline.py
# Output: 450K rows/sec, download=90ms per symbol * 10 concurrent (90%)
背景:流水线达到每秒 47K 行,目标为每秒 100K 行(低于服务等级目标 53%)
调查前的假设:
5 智能体调查后的发现:
dataframe() 批量摄入)前 3 项建议:
影响:发现数据库摄入速度为每秒 110 万行(比目标快 11 倍)——证明数据库从来不是瓶颈
结果:避免了在下载才是真正瓶颈的情况下浪费 2-3 周优化数据库
❌ 错误做法:仅剖析数据库,假设它是瓶颈 ✅ 正确做法:剖析整个流水线(下载 → 解压 → 解析 → 摄入)
❌ 错误做法:运行智能体 1,等待,然后运行智能体 2,等待,等等。 ✅ 正确做法:使用包含多个 Task 调用的单条消息并行生成所有 5 个智能体
❌ 错误做法:"我们先优化数据库配置吧"(基于假设) ✅ 正确做法:先剖析,发现数据库只占 4% 的时间,转而优化下载
❌ 错误做法:仅实施 P0(影响最大,工作量也最大) ✅ 正确做法:在规划 P0 的同时,实施 P2 快速见效项(4-8 小时工作量带来 1.3 倍提升)
❌ 错误做法:实施优化,假设它生效了 ✅ 正确做法:重新运行剖析脚本,验证是否达到预期改进
不适用 - 剖析脚本是项目特定的(存储在 tmp/perf-optimization/ 中)
profiling_template.py - 阶段边界插桩模板integration_report_template.md - 主集成报告模板impact_quantification_guide.md - 如何评估 P0/P1/P2 优先级不适用 - 剖析产物是项目特定的
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 智能体串行运行 | 使用了多条独立消息 | 在单条消息中使用多 Task 调用生成所有智能体 |
| 集成报告为空 | 智能体报告未写入 | 等待所有 4 个调查智能体完成 |
| 识别出错误的瓶颈 | 单层剖析 | 剖析整个流水线,而不仅仅是假设的层 |
| 剖析结果波动 | 没有预热运行 | 在测量前运行 3-5 次预热迭代 |
| 未剖析内存 | 缺少 tracemalloc | 在剖析脚本中添加 tracemalloc 插桩 |
| P0/P1 优先级不明确 | 没有影响量化 | 为每个发现包含预期的 N 倍改进 |
| 缺少共识 | 未比较智能体结果 | 集成智能体必须综合所有 4 份报告 |
| 重新剖析显示无变化 | 缓存效应 | 在重新剖析前清除缓存、重启服务 |
每周安装次数
62
代码仓库
GitHub 星标数
26
首次出现
2026 年 1 月 24 日
安全审计
安装于
opencode59
gemini-cli58
claude-code57
codex57
github-copilot56
cursor56
Prescriptive workflow for spawning parallel profiling agents to comprehensively identify performance bottlenecks across multiple system layers. Successfully discovered that QuestDB ingests at 1.1M rows/sec (11x faster than target), proving database was NOT the bottleneck - CloudFront download was 90% of pipeline time.
Use this skill when:
Key outcomes:
Agent 1: Profiling (Instrumentation)
Agent 2: Database Configuration Analysis
Agent 3: Client Library Analysis
Agent 4: Batch Size Analysis
Agent 5: Integration & Synthesis
Parallel Execution (all 5 agents run simultaneously):
Agent 1 (Profiling) → [PARALLEL]
Agent 2 (DB Config) → [PARALLEL]
Agent 3 (Client Library) → [PARALLEL]
Agent 4 (Batch Size) → [PARALLEL]
Agent 5 (Integration) → [PARALLEL - reads tmp/ outputs from others]
Key Principle : No dependencies between investigation agents (1-4). Integration agent synthesizes findings.
Dynamic Todo Management:
Each agent produces:
profile_pipeline.py)
Example Profiling Code:
import time
# Profile multi-stage pipeline
def profile_pipeline():
results = {}
# Phase 1: Download
start = time.perf_counter()
data = download_from_cdn(url)
results["download"] = time.perf_counter() - start
# Phase 2: Extract
start = time.perf_counter()
csv_data = extract_zip(data)
results["extract"] = time.perf_counter() - start
# Phase 3: Parse
start = time.perf_counter()
df = parse_csv(csv_data)
results["parse"] = time.perf_counter() - start
# Phase 4: Ingest
start = time.perf_counter()
ingest_to_db(df)
results["ingest"] = time.perf_counter() - start
# Analysis
total = sum(results.values())
for phase, duration in results.items():
pct = (duration / total) * 100
print(f"{phase}: {duration:.3f}s ({pct:.1f}%)")
return results
Priority Levels:
Impact Reporting Format:
### Recommendation: [Optimization Name] (P0/P1/P2) - [IMPACT LEVEL]
**Impact**: 🔴/🟠/🟡 **Nx improvement**
**Effort**: High/Medium/Low (N days)
**Expected Improvement**: CurrentK → TargetK rows/sec
**Rationale**:
- [Why this matters]
- [Supporting evidence from profiling]
- [Comparison to alternatives]
**Implementation**:
[Code snippet or architecture description]
Integration Agent Responsibilities:
Consensus Criteria:
Input : Performance metric below SLO Output : Problem statement with baseline metrics
Example Problem Statement:
Performance Issue: BTCUSDT 1m ingestion at 47K rows/sec
Target SLO: >100K rows/sec
Gap: 53% below target
Pipeline: CloudFront download → ZIP extract → CSV parse → QuestDB ILP ingest
Directory Structure:
tmp/perf-optimization/
profiling/ # Agent 1
profile_pipeline.py
PROFILING_REPORT.md
questdb-config/ # Agent 2
CONFIG_ANALYSIS.md
python-client/ # Agent 3
CLIENT_ANALYSIS.md
batch-size/ # Agent 4
BATCH_ANALYSIS.md
MASTER_INTEGRATION_REPORT.md # Agent 5
Agent Assignment:
IMPORTANT : Use single message with multiple Task tool calls for true parallelism
Example:
I'm going to spawn 5 parallel investigation agents:
[Uses Task tool 5 times in a single message]
- Agent 1: Profiling
- Agent 2: QuestDB Config
- Agent 3: Python Client
- Agent 4: Batch Size
- Agent 5: Integration (depends on others completing)
Execution:
# All agents run simultaneously (user observes 5 parallel tool calls)
# Each agent writes to its own tmp/ subdirectory
# Integration agent polls for completed reports
Progress Tracking:
Completion Criteria:
Report Structure:
# Master Performance Optimization Integration Report
## Executive Summary
- Critical discovery (what is/isn't the bottleneck)
- Key findings from each agent (1-sentence summary)
## Top 3 Recommendations (Consensus)
1. [P0 Optimization] - HIGHEST IMPACT
2. [P1 Optimization] - HIGH IMPACT
3. [P2 Optimization] - QUICK WIN
## Agent Investigation Summary
### Agent 1: Profiling
### Agent 2: Database Config
### Agent 3: Client Library
### Agent 4: Batch Size
## Implementation Roadmap
### Phase 1: P0 Optimizations (Week 1)
### Phase 2: P1 Optimizations (Week 2)
### Phase 3: P2 Quick Wins (As time permits)
For each recommendation:
Example Implementation:
# Before optimization
uv run python tmp/perf-optimization/profiling/profile_pipeline.py
# Output: 47K rows/sec, download=857ms (90%)
# Implement P0 recommendation (concurrent downloads)
# [Make code changes]
# After optimization
uv run python tmp/perf-optimization/profiling/profile_pipeline.py
# Output: 450K rows/sec, download=90ms per symbol * 10 concurrent (90%)
Context : Pipeline achieving 47K rows/sec, target 100K rows/sec (53% below SLO)
Assumptions Before Investigation:
Findings After 5-Agent Investigation:
Top 3 Recommendations:
Impact : Discovered database ingests at 1.1M rows/sec (11x faster than target) - proving database was never the bottleneck
Outcome : Avoided wasting 2-3 weeks optimizing database when download was the real bottleneck
❌ Bad : Profile database only, assume it's the bottleneck ✅ Good : Profile entire pipeline (download → extract → parse → ingest)
❌ Bad : Run Agent 1, wait, then run Agent 2, wait, etc. ✅ Good : Spawn all 5 agents in parallel using single message with multiple Task calls
❌ Bad : "Let's optimize the database config first" (assumption-driven) ✅ Good : Profile first, discover database is only 4% of time, optimize download instead
❌ Bad : Only implement P0 (highest impact, highest effort) ✅ Good : Implement P2 quick wins (1.3x for 4-8 hours effort) while planning P0
❌ Bad : Implement optimization, assume it worked ✅ Good : Re-run profiling script, verify expected improvement achieved
Not applicable - profiling scripts are project-specific (stored in tmp/perf-optimization/)
profiling_template.py - Template for phase-boundary instrumentationintegration_report_template.md - Template for master integration reportimpact_quantification_guide.md - How to assess P0/P1/P2 prioritiesNot applicable - profiling artifacts are project-specific
| Issue | Cause | Solution |
|---|---|---|
| Agents running sequentially | Using separate messages | Spawn all agents in single message with multi-Task |
| Integration report empty | Agent reports not written | Wait for all 4 investigation agents to complete |
| Wrong bottleneck identified | Single-layer profiling | Profile entire pipeline, not just assumed layer |
| Profiling results vary | No warmup runs | Run 3-5 warmup iterations before measuring |
| Memory not profiled | Missing tracemalloc | Add tracemalloc instrumentation to profiling script |
| P0/P1 priority unclear | No impact quantification | Include expected Nx improvement for each finding |
| Consensus missing | Agents not compared | Integration agent must synthesize all 4 reports |
| Re-profile shows no change | Caching effects | Clear caches, restart services before re-profiling |
Weekly Installs
62
Repository
GitHub Stars
26
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode59
gemini-cli58
claude-code57
codex57
github-copilot56
cursor56
Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU
111,700 周安装
Eve Horizon 智能体原生设计:构建以AI智能体为核心的应用程序开发指南
207 周安装
Firecrawl 网页抓取工具:AI 友好的 Markdown 转换与反爬虫处理
201 周安装
阿里云 OpenClaw 最小化冒烟测试设置指南 - 验证 CLI、插件与网关状态
200 周安装
移动端离线支持开发指南:React Native/iOS/Android离线存储与同步最佳实践
201 周安装
ontology 知识图谱工具:类型化词汇表与约束系统,构建可验证的知识图
202 周安装
Grove账户删除指南:使用Wrangler CLI安全删除租户及关联数据
62 周安装