agent-orchestration-improve-agent by sickn33/antigravity-awesome-skills
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill agent-orchestration-improve-agent通过性能分析、提示工程和持续迭代,系统性改进现有智能体。
[扩展思考:智能体优化需要采用数据驱动的方法,结合性能指标、用户反馈分析和先进的提示工程技术。成功取决于系统性评估、针对性改进以及具备回滚能力的严格测试,以确保生产环境安全。]
使用 context-manager 进行全面的智能体性能分析,用于历史数据收集。
Use: context-manager
Command: analyze-agent-performance $ARGUMENTS --days 30
收集指标包括:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
识别用户交互中的重复模式:
按根本原因对故障进行分类:
生成量化基线指标:
Performance Baseline:
- Task Success Rate: [X%]
- Average Corrections per Task: [Y]
- Tool Call Efficiency: [Z%]
- User Satisfaction Score: [1-10]
- Average Response Latency: [Xms]
- Token Efficiency Ratio: [X:Y]
使用 prompt-engineer 智能体应用高级提示优化技术。
实施结构化的推理模式:
Use: prompt-engineer
Technique: chain-of-thought-optimization
从成功的交互中精选高质量示例:
示例结构:
Good Example:
Input: [User request]
Reasoning: [Step-by-step thought process]
Output: [Successful response]
Why this works: [Key success factors]
Bad Example:
Input: [Similar request]
Output: [Failed response]
Why this fails: [Specific issues]
Correct approach: [Fixed version]
强化智能体身份和能力:
实施自我纠正机制:
Constitutional Principles:
1. Verify factual accuracy before responding
2. Self-check for potential biases or harmful content
3. Validate output format matches requirements
4. Ensure response completeness
5. Maintain consistency with previous responses
添加批判与修订循环:
优化响应结构:
包含 A/B 比较的全面测试框架。
创建代表性的测试场景:
Test Categories:
1. Golden path scenarios (common successful cases)
2. Previously failed tasks (regression testing)
3. Edge cases and corner scenarios
4. Stress tests (complex, multi-step tasks)
5. Adversarial inputs (potential breaking points)
6. Cross-domain tasks (combining capabilities)
比较原始智能体与改进后的智能体:
Use: parallel-test-runner
Config:
- Agent A: Original version
- Agent B: Improved version
- Test set: 100 representative tasks
- Metrics: Success rate, speed, token usage
- Evaluation: Blind human review + automated scoring
统计显著性测试:
全面的评分框架:
任务级指标:
质量指标:
性能指标:
结构化的人工评审流程:
具备监控和回滚能力的安全推出。
系统化的版本控制策略:
Version Format: agent-name-v[MAJOR].[MINOR].[PATCH]
Example: customer-support-v2.3.1
MAJOR: Significant capability changes
MINOR: Prompt improvements, new examples
PATCH: Bug fixes, minor adjustments
维护版本历史:
渐进式部署策略:
快速恢复机制:
Rollback Triggers:
- Success rate drops >10% from baseline
- Critical errors increase >5%
- User complaints spike
- Cost per task increases >20%
- Safety violations detected
Rollback Process:
1. Detect issue via monitoring
2. Alert team immediately
3. Switch to previous stable version
4. Analyze root cause
5. Fix and re-test before retry
实时性能跟踪:
当满足以下条件时,智能体改进被视为成功:
在生产环境使用 30 天后:
建立定期的改进节奏:
记住:智能体优化是一个迭代过程。每个周期都建立在先前学习的基础上,在保持稳定性和安全性的同时逐步提高性能。
每周安装
211
仓库
GitHub 星标
27.1K
首次出现
Jan 28, 2026
安全审计
安装于
opencode199
gemini-cli193
codex191
github-copilot189
cursor178
amp176
Systematic improvement of existing agents through performance analysis, prompt engineering, and continuous iteration.
[Extended thinking: Agent optimization requires a data-driven approach combining performance metrics, user feedback analysis, and advanced prompt engineering techniques. Success depends on systematic evaluation, targeted improvements, and rigorous testing with rollback capabilities for production safety.]
Comprehensive analysis of agent performance using context-manager for historical data collection.
Use: context-manager
Command: analyze-agent-performance $ARGUMENTS --days 30
Collect metrics including:
Identify recurring patterns in user interactions:
Categorize failures by root cause:
Generate quantitative baseline metrics:
Performance Baseline:
- Task Success Rate: [X%]
- Average Corrections per Task: [Y]
- Tool Call Efficiency: [Z%]
- User Satisfaction Score: [1-10]
- Average Response Latency: [Xms]
- Token Efficiency Ratio: [X:Y]
Apply advanced prompt optimization techniques using prompt-engineer agent.
Implement structured reasoning patterns:
Use: prompt-engineer
Technique: chain-of-thought-optimization
Curate high-quality examples from successful interactions:
Example structure:
Good Example:
Input: [User request]
Reasoning: [Step-by-step thought process]
Output: [Successful response]
Why this works: [Key success factors]
Bad Example:
Input: [Similar request]
Output: [Failed response]
Why this fails: [Specific issues]
Correct approach: [Fixed version]
Strengthen agent identity and capabilities:
Implement self-correction mechanisms:
Constitutional Principles:
1. Verify factual accuracy before responding
2. Self-check for potential biases or harmful content
3. Validate output format matches requirements
4. Ensure response completeness
5. Maintain consistency with previous responses
Add critique-and-revise loops:
Optimize response structure:
Comprehensive testing framework with A/B comparison.
Create representative test scenarios:
Test Categories:
1. Golden path scenarios (common successful cases)
2. Previously failed tasks (regression testing)
3. Edge cases and corner scenarios
4. Stress tests (complex, multi-step tasks)
5. Adversarial inputs (potential breaking points)
6. Cross-domain tasks (combining capabilities)
Compare original vs improved agent:
Use: parallel-test-runner
Config:
- Agent A: Original version
- Agent B: Improved version
- Test set: 100 representative tasks
- Metrics: Success rate, speed, token usage
- Evaluation: Blind human review + automated scoring
Statistical significance testing:
Comprehensive scoring framework:
Task-Level Metrics:
Quality Metrics:
Performance Metrics:
Structured human review process:
Safe rollout with monitoring and rollback capabilities.
Systematic versioning strategy:
Version Format: agent-name-v[MAJOR].[MINOR].[PATCH]
Example: customer-support-v2.3.1
MAJOR: Significant capability changes
MINOR: Prompt improvements, new examples
PATCH: Bug fixes, minor adjustments
Maintain version history:
Progressive deployment strategy:
Quick recovery mechanism:
Rollback Triggers:
- Success rate drops >10% from baseline
- Critical errors increase >5%
- User complaints spike
- Cost per task increases >20%
- Safety violations detected
Rollback Process:
1. Detect issue via monitoring
2. Alert team immediately
3. Switch to previous stable version
4. Analyze root cause
5. Fix and re-test before retry
Real-time performance tracking:
Agent improvement is successful when:
After 30 days of production use:
Establish regular improvement cadence:
Remember: Agent optimization is an iterative process. Each cycle builds upon previous learnings, gradually improving performance while maintaining stability and safety.
Weekly Installs
211
Repository
GitHub Stars
27.1K
First Seen
Jan 28, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode199
gemini-cli193
codex191
github-copilot189
cursor178
amp176
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
60,400 周安装