自我改进代理架构：实现AI代理自主学习与性能提升的工程模式

self-improving-agent by borghei/claude-skills

1 周安装量

29 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/borghei/claude-skills --skill self-improving-agent

AI/机器学习自动化工程/构建系统

🇨🇳中文介绍

自我改进代理 - 自主学习模式

层级: 强大 类别: 工程 标签: 自我改进, AI 代理, 反馈循环, 自动记忆, 元学习, 性能追踪

概述

自我改进代理为随着使用而变得更好的 AI 代理提供了架构模式。大多数代理是无状态的——它们会反复犯同样的错误，因为它们缺乏从自身执行中学习的机制。本技能通过反馈捕获、记忆管理、技能提取和回归检测的具体模式来解决这一差距。

关键见解：自动记忆捕获一切，但管理是将噪音转化为知识的关键。

核心架构

改进循环

┌──────────────────────────────────────────────────────────┐
│                   自我改进循环                           │
│                                                          │
│  ┌─────────┐    ┌──────────┐    ┌─────────────┐        │
│  │ 执行    │───▶│ 评估     │───▶│ 提取       │        │
│  │ 任务    │    │ 结果     │    │ 学习成果   │        │
│  └─────────┘    └──────────┘    └─────────────┘        │
│       ▲                               │                  │
│       │                               ▼                  │
│  ┌─────────┐    ┌──────────┐    ┌─────────────┐        │
│  │ 应用    │◀───│ 提升为   │◀───│ 验证       │        │
│  │ 规则    │    │ 规则     │    │ 学习成果   │        │
│  └─────────┘    └──────────┘    └─────────────┘        │
│                                                          │
└──────────────────────────────────────────────────────────┘

改进成熟度等级

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

等级	名称	机制	示例
0	无状态	会话间无记忆	默认代理行为
1	记录	捕获观察结果，不采取行动	自动记忆日志记录
2	管理	组织并去重观察结果	记忆审查 + 清理
3	提升	将模式提升为强制执行的规则	MEMORY.md 条目变为 CLAUDE.md 规则
4	提取	从已验证的模式创建可重用技能	重复出现的解决方案变为技能包
5	元学习	自适应调整学习策略本身	根据被证明有用的内容调整捕获内容

1. 记忆管理系统

┌─────────────────────────────────────────────────┐
│  CLAUDE.md / .claude/rules/                      │
│  最高权威。每次会话强制执行。                     │
│  容量：无限。加载：完整文件。                     │
├─────────────────────────────────────────────────┤
│  MEMORY.md (自动记忆)                            │
│  项目学习成果。由 Claude 自动捕获。               │
│  容量：加载前 200 行。溢出部分转到                 │
│  主题文件。                                       │
├─────────────────────────────────────────────────┤
│  会话上下文                                       │
│  当前对话。临时的。                               │
│  容量：上下文窗口。                               │
└─────────────────────────────────────────────────┘

定期运行（每周或每 10 次会话后）：

Step 1: 读取 MEMORY.md 和所有主题文件
Step 2: 对每个条目进行分类

  类别：
  - 提升：模式已验证 3 次以上，应成为规则
  - 合并：多个条目表达相同内容
  - 过时：引用已删除的文件、旧模式、已解决的问题
  - 保留：仍然相关，但尚未充分验证以提升
  - 提取：重复出现的解决方案，应成为可重用技能

Step 3: 执行操作
  - 提升条目 → 移动到 CLAUDE.md 或 .claude/rules/
  - 合并条目 → 合并为单个清晰的条目
  - 过时条目 → 删除
  - 提取条目 → 创建技能包（见技能提取）

Step 4: 验证 MEMORY.md 是否少于 200 行
  - 如果超过 200 行：将特定主题的条目移动到主题文件
  - 主题文件：~/.claude/projects/<path>/memory/<topic>.md

当满足以下条件时，条目可以准备提升：

标准	阈值	原因
重现性	在 3+ 次会话中出现	不是一次性事件
一致性	每次都是相同的解决方案	不依赖于上下文
影响	防止了错误或节省了大量时间	值得强制执行
稳定性	底层代码/系统未改变	不会立即过时
清晰度	可以用 1-2 句话说明	规则必须明确

模式类型	提升至	示例
编码规范	`.claude/rules/<area>.md`	"始终使用 `type` 而不是 `interface` 定义对象形状"
项目架构	`CLAUDE.md`	"所有 API 路由都经过中间件链"
工具偏好	`CLAUDE.md`	"使用 pnpm，而不是 npm"
调试模式	`.claude/rules/debugging.md`	"当测试失败时，首先检查环境变量"
文件范围规则	`.claude/rules/<scope>.md` 带 `paths:`	"在 migrations/ 中，始终添加 down migration"

2. 反馈循环设计

每个代理任务都会产生一个结果。对其进行分类：

成功           - 任务完成，用户接受了结果
部分成功       - 任务完成但需要修正
失败           - 任务失败，用户不得不重做
拒绝           - 用户明确拒绝了方法
超时           - 任务超出时间/令牌预算
错误           - 技术错误（工具故障，API 错误）

从结果中提取信号

结果	信号	记忆操作
成功（首次尝试）	方法有效	强化（增加置信度）
成功（修正后）	初始方法存在缺陷	记录修正模式
部分成功（用户编辑了结果）	输出格式或内容存在缺陷	记录用户更改的内容
失败	方法根本错误	记录反模式及上下文
拒绝	误解了需求	记录澄清模式
重复错误	工具或环境问题	记录变通方法或修复

## 学习：[简短描述]

**上下文：**[正在执行什么任务]
**发生了什么：**[结果描述]
**根本原因：**[为什么会出现这个结果]
**正确方法：**[应该怎么做]
**置信度：**[高/中/低]
**重现次数：**[第一次 / 出现 N 次]
**操作：**[保留 / 提升 / 提取]

3. 性能回归检测

指标	测量方式	回归信号
首次尝试成功率	无需修正即被接受的任务	降至 70% 以下
每任务修正次数	代理输出后用户编辑次数	升至每任务 2 次以上
工具错误率	失败的工具调用 / 总调用次数	升至 5% 以上
上下文相关性	实际使用的检索上下文	降至 60% 以下
任务完成时间	完成任务所需的轮次	连续 5 次会话呈上升趋势

1. 检测：指标超过阈值
2. 诊断：比较近期会话与基线
   - 发生了什么变化？（新代码？新模式？新工具？）
   - 哪些任务类型受到影响？
   - 是记忆问题还是能力问题？
3. 响应：
   - 记忆问题 → 审查并管理 MEMORY.md
   - 过时规则 → 更新 CLAUDE.md
   - 新代码模式 → 为新模式添加规则
   - 能力差距 → 提取为技能请求
4. 验证：追踪接下来 3 次会话的指标

当一个解决方案模式被验证且可重用时，将其提取为独立的技能。

当满足以下条件时，模式可以准备提取：
- 在不同上下文中成功使用 5 次以上
- 解决方案具有通用性（非项目特定）
- 从头开始重新创建需要付出非微不足道的努力
- 将使其他项目/用户受益

Step 1: 记录模式
  - 它解决了什么问题？
  - 分步方法是什么？
  - 输入和输出是什么？
  - 边界情况有哪些？

Step 2: 泛化
  - 移除项目特定的细节
  - 识别可配置参数
  - 添加对常见变体的处理

Step 3: 打包为技能
  - 创建带有 frontmatter 的 SKILL.md
  - 添加 references/ 用于知识库
  - 添加 scripts/ 如果可自动化
  - 添加 assets/ 用于模板

Step 4: 验证
  - 在另一个项目上测试
  - 让另一个人/代理使用它
  - 对不清晰的说明进行迭代

自适应捕获策略

并非所有观察结果都具有同等价值。根据被证明有用的内容调整捕获内容：

初始策略：捕获一切
10 次会话后：分析哪些捕获的条目导致了提升
20 次会话后：调整捕获以专注于高价值类别

高价值类别（通常）：
  - 错误解决（80% 提升率）
  - 用户修正（70% 提升率）
  - 工具偏好（60% 提升率）

低价值类别（通常）：
  - 文件结构观察（10% 提升率）
  - 一次性变通方法（5% 提升率）

除了捕获有效的内容，还要主动检测失败的内容：

反模式	检测信号	响应
重复错误的导入路径	相同修正 3+ 次	添加到 CLAUDE.md 作为规则
使用了错误的测试框架	用户总是更改测试方法	添加测试规则
不正确的 API 使用	相同的 API 错误模式	添加 API 使用说明
违反风格指南	用户重新格式化相同的模式	添加风格规则
错误的分支工作流	用户纠正 git 操作	添加 git 工作流规则

每个学到的知识都带有一个置信度分数：

置信度 = 基础分数 * 新近度因子 * 一致性因子

基础分数：
  - 用户明确说明：1.0
  - 从成功结果中观察到：0.8
  - 从模式中推断：0.6
  - 从上下文中猜测：0.3

新近度因子：
  - 最近 7 天：1.0
  - 7-30 天：0.9
  - 30-90 天：0.7
  - 90+ 天：0.5

一致性因子：
  - 从未被反驳：1.0
  - 被反驳一次，但被重申：0.9
  - 被反驳，未被重申：0.5
  - 被主动反驳：0.0（删除）

当新信息与现有知识矛盾时：

1. 比较置信度分数
2. 如果新信息置信度更高 → 更新知识
3. 如果大致相等 → 标记以请求用户确认
4. 如果新信息置信度更低 → 保留现有知识，记录冲突
5. 始终记录冲突以供审查

工作流程 1：每周记忆健康检查

1. 读取所有记忆文件（MEMORY.md + 主题文件）
2. 统计总条目数和行数
3. 对每个条目进行分类：提升 / 合并 / 过时 / 保留 / 提取
4. 执行提升（需用户确认）
5. 执行合并
6. 删除过时条目
7. 验证是否低于 200 行限制
8. 报告：已提升、已合并、已删除、剩余的条目数

工作流程 2：会话后学习捕获

1. 审查会话结果（成功、修正、失败）
2. 对于每个修正：记录错误之处和正确做法
3. 对于每个失败：记录根本原因和正确方法
4. 检查现有记忆中是否有相关条目
5. 如果相关条目存在：增加重现次数计数
6. 如果是新的：添加上下文的条目
7. 如果达到重现阈值：标记为待提升

工作流程 3：回归调查

1. 识别退化的指标
2. 提取该任务类型最近 5 次会话的结果
3. 与基线（前 5 次会话）进行比较
4. 识别变化内容：记忆、代码、规则、环境
5. 提出修复方案：更新规则、添加规则、重新训练模式
6. 应用修复
7. 监控接下来 3 次会话

陷阱	发生原因	修复方法
记忆膨胀	自动捕获但未管理	每周审查，强制执行 200 行限制
过时规则	代码更改，规则未更新	为规则添加时间戳，定期重新验证
过度提升	将一次性模式提升为规则	要求 3+ 次重现才能提升
静默回归	没有指标追踪	实施结果分类
盲目崇拜规则	复制规则但不理解	每条规则必须有"原因"注释
矛盾螺旋	新规则与旧规则冲突	信念修正协议

技能	集成
context-engine	Context Engine 管理代理看到的内容；自我改进代理管理代理记住的内容
agent-designer	Agent Designer 定义代理架构；自我改进代理添加学习层
prompt-engineer-toolkit	随时间退化的提示是一种回归；追踪并测试它们
observability-designer	将代理性能指标与系统指标一起监控

references/feedback-loop-patterns.md - 详细的反馈捕获和分析模式
references/memory-curation-guide.md - 逐步的记忆审查和提升程序
references/meta-learning-architectures.md - 用于学习如何学习的代理的高级模式

🇺🇸English

Self-Improving Agent - Autonomous Learning Patterns

Tier: POWERFUL Category: Engineering Tags: self-improvement, AI agents, feedback loops, auto-memory, meta-learning, performance tracking

Overview

Self-Improving Agent provides architectural patterns for AI agents that get better with use. Most agents are stateless -- they make the same mistakes repeatedly because they lack mechanisms to learn from their own execution. This skill addresses that gap with concrete patterns for feedback capture, memory curation, skill extraction, and regression detection.

The key insight: auto-memory captures everything, but curation is what turns noise into knowledge.

Core Architecture

The Improvement Loop

┌──────────────────────────────────────────────────────────┐
│                   SELF-IMPROVEMENT CYCLE                  │
│                                                          │
│  ┌─────────┐    ┌──────────┐    ┌─────────────┐        │
│  │ Execute  │───▶│ Evaluate │───▶│ Extract     │        │
│  │ Task     │    │ Outcome  │    │ Learnings   │        │
│  └─────────┘    └──────────┘    └─────────────┘        │
│       ▲                               │                  │
│       │                               ▼                  │
│  ┌─────────┐    ┌──────────┐    ┌─────────────┐        │
│  │ Apply   │◀───│ Promote  │◀───│ Validate    │        │
│  │ Rules   │    │ to Rules │    │ Learnings   │        │
│  └─────────┘    └──────────┘    └─────────────┘        │
│                                                          │
└──────────────────────────────────────────────────────────┘

Improvement Maturity Levels

Level	Name	Mechanism	Example
0	Stateless	No memory between sessions	Default agent behavior
1	Recording	Captures observations, no action	Auto-memory logging
2	Curating	Organizes and deduplicates observations	Memory review + cleanup
3	Promoting	Graduates patterns to enforced rules	MEMORY.md entries become CLAUDE.md rules
4	Extracting	Creates reusable skills from proven patterns	Recurring solutions become skill packages
5	Meta-Learning	Adapts learning strategy itself	Adjusts what to capture based on what proved useful

Most agents operate at Level 0-1. This skill provides the machinery for Levels 2-5.

Core Capabilities

1. Memory Curation System

The Memory Stack

┌─────────────────────────────────────────────────┐
│  CLAUDE.md / .claude/rules/                      │
│  Highest authority. Enforced every session.       │
│  Capacity: Unlimited. Load: Full file.           │
├─────────────────────────────────────────────────┤
│  MEMORY.md (auto-memory)                         │
│  Project learnings. Auto-captured by Claude.     │
│  Capacity: First 200 lines loaded. Overflow to   │
│  topic files.                                    │
├─────────────────────────────────────────────────┤
│  Session Context                                  │
│  Current conversation. Ephemeral.                │
│  Capacity: Context window.                       │
└─────────────────────────────────────────────────┘

Memory Review Protocol

Run periodically (weekly or after every 10 sessions):

Step 1: Read MEMORY.md and all topic files
Step 2: Classify each entry

  Categories:
  - PROMOTE: Pattern proven 3+ times, should be a rule
  - CONSOLIDATE: Multiple entries saying the same thing
  - STALE: References deleted files, old patterns, resolved issues
  - KEEP: Still relevant, not yet proven enough to promote
  - EXTRACT: Recurring solution that should be a reusable skill

Step 3: Execute actions
  - PROMOTE entries → move to CLAUDE.md or .claude/rules/
  - CONSOLIDATE entries → merge into single clear entry
  - STALE entries → delete
  - EXTRACT entries → create skill package (see Skill Extraction)

Step 4: Verify MEMORY.md is under 200 lines
  - If over 200: move topic-specific entries to topic files
  - Topic files: ~/.claude/projects/<path>/memory/<topic>.md

Promotion Criteria

An entry is ready for promotion when:

Criterion	Threshold	Why
Recurrence	Seen in 3+ sessions	Not a one-off
Consistency	Same solution every time	Not context-dependent
Impact	Prevented errors or saved significant time	Worth enforcing
Stability	Underlying code/system unchanged	Won't immediately become stale
Clarity	Can be stated in 1-2 sentences	Rules must be unambiguous

Promotion Targets

Pattern Type	Promote To	Example
Coding convention	`.claude/rules/<area>.md`	"Always use `type` not `interface` for object shapes"
Project architecture	`CLAUDE.md`	"All API routes go through middleware chain"
Tool preference	`CLAUDE.md`	"Use pnpm, not npm"
Debugging pattern	`.claude/rules/debugging.md`

2. Feedback Loop Design

Outcome Classification

Every agent task produces an outcome. Classify it:

SUCCESS         - Task completed, user accepted result
PARTIAL         - Task completed but required corrections
FAILURE         - Task failed, user had to redo
REJECTION       - User explicitly rejected approach
TIMEOUT         - Task exceeded time/token budget
ERROR           - Technical error (tool failure, API error)

Signal Extraction from Outcomes

Outcome	Signal	Memory Action
SUCCESS (first try)	Approach works well	Reinforce (increment confidence)
SUCCESS (after correction)	Initial approach had gap	Log the correction pattern
PARTIAL (user edited result)	Output format or content gap	Log what user changed
FAILURE	Approach fundamentally wrong	Log anti-pattern with context
REJECTION	Misunderstood requirements	Log clarification pattern
Repeated ERROR	Tool or environment issue	Log workaround or fix

Feedback Capture Template

## Learning: [Short description]

**Context:** [What task was being performed]
**What happened:** [Outcome description]
**Root cause:** [Why the outcome occurred]
**Correct approach:** [What should have been done]
**Confidence:** [High/Medium/Low]
**Recurrence:** [First time / Seen N times]
**Action:** [KEEP / PROMOTE / EXTRACT]

3. Performance Regression Detection

Metrics to Track

Metric	Measurement	Regression Signal
First-attempt success rate	Tasks accepted without correction	Dropping below 70%
Correction count per task	User edits after agent output	Rising above 2 per task
Tool error rate	Failed tool calls / total calls	Rising above 5%
Context relevance	Retrieved context actually used	Dropping below 60%
Task completion time	Turns to complete task	Rising trend over 5 sessions

Regression Response Protocol

1. DETECT: Metric crosses threshold
2. DIAGNOSE: Compare recent sessions vs baseline
   - What changed? (New code? New patterns? New tools?)
   - Which task types are affected?
   - Is it a memory issue or a capability issue?
3. RESPOND:
   - Memory issue → Review and curate MEMORY.md
   - Stale rules → Update CLAUDE.md
   - New code patterns → Add rules for new patterns
   - Capability gap → Extract as skill request
4. VERIFY: Track metric for next 3 sessions

4. Skill Extraction

When a solution pattern is proven and reusable, extract it into a standalone skill.

Extraction Criteria

A pattern is ready for extraction when:
- Used successfully 5+ times across different contexts
- Solution is generalizable (not project-specific)
- Takes more than trivial effort to recreate from scratch
- Would benefit other projects/users

Extraction Process

Step 1: Document the pattern
  - What problem does it solve?
  - What's the step-by-step approach?
  - What are the inputs and outputs?
  - What are the edge cases?

Step 2: Generalize
  - Remove project-specific details
  - Identify configurable parameters
  - Add handling for common variations

Step 3: Package as skill
  - Create SKILL.md with frontmatter
  - Add references/ for knowledge bases
  - Add scripts/ if automatable
  - Add assets/ for templates

Step 4: Validate
  - Test on a different project
  - Have another person/agent use it
  - Iterate on unclear instructions

5. Meta-Learning Patterns

Adaptive Capture Strategy

Not all observations are equally valuable. Adjust what gets captured based on what proved useful:

Initial strategy: Capture everything
After 10 sessions: Analyze which captured items led to promotions
After 20 sessions: Adjust capture to focus on high-value categories

High-value categories (typically):
  - Error resolutions (80% promotion rate)
  - User corrections (70% promotion rate)
  - Tool preferences (60% promotion rate)

Low-value categories (typically):
  - File structure observations (10% promotion rate)
  - One-off workarounds (5% promotion rate)

Anti-Pattern Detection

Beyond capturing what works, actively detect what fails:

Anti-Pattern	Detection Signal	Response
Repeated wrong import path	Same correction 3+ times	Add to CLAUDE.md as rule
Wrong test framework used	User always changes test approach	Add testing rules
Incorrect API usage	Same API error pattern	Add API usage notes
Style guide violations	User reformats same patterns	Add style rules
Wrong branch workflow	User corrects git operations	Add git workflow rules

6. Continuous Calibration

Confidence Scoring

Every piece of learned knowledge carries a confidence score:

Confidence = base_score * recency_factor * consistency_factor

base_score:
  - User explicitly stated: 1.0
  - Observed from successful outcome: 0.8
  - Inferred from pattern: 0.6
  - Guessed from context: 0.3

recency_factor:
  - Last 7 days: 1.0
  - 7-30 days: 0.9
  - 30-90 days: 0.7
  - 90+ days: 0.5

consistency_factor:
  - Never contradicted: 1.0
  - Contradicted once, reaffirmed: 0.9
  - Contradicted, not reaffirmed: 0.5
  - Actively contradicted: 0.0 (delete)

Belief Revision

When new information contradicts existing knowledge:

1. Compare confidence scores
2. If new info higher confidence → update knowledge
3. If roughly equal → flag for user confirmation
4. If new info lower confidence → keep existing, note conflict
5. Always log the conflict for review

Workflows

Workflow 1: Weekly Memory Health Check

1. Read all memory files (MEMORY.md + topic files)
2. Count total entries and lines
3. For each entry, classify: PROMOTE / CONSOLIDATE / STALE / KEEP / EXTRACT
4. Execute promotions (with user confirmation)
5. Execute consolidations
6. Delete stale entries
7. Verify under 200-line limit
8. Report: entries promoted, consolidated, deleted, remaining

Workflow 2: Post-Session Learning Capture

1. Review session outcomes (successes, corrections, failures)
2. For each correction: log what was wrong and what was right
3. For each failure: log root cause and correct approach
4. Check existing memory for related entries
5. If related entry exists: increment recurrence count
6. If new: add entry with context
7. If recurrence threshold met: flag for promotion

Workflow 3: Regression Investigation

1. Identify the degraded metric
2. Pull last 5 sessions' outcomes for that task type
3. Compare against baseline (first 5 sessions)
4. Identify what changed: memory, code, rules, environment
5. Propose fix: update rule, add rule, retrain pattern
6. Apply fix
7. Monitor next 3 sessions

Common Pitfalls

Pitfall	Why It Happens	Fix
Memory bloat	Auto-capture without curation	Weekly review, enforce 200-line limit
Stale rules	Code changes, rules don't update	Timestamp rules, periodic re-verification
Over-promotion	Promoting one-off patterns as rules	Require 3+ recurrences before promotion
Silent regression	No metrics tracking	Implement outcome classification
Cargo cult rules	Copying rules without understanding	Each rule must have a "why" annotation
Contradiction spirals	New rules conflict with old rules	Belief revision protocol

Integration Points

Skill	Integration
context-engine	Context Engine manages what the agent sees; Self-Improving Agent manages what the agent remembers
agent-designer	Agent Designer defines agent architecture; Self-Improving Agent adds the learning layer
prompt-engineer-toolkit	Prompts that degrade over time are a regression; track and test them
observability-designer	Monitor agent performance metrics alongside system metrics

References

references/feedback-loop-patterns.md - Detailed feedback capture and analysis patterns
references/memory-curation-guide.md - Step-by-step memory review and promotion procedures
references/meta-learning-architectures.md - Advanced patterns for agents that learn how to learn

Weekly Installs

Repository

borghei/claude-skills

GitHub Stars

First Seen

Today

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

zencoder1

amp1

cline1

openclaw1

opencode1

cursor1

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

60,400 周安装