evolve：目标驱动的自主代码改进与质量提升循环工具

evolve by boshu2/agentops

280 周安装量

220 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/boshu2/agentops --skill evolve

自动化代码质量开发运维

🇨🇳中文介绍

/evolve — 目标驱动的复合循环

测量问题所在。修复最严重的问题。再次测量。持续复合。

在 /rpi 上持续运行的自主循环。工作选择顺序：

已收集的 .agents/rpi/next-work.jsonl 工作（最新、具体的后续工作）
开放的、就绪的 beads 工作 (bd ready)
失败的目标和指令差距 (ao goals measure)
测试改进（缺失/覆盖率不足的测试、缺失的回归测试）
验证强化和漏洞排查（门控、审计、漏洞扫描）
复杂性 / TODO / FIXME / 代码漂移 / 死代码 / 过时文档 / 过时研究挖掘
具体功能建议（当没有更紧急的工作时，根据仓库目的推导得出）

休眠是最后的手段。 当前队列为空意味着“运行生成器层”，而不是“停止”。只有在队列层和生成器层在连续多次遍历中都为空后，才进入休眠状态。

/evolve                      # 运行直到触发终止开关、达到最大循环次数或真正休眠
/evolve --max-cycles=5       # 最多运行 5 个循环
/evolve --dry-run            # 显示将要处理的工作，但不执行
/evolve --beads-only         # 跳过目标测量，仅处理 beads 待办事项
/evolve --quality            # 质量优先模式：优先处理事后分析发现的问题
/evolve --quality --max-cycles=10  # 质量模式并限制循环次数
/evolve --athena             # 在第一个循环前运行挖掘 → 碎片整理预热
/evolve --athena --max-cycles=5  # 预热知识库，然后运行 5 个循环
/evolve --test-first         # 默认的严格质量 `/rpi` 执行路径
/evolve --no-test-first      # 显式选择退出测试优先模式

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

标志	默认值	描述
`--max-cycles=N`	无限制	在完成 `N` 个循环后停止
`--dry-run`	关闭	显示计划的循环操作但不执行
`--beads-only`	关闭	跳过目标测量，仅运行待办事项选择
`--skip-baseline`	关闭	跳过首次运行的基线快照
`--quality`	关闭	优先处理已收集的事后分析发现的问题
`--athena`	关闭	在循环 1 之前运行 `ao mine` + `ao defrag` 预热
`--test-first`	开启	将严格质量默认值传递给 `/rpi`
`--no-test-first`	关闭	显式禁用传递给 `/rpi` 的测试优先模式

步骤 0.2: Athena 预热（仅限 `--athena`）

如果未传递 --athena 或处于 --dry-run 模式，则跳过。

在第一个 evolve 循环之前，运行 Athena 循环的机械部分以发现新的信号：

mkdir -p .agents/mine .agents/defrag
echo "Athena warmup: mining signal..."
ao mine --since 26h --quiet 2>/dev/null || echo "(ao mine unavailable — skipping)"

echo "Athena warmup: defrag sweep..."
ao defrag --prune --dedup --quiet 2>/dev/null || echo "(ao defrag unavailable — skipping)"

然后读取 .agents/mine/latest.json 和 .agents/defrag/latest.json，并记录（每项 1-2 句话）：

任何看起来与当前目标相关的孤立研究文件
任何可能是失败目标根本原因的代码热点（最近编辑过的高圈复杂度函数）
任何被碎片整理合并的重复学习内容 — 关于已清理内容的上下文

这些记录将在整个 evolve 会话期间为工作选择提供信息。将它们存储在会话变量（内存中），而不是文件中。

步骤 0.5: 基线（仅首次运行）

如果 --skip-baseline 或 --beads-only 或基线已存在，则跳过。

if [ ! -f .agents/evolve/fitness-0-baseline.json ]; then
  bash scripts/evolve-capture-baseline.sh \
    --label "era-$(date -u +%Y%m%dT%H%M%SZ)" \
    --timeout 60
fi

步骤 1: 终止开关检查

在每个循环的顶部运行：

CYCLE_START_SHA=$(git rev-parse HEAD)
[ -f ~/.config/evolve/KILL ] && echo "KILL: $(cat ~/.config/evolve/KILL)" && exit 0
[ -f .agents/evolve/STOP ] && echo "STOP: $(cat .agents/evolve/STOP 2>/dev/null)" && exit 0

步骤 2: 测量适应度

如果 --beads-only，则跳过。

bash scripts/evolve-measure-fitness.sh \
  --output .agents/evolve/fitness-latest.json \
  --timeout 60 \
  --total-timeout 75

不要写入每个循环的 fitness-{N}-pre.json 文件。滚动文件足以用于工作选择和回归检测。

这通过临时文件加上 JSON 验证，将适应度快照原子性地写入 .agents/evolve/。适应度测量需要 AgentOps CLI，因为包装器会调用 ao goals measure。如果测量超过整个命令的时间限制或返回无效 JSON，包装器将失败且不会覆盖之前的滚动快照。

步骤 3: 选择工作

选择是一个阶梯过程，而非一次性检查。每次有效循环后，返回此步骤的顶部，并在考虑休眠之前重新读取队列。

步骤 3.1: 优先处理已收集的工作

读取 .agents/rpi/next-work.jsonl 并为此仓库选择最高价值的未消耗项。优先顺序：

精确的仓库匹配优先于 *，然后是遗留的无作用域条目
已收集的具体实现工作优先于流程工作
高严重性优先于低严重性

当 evolve 选择一个队列项时，首先认领它：

设置 claim_status: "in_progress"
设置 claimed_by: "evolve:cycle-N"
设置 claimed_at: "<timestamp>"
保持 consumed: false，直到 /rpi 循环和回归门控都成功

如果循环失败、出现回归或在成功前被中断，则释放认领，使该项可用于下一个循环。

步骤 3.2: 开放的、就绪的 beads

如果没有就绪的已收集项，检查 bd ready。选择优先级最高的未阻塞问题。

步骤 3.3: 失败的目标和指令差距（如果 --beads-only 则跳过）

首先评估指令，然后是目标：

来自 ao goals measure --directives 的最高优先级指令差距
权重最高的失败目标（跳过被隔离的振荡目标）
权重较低的失败目标

即使所有队列工作都为空，此步骤也存在。目标是第三个来源，而非停止条件。

DIRECTIVES=$(ao goals measure --directives 2>/dev/null)
FAILING=$(jq -r '.goals[] | select(.result=="fail") | .id' .agents/evolve/fitness-latest.json | head -1)

振荡检查： 在处理失败目标之前，检查它是否已振荡（在 cycle-history.jsonl 中改进→失败的转换次数 ≥ 3）。如果是，则隔离它并尝试下一个失败目标。参见 references/oscillation.md。

# 计算此目标的改进→失败转换次数
OSC_COUNT=$(jq -r "select(.target==\"$FAILING\") | .result" .agents/evolve/cycle-history.jsonl \
  | awk 'prev=="improved" && $0=="fail" {count++} {prev=$0} END {print count+0}')
if [ "$OSC_COUNT" -ge 3 ]; then
  QUARANTINED_GOALS[$FAILING]=true
  echo "{\"cycle\":${CYCLE},\"target\":\"${FAILING}\",\"result\":\"quarantined\",\"oscillations\":${OSC_COUNT},\"timestamp\":\"$(date -Iseconds)\"}" >> .agents/evolve/cycle-history.jsonl
fi

步骤 3.4: 测试改进

当队列和目标都为空时，生成具体的测试工作，而不是空闲：

查找测试覆盖率不足或缺失的包/文件
查找最近 bug 修复路径周围缺失的回归测试
识别不稳定或缺失的无头/运行时冒烟测试

将任何真实发现转化为持久化工作：

当工作需要跟踪的待办事项所有权时，添加一个 bead；或者
当工作应直接流回 /rpi 时，在共享的 next-work 契约下追加一个队列项

步骤 3.5: 验证强化和漏洞排查

如果测试改进生成没有返回任何结果，运行漏洞排查和验证扫描：

缺失的验证门控
薄弱的 lint/契约覆盖率
对风险区域进行漏洞排查式审计
文档、契约和运行时实际情况之间的过时假设

再次强调：将发现转化为 beads 或队列项，然后立即选择优先级最高的结果并继续。

步骤 3.6: 漂移 / 热点 / 死代码挖掘

如果先前的生成器为空，则挖掘：

复杂性热点
过时的 TODO/FIXME 标记
死代码
过时文档
过时研究
生成工件与事实来源文件之间的漂移

不要在此停止。将发现规范化为跟踪工作并继续。

步骤 3.7: 功能建议

如果所有具体的修复层都为空，则根据仓库目的提出一个或多个具体功能想法，将它们写成持久化工作，并继续：

当功能需要评审/待办事项处理时，创建一个 bead
或者当功能已准备好进行下一个 /rpi 循环时，追加一个带有 source: "feature-suggestion" 的队列项

质量模式 (--quality) — 反向级联（发现优先于指令）：

步骤 3.0q: 未消耗的高严重性事后分析发现：

HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high") | .title' \
  .agents/rpi/next-work.jsonl 2>/dev/null | head -1)

步骤 3.1q: 未消耗的中等严重性发现。

步骤 3.2q: 开放的、就绪的 beads。

步骤 3.3q: 紧急门控（权重 >= 5）和最高优先级指令差距。

步骤 3.4q: 测试改进。

步骤 3.5q: 验证强化 / 漏洞排查 / 漂移挖掘。

步骤 3.6q: 功能建议。

这仅在阶梯的顶部反转了标准级联：发现先于目标和指令。它不会跳过生成器层。

当 evolve 选择一个发现时，首先在 next-work.jsonl 中认领它：

设置 claim_status: "in_progress"、claimed_by: "evolve-quality:cycle-N"、claimed_at: "<timestamp>"
仅在 /rpi 循环和回归门控成功后设置 consumed: true
如果 /rpi 循环失败（回归），则清除认领并保持 consumed: false

有关评分和完整详细信息，请参见 references/quality-mode.md。

什么都没找到？ 硬性门控 — 仅在生成器层也为空时才考虑休眠：

# 计算 cycle-history.jsonl 中尾部空闲/未更改条目的数量（可移植，不使用 tac）
IDLE_STREAK=$(awk '/"result"\s*:\s*"(idle|unchanged)"/{streak++; next} {streak=0} END{print streak+0}' \
  .agents/evolve/cycle-history.jsonl 2>/dev/null)

if [ "$GENERATOR_EMPTY_STREAK" -ge 2 ] && [ "$IDLE_STREAK" -ge 2 ]; then
  # 队列层为空 AND 生产者层连续第 3 次遍历为空 — 停止
  echo "Stagnation reached after repeated empty queue + generator passes. Dormancy is the last-resort outcome."
  # go to Teardown — 不要记录另一个空闲条目
fi

如果队列层为空但生成器遍历尚未连续 3 次耗尽，则将新的生成器连续空转次数持久化到 session-state.json 中，并循环回步骤 1。循环前的空队列本身不是停止原因。

仅当没有工作源返回可操作的工作且每个生成器层也都为空时，循环才被视为空闲。针对振荡目标并跳过它的循环，只有在剩余阶梯耗尽后才计为空闲。

如果 --dry-run：报告将要处理的工作，然后进入拆卸步骤。

主要引擎：对任何实现质量的工作使用 /rpi。仅当 bead 已包含可执行范围且跳过发现显然是更好路径时，才允许使用 /implement 和 /crank。

对于已收集的项、失败的目标、指令差距、测试改进、验证强化任务、漏洞排查结果、漂移发现或功能建议：

Invoke /rpi "{normalized work title}" --auto --max-cycles=1

对于 beads 问题：

Prefer: /rpi "Land {issue_id}: {title}" --auto --max-cycles=1
Fallback: /implement {issue_id}

或者对于包含子任务的史诗：Invoke /crank {epic_id}。

如果步骤 3 创建了持久化工作而不是立即执行它，则重新进入步骤 3，让新创建的队列/bead 项通过正常选择顺序胜出。

步骤 5: 回归门控

执行后，验证没有破坏任何东西：

# 检测并运行项目构建+测试
if [ -f Makefile ]; then make test
elif [ -f package.json ]; then npm test
elif [ -f go.mod ]; then go build ./... && go vet ./... && go test ./... -count=1 -timeout 120s
elif [ -f Cargo.toml ]; then cargo build && cargo test
elif [ -f pyproject.toml ] || [ -f setup.py ]; then python -m pytest
else echo "No recognized build system found"; fi

# 跨领域约束检查（捕获布线回归）
if [ -f scripts/check-wiring-closure.sh ]; then
  bash scripts/check-wiring-closure.sh
else
  echo "WARNING: scripts/check-wiring-closure.sh not found — skipping wiring check"
fi

如果不是 --beads-only，则重新测量以生成循环后快照：

bash scripts/evolve-measure-fitness.sh \
  --output .agents/evolve/fitness-latest-post.json \
  --timeout 60 \
  --total-timeout 75 \
  --goal "$GOAL_ID"

# 提取循环历史记录条目的目标计数
PASSING=$(jq '[.goals[] | select(.result=="pass")] | length' .agents/evolve/fitness-latest-post.json 2>/dev/null || echo 0)
TOTAL=$(jq '.goals | length' .agents/evolve/fitness-latest-post.json 2>/dev/null || echo 0)

如果检测到回归（先前通过的目标现在失败）：

git revert HEAD --no-edit  # 单次提交
# 或者对于多次提交：
git revert --no-commit ${CYCLE_START_SHA}..HEAD && git commit -m "revert: evolve cycle ${CYCLE} regression"

将结果设置为 "regressed"。

回归门控后的队列最终确定：

成功： 最终确定任何已认领的队列项，设置 consumed: true、consumed_by 和 consumed_at；清除临时认领字段
失败/回归： 清除 claim_status、claimed_by 和 claimed_at；保持 consumed: false；在 session-state.json 中记录释放操作

在循环的 /post-mortem 完成后，在选择下一个项之前立即重新读取 .agents/rpi/next-work.jsonl。切勿假设循环前的队列状态。

步骤 6: 记录循环 + 提交

两条路径：有效循环会被提交，空闲循环仅本地记录。

有效循环（结果为 improved、regressed 或 harvested）：

# 质量模式：在写入 JSONL 条目之前计算质量分数
QUALITY_SCORE_ARGS=()
if [ "$QUALITY_MODE" = "true" ]; then
  REMAINING_HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high")' \
    .agents/rpi/next-work.jsonl 2>/dev/null | wc -l | tr -d ' ')
  REMAINING_MEDIUM=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="medium")' \
    .agents/rpi/next-work.jsonl 2>/dev/null | wc -l | tr -d ' ')
  QUALITY_SCORE=$((100 - (REMAINING_HIGH * 10) - (REMAINING_MEDIUM * 3)))
  [ "$QUALITY_SCORE" -lt 0 ] && QUALITY_SCORE=0
  QUALITY_SCORE_ARGS=(--quality-score "$QUALITY_SCORE")
fi

ENTRY_JSON="$(
  bash scripts/evolve-log-cycle.sh \
    --cycle "$CYCLE" \
    --target "$TARGET" \
    --result "$OUTCOME" \
    --canonical-sha "$(git rev-parse --short HEAD)" \
    --cycle-start-sha "$CYCLE_START_SHA" \
    --goals-passing "$PASSING" \
    --goals-total "$TOTAL" \
    "${QUALITY_SCORE_ARGS[@]}"
)"
OUTCOME="$(printf '%s\n' "$ENTRY_JSON" | jq -r '.result')"
REAL_CHANGES=$(git diff --name-only "${CYCLE_START_SHA}..HEAD" -- ':!.agents/**' ':!GOALS.yaml' ':!GOALS.md' \
  2>/dev/null | wc -l | tr -d ' ')

# 遥测
bash scripts/log-telemetry.sh evolve cycle-complete cycle=${CYCLE} goal=${TARGET} outcome=${OUTCOME} 2>/dev/null || true

if [ "$OUTCOME" = "unchanged" ]; then
  # 无变化循环：保持仅本地记录，以便历史记录保持诚实且停滞逻辑能够看到它。
  :
elif [ "$REAL_CHANGES" -gt 0 ]; then
  # 完整提交：真实代码被更改
  git add .agents/evolve/cycle-history.jsonl
  git commit -m "evolve: cycle ${CYCLE} -- ${TARGET} ${OUTCOME}"
else
  # 有效循环，但非代理仓库的增量已由子技能提交：
  # 暂存分类账，但不创建独立的后续提交。
  git add .agents/evolve/cycle-history.jsonl
fi

PRODUCTIVE_THIS_SESSION=$((PRODUCTIVE_THIS_SESSION + 1))

空闲循环（即使在生成器层之后也什么都没找到）：

bash scripts/evolve-log-cycle.sh \
  --cycle "$CYCLE" \
  --target "idle" \
  --result "unchanged" >/dev/null
# 不执行 git add、git commit，不写入适应度快照

步骤 7: 循环或停止

while true; do
  # Step 1 .. Step 6
  # 如果触发终止开关、达到最大循环次数或真正的安全断路器，则停止
  # 否则递增循环编号并重新进入选择
  CYCLE=$((CYCLE + 1))
done

仅在有效工作累积时推送：

if [ $((PRODUCTIVE_THIS_SESSION % 5)) -eq 0 ] && [ "$PRODUCTIVE_THIS_SESSION" -gt 0 ]; then
  git push
fi

提交任何已暂存但未提交的 cycle-history.jsonl（来自仅包含工件的循环）：

if git diff --cached --name-only | grep -q cycle-history.jsonl; then

  git commit -m "evolve: session teardown -- artifact-only cycles logged"
fi

2. 运行 /post-mortem "evolve session: ${CYCLE} cycles" 以收集学习内容。 3. 仅当存在未推送的提交时才推送：

UNPUSHED=$(git log origin/main..HEAD --oneline 2>/dev/null | wc -l)
[ "$UNPUSHED" -gt 0 ] && git push

## /evolve 完成
循环次数: N | 有效: X | 回归: Y (已回滚) | 空闲: Z
停止原因: stagnation | circuit-breaker | max-cycles | kill-switch

在质量模式下，报告包含额外字段：

## /evolve 完成 (质量模式)
循环次数: N | 已解决的发现: X | 已修复的目标: Y | 空闲: Z
质量分数: 开始 → 结束 (变化量)
剩余未消耗项: H 个高严重性, M 个中等严重性
停止原因: stagnation | circuit-breaker | max-cycles | kill-switch

用户说： /evolve --max-cycles=5 会发生什么： Evolve 在每次 /rpi 循环后重新进入完整的选择阶梯，并在空队列上运行生产者层而不是空闲。

用户说： /evolve --beads-only 会发生什么： Evolve 跳过目标测量，处理 bd ready 待办事项。

用户说： /evolve --dry-run 会发生什么： Evolve 显示将要处理的工作但不执行。

用户说： /evolve --athena 会发生什么： Evolve 在会话开始时运行 ao mine + ao defrag，以在第一个 evolve 循环之前发现新的信号（孤立研究、代码热点、振荡目标）。在长时间自主运行之前或开发活动爆发之后使用。

用户说： /evolve 会发生什么： 参见 references/examples.md，了解一个经过实际操作的夜间流程，该流程依次处理 beads -> 已收集工作 -> 目标 -> 测试 -> 漏洞排查 -> 功能建议，然后才考虑休眠。

详细演练请参见 references/examples.md。

问题	解决方案
循环立即退出	移除 `~/.config/evolve/KILL` 或 `.agents/evolve/STOP`
重复空遍历后停滞	队列层和生产者层在多次遍历中都为空 — 休眠是后备结果
`ao goals measure` 挂起	使用 `--timeout 30` 标志或 `--beads-only` 跳过
回归门控回滚	检查回滚的更改，缩小范围，重新运行；已认领的队列项必须释放回可用状态

高级故障排除请参见 references/cycle-history.md。

references/cycle-history.md — JSONL 格式、恢复协议、终止开关
references/compounding.md — 知识飞轮和工作收集
references/goals-schema.md — GOALS.yaml 格式和持续指标
references/parallel-execution.md — 并行 /swarm 架构
references/teardown.md — 轨迹计算和会话摘要
references/examples.md — 详细使用示例
references/artifacts.md — 生成文件注册表
references/oscillation.md — 振荡检测和隔离
references/quality-mode.md — 质量优先模式：评分、优先级级联、工件

skills/rpi/SKILL.md — 完整生命周期编排器（每个循环调用）
skills/crank/SKILL.md — 史诗执行（针对 beads 史诗调用）
GOALS.yaml — 此仓库的适应度目标

🇺🇸English

/evolve — Goal-Driven Compounding Loop

Measure what's wrong. Fix the worst thing. Measure again. Compound.

Always-on autonomous loop over /rpi. Work selection order:

Harvested.agents/rpi/next-work.jsonl work (freshest concrete follow-up)
Open ready beads work (bd ready)
Failing goals and directive gaps (ao goals measure)
Testing improvements (missing/thin coverage, missing regression tests)
Validation tightening and bug-hunt passes (gates, audits, bug sweeps)
Complexity / TODO / FIXME / drift / dead code / stale docs / stale research mining
Concrete feature suggestions derived from repo purpose when no sharper work exists

Dormancy is last resort. Empty current queues mean "run the generator layers", not "stop". Only go dormant after the queue layers and generator layers come up empty across multiple consecutive passes.

/evolve                      # Run until kill switch, max-cycles, or real dormancy
/evolve --max-cycles=5       # Cap at 5 cycles
/evolve --dry-run            # Show what would be worked on, don't execute
/evolve --beads-only         # Skip goals measurement, work beads backlog only
/evolve --quality            # Quality-first mode: prioritize post-mortem findings
/evolve --quality --max-cycles=10  # Quality mode with cycle cap
/evolve --athena             # Mine → Defrag warmup before first cycle
/evolve --athena --max-cycles=5  # Warm knowledge base then run 5 cycles
/evolve --test-first         # Default strict-quality /rpi execution path
/evolve --no-test-first      # Explicit opt-out from test-first mode

Flags

Flag	Default	Description
`--max-cycles=N`	unlimited	Stop after `N` completed cycles
`--dry-run`	off	Show planned cycle actions without executing
`--beads-only`	off	Skip goal measurement and run backlog-only selection
`--skip-baseline`	off	Skip first-run baseline snapshot
`--quality`

Execution Steps

YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.

Step 0: Setup

mkdir -p .agents/evolve
ao lookup --query "autonomous improvement cycle" --limit 5 2>/dev/null || true

Before cycle recovery, load the repo execution profile contract when it exists. The repo execution profile is the source for repo policy; the user prompt should mostly supply mission/objective, not restate startup reads, validation bundle, tracker wrapper rules, or definition_of_done.

Locate docs/contracts/repo-execution-profile.md and docs/contracts/repo-execution-profile.schema.json.
Read the ordered startup_reads and bootstrap from those repo paths before selecting work.
Cache repo validation_commands, tracker_commands, and definition_of_done into session state.
If the repo execution profile is present but missing required fields, stop or downgrade with an explicit warning before cycle 1. Do not silently invent repo policy.

Recover cycle number, queue/generator streaks, and the last claimed work item from disk (survives context compaction):

if [ -f .agents/evolve/cycle-history.jsonl ]; then
  CYCLE=$(( $(tail -1 .agents/evolve/cycle-history.jsonl | jq -r '.cycle // 0') + 1 ))
else
  CYCLE=1
fi
SESSION_START_SHA=$(git rev-parse HEAD)

# Recover idle streak from disk (not in-memory — survives compaction)
# Portable: forward-scanning awk counts trailing idle run without tac (unavailable on stock macOS)
IDLE_STREAK=$(awk '/"result"\s*:\s*"(idle|unchanged)"/{streak++; next} {streak=0} END{print streak+0}' \
  .agents/evolve/cycle-history.jsonl 2>/dev/null)

PRODUCTIVE_THIS_SESSION=0

# Recover generator state and queue claim state
if [ -f .agents/evolve/session-state.json ]; then
  GENERATOR_EMPTY_STREAK=$(jq -r '.generator_empty_streak // 0' .agents/evolve/session-state.json 2>/dev/null || echo 0)
  LAST_SELECTED_SOURCE=$(jq -r '.last_selected_source // empty' .agents/evolve/session-state.json 2>/dev/null || true)
  CLAIMED_WORK_REF=$(jq -r '.claimed_work.ref // empty' .agents/evolve/session-state.json 2>/dev/null || true)
else
  GENERATOR_EMPTY_STREAK=0
  LAST_SELECTED_SOURCE=""
  CLAIMED_WORK_REF=""
fi

# Circuit breaker: stop if last productive cycle was >60 minutes ago
LAST_PRODUCTIVE_TS=$(grep -v '"idle"\|"unchanged"' .agents/evolve/cycle-history.jsonl 2>/dev/null \
  | tail -1 | jq -r '.timestamp // empty')
if [ -n "$LAST_PRODUCTIVE_TS" ]; then
  NOW_EPOCH=$(date +%s)
  LAST_EPOCH=$(date -j -f "%Y-%m-%dT%H:%M:%S%z" "$LAST_PRODUCTIVE_TS" +%s 2>/dev/null \
    || date -d "$LAST_PRODUCTIVE_TS" +%s 2>/dev/null || echo 0)
  if [ "$LAST_EPOCH" -gt 1000000000 ] && [ $((NOW_EPOCH - LAST_EPOCH)) -ge 3600 ]; then
    echo "CIRCUIT BREAKER: No productive work in 60+ minutes. Stopping."
    # go to Teardown
  fi
fi

# Track oscillating goals (improved→fail→improved→fail) to avoid burning cycles
declare -A QUARANTINED_GOALS  # goal_id → true if oscillation count >= 3

# Pre-populate quarantine list from cycle history (lightweight local scan)
if [ -f .agents/evolve/cycle-history.jsonl ]; then
  while IFS= read -r goal; do
    QUARANTINED_GOALS[$goal]=true
    echo "Quarantined oscillating goal: $goal"
  done < <(
    jq -r '.target' .agents/evolve/cycle-history.jsonl 2>/dev/null \
    | awk '{
        if (prev != "" && prev != $0) transitions[$0]++
        prev = $0
      }
      END {
        for (g in transitions) if (transitions[g] >= 3) print g
      }'
  )
fi

Parse flags: --max-cycles=N (default unlimited), --dry-run, --beads-only, --skip-baseline, --quality, --athena.

Track cycle-level execution state:

evolve_state = {
  cycle: <current cycle number>,
  mode: <standard|quality|beads-only>,
  test_first: <true by default; false only when --no-test-first>,
  repo_profile_path: <docs/contracts/repo-execution-profile.md or null>,
  startup_reads: <ordered repo bootstrap paths>,
  validation_commands: <ordered repo validation bundle>,
  tracker_commands: <repo tracker shell wrappers>,
  definition_of_done: <repo stop predicates>,
  generator_empty_streak: <consecutive passes where all generator layers returned nothing>,
  last_selected_source: <harvested|beads|goal|directive|testing|validation|bug-hunt|drift|feature>,
  claimed_work: <null or queue reference being worked>,
  queue_refresh_count: <incremented after every /rpi cycle>
}

Persist evolve_state to .agents/evolve/session-state.json at each cycle boundary, after queue claims, after queue release/finalize, and during teardown. cycle-history.jsonl remains the canonical cycle ledger; session-state.json carries resume-only state that has not yet earned a committed cycle entry.

Step 0.2: Athena Warmup (--athena only)

Skip if --athena was not passed or if --dry-run.

Run the mechanical half of the Athena cycle to surface fresh signal before the first evolve cycle:

mkdir -p .agents/mine .agents/defrag
echo "Athena warmup: mining signal..."
ao mine --since 26h --quiet 2>/dev/null || echo "(ao mine unavailable — skipping)"

echo "Athena warmup: defrag sweep..."
ao defrag --prune --dedup --quiet 2>/dev/null || echo "(ao defrag unavailable — skipping)"

Then read .agents/mine/latest.json and .agents/defrag/latest.json and note (in 1-2 sentences each):

Any orphaned research files that look relevant to current goals
Any code hotspots (high-CC functions with recent edits) that may be the root cause of failing goals
Any duplicate learnings merged by defrag — context on what's been cleaned up

These notes inform work selection throughout the evolve session. Store them in a session variable (in-memory), not a file.

Step 0.5: Baseline (first run only)

Skip if --skip-baseline or --beads-only or baseline already exists.

if [ ! -f .agents/evolve/fitness-0-baseline.json ]; then
  bash scripts/evolve-capture-baseline.sh \
    --label "era-$(date -u +%Y%m%dT%H%M%SZ)" \
    --timeout 60
fi

Step 1: Kill Switch Check

Run at the TOP of every cycle:

CYCLE_START_SHA=$(git rev-parse HEAD)
[ -f ~/.config/evolve/KILL ] && echo "KILL: $(cat ~/.config/evolve/KILL)" && exit 0
[ -f .agents/evolve/STOP ] && echo "STOP: $(cat .agents/evolve/STOP 2>/dev/null)" && exit 0

Step 2: Measure Fitness

Skip if --beads-only.

bash scripts/evolve-measure-fitness.sh \
  --output .agents/evolve/fitness-latest.json \
  --timeout 60 \
  --total-timeout 75

Do NOT write per-cyclefitness-{N}-pre.json files. The rolling file is sufficient for work selection and regression detection.

This writes a fitness snapshot to .agents/evolve/ atomically via a temp file plus JSON validation. The AgentOps CLI is required for fitness measurement because the wrapper shells out to ao goals measure. If measurement exceeds the whole-command bound or returns invalid JSON, the wrapper fails without clobbering the previous rolling snapshot.

Step 3: Select Work

Selection is a ladder, not a one-shot check. After every productive cycle, return to the TOP of this step and re-read the queue before considering dormancy.

Step 3.1: Harvested work first

Read .agents/rpi/next-work.jsonl and pick the highest-value unconsumed item for this repo. Prefer:

exact repo match before *, then legacy unscoped entries
already-harvested concrete implementation work before process work
higher severity before lower severity

When evolve picks a queue item, claim it first :

set claim_status: "in_progress"
set claimed_by: "evolve:cycle-N"
set claimed_at: "<timestamp>"
keep consumed: false until the /rpi cycle and regression gate both succeed

If the cycle fails, regresses, or is interrupted before success, release the claim and leave the item available for the next cycle.

Step 3.2: Open ready beads

If no harvested item is ready, check bd ready. Pick the highest-priority unblocked issue.

Step 3.3: Failing goals and directive gaps (skip if --beads-only)

First assess directives, then goals:

top-priority directive gap from ao goals measure --directives
highest-weight failing goals (skip quarantined oscillators)
lower-weight failing goals

This step exists even when all queued work is empty. Goals are the third source, not the stop condition.

DIRECTIVES=$(ao goals measure --directives 2>/dev/null)
FAILING=$(jq -r '.goals[] | select(.result=="fail") | .id' .agents/evolve/fitness-latest.json | head -1)

Oscillation check: Before working a failing goal, check if it has oscillated (improved→fail transitions ≥ 3 times in cycle-history.jsonl). If so, quarantine it and try the next failing goal. See references/oscillation.md.

# Count improved→fail transitions for this goal
OSC_COUNT=$(jq -r "select(.target==\"$FAILING\") | .result" .agents/evolve/cycle-history.jsonl \
  | awk 'prev=="improved" && $0=="fail" {count++} {prev=$0} END {print count+0}')
if [ "$OSC_COUNT" -ge 3 ]; then
  QUARANTINED_GOALS[$FAILING]=true
  echo "{\"cycle\":${CYCLE},\"target\":\"${FAILING}\",\"result\":\"quarantined\",\"oscillations\":${OSC_COUNT},\"timestamp\":\"$(date -Iseconds)\"}" >> .agents/evolve/cycle-history.jsonl
fi

Step 3.4: Testing improvements

When queues and goals are empty, generate concrete testing work instead of idling:

find packages/files with thin or missing tests
look for missing regression tests around recent bug-fix paths
identify flaky or absent headless/runtime smokes

Convert any real finding into durable work:

add a bead when the work needs tracked backlog ownership, or
append a queue item under the shared next-work contract when it should flow directly back into /rpi

Step 3.5: Validation tightening and bug-hunt passes

If testing improvement generation returns nothing, run bug-hunt and validation sweeps:

missing validation gates
weak lint/contract coverage
bug-hunt style audits for risky areas
stale assumptions between docs, contracts, and runtime truth

Again: convert findings into beads or queue items, then immediately select the highest-priority result and continue.

Step 3.6: Drift / hotspot / dead-code mining

If the prior generators are empty, mine for:

complexity hotspots
stale TODO/FIXME markers
dead code
stale docs
stale research
drift between generated artifacts and source-of-truth files

Do not stop here. Normalize findings into tracked work and continue.

Step 3.7: Feature suggestions

If all concrete remediation layers are empty, propose one or more specific feature ideas grounded in the repo purpose, write them as durable work, and continue:

create a bead when the feature needs review/backlog treatment
or append a queue item with source: "feature-suggestion" when it is ready for the next /rpi cycle

Quality mode (--quality) — inverted cascade (findings before directives):

Step 3.0q: Unconsumed high-severity post-mortem findings:

HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high") | .title' \
  .agents/rpi/next-work.jsonl 2>/dev/null | head -1)

Step 3.1q: Unconsumed medium-severity findings.

Step 3.2q: Open ready beads.

Step 3.3q: Emergency gates (weight >= 5) and top directive gaps.

Step 3.4q: Testing improvements.

Step 3.5q: Validation tightening / bug-hunt / drift mining.

Step 3.6q: Feature suggestions.

This inverts the standard cascade only at the top of the ladder: findings BEFORE goals and directives. It does NOT skip the generator layers.

When evolve picks a finding, claim it first in next-work.jsonl:

Set claim_status: "in_progress", claimed_by: "evolve-quality:cycle-N", claimed_at: "<timestamp>"
Set consumed: true only after the /rpi cycle and regression gate succeed
If the /rpi cycle fails (regression), clear the claim and leave consumed: false

See references/quality-mode.md for scoring and full details.

Nothing found? HARD GATE — only consider dormancy after the generator layers also came up empty:

# Count trailing idle/unchanged entries in cycle-history.jsonl (portable, no tac)
IDLE_STREAK=$(awk '/"result"\s*:\s*"(idle|unchanged)"/{streak++; next} {streak=0} END{print streak+0}' \
  .agents/evolve/cycle-history.jsonl 2>/dev/null)

if [ "$GENERATOR_EMPTY_STREAK" -ge 2 ] && [ "$IDLE_STREAK" -ge 2 ]; then
  # Queue layers are empty AND producer layers were empty for the 3rd consecutive pass — STOP
  echo "Stagnation reached after repeated empty queue + generator passes. Dormancy is the last-resort outcome."
  # go to Teardown — do NOT log another idle entry
fi

If the queue layers were empty but a generator pass has not been exhausted 3 times yet, persist the new generator streak in session-state.json and loop back to Step 1. Empty pre-cycle queues are not a stop reason by themselves.

A cycle is idle only if NO work source returned actionable work and every generator layer also came up empty. A cycle that targeted an oscillating goal and skipped it counts as idle only after the remaining ladder was exhausted.

If --dry-run: report what would be worked on and go to Teardown.

Step 4: Execute

Primary engine: use /rpi for any implementation-quality work. /implement and /crank are allowed only when a bead already contains execution-ready scope and skipping discovery is clearly the better path.

For a harvested item, failing goal, directive gap, testing improvement, validation tightening task, bug-hunt result, drift finding, or feature suggestion :

Invoke /rpi "{normalized work title}" --auto --max-cycles=1

For a beads issue :

Prefer: /rpi "Land {issue_id}: {title}" --auto --max-cycles=1
Fallback: /implement {issue_id}

Or for an epic with children: Invoke /crank {epic_id}.

If Step 3 created durable work instead of executing it immediately, re-enter Step 3 and let the newly-created queue/bead item win through the normal selection order.

Step 5: Regression Gate

After execution, verify nothing broke:

# Detect and run project build+test
if [ -f Makefile ]; then make test
elif [ -f package.json ]; then npm test
elif [ -f go.mod ]; then go build ./... && go vet ./... && go test ./... -count=1 -timeout 120s
elif [ -f Cargo.toml ]; then cargo build && cargo test
elif [ -f pyproject.toml ] || [ -f setup.py ]; then python -m pytest
else echo "No recognized build system found"; fi

# Cross-cutting constraint check (catches wiring regressions)
if [ -f scripts/check-wiring-closure.sh ]; then
  bash scripts/check-wiring-closure.sh
else
  echo "WARNING: scripts/check-wiring-closure.sh not found — skipping wiring check"
fi

If not --beads-only, also re-measure to produce a post-cycle snapshot:

bash scripts/evolve-measure-fitness.sh \
  --output .agents/evolve/fitness-latest-post.json \
  --timeout 60 \
  --total-timeout 75 \
  --goal "$GOAL_ID"

# Extract goal counts for cycle history entry
PASSING=$(jq '[.goals[] | select(.result=="pass")] | length' .agents/evolve/fitness-latest-post.json 2>/dev/null || echo 0)
TOTAL=$(jq '.goals | length' .agents/evolve/fitness-latest-post.json 2>/dev/null || echo 0)

If regression detected (previously-passing goal now fails):

git revert HEAD --no-edit  # single commit
# or for multiple commits:
git revert --no-commit ${CYCLE_START_SHA}..HEAD && git commit -m "revert: evolve cycle ${CYCLE} regression"

Set outcome to "regressed".

Queue finalization after the regression gate:

success: finalize any claimed queue item with consumed: true, consumed_by, and consumed_at; clear transient claim fields
failure/regression: clear claim_status, claimed_by, and claimed_at; keep consumed: false; record the release in session-state.json

After the cycle's /post-mortem finishes, immediately re-read .agents/rpi/next-work.jsonl before selecting the next item. Never assume the queue state from before the cycle.

Step 6: Log Cycle + Commit

Two paths: productive cycles get committed, idle cycles are local-only.

PRODUCTIVE cycles (result is improved, regressed, or harvested):

# Quality mode: compute quality_score BEFORE writing the JSONL entry
QUALITY_SCORE_ARGS=()
if [ "$QUALITY_MODE" = "true" ]; then
  REMAINING_HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high")' \
    .agents/rpi/next-work.jsonl 2>/dev/null | wc -l | tr -d ' ')
  REMAINING_MEDIUM=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="medium")' \
    .agents/rpi/next-work.jsonl 2>/dev/null | wc -l | tr -d ' ')
  QUALITY_SCORE=$((100 - (REMAINING_HIGH * 10) - (REMAINING_MEDIUM * 3)))
  [ "$QUALITY_SCORE" -lt 0 ] && QUALITY_SCORE=0
  QUALITY_SCORE_ARGS=(--quality-score "$QUALITY_SCORE")
fi

ENTRY_JSON="$(
  bash scripts/evolve-log-cycle.sh \
    --cycle "$CYCLE" \
    --target "$TARGET" \
    --result "$OUTCOME" \
    --canonical-sha "$(git rev-parse --short HEAD)" \
    --cycle-start-sha "$CYCLE_START_SHA" \
    --goals-passing "$PASSING" \
    --goals-total "$TOTAL" \
    "${QUALITY_SCORE_ARGS[@]}"
)"
OUTCOME="$(printf '%s\n' "$ENTRY_JSON" | jq -r '.result')"
REAL_CHANGES=$(git diff --name-only "${CYCLE_START_SHA}..HEAD" -- ':!.agents/**' ':!GOALS.yaml' ':!GOALS.md' \
  2>/dev/null | wc -l | tr -d ' ')

# Telemetry
bash scripts/log-telemetry.sh evolve cycle-complete cycle=${CYCLE} goal=${TARGET} outcome=${OUTCOME} 2>/dev/null || true

if [ "$OUTCOME" = "unchanged" ]; then
  # No-delta cycle: leave local-only so history stays honest and stagnation logic can see it.
  :
elif [ "$REAL_CHANGES" -gt 0 ]; then
  # Full commit: real code was changed
  git add .agents/evolve/cycle-history.jsonl
  git commit -m "evolve: cycle ${CYCLE} -- ${TARGET} ${OUTCOME}"
else
  # Productive cycle with non-agent repo delta already committed by a sub-skill:
  # stage the ledger but do not create a standalone follow-up commit.
  git add .agents/evolve/cycle-history.jsonl
fi

PRODUCTIVE_THIS_SESSION=$((PRODUCTIVE_THIS_SESSION + 1))

IDLE cycles (nothing found even after generator layers):

bash scripts/evolve-log-cycle.sh \
  --cycle "$CYCLE" \
  --target "idle" \
  --result "unchanged" >/dev/null
# No git add, no git commit, no fitness snapshot write

Step 7: Loop or Stop

while true; do
  # Step 1 .. Step 6
  # Stop if kill switch, max-cycles, or a real safety breaker triggers
  # Otherwise increment cycle and re-enter selection
  CYCLE=$((CYCLE + 1))
done

Push only when productive work has accumulated:

if [ $((PRODUCTIVE_THIS_SESSION % 5)) -eq 0 ] && [ "$PRODUCTIVE_THIS_SESSION" -gt 0 ]; then
  git push
fi

Teardown

Commit any staged but uncommitted cycle-history.jsonl (from artifact-only cycles):

if git diff --cached --name-only | grep -q cycle-history.jsonl; then

  git commit -m "evolve: session teardown -- artifact-only cycles logged"
fi

2. Run /post-mortem "evolve session: ${CYCLE} cycles" to harvest learnings. 3. Push only if unpushed commits exist:

UNPUSHED=$(git log origin/main..HEAD --oneline 2>/dev/null | wc -l)
[ "$UNPUSHED" -gt 0 ] && git push

4. Report summary:

## /evolve Complete
Cycles: N | Productive: X | Regressed: Y (reverted) | Idle: Z
Stop reason: stagnation | circuit-breaker | max-cycles | kill-switch

In quality mode, the report includes additional fields:

## /evolve Complete (quality mode)
Cycles: N | Findings resolved: X | Goals fixed: Y | Idle: Z
Quality score: start → end (delta)
Remaining unconsumed: H high, M medium
Stop reason: stagnation | circuit-breaker | max-cycles | kill-switch

Examples

User says: /evolve --max-cycles=5 What happens: Evolve re-enters the full selection ladder after every /rpi cycle and runs producer layers instead of idling on empty queues.

User says: /evolve --beads-only What happens: Evolve skips goals measurement and works through bd ready backlog.

User says: /evolve --dry-run What happens: Evolve shows what would be worked on without executing.

User says: /evolve --athena What happens: Evolve runs ao mine + ao defrag at session start to surface fresh signal (orphaned research, code hotspots, oscillating goals) before the first evolve cycle. Use before a long autonomous run or after a burst of development activity.

User says: /evolve What happens: See references/examples.md for a worked overnight flow that moves through beads -> harvested work -> goals -> testing -> bug hunt -> feature suggestion before dormancy is considered.

See references/examples.md for detailed walkthroughs.

Troubleshooting

Problem	Solution
Loop exits immediately	Remove `~/.config/evolve/KILL` or `.agents/evolve/STOP`
Stagnation after repeated empty passes	Queue layers and producer layers were empty across multiple passes — dormancy is the fallback outcome
`ao goals measure` hangs	Use `--timeout 30` flag or `--beads-only` to skip
Regression gate reverts	Review reverted changes, narrow scope, re-run; claimed queue items must be released back to available state

See references/cycle-history.md for advanced troubleshooting.

References

references/cycle-history.md — JSONL format, recovery protocol, kill switch
references/compounding.md — Knowledge flywheel and work harvesting
references/goals-schema.md — GOALS.yaml format and continuous metrics
references/parallel-execution.md — Parallel /swarm architecture
references/teardown.md — Trajectory computation and session summary
references/examples.md — Detailed usage examples
references/artifacts.md — Generated files registry
references/oscillation.md — Oscillation detection and quarantine
references/quality-mode.md — Quality-first mode: scoring, priority cascade, artifacts

Reference Documents

Weekly Installs

220

Repository

boshu2/agentops

GitHub Stars

197

First Seen

Feb 12, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode217

codex214

gemini-cli213

github-copilot211

cursor210

kimi-cli207