npx skills add https://github.com/boshu2/agentops --skill evolve测量问题所在。修复最严重的问题。再次测量。持续复合。
在 /rpi 上持续运行的自主循环。工作选择顺序:
.agents/rpi/next-work.jsonl 工作(最新、具体的后续工作)bd ready)ao goals measure)休眠是最后的手段。 当前队列为空意味着“运行生成器层”,而不是“停止”。只有在队列层和生成器层在连续多次遍历中都为空后,才进入休眠状态。
/evolve # 运行直到触发终止开关、达到最大循环次数或真正休眠
/evolve --max-cycles=5 # 最多运行 5 个循环
/evolve --dry-run # 显示将要处理的工作,但不执行
/evolve --beads-only # 跳过目标测量,仅处理 beads 待办事项
/evolve --quality # 质量优先模式:优先处理事后分析发现的问题
/evolve --quality --max-cycles=10 # 质量模式并限制循环次数
/evolve --athena # 在第一个循环前运行挖掘 → 碎片整理预热
/evolve --athena --max-cycles=5 # 预热知识库,然后运行 5 个循环
/evolve --test-first # 默认的严格质量 `/rpi` 执行路径
/evolve --no-test-first # 显式选择退出测试优先模式
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 标志 | 默认值 | 描述 |
|---|---|---|
--max-cycles=N | 无限制 | 在完成 N 个循环后停止 |
--dry-run | 关闭 | 显示计划的循环操作但不执行 |
--beads-only | 关闭 | 跳过目标测量,仅运行待办事项选择 |
--skip-baseline | 关闭 | 跳过首次运行的基线快照 |
--quality | 关闭 | 优先处理已收集的事后分析发现的问题 |
--athena | 关闭 | 在循环 1 之前运行 ao mine + ao defrag 预热 |
--test-first | 开启 | 将严格质量默认值传递给 /rpi |
--no-test-first | 关闭 | 显式禁用传递给 /rpi 的测试优先模式 |
你必须执行此工作流。不要仅仅描述它。
mkdir -p .agents/evolve
ao lookup --query "autonomous improvement cycle" --limit 5 2>/dev/null || true
在循环恢复之前,如果存在仓库执行配置文件契约,则加载它。仓库执行配置文件是仓库策略的来源;用户提示应主要提供使命/目标,而不是重复启动读取、验证包、跟踪器包装器规则或 definition_of_done。
docs/contracts/repo-execution-profile.md 和 docs/contracts/repo-execution-profile.schema.json。startup_reads,并在选择工作之前从这些仓库路径进行引导。validation_commands、tracker_commands 和 definition_of_done 缓存到会话状态中。从磁盘恢复循环编号、队列/生成器连续空转次数以及上次认领的工作项(在上下文压缩后仍保留):
if [ -f .agents/evolve/cycle-history.jsonl ]; then
CYCLE=$(( $(tail -1 .agents/evolve/cycle-history.jsonl | jq -r '.cycle // 0') + 1 ))
else
CYCLE=1
fi
SESSION_START_SHA=$(git rev-parse HEAD)
# 从磁盘恢复空闲连续次数(非内存中 — 在压缩后仍保留)
# 可移植:正向扫描 awk 计算尾部空闲运行次数,不使用 tac(在标准 macOS 上不可用)
IDLE_STREAK=$(awk '/"result"\s*:\s*"(idle|unchanged)"/{streak++; next} {streak=0} END{print streak+0}' \
.agents/evolve/cycle-history.jsonl 2>/dev/null)
PRODUCTIVE_THIS_SESSION=0
# 恢复生成器状态和队列认领状态
if [ -f .agents/evolve/session-state.json ]; then
GENERATOR_EMPTY_STREAK=$(jq -r '.generator_empty_streak // 0' .agents/evolve/session-state.json 2>/dev/null || echo 0)
LAST_SELECTED_SOURCE=$(jq -r '.last_selected_source // empty' .agents/evolve/session-state.json 2>/dev/null || true)
CLAIMED_WORK_REF=$(jq -r '.claimed_work.ref // empty' .agents/evolve/session-state.json 2>/dev/null || true)
else
GENERATOR_EMPTY_STREAK=0
LAST_SELECTED_SOURCE=""
CLAIMED_WORK_REF=""
fi
# 断路器:如果上次有效工作是在 60 分钟前,则停止
LAST_PRODUCTIVE_TS=$(grep -v '"idle"\|"unchanged"' .agents/evolve/cycle-history.jsonl 2>/dev/null \
| tail -1 | jq -r '.timestamp // empty')
if [ -n "$LAST_PRODUCTIVE_TS" ]; then
NOW_EPOCH=$(date +%s)
LAST_EPOCH=$(date -j -f "%Y-%m-%dT%H:%M:%S%z" "$LAST_PRODUCTIVE_TS" +%s 2>/dev/null \
|| date -d "$LAST_PRODUCTIVE_TS" +%s 2>/dev/null || echo 0)
if [ "$LAST_EPOCH" -gt 1000000000 ] && [ $((NOW_EPOCH - LAST_EPOCH)) -ge 3600 ]; then
echo "CIRCUIT BREAKER: No productive work in 60+ minutes. Stopping."
# go to Teardown
fi
fi
# 跟踪振荡目标(改进→失败→改进→失败)以避免浪费循环
declare -A QUARANTINED_GOALS # goal_id → 如果振荡次数 >= 3 则为 true
# 从循环历史记录预填充隔离列表(轻量级本地扫描)
if [ -f .agents/evolve/cycle-history.jsonl ]; then
while IFS= read -r goal; do
QUARANTINED_GOALS[$goal]=true
echo "Quarantined oscillating goal: $goal"
done < <(
jq -r '.target' .agents/evolve/cycle-history.jsonl 2>/dev/null \
| awk '{
if (prev != "" && prev != $0) transitions[$0]++
prev = $0
}
END {
for (g in transitions) if (transitions[g] >= 3) print g
}'
)
fi
解析标志:--max-cycles=N(默认无限制)、--dry-run、--beads-only、--skip-baseline、--quality、--athena。
跟踪循环级别的执行状态:
evolve_state = {
cycle: <当前循环编号>,
mode: <standard|quality|beads-only>,
test_first: <默认 true;仅在 --no-test-first 时为 false>,
repo_profile_path: <docs/contracts/repo-execution-profile.md 或 null>,
startup_reads: <有序的仓库引导路径>,
validation_commands: <有序的仓库验证包>,
tracker_commands: <仓库跟踪器 shell 包装器>,
definition_of_done: <仓库停止谓词>,
generator_empty_streak: <所有生成器层都返回空结果的连续遍历次数>,
last_selected_source: <harvested|beads|goal|directive|testing|validation|bug-hunt|drift|feature>,
claimed_work: <null 或正在处理中的队列引用>,
queue_refresh_count: <每次 /rpi 循环后递增>
}
在每个循环边界、队列认领后、队列释放/最终确定后以及拆卸期间,将 evolve_state 持久化到 .agents/evolve/session-state.json。cycle-history.jsonl 仍然是规范的循环分类账;session-state.json 携带仅用于恢复的状态,这些状态尚未获得已提交的循环条目。
--athena)如果未传递 --athena 或处于 --dry-run 模式,则跳过。
在第一个 evolve 循环之前,运行 Athena 循环的机械部分以发现新的信号:
mkdir -p .agents/mine .agents/defrag
echo "Athena warmup: mining signal..."
ao mine --since 26h --quiet 2>/dev/null || echo "(ao mine unavailable — skipping)"
echo "Athena warmup: defrag sweep..."
ao defrag --prune --dedup --quiet 2>/dev/null || echo "(ao defrag unavailable — skipping)"
然后读取 .agents/mine/latest.json 和 .agents/defrag/latest.json,并记录(每项 1-2 句话):
这些记录将在整个 evolve 会话期间为工作选择提供信息。将它们存储在会话变量(内存中),而不是文件中。
如果 --skip-baseline 或 --beads-only 或基线已存在,则跳过。
if [ ! -f .agents/evolve/fitness-0-baseline.json ]; then
bash scripts/evolve-capture-baseline.sh \
--label "era-$(date -u +%Y%m%dT%H%M%SZ)" \
--timeout 60
fi
在每个循环的顶部运行:
CYCLE_START_SHA=$(git rev-parse HEAD)
[ -f ~/.config/evolve/KILL ] && echo "KILL: $(cat ~/.config/evolve/KILL)" && exit 0
[ -f .agents/evolve/STOP ] && echo "STOP: $(cat .agents/evolve/STOP 2>/dev/null)" && exit 0
如果 --beads-only,则跳过。
bash scripts/evolve-measure-fitness.sh \
--output .agents/evolve/fitness-latest.json \
--timeout 60 \
--total-timeout 75
不要写入每个循环的 fitness-{N}-pre.json 文件。滚动文件足以用于工作选择和回归检测。
这通过临时文件加上 JSON 验证,将适应度快照原子性地写入 .agents/evolve/。适应度测量需要 AgentOps CLI,因为包装器会调用 ao goals measure。如果测量超过整个命令的时间限制或返回无效 JSON,包装器将失败且不会覆盖之前的滚动快照。
选择是一个阶梯过程,而非一次性检查。每次有效循环后,返回此步骤的顶部,并在考虑休眠之前重新读取队列。
步骤 3.1: 优先处理已收集的工作
读取 .agents/rpi/next-work.jsonl 并为此仓库选择最高价值的未消耗项。优先顺序:
*,然后是遗留的无作用域条目当 evolve 选择一个队列项时,首先认领它:
claim_status: "in_progress"claimed_by: "evolve:cycle-N"claimed_at: "<timestamp>"consumed: false,直到 /rpi 循环和回归门控都成功如果循环失败、出现回归或在成功前被中断,则释放认领,使该项可用于下一个循环。
步骤 3.2: 开放的、就绪的 beads
如果没有就绪的已收集项,检查 bd ready。选择优先级最高的未阻塞问题。
步骤 3.3: 失败的目标和指令差距(如果 --beads-only 则跳过)
首先评估指令,然后是目标:
ao goals measure --directives 的最高优先级指令差距即使所有队列工作都为空,此步骤也存在。目标是第三个来源,而非停止条件。
DIRECTIVES=$(ao goals measure --directives 2>/dev/null)
FAILING=$(jq -r '.goals[] | select(.result=="fail") | .id' .agents/evolve/fitness-latest.json | head -1)
振荡检查: 在处理失败目标之前,检查它是否已振荡(在 cycle-history.jsonl 中改进→失败的转换次数 ≥ 3)。如果是,则隔离它并尝试下一个失败目标。参见 references/oscillation.md。
# 计算此目标的改进→失败转换次数
OSC_COUNT=$(jq -r "select(.target==\"$FAILING\") | .result" .agents/evolve/cycle-history.jsonl \
| awk 'prev=="improved" && $0=="fail" {count++} {prev=$0} END {print count+0}')
if [ "$OSC_COUNT" -ge 3 ]; then
QUARANTINED_GOALS[$FAILING]=true
echo "{\"cycle\":${CYCLE},\"target\":\"${FAILING}\",\"result\":\"quarantined\",\"oscillations\":${OSC_COUNT},\"timestamp\":\"$(date -Iseconds)\"}" >> .agents/evolve/cycle-history.jsonl
fi
步骤 3.4: 测试改进
当队列和目标都为空时,生成具体的测试工作,而不是空闲:
将任何真实发现转化为持久化工作:
/rpi 时,在共享的 next-work 契约下追加一个队列项步骤 3.5: 验证强化和漏洞排查
如果测试改进生成没有返回任何结果,运行漏洞排查和验证扫描:
再次强调:将发现转化为 beads 或队列项,然后立即选择优先级最高的结果并继续。
步骤 3.6: 漂移 / 热点 / 死代码挖掘
如果先前的生成器为空,则挖掘:
不要在此停止。将发现规范化为跟踪工作并继续。
步骤 3.7: 功能建议
如果所有具体的修复层都为空,则根据仓库目的提出一个或多个具体功能想法,将它们写成持久化工作,并继续:
/rpi 循环时,追加一个带有 source: "feature-suggestion" 的队列项质量模式 (--quality) — 反向级联(发现优先于指令):
步骤 3.0q: 未消耗的高严重性事后分析发现:
HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high") | .title' \
.agents/rpi/next-work.jsonl 2>/dev/null | head -1)
步骤 3.1q: 未消耗的中等严重性发现。
步骤 3.2q: 开放的、就绪的 beads。
步骤 3.3q: 紧急门控(权重 >= 5)和最高优先级指令差距。
步骤 3.4q: 测试改进。
步骤 3.5q: 验证强化 / 漏洞排查 / 漂移挖掘。
步骤 3.6q: 功能建议。
这仅在阶梯的顶部反转了标准级联:发现先于目标和指令。它不会跳过生成器层。
当 evolve 选择一个发现时,首先在 next-work.jsonl 中认领它:
claim_status: "in_progress"、claimed_by: "evolve-quality:cycle-N"、claimed_at: "<timestamp>"/rpi 循环和回归门控成功后设置 consumed: true/rpi 循环失败(回归),则清除认领并保持 consumed: false有关评分和完整详细信息,请参见 references/quality-mode.md。
什么都没找到? 硬性门控 — 仅在生成器层也为空时才考虑休眠:
# 计算 cycle-history.jsonl 中尾部空闲/未更改条目的数量(可移植,不使用 tac)
IDLE_STREAK=$(awk '/"result"\s*:\s*"(idle|unchanged)"/{streak++; next} {streak=0} END{print streak+0}' \
.agents/evolve/cycle-history.jsonl 2>/dev/null)
if [ "$GENERATOR_EMPTY_STREAK" -ge 2 ] && [ "$IDLE_STREAK" -ge 2 ]; then
# 队列层为空 AND 生产者层连续第 3 次遍历为空 — 停止
echo "Stagnation reached after repeated empty queue + generator passes. Dormancy is the last-resort outcome."
# go to Teardown — 不要记录另一个空闲条目
fi
如果队列层为空但生成器遍历尚未连续 3 次耗尽,则将新的生成器连续空转次数持久化到 session-state.json 中,并循环回步骤 1。循环前的空队列本身不是停止原因。
仅当没有工作源返回可操作的工作且每个生成器层也都为空时,循环才被视为空闲。针对振荡目标并跳过它的循环,只有在剩余阶梯耗尽后才计为空闲。
如果 --dry-run:报告将要处理的工作,然后进入拆卸步骤。
主要引擎:对任何实现质量的工作使用 /rpi。仅当 bead 已包含可执行范围且跳过发现显然是更好路径时,才允许使用 /implement 和 /crank。
对于已收集的项、失败的目标、指令差距、测试改进、验证强化任务、漏洞排查结果、漂移发现或功能建议:
Invoke /rpi "{normalized work title}" --auto --max-cycles=1
对于 beads 问题:
Prefer: /rpi "Land {issue_id}: {title}" --auto --max-cycles=1
Fallback: /implement {issue_id}
或者对于包含子任务的史诗:Invoke /crank {epic_id}。
如果步骤 3 创建了持久化工作而不是立即执行它,则重新进入步骤 3,让新创建的队列/bead 项通过正常选择顺序胜出。
执行后,验证没有破坏任何东西:
# 检测并运行项目构建+测试
if [ -f Makefile ]; then make test
elif [ -f package.json ]; then npm test
elif [ -f go.mod ]; then go build ./... && go vet ./... && go test ./... -count=1 -timeout 120s
elif [ -f Cargo.toml ]; then cargo build && cargo test
elif [ -f pyproject.toml ] || [ -f setup.py ]; then python -m pytest
else echo "No recognized build system found"; fi
# 跨领域约束检查(捕获布线回归)
if [ -f scripts/check-wiring-closure.sh ]; then
bash scripts/check-wiring-closure.sh
else
echo "WARNING: scripts/check-wiring-closure.sh not found — skipping wiring check"
fi
如果不是 --beads-only,则重新测量以生成循环后快照:
bash scripts/evolve-measure-fitness.sh \
--output .agents/evolve/fitness-latest-post.json \
--timeout 60 \
--total-timeout 75 \
--goal "$GOAL_ID"
# 提取循环历史记录条目的目标计数
PASSING=$(jq '[.goals[] | select(.result=="pass")] | length' .agents/evolve/fitness-latest-post.json 2>/dev/null || echo 0)
TOTAL=$(jq '.goals | length' .agents/evolve/fitness-latest-post.json 2>/dev/null || echo 0)
如果检测到回归(先前通过的目标现在失败):
git revert HEAD --no-edit # 单次提交
# 或者对于多次提交:
git revert --no-commit ${CYCLE_START_SHA}..HEAD && git commit -m "revert: evolve cycle ${CYCLE} regression"
将结果设置为 "regressed"。
回归门控后的队列最终确定:
consumed: true、consumed_by 和 consumed_at;清除临时认领字段claim_status、claimed_by 和 claimed_at;保持 consumed: false;在 session-state.json 中记录释放操作在循环的 /post-mortem 完成后,在选择下一个项之前立即重新读取 .agents/rpi/next-work.jsonl。切勿假设循环前的队列状态。
两条路径:有效循环会被提交,空闲循环仅本地记录。
有效循环(结果为 improved、regressed 或 harvested):
# 质量模式:在写入 JSONL 条目之前计算质量分数
QUALITY_SCORE_ARGS=()
if [ "$QUALITY_MODE" = "true" ]; then
REMAINING_HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high")' \
.agents/rpi/next-work.jsonl 2>/dev/null | wc -l | tr -d ' ')
REMAINING_MEDIUM=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="medium")' \
.agents/rpi/next-work.jsonl 2>/dev/null | wc -l | tr -d ' ')
QUALITY_SCORE=$((100 - (REMAINING_HIGH * 10) - (REMAINING_MEDIUM * 3)))
[ "$QUALITY_SCORE" -lt 0 ] && QUALITY_SCORE=0
QUALITY_SCORE_ARGS=(--quality-score "$QUALITY_SCORE")
fi
ENTRY_JSON="$(
bash scripts/evolve-log-cycle.sh \
--cycle "$CYCLE" \
--target "$TARGET" \
--result "$OUTCOME" \
--canonical-sha "$(git rev-parse --short HEAD)" \
--cycle-start-sha "$CYCLE_START_SHA" \
--goals-passing "$PASSING" \
--goals-total "$TOTAL" \
"${QUALITY_SCORE_ARGS[@]}"
)"
OUTCOME="$(printf '%s\n' "$ENTRY_JSON" | jq -r '.result')"
REAL_CHANGES=$(git diff --name-only "${CYCLE_START_SHA}..HEAD" -- ':!.agents/**' ':!GOALS.yaml' ':!GOALS.md' \
2>/dev/null | wc -l | tr -d ' ')
# 遥测
bash scripts/log-telemetry.sh evolve cycle-complete cycle=${CYCLE} goal=${TARGET} outcome=${OUTCOME} 2>/dev/null || true
if [ "$OUTCOME" = "unchanged" ]; then
# 无变化循环:保持仅本地记录,以便历史记录保持诚实且停滞逻辑能够看到它。
:
elif [ "$REAL_CHANGES" -gt 0 ]; then
# 完整提交:真实代码被更改
git add .agents/evolve/cycle-history.jsonl
git commit -m "evolve: cycle ${CYCLE} -- ${TARGET} ${OUTCOME}"
else
# 有效循环,但非代理仓库的增量已由子技能提交:
# 暂存分类账,但不创建独立的后续提交。
git add .agents/evolve/cycle-history.jsonl
fi
PRODUCTIVE_THIS_SESSION=$((PRODUCTIVE_THIS_SESSION + 1))
空闲循环(即使在生成器层之后也什么都没找到):
bash scripts/evolve-log-cycle.sh \
--cycle "$CYCLE" \
--target "idle" \
--result "unchanged" >/dev/null
# 不执行 git add、git commit,不写入适应度快照
while true; do
# Step 1 .. Step 6
# 如果触发终止开关、达到最大循环次数或真正的安全断路器,则停止
# 否则递增循环编号并重新进入选择
CYCLE=$((CYCLE + 1))
done
仅在有效工作累积时推送:
if [ $((PRODUCTIVE_THIS_SESSION % 5)) -eq 0 ] && [ "$PRODUCTIVE_THIS_SESSION" -gt 0 ]; then
git push
fi
if git diff --cached --name-only | grep -q cycle-history.jsonl; then
git commit -m "evolve: session teardown -- artifact-only cycles logged"
fi
2. 运行 /post-mortem "evolve session: ${CYCLE} cycles" 以收集学习内容。
3. 仅当存在未推送的提交时才推送:
UNPUSHED=$(git log origin/main..HEAD --oneline 2>/dev/null | wc -l)
[ "$UNPUSHED" -gt 0 ] && git push
4. 报告摘要:
## /evolve 完成
循环次数: N | 有效: X | 回归: Y (已回滚) | 空闲: Z
停止原因: stagnation | circuit-breaker | max-cycles | kill-switch
在质量模式下,报告包含额外字段:
## /evolve 完成 (质量模式)
循环次数: N | 已解决的发现: X | 已修复的目标: Y | 空闲: Z
质量分数: 开始 → 结束 (变化量)
剩余未消耗项: H 个高严重性, M 个中等严重性
停止原因: stagnation | circuit-breaker | max-cycles | kill-switch
用户说: /evolve --max-cycles=5 会发生什么: Evolve 在每次 /rpi 循环后重新进入完整的选择阶梯,并在空队列上运行生产者层而不是空闲。
用户说: /evolve --beads-only 会发生什么: Evolve 跳过目标测量,处理 bd ready 待办事项。
用户说: /evolve --dry-run 会发生什么: Evolve 显示将要处理的工作但不执行。
用户说: /evolve --athena 会发生什么: Evolve 在会话开始时运行 ao mine + ao defrag,以在第一个 evolve 循环之前发现新的信号(孤立研究、代码热点、振荡目标)。在长时间自主运行之前或开发活动爆发之后使用。
用户说: /evolve 会发生什么: 参见 references/examples.md,了解一个经过实际操作的夜间流程,该流程依次处理 beads -> 已收集工作 -> 目标 -> 测试 -> 漏洞排查 -> 功能建议,然后才考虑休眠。
详细演练请参见 references/examples.md。
| 问题 | 解决方案 |
|---|---|
| 循环立即退出 | 移除 ~/.config/evolve/KILL 或 .agents/evolve/STOP |
| 重复空遍历后停滞 | 队列层和生产者层在多次遍历中都为空 — 休眠是后备结果 |
ao goals measure 挂起 | 使用 --timeout 30 标志或 --beads-only 跳过 |
| 回归门控回滚 | 检查回滚的更改,缩小范围,重新运行;已认领的队列项必须释放回可用状态 |
高级故障排除请参见 references/cycle-history.md。
references/cycle-history.md — JSONL 格式、恢复协议、终止开关references/compounding.md — 知识飞轮和工作收集references/goals-schema.md — GOALS.yaml 格式和持续指标references/parallel-execution.md — 并行 /swarm 架构references/teardown.md — 轨迹计算和会话摘要references/examples.md — 详细使用示例references/artifacts.md — 生成文件注册表references/oscillation.md — 振荡检测和隔离references/quality-mode.md — 质量优先模式:评分、优先级级联、工件skills/rpi/SKILL.md — 完整生命周期编排器(每个循环调用)skills/crank/SKILL.md — 史诗执行(针对 beads 史诗调用)GOALS.yaml — 此仓库的适应度目标每周安装次数
220
仓库
GitHub 星标数
197
首次出现
2026年2月12日
安全审计
已安装于
opencode217
codex214
gemini-cli213
github-copilot211
cursor210
kimi-cli207
Measure what's wrong. Fix the worst thing. Measure again. Compound.
Always-on autonomous loop over /rpi. Work selection order:
.agents/rpi/next-work.jsonl work (freshest concrete follow-up)bd ready)ao goals measure)Dormancy is last resort. Empty current queues mean "run the generator layers", not "stop". Only go dormant after the queue layers and generator layers come up empty across multiple consecutive passes.
/evolve # Run until kill switch, max-cycles, or real dormancy
/evolve --max-cycles=5 # Cap at 5 cycles
/evolve --dry-run # Show what would be worked on, don't execute
/evolve --beads-only # Skip goals measurement, work beads backlog only
/evolve --quality # Quality-first mode: prioritize post-mortem findings
/evolve --quality --max-cycles=10 # Quality mode with cycle cap
/evolve --athena # Mine → Defrag warmup before first cycle
/evolve --athena --max-cycles=5 # Warm knowledge base then run 5 cycles
/evolve --test-first # Default strict-quality /rpi execution path
/evolve --no-test-first # Explicit opt-out from test-first mode
| Flag | Default | Description |
|---|---|---|
--max-cycles=N | unlimited | Stop after N completed cycles |
--dry-run | off | Show planned cycle actions without executing |
--beads-only | off | Skip goal measurement and run backlog-only selection |
--skip-baseline | off | Skip first-run baseline snapshot |
--quality |
YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.
mkdir -p .agents/evolve
ao lookup --query "autonomous improvement cycle" --limit 5 2>/dev/null || true
Before cycle recovery, load the repo execution profile contract when it exists. The repo execution profile is the source for repo policy; the user prompt should mostly supply mission/objective, not restate startup reads, validation bundle, tracker wrapper rules, or definition_of_done.
docs/contracts/repo-execution-profile.md and docs/contracts/repo-execution-profile.schema.json.startup_reads and bootstrap from those repo paths before selecting work.validation_commands, tracker_commands, and definition_of_done into session state.Recover cycle number, queue/generator streaks, and the last claimed work item from disk (survives context compaction):
if [ -f .agents/evolve/cycle-history.jsonl ]; then
CYCLE=$(( $(tail -1 .agents/evolve/cycle-history.jsonl | jq -r '.cycle // 0') + 1 ))
else
CYCLE=1
fi
SESSION_START_SHA=$(git rev-parse HEAD)
# Recover idle streak from disk (not in-memory — survives compaction)
# Portable: forward-scanning awk counts trailing idle run without tac (unavailable on stock macOS)
IDLE_STREAK=$(awk '/"result"\s*:\s*"(idle|unchanged)"/{streak++; next} {streak=0} END{print streak+0}' \
.agents/evolve/cycle-history.jsonl 2>/dev/null)
PRODUCTIVE_THIS_SESSION=0
# Recover generator state and queue claim state
if [ -f .agents/evolve/session-state.json ]; then
GENERATOR_EMPTY_STREAK=$(jq -r '.generator_empty_streak // 0' .agents/evolve/session-state.json 2>/dev/null || echo 0)
LAST_SELECTED_SOURCE=$(jq -r '.last_selected_source // empty' .agents/evolve/session-state.json 2>/dev/null || true)
CLAIMED_WORK_REF=$(jq -r '.claimed_work.ref // empty' .agents/evolve/session-state.json 2>/dev/null || true)
else
GENERATOR_EMPTY_STREAK=0
LAST_SELECTED_SOURCE=""
CLAIMED_WORK_REF=""
fi
# Circuit breaker: stop if last productive cycle was >60 minutes ago
LAST_PRODUCTIVE_TS=$(grep -v '"idle"\|"unchanged"' .agents/evolve/cycle-history.jsonl 2>/dev/null \
| tail -1 | jq -r '.timestamp // empty')
if [ -n "$LAST_PRODUCTIVE_TS" ]; then
NOW_EPOCH=$(date +%s)
LAST_EPOCH=$(date -j -f "%Y-%m-%dT%H:%M:%S%z" "$LAST_PRODUCTIVE_TS" +%s 2>/dev/null \
|| date -d "$LAST_PRODUCTIVE_TS" +%s 2>/dev/null || echo 0)
if [ "$LAST_EPOCH" -gt 1000000000 ] && [ $((NOW_EPOCH - LAST_EPOCH)) -ge 3600 ]; then
echo "CIRCUIT BREAKER: No productive work in 60+ minutes. Stopping."
# go to Teardown
fi
fi
# Track oscillating goals (improved→fail→improved→fail) to avoid burning cycles
declare -A QUARANTINED_GOALS # goal_id → true if oscillation count >= 3
# Pre-populate quarantine list from cycle history (lightweight local scan)
if [ -f .agents/evolve/cycle-history.jsonl ]; then
while IFS= read -r goal; do
QUARANTINED_GOALS[$goal]=true
echo "Quarantined oscillating goal: $goal"
done < <(
jq -r '.target' .agents/evolve/cycle-history.jsonl 2>/dev/null \
| awk '{
if (prev != "" && prev != $0) transitions[$0]++
prev = $0
}
END {
for (g in transitions) if (transitions[g] >= 3) print g
}'
)
fi
Parse flags: --max-cycles=N (default unlimited), --dry-run, --beads-only, --skip-baseline, --quality, --athena.
Track cycle-level execution state:
evolve_state = {
cycle: <current cycle number>,
mode: <standard|quality|beads-only>,
test_first: <true by default; false only when --no-test-first>,
repo_profile_path: <docs/contracts/repo-execution-profile.md or null>,
startup_reads: <ordered repo bootstrap paths>,
validation_commands: <ordered repo validation bundle>,
tracker_commands: <repo tracker shell wrappers>,
definition_of_done: <repo stop predicates>,
generator_empty_streak: <consecutive passes where all generator layers returned nothing>,
last_selected_source: <harvested|beads|goal|directive|testing|validation|bug-hunt|drift|feature>,
claimed_work: <null or queue reference being worked>,
queue_refresh_count: <incremented after every /rpi cycle>
}
Persist evolve_state to .agents/evolve/session-state.json at each cycle boundary, after queue claims, after queue release/finalize, and during teardown. cycle-history.jsonl remains the canonical cycle ledger; session-state.json carries resume-only state that has not yet earned a committed cycle entry.
Skip if --athena was not passed or if --dry-run.
Run the mechanical half of the Athena cycle to surface fresh signal before the first evolve cycle:
mkdir -p .agents/mine .agents/defrag
echo "Athena warmup: mining signal..."
ao mine --since 26h --quiet 2>/dev/null || echo "(ao mine unavailable — skipping)"
echo "Athena warmup: defrag sweep..."
ao defrag --prune --dedup --quiet 2>/dev/null || echo "(ao defrag unavailable — skipping)"
Then read .agents/mine/latest.json and .agents/defrag/latest.json and note (in 1-2 sentences each):
These notes inform work selection throughout the evolve session. Store them in a session variable (in-memory), not a file.
Skip if --skip-baseline or --beads-only or baseline already exists.
if [ ! -f .agents/evolve/fitness-0-baseline.json ]; then
bash scripts/evolve-capture-baseline.sh \
--label "era-$(date -u +%Y%m%dT%H%M%SZ)" \
--timeout 60
fi
Run at the TOP of every cycle:
CYCLE_START_SHA=$(git rev-parse HEAD)
[ -f ~/.config/evolve/KILL ] && echo "KILL: $(cat ~/.config/evolve/KILL)" && exit 0
[ -f .agents/evolve/STOP ] && echo "STOP: $(cat .agents/evolve/STOP 2>/dev/null)" && exit 0
Skip if --beads-only.
bash scripts/evolve-measure-fitness.sh \
--output .agents/evolve/fitness-latest.json \
--timeout 60 \
--total-timeout 75
Do NOT write per-cyclefitness-{N}-pre.json files. The rolling file is sufficient for work selection and regression detection.
This writes a fitness snapshot to .agents/evolve/ atomically via a temp file plus JSON validation. The AgentOps CLI is required for fitness measurement because the wrapper shells out to ao goals measure. If measurement exceeds the whole-command bound or returns invalid JSON, the wrapper fails without clobbering the previous rolling snapshot.
Selection is a ladder, not a one-shot check. After every productive cycle, return to the TOP of this step and re-read the queue before considering dormancy.
Step 3.1: Harvested work first
Read .agents/rpi/next-work.jsonl and pick the highest-value unconsumed item for this repo. Prefer:
*, then legacy unscoped entriesWhen evolve picks a queue item, claim it first :
claim_status: "in_progress"claimed_by: "evolve:cycle-N"claimed_at: "<timestamp>"consumed: false until the /rpi cycle and regression gate both succeedIf the cycle fails, regresses, or is interrupted before success, release the claim and leave the item available for the next cycle.
Step 3.2: Open ready beads
If no harvested item is ready, check bd ready. Pick the highest-priority unblocked issue.
Step 3.3: Failing goals and directive gaps (skip if --beads-only)
First assess directives, then goals:
ao goals measure --directivesThis step exists even when all queued work is empty. Goals are the third source, not the stop condition.
DIRECTIVES=$(ao goals measure --directives 2>/dev/null)
FAILING=$(jq -r '.goals[] | select(.result=="fail") | .id' .agents/evolve/fitness-latest.json | head -1)
Oscillation check: Before working a failing goal, check if it has oscillated (improved→fail transitions ≥ 3 times in cycle-history.jsonl). If so, quarantine it and try the next failing goal. See references/oscillation.md.
# Count improved→fail transitions for this goal
OSC_COUNT=$(jq -r "select(.target==\"$FAILING\") | .result" .agents/evolve/cycle-history.jsonl \
| awk 'prev=="improved" && $0=="fail" {count++} {prev=$0} END {print count+0}')
if [ "$OSC_COUNT" -ge 3 ]; then
QUARANTINED_GOALS[$FAILING]=true
echo "{\"cycle\":${CYCLE},\"target\":\"${FAILING}\",\"result\":\"quarantined\",\"oscillations\":${OSC_COUNT},\"timestamp\":\"$(date -Iseconds)\"}" >> .agents/evolve/cycle-history.jsonl
fi
Step 3.4: Testing improvements
When queues and goals are empty, generate concrete testing work instead of idling:
Convert any real finding into durable work:
/rpiStep 3.5: Validation tightening and bug-hunt passes
If testing improvement generation returns nothing, run bug-hunt and validation sweeps:
Again: convert findings into beads or queue items, then immediately select the highest-priority result and continue.
Step 3.6: Drift / hotspot / dead-code mining
If the prior generators are empty, mine for:
Do not stop here. Normalize findings into tracked work and continue.
Step 3.7: Feature suggestions
If all concrete remediation layers are empty, propose one or more specific feature ideas grounded in the repo purpose, write them as durable work, and continue:
source: "feature-suggestion" when it is ready for the next /rpi cycleQuality mode (--quality) — inverted cascade (findings before directives):
Step 3.0q: Unconsumed high-severity post-mortem findings:
HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high") | .title' \
.agents/rpi/next-work.jsonl 2>/dev/null | head -1)
Step 3.1q: Unconsumed medium-severity findings.
Step 3.2q: Open ready beads.
Step 3.3q: Emergency gates (weight >= 5) and top directive gaps.
Step 3.4q: Testing improvements.
Step 3.5q: Validation tightening / bug-hunt / drift mining.
Step 3.6q: Feature suggestions.
This inverts the standard cascade only at the top of the ladder: findings BEFORE goals and directives. It does NOT skip the generator layers.
When evolve picks a finding, claim it first in next-work.jsonl:
claim_status: "in_progress", claimed_by: "evolve-quality:cycle-N", claimed_at: "<timestamp>"consumed: true only after the /rpi cycle and regression gate succeedconsumed: falseSee references/quality-mode.md for scoring and full details.
Nothing found? HARD GATE — only consider dormancy after the generator layers also came up empty:
# Count trailing idle/unchanged entries in cycle-history.jsonl (portable, no tac)
IDLE_STREAK=$(awk '/"result"\s*:\s*"(idle|unchanged)"/{streak++; next} {streak=0} END{print streak+0}' \
.agents/evolve/cycle-history.jsonl 2>/dev/null)
if [ "$GENERATOR_EMPTY_STREAK" -ge 2 ] && [ "$IDLE_STREAK" -ge 2 ]; then
# Queue layers are empty AND producer layers were empty for the 3rd consecutive pass — STOP
echo "Stagnation reached after repeated empty queue + generator passes. Dormancy is the last-resort outcome."
# go to Teardown — do NOT log another idle entry
fi
If the queue layers were empty but a generator pass has not been exhausted 3 times yet, persist the new generator streak in session-state.json and loop back to Step 1. Empty pre-cycle queues are not a stop reason by themselves.
A cycle is idle only if NO work source returned actionable work and every generator layer also came up empty. A cycle that targeted an oscillating goal and skipped it counts as idle only after the remaining ladder was exhausted.
If --dry-run: report what would be worked on and go to Teardown.
Primary engine: use /rpi for any implementation-quality work. /implement and /crank are allowed only when a bead already contains execution-ready scope and skipping discovery is clearly the better path.
For a harvested item, failing goal, directive gap, testing improvement, validation tightening task, bug-hunt result, drift finding, or feature suggestion :
Invoke /rpi "{normalized work title}" --auto --max-cycles=1
For a beads issue :
Prefer: /rpi "Land {issue_id}: {title}" --auto --max-cycles=1
Fallback: /implement {issue_id}
Or for an epic with children: Invoke /crank {epic_id}.
If Step 3 created durable work instead of executing it immediately, re-enter Step 3 and let the newly-created queue/bead item win through the normal selection order.
After execution, verify nothing broke:
# Detect and run project build+test
if [ -f Makefile ]; then make test
elif [ -f package.json ]; then npm test
elif [ -f go.mod ]; then go build ./... && go vet ./... && go test ./... -count=1 -timeout 120s
elif [ -f Cargo.toml ]; then cargo build && cargo test
elif [ -f pyproject.toml ] || [ -f setup.py ]; then python -m pytest
else echo "No recognized build system found"; fi
# Cross-cutting constraint check (catches wiring regressions)
if [ -f scripts/check-wiring-closure.sh ]; then
bash scripts/check-wiring-closure.sh
else
echo "WARNING: scripts/check-wiring-closure.sh not found — skipping wiring check"
fi
If not --beads-only, also re-measure to produce a post-cycle snapshot:
bash scripts/evolve-measure-fitness.sh \
--output .agents/evolve/fitness-latest-post.json \
--timeout 60 \
--total-timeout 75 \
--goal "$GOAL_ID"
# Extract goal counts for cycle history entry
PASSING=$(jq '[.goals[] | select(.result=="pass")] | length' .agents/evolve/fitness-latest-post.json 2>/dev/null || echo 0)
TOTAL=$(jq '.goals | length' .agents/evolve/fitness-latest-post.json 2>/dev/null || echo 0)
If regression detected (previously-passing goal now fails):
git revert HEAD --no-edit # single commit
# or for multiple commits:
git revert --no-commit ${CYCLE_START_SHA}..HEAD && git commit -m "revert: evolve cycle ${CYCLE} regression"
Set outcome to "regressed".
Queue finalization after the regression gate:
consumed: true, consumed_by, and consumed_at; clear transient claim fieldsclaim_status, claimed_by, and claimed_at; keep consumed: false; record the release in session-state.jsonAfter the cycle's /post-mortem finishes, immediately re-read .agents/rpi/next-work.jsonl before selecting the next item. Never assume the queue state from before the cycle.
Two paths: productive cycles get committed, idle cycles are local-only.
PRODUCTIVE cycles (result is improved, regressed, or harvested):
# Quality mode: compute quality_score BEFORE writing the JSONL entry
QUALITY_SCORE_ARGS=()
if [ "$QUALITY_MODE" = "true" ]; then
REMAINING_HIGH=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="high")' \
.agents/rpi/next-work.jsonl 2>/dev/null | wc -l | tr -d ' ')
REMAINING_MEDIUM=$(jq -r 'select(.consumed==false) | .items[] | select(.severity=="medium")' \
.agents/rpi/next-work.jsonl 2>/dev/null | wc -l | tr -d ' ')
QUALITY_SCORE=$((100 - (REMAINING_HIGH * 10) - (REMAINING_MEDIUM * 3)))
[ "$QUALITY_SCORE" -lt 0 ] && QUALITY_SCORE=0
QUALITY_SCORE_ARGS=(--quality-score "$QUALITY_SCORE")
fi
ENTRY_JSON="$(
bash scripts/evolve-log-cycle.sh \
--cycle "$CYCLE" \
--target "$TARGET" \
--result "$OUTCOME" \
--canonical-sha "$(git rev-parse --short HEAD)" \
--cycle-start-sha "$CYCLE_START_SHA" \
--goals-passing "$PASSING" \
--goals-total "$TOTAL" \
"${QUALITY_SCORE_ARGS[@]}"
)"
OUTCOME="$(printf '%s\n' "$ENTRY_JSON" | jq -r '.result')"
REAL_CHANGES=$(git diff --name-only "${CYCLE_START_SHA}..HEAD" -- ':!.agents/**' ':!GOALS.yaml' ':!GOALS.md' \
2>/dev/null | wc -l | tr -d ' ')
# Telemetry
bash scripts/log-telemetry.sh evolve cycle-complete cycle=${CYCLE} goal=${TARGET} outcome=${OUTCOME} 2>/dev/null || true
if [ "$OUTCOME" = "unchanged" ]; then
# No-delta cycle: leave local-only so history stays honest and stagnation logic can see it.
:
elif [ "$REAL_CHANGES" -gt 0 ]; then
# Full commit: real code was changed
git add .agents/evolve/cycle-history.jsonl
git commit -m "evolve: cycle ${CYCLE} -- ${TARGET} ${OUTCOME}"
else
# Productive cycle with non-agent repo delta already committed by a sub-skill:
# stage the ledger but do not create a standalone follow-up commit.
git add .agents/evolve/cycle-history.jsonl
fi
PRODUCTIVE_THIS_SESSION=$((PRODUCTIVE_THIS_SESSION + 1))
IDLE cycles (nothing found even after generator layers):
bash scripts/evolve-log-cycle.sh \
--cycle "$CYCLE" \
--target "idle" \
--result "unchanged" >/dev/null
# No git add, no git commit, no fitness snapshot write
while true; do
# Step 1 .. Step 6
# Stop if kill switch, max-cycles, or a real safety breaker triggers
# Otherwise increment cycle and re-enter selection
CYCLE=$((CYCLE + 1))
done
Push only when productive work has accumulated:
if [ $((PRODUCTIVE_THIS_SESSION % 5)) -eq 0 ] && [ "$PRODUCTIVE_THIS_SESSION" -gt 0 ]; then
git push
fi
if git diff --cached --name-only | grep -q cycle-history.jsonl; then
git commit -m "evolve: session teardown -- artifact-only cycles logged"
fi
2. Run /post-mortem "evolve session: ${CYCLE} cycles" to harvest learnings.
3. Push only if unpushed commits exist:
UNPUSHED=$(git log origin/main..HEAD --oneline 2>/dev/null | wc -l)
[ "$UNPUSHED" -gt 0 ] && git push
4. Report summary:
## /evolve Complete
Cycles: N | Productive: X | Regressed: Y (reverted) | Idle: Z
Stop reason: stagnation | circuit-breaker | max-cycles | kill-switch
In quality mode, the report includes additional fields:
## /evolve Complete (quality mode)
Cycles: N | Findings resolved: X | Goals fixed: Y | Idle: Z
Quality score: start → end (delta)
Remaining unconsumed: H high, M medium
Stop reason: stagnation | circuit-breaker | max-cycles | kill-switch
User says: /evolve --max-cycles=5 What happens: Evolve re-enters the full selection ladder after every /rpi cycle and runs producer layers instead of idling on empty queues.
User says: /evolve --beads-only What happens: Evolve skips goals measurement and works through bd ready backlog.
User says: /evolve --dry-run What happens: Evolve shows what would be worked on without executing.
User says: /evolve --athena What happens: Evolve runs ao mine + ao defrag at session start to surface fresh signal (orphaned research, code hotspots, oscillating goals) before the first evolve cycle. Use before a long autonomous run or after a burst of development activity.
User says: /evolve What happens: See references/examples.md for a worked overnight flow that moves through beads -> harvested work -> goals -> testing -> bug hunt -> feature suggestion before dormancy is considered.
See references/examples.md for detailed walkthroughs.
| Problem | Solution |
|---|---|
| Loop exits immediately | Remove ~/.config/evolve/KILL or .agents/evolve/STOP |
| Stagnation after repeated empty passes | Queue layers and producer layers were empty across multiple passes — dormancy is the fallback outcome |
ao goals measure hangs | Use --timeout 30 flag or --beads-only to skip |
| Regression gate reverts | Review reverted changes, narrow scope, re-run; claimed queue items must be released back to available state |
See references/cycle-history.md for advanced troubleshooting.
references/cycle-history.md — JSONL format, recovery protocol, kill switchreferences/compounding.md — Knowledge flywheel and work harvestingreferences/goals-schema.md — GOALS.yaml format and continuous metricsreferences/parallel-execution.md — Parallel /swarm architecturereferences/teardown.md — Trajectory computation and session summaryreferences/examples.md — Detailed usage examplesreferences/artifacts.md — Generated files registryreferences/oscillation.md — Oscillation detection and quarantinereferences/quality-mode.md — Quality-first mode: scoring, priority cascade, artifactsskills/rpi/SKILL.md — Full lifecycle orchestrator (called per cycle)skills/crank/SKILL.md — Epic execution (called for beads epics)GOALS.yaml — Fitness goals for this repoWeekly Installs
220
Repository
GitHub Stars
197
First Seen
Feb 12, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode217
codex214
gemini-cli213
github-copilot211
cursor210
kimi-cli207
TypeScript项目Inngest设置指南:从零开始配置事件驱动工作流
273 周安装
Tailwind CSS 开发技能:实用优先CSS框架,打造响应式JARVIS HUD界面
273 周安装
批量文档转换工具 - 支持PDF、Word、Markdown等多格式互转,高效处理数百文件
274 周安装
免费在线发票生成器 - 支持多国税费计算与多格式导出
274 周安装
Gmail自动化工作流:附件管理、邮件整理与Google Drive集成 | 7800+ n8n模板
274 周安装
Vibe代码审查工具:AI驱动自动化代码质量与安全审计 | 开发与测试
274 周安装
| off |
| Prioritize harvested post-mortem findings |
--athena | off | Run ao mine + ao defrag warmup before cycle 1 |
--test-first | on | Pass strict-quality defaults through to /rpi |
--no-test-first | off | Explicitly disable test-first passthrough to /rpi |