codex-autoresearch-loop by aradotso/trending-skills
npx skills add https://github.com/aradotso/trending-skills --skill codex-autoresearch-loopSkill 由 ara.so 提供 — Daily 2026 Skills 合集。
Codex Autoresearch 是一个 Codex 技能,它能在你的代码库上运行一个自主的 修改→验证→保留/回滚 循环。你用一句话描述一个可衡量的目标;Codex 确认计划,然后无人值守地迭代 — 每一次改进都提交到 git,每一次失败都自动回滚 — 直到被中断或达到上限。灵感来源于 Karpathy 的 autoresearch 概念,将其从机器学习训练推广到任何软件指标。
选项 A — 手动复制到你的项目:
git clone https://github.com/leo-lilinxiao/codex-autoresearch.git
cp -r codex-autoresearch your-project/.agents/skills/codex-autoresearch
选项 B — Codex 技能安装器:
$skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch
该技能位于你项目内的 .agents/skills/codex-autoresearch/ 目录。首次使用前无需配置文件。
在你的项目目录中打开 Codex,并在你的目标前加上 $codex-autoresearch:
$codex-autoresearch
我想在我的 TypeScript 代码中消除所有 `any` 类型
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
Codex 将:
go(或纠正任何内容)。你永远不需要编写配置。Codex 推断一切。
在循环开始前,Codex 总是显示它发现的内容并要求你确认。示例对话:
Codex: 我在 src/**/*.ts 中发现了 47 处 `any` 出现。
已确认:
- 目标:消除 src/**/*.ts 中的 `any` 类型
- 指标:`any` 计数(当前:47),方向:降低
- 验证:使用 grep + tsc --noEmit 作为防护
需要确认:
- 运行直到全部消除,还是将迭代次数限制为 N?
回复 "go" 开始,或告诉我需要更改什么。
你: Go,运行一整夜。
Codex: 开始 — 基线:47。迭代直到被中断。
最多可能有五轮确认。之后,Codex 将继续进行。
PHASE 0: 探测环境(CPU/GPU/RAM/工具链),检查会话恢复
PHASE 1: 从之前的运行中读取上下文 + 经验文件(如果有的话)
LOOP (无限循环或 N 次):
1. 查看当前状态、git 历史、结果日志、经验
2. 选择一个假设(应用视角,根据环境过滤)
-- 如果并行模式激活,则选择 N 个假设
3. 进行一次原子更改
4. git commit(在验证之前)
5. 运行验证命令 → 目标指标是否改善?
运行防护命令 → 是否有其他东西被破坏?
6. 改善 → 保留(提取经验)
恶化 → 批准的回滚策略(git revert)
崩溃 → 修复或跳过
7. 将结果记录到结果日志
8. 健康检查(磁盘、git、验证健康状态)
9. 如果连续丢弃 3+ 次 → REFINE;5+ 次 → PIVOT;2 次 PIVOT → 网络搜索
10. 重复。永不停止。永不询问。
除非你在确认过程中指定 Iterations: N,否则循环将无限运行。
两个命令有不同的用途:
| 门 | 目的 | 失败意味着 |
|---|---|---|
| 验证 | 目标指标是否改善? | 更改被丢弃、回滚 |
| 防护 | 是否有其他东西被破坏? | 更改被重做(最多尝试 2 次),然后回滚 |
防护文件永远不会被循环修改。
用于 Python 覆盖率运行的验证 + 防护对示例:
Verify: pytest --cov=src --cov-report=term 2>&1 | grep TOTAL | awk '{print $NF}'
Guard: python -m mypy src --ignore-missing-imports
用于 TypeScript 类型清理的示例:
Verify: grep -r "any" src --include="*.ts" | wc -l
Guard: npx tsc --noEmit
Codex 会自动将你的句子映射到七种模式之一 — 你永远不需要显式选择模式。
loop — 朝着可衡量的目标迭代(默认)$codex-autoresearch
将 src/ 中的测试覆盖率提高到至少 80%
$codex-autoresearch
减少打包体积 — 目前是 2.3 MB,将其降到 1 MB 以下
plan — 将模糊的目标转化为经过验证的循环配置$codex-autoresearch
我想让我们的 API 更快,但不知道从哪里开始
Codex 将采访你(p95 延迟 vs 吞吐量?哪个端点?)并生成一个可立即运行的循环配置。
fix — 修复错误直到数量为零$codex-autoresearch
pytest 失败了,重构后有 12 个测试损坏 — 修复它们
debug — 基于证据的根因查找$codex-autoresearch
我们的 API 在负载下随机返回 503,不知道为什么
每次迭代测试一个可证伪的假设。Codex 呈现证据,而不是猜测。
security — 只读的 STRIDE + OWASP 审计$codex-autoresearch
这段代码安全吗?
ship — 就绪性验证和发布门控$codex-autoresearch
发布它
exec — 单次执行,无循环$codex-autoresearch
运行基准测试套件并总结结果
你可以在确认步骤中内联覆盖默认值 — 无需编辑文件:
| 短语 | 效果 |
|---|---|
Iterations: 20 | 将循环限制为 20 次迭代 |
Parallel: 3 | 每轮并行测试 3 个假设 |
Guard: npm test | 覆盖推断的防护命令 |
Verify: <command> | 覆盖推断的验证命令 |
Scope: src/api/ | 将更改限制在子目录 |
确认过程中的示例:
你: Go. Iterations: 30, Guard: npm test, Scope: src/api/
在每次迭代结束时,Codex 会将结构化的经验写入 .agents/skills/codex-autoresearch/lessons.md:
Iteration 7 — KEPT
Hypothesis: 将 src/utils/mapper.ts 中的显式 `any` 替换为推断的泛型
Change: 在 mapKeys() 中添加了 <T extends Record<string, unknown>>
Result: any 计数 31 → 29
Lesson: 对工具函数使用泛型约束可以消除下游的 `any` 集群。
在会话恢复时,Codex 会首先读取此文件。每次新的运行都受益于之前的运行。
要恢复中断的运行:
$codex-autoresearch
Resume
Codex 会重新读取经验文件,检查 git 状态,重新建立基线,然后继续。
在确认过程中或任何时候请求并行模式:
你: Go, parallel 4
Codex 并行运行四个假设,保留最佳结果,丢弃其余部分。当假设空间很大时很有用。
如果循环停滞,会自动进行升级:
| 连续丢弃次数 | 行动 |
|---|---|
| 3 | REFINE — 缩小假设范围,尝试更小的原子更改 |
| 5 | PIVOT — 完全改变策略 |
| 2 次 PIVOT | 网络搜索 — Codex 获取外部参考资料以摆脱困境 |
在升级过程中永远不会征求你的许可。循环继续。
any 消除(Python 验证脚本)如果你想要自定义验证脚本而不是单行命令:
# scripts/count_any.py
import subprocess, sys
result = subprocess.run(
["grep", "-r", "--include=*.ts", r"\bany\b", "src/"],
capture_output=True, text=True
)
count = len(result.stdout.strip().splitlines())
print(count)
sys.exit(0) # always exit 0; the number is what matters
在确认时告诉 Codex:
Verify: python scripts/count_any.py
Guard: npx tsc --noEmit
# scripts/coverage_pct.py
import subprocess, re, sys
out = subprocess.check_output(
["pytest", "--cov=src", "--cov-report=term", "-q"],
stderr=subprocess.STDOUT, text=True
)
match = re.search(r"TOTAL\s+\d+\s+\d+\s+(\d+)%", out)
if match:
print(int(match.group(1)))
sys.exit(0)
print(0)
sys.exit(0)
$codex-autoresearch
提高测试覆盖率 — 目标 85%
Verify: python scripts/coverage_pct.py
Guard: python -m mypy src
Direction: higher
Target: 85
Iterations: 50
# scripts/bundle_size.sh
#!/usr/bin/env bash
npm run build --silent 2>/dev/null
du -k dist/bundle.js | awk '{print $1}'
$codex-autoresearch
减少我们的 JS 打包体积,目前约 2300 KB,目标低于 900 KB
Verify: bash scripts/bundle_size.sh
Guard: npm test
Direction: lower
Target: 900
# scripts/lint_count.sh
#!/usr/bin/env bash
npx eslint src/ --format json 2>/dev/null \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"
$codex-autoresearch
将我们的 ESLint 警告计数降为零
Verify: bash scripts/lint_count.sh
Direction: lower
Target: 0
对于通宵或长时间运行,请确保 Codex CLI 的批准设置不会中断 git commit 或 git revert 命令。最简单的选择是在一次性或沙盒化的仓库克隆中运行:
git clone . /tmp/autoresearch-sandbox
cd /tmp/autoresearch-sandbox
# 在此处启动具有完全权限的 Codex
结果会累积在 git 历史中。完成后,将成功的提交拉取回你的主仓库:
# 在你的主仓库中
git fetch /tmp/autoresearch-sandbox main
git cherry-pick <winning-commit-sha>
| 文件 | 内容 |
|---|---|
.agents/skills/codex-autoresearch/lessons.md | 每次迭代的结构化经验 |
.agents/skills/codex-autoresearch/results.log | 完整的每次迭代日志(指标值、保留/回滚、耗时) |
.agents/skills/codex-autoresearch/session.json | 当前会话状态,用于恢复 |
这些文件在 Codex 会话之间持久存在。删除它们以重新开始。
循环回滚每一次更改:
bash -c "<your verify command>" 应该打印一个单独的数字。Direction: lower 或 Direction: higher。防护命令在不相关的文件上触发:
Scope: src/specific-module/Do not touch tests/。会话恢复选择了错误的基线:
session.json 以强制重新建立基线:rm .agents/skills/codex-autoresearch/session.json并行模式产生合并冲突:
Parallel: 2Codex 在循环中途提问:
Guard: <command> || true(如果防护失败应该是非致命的)来预先防止,或者给 Codex 更完整的沙盒权限,使其可以自由运行 git 命令。循环达到 PIVOT 但没有进展:
Hint: try tree-shaking unused imports firstplan 模式以生成更丰富的假设列表,然后再切换到 loop。# 启动一个循环
$codex-autoresearch
<你的目标,一句话>
# 恢复中断的运行
$codex-autoresearch
Resume
# 有限次数的运行
$codex-autoresearch
<目标> — Iterations: 25
# 并行假设
$codex-autoresearch
<目标> — Parallel: 4
# 强制使用某种模式
$codex-autoresearch fix
pytest 有 8 个失败,修复它们
# 只读审计
$codex-autoresearch security
审计 src/api/ 是否存在注入漏洞
每周安装数
113
仓库
GitHub 星标数
10
首次出现
3 天前
安全审计
安装在
github-copilot113
codex113
warp113
kimi-cli113
gemini-cli113
amp113
Skill by ara.so — Daily 2026 Skills collection.
Codex Autoresearch is a Codex skill that runs an autonomous modify→verify→keep/revert loop on your codebase. You describe a measurable goal in one sentence; Codex confirms the plan, then iterates unattended — every improvement stacks in git, every failure reverts automatically — until interrupted or a cap is reached. Inspired by Karpathy's autoresearch concept, generalized beyond ML training to any software metric.
Option A — manual copy into your project:
git clone https://github.com/leo-lilinxiao/codex-autoresearch.git
cp -r codex-autoresearch your-project/.agents/skills/codex-autoresearch
Option B — Codex skill installer:
$skill-installer install https://github.com/leo-lilinxiao/codex-autoresearch
The skill lives at .agents/skills/codex-autoresearch/ inside your project. No config file is required before first use.
Open Codex in your project directory and prefix your goal with $codex-autoresearch:
$codex-autoresearch
I want to get rid of all `any` types in my TypeScript code
Codex will:
go (or correct anything).You never write config. Codex infers everything.
Before the loop starts Codex always shows what it found and asks you to confirm. Example exchange:
Codex: I found 47 `any` occurrences across src/**/*.ts.
Confirmed:
- Target: eliminate `any` types in src/**/*.ts
- Metric: `any` count (current: 47), direction: lower
- Verify: grep + tsc --noEmit as guard
Need to confirm:
- Run until all gone, or cap at N iterations?
Reply "go" to start, or tell me what to change.
You: Go, run overnight.
Codex: Starting — baseline: 47. Iterating until interrupted.
Up to five confirmation rounds are possible. After that, Codex proceeds.
PHASE 0: Probe environment (CPU/GPU/RAM/toolchains), check for session resume
PHASE 1: Read context + lessons file from prior run (if any)
LOOP (forever or N times):
1. Review current state, git history, results log, lessons
2. Pick ONE hypothesis (apply perspectives, filter by environment)
-- or N hypotheses if parallel mode is active
3. Make ONE atomic change
4. git commit (before verification)
5. Run verify command → did the target metric improve?
Run guard command → did anything else break?
6. Improved → keep (extract lesson)
Worse → approved rollback strategy (git revert)
Crashed → fix or skip
7. Log the result to results log
8. Health check (disk, git, verify health)
9. If 3+ discards → REFINE; 5+ → PIVOT; 2 PIVOTs → web search
10. Repeat. Never stop. Never ask.
The loop runs unbounded unless you say Iterations: N during confirmation.
Two commands serve distinct purposes:
| Gate | Purpose | Fails means |
|---|---|---|
| Verify | Did the target metric improve? | Change discarded, reverted |
| Guard | Did anything else break? | Change reworked (up to 2 attempts), then reverted |
Guard files are never modified by the loop.
Example verify + guard pair for a Python coverage run:
Verify: pytest --cov=src --cov-report=term 2>&1 | grep TOTAL | awk '{print $NF}'
Guard: python -m mypy src --ignore-missing-imports
Example for TypeScript type cleanup:
Verify: grep -r "any" src --include="*.ts" | wc -l
Guard: npx tsc --noEmit
Codex maps your sentence to one of seven modes automatically — you never pick a mode explicitly.
loop — iterate toward a measurable target (default)$codex-autoresearch
Improve test coverage in src/ to at least 80%
$codex-autoresearch
Reduce bundle size — it's currently 2.3 MB, get it under 1 MB
plan — turn a vague goal into a validated loop config$codex-autoresearch
I want to make our API faster but I don't know where to start
Codex will interview you (p95 latency vs throughput? which endpoint?) and produce a ready-to-run loop config.
fix — repair errors until count reaches zero$codex-autoresearch
pytest is failing, 12 tests broken after the refactor — fix them all
debug — evidence-driven root-cause hunting$codex-autoresearch
Our API returns 503 randomly under load, no idea why
Each iteration tests one falsifiable hypothesis. Codex presents evidence, not guesses.
security — read-only STRIDE + OWASP audit$codex-autoresearch
Is this code secure?
ship — readiness verification and release gating$codex-autoresearch
Ship it
exec — one-shot execution with no loop$codex-autoresearch
Run the benchmark suite and summarize results
You can override defaults inline during the confirmation step — no file edits needed:
| Phrase | Effect |
|---|---|
Iterations: 20 | Cap the loop at 20 iterations |
Parallel: 3 | Test 3 hypotheses concurrently per round |
Guard: npm test | Override the inferred guard command |
Verify: <command> | Override the inferred verify command |
Scope: src/api/ | Restrict changes to a subdirectory |
Example during confirmation:
You: Go. Iterations: 30, Guard: npm test, Scope: src/api/
At the end of each iteration Codex writes a structured lesson to .agents/skills/codex-autoresearch/lessons.md:
Iteration 7 — KEPT
Hypothesis: replace explicit `any` with inferred generic in src/utils/mapper.ts
Change: added <T extends Record<string, unknown>> to mapKeys()
Result: any count 31 → 29
Lesson: Generic constraints on utility functions eliminate clusters of `any` downstream.
On session resume Codex reads this file first. Each new run benefits from prior runs.
To resume an interrupted run:
$codex-autoresearch
Resume
Codex re-reads the lessons file, checks git state, re-establishes the baseline, and continues.
Request parallel mode during confirmation or at any time:
You: Go, parallel 4
Codex runs four hypotheses concurrently, keeps the best result, discards the rest. Useful when hypothesis space is large.
If the loop stalls, escalation happens automatically:
| Consecutive discards | Action |
|---|---|
| 3 | REFINE — narrow hypothesis, try smaller atomic changes |
| 5 | PIVOT — change strategy entirely |
| 2 PIVOTs | Web search — Codex fetches external references to unstick itself |
You are never asked for permission during escalation. The loop continues.
any elimination (Python verify script)If you want a custom verify script instead of a one-liner:
# scripts/count_any.py
import subprocess, sys
result = subprocess.run(
["grep", "-r", "--include=*.ts", r"\bany\b", "src/"],
capture_output=True, text=True
)
count = len(result.stdout.strip().splitlines())
print(count)
sys.exit(0) # always exit 0; the number is what matters
Tell Codex during confirmation:
Verify: python scripts/count_any.py
Guard: npx tsc --noEmit
# scripts/coverage_pct.py
import subprocess, re, sys
out = subprocess.check_output(
["pytest", "--cov=src", "--cov-report=term", "-q"],
stderr=subprocess.STDOUT, text=True
)
match = re.search(r"TOTAL\s+\d+\s+\d+\s+(\d+)%", out)
if match:
print(int(match.group(1)))
sys.exit(0)
print(0)
sys.exit(0)
$codex-autoresearch
Improve test coverage — target 85%
Verify: python scripts/coverage_pct.py
Guard: python -m mypy src
Direction: higher
Target: 85
Iterations: 50
# scripts/bundle_size.sh
#!/usr/bin/env bash
npm run build --silent 2>/dev/null
du -k dist/bundle.js | awk '{print $1}'
$codex-autoresearch
Reduce our JS bundle size, currently ~2300 KB, target under 900 KB
Verify: bash scripts/bundle_size.sh
Guard: npm test
Direction: lower
Target: 900
# scripts/lint_count.sh
#!/usr/bin/env bash
npx eslint src/ --format json 2>/dev/null \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(sum(len(f['messages']) for f in d))"
$codex-autoresearch
Get our ESLint warning count to zero
Verify: bash scripts/lint_count.sh
Direction: lower
Target: 0
For overnight or long runs, ensure Codex CLI approval settings do not interrupt git commit or git revert commands. The simplest option is to run in a disposable or sandboxed repo clone:
git clone . /tmp/autoresearch-sandbox
cd /tmp/autoresearch-sandbox
# launch Codex here with full permissions
Results accumulate in git history. Pull the winning commits back to your main repo when done:
# in your main repo
git fetch /tmp/autoresearch-sandbox main
git cherry-pick <winning-commit-sha>
| File | Contents |
|---|---|
.agents/skills/codex-autoresearch/lessons.md | Structured lessons from every iteration |
.agents/skills/codex-autoresearch/results.log | Full per-iteration log (metric value, kept/reverted, elapsed) |
.agents/skills/codex-autoresearch/session.json | Current session state for resume |
These files persist across Codex sessions. Delete them to start fresh.
Loop reverts every change:
bash -c "<your verify command>" should print a single number.Direction: lower or Direction: higher during setup.Guard fires on unrelated files:
Scope: src/specific-module/Do not touch tests/ during confirmation.Session resume picks up wrong baseline:
session.json to force a fresh baseline: rm .agents/skills/codex-autoresearch/session.jsonParallel mode produces merge conflicts:
Parallel: 2Codex asks questions mid-loop:
Guard: <command> || true if guard failures should be non-fatal, or by giving Codex fuller sandbox permissions so it can run git commands freely.Loop hits PIVOT but makes no progress:
Hint: try tree-shaking unused imports firstplan mode first to produce a richer hypothesis list before switching to loop.# Start a loop
$codex-autoresearch
<your goal in one sentence>
# Resume interrupted run
$codex-autoresearch
Resume
# Bounded run
$codex-autoresearch
<goal> — Iterations: 25
# Parallel hypotheses
$codex-autoresearch
<goal> — Parallel: 4
# Force a mode
$codex-autoresearch fix
pytest has 8 failures, repair them
# Read-only audit
$codex-autoresearch security
Audit src/api/ for injection vulnerabilities
Weekly Installs
113
Repository
GitHub Stars
10
First Seen
3 days ago
Security Audits
Gen Agent Trust HubFailSocketWarnSnykFail
Installed on
github-copilot113
codex113
warp113
kimi-cli113
gemini-cli113
amp113
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
66,200 周安装
Python开发最佳实践指南:代码规范、架构设计、测试策略与Pydantic应用
221 周安装
tldraw SDK 文档编写指南 - 技术文档规范与MDX组件使用教程
222 周安装
Outlook自动化指南:通过Rube MCP与Composio实现邮件、日历、联系人管理
229 周安装
agent-browser 浏览器自动化工具:命令行网页操作与测试指南
92 周安装
Elasticsearch 审计日志配置指南:启用、管理与安全事件监控
223 周安装
前端XSS漏洞扫描器 - React/Vue/Angular/JavaScript代码安全检测工具
227 周安装