codex-review by hyperb1iss/hyperskills
npx skills add https://github.com/hyperb1iss/hyperskills --skill codex-review直接使用 codex 二进制文件进行跨模型验证。Claude 编写代码,Codex 审查代码 —— 不同的架构,不同的训练分布,避免了自我审查偏见。
核心洞察: 单一模型的自我审查存在系统性偏见。跨模型审查能发现不同类型的错误,因为审查者与作者存在根本不同的盲点。
前提条件: 必须安装并认证 codex CLI。使用 codex --help 验证。在 ~/.codex/config.toml 中配置默认值:
model = "gpt-5.4"
review_model = "gpt-5.4"
# 注意:review_model 专门用于覆盖 codex review 的 model 设置
model_reasoning_effort = "high"
| 模式 | 命令 | 最佳适用场景 |
|---|---|---|
codex review |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 结构化差异审查,包含优先级排序的发现 |
| PR 前审查、提交审查、WIP 检查 |
codex exec | 非交互式自由形式深度审查,完全控制提示词 | 安全审计、架构审查、针对性调查 |
关键标志:
| 标志 | 适用于 | 用途 |
|---|---|---|
-c model="gpt-5.4" | 两者 | 模型选择(review 没有 -m 标志) |
-m, --model | 仅 exec | 模型选择简写 |
-c model_reasoning_effort="xhigh" | 两者 | 推理深度:low / medium / high / xhigh |
--base <BRANCH> | 仅 review | 与基准分支进行差异比较 |
--commit <SHA> | 仅 review | 审查特定提交 |
--uncommitted | 仅 review | 审查工作树中的更改 |
在打开 PR 之前的标准审查。适用于任何非琐碎的更改。
步骤 1 — 结构化审查(捕获正确性和一般性问题):
通过 Bash 运行:
codex review --base main -c model="gpt-5.4"
步骤 2 — 安全深度审查(如果代码涉及认证、输入处理或 API):
通过 Bash 运行:
codex exec -m gpt-5.4 \
-c model_reasoning_effort="xhigh" \
"<security prompt from references/prompts.md>"
步骤 3 — 修复发现的问题,然后重新审查:
通过 Bash 运行:
codex review --base main -c model="gpt-5.4"
每次有意义的提交后进行快速检查。
codex review --commit <SHA> -c model="gpt-5.4"
在开发过程中审查未提交的工作。在问题固化之前捕获它们。
codex review --uncommitted -c model="gpt-5.4"
针对特定关注点(错误处理、并发、数据流)进行精准深度审查。
codex exec -m gpt-5.4 \
-c model_reasoning_effort="xhigh" \
"分析 main 和 HEAD 之间更改中的[具体关注点]。
对于发现的每个问题:引用文件和行号,解释风险,
提出具体的修复方案。置信度阈值:仅标记你 >=70% 确信的问题。"
迭代式质量保证 —— 实现、审查、修复、重复。最多 3 次迭代。
迭代 1:
Claude -> 实现功能
Bash: codex review --base main -c model="gpt-5.4" -> 发现的问题
Claude -> 修复关键/高级别问题
迭代 2:
Bash: codex review --base main -c model="gpt-5.4" -> 验证修复 + 捕获剩余问题
Claude -> 修复剩余问题
迭代 3(最终):
Bash: codex review --base main -c model="gpt-5.4" -> 健康证明
(或接受已知的权衡并记录它们)
3 次迭代后停止。超过此范围收益递减。
为了进行彻底的审查,运行多个针对性轮次,而不是一次模糊的审查。每轮审查分配特定的角色和关注领域。
| 轮次 | 关注点 | 模式 | 推理深度 |
|---|---|---|---|
| 正确性 | 错误、逻辑、边界情况、竞态条件 | codex review | default |
| 安全性 | OWASP Top 10、注入、认证、密钥 | codex exec 配合安全提示词 | xhigh |
| 架构 | 耦合、抽象、API 一致性 | codex exec 配合架构提示词 | xhigh |
| 性能 | O(n^2)、N+1 查询、内存泄漏 | codex exec 配合性能提示词 | high |
按顺序运行各轮审查。在轮次之间修复关键问题,以避免噪声累积。
何时使用多轮审查与单轮审查:
| 更改规模 | 策略 |
|---|---|
| < 50 行,单一关注点 | 单次 codex review |
| 50-300 行,功能开发 | codex review + 安全轮次 |
| 300+ 行或架构变更 | 完整的 4 轮审查 |
| 安全敏感(认证、支付、加密) | 始终包含安全轮次 |
digraph review_decision {
rankdir=TB;
node [shape=diamond];
"What stage?" -> "Pre-commit" [label="writing code"];
"What stage?" -> "Pre-PR" [label="ready to submit"];
"What stage?" -> "Post-commit" [label="just committed"];
"What stage?" -> "Investigating" [label="specific concern"];
node [shape=box];
"Pre-commit" -> "Pattern 3: WIP Check";
"Pre-PR" -> "How big?";
"Post-commit" -> "Pattern 2: Commit Review";
"Investigating" -> "Pattern 4: Focused Investigation";
"How big?" [shape=diamond];
"How big?" -> "Pattern 1: Pre-PR Review" [label="< 300 lines"];
"How big?" -> "Full Multi-Pass" [label=">= 300 lines"];
}
可立即使用的提示词模板位于 references/prompts.md。
| 反模式 | 失败原因 | 修复方法 |
|---|---|---|
| "审查这段代码" | 过于模糊 —— 产生表面化的无谓争论 | 使用带有角色的特定领域提示词 |
| 单轮审查处理所有问题 | 上下文稀释 —— 每个维度都得到肤浅处理 | 多轮审查,每轮关注一个方面 |
| 自我审查(Claude 审查 Claude 的代码) | 系统性偏见 —— 模型会认可自己的模式 | 跨模型:Claude 编写,Codex 审查 |
| 没有置信度阈值 | 噪声淹没信号 —— 置信度 0.3 的发现浪费时间 | 仅对 >= 0.7 置信度的发现采取行动 |
| 审查中包含风格评论 | 如果没有明确的跳过指令,LLM 默认会进行无谓争论 | "跳过:格式、命名、次要文档" |
3 次审查迭代 | 收益递减,噪声增加,过度处理 | 在 3 次时停止。接受权衡。
没有项目上下文的审查 | 与代码库惯例脱节的通用建议 | Codex 自动读取 CLAUDE.md/AGENTS.md
使用 MCP 包装器 | 对 CLI 二进制文件进行不必要的间接调用 | 通过 Bash 直接调用codex
如需可立即使用的提示词模板,请参阅 references/prompts.md。
每周安装次数
129
代码仓库
GitHub 星标数
2
首次出现
2026年2月19日
安全审计
安装于
claude-code126
opencode8
gemini-cli8
github-copilot8
amp8
codex8
Cross-model validation using the codex binary directly. Claude writes code, Codex reviews it — different architecture, different training distribution, no self-approval bias.
Core insight: Single-model self-review is systematically biased. Cross-model review catches different bug classes because the reviewer has fundamentally different blind spots than the author.
Prerequisite: The codex CLI must be installed and authenticated. Verify with codex --help. Configure defaults in ~/.codex/config.toml:
model = "gpt-5.4"
review_model = "gpt-5.4"
# Note: review_model overrides model for codex review specifically
model_reasoning_effort = "high"
| Mode | Command | Best For |
|---|---|---|
codex review | Structured diff review with prioritized findings | Pre-PR reviews, commit reviews, WIP checks |
codex exec | Freeform non-interactive deep-dive with full prompt control | Security audits, architecture review, focused investigation |
Key flags:
| Flag | Applies To | Purpose |
|---|---|---|
-c model="gpt-5.4" | both | Model selection (review has no -m flag) |
-m, --model | exec only | Model selection shorthand |
-c model_reasoning_effort="xhigh" | both | Reasoning depth: low / / / |
The standard review before opening a PR. Use for any non-trivial change.
Step 1 — Structured review (catches correctness + general issues):
Run via Bash:
codex review --base main -c model="gpt-5.4"
Step 2 — Security deep-dive (if code touches auth, input handling, or APIs):
Run via Bash:
codex exec -m gpt-5.4 \
-c model_reasoning_effort="xhigh" \
"<security prompt from references/prompts.md>"
Step 3 — Fix findings, then re-review:
Run via Bash:
codex review --base main -c model="gpt-5.4"
Quick check after each meaningful commit.
codex review --commit <SHA> -c model="gpt-5.4"
Review uncommitted work mid-development. Catches issues before they're baked in.
codex review --uncommitted -c model="gpt-5.4"
Surgical deep-dive on a specific concern (error handling, concurrency, data flow).
codex exec -m gpt-5.4 \
-c model_reasoning_effort="xhigh" \
"Analyze [specific concern] in the changes between main and HEAD.
For each issue found: cite file and line, explain the risk,
suggest a concrete fix. Confidence threshold: only flag issues
you are >=70% confident about."
Iterative quality enforcement — implement, review, fix, repeat. Max 3 iterations.
Iteration 1:
Claude -> implement feature
Bash: codex review --base main -c model="gpt-5.4" -> findings
Claude -> fix critical/high findings
Iteration 2:
Bash: codex review --base main -c model="gpt-5.4" -> verify fixes + catch remaining
Claude -> fix remaining issues
Iteration 3 (final):
Bash: codex review --base main -c model="gpt-5.4" -> clean bill of health
(or accept known trade-offs and document them)
STOP after 3 iterations. Diminishing returns beyond this.
For thorough reviews, run multiple focused passes instead of one vague pass. Each pass gets a specific persona and concern domain.
| Pass | Focus | Mode | Reasoning |
|---|---|---|---|
| Correctness | Bugs, logic, edge cases, race conditions | codex review | default |
| Security | OWASP Top 10, injection, auth, secrets | codex exec with security prompt | xhigh |
| Architecture | Coupling, abstractions, API consistency | codex exec with architecture prompt | xhigh |
Run passes sequentially. Fix critical findings between passes to avoid noise compounding.
When to use multi-pass vs single-pass:
| Change Size | Strategy |
|---|---|
| < 50 lines, single concern | Single codex review |
| 50-300 lines, feature work | codex review + security pass |
| 300+ lines or architecture change | Full 4-pass |
| Security-sensitive (auth, payments, crypto) | Always include security pass |
digraph review_decision {
rankdir=TB;
node [shape=diamond];
"What stage?" -> "Pre-commit" [label="writing code"];
"What stage?" -> "Pre-PR" [label="ready to submit"];
"What stage?" -> "Post-commit" [label="just committed"];
"What stage?" -> "Investigating" [label="specific concern"];
node [shape=box];
"Pre-commit" -> "Pattern 3: WIP Check";
"Pre-PR" -> "How big?";
"Post-commit" -> "Pattern 2: Commit Review";
"Investigating" -> "Pattern 4: Focused Investigation";
"How big?" [shape=diamond];
"How big?" -> "Pattern 1: Pre-PR Review" [label="< 300 lines"];
"How big?" -> "Full Multi-Pass" [label=">= 300 lines"];
}
Ready-to-use prompt templates are in references/prompts.md.
| Anti-Pattern | Why It Fails | Fix |
|---|---|---|
| "Review this code" | Too vague — produces surface-level bikeshedding | Use specific domain prompts with persona |
| Single pass for everything | Context dilution — every dimension gets shallow treatment | Multi-pass with one concern per pass |
| Self-review (Claude reviews Claude's code) | Systematic bias — models approve their own patterns | Cross-model: Claude writes, Codex reviews |
| No confidence threshold | Noise floods signal — 0.3 confidence findings waste time | Only act on >= 0.7 confidence |
| Style comments in review | LLMs default to bikeshedding without explicit skip directives | "Skip: formatting, naming, minor docs" |
3 review iterations | Diminishing returns, increasing noise, overbaking | Stop at 3. Accept trade-offs.
Review without project context | Generic advice disconnected from codebase conventions | Codex reads CLAUDE.md/AGENTS.md automatically
Using an MCP wrapper | Unnecessary indirection over a CLI binary | Callcodexdirectly via Bash
For ready-to-use prompt templates, see references/prompts.md.
Weekly Installs
129
Repository
GitHub Stars
2
First Seen
Feb 19, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code126
opencode8
gemini-cli8
github-copilot8
amp8
codex8
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
66,200 周安装
Claude技能创建器指南:构建模块化AI技能包,优化工作流与工具集成
5,700 周安装
React Native 设计实战:样式、导航与动画优化,打造高性能跨平台应用
5,800 周安装
Git高级工作流教程:交互式变基、拣选、二分查找与工作树实战指南
5,800 周安装
shadcn/ui + Tailwind CSS 实战:构建美观可访问的 React UI 组件与设计系统
5,900 周安装
A股分析全能Skill:基于AKShare的自然语言股票分析工具,支持QQ/Telegram
5,800 周安装
Monorepo 管理指南:使用 Turborepo 和 pnpm 构建高效代码仓库
5,900 周安装
mediumhighxhigh--base <BRANCH> | review only | Diff against base branch |
--commit <SHA> | review only | Review a specific commit |
--uncommitted | review only | Review working tree changes |
| Performance | O(n^2), N+1 queries, memory leaks | codex exec with performance prompt | high |