Codex CLI 跨模型代码审查指南：消除偏见，提升代码质量与安全性

codex-review by hyperb1iss/hyperskills

239 周安装量

3 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/hyperb1iss/hyperskills --skill codex-review

AI/机器学习自动化代码质量

🇨🇳中文介绍

使用 Codex CLI 进行跨模型代码审查

直接使用 codex 二进制文件进行跨模型验证。Claude 编写代码，Codex 审查代码 —— 不同的架构，不同的训练分布，避免了自我审查偏见。

核心洞察： 单一模型的自我审查存在系统性偏见。跨模型审查能发现不同类型的错误，因为审查者与作者存在根本不同的盲点。

前提条件： 必须安装并认证 codex CLI。使用 codex --help 验证。在 ~/.codex/config.toml 中配置默认值：

model = "gpt-5.4"
review_model = "gpt-5.4"
# 注意：review_model 专门用于覆盖 codex review 的 model 设置
model_reasoning_effort = "high"

调用 Codex 的两种方式

模式	命令	最佳适用场景
`codex review`

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

867,400 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

129,699 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

116,600 周安装

标志	适用于	用途
`-c model="gpt-5.4"`	两者	模型选择（review 没有 `-m` 标志）
`-m`, `--model`	仅 `exec`	模型选择简写
`-c model_reasoning_effort="xhigh"`	两者	推理深度：`low` / `medium` / `high` / `xhigh`
`--base <BRANCH>`	仅 `review`	与基准分支进行差异比较
`--commit <SHA>`	仅 `review`	审查特定提交
`--uncommitted`	仅 `review`	审查工作树中的更改

步骤 1 — 结构化审查（捕获正确性和一般性问题）：
  通过 Bash 运行：
    codex review --base main -c model="gpt-5.4"

步骤 2 — 安全深度审查（如果代码涉及认证、输入处理或 API）：
  通过 Bash 运行：
    codex exec -m gpt-5.4 \
      -c model_reasoning_effort="xhigh" \
      "<security prompt from references/prompts.md>"

步骤 3 — 修复发现的问题，然后重新审查：
  通过 Bash 运行：
    codex review --base main -c model="gpt-5.4"

迭代 1：
  Claude -> 实现功能
  Bash: codex review --base main -c model="gpt-5.4" -> 发现的问题
  Claude -> 修复关键/高级别问题

迭代 2：
  Bash: codex review --base main -c model="gpt-5.4" -> 验证修复 + 捕获剩余问题
  Claude -> 修复剩余问题

迭代 3（最终）：
  Bash: codex review --base main -c model="gpt-5.4" -> 健康证明
  （或接受已知的权衡并记录它们）

3 次迭代后停止。超过此范围收益递减。

轮次	关注点	模式	推理深度
正确性	错误、逻辑、边界情况、竞态条件	`codex review`	default
安全性	OWASP Top 10、注入、认证、密钥	`codex exec` 配合安全提示词	`xhigh`
架构	耦合、抽象、API 一致性	`codex exec` 配合架构提示词	`xhigh`
性能	O(n^2)、N+1 查询、内存泄漏	`codex exec` 配合性能提示词	`high`

更改规模	策略
< 50 行，单一关注点	单次 `codex review`
50-300 行，功能开发	`codex review` + 安全轮次
300+ 行或架构变更	完整的 4 轮审查
安全敏感（认证、支付、加密）	始终包含安全轮次

digraph review_decision {
    rankdir=TB;
    node [shape=diamond];

    "What stage?" -> "Pre-commit" [label="writing code"];
    "What stage?" -> "Pre-PR" [label="ready to submit"];
    "What stage?" -> "Post-commit" [label="just committed"];
    "What stage?" -> "Investigating" [label="specific concern"];

    node [shape=box];
    "Pre-commit" -> "Pattern 3: WIP Check";
    "Pre-PR" -> "How big?";
    "Post-commit" -> "Pattern 2: Commit Review";
    "Investigating" -> "Pattern 4: Focused Investigation";

    "How big?" [shape=diamond];
    "How big?" -> "Pattern 1: Pre-PR Review" [label="< 300 lines"];
    "How big?" -> "Full Multi-Pass" [label=">= 300 lines"];
}

反模式	失败原因	修复方法
"审查这段代码"	过于模糊 —— 产生表面化的无谓争论	使用带有角色的特定领域提示词
单轮审查处理所有问题	上下文稀释 —— 每个维度都得到肤浅处理	多轮审查，每轮关注一个方面
自我审查（Claude 审查 Claude 的代码）	系统性偏见 —— 模型会认可自己的模式	跨模型：Claude 编写，Codex 审查
没有置信度阈值	噪声淹没信号 —— 置信度 0.3 的发现浪费时间	仅对 >= 0.7 置信度的发现采取行动
审查中包含风格评论	如果没有明确的跳过指令，LLM 默认会进行无谓争论	"跳过：格式、命名、次要文档"

🇺🇸English

Cross-Model Code Review with Codex CLI

Cross-model validation using the codex binary directly. Claude writes code, Codex reviews it — different architecture, different training distribution, no self-approval bias.

Core insight: Single-model self-review is systematically biased. Cross-model review catches different bug classes because the reviewer has fundamentally different blind spots than the author.

Prerequisite: The codex CLI must be installed and authenticated. Verify with codex --help. Configure defaults in ~/.codex/config.toml:

model = "gpt-5.4"
review_model = "gpt-5.4"
# Note: review_model overrides model for codex review specifically
model_reasoning_effort = "high"

Two Ways to Invoke Codex

Mode	Command	Best For
`codex review`	Structured diff review with prioritized findings	Pre-PR reviews, commit reviews, WIP checks
`codex exec`	Freeform non-interactive deep-dive with full prompt control	Security audits, architecture review, focused investigation

Key flags:

Flag	Applies To	Purpose
`-c model="gpt-5.4"`	both	Model selection (review has no `-m` flag)
`-m`, `--model`	`exec` only	Model selection shorthand
`-c model_reasoning_effort="xhigh"`	both	Reasoning depth: `low` / / /

Review Patterns

Pattern 1: Pre-PR Full Review (Default)

The standard review before opening a PR. Use for any non-trivial change.

Step 1 — Structured review (catches correctness + general issues):
  Run via Bash:
    codex review --base main -c model="gpt-5.4"

Step 2 — Security deep-dive (if code touches auth, input handling, or APIs):
  Run via Bash:
    codex exec -m gpt-5.4 \
      -c model_reasoning_effort="xhigh" \
      "<security prompt from references/prompts.md>"

Step 3 — Fix findings, then re-review:
  Run via Bash:
    codex review --base main -c model="gpt-5.4"

Pattern 2: Commit-Level Review

Quick check after each meaningful commit.

codex review --commit <SHA> -c model="gpt-5.4"

Pattern 3: WIP Check

Review uncommitted work mid-development. Catches issues before they're baked in.

codex review --uncommitted -c model="gpt-5.4"

Pattern 4: Focused Investigation

Surgical deep-dive on a specific concern (error handling, concurrency, data flow).

codex exec -m gpt-5.4 \
  -c model_reasoning_effort="xhigh" \
  "Analyze [specific concern] in the changes between main and HEAD.
   For each issue found: cite file and line, explain the risk,
   suggest a concrete fix. Confidence threshold: only flag issues
   you are >=70% confident about."

Pattern 5: Ralph Loop (Implement-Review-Fix)

Iterative quality enforcement — implement, review, fix, repeat. Max 3 iterations.

Iteration 1:
  Claude -> implement feature
  Bash: codex review --base main -c model="gpt-5.4" -> findings
  Claude -> fix critical/high findings

Iteration 2:
  Bash: codex review --base main -c model="gpt-5.4" -> verify fixes + catch remaining
  Claude -> fix remaining issues

Iteration 3 (final):
  Bash: codex review --base main -c model="gpt-5.4" -> clean bill of health
  (or accept known trade-offs and document them)

STOP after 3 iterations. Diminishing returns beyond this.

Multi-Pass Strategy

For thorough reviews, run multiple focused passes instead of one vague pass. Each pass gets a specific persona and concern domain.

Pass	Focus	Mode	Reasoning
Correctness	Bugs, logic, edge cases, race conditions	`codex review`	default
Security	OWASP Top 10, injection, auth, secrets	`codex exec` with security prompt	`xhigh`
Architecture	Coupling, abstractions, API consistency	`codex exec` with architecture prompt	`xhigh`

Run passes sequentially. Fix critical findings between passes to avoid noise compounding.

When to use multi-pass vs single-pass:

Change Size	Strategy
< 50 lines, single concern	Single `codex review`
50-300 lines, feature work	`codex review` + security pass
300+ lines or architecture change	Full 4-pass
Security-sensitive (auth, payments, crypto)	Always include security pass

Decision Tree: Which Pattern?

digraph review_decision {
    rankdir=TB;
    node [shape=diamond];

    "What stage?" -> "Pre-commit" [label="writing code"];
    "What stage?" -> "Pre-PR" [label="ready to submit"];
    "What stage?" -> "Post-commit" [label="just committed"];
    "What stage?" -> "Investigating" [label="specific concern"];

    node [shape=box];
    "Pre-commit" -> "Pattern 3: WIP Check";
    "Pre-PR" -> "How big?";
    "Post-commit" -> "Pattern 2: Commit Review";
    "Investigating" -> "Pattern 4: Focused Investigation";

    "How big?" [shape=diamond];
    "How big?" -> "Pattern 1: Pre-PR Review" [label="< 300 lines"];
    "How big?" -> "Full Multi-Pass" [label=">= 300 lines"];
}

Prompt Engineering Rules

Assign a persona — "senior security engineer" beats "review for security"
Specify what to skip — "Skip formatting, naming style, minor docs gaps" prevents bikeshedding
Require confidence scores — Only act on findings with confidence >= 0.7
Demand file:line citations — Vague findings without location are not actionable
Ask for concrete fixes — "Suggest a specific fix" not just "this is a problem"
One domain per pass — Security-only, architecture-only. Mixing dilutes depth.

Ready-to-use prompt templates are in references/prompts.md.

Anti-Patterns

Anti-Pattern	Why It Fails	Fix
"Review this code"	Too vague — produces surface-level bikeshedding	Use specific domain prompts with persona
Single pass for everything	Context dilution — every dimension gets shallow treatment	Multi-pass with one concern per pass
Self-review (Claude reviews Claude's code)	Systematic bias — models approve their own patterns	Cross-model: Claude writes, Codex reviews
No confidence threshold	Noise floods signal — 0.3 confidence findings waste time	Only act on >= 0.7 confidence
Style comments in review	LLMs default to bikeshedding without explicit skip directives	"Skip: formatting, naming, minor docs"

3 review iterations | Diminishing returns, increasing noise, overbaking | Stop at 3. Accept trade-offs.
Review without project context | Generic advice disconnected from codebase conventions | Codex reads CLAUDE.md/AGENTS.md automatically
Using an MCP wrapper | Unnecessary indirection over a CLI binary | Call codex directly via Bash

What This Skill is NOT

Not a replacement for human review. Cross-model review catches bugs but can't evaluate product direction or user experience.
Not a linter. Don't use Codex review for formatting or style — that's what linters are for.
Not infallible. 5-15% false positive rate is normal. Triage findings, don't blindly fix everything.
Not for self-approval. The whole point is cross-model validation. Don't use Claude to review Claude's code.

References

For ready-to-use prompt templates, see references/prompts.md.

Weekly Installs

129

Repository

hyperb1iss/hyperskills

GitHub Stars

First Seen

Feb 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code126

opencode8

gemini-cli8

github-copilot8

amp8

codex8

Codex CLI 跨模型代码审查指南：消除偏见，提升代码质量与安全性

🇨🇳中文介绍

使用 Codex CLI 进行跨模型代码审查

调用 Codex 的两种方式

相关 Skills

审查模式

模式 1：PR 前完整审查（默认）

模式 2：提交级别审查

模式 3：WIP 检查

模式 4：针对性调查

模式 5：Ralph 循环（实现-审查-修复）

多轮审查策略

决策树：选择哪种模式？

提示词工程规则

反模式

此技能不是什么

参考