对抗性代码审查工作流：Hunter/Skeptic/Referee 消除偏见，提升缺陷检测准确率

hunter-skeptic-referee by b-open-io/prompts

1 周安装量

7 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/b-open-io/prompts --skill hunter-skeptic-referee

自动化代码质量测试

🇨🇳中文介绍

Hunter / Skeptic / Referee

由 danpeguine (@danpeguine) 设计的对抗性代码审查工作流。三个代理在隔离的上下文中运行——任何代理都无法看到其他代理"想"听到的内容。这消除了阿谀奉承的确认偏见，并产生真实无误的缺陷报告。

为何需要隔离上下文

当单个代理既发现缺陷又评估缺陷时，它会锚定于自己先前的判断。通过在阶段之间重置上下文，并仅向每个代理提供其所需信息，每个裁决都是真正独立的。Skeptic 无法看到 Hunter 的热情。Referee 无法看到 Skeptic 的怀疑。

三个代理

阶段	代理	角色
1. Hunter	Nyx (code-auditor)	找出每一个可能的缺陷。最大化召回率。允许误报。
2. Skeptic	Kayle (architecture-reviewer)	挑战每一个发现。证伪误报。错误驳回的惩罚加倍。
3. Referee	Iris (tester)	最终仲裁者。权衡双方。产生真实结果。

如何运行

步骤 1 — 生成 Hunter (Nyx)

在上下文中包含完整代码库，调度 code-auditor 代理。告诉 Nyx 以 Hunter 模式 运行。

Agent(subagent_type="bopen-tools:code-auditor", prompt="
HUNTER MODE: You are the Hunter in a three-phase adversarial review.

Analyze the following codebase thoroughly and identify ALL potential bugs, issues, and anomalies.

Scoring:
- +1: Minor (edge cases, cosmetic)
- +5: Significant (functional issues, data inconsistencies)
- +10: Critical (security vulnerabilities, data loss, crashes)

Maximize your score. Be aggressive. Report anything that COULD be a bug. Missing a real bug is worse than a false positive.

For each bug:
1. File and line number
2. Description of the issue
3. Why it's a bug or failure mode
4. Severity score (+1/+5/+10)

End with your total score.

Codebase to audit: [SPECIFY FILES/DIRS]
")

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 2 — 生成 Skeptic (Kayle)

仅使用 Hunter 的缺陷列表和引用的特定代码片段来调度 architecture-reviewer 代理。不要提供完整代码库。告诉 Kayle 以 Skeptic 模式 运行。

Agent(subagent_type="bopen-tools:architecture-reviewer", prompt="
SKEPTIC MODE: You are the Skeptic in a three-phase adversarial review.

A previous reviewer identified the following potential bugs. Your job is to DISPROVE as many as possible.

Scoring:
- Disprove a bug: +[bug's original score] points
- Wrongly dismiss a real bug: -2x [bug's original score] points

For each bug:
1. Analyze the reported issue
2. Attempt to disprove it (explain why it's NOT a bug)
3. Confidence level (%)
4. Decision: CONFIRMED or DISMISSED
5. Points gained/risked

End with: total confirmed, total dismissed, your final score.

Bug report to challenge:
[PASTE HUNTER'S OUTPUT]

Relevant code snippets:
[PASTE ONLY THE CODE REFERENCED IN EACH FINDING]
")

步骤 3 — 生成 Referee (Iris)

使用 Hunter 的发现和 Skeptic 的裁决来调度 tester 代理。仅此而已。告诉 Iris 以 Referee 模式 运行。

Agent(subagent_type="bopen-tools:tester", prompt="
REFEREE MODE: You are the Referee in a three-phase adversarial review.

You have: (1) Bug findings from the Hunter, (2) Challenges from the Skeptic.

IMPORTANT: I have the verified ground truth for each bug. You will be scored:
- +1: Correct judgment
- -1: Incorrect judgment

For each bug:
- Hunter's claim (summary)
- Skeptic's counter (summary)
- Your analysis
- VERDICT: REAL BUG or NOT A BUG
- Confidence: High / Medium / Low

Final summary:
- Total confirmed as real
- Total dismissed
- Ranked list of confirmed bugs by severity

Be precise. You are being scored against ground truth.

Hunter's findings:
[PASTE HUNTER'S OUTPUT]

Skeptic's verdicts:
[PASTE SKEPTIC'S OUTPUT]
")

步骤 4 — 呈现报告

Referee 的输出是权威的缺陷报告。将其呈现给用户，并按严重性对已确认的缺陷进行排序。

上下文边界规则

阶段	可访问内容
Hunter (Nyx)	完整代码库
Skeptic (Kayle)	仅缺陷列表 + 引用的代码片段
Referee (Iris)	仅 Hunter 发现 + Skeptic 裁决

违反这些边界会重新引入阿谀奉承问题。 如果 Skeptic 看到 Hunter 的信心，它会锚定于此。如果 Referee 看到任一代理的情感倾向，它会偏向共识而非真相。

分数	含义
+1	轻微 — 不太可能的边缘情况，影响小
+5	显著 — 在可达条件下影响正确性
+10	严重 — 安全漏洞、数据丢失或系统损坏

发布前的安全审计
审查不熟悉或遗留的代码库
高风险模块（认证、支付、数据完整性）
范围广泛或涉及架构变更的拉取请求

对于快速的非正式审查，可以直接在普通模式下使用 Nyx。

🇺🇸English

Hunter / Skeptic / Referee

An adversarial code review workflow designed by danpeguine (@danpeguine). Three agents run in isolated contexts — no agent sees what any other agent "wants" to hear. This eliminates sycophantic confirmation bias and produces ground-truth bug reports.

Why Isolated Contexts

When a single agent both finds bugs and evaluates them, it anchors on its own earlier judgments. By resetting context between phases and giving each agent only what it needs, every verdict is genuinely independent. The Skeptic cannot see the Hunter's enthusiasm. The Referee cannot see the Skeptic's skepticism.

The Three Agents

Phase	Agent	Role
1. Hunter	Nyx (code-auditor)	Find every possible bug. Maximize recall. False positives OK.
2. Skeptic	Kayle (architecture-reviewer)	Challenge every finding. Disprove false positives. 2x penalty for wrong dismissals.
3. Referee	Iris (tester)	Final arbiter. Weigh both sides. Produce ground truth.

How to Run

Step 1 — Spawn the Hunter (Nyx)

Dispatch the code-auditor agent with the full codebase in context. Tell Nyx to operate in Hunter mode.

Agent(subagent_type="bopen-tools:code-auditor", prompt="
HUNTER MODE: You are the Hunter in a three-phase adversarial review.

Analyze the following codebase thoroughly and identify ALL potential bugs, issues, and anomalies.

Scoring:
- +1: Minor (edge cases, cosmetic)
- +5: Significant (functional issues, data inconsistencies)
- +10: Critical (security vulnerabilities, data loss, crashes)

Maximize your score. Be aggressive. Report anything that COULD be a bug. Missing a real bug is worse than a false positive.

For each bug:
1. File and line number
2. Description of the issue
3. Why it's a bug or failure mode
4. Severity score (+1/+5/+10)

End with your total score.

Codebase to audit: [SPECIFY FILES/DIRS]
")

Collect the numbered bug list.

Step 2 — Spawn the Skeptic (Kayle)

Dispatch the architecture-reviewer agent with ONLY the Hunter's bug list and the specific code snippets referenced. Do NOT give the full codebase. Tell Kayle to operate in Skeptic mode.

Agent(subagent_type="bopen-tools:architecture-reviewer", prompt="
SKEPTIC MODE: You are the Skeptic in a three-phase adversarial review.

A previous reviewer identified the following potential bugs. Your job is to DISPROVE as many as possible.

Scoring:
- Disprove a bug: +[bug's original score] points
- Wrongly dismiss a real bug: -2x [bug's original score] points

For each bug:
1. Analyze the reported issue
2. Attempt to disprove it (explain why it's NOT a bug)
3. Confidence level (%)
4. Decision: CONFIRMED or DISMISSED
5. Points gained/risked

End with: total confirmed, total dismissed, your final score.

Bug report to challenge:
[PASTE HUNTER'S OUTPUT]

Relevant code snippets:
[PASTE ONLY THE CODE REFERENCED IN EACH FINDING]
")

Step 3 — Spawn the Referee (Iris)

Dispatch the tester agent with the Hunter's findings AND the Skeptic's verdicts. Nothing else. Tell Iris to operate in Referee mode.

Agent(subagent_type="bopen-tools:tester", prompt="
REFEREE MODE: You are the Referee in a three-phase adversarial review.

You have: (1) Bug findings from the Hunter, (2) Challenges from the Skeptic.

IMPORTANT: I have the verified ground truth for each bug. You will be scored:
- +1: Correct judgment
- -1: Incorrect judgment

For each bug:
- Hunter's claim (summary)
- Skeptic's counter (summary)
- Your analysis
- VERDICT: REAL BUG or NOT A BUG
- Confidence: High / Medium / Low

Final summary:
- Total confirmed as real
- Total dismissed
- Ranked list of confirmed bugs by severity

Be precise. You are being scored against ground truth.

Hunter's findings:
[PASTE HUNTER'S OUTPUT]

Skeptic's verdicts:
[PASTE SKEPTIC'S OUTPUT]
")

Step 4 — Present the Report

The Referee's output is the authoritative bug report. Present it to the user with confirmed bugs ranked by severity.

Context Boundary Rules

Phase	Gets access to
Hunter (Nyx)	Full codebase
Skeptic (Kayle)	Bug list + referenced code snippets only
Referee (Iris)	Hunter findings + Skeptic verdicts only

Violating these boundaries reintroduces the sycophancy problem. If the Skeptic sees the Hunter's confidence, it anchors on it. If the Referee sees either agent's emotional register, it drifts toward consensus rather than truth.

Severity Reference

Score	Meaning
+1	Minor — unlikely edge case, low impact
+5	Significant — affects correctness under reachable conditions
+10	Critical — security vulnerability, data loss, or system corruption

When to Use

Pre-release security audits
Reviewing unfamiliar or legacy codebases
High-stakes modules (auth, payments, data integrity)
Pull requests with broad scope or architectural changes

For quick informal reviews, just use Nyx directly in normal mode.

Weekly Installs

Repository

b-open-io/prompts

GitHub Stars

First Seen

1 day ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

windsurf1

amp1

cline1

opencode1

cursor1

kimi-cli1

GitHub Actions 官方文档查询助手 - 精准解答 CI/CD 工作流问题

30,000 周安装

对抗性代码审查工作流：Hunter/Skeptic/Referee 消除偏见，提升缺陷检测准确率

🇨🇳中文介绍

Hunter / Skeptic / Referee

为何需要隔离上下文

三个代理

如何运行

步骤 1 — 生成 Hunter (Nyx)

相关 Skills

步骤 2 — 生成 Skeptic (Kayle)

步骤 3 — 生成 Referee (Iris)

步骤 4 — 呈现报告

上下文边界规则

严重性参考

使用场景

🇺🇸English

Hunter / Skeptic / Referee

Why Isolated Contexts

The Three Agents

How to Run

Step 1 — Spawn the Hunter (Nyx)

Step 2 — Spawn the Skeptic (Kayle)

Step 3 — Spawn the Referee (Iris)

Step 4 — Present the Report

Context Boundary Rules

Severity Reference

When to Use

最新 Skills