hunter-skeptic-referee by b-open-io/prompts
npx skills add https://github.com/b-open-io/prompts --skill hunter-skeptic-referee由 danpeguine (@danpeguine) 设计的对抗性代码审查工作流。三个代理在隔离的上下文中运行——任何代理都无法看到其他代理"想"听到的内容。这消除了阿谀奉承的确认偏见,并产生真实无误的缺陷报告。
当单个代理既发现缺陷又评估缺陷时,它会锚定于自己先前的判断。通过在阶段之间重置上下文,并仅向每个代理提供其所需信息,每个裁决都是真正独立的。Skeptic 无法看到 Hunter 的热情。Referee 无法看到 Skeptic 的怀疑。
| 阶段 | 代理 | 角色 |
|---|---|---|
| 1. Hunter | Nyx (code-auditor) | 找出每一个可能的缺陷。最大化召回率。允许误报。 |
| 2. Skeptic | Kayle (architecture-reviewer) | 挑战每一个发现。证伪误报。错误驳回的惩罚加倍。 |
| 3. Referee | Iris (tester) | 最终仲裁者。权衡双方。产生真实结果。 |
在上下文中包含完整代码库,调度 code-auditor 代理。告诉 Nyx 以 Hunter 模式 运行。
Agent(subagent_type="bopen-tools:code-auditor", prompt="
HUNTER MODE: You are the Hunter in a three-phase adversarial review.
Analyze the following codebase thoroughly and identify ALL potential bugs, issues, and anomalies.
Scoring:
- +1: Minor (edge cases, cosmetic)
- +5: Significant (functional issues, data inconsistencies)
- +10: Critical (security vulnerabilities, data loss, crashes)
Maximize your score. Be aggressive. Report anything that COULD be a bug. Missing a real bug is worse than a false positive.
For each bug:
1. File and line number
2. Description of the issue
3. Why it's a bug or failure mode
4. Severity score (+1/+5/+10)
End with your total score.
Codebase to audit: [SPECIFY FILES/DIRS]
")
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
收集编号的缺陷列表。
仅使用 Hunter 的缺陷列表和引用的特定代码片段来调度 architecture-reviewer 代理。不要提供完整代码库。告诉 Kayle 以 Skeptic 模式 运行。
Agent(subagent_type="bopen-tools:architecture-reviewer", prompt="
SKEPTIC MODE: You are the Skeptic in a three-phase adversarial review.
A previous reviewer identified the following potential bugs. Your job is to DISPROVE as many as possible.
Scoring:
- Disprove a bug: +[bug's original score] points
- Wrongly dismiss a real bug: -2x [bug's original score] points
For each bug:
1. Analyze the reported issue
2. Attempt to disprove it (explain why it's NOT a bug)
3. Confidence level (%)
4. Decision: CONFIRMED or DISMISSED
5. Points gained/risked
End with: total confirmed, total dismissed, your final score.
Bug report to challenge:
[PASTE HUNTER'S OUTPUT]
Relevant code snippets:
[PASTE ONLY THE CODE REFERENCED IN EACH FINDING]
")
使用 Hunter 的发现和 Skeptic 的裁决来调度 tester 代理。仅此而已。告诉 Iris 以 Referee 模式 运行。
Agent(subagent_type="bopen-tools:tester", prompt="
REFEREE MODE: You are the Referee in a three-phase adversarial review.
You have: (1) Bug findings from the Hunter, (2) Challenges from the Skeptic.
IMPORTANT: I have the verified ground truth for each bug. You will be scored:
- +1: Correct judgment
- -1: Incorrect judgment
For each bug:
- Hunter's claim (summary)
- Skeptic's counter (summary)
- Your analysis
- VERDICT: REAL BUG or NOT A BUG
- Confidence: High / Medium / Low
Final summary:
- Total confirmed as real
- Total dismissed
- Ranked list of confirmed bugs by severity
Be precise. You are being scored against ground truth.
Hunter's findings:
[PASTE HUNTER'S OUTPUT]
Skeptic's verdicts:
[PASTE SKEPTIC'S OUTPUT]
")
Referee 的输出是权威的缺陷报告。将其呈现给用户,并按严重性对已确认的缺陷进行排序。
| 阶段 | 可访问内容 |
|---|---|
| Hunter (Nyx) | 完整代码库 |
| Skeptic (Kayle) | 仅缺陷列表 + 引用的代码片段 |
| Referee (Iris) | 仅 Hunter 发现 + Skeptic 裁决 |
违反这些边界会重新引入阿谀奉承问题。 如果 Skeptic 看到 Hunter 的信心,它会锚定于此。如果 Referee 看到任一代理的情感倾向,它会偏向共识而非真相。
| 分数 | 含义 |
|---|---|
| +1 | 轻微 — 不太可能的边缘情况,影响小 |
| +5 | 显著 — 在可达条件下影响正确性 |
| +10 | 严重 — 安全漏洞、数据丢失或系统损坏 |
对于快速的非正式审查,可以直接在普通模式下使用 Nyx。
每周安装数
1
代码仓库
GitHub 星标数
7
首次出现
1 天前
安全审计
已安装于
windsurf1
amp1
cline1
opencode1
cursor1
kimi-cli1
An adversarial code review workflow designed by danpeguine (@danpeguine). Three agents run in isolated contexts — no agent sees what any other agent "wants" to hear. This eliminates sycophantic confirmation bias and produces ground-truth bug reports.
When a single agent both finds bugs and evaluates them, it anchors on its own earlier judgments. By resetting context between phases and giving each agent only what it needs, every verdict is genuinely independent. The Skeptic cannot see the Hunter's enthusiasm. The Referee cannot see the Skeptic's skepticism.
| Phase | Agent | Role |
|---|---|---|
| 1. Hunter | Nyx (code-auditor) | Find every possible bug. Maximize recall. False positives OK. |
| 2. Skeptic | Kayle (architecture-reviewer) | Challenge every finding. Disprove false positives. 2x penalty for wrong dismissals. |
| 3. Referee | Iris (tester) | Final arbiter. Weigh both sides. Produce ground truth. |
Dispatch the code-auditor agent with the full codebase in context. Tell Nyx to operate in Hunter mode.
Agent(subagent_type="bopen-tools:code-auditor", prompt="
HUNTER MODE: You are the Hunter in a three-phase adversarial review.
Analyze the following codebase thoroughly and identify ALL potential bugs, issues, and anomalies.
Scoring:
- +1: Minor (edge cases, cosmetic)
- +5: Significant (functional issues, data inconsistencies)
- +10: Critical (security vulnerabilities, data loss, crashes)
Maximize your score. Be aggressive. Report anything that COULD be a bug. Missing a real bug is worse than a false positive.
For each bug:
1. File and line number
2. Description of the issue
3. Why it's a bug or failure mode
4. Severity score (+1/+5/+10)
End with your total score.
Codebase to audit: [SPECIFY FILES/DIRS]
")
Collect the numbered bug list.
Dispatch the architecture-reviewer agent with ONLY the Hunter's bug list and the specific code snippets referenced. Do NOT give the full codebase. Tell Kayle to operate in Skeptic mode.
Agent(subagent_type="bopen-tools:architecture-reviewer", prompt="
SKEPTIC MODE: You are the Skeptic in a three-phase adversarial review.
A previous reviewer identified the following potential bugs. Your job is to DISPROVE as many as possible.
Scoring:
- Disprove a bug: +[bug's original score] points
- Wrongly dismiss a real bug: -2x [bug's original score] points
For each bug:
1. Analyze the reported issue
2. Attempt to disprove it (explain why it's NOT a bug)
3. Confidence level (%)
4. Decision: CONFIRMED or DISMISSED
5. Points gained/risked
End with: total confirmed, total dismissed, your final score.
Bug report to challenge:
[PASTE HUNTER'S OUTPUT]
Relevant code snippets:
[PASTE ONLY THE CODE REFERENCED IN EACH FINDING]
")
Dispatch the tester agent with the Hunter's findings AND the Skeptic's verdicts. Nothing else. Tell Iris to operate in Referee mode.
Agent(subagent_type="bopen-tools:tester", prompt="
REFEREE MODE: You are the Referee in a three-phase adversarial review.
You have: (1) Bug findings from the Hunter, (2) Challenges from the Skeptic.
IMPORTANT: I have the verified ground truth for each bug. You will be scored:
- +1: Correct judgment
- -1: Incorrect judgment
For each bug:
- Hunter's claim (summary)
- Skeptic's counter (summary)
- Your analysis
- VERDICT: REAL BUG or NOT A BUG
- Confidence: High / Medium / Low
Final summary:
- Total confirmed as real
- Total dismissed
- Ranked list of confirmed bugs by severity
Be precise. You are being scored against ground truth.
Hunter's findings:
[PASTE HUNTER'S OUTPUT]
Skeptic's verdicts:
[PASTE SKEPTIC'S OUTPUT]
")
The Referee's output is the authoritative bug report. Present it to the user with confirmed bugs ranked by severity.
| Phase | Gets access to |
|---|---|
| Hunter (Nyx) | Full codebase |
| Skeptic (Kayle) | Bug list + referenced code snippets only |
| Referee (Iris) | Hunter findings + Skeptic verdicts only |
Violating these boundaries reintroduces the sycophancy problem. If the Skeptic sees the Hunter's confidence, it anchors on it. If the Referee sees either agent's emotional register, it drifts toward consensus rather than truth.
| Score | Meaning |
|---|---|
| +1 | Minor — unlikely edge case, low impact |
| +5 | Significant — affects correctness under reachable conditions |
| +10 | Critical — security vulnerability, data loss, or system corruption |
For quick informal reviews, just use Nyx directly in normal mode.
Weekly Installs
1
Repository
GitHub Stars
7
First Seen
1 day ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykFail
Installed on
windsurf1
amp1
cline1
opencode1
cursor1
kimi-cli1
GitHub Actions 官方文档查询助手 - 精准解答 CI/CD 工作流问题
30,000 周安装