sadd%3Ajudge by neolabhq/context-engineering-kit
npx skills add https://github.com/neolabhq/context-engineering-kit --skill sadd:judge在启动评估流程之前,先确定需要评估的内容:
确定要评估的工作: * 检查对话历史记录中已完成的工作 * 如果提供了参数:使用参数来聚焦于特定方面 * 如果不明确:询问用户“我应该评估什么工作?(代码变更、分析、文档等)”
提取评估上下文: * 促使该项工作的原始任务或请求 * 实际产生的输出/结果 * 创建或修改的文件(附简要说明) * 提到的任何约束、要求或验收标准 * 工件类型(代码、文档、配置等)
向用户提供范围:
评估范围:
正在启动元评估器以生成评估标准...
重要提示:仅将提取的上下文传递给子代理——而非整个对话。这可以防止上下文污染,并实现重点评估。
启动一个元评估器代理,以生成针对正在评估的特定工作量身定制的评估规范。元评估器将返回一个包含评分细则、检查清单和评分标准的评估规范 YAML。
元评估器提示:
## 任务
为以下评估任务生成一个评估规范 yaml。您将生成评分细则、检查清单和评分标准,供评估员代理用于评估工作。
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## 用户提示
{促使该项工作的原始任务或请求}
## 上下文
{关于被评估工作的任何相关上下文}
{来自参数的评估重点,或“总体质量评估”}
## 工件类型
{代码 | 文档 | 配置 | 等}
## 指令
在您的回复中仅返回最终的评估规范 YAML。
派遣:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
使用任务工具:
- description: "元评估器:为{简要工作摘要}生成评估标准"
- prompt: {元评估器提示}
- model: opus
- subagent_type: "sadd:meta-judge"
等待元评估器完成,然后再进入阶段 3。
元评估器完成后,提取其评估规范 YAML,并将工作上下文和规范一起派遣给评估员代理。
关键:向评估员提供完全相同的元评估器评估规范 YAML。不要跳过、添加、修改、缩短或总结其中的任何文本!
评估员代理提示:
您是一位专家评估员,根据元评估器生成的评估规范来评估工作质量。
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## 被评估的工作
[原始任务]
{粘贴原始请求/任务}
[/原始任务]
[工作输出]
{创建/修改内容的摘要}
[/工作输出]
[涉及的文件]
{文件列表及简要说明}
[/涉及的文件]
## 评估规范
```yaml
{元评估器的评估规范 YAML}
按照您在代理指令中定义的完整评估流程执行!
关键:您必须在回复的开头使用以下确切的 YAML 结构化评估报告格式!
关键:切勿以任何格式向评估员提供分数阈值。评估员绝不能知道分数阈值是多少,以避免产生偏见!!!
**派遣:**
使用任务工具:
description: "评估员:评估{简要工作摘要}"
prompt: {包含确切元评估器规范 YAML 的评估员提示}
model: opus
subagent_type: "sadd:judge"
收到评估员的评估后:
| 分数范围 | 裁决 | 解读 | 建议 |
|---|---|---|---|
| 4.50 - 5.00 | 优秀 | 质量卓越,超出预期 | 可直接使用 |
| 4.00 - 4.49 | 良好 | 质量扎实,符合专业标准 | 可选的微小改进 |
| 3.50 - 3.99 | 可接受 | 足够但仍有改进空间 | 建议改进 |
| 3.00 - 3.49 | 需要改进 | 低于标准,需要完善 | 使用前需解决问题 |
| 1.00 - 2.99 | 不足 | 未达到基本要求 | 需要大量返工 |
每周安装量
228
代码仓库
GitHub 星标数
699
首次出现
2026年2月19日
安装于
opencode223
codex222
github-copilot221
gemini-cli220
kimi-cli218
cursor218
Before launching the evaluation pipeline, identify what needs evaluation:
Identify the work to evaluate :
Extract evaluation context :
Provide scope for user :
Evaluation Scope:
- Original request: [summary]
- Work produced: [description]
- Files involved: [list]
- Artifact type: [code | documentation | configuration | etc.]
- Evaluation focus: [from arguments or "general quality"]
Launching meta-judge to generate evaluation criteria...
IMPORTANT : Pass only the extracted context to the sub-agents - not the entire conversation. This prevents context pollution and enables focused assessment.
Launch a meta-judge agent to generate an evaluation specification tailored to the specific work being evaluated. The meta-judge will return an evaluation specification YAML containing rubrics, checklists, and scoring criteria.
Meta-Judge Prompt:
## Task
Generate an evaluation specification yaml for the following evaluation task. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the work.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## User Prompt
{Original task or request that prompted the work}
## Context
{Any relevant context about the work being evaluated}
{Evaluation focus from arguments, or "General quality assessment"}
## Artifact Type
{code | documentation | configuration | etc.}
## Instructions
Return only the final evaluation specification YAML in your response.
Dispatch:
Use Task tool:
- description: "Meta-judge: Generate evaluation criteria for {brief work summary}"
- prompt: {meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
Wait for the meta-judge to complete before proceeding to Phase 3.
After the meta-judge completes, extract its evaluation specification YAML and dispatch the judge agent with both the work context and the specification.
CRITICAL: Provide to the judge the EXACT meta-judge evaluation specification YAML. Do not skip, add, modify, shorten, or summarize any text in it!
Judge Agent Prompt:
You are an Expert Judge evaluating the quality of work against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
## Work Under Evaluation
[ORIGINAL TASK]
{paste the original request/task}
[/ORIGINAL TASK]
[WORK OUTPUT]
{summary of what was created/modified}
[/WORK OUTPUT]
[FILES INVOLVED]
{list of files with brief descriptions}
[/FILES INVOLVED]
## Evaluation Specification
```yaml
{meta-judge's evaluation specification YAML}
Follow your full judge process as defined in your agent instructions!
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
CRITICAL: NEVER provide score threshold to judges in any format. Judge MUST not know what threshold for score is, in order to not be biased!!!
**Dispatch:**
Use Task tool:
description: "Judge: Evaluate {brief work summary}"
prompt: {judge prompt with exact meta-judge specification YAML}
model: opus
subagent_type: "sadd:judge"
After receiving the judge's evaluation:
Validate the evaluation:
If validation fails:
Present results to user:
| Score Range |
|---|
Weekly Installs
228
Repository
GitHub Stars
699
First Seen
Feb 19, 2026
Installed on
opencode223
codex222
github-copilot221
gemini-cli220
kimi-cli218
cursor218
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
58,500 周安装
cc-skill-strategic-compact:开发技能,提升AI编程助手效率,支持Claude、Cursor等主流工具
327 周安装
Expo React Native 性能优化指南:8大类42条关键规则,提升应用启动与渲染速度
352 周安装
Lottie动画性能优化指南:dotLottie格式、React实现与最佳实践
355 周安装
Accord Project 智能合同模板生成器 - 自动化创建法律合同与协议模板
404 周安装
字节跳动Web界面规范检查工具 - 自动化代码审查与设计规范验证
410 周安装
研究技能:AI驱动的代码库深度探索工具,支持多角度分析和知识注入
322 周安装
| Verdict |
|---|
| Interpretation |
|---|
| Recommendation |
|---|
| 4.50 - 5.00 | EXCELLENT | Exceptional quality, exceeds expectations | Ready as-is |
| 4.00 - 4.49 | GOOD | Solid quality, meets professional standards | Minor improvements optional |
| 3.50 - 3.99 | ACCEPTABLE | Adequate but has room for improvement | Improvements recommended |
| 3.00 - 3.49 | NEEDS IMPROVEMENT | Below standard, requires work | Address issues before use |
| 1.00 - 2.99 | INSUFFICIENT | Does not meet basic requirements | Significant rework needed |