Skill Judge：AI Agent Skill评估工具，官方规范与17+示例提炼，优化知识增量与思维模式 | SkillsMD

Skill Judge：AI Agent Skill评估工具，官方规范与17+示例提炼，优化知识增量与思维模式

skill-judge by davila7/claude-code-templates

163 周安装量

23,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill skill-judge

AI/机器学习代码规范提示工程

🇨🇳中文介绍

Skill Judge

根据官方规范和从 17 个以上官方示例中提炼的模式，评估 Agent Skills。

核心理念

什么是 Skill？

Skill 不是教程。Skill 是一种知识外化机制。

传统的 AI 知识被锁定在模型参数中。要教授新能力：

Traditional: Collect data → GPU cluster → Train → Deploy new version
Cost: $10,000 - $1,000,000+
Timeline: Weeks to months

Skills 改变了这一点：

Skill: Edit SKILL.md → Save → Takes effect on next invocation
Cost: $0
Timeline: Instant

这是从"训练 AI"到"教育 AI"的范式转变——就像一个无需训练的热插拔 LoRA 适配器。你用自然语言编辑一个 Markdown 文件，模型的行为就会改变。

核心公式

好的 Skill = 专家专属知识 − Claude 已经知道的知识

Skill 的价值由其知识增量来衡量——即它提供的知识与模型已有知识之间的差距。

专家专属知识：决策树、权衡取舍、边界情况、反模式、特定领域的思维框架——这些需要多年经验才能积累
Claude 已经知道的知识：基本概念、标准库用法、常见编程模式、通用最佳实践

当一个 Skill 解释"什么是 PDF"或"如何编写 for 循环"时，它是在压缩 Claude 已经知道的知识。这是令牌浪费——上下文窗口是与系统提示、对话历史、其他 Skills 和用户请求共享的公共资源。

工具 vs Skill

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

843,800 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

113,700 周安装

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

63,800 周安装

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

46,800 周安装

概念	本质	功能	示例
工具	模型能做什么	执行操作	bash, read_file, write_file, WebSearch
Skill	模型知道如何做	指导决策	PDF 处理, MCP 构建, 前端设计

General Agent + Excellent Skill = Domain Expert Agent

类型	定义	处理方式
专家	Claude 确实不知道这个	必须保留——这是 Skill 的价值所在
激活	Claude 知道但可能想不到	如果简短则保留——作为提醒
冗余	Claude 肯定知道这个	应该删除——浪费令牌

分数	标准
0-5	解释 Claude 知道的基础知识（什么是 X，如何编写代码，标准库教程）
6-10	混合：一些专家知识被明显的内容稀释
11-15	主要是专家知识，冗余内容极少
16-20	纯粹的知识增量——每个段落都物有所值

类型	示例	价值
思维模式	"设计前，问：什么让这个令人难忘？"	高——塑造决策
特定领域流程	"OOXML 工作流：解包 → 编辑 XML → 验证 → 打包"	高——Claude 可能不知道这个
通用流程	"步骤 1：打开文件，步骤 2：编辑，步骤 3：保存"	低——Claude 已经知道
分数	标准
---	---
0-3	只有 Claude 已经知道的通用流程
4-7	有特定领域流程但缺乏思维框架
8-11	良好的平衡：思维模式 + 特定领域工作流
12-15	专家级：塑造思维并提供 Claude 不知道的流程

Before [action], ask yourself:
- **Purpose**: What problem does this solve? Who uses it?
- **Constraints**: What are the hidden requirements?
- **Differentiation**: What makes this solution memorable?

### Redlining Workflow (Claude wouldn't know this sequence)
1. Convert to markdown: `pandoc --track-changes=all`
2. Map text to XML: grep for text in document.xml
3. Implement changes in batches of 3-10
4. Pack and verify: check ALL changes were applied

Step 1: Open the file
Step 2: Find the section
Step 3: Make the change
Step 4: Save and test

分数	标准
0-3	未提及反模式
4-7	通用警告（"避免错误"、"小心"、"考虑边界情况"）
8-11	带有一些推理的特定"绝对不要"列表
12-15	带有原因解释的专家级反模式——只有经验才能教会的事情

NEVER use generic AI-generated aesthetics like:
- Overused font families (Inter, Roboto, Arial)
- Cliched color schemes (particularly purple gradients on white backgrounds)
- Predictable layouts and component patterns
- Default border-radius on everything

Avoid making mistakes.
Be careful with edge cases.
Don't write bad code.

分数	标准
0-5	缺少 frontmatter 或格式无效
6-10	有 frontmatter 但描述模糊或不完整
11-13	有效的 frontmatter，描述有 WHAT 但 WHEN 部分较弱
14-15	完美：包含 WHAT、WHEN 和触发关键词的全面描述

┌─────────────────────────────────────────────────────────────────────┐
│  SKILL ACTIVATION FLOW                                              │
│                                                                     │
│  User Request → Agent sees ALL skill descriptions → Decides which  │
│                 (only descriptions, not bodies!)     to activate    │
│                                                                     │
│  If description doesn't match → Skill NEVER gets loaded            │
│  If description is vague → Skill might not trigger when it should  │
│  If description lacks keywords → Skill is invisible to the Agent   │
└─────────────────────────────────────────────────────────────────────┘

description: "Comprehensive document creation, editing, and analysis with support
for tracked changes, comments, formatting preservation, and text extraction.
When Claude needs to work with professional documents (.docx files) for:
(1) Creating new documents, (2) Modifying or editing content,
(3) Working with tracked changes, (4) Adding comments, or any other document tasks"

description: "处理文档相关功能"

description: "A helpful skill for various tasks"

Layer 1: Metadata (always in memory)
         Only name + description
         ~100 tokens per skill

Layer 2: SKILL.md Body (loaded after triggering)
         Detailed guidelines, code examples, decision trees
         Ideal: < 500 lines

Layer 3: Resources (loaded on demand)
         scripts/, references/, assets/
         No limit

分数	标准
0-5	所有内容都堆在 SKILL.md 中（>500 行，无结构）
6-10	有参考资料但不清楚何时加载
11-13	良好的分层，存在强制加载触发器
14-15	完美：决策树 + 明确的触发器 + "不要加载"指导

触发器质量	特征
差	参考资料列在末尾，无加载指导
中等	有一些触发器但未嵌入工作流中
好	工作流步骤中存在强制加载触发器
优秀	场景检测 + 条件触发器 + "不要加载"

Loading too little ◄─────────────────────────────────► Loading too much
- References sit unused                    - Wastes context space
- Agent doesn't know when to load          - Irrelevant info dilutes key content
- Knowledge is there but never accessed    - Unnecessary token overhead

### Creating New Document

**MANDATORY - READ ENTIRE FILE**: Before proceeding, you MUST read
[`docx-js.md`](docx-js.md) (~500 lines) completely from start to finish.
**NEVER set any range limits when reading this file.**

**Do NOT load** `ooxml.md` or `redlining.md` for this task.

## References
- docx-js.md - for creating documents
- ooxml.md - for editing
- redlining.md - for tracking changes

分数	标准
0-5	严重不匹配（创意任务使用僵化的脚本，脆弱操作使用模糊指导）
6-10	部分合适，存在一些不匹配
11-13	对大多数场景有良好的校准
14-15	全程完美的自由度校准

任务类型	应该具有	原因	示例 Skill
创意/设计	高自由度	多种有效方法，差异化是价值所在	frontend-design
代码审查	中等自由度	存在原则但需要判断力	code-review
文件格式操作	低自由度	一个错误的字节就会损坏文件，一致性至关重要	docx, xlsx, pdf

Commit to a BOLD aesthetic direction. Pick an extreme: brutally minimal,
maximalist chaos, retro-futuristic, organic natural...

Review priority:
1. Security vulnerabilities (must fix)
2. Logic errors (must fix)
3. Performance issues (should fix)
4. Maintainability (optional)

**MANDATORY**: Use exact script in `scripts/create-doc.py`
Parameters: --title "X" --author "Y"
Do NOT modify the script.

模式	~行数	关键特征	示例	何时使用
思维方式	~50	思维 > 技术，强大的"绝对不要"列表，高自由度	frontend-design	需要品味的创意任务
导航	~30	最小的 SKILL.md，路由到子文件	internal-comms	多个不同的场景
哲学	~150	两步：哲学 → 表达，强调工艺	canvas-design	需要原创性的艺术/创作
流程	~200	分阶段的工作流，检查点，中等自由度	mcp-builder	复杂的多步骤项目
工具	~300	决策树，代码示例，低自由度	docx, pdf, xlsx	对特定格式的精确操作
分数	标准
---	---
0-3	无可识别的模式，结构混乱
4-6	部分遵循模式但有显著偏差
7-8	清晰的模式，有轻微偏差
9-10	熟练应用适当的模式

你的任务特征	推荐模式
需要品味和创造力	思维方式（~50 行）
需要原创性和工艺质量	哲学（~150 行）
有多个不同的子场景	导航（~30 行）
复杂的多步骤项目	流程（~200 行）
对特定格式的精确操作	工具（~300 行）

分数	标准
0-5	指导令人困惑、不完整、矛盾或未经测试
6-10	可用但存在明显缺陷
11-13	对常见情况有清晰的指导
14-15	全面覆盖，包括边界情况和错误处理

| Task | Primary Tool | Fallback | When to Use Fallback |
|------|-------------|----------|----------------------|
| Read text | pdftotext | PyMuPDF | Need layout info |
| Extract tables | camelot-py | tabula-py | camelot fails |

**Common issues**:
- Scanned PDF: pdftotext returns blank → Use OCR first
- Encrypted PDF: Permission error → Use PyMuPDF with password

Use appropriate tools for PDF processing.
Handle errors properly.
Consider edge cases.

[ ] Check frontmatter validity
[ ] Count total lines in SKILL.md
[ ] List all reference files and their sizes
[ ] Identify which pattern the Skill follows
[ ] Check for loading triggers (if references exist)

Total = D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
Max = 120 points

等级	百分比	含义
A	90%+ (108+)	优秀——生产就绪的专家级 Skill
B	80-89% (96-107)	良好——需要少量改进
C	70-79% (84-95)	足够——有清晰的改进路径
D	60-69% (72-83)	低于平均水平——存在显著问题
F	<60% (<72)	差——需要根本性重新设计

# Skill Evaluation Report: [Skill Name]

## Summary
- **Total Score**: X/120 (X%)
- **Grade**: [A/B/C/D/F]
- **Pattern**: [Mindset/Navigation/Philosophy/Process/Tool]
- **Knowledge Ratio**: E:A:R = X:Y:Z
- **Verdict**: [One sentence assessment]

## Dimension Scores

| Dimension | Score | Max | Notes |
|-----------|-------|-----|-------|
| D1: Knowledge Delta | X | 20 | |
| D2: Mindset vs Mechanics | X | 15 | |
| D3: Anti-Pattern Quality | X | 15 | |
| D4: Specification Compliance | X | 15 | |
| D5: Progressive Disclosure | X | 15 | |
| D6: Freedom Calibration | X | 15 | |
| D7: Pattern Recognition | X | 10 | |
| D8: Practical Usability | X | 15 | |

## Critical Issues
[List must-fix problems that significantly impact the Skill's effectiveness]

## Top 3 Improvements
1. [Highest impact improvement with specific guidance]
2. [Second priority improvement]
3. [Third priority improvement]

## Detailed Analysis
[For each dimension scoring below 80%, provide:
- What's missing or problematic
- Specific examples from the Skill
- Concrete suggestions for improvement]

Symptom: Explains what PDF is, how Python works, basic library usage
Root cause: Author assumes Skill should "teach" the model
Fix: Claude already knows this. Delete all basic explanations.
     Focus on expert decisions, trade-offs, and anti-patterns.

Symptom: SKILL.md is 800+ lines with everything included
Root cause: No progressive disclosure design
Fix: Core routing and decision trees in SKILL.md (<300 lines ideal)
     Detailed content in references/, loaded on-demand

Symptom: References directory exists but files are never loaded
Root cause: No explicit loading triggers
Fix: Add "MANDATORY - READ ENTIRE FILE" at workflow decision points
     Add "Do NOT Load" to prevent over-loading

Symptom: Step 1, Step 2, Step 3... mechanical procedures
Root cause: Author thinks in procedures, not thinking frameworks
Fix: Transform into "Before doing X, ask yourself..."
     Focus on decision principles, not operation sequences

Symptom: "Be careful", "avoid errors", "consider edge cases"
Root cause: Author knows things can go wrong but hasn't articulated specifics
Fix: Specific NEVER list with concrete examples and non-obvious reasons
     "NEVER use X because [specific problem that takes experience to learn]"

Symptom: Great content but skill rarely gets activated
Root cause: Description is vague, missing keywords, or lacks trigger scenarios
Fix: Description must answer WHAT, WHEN, and include KEYWORDS
     "Use when..." + specific scenarios + searchable terms

Example fix:
BAD:  "Helps with document tasks"
GOOD: "Create, edit, and analyze .docx files. Use when working with
       Word documents, tracked changes, or professional document formatting."

Symptom: "When to use this Skill" section in body, not in description
Root cause: Misunderstanding of three-layer loading
Fix: Move all triggering information to description field
     Body is only loaded AFTER triggering decision is made

Symptom: README.md, CHANGELOG.md, INSTALLATION_GUIDE.md, CONTRIBUTING.md
Root cause: Treating Skill like a software project
Fix: Delete all auxiliary files. Only include what Agent needs for the task.
     No documentation about the Skill itself.

Symptom: Rigid scripts for creative tasks, vague guidance for fragile operations
Root cause: Not considering task fragility
Fix: High freedom for creative (principles, not steps)
     Low freedom for fragile (exact scripts, no parameters)

┌─────────────────────────────────────────────────────────────────────────┐
│  SKILL EVALUATION QUICK CHECK                                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  KNOWLEDGE DELTA (most important):                                      │
│    [ ] No "What is X" explanations for basic concepts                   │
│    [ ] No step-by-step tutorials for standard operations                │
│    [ ] Has decision trees for non-obvious choices                       │
│    [ ] Has trade-offs only experts would know                           │
│    [ ] Has edge cases from real-world experience                        │
│                                                                         │
│  MINDSET + PROCEDURES:                                                  │
│    [ ] Transfers thinking patterns (how to think about problems)        │
│    [ ] Has "Before doing X, ask yourself..." frameworks                 │
│    [ ] Includes domain-specific procedures Claude wouldn't know         │
│    [ ] Distinguishes valuable procedures from generic ones              │
│                                                                         │
│  ANTI-PATTERNS:                                                         │
│    [ ] Has explicit NEVER list                                          │
│    [ ] Anti-patterns are specific, not vague                            │
│    [ ] Includes WHY (non-obvious reasons)                               │
│                                                                         │
│  SPECIFICATION (description is critical!):                              │
│    [ ] Valid YAML frontmatter                                           │
│    [ ] name: lowercase, ≤64 chars                                       │
│    [ ] description answers: WHAT does it do?                            │
│    [ ] description answers: WHEN should it be used?                     │
│    [ ] description contains trigger KEYWORDS                            │
│    [ ] description is specific enough for Agent to know when to use     │
│    [ ] Includes scenarios where this skill MUST be used (not just "can be used")
│                                                                         │
│  STRUCTURE:                                                             │
│    [ ] SKILL.md < 500 lines (ideal < 300)                               │
│    [ ] Heavy content in references/                                     │
│    [ ] Loading triggers embedded in workflow                            │
│    [ ] Has "Do NOT Load" for preventing over-loading                    │
│                                                                         │
│  FREEDOM:                                                               │
│    [ ] Creative tasks → High freedom (principles)                       │
│    [ ] Fragile operations → Low freedom (exact scripts)                 │
│                                                                         │
│  USABILITY:                                                             │
│    [ ] Decision trees for multi-path scenarios                          │
│    [ ] Working code examples                                            │
│    [ ] Error handling and fallbacks                                     │
│    [ ] Edge cases covered                                               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Skill Judge：AI Agent Skill评估工具，官方规范与17+示例提炼，优化知识增量与思维模式

🇨🇳中文介绍

Skill Judge

核心理念

什么是 Skill？

核心公式

工具 vs Skill

相关 Skills

Skill 中的三种知识类型

评估维度（总分 120 分）

D1: 知识增量（20 分）——核心维度

D2: 思维方式 + 适当的流程（15 分）

D3: 反模式质量（15 分）

D4: 规范合规性——特别是描述（15 分）

D5: 渐进式披露（15 分）

D6: 自由度校准（15 分）

D7: 模式识别（10 分）

D8: 实际可用性（15 分）

评估时绝对不要做的事

评估协议

步骤 1：第一遍——知识增量扫描

步骤 2：结构分析

步骤 3：为每个维度评分

步骤 4：计算总分和等级

步骤 5：生成报告

常见失败模式

模式 1：教程式

模式 2：堆砌式

模式 3：孤立的参考资料

模式 4：复选框式流程

模式 5：模糊警告

模式 6：隐形 Skill

模式 7：位置错误

模式 8：过度工程化

模式 9：自由度不匹配

快速参考检查清单

元问题

自我评估说明

最新 Skills