npx skills add https://github.com/0x2e/superpowers --skill writing-skills写作技能是应用于流程文档的测试驱动开发。
个人技能存放在特定代理的目录中(Claude Code 使用 ~/.claude/skills,Codex 使用 ~/.agents/skills/)
你编写测试用例(包含子代理的压力场景),观察它们失败(基线行为),编写技能(文档),观察测试通过(代理遵守),然后重构(堵住漏洞)。
核心原则: 如果你没有观察过代理在没有该技能时的失败情况,你就不知道这个技能是否教授了正确的东西。
必备背景: 在使用此技能之前,你必须理解 superpowers:test-driven-development。该技能定义了基本的 RED-GREEN-REFACTOR 循环。本技能将 TDD 应用于文档编写。
官方指导: 关于 Anthropic 官方的技能编写最佳实践,请参阅 anthropic-best-practices.md。本文档提供了额外的模式和指南,是对本技能中 TDD 重点方法的补充。
技能 是针对已验证技术、模式或工具的参考指南。技能帮助未来的 Claude 实例发现并应用有效的方法。
技能是: 可重用的技术、模式、工具、参考指南
技能不是: 关于你如何一次性解决问题的叙述
| TDD 概念 | 技能创建 |
|---|---|
| 测试用例 | 包含子代理的压力场景 |
| 生产代码 | 技能文档 (SKILL.md) |
| 测试失败 (RED) | 代理在没有技能时违反规则(基线) |
| 测试通过 (GREEN) | 代理在有技能时遵守规则 |
| 重构 | 在保持遵守的同时堵住漏洞 |
| 先写测试 | 在编写技能之前运行基线场景 |
| 观察失败 | 记录代理使用的确切合理化理由 |
| 最小化代码 | 编写针对这些特定违规行为的技能 |
| 观察通过 | 验证代理现在是否遵守 |
| 重构循环 | 发现新的合理化理由 → 堵住 → 重新验证 |
整个技能创建过程遵循 RED-GREEN-REFACTOR。
在以下情况创建:
不要为以下情况创建:
具有具体步骤的方法(condition-based-waiting, root-cause-tracing)
思考问题的方式(flatten-with-flags, test-invariants)
API 文档、语法指南、工具文档(office docs)
skills/
skill-name/
SKILL.md # 主要参考(必需)
supporting-file.* # 仅在需要时
扁平命名空间 - 所有技能在一个可搜索的命名空间中
为以下情况使用单独文件:
保持内联:
Frontmatter (YAML):
name 和 descriptionname: 仅使用字母、数字和连字符(无括号、特殊字符)description: 第三人称,仅描述何时使用(而非其作用)
---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms]
---
# 技能名称
## 概述
这是什么?用 1-2 句话说明核心原则。
## 何时使用
[如果决策不直观,使用小型内联流程图]
包含症状和使用场景的要点列表
何时不使用
## 核心模式(针对技术/模式)
前后代码对比
## 快速参考
用于扫描常见操作的表格或要点
## 实现
简单模式的内联代码
大量参考或可重用工具的文件链接
## 常见错误
出错的地方 + 修复方法
## 实际影响(可选)
具体结果
对发现至关重要: 未来的 Claude 需要找到你的技能
目的: Claude 读取描述来决定为给定任务加载哪些技能。让它回答:"我现在应该阅读这个技能吗?"
格式: 以 "Use when..." 开头,专注于触发条件
关键:描述 = 何时使用,而非技能的作用
描述应仅描述触发条件。切勿在描述中总结技能的过程或工作流。
为什么这很重要: 测试发现,当描述总结了技能的工作流时,Claude 可能会遵循描述而不是阅读完整的技能内容。一个写着 "code review between tasks" 的描述导致 Claude 只进行一次审查,即使技能的流程图清楚地显示了两次审查(规范符合性审查,然后是代码质量审查)。
当描述改为仅 "Use when executing implementation plans with independent tasks in the current session"(无工作流总结)时,Claude 正确地阅读了流程图并遵循了两阶段审查过程。
陷阱: 总结工作流的描述创建了一个 Claude 会采用的捷径。技能主体变成了 Claude 会跳过的文档。
# ❌ 错误:总结工作流 - Claude 可能遵循这个而不是阅读技能
description: Use when executing plans - dispatches subagent per task with code review between tasks
# ❌ 错误:过程细节过多
description: Use for TDD - write test first, watch it fail, write minimal code, refactor
# ✅ 良好:仅触发条件,无工作流总结
description: Use when executing implementation plans with independent tasks in the current session
# ✅ 良好:仅触发条件
description: Use when implementing any feature or bugfix, before writing implementation code
内容:
# ❌ 错误:太抽象、模糊,未包含何时使用
description: For async testing
# ❌ 错误:第一人称
description: I can help you with async tests when they're flaky
# ❌ 错误:提及技术但技能并非特定于它
description: Use when tests use setTimeout/sleep and are flaky
# ✅ 良好:以 "Use when" 开头,描述问题,无工作流
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
# ✅ 良好:技术特定技能,具有明确的触发因素
description: Use when using React Router and handling authentication redirects
使用 Claude 会搜索的词语:
使用主动语态,动词优先:
creating-skills 而非 skill-creationcondition-based-waiting 而非 async-test-helpers问题: 入门和频繁引用的技能会加载到每一次对话中。每一个令牌都很重要。
目标字数:
技巧:
将细节移至工具帮助:
# ❌ 错误:在 SKILL.md 中记录所有标志
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
# ✅ 良好:参考 --help
search-conversations supports multiple modes and filters. Run --help for details.
使用交叉引用:
# ❌ 错误:重复工作流细节
When searching, dispatch subagent with template...
[20 lines of repeated instructions]
# ✅ 良好:引用其他技能
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
压缩示例:
# ❌ 错误:冗长的示例 (42 词)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
# ✅ 良好:最小化示例 (20 词)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
消除冗余:
验证:
wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 total
按你所做的或核心见解命名:
condition-based-waiting > async-test-helpersusing-skills 而非 skill-usageflatten-with-flags > data-structure-refactoringroot-cause-tracing > debugging-techniques动名词 (-ing) 适用于过程:
creating-skills, testing-skills, debugging-with-logs在编写引用其他技能的文档时:
仅使用技能名称,并带有明确的要求标记:
**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debuggingSee skills/testing/test-driven-development(不清楚是否必需)@skills/testing/test-driven-development/SKILL.md(强制加载,消耗上下文)为什么不用 @ 链接: @ 语法会立即强制加载文件,在你需要它们之前就消耗了 200k+ 的上下文。
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}
仅在以下情况使用流程图:
切勿在以下情况使用流程图:
关于 graphviz 样式规则,请参阅 @graphviz-conventions.dot。
为你的合作伙伴可视化: 使用此目录中的 render-graphs.js 将技能的流程图渲染为 SVG:
./render-graphs.js ../some-skill # 每个图表单独渲染
./render-graphs.js ../some-skill --combine # 所有图表在一个 SVG 中
一个优秀的示例胜过许多平庸的示例
选择最相关的语言:
良好的示例:
不要:
你擅长移植——一个优秀的示例就足够了。
defense-in-depth/
SKILL.md # 所有内容内联
何时:所有内容都合适,不需要大量参考
condition-based-waiting/
SKILL.md # 概述 + 模式
example.ts # 可供适配的工作助手
何时:工具是可重用代码,而不仅仅是叙述
pptx/
SKILL.md # 概述 + 工作流
pptxgenjs.md # 600 行 API 参考
ooxml.md # 500 行 XML 结构
scripts/ # 可执行工具
何时:参考材料太大,无法内联
NO SKILL WITHOUT A FAILING TEST FIRST
这适用于新技能和对现有技能的编辑。
在测试之前编写技能?删除它。重新开始。未经测试就编辑技能?同样的违规。
没有例外:
必备背景: superpowers:test-driven-development 技能解释了为什么这很重要。同样的原则适用于文档。
不同类型的技能需要不同的测试方法:
示例: TDD, verification-before-completion, designing-before-coding
测试方法:
成功标准: 代理在最大压力下遵循规则
示例: condition-based-waiting, root-cause-tracing, defensive-programming
测试方法:
成功标准: 代理成功地将技术应用于新场景
示例: reducing-complexity, information-hiding concepts
测试方法:
成功标准: 代理正确识别何时/如何应用模式
示例: API 文档、命令参考、库指南
测试方法:
成功标准: 代理找到并正确应用参考信息
| 借口 | 现实 |
|---|---|
| "技能显然很清晰" | 对你清晰 ≠ 对其他代理清晰。测试它。 |
| "它只是一个参考" | 参考可能有空白、不清楚的部分。测试检索。 |
| "测试过度了" | 未经测试的技能有问题。总是如此。15 分钟测试节省数小时。 |
| "如果出现问题我会测试" | 问题 = 代理无法使用技能。在部署前测试。 |
| "测试太繁琐" | 测试比在生产中调试坏技能更不繁琐。 |
| "我确信它很好" | 过度自信保证有问题。无论如何都要测试。 |
| "学术审查就够了" | 阅读 ≠ 使用。测试应用场景。 |
| "没时间测试" | 部署未经测试的技能会浪费更多时间在以后修复它。 |
所有这些都意味着:在部署前测试。没有例外。
强制执行纪律的技能(如 TDD)需要抵制合理化理由。代理很聪明,在压力下会找到漏洞。
心理学说明: 理解为什么说服技巧有效,有助于你系统地应用它们。关于权威、承诺、稀缺性、社会认同和统一性原则的研究基础,请参阅 persuasion-principles.md (Cialdini, 2021; Meincke et al., 2025)。
不要仅仅陈述规则——禁止特定的变通方法:
没有例外:
</Good>
### 处理"精神与字面"的争论
尽早添加基本原则:
```markdown
**违反规则的字面意思就是违反规则的精神。**
这切断了整个"我遵循精神"的合理化理由类别。
从基线测试中捕获合理化理由(参见下面的测试部分)。代理提出的每一个借口都放入表中:
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
让代理在合理化时容易进行自我检查:
## 危险信号 - 停止并重新开始
- 先写代码后测试
- "我已经手动测试过了"
- "事后测试能达到相同目的"
- "这是关于精神而非仪式"
- "这不一样,因为..."
**所有这些都意味着:删除代码。用 TDD 重新开始。**
添加到描述中:当你即将违反规则时的症状:
description: use when implementing any feature or bugfix, before writing implementation code
遵循 TDD 循环:
在没有技能的情况下运行包含子代理的压力场景。记录确切行为:
这就是"观察测试失败"——你必须在编写技能之前看到代理自然的行为。
编写针对这些特定合理化理由的技能。不要为假设的情况添加额外内容。
使用技能运行相同的场景。代理现在应该遵守。
代理找到了新的合理化理由?添加明确的应对措施。重新测试直到无懈可击。
测试方法: 关于完整的测试方法,请参阅 @testing-skills-with-subagents.md:
"In session 2025-10-03, we found empty projectDir caused..." 为什么不好: 太具体,不可重用
example-js.js, example-py.py, example-go.go 为什么不好: 质量平庸,维护负担重
step1 [label="import fs"];
step2 [label="read file"];
为什么不好: 无法复制粘贴,难以阅读
helper1, helper2, step3, pattern4 为什么不好: 标签应具有语义含义
编写任何技能后,你必须停止并完成部署过程。
不要:
下面的部署清单对每个技能都是强制性的。
部署未经测试的技能 = 部署未经测试的代码。这违反了质量标准。
重要:使用 TodoWrite 为下面的每个清单项创建待办事项。
RED 阶段 - 编写失败的测试:
GREEN 阶段 - 编写最小化技能:
REFACTOR 阶段 - 堵住漏洞:
质量检查:
部署:
未来 Claude 如何找到你的技能:
为此流程优化——尽早并经常放置可搜索的术语。
创建技能就是流程文档的 TDD。
同样的铁律:没有失败的测试就没有技能。同样的循环:RED(基线)→ GREEN(编写技能)→ REFACTOR(堵住漏洞)。同样的好处:更好的质量,更少的意外,无懈可击的结果。
如果你对代码遵循 TDD,对技能也要遵循。这是应用于文档的相同纪律。
每周安装
2
仓库
首次出现
2 天前
安全审计
Gen Agent Trust HubWarnSocketPassSnykPass
安装于
amp2
cline2
opencode2
cursor2
kimi-cli2
codex2
Writing skills IS Test-Driven Development applied to process documentation.
Personal skills live in agent-specific directories (~/.claude/skills for Claude Code, ~/.agents/skills/ for Codex)
You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
REQUIRED BACKGROUND: You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.
Skills are: Reusable techniques, patterns, tools, reference guides
Skills are NOT: Narratives about how you solved a problem once
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| TDD Concept | Skill Creation |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | Skill document (SKILL.md) |
| Test fails (RED) | Agent violates rule without skill (baseline) |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes while maintaining compliance |
| Write test first | Run baseline scenario BEFORE writing skill |
| Watch it fail | Document exact rationalizations agent uses |
| Minimal code | Write skill addressing those specific violations |
| Watch it pass | Verify agent now complies |
| Refactor cycle | Find new rationalizations → plug → re-verify |
The entire skill creation process follows RED-GREEN-REFACTOR.
Create when:
Don't create for:
Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)
Way of thinking about problems (flatten-with-flags, test-invariants)
API docs, syntax guides, tool documentation (office docs)
skills/
skill-name/
SKILL.md # Main reference (required)
supporting-file.* # Only if needed
Flat namespace - all skills in one searchable namespace
Separate files for:
Keep inline:
Frontmatter (YAML):
Only two fields supported: name and description
Max 1024 characters total
name: Use letters, numbers, and hyphens only (no parentheses, special chars)
description: Third-person, describes ONLY when to use (NOT what it does)
What is this? Core principle in 1-2 sentences.
[Small inline flowchart IF decision non-obvious]
Bullet list with SYMPTOMS and use cases When NOT to use
Before/after code comparison
Table or bullets for scanning common operations
Inline code for simple patterns Link to file for heavy reference or reusable tools
What goes wrong + fixes
Concrete results
Critical for discovery: Future Claude needs to FIND your skill
Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
Format: Start with "Use when..." to focus on triggering conditions
CRITICAL: Description = When to Use, NOT What the Skill Does
The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.
Why this matters: Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).
When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.
The trap: Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.
# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review between tasks
# ❌ BAD: Too much process detail
description: Use for TDD - write test first, watch it fail, write minimal code, refactor
# ✅ GOOD: Just triggering conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session
# ✅ GOOD: Triggering conditions only
description: Use when implementing any feature or bugfix, before writing implementation code
Content:
Use concrete triggers, symptoms, and situations that signal this skill applies
Describe the problem (race conditions, inconsistent behavior) not language-specific symptoms (setTimeout, sleep)
Keep triggers technology-agnostic unless the skill itself is technology-specific
If skill is technology-specific, make that explicit in the trigger
Write in third person (injected into system prompt)
NEVER summarize the skill's process or workflow
description: For async testing
description: I can help you with async tests when they're flaky
description: Use when tests use setTimeout/sleep and are flaky
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
description: Use when using React Router and handling authentication redirects
Use words Claude would search for:
Use active voice, verb-first:
creating-skills not skill-creationcondition-based-waiting not async-test-helpersProblem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
Target word counts:
Techniques:
Move details to tool help:
# ❌ BAD: Document all flags in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
# ✅ GOOD: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.
Use cross-references:
# ❌ BAD: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]
# ✅ GOOD: Reference other skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
Compress examples:
# ❌ BAD: Verbose example (42 words)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
# ✅ GOOD: Minimal example (20 words)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
Eliminate redundancy:
Verification:
wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 total
Name by what you DO or core insight:
condition-based-waiting > async-test-helpersusing-skills not skill-usageflatten-with-flags > data-structure-refactoringroot-cause-tracing > debugging-techniquesGerunds (-ing) work well for processes:
creating-skills, testing-skills, debugging-with-logsWhen writing documentation that references other skills:
Use skill name only, with explicit requirement markers:
**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debuggingSee skills/testing/test-driven-development (unclear if required)@skills/testing/test-driven-development/SKILL.md (force-loads, burns context)Why no @ links: @ syntax force-loads files immediately, consuming 200k+ context before you need them.
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}
Use flowcharts ONLY for:
Never use flowcharts for:
See @graphviz-conventions.dot for graphviz style rules.
Visualizing for your human partner: Use render-graphs.js in this directory to render a skill's flowcharts to SVG:
./render-graphs.js ../some-skill # Each diagram separately
./render-graphs.js ../some-skill --combine # All diagrams in one SVG
One excellent example beats many mediocre ones
Choose most relevant language:
Good example:
Don't:
You're good at porting - one great example is enough.
defense-in-depth/
SKILL.md # Everything inline
When: All content fits, no heavy reference needed
condition-based-waiting/
SKILL.md # Overview + patterns
example.ts # Working helpers to adapt
When: Tool is reusable code, not just narrative
pptx/
SKILL.md # Overview + workflows
pptxgenjs.md # 600 lines API reference
ooxml.md # 500 lines XML structure
scripts/ # Executable tools
When: Reference material too large for inline
NO SKILL WITHOUT A FAILING TEST FIRST
This applies to NEW skills AND EDITS to existing skills.
Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation.
No exceptions:
REQUIRED BACKGROUND: The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.
Different skill types need different test approaches:
Examples: TDD, verification-before-completion, designing-before-coding
Test with:
Success criteria: Agent follows rule under maximum pressure
Examples: condition-based-waiting, root-cause-tracing, defensive-programming
Test with:
Success criteria: Agent successfully applies technique to new scenario
Examples: reducing-complexity, information-hiding concepts
Test with:
Success criteria: Agent correctly identifies when/how to apply pattern
Examples: API documentation, command references, library guides
Test with:
Success criteria: Agent finds and correctly applies reference information
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
All of these mean: Test before deploying. No exceptions.
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
Psychology note: Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
Don't just state the rule - forbid specific workarounds:
No exceptions:
Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete
</Good>Add foundational principle early:
**Violating the letter of the rules is violating the spirit of the rules.**
This cuts off entire class of "I'm following the spirit" rationalizations.
Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
Make it easy for agents to self-check when rationalizing:
## Red Flags - STOP and Start Over
- Code before test
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**
Add to description: symptoms of when you're ABOUT to violate the rule:
description: use when implementing any feature or bugfix, before writing implementation code
Follow the TDD cycle:
Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
This is "watch the test fail" - you must see what agents naturally do before writing the skill.
Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
Run same scenarios WITH skill. Agent should now comply.
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
Testing methodology: See @testing-skills-with-subagents.md for the complete testing methodology:
"In session 2025-10-03, we found empty projectDir caused..." Why bad: Too specific, not reusable
example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden
step1 [label="import fs"];
step2 [label="read file"];
Why bad: Can't copy-paste, hard to read
helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning
After writing ANY skill, you MUST STOP and complete the deployment process.
Do NOT:
The deployment checklist below is MANDATORY for EACH skill.
Deploying untested skills = deploying untested code. It's a violation of quality standards.
IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.
RED Phase - Write Failing Test:
GREEN Phase - Write Minimal Skill:
REFACTOR Phase - Close Loopholes:
Quality Checks:
Deployment:
How future Claude finds your skill:
Optimize for this flow - put searchable terms early and often.
Creating skills IS TDD for process documentation.
Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.
If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
Weekly Installs
2
Repository
First Seen
2 days ago
Security Audits
Installed on
amp2
cline2
opencode2
cursor2
kimi-cli2
codex2
React Router 框架模式指南:全栈开发、文件路由、数据加载与渲染策略
1,200 周安装