writing-skills by obra/superpowers
npx skills add https://github.com/obra/superpowers --skill writing-skills写作技能是应用于流程文档的测试驱动开发。
个人技能存放在特定代理的目录中(Claude Code 使用 ~/.claude/skills,Codex 使用 ~/.agents/skills/)
你编写测试用例(包含子代理的压力场景),观察它们失败(基线行为),编写技能(文档),观察测试通过(代理遵守),然后重构(堵住漏洞)。
核心原则: 如果你没有观察过代理在没有该技能时的失败情况,你就不知道这个技能是否教授了正确的东西。
必备背景: 在使用此技能之前,你必须理解 superpowers:test-driven-development。该技能定义了基本的 RED-GREEN-REFACTOR 循环。本技能将 TDD 应用于文档编写。
官方指导: 关于 Anthropic 官方的技能编写最佳实践,请参阅 anthropic-best-practices.md。本文档提供了额外的模式和指南,以补充本技能中专注于 TDD 的方法。
技能 是经过验证的技术、模式或工具的参考指南。技能帮助未来的 Claude 实例找到并应用有效的方法。
技能是: 可重用的技术、模式、工具、参考指南
技能不是: 关于你如何一次性解决问题的叙述
| TDD 概念 | 技能创建 |
|---|---|
| 测试用例 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 包含子代理的压力场景 |
| 生产代码 | 技能文档 (SKILL.md) |
| 测试失败 (RED) | 代理在没有技能的情况下违反规则(基线) |
| 测试通过 (GREEN) | 代理在有技能的情况下遵守规则 |
| 重构 | 在保持合规性的同时堵住漏洞 |
| 先写测试 | 在编写技能之前运行基线场景 |
| 观察它失败 | 记录代理使用的确切合理化理由 |
| 最小化代码 | 编写针对这些特定违规行为的技能 |
| 观察它通过 | 验证代理现在是否遵守 |
| 重构循环 | 发现新的合理化理由 → 堵住 → 重新验证 |
整个技能创建过程遵循 RED-GREEN-REFACTOR 循环。
在以下情况下创建:
不要为以下情况创建:
具有具体步骤可遵循的具体方法(condition-based-waiting, root-cause-tracing)
思考问题的方式(flatten-with-flags, test-invariants)
API 文档、语法指南、工具文档(office docs)
skills/
skill-name/
SKILL.md # 主要参考(必需)
supporting-file.* # 仅在需要时
扁平命名空间 - 所有技能都在一个可搜索的命名空间中
为以下情况使用单独文件:
保持内联:
Frontmatter (YAML):
仅支持两个字段:name 和 description
总计最多 1024 个字符
name: 仅使用字母、数字和连字符(无括号、特殊字符)
description: 第三人称,仅描述何时使用(而非其作用)
这是什么?用 1-2 句话说明核心原则。
[如果决策不明显,使用小型内联流程图]
包含症状和使用场景的要点列表 何时不使用
前后代码对比
用于扫描常见操作的表格或要点
简单模式的内联代码 大量参考或可重用工具链接到文件
出错的地方 + 修复方法
具体结果
对可发现性至关重要: 未来的 Claude 需要能找到你的技能
目的: Claude 读取描述来决定为给定任务加载哪些技能。让它回答:"我现在应该阅读这个技能吗?"
格式: 以 "Use when..." 开头,专注于触发条件
关键:描述 = 何时使用,而非技能的作用
描述应仅描述触发条件。切勿在描述中总结技能的过程或工作流。
为什么这很重要: 测试发现,当描述总结了技能的工作流时,Claude 可能会遵循描述而不是阅读完整的技能内容。一个描述为 "code review between tasks" 的描述导致 Claude 只进行一次审查,即使技能的流程图清楚地显示了两次审查(规范符合性审查,然后是代码质量审查)。
当描述改为仅 "Use when executing implementation plans with independent tasks in the current session"(无工作流总结)时,Claude 正确地阅读了流程图并遵循了两阶段审查过程。
陷阱: 总结工作流的描述创建了一个 Claude 会采用的捷径。技能主体变成了 Claude 会跳过的文档。
# ❌ 错误:总结工作流 - Claude 可能遵循这个而不是阅读技能
description: Use when executing plans - dispatches subagent per task with code review between tasks
# ❌ 错误:过程细节过多
description: Use for TDD - write test first, watch it fail, write minimal code, refactor
# ✅ 良好:仅触发条件,无工作流总结
description: Use when executing implementation plans with independent tasks in the current session
# ✅ 良好:仅触发条件
description: Use when implementing any feature or bugfix, before writing implementation code
内容:
使用表明此技能适用的具体触发器、症状和情境
描述 问题(竞争条件、不一致行为)而非 语言特定症状(setTimeout, sleep)
保持触发器与技术无关,除非技能本身是技术特定的
如果技能是技术特定的,在触发器中明确说明
使用第三人称书写(注入到系统提示中)
切勿总结技能的过程或工作流
description: For async testing
description: I can help you with async tests when they're flaky
description: Use when tests use setTimeout/sleep and are flaky
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
description: Use when using React Router and handling authentication redirects
使用 Claude 会搜索的词语:
使用主动语态,动词优先:
creating-skills 而非 skill-creationcondition-based-waiting 而非 async-test-helpers问题: 入门指南和经常引用的技能会加载到每一次对话中。每一个令牌都很重要。
目标字数:
技巧:
将细节移至工具帮助:
# ❌ 错误:在 SKILL.md 中记录所有标志
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
# ✅ 良好:参考 --help
search-conversations supports multiple modes and filters. Run --help for details.
使用交叉引用:
# ❌ 错误:重复工作流细节
When searching, dispatch subagent with template...
[20 lines of repeated instructions]
# ✅ 良好:引用其他技能
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
压缩示例:
# ❌ 错误:冗长的示例 (42 字)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
# ✅ 良好:最小化示例 (20 字)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
消除冗余:
验证:
wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 total
按你做的事情或核心见解命名:
condition-based-waiting > async-test-helpersusing-skills 而非 skill-usageflatten-with-flags > data-structure-refactoringroot-cause-tracing > debugging-techniques动名词 (-ing) 适用于过程:
creating-skills, testing-skills, debugging-with-logs当编写引用其他技能的文档时:
仅使用技能名称,并加上明确的要求标记:
**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debuggingSee skills/testing/test-driven-development(不清楚是否必需)@skills/testing/test-driven-development/SKILL.md(强制加载,消耗上下文)为什么不用 @ 链接: @ 语法会立即强制加载文件,在你需要它们之前就消耗了 200k+ 的上下文。
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}
仅对以下情况使用流程图:
切勿对以下情况使用流程图:
有关 graphviz 样式规则,请参阅 @graphviz-conventions.dot。
为你的合作伙伴可视化: 使用此目录中的 render-graphs.js 将技能的流程图渲染为 SVG:
./render-graphs.js ../some-skill # 每个图表单独渲染
./render-graphs.js ../some-skill --combine # 所有图表在一个 SVG 中
一个优秀的示例胜过许多平庸的示例
选择最相关的语言:
良好的示例:
不要:
你擅长移植——一个优秀的示例就足够了。
defense-in-depth/
SKILL.md # 所有内容内联
何时使用:所有内容都合适,不需要大量参考
condition-based-waiting/
SKILL.md # 概述 + 模式
example.ts # 可供适配的工作助手
何时使用:工具是可重用代码,而不仅仅是叙述
pptx/
SKILL.md # 概述 + 工作流
pptxgenjs.md # 600 行 API 参考
ooxml.md # 500 行 XML 结构
scripts/ # 可执行工具
何时使用:参考材料太大,无法内联
NO SKILL WITHOUT A FAILING TEST FIRST
这适用于新技能和对现有技能的编辑。
先写技能再测试?删除它。重新开始。未经测试就编辑技能?同样的违规。
没有例外:
必备背景: superpowers:test-driven-development 技能解释了为什么这很重要。同样的原则适用于文档。
不同的技能类型需要不同的测试方法:
示例: TDD, verification-before-completion, designing-before-coding
测试方法:
成功标准: 代理在最大压力下遵循规则
示例: condition-based-waiting, root-cause-tracing, defensive-programming
测试方法:
成功标准: 代理成功将技术应用于新场景
示例: reducing-complexity, information-hiding concepts
测试方法:
成功标准: 代理正确识别何时/如何应用模式
示例: API 文档、命令参考、库指南
测试方法:
成功标准: 代理找到并正确应用参考信息
| 借口 | 现实 |
|---|---|
| "技能显然很清楚" | 对你清楚 ≠ 对其他代理清楚。测试它。 |
| "它只是一个参考" | 参考可能有漏洞、不清楚的部分。测试检索。 |
| "测试过度了" | 未经测试的技能总会有问题。总是如此。15 分钟测试节省数小时。 |
| "如果出现问题我会测试" | 问题 = 代理无法使用技能。在部署前测试。 |
| "测试太繁琐" | 测试比在生产中调试坏技能更不繁琐。 |
| "我确信它很好" | 过度自信保证会有问题。无论如何都要测试。 |
| "学术审查就够了" | 阅读 ≠ 使用。测试应用场景。 |
| "没时间测试" | 部署未经测试的技能会浪费更多时间在以后修复它上。 |
所有这些都意味着:在部署前测试。没有例外。
强制执行纪律的技能(如 TDD)需要能够抵抗合理化理由。代理很聪明,在压力下会找到漏洞。
心理学说明: 理解为什么说服技巧有效,有助于你系统地应用它们。关于权威、承诺、稀缺性、社会认同和统一性原则的研究基础,请参阅 persuasion-principles.md(Cialdini, 2021; Meincke et al., 2025)。
不要仅仅陈述规则——禁止特定的变通方法:
没有例外:
不要将其保留为"参考"
不要在编写测试时"调整"它
不要看它
删除意味着删除
</Good>尽早添加基本原则:
**违反规则的字面意思就是违反规则的精神。**
这切断了整个"我遵循的是精神"的合理化理由类别。
从基线测试中捕获合理化理由(参见下面的测试部分)。代理提出的每一个借口都放入表中:
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
让代理在合理化时容易自我检查:
## 红旗 - 停止并重新开始
- 先写代码后测试
- "我已经手动测试过了"
- "事后测试能达到同样的目的"
- "这是关于精神而非仪式"
- "这不一样,因为..."
**所有这些都意味着:删除代码。用 TDD 重新开始。**
添加到描述中:当你即将违反规则时的症状:
description: use when implementing any feature or bugfix, before writing implementation code
遵循 TDD 循环:
在没有技能的情况下运行包含子代理的压力场景。记录确切行为:
这就是"观察测试失败"——你必须在编写技能之前看到代理自然的行为。
编写针对这些特定合理化理由的技能。不要为假设的情况添加额外内容。
在有技能的情况下运行相同的场景。代理现在应该遵守。
代理找到了新的合理化理由?添加明确的应对措施。重新测试直到无懈可击。
测试方法: 完整的测试方法请参阅 @testing-skills-with-subagents.md:
"In session 2025-10-03, we found empty projectDir caused..." 为什么不好: 太具体,不可重用
example-js.js, example-py.py, example-go.go 为什么不好: 质量平庸,维护负担重
step1 [label="import fs"];
step2 [label="read file"];
为什么不好: 无法复制粘贴,难以阅读
helper1, helper2, step3, pattern4 为什么不好: 标签应具有语义含义
编写完任何技能后,你必须停止并完成部署过程。
不要:
下面的部署清单对每个技能都是强制性的。
部署未经测试的技能 = 部署未经测试的代码。这违反了质量标准。
重要:使用 TodoWrite 为下面的每个清单项创建待办事项。
RED 阶段 - 编写失败的测试:
GREEN 阶段 - 编写最小化技能:
REFACTOR 阶段 - 堵住漏洞:
质量检查:
部署:
未来 Claude 如何找到你的技能:
为此流程优化 - 尽早并经常放置可搜索的术语。
创建技能就是流程文档的 TDD。
同样的铁律:没有失败的测试就没有技能。同样的循环:RED(基线)→ GREEN(编写技能)→ REFACTOR(堵住漏洞)。同样的好处:更好的质量、更少的意外、无懈可击的结果。
如果你对代码遵循 TDD,那么对技能也要遵循 TDD。这是应用于文档的同一纪律。
每周安装
23.4K
仓库
GitHub 星标
107.7K
首次出现
Jan 19, 2026
安全审计
安装于
opencode19.9K
codex19.3K
gemini-cli19.2K
github-copilot18.1K
cursor17.6K
amp16.9K
Writing skills IS Test-Driven Development applied to process documentation.
Personal skills live in agent-specific directories (~/.claude/skills for Claude Code, ~/.agents/skills/ for Codex)
You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
REQUIRED BACKGROUND: You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.
Skills are: Reusable techniques, patterns, tools, reference guides
Skills are NOT: Narratives about how you solved a problem once
| TDD Concept | Skill Creation |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | Skill document (SKILL.md) |
| Test fails (RED) | Agent violates rule without skill (baseline) |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes while maintaining compliance |
| Write test first | Run baseline scenario BEFORE writing skill |
| Watch it fail | Document exact rationalizations agent uses |
| Minimal code | Write skill addressing those specific violations |
| Watch it pass | Verify agent now complies |
| Refactor cycle | Find new rationalizations → plug → re-verify |
The entire skill creation process follows RED-GREEN-REFACTOR.
Create when:
Don't create for:
Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)
Way of thinking about problems (flatten-with-flags, test-invariants)
API docs, syntax guides, tool documentation (office docs)
skills/
skill-name/
SKILL.md # Main reference (required)
supporting-file.* # Only if needed
Flat namespace - all skills in one searchable namespace
Separate files for:
Keep inline:
Frontmatter (YAML):
Only two fields supported: name and description
Max 1024 characters total
name: Use letters, numbers, and hyphens only (no parentheses, special chars)
description: Third-person, describes ONLY when to use (NOT what it does)
What is this? Core principle in 1-2 sentences.
[Small inline flowchart IF decision non-obvious]
Bullet list with SYMPTOMS and use cases When NOT to use
Before/after code comparison
Critical for discovery: Future Claude needs to FIND your skill
Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
Format: Start with "Use when..." to focus on triggering conditions
CRITICAL: Description = When to Use, NOT What the Skill Does
The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.
Why this matters: Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).
When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.
The trap: Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.
# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review between tasks
# ❌ BAD: Too much process detail
description: Use for TDD - write test first, watch it fail, write minimal code, refactor
# ✅ GOOD: Just triggering conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session
# ✅ GOOD: Triggering conditions only
description: Use when implementing any feature or bugfix, before writing implementation code
Content:
Use concrete triggers, symptoms, and situations that signal this skill applies
Describe the problem (race conditions, inconsistent behavior) not language-specific symptoms (setTimeout, sleep)
Keep triggers technology-agnostic unless the skill itself is technology-specific
If skill is technology-specific, make that explicit in the trigger
Write in third person (injected into system prompt)
NEVER summarize the skill's process or workflow
description: For async testing
description: I can help you with async tests when they're flaky
description: Use when tests use setTimeout/sleep and are flaky
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
description: Use when using React Router and handling authentication redirects
Use words Claude would search for:
Use active voice, verb-first:
creating-skills not skill-creationcondition-based-waiting not async-test-helpersProblem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
Target word counts:
Techniques:
Move details to tool help:
# ❌ BAD: Document all flags in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
# ✅ GOOD: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.
Use cross-references:
# ❌ BAD: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]
# ✅ GOOD: Reference other skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
Compress examples:
# ❌ BAD: Verbose example (42 words)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
# ✅ GOOD: Minimal example (20 words)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
Eliminate redundancy:
Verification:
wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 total
Name by what you DO or core insight:
condition-based-waiting > async-test-helpersusing-skills not skill-usageflatten-with-flags > data-structure-refactoringroot-cause-tracing > debugging-techniquesGerunds (-ing) work well for processes:
creating-skills, testing-skills, debugging-with-logsWhen writing documentation that references other skills:
Use skill name only, with explicit requirement markers:
**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debuggingSee skills/testing/test-driven-development (unclear if required)@skills/testing/test-driven-development/SKILL.md (force-loads, burns context)Why no @ links: @ syntax force-loads files immediately, consuming 200k+ context before you need them.
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}
Use flowcharts ONLY for:
Never use flowcharts for:
See @graphviz-conventions.dot for graphviz style rules.
Visualizing for your human partner: Use render-graphs.js in this directory to render a skill's flowcharts to SVG:
./render-graphs.js ../some-skill # Each diagram separately
./render-graphs.js ../some-skill --combine # All diagrams in one SVG
One excellent example beats many mediocre ones
Choose most relevant language:
Good example:
Don't:
You're good at porting - one great example is enough.
defense-in-depth/
SKILL.md # Everything inline
When: All content fits, no heavy reference needed
condition-based-waiting/
SKILL.md # Overview + patterns
example.ts # Working helpers to adapt
When: Tool is reusable code, not just narrative
pptx/
SKILL.md # Overview + workflows
pptxgenjs.md # 600 lines API reference
ooxml.md # 500 lines XML structure
scripts/ # Executable tools
When: Reference material too large for inline
NO SKILL WITHOUT A FAILING TEST FIRST
This applies to NEW skills AND EDITS to existing skills.
Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation.
No exceptions:
REQUIRED BACKGROUND: The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.
Different skill types need different test approaches:
Examples: TDD, verification-before-completion, designing-before-coding
Test with:
Success criteria: Agent follows rule under maximum pressure
Examples: condition-based-waiting, root-cause-tracing, defensive-programming
Test with:
Success criteria: Agent successfully applies technique to new scenario
Examples: reducing-complexity, information-hiding concepts
Test with:
Success criteria: Agent correctly identifies when/how to apply pattern
Examples: API documentation, command references, library guides
Test with:
Success criteria: Agent finds and correctly applies reference information
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to other agents. Test it. |
| "It's just a reference" | References can have gaps, unclear sections. Test retrieval. |
| "Testing is overkill" | Untested skills have issues. Always. 15 min testing saves hours. |
| "I'll test if problems emerge" | Problems = agents can't use skill. Test BEFORE deploying. |
| "Too tedious to test" | Testing is less tedious than debugging bad skill in production. |
| "I'm confident it's good" | Overconfidence guarantees issues. Test anyway. |
| "Academic review is enough" | Reading ≠ using. Test application scenarios. |
| "No time to test" | Deploying untested skill wastes more time fixing it later. |
All of these mean: Test before deploying. No exceptions.
Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.
Psychology note: Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.
Don't just state the rule - forbid specific workarounds:
No exceptions:
Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete
</Good>Add foundational principle early:
**Violating the letter of the rules is violating the spirit of the rules.**
This cuts off entire class of "I'm following the spirit" rationalizations.
Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:
| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
Make it easy for agents to self-check when rationalizing:
## Red Flags - STOP and Start Over
- Code before test
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "This is different because..."
**All of these mean: Delete code. Start over with TDD.**
Add to description: symptoms of when you're ABOUT to violate the rule:
description: use when implementing any feature or bugfix, before writing implementation code
Follow the TDD cycle:
Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:
This is "watch the test fail" - you must see what agents naturally do before writing the skill.
Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.
Run same scenarios WITH skill. Agent should now comply.
Agent found new rationalization? Add explicit counter. Re-test until bulletproof.
Testing methodology: See @testing-skills-with-subagents.md for the complete testing methodology:
"In session 2025-10-03, we found empty projectDir caused..." Why bad: Too specific, not reusable
example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden
step1 [label="import fs"];
step2 [label="read file"];
Why bad: Can't copy-paste, hard to read
helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning
After writing ANY skill, you MUST STOP and complete the deployment process.
Do NOT:
The deployment checklist below is MANDATORY for EACH skill.
Deploying untested skills = deploying untested code. It's a violation of quality standards.
IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.
RED Phase - Write Failing Test:
GREEN Phase - Write Minimal Skill:
REFACTOR Phase - Close Loopholes:
Quality Checks:
Deployment:
How future Claude finds your skill:
Optimize for this flow - put searchable terms early and often.
Creating skills IS TDD for process documentation.
Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.
If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.
Weekly Installs
23.4K
Repository
GitHub Stars
107.7K
First Seen
Jan 19, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode19.9K
codex19.3K
gemini-cli19.2K
github-copilot18.1K
cursor17.6K
amp16.9K
97,600 周安装
Table or bullets for scanning common operations
Inline code for simple patterns Link to file for heavy reference or reusable tools
What goes wrong + fixes
Concrete results