写作技能：基于测试驱动开发的AI代理技能创建指南 | 技术文档最佳实践

writing-skills by obra/superpowers

23,400 周安装量

107,700 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/obra/superpowers --skill writing-skills

AI/机器学习开发生产力

🇨🇳中文介绍

写作技能

概述

写作技能是应用于流程文档的测试驱动开发。

个人技能存放在特定代理的目录中（Claude Code 使用 ~/.claude/skills，Codex 使用 ~/.agents/skills/）

你编写测试用例（包含子代理的压力场景），观察它们失败（基线行为），编写技能（文档），观察测试通过（代理遵守），然后重构（堵住漏洞）。

核心原则： 如果你没有观察过代理在没有该技能时的失败情况，你就不知道这个技能是否教授了正确的东西。

必备背景： 在使用此技能之前，你必须理解 superpowers:test-driven-development。该技能定义了基本的 RED-GREEN-REFACTOR 循环。本技能将 TDD 应用于文档编写。

官方指导： 关于 Anthropic 官方的技能编写最佳实践，请参阅 anthropic-best-practices.md。本文档提供了额外的模式和指南，以补充本技能中专注于 TDD 的方法。

什么是技能？

技能是经过验证的技术、模式或工具的参考指南。技能帮助未来的 Claude 实例找到并应用有效的方法。

技能是： 可重用的技术、模式、工具、参考指南

技能不是： 关于你如何一次性解决问题的叙述

技能的 TDD 映射

TDD 概念	技能创建
测试用例

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

Claude 搜索优化 (CSO)

对可发现性至关重要： 未来的 Claude 需要能找到你的技能

1. 丰富的描述字段

目的： Claude 读取描述来决定为给定任务加载哪些技能。让它回答："我现在应该阅读这个技能吗？"

格式： 以 "Use when..." 开头，专注于触发条件

关键：描述 = 何时使用，而非技能的作用

描述应仅描述触发条件。切勿在描述中总结技能的过程或工作流。

为什么这很重要： 测试发现，当描述总结了技能的工作流时，Claude 可能会遵循描述而不是阅读完整的技能内容。一个描述为 "code review between tasks" 的描述导致 Claude 只进行一次审查，即使技能的流程图清楚地显示了两次审查（规范符合性审查，然后是代码质量审查）。

当描述改为仅 "Use when executing implementation plans with independent tasks in the current session"（无工作流总结）时，Claude 正确地阅读了流程图并遵循了两阶段审查过程。

陷阱： 总结工作流的描述创建了一个 Claude 会采用的捷径。技能主体变成了 Claude 会跳过的文档。

# ❌ 错误：总结工作流 - Claude 可能遵循这个而不是阅读技能
description: Use when executing plans - dispatches subagent per task with code review between tasks

# ❌ 错误：过程细节过多
description: Use for TDD - write test first, watch it fail, write minimal code, refactor

# ✅ 良好：仅触发条件，无工作流总结
description: Use when executing implementation plans with independent tasks in the current session

# ✅ 良好：仅触发条件
description: Use when implementing any feature or bugfix, before writing implementation code

使用表明此技能适用的具体触发器、症状和情境
描述问题（竞争条件、不一致行为）而非 语言特定症状（setTimeout, sleep）
保持触发器与技术无关，除非技能本身是技术特定的
如果技能是技术特定的，在触发器中明确说明
使用第三人称书写（注入到系统提示中）
切勿总结技能的过程或工作流

❌ 错误：太抽象、模糊，未包含何时使用

description: For async testing

❌ 错误：第一人称

description: I can help you with async tests when they're flaky

❌ 错误：提及技术但技能并非特定于它

description: Use when tests use setTimeout/sleep and are flaky

✅ 良好：以 "Use when" 开头，描述问题，无工作流

description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently

✅ 良好：技术特定技能，具有明确的触发器

description: Use when using React Router and handling authentication redirects

使用 Claude 会搜索的词语：

错误消息："Hook timed out", "ENOTEMPTY", "race condition"
症状："flaky", "hanging", "zombie", "pollution"
同义词："timeout/hang/freeze", "cleanup/teardown/afterEach"
工具：实际命令、库名、文件类型

使用主动语态，动词优先：

✅ creating-skills 而非 skill-creation
✅ condition-based-waiting 而非 async-test-helpers

4. 令牌效率（关键）

问题： 入门指南和经常引用的技能会加载到每一次对话中。每一个令牌都很重要。

入门工作流：每个 <150 字
经常加载的技能：总计 <200 字
其他技能：<500 字（仍需简洁）

将细节移至工具帮助：

# ❌ 错误：在 SKILL.md 中记录所有标志
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

# ✅ 良好：参考 --help
search-conversations supports multiple modes and filters. Run --help for details.

使用交叉引用：

# ❌ 错误：重复工作流细节
When searching, dispatch subagent with template...
[20 lines of repeated instructions]

# ✅ 良好：引用其他技能
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.

# ❌ 错误：冗长的示例 (42 字)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]

# ✅ 良好：最小化示例 (20 字)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]

不要重复交叉引用技能中的内容
不要解释命令中显而易见的内容
不要包含同一模式的多个示例

wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 total

按你做的事情或核心见解命名：

✅ condition-based-waiting > async-test-helpers
✅ using-skills 而非 skill-usage
✅ flatten-with-flags > data-structure-refactoring
✅ root-cause-tracing > debugging-techniques

动名词 (-ing) 适用于过程：

creating-skills, testing-skills, debugging-with-logs
主动，描述你正在采取的行动

4. 交叉引用其他技能

当编写引用其他技能的文档时：

仅使用技能名称，并加上明确的要求标记：

✅ 良好：**REQUIRED SUB-SKILL:** Use superpowers:test-driven-development
✅ 良好：**REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging
❌ 错误：See skills/testing/test-driven-development（不清楚是否必需）
❌ 错误：@skills/testing/test-driven-development/SKILL.md（强制加载，消耗上下文）

为什么不用 @ 链接： @ 语法会立即强制加载文件，在你需要它们之前就消耗了 200k+ 的上下文。

digraph when_flowchart {
    "Need to show information?" [shape=diamond];
    "Decision where I might go wrong?" [shape=diamond];
    "Use markdown" [shape=box];
    "Small inline flowchart" [shape=box];

    "Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
    "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
    "Decision where I might go wrong?" -> "Use markdown" [label="no"];
}

仅对以下情况使用流程图：

不明显的决策点
你可能过早停止的过程循环
"何时使用 A 与 B" 的决策

切勿对以下情况使用流程图：

参考材料 → 表格、列表
代码示例 → Markdown 块
线性说明 → 编号列表
没有语义含义的标签（step1, helper2）

有关 graphviz 样式规则，请参阅 @graphviz-conventions.dot。

为你的合作伙伴可视化： 使用此目录中的 render-graphs.js 将技能的流程图渲染为 SVG：

./render-graphs.js ../some-skill           # 每个图表单独渲染
./render-graphs.js ../some-skill --combine # 所有图表在一个 SVG 中

一个优秀的示例胜过许多平庸的示例

选择最相关的语言：

测试技术 → TypeScript/JavaScript
系统调试 → Shell/Python
数据处理 → Python

良好的示例：

完整且可运行
注释良好，解释 WHY
来自真实场景
清晰地展示模式
准备好适配（非通用模板）

用 5+ 种语言实现
创建填空模板
编写人为的示例

你擅长移植——一个优秀的示例就足够了。

defense-in-depth/
  SKILL.md    # 所有内容内联

何时使用：所有内容都合适，不需要大量参考

包含可重用工具的技能

condition-based-waiting/
  SKILL.md    # 概述 + 模式
  example.ts  # 可供适配的工作助手

何时使用：工具是可重用代码，而不仅仅是叙述

包含大量参考的技能

pptx/
  SKILL.md       # 概述 + 工作流
  pptxgenjs.md   # 600 行 API 参考
  ooxml.md       # 500 行 XML 结构
  scripts/       # 可执行工具

何时使用：参考材料太大，无法内联

铁律（与 TDD 相同）

NO SKILL WITHOUT A FAILING TEST FIRST

这适用于新技能和对现有技能的编辑。

先写技能再测试？删除它。重新开始。未经测试就编辑技能？同样的违规。

不适用于"简单添加"
不适用于"只是添加一个部分"
不适用于"文档更新"
不要将未经测试的更改保留为"参考"
不要在运行测试时"调整"
删除意味着删除

必备背景： superpowers:test-driven-development 技能解释了为什么这很重要。同样的原则适用于文档。

测试所有技能类型

不同的技能类型需要不同的测试方法：

纪律执行技能（规则/要求）

示例： TDD, verification-before-completion, designing-before-coding

学术问题：他们理解规则吗？
压力场景：他们在压力下遵守吗？
多种压力组合：时间 + 沉没成本 + 疲惫
识别合理化理由并添加明确的应对措施

成功标准： 代理在最大压力下遵循规则

技术技能（操作指南）

示例： condition-based-waiting, root-cause-tracing, defensive-programming

应用场景：他们能正确应用该技术吗？
变体场景：他们能处理边缘情况吗？
缺失信息测试：说明有漏洞吗？

成功标准： 代理成功将技术应用于新场景

模式技能（心智模型）

示例： reducing-complexity, information-hiding concepts

识别场景：他们能识别何时应用模式吗？
应用场景：他们能使用心智模型吗？
反例：他们知道何时不应用吗？

成功标准： 代理正确识别何时/如何应用模式

参考技能（文档/API）

示例： API 文档、命令参考、库指南

检索场景：他们能找到正确的信息吗？
应用场景：他们能正确使用找到的信息吗？
漏洞测试：常见用例是否涵盖？

成功标准： 代理找到并正确应用参考信息

跳过测试的常见合理化理由

借口	现实
"技能显然很清楚"	对你清楚 ≠ 对其他代理清楚。测试它。
"它只是一个参考"	参考可能有漏洞、不清楚的部分。测试检索。
"测试过度了"	未经测试的技能总会有问题。总是如此。15 分钟测试节省数小时。
"如果出现问题我会测试"	问题 = 代理无法使用技能。在部署前测试。
"测试太繁琐"	测试比在生产中调试坏技能更不繁琐。
"我确信它很好"	过度自信保证会有问题。无论如何都要测试。
"学术审查就够了"	阅读 ≠ 使用。测试应用场景。
"没时间测试"	部署未经测试的技能会浪费更多时间在以后修复它上。

所有这些都意味着：在部署前测试。没有例外。

使技能免受合理化理由影响

强制执行纪律的技能（如 TDD）需要能够抵抗合理化理由。代理很聪明，在压力下会找到漏洞。

心理学说明： 理解为什么说服技巧有效，有助于你系统地应用它们。关于权威、承诺、稀缺性、社会认同和统一性原则的研究基础，请参阅 persuasion-principles.md（Cialdini, 2021; Meincke et al., 2025）。

明确堵住每一个漏洞

不要仅仅陈述规则——禁止特定的变通方法：

不要将其保留为"参考"
不要在编写测试时"调整"它
不要看它
删除意味着删除
</Good>
处理"精神与字面"的争论

尽早添加基本原则：
```
**违反规则的字面意思就是违反规则的精神。**
```

这切断了整个"我遵循的是精神"的合理化理由类别。

构建合理化理由表

从基线测试中捕获合理化理由（参见下面的测试部分）。代理提出的每一个借口都放入表中：

| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |

让代理在合理化时容易自我检查：

## 红旗 - 停止并重新开始

- 先写代码后测试
- "我已经手动测试过了"
- "事后测试能达到同样的目的"
- "这是关于精神而非仪式"
- "这不一样，因为..."

**所有这些都意味着：删除代码。用 TDD 重新开始。**

为违规症状更新 CSO

添加到描述中：当你即将违反规则时的症状：

description: use when implementing any feature or bugfix, before writing implementation code

技能的 RED-GREEN-REFACTOR

遵循 TDD 循环：

RED：编写失败的测试（基线）

在没有技能的情况下运行包含子代理的压力场景。记录确切行为：

他们做了什么选择？
他们使用了哪些合理化理由（逐字记录）？
哪些压力触发了违规？

这就是"观察测试失败"——你必须在编写技能之前看到代理自然的行为。

GREEN：编写最小化技能

编写针对这些特定合理化理由的技能。不要为假设的情况添加额外内容。

在有技能的情况下运行相同的场景。代理现在应该遵守。

REFACTOR：堵住漏洞

代理找到了新的合理化理由？添加明确的应对措施。重新测试直到无懈可击。

测试方法： 完整的测试方法请参阅 @testing-skills-with-subagents.md：

如何编写压力场景
压力类型（时间、沉没成本、权威、疲惫）
系统地堵住漏洞
元测试技术

"In session 2025-10-03, we found empty projectDir caused..." 为什么不好： 太具体，不可重用

example-js.js, example-py.py, example-go.go 为什么不好： 质量平庸，维护负担重

❌ 流程图中的代码

step1 [label="import fs"];
step2 [label="read file"];

为什么不好： 无法复制粘贴，难以阅读

helper1, helper2, step3, pattern4 为什么不好： 标签应具有语义含义

停止：在转向下一个技能之前

编写完任何技能后，你必须停止并完成部署过程。

批量创建多个技能而不测试每一个
在当前技能验证之前转向下一个技能
因为"批量处理更高效"而跳过测试

下面的部署清单对每个技能都是强制性的。

部署未经测试的技能 = 部署未经测试的代码。这违反了质量标准。

技能创建清单（TDD 适配版）

重要：使用 TodoWrite 为下面的每个清单项创建待办事项。

RED 阶段 - 编写失败的测试：

创建压力场景（纪律技能需要 3+ 种压力组合）
在没有技能的情况下运行场景 - 逐字记录基线行为
识别合理化理由/失败中的模式

GREEN 阶段 - 编写最小化技能：

名称仅使用字母、数字、连字符（无括号/特殊字符）
YAML frontmatter 仅包含 name 和 description（最多 1024 字符）
描述以 "Use when..." 开头，并包含具体触发器/症状
描述使用第三人称书写
全文包含搜索关键词（错误、症状、工具）
概述清晰，包含核心原则
解决在 RED 阶段识别的特定基线失败
代码内联或链接到单独文件
一个优秀的示例（非多语言）
在有技能的情况下运行场景 - 验证代理现在是否遵守

REFACTOR 阶段 - 堵住漏洞：

从测试中识别新的合理化理由
添加明确的应对措施（如果是纪律技能）
从所有测试迭代中构建合理化理由表
创建红旗列表
重新测试直到无懈可击

仅在决策不明显时使用小型流程图
快速参考表
常见错误部分
没有叙述性故事
仅针对工具或大量参考使用支持文件

将技能提交到 git 并推送到你的分支（如果已配置）
考虑通过 PR 贡献回来（如果广泛有用）

未来 Claude 如何找到你的技能：

遇到问题 ("tests are flaky")
找到技能 (描述匹配)
扫描概述 (这相关吗？)
阅读模式 (快速参考表)
加载示例 (仅在实现时)

为此流程优化 - 尽早并经常放置可搜索的术语。

创建技能就是流程文档的 TDD。

同样的铁律：没有失败的测试就没有技能。同样的循环：RED（基线）→ GREEN（编写技能）→ REFACTOR（堵住漏洞）。同样的好处：更好的质量、更少的意外、无懈可击的结果。

如果你对代码遵循 TDD，那么对技能也要遵循 TDD。这是应用于文档的同一纪律。

🇺🇸English

Writing Skills

Overview

Writing skills IS Test-Driven Development applied to process documentation.

Personal skills live in agent-specific directories (~/.claude/skills for Claude Code, ~/.agents/skills/ for Codex)

You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).

Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.

REQUIRED BACKGROUND: You MUST understand superpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.

Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.

What is a Skill?

A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.

Skills are: Reusable techniques, patterns, tools, reference guides

Skills are NOT: Narratives about how you solved a problem once

TDD Mapping for Skills

TDD Concept	Skill Creation
Test case	Pressure scenario with subagent
Production code	Skill document (SKILL.md)
Test fails (RED)	Agent violates rule without skill (baseline)
Test passes (GREEN)	Agent complies with skill present
Refactor	Close loopholes while maintaining compliance
Write test first	Run baseline scenario BEFORE writing skill
Watch it fail	Document exact rationalizations agent uses
Minimal code	Write skill addressing those specific violations
Watch it pass	Verify agent now complies
Refactor cycle	Find new rationalizations → plug → re-verify

The entire skill creation process follows RED-GREEN-REFACTOR.

When to Create a Skill

Create when:

Technique wasn't intuitively obvious to you
You'd reference this again across projects
Pattern applies broadly (not project-specific)
Others would benefit

Don't create for:

One-off solutions
Standard practices well-documented elsewhere
Project-specific conventions (put in CLAUDE.md)
Mechanical constraints (if it's enforceable with regex/validation, automate it—save documentation for judgment calls)

Skill Types

Technique

Concrete method with steps to follow (condition-based-waiting, root-cause-tracing)

Pattern

Way of thinking about problems (flatten-with-flags, test-invariants)

Reference

API docs, syntax guides, tool documentation (office docs)

Directory Structure

skills/
  skill-name/
    SKILL.md              # Main reference (required)
    supporting-file.*     # Only if needed

Flat namespace - all skills in one searchable namespace

Separate files for:

Heavy reference (100+ lines) - API docs, comprehensive syntax
Reusable tools - Scripts, utilities, templates

Keep inline:

Principles and concepts
Code patterns (< 50 lines)
Everything else

SKILL.md Structure

Frontmatter (YAML):

Only two fields supported: name and description
Max 1024 characters total
name: Use letters, numbers, and hyphens only (no parentheses, special chars)
description: Third-person, describes ONLY when to use (NOT what it does)
- Start with "Use when..." to focus on triggering conditions
- Include specific symptoms, situations, and contexts
- NEVER summarize the skill's process or workflow (see CSO section for why)
- Keep under 500 characters if possible
name: Skill-Name-With-Hyphens description: Use when [specific triggering conditions and symptoms]

Skill Name

Overview

What is this? Core principle in 1-2 sentences.

When to Use

[Small inline flowchart IF decision non-obvious]

Bullet list with SYMPTOMS and use cases When NOT to use

Core Pattern (for techniques/patterns)

Before/after code comparison

Claude Search Optimization (CSO)

Critical for discovery: Future Claude needs to FIND your skill

1. Rich Description Field

Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"

Format: Start with "Use when..." to focus on triggering conditions

CRITICAL: Description = When to Use, NOT What the Skill Does

The description should ONLY describe triggering conditions. Do NOT summarize the skill's process or workflow in the description.

Why this matters: Testing revealed that when a description summarizes the skill's workflow, Claude may follow the description instead of reading the full skill content. A description saying "code review between tasks" caused Claude to do ONE review, even though the skill's flowchart clearly showed TWO reviews (spec compliance then code quality).

When the description was changed to just "Use when executing implementation plans with independent tasks" (no workflow summary), Claude correctly read the flowchart and followed the two-stage review process.

The trap: Descriptions that summarize workflow create a shortcut Claude will take. The skill body becomes documentation Claude skips.

# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review between tasks

# ❌ BAD: Too much process detail
description: Use for TDD - write test first, watch it fail, write minimal code, refactor

# ✅ GOOD: Just triggering conditions, no workflow summary
description: Use when executing implementation plans with independent tasks in the current session

# ✅ GOOD: Triggering conditions only
description: Use when implementing any feature or bugfix, before writing implementation code

Content:

Use concrete triggers, symptoms, and situations that signal this skill applies
Describe the problem (race conditions, inconsistent behavior) not language-specific symptoms (setTimeout, sleep)
Keep triggers technology-agnostic unless the skill itself is technology-specific
If skill is technology-specific, make that explicit in the trigger
Write in third person (injected into system prompt)
NEVER summarize the skill's process or workflow

❌ BAD: Too abstract, vague, doesn't include when to use

description: For async testing

❌ BAD: First person

description: I can help you with async tests when they're flaky

❌ BAD: Mentions technology but skill isn't specific to it

description: Use when tests use setTimeout/sleep and are flaky

✅ GOOD: Starts with "Use when", describes problem, no workflow

description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently

✅ GOOD: Technology-specific skill with explicit trigger

description: Use when using React Router and handling authentication redirects

2. Keyword Coverage

Use words Claude would search for:

Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
Symptoms: "flaky", "hanging", "zombie", "pollution"
Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
Tools: Actual commands, library names, file types

3. Descriptive Naming

Use active voice, verb-first:

✅ creating-skills not skill-creation
✅ condition-based-waiting not async-test-helpers

4. Token Efficiency (Critical)

Problem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.

Target word counts:

getting-started workflows: <150 words each
Frequently-loaded skills: <200 words total
Other skills: <500 words (still be concise)

Techniques:

Move details to tool help:

# ❌ BAD: Document all flags in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N

# ✅ GOOD: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.

Use cross-references:

# ❌ BAD: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]

# ✅ GOOD: Reference other skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.

Compress examples:

# ❌ BAD: Verbose example (42 words)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]

# ✅ GOOD: Minimal example (20 words)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]

Eliminate redundancy:

Don't repeat what's in cross-referenced skills
Don't explain what's obvious from command
Don't include multiple examples of same pattern

Verification:

wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 total

Name by what you DO or core insight:

✅ condition-based-waiting > async-test-helpers
✅ using-skills not skill-usage
✅ flatten-with-flags > data-structure-refactoring
✅ root-cause-tracing > debugging-techniques

Gerunds (-ing) work well for processes:

creating-skills, testing-skills, debugging-with-logs
Active, describes the action you're taking

4. Cross-Referencing Other Skills

When writing documentation that references other skills:

Use skill name only, with explicit requirement markers:

✅ Good: **REQUIRED SUB-SKILL:** Use superpowers:test-driven-development
✅ Good: **REQUIRED BACKGROUND:** You MUST understand superpowers:systematic-debugging
❌ Bad: See skills/testing/test-driven-development (unclear if required)
❌ Bad: @skills/testing/test-driven-development/SKILL.md (force-loads, burns context)

Why no @ links: @ syntax force-loads files immediately, consuming 200k+ context before you need them.

Flowchart Usage

digraph when_flowchart {
    "Need to show information?" [shape=diamond];
    "Decision where I might go wrong?" [shape=diamond];
    "Use markdown" [shape=box];
    "Small inline flowchart" [shape=box];

    "Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
    "Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
    "Decision where I might go wrong?" -> "Use markdown" [label="no"];
}

Use flowcharts ONLY for:

Non-obvious decision points
Process loops where you might stop too early
"When to use A vs B" decisions

Never use flowcharts for:

Reference material → Tables, lists
Code examples → Markdown blocks
Linear instructions → Numbered lists
Labels without semantic meaning (step1, helper2)

See @graphviz-conventions.dot for graphviz style rules.

Visualizing for your human partner: Use render-graphs.js in this directory to render a skill's flowcharts to SVG:

./render-graphs.js ../some-skill           # Each diagram separately
./render-graphs.js ../some-skill --combine # All diagrams in one SVG

Code Examples

One excellent example beats many mediocre ones

Choose most relevant language:

Testing techniques → TypeScript/JavaScript
System debugging → Shell/Python
Data processing → Python

Good example:

Complete and runnable
Well-commented explaining WHY
From real scenario
Shows pattern clearly
Ready to adapt (not generic template)

Don't:

Implement in 5+ languages
Create fill-in-the-blank templates
Write contrived examples

You're good at porting - one great example is enough.

File Organization

Self-Contained Skill

defense-in-depth/
  SKILL.md    # Everything inline

When: All content fits, no heavy reference needed

Skill with Reusable Tool

condition-based-waiting/
  SKILL.md    # Overview + patterns
  example.ts  # Working helpers to adapt

When: Tool is reusable code, not just narrative

Skill with Heavy Reference

pptx/
  SKILL.md       # Overview + workflows
  pptxgenjs.md   # 600 lines API reference
  ooxml.md       # 500 lines XML structure
  scripts/       # Executable tools

When: Reference material too large for inline

The Iron Law (Same as TDD)

NO SKILL WITHOUT A FAILING TEST FIRST

This applies to NEW skills AND EDITS to existing skills.

Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation.

No exceptions:

Not for "simple additions"
Not for "just adding a section"
Not for "documentation updates"
Don't keep untested changes as "reference"
Don't "adapt" while running tests
Delete means delete

REQUIRED BACKGROUND: The superpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.

Testing All Skill Types

Different skill types need different test approaches:

Discipline-Enforcing Skills (rules/requirements)

Examples: TDD, verification-before-completion, designing-before-coding

Test with:

Academic questions: Do they understand the rules?
Pressure scenarios: Do they comply under stress?
Multiple pressures combined: time + sunk cost + exhaustion
Identify rationalizations and add explicit counters

Success criteria: Agent follows rule under maximum pressure

Technique Skills (how-to guides)

Examples: condition-based-waiting, root-cause-tracing, defensive-programming

Test with:

Application scenarios: Can they apply the technique correctly?
Variation scenarios: Do they handle edge cases?
Missing information tests: Do instructions have gaps?

Success criteria: Agent successfully applies technique to new scenario

Pattern Skills (mental models)

Examples: reducing-complexity, information-hiding concepts

Test with:

Recognition scenarios: Do they recognize when pattern applies?
Application scenarios: Can they use the mental model?
Counter-examples: Do they know when NOT to apply?

Success criteria: Agent correctly identifies when/how to apply pattern

Reference Skills (documentation/APIs)

Examples: API documentation, command references, library guides

Test with:

Retrieval scenarios: Can they find the right information?
Application scenarios: Can they use what they found correctly?
Gap testing: Are common use cases covered?

Success criteria: Agent finds and correctly applies reference information

Common Rationalizations for Skipping Testing

Excuse	Reality
"Skill is obviously clear"	Clear to you ≠ clear to other agents. Test it.
"It's just a reference"	References can have gaps, unclear sections. Test retrieval.
"Testing is overkill"	Untested skills have issues. Always. 15 min testing saves hours.
"I'll test if problems emerge"	Problems = agents can't use skill. Test BEFORE deploying.
"Too tedious to test"	Testing is less tedious than debugging bad skill in production.
"I'm confident it's good"	Overconfidence guarantees issues. Test anyway.
"Academic review is enough"	Reading ≠ using. Test application scenarios.
"No time to test"	Deploying untested skill wastes more time fixing it later.

All of these mean: Test before deploying. No exceptions.

Bulletproofing Skills Against Rationalization

Skills that enforce discipline (like TDD) need to resist rationalization. Agents are smart and will find loopholes when under pressure.

Psychology note: Understanding WHY persuasion techniques work helps you apply them systematically. See persuasion-principles.md for research foundation (Cialdini, 2021; Meincke et al., 2025) on authority, commitment, scarcity, social proof, and unity principles.

Close Every Loophole Explicitly

Don't just state the rule - forbid specific workarounds:

No exceptions:

Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete
</Good>
Address "Spirit vs Letter" Arguments

Add foundational principle early:
```
**Violating the letter of the rules is violating the spirit of the rules.**
```

This cuts off entire class of "I'm following the spirit" rationalizations.

Build Rationalization Table

Capture rationalizations from baseline testing (see Testing section below). Every excuse agents make goes in the table:

| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |

Create Red Flags List

Make it easy for agents to self-check when rationalizing:

## Red Flags - STOP and Start Over

- Code before test
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "This is different because..."

**All of these mean: Delete code. Start over with TDD.**

Update CSO for Violation Symptoms

Add to description: symptoms of when you're ABOUT to violate the rule:

description: use when implementing any feature or bugfix, before writing implementation code

RED-GREEN-REFACTOR for Skills

Follow the TDD cycle:

RED: Write Failing Test (Baseline)

Run pressure scenario with subagent WITHOUT the skill. Document exact behavior:

What choices did they make?
What rationalizations did they use (verbatim)?
Which pressures triggered violations?

This is "watch the test fail" - you must see what agents naturally do before writing the skill.

GREEN: Write Minimal Skill

Write skill that addresses those specific rationalizations. Don't add extra content for hypothetical cases.

Run same scenarios WITH skill. Agent should now comply.

REFACTOR: Close Loopholes

Agent found new rationalization? Add explicit counter. Re-test until bulletproof.

Testing methodology: See @testing-skills-with-subagents.md for the complete testing methodology:

How to write pressure scenarios
Pressure types (time, sunk cost, authority, exhaustion)
Plugging holes systematically
Meta-testing techniques

Anti-Patterns

❌ Narrative Example

"In session 2025-10-03, we found empty projectDir caused..." Why bad: Too specific, not reusable

❌ Multi-Language Dilution

example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden

❌ Code in Flowcharts

step1 [label="import fs"];
step2 [label="read file"];

Why bad: Can't copy-paste, hard to read

❌ Generic Labels

helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning

STOP: Before Moving to Next Skill

After writing ANY skill, you MUST STOP and complete the deployment process.

Do NOT:

Create multiple skills in batch without testing each
Move to next skill before current one is verified
Skip testing because "batching is more efficient"

The deployment checklist below is MANDATORY for EACH skill.

Deploying untested skills = deploying untested code. It's a violation of quality standards.

Skill Creation Checklist (TDD Adapted)

IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.

RED Phase - Write Failing Test:

Create pressure scenarios (3+ combined pressures for discipline skills)
Run scenarios WITHOUT skill - document baseline behavior verbatim
Identify patterns in rationalizations/failures

GREEN Phase - Write Minimal Skill:

Name uses only letters, numbers, hyphens (no parentheses/special chars)
YAML frontmatter with only name and description (max 1024 chars)
Description starts with "Use when..." and includes specific triggers/symptoms
Description written in third person
Keywords throughout for search (errors, symptoms, tools)
Clear overview with core principle
Address specific baseline failures identified in RED
Code inline OR link to separate file
One excellent example (not multi-language)
Run scenarios WITH skill - verify agents now comply

REFACTOR Phase - Close Loopholes:

Identify NEW rationalizations from testing
Add explicit counters (if discipline skill)
Build rationalization table from all test iterations
Create red flags list
Re-test until bulletproof

Quality Checks:

Small flowchart only if decision non-obvious
Quick reference table
Common mistakes section
No narrative storytelling
Supporting files only for tools or heavy reference

Deployment:

Commit skill to git and push to your fork (if configured)
Consider contributing back via PR (if broadly useful)

Discovery Workflow

How future Claude finds your skill:

Encounters problem ("tests are flaky")
Finds SKILL (description matches)
Scans overview (is this relevant?)
Reads patterns (quick reference table)
Loads example (only when implementing)

Optimize for this flow - put searchable terms early and often.

The Bottom Line

Creating skills IS TDD for process documentation.

Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.

If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.

Weekly Installs

23.4K

Repository

obra/superpowers

GitHub Stars

107.7K

First Seen

Jan 19, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode19.9K

codex19.3K

gemini-cli19.2K

github-copilot18.1K

cursor17.6K

amp16.9K

Table or bullets for scanning common operations

Inline code for simple patterns Link to file for heavy reference or reusable tools

What goes wrong + fixes

写作技能：基于测试驱动开发的AI代理技能创建指南 | 技术文档最佳实践

🇨🇳中文介绍

写作技能

概述

什么是技能？

技能的 TDD 映射

相关 Skills

何时创建技能

技能类型

技术

模式

参考

目录结构

SKILL.md 结构

name: Skill-Name-With-Hyphens description: Use when [specific triggering conditions and symptoms]

技能名称

概述

何时使用

核心模式（针对技术/模式）

快速参考

实现

常见错误

实际影响（可选）

Claude 搜索优化 (CSO)

1. 丰富的描述字段

❌ 错误：太抽象、模糊，未包含何时使用

❌ 错误：第一人称

❌ 错误：提及技术但技能并非特定于它

✅ 良好：以 "Use when" 开头，描述问题，无工作流

✅ 良好：技术特定技能，具有明确的触发器

2. 关键词覆盖

3. 描述性命名

4. 令牌效率（关键）

4. 交叉引用其他技能

流程图使用

代码示例

文件组织

自包含技能

包含可重用工具的技能

包含大量参考的技能

铁律（与 TDD 相同）

测试所有技能类型

纪律执行技能（规则/要求）

技术技能（操作指南）

模式技能（心智模型）

参考技能（文档/API）

跳过测试的常见合理化理由

使技能免受合理化理由影响

明确堵住每一个漏洞

处理"精神与字面"的争论

构建合理化理由表

创建红旗列表

为违规症状更新 CSO

技能的 RED-GREEN-REFACTOR

RED：编写失败的测试（基线）

GREEN：编写最小化技能

REFACTOR：堵住漏洞

反模式

❌ 叙述性示例

❌ 多语言稀释

❌ 流程图中的代码

❌ 通用标签

停止：在转向下一个技能之前

技能创建清单（TDD 适配版）

发现工作流

底线

🇺🇸English

Writing Skills

Overview

What is a Skill?

TDD Mapping for Skills

When to Create a Skill

Skill Types

Technique

Pattern

Reference

Directory Structure

SKILL.md Structure

name: Skill-Name-With-Hyphens description: Use when [specific triggering conditions and symptoms]

Skill Name