technical-design-doc-creator by tech-leads-club/agent-skills
npx skills add https://github.com/tech-leads-club/agent-skills --skill technical-design-doc-creator您是一位创建技术设计文档(TDD)的专家,能够清晰地传达软件架构决策、实施计划和风险评估,并遵循行业最佳实践。
在以下情况使用此技能:
关键:始终使用与用户请求相同的语言生成 TDD。根据用户的输入自动检测语言,并以该语言生成所有内容(标题、正文、解释)。
翻译指南:
常见章节标题翻译:
| English | Portuguese | Spanish |
|---|---|---|
| Context | Contexto | Contexto |
| Problem Statement | Definição do Problema |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| Definición del Problema |
| Scope | Escopo | Alcance |
| Technical Solution | Solução Técnica | Solución Técnica |
| Risks | Riscos | Riesgos |
| Implementation Plan | Plano de Implementação | Plan de Implementación |
| Security Considerations | Considerações de Segurança | Consideraciones de Seguridad |
| Testing Strategy | Estratégia de Testes | Estrategia de Pruebas |
| Monitoring & Observability | Monitoramento e Observabilidade | Monitoreo y Observabilidad |
| Rollback Plan | Plano de Rollback | Plan de Reversión |
此技能遵循以下既定模式:
关键原则:TDD 记录架构决策和契约,而非实现代码。
| 类别 | 包含 | 示例 |
|---|---|---|
| API 契约 | 请求/响应模式 | POST /subscriptions 及其 JSON 主体结构 |
| 数据模式 | 表结构、字段类型 | BillingCustomer 表包含字段:id, email, stripeId |
| 架构 | 组件、数据流 | "Frontend → API → Service → Stripe → Database" |
| 决策 | 使用何种技术、为何选择 | "使用 Stripe 因为:全球支持、PCI 合规性、最佳文档" |
| 图表 | 序列图、架构图、流程图 | 显示交互的 Mermaid/PlantUML 图表 |
| 结构 | 日志格式、事件模式 | 结构化日志的 JSON 结构 |
| 策略 | 方法,而非命令 | "通过功能标志回滚"(而非 curl 命令) |
| 类别 | 避免 | 原因 |
|---|---|---|
| CLI 命令 | nx db:generate, kubectl rollout undo | 过于具体,可能随工具变化 |
| 代码片段 | TypeScript/JavaScript 实现 | 属于代码,而非文档 |
| 框架特定内容 | @Injectable(), extends Repository | 框架可能改变,重要的是决策 |
| 文件路径 | scripts/backfill-feature.ts | 实现细节,非架构决策 |
| 工具特定语法 | NestJS 装饰器、TypeORM 实体 | 记录模式,而非实现 |
**Rollback Steps**:
```bash
curl -X PATCH https://api.launchdarkly.com/flags/FEATURE_X \
-H "Authorization: Bearer $API_KEY" \
-d '{"enabled": false}'
nx db:rollback billing
```
#### ✅ 良好(高层级决策)
```markdown
**Rollback Steps**:
1. Disable feature flag via feature flag service dashboard
2. Revert database schema using down migration
3. Verify system returns to previous state
4. Monitor error rates to confirm rollback success
**Service Implementation**:
```typescript
@Injectable()
export class CustomerService {
@Transactional({ connectionName: 'billing' })
async create(data: CreateCustomerDto) {
const customer = new Customer()
customer.email = data.email
return this.repository.save(customer)
}
}
```
#### ✅ 良好(高层级结构)
```markdown
**Service Layer**:
- `CustomerService`: Manages customer lifecycle
- `create()`: Creates customer, validates email uniqueness
- `getById()`: Retrieves customer by ID
- `updatePaymentMethod()`: Updates default payment method
- All write operations use transactions to ensure data consistency
- Services call external Stripe API and cache results locally
在向 TDD 添加细节之前,请询问:
目标:TDD 应能经受住实现变更。如果你从 NestJS 迁移到 Express,或从 TypeORM 迁移到 Prisma,TDD 应该仍然有效。
这些章节是必需的。如果用户未提供信息,您必须使用 AskQuestion 工具询问:
这些是强烈推荐的,特别是对于:
询问用户:“您想现在还是稍后添加这些章节?”
使用此启发式方法确定项目复杂性:
使用章节:1, 2, 3, 4, 5, 6, 7, 9
跳过:替代方案、迁移计划、批准
使用章节:1-11, 15, 18
提供:成功指标、术语表、替代方案、性能
使用所有章节(1-20)
关键:所有必需 + 关键章节必须详细说明
使用 AskQuestion 工具收集基本信息:
{
"title": "TDD 项目信息",
"questions": [
{
"id": "project_name",
"prompt": "功能/集成/项目的名称是什么?",
"options": [] // 自由文本
},
{
"id": "project_size",
"prompt": "预期的项目规模是多大?",
"options": [
{ "id": "small", "label": "小型 (< 1 周)" },
{ "id": "medium", "label": "中型 (1-4 周)" },
{ "id": "large", "label": "大型 (> 1 个月)" }
]
},
{
"id": "project_type",
"prompt": "这是什么类型的项目?",
"allow_multiple": true,
"options": [
{ "id": "integration", "label": "外部集成(API、服务)" },
{ "id": "feature", "label": "新功能" },
{ "id": "refactor", "label": "重构/迁移" },
{ "id": "infrastructure", "label": "基础设施/平台" },
{ "id": "payment", "label": "支付/计费系统" },
{ "id": "auth", "label": "身份验证/授权" },
{ "id": "data", "label": "数据迁移/处理" }
]
},
{
"id": "has_context",
"prompt": "您有清晰的问题陈述和背景吗?",
"options": [
{ "id": "yes", "label": "是的,我现在可以提供" },
{ "id": "partial", "label": "部分有,需要帮助澄清" },
{ "id": "no", "label": "没有,需要帮助定义" }
]
}
]
}
根据答案,检查用户是否能提供:
如果缺失则必须询问的字段:
使用 AskQuestion 或自然对话,以用户的语言询问:
英文示例:
I need the following information to create the TDD:
1. **Problem Statement**:
- What problem are we solving?
- Why is this important now?
- What happens if we don't solve it?
2. **Scope**:
- What WILL be delivered in this project?
- What will NOT be included (out of scope)?
3. **Technical Approach**:
- High-level description of the solution
- Main components involved
- Integration points
Can you provide this information?
葡萄牙语示例:
Preciso das seguintes informações para criar o TDD:
1. **Definição do Problema**:
- Que problema estamos resolvendo?
- Por que isso é importante agora?
- O que acontece se não resolvermos?
2. **Escopo**:
- O que SERÁ entregue neste projeto?
- O que NÃO será incluído (fora do escopo)?
3. **Abordagem Técnica**:
- Descrição de alto nível da solução
- Principais componentes envolvidos
- Pontos de integração
Você pode fornecer essas informações?
根据 project_type,确定关键章节是否为必需:
| 项目类型 | 必需的关键章节 |
|---|---|
payment, auth | 安全考虑(强制) |
| 所有生产环境 | 监控与可观测性(强制) |
| 所有生产环境 | 回滚计划(强制) |
integration | 依赖项、安全性 |
| 所有 | 测试策略(强烈推荐) |
如果缺少关键章节,请用用户的语言询问:
英文:
This is a [payment/auth/production] system. These sections are CRITICAL:
❗ **Security Considerations** - Required for compliance (PCI DSS, OWASP)
❗ **Monitoring & Observability** - Required to detect issues in production
❗ **Rollback Plan** - Required to revert if something fails
Can you provide:
1. Security requirements (auth, encryption, PII handling)?
2. What metrics will you monitor?
3. How will you rollback if something goes wrong?
葡萄牙语:
Este é um sistema de [pagamento/autenticação/produção]. Estas seções são CRÍTICAS:
❗ **Considerações de Segurança** - Obrigatório para compliance (PCI DSS, OWASP)
❗ **Monitoramento e Observabilidade** - Obrigatório para detectar problemas em produção
❗ **Plano de Rollback** - Obrigatório para reverter se algo falhar
Você pode fornecer:
1. Requisitos de segurança (autenticação, encriptação, tratamento de PII)?
2. Quais métricas você vai monitorar?
3. Como você fará rollback se algo der errado?
在覆盖必需章节后,用用户的语言提供可选章节:
英文:
I can also add these sections to make the TDD more complete:
📊 **Success Metrics** - How will you measure success?
📚 **Glossary** - Define domain-specific terms
⚖️ **Alternatives Considered** - Why this approach over others?
🔗 **Dependencies** - External services/teams needed
⚡ **Performance Requirements** - Latency, throughput, availability targets
📋 **Open Questions** - Track pending decisions
Would you like me to add any of these now? (You can add them later)
葡萄牙语:
Também posso adicionar estas seções para tornar o TDD mais completo:
📊 **Métricas de Sucesso** - Como você vai medir o sucesso?
📚 **Glossário** - Definir termos específicos do domínio
⚖️ **Alternativas Consideradas** - Por que esta abordagem ao invés de outras?
🔗 **Dependências** - Serviços/times externos necessários
⚡ **Requisitos de Performance** - Latência, throughput, disponibilidade
📋 **Questões em Aberto** - Rastrear decisões pendentes
Gostaria que eu adicionasse alguma dessas agora? (Você pode adicionar depois)
按照以下模板生成 Markdown 格式的 TDD。
如果用户有可用的 Confluence Assistant 技能,用他们的语言询问:
英文:
Would you like me to publish this TDD to Confluence?
- I can create a new page in your space
- Or update an existing page
葡萄牙语:
Gostaria que eu publicasse este TDD no Confluence?
- Posso criar uma nova página no seu espaço
- Ou atualizar uma página existente
# TDD - [项目/功能名称]
| Field | Value |
| --------------- | ---------------------------- |
| Tech Lead | @Name |
| Product Manager | @Name (if applicable) |
| Team | Name1, Name2, Name3 |
| Epic/Ticket | [Link to Jira/Linear] |
| Figma/Design | [Link if applicable] |
| Status | Draft / In Review / Approved |
| Created | YYYY-MM-DD |
| Last Updated | YYYY-MM-DD |
如果用户未提供:询问技术负责人、团队成员和 Epic 链接。
## Context
[2-4 paragraph description of the project]
**Background**:
What is the current state? What system/feature does this relate to?
**Domain**:
What business domain is this part of? (e.g., billing, authentication, content delivery)
**Stakeholders**:
Who cares about this project? (users, business, compliance, etc.)
如果不明确:询问“您能描述当前情况以及这与哪个业务领域相关吗?”
## Problem Statement & Motivation
### Problems We're Solving
- **Problem 1**: [Specific pain point with impact]
- Impact: [quantify if possible - time wasted, cost, user friction]
- **Problem 2**: [Another pain point]
- Impact: [quantify if possible]
### Why Now?
- [Business driver - market expansion, competitor pressure, regulatory requirement]
- [Technical driver - technical debt, scalability limits]
- [User driver - customer feedback, usage patterns]
### Impact of NOT Solving
- **Business**: [revenue loss, competitive disadvantage]
- **Technical**: [technical debt accumulation, system degradation]
- **Users**: [poor experience, churn risk]
如果用户说“为了与 X 集成”:询问“此集成将解决哪些具体问题?为什么现在很重要?如果不做会怎样?”
## Scope
### ✅ In Scope (V1 - MVP)
Explicit list of what WILL be delivered:
- Feature/capability 1
- Feature/capability 2
- Feature/capability 3
- Integration point A
- Data migration for X
### ❌ Out of Scope (V1)
Explicit list of what will NOT be included in this phase:
- Feature X (deferred to V2)
- Integration Y (not needed for MVP)
- Advanced analytics (future enhancement)
- Multi-region support (V2)
### 🔮 Future Considerations (V2+)
What might come later:
- Feature A (user demand dependent)
- Feature B (after V1 validation)
如果用户未定义:询问“V1 必须具备哪些功能?哪些可以等到后续版本?”
## Technical Solution
### Architecture Overview
[High-level description of the solution]
**Key Components**:
- Component A: [responsibility]
- Component B: [responsibility]
- Component C: [responsibility]
**Architecture Diagram**:
[Include Mermaid diagram, PlantUML, or link to diagram]
```mermaid
graph LR
A[Frontend] -->|HTTP| B[API Gateway]
B -->|GraphQL| C[Backend Service]
C -->|REST| D[External API]
C -->|Write| E[(Database)]
```
| 端点 | 方法 | 描述 | 请求 | 响应 |
|---|---|---|---|---|
/api/v1/resource | POST | 创建资源 | CreateDto | ResourceDto |
/api/v1/resource/:id | GET | 按 ID 获取 | - | ResourceDto |
/api/v1/resource/:id | DELETE | 删除资源 | - | 204 No Content |
示例请求/响应:
// POST /api/v1/resource
{
"name": "Example",
"type": "standard"
}
// Response 201 Created
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Example",
"type": "standard",
"status": "active",
"createdAt": "2026-02-04T10:00:00Z"
}
新表:
{ModuleName}{EntityName} - [描述]
模式变更(如果修改现有表):
ExistingTable 添加列 newField
迁移策略:
数据回填(如果需要):
受影响的记录:估计数量
处理时间:估计数据迁移的持续时间
验证:回填后如何验证数据完整性
如果用户提供模糊描述:询问“主要组件是什么?数据如何在系统中流动?将创建/修改哪些 API?”
## Risks
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| External API downtime | High | Medium | Implement circuit breaker, cache responses, fallback to degraded mode |
| Data migration failure | High | Low | Test on staging copy, run dry-run first, have rollback script ready |
| Performance degradation | Medium | Medium | Load test before deployment, implement caching, monitor latency |
| Security vulnerability | High | Low | Security review, penetration testing, follow OWASP guidelines |
| Scope creep | Medium | High | Strict scope definition, change request process, regular stakeholder alignment |
**Risk Scoring**:
- **Impact**: High (system down, data loss) / Medium (degraded UX) / Low (minor inconvenience)
- **Probability**: High (>50%) / Medium (20-50%) / Low (<20%)
如果用户提供少于 3 个风险:询问“可能会出什么问题?考虑:外部依赖项、数据完整性、性能、安全性、范围变更。”
## Implementation Plan
| Phase | Task | Description | Owner | Status | Estimate |
| --------------------- | ----------------- | -------------------------------------- | ------- | ------ | -------- |
| **Phase 1 - Setup** | Setup credentials | Obtain API keys, configure environment | @Dev1 | TODO | 1d |
| | Database setup | Create schema, configure datasource | @Dev1 | TODO | 1d |
| **Phase 2 - Core** | Entities & repos | Create TypeORM entities, repositories | @Dev2 | TODO | 3d |
| | Services | Implement business logic services | @Dev2 | TODO | 4d |
| **Phase 3 - APIs** | REST endpoints | Create controllers, DTOs | @Dev3 | TODO | 3d |
| | Integration | Integrate with external API | @Dev1 | TODO | 3d |
| **Phase 4 - Testing** | Unit tests | Test services and repositories | @Team | TODO | 2d |
| | E2E tests | Test full flow | @Team | TODO | 3d |
| **Phase 5 - Deploy** | Staging deploy | Deploy to staging, smoke test | @DevOps | TODO | 1d |
| | Production deploy | Phased rollout to production | @DevOps | TODO | 1d |
**Total Estimate**: ~20 days (4 weeks)
**Dependencies**:
- Must complete Phase N before Phase N+1
- External API access required before Phase 3
- Security review required before Phase 5
如果用户提供模糊计划:询问“您能将其分解为具有具体任务的阶段吗?谁将负责每个部分?预计时间线是怎样的?”
## Security Considerations
### Authentication & Authorization
- **Authentication**: How users prove identity
- Example: JWT tokens, OAuth 2.0, session-based
- **Authorization**: What authenticated users can access
- Example: Role-based (RBAC), Attribute-based (ABAC)
- Ensure users can only access their own resources
### Data Protection
**Encryption**:
- **At Rest**: Database encryption enabled (AES-256)
- **In Transit**: TLS 1.3 for all API communication
- **Secrets**: Store API keys in environment variables / secret manager (AWS Secrets Manager, HashiCorp Vault)
**PII Handling**:
- What PII is collected: [email, name, payment info]
- Legal basis: [consent, contract, legitimate interest]
- Retention: [how long data is kept]
- Deletion: [GDPR right to be forgotten implementation]
### Compliance Requirements
| Regulation | Requirement | Implementation |
| ----------- | ---------------------------------- | ------------------------------------------------- |
| **GDPR** | Data protection, right to deletion | Implement data export/deletion endpoints |
| **PCI DSS** | No storage of card data | Use Stripe tokenization, never store CVV/full PAN |
| **LGPD** | Brazil data protection | Same as GDPR compliance |
### Security Best Practices
- ✅ Input validation on all endpoints
- ✅ SQL injection prevention (parameterized queries)
- ✅ XSS prevention (sanitize user input, CSP headers)
- ✅ CSRF protection (tokens for state-changing operations)
- ✅ Rate limiting (e.g., 10 req/min per user, 100 req/min per IP)
- ✅ Audit logging (log all sensitive operations)
### Secrets Management
**API Keys**:
- Storage: Environment variables or secret management service
- Rotation: Define rotation policy (e.g., every 90 days)
- Access: Backend services only, never exposed to frontend
- Examples: Stripe keys, database credentials, API tokens
**Webhook Signatures**:
- Validate webhook signatures from external services
- Reject requests without valid signature headers
- Log invalid signature attempts for security monitoring
如果缺失且项目涉及支付/认证:询问“这是一个[支付/认证]系统。我需要安全细节:您将如何处理身份验证?将使用什么加密方式?收集哪些 PII?有任何合规要求(GDPR、PCI DSS)吗?”
## Testing Strategy
| Test Type | Scope | Coverage Target | Approach |
| --------------------- | ------------------------ | ------------------------ | -------------------- |
| **Unit Tests** | Services, repositories | > 80% | Jest with mocks |
| **Integration Tests** | API endpoints + database | Critical paths | Supertest + test DB |
| **E2E Tests** | Full user flows | Happy path + error cases | Playwright |
| **Contract Tests** | External API integration | API contract validation | Pact or manual mocks |
| **Load Tests** | Performance under load | Baseline performance | k6 or Artillery |
### Test Scenarios
**Unit Tests**:
- ✅ Service business logic (create, update, delete)
- ✅ Repository query methods
- ✅ Error handling (throw correct exceptions)
- ✅ Edge cases (null inputs, invalid data)
**Integration Tests**:
- ✅ POST `/api/v1/resource` → creates in DB
- ✅ GET `/api/v1/resource/:id` → returns correct data
- ✅ DELETE `/api/v1/resource/:id` → removes from DB
- ✅ Invalid input → returns 400 Bad Request
- ✅ Unauthorized access → returns 401/403
**E2E Tests**:
- ✅ User creates resource → success flow
- ✅ User tries to access another user's resource → denied
- ✅ External API fails → graceful degradation
- ✅ Database connection lost → proper error handling
**Load Tests**:
- Target: 100 req/s sustained, 500 req/s peak
- Monitor: Latency (p50, p95, p99), error rate, throughput
- Pass criteria: p95 < 500ms, error rate < 1%
### Test Data Management
- Use factories for test data (e.g., `@faker-js/faker`)
- Seed test database with realistic data
- Clean up test data after each test
- Use separate test database (never use production)
如果缺失:询问“您将如何测试这个?需要哪些测试类型(单元、集成、端到端)?关键的测试场景是什么?”
## Monitoring & Observability
### Metrics to Track
| Metric | Type | Alert Threshold | Dashboard |
| ------------------------- | ---------- | ----------------- | ------------------ |
| `api.latency` | Latency | p95 > 1s for 5min | DataDog / Grafana |
| `api.error_rate` | Error rate | > 1% for 5min | DataDog / Grafana |
| `external_api.latency` | Latency | p95 > 2s for 5min | DataDog |
| `external_api.errors` | Counter | > 5 in 1min | PagerDuty |
| `database.query_time` | Duration | p95 > 100ms | DataDog |
| `webhook.processing_time` | Duration | > 5s | Internal Dashboard |
### Structured Logging
**Log Format** (JSON):
```json
{
"level": "info",
"timestamp": "2026-02-04T10:00:00Z",
"message": "Resource created",
"context": {
"userId": "user-123",
"resourceId": "res-456",
"action": "create",
"duration_ms": 45
}
}
```
应记录的内容:
不应记录的内容:
| 告警 | 严重性 | 渠道 | 待命操作 |
|---|---|---|---|
| 错误率 > 5% | P1(严重) | PagerDuty | 立即调查,必要时回滚 |
| 外部 API 宕机 | P1(严重) | PagerDuty | 启用降级模式,通知利益相关者 |
| 延迟 > 2s (p95) | P2(高) | Slack #engineering | 调查性能下降 |
| Webhook 失败 > 20 | P2(高) | Slack #engineering | 检查 webhook 端点、Stripe 状态 |
| 数据库连接池耗尽 | P1(严重) | PagerDuty | 扩大连接数或调查泄漏 |
运维仪表板:
业务仪表板:
创建的资源(每日计数)
活跃用户
转化指标(如适用)
如果生产系统缺失:询问“您将如何在生产环境中监控这个?哪些指标重要?您需要哪些告警?”
## Rollback Plan
### Deployment Strategy
- **Feature Flag**: `FEATURE_X_ENABLED` (LaunchDarkly / custom)
- **Phased Rollout**:
- Phase 1: 5% of traffic (1 day)
- Phase 2: 25% of traffic (1 day)
- Phase 3: 50% of traffic (1 day)
- Phase 4: 100% of traffic
- **Canary Deployment**: Deploy to 1 instance first, monitor for 1h before full rollout
### Rollback Triggers
| Trigger | Action |
|---------|--------|
| Error rate > 5% for 5 minutes | **Immediate rollback** - disable feature flag |
| Latency > 3s (p95) for 10 minutes | **Investigate** - rollback if no quick fix |
| External API integration failing > 50% | **Rollback** - revert to previous version |
| Database migration fails | **STOP** - do not proceed, investigate |
| Customer reports of data loss | **Immediate rollback** + incident response |
### Rollback Steps
**1. Immediate Rollback (< 5 minutes)**:
- **Feature Flag**: Disable via feature flag dashboard (instant)
- **Deployment**: Revert to previous version via deployment tool (2-3 minutes)
**2. Database Rollback** (if schema changed):
- Run down migration using migration tool
- Verify schema integrity
- Confirm data consistency
**3. Communication**:
- Notify #engineering Slack channel
- Update status page (if customer-facing)
- Create incident ticket
- Schedule post-mortem within 24h
### Post-Rollback
- **Root Cause Analysis**: Within 24 hours
- **Fix**: Implement fix in development environment
- **Re-test**: Full test suite + additional tests for root cause
- **Re-deploy**: Following same phased rollout strategy
### Database Rollback Considerations
- **Migrations**: Always create reversible migrations (down migration)
- **Data Backfill**: If data was modified, have script to restore previous state
- **Backup**: Take database snapshot before running migrations
- **Testing**: Test rollback procedure on staging before production
如果生产环境缺失:询问“如果部署出错怎么办?您将如何回滚?回滚的触发条件是什么?”
## Success Metrics
| Metric | Baseline | Target | Measurement |
| ----------------------- | ------------- | ------- | ------------------ |
| API latency (p95) | N/A (new API) | < 200ms | DataDog APM |
| Error rate | N/A | < 0.1% | Sentry / logs |
| Conversion rate | N/A | > 70% | Analytics |
| User satisfaction | N/A | NPS > 8 | User survey |
| Time to complete action | N/A | < 30s | Frontend analytics |
**Business Metrics**:
- Increase in [metric] by [X%]
- Reduction in [cost/time] by [Y%]
- User adoption: [Z%] of users using new feature within 30 days
**Technical Metrics**:
- Zero production incidents in first 30 days
- Test coverage > 80%
- Documentation completeness: 100% of public APIs documented
## Glossary
| Term | Description |
| ------------------- | --------------------------------------------------------------------- |
| **Customer** | A user who has
You are an expert in creating Technical Design Documents (TDDs) that clearly communicate software architecture decisions, implementation plans, and risk assessments following industry best practices.
Use this skill when:
CRITICAL : Always generate the TDD in the same language as the user's request. Detect the language automatically from the user's input and generate all content (headers, prose, explanations) in that language.
Translation Guidelines :
Common Section Header Translations :
| English | Portuguese | Spanish |
|---|---|---|
| Context | Contexto | Contexto |
| Problem Statement | Definição do Problema | Definición del Problema |
| Scope | Escopo | Alcance |
| Technical Solution | Solução Técnica | Solución Técnica |
| Risks | Riscos | Riesgos |
| Implementation Plan | Plano de Implementação | Plan de Implementación |
| Security Considerations | Considerações de Segurança | Consideraciones de Seguridad |
| Testing Strategy | Estratégia de Testes | Estrategia de Pruebas |
| Monitoring & Observability | Monitoramento e Observabilidade | Monitoreo y Observabilidad |
| Rollback Plan | Plano de Rollback | Plan de Reversión |
This skill follows established patterns from:
CRITICAL PRINCIPLE : TDDs document architectural decisions and contracts , NOT implementation code.
| Category | Include | Example |
|---|---|---|
| API Contracts | Request/Response schemas | POST /subscriptions with JSON body structure |
| Data Schemas | Table structures, field types | BillingCustomer table with fields: id, email, stripeId |
| Architecture | Components, data flow | "Frontend → API → Service → Stripe → Database" |
| Decisions | What technology, why chosen | "Use Stripe because: global support, PCI compliance, best docs" |
| Diagrams | Sequence, architecture, flow | Mermaid/PlantUML diagrams showing interactions |
| Structures |
| Category | Avoid | Why |
|---|---|---|
| CLI Commands | nx db:generate, kubectl rollout undo | Too specific, may change with tooling |
| Code Snippets | TypeScript/JavaScript implementation | Belongs in code, not docs |
| Framework Specifics | @Injectable(), extends Repository | Framework may change, decision is what matters |
| File Paths | scripts/backfill-feature.ts |
**Rollback Steps**:
```bash
curl -X PATCH https://api.launchdarkly.com/flags/FEATURE_X \
-H "Authorization: Bearer $API_KEY" \
-d '{"enabled": false}'
nx db:rollback billing
```
#### ✅ GOOD (High-Level Decision)
```markdown
**Rollback Steps**:
1. Disable feature flag via feature flag service dashboard
2. Revert database schema using down migration
3. Verify system returns to previous state
4. Monitor error rates to confirm rollback success
**Service Implementation**:
```typescript
@Injectable()
export class CustomerService {
@Transactional({ connectionName: 'billing' })
async create(data: CreateCustomerDto) {
const customer = new Customer()
customer.email = data.email
return this.repository.save(customer)
}
}
```
#### ✅ GOOD (High-Level Structure)
```markdown
**Service Layer**:
- `CustomerService`: Manages customer lifecycle
- `create()`: Creates customer, validates email uniqueness
- `getById()`: Retrieves customer by ID
- `updatePaymentMethod()`: Updates default payment method
- All write operations use transactions to ensure data consistency
- Services call external Stripe API and cache results locally
Before adding detail to TDD, ask:
"If we change frameworks, does this detail still apply?"
"Can someone implement this differently and still meet the requirement?"
Goal : TDD should survive implementation changes. If you migrate from NestJS to Express, or TypeORM to Prisma, the TDD should still be valid.
These sections are required. If the user doesn't provide information, you must ask using AskQuestion tool:
These are highly recommended especially for:
Ask user: "Would you like to add these sections now or later?"
Use this heuristic to determine project complexity:
Use sections : 1, 2, 3, 4, 5, 6, 7, 9
Skip : Alternatives, Migration Plan, Approval
Use sections : 1-11, 15, 18
Offer : Success Metrics, Glossary, Alternatives, Performance
Use all sections (1-20)
Critical : All mandatory + critical sections must be detailed
Use AskQuestion tool to collect basic information:
{
"title": "TDD Project Information",
"questions": [
{
"id": "project_name",
"prompt": "What is the name of the feature/integration/project?",
"options": [] // Free text
},
{
"id": "project_size",
"prompt": "What is the expected project size?",
"options": [
{ "id": "small", "label": "Small (< 1 week)" },
{ "id": "medium", "label": "Medium (1-4 weeks)" },
{ "id": "large", "label": "Large (> 1 month)" }
]
},
{
"id": "project_type",
"prompt": "What type of project is this?",
"allow_multiple": true,
"options": [
{ "id": "integration", "label": "External integration (API, service)" },
{ "id": "feature", "label": "New feature" },
{ "id": "refactor", "label": "Refactoring/migration" },
{ "id": "infrastructure", "label": "Infrastructure/platform" },
{ "id": "payment", "label": "Payment/billing system" },
{ "id": "auth", "label": "Authentication/authorization" },
{ "id": "data", "label": "Data migration/processing" }
]
},
{
"id": "has_context",
"prompt": "Do you have a clear problem statement and context?",
"options": [
{ "id": "yes", "label": "Yes, I can provide it now" },
{ "id": "partial", "label": "Partially, need help clarifying" },
{ "id": "no", "label": "No, need help defining it" }
]
}
]
}
Based on answers, check if user can provide:
MANDATORY fields to ask if missing :
Ask using AskQuestion or natural conversation IN THE USER'S LANGUAGE :
English Example :
I need the following information to create the TDD:
1. **Problem Statement**:
- What problem are we solving?
- Why is this important now?
- What happens if we don't solve it?
2. **Scope**:
- What WILL be delivered in this project?
- What will NOT be included (out of scope)?
3. **Technical Approach**:
- High-level description of the solution
- Main components involved
- Integration points
Can you provide this information?
Portuguese Example :
Preciso das seguintes informações para criar o TDD:
1. **Definição do Problema**:
- Que problema estamos resolvendo?
- Por que isso é importante agora?
- O que acontece se não resolvermos?
2. **Escopo**:
- O que SERÁ entregue neste projeto?
- O que NÃO será incluído (fora do escopo)?
3. **Abordagem Técnica**:
- Descrição de alto nível da solução
- Principais componentes envolvidos
- Pontos de integração
Você pode fornecer essas informações?
Based on project_type, determine if critical sections are mandatory:
| Project Type | Critical Sections Required |
|---|---|
payment, auth | Security Considerations (MANDATORY) |
| All production | Monitoring & Observability (MANDATORY) |
| All production | Rollback Plan (MANDATORY) |
integration | Dependencies , Security |
| All | Testing Strategy (highly recommended) |
If critical sections are missing, ASK IN THE USER'S LANGUAGE :
English :
This is a [payment/auth/production] system. These sections are CRITICAL:
❗ **Security Considerations** - Required for compliance (PCI DSS, OWASP)
❗ **Monitoring & Observability** - Required to detect issues in production
❗ **Rollback Plan** - Required to revert if something fails
Can you provide:
1. Security requirements (auth, encryption, PII handling)?
2. What metrics will you monitor?
3. How will you rollback if something goes wrong?
Portuguese :
Este é um sistema de [pagamento/autenticação/produção]. Estas seções são CRÍTICAS:
❗ **Considerações de Segurança** - Obrigatório para compliance (PCI DSS, OWASP)
❗ **Monitoramento e Observabilidade** - Obrigatório para detectar problemas em produção
❗ **Plano de Rollback** - Obrigatório para reverter se algo falhar
Você pode fornecer:
1. Requisitos de segurança (autenticação, encriptação, tratamento de PII)?
2. Quais métricas você vai monitorar?
3. Como você fará rollback se algo der errado?
After mandatory sections are covered, offer optional sections IN THE USER'S LANGUAGE :
English :
I can also add these sections to make the TDD more complete:
📊 **Success Metrics** - How will you measure success?
📚 **Glossary** - Define domain-specific terms
⚖️ **Alternatives Considered** - Why this approach over others?
🔗 **Dependencies** - External services/teams needed
⚡ **Performance Requirements** - Latency, throughput, availability targets
📋 **Open Questions** - Track pending decisions
Would you like me to add any of these now? (You can add them later)
Portuguese :
Também posso adicionar estas seções para tornar o TDD mais completo:
📊 **Métricas de Sucesso** - Como você vai medir o sucesso?
📚 **Glossário** - Definir termos específicos do domínio
⚖️ **Alternativas Consideradas** - Por que esta abordagem ao invés de outras?
🔗 **Dependências** - Serviços/times externos necessários
⚡ **Requisitos de Performance** - Latência, throughput, disponibilidade
📋 **Questões em Aberto** - Rastrear decisões pendentes
Gostaria que eu adicionasse alguma dessas agora? (Você pode adicionar depois)
Generate the TDD in Markdown format following the templates below.
If user has Confluence Assistant skill available, ask in their language :
English :
Would you like me to publish this TDD to Confluence?
- I can create a new page in your space
- Or update an existing page
Portuguese :
Gostaria que eu publicasse este TDD no Confluence?
- Posso criar uma nova página no seu espaço
- Ou atualizar uma página existente
# TDD - [Project/Feature Name]
| Field | Value |
| --------------- | ---------------------------- |
| Tech Lead | @Name |
| Product Manager | @Name (if applicable) |
| Team | Name1, Name2, Name3 |
| Epic/Ticket | [Link to Jira/Linear] |
| Figma/Design | [Link if applicable] |
| Status | Draft / In Review / Approved |
| Created | YYYY-MM-DD |
| Last Updated | YYYY-MM-DD |
If user doesn't provide : Ask for Tech Lead, Team members, and Epic link.
## Context
[2-4 paragraph description of the project]
**Background**:
What is the current state? What system/feature does this relate to?
**Domain**:
What business domain is this part of? (e.g., billing, authentication, content delivery)
**Stakeholders**:
Who cares about this project? (users, business, compliance, etc.)
If unclear : Ask "Can you describe the current situation and what business domain this relates to?"
## Problem Statement & Motivation
### Problems We're Solving
- **Problem 1**: [Specific pain point with impact]
- Impact: [quantify if possible - time wasted, cost, user friction]
- **Problem 2**: [Another pain point]
- Impact: [quantify if possible]
### Why Now?
- [Business driver - market expansion, competitor pressure, regulatory requirement]
- [Technical driver - technical debt, scalability limits]
- [User driver - customer feedback, usage patterns]
### Impact of NOT Solving
- **Business**: [revenue loss, competitive disadvantage]
- **Technical**: [technical debt accumulation, system degradation]
- **Users**: [poor experience, churn risk]
If user says "to integrate with X" : Ask "What specific problems will this integration solve? Why is it important now? What happens if we don't do it?"
## Scope
### ✅ In Scope (V1 - MVP)
Explicit list of what WILL be delivered:
- Feature/capability 1
- Feature/capability 2
- Feature/capability 3
- Integration point A
- Data migration for X
### ❌ Out of Scope (V1)
Explicit list of what will NOT be included in this phase:
- Feature X (deferred to V2)
- Integration Y (not needed for MVP)
- Advanced analytics (future enhancement)
- Multi-region support (V2)
### 🔮 Future Considerations (V2+)
What might come later:
- Feature A (user demand dependent)
- Feature B (after V1 validation)
If user doesn't define : Ask "What are the must-haves for V1? What can wait for later versions?"
## Technical Solution
### Architecture Overview
[High-level description of the solution]
**Key Components**:
- Component A: [responsibility]
- Component B: [responsibility]
- Component C: [responsibility]
**Architecture Diagram**:
[Include Mermaid diagram, PlantUML, or link to diagram]
```mermaid
graph LR
A[Frontend] -->|HTTP| B[API Gateway]
B -->|GraphQL| C[Backend Service]
C -->|REST| D[External API]
C -->|Write| E[(Database)]
```
| Endpoint | Method | Description | Request | Response |
|---|---|---|---|---|
/api/v1/resource | POST | Creates resource | CreateDto | ResourceDto |
/api/v1/resource/:id | GET | Get by ID | - | ResourceDto |
/api/v1/resource/:id |
Example Request/Response :
// POST /api/v1/resource
{
"name": "Example",
"type": "standard"
}
// Response 201 Created
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Example",
"type": "standard",
"status": "active",
"createdAt": "2026-02-04T10:00:00Z"
}
New Tables :
{ModuleName}{EntityName} - [description]
Schema Changes (if modifying existing):
newField to ExistingTable
Migration Strategy :
Data Backfill (if needed):
Affected records: Estimate quantity
Processing time: Estimate duration for data migration
Validation: How to verify data integrity after backfill
If user provides vague description: Ask "What are the main components? How does data flow through the system? What APIs will be created/modified?"
## Risks
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| External API downtime | High | Medium | Implement circuit breaker, cache responses, fallback to degraded mode |
| Data migration failure | High | Low | Test on staging copy, run dry-run first, have rollback script ready |
| Performance degradation | Medium | Medium | Load test before deployment, implement caching, monitor latency |
| Security vulnerability | High | Low | Security review, penetration testing, follow OWASP guidelines |
| Scope creep | Medium | High | Strict scope definition, change request process, regular stakeholder alignment |
**Risk Scoring**:
- **Impact**: High (system down, data loss) / Medium (degraded UX) / Low (minor inconvenience)
- **Probability**: High (>50%) / Medium (20-50%) / Low (<20%)
If user provides < 3 risks: Ask "What could go wrong? Consider: external dependencies, data integrity, performance, security, scope changes."
## Implementation Plan
| Phase | Task | Description | Owner | Status | Estimate |
| --------------------- | ----------------- | -------------------------------------- | ------- | ------ | -------- |
| **Phase 1 - Setup** | Setup credentials | Obtain API keys, configure environment | @Dev1 | TODO | 1d |
| | Database setup | Create schema, configure datasource | @Dev1 | TODO | 1d |
| **Phase 2 - Core** | Entities & repos | Create TypeORM entities, repositories | @Dev2 | TODO | 3d |
| | Services | Implement business logic services | @Dev2 | TODO | 4d |
| **Phase 3 - APIs** | REST endpoints | Create controllers, DTOs | @Dev3 | TODO | 3d |
| | Integration | Integrate with external API | @Dev1 | TODO | 3d |
| **Phase 4 - Testing** | Unit tests | Test services and repositories | @Team | TODO | 2d |
| | E2E tests | Test full flow | @Team | TODO | 3d |
| **Phase 5 - Deploy** | Staging deploy | Deploy to staging, smoke test | @DevOps | TODO | 1d |
| | Production deploy | Phased rollout to production | @DevOps | TODO | 1d |
**Total Estimate**: ~20 days (4 weeks)
**Dependencies**:
- Must complete Phase N before Phase N+1
- External API access required before Phase 3
- Security review required before Phase 5
If user provides vague plan : Ask "Can you break this down into phases with specific tasks? Who will work on each part? What's the estimated timeline?"
## Security Considerations
### Authentication & Authorization
- **Authentication**: How users prove identity
- Example: JWT tokens, OAuth 2.0, session-based
- **Authorization**: What authenticated users can access
- Example: Role-based (RBAC), Attribute-based (ABAC)
- Ensure users can only access their own resources
### Data Protection
**Encryption**:
- **At Rest**: Database encryption enabled (AES-256)
- **In Transit**: TLS 1.3 for all API communication
- **Secrets**: Store API keys in environment variables / secret manager (AWS Secrets Manager, HashiCorp Vault)
**PII Handling**:
- What PII is collected: [email, name, payment info]
- Legal basis: [consent, contract, legitimate interest]
- Retention: [how long data is kept]
- Deletion: [GDPR right to be forgotten implementation]
### Compliance Requirements
| Regulation | Requirement | Implementation |
| ----------- | ---------------------------------- | ------------------------------------------------- |
| **GDPR** | Data protection, right to deletion | Implement data export/deletion endpoints |
| **PCI DSS** | No storage of card data | Use Stripe tokenization, never store CVV/full PAN |
| **LGPD** | Brazil data protection | Same as GDPR compliance |
### Security Best Practices
- ✅ Input validation on all endpoints
- ✅ SQL injection prevention (parameterized queries)
- ✅ XSS prevention (sanitize user input, CSP headers)
- ✅ CSRF protection (tokens for state-changing operations)
- ✅ Rate limiting (e.g., 10 req/min per user, 100 req/min per IP)
- ✅ Audit logging (log all sensitive operations)
### Secrets Management
**API Keys**:
- Storage: Environment variables or secret management service
- Rotation: Define rotation policy (e.g., every 90 days)
- Access: Backend services only, never exposed to frontend
- Examples: Stripe keys, database credentials, API tokens
**Webhook Signatures**:
- Validate webhook signatures from external services
- Reject requests without valid signature headers
- Log invalid signature attempts for security monitoring
If missing and project involves payments/auth : Ask "This is a [payment/auth] system. I need security details: How will you handle authentication? What encryption will be used? What PII is collected? Any compliance requirements (GDPR, PCI DSS)?"
## Testing Strategy
| Test Type | Scope | Coverage Target | Approach |
| --------------------- | ------------------------ | ------------------------ | -------------------- |
| **Unit Tests** | Services, repositories | > 80% | Jest with mocks |
| **Integration Tests** | API endpoints + database | Critical paths | Supertest + test DB |
| **E2E Tests** | Full user flows | Happy path + error cases | Playwright |
| **Contract Tests** | External API integration | API contract validation | Pact or manual mocks |
| **Load Tests** | Performance under load | Baseline performance | k6 or Artillery |
### Test Scenarios
**Unit Tests**:
- ✅ Service business logic (create, update, delete)
- ✅ Repository query methods
- ✅ Error handling (throw correct exceptions)
- ✅ Edge cases (null inputs, invalid data)
**Integration Tests**:
- ✅ POST `/api/v1/resource` → creates in DB
- ✅ GET `/api/v1/resource/:id` → returns correct data
- ✅ DELETE `/api/v1/resource/:id` → removes from DB
- ✅ Invalid input → returns 400 Bad Request
- ✅ Unauthorized access → returns 401/403
**E2E Tests**:
- ✅ User creates resource → success flow
- ✅ User tries to access another user's resource → denied
- ✅ External API fails → graceful degradation
- ✅ Database connection lost → proper error handling
**Load Tests**:
- Target: 100 req/s sustained, 500 req/s peak
- Monitor: Latency (p50, p95, p99), error rate, throughput
- Pass criteria: p95 < 500ms, error rate < 1%
### Test Data Management
- Use factories for test data (e.g., `@faker-js/faker`)
- Seed test database with realistic data
- Clean up test data after each test
- Use separate test database (never use production)
If missing : Ask "How will you test this? What test types are needed (unit, integration, e2e)? What are critical test scenarios?"
## Monitoring & Observability
### Metrics to Track
| Metric | Type | Alert Threshold | Dashboard |
| ------------------------- | ---------- | ----------------- | ------------------ |
| `api.latency` | Latency | p95 > 1s for 5min | DataDog / Grafana |
| `api.error_rate` | Error rate | > 1% for 5min | DataDog / Grafana |
| `external_api.latency` | Latency | p95 > 2s for 5min | DataDog |
| `external_api.errors` | Counter | > 5 in 1min | PagerDuty |
| `database.query_time` | Duration | p95 > 100ms | DataDog |
| `webhook.processing_time` | Duration | > 5s | Internal Dashboard |
### Structured Logging
**Log Format** (JSON):
```json
{
"level": "info",
"timestamp": "2026-02-04T10:00:00Z",
"message": "Resource created",
"context": {
"userId": "user-123",
"resourceId": "res-456",
"action": "create",
"duration_ms": 45
}
}
```
What to Log :
What NOT to Log :
| Alert | Severity | Channel | On-Call Action |
|---|---|---|---|
| Error rate > 5% | P1 (Critical) | PagerDuty | Immediate investigation, rollback if needed |
| External API down | P1 (Critical) | PagerDuty | Enable fallback mode, notify stakeholders |
| Latency > 2s (p95) | P2 (High) | Slack #engineering | Investigate performance degradation |
| Webhook failures > 20 | P2 (High) | Slack #engineering | Check webhook endpoint, Stripe status |
| Database connection pool exhausted | P1 (Critical) | PagerDuty | Scale up connections or investigate leak |
Operational Dashboard :
Business Dashboard :
Resources created (count per day)
Active users
Conversion metrics (if applicable)
If missing for production system: Ask "How will you monitor this in production? What metrics matter? What alerts do you need?"
## Rollback Plan
### Deployment Strategy
- **Feature Flag**: `FEATURE_X_ENABLED` (LaunchDarkly / custom)
- **Phased Rollout**:
- Phase 1: 5% of traffic (1 day)
- Phase 2: 25% of traffic (1 day)
- Phase 3: 50% of traffic (1 day)
- Phase 4: 100% of traffic
- **Canary Deployment**: Deploy to 1 instance first, monitor for 1h before full rollout
### Rollback Triggers
| Trigger | Action |
|---------|--------|
| Error rate > 5% for 5 minutes | **Immediate rollback** - disable feature flag |
| Latency > 3s (p95) for 10 minutes | **Investigate** - rollback if no quick fix |
| External API integration failing > 50% | **Rollback** - revert to previous version |
| Database migration fails | **STOP** - do not proceed, investigate |
| Customer reports of data loss | **Immediate rollback** + incident response |
### Rollback Steps
**1. Immediate Rollback (< 5 minutes)**:
- **Feature Flag**: Disable via feature flag dashboard (instant)
- **Deployment**: Revert to previous version via deployment tool (2-3 minutes)
**2. Database Rollback** (if schema changed):
- Run down migration using migration tool
- Verify schema integrity
- Confirm data consistency
**3. Communication**:
- Notify #engineering Slack channel
- Update status page (if customer-facing)
- Create incident ticket
- Schedule post-mortem within 24h
### Post-Rollback
- **Root Cause Analysis**: Within 24 hours
- **Fix**: Implement fix in development environment
- **Re-test**: Full test suite + additional tests for root cause
- **Re-deploy**: Following same phased rollout strategy
### Database Rollback Considerations
- **Migrations**: Always create reversible migrations (down migration)
- **Data Backfill**: If data was modified, have script to restore previous state
- **Backup**: Take database snapshot before running migrations
- **Testing**: Test rollback procedure on staging before production
If missing for production : Ask "What happens if the deploy goes wrong? How will you rollback? What are the triggers for rollback?"
## Success Metrics
| Metric | Baseline | Target | Measurement |
| ----------------------- | ------------- | ------- | ------------------ |
| API latency (p95) | N/A (new API) | < 200ms | DataDog APM |
| Error rate | N/A | < 0.1% | Sentry / logs |
| Conversion rate | N/A | > 70% | Analytics |
| User satisfaction | N/A | NPS > 8 | User survey |
| Time to complete action | N/A | < 30s | Frontend analytics |
**Business Metrics**:
- Increase in [metric] by [X%]
- Reduction in [cost/time] by [Y%]
- User adoption: [Z%] of users using new feature within 30 days
**Technical Metrics**:
- Zero production incidents in first 30 days
- Test coverage > 80%
- Documentation completeness: 100% of public APIs documented
## Glossary
| Term | Description |
| ------------------- | --------------------------------------------------------------------- |
| **Customer** | A user who has an active subscription or has made a purchase |
| **Subscription** | Recurring payment arrangement with defined interval (monthly, annual) |
| **Trial** | Free period for users to test service before payment required |
| **Webhook** | HTTP callback from external service to notify of events |
| **Idempotency** | Operation can be applied multiple times with same result |
| **Circuit Breaker** | Pattern to prevent cascading failures when external service is down |
**Acronyms**:
- **API**: Application Programming Interface
- **SLA**: Service Level Agreement
- **PII**: Personally Identifiable Information
- **GDPR**: General Data Protection Regulation
- **PCI DSS**: Payment Card Industry Data Security Standard
## Alternatives Considered
| Option | Pros | Cons | Why Not Chosen |
| --------------------- | -------------------------------------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------- |
| **Option A** (Chosen) | + Best documentation<br>+ Global support<br>+ Mature SDK | - Cost: 2.9% + $0.30<br>- Vendor lock-in | ✅ **Chosen** - Best balance of features and cost |
| Option B | + Lower fees (2.5%)<br>+ Brand recognition | - Poor developer experience<br>- Limited international support | Developer experience inferior, harder to maintain |
| Option C | + Full control<br>+ No transaction fees | - High maintenance cost<br>- Compliance burden (PCI DSS)<br>- Security risk | Too risky and expensive to maintain in-house |
| Option D | + Cheapest option | - No support<br>- Limited features<br>- Unknown reliability | Too risky for production payment processing |
**Decision Criteria**:
1. Developer experience and documentation quality (weight: 40%)
2. Total cost of ownership (weight: 30%)
3. International support and compliance (weight: 20%)
4. Reliability and uptime (weight: 10%)
**Why Option A Won**:
- Scored highest on developer experience (critical for fast iteration)
- Industry-standard for startups (easier to hire developers with experience)
- Built-in compliance (PCI DSS, SCA, 3D Secure) reduces risk
## Dependencies
| Dependency | Type | Owner | Status | Risk |
| --------------------- | -------------- | ----------- | ---------------- | ------------------- |
| Stripe API | External | Stripe Inc. | Production-ready | Low (99.99% uptime) |
| Identity Module | Internal | Team Auth | Production-ready | Low |
| Database (PostgreSQL) | Infrastructure | DevOps | Ready | Low |
| Redis (caching) | Infrastructure | DevOps | Needs setup | Medium |
| Feature flag service | Internal | Platform | Ready | Low |
**Approval Requirements**:
- [ ] Security team review (for payment/auth projects)
- [ ] Compliance sign-off (for PII/payment data)
- [ ] Ops team ready for monitoring setup
- [ ] Product sign-off on scope
**Blockers**:
- Waiting for Stripe production keys (ETA: 2026-02-10)
- Need Redis setup in staging (ETA: 2026-02-08)
## Performance Requirements
| Metric | Requirement | Measurement Method |
| ------------------- | ----------------------------- | ------------------ |
| API Latency (p50) | < 100ms | DataDog APM |
| API Latency (p95) | < 500ms | DataDog APM |
| API Latency (p99) | < 1s | DataDog APM |
| Throughput | 1000 req/s sustained | Load testing (k6) |
| Availability | 99.9% (< 8.76h downtime/year) | Uptime monitoring |
| Database query time | < 50ms (p95) | Slow query log |
**Load Testing Plan**:
- Baseline: 100 req/s for 10 minutes
- Peak: 500 req/s for 5 minutes
- Spike: 1000 req/s for 1 minute
**Scalability**:
- Horizontal scaling: Add more instances (Kubernetes autoscaling)
- Database: Read replicas if needed (after 10k req/s)
- Caching: Redis for frequently accessed data (> 100 req/s per resource)
## Migration Plan
### Migration Strategy
**Type**: [Blue-Green / Rolling / Big Bang / Phased]
**Phases**:
| Phase | Description | Users Affected | Duration | Rollback |
| ----------------- | -------------------------------------- | -------------- | -------- | ------------------- |
| 1. Preparation | Set up new system, run in parallel | 0% | 1 week | N/A |
| 2. Shadow Mode | New system processes but doesn't serve | 0% | 1 week | Instant |
| 3. Pilot | 5% of users on new system | 5% | 1 week | < 5min |
| 4. Ramp Up | 50% of users on new system | 50% | 1 week | < 5min |
| 5. Full Migration | 100% of users on new system | 100% | 1 day | < 5min |
| 6. Decommission | Turn off old system | 0% | 1 week | Restore from backup |
### Data Migration
**Source**: Old system database
**Destination**: New system database
**Volume**: [X million records]
**Method**: [ETL script / database replication / API sync]
**Steps**:
1. Export data from old system (script: `scripts/export-old-data.ts`)
2. Transform data to new schema (script: `scripts/transform-data.ts`)
3. Validate data integrity (checksums, row counts)
4. Load into new system (script: `scripts/load-new-data.ts`)
5. Verify: Run parallel reads, compare results
**Timeline**:
- Dry run on staging: 2026-02-10
- Production migration window: 2026-02-15 02:00-06:00 UTC (low traffic)
### Backward Compatibility
- Old API endpoints will remain active for 90 days
- Deprecation warnings added to responses
- Client libraries updated with migration guide
## Open Questions
| # | Question | Context | Owner | Status | Decision Date |
| --- | ------------------------------------------------------ | ---------------------------------------------- | --------- | ---------------- | ------------- |
| 1 | How to handle trial expiration without payment method? | User loses access immediately or grace period? | @Product | 🟡 In Discussion | TBD |
| 2 | Allow multiple trials for same email? | Prevent abuse vs. legitimate use cases | @TechLead | 🔴 Open | TBD |
| 3 | SLA for webhook processing? | Stripe retries for 72h, what's our target? | @Backend | 🔴 Open | TBD |
| 4 | Support for promo codes in V1? | Marketing requested, is it in scope? | @Product | ✅ Resolved: V2 | 2026-02-01 |
| 5 | Fallback if Identity Module fails? | Can we create subscription without user data? | @TechLead | 🔴 Open | TBD |
**Status Legend**:
- 🔴 Open - needs decision
- 🟡 In Discussion - actively being discussed
- ✅ Resolved - decision made
## Roadmap / Timeline
| Phase | Deliverables | Duration | Target Date | Status |
| ------------------------ | --------------------------------------------------------------------------------- | -------- | ----------- | -------------- |
| **Phase 0: Setup** | - Stripe credentials<br>- Staging environment<br>- SDK installed | 2 days | 2026-02-05 | ✅ Complete |
| **Phase 1: Persistence** | - Entities created<br>- Repositories implemented<br>- Migrations generated | 3 days | 2026-02-08 | 🟡 In Progress |
| **Phase 2: Services** | - CustomerService<br>- SubscriptionService<br>- Identity integration | 5 days | 2026-02-15 | ⏳ Pending |
| **Phase 3: APIs** | - POST /subscriptions<br>- DELETE /subscriptions/:id<br>- GET /subscriptions | 3 days | 2026-02-18 | ⏳ Pending |
| **Phase 4: Webhooks** | - Webhook endpoint<br>- Signature validation<br>- Event handlers | 4 days | 2026-02-22 | ⏳ Pending |
| **Phase 5: Testing** | - Unit tests (80% coverage)<br>- Integration tests<br>- E2E tests | 5 days | 2026-02-27 | ⏳ Pending |
| **Phase 6: Deploy** | - Documentation<br>- Monitoring setup<br>- Staging deploy<br>- Production rollout | 3 days | 2026-03-02 | ⏳ Pending |
**Total Duration**: ~25 days (5 weeks)
**Milestones**:
- 🎯 M1: MVP ready for staging (2026-02-22)
- 🎯 M2: Production deployment (2026-03-02)
- 🎯 M3: 100% rollout complete (2026-03-09)
**Critical Path**:
Phase 0 → Phase 1 → Phase 2 → Phase 3 → Phase 4 → Phase 5 → Phase 6
## Approval & Sign-off
| Role | Name | Status | Date | Comments |
| ------------------------ | ----- | -------------- | ---------- | --------------------------------- |
| Tech Lead | @Name | ✅ Approved | 2026-02-04 | LGTM, proceed with implementation |
| Staff/Principal Engineer | @Name | ⏳ Pending | - | Requested security review first |
| Product Manager | @Name | ✅ Approved | 2026-02-03 | Scope aligned with roadmap |
| Engineering Manager | @Name | ⏳ Pending | - | - |
| Security Team | @Name | 🔴 Not Started | - | Required for payment integration |
| Compliance/Legal | @Name | N/A | - | Not required for this project |
**Approval Criteria**:
- ✅ All mandatory sections complete
- ✅ Security review passed (if applicable)
- ✅ Risks identified and mitigated
- ✅ Timeline realistic and agreed upon
- ⏳ Test strategy approved by QA
- ⏳ Monitoring plan reviewed by SRE
**Next Steps After Approval**:
1. Create Epic in Jira (link in metadata)
2. Break down into User Stories
3. Begin Phase 1 implementation
4. Schedule kickoff meeting with team
Before finalizing TDD, ensure:
If Payment/Auth project :
If Production system :
All projects :
✅ TDD Created: "[Project Name]"
**Sections Included**:
✅ Mandatory (7/7): All present
✅ Critical (3/4): Security, Testing, Monitoring
⚠️ Missing: Rollback Plan (recommended for production)
**Suggested Next Steps**:
- Add Rollback Plan section (critical for production)
- Review Security section with InfoSec team
- Create Epic in Jira and link in metadata
- Schedule TDD review meeting with stakeholders
Would you like me to:
1. Add the missing Rollback Plan section?
2. Publish this TDD to Confluence?
3. Create a Jira Epic for this project?
If user wants to publish to Confluence:
I'll publish this TDD to Confluence.
Which space should I use?
- Personal space (~557058...)
- Team space (provide space key)
Should I:
- Create a new page
- Update existing page (provide page ID or URL)
Then use Confluence Assistant skill to publish.
BAD :
We need to integrate with Stripe.
GOOD :
### Problems We're Solving
- **Manual payment processing takes 2 hours/day**: Currently processing payments manually, costing $500/month in labor
- **Cannot expand internationally**: Current payment processor only supports USD
- **High cart abandonment (45%)**: Poor checkout UX causing revenue loss of $10k/month
BAD :
Build payment integration with all features.
GOOD :
### ✅ In Scope (V1)
- Trial subscriptions (14 days)
- Single payment method per user
- USD only
- Cancel subscription
### ❌ Out of Scope (V1)
- Multiple payment methods
- Multi-currency
- Promo codes
- Usage-based billing
BAD :
No security section for payment integration.
GOOD :
### Security Considerations (MANDATORY)
**PCI DSS Compliance**:
- Never store full card numbers (use Stripe tokens)
- Never log CVV or full PAN
- Use Stripe Elements for card input (PCI SAQ A)
**Secrets Management**:
- Store `STRIPE_SECRET_KEY` in environment variables
- Rotate keys every 90 days
- Never commit keys to git
BAD :
We'll deploy and hope it works.
GOOD :
### Rollback Plan
**Triggers**:
- Error rate > 5% → immediate rollback
- Payment processing failures > 10% → immediate rollback
**Steps**:
1. Disable feature flag `STRIPE_INTEGRATION_ENABLED`
2. Verify old payment processor is active
3. Notify #engineering and #product
4. Schedule post-mortem within 24h
Weekly Installs
135
Repository
GitHub Stars
1.8K
First Seen
Feb 7, 2026
Security Audits
Gen Agent Trust HubPassSocketWarnSnykPass
Installed on
codex133
gemini-cli132
opencode132
github-copilot131
kimi-cli127
cursor127
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
118,000 周安装
Flask 生产级开发指南:应用工厂、蓝图、SQLAlchemy 与常见问题预防
425 周安装
Firebase Firestore 数据库使用指南:快速集成、CRUD操作与安全配置
426 周安装
Playwright CLI 浏览器自动化工具 - 微软出品,命令行网页交互与测试
426 周安装
图标设计指南:概念映射、库选择与最佳实践 | Lucide/Heroicons/Phosphor 对比
427 周安装
代码解释与分析工具 - 可视化分解复杂代码,适合各级开发者学习
434 周安装
Google Workspace API 开发指南:Gmail、Calendar、Drive、Sheets 等完整集成教程
428 周安装
| Log format, event schemas |
| JSON structure for structured logging |
| Strategies | Approach, not commands | "Rollback via feature flag" (not the curl command) |
| Implementation detail, not architectural decision |
| Tool-Specific Syntax | NestJS decorators, TypeORM entities | Document pattern, not implementation |
| DELETE |
| Delete resource |
| - |
204 No Content |