Claude测试模式：基于代理的实时集成测试与YAML规范方法

testing-patterns by jezweb/claude-skills

158 周安装量

650 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jezweb/claude-skills --skill testing-patterns

开发自动化测试

🇨🇳中文介绍

测试模式

一种务实的测试方法，强调：

实时测试 优于模拟
代理执行 以保持上下文清洁
YAML 规范 作为文档和测试
持久化结果 提交到 git

理念

这 不是传统的 TDD。而是：

在生产/预发布环境中测试 并配合良好的日志记录
使用代理 来运行测试（保持主上下文清洁）
以声明方式在 YAML 中定义测试（人类可读，版本控制）
专注于集成测试（真实服务器，真实数据）

为什么采用基于代理的测试？

在主对话中运行 50 个测试会消耗掉你的整个上下文窗口。通过委托给子代理：

主上下文保持清洁，用于开发
代理可以在没有上下文压力的情况下运行许多测试
结果以摘要形式返回
失败的测试会得到详细调查

命令

命令	用途
`/create-tests`	发现项目，生成测试规范 + 测试代理

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

843,800 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

278,000 周安装

Vercel Web界面规范检查工具 - 自动检测代码是否符合Web设计指南

224,400 周安装

/create-tests        → 生成 tests/specs/*.yaml + .claude/agents/test-runner.md
/run-tests           → 生成代理，运行所有测试，保存结果
/run-tests api       → 仅运行匹配 "api" 的规范
/run-tests --failed  → 仅重新运行失败的测试
/coverage            → 运行带覆盖率的测试，分析差距
/coverage --threshold 80  → 如果低于 80% 则失败

发现 - Claude 检查项目：
- 配置了哪些 MCP 服务器？
- 存在哪些 API 或工具？
- 代码是做什么的？
测试设计 - Claude 创建特定于项目的测试：
- 针对实际工具/端点的测试用例
- 基于真实行为的预期值
- 与此领域相关的边界情况
结构 - 使用此技能中的模式：
- tests/ 目录中的 YAML 规范
- .claude/agents/ 中的可选测试代理
- 结果保存到 tests/results/

你："为此 MCP 服务器创建测试"

Claude: [发现这是一个 Google Calendar MCP]
        [看到工具：calendar_events, calendar_create, calendar_delete]
        [设计测试用例：]

        tests/calendar-events.yaml:
        - list_upcoming_events (预期：数组，count_gte 0)
        - search_by_keyword (预期：包含搜索词)
        - invalid_date_range (预期：错误状态)

        tests/calendar-mutations.yaml:
        - create_event (预期：成功，返回 event_id)
        - delete_nonexistent (预期：错误，包含 "not found")

name: Feature Tests
description: What these tests validate

# Optional: defaults applied to all tests
defaults:
  tool: my_tool_name
  timeout: 5000

tests:
  - name: test_case_name
    description: Human-readable purpose
    tool: tool_name  # Override default if needed
    params:
      action: search
      query: "test input"
    expect:
      contains: "expected substring"
      not_contains: "should not appear"
      status: success

规则	描述	示例
`contains`	响应包含字符串	`contains: "from:john"`
`not_contains`	响应不包含	`not_contains: "error"`
`matches`	正则表达式模式匹配	`matches: "after:\\d{4}"`
`json_path`	检查 JSON 路径处的值	`json_path: "$.results[0].name"`
`equals`	精确值匹配	`equals: "success"`
`status`	检查成功/错误	`status: success`
`count_gte`	数组长度 >= N	`count_gte: 1`
`count_eq`	数组长度 == N	`count_eq: 5`
`type`	值类型检查	`type: array`

关键：如果你需要 MCP 访问权限，请不要指定 tools 字段。当你指定任何工具时，它就会变成一个允许列表，并且 "*" 会被按字面意思解释（而不是通配符）。完全省略 tools 以从父会话继承所有工具。

---
name: my-tester
description: |
  测试 [domain] 功能。读取 YAML 测试规范并验证响应。
  使用时机：变更后测试，运行回归测试。
# 省略 tools 字段 - 从父会话继承所有工具（包括 MCP）
model: sonnet
---

# [Domain] 测试器

## 工作原理

1. 查找测试规范：`tests/*.yaml`
2. 解析并执行每个测试
3. 验证响应
4. 报告通过/失败摘要

## 测试规范位置

tests/
├── feature-a.yaml
├── feature-b.yaml
└── results/
    └── YYYY-MM-DD-HHMMSS.md

## 执行

对于每个测试：
1. 使用参数调用工具
2. 捕获响应
3. 应用验证规则
4. 记录 PASS/FAIL

## 报告

将结果保存到 `tests/results/YYYY-MM-DD-HHMMSS.md`

# 测试结果：feature-name
**日期**：2026-02-02 14:30
**提交**：abc1234
**摘要**：8/9 通过 (89%)

## 结果

- test_basic_search - 通过 (0.3s)
- test_with_filter - 通过 (0.4s)
- test_edge_case - 失败

## 失败测试详情

### test_edge_case
- **预期**：包含 "expected value"
- **实际**：响应为空
- **参数**：`{ action: search, query: "" }`

从冒烟测试开始：基本连接性和身份验证
测试边界情况：空结果、错误、特殊字符
使用描述性名称：search_with_date_filter 而不是 test1
对相关测试进行分组：每个功能区域一个文件
在修复错误后添加：每个已修复的错误都应有一个回归测试
提交结果：创建测试运行的历史记录

场景	使用此方法	使用传统测试
MCP 服务器验证	是	否
API 集成	是	辅以单元测试
浏览器工作流	是	辅以组件测试
单元测试	否	是 (Jest/Vitest)
组件测试	否	是 (Testing Library)
类型检查	否	是 (TypeScript)

🇺🇸English

Testing Patterns

A pragmatic approach to testing that emphasises:

Live testing over mocks
Agent execution to preserve context
YAML specs as documentation and tests
Persistent results committed to git

Philosophy

This is not traditional TDD. Instead:

Test in production/staging with good logging
Use agents to run tests (keeps main context clean)
Define tests declaratively in YAML (human-readable, version-controlled)
Focus on integration (real servers, real data)

Why Agent-Based Testing?

Running 50 tests in the main conversation would consume your entire context window. By delegating to a sub-agent:

Main context stays clean for development
Agent can run many tests without context pressure
Results come back as a summary
Failed tests get detailed investigation

Commands

Command	Purpose
`/create-tests`	Discover project, generate test specs + testing agent
`/run-tests`	Execute tests via agent(s), report results
`/coverage`	Generate coverage report and identify uncovered code paths

Quick workflow:

/create-tests        → Generates tests/specs/*.yaml + .claude/agents/test-runner.md
/run-tests           → Spawns agent, runs all tests, saves results
/run-tests api       → Run only specs matching "api"
/run-tests --failed  → Re-run only failed tests
/coverage            → Run tests with coverage, analyse gaps
/coverage --threshold 80  → Fail if below 80%

Getting Started in a New Project

This skill provides the pattern and format. Claude designs the actual tests based on your project context.

What happens when you ask "Create tests for this project":

Discovery - Claude examines the project:
- What MCP servers are configured?
- What APIs or tools exist?
- What does the code do?
Test Design - Claude creates project-specific tests:
- Test cases for the actual tools/endpoints
- Expected values based on real behavior
- Edge cases relevant to this domain
Structure - Using patterns from this skill:
- YAML specs in tests/ directory
- Optional testing agent in .claude/agents/
- Results saved to tests/results/

Example:

You: "Create tests for this MCP server"

Claude: [Discovers this is a Google Calendar MCP]
        [Sees tools: calendar_events, calendar_create, calendar_delete]
        [Designs test cases:]

        tests/calendar-events.yaml:
        - list_upcoming_events (expect: array, count_gte 0)
        - search_by_keyword (expect: contains search term)
        - invalid_date_range (expect: error status)

        tests/calendar-mutations.yaml:
        - create_event (expect: success, returns event_id)
        - delete_nonexistent (expect: error, contains "not found")

The skill teaches Claude:

How to structure YAML test specs
What validation rules are available
How to create testing agents
When to use parallel execution

Your project provides:

What to actually test
Expected values and behaviors
Domain-specific edge cases

YAML Test Spec Format

name: Feature Tests
description: What these tests validate

# Optional: defaults applied to all tests
defaults:
  tool: my_tool_name
  timeout: 5000

tests:
  - name: test_case_name
    description: Human-readable purpose
    tool: tool_name  # Override default if needed
    params:
      action: search
      query: "test input"
    expect:
      contains: "expected substring"
      not_contains: "should not appear"
      status: success

Validation Rules

Rule	Description	Example
`contains`	Response contains string	`contains: "from:john"`
`not_contains`	Response doesn't contain	`not_contains: "error"`
`matches`	Regex pattern match	`matches: "after:\\d{4}"`
`json_path`

See references/validation-rules.md for complete documentation.

Creating a Testing Agent

Testing agents inherit MCP tools from the session. Create an agent that:

Reads YAML test specs
Executes tool calls with params
Validates responses against expectations
Reports results

Agent Template

CRITICAL : Do NOT specify a tools field if you need MCP access. When you specify ANY tools, it becomes an allowlist and "*" is interpreted literally (not as a wildcard). Omit tools entirely to inherit ALL tools from the parent session.

---
name: my-tester
description: |
  Tests [domain] functionality. Reads YAML test specs and validates responses.
  Use when: testing after changes, running regression tests.
# tools field OMITTED - inherits ALL tools from parent (including MCP)
model: sonnet
---

# [Domain] Tester

## How It Works

1. Find test specs: `tests/*.yaml`
2. Parse and execute each test
3. Validate responses
4. Report pass/fail summary

## Test Spec Location

tests/
├── feature-a.yaml
├── feature-b.yaml
└── results/
    └── YYYY-MM-DD-HHMMSS.md

## Execution

For each test:
1. Call tool with params
2. Capture response
3. Apply validation rules
4. Record PASS/FAIL

## Reporting

Save results to `tests/results/YYYY-MM-DD-HHMMSS.md`

See templates/test-agent.md for complete template.

Results Format

Test results are saved as markdown for git history:

# Test Results: feature-name
**Date**: 2026-02-02 14:30
**Commit**: abc1234
**Summary**: 8/9 passed (89%)

## Results

- test_basic_search - PASSED (0.3s)
- test_with_filter - PASSED (0.4s)
- test_edge_case - FAILED

## Failed Test Details

### test_edge_case
- **Expected**: Contains "expected value"
- **Actual**: Response was empty
- **Params**: `{ action: search, query: "" }`

Save to: tests/results/YYYY-MM-DD-HHMMSS.md

Workflow

1. Create Test Specs

# tests/search.yaml
name: Search Tests
defaults:
  tool: my_search_tool

tests:
  - name: basic_search
    params: { query: "hello" }
    expect: { status: success, count_gte: 0 }

  - name: filtered_search
    params: { query: "hello", filter: "recent" }
    expect: { contains: "results" }

2. Create Testing Agent

Copy templates/test-agent.md and customise for your domain.

3. Run Tests

"Run the search tests"
"Test the API after my changes"
"Run regression tests for gmail-mcp"

4. Review Results

Results saved to tests/results/. Commit them for history:

git add tests/results/
git commit -m "Test results: 8/9 passed"

Parallel Test Execution

Run multiple test agents simultaneously to speed up large test suites:

"Run these test suites in parallel:
- Agent 1: tests/auth/*.yaml
- Agent 2: tests/search/*.yaml
- Agent 3: tests/api/*.yaml"

Each agent:

Has its own context (won't bloat main conversation)
Can run 10-50 tests independently
Returns a summary when done
Inherits MCP tools from parent session

Why parallel agents?

50 tests in main context = context exhaustion
50 tests across 5 agents = clean context + faster execution
Each agent reports pass/fail summary, not every test detail

Batching strategy:

Group tests by feature area or MCP server
10-20 tests per agent is ideal
Too few = overhead of spawning not worth it
Too many = agent context fills up

MCP Testing

For MCP servers, the testing agent inherits configured MCPs:

# Configure MCP first
claude mcp add --transport http gmail https://gmail.mcp.example.com/mcp

# Then test
"Run tests for gmail MCP"

Example MCP test spec:

name: Gmail Search Tests
defaults:
  tool: gmail_messages

tests:
  - name: search_from_person
    params: { action: search, searchQuery: "from John" }
    expect: { contains: "from:john" }

  - name: search_with_date
    params: { action: search, searchQuery: "emails from January 2026" }
    expect: { matches: "after:2026" }

API Testing

For REST APIs, use Bash tool:

name: API Tests
defaults:
  timeout: 5000

tests:
  - name: health_check
    command: curl -s https://api.example.com/health
    expect: { contains: "ok" }

  - name: get_user
    command: curl -s https://api.example.com/users/1
    expect:
      json_path: "$.name"
      type: string

Browser Testing

For browser automation, use Playwright tools:

name: UI Tests

tests:
  - name: login_page_loads
    steps:
      - navigate: https://app.example.com/login
      - snapshot: true
    expect: { contains: "Sign In" }

  - name: form_submission
    steps:
      - navigate: https://app.example.com/form
      - type: { ref: "#email", text: "test@example.com" }
      - click: { ref: "button[type=submit]" }
    expect: { contains: "Success" }

Tips

Start with smoke tests : Basic connectivity and auth
Test edge cases : Empty results, errors, special characters
Use descriptive names : search_with_date_filter not test1
Group related tests : One file per feature area
Add after bugs : Every fixed bug gets a regression test
Commit results : Create history of test runs

What This Is NOT

Not a Jest/Vitest replacement (use those for unit tests)
Not enforcing TDD (use what works for you)
Not a test runner library (the agent IS the runner)
Not about mocking (we test real systems)

When to Use

Scenario	Use This	Use Traditional Testing
MCP server validation	Yes	No
API integration	Yes	Complement with unit tests
Browser workflows	Yes	Complement with component tests
Unit testing	No	Yes (Jest/Vitest)
Component testing	No	Yes (Testing Library)
Type checking	No	Yes (TypeScript)

Related Resources

templates/test-spec.yaml - Generic test spec template
templates/test-agent.md - Testing agent template
references/validation-rules.md - Complete validation rule reference

Weekly Installs

158

Repository

jezweb/claude-skills

GitHub Stars

650

First Seen

Feb 2, 2026

Security Audits

Gen Agent Trust HubFail SocketWarn SnykWarn

Installed on

claude-code128

opencode108

gemini-cli100

replit97

codex94

cursor87

Claude测试模式：基于代理的实时集成测试与YAML规范方法

🇨🇳中文介绍

测试模式

理念

为什么采用基于代理的测试？

命令

相关 Skills

在新项目中开始

YAML 测试规范格式

验证规则

创建测试代理

代理模板

结果格式

工作流

1. 创建测试规范

2. 创建测试代理

3. 运行测试

4. 查看结果

并行测试执行

MCP 测试

API 测试

浏览器测试

提示

这不是什么

何时使用

相关资源