e2e-tests-studio：基于Playwright的端到端行为测试框架，专注验证产品功能而非UI状态

e2e-tests-studio by mastra-ai/mastra

274 周安装量

22,700 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mastra-ai/mastra --skill e2e-tests-studio

自动化测试前端架构

🇨🇳中文介绍

前端修改的端到端行为验证

核心原则：测试产品行为，而非 UI 状态

关键：测试必须验证产品功能是否正常工作，而不仅仅是 UI 元素是否渲染。

不应测试的内容（UI 状态）：

❌ "点击时下拉菜单打开"
❌ "按钮点击后模态框出现"
❌ "请求期间显示加载指示器"
❌ "表单字段可见"
❌ "侧边栏折叠"

应该测试的内容（产品行为）：

✅ "选择 LLM 提供商会将智能体配置为使用该提供商"
✅ "创建新智能体会持久化保存并显示在智能体列表中"
✅ "使用参数运行工具会返回预期输出"
✅ "聊天消息正确流式传输并保持对话上下文"
✅ "工作流执行按正确顺序触发工具"

先决条件

需要 Playwright MCP 服务器。如果 browser_navigate 工具不可用，请指示用户添加它：

claude mcp add playwright -- npx @playwright/mcp@latest

步骤 1：理解功能意图

在编写任何测试之前，请回答以下问题：

此功能解决了什么用户问题？
功能正常工作时预期结果是什么？
哪些数据流经系统？（用户输入 → API → 状态 → UI）
页面重新加载后哪些内容应持久化？
此操作应产生哪些下游影响？

将这些答案作为注释记录在测试文件中。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 2：构建并启动

pnpm build:cli
cd packages/playground/e2e/kitchen-sink && pnpm dev

验证服务器是否在 http://localhost:4111 运行。

步骤 3：将功能映射到行为测试

功能到测试映射指南

功能类别	测试内容	示例断言
智能体配置	配置更改影响智能体行为	发送消息 → 验证响应使用所选模型
LLM 提供商选择	所选提供商在请求中使用	拦截 API 调用 → 验证请求负载中的提供商
工具执行	工具使用正确参数运行并返回结果	执行工具 → 验证输出匹配预期转换
工作流执行	步骤按顺序执行，数据在步骤间流动	运行工作流 → 验证每个步骤的输出馈送到下一步
聊天/流式传输	消息持久化，上下文在多轮对话中保持	多轮对话 → 验证上下文感知能力
MCP 服务器工具	服务器工具可调用并返回数据	调用 MCP 工具 → 验证响应结构和内容
内存/持久化	数据在页面重新加载后保留	创建项目 → 重新加载 → 验证项目存在
错误处理	错误正确呈现给用户	触发错误条件 → 验证错误消息 + 恢复

步骤 4：编写行为导向的测试

import { test, expect, Page } from '@playwright/test';
import { resetStorage } from '../__utils__/reset-storage';
import { selectFixture } from '../__utils__/select-fixture';
import { nanoid } from 'nanoid';

/**
 * 功能：[功能名称]
 * 用户故事：作为用户，我希望[执行操作]以便[达成结果]
 * 测试中的行为：[正在验证的特定行为]
 */

test.describe('[功能名称] - 行为测试', () => {
  let page: Page;

  test.beforeEach(async ({ browser }) => {
    const context = await browser.newContext();
    page = await context.newPage();
  });

  test.afterEach(async () => {
    await resetStorage(page);
  });

  test('当[触发条件]时，应该[描述行为的动词]', async () => {
    // 准备：设置前置条件
    // - 导航到功能页面
    // - 配置任何必需的状态
    // 执行：执行触发行为的用户操作
    // 断言：验证结果，而非 UI 状态
    // - 检查数据持久化
    // - 验证下游影响
    // - 确认 API 调用正确进行
  });
});

模式 1：配置影响行为

test('选择 LLM 提供商应将该提供商用于智能体响应', async () => {
  // 准备
  await page.goto('/agents/my-agent/chat');

  // 拦截 API 以验证提供商
  let capturedProvider: string | null = null;
  await page.route('**/api/chat', route => {
    const body = JSON.parse(route.request().postData() || '{}');
    capturedProvider = body.provider;
    route.continue();
  });

  // 执行：选择不同的提供商
  await page.getByTestId('provider-selector').click();
  await page.getByRole('option', { name: 'OpenAI' }).click();

  // 发送消息以触发智能体
  await page.getByTestId('chat-input').fill('Hello');
  await page.getByTestId('send-button').click();

  // 断言：验证使用了选定的提供商
  await expect.poll(() => capturedProvider).toBe('openai');
});

模式 2：数据持久化

test('创建的智能体应在页面重新加载后持久化', async () => {
  // 准备
  await page.goto('/agents');
  const agentName = `Test Agent ${nanoid()}`;

  // 执行：创建新智能体
  await page.getByTestId('create-agent-button').click();
  await page.getByTestId('agent-name-input').fill(agentName);
  await page.getByTestId('save-agent-button').click();

  // 等待创建完成
  await expect(page.getByText(agentName)).toBeVisible();

  // 断言：验证持久化
  await page.reload();
  await expect(page.getByText(agentName)).toBeVisible({ timeout: 10000 });
});

模式 3：工具执行产生正确输出

test('天气工具应返回格式化的天气数据', async () => {
  // 准备
  await selectFixture(page, 'weather-success');
  await page.goto('/tools/weather-tool');

  // 执行：使用参数执行工具
  await page.getByTestId('param-city').fill('San Francisco');
  await page.getByTestId('execute-tool-button').click();

  // 断言：验证输出内容，而不仅仅是输出出现
  const output = page.getByTestId('tool-output');
  await expect(output).toContainText('temperature');
  await expect(output).toContainText('San Francisco');

  // 如果适用，验证结构化数据
  const outputText = await output.textContent();
  const outputData = JSON.parse(outputText || '{}');
  expect(outputData).toHaveProperty('temperature');
  expect(outputData).toHaveProperty('conditions');
});

模式 4：工作流步骤链式调用

test('工作流应在步骤间正确传递数据', async () => {
  // 准备
  await selectFixture(page, 'workflow-multi-step');
  const sessionId = nanoid();
  await page.goto(`/workflows/data-pipeline?session=${sessionId}`);

  // 执行：触发工作流执行
  await page.getByTestId('workflow-input').fill('test input data');
  await page.getByTestId('run-workflow-button').click();

  // 断言：验证每个步骤从上一步接收到正确的输入
  // 等待完成
  await expect(page.getByTestId('workflow-status')).toHaveText('completed', { timeout: 30000 });

  // 检查步骤输出显示数据转换链
  const step1Output = await page.getByTestId('step-1-output').textContent();
  const step2Output = await page.getByTestId('step-2-output').textContent();

  // 验证步骤 2 将步骤 1 的输出作为输入接收
  expect(step2Output).toContain(step1Output);
});

模式 5：带上下文的流式聊天

test('聊天应在多条消息间保持对话上下文', async () => {
  // 准备
  await selectFixture(page, 'contextual-chat');
  const chatId = nanoid();
  await page.goto(`/agents/assistant/chat/${chatId}`);

  // 执行：多轮对话
  await page.getByTestId('chat-input').fill('My name is Alice');
  await page.getByTestId('send-button').click();
  await expect(page.getByTestId('assistant-message').last()).toBeVisible({ timeout: 20000 });

  await page.getByTestId('chat-input').fill('What is my name?');
  await page.getByTestId('send-button').click();

  // 断言：验证上下文得以保持
  const response = page.getByTestId('assistant-message').last();
  await expect(response).toContainText('Alice', { timeout: 20000 });
});

模式 6：错误恢复

test('当 API 失败时应显示可操作的错误并允许重试', async () => {
  // 准备：设置失败夹具
  await selectFixture(page, 'api-failure');
  await page.goto('/tools/flaky-tool');

  // 执行：触发错误
  await page.getByTestId('execute-tool-button').click();

  // 断言：显示带有恢复选项的错误
  await expect(page.getByTestId('error-message')).toContainText('failed');
  await expect(page.getByTestId('retry-button')).toBeVisible();

  // 切换到成功夹具并重试
  await selectFixture(page, 'api-success');
  await page.getByTestId('retry-button').click();

  // 验证恢复有效
  await expect(page.getByTestId('tool-output')).toBeVisible({ timeout: 10000 });
  await expect(page.getByTestId('error-message')).not.toBeVisible();
});

步骤 5：更新现有测试

当测试文件已存在时：

阅读现有测试以了解当前覆盖范围
识别测试是 UI 导向还是行为导向
重构 UI 导向的测试以验证行为：

重构前（UI 导向）：

test('dropdown opens when clicked', async () => {
  await page.getByTestId('model-dropdown').click();
  await expect(page.getByRole('listbox')).toBeVisible();
});

重构后（行为导向）：

test('从下拉菜单中选择模型会更新智能体配置', async () => {
  // 打开下拉菜单并选择模型
  await page.getByTestId('model-dropdown').click();
  await page.getByRole('option', { name: 'GPT-4' }).click();

  // 验证选择持久化并影响行为
  await page.reload();
  await expect(page.getByTestId('model-dropdown')).toHaveText('GPT-4');

  // 可选：验证模型在实际请求中使用
  //（通过请求拦截或检查响应元数据）
});

步骤 6：用于行为测试的 Kitchen-Sink 夹具

夹具应代表真实场景，而不仅仅是模拟数据：

<功能>-<场景>.fixture.ts

示例：
- agent-with-tools.fixture.ts
- chat-multi-turn-context.fixture.ts
- workflow-parallel-execution.fixture.ts
- tool-validation-error.fixture.ts
- mcp-server-timeout.fixture.ts

每个夹具必须定义：

场景描述（它支持测试什么行为）
预期结果（哪些断言应通过）
覆盖的边缘情况（错误状态、空状态等）

// fixtures/agent-provider-switch.fixture.ts
export const agentProviderSwitch = {
  name: 'agent-provider-switch',
  description: '测试切换 LLM 提供商会改变智能体行为',

  // 不同提供商的模拟响应
  responses: {
    openai: { content: 'Response from OpenAI', model: 'gpt-4' },
    anthropic: { content: 'Response from Anthropic', model: 'claude-3' },
  },

  expectedBehavior: {
    // 当提供商切换时，后续消息使用新提供商
    providerSwitchAffectsNextMessage: true,
    // 提供商选择在页面重新加载后持久化
    providerPersistsOnReload: true,
  },
};

步骤 7：运行和验证

cd packages/playground && pnpm test:e2e

测试质量检查清单

在认为测试完成之前，请验证：

每个测试都有清晰的用户故事注释
测试验证结果，而非中间 UI 状态
如果功能损坏，测试会失败（而不仅仅是 UI 更改）
在适用的情况下通过 page.reload() 验证持久化
覆盖错误场景
测试对异步操作使用适当的超时时间
夹具代表真实的使用场景

步骤	命令/操作
构建	`pnpm build:cli`
启动	`cd packages/playground/e2e/kitchen-sink && pnpm dev`
应用 URL	http://localhost:4111
路由	`@packages/playground/src/App.tsx`
运行测试	`cd packages/playground && pnpm test:e2e`
测试目录	`packages/playground/e2e/tests/`
夹具	`packages/playground/e2e/kitchen-sink/fixtures/`

应避免的反模式

❌ 不要	✅ 应改为
测试模态框是否打开	测试模态框操作完成并持久化
测试按钮是否可点击	测试点击按钮产生预期结果
测试加载指示器是否出现	测试加载的数据正确
测试表单验证消息是否显示	测试无效表单无法提交且有效表单成功
测试下拉菜单是否有选项	测试选择选项会改变系统行为
测试侧边栏导航是否有效	测试导航到的页面具有正确的数据/功能
断言元素可见	断言元素包含预期的数据/状态

2026 年 1 月 26 日

🇺🇸English

E2E Behavior Validation for Frontend Modifications

Core Principle: Test Product Behavior, Not UI States

CRITICAL : Tests must verify that product features WORK correctly, not just that UI elements render.

What NOT to test (UI States):

❌ "Dropdown opens when clicked"
❌ "Modal appears after button click"
❌ "Loading spinner shows during request"
❌ "Form fields are visible"
❌ "Sidebar collapses"

What TO test (Product Behavior):

✅ "Selecting an LLM provider configures the agent to use that provider"
✅ "Creating a new agent persists it and shows in the agents list"
✅ "Running a tool with parameters returns the expected output"
✅ "Chat messages stream correctly and maintain conversation context"
✅ "Workflow execution triggers tools in the correct order"

Prerequisites

Requires Playwright MCP server. If the browser_navigate tool is unavailable, instruct the user to add it:

claude mcp add playwright -- npx @playwright/mcp@latest

Step 1: Understand the Feature Intent

Before writing ANY test, answer these questions:

What user problem does this feature solve?
What is the expected outcome when the feature works correctly?
What data flows through the system? (user input → API → state → UI)
What should persist after page reload?
What downstream effects should this action have?

Document these answers as comments in your test file.

Step 2: Build and Start

pnpm build:cli
cd packages/playground/e2e/kitchen-sink && pnpm dev

Verify server at http://localhost:4111

Step 3: Map Feature to Behavior Tests

Feature-to-Test Mapping Guide

Feature Category	What to Test	Example Assertion
Agent Configuration	Config changes affect agent behavior	Send message → verify response uses selected model
LLM Provider Selection	Selected provider is used in requests	Intercept API call → verify provider in request payload
Tool Execution	Tool runs with correct params & returns result	Execute tool → verify output matches expected transformation
Workflow Execution	Steps execute in order, data flows between steps	Run workflow → verify each step's output feeds next step
Chat/Streaming	Messages persist, context maintained across turns	Multi-turn conversation → verify context awareness
MCP Server Tools	Server tools are callable and return data	Call MCP tool → verify response structure and content
Memory/Persistence

Step 4: Write Behavior-Focused Tests

Test Structure Template

import { test, expect, Page } from '@playwright/test';
import { resetStorage } from '../__utils__/reset-storage';
import { selectFixture } from '../__utils__/select-fixture';
import { nanoid } from 'nanoid';

/**
 * FEATURE: [Name of feature]
 * USER STORY: As a user, I want to [action] so that [outcome]
 * BEHAVIOR UNDER TEST: [Specific behavior being validated]
 */

test.describe('[Feature Name] - Behavior Tests', () => {
  let page: Page;

  test.beforeEach(async ({ browser }) => {
    const context = await browser.newContext();
    page = await context.newPage();
  });

  test.afterEach(async () => {
    await resetStorage(page);
  });

  test('should [verb describing behavior] when [trigger condition]', async () => {
    // ARRANGE: Set up preconditions
    // - Navigate to the feature
    // - Configure any required state
    // ACT: Perform the user action that triggers the behavior
    // ASSERT: Verify the OUTCOME, not the UI state
    // - Check data persistence
    // - Verify downstream effects
    // - Confirm API calls made correctly
  });
});

Behavior Test Patterns

Pattern 1: Configuration Affects Behavior

test('selecting LLM provider should use that provider for agent responses', async () => {
  // ARRANGE
  await page.goto('/agents/my-agent/chat');

  // Intercept API to verify provider
  let capturedProvider: string | null = null;
  await page.route('**/api/chat', route => {
    const body = JSON.parse(route.request().postData() || '{}');
    capturedProvider = body.provider;
    route.continue();
  });

  // ACT: Select a different provider
  await page.getByTestId('provider-selector').click();
  await page.getByRole('option', { name: 'OpenAI' }).click();

  // Send a message to trigger the agent
  await page.getByTestId('chat-input').fill('Hello');
  await page.getByTestId('send-button').click();

  // ASSERT: Verify the selected provider was used
  await expect.poll(() => capturedProvider).toBe('openai');
});

Pattern 2: Data Persistence

test('created agent should persist after page reload', async () => {
  // ARRANGE
  await page.goto('/agents');
  const agentName = `Test Agent ${nanoid()}`;

  // ACT: Create new agent
  await page.getByTestId('create-agent-button').click();
  await page.getByTestId('agent-name-input').fill(agentName);
  await page.getByTestId('save-agent-button').click();

  // Wait for creation to complete
  await expect(page.getByText(agentName)).toBeVisible();

  // ASSERT: Verify persistence
  await page.reload();
  await expect(page.getByText(agentName)).toBeVisible({ timeout: 10000 });
});

Pattern 3: Tool Execution Produces Correct Output

test('weather tool should return formatted weather data', async () => {
  // ARRANGE
  await selectFixture(page, 'weather-success');
  await page.goto('/tools/weather-tool');

  // ACT: Execute tool with parameters
  await page.getByTestId('param-city').fill('San Francisco');
  await page.getByTestId('execute-tool-button').click();

  // ASSERT: Verify OUTPUT content, not just that output appears
  const output = page.getByTestId('tool-output');
  await expect(output).toContainText('temperature');
  await expect(output).toContainText('San Francisco');

  // Verify structured data if applicable
  const outputText = await output.textContent();
  const outputData = JSON.parse(outputText || '{}');
  expect(outputData).toHaveProperty('temperature');
  expect(outputData).toHaveProperty('conditions');
});

Pattern 4: Workflow Step Chaining

test('workflow should pass data between steps correctly', async () => {
  // ARRANGE
  await selectFixture(page, 'workflow-multi-step');
  const sessionId = nanoid();
  await page.goto(`/workflows/data-pipeline?session=${sessionId}`);

  // ACT: Trigger workflow execution
  await page.getByTestId('workflow-input').fill('test input data');
  await page.getByTestId('run-workflow-button').click();

  // ASSERT: Verify each step received correct input from previous step
  // Wait for completion
  await expect(page.getByTestId('workflow-status')).toHaveText('completed', { timeout: 30000 });

  // Check step outputs show data transformation chain
  const step1Output = await page.getByTestId('step-1-output').textContent();
  const step2Output = await page.getByTestId('step-2-output').textContent();

  // Verify step 2 received step 1's output as input
  expect(step2Output).toContain(step1Output);
});

Pattern 5: Streaming Chat with Context

test('chat should maintain conversation context across messages', async () => {
  // ARRANGE
  await selectFixture(page, 'contextual-chat');
  const chatId = nanoid();
  await page.goto(`/agents/assistant/chat/${chatId}`);

  // ACT: Multi-turn conversation
  await page.getByTestId('chat-input').fill('My name is Alice');
  await page.getByTestId('send-button').click();
  await expect(page.getByTestId('assistant-message').last()).toBeVisible({ timeout: 20000 });

  await page.getByTestId('chat-input').fill('What is my name?');
  await page.getByTestId('send-button').click();

  // ASSERT: Verify context was maintained
  const response = page.getByTestId('assistant-message').last();
  await expect(response).toContainText('Alice', { timeout: 20000 });
});

Pattern 6: Error Recovery

test('should show actionable error and allow retry when API fails', async () => {
  // ARRANGE: Set up failure fixture
  await selectFixture(page, 'api-failure');
  await page.goto('/tools/flaky-tool');

  // ACT: Trigger the error
  await page.getByTestId('execute-tool-button').click();

  // ASSERT: Error is shown with recovery option
  await expect(page.getByTestId('error-message')).toContainText('failed');
  await expect(page.getByTestId('retry-button')).toBeVisible();

  // Switch to success fixture and retry
  await selectFixture(page, 'api-success');
  await page.getByTestId('retry-button').click();

  // Verify recovery worked
  await expect(page.getByTestId('tool-output')).toBeVisible({ timeout: 10000 });
  await expect(page.getByTestId('error-message')).not.toBeVisible();
});

Step 5: Update Existing Tests

When a test file already exists:

Read the existing tests to understand current coverage
Identify if tests are UI-focused or behavior-focused
Refactor UI-focused tests to verify behavior instead:

Refactoring Example

BEFORE (UI-focused):

test('dropdown opens when clicked', async () => {
  await page.getByTestId('model-dropdown').click();
  await expect(page.getByRole('listbox')).toBeVisible();
});

AFTER (Behavior-focused):

test('selecting model from dropdown updates agent configuration', async () => {
  // Open dropdown and select model
  await page.getByTestId('model-dropdown').click();
  await page.getByRole('option', { name: 'GPT-4' }).click();

  // Verify the selection persists and affects behavior
  await page.reload();
  await expect(page.getByTestId('model-dropdown')).toHaveText('GPT-4');

  // Optionally: verify the model is used in actual requests
  // (via request interception or checking response metadata)
});

Step 6: Kitchen-Sink Fixtures for Behavior Testing

Fixtures should represent realistic scenarios , not just mock data:

Fixture Naming Convention

<feature>-<scenario>.fixture.ts

Examples:
- agent-with-tools.fixture.ts
- chat-multi-turn-context.fixture.ts
- workflow-parallel-execution.fixture.ts
- tool-validation-error.fixture.ts
- mcp-server-timeout.fixture.ts

Fixture Content Requirements

Each fixture must define:

Scenario description (what behavior it enables testing)
Expected outcomes (what assertions should pass)
Edge cases covered (error states, empty states, etc.)

// fixtures/agent-provider-switch.fixture.ts

export const agentProviderSwitch = {
  name: 'agent-provider-switch',
  description: 'Tests that switching LLM providers changes agent behavior',

  // Mock responses for different providers
  responses: {
    openai: { content: 'Response from OpenAI', model: 'gpt-4' },
    anthropic: { content: 'Response from Anthropic', model: 'claude-3' },
  },

  expectedBehavior: {
    // When provider is switched, subsequent messages use new provider
    providerSwitchAffectsNextMessage: true,
    // Provider selection persists across page reload
    providerPersistsOnReload: true,
  },
};

Step 7: Run and Validate

cd packages/playground && pnpm test:e2e

Test Quality Checklist

Before considering tests complete, verify:

Each test has a clear user story comment
Tests verify OUTCOMES, not intermediate UI states
Tests would FAIL if the feature broke (not just if UI changed)
Persistence is verified via page.reload() where applicable
Error scenarios are covered
Tests use appropriate timeouts for async operations
Fixtures represent realistic usage scenarios

Quick Reference

Step	Command/Action
Build	`pnpm build:cli`
Start	`cd packages/playground/e2e/kitchen-sink && pnpm dev`
App URL	http://localhost:4111
Routes	`@packages/playground/src/App.tsx`
Run tests	`cd packages/playground && pnpm test:e2e`
Test dir	`packages/playground/e2e/tests/`

Anti-Patterns to Avoid

❌ Don't	✅ Do Instead
Test that modal opens	Test that modal action completes and persists
Test that button is clickable	Test that clicking button produces expected result
Test loading spinner appears	Test that loaded data is correct
Test form validation message shows	Test that invalid form cannot submit AND valid form succeeds
Test dropdown has options	Test that selecting option changes system behavior
Test sidebar navigation works	Test that navigated page has correct data/functionality
Assert element is visible	Assert element contains expected data/state

Weekly Installs

148

Repository

mastra-ai/mastra

GitHub Stars

22.2K

First Seen

Jan 26, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode141

gemini-cli137

codex136

github-copilot134

claude-code131

cursor129

Skills CLI 使用指南：AI Agent 技能包管理器安装与管理教程

36,300 周安装