⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

前端测试最佳实践与Playwright测试指南 - 提升测试信心与可靠性

playwright-testing by chongdashu/phaserjs-oakwoods

62 周安装量

65 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/chongdashu/phaserjs-oakwoods --skill playwright-testing

自动化测试前端架构

🇨🇳中文介绍

前端测试

快速获得可靠信心：通过选择合适的测试层级、使应用可观察、消除非确定性，从而实现安全重构，让测试失败具有可操作性。

理念：每分钟信心

前端测试失败有两个原因：产品坏了，或者测试在撒谎。你的工作是最大化信号，最小化“测试撒谎”。

在编写测试前，请思考：

我要覆盖什么用户风险（金钱、进度、认证、数据丢失、崩溃）？
能捕获此类错误的最窄层级是什么（纯逻辑 vs UI vs 完整浏览器）？
存在哪些非确定性因素（时间、随机数生成、异步加载、网络、动画、字体、GPU）？
除了 setTimeout，我可以等待什么“就绪”信号？
失败时应打印/截图什么信息，以便在 CI 中诊断？

核心原则：

测试契约，而非实现：断言稳定的、对用户有意义的结果和公共接缝。
优先确定性而非重试：使时间/随机数/网络可控；从源头消除不稳定因素。
像调试器一样观察：失败时的控制台错误、网络故障、截图和状态转储。
先有一个关键流程：一个可靠的冒烟测试胜过 50 个不稳定的测试。

测试层级决策树

选择能提供所需信心的最廉价层级：

层级	速度	用途
单元测试	最快	纯函数、reducer、验证器、数学、寻路、确定性模拟

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

快速开始：第一个冒烟测试

定义一个关键流程：“页面加载 → 用户可以开始 → 一个关键操作有效”
为应用添加一个测试接缝（见下文）
选择运行器：端到端测试用 Playwright MCP，逻辑测试用单元测试
大声失败：将控制台错误和失败的请求视为测试失败
稳定化：设定随机数种子、冻结时间、固定视口、禁用动画

具体的 MCP 工作流：测试游戏

测试 Phaser/canvas 游戏的逐步流程：

1. mcp__playwright__browser_navigate
   → http://localhost:3000?test=1&seed=42

2. mcp__playwright__browser_evaluate
   → () => new Promise(r => { const c = () => window.__TEST__?.ready ? r(true) : setTimeout(c, 100); c(); })
   (等待游戏就绪)

3. mcp__playwright__browser_console_messages
   → level: "error"
   (如果有任何错误则失败)

4. mcp__playwright__browser_snapshot
   → 获取 UI 状态和引用

5. mcp__playwright__browser_click
   → element: "Start Button", ref: [from snapshot]

6. mcp__playwright__browser_evaluate
   → () => window.__TEST__.state()
   (断言游戏状态正确)

7. mcp__playwright__browser_press_key
   → key: "ArrowRight" (或 WASD 用于移动)

8. mcp__playwright__browser_evaluate
   → () => window.__TEST__.state().player.x
   (验证移动发生)

9. mcp__playwright__browser_take_screenshot
   → filename: "gameplay-state.png"
   (确定性设置后的视觉证据)

需要避免的反模式

❌ 测试错误的层级：用端到端测试验证纯逻辑 为何诱人：“让我们通过浏览器测试所有东西” 更好的做法：逻辑用单元测试；端到端测试留给集成契约

❌ 测试实现细节：断言 DOM 结构/类名 为何诱人：断言在 DevTools 中能看到的东西很容易 更好的做法：断言对用户有意义的输出（文本、分数、HP 变化）

❌ 睡眠驱动的测试：等待 2 秒然后点击 为何诱人：简单且“在我的机器上有效” 更好的做法：等待明确的就绪信号（DOM 标记、window.__TEST__.ready）

❌ 不受控的随机性：断言中使用随机数/时间 为何诱人：“游戏使用随机，所以测试也应该用” 更好的做法：设定随机数种子（?seed=42）、冻结时间、断言稳定的不变量

❌ 没有确定性的像素快照：Canvas 截图不稳定 为何诱人：“我会自动捕获视觉错误” 更好的做法：先启用确定性模式；然后在已知的稳定帧截图

❌ 将重试作为策略：“把重试次数增加到 3 次” 为何诱人：让 CI 变绿的快速修复 更好的做法：修复不稳定的根源；重试掩盖了真正的问题

调试失败的测试

当测试失败时，按此顺序收集证据：

控制台错误：mcp__playwright__browser_console_messages({ level: "error" })
网络故障：mcp__playwright__browser_network_requests() → 检查非 2xx 状态码
截图：mcp__playwright__browser_take_screenshot() → 失败时的视觉状态
应用状态：mcp__playwright__browser_evaluate({ function: "() => window.__TEST__.state()" })
对不稳定因素分类（参见 references/flake-reduction.md）： * 就绪问题？ → 添加显式等待 * 时序问题？ → 控制动画/物理 * 环境问题？ → 锁定视口/设备像素比 * 数据问题？ → 隔离测试数据

毕业标准：何时测试“足够”？

最低可行的测试套件：

1 个冒烟测试，证明应用加载且主要操作有效
存在测试接缝（带有就绪标志和状态的 window.__TEST__）
Canvas/游戏的确定性模式（?test=1 启用种子设定）
控制台错误导致测试失败（无静默失败）
每次推送时 CI 运行测试

关键路径（认证、支付、保存/加载）有专门的端到端测试
单元测试覆盖复杂逻辑（寻路、伤害计算、状态机）
关键屏幕（菜单、HUD）的视觉回归测试，并锁定确定性

使用 imgdiff.py 进行视觉回归测试

用于截图的像素比较：

# 比较基准与当前
python scripts/imgdiff.py baseline.png current.png --out diff.png

# 允许小的容差（抗锯齿差异）
python scripts/imgdiff.py baseline.png current.png --max-rms 2.0

退出码：0 = 相同，1 = 不同，2 = 错误

UI 切片回归（九宫格 / 带状条 / 进度条）

Canvas UI 问题（面板接缝、分段带状条、不可见的 HUD 填充）最好通过专用的 UI 测试工具捕获，而不是完整的游戏流程。

构建一个简单的 test.html/场景，仅加载 UI 资源。
将原始切片与组装好的面板（多尺寸）并排渲染，并包含“原始裁剪 + 缩放”和“拼接多切片”两种视图的带状条/进度条。
暴露 window.__TEST__ 及其 .commands.showTest(n)，以便 Playwright 可以确定性地切换每种模式。
捕获有针对性的截图（面板、带状条、进度条）并在 CI 中进行差异比较。

有关确定性设置 + 截图工作流，请参见 references/phaser-canvas-testing.md。

根据上下文调整方法：

DOM 应用：标准 Playwright 选择器，等待文本/元素
Canvas 游戏：必须使用测试接缝，通过 window.__TEST__.ready 等待
混合应用：菜单用 DOM，游戏玩法用测试接缝
仅 CI 的 GPU：可能需要软件渲染标志或跳过视觉测试
UI 切片回归：对于九宫格/带状条/进度条问题，优先使用具有确定性模式和针对性截图的小型 UI 测试工具场景/页面（references/phaser-canvas-testing.md）。

references/playwright-mcp-cheatsheet.md：详细的 MCP 工具模式
references/phaser-canvas-testing.md：Phaser 游戏的确定性模式
references/flake-reduction.md：不稳定因素分类和修复方法

你可以通过添加一个微小、稳定的接缝用于就绪状态 + 状态，使几乎任何前端（包括 canvas/WebGL 游戏）变得可测试。一个可靠的冒烟测试是基础。目标是维护起来枯燥的测试：确定性的、明确就绪状态的、失败时证据丰富的。目标是信心，而不是覆盖率数字。

🇺🇸English

Frontend Testing

Unlock reliable confidence fast: enable safe refactors by choosing the right test layer, making the app observable, and eliminating nondeterminism so failures are actionable.

Philosophy: Confidence Per Minute

Frontend tests fail for two reasons: the product is broken, or the test is lying. Your job is to maximize signal and minimize "test is lying".

Before writing a test, ask :

What user risk am I covering (money, progression, auth, data loss, crashes)?
What's the narrowest layer that catches this bug class (pure logic vs UI vs full browser)?
What nondeterminism exists (time, RNG, async loading, network, animations, fonts, GPU)?
What "ready" signal can I wait on besides setTimeout?
What should a failure print/screenshot so it's diagnosable in CI?

Core principles :

Test the contract, not the implementation : assert stable user-meaningful outcomes and public seams.
Prefer determinism over retries : make time/RNG/network controllable; remove flake at the source.
Observe like a debugger : console errors, network failures, screenshots, and state dumps on failure.
One critical flow first : a reliable smoke test beats 50 flaky tests.

Test Layer Decision Tree

Pick the cheapest layer that provides needed confidence:

Layer	Speed	Use For
Unit	Fastest	Pure functions, reducers, validators, math, pathfinding, deterministic simulation
Component	Medium	UI behavior with mocked IO (React Testing Library, Vue Testing Library)
E2E	Slowest	Critical user flows across routing, storage, real bundling/runtime
Visual	Specialized	Layout/pixel regressions; for canvas/WebGL, only after locking determinism

Quick Start: First Smoke Test

Define 1 critical flow : "page loads → user can start → one key action works"
Add a test seam to the app (see below)
Choose runner : Playwright MCP for E2E, unit tests for logic
Fail loudly : treat console errors and failed requests as test failures
Stabilize : seed RNG, freeze time, fix viewport, disable animations

Concrete MCP Workflow: Testing a Game

Step-by-step sequence for testing a Phaser/canvas game:

1. mcp__playwright__browser_navigate
   → http://localhost:3000?test=1&seed=42

2. mcp__playwright__browser_evaluate
   → () => new Promise(r => { const c = () => window.__TEST__?.ready ? r(true) : setTimeout(c, 100); c(); })
   (Wait for game ready)

3. mcp__playwright__browser_console_messages
   → level: "error"
   (Fail if any errors)

4. mcp__playwright__browser_snapshot
   → Get UI state and refs

5. mcp__playwright__browser_click
   → element: "Start Button", ref: [from snapshot]

6. mcp__playwright__browser_evaluate
   → () => window.__TEST__.state()
   (Assert game state is correct)

7. mcp__playwright__browser_press_key
   → key: "ArrowRight" (or WASD for movement)

8. mcp__playwright__browser_evaluate
   → () => window.__TEST__.state().player.x
   (Verify movement happened)

9. mcp__playwright__browser_take_screenshot
   → filename: "gameplay-state.png"
   (Visual evidence after deterministic setup)

Recommended Test Seams

Add to the app for testability (read-only, stable, minimal):

window.__TEST__ = {
  ready: false,           // true after first interactive frame
  seed: null,             // current RNG seed
  sceneKey: null,         // current scene/route
  state: () => ({         // JSON-serializable snapshot
    scene: this.sceneKey,
    player: { x, y, hp },
    score: gameState.score,
    entities: entities.map(e => ({ id: e.id, type: e.type, x: e.x, y: e.y }))
  }),
  commands: {             // optional mutation commands
    reset: () => {},
    seed: (n) => {},
    skipIntro: () => {}
  }
};

Rule : Expose IDs + essential fields, not raw Phaser/engine objects.

Anti-Patterns to Avoid

❌ Testing the wrong layer : E2E tests for pure logic Why tempting : "Let's just test everything through the browser" Better : Unit tests for logic; reserve E2E for integration contracts

❌ Testing implementation details : Asserting DOM structure/classnames Why tempting : Easy to assert what you can see in DevTools Better : Assert user-meaningful outputs (text, score, HP changes)

❌ Sleep-driven tests : wait 2s then click Why tempting : Simple and "works on my machine" Better : Wait on explicit readiness (DOM marker, window.__TEST__.ready)

❌ Uncontrolled randomness : RNG/time in assertions Why tempting : "The game uses random, so the test should too" Better : Seed RNG (?seed=42), freeze time, assert stable invariants

❌ Pixel snapshots without determinism : Canvas screenshots that flake Why tempting : "I'll catch visual bugs automatically" Better : Deterministic mode first; then screenshot at known stable frames

❌ Retries as a strategy : "Just bump retries to 3" Why tempting : Quick fix that makes CI green Better : Fix the flake source; retries hide real problems

Debugging Failed Tests

When a test fails, gather evidence in this order:

Console errors : mcp__playwright__browser_console_messages({ level: "error" })
Network failures : mcp__playwright__browser_network_requests() → check for non-2xx
Screenshot : mcp__playwright__browser_take_screenshot() → visual state at failure
App state : mcp__playwright__browser_evaluate({ function: "() => window.__TEST__.state()" })
Classify the flake (see references/flake-reduction.md):
- Readiness? → add explicit wait
- Timing? → control animation/physics
- Environment? → lock viewport/DPR
- Data? → isolate test data

Graduation Criteria: When Is Testing "Enough"?

Minimum viable test suite:

1 smoke test that proves the app loads and primary action works
Test seam exists (window.__TEST__ with ready flag and state)
Deterministic mode for canvas/games (?test=1 enables seeding)
Console errors fail tests (no silent failures)
CI runs tests on every push

Level up when:

Critical paths (auth, payment, save/load) have dedicated E2E
Unit tests cover complex logic (pathfinding, damage calc, state machines)
Visual regression on key screens (menu, HUD) with locked determinism

Visual Regression with imgdiff.py

For pixel comparison of screenshots:

# Compare baseline to current
python scripts/imgdiff.py baseline.png current.png --out diff.png

# Allow small tolerance (anti-aliasing differences)
python scripts/imgdiff.py baseline.png current.png --max-rms 2.0

Exit codes: 0 = identical, 1 = different, 2 = error

UI Slicing Regressions (Nine-Slice / Ribbons / Bars)

Canvas UI issues (panel seams, segmented ribbons, invisible HUD fills) are best caught with a dedicated UI harness instead of the full gameplay flow.

Build a simple test.html/scene that loads only the UI assets.
Render raw slices next to assembled panels (multi-size), and include ribbon/bars with both “raw crop + scale” and “stitched multi-slice” views.
Expose window.__TEST__ with .commands.showTest(n) so Playwright can toggle each mode deterministically.
Capture targeted screenshots (panels, ribbons, bars) and diff them in CI.

See references/phaser-canvas-testing.md for the deterministic setup + screenshot workflow.

Variation Guidance

Adapt approach based on context:

DOM app : Standard Playwright selectors, wait for text/elements
Canvas game : Test seams mandatory, wait via window.__TEST__.ready
Hybrid : DOM for menus, test seams for gameplay
CI-only GPU : May need software rendering flags or skip visual tests
UI slicing regressions : For nine-slice/ribbon/bar artifacts, prefer a small UI harness scene/page with deterministic modes and targeted screenshots (references/phaser-canvas-testing.md).

Bundled Resources

Read these when needed:

references/playwright-mcp-cheatsheet.md: Detailed MCP tool patterns
references/phaser-canvas-testing.md: Deterministic mode for Phaser games
references/flake-reduction.md: Flake classification and fixes

Remember

You can make almost any frontend (including canvas/WebGL games) testable by adding a tiny, stable seam for readiness + state. One reliable smoke test is the foundation. Aim for tests that are boring to maintain: deterministic, explicit about readiness, and rich in failure evidence. The goal is confidence, not coverage numbers.

Weekly Installs

Repository

chongdashu/phas…oakwoods

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode44

gemini-cli38

cursor36

codex35

github-copilot33

claude-code29

Skills CLI 使用指南：AI Agent 技能包管理器安装与管理教程

52,700 周安装