UX审计工具 - 自动化用户体验测试与网站可用性分析

ux-audit by jezweb/claude-skills

669 周安装量

650 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jezweb/claude-skills --skill ux-audit

自动化测试用户体验

🇨🇳中文介绍

UX 审计

像真实用户一样浏览 Web 应用——带着他们的目标、耐心和上下文。超越“它是否能用？”而关注“它是否好用？”，通过追踪情感摩擦（信任、焦虑、困惑）、计算点击效率、测试恢复能力，并询问终极问题：“我会回来使用吗？”使用 Chrome MCP（用于已登录会话的认证应用）或 Playwright 进行浏览器自动化。生成结构化的审计报告，并按影响程度对发现的问题进行排序。

浏览器工具检测

在启动任何模式之前，先检测可用的浏览器工具：

Chrome MCP (mcp__claude-in-chrome__*) — 适用于认证应用的首选。使用用户已登录的 Chrome 会话，因此 OAuth/ Cookie 可以直接工作。
Playwright MCP (mcp__plugin_playwright_playwright__*) — 适用于公共应用或并行会话。
playwright-cli — 适用于脚本化流程和子代理浏览器任务。

如果没有可用的工具，请告知用户并建议安装 Chrome MCP 或 Playwright。

有关特定工具的命令，请参阅 references/browser-tools.md。

URL 解析

如果用户未提供 URL，则自动查找一个。优先使用已部署/线上版本——那是真实用户看到的。

检查 wrangler.jsonc 中的自定义域名或 routes：

grep -E '"pattern"|"custom_domain"' wrangler.jsonc 2>/dev/null

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

深度	持续时间	自主性	涵盖内容
quick	5-10 分钟	交互式	一个用户流程，仅限顺利路径。变更后的抽查。
standard	20-40 分钟	半自主	完整走查 + 主要页面的质量保证扫描。默认。
thorough	1-3 小时	全自主	多个人物角色，所有页面，所有模式组合。夜间模式。
exhaustive	4-8+ 小时	全自主	每个页面上的每个交互元素。点击每个按钮，打开每个对话框，填写每个表单，触发每个状态。不留任何未测试项。

彻底模式：夜间工作流

彻底模式设计为无人值守运行。在一天结束时启动，第二天早上查看报告。用户不应需要自己发现问题——此模式能捕获一切。

心态：不要机械地核对清单。思考每天会使用此应用的真实用户。他们工作日的工作线索是什么？他们将如何在系统中移动？他们会理解他们正在看的内容吗？应用会通过其设计教会他们如何使用，还是他们会猜测？在开始前阅读 references/workflow-comprehension.md。

发现所有路由 — 读取路由器配置，爬取导航，构建完整的页面清单
识别工作流线索 — 用户一天中要做的 3-5 个真实任务是什么？在测试单个页面之前先映射它们。参见 references/workflow-comprehension.md。
创建任务列表 — 在整个审计过程中跟踪进度
视觉与布局扫描（每个页面）： * 在 1280px、1024px、768px、375px 宽度下截图 * 在浅色模式和深色模式下截图 * 在每个页面上运行 JS 溢出检测（见下文） * 检查文本被裁剪、元素重叠、网格损坏的情况 * 比较所有页面的侧边栏 + 内容对齐方式
工作流线索测试 — 从头到尾跟踪每个已识别的线索： * 每个点上的下一步是否不言自明？ * 用户能否离开并返回而不丢失位置？ * 页面之间的转换是否保留上下文（过滤器、选择）？ * 导航标签是否与用户描述其工作的方式匹配？ * 在创建/保存/删除之后，应用是否将他们带到逻辑上的某个地方？
UX 走查 x3 人物角色： * 首次用户（非技术性，时间紧迫，首次访问） * 高级用户（日常用户，了解应用，寻求效率） * 移动用户（手机，触摸目标，小视口）
完整质量保证扫描 — 每个页面，所有 CRUD，所有状态（空、错误、加载中、已填充）
恢复能力测试 — 每个表单：错误数据、导航中途、后退按钮、刷新、重复提交
无障碍基础 — 标题层级、替代文本、焦点顺序、颜色对比度
控制台错误扫描 — 在每个页面上检查浏览器控制台是否有 JS 错误、失败的网络请求、弃用警告
寻路与理解检查 — 在每个页面上：我知道我在哪里吗？我能回去吗？标题是否告诉我在这里可以做什么？视觉线索是否引导我进行正确的操作？
场景测试 — 运行 references/scenario-tests.md 中的所有六个测试： * 新员工入职（能否在零指导的情况下弄懂应用？） * 中断的工作流（开始一个任务，关闭标签页，返回——什么幸存下来了？） * 错误转向恢复（去到错误的页面，需要多少次点击才能回到正轨？） * 第二天（重复相同的任务——是否更快？有快捷方式吗？） * 向同事解释（为每个工作流写一份 2 分钟指南——空白处 = UX 失败） * 发生了什么变化？（创建数据后登录——你能看出需要注意什么吗？）
为所有内容截图 — 保存到 .jez/screenshots/ux-audit/（按时间顺序编号）
综合报告 — docs/ux-audit-thorough-YYYY-MM-DD.md，包含按严重程度统计的问题数量
总结 — 前 5 个关键问题、工作流空白、场景测试结果、“首先要修复的一件事”

自动化布局检测（JS 注入）

在每个页面上，通过浏览器工具注入 JavaScript，以编程方式检测布局问题：

// Detect elements overflowing their parent
document.querySelectorAll('*').forEach(el => {
  const r = el.getBoundingClientRect();
  const p = el.parentElement?.getBoundingClientRect();
  if (p && (r.left < p.left - 1 || r.right > p.right + 1)) {
    console.warn('OVERFLOW:', el.tagName, el.className, 'extends beyond parent');
  }
});

// Detect text clipped by containers
document.querySelectorAll('h1,h2,h3,h4,p,span,a,button,label').forEach(el => {
  if (el.scrollWidth > el.clientWidth + 2 || el.scrollHeight > el.clientHeight + 2) {
    console.warn('CLIPPED:', el.tagName, el.textContent?.slice(0,50));
  }
});

// Detect elements with zero or negative visibility
document.querySelectorAll('*').forEach(el => {
  const s = getComputedStyle(el);
  const r = el.getBoundingClientRect();
  if (r.width > 0 && r.height > 0 && r.left + r.width < 0) {
    console.warn('OFF-SCREEN LEFT:', el.tagName, el.className);
  }
});

// Detect low contrast text (rough check)
document.querySelectorAll('h1,h2,h3,p,span,a,li,td,th,label,button').forEach(el => {
  const s = getComputedStyle(el);
  if (s.color === s.backgroundColor || s.opacity === '0') {
    console.warn('INVISIBLE TEXT:', el.tagName, el.textContent?.slice(0,30));
  }
});

注入后读取控制台输出。每个警告都是一个需要截图和调查的潜在发现。

响应式断点扫描

对于每个页面，调整视口大小通过标准断点并截图：

宽度	代表内容	检查项
1280px	桌面（标准）	基线布局，侧边栏 + 内容
1024px	小桌面 / 平板横向	导航折叠点，网格重排
768px	平板纵向	侧边栏行为，堆叠布局
375px	移动端	所有内容堆叠，触摸目标，无水平滚动

如果布局在断点之间发生变化（侧边栏折叠，网格减少列数），也要截图过渡点。

控制台错误扫描

在每个页面上，读取浏览器控制台以查找：

JS 错误（TypeError、ReferenceError 等）— 严重程度：高
失败的网络请求（404、500、CORS）— 严重程度：高
React/框架警告（key props、弃用的 API）— 严重程度：中
CSP 违规 — 严重程度：中
弃用警告 — 严重程度：低

网络错误检测（彻底 + 详尽模式）

关键：视觉浏览会错过 UI 隐藏的 API 故障。数据获取库（TanStack Query、SWR）会吞掉 HTTP 错误，并显示空/加载状态而不是错误消息。显示“未找到结果”的组件实际上可能收到了 403——但 UI 看起来正常。

在整个审计会话期间监控网络响应。如果使用 Playwright，在开始浏览之前附加响应监听器：

// Inject into page or use Playwright's page.on('response')
// Collect all non-2xx API responses
const networkErrors = [];
// After each page navigation, check for failed fetch/XHR requests
// by reading the browser's network log or console output

如果使用 Chrome MCP，使用 read_network_requests 在每次页面访问后检查失败的 API 调用。

收集内容：URL、HTTP 状态、方法、发生时所在的页面。

严重程度映射：

状态码	严重程度	通常含义
500+	严重	服务器错误 — 某些东西坏了
403	高	权限错误或路由冲突（静态路由被 `/:param` 遮蔽）
404	中	缺少端点 — 可能是重命名/移除的 API 路由
401	低	对于未认证的探测是预期的，但如果发生在已认证页面上则标记
CORS 错误	高	API 端点缺少 CORS 头 — 功能在生产环境中损坏

这能捕获视觉浏览会错过的内容：

路由冲突（例如 GET /api/boards/users 被 GET /api/boards/:boardId 遮蔽）
静默失败的端点（TanStack Query 显示空数据而不是错误）
仅在生产环境中可见的 CORS 问题
认证中间件拒绝有效会话
重构后缺少的端点

报告格式：在“网络错误”部分按状态码分组：

## Network Errors (detected during browsing)

### 403 Forbidden (2 endpoints)
- `GET /api/boards/users` on /app/boards/123 — likely route collision with /:boardId
- `POST /api/settings/theme` on /app/settings — permission check failing

### 500 Internal Server Error (1 endpoint)
- `GET /api/reports/summary` on /app/dashboard — server error

按深度划分的自主性

操作	quick	standard	thorough
导航页面	直接执行	直接执行	直接执行
截图	关键时刻	摩擦点	每个页面 + 每个问题
用测试数据填写表单	先询问	先询问	直接执行（明显是假数据）
点击删除/破坏性操作	先询问	先询问	先询问（唯一例外）
提交表单	先询问	简要确认	直接执行（仅限测试数据）
写入报告文件	直接执行	简要确认	直接执行

模式 1：UX 走查（内部试用）

何时使用：“ux walkthrough”、“walk through the app”、“is the app intuitive?”、“ux audit”、“dogfood this”

这是价值最高的模式。你正在内部试用应用——像真实用户一样使用它，带着他们的目标、限制和耐心水平。不是机械的清单核对，而是真正尝试完成一项工作。

步骤 1：采用用户角色

询问用户两个问题：

任务场景：用户需要完成什么？（例如，“创建新患者并为其预约手术”）
用户是谁？：他们的上下文是什么？（例如，“在电话间隙忙碌的接待员，使用桌面，中等技术舒适度”）

如果用户未指定角色，则采用合理的默认值：一个非技术性、时间紧迫、轻度分心、今天首次使用此应用的人。

步骤 2：以全新视角接近

导航到应用的入口点。从这里开始，在没有任何 UI 先验知识的情况下尝试完成任务。采用角色的心态：

不要使用浏览器开发者工具或阅读源代码来弄清楚东西在哪里
不要假设标签的含义是开发者所期望的——按字面意思阅读它们
如果某些东西令人困惑，不要强行通过——将其记录为摩擦点
如果你不确定按钮会做什么，那就是一个发现

步骤 3：评估每个屏幕

在每个屏幕上，根据走查清单进行评估（参见 references/walkthrough-checklist.md）。心中要牢记的关键问题：

布局：所有文本是否完全可见？没有裁剪或重叠？间距是否一致？理解：我是否理解此页面的用途以及我可以在这里做什么？标签对非开发者来说有意义吗？寻路：我知道我在应用中的位置吗？我能回到我来的地方吗？导航是否显示我的位置？流程：此页面是否与上一个页面自然连接？下一步是否显而易见，无需思考？信任：我是否确信这会按我预期的方式工作？我是否害怕会破坏某些东西？效率：这需要多少次点击/步骤？有更短的路径吗？恢复：如果我现在犯了一个错误，我能恢复吗？

步骤 4：计算成本

跟踪完成任务所需的努力：

点击次数：从开始到结束需要多少次点击？
决策点：我不得不停下来思考多少次？
死胡同：我是否走错了路并不得不回溯？
时间印象：这感觉快还是乏味？

步骤 5：测试恢复能力

完成主要任务后，测试出错时会发生什么：

在表单中途导航离开——数据是否保留？
提交缺失/错误的数据——错误消息是否有帮助且具体？
使用后退按钮——应用是否优雅处理？
刷新页面——状态是否保留？

步骤 6：询问重大问题

在完成（或失败）任务后，以角色的身份反思：

我会回来吗？ 或者我会寻找替代方案？
我能教同事使用这个吗？ 在 2 分钟内？
哪一件事能让这变得容易一倍？

步骤 7：记录和报告

在摩擦点截图。将发现整理成 UX 审计报告。使用 references/report-template.md 中的模板将报告写入 docs/ux-audit-YYYY-MM-DD.md。

严重程度级别：

严重 — 阻止用户完成任务
高 — 导致困惑或显著摩擦
中 — 次优但用户可以绕过
低 — 润色和次要改进

模式 2：质量保证扫描

何时使用：“qa test”、“test all pages”、“check everything works”、“qa sweep”

对所有页面和功能进行系统性的机械测试。

发现页面：读取应用的路由器配置、站点地图，或手动导航侧边栏/菜单以找到所有路由
创建任务列表，列出要测试的区域（按功能区域分组）
对于每个页面/功能：

 * 页面渲染无错误

 * 数据正确显示（表格、列表、详情）
 * 表单成功提交（创建）
 * 记录可以编辑（更新）
 * 删除操作有效并带有确认
 * 错误输入时触发验证
 * 空状态正确显示
 * 错误状态得到处理

4. 跨领域关注点：

 * 深色模式：所有元素可见，无对比度问题
 * 移动视口（375px）：布局未破坏，触摸目标足够大
 * 搜索和过滤器：返回正确结果
 * 通知：显示并可关闭

5. 生成质量保证扫描汇总表：

页面	状态	问题
/patients	通过	—
/patients/new	失败	电子邮件缺少表单验证

将报告写入 docs/qa-sweep-YYYY-MM-DD.md

模式 3：针对性检查

何时使用：“check [feature]”、“test [page]”、“verify [component] works”

对特定区域进行集中测试。

导航到特定页面或功能
彻底测试——所有状态、边缘情况、错误路径
内联报告发现（除非用户要求，否则不单独生成文件）

场景	模式	深度
刚更改了一个页面，快速完整性检查	针对性检查	quick
构建功能后，向用户展示前	UX 走查	standard
发布前，验证没有损坏	质量保证扫描	standard
变更后对特定页面的快速检查	针对性检查	quick
定期的 UX 健康检查	UX 走查	standard
客户演示准备	质量保证扫描 + UX 走查	standard
一天结束时的综合测试，早上查看	所有模式组合	thorough
上线前的完整审计，附带证据	所有模式组合	thorough
客户演示前测试字面上的所有内容	每个页面上的每个元素	exhaustive
周末进行的完整应用认证	每个元素、状态、视口、模式	exhaustive

跳过此技能用于：仅 API 服务、CLI 工具、单元/集成测试（使用测试框架）、性能测试。

默认规则（standard 深度）。有关 quick/thorough 的覆盖规则，请参见上文的“按深度划分的自主性”表格。

直接执行：导航页面、截图、读取页面内容、评估可用性
简要确认：在开始完整的质量保证扫描（可能耗时较长）之前，在写入报告文件之前
先询问：在提交包含真实数据的表单之前，在点击删除/破坏性操作之前

Chrome MCP 是认证应用的理想选择——它使用你的真实会话
对于长时间的质量保证扫描，使用任务列表来跟踪跨页面的进度
在关键摩擦点截图——它们使报告具有可操作性
在质量保证扫描之前运行 UX 走查——发现“按钮有效但用户困惑”比“所有按钮有效”更有价值
始终保持角色——如果你发现自己想“开发者会知道要……”，请停止。用户不是开发者。
每一次犹豫都是一个发现。如果你停下来弄清楚要点击什么，那就是值得报告的摩擦。
“让事情容易一倍的一件事”通常是整个报告中最具可行性的见解。

时机	阅读
开始彻底模式前 — 了解用户的世界	references/workflow-comprehension.md
走查过程中评估每个屏幕	references/walkthrough-checklist.md
运行六个场景测试	references/scenario-tests.md
撰写审计报告	references/report-template.md
浏览器工具命令和选择	references/browser-tools.md

🇺🇸English

UX Audit

Dogfood web apps by browsing them as a real user would — with their goals, their patience, and their context. Goes beyond "does it work?" to "is it good?" by tracking emotional friction (trust, anxiety, confusion), counting click efficiency, testing resilience, and asking the ultimate question: "would I come back?" Uses Chrome MCP (for authenticated apps with your session) or Playwright for browser automation. Produces structured audit reports with findings ranked by impact.

Browser Tool Detection

Before starting any mode, detect available browser tools:

Chrome MCP (mcp__claude-in-chrome__*) — preferred for authenticated apps. Uses the user's logged-in Chrome session, so OAuth/cookies just work.
Playwright MCP (mcp__plugin_playwright_playwright__*) — for public apps or parallel sessions.
playwright-cli — for scripted flows and sub-agent browser tasks.

If none are available, inform the user and suggest installing Chrome MCP or Playwright.

See references/browser-tools.md for tool-specific commands.

URL Resolution

If the user didn't provide a URL, find one automatically. Prefer the deployed/live version — that's what real users see.

Check wrangler.jsonc for custom domains or routes:

grep -E '"pattern"|"custom_domain"' wrangler.jsonc 2>/dev/null

If found, use the production URL (e.g. https://app.example.com).

Check for deployed URL in CLAUDE.md, README, or package.json homepage field.
Fall back to local dev server — check if one is already running:

lsof -i :5173 -i :3000 -i :8787 -t 2>/dev/null

If running, use http://localhost:{port}.

Ask the user as a last resort.

Why live over local : The live site has real data, real auth, real network latency, real CDN behaviour, and real CORS/CSP policies. Testing locally misses deployment-specific issues (missing env vars, broken asset paths, CORS errors, slow API responses). The UX audit should test what the user actually experiences.

When local is better : The user explicitly says "test localhost", or the feature isn't deployed yet.

Depth Levels

Control how thorough the audit is. Pass as an argument: /ux-audit quick, /ux-audit thorough, or default to standard.

Depth	Duration	Autonomy	What it covers
quick	5-10 min	Interactive	One user flow, happy path only. Spot check after a change.
standard	20-40 min	Semi-autonomous	Full walkthrough + QA sweep of main pages. Default.
thorough	1-3 hours	Fully autonomous	Multiple personas, all pages, all modes combined. Overnight mode.
exhaustive	4-8+ hours	Fully autonomous	Every interactive element on every page. Every button clicked, every dialog opened, every form filled, every state triggered. Leave nothing untested.

Exhaustive Mode

The exhaustive mode goes beyond thorough. Thorough tests workflows and pages. Exhaustive tests every single interactive element in the application.

For each page discovered:

Inventory all interactive elements — buttons, links, inputs, selects, checkboxes, toggles, tabs, accordions, modals triggers, dropdowns, context menus, drag handles, sliders
Click/interact with every one — open every dialog, expand every accordion, select every tab, toggle every switch, trigger every dropdown
Screenshot each state — default, hover, active, open, closed, expanded, collapsed, selected, error
Test every form — fill with valid data, submit. Fill with invalid data, submit. Leave empty, submit. Test every field individually.
Test every combination — if there are filters, test each filter value. If there are tabs, test each tab. If there are sort options, test each sort.
Dark mode + light mode — every page, every dialog, every state in both modes
Three viewport widths — 1280px, 768px, 375px for every page and dialog
Keyboard navigation — tab through every page, verify focus order, test Enter/Space/Escape on every interactive element
Right-click/context menus — if the app has custom context menus, test every option in every context
Edge states — what happens with 0 items, 1 item, 100 items, 1000 items? What happens with very long text in every field?
Concurrent tabs — open the same page in two tabs, interact in both, check for conflicts
Every error path — trigger every validation error, every 404, every permission denied, every timeout

Progress tracking : This mode generates a LOT of findings. Write findings to the report incrementally — don't hold everything in memory. Update docs/ux-audit-exhaustive-YYYY-MM-DD.md after each page is complete.

Element inventory format (per page):

/clients — 47 interactive elements
  [x] "Add Client" button — opens modal ✓, form submits ✓, validation ✓
  [x] Search input — filters correctly ✓, clear button works ✓, empty search ✓
  [x] Sort dropdown — all 4 options work ✓, persists on navigation ✗ (BUG)
  [x] Client row click — navigates to detail ✓
  [x] Star button — toggles ✓, persists on refresh ✓
  [ ] Pagination — next ✓, prev ✓, page numbers ✓, items per page ✗ (not tested - no data)
  ...

Thorough Mode: Overnight Workflow

The thorough mode is designed to run unattended. Kick it off at end of day, review the report in the morning. The user should NOT need to find issues themselves — this mode catches everything.

Mindset : Don't run through a checklist. Think about the real person who will use this app every day. What are the threads of their workday? How will they move through the system? Will they understand what they're looking at? Will the app teach them how to use it through its design, or will they be guessing? Read references/workflow-comprehension.md before starting.

Discover all routes — read router config, crawl navigation, build complete page inventory
Identify workflow threads — what are the 3-5 real tasks a user does in a day? Map them before testing individual pages. See references/workflow-comprehension.md.
Create a task list — track progress across the audit
Visual & layout sweep (every page):
- Screenshot at 1280px, 1024px, 768px, 375px widths
- Screenshot in light mode and dark mode
- Run JS overflow detection on each page (see below)
- Check for clipped text, overlapping elements, broken grids
- Compare sidebar + content alignment across all pages
Workflow thread testing — follow each identified thread end to end:
- Does the next step suggest itself at every point?
- Can the user leave and come back without losing their place?
- Do transitions between pages preserve context (filters, selections)?
- Do nav labels match how a user would describe their work?
- After creating/saving/deleting, does the app take them somewhere logical?
UX Walkthrough x3 personas :
- First-time user (non-technical, time-poor, first visit)
- Power user (daily user, knows the app, looking for efficiency)
- Mobile user (phone, touch targets, small viewport)
Full QA sweep — every page, all CRUD, all states (empty, error, loading, populated)
Resilience testing — every form: bad data, mid-navigation, back button, refresh, double-submit

Automated Layout Detection (JS Injection)

On each page, inject JavaScript via the browser tool to programmatically detect layout issues:

// Detect elements overflowing their parent
document.querySelectorAll('*').forEach(el => {
  const r = el.getBoundingClientRect();
  const p = el.parentElement?.getBoundingClientRect();
  if (p && (r.left < p.left - 1 || r.right > p.right + 1)) {
    console.warn('OVERFLOW:', el.tagName, el.className, 'extends beyond parent');
  }
});

// Detect text clipped by containers
document.querySelectorAll('h1,h2,h3,h4,p,span,a,button,label').forEach(el => {
  if (el.scrollWidth > el.clientWidth + 2 || el.scrollHeight > el.clientHeight + 2) {
    console.warn('CLIPPED:', el.tagName, el.textContent?.slice(0,50));
  }
});

// Detect elements with zero or negative visibility
document.querySelectorAll('*').forEach(el => {
  const s = getComputedStyle(el);
  const r = el.getBoundingClientRect();
  if (r.width > 0 && r.height > 0 && r.left + r.width < 0) {
    console.warn('OFF-SCREEN LEFT:', el.tagName, el.className);
  }
});

// Detect low contrast text (rough check)
document.querySelectorAll('h1,h2,h3,p,span,a,li,td,th,label,button').forEach(el => {
  const s = getComputedStyle(el);
  if (s.color === s.backgroundColor || s.opacity === '0') {
    console.warn('INVISIBLE TEXT:', el.tagName, el.textContent?.slice(0,30));
  }
});

Read console output after injection. Each warning is a potential finding to screenshot and investigate.

Responsive Breakpoint Sweep

For each page, resize the viewport through standard breakpoints and screenshot:

Width	What it represents	Check for
1280px	Desktop (standard)	Baseline layout, sidebar + content
1024px	Small desktop / tablet landscape	Nav collapse point, grid reflow
768px	Tablet portrait	Sidebar behaviour, stacked layout
375px	Mobile	Everything stacked, touch targets, no horizontal scroll

If the layout changes between breakpoints (sidebar collapses, grid reduces columns), screenshot the transition point too.

Console Error Sweep

On each page, read the browser console for:

JS errors (TypeError, ReferenceError, etc.) — severity: High
Failed network requests (404, 500, CORS) — severity: High
React/framework warnings (key props, deprecated APIs) — severity: Medium
CSP violations — severity: Medium
Deprecation warnings — severity: Low

Network Error Detection (thorough + exhaustive)

Critical : Visual browsing misses API failures that the UI hides. Data-fetching libraries (TanStack Query, SWR) swallow HTTP errors and show empty/loading states instead of error messages. A component showing "No results found" might actually be getting a 403 — but the UI looks normal.

Monitor network responses throughout the entire audit session. If using Playwright, attach a response listener before browsing starts:

// Inject into page or use Playwright's page.on('response')
// Collect all non-2xx API responses
const networkErrors = [];
// After each page navigation, check for failed fetch/XHR requests
// by reading the browser's network log or console output

If using Chrome MCP, use read_network_requests to check for failed API calls after each page visit.

What to collect : URL, HTTP status, method, the page you were on when it happened.

Severity mapping :

Status	Severity	What it usually means
500+	Critical	Server error — something is broken
403	High	Permission error OR route collision (static route shadowed by `/:param`)
404	Medium	Missing endpoint — may be a renamed/removed API route
401	Low	Expected for unauthenticated probes, but flag if it happens on authenticated pages
CORS error	High	API endpoint missing CORS headers — feature broken in production

What this catches that visual browsing misses :

Route collisions (e.g. GET /api/boards/users shadowed by GET /api/boards/:boardId)
Endpoints that fail silently (TanStack Query shows empty data instead of error)
CORS issues only visible in production
Auth middleware rejecting valid sessions
Missing endpoints after refactoring

Report format : Group by status code in a "Network Errors" section:

## Network Errors (detected during browsing)

### 403 Forbidden (2 endpoints)
- `GET /api/boards/users` on /app/boards/123 — likely route collision with /:boardId
- `POST /api/settings/theme` on /app/settings — permission check failing

### 500 Internal Server Error (1 endpoint)
- `GET /api/reports/summary` on /app/dashboard — server error

Autonomy by Depth

Action	quick	standard	thorough
Navigate pages	Just do it	Just do it	Just do it
Take screenshots	Key moments	Friction points	Every page + every issue
Fill forms with test data	Ask first	Ask first	Just do it (obviously fake data)
Click delete/destructive	Ask first	Ask first	Ask first (only exception)
Submit forms	Ask first	Brief confirmation	Just do it (test data only)
Write report file	Just do it	Brief confirmation	Just do it

Operating Modes

Mode 1: UX Walkthrough (Dogfooding)

When : "ux walkthrough", "walk through the app", "is the app intuitive?", "ux audit", "dogfood this"

This is the highest-value mode. You are dogfooding the app — using it as a real user would, with their goals, their constraints, and their patience level. Not a mechanical checklist pass, but genuinely trying to get a job done.

Step 1: Adopt a User Persona

Ask the user two questions:

Task scenario : What does the user need to accomplish? (e.g., "Create a new patient and book them for surgery")
Who is the user? : What's their context? (e.g., "A busy receptionist between phone calls, on a desktop, moderate tech comfort")

If the user doesn't specify a persona, adopt a reasonable default: a non-technical person who is time-poor, mildly distracted, and using this app for the first time today.

Step 2: Approach with Fresh Eyes

Navigate to the app's entry point. From here, attempt the task with no prior knowledge of the UI. Adopt the persona's mindset:

Don't use browser dev tools or read source code to figure out where things are
Don't assume labels mean what a developer intended — read them literally
If something is confusing, don't power through — note it as friction
If you feel uncertain about what a button will do, that's a finding

Step 3: Evaluate Each Screen

At each screen, evaluate against the walkthrough checklist (see references/walkthrough-checklist.md). Key questions to hold in mind:

Layout : Is all text fully visible? Nothing clipped or overlapping? Spacing consistent? Comprehension : Do I understand what this page is for and what I can do here? Do the labels make sense to a non-developer? Wayfinding : Do I know where I am in the app? Can I get back to where I came from? Does the nav show my location? Flow : Does this page connect naturally to the last one? Is the next step obvious without thinking? Trust : Do I feel confident this will do what I expect? Am I afraid I'll break something? Efficiency : How many clicks/steps is this taking? Is there a shorter path? Recovery : If I make a mistake right now, can I get back?

Step 4: Count the Cost

Track the effort required to complete the task:

Click count : How many clicks from start to finish?
Decision points : How many times did I have to stop and think?
Dead ends : Did I go down a wrong path and have to backtrack?
Time impression : Does this feel fast or tedious?

Step 5: Test Resilience

After completing the main task, test what happens when things go wrong:

Navigate away mid-form — is data preserved?
Submit with missing/bad data — are error messages helpful and specific?
Use the back button — does the app handle it gracefully?
Refresh the page — does state survive?

Step 6: Ask the Big Questions

After completing (or failing) the task, reflect as the persona:

Would I come back? Or would I look for an alternative?
Could I teach a colleague to use this? In under 2 minutes?
What's the one thing that would make this twice as easy?

Step 7: Document and Report

Take screenshots at friction points. Compile findings into a UX audit report. Write report to docs/ux-audit-YYYY-MM-DD.md using the template from references/report-template.md

Severity levels :

Critical — blocks the user from completing their task
High — causes confusion or significant friction
Medium — suboptimal but the user can work around it
Low — polish and minor improvements

Mode 2: QA Sweep

When : "qa test", "test all pages", "check everything works", "qa sweep"

Systematic mechanical testing of all pages and features.

Discover pages : Read the app's router config, sitemap, or manually navigate the sidebar/menu to find all routes
Create a task list of areas to test (group by feature area)
For each page/feature :
- Page renders without errors
- Data displays correctly (tables, lists, details)
- Forms submit successfully (create)
- Records can be edited (update)
- Delete operations work with confirmation
- Validation fires on bad input
- Empty states display correctly
- Error states are handled
Cross-cutting concerns :
- Dark mode: all elements visible, no contrast issues
- Mobile viewport (375px): layout doesn't break, touch targets adequate
- Search and filters: return correct results
- Notifications: display and can be dismissed
Produce a QA sweep summary table :

Page	Status	Issues
/patients	Pass	—
/patients/new	Fail	Form validation missing on email

Write report to docs/qa-sweep-YYYY-MM-DD.md

Mode 3: Targeted Check

When : "check [feature]", "test [page]", "verify [component] works"

Focused testing of a specific area.

Navigate to the specific page or feature
Test thoroughly — all states, edge cases, error paths
Report findings inline (no separate file unless user requests)

When to Use

Scenario	Mode	Depth
Just changed a page, quick sanity check	Targeted Check	quick
After building a feature, before showing users	UX Walkthrough	standard
Before a release, verify nothing is broken	QA Sweep	standard
Quick check on a specific page after changes	Targeted Check	quick
Periodic UX health check	UX Walkthrough	standard
Client demo prep	QA Sweep + UX Walkthrough	standard
End-of-day comprehensive test, review in morning	All modes combined	thorough
Pre-launch full audit with evidence	All modes combined	thorough
Test literally everything before a client demo	Every element on every page

Skip this skill for : API-only services, CLI tools, unit/integration tests (use test frameworks), performance testing.

Autonomy Rules

Default rules (standard depth). See "Autonomy by Depth" table above for quick/thorough overrides.

Just do it : Navigate pages, take screenshots, read page content, evaluate usability
Brief confirmation : Before starting a full QA sweep (can be lengthy), before writing report files
Ask first : Before submitting forms with real data, before clicking delete/destructive actions

Tips

Chrome MCP is ideal for authenticated apps — it uses your real session
For long QA sweeps, use the task list to track progress across pages
Take screenshots at key friction points — they make the report actionable
Run UX walkthrough before QA sweep — finding "buttons work but users are confused" is more valuable than "all buttons work"
Stay in persona throughout — if you catch yourself thinking "a developer would know to..." stop. The user isn't a developer.
Every hesitation is a finding. If you paused to figure out what to click, that's friction worth reporting.
The "one thing to make it twice as easy" is often the most actionable insight in the whole report

Reference Files

When	Read
Before starting thorough mode — understand the user's world	references/workflow-comprehension.md
Evaluating each screen during walkthrough	references/walkthrough-checklist.md
Running the six scenario tests	references/scenario-tests.md
Writing the audit report	references/report-template.md
Browser tool commands and selection	references/browser-tools.md

Weekly Installs

653

Repository

jezweb/claude-skills

GitHub Stars

643

First Seen

Feb 18, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

opencode601

gemini-cli596

codex596

github-copilot594

cursor584

kimi-cli579

Gmail过滤器创建教程 - 使用Google Workspace CLI自动分类邮件与添加标签

6,500 周安装

Accessibility basics — heading hierarchy, alt text, focus order, colour contrast

Console error sweep — check browser console on every page for JS errors, failed network requests, deprecation warnings

Wayfinding & comprehension check — on each page: do I know where I am? Can I get back? Does the heading tell me what I can do here? Are visual cues guiding me to the right action?

Scenario tests — run all six from references/scenario-tests.md: * New hire onboarding (can you figure out the app with zero guidance?) * Interrupted workflow (start a task, close the tab, come back — what survived?) * Wrong turn recovery (go to the wrong page, how many clicks to get back on track?) * Day two (repeat the same tasks — is it faster? are there shortcuts?) * Explain it to a colleague (write a 2-min guide for each workflow — gaps = UX failures) * What changed? (log in after creating data — can you tell what needs attention?)

Screenshot everything — save to .jez/screenshots/ux-audit/ (numbered chronologically)

Comprehensive report — docs/ux-audit-thorough-YYYY-MM-DD.md with issue counts by severity

Summary — top 5 critical issues, workflow gaps, scenario test results, "one thing to fix first"