⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

软件用户体验研究技能指南：UX研究方法、可用性测试与无障碍设计基准

software-ux-research by vasilyu1983/ai-agents-public

83 周安装量

53 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/vasilyu1983/ai-agents-public --skill software-ux-research

研究方法用户体验产品管理

🇨🇳中文介绍

软件用户体验研究技能 — 快速参考

使用此技能来识别问题/机会并降低决策风险。使用 software-ui-ux-design 来实现用户界面模式、组件变更和设计系统更新。

2026年3月基准（核心）

以人为本的设计 : 基于证据的迭代设计 + 评估 (ISO 9241-210:2019) https://www.iso.org/standard/77520.html
可用性定义 : 特定情境下的有效性、效率和满意度 (ISO 9241-11:2018) https://www.iso.org/standard/63500.html
无障碍基准 : WCAG 2.2 是 W3C 推荐标准 (2024年12月12日) https://www.w3.org/TR/WCAG22/
WCAG 3.0 预览 : 工作草案于2025年9月发布；引入了青铜/白银/黄金合规等级和增强的认知无障碍性；预计不早于2028-2030年 https://www.w3.org/WAI/standards-guidelines/wcag/wcag3-intro/
欧盟运输说明 : 欧洲无障碍法案适用于2025年6月28日之后涵盖的产品/服务 (指令 (EU) 2019/882) https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32019L0882

何时使用此技能

探索：用户需求、待办任务、机会规模评估、心智模型。
验证：概念、原型、入门/首次使用成功率。
评估性：可用性测试、启发式评估、认知走查。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

何时不使用此技能

用户界面实现 → 使用 software-ui-ux-design 处理组件、模式、代码
分析埋点 → 使用 marketing-product-analytics 处理追踪计划，使用 qa-observability 处理实现模式
无障碍合规性审计 → 使用专门的无障碍检查清单（WCAG 合规性）
市场研究 → 使用 marketing-social-media 或相关的市场营销技能
A/B测试平台设置 → 使用实验平台（Statsig, GrowthBook, LaunchDarkly）

操作模式（核心）

如果输入信息缺失，请询问：

需要解锁的决策（基于此项研究将改变什么）。
目标角色/细分群体和首要任务。
平台和情境（网页/移动端/桌面端；远程/现场；辅助技术）。
现有证据（分析数据、工单、评论、录音、先前研究）。
约束条件（时间线、招募渠道、合规性、预算）。

默认输出（选择用户所要求的）：

研究计划 + 产出合同（优先使用 ../software-clean-code-standard/assets/checklists/ux-research-plan-template.md；使用 assets/research-plan-template.md 获取技能特定细节）
研究方案（任务/脚本 + 成功指标 + 招募计划）
发现报告（问题 + 严重性 + 证据 + 建议 + 置信度）
决策简报（选项 + 权衡 + 建议 + 测量计划）

每项研究成果——计划、方案、评估、报告——必须包含以下部分。它们代表了此技能超越标准用户体验知识的核心价值：治理、置信度校准和伦理研究实践。

方法论证 : 命名所选方法并解释为何拒绝替代方案。不要仅仅描述方法；要根据特定情境（阶段、时间线、样本、问题类型）解释为何选择它而不是至少两种替代方案。
置信度与三角验证评估：为每条建议或发现标记置信度等级：

置信度	证据要求	用于
高	多种方法或来源达成一致	高影响力决策
中	单一方法的强信号 + 支持性指标	优先级排序
低	单一来源 / 小样本	仅用于探索性假设

同意书与数据处理：在每个计划或方案中包含个人身份信息/同意书部分。涉及参与者的研究需要明确关注：
- 收集最少的个人身份信息
- 身份信息与研究数据分开存储
- 在广泛共享前对姓名/邮箱进行脱敏处理
- 录音访问权限限制在需要知晓范围内
- 记录同意书、目的、保留期限和退出路径
决策框架 : 对于评估和分析输出，提供一个结构化的决策表，包含选项、置信度等级、时间线和风险——而不仅仅是单一建议。
决策前检查清单 : 对于实验评估（A/B测试等），包含一份关于混杂因素和数据质量检查的验证清单，在任何发布/终止决策前完成。

方法选择器（核心）

决策树（快速）

What do you need?
  ├─ WHY / needs / context → interviews, contextual inquiry, diary
  ├─ HOW / usability → moderated usability test, cognitive walkthrough, heuristic eval
  ├─ WHAT / scale → analytics/logs + targeted qual follow-ups
  └─ WHICH / causal → experiments (if feasible) or preference tests

选择方法时，始终要通过解释为何在用户特定情境下拒绝了2种以上的替代方案来论证选择。这是一个关键区别——没有论证的通用“我们将进行访谈”是不够的。

按产品阶段进行研究

阶段框架（何时做什么）

阶段	决策	主要方法	次要方法	产出
探索	构建什么以及为谁构建	访谈、实地/日记研究、旅程地图	竞品分析、反馈挖掘	机会简报 + 待办任务 + 进步力量
概念/最小可行产品	概念是否可行？	概念测试、原型可用性测试	首次点击/树测试	最小可行产品范围 + 入门计划
发布	是否可用 + 无障碍？	可用性测试、无障碍审查	启发式评估、会话回放	发布阻碍 + 修复方案
增长	什么驱动采用/价值？	细分分析 + 定性跟进	流失访谈、调查	留存驱动因素 + 摩擦点
成熟	优化/弃用什么？	实验、纵向追踪	非主持测试	增量路线图

探索产出：超越基础待办任务

探索研究应产出比任务陈述更多的内容。包括：

进步力量图 : 映射影响切换行为的四种力量——推力（当前痛点）、拉力（新解决方案吸引力）、焦虑（对改变的恐惧）、习惯（惯性）。这些力量解释了用户为何采用或不采用，这直接影响了定位和入门。
痛点严重性矩阵 : 通过频率 × 影响 × 广度对每个痛点进行评分，以客观确定优先级。每周影响3个角色的痛点，其优先级高于每月影响1个角色的痛点，即使后者在访谈中感觉更突出。

复杂系统研究（工作流、管理、受监管）

指标	示例	研究影响
多步骤工作流	草稿 → 批准 → 发布	任务分析 + 状态映射
多角色权限	管理员 vs 编辑 vs 查看者	测试每个角色 + 转换
数据依赖	需要集成/同步	错误路径 + 恢复测试
高风险	金融、医疗保健	安全检查 + 确认
专家用户	开发工具、分析	招募真实专家（而非代理）

评估方法（核心）

情境调查：观察真实工作和约束条件。
任务分析：映射目标 → 步骤 → 失败点。
认知走查：评估可学习性和指示符。
错误路径测试：超时、离线、部分数据、权限丢失、重试。
多角色走查：模拟交接（创建者 → 审核者 → 管理员）。

多角色覆盖检查清单

角色-权限矩阵已记录。
“无访问权限”用户体验已定义（请求路径、最小权限默认值）。
跨角色交接已测试（通知、状态变更、审计历史）。
每个角色的错误恢复已测试（重试、撤销、升级）。

研究运营与治理（核心）

需求收集（使请求具有可比性）

最低必需字段：

需要解锁的决策和截止日期。
研究问题（主要 + 次要）。
目标用户/细分群体和招募约束。
现有证据和链接。
交付物格式 + 受众。

优先级排序（简单评分）

使用轻量级评分以避免积压瘫痪：

决策影响
知识缺口
时间紧迫性
可行性（招募 + 时间）

知识库与分类法

存储每项研究时包含：方法、日期、产品领域、角色、任务、关键发现、原始证据链接。
为复用打标签：问题类型（导航/表单/性能）、组件/模式、漏斗步骤。
倾向于“原子化”发现（每张卡片一个洞察）以实现重组 [推理]。

同意书、个人身份信息和访问控制

遵循适用的隐私法律；GDPR是欧盟处理的主要参考 https://eur-lex.europa.eu/eli/reg/2016/679/oj

个人身份信息处理检查清单：

收集安排日程和激励所需的最少个人身份信息。
将身份/联系信息与研究数据分开存储。
在广泛共享前对转录稿中的姓名/邮箱进行脱敏处理。
将原始录音访问权限限制在需要知晓范围内。
记录同意书、目的、保留期限和退出路径。

研究民主化（2026年趋势）

研究民主化是2026年的一个反复出现的趋势：非研究人员越来越多地进行研究。谨慎地通过护栏来实现。

方法	护栏	风险等级
模板化可用性测试	提供脚本 + 任务模板	低
产品经理进行的客户访谈	需要培训 + 审查	中
任何人进行的调查设计	中心审查 + 标准问题	中
无监督研究	不推荐	高

非研究人员护栏：

仅使用预先批准的研究模板
采取行动前需进行中心审查
未经运营批准不得直接招募参与者
强制性偏见意识培训
针对意外发现的明确升级路径

研究非技术用户细分群体（2026年）

涉及数字素养低或技术信心低的用户研究时的快速检查清单。完整指南见 references/non-technical-user-research.md。

评估数字素养层级（被排除 → 依赖型 → 犹豫型 → 有能力型 → 自信型）
通过线下优先渠道招募（社区中心、图书馆、电话外联）
使用通俗易懂的筛选问题（无术语，无自评量表）
调整方法：仅主持测试、缩短会话时间（30-40分钟）、大声朗读任务
测量：无协助任务完成率（>=80%）、首次价值时间（<2分钟）、错误恢复率
将发现定位为“包容性改进”，而非“简化”
与简化审计模板交叉参考

测量与决策质量（核心）

研究投资回报率快速参考

研究活动	代理指标	计算方式
可用性测试发现	避免的开发返工	节省工时 × $150/小时
探索性访谈	避免构建错误的东西	冲刺成本 × 风险降低百分比
A/B测试结论性结果	提升的转化率	(Δ转化率 × 流量 × 生命周期价值) - 测试成本
启发式评估	早期缺陷检测	发现的缺陷 × 后期修复成本

1个可用性发现避免了40小时的返工 = $6,000 价值
1个探索性洞察避免了1个浪费的冲刺 = $50,000-100,000 价值
研究将10万访客的转化率提升0.5%，生命周期价值$50 = $25,000/月

何时不进行 A/B 测试

情况	失败原因	更好的方法
低统计功效/流量	结果不确定	可用性测试 + 趋势分析
多个变量变化	归因不可能	原型测试 → 分阶段发布
需要“为什么”	实验无法解释	访谈 + 观察
伦理约束	有害的拒绝	分阶段发布 + 对照组
长期效应	短期测试错过延迟影响	纵向 + 留存分析

常见混杂因素（尽早指出）

在实验评估中始终检查这些因素。列出每个相关的混杂因素及其风险等级和验证方法——不要仅仅命名：

选择偏差（只有高级用户响应）——检查细分群体构成。
幸存者偏差（你错过了流失用户）——与群组级别数据比较。
新奇效应（短期提升）——绘制每日指标以检查趋势衰减。
测试期间埋点变更（指标漂移）——确认没有并发部署。
样本比例不匹配——对分配计数进行卡方检验。
窥探 / 多次查看——确认在预设结束日期前未检查测试。
功能交互——检查同一界面上是否有其他并发实验。

可选：人工智能/自动化研究注意事项

仅在研究自动化/人工智能驱动的功能时使用。对于传统软件用户体验，请跳过。

2026年基准 : 趋势报告持续强调人工智能辅助分析。使用人工智能以提高速度，同时保持人类对策略和解释的责任。示例参考：https://www.lyssna.com/blog/ux-research-trends/

维度	问题	方法
心智模型	用户认为系统能/不能做什么？	访谈、概念测试
信任校准	用户何时过度/不足依赖？	场景测试、日志审查
解释有用性	“为什么”有助于决策吗？	A/B解释变体、访谈
失败恢复	用户能恢复并完成任务吗？	失败路径可用性测试

错误分类（用户可见）

失败类型	典型影响	测量内容
错误输出	返工、失去信任	验证 + 覆盖率
缺失输出	手动回退	回退完成率
不明确输出	困惑	澄清请求
不可恢复失败	流程阻塞	恢复时间、联系支持

可选：人工智能辅助研究运营（带护栏）

仅在个人身份信息脱敏后，将自动化用于转录/打标签。
维护审计追踪：每个主题都链接回原始引述/片段。

合成用户：何时适用（2026年）

趋势报告频繁提及合成/人工智能参与者。在明确边界内使用。示例参考：https://www.lyssna.com/blog/ux-research-trends/

使用场景	是否适用？	原因
早期概念头脑风暴	警告：仅作为补充	生成边缘案例，而非验证
场景/边缘案例扩展	通过是	在真实测试前扩大覆盖范围
主持人培训/练习	通过是	练习而无参与者负担
假设生成	通过是	探索需用真实用户测试的方向
验证/通过-不通过决策	失败从不	无法替代真实体验
可用性发现作为证据	失败从不	需要真实行为
报告中的引述	失败从不	捏造的引述损害可信度

关键规则 : 合成输出是假设，而非证据。在发布前，始终用真实用户验证。

核心研究方法：

references/research-frameworks.md — 待办任务、卡诺模型、双钻模型、服务蓝图、机会映射
references/ux-audit-framework.md — 启发式评估、认知走查、严重性评级
references/usability-testing-guide.md — 任务设计、引导、分析
references/ux-metrics-framework.md — 任务指标、系统可用性量表/用户体验指标、测量指南
references/customer-journey-mapping.md — 旅程地图和服务蓝图
references/pain-point-extraction.md — 反馈到主题的方法
references/review-mining-playbook.md — 企业对企业/企业对消费者评论挖掘

人口统计与定量研究：

references/demographic-research-methods.md — 针对老年人、儿童、不同文化、残障人士的包容性研究
references/non-technical-user-research.md — 针对非技术性和低数字素养用户的研究方法
references/ab-testing-implementation.md — A/B测试深度解析（样本量、分析、陷阱）

竞争性用户体验分析与流程模式：

references/competitive-ux-analysis.md — 来自行业领导者（Wise, Revolut, Shopify, Notion, Linear, Stripe）的逐步流程模式 + 基准测试方法

研究运营与方法：

references/research-repository-management.md — 知识库架构、分类法、原子研究、个人身份信息处理、采用指标
references/survey-design-guide.md — 问题类型、偏见预防、抽样、样本量、分发、平台比较
references/remote-research-patterns.md — 主持式远程、非主持测试、异步方法、招募、工具比较

反馈收集与分析：

references/bigtech-feedback-patterns.md — 顶级公司如何收集和处理用户反馈
references/feedback-tools-guide.md — 反馈收集工具设置指南和选择矩阵

评估性迭代：

references/evaluative-research-loop.md — 原型-产品一致性打磨循环（双界面审计、漂移分类、快速迭代）

数据与来源：

data/sources.json — 精选的外部参考资料

特定领域用户体验基准测试

重要 : 当为特定领域设计用户体验流程时，必须使用网络搜索来查找并建议行业领导者的最佳实践模式。

“我们正在为 [领域] 设计 [流程类型]”
“[行业] 中 [功能] 的最佳用户体验是什么？”
“[公司 A, 公司 B] 如何处理 [流程]？”
“将我们的 [功能] 与竞争对手进行基准测试”
任何具有可识别领域背景的用户体验设计任务

领域 → 领导者查找表

领域	需要检查的行业领导者	关键流程
金融科技/银行	Wise, Revolut, Monzo, N26, Chime, Mercury	入门/了解你的客户、转账、卡片管理、消费分析
电子商务	Shopify, Amazon, Stripe Checkout	结账、购物车、产品页面、退货
软件即服务/企业对企业	Linear, Notion, Figma, Slack, Airtable	入门、设置、协作、权限
开发者工具	Stripe, Vercel, GitHub, Supabase	文档、API浏览器、仪表板、命令行界面
消费者应用	Spotify, Airbnb, Uber, Instagram	发现、预订、信息流、社交
医疗保健	Oscar, One Medical, Calm, Headspace	预约安排、记录、合规流程
教育科技	Duolingo, Coursera, Khan Academy	入门、进度、游戏化

当用户指定领域时，执行：

搜索："[领域] UX 最佳实践 2026"
搜索："[领导者公司] [流程类型] UX"
搜索："[领导者公司] 应用评论 UX" site:mobbin.com OR site:pageflows.com
搜索："[领域] 入门流程示例"

搜索后，提供：

模式示例 : 来自2-3个行业领导者的截图/流程
识别的关键模式 : 他们做得好的地方（具体说明）
适用于你的流程 : 如何调整模式
差异化机会 : 你可以在哪些方面超越领导者

DOMAIN: Fintech (Money Transfer)
BENCHMARKED: Wise, Revolut

WISE PATTERNS:
- Upfront fee transparency (shows exact fee before recipient input)
- Mid-transfer rate lock (shows countdown timer)
- Delivery time estimate per payment method
- Recipient validation (bank account check before send)

REVOLUT PATTERNS:
- Instant send to Revolut users (P2P first)
- Currency conversion preview with rate comparison
- Scheduled/recurring transfers prominent

APPLY TO YOUR FLOW:
1. Add fee transparency at step 1 (not step 3)
2. Show delivery estimate per payment rail
3. Consider rate lock feature for FX transfers

DIFFERENTIATION OPPORTUNITY:
- Neither shows historical rate chart—add "is now a good time?" context

重要 : 当用户询问关于用户体验研究的推荐问题时，必须在回答前使用网络搜索检查当前趋势。

工具/趋势触发条件

“对于 [使用场景] 来说，最好的用户体验研究工具是什么？”
“我应该用什么工具进行 [可用性测试/调查/分析]？”
“用户体验研究的最新动态是什么？”
“[用户访谈/A/B测试/无障碍性] 的当前最佳实践是什么？”
“[研究方法] 在2026年仍然相关吗？”
“我应该使用哪些研究工具？”
“进行 [远程研究/非主持测试] 的最佳方法是什么？”

搜索："UX 研究趋势 2026"
搜索："UX 研究工具最佳实践 2026"
搜索："[Maze/Hotjar/UserTesting] 比较 2026"
搜索："AI 在 UX 研究中的应用 2026"

工具/趋势报告格式

搜索后，提供：

当前格局 : 现在流行哪些研究方法/工具
新兴趋势 : 正在兴起的新技术或工具
已弃用/衰退 : 正在失去效力的方法
推荐 : 基于最新数据和当前实践

示例主题（通过最新搜索验证）

人工智能驱动的研究工具（Maze AI, Looppanel）
非主持测试平台的演变
客户之声平台
分析和行为工具（Hotjar, FullStory）
无障碍测试工具和标准
研究知识库和洞察管理

共享计划模板：../software-clean-code-standard/assets/checklists/ux-research-plan-template.md — 产品无关的研究计划模板（核心 + 可选人工智能）
assets/research-plan-template.md — 用户体验研究计划模板
assets/testing/usability-test-plan.md — 可用性测试计划
assets/testing/usability-testing-checklist.md — 可用性测试检查清单
assets/audits/heuristic-evaluation-template.md — 启发式评估
assets/audits/ux-audit-report-template.md — 审计报告

评估性研究循环

关于原型-产品一致性打磨（当产品“接近理想”时的快速迭代），请参阅 references/evaluative-research-loop.md。涵盖：双界面审计、漂移分类（布局/密度/控件/内容/状态）、基于摩擦的优先级排序、横幅/加载护栏、本地化就绪检查以及快速迭代节奏。

在最终答案前，使用网络搜索/网络抓取来验证当前外部事实、版本、定价、截止日期、法规或平台行为。
优先使用主要来源；对于易变信息，报告来源链接和日期。
如果无法访问网络，请说明限制并将指南标记为未经验证。

🇺🇸English

Software UX Research Skill — Quick Reference

Use this skill to identify problems/opportunities and de-risk decisions. Use software-ui-ux-design to implement UI patterns, component changes, and design system updates.

Mar 2026 Baselines (Core)

Human-centred design : Iterative design + evaluation grounded in evidence (ISO 9241-210:2019) https://www.iso.org/standard/77520.html
Usability definition : Effectiveness, efficiency, satisfaction in context (ISO 9241-11:2018) https://www.iso.org/standard/63500.html
Accessibility baseline : WCAG 2.2 is a W3C Recommendation (12 Dec 2024) https://www.w3.org/TR/WCAG22/
WCAG 3.0 preview : Working Draft published Sep 2025; introduces Bronze/Silver/Gold conformance tiers and enhanced cognitive accessibility; not expected before 2028-2030 https://www.w3.org/WAI/standards-guidelines/wcag/wcag3-intro/
EU shipping note : European Accessibility Act applies to covered products/services after 28 Jun 2025 (Directive (EU) 2019/882) https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32019L0882

When to Use This Skill

Discovery: user needs, JTBD, opportunity sizing, mental models.
Validation: concepts, prototypes, onboarding/first-run success.
Evaluative: usability tests, heuristic evaluation, cognitive walkthroughs.
Quant/behavioral: funnels, cohorts, instrumentation gaps, guardrails.
Research Ops: intake, prioritization, repository/taxonomy, consent/PII handling.
Demographic research : Age-diverse, cultural, accessibility participant recruitment.
A/B testing : Experiment design, sample size, analysis, pitfalls.
Non-technical user research : Digital literacy assessment, simplified-flow validation, low-tech-confidence usability testing.

When NOT to Use This Skill

UI implementation → Use software-ui-ux-design for components, patterns, code
Analytics instrumentation → Use marketing-product-analytics for tracking plans and qa-observability for implementation patterns
Accessibility compliance audit → Use accessibility-specific checklists (WCAG conformance)
Marketing research → Use marketing-social-media or related marketing skills
A/B test platform setup → Use experimentation platforms (Statsig, GrowthBook, LaunchDarkly)

Operating Mode (Core)

If inputs are missing, ask for:

Decision to unblock (what will change based on this research).
Target roles/segments and top tasks.
Platforms and contexts (web/mobile/desktop; remote/on-site; assisted tech).
Existing evidence (analytics, tickets, reviews, recordings, prior studies).
Constraints (timeline, recruitment access, compliance, budget).

Default outputs (pick what the user asked for):

Research plan + output contract (prefer ../software-clean-code-standard/assets/checklists/ux-research-plan-template.md; use assets/research-plan-template.md for skill-specific detail)
Study protocol (tasks/script + success metrics + recruitment plan)
Findings report (issues + severity + evidence + recommendations + confidence)
Decision brief (options + tradeoffs + recommendation + measurement plan)

Required Output Sections

Every research output — plans, protocols, evaluations, reports — must include these sections. They represent the skill's core value beyond standard UX knowledge: governance, confidence calibration, and ethical research practice.

Method Justification : Name the chosen method AND explain why alternatives were rejected. Do not just describe the method; explain why it was selected over at least 2 alternatives given the specific context (stage, timeline, sample, question type).
Confidence & Triangulation Assessment: Tag every recommendation or finding with a confidence level:

Confidence	Evidence requirement	Use for
High	Multiple methods or sources agree	High-impact decisions
Medium	Strong signal from one method + supporting indicators	Prioritization
Low	Single source / small sample	Exploratory hypotheses only

Consent & Data Handling: Include a PII/consent section in every plan or protocol. Research that involves participants requires explicit attention to:
- Minimum PII collection
- Identity stored separately from study data
- Name/email redaction before broad sharing
- Recording access restricted to need-to-know
- Consent, purpose, retention, and opt-out documented
Decision Framework : For evaluations and analysis outputs, provide a structured decision table with options, confidence levels, timelines, and risks — not just a single recommendation.
Pre-Decision Checklist : For experiment evaluations (A/B tests, etc.), include a verification checklist of confounds and data quality checks to complete before any ship/kill decision.

Method Chooser (Core)

Decision Tree (Fast)

What do you need?
  ├─ WHY / needs / context → interviews, contextual inquiry, diary
  ├─ HOW / usability → moderated usability test, cognitive walkthrough, heuristic eval
  ├─ WHAT / scale → analytics/logs + targeted qual follow-ups
  └─ WHICH / causal → experiments (if feasible) or preference tests

When selecting a method, always justify the choice by explaining why 2+ alternatives were rejected given the user's specific context. This is a key differentiator — generic "we'll do interviews" without justification is insufficient.

Research by Product Stage

Stage Framework (What to Do When)

Stage	Decisions	Primary Methods	Secondary Methods	Output
Discovery	What to build and for whom	Interviews, field/diary, journey mapping	Competitive analysis, feedback mining	Opportunity brief + JTBD + Forces of Progress
Concept/MVP	Does the concept work?	Concept test, prototype usability	First-click/tree test	MVP scope + onboarding plan
Launch	Is it usable + accessible?	Usability testing, accessibility review	Heuristic eval, session replay	Launch blockers + fixes
Growth	What drives adoption/value?	Segmented analytics + qual follow-ups	Churn interviews, surveys	Retention drivers + friction
Maturity	What to optimize/deprecate?	Experiments, longitudinal tracking	Unmoderated tests

Discovery Outputs: Beyond Basic JTBD

Discovery research should produce more than job statements. Include:

Forces of Progress diagram : Map the four forces acting on switching behavior — Push (current pain), Pull (new solution appeal), Anxiety (fear of change), Habit (inertia). These forces explain why users do or don't adopt, which directly informs positioning and onboarding.
Pain Point Severity Matrix : Score each pain point by Frequency × Impact × Breadth to prioritize objectively. A pain that affects 3 roles weekly outranks one that affects 1 role monthly, even if the single-role pain feels more dramatic in interviews.

Research for Complex Systems (Workflows, Admin, Regulated)

Complexity Indicators

Indicator	Example	Research Implication
Multi-step workflows	Draft → approve → publish	Task analysis + state mapping
Multi-role permissions	Admin vs editor vs viewer	Test each role + transitions
Data dependencies	Requires integrations/sync	Error-path + recovery testing
High stakes	Finance, healthcare	Safety checks + confirmations
Expert users	Dev tools, analytics	Recruit real experts (not proxies)

Evaluation Methods (Core)

Contextual inquiry: observe real work and constraints.
Task analysis: map goals → steps → failure points.
Cognitive walkthrough: evaluate learnability and signifiers.
Error-path testing: timeouts, offline, partial data, permission loss, retries.
Multi-role walkthrough: simulate handoffs (creator → reviewer → admin).

Multi-Role Coverage Checklist

Role-permission matrix documented.
“No access” UX defined (request path, least-privilege defaults).
Cross-role handoffs tested (notifications, state changes, audit history).
Error recovery tested for each role (retry, undo, escalation).

Research Ops & Governance (Core)

Intake (Make Requests Comparable)

Minimum required fields:

Decision to unblock and deadline.
Research questions (primary + secondary).
Target users/segments and recruitment constraints.
Existing evidence and links.
Deliverable format + audience.

Prioritization (Simple Scoring)

Use a lightweight score to avoid backlog paralysis:

Decision impact
Knowledge gap
Timing urgency
Feasibility (recruitment + time)

Repository & Taxonomy

Store each study with: method, date, product area, roles, tasks, key findings, raw evidence links.
Tag for reuse: problem type (navigation/forms/performance), component/pattern, funnel step.
Prefer “atomic” findings (one insight per card) to enable recombination [Inference].

Consent, PII, and Access Control

Follow applicable privacy laws; GDPR is a primary reference for EU processing https://eur-lex.europa.eu/eli/reg/2016/679/oj

PII handling checklist:

Collect minimum PII needed for scheduling and incentives.
Store identity/contact separately from study data.
Redact names/emails from transcripts before broad sharing.
Restrict raw recordings to need-to-know access.
Document consent, purpose, retention, and opt-out path.

Research Democratization (2026 Trend)

Research democratization is a recurring 2026 trend: non-researchers increasingly conduct research. Enable carefully with guardrails.

Approach	Guardrails	Risk Level
Templated usability tests	Script + task templates provided	Low
Customer interviews by PMs	Training + review required	Medium
Survey design by anyone	Central review + standard questions	Medium
Unsupervised research	Not recommended	High

Guardrails for non-researchers:

Pre-approved research templates only
Central review of findings before action
No direct participant recruitment without ops approval
Mandatory bias awareness training
Clear escalation path for unexpected findings

Researching Non-Technical User Segments (2026)

Quick checklist for research involving users with low digital literacy or low tech confidence. Full guidance in references/non-technical-user-research.md.

Assess digital literacy tier (excluded → dependent → hesitant → capable → confident)
Recruit via offline-first channels (community centers, libraries, phone outreach)
Use plain-language screening questions (no jargon, no self-rating scales)
Adapt methods: moderated-only testing, shorter sessions (30-40 min), read tasks aloud
Measure: unassisted task completion (>=80%), time-to-first-value (<2 min), error recovery rate
Frame findings as "inclusion improvements," not "dumbing down"
Cross-reference with simplification audit template

Measurement & Decision Quality (Core)

Research ROI Quick Reference

Research Activity	Proxy Metric	Calculation
Usability testing finding	Prevented dev rework	Hours saved × $150/hr
Discovery interview	Prevented build-wrong-thing	Sprint cost × risk reduction %
A/B test conclusive result	Improved conversion	(ΔConversion × Traffic × LTV) - Test cost
Heuristic evaluation	Early defect detection	Defects found × Cost-to-fix-later

Rules of thumb :

1 usability finding that prevents 40 hours of rework = $6,000 value
1 discovery insight that prevents 1 wasted sprint = $50,000-100,000 value
Research that improves conversion 0.5% on 100k visitors × $50 LTV = $25,000/month

When NOT to Run A/B Tests

Situation	Why it fails	Better method
Low power/traffic	Inconclusive results	Usability tests + trends
Many variables change	Attribution impossible	Prototype tests → staged rollout
Need “why”	Experiments don’t explain	Interviews + observation
Ethical constraints	Harmful denial	Phased rollout + holdouts
Long-term effects	Short tests miss delayed impact	Longitudinal + retention analysis

Common Confounds (Call Out Early)

Always check for these in experiment evaluations. List each relevant confound with its risk level and how to verify — do not just name them:

Selection bias (only power users respond) — check segment composition.
Survivorship bias (you miss churned users) — compare with cohort-level data.
Novelty effect (short-term lift) — plot daily metrics to check for trend decay.
Instrumentation changes mid-test (metrics drift) — confirm no concurrent deployments.
Sample ratio mismatch (SRM) — run chi-square on assignment counts.
Peeking / multiple looks — confirm test was not checked before pre-set end date.
Feature interaction — check if other experiments ran concurrently on same surface.

Optional: AI/Automation Research Considerations

Use only when researching automation/AI-powered features. Skip for traditional software UX.

2026 benchmark : Trend reports consistently highlight AI-assisted analysis. Use AI for speed while keeping humans responsible for strategy and interpretation. Example reference: https://www.lyssna.com/blog/ux-research-trends/

Key Questions

Dimension	Question	Methods
Mental model	What do users think the system can/can’t do?	Interviews, concept tests
Trust calibration	When do users over/under-rely?	Scenario tests, log review
Explanation usefulness	Does “why” help decisions?	A/B explanation variants, interviews
Failure recovery	Do users recover and finish tasks?	Failure-path usability tests

Error Taxonomy (User-Visible)

Failure type	Typical impact	What to measure
Wrong output	Rework, lost trust	Verification + override rate
Missing output	Manual fallback	Fallback completion rate
Unclear output	Confusion	Clarification requests
Non-recoverable failure	Blocked flow	Time-to-recovery, support contact

Optional: AI-Assisted Research Ops (Guardrailed)

Use automation for transcription/tagging only after PII redaction.
Maintain an audit trail: every theme links back to raw quotes/clips.

Synthetic Users: When Appropriate (2026)

Trend reports frequently mention synthetic/AI participants. Use with clear boundaries. Example reference: https://www.lyssna.com/blog/ux-research-trends/

Use Case	Appropriate?	Why
Early concept brainstorming	WARNING: Supplement only	Generate edge cases, not validation
Scenario/edge case expansion	PASS Yes	Broaden coverage before real testing
Moderator training/practice	PASS Yes	Practice without participant burden
Hypothesis generation	PASS Yes	Explore directions to test with real users
Validation/go-no-go decisions	FAIL Never	Cannot substitute lived experience
Usability findings as evidence	FAIL Never	Real behavior required
Quotes in reports	FAIL Never	Fabricated quotes damage credibility

Critical rule : Synthetic outputs are hypotheses , not evidence. Always validate with real users before shipping.

Navigation

Resources

Core Research Methods:

references/research-frameworks.md — JTBD, Kano, Double Diamond, Service Blueprint, opportunity mapping
references/ux-audit-framework.md — Heuristic evaluation, cognitive walkthrough, severity rating
references/usability-testing-guide.md — Task design, facilitation, analysis
references/ux-metrics-framework.md — Task metrics, SUS/HEART, measurement guidance
references/customer-journey-mapping.md — Journey mapping and service blueprints
references/pain-point-extraction.md — Feedback-to-themes method
references/review-mining-playbook.md — B2B/B2C review mining

Demographic & Quantitative Research:

references/demographic-research-methods.md — Inclusive research for seniors, children, cultures, disabilities
references/non-technical-user-research.md — Research methods for non-technical and low-digital-literacy users
references/ab-testing-implementation.md — A/B testing deep-dive (sample size, analysis, pitfalls)

Competitive UX Analysis & Flow Patterns:

references/competitive-ux-analysis.md — Step-by-step flow patterns from industry leaders (Wise, Revolut, Shopify, Notion, Linear, Stripe) + benchmarking methodology

Research Operations & Methods:

references/research-repository-management.md — Repository architecture, taxonomy, atomic research, PII handling, adoption metrics
references/survey-design-guide.md — Question types, bias prevention, sampling, sample size, distribution, platform comparison
references/remote-research-patterns.md — Moderated remote, unmoderated testing, async methods, recruitment, tool comparison

Feedback Collection & Analysis:

references/bigtech-feedback-patterns.md — How top companies collect and act on user feedback
references/feedback-tools-guide.md — Feedback collection tool setup guides and selection matrix

Evaluative Iteration:

references/evaluative-research-loop.md — Prototype-parity polishing loop (two-surface audit, drift classification, fast iteration)

Data & Sources:

data/sources.json — Curated external references

Domain-Specific UX Benchmarking

IMPORTANT : When designing UX flows for a specific domain, you MUST use WebSearch to find and suggest best-practice patterns from industry leaders.

Trigger Conditions

"We're designing [flow type] for [domain]"
"What's the best UX for [feature] in [industry]?"
"How do [Company A, Company B] handle [flow]?"
"Benchmark our [feature] against competitors"
Any UX design task with identifiable domain context

Domain → Leader Lookup Table

Domain	Industry Leaders to Check	Key Flows
Fintech/Banking	Wise, Revolut, Monzo, N26, Chime, Mercury	Onboarding/KYC, money transfer, card management, spend analytics
E-commerce	Shopify, Amazon, Stripe Checkout	Checkout, cart, product pages, returns
SaaS/B2B	Linear, Notion, Figma, Slack, Airtable	Onboarding, settings, collaboration, permissions
Developer Tools	Stripe, Vercel, GitHub, Supabase	Docs, API explorer, dashboard, CLI
Consumer Apps	Spotify, Airbnb, Uber, Instagram	Discovery, booking, feed, social
Healthcare	Oscar, One Medical, Calm, Headspace	Appointment booking, records, compliance flows
EdTech	Duolingo, Coursera, Khan Academy

Required Searches

When user specifies a domain, execute:

Search: "[domain] UX best practices 2026"
Search: "[leader company] [flow type] UX"
Search: "[leader company] app review UX" site:mobbin.com OR site:pageflows.com
Search: "[domain] onboarding flow examples"

What to Report

After searching, provide:

Pattern examples : Screenshots/flows from 2-3 industry leaders
Key patterns identified : What they do well (with specifics)
Applicable to your flow : How to adapt patterns
Differentiation opportunity : Where you could improve on leaders

Example Output Format

DOMAIN: Fintech (Money Transfer)
BENCHMARKED: Wise, Revolut

WISE PATTERNS:
- Upfront fee transparency (shows exact fee before recipient input)
- Mid-transfer rate lock (shows countdown timer)
- Delivery time estimate per payment method
- Recipient validation (bank account check before send)

REVOLUT PATTERNS:
- Instant send to Revolut users (P2P first)
- Currency conversion preview with rate comparison
- Scheduled/recurring transfers prominent

APPLY TO YOUR FLOW:
1. Add fee transparency at step 1 (not step 3)
2. Show delivery estimate per payment rail
3. Consider rate lock feature for FX transfers

DIFFERENTIATION OPPORTUNITY:
- Neither shows historical rate chart—add "is now a good time?" context

Trend Awareness Protocol

IMPORTANT : When users ask recommendation questions about UX research, you MUST use WebSearch to check current trends before answering.

Tool/Trend Triggers

"What's the best UX research tool for [use case]?"
"What should I use for [usability testing/surveys/analytics]?"
"What's the latest in UX research?"
"Current best practices for [user interviews/A/B testing/accessibility]?"
"Is [research method] still relevant in 2026?"
"What research tools should I use?"
"Best approach for [remote research/unmoderated testing]?"

Tool/Trend Searches

Search: "UX research trends 2026"
Search: "UX research tools best practices 2026"
Search: "[Maze/Hotjar/UserTesting] comparison 2026"
Search: "AI in UX research 2026"

Tool/Trend Report Format

After searching, provide:

Current landscape : What research methods/tools are popular NOW
Emerging trends : New techniques or tools gaining traction
Deprecated/declining : Methods that are losing effectiveness
Recommendation : Based on fresh data and current practices

Example Topics (verify with fresh search)

AI-powered research tools (Maze AI, Looppanel)
Unmoderated testing platforms evolution
Voice of Customer (VoC) platforms
Analytics and behavioral tools (Hotjar, FullStory)
Accessibility testing tools and standards
Research repository and insight management

Templates

Shared plan template: ../software-clean-code-standard/assets/checklists/ux-research-plan-template.md — Product-agnostic research plan template (core + optional AI)
assets/research-plan-template.md — UX research plan template
assets/testing/usability-test-plan.md — Usability test plan
assets/testing/usability-testing-checklist.md — Usability testing checklist
assets/audits/heuristic-evaluation-template.md — Heuristic evaluation
assets/audits/ux-audit-report-template.md — Audit report

Evaluative Research Loop

For prototype-parity polishing (fast iteration when product is "almost ideal"), see references/evaluative-research-loop.md. Covers: two-surface audit, drift classification (layout/density/control/content/state), friction-based prioritization, banner/loading guardrails, localization-readiness checks, and fast iteration cadence.

Fact-Checking

Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
Prefer primary sources; report source links and dates for volatile information.
If web access is unavailable, state the limitation and mark guidance as unverified.

Weekly Installs

Repository

vasilyu1983/ai-…s-public

GitHub Stars

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

codex54

opencode54

gemini-cli53

cursor53

github-copilot49

amp44

注册流程转化率优化指南：减少摩擦、提高完成率的专家技巧

31,300 周安装