产品经理PoL探针选择指南：5种概念验证方法决策框架

pol-probe-advisor by deanpeters/product-manager-skills

389 周安装量

2,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/deanpeters/product-manager-skills --skill pol-probe-advisor

方法论测试产品管理

🇨🇳中文介绍

目的

指导产品经理根据他们的假设、风险和可用资源，选择合适的概念验证（PoL）探针类型（共5种）。当您需要消除特定风险或测试一个狭窄的假设，但不确定使用哪种验证方法时，请使用此技能。这个互动技能确保您将最便宜的原型与最严酷的真相相匹配——而不是您最擅长构建的原型。

这不是用来决定是否应该验证的工具（您应该验证）。它是一个用于选择如何最有效验证的决策框架。

核心概念

核心问题：方法与假设不匹配

常见的失败模式： 产品经理根据工具熟悉度（"我会用 Figma，所以我会设计一个原型"）而不是学习目标来选择验证方法。结果：验证了错误的东西，错过了实际的风险。

解决方案： 从假设出发，逆向工作。提问："我要消除的具体风险是什么？通往严酷真相的最廉价路径是什么？"

5 种 PoL 探针类型（快速参考）

类型	核心问题	最适合	时间线
可行性检查	"我们能构建这个吗？"	技术未知、API 依赖、数据完整性	1-2 天
任务导向测试	"用户能否无摩擦地完成这项工作？"	关键 UI 时刻、字段标签、决策点	2-5 天
叙事原型	"这个工作流程能获得利益相关者的认同吗？"

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

反模式（这不是什么）

不是"构建你熟悉的原型"： 让方法匹配假设，而不是匹配技能
不是"根据利益相关者偏好选择"： 为学习而优化，而不是为内部政治
不是"选择最令人印象深刻的选项"： 令人印象深刻 ≠ 信息丰富
不是"默认写代码"： 编写代码应该是你的最后手段，而不是首选

何时使用此技能

✅ 在以下情况使用此技能：

你有一个清晰的假设，但不知道使用哪种验证方法
你不确定是应该构建代码、创建视频还是运行模拟
你需要快速（几天内）消除一个特定风险
你想避免原型剧场

❌ 不要在以下情况使用此技能：

你还没有假设（先使用 problem-statement.md 或 problem-framing-canvas.md）
你试图给高管留下深刻印象（那不是验证）
你已经知道答案（确认偏误）
你需要发布一个 MVP（这是用于 MVP 前的侦察）

引导过程参考标准

使用 workshop-facilitation 作为此技能的默认交互协议。

会话预告 + 进入模式（引导式、上下文转储、最佳猜测）
使用通俗语言提示的单轮提问
进度标签（例如，上下文 Qx/8 和评分 Qx/5）
中断处理和暂停/恢复行为
决策点的编号建议
常规问题的快速选择编号响应选项（在有用时包含 其他（请说明））

此文件定义了特定领域的评估内容。如果存在冲突，请遵循此文件的领域逻辑。

此互动技能使用自适应提问，根据您的上下文推荐正确的 PoL 探针类型。

步骤 0：收集上下文

让我们找出哪种 PoL 探针类型适合您的验证需求。首先，我需要一些上下文信息：

1. 您正在测试什么假设？（用一句话描述，或使用"如果[我们做 X]给[用户画像]，那么[结果]"的格式）

2. 您试图消除什么具体风险？ 例如：

技术可行性（"我们的 API 能处理实时数据吗？"）
用户任务完成度（"用户能找到'导出'按钮吗？"）
利益相关者一致性（"领导层会批准这个方向吗？"）
边缘情况行为（"系统如何处理重复条目？"）
工作流程验证（"用户会完成 3 步引导流程吗？"）

3. 您的时间线如何？

小时（当天验证）
1-2 天（快速探索）
3-5 天（中等投入）
1 周以上（太长——考虑分解为更小的探针）

4. 您有哪些可用资源？ 例如：

工程能力（1 名开发人员 1 天）
设计工具（Figma、Loom、Sora）
AI/无代码工具（ChatGPT Canvas、Replit、Airtable）
用户访问权限（等待列表中的 10 名用户、5 名 Beta 客户等）
预算（用于 UsabilityHub、Optimal Workshop 等）

步骤 1：确定核心问题

代理综合用户输入并询问：

根据您的假设和风险，您真正想回答以下哪个核心问题？

提供 5 个选项（与探针类型对应）：

"我们能构建这个吗？" — 您不确定技术可行性、API 集成、数据可用性或第三方依赖
"用户能否无摩擦地完成这项工作？" — 您正在验证关键 UI 时刻、字段标签、导航或决策点
"这个工作流程能获得利益相关者的认同吗？" — 您需要解释一个复杂流程、协调领导层，或者"讲述而非测试"故事
"我们能否在不冒生产风险的情况下建模？" — 您需要探索边缘情况、模拟用户行为，或安全地测试提示逻辑
"这个解决方案能经受住真实用户的接触吗？" — 您需要用户与半功能工作流程交互，以发现 UX/工作流程问题

用户响应： [选择一个数字，或者如果都不合适则描述]

步骤 2：推荐 PoL 探针类型

根据用户选择，代理推荐匹配的探针类型：

选项 1 被选中："我们能构建这个吗？"

→ 推荐探针：可行性检查

这是什么： 一个为期 1-2 天的快速探索和删除测试，旨在暴露技术风险。不是为了给人留下深刻印象——而是为了快速揭示阻碍因素。

GenAI 提示链（测试 AI 是否能处理您的用例）
API 嗅探测试（验证第三方集成是否有效）
数据完整性扫描（检查您的数据是否支持该功能）
第三方工具评估（测试 Zapier/Stripe/Twilio 是否如您所想的那样工作）

时间线： 1-2 天

ChatGPT/Claude（提示测试）
Postman/Insomnia（API 测试）
Jupyter notebooks（数据探索）
概念验证脚本（一次性代码）

成功标准示例：

通过： API 在 <200ms 内返回预期的数据格式
失败： API 超时，或数据结构与我们的模式不兼容
学习： 识别具体的技术阻碍因素

处置计划： 记录发现后删除所有探索代码。

下一步： 您希望我生成一个记录此可行性检查的 pol-probe 工件吗？

选项 2 被选中："用户能否无摩擦地完成这项工作？"

→ 推荐探针：任务导向测试

这是什么： 使用专门的测试工具验证关键时刻——字段标签、决策点、导航、流失区域。专注于可观察的任务完成度，而不是意见。

Optimal Workshop（树测试、卡片分类）
UsabilityHub（5 秒测试、点击测试、偏好测试）
Maze（带有热图的原型测试）
Loom 录制的任务演练（要求用户"边想边说"）

时间线： 2-5 天

Optimal Workshop（$200/月）
UsabilityHub（$100-300/月）
Maze（提供免费套餐）
Loom（基础功能免费）

成功标准示例：

通过： 80%+ 用户在 <2 分钟内完成任务
失败： <60% 完成率，或 3+ 名用户在同一环节卡住
学习： 识别确切的摩擦点（特定字段、按钮等）

处置计划： 归档会话录音，记录学习成果，删除测试原型。

下一步： 您希望我生成一个记录此任务导向测试的 pol-probe 工件吗？

选项 3 被选中："这个工作流程能获得利益相关者的认同吗？"

→ 推荐探针：叙事原型

这是什么： 讲述故事，而不是测试界面。使用视频演练或幻灯片故事板来解释工作流程并衡量兴趣。这是"讲述而非测试"——您验证的是叙事，而不是 UI。

Loom 演练（带画外音的屏幕录制）
Sora/Synthesia/Veo3（AI 生成的解说视频）
幻灯片故事板（带有插图的 PowerPoint/Keynote）
故事板草图（使用 storyboard.md 组件技能）

时间线： 1-3 天

Loom（免费、快速）
Sora/Synthesia（文本转视频，付费）
PowerPoint/Keynote（幻灯片动画）
Figma（静态故事板帧）

成功标准示例：

通过： 8/10 的利益相关者说"我会用这个"或"这解决了问题"
失败： 利益相关者问"我为什么要用这个？"或建议替代方案
学习： 识别叙事的哪部分引起共鸣（或没有）

处置计划： 归档视频，记录反馈，删除支持文件。

下一步： 您希望我生成一个记录此叙事原型的 pol-probe 工件吗？

选项 4 被选中："我们能否在不冒生产风险的情况下建模？"

→ 推荐探针：合成数据模拟

这是什么： 使用模拟用户、合成数据或提示逻辑测试来探索边缘情况和未知的未知，而无需触及生产环境。想想"风洞测试，比事后分析便宜"。

Synthea（合成患者数据生成）
DataStax LangFlow（在没有真实用户的情况下测试提示逻辑）
蒙特卡洛模拟（模拟概率结果）
合成用户行为脚本（模拟点击模式、负载测试）

时间线： 2-4 天

Synthea（开源，医疗保健）
DataStax LangFlow（提示链测试）
Python + Faker 库（生成合成数据）
Locust/k6（使用合成用户进行负载测试）

成功标准示例：

通过： 系统处理 10,000 个合成用户时错误率 <1%
失败： 边缘情况导致崩溃或错误输出
学习： 识别哪些边缘情况会导致系统崩溃

处置计划： 删除合成数据，归档发现结果，记录边缘情况。

下一步： 您希望我生成一个记录此合成数据模拟的 pol-probe 工件吗？

选项 5 被选中："这个解决方案能经受住真实用户的接触吗？"

→ 推荐探针：氛围编码 PoL 探针

这是什么： 一个弗兰肯软件堆栈（ChatGPT Canvas + Replit + Airtable），创造足够的假象让用户与半功能工作流程交互。不是生产级的——只是足够在 48 小时内捕捉 UX/工作流程信号。

⚠️ 警告： 这是风险最高的探针类型。它看起来足够真实，容易将势头误认为成熟度。仅在需要真实用户接触且其他方法无法满足时使用。

ChatGPT Canvas（快速 UI 生成）
Replit（托管一次性代码）
Airtable（假数据库）
Carrd/Webflow（着陆页 + 工作流程模拟）

时间线： 2-3 天

ChatGPT Canvas：生成表单 UI
Replit：托管简单的 Flask/Node 应用
Airtable：捕获表单提交
Loom：录制用户会话用于事后分析

成功标准示例：

通过： 8/10 用户完成工作流程，0 个关键困惑时刻
失败： 用户卡住，问"这个坏了吗？"，或中途放弃
学习： 识别用户失去信心的确切步骤

处置计划： 用户会话后删除所有代码，归档 Loom 录音，记录学习成果。

下一步： 您希望我生成一个记录此氛围编码探针的 pol-probe 工件吗？

步骤 3：应用组件技能

我建议对您的假设使用 [选定的探针类型]。您希望我：

生成一个完整的 pol-probe 工件（使用 pol-probe.md 模板），包含您的假设、成功标准、时间线和处置计划？
逐步讲解此探针类型的执行步骤（工具、方法、用户招募）？
在继续之前优化假设（如果它看起来太宽泛或模糊）？

用户选择选项，代理相应执行。

步骤 4：优化或迭代（可选）

如果用户不确定或假设太宽泛：

您的假设似乎很宽泛，或者同时测试了多个东西。PoL 探针在狭窄且可证伪时效果最好。让我问几个澄清性问题：

澄清性问题：

您能先测试的最小东西是什么？（隔离一个变量）
"失败"会是什么样子？（如果您无法描述失败，就无法衡量成功）
这是在测试用户行为、技术可行性还是利益相关者一致性？（选择一个重点）

然后，代理根据优化后的假设重新推荐探针类型。

示例 1：选择任务导向测试

用户上下文：

假设： "用户能在 3 分钟内完成我们的结账流程"
风险： 移动端购物车高放弃率
时间线： 3 天
资源： UsabilityHub 账户，等待列表中的 20 名用户

核心问题："用户能否无摩擦地完成这项工作？"
风险是任务完成度（可观察行为），而不是技术可行性
时间线符合任务导向测试（2-5 天）

推荐： 使用 UsabilityHub 进行任务导向测试

方法：使用移动用户进行 Maze 原型测试
成功标准：80%+ 用户在 <3 分钟内完成结账
处置：分析热图后删除原型

示例 2：选择可行性检查

用户上下文：

假设： "我们可以使用 GPT-4 自动标记支持工单，错误率 <5%"
风险： 未知 AI 是否能处理行业特定术语
时间线： 1 天
资源： 100 份历史支持工单，ChatGPT API 访问权限

核心问题："我们能构建这个吗？"
风险是技术可行性（AI 能力），而不是用户行为
时间线很短（1 天 = 快速探索领域）

推荐： 使用提示工程快速探索进行可行性检查

方法：通过 GPT-4 提示链运行 100 份工单，测量错误率
成功标准：<5% 的错误分类工单
处置：记录发现后删除探索代码

结果： 错误率为 18%。决定不构建该功能。节省了 6 周的开发时间。

示例 3：选择叙事原型（而非氛围编码）

用户上下文：

假设： "企业买家会理解我们的多租户安全模型"
风险： 复杂的技术概念需要解释
时间线： 2 天
资源： Loom，5 个企业潜在客户

核心问题："这个工作流程能获得利益相关者的认同吗？"
风险是理解/一致性，而不是任务完成度
构建功能原型会过度投入（2 天 = 只需解释它）

推荐： 使用 Loom 演练进行叙事原型

方法：5 分钟的 Loom 视频，用图表解释安全模型
成功标准：4/5 的潜在客户在没有后续问题的情况下说"我理解这个"
处置：5 次会话后归档视频

为什么不是氛围编码探针： 您不需要用户与安全设置交互——您需要他们理解这个概念。叙事更便宜、更快。

示例 4：避免原型剧场

用户上下文：

假设： "如果我们展示一个精美的演示，高管们会批准预算"
风险： 内部政治
时间线： 3 周
资源： 完整的设计团队

⚠️ 这是原型剧场，不是 PoL 探针。

您测试的是内部政治，而不是用户行为或技术可行性。PoL 探针消除的是产品风险，而不是组织风险。

推荐： 完全跳过原型。相反：

使用叙事原型（1 天的 Loom 演练）来解释概念
用5 个目标用户（而不是高管）测试以验证假设
向高管展示用户反馈，而不是精美的演示

如果高管需要演示，请在用真实用户验证假设之后再构建它。

1. 根据工具熟悉度选择

失败模式： "我会用 Figma，所以我会设计一个 UI 原型"（即使设计不是风险所在）。

后果： 验证了错误的东西；错过了实际的风险。

解决方法： 先回答核心问题，然后选择方法。如果您需要进行可行性检查但只懂设计工具，请与工程师合作 1 天。

2. 默认写代码

失败模式： "我们就构建它，看看会发生什么。"

后果： 在发现测试了错误假设之前，已经进行了 2 周的开发。

解决方法： 提问："什么是能揭示最严酷真相的最便宜原型？"通常它不是代码。

3. 混淆氛围编码探针与 MVP

失败模式： 氛围编码探针"看起来真实"，所以团队将其视为生产代码。

后果： 范围蔓延、技术债务、抵制处置。

解决方法： 在构建之前设定处置日期。氛围编码探针是有意设计的弗兰肯软件——庆祝其粗糙，学习后删除。

4. 同时测试多个东西

失败模式： "让我们在一个探针中测试工作流程、定价和 UI。"

后果： 结果模糊不清——您不知道是哪个变量导致了失败。

解决方法： 一个探针，一个假设。如果您有 3 个假设，就运行 3 个探针。

5. 跳过成功标准

失败模式： "我们看到了就知道了。"

后果： 没有严酷的真相——只有意见和虚荣指标。

解决方法： 在构建之前写下成功标准。定义"通过"、"失败"和"学习"的阈值。

pol-probe（组件）— 用于记录 PoL 探针的模板
problem-statement（组件）— 在选择验证方法之前框定问题
problem-framing-canvas（互动）— 验证前的 MITRE 问题框定
discovery-process（工作流程）— 在验证阶段使用 PoL 探针
epic-hypothesis（组件）— 将史诗转化为可测试的假设

Jeff Patton — User Story Mapping（精益验证原则）
Marty Cagan — Inspired（2014 原型类型框架）
Dean Peters — Vibe First, Validate Fast, Verify Fit（Dean Peters 的 Substack，2025）

按探针类型分类的工具

可行性： ChatGPT/Claude、Postman、Jupyter
任务导向： Optimal Workshop、UsabilityHub、Maze
叙事： Loom、Sora、Synthesia、PowerPoint
合成数据： Synthea、DataStax LangFlow、Faker
氛围编码： ChatGPT Canvas、Replit、Airtable、Carrd

🇺🇸English

Purpose

Guide product managers through selecting the right Proof of Life (PoL) probe type (of 5 flavors) based on their hypothesis, risk, and available resources. Use this when you need to eliminate a specific risk or test a narrow hypothesis, but aren't sure which validation method to use. This interactive skill ensures you match the cheapest prototype to the harshest truth—not the prototype you're most comfortable building.

This is not a tool for deciding if you should validate (you should). It's a decision framework for choosing how to validate most effectively.

Key Concepts

The Core Problem: Method-Hypothesis Mismatch

Common failure mode: PMs choose validation methods based on tooling comfort ("I know Figma, so I'll design a prototype") rather than learning goal. Result: validate the wrong thing, miss the actual risk.

Solution: Work backwards from the hypothesis. Ask: "What specific risk am I eliminating? What's the cheapest path to harsh truth?"

The 5 PoL Probe Flavors (Quick Reference)

Type	Core Question	Best For	Timeline
Feasibility Check	"Can we build this?"	Technical unknowns, API dependencies, data integrity	1-2 days
Task-Focused Test	"Can users complete this job without friction?"	Critical UI moments, field labels, decision points	2-5 days
Narrative Prototype	"Does this workflow earn stakeholder buy-in?"	Storytelling, explaining complex flows, alignment	1-3 days
Synthetic Data Simulation	"Can we model this without production risk?"	Edge cases, unknown-unknowns, statistical modeling	2-4 days
Vibe-Coded PoL Probe	"Will this solution survive real user contact?"	Workflow/UX validation with real interactions	2-3 days

Golden Rule: "Use the cheapest prototype that tells the harshest truth."

Anti-Patterns (What This Is NOT)

Not "build the prototype you're comfortable with": Match method to hypothesis, not skillset
Not "pick based on stakeholder preference": Optimize for learning, not internal politics
Not "choose the most impressive option": Impressive ≠ informative
Not "default to code": Writing code should be your last resort, not your first

When to Use This Skill

✅ Use this when:

You have a clear hypothesis but don't know which validation method to use
You're unsure whether to build code, create a video, or run a simulation
You need to eliminate a specific risk quickly (within days)
You want to avoid prototype theater

❌ Don't use this when:

You don't have a hypothesis yet (use problem-statement.md or problem-framing-canvas.md first)
You're trying to impress executives (that's not validation)
You already know the answer (confirmation bias)
You need to ship an MVP (this is for pre-MVP reconnaissance)

Facilitation Source of Truth

Use workshop-facilitation as the default interaction protocol for this skill.

It defines:

session heads-up + entry mode (Guided, Context dump, Best guess)
one-question turns with plain-language prompts
progress labels (for example, Context Qx/8 and Scoring Qx/5)
interruption handling and pause/resume behavior
numbered recommendations at decision points
quick-select numbered response options for regular questions (include Other (specify) when useful)

This file defines the domain-specific assessment content. If there is a conflict, follow this file's domain logic.

Application

This interactive skill uses adaptive questioning to recommend the right PoL probe type based on your context.

Step 0: Gather Context

Agent asks:

Let's figure out which PoL probe type is right for your validation needs. First, I need some context:

1. What hypothesis are you testing? (Describe in one sentence, or use "If [we do X] for [persona], then [outcome]" format)

2. What specific risk are you trying to eliminate? Examples:

Technical feasibility ("Can our API handle real-time data?")
User task completion ("Can users find the 'export' button?")
Stakeholder alignment ("Will leadership approve this direction?")
Edge case behavior ("How does the system handle duplicate entries?")
Workflow validation ("Will users complete the 3-step onboarding?")

3. What's your timeline?

Hours (same-day validation)
1-2 days (quick spike)
3-5 days (moderate effort)
1 week+ (too long—consider breaking into smaller probes)

4. What resources do you have available? Examples:

Engineering capacity (1 dev for 1 day)
Design tools (Figma, Loom, Sora)
AI/no-code tools (ChatGPT Canvas, Replit, Airtable)
User access (10 users from waitlist, 5 beta customers, etc.)
Budget (for UsabilityHub, Optimal Workshop, etc.)

Step 1: Identify the Core Question

Agent synthesizes user input and asks:

Based on your hypothesis and risk, which of these core questions are you really trying to answer?

Offer 5 options (aligned to probe types):

"Can we build this?" — You're uncertain about technical feasibility, API integration, data availability, or third-party dependencies
"Can users complete this job without friction?" — You're validating critical UI moments, field labels, navigation, or decision points
"Does this workflow earn stakeholder buy-in?" — You need to explain a complex flow, align leadership, or "tell vs. test" the story
"Can we model this without production risk?" — You need to explore edge cases, simulate user behavior, or test prompt logic safely
"Will this solution survive real user contact?" — You need users to interact with a semi-functional workflow to catch UX/workflow issues

User response: [Select one number, or describe if none fit]

Step 2: Recommend PoL Probe Type

Based on user selection, agent recommends the matching probe type:

Option 1 Selected: "Can we build this?"

→ Recommended Probe: Feasibility Check

What it is: A 1-2 day spike-and-delete test to surface technical risk. Not meant to impress anyone—meant to reveal blockers fast.

Methods:

GenAI prompt chains (test if AI can handle your use case)
API sniff tests (verify third-party integrations work)
Data integrity sweeps (check if your data supports the feature)
Third-party tool evaluation (test if Zapier/Stripe/Twilio does what you think)

Timeline: 1-2 days

Tools:

ChatGPT/Claude (prompt testing)
Postman/Insomnia (API testing)
Jupyter notebooks (data exploration)
Proof-of-concept scripts (throwaway code)

Success Criteria Example:

Pass: API returns expected data format in <200ms
Fail: API times out, or data structure incompatible with our schema
Learn: Identify specific technical blocker

Disposal Plan: Delete all spike code after documenting findings.

Next Step: Would you like me to generate a pol-probe artifact documenting this feasibility check?

Option 2 Selected: "Can users complete this job without friction?"

→ Recommended Probe: Task-Focused Test

What it is: Validate critical moments—field labels, decision points, navigation, drop-off zones—using specialized testing tools. Focus on observable task completion , not opinions.

Methods:

Optimal Workshop (tree testing, card sorting)
UsabilityHub (5-second tests, click tests, preference tests)
Maze (prototype testing with heatmaps)
Loom-recorded task walkthroughs (ask users to "think aloud")

Timeline: 2-5 days

Tools:

Optimal Workshop ($200/month)
UsabilityHub ($100-300/month)
Maze (free tier available)
Loom (free for basic)

Success Criteria Example:

Pass: 80%+ users complete task in <2 minutes
Fail: <60% completion, or 3+ users get stuck on same step
Learn: Identify exact friction point (specific field, button, etc.)

Disposal Plan: Archive session recordings, document learnings, delete test prototype.

Next Step: Would you like me to generate a pol-probe artifact documenting this task-focused test?

Option 3 Selected: "Does this workflow earn stakeholder buy-in?"

→ Recommended Probe: Narrative Prototype

What it is: Tell the story, don't test the interface. Use video walkthroughs or slideware storyboards to explain workflows and measure interest. This is "tell vs. test"—you're validating the narrative, not the UI.

Methods:

Loom walkthroughs (screen recording with voiceover)
Sora/Synthesia/Veo3 (AI-generated explainer videos)
Slideware storyboards (PowerPoint/Keynote with illustrations)
Storyboard sketches (use storyboard.md component skill)

Timeline: 1-3 days

Tools:

Loom (free, fast)
Sora/Synthesia (text-to-video, paid)
PowerPoint/Keynote (slideware animation)
Figma (static storyboard frames)

Success Criteria Example:

Pass: 8/10 stakeholders say "I'd use this" or "This solves the problem"
Fail: Stakeholders ask "Why would I use this?" or suggest alternative approaches
Learn: Identify which part of the narrative resonates (or doesn't)

Disposal Plan: Archive video, document feedback, delete supporting files.

Next Step: Would you like me to generate a pol-probe artifact documenting this narrative prototype?

Option 4 Selected: "Can we model this without production risk?"

→ Recommended Probe: Synthetic Data Simulation

What it is: Use simulated users, synthetic data, or prompt logic testing to explore edge cases and unknown-unknowns without touching production. Think "wind tunnel testing, cheaper than postmortem."

Methods:

Synthea (synthetic patient data generation)
DataStax LangFlow (test prompt logic without real users)
Monte Carlo simulations (model probabilistic outcomes)
Synthetic user behavior scripts (simulate click patterns, load testing)

Timeline: 2-4 days

Tools:

Synthea (open-source, healthcare)
DataStax LangFlow (prompt chain testing)
Python + Faker library (generate synthetic data)
Locust/k6 (load testing with synthetic users)

Success Criteria Example:

Pass: System handles 10,000 synthetic users with <1% error rate
Fail: Edge cases cause crashes or incorrect outputs
Learn: Identify which edge cases break the system

Disposal Plan: Delete synthetic data, archive findings, document edge cases.

Next Step: Would you like me to generate a pol-probe artifact documenting this synthetic data simulation?

Option 5 Selected: "Will this solution survive real user contact?"

→ Recommended Probe: Vibe-Coded PoL Probe

What it is: A Frankensoft stack (ChatGPT Canvas + Replit + Airtable) that creates just enough illusion for users to interact with a semi-functional workflow. Not production-grade—just enough to catch UX/workflow signals in 48 hours.

⚠️ Warning: This is the riskiest probe type. It looks real enough to confuse momentum with maturity. Use only when you need real user contact and other methods won't suffice.

Methods:

ChatGPT Canvas (quick UI generation)
Replit (host throwaway code)
Airtable (fake database)
Carrd/Webflow (landing page + workflow mockup)

Timeline: 2-3 days

Stack Example:

ChatGPT Canvas: Generate form UI
Replit: Host simple Flask/Node app
Airtable: Capture form submissions
Loom: Record user sessions for post-mortem analysis

Success Criteria Example:

Pass: 8/10 users complete workflow, 0 critical confusion moments
Fail: Users get stuck, ask "Is this broken?", or abandon mid-flow
Learn: Identify exact step where users lose confidence

Disposal Plan: Delete all code after user sessions, archive Loom recordings, document learnings.

Next Step: Would you like me to generate a pol-probe artifact documenting this vibe-coded probe?

Step 3: Apply Component Skill

Agent offers:

I recommend using [selected probe type] for your hypothesis. Would you like me to:

Generate a completepol-probe artifact (using the pol-probe.md template) with your hypothesis, success criteria, timeline, and disposal plan?
Walk through the execution steps for this probe type (tools, methods, user recruitment)?
Refine the hypothesis before proceeding (if it seems too broad or ambiguous)?

User selects option, and agent executes accordingly.

Step 4: Refine or Iterate (Optional)

If user is uncertain or hypothesis is too broad:

Agent asks:

Your hypothesis seems broad or tests multiple things at once. PoL probes work best when they're narrow and falsifiable. Let me ask a few clarifying questions:

Clarifying Questions:

What's the smallest thing you could test first? (Isolate one variable)
What would "failure" look like? (If you can't describe failure, you can't measure success)
Is this testing user behavior, technical feasibility, or stakeholder alignment? (Pick one focus)

Agent then re-recommends probe type based on refined hypothesis.

Examples

Example 1: Choosing Task-Focused Test

User Context:

Hypothesis: "Users can complete our checkout flow in under 3 minutes"
Risk: High cart abandonment on mobile
Timeline: 3 days
Resources: UsabilityHub account, 20 users from waitlist

Agent Reasoning:

Core question: "Can users complete this job without friction?"
Risk is task completion (observable behavior), not technical feasibility
Timeline fits Task-Focused Test (2-5 days)

Recommendation: Task-Focused Test using UsabilityHub

Method: Maze prototype testing with mobile users
Success criteria: 80%+ complete checkout in <3 minutes
Disposal: Delete prototype after analyzing heatmaps

Example 2: Choosing Feasibility Check

User Context:

Hypothesis: "We can auto-tag support tickets using GPT-4 with <5% error rate"
Risk: Unknown if AI can handle industry-specific jargon
Timeline: 1 day
Resources: 100 past support tickets, ChatGPT API access

Agent Reasoning:

Core question: "Can we build this?"
Risk is technical feasibility (AI capability), not user behavior
Timeline is short (1 day = spike territory)

Recommendation: Feasibility Check using prompt engineering spike

Method: Run 100 tickets through GPT-4 prompt chain, measure error rate
Success criteria: <5% misclassified tickets
Disposal: Delete spike code after documenting findings

Outcome: Error rate was 18%. Decided NOT to build feature. Saved 6 weeks of development.

Example 3: Choosing Narrative Prototype (NOT Vibe-Coded)

User Context:

Hypothesis: "Enterprise buyers will understand our multi-tenant security model"
Risk: Complex technical concept needs explanation
Timeline: 2 days
Resources: Loom, 5 enterprise prospects

Agent Reasoning:

Core question: "Does this workflow earn stakeholder buy-in?"
Risk is comprehension/alignment, not task completion
Building a functional prototype would over-invest (2 days = just explain it)

Recommendation: Narrative Prototype using Loom walkthrough

Method: 5-minute Loom explaining security model with diagrams
Success criteria: 4/5 prospects say "I understand this" without follow-up questions
Disposal: Archive video after 5 sessions

Why NOT Vibe-Coded Probe: You don't need users to interact with security settings—you need them to understand the concept. Narrative is cheaper and faster.

Example 4: Avoiding Prototype Theater

User Context:

Hypothesis: "Executives will approve budget if we show a polished demo"
Risk: Internal politics
Timeline: 3 weeks
Resources: Full design team

Agent Response:

⚠️ This is prototype theater, not a PoL probe.

You're testing internal politics, not user behavior or technical feasibility. PoL probes eliminate product risk, not organizational risk.

Recommendation: Skip the prototype entirely. Instead:

Use a Narrative Prototype (1-day Loom walkthrough) to explain the concept
Test with 5 target users (not executives) to validate the hypothesis
Present user feedback to executives, not a polished demo

If executives need a demo, build it after you've validated the hypothesis with real users.

Common Pitfalls

1. Choosing Based on Tooling Comfort

Failure Mode: "I know Figma, so I'll design a UI prototype" (even if design isn't the risk).

Consequence: Validate the wrong thing; miss the actual risk.

Fix: Answer the core question first , then pick the method. If you need a Feasibility Check but only know design tools, pair with an engineer for 1 day.

2. Defaulting to Code

Failure Mode: "Let's just build it and see what happens."

Consequence: 2 weeks of development before learning you tested the wrong hypothesis.

Fix: Ask: "What's the cheapest prototype that tells the harshest truth?" Usually it's NOT code.

3. Confusing Vibe-Coded Probes with MVPs

Failure Mode: Vibe-Coded probe "looks real," so team treats it like production code.

Consequence: Scope creep, technical debt, resistance to disposal.

Fix: Set disposal date before building. Vibe-Coded probes are Frankensoft by design —celebrate the jank, delete after learning.

4. Testing Multiple Things at Once

Failure Mode: "Let's test the workflow, the pricing, and the UI in one probe."

Consequence: Ambiguous results—you won't know which variable caused failure.

Fix: One probe, one hypothesis. If you have 3 hypotheses, run 3 probes.

5. Skipping Success Criteria

Failure Mode: "We'll know it when we see it."

Consequence: No harsh truth—just opinions and vanity metrics.

Fix: Write success criteria before building. Define "pass," "fail," and "learn" thresholds.

References

Related Skills

pol-probe (Component) — Template for documenting PoL probes
problem-statement (Component) — Frame problem before choosing validation method
problem-framing-canvas (Interactive) — MITRE Problem Framing before validation
discovery-process (Workflow) — Use PoL probes in validation phase
epic-hypothesis (Component) — Turn epics into testable hypotheses

External Frameworks

Jeff Patton — User Story Mapping (lean validation principles)
Marty Cagan — Inspired (2014 prototype flavors framework)
Dean Peters — Vibe First, Validate Fast, Verify Fit (Dean Peters' Substack, 2025)

Tools by Probe Type

Feasibility: ChatGPT/Claude, Postman, Jupyter
Task-Focused: Optimal Workshop, UsabilityHub, Maze
Narrative: Loom, Sora, Synthesia, PowerPoint
Synthetic Data: Synthea, DataStax LangFlow, Faker
Vibe-Coded: ChatGPT Canvas, Replit, Airtable, Carrd

Weekly Installs

212

Repository

deanpeters/prod…r-skills

GitHub Stars

1.5K

First Seen

Feb 12, 2026

Security Audits

Gen Agent Trust HubPass SocketFail SnykWarn

Installed on

codex187

opencode185

gemini-cli181

github-copilot180

cursor178

kimi-cli177

Vue 3 调试指南：解决响应式、计算属性与监听器常见错误

9,900 周安装

产品经理PoL探针选择指南：5种概念验证方法决策框架

🇨🇳中文介绍

目的

核心概念

核心问题：方法与假设不匹配

5 种 PoL 探针类型（快速参考）

相关 Skills

反模式（这不是什么）

何时使用此技能

引导过程参考标准

应用

步骤 0：收集上下文

步骤 1：确定核心问题

步骤 2：推荐 PoL 探针类型

选项 1 被选中："我们能构建这个吗？"

选项 2 被选中："用户能否无摩擦地完成这项工作？"

选项 3 被选中："这个工作流程能获得利益相关者的认同吗？"

选项 4 被选中："我们能否在不冒生产风险的情况下建模？"

选项 5 被选中："这个解决方案能经受住真实用户的接触吗？"

步骤 3：应用组件技能

步骤 4：优化或迭代（可选）

示例

示例 1：选择任务导向测试

示例 2：选择可行性检查

示例 3：选择叙事原型（而非氛围编码）

示例 4：避免原型剧场

常见陷阱

1. 根据工具熟悉度选择

2. 默认写代码

3. 混淆氛围编码探针与 MVP

4. 同时测试多个东西

5. 跳过成功标准

参考资料

相关技能

外部框架

按探针类型分类的工具

🇺🇸English

Purpose

Key Concepts

The Core Problem: Method-Hypothesis Mismatch

The 5 PoL Probe Flavors (Quick Reference)

Anti-Patterns (What This Is NOT)

When to Use This Skill

Facilitation Source of Truth

Application

Step 0: Gather Context

Step 1: Identify the Core Question

Step 2: Recommend PoL Probe Type

Option 1 Selected: "Can we build this?"

Option 2 Selected: "Can users complete this job without friction?"

Option 3 Selected: "Does this workflow earn stakeholder buy-in?"

Option 4 Selected: "Can we model this without production risk?"

Option 5 Selected: "Will this solution survive real user contact?"

Step 3: Apply Component Skill

Step 4: Refine or Iterate (Optional)

Examples

Example 1: Choosing Task-Focused Test

Example 2: Choosing Feasibility Check

Example 3: Choosing Narrative Prototype (NOT Vibe-Coded)

Example 4: Avoiding Prototype Theater

Common Pitfalls

1. Choosing Based on Tooling Comfort

2. Defaulting to Code

3. Confusing Vibe-Coded Probes with MVPs

4. Testing Multiple Things at Once

5. Skipping Success Criteria

References

Related Skills

External Frameworks

Tools by Probe Type

最新 Skills