工单分类评估工具：六项就绪标准详解与自动化验收指南

ticket-triage by jwilger/agent-skills

68 周安装量

2 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jwilger/agent-skills --skill ticket-triage

需求分析软件工程项目管理

🇨🇳中文介绍

工单分类

根据六项就绪标准评估单个工单，并给出明确的结论和可操作的修复指导。

如何获取工单

用户可能通过以下几种方式提供工单。请相应调整：

粘贴在提示中 ：工单文本就在那里。直接使用。
文件路径 ：读取文件。
Jira / Linear / GitHub Issue / 其他跟踪器 ：如果该跟踪器有可用的 MCP 工具，则使用它们来获取工单。如果没有，请要求用户粘贴内容。
多个工单 ：如果给出多个工单，请分别评估每个工单，并给出每个工单的评估结果。不要将它们批量处理成一个单一的结论。

如果工单内容模糊或不完整（例如，只有标题没有描述），请在评估中注明，但仍需评估已有的内容。

评估立场

在深入标准之前，先理解进行此评估的正确心态。你要回答的问题是：“开发人员能否接手这个工单，并在无需询问澄清问题的情况下构建出正确的东西？”

你不是在追求完美。你是在寻找充分性。一个语言略显随意但能清楚传达预期行为的工单是可以接受的。只有当差距确实会导致开发人员构建错误的东西、遗漏重要行为或无法验证其工作时，才判定该标准不通过。

如有疑问，请通过该标准并注明可选的改进项。将“不通过”留给那些会阻碍开发或导致错误实现的真正问题。

六项就绪标准

只有当工单通过全部六项标准时，才被视为已就绪，可进行开发。即使只失败一项，也意味着未就绪。

1. 具体的验收标准

验收标准必须足够具体地传达功能的作用，以便开发人员知道要构建什么。它们应描述行为，而不仅仅是重复功能名称。

通过："拖拽会改变 UI 中的顺序" —— 开发人员知道要实现什么：拖拽交互以视觉上重新排序项目

通过："当用户提交标题少于 3 个字符的任务时，标题字段下方会出现验证错误" —— 非常精确和详细

不通过 ："用户可以创建任务" —— 这是功能摘要，不是验收标准。它没有说明创建任务涉及什么、显示哪些字段或创建后会发生什么

不通过 ："表单上显示验证错误" —— 哪些错误？针对哪些字段？在表单的哪个位置？

标准是“开发人员是否知道要构建什么？”，而不是“是否记录了每个边缘情况？”。即使可以更详细，但能清楚传达预期行为的验收标准也算通过。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

3. 可验证且具体的验收标准

每个验收标准都必须是可测试的——质量保证工程师可以根据验收标准文本判断通过或失败。关键问题是验收标准是否包含主观语言，使得通过/失败的判断成为意见问题。

通过："页面刷新后顺序保持不变" —— 清晰的测试：重新排序，刷新，检查

通过："用户只能看到自己的任务" —— 清晰的测试：以用户 A 身份登录，验证用户 B 的任务不可见

不通过 ："查询在大数据集下性能合理" —— "合理"是主观的，"大"是未定义的

不通过 ："用户感觉应用更强大" —— 完全主观，无法测试

通常导致不通过的主观危险词："适当"、"合理"、"正确"、"清晰"、"感觉"、"感知"、"直观"、"有意义"。然而，上下文很重要——"用户可以切换主题"是可以接受的，即使"切换"有点随意，因为行为是显而易见的。

4. 可通过用户界面进行验证

整个工单必须能够由与应用程序交互的用户进行验证。在工单层面评估此项，而不是针对每个验收标准。

通过：核心行为是面向用户的工单，即使一个验收标准提到了实现细节。例如，如果一个工单有 3 个验收标准，其中两个描述 UI 行为，而一个说"存在外键关系"，那么该工单通过——它确实是用户可验证的。实现细节的验收标准是一个风格问题，应作为可选改进项注明，而不是标准失败。如果即使只有一个验收标准描述了用户可观察到的行为，该工单就通过此标准。

不通过 ：所有验收标准都描述实现细节（"存在外键"、"数据存储在数据库中"、"迁移成功运行"），完全没有用户可观察到的行为。

问题是："用户能否通过使用应用程序来验证此工单已完成？" 如果是，则通过。就是这样。

5. 非纯基础设施

工单交付的是应用程序用户可以体验到的价值。纯基础设施、工具或运维工单应归类为"杂项"或"任务"，而不是"故事"。

通过："应用启动后，用户可以看到任务列表页面" —— 基础设施 + 用户价值结合

不通过 ："Docker Compose 启动应用，并且数据库可以从 Rails 访问" —— 仅对开发/基础设施有价值

基础设施工作是必要且有效的。此标准是关于分类和框架，而不是否定工作。如果一个工单纯粹是基础设施，建议将其重新归类为杂项，或将其与面向用户的工单合并，以便一起发布。

6. 数据模型的验证标准

当工单引入新的数据模型（新的数据库表/实体）或向现有模型添加面向用户的字段时，适用此标准。适用时，每个面向用户的字段都必须有明确的验证规则：类型、必需/可选、最小/最大长度或值、格式、允许的值、默认值。

通过：包含类似以下内容的工单：

字段	类型	必需	约束	默认值
title	string	是	最小 3 个字符，最大 255 个字符	--
status	enum	是	pending, in_progress, completed, archived	pending

不通过 ："每个任务包含：标题（字符串）、描述（文本）、状态、截止日期" —— 只有字段名称和粗略类型，没有验证规则

不适用 —— 标记为通过 ：当工单没有引入新的数据模型时。具体来说：

为关系添加外键（例如，向任务添加 user_id）是关系变更，不是新的数据模型。通过。
使用已建立模型上已有的字段（例如，使用先前工单中引入的 position 字段重新排序任务）不是引入新字段。通过。
纯 UI 变更、行为变更或不涉及数据模型的功能。通过。

只有当工单提到类似"创建评论模型"或"每个任务应包含：[新字段列表]"时，此标准才会触发——即，工单正在定义要存储哪些数据，并且开发人员需要知道验证规则才能正确构建表单和模型。

注意：描述中提到数据字段但没有验证部分的工单，列出值但没有指定默认值的枚举字段，以及验证规则在另一个工单中定义的字段（这意味着该工单不是自包含的）。

对于每个工单：

识别工单元数据 ：标题、类型、优先级（如果可用）
评估每项标准 ：确定通过/不通过，并提供引用实际工单文本的具体理由
确定总体结论 ：就绪、基本就绪（恰好失败 1 项次要标准，且修复时间 < 30 分钟）或未就绪
识别具体差距 ：具体缺少或模糊的内容是什么，并直接引用工单中的内容
提供修复指导 ：团队需要采取的具体步骤，而不是通用建议

以清晰、可扫描的格式呈现评估结果：

## 工单分类: [工单标题]

**结论: [已就绪，可进行开发 / 基本就绪 / 未就绪]**

### 标准评估

| # | 标准 | 结果 | 备注 |
|---|-----------|--------|-------|
| 1 | 具体的验收标准 | 通过/不通过 | [简要理由] |
| 2 | 适当拆分 | 通过/不通过 | [简要理由] |
| 3 | 可验证且具体的验收标准 | 通过/不通过 | [简要理由] |
| 4 | 可通过用户界面进行验证 | 通过/不通过 | [简要理由] |
| 5 | 非纯基础设施 | 通过/不通过 | [简要理由] |
| 6 | 数据模型的验证标准 | 通过/不通过/不适用 | [简要理由] |

然后，根据结论：

如果已就绪 ：简要确认工单可以开始的原因。注明任何可选的改进项（明确标记为非阻塞项）。

如果基本就绪 ：列出需要进行的 1-2 个具体、小的变更。提供确切的改写后的验收标准或补充内容，以便团队可以复制粘贴。

如果未就绪 ：包含以下所有三个部分：

差距 —— 每个具体差距的要点列表，按失败的标准分组。直接引用工单中有问题的文本。

修复步骤 —— 团队需要做什么的编号清单，按优先级排序。要具体：不要说"添加验证规则"——说明哪些字段需要哪些规则。例如：

为 title 添加验证：必需、字符串、最小 3 个字符、最大 255 个字符
将验收标准"状态存储在数据库中"替换为："当用户更改任务状态时，新状态徽章立即可见，并在页面刷新后保持不变"
将此工单拆分为：(1) 状态过滤，(2) 截止日期过滤，(3) 关键字搜索

修复后示例 —— 展示工单中最有影响的部分的前后对比。这可以教会团队模式，以便他们可以将其应用到其他工单。格式：

修复前:

用户可以创建任务

任务持久保存在数据库中

修复后:

当用户填写任务标题并点击"创建"时，他们将被重定向到任务列表，新任务出现在顶部

当用户创建没有标题的任务时，标题字段下方会出现验证错误"标题是必需的"

当用户刷新任务列表页面时，所有先前创建的任务仍然可见

默认通过。 你的起始假设应该是标准通过。只有当你能指出一个具体的、会导致开发人员构建错误的东西或无法验证其工作的问题时，才判定其失败。"这可以更详细"是可选的改进项，而不是失败。

冗余的实现验收标准 ：如果工单混合了面向用户的验收标准和一个实现细节的验收标准（如"保存到数据库"），该工单仍然通过标准 4（用户可验证）。在你的评估中将冗余的验收标准注明为可选清理项，但不要判定该标准失败。只有当工单没有用户可验证的验收标准时，才判定标准 4 失败。

边界验收标准 ：如果一个验收标准足够清楚地传达了预期行为，以至于开发人员知道要构建什么，那么它就通过标准 1 和 3——即使措辞随意或可以更精确。"拖拽会改变 UI 中的顺序"是清晰的。"结果更新正确"则不是（"正确"是什么意思？）。

基础设施 + 用户价值 ：如果一个工单将基础设施工作与面向用户的交付成果结合起来，则通过标准 5。

隐含的数据模型 ：如果工单没有明确说"创建新模型"，但功能显然需要一个（例如，"用户可以在任务上留言"暗示需要一个评论模型），则在标准 6 下标记缺失的数据模型定义。

关系与新模型 ：添加外键以建立现有模型之间的关系（例如，任务上的 user_id）不会触发标准 6。该标准是关于具有需要表单验证的面向用户字段的新实体，而不是关于数据库级别的关联管道。

人为拆分 ：如果一个工单引用了另一个工单以获取基本细节（例如，"状态值在 TICKET-4 中定义"），则在标准 2 下标记此问题。工单应该是自包含的，或者明确声明依赖关系。

基本就绪阈值 ：当工单恰好失败 1 项标准且修复很小（< 30 分钟的细化时间）时使用此结论。两项或更多失败标准意味着未就绪，即使每个修复单独来看都很小。

独立模式 ：建议性。代理遵循评估标准作为惯例。
流水线模式 ：门控性。未就绪的工单不能进入切片队列。

所有六项标准都通过才能获得已就绪结论：[H]

"默认通过" ：默认通过意味着：当一项标准处于边界，并且合理的开发人员会有不同意见时，给予怀疑的好处。它不意味着：为了避免工单失败，对每项标准都进行宽泛的解释。如果你为了通过一项标准而扭曲逻辑，那就是一个应该失败的信号。
基本就绪的" < 30 分钟修复"：这是一个粗略的人力工作量估计，不是精确的阈值。其精神是：工单作者能否在单个短迭代中修复此问题？如果修复需要研究、会议或设计决策，那么无论打字速度如何，它都不是 30 分钟的修复。

要直接且有建设性。目标是帮助团队发布更好的工单，而不是设置障碍。将修复框架化为"以下是使其就绪的方法"，而不是"以下是错误所在"。当工单写得很好时，要给予肯定——认可好的做法有助于在整个团队中建立模式。

🇺🇸English

Ticket Triage

Evaluate a single ticket against six readiness criteria and produce a clear verdict with actionable remediation guidance.

How to Get the Ticket

The user might provide the ticket in several ways. Adapt accordingly:

Pasted in the prompt : The ticket text is right there. Use it directly.
File path : Read the file.
Jira / Linear / GitHub Issue / other tracker : If MCP tools are available for the tracker, use them to fetch the ticket. If not, ask the user to paste the content.
Multiple tickets : If given multiple tickets, evaluate each one separately and produce a per-ticket assessment. Do not batch them into a single verdict.

If the ticket content is ambiguous or incomplete (e.g., just a title with no description), note that in your assessment but still evaluate what's there.

Evaluation Posture

Before diving into the criteria, understand the right mindset for this evaluation. The question you are answering is: "Could a developer pick up this ticket and build the right thing without needing to ask clarifying questions?"

You are not looking for perfection. You are looking for sufficiency. A ticket with slightly informal language that nonetheless communicates the expected behavior clearly is FINE. Only fail a criterion when the gap would genuinely cause a developer to build the wrong thing, miss an important behavior, or be unable to verify their work.

When in doubt, pass the criterion and note an optional improvement. Reserve failures for genuine problems that would block development or lead to incorrect implementations.

The Six Readiness Criteria

A ticket is Ready for Development only if it passes ALL six criteria. Failing even one makes it not ready.

1. Specific Acceptance Criteria

The ACs must communicate what the feature does concretely enough that a developer knows what to build. They should describe behaviors, not just restate the feature name.

Passes : "Dragging changes order in the UI" -- a developer knows what to implement: drag interaction that visually reorders items

Passes : "When a user submits a task with a title shorter than 3 characters, a validation error appears below the title field" -- very precise and detailed

Fails : "User can create a task" -- this is a feature summary, not an AC. It says nothing about what creating a task involves, what fields are shown, or what happens after creation

Fails : "Validation errors are displayed on the form" -- which errors? for which fields? where on the form?

The bar is "would a developer know what to build?" not "is every edge case documented?" ACs that clearly communicate the expected behavior pass even if they could be more detailed.

2. Appropriately Sliced

The ticket is a single, deliverable unit of work. Not an epic disguised as a story, nor an artificial slice that separates tightly coupled concerns (like creating a model in one ticket and adding its validations in another).

Passes : "Add drag-and-drop reordering to the task list" -- single feature, clear scope

Fails : "Build import and export functionality" -- two distinct features with different complexity

Red flags: 3+ distinct capabilities listed, "and" in the title joining unrelated concerns, question marks in the description suggesting the scope isn't decided.

3. Verifiable and Specific ACs

Each AC must be testable -- a QA engineer could determine pass or fail from the AC text. The key question is whether the AC contains subjective language that makes the pass/fail determination a matter of opinion.

Passes : "Order persists after page refresh" -- clear test: reorder, refresh, check

Passes : "A user only sees their own tasks" -- clear test: log in as user A, verify user B's tasks are not visible

Fails : "Query is reasonably performant with large data sets" -- "reasonably" is subjective, "large" is undefined

Fails : "Users perceive the app as more powerful" -- entirely subjective, no way to test

Subjective red-flag words that typically cause failures: "appropriately", "reasonably", "correctly", "clearly", "feels", "perceived", "intuitive", "meaningful". However, context matters -- "User can toggle theme" is fine even though "toggle" is slightly informal, because the behavior is obvious.

4. User-Verifiable Through the UI

The ticket as a whole must be verifiable by a user interacting with the application. Evaluate this at the ticket level , not per-AC.

Passes : A ticket where the core behavior is user-facing, even if one AC mentions an implementation detail. For example, if a ticket has 3 ACs and two describe UI behavior while one says "foreign key relationship exists", the ticket passes -- it IS user-verifiable. The implementation AC is a style issue to note as an optional improvement, not a criterion failure. If even ONE AC describes a user-observable behavior, the ticket passes this criterion.

Fails : A ticket where ALL ACs describe implementation details ("foreign key exists", "data is stored in the database", "migrations run successfully") with no user-observable behavior at all.

The question is: "Can a user verify this ticket is done by using the app?" If yes, it passes. Period.

5. Not Infrastructure-Only

The ticket delivers value that a user of the application can experience. Pure infrastructure, tooling, or devops tickets should be typed as "Chores" or "Tasks", not "Stories."

Passes : "User can see a task list page after the app starts" -- infrastructure + user value combined

Fails : "Docker Compose starts the app and database is reachable from Rails" -- developer/infra value only

Infrastructure work is necessary and valid. The criterion is about typing and framing, not dismissing the work. If a ticket is purely infrastructure, recommend reclassifying it as a Chore or merging it with a user-facing ticket so they ship together.

6. Validation Criteria for Data Models

This criterion applies when a ticket introduces a new data model (a new database table / entity) or adds user-facing fields to an existing model. When it applies, every user-facing field must have explicit validation rules: type, required/optional, min/max length or value, format, allowed values, default value.

Passes : A ticket that includes something like:

Field	Type	Required	Constraints	Default
title	string	Yes	Min 3 chars, max 255 chars	--
status	enum	Yes	pending, in_progress, completed, archived	pending

Fails : "Each task has: Title (string), Description (text), Status, Due date" -- field names and rough types but no validation rules

N/A -- mark as Pass : When the ticket does NOT introduce a new data model. Specifically:

Adding a foreign key for a relationship (e.g., adding user_id to tasks) is a relationship change, not a new data model. Pass.
Using fields that already exist on an established model (e.g., reordering tasks using a position field introduced in an earlier ticket) is not introducing new fields. Pass.
Pure UI changes, behavioral changes, or features that don't touch data models. Pass.

The criterion only triggers when the ticket says something like "create a Comment model" or "each task should have: [list of new fields]" -- i.e., the ticket is defining what data gets stored and the developer needs to know the validation rules to build the forms and model correctly.

Watch for: tickets that mention data fields in the description but have no validation section, enum fields that list values without specifying the default, and fields whose validation rules are defined in a different ticket (this means the ticket isn't self-contained).

Evaluation Process

For each ticket:

Identify the ticket metadata : title, type, priority (if available)
Evaluate each criterion : Determine pass/fail with specific reasoning that references the actual ticket text
Determine the overall verdict : Ready, Nearly Ready (fails exactly 1 minor criterion with a < 30 min fix), or Not Ready
Identify specific gaps : What exactly is missing or vague, with direct quotes from the ticket
Provide remediation guidance : Concrete steps the team needs to take, not generic advice

Output Format

Present the assessment in a clear, scannable format:

## Ticket Triage: [Ticket Title]

**Verdict: [READY FOR DEVELOPMENT / NEARLY READY / NOT READY]**

### Criteria Assessment

| # | Criterion | Result | Notes |
|---|-----------|--------|-------|
| 1 | Specific Acceptance Criteria | Pass/Fail | [brief reason] |
| 2 | Appropriately Sliced | Pass/Fail | [brief reason] |
| 3 | Verifiable and Specific ACs | Pass/Fail | [brief reason] |
| 4 | User-Verifiable Through UI | Pass/Fail | [brief reason] |
| 5 | Not Infrastructure-Only | Pass/Fail | [brief reason] |
| 6 | Validation Criteria for Data Models | Pass/Fail/N/A | [brief reason] |

Then, based on the verdict:

If READY : Brief confirmation of why the ticket is good to go. Note any optional improvements (clearly marked as non-blocking).

If NEARLY READY : List the 1-2 specific, small changes needed. Provide the exact rewritten AC or addition so the team can copy-paste it.

If NOT READY : Include all three of these sections:

Gaps -- Bulleted list of every specific gap, grouped by failing criterion. Quote the problematic text from the ticket directly.

Remediation Steps -- Numbered checklist of what the team needs to do, in priority order. Be specific: don't say "add validation rules" -- say which fields need which rules. For example:

Add validation for title: required, string, min 3 chars, max 255 chars
Replace AC "Status is stored in the database" with: "When a user changes a task's status, the new status badge is visible immediately and persists after page refresh"
Split this ticket into: (1) Status filtering, (2) Due date filtering, (3) Keyword search

Remediated Example -- Show a before/after for the most impactful section of the ticket. This teaches the team the pattern so they can apply it to other tickets. Format:

Before:

User can create a task

Tasks persist in the database

After:

When a user fills in the task title and clicks "Create", they are redirected to the task list and the new task appears at the top

When a user creates a task without a title, a validation error "Title is required" appears below the title field

When a user refreshes the task list page, all previously created tasks are still visible

Judgment Calls

Default to passing. Your starting assumption should be that a criterion passes. Only fail it when you can point to a specific, concrete problem that would cause a developer to build the wrong thing or be unable to verify their work. "This could be more detailed" is an optional improvement, not a failure.

Redundant implementation ACs : If a ticket has a mix of user-facing ACs and one implementation-detail AC (like "saved to the database"), the ticket still passes criteria 4 (user-verifiable). Note the redundant AC as an optional cleanup in your assessment, but do not fail the criterion. Only fail criterion 4 when the ticket has NO user-verifiable ACs.

Borderline ACs : If an AC communicates the expected behavior clearly enough that a developer would know what to build, it passes criteria 1 and 3 -- even if the wording is informal or could be more precise. "Dragging changes order in the UI" is clear. "Results update correctly" is not (what does "correctly" mean?).

Infrastructure + user value : If a ticket combines infrastructure work with a user-facing deliverable, it passes criterion 5.

Implied data models : If the ticket doesn't explicitly say "create a new model" but the feature clearly requires one (e.g., "users can leave comments on tasks" implies a Comment model), flag the missing data model definition under criterion 6.

Relationships vs. new models : Adding a foreign key to establish a relationship between existing models (e.g., user_id on tasks) does NOT trigger criterion 6. The criterion is about new entities with user-facing fields that need form validation, not about database-level relationship plumbing.

Artificial splits : If a ticket references another ticket for essential details (e.g., "Status values are defined in TICKET-4"), flag this under criterion 2. The ticket should be self-contained or explicitly declare the dependency.

Nearly Ready threshold : Use this when the ticket fails exactly 1 criterion and the fix is small (< 30 minutes of refinement). Two or more failing criteria means Not Ready, even if each fix is individually small.

Enforcement Note

Standalone mode : Advisory. The agent follows evaluation criteria by convention.
Pipeline mode : Gating. NOT READY tickets cannot enter the slice queue.

Hard constraints:

All six criteria PASS for READY verdict: [H]

Constraints

"Default to passing" : Default to passing means: when a criterion is borderline and reasonable developers would disagree, give the benefit of the doubt. It does NOT mean: find a generous interpretation of every criterion to avoid failing tickets. If you're contorting logic to pass a criterion, that's a signal it should fail.
" < 30 min fix" for NEARLY READY: This is a rough human-effort estimate, not a precise threshold. The spirit is: could the ticket author fix this in a single short iteration? If the fix requires research, meetings, or design decisions, it's not a 30-minute fix regardless of typing speed.

Tone

Be direct and constructive. The goal is to help the team ship better tickets, not to gatekeep. Frame remediation as "here's what would make this ready" rather than "here's what's wrong." When a ticket is well-written, say so -- recognizing good practices helps establish patterns across the team.

Weekly Installs

Repository

jwilger/agent-skills

GitHub Stars

First Seen

Feb 20, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

claude-code61

codex55

cursor55

kimi-cli53

github-copilot53

opencode53

任务估算指南：敏捷开发故事点、计划扑克、T恤尺码法详解

10,500 周安装