变异测试指南：100%杀死率确保代码质量，支持Rust/TypeScript/Python/Elixir

mutation-testing by jwilger/agent-skills

83 周安装量

2 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/jwilger/agent-skills --skill mutation-testing

自动化代码质量测试

🇨🇳中文介绍

变异测试

价值： 反馈 —— 变异测试通过证明测试确实能检测出它们声称要预防的缺陷，从而闭合了验证循环。没有它，通过的测试可能会提供虚假的信心。

目的

教导代理在创建 PR 前运行变异测试作为质量门禁。变异测试对生产代码进行微小更改（变异），并检查测试是否能捕获这些更改。存活下来的变异体揭示了可能隐藏未被发现的缺陷的空白区域。要求的变异体杀死率是 100%。

实践

检测并运行正确的工具

检测项目类型并运行适当的变异测试工具。

检查项目标识符：
- Cargo.toml -> Rust -> cargo mutants
- package.json -> TypeScript/JavaScript -> npx stryker run
- pyproject.toml 或 setup.py -> Python -> mutmut run

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

Rust (限定到包)

cargo mutants --package <package> --jobs 4

# TypeScript
npx stryker run

# Python (限定到源代码)
mutmut run --paths-to-mutate=src/
mutmut results

# Elixir
mix muzak

解析并报告结果

从变异工具输出中提取：

生成的变异体总数
被杀死的变异体（测试检测到了更改）
存活下来的变异体（测试未检测到更改）
超时的变异体
变异分数百分比

分析存活变异体

对于每个存活变异体，报告三件事：

位置： 文件和行号
变异： 更改了什么（例如，“将 + 替换为 -”）
含义： 这会让哪类缺陷通过

常见的变异类型及其存活所表明的问题：

算术 (+ -> -, * -> /)：计算未验证
比较 (> -> >=, == -> !=)：边界条件未测试
布尔 (&& -> ||, ! 被移除)：逻辑分支未覆盖
返回值 (true -> false, Ok -> Err)：返回路径未检查
语句移除（行被删除）：副作用未断言

在推荐任何针对存活变异体的测试之前，检查场景覆盖：

对于每个存活变异体：

步骤 1 — 场景检查： 是否有任何验收场景或领域场景（来自切片的 acceptance_scenarios 或 domain_scenarios 数组，或计划的“已确认场景”部分）需要被变异的行为？

是 → 存在场景但其测试缺失。继续下面的“推荐缺失的测试”。
否 → 标记为需要人工决策：
```
位于 [file]:[line] 的存活变异体没有 GWT 场景需要此行为。
```
选项： (a) 删除代码 —— 此行为可能不需要。 (b) 添加缺失的验收或领域场景 —— 规格说明不完整。 100% 的杀死率仍然适用；这明确了如何解决它。

对于未覆盖的变异体，不要继续到测试推荐。在没有场景的情况下编写测试，只是为了满足指标而测试，并没有测试真实行为。

强制执行质量门禁

要求的变异体杀死率是 100%。所有变异体必须被杀死。

如果分数是 100%：报告成功，继续创建 PR
如果分数低于 100%：列出所有存活变异体及推荐。用明确的警告阻止 PR 创建。用户可以覆盖，但默认是先修复。

尽可能将变异运行范围限定在更改的代码
报告存活变异体并提供可操作的修复建议
修复后重新运行以确认所有变异体现已被杀死
将超时视为已杀死（变异破坏了某些东西）

在创建 PR 前跳过变异测试
接受存活变异体而不报告它们
当只有模块更改时，对整个代码库运行变异
为属于领域类型的数据验证推荐测试

当由流水线编排器调用时：

FAIL 裁决会自动路由回 tdd 技能，并附上存活变异体列表。流水线处理此返工路由 —— 变异测试仅报告结果。
MUTATION_RESULT 包中的存活变异体详细信息必须足够具体（文件、行号、变异类型、描述），以便 TDD 对编写针对性测试，而无需重新运行变异工具来理解失败原因。
流水线可能为每个切片多次调用变异测试；每次运行都会覆盖该切片之前的 mutation.json。

流水线模式：门禁。100% 杀死率是一个门禁 —— 失败会阻止合并。
独立模式：建议性。代理报告但无法阻止覆盖。

100% 杀死率：流水线模式下为 [H]，独立模式下为 [RP]（阻止 PR，用户可以用记录的原因覆盖）

有关覆盖文档要求，请参阅模板目录中的 CONSTRAINT-RESOLUTION.md。

100% 杀死率：100% 就是 100%。不是“足够接近”。不是“98% 且有理由”。低于 100% 的唯一途径是明确的用户覆盖，这必须包括一个记录的原因，解释为什么每个存活变异体是可接受的。“我想发布”不是理由。“此变异体测试日志输出，这不是业务规则”是理由。
“没有场景就没有测试”：仅为了杀死变异体而编写测试，没有相应的场景，是在玩弄指标。测试证明你能杀死变异体，而不是该行为很重要。如果没有场景覆盖该行为，正确的响应是向团队标记此空白 —— 缺失的场景可能揭示了缺失的需求。

完成变异测试后，验证：

已针对相关范围运行变异测试工具
所有存活变异体都已列出，包含文件、行号和变异类型
在测试推荐前，检查了每个存活变异体的场景覆盖情况
每个有覆盖场景的存活变异体都有具体的测试推荐
每个没有覆盖场景的存活变异体都被标记为需要人工决策
变异分数是 100%（或用户明确选择覆盖）
如果进行了修复，已重新运行变异测试以确认

如果任何标准未满足，在继续之前重新审视相关实践。

此技能可独立工作，但作为 PR 前的质量门禁最有价值。它与以下集成：

tdd： TDD 产生变异测试验证的测试；存活变异体表明 TDD 周期遗漏了一个案例
code-review： 变异结果通知代码审查 —— 审查者可以检查新代码没有存活变异体

缺少依赖项？使用以下命令安装：

npx skills add jwilger/agent-skills --skill tdd

🇺🇸English

Mutation Testing

Value: Feedback -- mutation testing closes the verification loop by proving that tests actually detect the bugs they claim to prevent. Without it, passing tests may provide false confidence.

Purpose

Teaches the agent to run mutation testing as a quality gate before PR creation. Mutation testing makes small changes (mutations) to production code and checks whether tests catch them. Surviving mutants reveal gaps where bugs could hide undetected. The required mutation kill rate is 100%.

Practices

Detect and Run the Right Tool

Detect the project type and run the appropriate mutation testing tool.

Check for project markers:
- Cargo.toml -> Rust -> cargo mutants
- package.json -> TypeScript/JavaScript -> npx stryker run
- pyproject.toml or setup.py -> Python -> mutmut run
- mix.exs -> Elixir -> mix muzak
Verify the tool is installed. If not, provide installation instructions:
- Rust: cargo install cargo-mutants
- TypeScript: npm install --save-dev @stryker-mutator/core
- Python: pip install mutmut
- Elixir: add {:muzak, "~> 1.0", only: :test} to deps
Run mutation testing against the relevant scope. Prefer scoping to changed files or packages rather than the entire codebase when possible:

Rust (scoped to package)

cargo mutants --package <package> --jobs 4

# TypeScript
npx stryker run

# Python (scoped to source)
mutmut run --paths-to-mutate=src/
mutmut results

# Elixir
mix muzak

Parse and Report Results

Extract from the mutation tool output:

Total mutants generated
Mutants killed (tests detected the change)
Mutants survived (tests did NOT detect the change)
Timed-out mutants
Mutation score percentage

Analyze Surviving Mutants

For each surviving mutant, report three things:

Location: File and line number
Mutation: What was changed (e.g., "replaced + with -")
Meaning: What class of bug this lets through

Common mutation types and what survival indicates:

Arithmetic (+ -> -, * -> /): Calculations not verified
Comparison (> -> >=, == -> !=): Boundary conditions untested
Boolean (&& -> ||, ! removed): Logic branches not covered

Scenario Coverage Check

Before recommending any test for a surviving mutant, check scenario coverage:

For each surviving mutant:

Step 1 — Scenario check: Does any acceptance scenario or domain scenario (from the slice's acceptance_scenarios or domain_scenarios arrays, or the plan's Confirmed Scenarios sections) require the behavior being mutated?

YES → A scenario exists but its test is missing. Proceed to Recommend Missing Tests below.
NO → Flag for human decision:
```
Surviving mutant at [file]:[line] has no GWT scenario requiring this behavior.
```
Options: (a) Delete the code — this behavior may not be needed. (b) Add a missing acceptance or domain scenario — the spec is incomplete. The 100% kill rate still applies; this clarifies how to resolve it.

Do NOT proceed to test recommendations for uncovered mutants. Writing a test without a scenario games the metric without testing real behavior.

Recommend Missing Tests

For each surviving mutant with a covering scenario , suggest a specific test:

Surviving: src/money.rs:45 -- replaced `+` with `-` in Money::add()
Recommend: Test that adding Money(50) + Money(30) equals Money(80),
           not Money(20). The current tests do not assert the sum value.

Surviving: src/account.rs:78 -- replaced `>` with `>=` in check_balance()
Recommend: Test the exact boundary -- check_balance with exactly zero
           balance. Current tests only check positive and negative.

Structured Output

After mutation testing completes, produce a MUTATION_RESULT evidence packet:

{
  "tool": "cargo-mutants",
  "scope": ["src/money.rs", "src/account.rs"],
  "total_mutants": 42,
  "killed": 40,
  "survived": 2,
  "score": 95.2,
  "survivors": [
    {"file": "src/money.rs", "line": 45, "mutation_type": "arithmetic", "description": "replaced + with -"}
  ],
  "verdict": "FAIL"
}

Verdict: PASS if score is 100% on changed files, FAIL otherwise
When running in pipeline mode, store to .factory/audit-trail/slices/<slice-id>/mutation.json
When running standalone, the output is informational only -- display it and proceed to the quality gate

Enforce the Quality Gate

The required mutation kill rate is 100%. All mutants must be killed.

If score is 100%: Report success, proceed to PR creation
If score is below 100%: List all survivors with recommendations. Block PR creation with a clear warning. The user may override, but the default is to fix first.

Do:

Scope mutation runs to changed code when possible
Report survivors with actionable fix recommendations
Re-run after fixes to confirm all mutants are now killed
Treat timeouts as killed (the mutation broke something)

Do not:

Skip mutation testing before PR creation
Accept surviving mutants without reporting them
Run mutations on the entire codebase when only a module changed
Recommend tests for data validation that belongs in domain types

Pipeline Mode

When invoked by the pipeline orchestrator:

A FAIL verdict routes automatically back to the tdd skill with the survivor list attached. The pipeline handles this rework routing -- mutation-testing just reports results.
Survivor details in the MUTATION_RESULT packet must be specific enough (file, line, mutation type, description) for the TDD pair to write targeted tests without re-running the mutation tool to understand what failed.
The pipeline may invoke mutation-testing multiple times per slice; each run overwrites the previous mutation.json for that slice.

Enforcement Note

Pipeline mode : Gating. 100% kill rate is a gate -- failing blocks merge.
Standalone mode : Advisory. The agent reports but cannot prevent override.

Hard constraints:

100% kill rate: [H] in pipeline mode, [RP] in standalone (block PR, user can override with documented reason)

See CONSTRAINT-RESOLUTION.md in the template directory for override documentation requirements.

Constraints

100% kill rate : 100% means 100%. Not "close enough." Not "98% with justification." The only path below 100% is an explicit user override, which MUST include a documented reason explaining why each surviving mutant is acceptable. "I want to ship" is not a reason. "This mutant tests logging output which is not a business rule" is a reason.
"No test without scenario" : Writing a test solely to kill a mutant without a corresponding scenario games the metric. The test proves you can kill the mutant, not that the behavior matters. If no scenario covers the behavior, the correct response is to flag the gap for the team -- the missing scenario might reveal a missing requirement.

Verification

After completing mutation testing, verify:

Mutation testing tool was run against the relevant scope
All surviving mutants are listed with file, line, and mutation type
Each survivor checked for scenario coverage before test recommendations
Each survivor with a covering scenario has a specific test recommendation
Each survivor without a covering scenario is flagged for human decision
Mutation score is 100% (or user explicitly chose to override)
If fixes were made, mutation testing was re-run to confirm

If any criterion is not met, revisit the relevant practice before proceeding.

Dependencies

This skill works standalone but is most valuable as a pre-PR quality gate. It integrates with:

tdd: TDD produces the tests that mutation testing validates; surviving mutants indicate the TDD cycle missed a case
code-review: Mutation results inform code review -- reviewers can check that new code has no surviving mutants

Missing a dependency? Install with:

npx skills add jwilger/agent-skills --skill tdd

Weekly Installs

Repository

jwilger/agent-skills

GitHub Stars

First Seen

Feb 12, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code74

cursor69

codex69

github-copilot67

amp67

kimi-cli67

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

44,900 周安装

Return value (true -> false, Ok -> Err): Return paths not checked

Statement removal (line deleted): Side effects not asserted