finding-duplicate-functions by obra/superpowers-lab
npx skills add https://github.com/obra/superpowers-lab --skill finding-duplicate-functionsLLM 生成的代码库会积累语义重复项:即功能相同但独立实现的函数。传统的复制粘贴检测器(jscpd)能找到语法重复项,但会遗漏"意图相同,实现不同"的情况。
本技能采用两阶段方法:先进行传统提取,再进行 LLM 驱动的意图聚类。
| 阶段 | 工具 | 模型 | 输出 |
|---|---|---|---|
| 1. 提取 | scripts/extract-functions.sh | - | catalog.json |
| 2. 分类 | scripts/categorize-prompt.md | haiku |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
categorized.json |
| 3. 拆分 | scripts/prepare-category-analysis.sh | - | categories/*.json |
| 4. 检测 | scripts/find-duplicates-prompt.md | opus | duplicates/*.json |
| 5. 报告 | scripts/generate-report.sh | - | report.md |
digraph duplicate_detection {
rankdir=TB;
node [shape=box];
extract [label="1. Extract function catalog\n./scripts/extract-functions.sh"];
categorize [label="2. Categorize by domain\n(haiku subagent)"];
split [label="3. Split into categories\n./scripts/prepare-category-analysis.sh"];
detect [label="4. Find duplicates per category\n(opus subagent per category)"];
report [label="5. Generate report\n./scripts/generate-report.sh"];
review [label="6. Human review & consolidate"];
extract -> categorize -> split -> detect -> report -> review;
}
./scripts/extract-functions.sh src/ -o catalog.json
选项:
-o FILE:输出文件(默认:stdout)-c N:捕获的上下文行数(默认:15)-t GLOB:文件类型(默认:*.ts,*.tsx,*.js,*.jsx)--include-tests:包含测试文件(默认排除)默认排除测试文件(*.test.*, *.spec.*, __tests__/**),因为测试工具不太可能是整合候选对象。
使用 scripts/categorize-prompt.md 中的提示词,调度一个 haiku 子代理。
将 catalog.json 的内容插入到提示词模板中指示的位置。将输出保存为 categorized.json。
./scripts/prepare-category-analysis.sh categorized.json ./categories
为每个类别创建一个 JSON 文件。只有包含 3 个或以上函数的类别才值得分析。
对于 ./categories/ 中的每个类别文件,使用 scripts/find-duplicates-prompt.md 中的提示词调度一个 opus 子代理。
将每个输出保存为 ./duplicates/{category}.json。
./scripts/generate-report.sh ./duplicates ./duplicates-report.md
生成一个按置信度分组、优先级排序的 Markdown 报告。
审查报告。对于高置信度重复项:
首先集中提取这些区域——它们积累重复项最快:
| 区域 | 常见重复项 |
|---|---|
utils/, helpers/, lib/ | 重新实现的通用工具函数 |
| 验证代码 | 以多种方式编写的相同检查 |
| 错误格式化 | 错误到字符串的转换 |
| 路径操作 | 连接、解析、规范化路径 |
| 字符串格式化 | 大小写转换、截断、转义 |
| 日期格式化 | 重复实现的相同格式 |
| API 响应塑形 | 针对不同端点的类似转换 |
提取过多内容:专注于导出的函数和公共方法。内部辅助函数在不同文件间重复的可能性较小。
跳过分类步骤:直接对整个目录进行重复检测会产生噪音。分类可以集中比较。
使用 haiku 进行重复检测:haiku 用于分类具有成本效益,但会遗漏细微的语义重复。实际的重复分析应使用 Opus。
在没有测试的情况下进行整合:在删除重复项之前,确保保留函数具有覆盖被删除函数所有用例的测试。
每周安装次数
199
代码仓库
GitHub 星标数
238
首次出现
2026年1月21日
安全审计
安装于
opencode167
codex157
gemini-cli153
github-copilot149
claude-code131
cursor130
LLM-generated codebases accumulate semantic duplicates: functions that serve the same purpose but were implemented independently. Classical copy-paste detectors (jscpd) find syntactic duplicates but miss "same intent, different implementation."
This skill uses a two-phase approach: classical extraction followed by LLM-powered intent clustering.
| Phase | Tool | Model | Output |
|---|---|---|---|
| 1. Extract | scripts/extract-functions.sh | - | catalog.json |
| 2. Categorize | scripts/categorize-prompt.md | haiku | categorized.json |
| 3. Split | scripts/prepare-category-analysis.sh | - | categories/*.json |
| 4. Detect | scripts/find-duplicates-prompt.md | opus | duplicates/*.json |
| 5. Report | scripts/generate-report.sh | - | report.md |
digraph duplicate_detection {
rankdir=TB;
node [shape=box];
extract [label="1. Extract function catalog\n./scripts/extract-functions.sh"];
categorize [label="2. Categorize by domain\n(haiku subagent)"];
split [label="3. Split into categories\n./scripts/prepare-category-analysis.sh"];
detect [label="4. Find duplicates per category\n(opus subagent per category)"];
report [label="5. Generate report\n./scripts/generate-report.sh"];
review [label="6. Human review & consolidate"];
extract -> categorize -> split -> detect -> report -> review;
}
./scripts/extract-functions.sh src/ -o catalog.json
Options:
-o FILE: Output file (default: stdout)-c N: Lines of context to capture (default: 15)-t GLOB: File types (default: *.ts,*.tsx,*.js,*.jsx)--include-tests: Include test files (excluded by default)Test files (*.test.*, *.spec.*, __tests__/**) are excluded by default since test utilities are less likely to be consolidation candidates.
Dispatch a haiku subagent using the prompt in scripts/categorize-prompt.md.
Insert the contents of catalog.json where indicated in the prompt template. Save output as categorized.json.
./scripts/prepare-category-analysis.sh categorized.json ./categories
Creates one JSON file per category. Only categories with 3+ functions are worth analyzing.
For each category file in ./categories/, dispatch an opus subagent using the prompt in scripts/find-duplicates-prompt.md.
Save each output as ./duplicates/{category}.json.
./scripts/generate-report.sh ./duplicates ./duplicates-report.md
Produces a prioritized markdown report grouped by confidence level.
Review the report. For HIGH confidence duplicates:
Focus extraction on these areas first - they accumulate duplicates fastest:
| Zone | Common Duplicates |
|---|---|
utils/, helpers/, lib/ | General utilities reimplemented |
| Validation code | Same checks written multiple ways |
| Error formatting | Error-to-string conversions |
| Path manipulation | Joining, resolving, normalizing paths |
| String formatting | Case conversion, truncation, escaping |
| Date formatting | Same formats implemented repeatedly |
| API response shaping | Similar transformations for different endpoints |
Extracting too much : Focus on exported functions and public methods. Internal helpers are less likely to be duplicated across files.
Skipping the categorization step : Going straight to duplicate detection on the full catalog produces noise. Categories focus the comparison.
Using haiku for duplicate detection : Haiku is cost-effective for categorization but misses subtle semantic duplicates. Use Opus for the actual duplicate analysis.
Consolidating without tests : Before deleting duplicates, ensure the survivor has tests covering all use cases of the deleted functions.
Weekly Installs
199
Repository
GitHub Stars
238
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode167
codex157
gemini-cli153
github-copilot149
claude-code131
cursor130
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
62,200 周安装
Web表单创建全攻略 - HTML/CSS/JS表单设计、PHP/Python服务器处理与验证
7,000 周安装
Google Calendar 跨日历查找空闲时间 - 多人会议时间安排工具 | Google Workspace CLI
7,300 周安装
Google Workspace CLI 项目经理角色技能 - 自动化项目协调与任务管理工具
7,400 周安装
响应式设计实战指南:掌握CSS容器查询、流式排版与移动优先布局
7,200 周安装
Google Workspace Events CLI 命令:管理订阅与事件流(gws-events)
7,300 周安装
Solidity 智能合约安全指南:防范重入攻击、溢出漏洞与访问控制
7,400 周安装