idea-tournament by evoscientist/evoskills
npx skills add https://github.com/evoscientist/evoskills --skill idea-tournament一个通过树状扩展生成多样化研究想法,然后基于四个质量维度进行 Elo 评级配对锦标赛,以选出最强候选者的结构化框架。
research-ideation 获得了一个研究方向,需要具体且经过排序的想法从“我有一个研究方向”到“我有一个具体提案”之间的差距是大多数研究者停滞的地方。他们要么固守第一个想法(错过了更好的替代方案),要么无休止地进行头脑风暴而无法收敛(分析瘫痪)。
锦标赛解决了这两个问题。第一阶段强制广度——通过系统地变化技术、领域和问题表述,生成最多 N_I=21 个候选想法(论文中的最大值)。第二阶段强制收敛——通过成对的 Elo 比较来识别最强的想法,而无需你同时在大脑中记住所有候选方案。
开始之前:
/memory/ideation-memory.md 读取 M_I通过每层改变一个轴,将一个种子想法扩展成一个候选想法树。树状结构确保了多样性——每个分支探索的是根本不同的变体,而不是同一概念的微小调整。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 层级 | 轴 | 变化内容 | 示例 |
|---|
| 0 | 种子 | 起始研究方向 | "高效 LLM 推理" |
| 1 | 技术 | 核心技术方法 | 剪枝、量化、蒸馏 |
| 2 | 领域 | 应用场景 | 边缘设备、多模态、长上下文 |
| 3 | 表述 | 问题框架 | 延迟受限、内存受限、精度保持 |
第 0 级 — 种子(1 个节点):从 research-ideation 获得的研究方向开始。这是你的根节点。
第 1 级 — 技术变体(3 个节点):为种子方向生成 3 种根本不同的技术方法。这些应该是不同的范式,而不是同一技术的变体。仔细思考,验证每个方法是否真正不同。
第 2 级 — 领域适配(6-9 个节点):对于每个第 1 级节点,生成 2-3 个特定领域的适配方案。该技术在不同场景下如何差异化应用?哪些特定领域的约束带来了新的挑战?
第 3 级 — 表述变体(最多 N_I=21 个总叶节点):对于每个第 2 级节点,细化为 1-3 个具体的问题表述。一个表述确定了确切的问题陈述——输入、输出、约束和评估标准。论文设定 N_I=21 为候选想法的最大数量。如果树产生的叶节点少于 15 个,则进一步扩展第 2 级或第 3 级。如果超过 21 个,则进行剪枝以保持在 N_I 限制内。
对于每个新节点:
在扩展每一层级后,剪除明显不可行的分支。一个分支在以下情况下是“明显不可行”的:
evo-memory 的不成功方向中重要提示:剪枝只移除明显行不通的想法。不要剪除那些有风险、非传统或超出你当前专业知识的想法——这些正是锦标赛旨在公平评估的想法。
将完整的树保存到 /idea-tree.md。
有关详细的扩展规则和多样性指标,请参见 references/tree-search-protocol.md。
通过在四个质量维度上进行成对比较,对所有叶节点候选想法进行排序。瑞士制配对在保持比较次数可控的同时,仍能产生可靠的排名。
| 维度 | 权重 | 衡量内容 |
|---|---|---|
| 新颖性 | 25% | 与现有已发表工作相比有何不同? |
| 可行性 | 25% | 能否在合理的时间和资源内实施和验证? |
| 相关性 | 25% | 是否解决了该领域一个重要且开放的问题? |
| 清晰度 | 25% | 想法是否定义得足够清晰,可以立即开始工作? |
所有维度权重相等。研究者往往高估新颖性而低估可行性——相等的权重可以纠正这种偏见。
起始 Elo 分:所有候选者均为 1500。
K 因子:32(新玩家的标准值;足够大,使得少数几场比赛就能显著改变评分)。
瑞士制配对(4-5 轮):
每场比赛流程:
将排名保存到 /idea-rankings.md。
有关详细的评分标准和收敛标准,请参见 references/elo-ranking-guide.md。
将排名前三的想法综合成一个“有前景的方向”摘要。这有两个目的:它保留了可选性(最好的想法可能结合了多个候选者的元素),并且为未来的周期提供信息给 evo-memory。
对于每个排名前三的想法:
evo-memory 检查——这个方向以前是否被探索过?学到了什么?然后综合排名前三的想法:
保存到 /direction-summary.md。
完成后,触发 evo-memory IDE(想法方向演进),用识别出的有前景方向更新构思记忆。
将锦标赛获胜者(排名第一)扩展成一个完整的研究提案,包含足够的细节以开始实施。
论文将提案 P 定义为包含 5 个部分:背景、相关工作、方法、实验计划和预期结果。我们在此基础上增加第 6 个实用部分(风险与缓解措施)。
1. 背景:定义确切的问题——输入、输出、约束,以及为什么现有解决方案不足。要具体:“在内存小于 2GB 的边缘设备上进行 LLM 推理,同时保持超过 90% 的全模型精度”是一个背景陈述;“让 LLM 更快”则不是。包括背景和动机。
2. 相关工作:将想法定位在现有文献中。已经尝试过什么?存在哪些空白?这应该借鉴第一阶段树生成期间检索到的文献 L。
3. 提议方法:以足以实施的水平描述技术方法。包括区别于先前工作的关键见解。明确陈述假设。列出 3 个可测试的贡献。
4. 实验计划:数据集、基线、指标和消融设计。这应该与 experiment-pipeline 第 4 阶段所需的内容保持一致。包括定量指标和适当的定性评估。
5. 预期结果:定量目标(例如,“延迟减少 15-20%,精度损失小于 2%”)和定性预期。具体说明预期结果会迫使你思考想法是否现实。
6. 风险与缓解措施(实用扩展):可能阻碍成功的技术风险,以及每个风险的备用计划。没有风险的提案要么不诚实,要么分析不足。此部分不在论文中,但对实际研究规划很有价值。
保存到 /research-proposal.md。
有关每个部分的详细指导,请参见 references/proposal-extension.md。
在想法生成和排序过程中优先遵循这些规则:
数量先于质量:在评估任何想法之前,先生成大量候选方案。过早的筛选会扼杀多样性。在看到替代方案之前,你无法知道哪个想法最强——最好的想法往往来自树中意想不到的分支。
每层只改变一个轴:同时改变多个轴会产生不同但并非有意义的多样性的想法。树的每一层应该只探索一个维度的变化,这样你才能准确理解是什么使每个分支独特。
可行性不是可选项:精彩但不可行的想法会浪费整个研究周期。一个新颖但无法在你的约束条件下验证的想法不是贡献——它只是一个思想实验。将可行性与新颖性同等加权。
锦标赛会发现惊喜:结构化的成对比较常常揭示你最初最喜欢的想法实际上并不是最强的。相信排名胜过你的直觉。如果结果让你惊讶,那意味着锦标赛正在发挥作用——它揭示了你仅凭直觉无法发现的信息。
剪枝不是选择:只剪除明显不可行的分支。锦标赛负责质量排序。如果你在锦标赛前就积极剪枝,你就是用你的初始直觉替代了系统性的比较——而这正是锦标赛旨在纠正的偏见。
前三名,而非仅第一名:总结前三名的方向(而不仅仅是获胜者)保留了可选性。最终的最佳方法可能结合了多个顶级候选者的元素。过早地只确定一个想法会丢弃有价值的信号。
当锦标赛完成且提案撰写完毕后,将这些成果传递给 paper-planning:
| 成果 | 来源阶段 | 使用者 |
|---|---|---|
| 研究提案(5+1 部分) | 第四阶段 | 故事设计、实验规划 |
| 想法树(完整结构) | 第一阶段 | 相关工作定位 |
| 带分数的 Elo 排名 | 第二阶段 | 所选方向的理由 |
| 方向摘要(前三名) | 第三阶段 | 如果主要方向失败时的备选方向 |
| 锦标赛评分卡 | 第二阶段 | 理解想法的优势/劣势 |
同时将结果传递给 evo-memory 以进行演进更新:
参考 evo-memory 技能读取构思记忆:→ 在 /memory/ideation-memory.md 读取 M_I
参考 evo-memory 技能并触发 IDE:→ 使用 /direction-summary.md 运行 IDE 协议
参考 paper-planning 技能:→ 传递 /research-proposal.md
| 主题 | 参考文件 | 何时使用 |
|---|---|---|
| 树扩展规则和多样性 | tree-search-protocol.md | 生成多样化的想法候选者 |
| Elo 公式、评分标准和配对 | elo-ranking-guide.md | 运行锦标赛 |
| 提案部分指导 | proposal-extension.md | 撰写研究提案 |
| 想法候选者模板 | idea-candidate-template.md | 描述单个想法 |
| 排名评分卡模板 | ranking-scorecard-template.md | 记录成对比较 |
| 方向摘要模板 | direction-summary-template.md | 综合前三名方向 |
每周安装数
69
仓库
GitHub 星标数
105
首次出现
10 天前
安全审计
安装于
gemini-cli69
codex69
kimi-cli69
github-copilot69
cursor69
opencode69
A structured framework for generating diverse research ideas through tree-based expansion, then selecting the strongest candidate via Elo-rated pairwise tournaments across four quality dimensions.
research-ideation and needs concrete, ranked ideasThe gap between "I have a research direction" and "I have a concrete proposal" is where most researchers stall. They either commit to their first idea (missing better alternatives) or endlessly brainstorm without converging (analysis paralysis).
The tournament solves both problems. Phase 1 forces breadth — you generate up to N_I=21 candidates (the paper's maximum) by systematically varying technique, domain, and formulation. Phase 2 forces convergence — pairwise Elo comparisons identify the strongest idea without requiring you to hold all candidates in your head simultaneously.
Before starting:
/memory/ideation-memory.mdExpand a seed idea into a tree of candidates by varying one axis per level. The tree structure ensures diversity — each branch explores a fundamentally different variation rather than minor tweaks of the same concept.
| Level | Axis | What Varies | Example |
|---|---|---|---|
| 0 | Seed | Starting research direction | "Efficient LLM inference" |
| 1 | Technique | The core technical approach | Pruning, quantization, distillation |
| 2 | Domain | The application context | Edge devices, multi-modal, long-context |
| 3 | Formulation | The problem framing | Latency-constrained, memory-constrained, accuracy-preserving |
Level 0 — Seed (1 node) : Start with the research direction from research-ideation. This is your root node.
Level 1 — Technique variants (3 nodes) : Generate 3 fundamentally different technical approaches to the seed direction. These should be distinct paradigms, not variations of the same technique. Reflect carefully to verify each is genuinely different.
Level 2 — Domain adaptations (6-9 nodes) : For each Level 1 node, generate 2-3 domain-specific adaptations. How does this technique apply differently in different contexts? What domain-specific constraints create new challenges?
Level 3 — Formulation variants (up to N_I=21 total leaves) : For each Level 2 node, refine into 1-3 specific problem formulations. A formulation pins down the exact problem statement — the inputs, outputs, constraints, and evaluation criteria. The paper sets N_I=21 as the maximum number of candidate ideas. If the tree produces fewer than 15 leaves, expand Level 2 or Level 3 further. If more than 21, prune to stay within the N_I limit.
For each new node:
After expanding each level, prune clearly infeasible branches. A branch is "clearly infeasible" if:
evo-memory's unsuccessful directions as a fundamental failure (not implementation failure)Important : Pruning removes only the obviously unworkable. Do NOT prune ideas that are risky, unconventional, or outside your current expertise — these are exactly the ideas tournaments are designed to evaluate fairly.
Save the complete tree to /idea-tree.md.
See references/tree-search-protocol.md for detailed expansion rules and diversity metrics.
Rank all leaf candidates through pairwise comparisons on four quality dimensions. Swiss-system pairing keeps the number of comparisons manageable while still producing reliable rankings.
| Dimension | Weight | What It Measures |
|---|---|---|
| Novelty | 25% | How different is this from existing published work? |
| Feasibility | 25% | Can this be implemented and validated within reasonable time and resources? |
| Relevance | 25% | Does this address an important, open problem in the field? |
| Clarity | 25% | Is the idea well-defined enough to start working on immediately? |
All dimensions are weighted equally. Researchers tend to overweight novelty and underweight feasibility — equal weights correct this bias.
Starting Elo : 1500 for all candidates.
K-factor : 32 (standard for new players; large enough that a few matches significantly move ratings).
Swiss-system pairing (4-5 rounds):
Per-match process :
Save rankings to /idea-rankings.md.
See references/elo-ranking-guide.md for the detailed rubric and convergence criteria.
Synthesize the top-3 ranked ideas into a "promising directions" summary. This serves two purposes: it preserves optionality (the best idea may combine elements from multiple candidates), and it feeds into evo-memory for future cycles.
For each of the top-3 ideas:
evo-memory — has this direction been explored before? What was learned?Then synthesize across the top-3:
Save to /direction-summary.md.
After completion, trigger evo-memory IDE (Idea Direction Evolution) to update Ideation Memory with the promising directions identified.
Extend the tournament winner (rank #1) into a full research proposal with enough detail to begin implementation.
The paper defines proposal P as containing 5 sections: background, related work, method, experiment plan, and expected results. We extend this with a 6th practical section (risks and mitigations).
1. Background : Define the exact problem — inputs, outputs, constraints, and why existing solutions are insufficient. Be specific: "LLM inference on edge devices with <2GB memory while maintaining >90% of full-model accuracy" is a background statement; "make LLMs faster" is not. Include context and motivation.
2. Related Work : Position the idea within the existing literature. What has been tried? What are the gaps? This should draw on the literature L retrieved during Phase 1 tree generation.
3. Proposed Method : Describe the technical approach at a level of detail sufficient for implementation. Include the key insight that differentiates this from prior work. State assumptions explicitly. List 3 testable contributions.
4. Experiment Plan : Datasets, baselines, metrics, and ablation design. This should align with what experiment-pipeline Stage 4 will need. Include both quantitative metrics and qualitative evaluation where appropriate.
5. Expected Results : Quantitative targets (e.g., "15-20% latency reduction with <2% accuracy loss") and qualitative expectations. Being specific about expected results forces you to think about whether the idea is realistic.
6. Risks and Mitigations (practical extension): Technical risks that could prevent success, and fallback plans for each. A proposal without risks is either dishonest or insufficiently analyzed. This section is not in the paper but is valuable for practical research planning.
Save to /research-proposal.md.
See references/proposal-extension.md for detailed guidance on each section.
Prioritize these rules during idea generation and ranking:
Quantity before quality : Generate many candidates before evaluating any. Premature filtering kills diversity. You can't know which idea is strongest until you've seen the alternatives — and the best ideas often emerge from unexpected branches of the tree.
Vary one axis per level : Changing multiple axes simultaneously produces ideas that are different but not meaningfully diverse. Each level of the tree should explore ONE dimension of variation, so you understand exactly what makes each branch unique.
Feasibility is not optional : Brilliant but infeasible ideas waste entire research cycles. A novel idea that can't be validated within your constraints is not a contribution — it's a thought experiment. Weight feasibility equally with novelty.
The tournament finds surprises : Structured pairwise comparison often reveals that your initial favorite isn't actually the strongest idea. Trust the rankings over your gut feeling. If the results surprise you, that means the tournament is working — it's surfacing information you wouldn't have found through intuition alone.
Pruning is not selecting : Prune only clearly infeasible branches. The tournament handles quality ranking. If you aggressively prune before the tournament, you're substituting your initial intuition for systematic comparison — exactly the bias the tournament is designed to correct.
Top-3, not top-1 : Summarizing the top 3 directions (not just the winner) preserves optionality. The best final approach may combine elements from multiple top candidates. Committing to exactly one idea too early discards valuable signal.
When the tournament is complete and the proposal is written, pass these artifacts to paper-planning:
| Artifact | Source Phase | Used By |
|---|---|---|
| Research proposal (5+1 sections) | Phase 4 | Story design, experiment planning |
| Idea tree (full structure) | Phase 1 | Related work positioning |
| Elo rankings with scores | Phase 2 | Justification for chosen direction |
| Direction summary (top-3) | Phase 3 | Fallback directions if primary fails |
| Tournament scorecards | Phase 2 | Understanding idea strengths/weaknesses |
Also pass results to evo-memory for evolution updates:
Refer to the evo-memory skill to read Ideation Memory: → Read M_I at /memory/ideation-memory.md
Refer to the evo-memory skill and trigger IDE: → Run IDE protocol with /direction-summary.md
Refer to the paper-planning skill: → Pass /research-proposal.md
| Topic | Reference File | When to Use |
|---|---|---|
| Tree expansion rules and diversity | tree-search-protocol.md | Generating diverse idea candidates |
| Elo formula, rubric, and pairing | elo-ranking-guide.md | Running the tournament |
| Proposal section guidance | proposal-extension.md | Writing the research proposal |
| Idea candidate template | idea-candidate-template.md | Describing individual ideas |
| Ranking scorecard template | ranking-scorecard-template.md | Recording pairwise comparisons |
Weekly Installs
69
Repository
GitHub Stars
105
First Seen
10 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
gemini-cli69
codex69
kimi-cli69
github-copilot69
cursor69
opencode69
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
50,900 周安装
| Direction summary template |
| direction-summary-template.md |
| Synthesizing top-3 directions |