想法锦标赛：结构化研究提案生成框架，通过树状扩展与Elo评级选出最佳方案

idea-tournament by evoscientist/evoskills

69 周安装量

105 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/evoscientist/evoskills --skill idea-tournament

AI/机器学习研究方法科研工具

🇨🇳中文介绍

想法锦标赛

一个通过树状扩展生成多样化研究想法，然后基于四个质量维度进行 Elo 评级配对锦标赛，以选出最强候选者的结构化框架。

何时使用此技能

用户从 research-ideation 获得了一个研究方向，需要具体且经过排序的想法
用户在确定方案前，希望系统地比较多个研究想法
用户询问关于想法排序、竞争性选择或提案生成的问题
用户希望探索一个研究概念的各种变体并选出最佳方案
用户提及“想法锦标赛”、“想法排序”、“比较方法”、“研究提案”、“哪个想法最好”

从方向到提案

从“我有一个研究方向”到“我有一个具体提案”之间的差距是大多数研究者停滞的地方。他们要么固守第一个想法（错过了更好的替代方案），要么无休止地进行头脑风暴而无法收敛（分析瘫痪）。

锦标赛解决了这两个问题。第一阶段强制广度——通过系统地变化技术、领域和问题表述，生成最多 N_I=21 个候选想法（论文中的最大值）。第二阶段强制收敛——通过成对的 Elo 比较来识别最强的想法，而无需你同时在大脑中记住所有候选方案。

开始之前：

从构思记忆（M_I）加载先验知识：
- 参考 evo-memory 技能 → 在 /memory/ideation-memory.md 读取 M_I
- 通过将每个条目的摘要和检索标签与当前目标进行比较，选择与用户当前目标最相关的 top-2 条目（k_I=2）
- 可行的方向成为树的种子——将它们作为第一阶段中的第 1 级分支纳入
- 不成功的方向（仅指根本性失败）用于剪枝——剪除任何与根本性失败模式匹配的树分支
- 如果 M_I 尚不存在（第一个周期），则跳过此步骤
为用户目标 G 检索相关文献 L。论文将想法树搜索定义为 IdeaTreeSearch(G, L, K_I)——文献是与用户目标和检索到的记忆并列的正式输入。使用网络搜索或提供的论文，将想法生成建立在现有工作的基础上。

第一阶段：树状结构想法生成

通过每层改变一个轴，将一个种子想法扩展成一个候选想法树。树状结构确保了多样性——每个分支探索的是根本不同的变体，而不是同一概念的微小调整。

三个轴

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

每节点循环：提出 → 评审 → 优化

对于每个新节点：

提出：撰写 2-3 句话描述该想法
评审：批判性评估——这个想法与兄弟节点是否真正不同？它是否至少是合理的？
优化：根据评审意见优化描述。去除模糊的语言。使新颖性主张具体化。

在扩展每一层级后，剪除明显不可行的分支。一个分支在以下情况下是“明显不可行”的：

它需要根本不可用的资源（例如，你无法访问的专有数据集）
它与公认的理论结果相矛盾
它复制了现有的、成熟的解决方案，且没有有意义的变体
它作为根本性失败（而非实施失败）出现在 evo-memory 的不成功方向中

重要提示：剪枝只移除明显行不通的想法。不要剪除那些有风险、非传统或超出你当前专业知识的想法——这些正是锦标赛旨在公平评估的想法。

将完整的树保存到 /idea-tree.md。

有关详细的扩展规则和多样性指标，请参见 references/tree-search-protocol.md。

第二阶段：Elo 锦标赛排序

通过在四个质量维度上进行成对比较，对所有叶节点候选想法进行排序。瑞士制配对在保持比较次数可控的同时，仍能产生可靠的排名。

维度	权重	衡量内容
新颖性	25%	与现有已发表工作相比有何不同？
可行性	25%	能否在合理的时间和资源内实施和验证？
相关性	25%	是否解决了该领域一个重要且开放的问题？
清晰度	25%	想法是否定义得足够清晰，可以立即开始工作？

所有维度权重相等。研究者往往高估新颖性而低估可行性——相等的权重可以纠正这种偏见。

起始 Elo 分：所有候选者均为 1500。

K 因子：32（新玩家的标准值；足够大，使得少数几场比赛就能显著改变评分）。

瑞士制配对（4-5 轮）：

第 1 轮：随机配对
后续轮次：将当前 Elo 评分相近的候选者配对，避免重赛
对于 15-21 个候选者，4-5 轮足以产生稳定的排名

每场比赛流程：

并排呈现两个候选者及其完整描述
在所有 4 个维度上为每个候选者打分（1-10 分制）
计算综合得分（4 个维度的平均值）
确定比赛胜者（综合得分更高者）
使用标准公式更新 Elo 评分（有关公式、计算示例和收敛标准，请参见 elo-ranking-guide.md）

将排名保存到 /idea-rankings.md。

有关详细的评分标准和收敛标准，请参见 references/elo-ranking-guide.md。

第三阶段：方向总结

将排名前三的想法综合成一个“有前景的方向”摘要。这有两个目的：它保留了可选性（最好的想法可能结合了多个候选者的元素），并且为未来的周期提供信息给 evo-memory。

对于每个排名前三的想法：

提取核心研究方向（从具体实施细节中抽象出来）
识别使该方向有前景的关键见解
注意主要风险或不确定性
对照 evo-memory 检查——这个方向以前是否被探索过？学到了什么？

然后综合排名前三的想法：

排名前三的想法中有哪些共同点？这可能暗示着一个更强大的组合方向。
排名靠前的想法在哪些维度上表现出色？是否存在模式（例如，所有排名靠前的想法在可行性上得分高，但在新颖性上得分中等）？
缺少了什么？原始种子方向中有哪些重要方面是排名靠前的想法都没有解决的？

保存到 /direction-summary.md。

完成后，触发 evo-memory IDE（想法方向演进），用识别出的有前景方向更新构思记忆。

第四阶段：提案扩展

将锦标赛获胜者（排名第一）扩展成一个完整的研究提案，包含足够的细节以开始实施。

论文将提案 P 定义为包含 5 个部分：背景、相关工作、方法、实验计划和预期结果。我们在此基础上增加第 6 个实用部分（风险与缓解措施）。

1. 背景：定义确切的问题——输入、输出、约束，以及为什么现有解决方案不足。要具体：“在内存小于 2GB 的边缘设备上进行 LLM 推理，同时保持超过 90% 的全模型精度”是一个背景陈述；“让 LLM 更快”则不是。包括背景和动机。

2. 相关工作：将想法定位在现有文献中。已经尝试过什么？存在哪些空白？这应该借鉴第一阶段树生成期间检索到的文献 L。

3. 提议方法：以足以实施的水平描述技术方法。包括区别于先前工作的关键见解。明确陈述假设。列出 3 个可测试的贡献。

4. 实验计划：数据集、基线、指标和消融设计。这应该与 experiment-pipeline 第 4 阶段所需的内容保持一致。包括定量指标和适当的定性评估。

5. 预期结果：定量目标（例如，“延迟减少 15-20%，精度损失小于 2%”）和定性预期。具体说明预期结果会迫使你思考想法是否现实。

6. 风险与缓解措施（实用扩展）：可能阻碍成功的技术风险，以及每个风险的备用计划。没有风险的提案要么不诚实，要么分析不足。此部分不在论文中，但对实际研究规划很有价值。

保存到 /research-proposal.md。

有关每个部分的详细指导，请参见 references/proposal-extension.md。

反直觉的锦标赛规则

在想法生成和排序过程中优先遵循这些规则：

数量先于质量：在评估任何想法之前，先生成大量候选方案。过早的筛选会扼杀多样性。在看到替代方案之前，你无法知道哪个想法最强——最好的想法往往来自树中意想不到的分支。
每层只改变一个轴：同时改变多个轴会产生不同但并非有意义的多样性的想法。树的每一层应该只探索一个维度的变化，这样你才能准确理解是什么使每个分支独特。
可行性不是可选项：精彩但不可行的想法会浪费整个研究周期。一个新颖但无法在你的约束条件下验证的想法不是贡献——它只是一个思想实验。将可行性与新颖性同等加权。
锦标赛会发现惊喜：结构化的成对比较常常揭示你最初最喜欢的想法实际上并不是最强的。相信排名胜过你的直觉。如果结果让你惊讶，那意味着锦标赛正在发挥作用——它揭示了你仅凭直觉无法发现的信息。
剪枝不是选择：只剪除明显不可行的分支。锦标赛负责质量排序。如果你在锦标赛前就积极剪枝，你就是用你的初始直觉替代了系统性的比较——而这正是锦标赛旨在纠正的偏见。
前三名，而非仅第一名：总结前三名的方向（而不仅仅是获胜者）保留了可选性。最终的最佳方法可能结合了多个顶级候选者的元素。过早地只确定一个想法会丢弃有价值的信号。

当锦标赛完成且提案撰写完毕后，将这些成果传递给 paper-planning：

成果	来源阶段	使用者
研究提案（5+1 部分）	第四阶段	故事设计、实验规划
想法树（完整结构）	第一阶段	相关工作定位
带分数的 Elo 排名	第二阶段	所选方向的理由
方向摘要（前三名）	第三阶段	如果主要方向失败时的备选方向
锦标赛评分卡	第二阶段	理解想法的优势/劣势

同时将结果传递给 evo-memory 以进行演进更新：

使用第三阶段的前三名方向触发 IDE（想法方向演进）

开始前（加载记忆）

参考 evo-memory 技能读取构思记忆：→ 在 /memory/ideation-memory.md 读取 M_I

第三阶段后（更新记忆）

参考 evo-memory 技能并触发 IDE：→ 使用 /direction-summary.md 运行 IDE 协议

第四阶段后（交接给规划）

参考 paper-planning 技能：→ 传递 /research-proposal.md

主题	参考文件	何时使用
树扩展规则和多样性	tree-search-protocol.md	生成多样化的想法候选者
Elo 公式、评分标准和配对	elo-ranking-guide.md	运行锦标赛
提案部分指导	proposal-extension.md	撰写研究提案
想法候选者模板	idea-candidate-template.md	描述单个想法
排名评分卡模板	ranking-scorecard-template.md	记录成对比较
方向摘要模板	direction-summary-template.md	综合前三名方向

🇺🇸English

Idea Tournament

A structured framework for generating diverse research ideas through tree-based expansion, then selecting the strongest candidate via Elo-rated pairwise tournaments across four quality dimensions.

When to Use This Skill

User has a research direction from research-ideation and needs concrete, ranked ideas
User wants to systematically compare multiple research ideas before committing
User asks about idea ranking, competitive selection, or proposal generation
User wants to explore variations of a research concept and select the best one
User mentions "idea tournament", "rank ideas", "compare approaches", "research proposal", "which idea is best"

From Direction to Proposal

The gap between "I have a research direction" and "I have a concrete proposal" is where most researchers stall. They either commit to their first idea (missing better alternatives) or endlessly brainstorm without converging (analysis paralysis).

The tournament solves both problems. Phase 1 forces breadth — you generate up to N_I=21 candidates (the paper's maximum) by systematically varying technique, domain, and formulation. Phase 2 forces convergence — pairwise Elo comparisons identify the strongest idea without requiring you to hold all candidates in your head simultaneously.

Before starting:

Load prior knowledge from Ideation Memory (M_I):
- Refer to the evo-memory skill → Read M_I at /memory/ideation-memory.md
- Select the top-2 entries (k_I=2) most relevant to the user's current goal by comparing each entry's Summary and Retrieval Tags against the goal
- Feasible directions become tree seeds — incorporate them as Level 1 branches in Phase 1
- Unsuccessful directions (fundamental failures only) are used during pruning — prune any tree branch that matches a fundamental failure pattern
- If M_I doesn't exist yet (first cycle), skip this step
Retrieve relevant literature L for the user goal G. The paper defines idea tree search as IdeaTreeSearch(G, L, K_I) — literature is a formal input alongside the user goal and retrieved memory. Use web search or provided papers to ground idea generation in existing work.

Phase 1: Tree-Structured Idea Generation

Expand a seed idea into a tree of candidates by varying one axis per level. The tree structure ensures diversity — each branch explores a fundamentally different variation rather than minor tweaks of the same concept.

The Three Axes

Level	Axis	What Varies	Example
0	Seed	Starting research direction	"Efficient LLM inference"
1	Technique	The core technical approach	Pruning, quantization, distillation
2	Domain	The application context	Edge devices, multi-modal, long-context
3	Formulation	The problem framing	Latency-constrained, memory-constrained, accuracy-preserving

Expansion Process

Level 0 — Seed (1 node) : Start with the research direction from research-ideation. This is your root node.

Level 1 — Technique variants (3 nodes) : Generate 3 fundamentally different technical approaches to the seed direction. These should be distinct paradigms, not variations of the same technique. Reflect carefully to verify each is genuinely different.

Level 2 — Domain adaptations (6-9 nodes) : For each Level 1 node, generate 2-3 domain-specific adaptations. How does this technique apply differently in different contexts? What domain-specific constraints create new challenges?

Level 3 — Formulation variants (up to N_I=21 total leaves) : For each Level 2 node, refine into 1-3 specific problem formulations. A formulation pins down the exact problem statement — the inputs, outputs, constraints, and evaluation criteria. The paper sets N_I=21 as the maximum number of candidate ideas. If the tree produces fewer than 15 leaves, expand Level 2 or Level 3 further. If more than 21, prune to stay within the N_I limit.

Per-Node Cycle: Propose → Review → Refine

For each new node:

Propose : Write a 2-3 sentence description of the idea
Review : Evaluate critically — Is this genuinely different from sibling nodes? Is it at least plausible?
Refine : Sharpen the description based on the review. Remove vague language. Make the novelty claim specific.

Pruning

After expanding each level, prune clearly infeasible branches. A branch is "clearly infeasible" if:

It requires resources fundamentally unavailable (e.g., proprietary datasets you can't access)
It contradicts well-established theoretical results
It duplicates an existing, well-established solution with no meaningful variation
It appears in evo-memory's unsuccessful directions as a fundamental failure (not implementation failure)

Important : Pruning removes only the obviously unworkable. Do NOT prune ideas that are risky, unconventional, or outside your current expertise — these are exactly the ideas tournaments are designed to evaluate fairly.

Save the complete tree to /idea-tree.md.

See references/tree-search-protocol.md for detailed expansion rules and diversity metrics.

Phase 2: Elo Tournament Ranking

Rank all leaf candidates through pairwise comparisons on four quality dimensions. Swiss-system pairing keeps the number of comparisons manageable while still producing reliable rankings.

The Four Dimensions

Dimension	Weight	What It Measures
Novelty	25%	How different is this from existing published work?
Feasibility	25%	Can this be implemented and validated within reasonable time and resources?
Relevance	25%	Does this address an important, open problem in the field?
Clarity	25%	Is the idea well-defined enough to start working on immediately?

All dimensions are weighted equally. Researchers tend to overweight novelty and underweight feasibility — equal weights correct this bias.

Tournament Mechanics

Starting Elo : 1500 for all candidates.

K-factor : 32 (standard for new players; large enough that a few matches significantly move ratings).

Swiss-system pairing (4-5 rounds):

Round 1: Random pairing
Subsequent rounds: Pair candidates with similar current Elo ratings, avoiding rematches
4-5 rounds is sufficient for 15-21 candidates to produce stable rankings

Per-match process :

Present both candidates side by side with their full descriptions
Score each on all 4 dimensions (1-10 scale)
Compute composite scores (average of 4 dimensions)
Determine the match winner (higher composite score)
Update Elo ratings using the standard formula (see elo-ranking-guide.md for the formula, worked example, and convergence criteria)

Save rankings to /idea-rankings.md.

See references/elo-ranking-guide.md for the detailed rubric and convergence criteria.

Phase 3: Direction Summarization

Synthesize the top-3 ranked ideas into a "promising directions" summary. This serves two purposes: it preserves optionality (the best idea may combine elements from multiple candidates), and it feeds into evo-memory for future cycles.

Summarization Process

For each of the top-3 ideas:

Extract the core research direction (abstract away from specific implementation details)
Identify the key insight that makes this direction promising
Note the primary risk or uncertainty
Check against evo-memory — has this direction been explored before? What was learned?

Then synthesize across the top-3:

What common threads run through the top-3? These may suggest an even stronger combined direction.
What dimensions do the top ideas excel in? Are there patterns (e.g., all top ideas score high on feasibility but moderate on novelty)?
What's missing? Are there important aspects of the original seed that none of the top ideas address?

Save to /direction-summary.md.

After completion, trigger evo-memory IDE (Idea Direction Evolution) to update Ideation Memory with the promising directions identified.

Phase 4: Proposal Extension

Extend the tournament winner (rank #1) into a full research proposal with enough detail to begin implementation.

Proposal Structure

The paper defines proposal P as containing 5 sections: background, related work, method, experiment plan, and expected results. We extend this with a 6th practical section (risks and mitigations).

1. Background : Define the exact problem — inputs, outputs, constraints, and why existing solutions are insufficient. Be specific: "LLM inference on edge devices with <2GB memory while maintaining >90% of full-model accuracy" is a background statement; "make LLMs faster" is not. Include context and motivation.

2. Related Work : Position the idea within the existing literature. What has been tried? What are the gaps? This should draw on the literature L retrieved during Phase 1 tree generation.

3. Proposed Method : Describe the technical approach at a level of detail sufficient for implementation. Include the key insight that differentiates this from prior work. State assumptions explicitly. List 3 testable contributions.

4. Experiment Plan : Datasets, baselines, metrics, and ablation design. This should align with what experiment-pipeline Stage 4 will need. Include both quantitative metrics and qualitative evaluation where appropriate.

5. Expected Results : Quantitative targets (e.g., "15-20% latency reduction with <2% accuracy loss") and qualitative expectations. Being specific about expected results forces you to think about whether the idea is realistic.

6. Risks and Mitigations (practical extension): Technical risks that could prevent success, and fallback plans for each. A proposal without risks is either dishonest or insufficiently analyzed. This section is not in the paper but is valuable for practical research planning.

Save to /research-proposal.md.

See references/proposal-extension.md for detailed guidance on each section.

Counterintuitive Tournament Rules

Prioritize these rules during idea generation and ranking:

Quantity before quality : Generate many candidates before evaluating any. Premature filtering kills diversity. You can't know which idea is strongest until you've seen the alternatives — and the best ideas often emerge from unexpected branches of the tree.
Vary one axis per level : Changing multiple axes simultaneously produces ideas that are different but not meaningfully diverse. Each level of the tree should explore ONE dimension of variation, so you understand exactly what makes each branch unique.
Feasibility is not optional : Brilliant but infeasible ideas waste entire research cycles. A novel idea that can't be validated within your constraints is not a contribution — it's a thought experiment. Weight feasibility equally with novelty.
The tournament finds surprises : Structured pairwise comparison often reveals that your initial favorite isn't actually the strongest idea. Trust the rankings over your gut feeling. If the results surprise you, that means the tournament is working — it's surfacing information you wouldn't have found through intuition alone.
Pruning is not selecting : Prune only clearly infeasible branches. The tournament handles quality ranking. If you aggressively prune before the tournament, you're substituting your initial intuition for systematic comparison — exactly the bias the tournament is designed to correct.
Top-3, not top-1 : Summarizing the top 3 directions (not just the winner) preserves optionality. The best final approach may combine elements from multiple top candidates. Committing to exactly one idea too early discards valuable signal.

Handoff to Planning

When the tournament is complete and the proposal is written, pass these artifacts to paper-planning:

Artifact	Source Phase	Used By
Research proposal (5+1 sections)	Phase 4	Story design, experiment planning
Idea tree (full structure)	Phase 1	Related work positioning
Elo rankings with scores	Phase 2	Justification for chosen direction
Direction summary (top-3)	Phase 3	Fallback directions if primary fails
Tournament scorecards	Phase 2	Understanding idea strengths/weaknesses

Also pass results to evo-memory for evolution updates:

Trigger IDE (Idea Direction Evolution) with the top-3 directions from Phase 3

Skill Integration

Before Starting (load memory)

Refer to the evo-memory skill to read Ideation Memory: → Read M_I at /memory/ideation-memory.md

After Phase 3 (update memory)

Refer to the evo-memory skill and trigger IDE: → Run IDE protocol with /direction-summary.md

After Phase 4 (handoff to planning)

Refer to the paper-planning skill: → Pass /research-proposal.md

Reference Navigation

Topic	Reference File	When to Use
Tree expansion rules and diversity	tree-search-protocol.md	Generating diverse idea candidates
Elo formula, rubric, and pairing	elo-ranking-guide.md	Running the tournament
Proposal section guidance	proposal-extension.md	Writing the research proposal
Idea candidate template	idea-candidate-template.md	Describing individual ideas
Ranking scorecard template	ranking-scorecard-template.md	Recording pairwise comparisons

Weekly Installs

Repository

evoscientist/evoskills

GitHub Stars

105

First Seen

10 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

gemini-cli69

codex69

kimi-cli69

github-copilot69

cursor69

opencode69

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

50,900 周安装

0	种子	起始研究方向	"高效 LLM 推理"
1	技术	核心技术方法	剪枝、量化、蒸馏
2	领域	应用场景	边缘设备、多模态、长上下文
3	表述	问题框架	延迟受限、内存受限、精度保持