Hypogenic：基于LLM的自动化科学假设生成与测试框架，加速AI科研发现

hypogenic by davila7/claude-code-templates

136 周安装量

23,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill hypogenic

AI/机器学习自动化科研工具

🇨🇳中文介绍

Hypogenic

概述

Hypogenic 利用大型语言模型提供自动化的假设生成与测试，以加速科学发现。该框架支持三种方法：HypoGeniC（数据驱动的假设生成）、HypoRefine（文献与数据的协同整合）以及 Union 方法（文献与数据驱动假设的机制性结合）。

快速开始

几分钟内即可开始使用 Hypogenic：

# 安装包
uv pip install hypogenic

# 克隆示例数据集
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data

# 运行基础假设生成
hypogenic_generation --config ./data/your_task/config.yaml --method hypogenic --num_hypotheses 20

# 对生成的假设进行推理
hypogenic_inference --config ./data/your_task/config.yaml --hypotheses output/hypotheses.json

或使用 Python API：

from hypogenic import BaseTask

# 使用您的配置创建任务
task = BaseTask(config_path="./data/your_task/config.yaml")

# 生成假设
task.generate_hypotheses(method="hypogenic", num_hypotheses=20)

# 运行推理
results = task.inference(hypothesis_bank="./output/hypotheses.json")

何时使用此技能

在以下场景中使用此技能：

从观测数据集中生成科学假设
系统地测试多个竞争性假设
将文献见解与经验模式相结合
通过自动化假设构思加速研究探索
需要假设驱动分析的领域：欺骗检测、AI生成内容识别、心理健康指标、预测建模或其他实证研究

主要特性

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

1. HypoGeniC：数据驱动的假设生成

通过迭代优化，仅从观测数据生成假设。

使用小型数据子集初始化，生成候选假设
基于性能迭代优化假设
用来自挑战性示例的新假设替换性能不佳的假设

最适合： 无现有文献的探索性研究、新数据集中的模式发现

2. HypoRefine：文献与数据整合

通过智能体框架协同结合现有文献与经验数据。

从相关研究论文中提取见解（通常 10 篇论文）
从文献中生成基于理论的假设
从观测模式中生成数据驱动的假设
通过迭代改进来优化两个假设库

最适合： 具有成熟理论基础的研究、验证或扩展现有理论

机制性地将纯文献假设与框架输出相结合。

Literature ∪ HypoGeniC ：将文献假设与数据驱动生成相结合
Literature ∪ HypoRefine ：将文献假设与整合方法相结合

最适合： 全面的假设覆盖，在保持多元视角的同时消除冗余

通过 pip 安装：

uv pip install hypogenic

可选依赖项：

Redis 服务器（端口 6832）：启用 LLM 响应缓存，在迭代假设生成过程中显著降低 API 成本
s2orc-doc2json ：在 HypoRefine 工作流中处理文献 PDF 所必需
GROBID ：PDF 预处理所必需（参见文献处理部分）

克隆示例数据集：

# 用于 HypoGeniC 示例
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data

# 用于 HypoRefine/Union 示例
git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data

数据集必须遵循 HuggingFace 数据集格式，并遵守特定的命名约定：

<TASK>_train.json：训练数据
<TASK>_val.json：验证数据
<TASK>_test.json：测试数据

JSON 中的必需键：

text_features_1 到 text_features_n：包含特征值的字符串列表
label：包含真实标签的字符串列表

示例（标题点击率预测）：

{
  "headline_1": [
    "What Up, Comet? You Just Got *PROBED*",
    "Scientists Made a Breakthrough in Quantum Computing"
  ],
  "headline_2": [
    "Scientists Everywhere Were Holding Their Breath Today. Here's Why.",
    "New Quantum Computer Achieves Milestone"
  ],
  "label": [
    "Headline 2 has more clicks than Headline 1",
    "Headline 1 has more clicks than Headline 2"
  ]
}

所有列表必须具有相同的长度
标签格式必须与您的 extract_label() 函数输出格式匹配
特征键可以自定义以匹配您的领域（例如 review_text、post_content 等）

每个任务都需要一个 config.yaml 文件来指定：

数据集路径（训练/验证/测试）
提示模板，用于：
- 观察生成
- 批量假设生成
- 假设推理
- 相关性检查
- 自适应方法（用于 HypoRefine）

用于动态变量注入的数据集占位符（例如 ${text_features_1}、${num_hypotheses}）
用于领域特定解析的自定义标签提取函数
基于角色的提示结构（系统、用户、助手角色）

task_name: your_task_name

train_data_path: ./your_task_train.json
val_data_path: ./your_task_val.json
test_data_path: ./your_task_test.json

prompt_templates:
  # 用于可重用提示组件的额外键
  observations: |
    Feature 1: ${text_features_1}
    Feature 2: ${text_features_2}
    Observation: ${label}
  
  # 必需模板
  batched_generation:
    system: "Your system prompt here"
    user: "Your user prompt with ${num_hypotheses} placeholder"
  
  inference:
    system: "Your inference system prompt"
    user: "Your inference user prompt"
  
  # 高级功能的可选模板
  few_shot_baseline: {...}
  is_relevant: {...}
  adaptive_inference: {...}
  adaptive_selection: {...}

请参考 references/config_template.yaml 获取完整的示例配置。

文献处理（HypoRefine/Union 方法）

要使用基于文献的假设生成，您必须预处理 PDF 论文：

步骤 1：设置 GROBID（仅首次）

bash ./modules/setup_grobid.sh

步骤 2：添加 PDF 文件 将研究论文放置在 literature/YOUR_TASK_NAME/raw/ 中

步骤 3：处理 PDF

# 启动 GROBID 服务
bash ./modules/run_grobid.sh

# 为您的任务处理 PDF
cd examples
python pdf_preprocess.py --task_name YOUR_TASK_NAME

这将把 PDF 转换为结构化格式以进行假设提取。未来的版本将支持自动化文献搜索。

hypogenic_generation --help

任务配置文件路径
模型选择（基于 API 或本地）
生成方法（HypoGeniC、HypoRefine 或 Union）
要生成的假设数量
假设库的输出目录

hypogenic_inference --help

任务配置文件路径
假设库文件路径
测试数据集路径
推理方法（默认或多假设）
结果的输出文件

为了进行程序化控制和自定义工作流，请在您的 Python 代码中直接使用 Hypogenic：

基础 HypoGeniC 生成

from hypogenic import BaseTask

# 首先克隆示例数据集
# git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data

# 使用自定义 extract_label 函数加载您的任务
task = BaseTask(
    config_path="./data/your_task/config.yaml",
    extract_label=lambda text: extract_your_label(text)
)

# 生成假设
task.generate_hypotheses(
    method="hypogenic",
    num_hypotheses=20,
    output_path="./output/hypotheses.json"
)

# 运行推理
results = task.inference(
    hypothesis_bank="./output/hypotheses.json",
    test_data="./data/your_task/your_task_test.json"
)

HypoRefine/Union 方法

# 用于文献整合方法
# git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data

# 使用 HypoRefine 生成
task.generate_hypotheses(
    method="hyporefine",
    num_hypotheses=15,
    literature_path="./literature/your_task/",
    output_path="./output/"
)
# 这将生成 3 个假设库：
# - HypoRefine（整合方法）
# - 纯文献假设
# - Literature∪HypoRefine（并集）

from examples.multi_hyp_inference import run_multi_hypothesis_inference

# 同时测试多个假设
results = run_multi_hypothesis_inference(
    config_path="./data/your_task/config.yaml",
    hypothesis_bank="./output/hypotheses.json",
    test_data="./data/your_task/your_task_test.json"
)

自定义标签提取

extract_label() 函数对于解析 LLM 输出至关重要。根据您的任务实现它：

def extract_label(llm_output: str) -> str:
    """从 LLM 推理文本中提取预测标签。
    
    默认行为：搜索 'final answer:\s+(.*)' 模式。
    为您的领域特定输出格式进行自定义。
    """
    import re
    match = re.search(r'final answer:\s+(.*)', llm_output, re.IGNORECASE)
    if match:
        return match.group(1).strip()
    return llm_output.strip()

重要： 提取的标签必须与数据集中 label 值的格式匹配，以确保准确度计算正确。

示例 1：数据驱动的假设生成（HypoGeniC）

场景： 在没有先验理论框架的情况下检测 AI 生成的内容

准备包含文本样本和标签（人类 vs AI 生成）的数据集
使用适当的提示模板创建 config.yaml

运行假设生成：

hypogenic_generation --config config.yaml --method hypogenic --num_hypotheses 20

在测试集上运行推理：

hypogenic_inference --config config.yaml --hypotheses output/hypotheses.json --test_data data/test.json

分析结果中的模式，如正式程度、语法精确度和语气差异

示例 2：文献指导的假设测试（HypoRefine）

场景： 基于现有研究检测酒店评论中的欺骗行为

收集 10 篇关于语言欺骗线索的相关论文
准备包含真实和欺诈性评论的数据集
配置 config.yaml，包含文献处理和数据生成模板

运行 HypoRefine：

hypogenic_generation --config config.yaml --method hyporefine --papers papers/ --num_hypotheses 15

测试检查代词频率、细节特异性和其他语言模式的假设
比较基于文献的假设和数据驱动假设的性能

示例 3：全面的假设覆盖（Union 方法）

场景： 最大化假设多样性的心理压力检测

从心理健康研究论文中生成文献假设
从社交媒体帖子中生成数据驱动假设

运行 Union 方法进行组合和去重：

hypogenic_generation --config config.yaml --method union --literature_hypotheses lit_hyp.json

推理捕捉了理论概念（发帖行为变化）和数据模式（情感语言转变）

缓存： 启用 Redis 缓存以降低重复 LLM 调用的 API 成本和计算时间

并行处理： 利用多个工作器进行大规模假设生成和测试

自适应优化： 使用具有挑战性的示例迭代提高假设质量

使用 hypogenic 的研究已证明：

在 AI 内容检测任务中准确率提升 14.19%
在欺骗检测任务中准确率提升 7.44%
80-84% 的假设对提供独特、非冗余的见解
在多个研究领域获得人类评估者的高帮助性评分

问题： 生成的假设过于通用 解决方案： 优化 config.yaml 中的提示模板，要求更具体、可测试的假设

问题： 推理性能不佳 解决方案： 确保数据集有足够的训练示例，调整假设生成参数，或增加假设数量

问题： 标签提取失败 解决方案： 为领域特定输出解析实现自定义 extract_label() 函数

问题： GROBID PDF 处理失败 解决方案： 确保 GROBID 服务正在运行（bash ./modules/run_grobid.sh）且 PDF 是有效的研究论文

创建自定义任务

要向 Hypogenic 添加新任务或数据集：

步骤 1：准备您的数据集

按照所需格式创建三个 JSON 文件：

your_task_train.json
your_task_val.json
your_task_test.json

每个文件必须具有文本特征键（text_features_1 等）和 label。

步骤 2：创建 config.yaml

定义您的任务配置，包括：

任务名称和数据集路径
用于观察、生成、推理的提示模板
任何用于可重用提示组件的额外键
占位符变量（例如 ${text_features_1}、${num_hypotheses}）

步骤 3：实现 extract_label 函数

创建一个自定义标签提取函数，用于解析您领域的 LLM 输出：

from hypogenic import BaseTask

def extract_my_label(llm_output: str) -> str:
    """为您的任务自定义标签提取。
    
    必须返回与数据集 'label' 字段相同格式的标签。
    """
    # 示例：从特定格式中提取
    if "Final prediction:" in llm_output:
        return llm_output.split("Final prediction:")[-1].strip()
    
    # 回退到默认模式
    import re
    match = re.search(r'final answer:\s+(.*)', llm_output, re.IGNORECASE)
    return match.group(1).strip() if match else llm_output.strip()

# 使用您的自定义任务
task = BaseTask(
    config_path="./your_task/config.yaml",
    extract_label=extract_my_label
)

步骤 4：（可选）处理文献

对于 HypoRefine/Union 方法：

创建 literature/your_task_name/raw/ 目录
添加相关研究论文 PDF
运行 GROBID 预处理
使用 pdf_preprocess.py 进行处理

步骤 5：生成和测试

使用 CLI 或 Python API 运行假设生成和推理：

# CLI 方法
hypogenic_generation --config your_task/config.yaml --method hypogenic --num_hypotheses 20
hypogenic_inference --config your_task/config.yaml --hypotheses output/hypotheses.json

# 或使用 Python API（参见 Python API 使用部分）

了解仓库布局：

hypothesis-generation/
├── hypogenic/              # 核心包代码
├── hypogenic_cmd/          # CLI 入口点
├── hypothesis_agent/       # HypoRefine 智能体框架
├── literature/            # 文献处理工具
├── modules/               # GROBID 和预处理模块
├── examples/              # 示例脚本
│   ├── generation.py      # 基础 HypoGeniC 生成
│   ├── union_generation.py # HypoRefine/Union 生成
│   ├── inference.py       # 单假设推理
│   ├── multi_hyp_inference.py # 多假设推理
│   └── pdf_preprocess.py  # 文献 PDF 处理
├── data/                  # 示例数据集（单独克隆）
├── tests/                 # 单元测试
└── IO_prompting/          # 提示模板和实验

hypogenic/ ：包含 BaseTask 和生成逻辑的主包
examples/ ：常见工作流的参考实现
literature/ ：PDF 处理和文献提取工具
modules/ ：外部工具集成（GROBID 等）

Liu, H., Huang, S., Hu, J., Zhou, Y., & Tan, C. (2025). HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation. arXiv preprint arXiv:2504.11524.

论文： https://arxiv.org/abs/2504.11524
描述： 用于系统评估假设生成方法的基准测试框架

@misc{liu2025hypobenchsystematicprincipledbenchmarking,
      title={HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation}, 
      author={Haokun Liu and Sicong Huang and Jingyu Hu and Yangqiaoyu Zhou and Chenhao Tan},
      year={2025},
      eprint={2504.11524},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.11524}, 
}

Literature Meets Data (2024)

Liu, H., Zhou, Y., Li, M., Yuan, C., & Tan, C. (2024). Literature Meets Data: A Synergistic Approach to Hypothesis Generation. arXiv preprint arXiv:2410.17309.

论文： https://arxiv.org/abs/2410.17309
代码： https://github.com/ChicagoHAI/hypothesis-generation
描述： 介绍 HypoRefine 并展示基于文献和数据驱动的假设生成的协同组合

@misc{liu2024literaturemeetsdatasynergistic,
      title={Literature Meets Data: A Synergistic Approach to Hypothesis Generation}, 
      author={Haokun Liu and Yangqiaoyu Zhou and Mingxuan Li and Chenfei Yuan and Chenhao Tan},
      year={2024},
      eprint={2410.17309},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.17309}, 
}

Hypothesis Generation with Large Language Models (2024)

Zhou, Y., Liu, H., Srivastava, T., Mei, H., & Tan, C. (2024). Hypothesis Generation with Large Language Models. In Proceedings of EMNLP Workshop of NLP for Science.

论文： https://aclanthology.org/2024.nlp4science-1.10/
描述： 数据驱动假设生成的最初 HypoGeniC 框架

@inproceedings{zhou2024hypothesisgenerationlargelanguage,
      title={Hypothesis Generation with Large Language Models}, 
      author={Yangqiaoyu Zhou and Haokun Liu and Tejes Srivastava and Hongyuan Mei and Chenhao Tan},
      booktitle = {Proceedings of EMNLP Workshop of NLP for Science},
      year={2024},
      url={https://aclanthology.org/2024.nlp4science-1.10/},
}

克隆这些仓库以获取即用型示例：

# HypoGeniC 示例（仅数据驱动）
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data

# HypoRefine/Union 示例（文献 + 数据）
git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data

贡献者： 7+ 活跃贡献者
星标： GitHub 上 89+
主题： research-tool, interpretability, hypothesis-generation, scientific-discovery, llm-application

如需贡献或提问，请访问 GitHub 仓库并查看问题页面。

config_template.yaml - 包含所有必需提示模板和参数的完整示例配置文件。这包括：

任务配置的完整 YAML 结构
所有方法的示例提示模板
占位符变量文档
基于角色的提示示例

scripts 目录可用于：

自定义数据准备工具
格式转换工具
分析和评估脚本
与外部工具的集成

assets 目录可用于：

示例数据集和模板
样本假设库
可视化输出
文档补充

🇺🇸English

Hypogenic

Overview

Hypogenic provides automated hypothesis generation and testing using large language models to accelerate scientific discovery. The framework supports three approaches: HypoGeniC (data-driven hypothesis generation), HypoRefine (synergistic literature and data integration), and Union methods (mechanistic combination of literature and data-driven hypotheses).

Quick Start

Get started with Hypogenic in minutes:

# Install the package
uv pip install hypogenic

# Clone example datasets
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data

# Run basic hypothesis generation
hypogenic_generation --config ./data/your_task/config.yaml --method hypogenic --num_hypotheses 20

# Run inference on generated hypotheses
hypogenic_inference --config ./data/your_task/config.yaml --hypotheses output/hypotheses.json

Or use Python API:

from hypogenic import BaseTask

# Create task with your configuration
task = BaseTask(config_path="./data/your_task/config.yaml")

# Generate hypotheses
task.generate_hypotheses(method="hypogenic", num_hypotheses=20)

# Run inference
results = task.inference(hypothesis_bank="./output/hypotheses.json")

When to Use This Skill

Use this skill when working on:

Generating scientific hypotheses from observational datasets
Testing multiple competing hypotheses systematically
Combining literature insights with empirical patterns
Accelerating research discovery through automated hypothesis ideation
Domains requiring hypothesis-driven analysis: deception detection, AI-generated content identification, mental health indicators, predictive modeling, or other empirical research

Key Features

Automated Hypothesis Generation

Generate 10-20+ testable hypotheses from data in minutes
Iterative refinement based on validation performance
Support for both API-based (OpenAI, Anthropic) and local LLMs

Literature Integration

Extract insights from research papers via PDF processing
Combine theoretical foundations with empirical patterns
Systematic literature-to-hypothesis pipeline with GROBID

Performance Optimization

Redis caching reduces API costs for repeated experiments
Parallel processing for large-scale hypothesis testing
Adaptive refinement focuses on challenging examples

Flexible Configuration

Template-based prompt engineering with variable injection
Custom label extraction for domain-specific tasks
Modular architecture for easy extension

Proven Results

8.97% improvement over few-shot baselines
15.75% improvement over literature-only approaches
80-84% hypothesis diversity (non-redundant insights)
Human evaluators report significant decision-making improvements

Core Capabilities

1. HypoGeniC: Data-Driven Hypothesis Generation

Generate hypotheses solely from observational data through iterative refinement.

Process:

Initialize with a small data subset to generate candidate hypotheses
Iteratively refine hypotheses based on performance
Replace poorly-performing hypotheses with new ones from challenging examples

Best for: Exploratory research without existing literature, pattern discovery in novel datasets

2. HypoRefine: Literature and Data Integration

Synergistically combine existing literature with empirical data through an agentic framework.

Process:

Extract insights from relevant research papers (typically 10 papers)
Generate theory-grounded hypotheses from literature
Generate data-driven hypotheses from observational patterns
Refine both hypothesis banks through iterative improvement

Best for: Research with established theoretical foundations, validating or extending existing theories

3. Union Methods

Mechanistically combine literature-only hypotheses with framework outputs.

Variants:

Literature ∪ HypoGeniC : Combines literature hypotheses with data-driven generation
Literature ∪ HypoRefine : Combines literature hypotheses with integrated approach

Best for: Comprehensive hypothesis coverage, eliminating redundancy while maintaining diverse perspectives

Installation

Install via pip:

uv pip install hypogenic

Optional dependencies:

Redis server (port 6832): Enables caching of LLM responses to significantly reduce API costs during iterative hypothesis generation
s2orc-doc2json : Required for processing literature PDFs in HypoRefine workflows
GROBID : Required for PDF preprocessing (see Literature Processing section)

Clone example datasets:

# For HypoGeniC examples
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data

# For HypoRefine/Union examples
git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data

Dataset Format

Datasets must follow HuggingFace datasets format with specific naming conventions:

Required files:

<TASK>_train.json: Training data
<TASK>_val.json: Validation data
<TASK>_test.json: Test data

Required keys in JSON:

text_features_1 through text_features_n: Lists of strings containing feature values
label: List of strings containing ground truth labels

Example (headline click prediction):

{
  "headline_1": [
    "What Up, Comet? You Just Got *PROBED*",
    "Scientists Made a Breakthrough in Quantum Computing"
  ],
  "headline_2": [
    "Scientists Everywhere Were Holding Their Breath Today. Here's Why.",
    "New Quantum Computer Achieves Milestone"
  ],
  "label": [
    "Headline 2 has more clicks than Headline 1",
    "Headline 1 has more clicks than Headline 2"
  ]
}

Important notes:

All lists must have the same length
Label format must match your extract_label() function output format
Feature keys can be customized to match your domain (e.g., review_text, post_content, etc.)

Configuration

Each task requires a config.yaml file specifying:

Required elements:

Dataset paths (train/val/test)
Prompt templates for:
- Observations generation
- Batched hypothesis generation
- Hypothesis inference
- Relevance checking
- Adaptive methods (for HypoRefine)

Template capabilities:

Dataset placeholders for dynamic variable injection (e.g., ${text_features_1}, ${num_hypotheses})
Custom label extraction functions for domain-specific parsing
Role-based prompt structure (system, user, assistant roles)

Configuration structure:

task_name: your_task_name

train_data_path: ./your_task_train.json
val_data_path: ./your_task_val.json
test_data_path: ./your_task_test.json

prompt_templates:
  # Extra keys for reusable prompt components
  observations: |
    Feature 1: ${text_features_1}
    Feature 2: ${text_features_2}
    Observation: ${label}
  
  # Required templates
  batched_generation:
    system: "Your system prompt here"
    user: "Your user prompt with ${num_hypotheses} placeholder"
  
  inference:
    system: "Your inference system prompt"
    user: "Your inference user prompt"
  
  # Optional templates for advanced features
  few_shot_baseline: {...}
  is_relevant: {...}
  adaptive_inference: {...}
  adaptive_selection: {...}

Refer to references/config_template.yaml for a complete example configuration.

Literature Processing (HypoRefine/Union Methods)

To use literature-based hypothesis generation, you must preprocess PDF papers:

Step 1: Setup GROBID (first time only)

bash ./modules/setup_grobid.sh

Step 2: Add PDF files Place research papers in literature/YOUR_TASK_NAME/raw/

Step 3: Process PDFs

# Start GROBID service
bash ./modules/run_grobid.sh

# Process PDFs for your task
cd examples
python pdf_preprocess.py --task_name YOUR_TASK_NAME

This converts PDFs to structured format for hypothesis extraction. Automated literature search will be supported in future releases.

CLI Usage

Hypothesis Generation

hypogenic_generation --help

Key parameters:

Task configuration file path
Model selection (API-based or local)
Generation method (HypoGeniC, HypoRefine, or Union)
Number of hypotheses to generate
Output directory for hypothesis banks

Hypothesis Inference

hypogenic_inference --help

Key parameters:

Task configuration file path
Hypothesis bank file path
Test dataset path
Inference method (default or multi-hypothesis)
Output file for results

Python API Usage

For programmatic control and custom workflows, use Hypogenic directly in your Python code:

Basic HypoGeniC Generation

from hypogenic import BaseTask

# Clone example datasets first
# git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data

# Load your task with custom extract_label function
task = BaseTask(
    config_path="./data/your_task/config.yaml",
    extract_label=lambda text: extract_your_label(text)
)

# Generate hypotheses
task.generate_hypotheses(
    method="hypogenic",
    num_hypotheses=20,
    output_path="./output/hypotheses.json"
)

# Run inference
results = task.inference(
    hypothesis_bank="./output/hypotheses.json",
    test_data="./data/your_task/your_task_test.json"
)

HypoRefine/Union Methods

# For literature-integrated approaches
# git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data

# Generate with HypoRefine
task.generate_hypotheses(
    method="hyporefine",
    num_hypotheses=15,
    literature_path="./literature/your_task/",
    output_path="./output/"
)
# This generates 3 hypothesis banks:
# - HypoRefine (integrated approach)
# - Literature-only hypotheses
# - Literature∪HypoRefine (union)

Multi-Hypothesis Inference

from examples.multi_hyp_inference import run_multi_hypothesis_inference

# Test multiple hypotheses simultaneously
results = run_multi_hypothesis_inference(
    config_path="./data/your_task/config.yaml",
    hypothesis_bank="./output/hypotheses.json",
    test_data="./data/your_task/your_task_test.json"
)

Custom Label Extraction

The extract_label() function is critical for parsing LLM outputs. Implement it based on your task:

def extract_label(llm_output: str) -> str:
    """Extract predicted label from LLM inference text.
    
    Default behavior: searches for 'final answer:\s+(.*)' pattern.
    Customize for your domain-specific output format.
    """
    import re
    match = re.search(r'final answer:\s+(.*)', llm_output, re.IGNORECASE)
    if match:
        return match.group(1).strip()
    return llm_output.strip()

Important: Extracted labels must match the format of label values in your dataset for correct accuracy calculation.

Workflow Examples

Example 1: Data-Driven Hypothesis Generation (HypoGeniC)

Scenario: Detecting AI-generated content without prior theoretical framework

Steps:

Prepare dataset with text samples and labels (human vs. AI-generated)
Create config.yaml with appropriate prompt templates

Run hypothesis generation:

hypogenic_generation --config config.yaml --method hypogenic --num_hypotheses 20

Run inference on test set:

hypogenic_inference --config config.yaml --hypotheses output/hypotheses.json --test_data data/test.json

Analyze results for patterns like formality, grammatical precision, and tone differences

Example 2: Literature-Informed Hypothesis Testing (HypoRefine)

Scenario: Deception detection in hotel reviews building on existing research

Steps:

Collect 10 relevant papers on linguistic deception cues
Prepare dataset with genuine and fraudulent reviews
Configure config.yaml with literature processing and data generation templates

Run HypoRefine:

hypogenic_generation --config config.yaml --method hyporefine --papers papers/ --num_hypotheses 15

Test hypotheses examining pronoun frequency, detail specificity, and other linguistic patterns
Compare literature-based and data-driven hypothesis performance

Example 3: Comprehensive Hypothesis Coverage (Union Method)

Scenario: Mental stress detection maximizing hypothesis diversity

Steps:

Generate literature hypotheses from mental health research papers
Generate data-driven hypotheses from social media posts

Run Union method to combine and deduplicate:

hypogenic_generation --config config.yaml --method union --literature_hypotheses lit_hyp.json

Inference captures both theoretical constructs (posting behavior changes) and data patterns (emotional language shifts)

Performance Optimization

Caching: Enable Redis caching to reduce API costs and computation time for repeated LLM calls

Parallel Processing: Leverage multiple workers for large-scale hypothesis generation and testing

Adaptive Refinement: Use challenging examples to iteratively improve hypothesis quality

Expected Outcomes

Research using hypogenic has demonstrated:

14.19% accuracy improvement in AI-content detection tasks
7.44% accuracy improvement in deception detection tasks
80-84% of hypothesis pairs offering distinct, non-redundant insights
High helpfulness ratings from human evaluators across multiple research domains

Troubleshooting

Issue: Generated hypotheses are too generic Solution: Refine prompt templates in config.yaml to request more specific, testable hypotheses

Issue: Poor inference performance Solution: Ensure dataset has sufficient training examples, adjust hypothesis generation parameters, or increase number of hypotheses

Issue: Label extraction failures Solution: Implement custom extract_label() function for domain-specific output parsing

Issue: GROBID PDF processing fails Solution: Ensure GROBID service is running (bash ./modules/run_grobid.sh) and PDFs are valid research papers

Creating Custom Tasks

To add a new task or dataset to Hypogenic:

Step 1: Prepare Your Dataset

Create three JSON files following the required format:

your_task_train.json
your_task_val.json
your_task_test.json

Each file must have keys for text features (text_features_1, etc.) and label.

Step 2: Create config.yaml

Define your task configuration with:

Task name and dataset paths
Prompt templates for observations, generation, inference
Any extra keys for reusable prompt components
Placeholder variables (e.g., ${text_features_1}, ${num_hypotheses})

Step 3: Implement extract_label Function

Create a custom label extraction function that parses LLM outputs for your domain:

from hypogenic import BaseTask

def extract_my_label(llm_output: str) -> str:
    """Custom label extraction for your task.
    
    Must return labels in same format as dataset 'label' field.
    """
    # Example: Extract from specific format
    if "Final prediction:" in llm_output:
        return llm_output.split("Final prediction:")[-1].strip()
    
    # Fallback to default pattern
    import re
    match = re.search(r'final answer:\s+(.*)', llm_output, re.IGNORECASE)
    return match.group(1).strip() if match else llm_output.strip()

# Use your custom task
task = BaseTask(
    config_path="./your_task/config.yaml",
    extract_label=extract_my_label
)

Step 4: (Optional) Process Literature

For HypoRefine/Union methods:

Create literature/your_task_name/raw/ directory
Add relevant research paper PDFs
Run GROBID preprocessing
Process with pdf_preprocess.py

Step 5: Generate and Test

Run hypothesis generation and inference using CLI or Python API:

# CLI approach
hypogenic_generation --config your_task/config.yaml --method hypogenic --num_hypotheses 20
hypogenic_inference --config your_task/config.yaml --hypotheses output/hypotheses.json

# Or use Python API (see Python API Usage section)

Repository Structure

Understanding the repository layout:

hypothesis-generation/
├── hypogenic/              # Core package code
├── hypogenic_cmd/          # CLI entry points
├── hypothesis_agent/       # HypoRefine agent framework
├── literature/            # Literature processing utilities
├── modules/               # GROBID and preprocessing modules
├── examples/              # Example scripts
│   ├── generation.py      # Basic HypoGeniC generation
│   ├── union_generation.py # HypoRefine/Union generation
│   ├── inference.py       # Single hypothesis inference
│   ├── multi_hyp_inference.py # Multiple hypothesis inference
│   └── pdf_preprocess.py  # Literature PDF processing
├── data/                  # Example datasets (clone separately)
├── tests/                 # Unit tests
└── IO_prompting/          # Prompt templates and experiments

Key directories:

hypogenic/ : Main package with BaseTask and generation logic
examples/ : Reference implementations for common workflows
literature/ : Tools for PDF processing and literature extraction
modules/ : External tool integrations (GROBID, etc.)

Related Publications

HypoBench (2025)

Liu, H., Huang, S., Hu, J., Zhou, Y., & Tan, C. (2025). HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation. arXiv preprint arXiv:2504.11524.

Paper: https://arxiv.org/abs/2504.11524
Description: Benchmarking framework for systematic evaluation of hypothesis generation methods

BibTeX:

@misc{liu2025hypobenchsystematicprincipledbenchmarking,
      title={HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation}, 
      author={Haokun Liu and Sicong Huang and Jingyu Hu and Yangqiaoyu Zhou and Chenhao Tan},
      year={2025},
      eprint={2504.11524},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2504.11524}, 
}

Literature Meets Data (2024)

Liu, H., Zhou, Y., Li, M., Yuan, C., & Tan, C. (2024). Literature Meets Data: A Synergistic Approach to Hypothesis Generation. arXiv preprint arXiv:2410.17309.

Paper: https://arxiv.org/abs/2410.17309
Code: https://github.com/ChicagoHAI/hypothesis-generation
Description: Introduces HypoRefine and demonstrates synergistic combination of literature-based and data-driven hypothesis generation

BibTeX:

@misc{liu2024literaturemeetsdatasynergistic,
      title={Literature Meets Data: A Synergistic Approach to Hypothesis Generation}, 
      author={Haokun Liu and Yangqiaoyu Zhou and Mingxuan Li and Chenfei Yuan and Chenhao Tan},
      year={2024},
      eprint={2410.17309},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.17309}, 
}

Hypothesis Generation with Large Language Models (2024)

Zhou, Y., Liu, H., Srivastava, T., Mei, H., & Tan, C. (2024). Hypothesis Generation with Large Language Models. In Proceedings of EMNLP Workshop of NLP for Science.

Paper: https://aclanthology.org/2024.nlp4science-1.10/
Description: Original HypoGeniC framework for data-driven hypothesis generation

BibTeX:

@inproceedings{zhou2024hypothesisgenerationlargelanguage,
      title={Hypothesis Generation with Large Language Models}, 
      author={Yangqiaoyu Zhou and Haokun Liu and Tejes Srivastava and Hongyuan Mei and Chenhao Tan},
      booktitle = {Proceedings of EMNLP Workshop of NLP for Science},
      year={2024},
      url={https://aclanthology.org/2024.nlp4science-1.10/},
}

Additional Resources

Official Links

GitHub Repository: https://github.com/ChicagoHAI/hypothesis-generation
PyPI Package: https://pypi.org/project/hypogenic/
License: MIT License
Issues & Support: https://github.com/ChicagoHAI/hypothesis-generation/issues

Example Datasets

Clone these repositories for ready-to-use examples:

# HypoGeniC examples (data-driven only)
git clone https://github.com/ChicagoHAI/HypoGeniC-datasets.git ./data

# HypoRefine/Union examples (literature + data)
git clone https://github.com/ChicagoHAI/Hypothesis-agent-datasets.git ./data

Community & Contributions

Contributors: 7+ active contributors
Stars: 89+ on GitHub
Topics: research-tool, interpretability, hypothesis-generation, scientific-discovery, llm-application

For contributions or questions, visit the GitHub repository and check the issues page.

Local Resources

references/

config_template.yaml - Complete example configuration file with all required prompt templates and parameters. This includes:

Full YAML structure for task configuration
Example prompt templates for all methods
Placeholder variable documentation
Role-based prompt examples

scripts/

Scripts directory is available for:

Custom data preparation utilities
Format conversion tools
Analysis and evaluation scripts
Integration with external tools

assets/

Assets directory is available for:

Example datasets and templates
Sample hypothesis banks
Visualization outputs
Documentation supplements

Weekly Installs

117

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykWarn

Installed on

claude-code100

opencode91

cursor88

gemini-cli87

antigravity81

codex76

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

65,000 周安装

Hypogenic：基于LLM的自动化科学假设生成与测试框架，加速AI科研发现

🇨🇳中文介绍

Hypogenic

概述

快速开始

何时使用此技能

主要特性

相关 Skills

核心能力

1. HypoGeniC：数据驱动的假设生成

2. HypoRefine：文献与数据整合

3. Union 方法

安装

数据集格式

配置

文献处理（HypoRefine/Union 方法）

CLI 使用

假设生成

假设推理

Python API 使用

基础 HypoGeniC 生成

HypoRefine/Union 方法

多假设推理

自定义标签提取

工作流示例

示例 1：数据驱动的假设生成（HypoGeniC）

示例 2：文献指导的假设测试（HypoRefine）

示例 3：全面的假设覆盖（Union 方法）

性能优化

预期成果

故障排除

创建自定义任务

步骤 1：准备您的数据集

步骤 2：创建 config.yaml

步骤 3：实现 extract_label 函数

步骤 4：（可选）处理文献

步骤 5：生成和测试

仓库结构

相关出版物

HypoBench (2025)

Literature Meets Data (2024)

Hypothesis Generation with Large Language Models (2024)

附加资源

官方链接

示例数据集

社区与贡献

本地资源

references/

scripts/

assets/

🇺🇸English

Hypogenic

Overview

Quick Start

When to Use This Skill

Key Features

Core Capabilities

1. HypoGeniC: Data-Driven Hypothesis Generation

2. HypoRefine: Literature and Data Integration

3. Union Methods

Installation

Dataset Format

Configuration

Literature Processing (HypoRefine/Union Methods)

CLI Usage

Hypothesis Generation

Hypothesis Inference

Python API Usage

Basic HypoGeniC Generation

HypoRefine/Union Methods

Multi-Hypothesis Inference

Custom Label Extraction

Workflow Examples

Example 1: Data-Driven Hypothesis Generation (HypoGeniC)

Example 2: Literature-Informed Hypothesis Testing (HypoRefine)

Example 3: Comprehensive Hypothesis Coverage (Union Method)

Performance Optimization

Expected Outcomes

Troubleshooting

Creating Custom Tasks