ARIS 自主机器学习研究工具：夜间自动生成AI论文的Claude Code技能

aris-autonomous-ml-research by aradotso/trending-skills

359 周安装量

10 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/aradotso/trending-skills --skill aris-autonomous-ml-research

AI/机器学习内容创作自动化

🇨🇳中文介绍

ARIS — 睡眠中的自主机器学习研究

Skill by ara.so — Daily 2026 Skills 合集

ARIS（Auto-Research-In-Sleep）将 Claude Code 转变为一个自主的机器学习研究引擎。它将想法发现 → 跨模型评审循环 → 论文撰写 → 编译 PDF 链接成无需人工干预的夜间流水线。Claude Code 驱动执行，而外部模型（Codex/GPT-5.4、GLM、DeepSeek、Kimi 等）充当对抗性评审者——打破单一模型评审无法摆脱的自对弈盲点。

功能概述

工作流	触发命令	执行内容
想法发现	`/idea-discovery`	文献调研 → 8–12 个想法 → 新颖性检查 → 试点 GPU 运行 → 排名报告
自动评审循环	`/auto-review-loop`	4 轮评审/修复循环，每轮跟踪评分（例如 5/10 → 7.5/10）
论文撰写	`/paper-writing`	叙述 → 大纲 → 图表 → LaTeX → PDF → 2 轮自动改进

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

工作流 1 — 想法发现

/idea-discovery "离散扩散语言模型中的因子化差距"

请具体说明 — "NLP" 会产生弱想法；"离散扩散语言模型中的因子化差距" 则针对真实的研究空白。

多源文献搜索（arXiv、Scholar、Zotero、Obsidian、本地 PDF）
Claude 头脑风暴产生 8–12 个候选想法
Codex 评审者根据文献交叉检查新颖性
对顶级候选想法进行试点 GPU 实验
排名后的想法报告保存至 idea_discovery_report.md

工作流 2 — 自动评审循环

从包含您论文草稿或实验结果的目录中运行。

Claude 将当前工作提交给 Codex 评审者
Codex 返回带有 /10 评分的结构化批评
Claude 实施修复（实验、写作、消融研究）
重复最多 4 轮或直到达到分数阈值
评分曲线保存至 docs/auto_review_score_curve.png

工作流 3 — 论文撰写

/paper-writing "NARRATIVE_REPORT.md"

指向描述您发现的叙述性 Markdown 文件。

大纲生成（章节、图表、表格）
根据实验结果生成图表
LaTeX 源码组装
pdflatex 编译
2 轮自动评审与改进循环
最终 PDF + 抗幻觉 BibTeX（从 DBLP/CrossRef 获取）

/research-pipeline "您的研究方向"

从单一提示开始，链接工作流 1 → 2 → 3。醒来时您将获得一份已评分、已编译的论文。

在任何命令后追加 — key: value：

/research-pipeline "主题" — AUTO_PROCEED: false
/research-pipeline "主题" — human checkpoint: true
/research-pipeline "主题" — arxiv download: true
/research-pipeline "主题" — AUTO_PROCEED: false, human checkpoint: true

参数	默认值	效果
`AUTO_PROCEED`	`true`	`false` = 在提交 GPU 时间前的想法选择关口暂停
`human checkpoint`	`false`	`true` = 每轮评审后暂停以获取手动反馈
`arxiv download`	`false`	`true` = 在文献调研期间下载完整 PDF（而非仅元数据）
`DBLP_BIBTEX`	`true`	`false` = 使用 LLM 生成的 BibTeX（不推荐 — 存在幻觉风险）

无需 Claude 或 OpenAI API — 通过 llm-chat MCP 服务器可交换任何 OpenAI 兼容的端点：

# 安装捆绑的 llm-chat MCP 服务器
cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat
pip install -r requirements.txt

# 配置您的提供商
export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4"   # GLM-4
export LLM_CHAT_API_KEY="your-key"
export LLM_CHAT_MODEL="glm-4-plus"

# 添加到 Claude Code
claude mcp add llm-chat -s user -- python server.py

已测试的评审者模型：

提供商	模型	备注
OpenAI	`gpt-5.4`	推荐 — 最严格
智谱 AI	`glm-4-plus`	中文论文能力强
MiniMax	`abab6.5s-chat`	快速、经济高效
Moonshot	`moonshot-v1-128k`	Kimi — 长上下文论文
DeepSeek	`deepseek-chat`	代码密集型实验
01.ai	`yi-large`	LongCat — 长上下文

默认情况下，BibTeX 从真实数据库获取 — 无需手动标志：

# skills/paper-writing/citation_fetcher.py 内部使用的模式
import requests

def fetch_bibtex_dblp(title: str) -> str | None:
    """根据论文标题从 DBLP 获取真实的 BibTeX。"""
    resp = requests.get(
        "https://dblp.org/search/publ/api",
        params={"q": title, "format": "json", "h": 1}
    )
    hits = resp.json().get("result", {}).get("hits", {}).get("hit", [])
    if not hits:
        return None
    key = hits[0]["info"].get("key", "")
    bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib")
    return bib_resp.text if bib_resp.ok else None

def fetch_bibtex_crossref(doi: str) -> str | None:
    """备用方案：根据 DOI 从 CrossRef 获取 BibTeX。"""
    resp = requests.get(
        f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex"
    )
    return resp.text if resp.ok else None

如果完全离线工作，请使用 — DBLP_BIBTEX: false 禁用。

# 安装 Zotero Better BibTeX 插件，然后：
export ZOTERO_API_KEY="your-zotero-web-api-key"
export ZOTERO_LIBRARY_ID="your-library-id"
export ZOTERO_LIBRARY_TYPE="user"   # 或 "group"

文献搜索将在访问 arXiv 之前查询您的 Zotero 库。

export OBSIDIAN_VAULT_PATH="/path/to/your/vault"

技能将在外部查询之前，在您的知识库中搜索相关的 Markdown 笔记。

飞书 / Lark 通知

export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/your-token"
export FEISHU_MODE="push"   # off | push | interactive

模式	行为
`off`	无通知
`push`	单向提醒：评审分数、实验完成、检查点
`interactive`	在 `AUTO_PROCEED: false` 关口显示移动端批准按钮

流水线运行后的目录结构

your-project/
├── idea_discovery_report.md       # 带有新颖性评分的排名想法
├── NARRATIVE_REPORT.md            # 自动生成的发现叙述
├── paper/
│   ├── main.tex                   # 组装的 LaTeX
│   ├── main.pdf                   # 编译输出
│   ├── figures/                   # 自动生成的图表
│   └── references.bib             # 来自 DBLP/CrossRef 的真实 BibTeX
├── experiments/
│   ├── pilot_runs/                # 想法发现的 GPU 试点运行
│   └── review_round_*/            # 每轮实验的结果
└── docs/
    └── auto_review_score_curve.png

从 Python 脚本（例如 cron 作业或 CI 步骤）以编程方式触发 ARIS 工作流：

import subprocess
import json
from pathlib import Path

def run_aris_pipeline(
    research_direction: str,
    output_dir: str = ".",
    auto_proceed: bool = True,
    human_checkpoint: bool = False,
    arxiv_download: bool = False,
) -> dict:
    """
    通过 Claude Code CLI 启动 ARIS 完整流水线。
    从评审曲线 JSON 返回解析后的分数进展。
    """
    overrides = ", ".join([
        f"AUTO_PROCEED: {str(auto_proceed).lower()}",
        f"human checkpoint: {str(human_checkpoint).lower()}",
        f"arxiv download: {str(arxiv_download).lower()}",
    ])

    command = f'/research-pipeline "{research_direction}" — {overrides}'

    result = subprocess.run(
        ["claude", "--print", command],
        cwd=output_dir,
        capture_output=True,
        text=True,
    )

    if result.returncode != 0:
        raise RuntimeError(f"ARIS 流水线失败:\n{result.stderr}")

    # 如果可用，解析分数进展
    score_json = Path(output_dir) / "docs" / "review_scores.json"
    if score_json.exists():
        return json.loads(score_json.read_text())

    return {"stdout": result.stdout}


# 示例：夜间研究任务
if __name__ == "__main__":
    scores = run_aris_pipeline(
        research_direction="自回归语言模型中的令牌级不确定性校准",
        output_dir="./nightly_research",
        auto_proceed=True,
        human_checkpoint=False,
    )
    print(f"最终评审分数: {scores.get('rounds', [{}])[-1].get('score')}/10")

ARIS 附带 20 个可组合的子技能。手动链接它们以创建自定义工作流：

# 仅文献
/literature-survey "主题"

# 无试点实验的头脑风暴
/idea-brainstorm "主题" — pilot experiments: false

# 单轮评审（无循环）
/single-review "path/to/draft.md"

# 证明撰写（社区技能）
/proof-writer "定理陈述"

# 从现有叙述撰写论文，跳过评审
/paper-writing "NARRATIVE.md" — auto-review: false

未找到 Codex MCP

claude mcp list                          # 验证 "codex" 出现
codex setup                              # 如果缺失，重新运行设置
claude mcp remove codex && \
  claude mcp add codex -s user -- codex mcp-server   # 重新添加

技能未在 Claude Code 中加载

ls ~/.claude/skills/                     # 验证文件已复制
# 每个技能必须是一个包含 SKILL.md 的目录
ls ~/.claude/skills/auto-review-loop/SKILL.md

论文撰写期间未找到 pdflatex

# macOS
brew install --cask mactex-no-gui
# Ubuntu/Debian
sudo apt install texlive-full
# 然后重试 — 技能会自动检测 PATH 上的 pdflatex

评审者返回空批评 检查 ~/.codex/config.toml — 确保 model 已设置且您的 API 密钥有效：

codex "say hello"    # 在 Claude Code 外部快速冒烟测试

GLM/DeepSeek 评审者未触发 验证 llm-chat MCP 服务器在列表中：

claude mcp list      # 应显示 "llm-chat"
echo $LLM_CHAT_BASE_URL   # 必须在启动 claude 的 shell 中设置

4 轮后分数未提高

添加 — human checkpoint: true 并检查 experiments/review_round_*/ 中每轮的批评文件
考虑切换评审者模型 — 不同的架构会暴露不同的弱点
低级问题（错误数据、有缺陷的基线）需要在另一个循环之前进行手动干预

技能	描述
`proof-writer`	带有抗幻觉引用的严谨定理证明起草

添加您自己的技能：创建 skills/your-skill-name/SKILL.md 并提交 PR。

跨模型评审 — 为何有效

Claude Code (执行者)          Codex / 外部 LLM (评审者)
─────────────────────          ───────────────────────────────
快速、流畅的代码执行  ←→  深思熟虑、严谨的批评
广泛的上下文保留         对抗性地探查盲点
叙述生成                结构弱点检测

单一模型的自评审会陷入局部最优 — 生成工作的相同模式匹配也用于评估它。跨模型评审是对抗性的：评审者主动探查执行者未预料到的弱点。1→2 的模型跳跃产生最大的质量提升；添加更多评审者会产生收益递减。

🇺🇸English

ARIS — Autonomous ML Research In Sleep

Skill by ara.so — Daily 2026 Skills collection

ARIS (Auto-Research-In-Sleep) turns Claude Code into an autonomous ML research engine. It chains idea discovery → cross-model review loops → paper writing → compiled PDF into hands-off overnight pipelines. Claude Code drives execution while an external model (Codex/GPT-5.4, GLM, DeepSeek, Kimi, etc.) acts as adversarial reviewer — breaking self-play blind spots that single-model review cannot escape.

What It Does

Workflow	Trigger	What Runs
Idea Discovery	`/idea-discovery`	Literature survey → 8–12 ideas → novelty check → pilot GPU runs → ranked report
Auto Review Loop	`/auto-review-loop`	4-round review/fix cycle, score tracked per round (e.g. 5/10 → 7.5/10)
Paper Writing	`/paper-writing`	Narrative → outline → figures → LaTeX → PDF → 2-round auto-improvement
Full Pipeline	`/research-pipeline`	Chains all three end-to-end from a single prompt

Installation

# 1. Clone and install skills
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/

# 2. Install Codex MCP (cross-model reviewer)
npm install -g @openai/codex
codex setup          # set model to gpt-5.4 when prompted
claude mcp add codex -s user -- codex mcp-server

# 3. Verify MCP is connected
claude mcp list      # should show "codex" in the list

Codex Model Configuration

The reviewer model is read from ~/.codex/config.toml, not from skill files. Edit directly if needed:

# ~/.codex/config.toml
model = "gpt-5.4"        # recommended — most rigorous reviewer
# model = "gpt-5.3-codex"
# model = "gpt-5.2-codex"
# model = "o3"

Core Workflows

Workflow 1 — Idea Discovery

/idea-discovery "factorized gap in discrete diffusion language models"

Be specific — "NLP" produces weak ideas; "factorized gap in discrete diffusion LMs" targets a real research gap.

What runs:

Multi-source literature search (arXiv, Scholar, Zotero, Obsidian, local PDFs)
Claude brainstorms 8–12 candidate ideas
Codex reviewer cross-checks novelty against literature
Pilot GPU experiments on top candidates
Ranked idea report saved to idea_discovery_report.md

Workflow 2 — Auto Review Loop

/auto-review-loop

Run from a directory containing your paper draft or experiment results.

What runs:

Claude submits current work to Codex reviewer
Codex returns structured critique with score /10
Claude implements fixes (experiments, writing, ablations)
Repeat up to 4 rounds or until score threshold met
Score curve saved to docs/auto_review_score_curve.png

Workflow 3 — Paper Writing

/paper-writing "NARRATIVE_REPORT.md"

Point at a narrative markdown file describing your findings.

What runs:

Outline generation (sections, figures, tables)
Figure generation from experiment results
LaTeX source assembly
pdflatex compilation
2-round auto-review-and-improve cycle
Final PDF + anti-hallucination BibTeX (fetched from DBLP/CrossRef)

Full Pipeline

/research-pipeline "your research direction"

Chains Workflows 1 → 2 → 3 from a single prompt. Wake up to a scored, compiled paper.

Inline Configuration Overrides

Append — key: value to any command:

/research-pipeline "topic" — AUTO_PROCEED: false
/research-pipeline "topic" — human checkpoint: true
/research-pipeline "topic" — arxiv download: true
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true

Parameter	Default	Effect
`AUTO_PROCEED`	`true`	`false` = pause at idea selection gate before committing GPU time
`human checkpoint`	`false`	`true` = pause after each review round for manual feedback
`arxiv download`	`false`

Alternative Model Combinations

No Claude or OpenAI API required — swap any OpenAI-compatible endpoint via the llm-chat MCP server:

# Install the bundled llm-chat MCP server
cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat
pip install -r requirements.txt

# Configure your provider
export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4"   # GLM-4
export LLM_CHAT_API_KEY="your-key"
export LLM_CHAT_MODEL="glm-4-plus"

# Add to Claude Code
claude mcp add llm-chat -s user -- python server.py

Tested reviewer models:

Provider	Model	Notes
OpenAI	`gpt-5.4`	Recommended — most rigorous
Zhipu AI	`glm-4-plus`	Strong Chinese-language papers
MiniMax	`abab6.5s-chat`	Fast, cost-effective
Moonshot	`moonshot-v1-128k`	Kimi — long-context papers
DeepSeek	`deepseek-chat`	Code-heavy experiments

Anti-Hallucination Citations

BibTeX is fetched from real databases by default — no manual flag needed:

# skills/paper-writing/citation_fetcher.py pattern used internally
import requests

def fetch_bibtex_dblp(title: str) -> str | None:
    """Fetch real BibTeX from DBLP by paper title."""
    resp = requests.get(
        "https://dblp.org/search/publ/api",
        params={"q": title, "format": "json", "h": 1}
    )
    hits = resp.json().get("result", {}).get("hits", {}).get("hit", [])
    if not hits:
        return None
    key = hits[0]["info"].get("key", "")
    bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib")
    return bib_resp.text if bib_resp.ok else None

def fetch_bibtex_crossref(doi: str) -> str | None:
    """Fallback: fetch BibTeX from CrossRef by DOI."""
    resp = requests.get(
        f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex"
    )
    return resp.text if resp.ok else None

Disable with — DBLP_BIBTEX: false if working fully offline.

Optional Integrations

Zotero

# Install Zotero Better BibTeX plugin, then:
export ZOTERO_API_KEY="your-zotero-web-api-key"
export ZOTERO_LIBRARY_ID="your-library-id"
export ZOTERO_LIBRARY_TYPE="user"   # or "group"

Literature search will query your Zotero library before hitting arXiv.

Obsidian

export OBSIDIAN_VAULT_PATH="/path/to/your/vault"

Skill will search markdown notes in the vault for related work before external queries.

Feishu / Lark Notifications

export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/your-token"
export FEISHU_MODE="push"   # off | push | interactive

Mode	Behaviour
`off`	No notifications
`push`	One-way alerts: review scores, experiment completions, checkpoints
`interactive`	Mobile approval buttons at `AUTO_PROCEED: false` gates

Directory Layout After a Pipeline Run

your-project/
├── idea_discovery_report.md       # ranked ideas with novelty scores
├── NARRATIVE_REPORT.md            # auto-generated findings narrative
├── paper/
│   ├── main.tex                   # assembled LaTeX
│   ├── main.pdf                   # compiled output
│   ├── figures/                   # auto-generated plots
│   └── references.bib             # real BibTeX from DBLP/CrossRef
├── experiments/
│   ├── pilot_runs/                # idea-discovery GPU pilots
│   └── review_round_*/            # per-round experiment results
└── docs/
    └── auto_review_score_curve.png

Python Integration Pattern

Trigger ARIS workflows programmatically from a Python script (e.g. a cron job or CI step):

import subprocess
import json
from pathlib import Path

def run_aris_pipeline(
    research_direction: str,
    output_dir: str = ".",
    auto_proceed: bool = True,
    human_checkpoint: bool = False,
    arxiv_download: bool = False,
) -> dict:
    """
    Launch ARIS full pipeline via Claude Code CLI.
    Returns parsed score progression from the review curve JSON.
    """
    overrides = ", ".join([
        f"AUTO_PROCEED: {str(auto_proceed).lower()}",
        f"human checkpoint: {str(human_checkpoint).lower()}",
        f"arxiv download: {str(arxiv_download).lower()}",
    ])

    command = f'/research-pipeline "{research_direction}" — {overrides}'

    result = subprocess.run(
        ["claude", "--print", command],
        cwd=output_dir,
        capture_output=True,
        text=True,
    )

    if result.returncode != 0:
        raise RuntimeError(f"ARIS pipeline failed:\n{result.stderr}")

    # Parse score progression if available
    score_json = Path(output_dir) / "docs" / "review_scores.json"
    if score_json.exists():
        return json.loads(score_json.read_text())

    return {"stdout": result.stdout}


# Example: nightly research job
if __name__ == "__main__":
    scores = run_aris_pipeline(
        research_direction="token-level uncertainty calibration in autoregressive LMs",
        output_dir="./nightly_research",
        auto_proceed=True,
        human_checkpoint=False,
    )
    print(f"Final review score: {scores.get('rounds', [{}])[-1].get('score')}/10")

Skill Composition

ARIS ships 20 composable sub-skills. Chain them manually for custom workflows:

# Literature only
/literature-survey "topic"

# Brainstorm without pilot experiments
/idea-brainstorm "topic" — pilot experiments: false

# Single review round (no loop)
/single-review "path/to/draft.md"

# Proof-writing (community skill)
/proof-writer "theorem statement"

# Write paper from existing narrative, skip review
/paper-writing "NARRATIVE.md" — auto-review: false

Troubleshooting

Codex MCP not found

claude mcp list                          # verify "codex" appears
codex setup                              # re-run setup if missing
claude mcp remove codex && \
  claude mcp add codex -s user -- codex mcp-server   # re-add

Skills not loading in Claude Code

ls ~/.claude/skills/                     # verify files copied
# Each skill must be a directory with SKILL.md inside
ls ~/.claude/skills/auto-review-loop/SKILL.md

pdflatex not found during paper writing

# macOS
brew install --cask mactex-no-gui
# Ubuntu/Debian
sudo apt install texlive-full
# Then retry — skill auto-detects pdflatex on PATH

Reviewer returns empty critique Check ~/.codex/config.toml — ensure model is set and your API key is valid:

codex "say hello"    # quick smoke test outside Claude Code

GLM/DeepSeek reviewer not triggering Verify llm-chat MCP server is listed:

claude mcp list      # should show "llm-chat"
echo $LLM_CHAT_BASE_URL   # must be set in the shell that launches claude

Score not improving after 4 rounds

Add — human checkpoint: true and inspect each round's critique file in experiments/review_round_*/
Consider switching reviewer model — a different architecture surfaces different weaknesses
Lower-level issues (bad data, flawed baseline) need manual intervention before another loop

Community Skills

Skill	Description
`proof-writer`	Rigorous theorem proof drafting with anti-hallucination citations

Add your own skill: create skills/your-skill-name/SKILL.md and open a PR.

Cross-Model Review — Why It Works

Claude Code (executor)          Codex / external LLM (reviewer)
─────────────────────          ───────────────────────────────
Fast, fluid code execution  ←→  Deliberate, rigorous critique
Broad context retention         Adversarial probing of blind spots
Narrative generation            Structural weakness detection

Single-model self-review falls into local minima — the same pattern-matching that generated the work also evaluates it. Cross-model review is adversarial: the reviewer actively probes weaknesses the executor didn't anticipate. The 1→2 model jump produces the largest quality gain; adding more reviewers yields diminishing returns.

Weekly Installs

355

Repository

aradotso/trending-skills

GitHub Stars

First Seen

8 days ago

Security Audits

Gen Agent Trust HubFail SocketPass SnykWarn

Installed on

gemini-cli351

github-copilot351

codex351

amp351

cline351

kimi-cli351

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

54,900 周安装

ARIS 自主机器学习研究工具：夜间自动生成AI论文的Claude Code技能

🇨🇳中文介绍

ARIS — 睡眠中的自主机器学习研究

功能概述

相关 Skills

安装

Codex 模型配置

核心工作流

工作流 1 — 想法发现

工作流 2 — 自动评审循环

工作流 3 — 论文撰写

完整流水线

行内配置覆盖

替代模型组合

抗幻觉引用

可选集成

Zotero

Obsidian

飞书 / Lark 通知

流水线运行后的目录结构

Python 集成模式

技能组合

故障排除

社区技能

跨模型评审 — 为何有效

🇺🇸English

ARIS — Autonomous ML Research In Sleep

What It Does

Installation

Codex Model Configuration

Core Workflows

Workflow 1 — Idea Discovery

Workflow 2 — Auto Review Loop

Workflow 3 — Paper Writing

Full Pipeline

Inline Configuration Overrides

Alternative Model Combinations

Anti-Hallucination Citations

Optional Integrations

Zotero

Obsidian

Feishu / Lark Notifications

Directory Layout After a Pipeline Run

Python Integration Pattern

Skill Composition

Troubleshooting

Community Skills

Cross-Model Review — Why It Works

最新 Skills