aris-autonomous-ml-research by aradotso/trending-skills
npx skills add https://github.com/aradotso/trending-skills --skill aris-autonomous-ml-researchSkill by ara.so — Daily 2026 Skills 合集
ARIS(Auto-Research-In-Sleep)将 Claude Code 转变为一个自主的机器学习研究引擎。它将想法发现 → 跨模型评审循环 → 论文撰写 → 编译 PDF 链接成无需人工干预的夜间流水线。Claude Code 驱动执行,而外部模型(Codex/GPT-5.4、GLM、DeepSeek、Kimi 等)充当对抗性评审者——打破单一模型评审无法摆脱的自对弈盲点。
| 工作流 | 触发命令 | 执行内容 |
|---|---|---|
| 想法发现 | /idea-discovery | 文献调研 → 8–12 个想法 → 新颖性检查 → 试点 GPU 运行 → 排名报告 |
| 自动评审循环 | /auto-review-loop | 4 轮评审/修复循环,每轮跟踪评分(例如 5/10 → 7.5/10) |
| 论文撰写 | /paper-writing | 叙述 → 大纲 → 图表 → LaTeX → PDF → 2 轮自动改进 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 完整流水线 | /research-pipeline | 从单一提示开始,端到端链接所有三个工作流 |
# 1. 克隆并安装技能
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/
# 2. 安装 Codex MCP(跨模型评审者)
npm install -g @openai/codex
codex setup # 提示时设置模型为 gpt-5.4
claude mcp add codex -s user -- codex mcp-server
# 3. 验证 MCP 已连接
claude mcp list # 列表中应显示 "codex"
评审者模型从 ~/.codex/config.toml 读取,而非从技能文件。如有需要,请直接编辑:
# ~/.codex/config.toml
model = "gpt-5.4" # 推荐 — 最严格的评审者
# model = "gpt-5.3-codex"
# model = "gpt-5.2-codex"
# model = "o3"
/idea-discovery "离散扩散语言模型中的因子化差距"
请具体说明 — "NLP" 会产生弱想法;"离散扩散语言模型中的因子化差距" 则针对真实的研究空白。
执行内容:
idea_discovery_report.md/auto-review-loop
从包含您论文草稿或实验结果的目录中运行。
执行内容:
/10 评分的结构化批评docs/auto_review_score_curve.png/paper-writing "NARRATIVE_REPORT.md"
指向描述您发现的叙述性 Markdown 文件。
执行内容:
pdflatex 编译/research-pipeline "您的研究方向"
从单一提示开始,链接工作流 1 → 2 → 3。醒来时您将获得一份已评分、已编译的论文。
在任何命令后追加 — key: value:
/research-pipeline "主题" — AUTO_PROCEED: false
/research-pipeline "主题" — human checkpoint: true
/research-pipeline "主题" — arxiv download: true
/research-pipeline "主题" — AUTO_PROCEED: false, human checkpoint: true
| 参数 | 默认值 | 效果 |
|---|---|---|
AUTO_PROCEED | true | false = 在提交 GPU 时间前的想法选择关口暂停 |
human checkpoint | false | true = 每轮评审后暂停以获取手动反馈 |
arxiv download | false | true = 在文献调研期间下载完整 PDF(而非仅元数据) |
DBLP_BIBTEX | true | false = 使用 LLM 生成的 BibTeX(不推荐 — 存在幻觉风险) |
无需 Claude 或 OpenAI API — 通过 llm-chat MCP 服务器可交换任何 OpenAI 兼容的端点:
# 安装捆绑的 llm-chat MCP 服务器
cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat
pip install -r requirements.txt
# 配置您的提供商
export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # GLM-4
export LLM_CHAT_API_KEY="your-key"
export LLM_CHAT_MODEL="glm-4-plus"
# 添加到 Claude Code
claude mcp add llm-chat -s user -- python server.py
已测试的评审者模型:
| 提供商 | 模型 | 备注 |
|---|---|---|
| OpenAI | gpt-5.4 | 推荐 — 最严格 |
| 智谱 AI | glm-4-plus | 中文论文能力强 |
| MiniMax | abab6.5s-chat | 快速、经济高效 |
| Moonshot | moonshot-v1-128k | Kimi — 长上下文论文 |
| DeepSeek | deepseek-chat | 代码密集型实验 |
| 01.ai | yi-large | LongCat — 长上下文 |
默认情况下,BibTeX 从真实数据库获取 — 无需手动标志:
# skills/paper-writing/citation_fetcher.py 内部使用的模式
import requests
def fetch_bibtex_dblp(title: str) -> str | None:
"""根据论文标题从 DBLP 获取真实的 BibTeX。"""
resp = requests.get(
"https://dblp.org/search/publ/api",
params={"q": title, "format": "json", "h": 1}
)
hits = resp.json().get("result", {}).get("hits", {}).get("hit", [])
if not hits:
return None
key = hits[0]["info"].get("key", "")
bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib")
return bib_resp.text if bib_resp.ok else None
def fetch_bibtex_crossref(doi: str) -> str | None:
"""备用方案:根据 DOI 从 CrossRef 获取 BibTeX。"""
resp = requests.get(
f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex"
)
return resp.text if resp.ok else None
如果完全离线工作,请使用 — DBLP_BIBTEX: false 禁用。
# 安装 Zotero Better BibTeX 插件,然后:
export ZOTERO_API_KEY="your-zotero-web-api-key"
export ZOTERO_LIBRARY_ID="your-library-id"
export ZOTERO_LIBRARY_TYPE="user" # 或 "group"
文献搜索将在访问 arXiv 之前查询您的 Zotero 库。
export OBSIDIAN_VAULT_PATH="/path/to/your/vault"
技能将在外部查询之前,在您的知识库中搜索相关的 Markdown 笔记。
export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/your-token"
export FEISHU_MODE="push" # off | push | interactive
| 模式 | 行为 |
|---|---|
off | 无通知 |
push | 单向提醒:评审分数、实验完成、检查点 |
interactive | 在 AUTO_PROCEED: false 关口显示移动端批准按钮 |
your-project/
├── idea_discovery_report.md # 带有新颖性评分的排名想法
├── NARRATIVE_REPORT.md # 自动生成的发现叙述
├── paper/
│ ├── main.tex # 组装的 LaTeX
│ ├── main.pdf # 编译输出
│ ├── figures/ # 自动生成的图表
│ └── references.bib # 来自 DBLP/CrossRef 的真实 BibTeX
├── experiments/
│ ├── pilot_runs/ # 想法发现的 GPU 试点运行
│ └── review_round_*/ # 每轮实验的结果
└── docs/
└── auto_review_score_curve.png
从 Python 脚本(例如 cron 作业或 CI 步骤)以编程方式触发 ARIS 工作流:
import subprocess
import json
from pathlib import Path
def run_aris_pipeline(
research_direction: str,
output_dir: str = ".",
auto_proceed: bool = True,
human_checkpoint: bool = False,
arxiv_download: bool = False,
) -> dict:
"""
通过 Claude Code CLI 启动 ARIS 完整流水线。
从评审曲线 JSON 返回解析后的分数进展。
"""
overrides = ", ".join([
f"AUTO_PROCEED: {str(auto_proceed).lower()}",
f"human checkpoint: {str(human_checkpoint).lower()}",
f"arxiv download: {str(arxiv_download).lower()}",
])
command = f'/research-pipeline "{research_direction}" — {overrides}'
result = subprocess.run(
["claude", "--print", command],
cwd=output_dir,
capture_output=True,
text=True,
)
if result.returncode != 0:
raise RuntimeError(f"ARIS 流水线失败:\n{result.stderr}")
# 如果可用,解析分数进展
score_json = Path(output_dir) / "docs" / "review_scores.json"
if score_json.exists():
return json.loads(score_json.read_text())
return {"stdout": result.stdout}
# 示例:夜间研究任务
if __name__ == "__main__":
scores = run_aris_pipeline(
research_direction="自回归语言模型中的令牌级不确定性校准",
output_dir="./nightly_research",
auto_proceed=True,
human_checkpoint=False,
)
print(f"最终评审分数: {scores.get('rounds', [{}])[-1].get('score')}/10")
ARIS 附带 20 个可组合的子技能。手动链接它们以创建自定义工作流:
# 仅文献
/literature-survey "主题"
# 无试点实验的头脑风暴
/idea-brainstorm "主题" — pilot experiments: false
# 单轮评审(无循环)
/single-review "path/to/draft.md"
# 证明撰写(社区技能)
/proof-writer "定理陈述"
# 从现有叙述撰写论文,跳过评审
/paper-writing "NARRATIVE.md" — auto-review: false
未找到 Codex MCP
claude mcp list # 验证 "codex" 出现
codex setup # 如果缺失,重新运行设置
claude mcp remove codex && \
claude mcp add codex -s user -- codex mcp-server # 重新添加
技能未在 Claude Code 中加载
ls ~/.claude/skills/ # 验证文件已复制
# 每个技能必须是一个包含 SKILL.md 的目录
ls ~/.claude/skills/auto-review-loop/SKILL.md
论文撰写期间未找到 pdflatex
# macOS
brew install --cask mactex-no-gui
# Ubuntu/Debian
sudo apt install texlive-full
# 然后重试 — 技能会自动检测 PATH 上的 pdflatex
评审者返回空批评 检查 ~/.codex/config.toml — 确保 model 已设置且您的 API 密钥有效:
codex "say hello" # 在 Claude Code 外部快速冒烟测试
GLM/DeepSeek 评审者未触发 验证 llm-chat MCP 服务器在列表中:
claude mcp list # 应显示 "llm-chat"
echo $LLM_CHAT_BASE_URL # 必须在启动 claude 的 shell 中设置
4 轮后分数未提高
— human checkpoint: true 并检查 experiments/review_round_*/ 中每轮的批评文件| 技能 | 描述 |
|---|---|
proof-writer | 带有抗幻觉引用的严谨定理证明起草 |
添加您自己的技能:创建 skills/your-skill-name/SKILL.md 并提交 PR。
Claude Code (执行者) Codex / 外部 LLM (评审者)
───────────────────── ───────────────────────────────
快速、流畅的代码执行 ←→ 深思熟虑、严谨的批评
广泛的上下文保留 对抗性地探查盲点
叙述生成 结构弱点检测
单一模型的自评审会陷入局部最优 — 生成工作的相同模式匹配也用于评估它。跨模型评审是对抗性的:评审者主动探查执行者未预料到的弱点。1→2 的模型跳跃产生最大的质量提升;添加更多评审者会产生收益递减。
每周安装数
355
代码仓库
GitHub 星标数
10
首次出现
8 天前
安全审计
安装于
gemini-cli351
github-copilot351
codex351
amp351
cline351
kimi-cli351
Skill by ara.so — Daily 2026 Skills collection
ARIS (Auto-Research-In-Sleep) turns Claude Code into an autonomous ML research engine. It chains idea discovery → cross-model review loops → paper writing → compiled PDF into hands-off overnight pipelines. Claude Code drives execution while an external model (Codex/GPT-5.4, GLM, DeepSeek, Kimi, etc.) acts as adversarial reviewer — breaking self-play blind spots that single-model review cannot escape.
| Workflow | Trigger | What Runs |
|---|---|---|
| Idea Discovery | /idea-discovery | Literature survey → 8–12 ideas → novelty check → pilot GPU runs → ranked report |
| Auto Review Loop | /auto-review-loop | 4-round review/fix cycle, score tracked per round (e.g. 5/10 → 7.5/10) |
| Paper Writing | /paper-writing | Narrative → outline → figures → LaTeX → PDF → 2-round auto-improvement |
| Full Pipeline | /research-pipeline | Chains all three end-to-end from a single prompt |
# 1. Clone and install skills
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cp -r Auto-claude-code-research-in-sleep/skills/* ~/.claude/skills/
# 2. Install Codex MCP (cross-model reviewer)
npm install -g @openai/codex
codex setup # set model to gpt-5.4 when prompted
claude mcp add codex -s user -- codex mcp-server
# 3. Verify MCP is connected
claude mcp list # should show "codex" in the list
The reviewer model is read from ~/.codex/config.toml, not from skill files. Edit directly if needed:
# ~/.codex/config.toml
model = "gpt-5.4" # recommended — most rigorous reviewer
# model = "gpt-5.3-codex"
# model = "gpt-5.2-codex"
# model = "o3"
/idea-discovery "factorized gap in discrete diffusion language models"
Be specific — "NLP" produces weak ideas; "factorized gap in discrete diffusion LMs" targets a real research gap.
What runs:
idea_discovery_report.md/auto-review-loop
Run from a directory containing your paper draft or experiment results.
What runs:
/10docs/auto_review_score_curve.png/paper-writing "NARRATIVE_REPORT.md"
Point at a narrative markdown file describing your findings.
What runs:
pdflatex compilation/research-pipeline "your research direction"
Chains Workflows 1 → 2 → 3 from a single prompt. Wake up to a scored, compiled paper.
Append — key: value to any command:
/research-pipeline "topic" — AUTO_PROCEED: false
/research-pipeline "topic" — human checkpoint: true
/research-pipeline "topic" — arxiv download: true
/research-pipeline "topic" — AUTO_PROCEED: false, human checkpoint: true
| Parameter | Default | Effect |
|---|---|---|
AUTO_PROCEED | true | false = pause at idea selection gate before committing GPU time |
human checkpoint | false | true = pause after each review round for manual feedback |
arxiv download | false |
No Claude or OpenAI API required — swap any OpenAI-compatible endpoint via the llm-chat MCP server:
# Install the bundled llm-chat MCP server
cd Auto-claude-code-research-in-sleep/mcp-servers/llm-chat
pip install -r requirements.txt
# Configure your provider
export LLM_CHAT_BASE_URL="https://open.bigmodel.cn/api/paas/v4" # GLM-4
export LLM_CHAT_API_KEY="your-key"
export LLM_CHAT_MODEL="glm-4-plus"
# Add to Claude Code
claude mcp add llm-chat -s user -- python server.py
Tested reviewer models:
| Provider | Model | Notes |
|---|---|---|
| OpenAI | gpt-5.4 | Recommended — most rigorous |
| Zhipu AI | glm-4-plus | Strong Chinese-language papers |
| MiniMax | abab6.5s-chat | Fast, cost-effective |
| Moonshot | moonshot-v1-128k | Kimi — long-context papers |
| DeepSeek | deepseek-chat | Code-heavy experiments |
BibTeX is fetched from real databases by default — no manual flag needed:
# skills/paper-writing/citation_fetcher.py pattern used internally
import requests
def fetch_bibtex_dblp(title: str) -> str | None:
"""Fetch real BibTeX from DBLP by paper title."""
resp = requests.get(
"https://dblp.org/search/publ/api",
params={"q": title, "format": "json", "h": 1}
)
hits = resp.json().get("result", {}).get("hits", {}).get("hit", [])
if not hits:
return None
key = hits[0]["info"].get("key", "")
bib_resp = requests.get(f"https://dblp.org/rec/{key}.bib")
return bib_resp.text if bib_resp.ok else None
def fetch_bibtex_crossref(doi: str) -> str | None:
"""Fallback: fetch BibTeX from CrossRef by DOI."""
resp = requests.get(
f"https://api.crossref.org/works/{doi}/transform/application/x-bibtex"
)
return resp.text if resp.ok else None
Disable with — DBLP_BIBTEX: false if working fully offline.
# Install Zotero Better BibTeX plugin, then:
export ZOTERO_API_KEY="your-zotero-web-api-key"
export ZOTERO_LIBRARY_ID="your-library-id"
export ZOTERO_LIBRARY_TYPE="user" # or "group"
Literature search will query your Zotero library before hitting arXiv.
export OBSIDIAN_VAULT_PATH="/path/to/your/vault"
Skill will search markdown notes in the vault for related work before external queries.
export FEISHU_WEBHOOK_URL="https://open.feishu.cn/open-apis/bot/v2/hook/your-token"
export FEISHU_MODE="push" # off | push | interactive
| Mode | Behaviour |
|---|---|
off | No notifications |
push | One-way alerts: review scores, experiment completions, checkpoints |
interactive | Mobile approval buttons at AUTO_PROCEED: false gates |
your-project/
├── idea_discovery_report.md # ranked ideas with novelty scores
├── NARRATIVE_REPORT.md # auto-generated findings narrative
├── paper/
│ ├── main.tex # assembled LaTeX
│ ├── main.pdf # compiled output
│ ├── figures/ # auto-generated plots
│ └── references.bib # real BibTeX from DBLP/CrossRef
├── experiments/
│ ├── pilot_runs/ # idea-discovery GPU pilots
│ └── review_round_*/ # per-round experiment results
└── docs/
└── auto_review_score_curve.png
Trigger ARIS workflows programmatically from a Python script (e.g. a cron job or CI step):
import subprocess
import json
from pathlib import Path
def run_aris_pipeline(
research_direction: str,
output_dir: str = ".",
auto_proceed: bool = True,
human_checkpoint: bool = False,
arxiv_download: bool = False,
) -> dict:
"""
Launch ARIS full pipeline via Claude Code CLI.
Returns parsed score progression from the review curve JSON.
"""
overrides = ", ".join([
f"AUTO_PROCEED: {str(auto_proceed).lower()}",
f"human checkpoint: {str(human_checkpoint).lower()}",
f"arxiv download: {str(arxiv_download).lower()}",
])
command = f'/research-pipeline "{research_direction}" — {overrides}'
result = subprocess.run(
["claude", "--print", command],
cwd=output_dir,
capture_output=True,
text=True,
)
if result.returncode != 0:
raise RuntimeError(f"ARIS pipeline failed:\n{result.stderr}")
# Parse score progression if available
score_json = Path(output_dir) / "docs" / "review_scores.json"
if score_json.exists():
return json.loads(score_json.read_text())
return {"stdout": result.stdout}
# Example: nightly research job
if __name__ == "__main__":
scores = run_aris_pipeline(
research_direction="token-level uncertainty calibration in autoregressive LMs",
output_dir="./nightly_research",
auto_proceed=True,
human_checkpoint=False,
)
print(f"Final review score: {scores.get('rounds', [{}])[-1].get('score')}/10")
ARIS ships 20 composable sub-skills. Chain them manually for custom workflows:
# Literature only
/literature-survey "topic"
# Brainstorm without pilot experiments
/idea-brainstorm "topic" — pilot experiments: false
# Single review round (no loop)
/single-review "path/to/draft.md"
# Proof-writing (community skill)
/proof-writer "theorem statement"
# Write paper from existing narrative, skip review
/paper-writing "NARRATIVE.md" — auto-review: false
Codex MCP not found
claude mcp list # verify "codex" appears
codex setup # re-run setup if missing
claude mcp remove codex && \
claude mcp add codex -s user -- codex mcp-server # re-add
Skills not loading in Claude Code
ls ~/.claude/skills/ # verify files copied
# Each skill must be a directory with SKILL.md inside
ls ~/.claude/skills/auto-review-loop/SKILL.md
pdflatex not found during paper writing
# macOS
brew install --cask mactex-no-gui
# Ubuntu/Debian
sudo apt install texlive-full
# Then retry — skill auto-detects pdflatex on PATH
Reviewer returns empty critique Check ~/.codex/config.toml — ensure model is set and your API key is valid:
codex "say hello" # quick smoke test outside Claude Code
GLM/DeepSeek reviewer not triggering Verify llm-chat MCP server is listed:
claude mcp list # should show "llm-chat"
echo $LLM_CHAT_BASE_URL # must be set in the shell that launches claude
Score not improving after 4 rounds
— human checkpoint: true and inspect each round's critique file in experiments/review_round_*/| Skill | Description |
|---|---|
proof-writer | Rigorous theorem proof drafting with anti-hallucination citations |
Add your own skill: create skills/your-skill-name/SKILL.md and open a PR.
Claude Code (executor) Codex / external LLM (reviewer)
───────────────────── ───────────────────────────────
Fast, fluid code execution ←→ Deliberate, rigorous critique
Broad context retention Adversarial probing of blind spots
Narrative generation Structural weakness detection
Single-model self-review falls into local minima — the same pattern-matching that generated the work also evaluates it. Cross-model review is adversarial: the reviewer actively probes weaknesses the executor didn't anticipate. The 1→2 model jump produces the largest quality gain; adding more reviewers yields diminishing returns.
Weekly Installs
355
Repository
GitHub Stars
10
First Seen
8 days ago
Security Audits
Gen Agent Trust HubFailSocketPassSnykWarn
Installed on
gemini-cli351
github-copilot351
codex351
amp351
cline351
kimi-cli351
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
54,900 周安装
Rust调用关系图生成器 - 可视化函数调用层次结构,提升代码分析效率
539 周安装
parallel-web-extract:并行网页内容提取工具,高效抓取网页数据
595 周安装
腾讯云CloudBase AI模型Web技能:前端调用混元/DeepSeek模型,实现流式文本生成
560 周安装
Apollo Connectors 模式助手:GraphQL API 连接器开发与集成指南
565 周安装
GitHub Trending 趋势分析工具:实时发现热门项目、技术洞察与开源机会
556 周安装
GSAP React 集成教程:useGSAP Hook 动画库与 React 组件开发指南
546 周安装
true = download full PDFs during literature survey (vs metadata only) |
DBLP_BIBTEX | true | false = use LLM-generated BibTeX (not recommended — hallucination risk) |
| 01.ai | yi-large | LongCat — long context |