arXiv MCP 搜索技能：无需安装的学术论文搜索工具，支持API和语义搜索

arxiv-mcp by oimiragieo/agent-studio

82 周安装量

19 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/oimiragieo/agent-studio --skill arxiv-mcp

AI/机器学习科研工具搜索

🇨🇳中文介绍

模式：认知/提示驱动 — 无独立实用脚本；通过智能体上下文使用。

arXiv 搜索技能

✅ 无需安装

此技能使用现有工具访问 arXiv：

WebFetch - 直接访问 arXiv API
Exa - 支持 arXiv 过滤的语义搜索

立即生效 - 无需 MCP 服务器，无需重启。

结果限制（内存保护）

arxiv-mcp 返回学术论文。为防止内存耗尽：

max_results: 20（硬性限制）
每篇论文元数据约 300 字节
20 篇论文 × 300 字节 = 约 6 KB 元数据
如果获取全文，每篇论文可能超过 100 KB - 请勿获取全文

为何限制？

先前限制：100 条结果 → 30 KB+ 元数据 → 上下文爆炸
新限制：20 条结果 → 6 KB 元数据 → 内存安全
20 篇论文通常足以找到目标

方法 1：使用 arXiv API 的 WebFetch（推荐用于特定查询）

arXiv API 可通过 http://export.arxiv.org/api/query 公开访问。

按 ID 获取特定论文

WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041',
  prompt:
    '提取完整详情：标题、所有作者、摘要、类别、发布日期、PDF 链接',
});

参数	描述	示例
`search_query`	带字段前缀的搜索词	`all:transformer`, `au:LeCun`, `ti:attention`
`id_list`	逗号分隔的 arXiv ID	`2301.07041,2302.13971`
`max_results`	结果数量（默认 10，最大 100）	`max_results=20`
`start`	分页偏移量	`start=10`
`sortBy`	排序方式：`relevance`, `lastUpdatedDate`, `submittedDate`	`sortBy=submittedDate`
`sortOrder`	`ascending` 或 `descending`	`sortOrder=descending`

search_query 的字段前缀

前缀	字段	示例
`all:`	所有字段	`all:machine+learning`
`ti:`	标题	`ti:transformer`
`au:`	作者	`au:Vaswani`
`abs:`	摘要	`abs:attention+mechanism`
`cat:`	类别	`cat:cs.LG`
`co:`	评论	`co:accepted`

使用 AND、OR、ANDNOT 组合词条：

search_query=ti:transformer+AND+abs:attention
search_query=au:LeCun+OR+au:Bengio
search_query=cat:cs.LG+ANDNOT+ti:survey

何时不应使用 arxiv-mcp

通用网络研究 → 改用 WebSearch/WebFetch
实现示例 → 在代码库上使用 pnpm search:code 或 ripgrep 技能（Grep/Glob 作为备用）
产品研究 → 使用带新闻过滤器的 WebSearch
社区讨论 → 使用 WebSearch 搜索论坛/Stack Overflow

arxiv-mcp 最适合：

查找特定主题的学术论文
理解理论基础
在文档中引用研究
快速文献综述（最多 20 篇论文）

方法 2：Exa 搜索（更适合语义/自然语言查询）

使用 Exa 进行更自然的语言查询，并支持 arXiv 过滤：

mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer architecture attention mechanism deep learning',
  numResults: 10,
});

某个领域的最新论文

mcp__Exa__web_search_exa({
  query: 'site:arxiv.org large language model scaling laws 2024',
  numResults: 15,
});

聚焦作者的搜索

mcp__Exa__web_search_exa({
  query: 'site:arxiv.org author:"Yann LeCun" deep learning',
  numResults: 10,
});

类别	领域
cs.AI	人工智能
cs.LG	机器学习
cs.CL	计算与语言（NLP）
cs.CV	计算机视觉
cs.SE	软件工程
cs.CR	密码学与安全
stat.ML	机器学习（统计学）
math.*	数学（所有子类别）
physics.*	物理学（所有子类别）
q-bio.*	定量生物学
econ.*	经济学

工作流程：完整研究过程

步骤 1：初始搜索

// 从广泛的 Exa 搜索开始，进行语义匹配
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer attention mechanism neural networks',
  numResults: 10,
});

步骤 2：获取特定论文

// 通过 ID 获取感兴趣论文的详细信息
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971',
  prompt: '为每篇论文提取完整元数据：标题、作者、摘要、类别、PDF URL',
});

步骤 3：查找相关工作

// 按感兴趣论文的类别搜索
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate',
  prompt: '查找相关论文，提取标题和摘要',
});

步骤 4：获取最新论文

// 该领域的最新论文
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: '提取 20 篇最新的机器学习论文',
});

</execution_process>

使用 Exa 进行发现：自然语言查询可找到语义相关的论文
使用 WebFetch 进行精确检索：特定 ID、类别或 API 查询
组合使用两种方法：Exa 用于发现，WebFetch 用于深入探究
使用具体查询："transformer attention mechanism" 优于 "machine learning"
检查多个类别：论文通常横跨 cs.AI + cs.LG + cs.CL
按日期排序以获取最新工作：sortBy=submittedDate&sortOrder=descending

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance',
  prompt: '提取论文标题、作者、摘要和 arXiv ID',
});

示例 2：查找研究人员的论文：

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15',
  prompt: '列出该作者的所有论文，包含标题和日期',
});

示例 3：获取最新的机器学习论文：

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: '提取 20 篇最新的机器学习论文，包含标题和摘要',
});

示例 4：使用 Exa 进行语义搜索：

mcp__Exa__web_search_exa({
  query: 'site:arxiv.org multimodal large language models vision 2024',
  numResults: 10,
});

示例 5：获取特定论文详情：

WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=1706.03762',
  prompt: "提取 'Attention Is All You Need' 论文的完整详情",
});

此技能自动分配给：

researcher - 学术研究、文献综述
scientific-research-expert - 深度科学分析
developer - 查找用于实现的技术论文

始终强制执行 max_results=20 — 绝不允许无限制或超过 20 条结果的查询；100+ 篇论文导致的上下文爆炸是已知的故障模式，会阻塞智能体流水线。
在文献综述期间切勿获取完整论文 PDF — 仅提取元数据和摘要；每篇完整论文超过 100KB，将在几分钟内耗尽上下文预算。
始终使用 Exa 进行语义发现，使用 WebFetch 进行精确检索 — Exa 查找语义相关的论文；WebFetch 获取特定 ID 或类别源；按顺序使用两者，而非互换使用。
切勿使用不带字段前缀的宽泛查询 — search_query=neural+networks 会返回数千条结果；始终使用 ti:、au:、cat: 或 abs: 前缀来限定查询范围。
在引用论文时始终注明 arXiv ID（例如 2301.07041） — 仅标题具有歧义且可能更改；ID 是稳定、机器可读的，并支持即时检索。

反模式	失败原因	正确方法
使用 `max_results=100` 或无限制	上下文爆炸；100 篇论文 × 300 字节 = 30KB+ 元数据	始终设置 `max_results=20`（硬性限制）
获取完整论文 PDF	单篇论文可能超过 100KB；耗尽上下文预算	仅通过 API 提取摘要 + 元数据
不带字段前缀的宽泛查询	返回所有字段的不相关结果	使用 `ti:`、`au:`、`cat:` 或 `abs:` 前缀
仅使用 WebFetch 进行发现	错过语义相关但不完全匹配确切术语的论文	首先使用 Exa 进行语义发现
引用论文标题而非 arXiv ID	标题可能具有歧义或重复	始终包含 arXiv ID（例如 1706.03762）

内存协议（强制）

cat .claude/context/memory/learnings.md

新模式 -> .claude/context/memory/learnings.md
发现的问题 -> .claude/context/memory/issues.md
做出的决定 -> .claude/context/memory/decisions.md

假设可能中断：您的上下文可能会重置。如果它不在内存中，则表示未发生。

2026 年 1 月 29 日

🇺🇸English

Mode: Cognitive/Prompt-Driven — No standalone utility script; use via agent context.

arXiv Search Skill

✅ No Installation Required

This skill uses existing tools to access arXiv:

WebFetch - Direct access to arXiv API
Exa - Semantic search with arXiv filtering

Works immediately - no MCP server, no restart needed.

Result Limits (Memory Safeguard)

arxiv-mcp returns academic papers. To prevent memory exhaustion:

max_results: 20 (HARD LIMIT)
Each paper metadata ~300 bytes
20 papers × 300 bytes = ~6 KB metadata
Papers can be 100+ KB each if fetched - DON'T fetch full papers

Why the limit?

Previous limit: 100 results → 30 KB+ metadata → context explosion
New limit: 20 results → 6 KB metadata → memory safe
20 papers is usually enough to find your target

Method 1: WebFetch with arXiv API (Recommended for specific queries)

The arXiv API is publicly accessible at http://export.arxiv.org/api/query.

Recommended Pattern

// ✓ GOOD: Limit results to 20
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});

// ✓ GOOD: Use specific filters to reduce result set
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention+2025&max_results=20&sortBy=submittedDate',
  prompt: 'Extract recent papers on transformer attention',
});

// ✗ BAD: Old behavior - unlimited or >20 results
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:neural+networks',
  // Too broad - will get 100s of results
});

// ✗ BAD: Exceeds memory limit
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:deep+learning&max_results=100',
  // Over limit - memory risk
});

Search by Keywords

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=all:transformer+attention&max_results=20&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, arXiv IDs, and PDF links from these results',
});

Search by Author

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:LeCun&max_results=10&sortBy=submittedDate',
  prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});

Search by Category

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=15&sortBy=submittedDate',
  prompt: 'Extract paper titles, authors, abstracts, categories, and arXiv IDs',
});

Get Specific Paper by ID

WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041',
  prompt:
    'Extract full details: title, all authors, abstract, categories, published date, PDF link',
});

API Query Parameters

Parameter	Description	Example
`search_query`	Search terms with field prefixes	`all:transformer`, `au:LeCun`, `ti:attention`
`id_list`	Comma-separated arXiv IDs	`2301.07041,2302.13971`
`max_results`	Number of results (default 10, max 100)

Field Prefixes for search_query

Prefix	Field	Example
`all:`	All fields	`all:machine+learning`
`ti:`	Title	`ti:transformer`
`au:`	Author	`au:Vaswani`
`abs:`

Boolean Operators

Combine terms with AND, OR, ANDNOT:

search_query=ti:transformer+AND+abs:attention
search_query=au:LeCun+OR+au:Bengio
search_query=cat:cs.LG+ANDNOT+ti:survey

When NOT to Use arxiv-mcp

General web research → Use WebSearch/WebFetch instead
Implementation examples → Use pnpm search:code or ripgrep skill on codebase (Grep/Glob as fallback)
Product research → Use WebSearch with news filter
Community discussions → Use WebSearch for forums/Stack Overflow

arxiv-mcp is best for:

Finding academic papers on specific topics
Understanding theoretical foundations
Citing research in documentation
Quick literature review (20 papers max)

Method 2: Exa Search (Better for semantic/natural language queries)

Use Exa for more natural language queries with arXiv filtering:

Semantic Search

mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer architecture attention mechanism deep learning',
  numResults: 10,
});

Recent Papers in a Field

mcp__Exa__web_search_exa({
  query: 'site:arxiv.org large language model scaling laws 2024',
  numResults: 15,
});

Author-Focused Search

mcp__Exa__web_search_exa({
  query: 'site:arxiv.org author:"Yann LeCun" deep learning',
  numResults: 10,
});

Common arXiv Categories

Category	Field
cs.AI	Artificial Intelligence
cs.LG	Machine Learning
cs.CL	Computation and Language (NLP)
cs.CV	Computer Vision
cs.SE	Software Engineering
cs.CR	Cryptography and Security
stat.ML	Machine Learning (Statistics)
math.*	Mathematics (all subcategories)
physics.*	Physics (all subcategories)
q-bio.*	Quantitative Biology
econ.*	Economics

Workflow: Complete Research Process

Step 1: Initial Search

// Start with broad Exa search for semantic matching
mcp__Exa__web_search_exa({
  query: 'site:arxiv.org transformer attention mechanism neural networks',
  numResults: 10,
});

Step 2: Get Specific Papers

// Get details for interesting papers by ID
WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=2301.07041,2302.13971',
  prompt: 'Extract full metadata for each paper: title, authors, abstract, categories, PDF URL',
});

Step 3: Find Related Work

// Search by category of interesting paper
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG+AND+ti:attention&max_results=10&sortBy=submittedDate',
  prompt: 'Find related papers, extract titles and abstracts',
});

Step 4: Get Recent Papers

// Latest papers in the field
WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: 'Extract the 20 most recent machine learning papers',
});

</execution_process>

<best_practices>

Use Exa for discovery : Natural language queries find semantically related papers
Use WebFetch for precision : Specific IDs, categories, or API queries
Combine approaches : Exa to discover, WebFetch to deep-dive
Use specific queries : "transformer attention mechanism" > "machine learning"
Check multiple categories : Papers often span cs.AI + cs.LG + cs.CL
Sort by date for recent work : sortBy=submittedDate&sortOrder=descending

</best_practices>

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=ti:transformer+AND+abs:attention&max_results=10&sortBy=relevance',
  prompt: 'Extract paper titles, authors, abstracts, and arXiv IDs',
});

Example 2: Find papers by researcher :

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=au:Vaswani&max_results=15',
  prompt: 'List all papers by this author with titles and dates',
});

Example 3: Get recent ML papers :

WebFetch({
  url: 'http://export.arxiv.org/api/query?search_query=cat:cs.LG&max_results=20&sortBy=submittedDate&sortOrder=descending',
  prompt: 'Extract the 20 most recent machine learning papers with titles and abstracts',
});

Example 4: Semantic search with Exa :

mcp__Exa__web_search_exa({
  query: 'site:arxiv.org multimodal large language models vision 2024',
  numResults: 10,
});

Example 5: Get specific paper details :

WebFetch({
  url: 'http://export.arxiv.org/api/query?id_list=1706.03762',
  prompt: "Extract complete details for the 'Attention Is All You Need' paper",
});

</usage_example>

Agent Integration

This skill is automatically assigned to:

researcher - Academic research, literature review
scientific-research-expert - Deep scientific analysis
developer - Finding technical papers for implementation

Iron Laws

ALWAYS enforce max_results=20 — never allow unlimited or >20 result queries; context explosion from 100+ papers is a known failure mode that stalls agent pipelines.
NEVER fetch full paper PDFs during literature review — extract metadata and abstracts only; full papers are 100KB+ each and will exhaust context budget in minutes.
ALWAYS use Exa for semantic discovery, WebFetch for precision retrieval — Exa finds semantically related papers; WebFetch gets specific IDs or category feeds; use both in sequence, not interchangeably.
NEVER use broad queries without field prefixes — search_query=neural+networks returns thousands of results; always scope with ti:, au:, cat:, or abs: prefixes to target the query.
ALWAYS cite arXiv IDs (e.g., 2301.07041) when referencing papers — titles alone are ambiguous and change; IDs are stable, machine-readable, and enable instant retrieval.

Anti-Patterns

Anti-Pattern	Why It Fails	Correct Approach
Using `max_results=100` or no limit	Context explosion; 100 papers × 300 bytes = 30KB+ metadata	Always set `max_results=20` (hard limit)
Fetching full paper PDFs	Single paper can be 100KB+; kills context budget	Extract abstract + metadata only via API
Broad query without field prefix	Returns irrelevant results across all fields	Use `ti:`, `au:`, `cat:`, or `abs:` prefix

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

Weekly Installs

Repository

oimiragieo/agent-studio

GitHub Stars

First Seen

Jan 29, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

github-copilot81

gemini-cli80

cursor80

kimi-cli79

amp79

codex79

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

49,800 周安装