tooluniverse-expression-data-retrieval by mims-harvard/tooluniverse
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-expression-data-retrieval通过适当的消歧和质量评估,检索基因表达实验和多组学数据集。
重要提示:在工具调用中始终使用英文术语(基因名称、组织名称、条件描述),即使用户使用其他语言书写。只有在英文搜索无结果时,才尝试使用原始语言术语作为备选。使用用户的语言进行回复。
Phase 0: Clarify Query (if ambiguous)
↓
Phase 1: Disambiguate Gene/Condition
↓
Phase 2: Search & Retrieve (Internal)
↓
Phase 3: Report Dataset Profile
仅在以下情况下询问用户:
对于以下情况,跳过澄清:
如果按基因搜索,首先解析官方标识符:
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# For gene-focused searches, resolve official symbol first
# This helps construct better search queries
# Example: "p53" → "TP53" (official HGNC symbol)
基因消歧核对清单:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 用户查询类型 | 搜索策略 |
|---|---|
| 特定登录号 | 直接检索 |
| 基因 + 条件 | "[gene] [condition]" + 物种过滤器 |
| 仅疾病 | "[disease]" + 物种过滤器 |
| 特定技术平台 | 添加平台关键词(RNA-seq、微阵列) |
静默搜索。不要叙述过程。
# ArrayExpress search
result = tu.tools.arrayexpress_search_experiments(
keywords="[gene/disease] [condition]",
species="[species]",
limit=20
)
# BioStudies for multi-omics
biostudies_result = tu.tools.biostudies_search_studies(
query="[keywords]",
limit=10
)
对于顶部结果,检索完整的元数据:
# Get details for each relevant experiment
details = tu.tools.arrayexpress_get_experiment_details(
accession=accession
)
# Get sample information
samples = tu.tools.arrayexpress_get_experiment_samples(
accession=accession
)
# Get available files
files = tu.tools.arrayexpress_get_experiment_files(
accession=accession
)
# Multi-omics study details
study_details = tu.tools.biostudies_get_study_details(
accession=study_accession
)
# Study structure
sections = tu.tools.biostudies_get_study_sections(
accession=study_accession
)
# Available files
files = tu.tools.biostudies_get_study_files(
accession=study_accession
)
| 主要方法 | 备用方法 | 备注 |
|---|---|---|
| ArrayExpress 搜索 | BioStudies 搜索 | ArrayExpress 为空时 |
| arrayexpress_get_experiment_details | biostudies_get_study_details | E-GEOD 可能在 BioStudies 有镜像 |
| arrayexpress_get_experiment_files | 备注"文件不可用" | 部分研究限制下载 |
以 数据集搜索报告 的形式呈现。隐藏搜索过程。
# Expression Data: [Query Topic]
**搜索摘要**
- 查询:[gene/disease] 在 [species] 中
- 数据库:ArrayExpress, BioStudies
- 结果:找到 [N] 个相关实验
**数据质量概述**:[基于以下标准的评估]
---
## 顶级实验
### 1. [E-MTAB-XXXX]: [Title]
| 属性 | 值 |
|-----------|-------|
| **登录号** | [accession with link] |
| **生物体** | [species] |
| **实验类型** | RNA-seq / Microarray |
| **平台** | [specific platform] |
| **样本数** | [N] samples |
| **发布日期** | [date] |
**描述**:[来自元数据的简要描述]
**实验设计**:
- 条件:[处理组 vs 对照组等]
- 重复:[N 个生物学重复,M 个技术重复]
- 组织/细胞类型:[如果指定]
**样本分组**:
| 组别 | 样本数 | 描述 |
|-------|---------|-------------|
| 对照 | [N] | [description] |
| 处理 | [N] | [description] |
**可用数据文件**:
| 文件 | 类型 | 大小 |
|------|------|------|
| [filename] | 处理后的数据 | [size] |
| [filename] | 原始数据 | [size] |
| [filename] | 样本元数据 | [size] |
**质量评估**:●●● 高 / ●●○ 中 / ●○○ 低
- 样本量:[充足/有限]
- 重复性:[有/无]
- 元数据完整性:[完整/部分]
---
### 2. [E-GEOD-XXXXX]: [Title]
[同上结构]
---
## 多组学研究(来自 BioStudies)
### [S-BSST-XXXXX]: [Title]
| 属性 | 值 |
|-----------|-------|
| **登录号** | [accession] |
| **研究类型** | [proteomics/metabolomics/integrated] |
| **生物体** | [species] |
| **样本数** | [N] |
**包含的数据类型**:
- [ ] 转录组学
- [ ] 蛋白质组学
- [ ] 代谢组学
- [ ] 其他:[specify]
---
## 汇总表
| 登录号 | 类型 | 样本数 | 平台 | 质量 |
|-----------|------|---------|----------|---------|
| [E-MTAB-X] | RNA-seq | [N] | Illumina | ●●● |
| [E-GEOD-X] | Microarray | [N] | Affymetrix | ●●○ |
---
## 建议
**对于 [specific analysis type]**:
- 最佳实验:[accession] - [reason]
- 备选方案:[accession] - [reason]
**数据整合注意事项**:
- 平台兼容性:[关于合并数据集的说明]
- 批次考虑:[如果适用]
---
## 数据访问
### 直接下载链接
- [E-MTAB-XXXX 处理后的数据](link)
- [E-MTAB-XXXX 原始数据](link)
### 数据库链接
- ArrayExpress: https://www.ebi.ac.uk/arrayexpress/experiments/[accession]
- BioStudies: https://www.ebi.ac.uk/biostudies/studies/[accession]
检索日期:[date]
表达实验的评估标准:
| 等级 | 符号 | 标准 |
|---|---|---|
| 高质量 | ●●● | ≥3 个生物学重复,元数据完整,有处理后的数据可用 |
| 中等质量 | ●●○ | 2-3 个重复 或 部分元数据缺失,数据可访问 |
| 低质量 | ●○○ | 无重复,元数据稀疏,或数据访问有问题 |
| 谨慎使用 | ○○○ | 单一样本,无重复,平台过时 |
包含评估理由:
**质量**:●●● 高
- ✓ 每个条件 4 个生物学重复
- ✓ 样本注释完整
- ✓ 处理后的数据和原始数据均可用
- ✓ 近期 RNA-seq 平台
每个数据集报告必须包含:
用户:"查找乳腺癌 RNA-seq 数据"
result = tu.tools.arrayexpress_search_experiments(
keywords="breast cancer RNA-seq",
species="Homo sapiens",
limit=20
)
→ 报告顶级实验及质量评估
用户:"查找小鼠中 TP53 表达实验"
result = tu.tools.arrayexpress_search_experiments(
keywords="TP53 p53", # Include aliases
species="Mus musculus",
limit=15
)
→ 报告研究该基因的实验
用户:"获取 E-MTAB-5214 的详细信息" → 包含所有详情和文件的单个实验概况
用户:"查找肝脏疾病的蛋白质组学和转录组学研究" → 搜索 ArrayExpress 和 BioStudies,注明整合潜力
| 错误 | 响应 |
|---|---|
| "未找到实验" | 扩展关键词,移除物种过滤器,尝试同义词 |
| "未找到登录号" | 验证格式(E-MTAB-、E-GEOD-、S-BSST*),检查是否已撤回 |
| "文件不可用" | 在报告中注明:"数据文件由提交者限制" |
| "API 超时" | 重试一次,然后注明:"(元数据检索不完整)" |
ArrayExpress(基因表达)
| 工具 | 用途 |
|---|---|
arrayexpress_search_experiments | 关键词/物种搜索 |
arrayexpress_get_experiment_details | 完整元数据 |
arrayexpress_get_experiment_files | 下载链接 |
arrayexpress_get_experiment_samples | 样本注释 |
BioStudies(多组学)
| 工具 | 用途 |
|---|---|
biostudies_search_studies | 多组学搜索 |
biostudies_get_study_details | 研究元数据 |
biostudies_get_study_files | 数据文件 |
biostudies_get_study_sections | 研究结构 |
ArrayExpress
| 参数 | 描述 | 示例 |
|---|---|---|
keywords | 自由文本搜索 | "breast cancer RNA-seq" |
species | 学名 | "Homo sapiens" |
array | 平台过滤器 | "Illumina" |
limit | 最大结果数 | 20 |
BioStudies
| 参数 | 描述 | 示例 |
|---|---|---|
query | 自由文本 | "proteomics liver" |
limit | 最大结果数 | 10 |
每周安装量
161
代码库
GitHub 星标数
1.2K
首次出现
2026年2月4日
安全审计
安装于
codex153
opencode152
gemini-cli148
github-copilot145
amp140
kimi-cli139
Retrieve gene expression experiments and multi-omics datasets with proper disambiguation and quality assessment.
IMPORTANT : Always use English terms in tool calls (gene names, tissue names, condition descriptions), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.
Phase 0: Clarify Query (if ambiguous)
↓
Phase 1: Disambiguate Gene/Condition
↓
Phase 2: Search & Retrieve (Internal)
↓
Phase 3: Report Dataset Profile
Ask the user ONLY if:
Skip clarification for:
If searching by gene, first resolve official identifiers:
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# For gene-focused searches, resolve official symbol first
# This helps construct better search queries
# Example: "p53" → "TP53" (official HGNC symbol)
Gene Disambiguation Checklist:
| User Query Type | Search Strategy |
|---|---|
| Specific accession | Direct retrieval |
| Gene + condition | "[gene] [condition]" + species filter |
| Disease only | "[disease]" + species filter |
| Technology-specific | Add platform keywords (RNA-seq, microarray) |
Search silently. Do NOT narrate the process.
# ArrayExpress search
result = tu.tools.arrayexpress_search_experiments(
keywords="[gene/disease] [condition]",
species="[species]",
limit=20
)
# BioStudies for multi-omics
biostudies_result = tu.tools.biostudies_search_studies(
query="[keywords]",
limit=10
)
For top results, retrieve full metadata:
# Get details for each relevant experiment
details = tu.tools.arrayexpress_get_experiment_details(
accession=accession
)
# Get sample information
samples = tu.tools.arrayexpress_get_experiment_samples(
accession=accession
)
# Get available files
files = tu.tools.arrayexpress_get_experiment_files(
accession=accession
)
# Multi-omics study details
study_details = tu.tools.biostudies_get_study_details(
accession=study_accession
)
# Study structure
sections = tu.tools.biostudies_get_study_sections(
accession=study_accession
)
# Available files
files = tu.tools.biostudies_get_study_files(
accession=study_accession
)
| Primary | Fallback | Notes |
|---|---|---|
| ArrayExpress search | BioStudies search | ArrayExpress empty |
| arrayexpress_get_experiment_details | biostudies_get_study_details | E-GEOD may have BioStudies mirror |
| arrayexpress_get_experiment_files | Note "Files unavailable" | Some studies restrict downloads |
Present as a Dataset Search Report. Hide search process.
# Expression Data: [Query Topic]
**Search Summary**
- Query: [gene/disease] in [species]
- Databases: ArrayExpress, BioStudies
- Results: [N] relevant experiments found
**Data Quality Overview**: [assessment based on criteria below]
---
## Top Experiments
### 1. [E-MTAB-XXXX]: [Title]
| Attribute | Value |
|-----------|-------|
| **Accession** | [accession with link] |
| **Organism** | [species] |
| **Experiment Type** | RNA-seq / Microarray |
| **Platform** | [specific platform] |
| **Samples** | [N] samples |
| **Release Date** | [date] |
**Description**: [Brief description from metadata]
**Experimental Design**:
- Conditions: [treatment vs control, etc.]
- Replicates: [N biological, M technical]
- Tissue/Cell type: [if specified]
**Sample Groups**:
| Group | Samples | Description |
|-------|---------|-------------|
| Control | [N] | [description] |
| Treatment | [N] | [description] |
**Data Files Available**:
| File | Type | Size |
|------|------|------|
| [filename] | Processed data | [size] |
| [filename] | Raw data | [size] |
| [filename] | Sample metadata | [size] |
**Quality Assessment**: ●●● High / ●●○ Medium / ●○○ Low
- Sample size: [adequate/limited]
- Replication: [yes/no]
- Metadata completeness: [complete/partial]
---
### 2. [E-GEOD-XXXXX]: [Title]
[Same structure as above]
---
## Multi-Omics Studies (from BioStudies)
### [S-BSST-XXXXX]: [Title]
| Attribute | Value |
|-----------|-------|
| **Accession** | [accession] |
| **Study Type** | [proteomics/metabolomics/integrated] |
| **Organism** | [species] |
| **Samples** | [N] |
**Data Types Included**:
- [ ] Transcriptomics
- [ ] Proteomics
- [ ] Metabolomics
- [ ] Other: [specify]
---
## Summary Table
| Accession | Type | Samples | Platform | Quality |
|-----------|------|---------|----------|---------|
| [E-MTAB-X] | RNA-seq | [N] | Illumina | ●●● |
| [E-GEOD-X] | Microarray | [N] | Affymetrix | ●●○ |
---
## Recommendations
**For [specific analysis type]**:
- Best experiment: [accession] - [reason]
- Alternative: [accession] - [reason]
**Data Integration Notes**:
- Platform compatibility: [notes on combining datasets]
- Batch considerations: [if applicable]
---
## Data Access
### Direct Download Links
- [E-MTAB-XXXX processed data](link)
- [E-MTAB-XXXX raw data](link)
### Database Links
- ArrayExpress: https://www.ebi.ac.uk/arrayexpress/experiments/[accession]
- BioStudies: https://www.ebi.ac.uk/biostudies/studies/[accession]
Retrieved: [date]
Assessment criteria for expression experiments:
| Tier | Symbol | Criteria |
|---|---|---|
| High Quality | ●●● | ≥3 bio replicates, complete metadata, processed data available |
| Medium Quality | ●●○ | 2-3 replicates OR some metadata gaps, data accessible |
| Low Quality | ●○○ | No replicates, sparse metadata, or data access issues |
| Use with Caution | ○○○ | Single sample, no replication, outdated platform |
Include assessment rationale:
**Quality**: ●●● High
- ✓ 4 biological replicates per condition
- ✓ Complete sample annotations
- ✓ Processed and raw data available
- ✓ Recent RNA-seq platform
Every dataset report MUST include:
User: "Find breast cancer RNA-seq data"
result = tu.tools.arrayexpress_search_experiments(
keywords="breast cancer RNA-seq",
species="Homo sapiens",
limit=20
)
→ Report top experiments with quality assessment
User: "Find TP53 expression experiments in mouse"
result = tu.tools.arrayexpress_search_experiments(
keywords="TP53 p53", # Include aliases
species="Mus musculus",
limit=15
)
→ Report experiments studying this gene
User: "Get details for E-MTAB-5214" → Single experiment profile with all details and files
User: "Find proteomics and transcriptomics studies for liver disease" → Search both ArrayExpress and BioStudies, note integration potential
| Error | Response |
|---|---|
| "No experiments found" | Broaden keywords, remove species filter, try synonyms |
| "Accession not found" | Verify format (E-MTAB-, E-GEOD- , S-BSST*), check if withdrawn |
| "Files not available" | Note in report: "Data files restricted by submitter" |
| "API timeout" | Retry once, then note: "(metadata retrieval incomplete)" |
ArrayExpress (Gene Expression)
| Tool | Purpose |
|---|---|
arrayexpress_search_experiments | Keyword/species search |
arrayexpress_get_experiment_details | Full metadata |
arrayexpress_get_experiment_files | Download links |
arrayexpress_get_experiment_samples | Sample annotations |
BioStudies (Multi-Omics)
| Tool | Purpose |
|---|---|
biostudies_search_studies | Multi-omics search |
biostudies_get_study_details | Study metadata |
biostudies_get_study_files | Data files |
biostudies_get_study_sections | Study structure |
ArrayExpress
| Parameter | Description | Example |
|---|---|---|
keywords | Free text search | "breast cancer RNA-seq" |
species | Scientific name | "Homo sapiens" |
array | Platform filter | "Illumina" |
limit | Max results | 20 |
BioStudies
| Parameter | Description | Example |
|---|---|---|
query | Free text | "proteomics liver" |
limit | Max results | 10 |
Weekly Installs
161
Repository
GitHub Stars
1.2K
First Seen
Feb 4, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
codex153
opencode152
gemini-cli148
github-copilot145
amp140
kimi-cli139
智能OCR文字识别工具 - 支持100+语言,高精度提取图片/PDF/手写文本
976 周安装
电子表格自动化技能:Python openpyxl与pandas创建、编辑、分析、可视化Excel/CSV数据
159 周安装
TDD 工作流指南:红绿重构循环、三定律与实践原则 | 测试驱动开发
159 周安装
Vite React 最佳实践指南:资深开发者构建高性能生产级SPA
159 周安装
Google Calendar自动化教程:通过Rube MCP与Composio实现事件管理与日程安排
159 周安装
OpenCode Bridge 集成指南:CLI、数据库与MCP桥接实现AI助手通信
159 周安装
loop 自主实验循环工具 - 自动化定时运行 AI 实验与性能测试
159 周安装