重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
imaging-data-commons by k-dense-ai/claude-scientific-skills
npx skills add https://github.com/k-dense-ai/claude-scientific-skills --skill imaging-data-commons使用 idc-index Python 包查询和下载来自美国国家癌症研究所影像数据共享平台(IDC)的公开癌症影像数据。访问数据无需身份验证。
当前 IDC 数据版本:v23(始终使用 IDCClient().get_idc_version() 验证)
主要工具: idc-index (GitHub)
关键 - 检查包版本并在需要时升级(首先运行此代码):
import idc_index
REQUIRED_VERSION = "0.11.10" # 必须与此文件中的 metadata.idc-index 匹配
installed = idc_index.__version__
if installed < REQUIRED_VERSION:
print(f"Upgrading idc-index from {installed} to {REQUIRED_VERSION}...")
import subprocess
subprocess.run(["pip3", "install", "--upgrade", "--break-system-packages", "idc-index"], check=True)
print("Upgrade complete. Restart Python to use new version.")
else:
print(f"idc-index {installed} meets requirement ({REQUIRED_VERSION})")
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
验证 IDC 数据版本并检查当前数据规模:
from idc_index import IDCClient
client = IDCClient()
# 验证 IDC 数据版本(应为 "v23")
print(f"IDC data version: {client.get_idc_version()}")
# 获取集合计数和总序列数
stats = client.sql_query("""
SELECT
COUNT(DISTINCT collection_id) as collections,
COUNT(DISTINCT analysis_result_id) as analysis_results,
COUNT(DISTINCT PatientID) as patients,
COUNT(DISTINCT StudyInstanceUID) as studies,
COUNT(DISTINCT SeriesInstanceUID) as series,
SUM(instanceCount) as instances,
SUM(series_size_MB)/1000000 as size_TB
FROM index
""")
print(stats)
核心工作流程:
client.sql_query()client.download_from_selection()client.get_viewer_URL(seriesInstanceUID=...)核心章节(内联):
参考指南(按需加载):
| 指南 | 何时加载 |
|---|---|
index_tables_guide.md | 复杂的 JOIN 操作、模式发现、DataFrame 访问 |
use_cases.md | 端到端工作流程示例(训练数据集、批量下载) |
sql_patterns.md | 用于过滤器发现、注释、大小估算的快速 SQL 模式 |
clinical_data_guide.md | 临床/表格数据、影像+临床连接、值映射 |
cloud_storage_guide.md | 直接 S3/GCS 访问、版本控制、UUID 映射 |
dicomweb_guide.md | DICOMweb 端点、PACS 集成 |
digital_pathology_guide.md | 玻片显微镜(SM)、注释(ANN)、病理学工作流程 |
bigquery_guide.md | 完整的 DICOM 元数据、私有元素(需要 GCP) |
cli_guide.md | 命令行工具(idc download、清单文件) |
IDC 在标准 DICOM 层次结构(患者 → 研究 → 序列 → 实例)之上增加了两个分组级别:
tcga_luad、nlst)。一个患者恰好属于一个集合。使用 collection_id 查找原始影像数据,可能包含与图像一起提交的注释;使用 analysis_result_id 查找 AI 生成的或专家注释。
用于查询的关键标识符:
| 标识符 | 范围 | 用途 |
|---|---|---|
collection_id | 数据集分组 | 按项目/研究筛选 |
PatientID | 患者 | 按患者分组图像 |
StudyInstanceUID | DICOM 研究 | 相关序列的分组、可视化 |
SeriesInstanceUID | DICOM 序列 | 相关序列的分组、可视化 |
idc-index 包提供多个元数据索引表,可通过 SQL 或作为 pandas DataFrame 访问。
完整的索引表文档: 使用 https://idc-index.readthedocs.io/en/latest/indices_reference.html 快速检查可用表和列,无需执行任何代码。
重要: 使用 client.indices_overview 获取当前表的描述和列模式。这是可用列及其类型的权威来源——在编写 SQL 或探索数据结构时始终查询它。
| 表 | 行粒度 | 加载方式 | 描述 |
|---|---|---|---|
index | 1 行 = 1 个 DICOM 序列 | 自动 | 所有当前 IDC 数据的主要元数据 |
prior_versions_index | 1 行 = 1 个 DICOM 序列 | 自动 | 先前 IDC 版本中的序列;用于下载已弃用的数据 |
collections_index | 1 行 = 1 个集合 | fetch_index() | 集合级别的元数据和描述 |
analysis_results_index | 1 行 = 1 个分析结果集合 | fetch_index() | 关于派生数据集(注释、分割)的元数据 |
clinical_index | 1 行 = 1 个临床数据列 | fetch_index() | 将临床表列映射到集合的字典 |
sm_index | 1 行 = 1 个玻片显微镜序列 | fetch_index() | 玻片显微镜(病理学)序列元数据 |
sm_instance_index | 1 行 = 1 个玻片显微镜实例 | fetch_index() | 玻片显微镜的实例级别(SOPInstanceUID)元数据 |
seg_index | 1 行 = 1 个 DICOM 分割序列 | fetch_index() | 分割元数据:算法、片段数量、对源图像序列的引用 |
ann_index | 1 行 = 1 个 DICOM ANN 序列 | fetch_index() | 显微镜批量简单注释序列元数据;引用被注释的图像序列 |
ann_group_index | 1 行 = 1 个注释组 | fetch_index() | 详细的注释组元数据:图形类型、注释数量、属性代码、算法 |
contrast_index | 1 行 = 1 个包含对比剂信息的序列 | fetch_index() | 对比剂元数据:药剂名称、成分、给药途径(CT、MR、PT、XA、RF) |
自动 = 实例化 IDCClient() 时自动加载 fetch_index() = 需要 client.fetch_index("table_name") 来加载
关键列没有明确标记,以下是可用于连接操作的一个子集。
| 连接列 | 表 | 用例 |
|---|---|---|
collection_id | index, prior_versions_index, collections_index, clinical_index | 将序列链接到集合元数据或临床数据 |
SeriesInstanceUID | index, prior_versions_index, sm_index, sm_instance_index | 跨表链接序列;连接到玻片显微镜详细信息 |
StudyInstanceUID | index, prior_versions_index | 跨当前和历史数据链接研究 |
PatientID | index, prior_versions_index | 跨当前和历史数据链接患者 |
analysis_result_id | index, analysis_results_index | 将序列链接到分析结果元数据(注释、分割) |
source_DOI | index, analysis_results_index | 通过出版物 DOI 链接 |
crdc_series_uuid | index, prior_versions_index | 通过 CRDC 唯一标识符链接 |
Modality | index, prior_versions_index | 按成像模态筛选 |
SeriesInstanceUID | index, seg_index, ann_index, ann_group_index, contrast_index | 将分割/注释/对比剂序列链接到其索引元数据 |
segmented_SeriesInstanceUID | seg_index → index | 将分割链接到其源图像序列(连接 seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
referenced_SeriesInstanceUID | ann_index → index | 将注释链接到其源图像序列(连接 ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |
注意: Subjects、Updated 和 Description 出现在多个表中,但含义不同(计数 vs 标识符,不同的更新上下文)。
有关详细的连接示例、模式发现模式、关键列参考和 DataFrame 访问,请参阅 references/index_tables_guide.md。
# 获取临床索引(同时下载临床数据表)
client.fetch_index("clinical_index")
# 查询临床索引以查找可用的表及其列
tables = client.sql_query("SELECT DISTINCT table_name, column_label FROM clinical_index")
# 将特定的临床表加载为 DataFrame
clinical_df = client.get_clinical_table("table_name")
有关详细的工作流程,包括值映射模式以及将临床数据与影像数据连接,请参阅 references/clinical_data_guide.md。
| 方法 | 需要身份验证 | 最适合 |
|---|---|---|
idc-index | 否 | 关键查询和下载(推荐) |
| IDC 门户 | 否 | 交互式探索、手动选择、基于浏览器的下载 |
| BigQuery | 是(GCP 账户) | 复杂查询、完整的 DICOM 元数据 |
| DICOMweb 代理 | 否 | 通过 DICOMweb API 进行工具集成 |
| 云存储(S3/GCS) | 否 | 直接文件访问、批量下载、自定义管道 |
云存储组织
IDC 将所有 DICOM 文件保存在公共云存储桶中,这些桶在 AWS S3 和 Google Cloud Storage 之间镜像。文件按 CRDC UUID(而非 DICOM UID)组织以支持版本控制。
| 存储桶(AWS / GCS) | 许可证 | 内容 |
|---|---|---|
idc-open-data / idc-open-data | 无商业限制 | >90% 的 IDC 数据 |
idc-open-data-two / idc-open-idc1 | 无商业限制 | 可能包含头部扫描的集合 |
idc-open-data-cr / idc-open-cr | 商业用途受限(CC BY-NC) | 约 4% 的数据 |
文件存储为 <crdc_series_uuid>/<crdc_instance_uuid>.dcm。可通过 AWS CLI、gsutil 或 s5cmd 匿名访问免费访问(无出口费用)。使用索引中的 series_aws_url 列获取 S3 URL;GCS 使用相同的路径结构。
有关存储桶详细信息、访问命令、UUID 映射和版本控制,请参阅 references/cloud_storage_guide.md。
DICOMweb 访问
IDC 数据可通过 DICOMweb 接口(Google Cloud Healthcare API 实现)获取,以便与 PACS 系统和兼容 DICOMweb 的工具集成。
| 端点 | 身份验证 | 用例 |
|---|---|---|
| 公共代理 | 否 | 测试、中等查询、每日配额 |
| Google Healthcare | 是(GCP) | 生产使用、更高配额 |
有关端点 URL、代码示例、支持的操作和实现细节,请参阅 references/dicomweb_guide.md。
必需(用于基本访问):
pip install --upgrade idc-index
重要: 新的 IDC 数据发布总会触发 idc-index 的新版本。安装时始终使用 --upgrade 标志,除非出于可重复性需要旧版本。
重要: IDC 数据版本 v23 是当前版本。始终验证您的版本:
print(client.get_idc_version()) # 应返回 "v23"
如果您看到旧版本,请使用以下命令升级:pip install --upgrade idc-index
测试版本: idc-index 0.11.10(IDC 数据版本 v23)
可选(用于数据分析):
pip install pandas numpy pydicom
发现 IDC 中可用的影像集合和数据:
from idc_index import IDCClient
client = IDCClient()
# 从主索引获取汇总统计信息
query = """
SELECT
collection_id,
COUNT(DISTINCT PatientID) as patients,
COUNT(DISTINCT SeriesInstanceUID) as series,
SUM(series_size_MB) as size_mb
FROM index
GROUP BY collection_id
ORDER BY patients DESC
"""
collections_summary = client.sql_query(query)
# 对于更丰富的集合元数据,使用 collections_index
client.fetch_index("collections_index")
collections_info = client.sql_query("""
SELECT collection_id, CancerTypes, TumorLocations, Species, Subjects, SupportingData
FROM collections_index
""")
# 对于分析结果(注释、分割),使用 analysis_results_index
client.fetch_index("analysis_results_index")
analysis_info = client.sql_query("""
SELECT analysis_result_id, analysis_result_title, Subjects, Collections, Modalities
FROM analysis_results_index
""")
collections_index 提供每个集合的精选元数据:癌症类型、肿瘤位置、物种、受试者数量和数据类型——无需从主索引聚合。
analysis_results_index 列出派生数据集(AI 分割、专家注释、影像组学特征)及其源集合和模态。
使用 SQL 查询 IDC 迷你索引以查找特定数据集。
首先,探索筛选列的可用值:
from idc_index import IDCClient
client = IDCClient()
# 检查存在哪些 Modality 值
modalities = client.sql_query("""
SELECT DISTINCT Modality, COUNT(*) as series_count
FROM index
GROUP BY Modality
ORDER BY series_count DESC
""")
print(modalities)
# 检查 MR 模态存在哪些 BodyPartExamined 值
body_parts = client.sql_query("""
SELECT DISTINCT BodyPartExamined, COUNT(*) as series_count
FROM index
WHERE Modality = 'MR' AND BodyPartExamined IS NOT NULL
GROUP BY BodyPartExamined
ORDER BY series_count DESC
LIMIT 20
""")
print(body_parts)
然后使用验证过的筛选值进行查询:
# 查找乳腺 MRI 扫描(使用上面探索得到的实际值)
results = client.sql_query("""
SELECT
collection_id,
PatientID,
SeriesInstanceUID,
Modality,
SeriesDescription,
license_short_name
FROM index
WHERE Modality = 'MR'
AND BodyPartExamined = 'BREAST'
LIMIT 20
""")
# 以 pandas DataFrame 形式访问结果
for idx, row in results.iterrows():
print(f"Patient: {row['PatientID']}, Series: {row['SeriesInstanceUID']}")
要通过癌症类型筛选,请与collections_index 连接:
client.fetch_index("collections_index")
results = client.sql_query("""
SELECT i.collection_id, i.PatientID, i.SeriesInstanceUID, i.Modality
FROM index i
JOIN collections_index c ON i.collection_id = c.collection_id
WHERE c.CancerTypes LIKE '%Breast%'
AND i.Modality = 'MR'
LIMIT 20
""")
可用的元数据字段(使用 client.indices_overview 获取完整列表):
注意: 癌症类型在 collections_index.CancerTypes 中,不在主 index 表中。
从 IDC 的云存储高效下载影像数据:
下载整个集合:
from idc_index import IDCClient
client = IDCClient()
# 下载小型集合(RIDER Pilot ~1GB)
client.download_from_selection(
collection_id="rider_pilot",
downloadDir="./data/rider"
)
下载特定序列:
# 首先,查询序列 UID
series_df = client.sql_query("""
SELECT SeriesInstanceUID
FROM index
WHERE Modality = 'CT'
AND BodyPartExamined = 'CHEST'
AND collection_id = 'nlst'
LIMIT 5
""")
# 仅下载这些序列
client.download_from_selection(
seriesInstanceUID=list(series_df['SeriesInstanceUID'].values),
downloadDir="./data/lung_ct"
)
自定义目录结构:
默认 dirTemplate:%collection_id/%PatientID/%StudyInstanceUID/%Modality_%SeriesInstanceUID
# 简化层次结构(省略 StudyInstanceUID 级别)
client.download_from_selection(
collection_id="tcga_luad",
downloadDir="./data",
dirTemplate="%collection_id/%PatientID/%Modality"
)
# 结果:./data/tcga_luad/TCGA-05-4244/CT/
# 扁平结构(所有文件在一个目录中)
client.download_from_selection(
seriesInstanceUID=list(series_df['SeriesInstanceUID'].values),
downloadDir="./data/flat",
dirTemplate=""
)
# 结果:./data/flat/*.dcm
下载的文件名:
单个 DICOM 文件使用其 CRDC 实例 UUID 命名:<crdc_instance_uuid>.dcm(例如,0d73f84e-70ae-4eeb-96a0-1c613b5d9229.dcm)。这种基于 UUID 的命名:
s3://idc-open-data/<crdc_series_uuid>/<crdc_instance_uuid>.dcm)要识别文件,请在查询中使用 crdc_instance_uuid 列或从文件中读取 DICOM 元数据(SOPInstanceUID)。
idc download 命令提供对下载功能的命令行访问,无需编写 Python 代码。安装 idc-index 后可用。
自动检测输入类型: 清单文件路径,或标识符(collection_id, PatientID, StudyInstanceUID, SeriesInstanceUID, crdc_series_uuid)。
# 下载整个集合
idc download rider_pilot --download-dir ./data
# 按 UID 下载特定序列
idc download "1.3.6.1.4.1.9328.50.1.69736" --download-dir ./data
# 下载多个项目(逗号分隔)
idc download "tcga_luad,tcga_lusc" --download-dir ./data
# 从清单文件下载(自动检测)
idc download manifest.txt --download-dir ./data
选项:
| 选项 | 描述 |
|---|---|
--download-dir | 输出目录(默认:当前目录) |
--dir-template | 目录层次结构模板(默认:%collection_id/%PatientID/%StudyInstanceUID/%Modality_%SeriesInstanceUID) |
--log-level | 详细程度:debug, info, warning, error, critical |
清单文件:
清单文件包含 S3 URL(每行一个),可以是:
格式(每行一个 S3 URL):
s3://idc-open-data/cb09464a-c5cc-4428-9339-d7fa87cfe837/*
s3://idc-open-data/88f3990d-bdef-49cd-9b2b-4787767240f2/*
示例:从 Python 查询生成清单:
from idc_index import IDCClient
client = IDCClient()
# 查询序列 URL
results = client.sql_query("""
SELECT series_aws_url
FROM index
WHERE collection_id = 'rider_pilot' AND Modality = 'CT'
""")
# 保存为清单文件
with open('ct_manifest.txt', 'w') as f:
for url in results['series_aws_url']:
f.write(url + '\n')
然后下载:
idc download ct_manifest.txt --download-dir ./ct_data
在浏览器中查看 DICOM 数据而无需下载:
from idc_index import IDCClient
import webbrowser
client = IDCClient()
# 首先查询以获取有效的 UID
results = client.sql_query("""
SELECT SeriesInstanceUID, StudyInstanceUID
FROM index
WHERE collection_id = 'rider_pilot' AND Modality = 'CT'
LIMIT 1
""")
# 查看单个序列
viewer_url = client.get_viewer_URL(seriesInstanceUID=results.iloc[0]['SeriesInstanceUID'])
webbrowser.open(viewer_url)
# 查看研究中的所有序列(对于多序列检查如 MRI 协议很有用)
viewer_url = client.get_viewer_URL(studyInstanceUID=results.iloc[0]['StudyInstanceUID'])
webbrowser.open(viewer_url)
该方法自动为放射学选择 OHIF v3 或为玻片显微镜选择 SLIM。按研究查看非常有用,当一个 DICOM 研究包含多个序列时(例如,来自单个 MRI 会话的 T1、T2、DWI 序列)。
在使用前检查数据许可证(对商业应用至关重要):
from idc_index import IDCClient
client = IDCClient()
# 检查所有集合的许可证
query = """
SELECT DISTINCT
collection_id,
license_short_name,
COUNT(DISTINCT SeriesInstanceUID) as series_count
FROM index
GROUP BY collection_id, license_short_name
ORDER BY collection_id
"""
licenses = client.sql_query(query)
print(licenses)
IDC 中的许可证类型:
重要: 在出版物或商业应用中使用 IDC 数据前,务必检查许可证。每个 DICOM 文件都在元数据中标记了其特定许可证。
source_DOI 列包含 DOI,链接到描述数据生成方式的出版物。为满足注明出处的要求,使用 citations_from_selection() 生成格式正确的引用:
from idc_index import IDCClient
client = IDCClient()
# 获取集合的引用(默认 APA 格式)
citations = client.citations_from_selection(collection_id="rider_pilot")
for citation in citations:
print(citation)
# 获取特定序列的引用
results = client.sql_query("""
SELECT SeriesInstanceUID FROM index
WHERE collection_id = 'tcga_luad' LIMIT 5
""")
citations = client.citations_from_selection(
seriesInstanceUID=list(results['SeriesInstanceUID'].values)
)
# 替代格式:BibTeX(用于 LaTeX 文档)
bibtex_citations = client.citations_from_selection(
collection_id="tcga_luad",
citation_format=IDCClient.CITATION_FORMAT_BIBTEX
)
参数:
collection_id:按集合筛选patientId:按患者 ID 筛选studyInstanceUID:按研究 UID 筛选seriesInstanceUID:按序列 UID 筛选citation_format:使用 IDCClient.CITATION_FORMAT_* 常量:
CITATION_FORMAT_APA(默认)- APA 样式CITATION_FORMAT_BIBTEX - 用于 LaTeX 的 BibTeXCITATION_FORMAT_JSON - CSL JSONCITATION_FORMAT_TURTLE - RDF Turtle最佳实践: 当发布使用 IDC 数据的结果时,包含生成的引用以正确注明数据来源并满足许可证要求。
通过筛选高效处理大型数据集:
from idc_index import IDCClient
import pandas as pd
client = IDCClient()
# 查找来自 GE 扫描仪的胸部 CT 扫描
query = """
SELECT
SeriesInstanceUID,
PatientID,
collection_id,
ManufacturerModelName
FROM index
WHERE Modality = 'CT'
AND BodyPartExamined = 'CHEST'
AND Manufacturer = 'GE MEDICAL SYSTEMS'
AND license_short_name = 'CC BY 4.0'
LIMIT 100
"""
results = client.sql_query(query)
# 保存清单供以后使用
results.to_csv('lung_ct_manifest.csv', index=False)
# 分批下载以避免超时
batch_size = 10
for i in range(0, len(results), batch_size):
batch = results.iloc[i:i+batch_size]
client.download_from_selection(
seriesInstanceUID=list(batch['SeriesInstanceUID'].values),
downloadDir=f"./data/batch_{i//batch_size}"
)
对于需要完整 DICOM 元数据、复杂 JOIN 操作、临床数据表或私有 DICOM 元素的查询,请使用 Google BigQuery。需要启用计费的 GCP 账户。
快速参考:
bigquery-public-data.idc_current.*dicom_all(合并的元数据)dicom_metadata(所有 DICOM 标签)OtherElements 列(供应商特定标签,如扩散 b 值)有关设置、表模式、查询模式、私有元素访问和成本优化,请参阅 references/bigquery_guide.md。
在使用 BigQuery 之前,始终检查专门的索引表是否已包含您需要的元数据:
client.indices_overview 或 idc-index indices reference 发现所有可用表及其列client.fetch_index("table_name")client.sql_query() 在本地查询(免费,无需 GCP 账户)常见的专门索引:seg_index(分割)、ann_index / ann_group_index(显微镜注释)、sm_index(玻片显微镜)、collections_index(集合元数据)。仅当您需要私有 DICOM 元素或任何索引中不存在的属性时才使用 BigQuery。
| 任务 | 工具 | 参考 |
|---|---|---|
| 编程查询和下载 | idc-index | 本文档 |
| 交互式探索 | IDC 门户 | https://portal.imaging.datacommons.cancer.gov/ |
| 复杂元数据查询 | BigQuery | references/bigquery_guide.md |
| 3D 可视化和分析 | SlicerIDCBrowser | https://github.com/ImagingDataCommons/SlicerIDCBrowser |
默认选择: 对于大多数任务使用 idc-index(无需身份验证、易于使用的 API、批量下载)。
将 IDC 数据集成到影像分析工作流程中:
读取下载的 DICOM 文件:
import pydicom
import os
# 从下载的序列读取 DICOM 文件
series_dir = "./data/rider/rider_pilot/RIDER-1007893286/CT_1.3.6.1..."
dicom_files = [os.path.join(series_dir, f) for f in os.listdir(series_dir)
if f.endswith('.dcm')]
# 加载第一张图像
ds = pydicom.dcmread(dicom_files[0])
print(f"Patient ID: {ds.PatientID}")
print(f"Modality: {ds.Modality}")
print(f"Image shape: {ds.pixel_array.shape}")
从 CT 序列构建 3D 体积:
import pydicom
import numpy as np
from pathlib import Path
def load_ct_series(series_path):
"""将 CT 序列加载为 3D numpy 数组"""
files = sorted(Path(series_path).glob('*.dcm'))
slices = [pydicom.dcmread(str(f)) for f in files]
# 按切片位置排序
slices.sort(key=lambda x: float(x.ImagePositionPatient[2]))
# 堆叠成 3D 数组
volume = np.stack([s.pixel_array for s in slices])
return volume, slices[0] # 返回体积和第一张切片用于元数据
volume, metadata = load_ct_series("./data/lung_ct/series_dir")
print(f"Volume shape: {volume.shape}") # (z, y, x)
与 SimpleITK 集成:
import SimpleITK as sitk
from pathlib import Path
# 读取 DICOM 序列
series_path = "./data/ct_series"
reader = sitk.ImageSeriesReader()
dicom_names = reader.GetGDCMSeriesFileNames(series_path)
reader.SetFileNames(dicom_names)
image = reader.Execute()
# 应用处理
smoothed = sitk.CurvatureFlow(image1=image, timeStep=0.125, numberOfIterations=5)
# 保存为 NIfTI
sitk.WriteImage(smoothed, "processed_volume.nii.gz")
有关完整的端到端工作流程示例,请参阅 references/use_cases.md,包括:
client.get_idc_version() 以确认您使用的是预期的数据版本(当前为 v23)。如果使用旧版本,建议 pip install --upgrade idc-indexlicense_short_name 字段并遵守许可条款(CC BY 与 CC BY-NC)citations_from_selection() 从 source_DOI 值获取格式正确的引用;在出版物中包含这些引用LIMIT 子句以避免长时间下载并理解数据结构%collection_id/%PatientID/%Modality问题:ModuleNotFoundError: No module named 'idc_index'
pip install --upgrade idc-index 安装问题:下载因连接超时而失败
dirTemplate 按批次组织下载问题:BigQuery quota exceeded 或计费错误
references/bigquery_guide.md 获取成本优化技巧问题:未找到序列 UID 或未返回数据
LIMIT 5 测试查询问题:下载的 DICOM 文件无法打开
pydicom.dcmread(file, force=True)有关快速参考的 SQL 模式,请参阅 references/sql_patterns.md,包括:
有关分割和注释的详细信息,另请参阅 references/digital_pathology_guide.md。
Use the idc-index Python package to query and download public cancer imaging data from the National Cancer Institute Imaging Data Commons (IDC). No authentication required for data access.
Current IDC Data Version: v23 (always verify with IDCClient().get_idc_version())
Primary tool: idc-index (GitHub)
CRITICAL - Check package version and upgrade if needed (run this FIRST):
import idc_index
REQUIRED_VERSION = "0.11.10" # Must match metadata.idc-index in this file
installed = idc_index.__version__
if installed < REQUIRED_VERSION:
print(f"Upgrading idc-index from {installed} to {REQUIRED_VERSION}...")
import subprocess
subprocess.run(["pip3", "install", "--upgrade", "--break-system-packages", "idc-index"], check=True)
print("Upgrade complete. Restart Python to use new version.")
else:
print(f"idc-index {installed} meets requirement ({REQUIRED_VERSION})")
Verify IDC data version and check current data scale:
from idc_index import IDCClient
client = IDCClient()
# Verify IDC data version (should be "v23")
print(f"IDC data version: {client.get_idc_version()}")
# Get collection count and total series
stats = client.sql_query("""
SELECT
COUNT(DISTINCT collection_id) as collections,
COUNT(DISTINCT analysis_result_id) as analysis_results,
COUNT(DISTINCT PatientID) as patients,
COUNT(DISTINCT StudyInstanceUID) as studies,
COUNT(DISTINCT SeriesInstanceUID) as series,
SUM(instanceCount) as instances,
SUM(series_size_MB)/1000000 as size_TB
FROM index
""")
print(stats)
Core workflow:
client.sql_query()client.download_from_selection()client.get_viewer_URL(seriesInstanceUID=...)Core Sections (inline):
Reference Guides (load on demand):
| Guide | When to Load |
|---|---|
index_tables_guide.md | Complex JOINs, schema discovery, DataFrame access |
use_cases.md | End-to-end workflow examples (training datasets, batch downloads) |
sql_patterns.md | Quick SQL patterns for filter discovery, annotations, size estimation |
clinical_data_guide.md | Clinical/tabular data, imaging+clinical joins, value mapping |
cloud_storage_guide.md | Direct S3/GCS access, versioning, UUID mapping |
dicomweb_guide.md |
IDC adds two grouping levels above the standard DICOM hierarchy (Patient → Study → Series → Instance):
tcga_luad, nlst). A patient belongs to exactly one collection.Use collection_id to find original imaging data, may include annotations deposited along with the images; use analysis_result_id to find AI-generated or expert annotations.
Key identifiers for queries:
| Identifier | Scope | Use for |
|---|---|---|
collection_id | Dataset grouping | Filtering by project/study |
PatientID | Patient | Grouping images by patient |
StudyInstanceUID | DICOM study | Grouping of related series, visualization |
SeriesInstanceUID | DICOM series | Grouping of related series, visualization |
The idc-index package provides multiple metadata index tables, accessible via SQL or as pandas DataFrames.
Complete index table documentation: Use https://idc-index.readthedocs.io/en/latest/indices_reference.html for quick check of available tables and columns without executing any code.
Important: Use client.indices_overview to get current table descriptions and column schemas. This is the authoritative source for available columns and their types — always query it when writing SQL or exploring data structure.
| Table | Row Granularity | Loaded | Description |
|---|---|---|---|
index | 1 row = 1 DICOM series | Auto | Primary metadata for all current IDC data |
prior_versions_index | 1 row = 1 DICOM series | Auto | Series from previous IDC releases; for downloading deprecated data |
collections_index | 1 row = 1 collection | fetch_index() | Collection-level metadata and descriptions |
analysis_results_index | 1 row = 1 analysis result collection | fetch_index() |
Auto = loaded automatically when IDCClient() is instantiated fetch_index() = requires client.fetch_index("table_name") to load
Key columns are not explicitly labeled, the following is a subset that can be used in joins.
| Join Column | Tables | Use Case |
|---|---|---|
collection_id | index, prior_versions_index, collections_index, clinical_index | Link series to collection metadata or clinical data |
SeriesInstanceUID | index, prior_versions_index, sm_index, sm_instance_index | Link series across tables; connect to slide microscopy details |
StudyInstanceUID | index, prior_versions_index | Link studies across current and historical data |
PatientID | index, prior_versions_index | Link patients across current and historical data |
analysis_result_id |
Note: Subjects, Updated, and Description appear in multiple tables but have different meanings (counts vs identifiers, different update contexts).
For detailed join examples, schema discovery patterns, key columns reference, and DataFrame access, see references/index_tables_guide.md.
# Fetch clinical index (also downloads clinical data tables)
client.fetch_index("clinical_index")
# Query clinical index to find available tables and their columns
tables = client.sql_query("SELECT DISTINCT table_name, column_label FROM clinical_index")
# Load a specific clinical table as DataFrame
clinical_df = client.get_clinical_table("table_name")
See references/clinical_data_guide.md for detailed workflows including value mapping patterns and joining clinical data with imaging.
| Method | Auth Required | Best For |
|---|---|---|
idc-index | No | Key queries and downloads (recommended) |
| IDC Portal | No | Interactive exploration, manual selection, browser-based download |
| BigQuery | Yes (GCP account) | Complex queries, full DICOM metadata |
| DICOMweb proxy | No | Tool integration via DICOMweb API |
| Cloud storage (S3/GCS) | No | Direct file access, bulk downloads, custom pipelines |
Cloud storage organization
IDC maintains all DICOM files in public cloud storage buckets mirrored between AWS S3 and Google Cloud Storage. Files are organized by CRDC UUIDs (not DICOM UIDs) to support versioning.
| Bucket (AWS / GCS) | License | Content |
|---|---|---|
idc-open-data / idc-open-data | No commercial restriction | >90% of IDC data |
idc-open-data-two / idc-open-idc1 | No commercial restriction | Collections with potential head scans |
idc-open-data-cr / idc-open-cr | Commercial use restricted (CC BY-NC) | ~4% of data |
Files are stored as <crdc_series_uuid>/<crdc_instance_uuid>.dcm. Access is free (no egress fees) via AWS CLI, gsutil, or s5cmd with anonymous access. Use series_aws_url column from the index for S3 URLs; GCS uses the same path structure.
See references/cloud_storage_guide.md for bucket details, access commands, UUID mapping, and versioning.
DICOMweb access
IDC data is available via DICOMweb interface (Google Cloud Healthcare API implementation) for integration with PACS systems and DICOMweb-compatible tools.
| Endpoint | Auth | Use Case |
|---|---|---|
| Public proxy | No | Testing, moderate queries, daily quota |
| Google Healthcare | Yes (GCP) | Production use, higher quotas |
See references/dicomweb_guide.md for endpoint URLs, code examples, supported operations, and implementation details.
Required (for basic access):
pip install --upgrade idc-index
Important: New IDC data release will always trigger a new version of idc-index. Always use --upgrade flag while installing, unless an older version is needed for reproducibility.
IMPORTANT: IDC data version v23 is current. Always verify your version:
print(client.get_idc_version()) # Should return "v23"
If you see an older version, upgrade with: pip install --upgrade idc-index
Tested with: idc-index 0.11.10 (IDC data version v23)
Optional (for data analysis):
pip install pandas numpy pydicom
Discover what imaging collections and data are available in IDC:
from idc_index import IDCClient
client = IDCClient()
# Get summary statistics from primary index
query = """
SELECT
collection_id,
COUNT(DISTINCT PatientID) as patients,
COUNT(DISTINCT SeriesInstanceUID) as series,
SUM(series_size_MB) as size_mb
FROM index
GROUP BY collection_id
ORDER BY patients DESC
"""
collections_summary = client.sql_query(query)
# For richer collection metadata, use collections_index
client.fetch_index("collections_index")
collections_info = client.sql_query("""
SELECT collection_id, CancerTypes, TumorLocations, Species, Subjects, SupportingData
FROM collections_index
""")
# For analysis results (annotations, segmentations), use analysis_results_index
client.fetch_index("analysis_results_index")
analysis_info = client.sql_query("""
SELECT analysis_result_id, analysis_result_title, Subjects, Collections, Modalities
FROM analysis_results_index
""")
collections_index provides curated metadata per collection: cancer types, tumor locations, species, subject counts, and supporting data types — without needing to aggregate from the primary index.
analysis_results_index lists derived datasets (AI segmentations, expert annotations, radiomics features) with their source collections and modalities.
Query the IDC mini-index using SQL to find specific datasets.
First, explore available values for filter columns:
from idc_index import IDCClient
client = IDCClient()
# Check what Modality values exist
modalities = client.sql_query("""
SELECT DISTINCT Modality, COUNT(*) as series_count
FROM index
GROUP BY Modality
ORDER BY series_count DESC
""")
print(modalities)
# Check what BodyPartExamined values exist for MR modality
body_parts = client.sql_query("""
SELECT DISTINCT BodyPartExamined, COUNT(*) as series_count
FROM index
WHERE Modality = 'MR' AND BodyPartExamined IS NOT NULL
GROUP BY BodyPartExamined
ORDER BY series_count DESC
LIMIT 20
""")
print(body_parts)
Then query with validated filter values:
# Find breast MRI scans (use actual values from exploration above)
results = client.sql_query("""
SELECT
collection_id,
PatientID,
SeriesInstanceUID,
Modality,
SeriesDescription,
license_short_name
FROM index
WHERE Modality = 'MR'
AND BodyPartExamined = 'BREAST'
LIMIT 20
""")
# Access results as pandas DataFrame
for idx, row in results.iterrows():
print(f"Patient: {row['PatientID']}, Series: {row['SeriesInstanceUID']}")
To filter by cancer type, join withcollections_index:
client.fetch_index("collections_index")
results = client.sql_query("""
SELECT i.collection_id, i.PatientID, i.SeriesInstanceUID, i.Modality
FROM index i
JOIN collections_index c ON i.collection_id = c.collection_id
WHERE c.CancerTypes LIKE '%Breast%'
AND i.Modality = 'MR'
LIMIT 20
""")
Available metadata fields (use client.indices_overview for complete list):
Note: Cancer type is in collections_index.CancerTypes, not in the primary index table.
Download imaging data efficiently from IDC's cloud storage:
Download entire collection:
from idc_index import IDCClient
client = IDCClient()
# Download small collection (RIDER Pilot ~1GB)
client.download_from_selection(
collection_id="rider_pilot",
downloadDir="./data/rider"
)
Download specific series:
# First, query for series UIDs
series_df = client.sql_query("""
SELECT SeriesInstanceUID
FROM index
WHERE Modality = 'CT'
AND BodyPartExamined = 'CHEST'
AND collection_id = 'nlst'
LIMIT 5
""")
# Download only those series
client.download_from_selection(
seriesInstanceUID=list(series_df['SeriesInstanceUID'].values),
downloadDir="./data/lung_ct"
)
Custom directory structure:
Default dirTemplate: %collection_id/%PatientID/%StudyInstanceUID/%Modality_%SeriesInstanceUID
# Simplified hierarchy (omit StudyInstanceUID level)
client.download_from_selection(
collection_id="tcga_luad",
downloadDir="./data",
dirTemplate="%collection_id/%PatientID/%Modality"
)
# Results in: ./data/tcga_luad/TCGA-05-4244/CT/
# Flat structure (all files in one directory)
client.download_from_selection(
seriesInstanceUID=list(series_df['SeriesInstanceUID'].values),
downloadDir="./data/flat",
dirTemplate=""
)
# Results in: ./data/flat/*.dcm
Downloaded file names:
Individual DICOM files are named using their CRDC instance UUID: <crdc_instance_uuid>.dcm (e.g., 0d73f84e-70ae-4eeb-96a0-1c613b5d9229.dcm). This UUID-based naming:
s3://idc-open-data/<crdc_series_uuid>/<crdc_instance_uuid>.dcm)To identify files, use the crdc_instance_uuid column in queries or read DICOM metadata (SOPInstanceUID) from the files.
The idc download command provides command-line access to download functionality without writing Python code. Available after installing idc-index.
Auto-detects input type: manifest file path, or identifiers (collection_id, PatientID, StudyInstanceUID, SeriesInstanceUID, crdc_series_uuid).
# Download entire collection
idc download rider_pilot --download-dir ./data
# Download specific series by UID
idc download "1.3.6.1.4.1.9328.50.1.69736" --download-dir ./data
# Download multiple items (comma-separated)
idc download "tcga_luad,tcga_lusc" --download-dir ./data
# Download from manifest file (auto-detected)
idc download manifest.txt --download-dir ./data
Options:
| Option | Description |
|---|---|
--download-dir | Output directory (default: current directory) |
--dir-template | Directory hierarchy template (default: %collection_id/%PatientID/%StudyInstanceUID/%Modality_%SeriesInstanceUID) |
--log-level | Verbosity: debug, info, warning, error, critical |
Manifest files:
Manifest files contain S3 URLs (one per line) and can be:
Format (one S3 URL per line):
s3://idc-open-data/cb09464a-c5cc-4428-9339-d7fa87cfe837/*
s3://idc-open-data/88f3990d-bdef-49cd-9b2b-4787767240f2/*
Example: Generate manifest from Python query:
from idc_index import IDCClient
client = IDCClient()
# Query for series URLs
results = client.sql_query("""
SELECT series_aws_url
FROM index
WHERE collection_id = 'rider_pilot' AND Modality = 'CT'
""")
# Save as manifest file
with open('ct_manifest.txt', 'w') as f:
for url in results['series_aws_url']:
f.write(url + '\n')
Then download:
idc download ct_manifest.txt --download-dir ./ct_data
View DICOM data in browser without downloading:
from idc_index import IDCClient
import webbrowser
client = IDCClient()
# First query to get valid UIDs
results = client.sql_query("""
SELECT SeriesInstanceUID, StudyInstanceUID
FROM index
WHERE collection_id = 'rider_pilot' AND Modality = 'CT'
LIMIT 1
""")
# View single series
viewer_url = client.get_viewer_URL(seriesInstanceUID=results.iloc[0]['SeriesInstanceUID'])
webbrowser.open(viewer_url)
# View all series in a study (useful for multi-series exams like MRI protocols)
viewer_url = client.get_viewer_URL(studyInstanceUID=results.iloc[0]['StudyInstanceUID'])
webbrowser.open(viewer_url)
The method automatically selects OHIF v3 for radiology or SLIM for slide microscopy. Viewing by study is useful when a DICOM Study contains multiple Series (e.g., T1, T2, DWI sequences from a single MRI session).
Check data licensing before use (critical for commercial applications):
from idc_index import IDCClient
client = IDCClient()
# Check licenses for all collections
query = """
SELECT DISTINCT
collection_id,
license_short_name,
COUNT(DISTINCT SeriesInstanceUID) as series_count
FROM index
GROUP BY collection_id, license_short_name
ORDER BY collection_id
"""
licenses = client.sql_query(query)
print(licenses)
License types in IDC:
Important: Always check the license before using IDC data in publications or commercial applications. Each DICOM file is tagged with its specific license in metadata.
The source_DOI column contains DOIs linking to publications describing how the data was generated. To satisfy attribution requirements, use citations_from_selection() to generate properly formatted citations:
from idc_index import IDCClient
client = IDCClient()
# Get citations for a collection (APA format by default)
citations = client.citations_from_selection(collection_id="rider_pilot")
for citation in citations:
print(citation)
# Get citations for specific series
results = client.sql_query("""
SELECT SeriesInstanceUID FROM index
WHERE collection_id = 'tcga_luad' LIMIT 5
""")
citations = client.citations_from_selection(
seriesInstanceUID=list(results['SeriesInstanceUID'].values)
)
# Alternative format: BibTeX (for LaTeX documents)
bibtex_citations = client.citations_from_selection(
collection_id="tcga_luad",
citation_format=IDCClient.CITATION_FORMAT_BIBTEX
)
Parameters:
collection_id: Filter by collection(s)patientId: Filter by patient ID(s)studyInstanceUID: Filter by study UID(s)seriesInstanceUID: Filter by series UID(s)citation_format: Use IDCClient.CITATION_FORMAT_* constants:
CITATION_FORMAT_APA (default) - APA styleCITATION_FORMAT_BIBTEX - BibTeX for LaTeXCITATION_FORMAT_JSON - CSL JSONBest practice: When publishing results using IDC data, include the generated citations to properly attribute the data sources and satisfy license requirements.
Process large datasets efficiently with filtering:
from idc_index import IDCClient
import pandas as pd
client = IDCClient()
# Find chest CT scans from GE scanners
query = """
SELECT
SeriesInstanceUID,
PatientID,
collection_id,
ManufacturerModelName
FROM index
WHERE Modality = 'CT'
AND BodyPartExamined = 'CHEST'
AND Manufacturer = 'GE MEDICAL SYSTEMS'
AND license_short_name = 'CC BY 4.0'
LIMIT 100
"""
results = client.sql_query(query)
# Save manifest for later
results.to_csv('lung_ct_manifest.csv', index=False)
# Download in batches to avoid timeout
batch_size = 10
for i in range(0, len(results), batch_size):
batch = results.iloc[i:i+batch_size]
client.download_from_selection(
seriesInstanceUID=list(batch['SeriesInstanceUID'].values),
downloadDir=f"./data/batch_{i//batch_size}"
)
For queries requiring full DICOM metadata, complex JOINs, clinical data tables, or private DICOM elements, use Google BigQuery. Requires GCP account with billing enabled.
Quick reference:
bigquery-public-data.idc_current.*dicom_all (combined metadata)dicom_metadata (all DICOM tags)OtherElements column (vendor-specific tags like diffusion b-values)See references/bigquery_guide.md for setup, table schemas, query patterns, private element access, and cost optimization.
Before using BigQuery , always check if a specialized index table already has the metadata you need:
client.indices_overview or the idc-index indices reference to discover all available tables and their columnsclient.fetch_index("table_name")client.sql_query() (free, no GCP account needed)Common specialized indices: seg_index (segmentations), ann_index / ann_group_index (microscopy annotations), sm_index (slide microscopy), collections_index (collection metadata). Only use BigQuery if you need private DICOM elements or attributes not in any index.
| Task | Tool | Reference |
|---|---|---|
| Programmatic queries & downloads | idc-index | This document |
| Interactive exploration | IDC Portal | https://portal.imaging.datacommons.cancer.gov/ |
| Complex metadata queries | BigQuery | references/bigquery_guide.md |
| 3D visualization & analysis | SlicerIDCBrowser | https://github.com/ImagingDataCommons/SlicerIDCBrowser |
Default choice: Use idc-index for most tasks (no auth, easy API, batch downloads).
Integrate IDC data into imaging analysis workflows:
Read downloaded DICOM files:
import pydicom
import os
# Read DICOM files from downloaded series
series_dir = "./data/rider/rider_pilot/RIDER-1007893286/CT_1.3.6.1..."
dicom_files = [os.path.join(series_dir, f) for f in os.listdir(series_dir)
if f.endswith('.dcm')]
# Load first image
ds = pydicom.dcmread(dicom_files[0])
print(f"Patient ID: {ds.PatientID}")
print(f"Modality: {ds.Modality}")
print(f"Image shape: {ds.pixel_array.shape}")
Build 3D volume from CT series:
import pydicom
import numpy as np
from pathlib import Path
def load_ct_series(series_path):
"""Load CT series as 3D numpy array"""
files = sorted(Path(series_path).glob('*.dcm'))
slices = [pydicom.dcmread(str(f)) for f in files]
# Sort by slice location
slices.sort(key=lambda x: float(x.ImagePositionPatient[2]))
# Stack into 3D array
volume = np.stack([s.pixel_array for s in slices])
return volume, slices[0] # Return volume and first slice for metadata
volume, metadata = load_ct_series("./data/lung_ct/series_dir")
print(f"Volume shape: {volume.shape}") # (z, y, x)
Integrate with SimpleITK:
import SimpleITK as sitk
from pathlib import Path
# Read DICOM series
series_path = "./data/ct_series"
reader = sitk.ImageSeriesReader()
dicom_names = reader.GetGDCMSeriesFileNames(series_path)
reader.SetFileNames(dicom_names)
image = reader.Execute()
# Apply processing
smoothed = sitk.CurvatureFlow(image1=image, timeStep=0.125, numberOfIterations=5)
# Save as NIfTI
sitk.WriteImage(smoothed, "processed_volume.nii.gz")
See references/use_cases.md for complete end-to-end workflow examples including:
client.get_idc_version() at the start of a session to confirm you're using the expected data version (currently v23). If using an older version, recommend pip install --upgrade idc-indexlicense_short_name field and respect licensing terms (CC BY vs CC BY-NC)citations_from_selection() to get properly formatted citations from source_DOI values; include these in publicationsLIMIT clause when exploring to avoid long downloads and understand data structure%collection_id/%PatientID/%ModalityIssue:ModuleNotFoundError: No module named 'idc_index'
pip install --upgrade idc-indexIssue: Download fails with connection timeout
dirTemplate to organize downloads by batchIssue:BigQuery quota exceeded or billing errors
references/bigquery_guide.md for cost optimization tipsIssue: Series UID not found or no data returned
LIMIT 5 to test query firstIssue: Downloaded DICOM files won't open
pydicom.dcmread(file, force=True)See references/sql_patterns.md for quick-reference SQL patterns including:
For segmentation and annotation details, also see references/digital_pathology_guide.md.
The following skills complement IDC workflows for downstream analysis and visualization:
See references/digital_pathology_guide.md for DICOM-compatible tools (highdicom, wsidicom, TIA-Toolbox, Slim viewer).
Always useclient.indices_overview for current column schemas. This ensures accuracy with the installed idc-index version:
# Get all column names and types for any table
schema = client.indices_overview["index"]["schema"]
columns = [(c['name'], c['type'], c.get('description', '')) for c in schema['columns']]
See the Quick Navigation section at the top for the full list of reference guides with decision triggers.
This skill version is available in skill metadata. To check for updates:
Weekly Installs
50
Repository
GitHub Stars
17.2K
First Seen
Jan 25, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykWarn
Installed on
opencode44
codex43
gemini-cli43
cursor41
claude-code40
github-copilot40
免费AI数据抓取智能体:自动化收集、丰富与存储网站/API数据
1,300 周安装
| DICOMweb endpoints, PACS integration |
digital_pathology_guide.md | Slide microscopy (SM), annotations (ANN), pathology workflows |
bigquery_guide.md | Full DICOM metadata, private elements (requires GCP) |
cli_guide.md | Command-line tools (idc download, manifest files) |
| Metadata about derived datasets (annotations, segmentations) |
clinical_index | 1 row = 1 clinical data column | fetch_index() | Dictionary mapping clinical table columns to collections |
sm_index | 1 row = 1 slide microscopy series | fetch_index() | Slide Microscopy (pathology) series metadata |
sm_instance_index | 1 row = 1 slide microscopy instance | fetch_index() | Instance-level (SOPInstanceUID) metadata for slide microscopy |
seg_index | 1 row = 1 DICOM Segmentation series | fetch_index() | Segmentation metadata: algorithm, segment count, reference to source image series |
ann_index | 1 row = 1 DICOM ANN series | fetch_index() | Microscopy Bulk Simple Annotations series metadata; references annotated image series |
ann_group_index | 1 row = 1 annotation group | fetch_index() | Detailed annotation group metadata: graphic type, annotation count, property codes, algorithm |
contrast_index | 1 row = 1 series with contrast info | fetch_index() | Contrast agent metadata: agent name, ingredient, administration route (CT, MR, PT, XA, RF) |
| index, analysis_results_index |
| Link series to analysis result metadata (annotations, segmentations) |
source_DOI | index, analysis_results_index | Link by publication DOI |
crdc_series_uuid | index, prior_versions_index | Link by CRDC unique identifier |
Modality | index, prior_versions_index | Filter by imaging modality |
SeriesInstanceUID | index, seg_index, ann_index, ann_group_index, contrast_index | Link segmentation/annotation/contrast series to its index metadata |
segmented_SeriesInstanceUID | seg_index → index | Link segmentation to its source image series (join seg_index.segmented_SeriesInstanceUID = index.SeriesInstanceUID) |
referenced_SeriesInstanceUID | ann_index → index | Link annotation to its source image series (join ann_index.referenced_SeriesInstanceUID = index.SeriesInstanceUID) |
CITATION_FORMAT_TURTLE - RDF Turtle