免疫组库分析技能：TCR/BCR测序数据分析、克隆扩增检测与表位预测

tooluniverse-immune-repertoire-analysis by mims-harvard/tooluniverse

136 周安装量

1,200 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-immune-repertoire-analysis

数据分析科研工具生物信息学

🇨🇳中文介绍

ToolUniverse 免疫组库分析

用于分析 T 细胞受体（TCR）和 B 细胞受体（BCR）组库测序数据的综合技能，以表征适应性免疫应答、克隆扩增和抗原特异性。

概述

适应性免疫受体组库测序（AIRR-seq）通过对 TCR 和 BCR 可变区进行高通量测序，实现对 T 细胞和 B 细胞群体的全面分析。本技能提供了一个包含 8 个阶段的工作流程，用于：

克隆型识别与追踪
多样性与克隆性评估
V(D)J 基因使用分析
CDR3 序列特征分析
克隆扩增与趋同检测
表位特异性预测
与单细胞表型分析的整合
纵向组库追踪

核心工作流程

阶段 1：数据导入与克隆型定义

从常见格式（MiXCR、ImmunoSEQ、AIRR 标准、10x Genomics VDJ）加载 AIRR-seq 数据。将列标准化为：cloneId、count、frequency、cdr3aa、cdr3nt、v_gene、j_gene、。使用以下三种方法之一定义克隆型：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

阶段 2：多样性与克隆性分析

计算组库的多样性指标：

香农熵：总体多样性（值越高表示越多样）
辛普森指数：两个随机克隆相同的概率
逆辛普森指数：有效克隆型数量
基尼系数：克隆型分布的不平等性
克隆性：1 - Pielou 均匀度（值越高表示克隆性越强）
丰富度：独特克隆型的数量

生成稀疏曲线以评估测序深度是否足够。

阶段 3：V(D)J 基因使用分析

分析按克隆型计数加权的 V 和 J 基因使用模式：

V 基因家族使用频率
J 基因家族使用频率
V-J 配对频率
偏向使用的统计检验（卡方检验 vs. 均匀分布期望）

阶段 4：CDR3 序列分析

表征 CDR3 序列：

长度分布：典型 TCR CDR3 = 12-18 aa；BCR CDR3 = 10-20 aa
氨基酸组成：按克隆型频率加权
标记异常的长度分布（可能表明 PCR 偏倚）

阶段 5：克隆扩增检测

识别频率超过阈值（默认：第 95 百分位数）的扩增克隆型。在多个时间点纵向追踪克隆型，以测量其持续性、平均/最大频率和倍数变化。

阶段 6：趋同性与公共克隆型

趋同性重组：来自不同核苷酸序列的相同 CDR3 氨基酸序列（抗原驱动选择的证据）
公共克隆型：在多个样本/个体间共享（可能表明共同的抗原应答）

阶段 7：表位预测与特异性

查询表位数据库以获取已知的 TCR-表位关联：

IEDB (iedb_search_tcell_assays)：通过序列或 MHC 类别搜索 T 细胞测定记录；使用 iedb_search_epitopes 并配合 sequence_contains 进行基序搜索
BVBRC (BVBRC_search_epitopes)：最适合基于生物体的表位发现（例如，SARS-CoV-2 使用 taxon_id="2697049"）；返回带有 T 细胞/B 细胞测定计数的表位序列
VDJdb（手动）：https://vdjdb.cdr3.net/search
PubMed 文献 (PubMed_search_articles)：搜索 CDR3 + 表位/抗原/特异性
IEDB 详细工具：iedb_get_epitope_antigens（链接表位→抗原），iedb_get_epitope_mhc（MHC 限制性）

阶段 8：与单细胞数据整合

将 TCR/BCR 克隆型与配对单细胞 RNA-seq 的细胞表型关联：

将克隆型映射到细胞条形码
在 UMAP 上识别扩增克隆型的表型
分析克隆型-细胞簇关联（交叉表）
寻找簇特异性克隆型（>80% 的细胞位于一个簇中）
差异基因表达：扩增细胞与非扩增细胞

ToolUniverse 工具集成

使用的关键工具：

iedb_search_tcell_assays - T 细胞测定记录（序列、MHC 类别过滤器）
iedb_search_bcell - B 细胞测定记录
iedb_search_epitopes - 通过 sequence_contains 进行表位基序搜索
BVBRC_search_epitopes - 基于生物体的表位发现（最适合病原体特异性查询）
NCBI_SRA_search_runs - 查找公共 TCR/BCR-seq 数据集（使用 strategy="AMPLICON"）
ImmPort_search_studies - NIAID 免疫学研究（疫苗试验、流式细胞术）
PubMed_search_articles - 关于 TCR/BCR 特异性的文献
UniProt_get_entry_by_accession - 抗原蛋白信息

与其他技能的集成：

tooluniverse-single-cell - 单细胞转录组学
tooluniverse-rnaseq-deseq2 - 批量 RNA-seq 分析
tooluniverse-variant-analysis - 体细胞超突变分析（BCR）

from tooluniverse import ToolUniverse

# 1. 加载数据
tcr_data = load_airr_data("clonotypes.txt", format='mixcr')

# 2. 定义克隆型
clonotypes = define_clonotypes(tcr_data, method='vj_cdr3')

# 3. 计算多样性
diversity = calculate_diversity(clonotypes['count'])
print(f"Shannon entropy: {diversity['shannon_entropy']:.2f}")

# 4. 检测扩增克隆
expansion = detect_expanded_clones(clonotypes)
print(f"Expanded clonotypes: {expansion['n_expanded']}")

# 5. 分析 V(D)J 使用情况
vdj_usage = analyze_vdj_usage(tcr_data)

# 6. 查询表位数据库
top_clones = expansion['expanded_clonotypes']['clonotype'].head(10)
epitopes = query_epitope_database(top_clones)

Dash P, et al. (2017) Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature
Glanville J, et al. (2017) Identifying specificity groups in the T cell receptor repertoire. Nature
Stubbington MJT, et al. (2016) T cell fate and clonality inference from single-cell transcriptomes. Nature Methods
Vander Heiden JA, et al. (2014) pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics

ANALYSIS_DETAILS.md - 所有 8 个阶段的详细代码片段
USE_CASES.md - 完整用例（免疫治疗、疫苗、自身免疫、单细胞整合）和最佳实践

2026 年 2 月 19 日

🇺🇸English

ToolUniverse Immune Repertoire Analysis

Comprehensive skill for analyzing T-cell receptor (TCR) and B-cell receptor (BCR) repertoire sequencing data to characterize adaptive immune responses, clonal expansion, and antigen specificity.

Overview

Adaptive immune receptor repertoire sequencing (AIRR-seq) enables comprehensive profiling of T-cell and B-cell populations through high-throughput sequencing of TCR and BCR variable regions. This skill provides an 8-phase workflow for:

Clonotype identification and tracking
Diversity and clonality assessment
V(D)J gene usage analysis
CDR3 sequence characterization
Clonal expansion and convergence detection
Epitope specificity prediction
Integration with single-cell phenotyping
Longitudinal repertoire tracking

Core Workflow

Phase 1: Data Import & Clonotype Definition

Load AIRR-seq data from common formats (MiXCR, ImmunoSEQ, AIRR standard, 10x Genomics VDJ). Standardize columns to: cloneId, count, frequency, cdr3aa, cdr3nt, v_gene, j_gene, chain. Define clonotypes using one of three methods:

cdr3aa : Amino acid CDR3 sequence only
cdr3nt : Nucleotide CDR3 sequence
vj_cdr3 : V gene + J gene + CDR3aa (most common, recommended)

Aggregate by clonotype, sort by count, assign ranks.

Phase 2: Diversity & Clonality Analysis

Calculate diversity metrics for the repertoire:

Shannon entropy : Overall diversity (higher = more diverse)
Simpson index : Probability two random clones are same
Inverse Simpson : Effective number of clonotypes
Gini coefficient : Inequality in clonotype distribution
Clonality : 1 - Pielou's evenness (higher = more clonal)
Richness : Number of unique clonotypes

Generate rarefaction curves to assess whether sequencing depth is sufficient.

Phase 3: V(D)J Gene Usage Analysis

Analyze V and J gene usage patterns weighted by clonotype count:

V gene family usage frequencies
J gene family usage frequencies
V-J pairing frequencies
Statistical testing for biased usage (chi-square test vs. uniform expectation)

Phase 4: CDR3 Sequence Analysis

Characterize CDR3 sequences:

Length distribution : Typical TCR CDR3 = 12-18 aa; BCR CDR3 = 10-20 aa
Amino acid composition : Weighted by clonotype frequency
Flag unusual length distributions (may indicate PCR bias)

Phase 5: Clonal Expansion Detection

Identify expanded clonotypes above a frequency threshold (default: 95th percentile). Track clonotypes longitudinally across multiple timepoints to measure persistence, mean/max frequency, and fold changes.

Phase 6: Convergence & Public Clonotypes

Convergent recombination : Same CDR3 amino acid from different nucleotide sequences (evidence of antigen-driven selection)
Public clonotypes : Shared across multiple samples/individuals (may indicate common antigen responses)

Phase 7: Epitope Prediction & Specificity

Query epitope databases for known TCR-epitope associations:

IEDB (iedb_search_tcell_assays): Search T-cell assay records by sequence or MHC class; use iedb_search_epitopes with sequence_contains for motif search
BVBRC (BVBRC_search_epitopes): Best for organism-based epitope discovery (e.g., taxon_id="2697049" for SARS-CoV-2); returns epitope sequences with T-cell/B-cell assay counts
VDJdb (manual): https://vdjdb.cdr3.net/search
PubMed literature (PubMed_search_articles): Search for CDR3 + epitope/antigen/specificity
IEDB detail tools : iedb_get_epitope_antigens (link epitope→antigen), iedb_get_epitope_mhc (MHC restriction)

Phase 8: Integration with Single-Cell Data

Link TCR/BCR clonotypes to cell phenotypes from paired single-cell RNA-seq:

Map clonotypes to cell barcodes
Identify expanded clonotype phenotypes on UMAP
Analyze clonotype-cluster associations (cross-tabulation)
Find cluster-specific clonotypes (>80% cells in one cluster)
Differential gene expression: expanded vs. non-expanded cells

ToolUniverse Tool Integration

Key Tools Used :

iedb_search_tcell_assays - T-cell assay records (sequence, MHC class filters)
iedb_search_bcell - B-cell assay records
iedb_search_epitopes - Epitope motif search via sequence_contains
BVBRC_search_epitopes - Organism-based epitope discovery (best for pathogen-specific queries)
NCBI_SRA_search_runs - Find public TCR/BCR-seq datasets (use strategy="AMPLICON")
ImmPort_search_studies - NIAID immunology studies (vaccine trials, flow cytometry)
PubMed_search_articles - Literature on TCR/BCR specificity
UniProt_get_entry_by_accession - Antigen protein information

Integration with Other Skills :

tooluniverse-single-cell - Single-cell transcriptomics
tooluniverse-rnaseq-deseq2 - Bulk RNA-seq analysis
tooluniverse-variant-analysis - Somatic hypermutation analysis (BCR)

Quick Start

from tooluniverse import ToolUniverse

# 1. Load data
tcr_data = load_airr_data("clonotypes.txt", format='mixcr')

# 2. Define clonotypes
clonotypes = define_clonotypes(tcr_data, method='vj_cdr3')

# 3. Calculate diversity
diversity = calculate_diversity(clonotypes['count'])
print(f"Shannon entropy: {diversity['shannon_entropy']:.2f}")

# 4. Detect expanded clones
expansion = detect_expanded_clones(clonotypes)
print(f"Expanded clonotypes: {expansion['n_expanded']}")

# 5. Analyze V(D)J usage
vdj_usage = analyze_vdj_usage(tcr_data)

# 6. Query epitope databases
top_clones = expansion['expanded_clonotypes']['clonotype'].head(10)
epitopes = query_epitope_database(top_clones)

References

Dash P, et al. (2017) Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature
Glanville J, et al. (2017) Identifying specificity groups in the T cell receptor repertoire. Nature
Stubbington MJT, et al. (2016) T cell fate and clonality inference from single-cell transcriptomes. Nature Methods
Vander Heiden JA, et al. (2014) pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics