ETE工具包：Python系统发育分析库，用于进化树操作、基因树分析与NCBI分类学集成

etetoolkit by davila7/claude-code-templates

173 周安装量

24,200 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill etetoolkit

Python Web框架科研工具生物信息学

🇨🇳中文介绍

ETE 工具包技能

概述

ETE（进化树探索环境）是一个用于系统发育和层次树分析的工具包。可操作进化树、分析进化事件、可视化结果，并与生物数据库集成，用于系统基因组学研究和聚类分析。

核心功能

1. 进化树操作与分析

加载、操作和分析层次树结构，支持：

进化树输入/输出：读取和写入 Newick、NHX、PhyloXML 和 NeXML 格式
进化树遍历：使用前序、后序或层序策略遍历进化树
拓扑结构修改：修剪、重新定根、折叠节点、解决多歧分支
距离计算：计算分支长度和节点间的拓扑距离
进化树比较：计算 Robinson-Foulds 距离并识别拓扑差异

常见模式：

from ete3 import Tree

# 从文件加载进化树
tree = Tree("tree.nw", format=1)

# 基础统计
print(f"Leaves: {len(tree)}")
print(f"Total nodes: {len(list(tree.traverse()))}")

# 修剪至感兴趣的类群
taxa_to_keep = ["species1", "species2", "species3"]
tree.prune(taxa_to_keep, preserve_branch_length=True)

# 中点定根
midpoint = tree.get_midpoint_outgroup()
tree.set_outgroup(midpoint)

# 保存修改后的进化树
tree.write(outfile="rooted_tree.nw")

使用 scripts/tree_operations.py 进行命令行进化树操作：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 系统发育分析

通过进化事件检测分析基因树：

序列比对集成：将进化树连接到多序列比对（FASTA、Phylip）
物种命名：从基因名称自动或自定义提取物种信息
进化事件：使用物种重叠或进化树协调检测复制和物种形成事件
直系同源检测：基于进化事件识别直系同源和旁系同源基因
基因家族分析：通过复制事件分割进化树，折叠谱系特异性扩张

基因树分析工作流程：

from ete3 import PhyloTree

# 加载带有比对的基因树
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")

# 设置物种命名函数
def get_species(gene_name):
    return gene_name.split("_")[0]

tree.set_species_naming_function(get_species)

# 检测进化事件
events = tree.get_descendant_evol_events()

# 分析事件
for node in tree.traverse():
    if hasattr(node, "evoltype"):
        if node.evoltype == "D":
            print(f"Duplication at {node.name}")
        elif node.evoltype == "S":
            print(f"Speciation at {node.name}")

# 提取直系同源组
ortho_groups = tree.get_speciation_trees()
for i, ortho_tree in enumerate(ortho_groups):
    ortho_tree.write(outfile=f"ortholog_group_{i}.nw")

寻找直系同源和旁系同源基因：

# 查找查询基因的直系同源基因
query = tree & "species1_gene1"

orthologs = []
paralogs = []

for event in events:
    if query in event.in_seqs:
        if event.etype == "S":
            orthologs.extend([s for s in event.out_seqs if s != query])
        elif event.etype == "D":
            paralogs.extend([s for s in event.out_seqs if s != query])

3. NCBI 分类学集成

集成来自 NCBI 分类学数据库的分类学信息：

数据库访问：自动下载并本地缓存 NCBI 分类学数据库（约 300MB）
分类学 ID/名称转换：在分类学 ID 和科学名称之间转换
谱系检索：获取完整的进化谱系
分类学进化树：构建连接指定类群的物种进化树
进化树注释：自动用分类学信息注释进化树

构建基于分类学的进化树：

from ete3 import NCBITaxa

ncbi = NCBITaxa()

# 从物种名称构建进化树
species = ["Homo sapiens", "Pan troglodytes", "Mus musculus"]
name2taxid = ncbi.get_name_translator(species)
taxids = [name2taxid[sp][0] for sp in species]

# 获取连接类群的最小进化树
tree = ncbi.get_topology(taxids)

# 用分类学信息注释节点
for node in tree.traverse():
    if hasattr(node, "sci_name"):
        print(f"{node.sci_name} - Rank: {node.rank} - TaxID: {node.taxid}")

注释现有进化树：

# 获取进化树叶节点的分类学信息
for leaf in tree:
    species = extract_species_from_name(leaf.name)
    taxid = ncbi.get_name_translator([species])[species][0]

    # 获取谱系
    lineage = ncbi.get_lineage(taxid)
    ranks = ncbi.get_rank(lineage)
    names = ncbi.get_taxid_translator(lineage)

    # 添加到节点
    leaf.add_feature("taxid", taxid)
    leaf.add_feature("lineage", [names[t] for t in lineage])

4. 进化树可视化

创建出版物质量的进化树可视化：

输出格式：用于出版物的 PNG（栅格）、PDF 和 SVG（矢量）格式
布局模式：矩形和圆形进化树布局
交互式 GUI：通过缩放、平移和搜索交互式探索进化树
自定义样式：NodeStyle 用于节点外观（颜色、形状、大小）
Faces：向节点添加图形元素（文本、图像、图表、热图）
布局函数：基于节点属性的动态样式

基础可视化工作流程：

from ete3 import Tree, TreeStyle, NodeStyle

tree = Tree("tree.nw")

# 配置进化树样式
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_support = True
ts.scale = 50  # 每分支长度单位的像素数

# 样式化节点
for node in tree.traverse():
    nstyle = NodeStyle()

    if node.is_leaf():
        nstyle["fgcolor"] = "blue"
        nstyle["size"] = 8
    else:
        # 根据支持度着色
        if node.support > 0.9:
            nstyle["fgcolor"] = "darkgreen"
        else:
            nstyle["fgcolor"] = "red"
        nstyle["size"] = 5

    node.set_style(nstyle)

# 渲染到文件
tree.render("tree.pdf", tree_style=ts)
tree.render("tree.png", w=800, h=600, units="px", dpi=300)

使用 scripts/quick_visualize.py 进行快速可视化：

# 基础可视化
python scripts/quick_visualize.py tree.nw output.pdf

# 圆形布局，自定义样式
python scripts/quick_visualize.py tree.nw output.pdf --mode c --color-by-support

# 高分辨率 PNG
python scripts/quick_visualize.py tree.nw output.png --width 1200 --height 800 --units px --dpi 300

# 自定义标题和样式
python scripts/quick_visualize.py tree.nw output.pdf --title "Species Phylogeny" --show-support

使用 Faces 进行高级可视化：

from ete3 import Tree, TreeStyle, TextFace, CircleFace

tree = Tree("tree.nw")

# 向节点添加特征
for leaf in tree:
    leaf.add_feature("habitat", "marine" if "fish" in leaf.name else "land")

# 布局函数
def layout(node):
    if node.is_leaf():
        # 添加彩色圆圈
        color = "blue" if node.habitat == "marine" else "green"
        circle = CircleFace(radius=5, color=color)
        node.add_face(circle, column=0, position="aligned")

        # 添加标签
        label = TextFace(node.name, fsize=10)
        node.add_face(label, column=1, position="aligned")

ts = TreeStyle()
ts.layout_fn = layout
ts.show_leaf_name = False

tree.render("annotated_tree.pdf", tree_style=ts)

通过数据集成分析层次聚类结果：

ClusterTree：用于聚类树状图的专用类
数据矩阵链接：将进化树叶节点连接到数值谱
聚类指标：轮廓系数、邓恩指数、簇间/簇内距离
验证：使用不同距离度量测试聚类质量
热图可视化：在进化树旁显示数据矩阵

聚类工作流程：

from ete3 import ClusterTree

# 加载带有数据矩阵的进化树
matrix = """#Names\tSample1\tSample2\tSample3
Gene1\t1.5\t2.3\t0.8
Gene2\t0.9\t1.1\t1.8
Gene3\t2.1\t2.5\t0.5"""

tree = ClusterTree("((Gene1,Gene2),Gene3);", text_array=matrix)

# 评估聚类质量
for node in tree.traverse():
    if not node.is_leaf():
        silhouette = node.get_silhouette()
        dunn = node.get_dunn()

        print(f"Cluster: {node.name}")
        print(f"  Silhouette: {silhouette:.3f}")
        print(f"  Dunn index: {dunn:.3f}")

# 使用热图可视化
tree.show("heatmap")

量化进化树之间的拓扑差异：

Robinson-Foulds 距离：进化树比较的标准度量
归一化 RF：尺度不变距离（0.0 到 1.0）
分区分析：识别独特和共享的二分割
一致性进化树：分析多个进化树间的支持度
批量比较：成对比较多个进化树

比较两个进化树：

from ete3 import Tree

tree1 = Tree("tree1.nw")
tree2 = Tree("tree2.nw")

# 计算 RF 距离
rf, max_rf, common_leaves, parts_t1, parts_t2 = tree1.robinson_foulds(tree2)

print(f"RF distance: {rf}/{max_rf}")
print(f"Normalized RF: {rf/max_rf:.3f}")
print(f"Common leaves: {len(common_leaves)}")

# 查找独特分区
unique_t1 = parts_t1 - parts_t2
unique_t2 = parts_t2 - parts_t1

print(f"Unique to tree1: {len(unique_t1)}")
print(f"Unique to tree2: {len(unique_t2)}")

比较多个进化树：

import numpy as np

trees = [Tree(f"tree{i}.nw") for i in range(4)]

# 创建距离矩阵
n = len(trees)
dist_matrix = np.zeros((n, n))

for i in range(n):
    for j in range(i+1, n):
        rf, max_rf, _, _, _ = trees[i].robinson_foulds(trees[j])
        norm_rf = rf / max_rf if max_rf > 0 else 0
        dist_matrix[i, j] = norm_rf
        dist_matrix[j, i] = norm_rf

安装 ETE 工具包：

# 基础安装
uv pip install ete3

# 安装渲染的外部依赖（可选但推荐）
# 在 macOS 上：
brew install qt@5

# 在 Ubuntu/Debian 上：
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg

# 包含 GUI 的完整功能
uv pip install ete3[gui]

首次 NCBI 分类学设置：

首次实例化 NCBITaxa 时，它会自动将 NCBI 分类学数据库（约 300MB）下载到 ~/.etetoolkit/taxa.sqlite。此操作仅发生一次：

from ete3 import NCBITaxa
ncbi = NCBITaxa()  # 首次运行时下载数据库

更新分类学数据库：

ncbi.update_taxonomy_database()  # 下载最新的 NCBI 数据

用例 1：系统基因组学流程

从基因树到直系同源基因识别的完整工作流程：

from ete3 import PhyloTree, NCBITaxa

# 1. 加载带有比对的基因树
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")

# 2. 配置物种命名
tree.set_species_naming_function(lambda x: x.split("_")[0])

# 3. 检测进化事件
tree.get_descendant_evol_events()

# 4. 用分类学注释
ncbi = NCBITaxa()
for leaf in tree:
    if leaf.species in species_to_taxid:
        taxid = species_to_taxid[leaf.species]
        lineage = ncbi.get_lineage(taxid)
        leaf.add_feature("lineage", lineage)

# 5. 提取直系同源组
ortho_groups = tree.get_speciation_trees()

# 6. 保存和可视化
for i, ortho in enumerate(ortho_groups):
    ortho.write(outfile=f"ortho_{i}.nw")

用例 2：进化树预处理与格式化

批量处理进化树以进行分析：

# 转换格式
python scripts/tree_operations.py convert input.nw output.nw --in-format 0 --out-format 1

# 中点定根
python scripts/tree_operations.py reroot input.nw rooted.nw --midpoint

# 修剪至核心类群
python scripts/tree_operations.py prune rooted.nw pruned.nw --keep-taxa taxa_list.txt

# 获取统计信息
python scripts/tree_operations.py stats pruned.nw

用例 3：出版物质量图表

创建样式化的可视化：

from ete3 import Tree, TreeStyle, NodeStyle, TextFace

tree = Tree("tree.nw")

# 定义支系颜色
clade_colors = {
    "Mammals": "red",
    "Birds": "blue",
    "Fish": "green"
}

def layout(node):
    # 高亮支系
    if node.is_leaf():
        for clade, color in clade_colors.items():
            if clade in node.name:
                nstyle = NodeStyle()
                nstyle["fgcolor"] = color
                nstyle["size"] = 8
                node.set_style(nstyle)
    else:
        # 添加支持度值
        if node.support > 0.95:
            support = TextFace(f"{node.support:.2f}", fsize=8)
            node.add_face(support, column=0, position="branch-top")

ts = TreeStyle()
ts.layout_fn = layout
ts.show_scale = True

# 为出版物渲染
tree.render("figure.pdf", w=200, units="mm", tree_style=ts)
tree.render("figure.svg", tree_style=ts)  # 可编辑矢量图

用例 4：自动化进化树分析

系统处理多个进化树：

from ete3 import Tree
import os

input_dir = "trees"
output_dir = "processed"

for filename in os.listdir(input_dir):
    if filename.endswith(".nw"):
        tree = Tree(os.path.join(input_dir, filename))

        # 标准化：中点定根，解决多歧分支
        midpoint = tree.get_midpoint_outgroup()
        tree.set_outgroup(midpoint)
        tree.resolve_polytomy(recursive=True)

        # 过滤低支持度分支
        for node in tree.traverse():
            if hasattr(node, 'support') and node.support < 0.5:
                if not node.is_leaf() and not node.is_root():
                    node.delete()

        # 保存处理后的进化树
        output_file = os.path.join(output_dir, f"processed_{filename}")
        tree.write(outfile=output_file)

有关全面的 API 文档、代码示例和详细指南，请参阅 references/ 目录中的以下资源：

api_reference.md：所有 ETE 类和方法（Tree、PhyloTree、ClusterTree、NCBITaxa）的完整 API 文档，包括参数、返回类型和代码示例
workflows.md：按任务组织的常见工作流程模式（进化树操作、系统发育分析、进化树比较、分类学集成、聚类分析）
visualization.md：全面的可视化指南，涵盖 TreeStyle、NodeStyle、Faces、布局函数和高级可视化技术

需要详细信息时加载这些参考：

# 使用 API 参考
# 阅读 references/api_reference.md 以获取完整的方法签名和参数

# 实现工作流程
# 阅读 references/workflows.md 以获取分步工作流程示例

# 创建可视化
# 阅读 references/visualization.md 以获取样式和渲染选项

# 如果出现 "ModuleNotFoundError: No module named 'ete3'"
uv pip install ete3

# 对于 GUI 和渲染问题
uv pip install ete3[gui]

如果 tree.render() 或 tree.show() 因 Qt 相关错误而失败，请安装系统依赖项：

# macOS
brew install qt@5

# Ubuntu/Debian
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg

NCBI 分类学数据库：

如果数据库下载失败或损坏：

from ete3 import NCBITaxa
ncbi = NCBITaxa()
ncbi.update_taxonomy_database()  # 重新下载数据库

大型进化树的内存问题：

对于非常大的进化树（>10,000 个叶节点），使用迭代器而非列表推导式：

# 内存高效迭代
for leaf in tree.iter_leaves():
    process(leaf)

# 而不是
for leaf in tree.get_leaves():  # 将所有加载到内存中
    process(leaf)

ETE 支持多种 Newick 格式规范（0-100）：

格式 0：灵活，带分支长度（默认）
格式 1：带内部节点名称
格式 2：带自举/支持度值
格式 5：内部节点名称 + 分支长度
格式 8：所有特征（名称、距离、支持度）
格式 9：仅叶节点名称
格式 100：仅拓扑结构

读取/写入时指定格式：

tree = Tree("tree.nw", format=1)
tree.write(outfile="output.nw", format=5)

NHX（新罕布什尔扩展）格式保留自定义特征：

tree.write(outfile="tree.nhx", features=["habitat", "temperature", "depth"])

保留分支长度：进行系统发育分析时，修剪时使用 preserve_branch_length=True
缓存内容：对于大型进化树上重复访问节点内容，使用 get_cached_content()
使用迭代器：使用 iter_* 方法对大型进化树进行内存高效处理
选择合适的遍历方式：自底向上分析使用后序遍历，自顶向下分析使用前序遍历
验证单系性：始终检查返回的支系类型（单系/并系/多系）
出版物使用矢量格式：出版物图表使用 PDF 或 SVG 格式（可缩放、可编辑）
交互式测试：使用 tree.show() 在渲染到文件前测试可视化效果
系统发育分析使用 PhyloTree：对于基因树和进化分析，使用 PhyloTree 类
复制方法选择：速度选择 "newick"，完全保真选择 "cpickle"，复杂对象选择 "deepcopy"
NCBI 查询缓存：存储 NCBI 分类学查询结果以避免重复数据库访问

🇺🇸English

ETE Toolkit Skill

Overview

ETE (Environment for Tree Exploration) is a toolkit for phylogenetic and hierarchical tree analysis. Manipulate trees, analyze evolutionary events, visualize results, and integrate with biological databases for phylogenomic research and clustering analysis.

Core Capabilities

1. Tree Manipulation and Analysis

Load, manipulate, and analyze hierarchical tree structures with support for:

Tree I/O : Read and write Newick, NHX, PhyloXML, and NeXML formats
Tree traversal : Navigate trees using preorder, postorder, or levelorder strategies
Topology modification : Prune, root, collapse nodes, resolve polytomies
Distance calculations : Compute branch lengths and topological distances between nodes
Tree comparison : Calculate Robinson-Foulds distances and identify topological differences

Common patterns:

from ete3 import Tree

# Load tree from file
tree = Tree("tree.nw", format=1)

# Basic statistics
print(f"Leaves: {len(tree)}")
print(f"Total nodes: {len(list(tree.traverse()))}")

# Prune to taxa of interest
taxa_to_keep = ["species1", "species2", "species3"]
tree.prune(taxa_to_keep, preserve_branch_length=True)

# Midpoint root
midpoint = tree.get_midpoint_outgroup()
tree.set_outgroup(midpoint)

# Save modified tree
tree.write(outfile="rooted_tree.nw")

Use scripts/tree_operations.py for command-line tree manipulation:

# Display tree statistics
python scripts/tree_operations.py stats tree.nw

# Convert format
python scripts/tree_operations.py convert tree.nw output.nw --in-format 0 --out-format 1

# Reroot tree
python scripts/tree_operations.py reroot tree.nw rooted.nw --midpoint

# Prune to specific taxa
python scripts/tree_operations.py prune tree.nw pruned.nw --keep-taxa "sp1,sp2,sp3"

# Show ASCII visualization
python scripts/tree_operations.py ascii tree.nw

2. Phylogenetic Analysis

Analyze gene trees with evolutionary event detection:

Sequence alignment integration : Link trees to multiple sequence alignments (FASTA, Phylip)
Species naming : Automatic or custom species extraction from gene names
Evolutionary events : Detect duplication and speciation events using Species Overlap or tree reconciliation
Orthology detection : Identify orthologs and paralogs based on evolutionary events
Gene family analysis : Split trees by duplications, collapse lineage-specific expansions

Workflow for gene tree analysis:

from ete3 import PhyloTree

# Load gene tree with alignment
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")

# Set species naming function
def get_species(gene_name):
    return gene_name.split("_")[0]

tree.set_species_naming_function(get_species)

# Detect evolutionary events
events = tree.get_descendant_evol_events()

# Analyze events
for node in tree.traverse():
    if hasattr(node, "evoltype"):
        if node.evoltype == "D":
            print(f"Duplication at {node.name}")
        elif node.evoltype == "S":
            print(f"Speciation at {node.name}")

# Extract ortholog groups
ortho_groups = tree.get_speciation_trees()
for i, ortho_tree in enumerate(ortho_groups):
    ortho_tree.write(outfile=f"ortholog_group_{i}.nw")

Finding orthologs and paralogs:

# Find orthologs to query gene
query = tree & "species1_gene1"

orthologs = []
paralogs = []

for event in events:
    if query in event.in_seqs:
        if event.etype == "S":
            orthologs.extend([s for s in event.out_seqs if s != query])
        elif event.etype == "D":
            paralogs.extend([s for s in event.out_seqs if s != query])

3. NCBI Taxonomy Integration

Integrate taxonomic information from NCBI Taxonomy database:

Database access : Automatic download and local caching of NCBI taxonomy (~300MB)
Taxid/name translation : Convert between taxonomic IDs and scientific names
Lineage retrieval : Get complete evolutionary lineages
Taxonomy trees : Build species trees connecting specified taxa
Tree annotation : Automatically annotate trees with taxonomic information

Building taxonomy-based trees:

from ete3 import NCBITaxa

ncbi = NCBITaxa()

# Build tree from species names
species = ["Homo sapiens", "Pan troglodytes", "Mus musculus"]
name2taxid = ncbi.get_name_translator(species)
taxids = [name2taxid[sp][0] for sp in species]

# Get minimal tree connecting taxa
tree = ncbi.get_topology(taxids)

# Annotate nodes with taxonomy info
for node in tree.traverse():
    if hasattr(node, "sci_name"):
        print(f"{node.sci_name} - Rank: {node.rank} - TaxID: {node.taxid}")

Annotating existing trees:

# Get taxonomy info for tree leaves
for leaf in tree:
    species = extract_species_from_name(leaf.name)
    taxid = ncbi.get_name_translator([species])[species][0]

    # Get lineage
    lineage = ncbi.get_lineage(taxid)
    ranks = ncbi.get_rank(lineage)
    names = ncbi.get_taxid_translator(lineage)

    # Add to node
    leaf.add_feature("taxid", taxid)
    leaf.add_feature("lineage", [names[t] for t in lineage])

4. Tree Visualization

Create publication-quality tree visualizations:

Output formats : PNG (raster), PDF, and SVG (vector) for publications
Layout modes : Rectangular and circular tree layouts
Interactive GUI : Explore trees interactively with zoom, pan, and search
Custom styling : NodeStyle for node appearance (colors, shapes, sizes)
Faces : Add graphical elements (text, images, charts, heatmaps) to nodes
Layout functions : Dynamic styling based on node properties

Basic visualization workflow:

from ete3 import Tree, TreeStyle, NodeStyle

tree = Tree("tree.nw")

# Configure tree style
ts = TreeStyle()
ts.show_leaf_name = True
ts.show_branch_support = True
ts.scale = 50  # pixels per branch length unit

# Style nodes
for node in tree.traverse():
    nstyle = NodeStyle()

    if node.is_leaf():
        nstyle["fgcolor"] = "blue"
        nstyle["size"] = 8
    else:
        # Color by support
        if node.support > 0.9:
            nstyle["fgcolor"] = "darkgreen"
        else:
            nstyle["fgcolor"] = "red"
        nstyle["size"] = 5

    node.set_style(nstyle)

# Render to file
tree.render("tree.pdf", tree_style=ts)
tree.render("tree.png", w=800, h=600, units="px", dpi=300)

Use scripts/quick_visualize.py for rapid visualization:

# Basic visualization
python scripts/quick_visualize.py tree.nw output.pdf

# Circular layout with custom styling
python scripts/quick_visualize.py tree.nw output.pdf --mode c --color-by-support

# High-resolution PNG
python scripts/quick_visualize.py tree.nw output.png --width 1200 --height 800 --units px --dpi 300

# Custom title and styling
python scripts/quick_visualize.py tree.nw output.pdf --title "Species Phylogeny" --show-support

Advanced visualization with faces:

from ete3 import Tree, TreeStyle, TextFace, CircleFace

tree = Tree("tree.nw")

# Add features to nodes
for leaf in tree:
    leaf.add_feature("habitat", "marine" if "fish" in leaf.name else "land")

# Layout function
def layout(node):
    if node.is_leaf():
        # Add colored circle
        color = "blue" if node.habitat == "marine" else "green"
        circle = CircleFace(radius=5, color=color)
        node.add_face(circle, column=0, position="aligned")

        # Add label
        label = TextFace(node.name, fsize=10)
        node.add_face(label, column=1, position="aligned")

ts = TreeStyle()
ts.layout_fn = layout
ts.show_leaf_name = False

tree.render("annotated_tree.pdf", tree_style=ts)

5. Clustering Analysis

Analyze hierarchical clustering results with data integration:

ClusterTree : Specialized class for clustering dendrograms
Data matrix linking : Connect tree leaves to numerical profiles
Cluster metrics : Silhouette coefficient, Dunn index, inter/intra-cluster distances
Validation : Test cluster quality with different distance metrics
Heatmap visualization : Display data matrices alongside trees

Clustering workflow:

from ete3 import ClusterTree

# Load tree with data matrix
matrix = """#Names\tSample1\tSample2\tSample3
Gene1\t1.5\t2.3\t0.8
Gene2\t0.9\t1.1\t1.8
Gene3\t2.1\t2.5\t0.5"""

tree = ClusterTree("((Gene1,Gene2),Gene3);", text_array=matrix)

# Evaluate cluster quality
for node in tree.traverse():
    if not node.is_leaf():
        silhouette = node.get_silhouette()
        dunn = node.get_dunn()

        print(f"Cluster: {node.name}")
        print(f"  Silhouette: {silhouette:.3f}")
        print(f"  Dunn index: {dunn:.3f}")

# Visualize with heatmap
tree.show("heatmap")

6. Tree Comparison

Quantify topological differences between trees:

Robinson-Foulds distance : Standard metric for tree comparison
Normalized RF : Scale-invariant distance (0.0 to 1.0)
Partition analysis : Identify unique and shared bipartitions
Consensus trees : Analyze support across multiple trees
Batch comparison : Compare multiple trees pairwise

Compare two trees:

from ete3 import Tree

tree1 = Tree("tree1.nw")
tree2 = Tree("tree2.nw")

# Calculate RF distance
rf, max_rf, common_leaves, parts_t1, parts_t2 = tree1.robinson_foulds(tree2)

print(f"RF distance: {rf}/{max_rf}")
print(f"Normalized RF: {rf/max_rf:.3f}")
print(f"Common leaves: {len(common_leaves)}")

# Find unique partitions
unique_t1 = parts_t1 - parts_t2
unique_t2 = parts_t2 - parts_t1

print(f"Unique to tree1: {len(unique_t1)}")
print(f"Unique to tree2: {len(unique_t2)}")

Compare multiple trees:

import numpy as np

trees = [Tree(f"tree{i}.nw") for i in range(4)]

# Create distance matrix
n = len(trees)
dist_matrix = np.zeros((n, n))

for i in range(n):
    for j in range(i+1, n):
        rf, max_rf, _, _, _ = trees[i].robinson_foulds(trees[j])
        norm_rf = rf / max_rf if max_rf > 0 else 0
        dist_matrix[i, j] = norm_rf
        dist_matrix[j, i] = norm_rf

Installation and Setup

Install ETE toolkit:

# Basic installation
uv pip install ete3

# With external dependencies for rendering (optional but recommended)
# On macOS:
brew install qt@5

# On Ubuntu/Debian:
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg

# For full features including GUI
uv pip install ete3[gui]

First-time NCBI Taxonomy setup:

The first time NCBITaxa is instantiated, it automatically downloads the NCBI taxonomy database (~300MB) to ~/.etetoolkit/taxa.sqlite. This happens only once:

from ete3 import NCBITaxa
ncbi = NCBITaxa()  # Downloads database on first run

Update taxonomy database:

ncbi.update_taxonomy_database()  # Download latest NCBI data

Common Use Cases

Use Case 1: Phylogenomic Pipeline

Complete workflow from gene tree to ortholog identification:

from ete3 import PhyloTree, NCBITaxa

# 1. Load gene tree with alignment
tree = PhyloTree("gene_tree.nw", alignment="alignment.fasta")

# 2. Configure species naming
tree.set_species_naming_function(lambda x: x.split("_")[0])

# 3. Detect evolutionary events
tree.get_descendant_evol_events()

# 4. Annotate with taxonomy
ncbi = NCBITaxa()
for leaf in tree:
    if leaf.species in species_to_taxid:
        taxid = species_to_taxid[leaf.species]
        lineage = ncbi.get_lineage(taxid)
        leaf.add_feature("lineage", lineage)

# 5. Extract ortholog groups
ortho_groups = tree.get_speciation_trees()

# 6. Save and visualize
for i, ortho in enumerate(ortho_groups):
    ortho.write(outfile=f"ortho_{i}.nw")

Use Case 2: Tree Preprocessing and Formatting

Batch process trees for analysis:

# Convert format
python scripts/tree_operations.py convert input.nw output.nw --in-format 0 --out-format 1

# Root at midpoint
python scripts/tree_operations.py reroot input.nw rooted.nw --midpoint

# Prune to focal taxa
python scripts/tree_operations.py prune rooted.nw pruned.nw --keep-taxa taxa_list.txt

# Get statistics
python scripts/tree_operations.py stats pruned.nw

Use Case 3: Publication-Quality Figures

Create styled visualizations:

from ete3 import Tree, TreeStyle, NodeStyle, TextFace

tree = Tree("tree.nw")

# Define clade colors
clade_colors = {
    "Mammals": "red",
    "Birds": "blue",
    "Fish": "green"
}

def layout(node):
    # Highlight clades
    if node.is_leaf():
        for clade, color in clade_colors.items():
            if clade in node.name:
                nstyle = NodeStyle()
                nstyle["fgcolor"] = color
                nstyle["size"] = 8
                node.set_style(nstyle)
    else:
        # Add support values
        if node.support > 0.95:
            support = TextFace(f"{node.support:.2f}", fsize=8)
            node.add_face(support, column=0, position="branch-top")

ts = TreeStyle()
ts.layout_fn = layout
ts.show_scale = True

# Render for publication
tree.render("figure.pdf", w=200, units="mm", tree_style=ts)
tree.render("figure.svg", tree_style=ts)  # Editable vector

Use Case 4: Automated Tree Analysis

Process multiple trees systematically:

from ete3 import Tree
import os

input_dir = "trees"
output_dir = "processed"

for filename in os.listdir(input_dir):
    if filename.endswith(".nw"):
        tree = Tree(os.path.join(input_dir, filename))

        # Standardize: midpoint root, resolve polytomies
        midpoint = tree.get_midpoint_outgroup()
        tree.set_outgroup(midpoint)
        tree.resolve_polytomy(recursive=True)

        # Filter low support branches
        for node in tree.traverse():
            if hasattr(node, 'support') and node.support < 0.5:
                if not node.is_leaf() and not node.is_root():
                    node.delete()

        # Save processed tree
        output_file = os.path.join(output_dir, f"processed_{filename}")
        tree.write(outfile=output_file)

Reference Documentation

For comprehensive API documentation, code examples, and detailed guides, refer to the following resources in the references/ directory:

api_reference.md : Complete API documentation for all ETE classes and methods (Tree, PhyloTree, ClusterTree, NCBITaxa), including parameters, return types, and code examples
workflows.md : Common workflow patterns organized by task (tree operations, phylogenetic analysis, tree comparison, taxonomy integration, clustering analysis)
visualization.md : Comprehensive visualization guide covering TreeStyle, NodeStyle, Faces, layout functions, and advanced visualization techniques

Load these references when detailed information is needed:

# To use API reference
# Read references/api_reference.md for complete method signatures and parameters

# To implement workflows
# Read references/workflows.md for step-by-step workflow examples

# To create visualizations
# Read references/visualization.md for styling and rendering options

Troubleshooting

Import errors:

# If "ModuleNotFoundError: No module named 'ete3'"
uv pip install ete3

# For GUI and rendering issues
uv pip install ete3[gui]

Rendering issues:

If tree.render() or tree.show() fails with Qt-related errors, install system dependencies:

# macOS
brew install qt@5

# Ubuntu/Debian
sudo apt-get install python3-pyqt5 python3-pyqt5.qtsvg

NCBI Taxonomy database:

If database download fails or becomes corrupted:

from ete3 import NCBITaxa
ncbi = NCBITaxa()
ncbi.update_taxonomy_database()  # Redownload database

Memory issues with large trees:

For very large trees (>10,000 leaves), use iterators instead of list comprehensions:

# Memory-efficient iteration
for leaf in tree.iter_leaves():
    process(leaf)

# Instead of
for leaf in tree.get_leaves():  # Loads all into memory
    process(leaf)

Newick Format Reference

ETE supports multiple Newick format specifications (0-100):

Format 0 : Flexible with branch lengths (default)
Format 1 : With internal node names
Format 2 : With bootstrap/support values
Format 5 : Internal node names + branch lengths
Format 8 : All features (names, distances, support)
Format 9 : Leaf names only
Format 100 : Topology only

Specify format when reading/writing:

tree = Tree("tree.nw", format=1)
tree.write(outfile="output.nw", format=5)

NHX (New Hampshire eXtended) format preserves custom features:

tree.write(outfile="tree.nhx", features=["habitat", "temperature", "depth"])

Best Practices

Preserve branch lengths : Use preserve_branch_length=True when pruning for phylogenetic analysis
Cache content : Use get_cached_content() for repeated access to node contents on large trees
Use iterators : Employ iter_* methods for memory-efficient processing of large trees
Choose appropriate traversal : Postorder for bottom-up analysis, preorder for top-down
Validate monophyly : Always check returned clade type (monophyletic/paraphyletic/polyphyletic)
Vector formats for publication : Use PDF or SVG for publication figures (scalable, editable)
Interactive testing : Use tree.show() to test visualizations before rendering to file
PhyloTree for phylogenetics : Use PhyloTree class for gene trees and evolutionary analysis
Copy method selection : "newick" for speed, "cpickle" for full fidelity, "deepcopy" for complex objects
NCBI query caching : Store NCBI taxonomy query results to avoid repeated database access

Weekly Installs

117

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code101

opencode91

cursor87

gemini-cli86

antigravity81

codex75

PPTX 文件处理全攻略：Python 脚本创建、编辑、分析 .pptx 文件内容与结构

891 周安装