Arboreto：基因调控网络推断Python库，支持GRNBoost2/GENIE3算法与分布式计算

arboreto by davila7/claude-code-templates

136 周安装量

23,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill arboreto

AI/机器学习生物信息学数据处理

🇨🇳中文介绍

Arboreto

概述

Arboreto 是一个用于从基因表达数据推断基因调控网络（GRN）的计算库，采用并行化算法，可从单机扩展到多节点集群。

核心功能：根据观测数据（细胞、样本、条件）中的表达模式，识别哪些转录因子（TF）调控哪些靶基因。

快速开始

安装 arboreto：

uv pip install arboreto

基本的 GRN 推断：

import pandas as pd
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # 加载表达数据（基因作为列）
    expression_matrix = pd.read_csv('expression_data.tsv', sep='\t')

    # 推断调控网络
    network = grnboost2(expression_data=expression_matrix)

    # 保存结果（TF, target, importance）
    network.to_csv('network.tsv', sep='\t', index=False, header=False)

关键：务必使用 if __name__ == '__main__': 保护，因为 Dask 会生成新进程。

核心功能

1. 基本 GRN 推断

适用于标准的 GRN 推断工作流，包括：

输入数据准备（Pandas DataFrame 或 NumPy 数组）
使用 GRNBoost2 或 GENIE3 运行推断
按转录因子进行筛选
输出格式和解释

：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

单细胞 RNA-seq 分析

import pandas as pd
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # 加载单细胞表达矩阵（细胞 x 基因）
    sc_data = pd.read_csv('scrna_counts.tsv', sep='\t')

    # 推断细胞类型特异性调控网络
    network = grnboost2(expression_data=sc_data, seed=42)

    # 筛选高置信度连接
    high_confidence = network[network['importance'] > 0.5]
    high_confidence.to_csv('grn_high_confidence.tsv', sep='\t', index=False)

批量 RNA-seq 与 TF 筛选

from arboreto.utils import load_tf_names
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # 加载数据
    expression_data = pd.read_csv('rnaseq_tpm.tsv', sep='\t')
    tf_names = load_tf_names('human_tfs.txt')

    # 使用 TF 限制进行推断
    network = grnboost2(
        expression_data=expression_data,
        tf_names=tf_names,
        seed=123
    )

    network.to_csv('tf_target_network.tsv', sep='\t', index=False)

比较分析（多条件）

from arboreto.algo import grnboost2

if __name__ == '__main__':
    # 为不同条件推断网络
    conditions = ['control', 'treatment_24h', 'treatment_48h']

    for condition in conditions:
        data = pd.read_csv(f'{condition}_expression.tsv', sep='\t')
        network = grnboost2(expression_data=data, seed=42)
        network.to_csv(f'{condition}_network.tsv', sep='\t', index=False)

Arboreto 返回一个包含调控连接的 DataFrame：

列	描述
`TF`	转录因子（调控因子）
`target`	靶基因
`importance`	调控重要性分数（越高越强）

每个靶基因的前 N 个连接
重要性阈值（例如 > 0.5）
统计显著性检验（置换检验）

Arboreto 是用于单细胞调控网络分析的 SCENIC 流程的核心组件：

# 步骤 1: 使用 arboreto 进行 GRN 推断
from arboreto.algo import grnboost2
network = grnboost2(expression_data=sc_data, tf_names=tf_list)

# 步骤 2: 使用 pySCENIC 进行调节子识别和活性评分
# （下游分析请参阅 pySCENIC 文档）

始终设置种子以获得可重复的结果：

network = grnboost2(expression_data=matrix, seed=777)

运行多个种子进行稳健性分析：

from distributed import LocalCluster, Client

if __name__ == '__main__':
    client = Client(LocalCluster())

    seeds = [42, 123, 777]
    networks = []

    for seed in seeds:
        net = grnboost2(expression_data=matrix, client_or_address=client, seed=seed)
        networks.append(net)

    # 合并网络并筛选共识连接
    consensus = analyze_consensus(networks)

内存错误：通过筛选低变异基因来减少数据集大小，或使用分布式计算

性能缓慢：使用 GRNBoost2 代替 GENIE3，启用分布式客户端，筛选 TF 列表

Dask 错误：确保脚本中存在 if __name__ == '__main__': 保护

结果为空：检查数据格式（基因作为列），验证 TF 名称是否与基因名称匹配

🇺🇸English

Arboreto

Overview

Arboreto is a computational library for inferring gene regulatory networks (GRNs) from gene expression data using parallelized algorithms that scale from single machines to multi-node clusters.

Core capability : Identify which transcription factors (TFs) regulate which target genes based on expression patterns across observations (cells, samples, conditions).

Quick Start

Install arboreto:

uv pip install arboreto

Basic GRN inference:

import pandas as pd
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # Load expression data (genes as columns)
    expression_matrix = pd.read_csv('expression_data.tsv', sep='\t')

    # Infer regulatory network
    network = grnboost2(expression_data=expression_matrix)

    # Save results (TF, target, importance)
    network.to_csv('network.tsv', sep='\t', index=False, header=False)

Critical : Always use if __name__ == '__main__': guard because Dask spawns new processes.

Core Capabilities

1. Basic GRN Inference

For standard GRN inference workflows including:

Input data preparation (Pandas DataFrame or NumPy array)
Running inference with GRNBoost2 or GENIE3
Filtering by transcription factors
Output format and interpretation

See : references/basic_inference.md

Use the ready-to-run script : scripts/basic_grn_inference.py for standard inference tasks:

python scripts/basic_grn_inference.py expression_data.tsv output_network.tsv --tf-file tfs.txt --seed 777

2. Algorithm Selection

Arboreto provides two algorithms:

GRNBoost2 (Recommended) :

Fast gradient boosting-based inference
Optimized for large datasets (10k+ observations)
Default choice for most analyses

GENIE3 :

Random Forest-based inference
Original multiple regression approach
Use for comparison or validation

Quick comparison:

from arboreto.algo import grnboost2, genie3

# Fast, recommended
network_grnboost = grnboost2(expression_data=matrix)

# Classic algorithm
network_genie3 = genie3(expression_data=matrix)

For detailed algorithm comparison, parameters, and selection guidance : references/algorithms.md

3. Distributed Computing

Scale inference from local multi-core to cluster environments:

Local (default) - Uses all available cores automatically:

network = grnboost2(expression_data=matrix)

Custom local client - Control resources:

from distributed import LocalCluster, Client

local_cluster = LocalCluster(n_workers=10, memory_limit='8GB')
client = Client(local_cluster)

network = grnboost2(expression_data=matrix, client_or_address=client)

client.close()
local_cluster.close()

Cluster computing - Connect to remote Dask scheduler:

from distributed import Client

client = Client('tcp://scheduler:8786')
network = grnboost2(expression_data=matrix, client_or_address=client)

For cluster setup, performance optimization, and large-scale workflows : references/distributed_computing.md

Installation

uv pip install arboreto

Dependencies : scipy, scikit-learn, numpy, pandas, dask, distributed

Common Use Cases

Single-Cell RNA-seq Analysis

import pandas as pd
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # Load single-cell expression matrix (cells x genes)
    sc_data = pd.read_csv('scrna_counts.tsv', sep='\t')

    # Infer cell-type-specific regulatory network
    network = grnboost2(expression_data=sc_data, seed=42)

    # Filter high-confidence links
    high_confidence = network[network['importance'] > 0.5]
    high_confidence.to_csv('grn_high_confidence.tsv', sep='\t', index=False)

Bulk RNA-seq with TF Filtering

from arboreto.utils import load_tf_names
from arboreto.algo import grnboost2

if __name__ == '__main__':
    # Load data
    expression_data = pd.read_csv('rnaseq_tpm.tsv', sep='\t')
    tf_names = load_tf_names('human_tfs.txt')

    # Infer with TF restriction
    network = grnboost2(
        expression_data=expression_data,
        tf_names=tf_names,
        seed=123
    )

    network.to_csv('tf_target_network.tsv', sep='\t', index=False)

Comparative Analysis (Multiple Conditions)

from arboreto.algo import grnboost2

if __name__ == '__main__':
    # Infer networks for different conditions
    conditions = ['control', 'treatment_24h', 'treatment_48h']

    for condition in conditions:
        data = pd.read_csv(f'{condition}_expression.tsv', sep='\t')
        network = grnboost2(expression_data=data, seed=42)
        network.to_csv(f'{condition}_network.tsv', sep='\t', index=False)

Output Interpretation

Arboreto returns a DataFrame with regulatory links:

Column	Description
`TF`	Transcription factor (regulator)
`target`	Target gene
`importance`	Regulatory importance score (higher = stronger)

Filtering strategy :

Top N links per target gene
Importance threshold (e.g., > 0.5)
Statistical significance testing (permutation tests)

Integration with pySCENIC

Arboreto is a core component of the SCENIC pipeline for single-cell regulatory network analysis:

# Step 1: Use arboreto for GRN inference
from arboreto.algo import grnboost2
network = grnboost2(expression_data=sc_data, tf_names=tf_list)

# Step 2: Use pySCENIC for regulon identification and activity scoring
# (See pySCENIC documentation for downstream analysis)

Reproducibility

Always set a seed for reproducible results:

network = grnboost2(expression_data=matrix, seed=777)

Run multiple seeds for robustness analysis:

from distributed import LocalCluster, Client

if __name__ == '__main__':
    client = Client(LocalCluster())

    seeds = [42, 123, 777]
    networks = []

    for seed in seeds:
        net = grnboost2(expression_data=matrix, client_or_address=client, seed=seed)
        networks.append(net)

    # Combine networks and filter consensus links
    consensus = analyze_consensus(networks)

Troubleshooting

Memory errors : Reduce dataset size by filtering low-variance genes or use distributed computing

Slow performance : Use GRNBoost2 instead of GENIE3, enable distributed client, filter TF list

Dask errors : Ensure if __name__ == '__main__': guard is present in scripts

Empty results : Check data format (genes as columns), verify TF names match gene names

Weekly Installs

117

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code100

opencode93

gemini-cli88

cursor88

antigravity82

codex77

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

47,700 周安装

Arboreto：基因调控网络推断Python库，支持GRNBoost2/GENIE3算法与分布式计算

🇨🇳中文介绍

Arboreto

概述

快速开始

核心功能

1. 基本 GRN 推断

相关 Skills

2. 算法选择

3. 分布式计算

安装

常见用例

单细胞 RNA-seq 分析

批量 RNA-seq 与 TF 筛选

比较分析（多条件）

输出解释

与 pySCENIC 集成

可重复性

故障排除

🇺🇸English

Arboreto

Overview

Quick Start

Core Capabilities

1. Basic GRN Inference

2. Algorithm Selection

3. Distributed Computing

Installation

Common Use Cases

Single-Cell RNA-seq Analysis

Bulk RNA-seq with TF Filtering

Comparative Analysis (Multiple Conditions)

Output Interpretation

Integration with pySCENIC

Reproducibility

Troubleshooting

最新 Skills