scikit-bio by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill scikit-bioscikit-bio 是一个用于处理生物数据的综合性 Python 库。应用此技能可进行生物信息学分析,涵盖序列操作、比对、系统发育学、微生物生态学和多元统计。
当用户满足以下情况时,应使用此技能:
使用专门用于 DNA、RNA 和蛋白质数据的类来处理生物序列。
关键操作:
常见模式:
import skbio
# 从文件读取序列
seq = skbio.DNA.read('input.fasta')
# 序列操作
rc = seq.reverse_complement()
rna = seq.transcribe()
protein = rna.translate()
# 查找基序
motif_positions = seq.find_with_regex('ATG[ACGT]{3}')
# 检查属性
has_degens = seq.has_degenerates()
seq_no_gaps = seq.degap()
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
重要说明:
DNA、RNA、Protein 类来处理经过语法验证的序列Sequence 类来处理没有字母表限制的通用序列使用动态规划算法执行成对和多重序列比对。
关键能力:
TabularMSA 存储和操作多重序列比对常见模式:
from skbio.alignment import local_pairwise_align_ssw, TabularMSA
# 成对比对
alignment = local_pairwise_align_ssw(seq1, seq2)
# 访问比对后的序列
msa = alignment.aligned_sequences
# 从文件读取多重比对
msa = TabularMSA.read('alignment.fasta', constructor=skbio.DNA)
# 计算共有序列
consensus = msa.consensus()
重要说明:
local_pairwise_align_ssw 进行局部比对(更快,基于 SSW)StripedSmithWaterman 进行蛋白质比对构建、操作和分析代表进化关系的系统发育树。
关键能力:
常见模式:
from skbio import TreeNode
from skbio.tree import nj
# 从文件读取树
tree = TreeNode.read('tree.nwk')
# 从距离矩阵构建树
tree = nj(distance_matrix)
# 树操作
subtree = tree.shear(['taxon1', 'taxon2', 'taxon3'])
tips = [node for node in tree.tips()]
lca = tree.lowest_common_ancestor(['taxon1', 'taxon2'])
# 计算距离
patristic_dist = tree.find('taxon1').distance(tree.find('taxon2'))
cophenetic_matrix = tree.cophenetic_matrix()
# 比较树
rf_distance = tree.robinson_foulds(other_tree)
重要说明:
nj() 进行邻接法(经典系统发育方法)upgma() 进行 UPGMA(假设分子钟)计算微生物生态学和群落分析的 α 和 β 多样性指标。
关键能力:
常见模式:
from skbio.diversity import alpha_diversity, beta_diversity
import skbio
# α 多样性
alpha = alpha_diversity('shannon', counts_matrix, ids=sample_ids)
faith_pd = alpha_diversity('faith_pd', counts_matrix, ids=sample_ids,
tree=tree, otu_ids=feature_ids)
# β 多样性
bc_dm = beta_diversity('braycurtis', counts_matrix, ids=sample_ids)
unifrac_dm = beta_diversity('unweighted_unifrac', counts_matrix,
ids=sample_ids, tree=tree, otu_ids=feature_ids)
# 获取可用指标
from skbio.diversity import get_alpha_diversity_metrics
print(get_alpha_diversity_metrics())
重要说明:
partial_beta_diversity() 仅计算特定的样本对将高维生物数据降维到可可视化的低维空间。
关键能力:
常见模式:
from skbio.stats.ordination import pcoa, cca
# 从距离矩阵进行 PCoA
pcoa_results = pcoa(distance_matrix)
pc1 = pcoa_results.samples['PC1']
pc2 = pcoa_results.samples['PC2']
# 带有环境变量的 CCA
cca_results = cca(species_matrix, environmental_matrix)
# 保存/加载排序结果
pcoa_results.write('ordination.txt')
results = skbio.OrdinationResults.read('ordination.txt')
重要说明:
执行特定于生态和生物数据的假设检验。
关键能力:
常见模式:
from skbio.stats.distance import permanova, anosim, mantel
# 检验组间是否存在显著差异
permanova_results = permanova(distance_matrix, grouping, permutations=999)
print(f"p-value: {permanova_results['p-value']}")
# ANOSIM 检验
anosim_results = anosim(distance_matrix, grouping, permutations=999)
# 两个距离矩阵之间的 Mantel 检验
mantel_results = mantel(dm1, dm2, method='pearson', permutations=999)
print(f"Correlation: {mantel_results[0]}, p-value: {mantel_results[1]}")
重要说明:
通过自动格式检测读写 19+ 种生物文件格式。
支持的格式:
常见模式:
import skbio
# 通过自动格式检测读取
seq = skbio.DNA.read('file.fasta', format='fasta')
tree = skbio.TreeNode.read('tree.nwk')
# 写入文件
seq.write('output.fasta', format='fasta')
# 用于大文件的生成器(内存高效)
for seq in skbio.io.read('large.fasta', format='fasta', constructor=skbio.DNA):
process(seq)
# 转换格式
seqs = list(skbio.io.read('input.fastq', format='fastq', constructor=skbio.DNA))
skbio.io.write(seqs, format='fasta', into='output.fasta')
重要说明:
into 参数时,格式可以自动检测verify=False 进行 stdin/stdout 管道传输使用统计方法创建和操作距离/相异性矩阵。
关键能力:
常见模式:
from skbio import DistanceMatrix
import numpy as np
# 从数组创建
data = np.array([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
dm = DistanceMatrix(data, ids=['A', 'B', 'C'])
# 访问距离
dist_ab = dm['A', 'B']
row_a = dm['A']
# 从文件读取
dm = DistanceMatrix.read('distances.txt')
# 在下游分析中使用
pcoa_results = pcoa(dm)
permanova_results = permanova(dm, grouping)
重要说明:
处理微生物组研究中常见的特征表(OTU/ASV 表)。
关键能力:
常见模式:
from skbio import Table
# 读取 BIOM 表
table = Table.read('table.biom')
# 访问数据
sample_ids = table.ids(axis='sample')
feature_ids = table.ids(axis='observation')
counts = table.matrix_data
# 过滤
filtered = table.filter(sample_ids_to_keep, axis='sample')
# 与 pandas 相互转换
df = table.to_dataframe()
table = Table.from_dataframe(df)
重要说明:
处理蛋白质语言模型嵌入以进行下游分析。
关键能力:
常见模式:
from skbio.embedding import ProteinEmbedding, ProteinVector
# 从数组创建嵌入
embedding = ProteinEmbedding(embedding_array, sequence_ids)
# 转换为距离矩阵进行分析
dm = embedding.to_distances(metric='euclidean')
# 嵌入空间的 PCoA 可视化
pcoa_results = embedding.to_ordination(metric='euclidean', method='pcoa')
# 导出用于机器学习
array = embedding.to_array()
df = embedding.to_dataframe()
重要说明:
uv pip install scikit-bio
partial_beta_diversity() 并行化有关详细的 API 信息、参数规范和高级用法示例,请参阅 references/api_reference.md,其中包含以下内容的全面文档:
每周安装次数
163
仓库
GitHub 星标数
23.5K
首次出现
2026年1月21日
安全审计
安装于
claude-code141
opencode136
gemini-cli130
cursor127
antigravity118
codex118
scikit-bio is a comprehensive Python library for working with biological data. Apply this skill for bioinformatics analyses spanning sequence manipulation, alignment, phylogenetics, microbial ecology, and multivariate statistics.
This skill should be used when the user:
Work with biological sequences using specialized classes for DNA, RNA, and protein data.
Key operations:
Common patterns:
import skbio
# Read sequences from file
seq = skbio.DNA.read('input.fasta')
# Sequence operations
rc = seq.reverse_complement()
rna = seq.transcribe()
protein = rna.translate()
# Find motifs
motif_positions = seq.find_with_regex('ATG[ACGT]{3}')
# Check for properties
has_degens = seq.has_degenerates()
seq_no_gaps = seq.degap()
Important notes:
DNA, RNA, Protein classes for grammared sequences with validationSequence class for generic sequences without alphabet restrictionsPerform pairwise and multiple sequence alignments using dynamic programming algorithms.
Key capabilities:
TabularMSACommon patterns:
from skbio.alignment import local_pairwise_align_ssw, TabularMSA
# Pairwise alignment
alignment = local_pairwise_align_ssw(seq1, seq2)
# Access aligned sequences
msa = alignment.aligned_sequences
# Read multiple alignment from file
msa = TabularMSA.read('alignment.fasta', constructor=skbio.DNA)
# Calculate consensus
consensus = msa.consensus()
Important notes:
local_pairwise_align_ssw for local alignments (faster, SSW-based)StripedSmithWaterman for protein alignmentsConstruct, manipulate, and analyze phylogenetic trees representing evolutionary relationships.
Key capabilities:
Common patterns:
from skbio import TreeNode
from skbio.tree import nj
# Read tree from file
tree = TreeNode.read('tree.nwk')
# Construct tree from distance matrix
tree = nj(distance_matrix)
# Tree operations
subtree = tree.shear(['taxon1', 'taxon2', 'taxon3'])
tips = [node for node in tree.tips()]
lca = tree.lowest_common_ancestor(['taxon1', 'taxon2'])
# Calculate distances
patristic_dist = tree.find('taxon1').distance(tree.find('taxon2'))
cophenetic_matrix = tree.cophenetic_matrix()
# Compare trees
rf_distance = tree.robinson_foulds(other_tree)
Important notes:
nj() for neighbor joining (classic phylogenetic method)upgma() for UPGMA (assumes molecular clock)Calculate alpha and beta diversity metrics for microbial ecology and community analysis.
Key capabilities:
Common patterns:
from skbio.diversity import alpha_diversity, beta_diversity
import skbio
# Alpha diversity
alpha = alpha_diversity('shannon', counts_matrix, ids=sample_ids)
faith_pd = alpha_diversity('faith_pd', counts_matrix, ids=sample_ids,
tree=tree, otu_ids=feature_ids)
# Beta diversity
bc_dm = beta_diversity('braycurtis', counts_matrix, ids=sample_ids)
unifrac_dm = beta_diversity('unweighted_unifrac', counts_matrix,
ids=sample_ids, tree=tree, otu_ids=feature_ids)
# Get available metrics
from skbio.diversity import get_alpha_diversity_metrics
print(get_alpha_diversity_metrics())
Important notes:
partial_beta_diversity() for computing specific sample pairs onlyReduce high-dimensional biological data to visualizable lower-dimensional spaces.
Key capabilities:
Common patterns:
from skbio.stats.ordination import pcoa, cca
# PCoA from distance matrix
pcoa_results = pcoa(distance_matrix)
pc1 = pcoa_results.samples['PC1']
pc2 = pcoa_results.samples['PC2']
# CCA with environmental variables
cca_results = cca(species_matrix, environmental_matrix)
# Save/load ordination results
pcoa_results.write('ordination.txt')
results = skbio.OrdinationResults.read('ordination.txt')
Important notes:
Perform hypothesis tests specific to ecological and biological data.
Key capabilities:
Common patterns:
from skbio.stats.distance import permanova, anosim, mantel
# Test if groups differ significantly
permanova_results = permanova(distance_matrix, grouping, permutations=999)
print(f"p-value: {permanova_results['p-value']}")
# ANOSIM test
anosim_results = anosim(distance_matrix, grouping, permutations=999)
# Mantel test between two distance matrices
mantel_results = mantel(dm1, dm2, method='pearson', permutations=999)
print(f"Correlation: {mantel_results[0]}, p-value: {mantel_results[1]}")
Important notes:
Read and write 19+ biological file formats with automatic format detection.
Supported formats:
Common patterns:
import skbio
# Read with automatic format detection
seq = skbio.DNA.read('file.fasta', format='fasta')
tree = skbio.TreeNode.read('tree.nwk')
# Write to file
seq.write('output.fasta', format='fasta')
# Generator for large files (memory efficient)
for seq in skbio.io.read('large.fasta', format='fasta', constructor=skbio.DNA):
process(seq)
# Convert formats
seqs = list(skbio.io.read('input.fastq', format='fastq', constructor=skbio.DNA))
skbio.io.write(seqs, format='fasta', into='output.fasta')
Important notes:
into parameter specifiedverify=FalseCreate and manipulate distance/dissimilarity matrices with statistical methods.
Key capabilities:
Common patterns:
from skbio import DistanceMatrix
import numpy as np
# Create from array
data = np.array([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
dm = DistanceMatrix(data, ids=['A', 'B', 'C'])
# Access distances
dist_ab = dm['A', 'B']
row_a = dm['A']
# Read from file
dm = DistanceMatrix.read('distances.txt')
# Use in downstream analyses
pcoa_results = pcoa(dm)
permanova_results = permanova(dm, grouping)
Important notes:
Work with feature tables (OTU/ASV tables) common in microbiome research.
Key capabilities:
Common patterns:
from skbio import Table
# Read BIOM table
table = Table.read('table.biom')
# Access data
sample_ids = table.ids(axis='sample')
feature_ids = table.ids(axis='observation')
counts = table.matrix_data
# Filter
filtered = table.filter(sample_ids_to_keep, axis='sample')
# Convert to/from pandas
df = table.to_dataframe()
table = Table.from_dataframe(df)
Important notes:
Work with protein language model embeddings for downstream analysis.
Key capabilities:
Common patterns:
from skbio.embedding import ProteinEmbedding, ProteinVector
# Create embedding from array
embedding = ProteinEmbedding(embedding_array, sequence_ids)
# Convert to distance matrix for analysis
dm = embedding.to_distances(metric='euclidean')
# PCoA visualization of embedding space
pcoa_results = embedding.to_ordination(metric='euclidean', method='pcoa')
# Export for machine learning
array = embedding.to_array()
df = embedding.to_dataframe()
Important notes:
uv pip install scikit-bio
partial_beta_diversity()For detailed API information, parameter specifications, and advanced usage examples, refer to references/api_reference.md which contains comprehensive documentation on:
Weekly Installs
163
Repository
GitHub Stars
23.5K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code141
opencode136
gemini-cli130
cursor127
antigravity118
codex118
DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本
48,500 周安装