Biopython完整指南：Python生物信息学工具包，处理DNA/RNA/蛋白质序列与结构分析

biopython by davila7/claude-code-templates

164 周安装量

23,500 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill biopython

Python Web框架生物信息学数据处理

🇨🇳中文介绍

Biopython：Python 中的计算分子生物学

概述

Biopython 是一套用于生物计算的综合性免费 Python 工具集。它提供了序列操作、文件 I/O、数据库访问、结构生物信息学、系统发育学以及许多其他生物信息学任务的功能。当前版本是 Biopython 1.85（发布于 2025 年 1 月），支持 Python 3 并需要 NumPy。

何时使用此技能

在以下情况下使用此技能：

处理生物序列（DNA、RNA 或蛋白质）
读取、写入或转换生物文件格式（FASTA、GenBank、FASTQ、PDB、mmCIF 等）
通过 Entrez 访问 NCBI 数据库（GenBank、PubMed、Protein、Gene 等）
运行 BLAST 搜索或解析 BLAST 结果
执行序列比对（成对或多序列比对）
分析来自 PDB 文件的蛋白质结构
创建、操作或可视化系统发育树
查找序列模体或分析模体模式
计算序列统计信息（GC 含量、分子量、解链温度等）
执行结构生物信息学任务
处理群体遗传学数据
任何其他计算分子生物学任务

核心能力

Biopython 被组织成模块化的子包，每个子包处理特定的生物信息学领域：

序列处理 - Bio.Seq 和 Bio.SeqIO 用于序列操作和文件 I/O
比对分析 - Bio.Align 和 Bio.AlignIO 用于成对和多序列比对
数据库访问 - Bio.Entrez 用于以编程方式访问 NCBI 数据库
BLAST 操作 - Bio.Blast 用于运行和解析 BLAST 搜索
结构生物信息学 - Bio.PDB 用于处理 3D 蛋白质结构
系统发育学 - Bio.Phylo 用于系统发育树的操作和可视化
高级功能 - 模体、群体遗传学、序列实用工具等

安装与设置

使用 pip 安装 Biopython（需要 Python 3 和 NumPy）：

uv pip install biopython

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

1. 序列处理（Bio.Seq 和 Bio.SeqIO）

参考： references/sequence_io.md

创建和操作生物序列
读取和写入序列文件（FASTA、GenBank、FASTQ 等）
在文件格式之间转换
从大文件中提取序列
序列翻译、转录和反向互补
处理 SeqRecord 对象

from Bio import SeqIO

# 从 FASTA 文件读取序列
for record in SeqIO.parse("sequences.fasta", "fasta"):
    print(f"{record.id}: {len(record.seq)} bp")

# 将 GenBank 转换为 FASTA
SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")

2. 比对分析（Bio.Align 和 Bio.AlignIO）

参考： references/alignment.md

成对序列比对（全局和局部）
读取和写入多序列比对
使用替换矩阵（BLOSUM、PAM）
计算比对统计信息
自定义比对参数

from Bio import Align

# 成对比对
aligner = Align.PairwiseAligner()
aligner.mode = 'global'
alignments = aligner.align("ACCGGT", "ACGGT")
print(alignments[0])

3. 数据库访问（Bio.Entrez）

参考： references/databases.md

搜索 NCBI 数据库（PubMed、GenBank、Protein、Gene 等）
下载序列和记录
获取出版物信息
跨数据库查找相关记录
使用适当的速率限制进行批量下载

from Bio import Entrez
Entrez.email = "your.email@example.com"

# 搜索 PubMed
handle = Entrez.esearch(db="pubmed", term="biopython", retmax=10)
results = Entrez.read(handle)
handle.close()
print(f"Found {results['Count']} results")

4. BLAST 操作（Bio.Blast）

参考： references/blast.md

通过 NCBI Web 服务运行 BLAST 搜索
运行本地 BLAST 搜索
解析 BLAST XML 输出
按 E 值或同一性过滤结果
提取命中序列

from Bio.Blast import NCBIWWW, NCBIXML

# 运行 BLAST 搜索
result_handle = NCBIWWW.qblast("blastn", "nt", "ATCGATCGATCG")
blast_record = NCBIXML.read(result_handle)

# 显示顶部命中
for alignment in blast_record.alignments[:5]:
    print(f"{alignment.title}: E-value={alignment.hsps[0].expect}")

5. 结构生物信息学（Bio.PDB）

参考： references/structure.md

解析 PDB 和 mmCIF 结构文件
导航蛋白质结构层次结构（SMCRA：结构/模型/链/残基/原子）
计算距离、角度和二面角
二级结构分配（DSSP）
结构叠加和 RMSD 计算
从结构中提取序列

from Bio.PDB import PDBParser

# 解析结构
parser = PDBParser(QUIET=True)
structure = parser.get_structure("1crn", "1crn.pdb")

# 计算 α 碳之间的距离
chain = structure[0]["A"]
distance = chain[10]["CA"] - chain[20]["CA"]
print(f"Distance: {distance:.2f} Å")

6. 系统发育学（Bio.Phylo）

参考： references/phylogenetics.md

读取和写入系统发育树（Newick、NEXUS、phyloXML）
从距离矩阵或比对构建树
树操作（修剪、重新定根、梯形化）
计算系统发育距离
创建一致树
可视化树

from Bio import Phylo

# 读取并可视化树
tree = Phylo.read("tree.nwk", "newick")
Phylo.draw_ascii(tree)

# 计算距离
distance = tree.distance("Species_A", "Species_B")
print(f"Distance: {distance:.3f}")

参考： references/advanced.md

序列模体（Bio.motifs）- 查找和分析模体模式
群体遗传学（Bio.PopGen）- GenePop 文件、Fst 计算、Hardy-Weinberg 检验
序列实用工具（Bio.SeqUtils）- GC 含量、解链温度、分子量、蛋白质分析
限制性分析（Bio.Restriction）- 查找限制性酶切位点
聚类（Bio.Cluster）- K 均值和层次聚类
基因组图谱（GenomeDiagram）- 可视化基因组特征

from Bio.SeqUtils import gc_fraction, molecular_weight
from Bio.Seq import Seq

seq = Seq("ATCGATCGATCG")
print(f"GC content: {gc_fraction(seq):.2%}")
print(f"Molecular weight: {molecular_weight(seq, seq_type='DNA'):.2f} g/mol")

通用工作流程指南

当用户询问特定的 Biopython 任务时：

根据任务描述确定相关模块
使用 Read 工具阅读相应的参考文件
提取相关的代码模式并使其适应用户的特定需求
当任务需要时，组合多个模块

参考文件的示例搜索模式：

# 查找特定函数的信息
grep -n "SeqIO.parse" references/sequence_io.md

# 查找特定任务的示例
grep -n "BLAST" references/blast.md

# 查找特定概念的信息
grep -n "alignment" references/alignment.md

编写 Biopython 代码

编写 Biopython 代码时遵循以下原则：

显式导入模块

from Bio import SeqIO, Entrez
from Bio.Seq import Seq

使用 NCBI 数据库时设置 Entrez 邮箱
```
Entrez.email = "your.email@example.com"
```

使用适当的文件格式 - 检查哪种格式最适合任务

# 常见格式："fasta"、"genbank"、"fastq"、"clustal"、"phylip"

正确处理文件 - 使用后关闭句柄或使用上下文管理器

with open("file.fasta") as handle:
    records = SeqIO.parse(handle, "fasta")

对大文件使用迭代器 - 避免将所有内容加载到内存中

for record in SeqIO.parse("large_file.fasta", "fasta"):
    # 一次处理一条记录

优雅地处理错误 - 网络操作和文件解析可能会失败

try:
    handle = Entrez.efetch(db="nucleotide", id=accession)
except HTTPError as e:
    print(f"Error: {e}")

模式 1：从 GenBank 获取序列

from Bio import Entrez, SeqIO

Entrez.email = "your.email@example.com"

# 获取序列
handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
handle.close()

print(f"Description: {record.description}")
print(f"Sequence length: {len(record.seq)}")

模式 2：序列分析流程

from Bio import SeqIO
from Bio.SeqUtils import gc_fraction

for record in SeqIO.parse("sequences.fasta", "fasta"):
    # 计算统计信息
    gc = gc_fraction(record.seq)
    length = len(record.seq)

    # 查找 ORF、翻译等
    protein = record.seq.translate()

    print(f"{record.id}: {length} bp, GC={gc:.2%}")

模式 3：BLAST 并获取顶部命中

from Bio.Blast import NCBIWWW, NCBIXML
from Bio import Entrez, SeqIO

Entrez.email = "your.email@example.com"

# 运行 BLAST
result_handle = NCBIWWW.qblast("blastn", "nt", sequence)
blast_record = NCBIXML.read(result_handle)

# 获取顶部命中序列号
accessions = [aln.accession for aln in blast_record.alignments[:5]]

# 获取序列
for acc in accessions:
    handle = Entrez.efetch(db="nucleotide", id=acc, rettype="fasta", retmode="text")
    record = SeqIO.read(handle, "fasta")
    handle.close()
    print(f">{record.description}")

模式 4：从序列构建系统发育树

from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor

# 读取比对
alignment = AlignIO.read("alignment.fasta", "fasta")

# 计算距离
calculator = DistanceCalculator("identity")
dm = calculator.get_distance(alignment)

# 构建树
constructor = DistanceTreeConstructor()
tree = constructor.nj(dm)

# 可视化
Phylo.draw_ascii(tree)

在编写代码之前，始终阅读相关的参考文档
使用 grep 搜索参考文件以查找特定函数或示例
在解析之前验证文件格式
优雅地处理缺失数据 - 并非所有记录都包含所有字段
缓存下载的数据 - 不要重复下载相同的序列
尊重 NCBI 速率限制 - 使用 API 密钥和适当的延迟
在处理大文件之前先用小数据集测试
保持 Biopython 更新以获取最新功能和错误修复
使用适当的遗传密码表进行翻译
记录分析参数以确保可重复性

常见问题故障排除

问题："No handlers could be found for logger 'Bio.Entrez'"

解决方案： 这只是一个警告。设置 Entrez.email 可以抑制它。

问题：来自 NCBI 的 "HTTP Error 400"

解决方案： 检查 ID/序列号是否有效且格式正确。

问题：解析文件时出现 "ValueError: EOF"

解决方案： 验证文件格式是否与指定的格式字符串匹配。

问题：比对失败，提示 "sequences are not the same length"

解决方案： 确保在使用 AlignIO 或 MultipleSeqAlignment 之前序列已对齐。

问题：BLAST 搜索速度慢

解决方案： 对于大规模搜索，使用本地 BLAST，或缓存结果。

问题：PDB 解析器警告

解决方案： 使用 PDBParser(QUIET=True) 来抑制警告，或调查结构质量。

要定位参考文件中的信息，请使用以下搜索模式：

# 搜索特定函数
grep -n "function_name" references/*.md

# 查找特定任务的示例
grep -n "example" references/sequence_io.md

# 查找模块的所有出现
grep -n "Bio.Seq" references/*.md

Biopython 为计算分子生物学提供了全面的工具。使用此技能时：

确定任务领域（序列、比对、数据库、BLAST、结构、系统发育学或高级功能）
查阅 references/ 目录中相应的参考文件
使代码示例适应特定的用例
在复杂工作流程需要时，组合多个模块
遵循最佳实践进行文件处理、错误检查和数据管理

模块化的参考文档确保了每个主要 Biopython 功能都有详细、可搜索的信息。

2026 年 1 月 21 日

🇺🇸English

Biopython: Computational Molecular Biology in Python

Overview

Biopython is a comprehensive set of freely available Python tools for biological computation. It provides functionality for sequence manipulation, file I/O, database access, structural bioinformatics, phylogenetics, and many other bioinformatics tasks. The current version is Biopython 1.85 (released January 2025), which supports Python 3 and requires NumPy.

When to Use This Skill

Use this skill when:

Working with biological sequences (DNA, RNA, or protein)
Reading, writing, or converting biological file formats (FASTA, GenBank, FASTQ, PDB, mmCIF, etc.)
Accessing NCBI databases (GenBank, PubMed, Protein, Gene, etc.) via Entrez
Running BLAST searches or parsing BLAST results
Performing sequence alignments (pairwise or multiple sequence alignments)
Analyzing protein structures from PDB files
Creating, manipulating, or visualizing phylogenetic trees
Finding sequence motifs or analyzing motif patterns
Calculating sequence statistics (GC content, molecular weight, melting temperature, etc.)
Performing structural bioinformatics tasks
Working with population genetics data
Any other computational molecular biology task

Core Capabilities

Biopython is organized into modular sub-packages, each addressing specific bioinformatics domains:

Sequence Handling - Bio.Seq and Bio.SeqIO for sequence manipulation and file I/O
Alignment Analysis - Bio.Align and Bio.AlignIO for pairwise and multiple sequence alignments
Database Access - Bio.Entrez for programmatic access to NCBI databases
BLAST Operations - Bio.Blast for running and parsing BLAST searches
Structural Bioinformatics - Bio.PDB for working with 3D protein structures
Phylogenetics - Bio.Phylo for phylogenetic tree manipulation and visualization
Advanced Features - Motifs, population genetics, sequence utilities, and more

Installation and Setup

Install Biopython using pip (requires Python 3 and NumPy):

uv pip install biopython

For NCBI database access, always set your email address (required by NCBI):

from Bio import Entrez
Entrez.email = "your.email@example.com"

# Optional: API key for higher rate limits (10 req/s instead of 3 req/s)
Entrez.api_key = "your_api_key_here"

Using This Skill

This skill provides comprehensive documentation organized by functionality area. When working on a task, consult the relevant reference documentation:

1. Sequence Handling (Bio.Seq & Bio.SeqIO)

Reference: references/sequence_io.md

Use for:

Creating and manipulating biological sequences
Reading and writing sequence files (FASTA, GenBank, FASTQ, etc.)
Converting between file formats
Extracting sequences from large files
Sequence translation, transcription, and reverse complement
Working with SeqRecord objects

Quick example:

from Bio import SeqIO

# Read sequences from FASTA file
for record in SeqIO.parse("sequences.fasta", "fasta"):
    print(f"{record.id}: {len(record.seq)} bp")

# Convert GenBank to FASTA
SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")

2. Alignment Analysis (Bio.Align & Bio.AlignIO)

Reference: references/alignment.md

Use for:

Pairwise sequence alignment (global and local)
Reading and writing multiple sequence alignments
Using substitution matrices (BLOSUM, PAM)
Calculating alignment statistics
Customizing alignment parameters

Quick example:

from Bio import Align

# Pairwise alignment
aligner = Align.PairwiseAligner()
aligner.mode = 'global'
alignments = aligner.align("ACCGGT", "ACGGT")
print(alignments[0])

3. Database Access (Bio.Entrez)

Reference: references/databases.md

Use for:

Searching NCBI databases (PubMed, GenBank, Protein, Gene, etc.)
Downloading sequences and records
Fetching publication information
Finding related records across databases
Batch downloading with proper rate limiting

Quick example:

from Bio import Entrez
Entrez.email = "your.email@example.com"

# Search PubMed
handle = Entrez.esearch(db="pubmed", term="biopython", retmax=10)
results = Entrez.read(handle)
handle.close()
print(f"Found {results['Count']} results")

4. BLAST Operations (Bio.Blast)

Reference: references/blast.md

Use for:

Running BLAST searches via NCBI web services
Running local BLAST searches
Parsing BLAST XML output
Filtering results by E-value or identity
Extracting hit sequences

Quick example:

from Bio.Blast import NCBIWWW, NCBIXML

# Run BLAST search
result_handle = NCBIWWW.qblast("blastn", "nt", "ATCGATCGATCG")
blast_record = NCBIXML.read(result_handle)

# Display top hits
for alignment in blast_record.alignments[:5]:
    print(f"{alignment.title}: E-value={alignment.hsps[0].expect}")

5. Structural Bioinformatics (Bio.PDB)

Reference: references/structure.md

Use for:

Parsing PDB and mmCIF structure files
Navigating protein structure hierarchy (SMCRA: Structure/Model/Chain/Residue/Atom)
Calculating distances, angles, and dihedrals
Secondary structure assignment (DSSP)
Structure superimposition and RMSD calculation
Extracting sequences from structures

Quick example:

from Bio.PDB import PDBParser

# Parse structure
parser = PDBParser(QUIET=True)
structure = parser.get_structure("1crn", "1crn.pdb")

# Calculate distance between alpha carbons
chain = structure[0]["A"]
distance = chain[10]["CA"] - chain[20]["CA"]
print(f"Distance: {distance:.2f} Å")

6. Phylogenetics (Bio.Phylo)

Reference: references/phylogenetics.md

Use for:

Reading and writing phylogenetic trees (Newick, NEXUS, phyloXML)
Building trees from distance matrices or alignments
Tree manipulation (pruning, rerooting, ladderizing)
Calculating phylogenetic distances
Creating consensus trees
Visualizing trees

Quick example:

from Bio import Phylo

# Read and visualize tree
tree = Phylo.read("tree.nwk", "newick")
Phylo.draw_ascii(tree)

# Calculate distance
distance = tree.distance("Species_A", "Species_B")
print(f"Distance: {distance:.3f}")

7. Advanced Features

Reference: references/advanced.md

Use for:

Sequence motifs (Bio.motifs) - Finding and analyzing motif patterns
Population genetics (Bio.PopGen) - GenePop files, Fst calculations, Hardy-Weinberg tests
Sequence utilities (Bio.SeqUtils) - GC content, melting temperature, molecular weight, protein analysis
Restriction analysis (Bio.Restriction) - Finding restriction enzyme sites
Clustering (Bio.Cluster) - K-means and hierarchical clustering
Genome diagrams (GenomeDiagram) - Visualizing genomic features

Quick example:

from Bio.SeqUtils import gc_fraction, molecular_weight
from Bio.Seq import Seq

seq = Seq("ATCGATCGATCG")
print(f"GC content: {gc_fraction(seq):.2%}")
print(f"Molecular weight: {molecular_weight(seq, seq_type='DNA'):.2f} g/mol")

General Workflow Guidelines

Reading Documentation

When a user asks about a specific Biopython task:

Identify the relevant module based on the task description
Read the appropriate reference file using the Read tool
Extract relevant code patterns and adapt them to the user's specific needs
Combine multiple modules when the task requires it

Example search patterns for reference files:

# Find information about specific functions
grep -n "SeqIO.parse" references/sequence_io.md

# Find examples of specific tasks
grep -n "BLAST" references/blast.md

# Find information about specific concepts
grep -n "alignment" references/alignment.md

Writing Biopython Code

Follow these principles when writing Biopython code:

Import modules explicitly

from Bio import SeqIO, Entrez
from Bio.Seq import Seq

Set Entrez email when using NCBI databases

Entrez.email = "your.email@example.com"

Use appropriate file formats - Check which format best suits the task

# Common formats: "fasta", "genbank", "fastq", "clustal", "phylip"

Handle files properly - Close handles after use or use context managers

with open("file.fasta") as handle:
    records = SeqIO.parse(handle, "fasta")

Use iterators for large files - Avoid loading everything into memory

for record in SeqIO.parse("large_file.fasta", "fasta"):
    # Process one record at a time

Handle errors gracefully - Network operations and file parsing can fail

Common Patterns

Pattern 1: Fetch Sequence from GenBank

from Bio import Entrez, SeqIO

Entrez.email = "your.email@example.com"

# Fetch sequence
handle = Entrez.efetch(db="nucleotide", id="EU490707", rettype="gb", retmode="text")
record = SeqIO.read(handle, "genbank")
handle.close()

print(f"Description: {record.description}")
print(f"Sequence length: {len(record.seq)}")

Pattern 2: Sequence Analysis Pipeline

from Bio import SeqIO
from Bio.SeqUtils import gc_fraction

for record in SeqIO.parse("sequences.fasta", "fasta"):
    # Calculate statistics
    gc = gc_fraction(record.seq)
    length = len(record.seq)

    # Find ORFs, translate, etc.
    protein = record.seq.translate()

    print(f"{record.id}: {length} bp, GC={gc:.2%}")

Pattern 3: BLAST and Fetch Top Hits

from Bio.Blast import NCBIWWW, NCBIXML
from Bio import Entrez, SeqIO

Entrez.email = "your.email@example.com"

# Run BLAST
result_handle = NCBIWWW.qblast("blastn", "nt", sequence)
blast_record = NCBIXML.read(result_handle)

# Get top hit accessions
accessions = [aln.accession for aln in blast_record.alignments[:5]]

# Fetch sequences
for acc in accessions:
    handle = Entrez.efetch(db="nucleotide", id=acc, rettype="fasta", retmode="text")
    record = SeqIO.read(handle, "fasta")
    handle.close()
    print(f">{record.description}")

Pattern 4: Build Phylogenetic Tree from Sequences

from Bio import AlignIO, Phylo
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor

# Read alignment
alignment = AlignIO.read("alignment.fasta", "fasta")

# Calculate distances
calculator = DistanceCalculator("identity")
dm = calculator.get_distance(alignment)

# Build tree
constructor = DistanceTreeConstructor()
tree = constructor.nj(dm)

# Visualize
Phylo.draw_ascii(tree)

Best Practices

Always read relevant reference documentation before writing code
Use grep to search reference files for specific functions or examples
Validate file formats before parsing
Handle missing data gracefully - Not all records have all fields
Cache downloaded data - Don't repeatedly download the same sequences
Respect NCBI rate limits - Use API keys and proper delays
Test with small datasets before processing large files
Keep Biopython updated to get latest features and bug fixes
Use appropriate genetic code tables for translation
Document analysis parameters for reproducibility

Troubleshooting Common Issues

Issue: "No handlers could be found for logger 'Bio.Entrez'"

Solution: This is just a warning. Set Entrez.email to suppress it.

Issue: "HTTP Error 400" from NCBI

Solution: Check that IDs/accessions are valid and properly formatted.

Issue: "ValueError: EOF" when parsing files

Solution: Verify file format matches the specified format string.

Issue: Alignment fails with "sequences are not the same length"

Solution: Ensure sequences are aligned before using AlignIO or MultipleSeqAlignment.

Issue: BLAST searches are slow

Solution: Use local BLAST for large-scale searches, or cache results.

Issue: PDB parser warnings

Solution: Use PDBParser(QUIET=True) to suppress warnings, or investigate structure quality.

Additional Resources

Official Documentation : https://biopython.org/docs/latest/
Tutorial : https://biopython.org/docs/latest/Tutorial/
Cookbook : https://biopython.org/docs/latest/Tutorial/ (advanced examples)
GitHub : https://github.com/biopython/biopython
Mailing List : biopython@biopython.org

Quick Reference

To locate information in reference files, use these search patterns:

# Search for specific functions
grep -n "function_name" references/*.md

# Find examples of specific tasks
grep -n "example" references/sequence_io.md

# Find all occurrences of a module
grep -n "Bio.Seq" references/*.md

Summary

Biopython provides comprehensive tools for computational molecular biology. When using this skill:

Identify the task domain (sequences, alignments, databases, BLAST, structures, phylogenetics, or advanced)
Consult the appropriate reference file in the references/ directory
Adapt code examples to the specific use case
Combine multiple modules when needed for complex workflows
Follow best practices for file handling, error checking, and data management

The modular reference documentation ensures detailed, searchable information for every major Biopython capability.

Weekly Installs

140

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

claude-code112

opencode111

gemini-cli105

cursor100

codex95

antigravity92

Apify Actor 输出模式生成工具 - 自动化创建 dataset_schema.json 与 output_schema.json

1,000 周安装

try:
    handle = Entrez.efetch(db="nucleotide", id=accession)
except HTTPError as e:
    print(f"Error: {e}")

Biopython完整指南：Python生物信息学工具包，处理DNA/RNA/蛋白质序列与结构分析

🇨🇳中文介绍

Biopython：Python 中的计算分子生物学

概述

何时使用此技能

核心能力

安装与设置

相关 Skills

使用此技能

1. 序列处理（Bio.Seq 和 Bio.SeqIO）

2. 比对分析（Bio.Align 和 Bio.AlignIO）

3. 数据库访问（Bio.Entrez）

4. BLAST 操作（Bio.Blast）

5. 结构生物信息学（Bio.PDB）

6. 系统发育学（Bio.Phylo）

7. 高级功能

通用工作流程指南

阅读文档

编写 Biopython 代码

常见模式

模式 1：从 GenBank 获取序列

模式 2：序列分析流程

模式 3：BLAST 并获取顶部命中

模式 4：从序列构建系统发育树

最佳实践

常见问题故障排除

问题："No handlers could be found for logger 'Bio.Entrez'"

问题：来自 NCBI 的 "HTTP Error 400"

问题：解析文件时出现 "ValueError: EOF"

问题：比对失败，提示 "sequences are not the same length"

问题：BLAST 搜索速度慢

问题：PDB 解析器警告

其他资源

快速参考

总结

🇺🇸English

Biopython: Computational Molecular Biology in Python

Overview

When to Use This Skill

Core Capabilities

Installation and Setup

Using This Skill

1. Sequence Handling (Bio.Seq & Bio.SeqIO)

2. Alignment Analysis (Bio.Align & Bio.AlignIO)

3. Database Access (Bio.Entrez)

4. BLAST Operations (Bio.Blast)

5. Structural Bioinformatics (Bio.PDB)

6. Phylogenetics (Bio.Phylo)

7. Advanced Features

General Workflow Guidelines

Reading Documentation

Writing Biopython Code

Common Patterns

Pattern 1: Fetch Sequence from GenBank

Pattern 2: Sequence Analysis Pipeline

Pattern 3: BLAST and Fetch Top Hits

Pattern 4: Build Phylogenetic Tree from Sequences

Best Practices

Troubleshooting Common Issues

Issue: "No handlers could be found for logger 'Bio.Entrez'"

Issue: "HTTP Error 400" from NCBI

Issue: "ValueError: EOF" when parsing files

Issue: Alignment fails with "sequences are not the same length"

Issue: BLAST searches are slow

Issue: PDB parser warnings

Additional Resources

Quick Reference

Summary

最新 Skills