pysam by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill pysamPysam 是一个用于读取、操作和写入基因组数据集的 Python 模块。通过 Pythonic 接口访问 htslib,可以读写 SAM/BAM/CRAM 比对文件、VCF/BCF 变异文件以及 FASTA/FASTQ 序列。查询 tabix 索引文件,执行覆盖度分析的 pileup,并运行 samtools/bcftools 命令。
此技能应在以下情况下使用:
uv pip install pysam
读取比对文件:
import pysam
# 打开 BAM 文件并获取指定区域的 reads
samfile = pysam.AlignmentFile("example.bam", "rb")
for read in samfile.fetch("chr1", 1000, 2000):
print(f"{read.query_name}: {read.reference_start}")
samfile.close()
读取变异文件:
# 打开 VCF 文件并遍历变异
vcf = pysam.VariantFile("variants.vcf")
for variant in vcf:
print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}")
vcf.close()
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
查询参考序列:
# 打开 FASTA 并提取序列
fasta = pysam.FastaFile("reference.fasta")
sequence = fasta.fetch("chr1", 1000, 2000)
print(sequence)
fasta.close()
使用 AlignmentFile 类处理比对后的测序 reads。适用于分析比对结果、计算覆盖度、提取 reads 或质量控制。
常见操作:
参考: 查看 references/alignment_files.md 获取详细文档,包括:
fetch() 进行基于区域的获取使用 VariantFile 类处理来自变异检测流程的遗传变异。适用于变异分析、过滤、注释或群体遗传学。
常见操作:
参考: 查看 references/variant_files.md 获取详细文档,包括:
使用 FastaFile 进行参考序列的随机访问,使用 FastxFile 读取原始测序数据。适用于提取基因序列、根据参考验证变异或处理原始 reads。
常见操作:
参考: 查看 references/sequence_files.md 获取详细文档,包括:
Pysam 擅长整合多种文件类型以进行全面的基因组分析。常见的工作流结合比对文件、变异文件和参考序列。
常见工作流:
参考: 查看 references/common_workflows.md 获取详细示例,包括:
关键点: Pysam 使用 0 起始、半开区间 坐标(Python 惯例):
例外: fetch() 中的区域字符串遵循 samtools 惯例(1 起始):
samfile.fetch("chr1", 999, 2000) # 0 起始:位置 999-1999
samfile.fetch("chr1:1000-2000") # 1 起始字符串:位置 1000-2000
VCF 文件: 文件格式中使用 1 起始坐标,但 VariantRecord.start 是 0 起始的。
对特定基因组区域的随机访问需要索引文件:
.bai 索引(使用 pysam.index() 创建).crai 索引.fai 索引(使用 pysam.faidx() 创建).tbi tabix 索引(使用 pysam.tabix_index() 创建).csi 索引如果没有索引,使用 fetch(until_eof=True) 进行顺序读取。
打开文件时指定格式:
"rb" - 读取 BAM(二进制)"r" - 读取 SAM(文本)"rc" - 读取 CRAM"wb" - 写入 BAM"w" - 写入 SAM"wc" - 写入 CRAMpileup() 进行列式分析,而不是重复的 fetch 操作count() 进行计数,而不是手动迭代和计数until_eof=True 进行无索引的顺序处理multiple_iterators=True)fetch() 返回与区域边界重叠的 reads,而不仅仅是完全包含的 readsquery_sequence 后不能原地修改 query_qualities——先创建一个副本Pysam 提供对 samtools 和 bcftools 命令的访问:
# 排序 BAM 文件
pysam.samtools.sort("-o", "sorted.bam", "input.bam")
# 索引 BAM
pysam.samtools.index("sorted.bam")
# 查看特定区域
pysam.samtools.view("-b", "-o", "region.bam", "input.bam", "chr1:1000-2000")
# BCF 工具
pysam.bcftools.view("-O", "z", "-o", "output.vcf.gz", "input.vcf")
错误处理:
try:
pysam.samtools.sort("-o", "output.bam", "input.bam")
except pysam.SamtoolsError as e:
print(f"Error: {e}")
每个主要功能的详细文档:
alignment_files.md - SAM/BAM/CRAM 操作的完整指南,包括 AlignmentFile 类、AlignedSegment 属性、fetch 操作、pileup 分析和写入比对
variant_files.md - VCF/BCF 操作的完整指南,包括 VariantFile 类、VariantRecord 属性、基因型处理、INFO/FORMAT 字段和多样本操作
sequence_files.md - FASTA/FASTQ 操作的完整指南,包括 FastaFile 和 FastxFile 类、序列提取、质量分数处理和 tabix 索引文件访问
common_workflows.md - 集成生物信息学工作流的实用示例,结合多种文件类型,包括质量控制、覆盖度分析、变异验证和序列提取
有关特定操作的详细信息,请参阅相应的参考文档:
alignment_files.mdvariant_files.mdsequence_files.mdcommon_workflows.md每周安装数
124
代码仓库
GitHub Stars
22.6K
首次出现
Jan 21, 2026
安全审计
安装于
claude-code104
opencode99
gemini-cli94
cursor94
antigravity85
codex83
Pysam is a Python module for reading, manipulating, and writing genomic datasets. Read/write SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequences with a Pythonic interface to htslib. Query tabix-indexed files, perform pileup analysis for coverage, and execute samtools/bcftools commands.
This skill should be used when:
uv pip install pysam
Read alignment file:
import pysam
# Open BAM file and fetch reads in region
samfile = pysam.AlignmentFile("example.bam", "rb")
for read in samfile.fetch("chr1", 1000, 2000):
print(f"{read.query_name}: {read.reference_start}")
samfile.close()
Read variant file:
# Open VCF file and iterate variants
vcf = pysam.VariantFile("variants.vcf")
for variant in vcf:
print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}")
vcf.close()
Query reference sequence:
# Open FASTA and extract sequence
fasta = pysam.FastaFile("reference.fasta")
sequence = fasta.fetch("chr1", 1000, 2000)
print(sequence)
fasta.close()
Use the AlignmentFile class to work with aligned sequencing reads. This is appropriate for analyzing mapping results, calculating coverage, extracting reads, or quality control.
Common operations:
Reference: See references/alignment_files.md for detailed documentation on:
fetch()Use the VariantFile class to work with genetic variants from variant calling pipelines. This is appropriate for variant analysis, filtering, annotation, or population genetics.
Common operations:
Reference: See references/variant_files.md for detailed documentation on:
Use FastaFile for random access to reference sequences and FastxFile for reading raw sequencing data. This is appropriate for extracting gene sequences, validating variants against reference, or processing raw reads.
Common operations:
Reference: See references/sequence_files.md for detailed documentation on:
Pysam excels at integrating multiple file types for comprehensive genomic analyses. Common workflows combine alignment files, variant files, and reference sequences.
Common workflows:
Reference: See references/common_workflows.md for detailed examples of:
Critical: Pysam uses 0-based, half-open coordinates (Python convention):
Exception: Region strings in fetch() follow samtools convention (1-based):
samfile.fetch("chr1", 999, 2000) # 0-based: positions 999-1999
samfile.fetch("chr1:1000-2000") # 1-based string: positions 1000-2000
VCF files: Use 1-based coordinates in the file format, but VariantRecord.start is 0-based.
Random access to specific genomic regions requires index files:
.bai index (create with pysam.index()).crai index.fai index (create with pysam.faidx()).tbi tabix index (create with pysam.tabix_index()).csi indexWithout an index, use fetch(until_eof=True) for sequential reading.
Specify format when opening files:
"rb" - Read BAM (binary)"r" - Read SAM (text)"rc" - Read CRAM"wb" - Write BAM"w" - Write SAM"wc" - Write CRAMpileup() for column-wise analysis instead of repeated fetch operationscount() for counting instead of iterating and counting manuallyuntil_eof=True for sequential processing without indexmultiple_iterators=True if needed)fetch() returns reads overlapping region boundaries, not just those fully containedquery_qualities in place after changing query_sequence—create a copy firstPysam provides access to samtools and bcftools commands:
# Sort BAM file
pysam.samtools.sort("-o", "sorted.bam", "input.bam")
# Index BAM
pysam.samtools.index("sorted.bam")
# View specific region
pysam.samtools.view("-b", "-o", "region.bam", "input.bam", "chr1:1000-2000")
# BCF tools
pysam.bcftools.view("-O", "z", "-o", "output.vcf.gz", "input.vcf")
Error handling:
try:
pysam.samtools.sort("-o", "output.bam", "input.bam")
except pysam.SamtoolsError as e:
print(f"Error: {e}")
Detailed documentation for each major capability:
alignment_files.md - Complete guide to SAM/BAM/CRAM operations, including AlignmentFile class, AlignedSegment attributes, fetch operations, pileup analysis, and writing alignments
variant_files.md - Complete guide to VCF/BCF operations, including VariantFile class, VariantRecord attributes, genotype handling, INFO/FORMAT fields, and multi-sample operations
sequence_files.md - Complete guide to FASTA/FASTQ operations, including FastaFile and FastxFile classes, sequence extraction, quality score handling, and tabix-indexed file access
common_workflows.md - Practical examples of integrated bioinformatics workflows combining multiple file types, including quality control, coverage analysis, variant validation, and sequence extraction
For detailed information on specific operations, refer to the appropriate reference document:
alignment_files.mdvariant_files.mdsequence_files.mdcommon_workflows.mdOfficial documentation: https://pysam.readthedocs.io/
Weekly Installs
124
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code104
opencode99
gemini-cli94
cursor94
antigravity85
codex83
FastAPI官方技能:Python Web开发最佳实践与CLI工具使用指南
1,100 周安装