重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
pysam by k-dense-ai/claude-scientific-skills
npx skills add https://github.com/k-dense-ai/claude-scientific-skills --skill pysamPysam 是一个用于读取、操作和写入基因组数据集的 Python 模块。通过 Pythonic 接口访问 htslib,可以读写 SAM/BAM/CRAM 比对文件、VCF/BCF 变异文件以及 FASTA/FASTQ 序列文件。支持查询 tabix 索引文件、执行覆盖度分析以及运行 samtools/bcftools 命令。
此技能应在以下场景中使用:
uv pip install pysam
读取比对文件:
import pysam
# 打开 BAM 文件并获取指定区域的 reads
samfile = pysam.AlignmentFile("example.bam", "rb")
for read in samfile.fetch("chr1", 1000, 2000):
print(f"{read.query_name}: {read.reference_start}")
samfile.close()
读取变异文件:
# 打开 VCF 文件并遍历变异
vcf = pysam.VariantFile("variants.vcf")
for variant in vcf:
print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}")
vcf.close()
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
查询参考序列:
# 打开 FASTA 文件并提取序列
fasta = pysam.FastaFile("reference.fasta")
sequence = fasta.fetch("chr1", 1000, 2000)
print(sequence)
fasta.close()
使用 AlignmentFile 类处理比对后的测序 reads。适用于分析比对结果、计算覆盖度、提取 reads 或进行质量控制。
常见操作:
参考: 有关以下内容的详细文档,请参阅 references/alignment_files.md:
fetch() 进行基于区域的获取使用 VariantFile 类处理来自变异检测流程的遗传变异。适用于变异分析、过滤、注释或群体遗传学。
常见操作:
参考: 有关以下内容的详细文档,请参阅 references/variant_files.md:
使用 FastaFile 随机访问参考序列,使用 FastxFile 读取原始测序数据。适用于提取基因序列、根据参考序列验证变异或处理原始 reads。
常见操作:
参考: 有关以下内容的详细文档,请参阅 references/sequence_files.md:
Pysam 擅长整合多种文件类型以进行全面的基因组分析。常见的工作流结合了比对文件、变异文件和参考序列。
常见工作流:
参考: 有关以下内容的详细示例,请参阅 references/common_workflows.md:
重要: Pysam 使用 0 起始、半开区间坐标(Python 惯例):
例外: fetch() 中的区域字符串遵循 samtools 惯例(1 起始):
samfile.fetch("chr1", 999, 2000) # 0 起始:位置 999-1999
samfile.fetch("chr1:1000-2000") # 1 起始字符串:位置 1000-2000
VCF 文件: 文件格式中使用 1 起始坐标,但 VariantRecord.start 是 0 起始的。
随机访问特定基因组区域需要索引文件:
.bai 索引(使用 pysam.index() 创建).crai 索引.fai 索引(使用 pysam.faidx() 创建).tbi tabix 索引(使用 pysam.tabix_index() 创建).csi 索引如果没有索引,请使用 fetch(until_eof=True) 进行顺序读取。
打开文件时指定格式:
"rb" - 读取 BAM(二进制)"r" - 读取 SAM(文本)"rc" - 读取 CRAM"wb" - 写入 BAM"w" - 写入 SAM"wc" - 写入 CRAMpileup() 进行列式分析,而不是重复的 fetch 操作count() 进行计数,而不是手动迭代和计数until_eof=True 进行无需索引的顺序处理multiple_iterators=True)fetch() 返回与区域边界重叠的 reads,而不仅仅是完全包含的 readsquery_sequence 后无法原地修改 query_qualities——请先创建一个副本Pysam 提供了对 samtools 和 bcftools 命令的访问:
# 排序 BAM 文件
pysam.samtools.sort("-o", "sorted.bam", "input.bam")
# 索引 BAM
pysam.samtools.index("sorted.bam")
# 查看特定区域
pysam.samtools.view("-b", "-o", "region.bam", "input.bam", "chr1:1000-2000")
# BCF 工具
pysam.bcftools.view("-O", "z", "-o", "output.vcf.gz", "input.vcf")
错误处理:
try:
pysam.samtools.sort("-o", "output.bam", "input.bam")
except pysam.SamtoolsError as e:
print(f"Error: {e}")
每个主要功能的详细文档:
有关特定操作的详细信息,请参阅相应的参考文档:
alignment_files.mdvariant_files.mdsequence_files.mdcommon_workflows.md每周安装次数
55
代码仓库
GitHub 星标数
17.3K
首次出现
Jan 20, 2026
安全审计
安装于
opencode48
codex47
gemini-cli46
claude-code44
cursor44
github-copilot43
Pysam is a Python module for reading, manipulating, and writing genomic datasets. Read/write SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequences with a Pythonic interface to htslib. Query tabix-indexed files, perform pileup analysis for coverage, and execute samtools/bcftools commands.
This skill should be used when:
uv pip install pysam
Read alignment file:
import pysam
# Open BAM file and fetch reads in region
samfile = pysam.AlignmentFile("example.bam", "rb")
for read in samfile.fetch("chr1", 1000, 2000):
print(f"{read.query_name}: {read.reference_start}")
samfile.close()
Read variant file:
# Open VCF file and iterate variants
vcf = pysam.VariantFile("variants.vcf")
for variant in vcf:
print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}")
vcf.close()
Query reference sequence:
# Open FASTA and extract sequence
fasta = pysam.FastaFile("reference.fasta")
sequence = fasta.fetch("chr1", 1000, 2000)
print(sequence)
fasta.close()
Use the AlignmentFile class to work with aligned sequencing reads. This is appropriate for analyzing mapping results, calculating coverage, extracting reads, or quality control.
Common operations:
Reference: See references/alignment_files.md for detailed documentation on:
fetch()Use the VariantFile class to work with genetic variants from variant calling pipelines. This is appropriate for variant analysis, filtering, annotation, or population genetics.
Common operations:
Reference: See references/variant_files.md for detailed documentation on:
Use FastaFile for random access to reference sequences and FastxFile for reading raw sequencing data. This is appropriate for extracting gene sequences, validating variants against reference, or processing raw reads.
Common operations:
Reference: See references/sequence_files.md for detailed documentation on:
Pysam excels at integrating multiple file types for comprehensive genomic analyses. Common workflows combine alignment files, variant files, and reference sequences.
Common workflows:
Reference: See references/common_workflows.md for detailed examples of:
Critical: Pysam uses 0-based, half-open coordinates (Python convention):
Exception: Region strings in fetch() follow samtools convention (1-based):
samfile.fetch("chr1", 999, 2000) # 0-based: positions 999-1999
samfile.fetch("chr1:1000-2000") # 1-based string: positions 1000-2000
VCF files: Use 1-based coordinates in the file format, but VariantRecord.start is 0-based.
Random access to specific genomic regions requires index files:
.bai index (create with pysam.index()).crai index.fai index (create with pysam.faidx()).tbi tabix index (create with pysam.tabix_index()).csi indexWithout an index, use fetch(until_eof=True) for sequential reading.
Specify format when opening files:
"rb" - Read BAM (binary)"r" - Read SAM (text)"rc" - Read CRAM"wb" - Write BAM"w" - Write SAM"wc" - Write CRAMpileup() for column-wise analysis instead of repeated fetch operationscount() for counting instead of iterating and counting manuallyuntil_eof=True for sequential processing without indexmultiple_iterators=True if needed)fetch() returns reads overlapping region boundaries, not just those fully containedquery_qualities in place after changing query_sequence—create a copy firstPysam provides access to samtools and bcftools commands:
# Sort BAM file
pysam.samtools.sort("-o", "sorted.bam", "input.bam")
# Index BAM
pysam.samtools.index("sorted.bam")
# View specific region
pysam.samtools.view("-b", "-o", "region.bam", "input.bam", "chr1:1000-2000")
# BCF tools
pysam.bcftools.view("-O", "z", "-o", "output.vcf.gz", "input.vcf")
Error handling:
try:
pysam.samtools.sort("-o", "output.bam", "input.bam")
except pysam.SamtoolsError as e:
print(f"Error: {e}")
Detailed documentation for each major capability:
alignment_files.md - Complete guide to SAM/BAM/CRAM operations, including AlignmentFile class, AlignedSegment attributes, fetch operations, pileup analysis, and writing alignments
variant_files.md - Complete guide to VCF/BCF operations, including VariantFile class, VariantRecord attributes, genotype handling, INFO/FORMAT fields, and multi-sample operations
sequence_files.md - Complete guide to FASTA/FASTQ operations, including FastaFile and FastxFile classes, sequence extraction, quality score handling, and tabix-indexed file access
common_workflows.md - Practical examples of integrated bioinformatics workflows combining multiple file types, including quality control, coverage analysis, variant validation, and sequence extraction
For detailed information on specific operations, refer to the appropriate reference document:
alignment_files.mdvariant_files.mdsequence_files.mdcommon_workflows.mdOfficial documentation: https://pysam.readthedocs.io/
Weekly Installs
55
Repository
GitHub Stars
17.3K
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode48
codex47
gemini-cli46
claude-code44
cursor44
github-copilot43
FastAPI官方技能:Python Web开发最佳实践与CLI工具使用指南
1,300 周安装