ESM蛋白质语言模型：AI驱动蛋白质设计、结构预测与功能分析

esm by davila7/claude-code-templates

176 周安装量

24,300 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill esm

AI/机器学习科研工具生物信息学

🇨🇳中文介绍

ESM：进化尺度建模

概述

ESM 提供了最先进的蛋白质语言模型，用于理解、生成和设计蛋白质。此技能支持使用两个模型系列：ESM3 用于跨序列、结构和功能的生成式蛋白质设计，以及 ESM C 用于高效的蛋白质表示学习和嵌入。

核心功能

1. 使用 ESM3 生成蛋白质序列

使用多模态生成模型生成具有所需特性的新型蛋白质序列。

使用场景：

设计具有特定功能特性的蛋白质
补全部分蛋白质序列
生成现有蛋白质的变体
创建具有所需结构特征的蛋白质

基本用法：

from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

# 本地加载模型
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-sm-open-v1").to("cuda")

# 创建蛋白质提示
protein = ESMProtein(sequence="MPRT___KEND")  # '_' 代表掩码位置

# 生成补全
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
print(protein.sequence)

通过 Forge API 进行远程/云端使用：

from esm.sdk.forge import ESM3ForgeInferenceClient
from esm.sdk.api import ESMProtein, GenerationConfig

# 连接到 Forge
model = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", url="https://forge.evolutionaryscale.ai", token="<token>")

# 生成
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 结构预测和逆折叠

使用 ESM3 的结构轨迹进行从序列预测结构或逆折叠（从结构设计序列）。

from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

# 从序列预测结构
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_with_structure = model.generate(
    protein,
    GenerationConfig(track="structure", num_steps=protein.sequence.count("_"))
)

# 访问预测的结构
coordinates = protein_with_structure.coordinates  # 3D 坐标
pdb_string = protein_with_structure.to_pdb()

逆折叠（从结构设计序列）：

# 为目标结构设计序列
protein_with_structure = ESMProtein.from_pdb("target_structure.pdb")
protein_with_structure.sequence = None  # 移除序列

# 生成折叠为此结构的序列
designed_protein = model.generate(
    protein_with_structure,
    GenerationConfig(track="sequence", num_steps=50, temperature=0.7)
)

3. 使用 ESM C 生成蛋白质嵌入

为下游任务（如功能预测、分类或相似性分析）生成高质量的嵌入。

为机器学习提取蛋白质表示
计算序列相似性
为蛋白质分类提取特征
为蛋白质相关任务进行迁移学习

from esm.models.esmc import ESMC
from esm.sdk.api import ESMProtein

# 加载 ESM C 模型
model = ESMC.from_pretrained("esmc-300m").to("cuda")

# 获取嵌入
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_tensor = model.encode(protein)

# 生成嵌入
embeddings = model.forward(protein_tensor)

# 编码多个蛋白质
proteins = [
    ESMProtein(sequence="MPRTKEIND..."),
    ESMProtein(sequence="AGLIVHSPQ..."),
    ESMProtein(sequence="KTEFLNDGR...")
]

embeddings_list = [model.logits(model.forward(model.encode(p))) for p in proteins]

ESM C 模型详情、效率比较和高级嵌入策略，请参阅 references/esm-c-api.md。

4. 功能条件化和注释

使用 ESM3 的功能轨迹生成具有特定功能注释的蛋白质，或从序列预测功能。

功能条件化生成：

from esm.sdk.api import ESMProtein, FunctionAnnotation, GenerationConfig

# 创建具有所需功能的蛋白质
protein = ESMProtein(
    sequence="_" * 200,  # 生成 200 个残基的蛋白质
    function_annotations=[
        FunctionAnnotation(label="fluorescent_protein", start=50, end=150)
    ]
)

# 生成具有指定功能的序列
functional_protein = model.generate(
    protein,
    GenerationConfig(track="sequence", num_steps=200)
)

使用 ESM3 的思维链生成方法迭代优化蛋白质设计。

from esm.sdk.api import GenerationConfig

# 多步优化
protein = ESMProtein(sequence="MPRT" + "_" * 100 + "KEND")

# 步骤 1：生成初始结构
config = GenerationConfig(track="structure", num_steps=50)
protein = model.generate(protein, config)

# 步骤 2：基于结构优化序列
config = GenerationConfig(track="sequence", num_steps=50, temperature=0.5)
protein = model.generate(protein, config)

# 步骤 3：预测功能
config = GenerationConfig(track="function", num_steps=20)
protein = model.generate(protein, config)

6. 使用 Forge API 进行批处理

使用 Forge 的异步执行器高效处理多个蛋白质。

from esm.sdk.forge import ESM3ForgeInferenceClient
import asyncio

client = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", token="<token>")

# 异步批处理
async def batch_generate(proteins_list):
    tasks = [
        client.async_generate(protein, GenerationConfig(track="sequence"))
        for protein in proteins_list
    ]
    return await asyncio.gather(*tasks)

# 执行
proteins = [ESMProtein(sequence=f"MPRT{'_' * 50}KEND") for _ in range(10)]
results = asyncio.run(batch_generate(proteins))

详细 Forge API 文档、身份验证、速率限制和批处理模式，请参阅 references/forge-api.md。

ESM3 模型（生成式）：

esm3-sm-open-v1 (1.4B) - 开放权重，本地使用，适合实验
esm3-medium-2024-08 (7B) - 质量与速度的最佳平衡（仅限 Forge）
esm3-large-2024-03 (98B) - 最高质量，速度较慢（仅限 Forge）

ESM C 模型（嵌入）：

esmc-300m (30 层) - 轻量级，推理速度快
esmc-600m (36 层) - 性能平衡
esmc-6b (80 层) - 最大表示质量

本地开发/测试： 使用 esm3-sm-open-v1 或 esmc-300m
生产质量： 通过 Forge 使用 esm3-medium-2024-08
最高准确度： 使用 esm3-large-2024-03 或 esmc-6b
高吞吐量： 使用 Forge API 和批处理执行器
成本优化： 使用较小的模型，实施缓存策略

uv pip install esm

安装 Flash Attention（推荐用于更快的推理）：

uv pip install esm
uv pip install flash-attn --no-build-isolation

用于 Forge API 访问：

uv pip install esm  # SDK 包含 Forge 客户端

无需额外依赖项。在 https://forge.evolutionaryscale.ai 获取 Forge API 令牌。

详细示例和完整工作流，请参阅 references/workflows.md，其中包括：

使用思维链设计新型 GFP
蛋白质变体生成和筛选
基于结构的序列优化
功能预测流程
基于嵌入的聚类和分析

此技能包含全面的参考文档：

references/esm3-api.md - ESM3 模型架构、API 参考、生成参数和多模态提示
references/esm-c-api.md - ESM C 模型详情、嵌入策略和性能优化
references/forge-api.md - Forge 平台文档、身份验证、批处理和部署
references/workflows.md - 完整示例和常见工作流模式

这些参考资料包含详细的 API 规范、参数描述和高级使用模式。根据具体任务需要加载它们。

对于生成任务：

从较小的模型开始进行原型设计（esm3-sm-open-v1）
使用温度参数控制多样性（0.0 = 确定性，1.0 = 多样性）
对复杂设计实施带有思维链的迭代优化
通过结构预测或湿实验验证生成的序列

对于嵌入任务：

尽可能批处理序列以提高效率
为重复分析缓存嵌入
计算相似性时对嵌入进行归一化
根据下游任务要求使用适当的模型大小

对于生产部署：

使用 Forge API 以获得可扩展性和最新模型
为 API 调用实施错误处理和重试逻辑
监控令牌使用情况并实施速率限制
考虑使用 AWS SageMaker 部署以获得专用基础设施

GitHub 仓库： https://github.com/evolutionaryscale/esm
Forge 平台： https://forge.evolutionaryscale.ai
科学论文： Hayes 等人，Science (2025) - https://www.science.org/doi/10.1126/science.ads0018
博客文章：
- ESM3 发布：https://www.evolutionaryscale.ai/blog/esm3-release
- ESM C 发布：https://www.evolutionaryscale.ai/blog/esm-cambrian
社区： Slack 社区 https://bit.ly/3FKwcWd
模型权重： HuggingFace EvolutionaryScale 组织

ESM 专为蛋白质工程、药物发现和科学研究中的有益应用而设计。设计新型蛋白质时，请遵循负责任生物设计框架 (https://responsiblebiodesign.ai/)。在进行实验验证之前，请考虑蛋白质设计的生物安全性和伦理影响。

2026 年 1 月 21 日

🇺🇸English

ESM: Evolutionary Scale Modeling

Overview

ESM provides state-of-the-art protein language models for understanding, generating, and designing proteins. This skill enables working with two model families: ESM3 for generative protein design across sequence, structure, and function, and ESM C for efficient protein representation learning and embeddings.

Core Capabilities

1. Protein Sequence Generation with ESM3

Generate novel protein sequences with desired properties using multimodal generative modeling.

When to use:

Designing proteins with specific functional properties
Completing partial protein sequences
Generating variants of existing proteins
Creating proteins with desired structural characteristics

Basic usage:

from esm.models.esm3 import ESM3
from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

# Load model locally
model: ESM3InferenceClient = ESM3.from_pretrained("esm3-sm-open-v1").to("cuda")

# Create protein prompt
protein = ESMProtein(sequence="MPRT___KEND")  # '_' represents masked positions

# Generate completion
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))
print(protein.sequence)

For remote/cloud usage via Forge API:

from esm.sdk.forge import ESM3ForgeInferenceClient
from esm.sdk.api import ESMProtein, GenerationConfig

# Connect to Forge
model = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", url="https://forge.evolutionaryscale.ai", token="<token>")

# Generate
protein = model.generate(protein, GenerationConfig(track="sequence", num_steps=8))

See references/esm3-api.md for detailed ESM3 model specifications, advanced generation configurations, and multimodal prompting examples.

2. Structure Prediction and Inverse Folding

Use ESM3's structure track for structure prediction from sequence or inverse folding (sequence design from structure).

Structure prediction:

from esm.sdk.api import ESM3InferenceClient, ESMProtein, GenerationConfig

# Predict structure from sequence
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_with_structure = model.generate(
    protein,
    GenerationConfig(track="structure", num_steps=protein.sequence.count("_"))
)

# Access predicted structure
coordinates = protein_with_structure.coordinates  # 3D coordinates
pdb_string = protein_with_structure.to_pdb()

Inverse folding (sequence from structure):

# Design sequence for a target structure
protein_with_structure = ESMProtein.from_pdb("target_structure.pdb")
protein_with_structure.sequence = None  # Remove sequence

# Generate sequence that folds to this structure
designed_protein = model.generate(
    protein_with_structure,
    GenerationConfig(track="sequence", num_steps=50, temperature=0.7)
)

3. Protein Embeddings with ESM C

Generate high-quality embeddings for downstream tasks like function prediction, classification, or similarity analysis.

When to use:

Extracting protein representations for machine learning
Computing sequence similarities
Feature extraction for protein classification
Transfer learning for protein-related tasks

Basic usage:

from esm.models.esmc import ESMC
from esm.sdk.api import ESMProtein

# Load ESM C model
model = ESMC.from_pretrained("esmc-300m").to("cuda")

# Get embeddings
protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP...")
protein_tensor = model.encode(protein)

# Generate embeddings
embeddings = model.forward(protein_tensor)

Batch processing:

# Encode multiple proteins
proteins = [
    ESMProtein(sequence="MPRTKEIND..."),
    ESMProtein(sequence="AGLIVHSPQ..."),
    ESMProtein(sequence="KTEFLNDGR...")
]

embeddings_list = [model.logits(model.forward(model.encode(p))) for p in proteins]

See references/esm-c-api.md for ESM C model details, efficiency comparisons, and advanced embedding strategies.

4. Function Conditioning and Annotation

Use ESM3's function track to generate proteins with specific functional annotations or predict function from sequence.

Function-conditioned generation:

from esm.sdk.api import ESMProtein, FunctionAnnotation, GenerationConfig

# Create protein with desired function
protein = ESMProtein(
    sequence="_" * 200,  # Generate 200 residue protein
    function_annotations=[
        FunctionAnnotation(label="fluorescent_protein", start=50, end=150)
    ]
)

# Generate sequence with specified function
functional_protein = model.generate(
    protein,
    GenerationConfig(track="sequence", num_steps=200)
)

5. Chain-of-Thought Generation

Iteratively refine protein designs using ESM3's chain-of-thought generation approach.

from esm.sdk.api import GenerationConfig

# Multi-step refinement
protein = ESMProtein(sequence="MPRT" + "_" * 100 + "KEND")

# Step 1: Generate initial structure
config = GenerationConfig(track="structure", num_steps=50)
protein = model.generate(protein, config)

# Step 2: Refine sequence based on structure
config = GenerationConfig(track="sequence", num_steps=50, temperature=0.5)
protein = model.generate(protein, config)

# Step 3: Predict function
config = GenerationConfig(track="function", num_steps=20)
protein = model.generate(protein, config)

6. Batch Processing with Forge API

Process multiple proteins efficiently using Forge's async executor.

from esm.sdk.forge import ESM3ForgeInferenceClient
import asyncio

client = ESM3ForgeInferenceClient(model="esm3-medium-2024-08", token="<token>")

# Async batch processing
async def batch_generate(proteins_list):
    tasks = [
        client.async_generate(protein, GenerationConfig(track="sequence"))
        for protein in proteins_list
    ]
    return await asyncio.gather(*tasks)

# Execute
proteins = [ESMProtein(sequence=f"MPRT{'_' * 50}KEND") for _ in range(10)]
results = asyncio.run(batch_generate(proteins))

See references/forge-api.md for detailed Forge API documentation, authentication, rate limits, and batch processing patterns.

Model Selection Guide

ESM3 Models (Generative):

esm3-sm-open-v1 (1.4B) - Open weights, local usage, good for experimentation
esm3-medium-2024-08 (7B) - Best balance of quality and speed (Forge only)
esm3-large-2024-03 (98B) - Highest quality, slower (Forge only)

ESM C Models (Embeddings):

esmc-300m (30 layers) - Lightweight, fast inference
esmc-600m (36 layers) - Balanced performance
esmc-6b (80 layers) - Maximum representation quality

Selection criteria:

Local development/testing: Use esm3-sm-open-v1 or esmc-300m
Production quality: Use esm3-medium-2024-08 via Forge
Maximum accuracy: Use esm3-large-2024-03 or esmc-6b
High throughput: Use Forge API with batch executor
Cost optimization: Use smaller models, implement caching strategies

Installation

Basic installation:

uv pip install esm

With Flash Attention (recommended for faster inference):

uv pip install esm
uv pip install flash-attn --no-build-isolation

For Forge API access:

uv pip install esm  # SDK includes Forge client

No additional dependencies needed. Obtain Forge API token at https://forge.evolutionaryscale.ai

Common Workflows

For detailed examples and complete workflows, see references/workflows.md which includes:

Novel GFP design with chain-of-thought
Protein variant generation and screening
Structure-based sequence optimization
Function prediction pipelines
Embedding-based clustering and analysis

References

This skill includes comprehensive reference documentation:

references/esm3-api.md - ESM3 model architecture, API reference, generation parameters, and multimodal prompting
references/esm-c-api.md - ESM C model details, embedding strategies, and performance optimization
references/forge-api.md - Forge platform documentation, authentication, batch processing, and deployment
references/workflows.md - Complete examples and common workflow patterns

These references contain detailed API specifications, parameter descriptions, and advanced usage patterns. Load them as needed for specific tasks.

Best Practices

For generation tasks:

Start with smaller models for prototyping (esm3-sm-open-v1)
Use temperature parameter to control diversity (0.0 = deterministic, 1.0 = diverse)
Implement iterative refinement with chain-of-thought for complex designs
Validate generated sequences with structure prediction or wet-lab experiments

For embedding tasks:

Batch process sequences when possible for efficiency
Cache embeddings for repeated analyses
Normalize embeddings when computing similarities
Use appropriate model size based on downstream task requirements

For production deployment:

Use Forge API for scalability and latest models
Implement error handling and retry logic for API calls
Monitor token usage and implement rate limiting
Consider AWS SageMaker deployment for dedicated infrastructure

Resources and Documentation

GitHub Repository: https://github.com/evolutionaryscale/esm
Forge Platform: https://forge.evolutionaryscale.ai
Scientific Paper: Hayes et al., Science (2025) - https://www.science.org/doi/10.1126/science.ads0018
Blog Posts:
- ESM3 Release: https://www.evolutionaryscale.ai/blog/esm3-release
- ESM C Launch: https://www.evolutionaryscale.ai/blog/esm-cambrian
Community: Slack community at https://bit.ly/3FKwcWd
Model Weights: HuggingFace EvolutionaryScale organization

Responsible Use

ESM is designed for beneficial applications in protein engineering, drug discovery, and scientific research. Follow the Responsible Biodesign Framework (https://responsiblebiodesign.ai/) when designing novel proteins. Consider biosafety and ethical implications of protein designs before experimental validation.

Weekly Installs

117

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code100

opencode91

cursor88

gemini-cli87

antigravity84

codex76

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

50,900 周安装