mamba-architecture by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill mamba-architectureMamba 是一种状态空间模型架构,在序列建模中实现了 O(n) 线性复杂度。
安装:
# 安装 causal-conv1d(可选,用于提升效率)
pip install causal-conv1d>=1.4.0
# 安装 Mamba
pip install mamba-ssm
# 或者同时安装两者
pip install mamba-ssm[causal-conv1d]
前提条件:Linux 系统,NVIDIA GPU,PyTorch 1.12+,CUDA 11.6+
基本用法(Mamba 模块):
import torch
from mamba_ssm import Mamba
batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba(
d_model=dim, # 模型维度
d_state=16, # SSM 状态维度
d_conv=4, # Conv1d 卷积核大小
expand=2 # 扩展因子
).to("cuda")
y = model(x) # O(n) 复杂度!
assert y.shape == x.shape
包含生成的完整语言模型:
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from mamba_ssm.models.config_mamba import MambaConfig
import torch
# 配置 Mamba-2 语言模型
config = MambaConfig(
d_model=1024, # 隐藏层维度
n_layer=24, # 层数
vocab_size=50277, # 词汇表大小
ssm_cfg=dict(
layer="Mamba2", # 使用 Mamba-2
d_state=128, # Mamba-2 使用更大的状态维度
headdim=64, # 头维度
ngroups=1 # 分组数量
)
)
model = MambaLMHeadModel(config, device="cuda", dtype=torch.float16)
# 生成文本
input_ids = torch.randint(0, 1000, (1, 20), device="cuda", dtype=torch.long)
output = model.generate(
input_ids=input_ids,
max_length=100,
temperature=0.7,
top_p=0.9
)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
从 HuggingFace 加载:
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
# 加载预训练模型
model_name = "state-spaces/mamba-2.8b"
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") # 使用兼容的分词器
model = MambaLMHeadModel.from_pretrained(model_name, device="cuda", dtype=torch.float16)
# 生成
prompt = "The future of AI is"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
output_ids = model.generate(
input_ids=input_ids,
max_length=200,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2
)
generated_text = tokenizer.decode(output_ids[0])
print(generated_text)
可用模型:
state-spaces/mamba-130mstate-spaces/mamba-370mstate-spaces/mamba-790mstate-spaces/mamba-1.4bstate-spaces/mamba-2.8bMamba-1(状态维度较小):
from mamba_ssm import Mamba
model = Mamba(
d_model=256,
d_state=16, # 较小的状态维度
d_conv=4,
expand=2
).to("cuda")
Mamba-2(多头,状态维度较大):
from mamba_ssm import Mamba2
model = Mamba2(
d_model=256,
d_state=128, # 较大的状态维度
d_conv=4,
expand=2,
headdim=64, # 多头结构的头维度
ngroups=1 # 并行组数
).to("cuda")
主要区别:
生成速度对比:
# 对 Mamba 进行基准测试
python benchmarks/benchmark_generation_mamba_simple.py \
--model-name "state-spaces/mamba-2.8b" \
--prompt "The future of machine learning is" \
--topp 0.9 --temperature 0.7 --repetition-penalty 1.2
# 对 Transformer 进行基准测试
python benchmarks/benchmark_generation_mamba_simple.py \
--model-name "EleutherAI/pythia-2.8b" \
--prompt "The future of machine learning is" \
--topp 0.9 --temperature 0.7 --repetition-penalty 1.2
预期结果:
在以下情况使用 Mamba:
优势:
在以下情况使用替代方案:
问题:CUDA 内存不足
减少批处理大小或使用梯度检查点:
model = MambaLMHeadModel(config, device="cuda", dtype=torch.float16)
model.gradient_checkpointing_enable() # 启用检查点
问题:安装缓慢
安装二进制 wheel 包(而非源码编译):
pip install mamba-ssm --no-build-isolation
问题:缺少 causal-conv1d
单独安装:
pip install causal-conv1d>=1.4.0
问题:无法从 HuggingFace 加载模型
使用 MambaLMHeadModel.from_pretrained(而非 AutoModel):
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
model = MambaLMHeadModel.from_pretrained("state-spaces/mamba-2.8b")
选择性 SSM:请参阅 references/selective-ssm.md 了解数学公式、状态空间方程以及选择性如何实现 O(n) 复杂度。
Mamba-2 架构:请参阅 references/mamba2-details.md 了解多头结构、张量并行和分布式训练设置。
性能优化:请参阅 references/performance.md 了解硬件感知设计、CUDA 内核和内存效率技术。
性能(与 Transformers 对比):
每周安装量
142
仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
已安装于
claude-code117
opencode110
gemini-cli106
cursor103
antigravity95
codex92
Mamba is a state-space model architecture achieving O(n) linear complexity for sequence modeling.
Installation :
# Install causal-conv1d (optional, for efficiency)
pip install causal-conv1d>=1.4.0
# Install Mamba
pip install mamba-ssm
# Or both together
pip install mamba-ssm[causal-conv1d]
Prerequisites : Linux, NVIDIA GPU, PyTorch 1.12+, CUDA 11.6+
Basic usage (Mamba block):
import torch
from mamba_ssm import Mamba
batch, length, dim = 2, 64, 16
x = torch.randn(batch, length, dim).to("cuda")
model = Mamba(
d_model=dim, # Model dimension
d_state=16, # SSM state dimension
d_conv=4, # Conv1d kernel size
expand=2 # Expansion factor
).to("cuda")
y = model(x) # O(n) complexity!
assert y.shape == x.shape
Complete LM with generation :
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
from mamba_ssm.models.config_mamba import MambaConfig
import torch
# Configure Mamba-2 LM
config = MambaConfig(
d_model=1024, # Hidden dimension
n_layer=24, # Number of layers
vocab_size=50277, # Vocabulary size
ssm_cfg=dict(
layer="Mamba2", # Use Mamba-2
d_state=128, # Larger state for Mamba-2
headdim=64, # Head dimension
ngroups=1 # Number of groups
)
)
model = MambaLMHeadModel(config, device="cuda", dtype=torch.float16)
# Generate text
input_ids = torch.randint(0, 1000, (1, 20), device="cuda", dtype=torch.long)
output = model.generate(
input_ids=input_ids,
max_length=100,
temperature=0.7,
top_p=0.9
)
Load from HuggingFace :
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
# Load pretrained model
model_name = "state-spaces/mamba-2.8b"
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b") # Use compatible tokenizer
model = MambaLMHeadModel.from_pretrained(model_name, device="cuda", dtype=torch.float16)
# Generate
prompt = "The future of AI is"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
output_ids = model.generate(
input_ids=input_ids,
max_length=200,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2
)
generated_text = tokenizer.decode(output_ids[0])
print(generated_text)
Available models :
state-spaces/mamba-130mstate-spaces/mamba-370mstate-spaces/mamba-790mstate-spaces/mamba-1.4bstate-spaces/mamba-2.8bMamba-1 (smaller state):
from mamba_ssm import Mamba
model = Mamba(
d_model=256,
d_state=16, # Smaller state dimension
d_conv=4,
expand=2
).to("cuda")
Mamba-2 (multi-head, larger state):
from mamba_ssm import Mamba2
model = Mamba2(
d_model=256,
d_state=128, # Larger state dimension
d_conv=4,
expand=2,
headdim=64, # Head dimension for multi-head
ngroups=1 # Parallel groups
).to("cuda")
Key differences :
Generation speed comparison :
# Benchmark Mamba
python benchmarks/benchmark_generation_mamba_simple.py \
--model-name "state-spaces/mamba-2.8b" \
--prompt "The future of machine learning is" \
--topp 0.9 --temperature 0.7 --repetition-penalty 1.2
# Benchmark Transformer
python benchmarks/benchmark_generation_mamba_simple.py \
--model-name "EleutherAI/pythia-2.8b" \
--prompt "The future of machine learning is" \
--topp 0.9 --temperature 0.7 --repetition-penalty 1.2
Expected results :
Use Mamba when :
Advantages :
Use alternatives instead :
Issue: CUDA out of memory
Reduce batch size or use gradient checkpointing:
model = MambaLMHeadModel(config, device="cuda", dtype=torch.float16)
model.gradient_checkpointing_enable() # Enable checkpointing
Issue: Slow installation
Install binary wheels (not source):
pip install mamba-ssm --no-build-isolation
Issue: Missing causal-conv1d
Install separately:
pip install causal-conv1d>=1.4.0
Issue: Model not loading from HuggingFace
Use MambaLMHeadModel.from_pretrained (not AutoModel):
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
model = MambaLMHeadModel.from_pretrained("state-spaces/mamba-2.8b")
Selective SSM : See references/selective-ssm.md for mathematical formulation, state-space equations, and how selectivity enables O(n) complexity.
Mamba-2 architecture : See references/mamba2-details.md for multi-head structure, tensor parallelism, and distributed training setup.
Performance optimization : See references/performance.md for hardware-aware design, CUDA kernels, and memory efficiency techniques.
Performance (vs Transformers):
Weekly Installs
142
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
claude-code117
opencode110
gemini-cli106
cursor103
antigravity95
codex92
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
46,500 周安装