model-merging by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill model-merging当您需要以下情况时,请使用模型合并:
成功案例:Marcoro14-7B-slerp(2024年2月在 Open LLM 排行榜上表现最佳),许多顶级的 HuggingFace 模型都使用了合并技术
工具:mergekit(Arcee AI)、LazyMergekit、Model Soup
# 安装 mergekit
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .
# 或通过 pip 安装
pip install mergekit
# 可选:Transformer 库
pip install transformers torch
# config.yml - 以相等权重合并两个模型
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.5
dtype: bfloat16
# 运行合并
mergekit-yaml config.yml ./merged-model --cuda
# 使用合并后的模型
python -m transformers.models.auto --model_name_or_path ./merged-model
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
# config.yml - 球面线性插值
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # 插值因子 (0=模型1, 1=模型2)
dtype: bfloat16
线性(模型汤)
merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights
# 其中 w1 + w2 + w3 = 1
SLERP(球面线性插值)
# SLERP 公式
merged = (sin((1-t)*θ) / sin(θ)) * model1 + (sin(t*θ) / sin(θ)) * model2
# 其中 θ = arccos(dot(model1, model2))
# t ∈ [0, 1]
任务算术
# 任务向量
task_vector = finetuned_model - base_model
# 合并多个任务向量
merged = base_model + α₁*task_vector₁ + α₂*task_vector₂
TIES-Merging
DARE(丢弃与重缩放)
# 基本结构
merge_method: <method> # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path> # 可选:任务算术的基础模型
models:
- model: <path/to/model1>
parameters:
weight: <float> # 合并权重
density: <float> # 用于 TIES/DARE
- model: <path/to/model2>
parameters:
weight: <float>
parameters:
# 方法特定参数
dtype: <dtype> # bfloat16, float16, float32
# 可选
slices: # 逐层合并
tokenizer: # 分词器配置
最适合:简单的模型组合,等权重
merge_method: linear
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
weight: 0.4
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.3
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
weight: 0.3
dtype: bfloat16
最适合:两个模型,平滑插值
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # 0.0 = 第一个模型, 1.0 = 第二个模型
dtype: bfloat16
逐层 SLERP:
merge_method: slerp
slices:
- sources:
- model: model_a
layer_range: [0, 32]
- model: model_b
layer_range: [0, 32]
parameters:
t:
- filter: self_attn # 注意力层
value: 0.3
- filter: mlp # MLP 层
value: 0.7
- value: 0.5 # 其他层的默认值
dtype: bfloat16
最适合:组合专业技能
merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1 # 数学
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B # 聊天
parameters:
weight: 0.3
- model: ajibawa-2023/Code-Mistral-7B # 代码
parameters:
weight: 0.2
dtype: bfloat16
最适合:多个模型,解决冲突
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # 保留前 50% 的参数
weight: 1.0
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 1.0
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
density: 0.5
weight: 1.0
parameters:
normalize: true
dtype: bfloat16
最适合:减少冗余
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # 丢弃 50% 的增量
weight: 0.6
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 0.4
parameters:
int8_mask: true # 对掩码使用 int8(节省内存)
dtype: bfloat16
# 不同层使用不同模型
merge_method: passthrough
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 16] # 前半部分
- sources:
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [16, 32] # 后半部分
dtype: bfloat16
# 创建专家混合模型
merge_method: moe
base_model: mistralai/Mistral-7B-v0.1
experts:
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "math"
- "calculate"
- source_model: teknium/OpenHermes-2.5-Mistral-7B
positive_prompts:
- "chat"
- "conversation"
- source_model: ajibawa-2023/Code-Mistral-7B
positive_prompts:
- "code"
- "python"
dtype: bfloat16
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
- model: custom/specialized-model
tokenizer:
source: "union" # 组合两个模型的词汇表
tokens:
<|special_token|>:
source: "custom/specialized-model"
# ✅ 良好:相同架构
models = [
"mistralai/Mistral-7B-v0.1",
"teknium/OpenHermes-2.5-Mistral-7B", # 两者都是 Mistral 7B
]
# ❌ 不良:不同架构
models = [
"meta-llama/Llama-2-7b-hf", # Llama
"mistralai/Mistral-7B-v0.1", # Mistral(不兼容!)
]
# ✅ 良好:权重总和为 1.0
models:
- model: model_a
parameters:
weight: 0.6
- model: model_b
parameters:
weight: 0.4 # 0.6 + 0.4 = 1.0
# ⚠️ 可接受:权重总和不为 1(用于任务算术)
models:
- model: model_a
parameters:
weight: 0.8
- model: model_b
parameters:
weight: 0.8 # 可能提升性能
# 根据用例选择合并方法:
# 2 个模型,平滑混合 → SLERP
merge_method = "slerp"
# 3+ 个模型,简单平均 → 线性
merge_method = "linear"
# 多个特定任务模型 → 任务算术或 TIES
merge_method = "ties"
# 希望减少冗余 → DARE
merge_method = "dare_ties"
# 从保守开始(保留更多参数)
parameters:
density: 0.8 # 保留 80%
# 如果性能良好,增加稀疏性
parameters:
density: 0.5 # 保留 50%
# 如果性能下降,减少稀疏性
parameters:
density: 0.9 # 保留 90%
# 保留基础模型的开始和结束部分
merge_method: passthrough
slices:
- sources:
- model: base_model
layer_range: [0, 2] # 保留前几层
- sources:
- model: merged_middle # 合并中间层
layer_range: [2, 30]
- sources:
- model: base_model
layer_range: [30, 32] # 保留最后几层
from transformers import AutoModelForCausalLM, AutoTokenizer
# 加载合并模型
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
# 在各种任务上测试
test_prompts = {
"math": "Calculate: 25 * 17 =",
"code": "Write a Python function to reverse a string:",
"chat": "What is the capital of France?",
}
for task, prompt in test_prompts.items():
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(f"{task}: {tokenizer.decode(outputs[0])}")
from transformers import AutoModelForCausalLM, AutoTokenizer
# 加载合并模型
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
# 上传到 HuggingFace Hub
model.push_to_hub("username/my-merged-model")
tokenizer.push_to_hub("username/my-merged-model")
# 使用 GGUF 量化
python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf
# 使用 GPTQ 量化
python quantize_gptq.py ./merged-model --bits 4 --group_size 128
# 错误:不同架构
models:
- model: meta-llama/Llama-2-7b # Llama 架构
- model: mistralai/Mistral-7B # Mistral 架构
修复:仅合并具有相同架构的模型
# 次优:一个模型占主导地位
models:
- model: model_a
parameters:
weight: 0.95 # 过高
- model: model_b
parameters:
weight: 0.05 # 过低
修复:使用更平衡的权重(0.3-0.7 范围)
# 错误:合并后不测试直接部署
mergekit-yaml config.yml ./merged-model
# 立即部署(有风险!)
修复:部署前始终进行基准测试
references/methods.md - 合并算法深入探讨references/examples.md - 实际合并配置示例references/evaluation.md - 基准测试和测试策略每周安装次数
175
仓库
GitHub 星标
22.6K
首次出现
2026年1月21日
安全审计
安装于
claude-code138
opencode137
gemini-cli127
cursor121
codex118
github-copilot106
Use Model Merging when you need to:
Success Stories : Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging
Tools : mergekit (Arcee AI), LazyMergekit, Model Soup
# Install mergekit
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .
# Or via pip
pip install mergekit
# Optional: Transformer library
pip install transformers torch
# config.yml - Merge two models with equal weights
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.5
dtype: bfloat16
# Run merge
mergekit-yaml config.yml ./merged-model --cuda
# Use merged model
python -m transformers.models.auto --model_name_or_path ./merged-model
# config.yml - Spherical interpolation
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # Interpolation factor (0=model1, 1=model2)
dtype: bfloat16
Linear (Model Soup)
Simple weighted average of parameters
Fast, works well for similar models
Can merge 2+ models
merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights
SLERP (Spherical Linear Interpolation)
Interpolates along sphere in weight space
Preserves magnitude of weight vectors
Best for merging 2 models
Smoother than linear
merged = (sin((1-t)θ) / sin(θ)) * model1 + (sin(tθ) / sin(θ)) * model2
Task Arithmetic
Extract "task vectors" (fine-tuned - base)
Combine task vectors, add to base
Good for merging multiple specialized models
task_vector = finetuned_model - base_model
merged = base_model + α₁task_vector₁ + α₂task_vector₂
TIES-Merging
DARE (Drop And REscale)
# Basic structure
merge_method: <method> # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path> # Optional: base model for task arithmetic
models:
- model: <path/to/model1>
parameters:
weight: <float> # Merge weight
density: <float> # For TIES/DARE
- model: <path/to/model2>
parameters:
weight: <float>
parameters:
# Method-specific parameters
dtype: <dtype> # bfloat16, float16, float32
# Optional
slices: # Layer-wise merging
tokenizer: # Tokenizer configuration
Best for : Simple model combinations, equal weighting
merge_method: linear
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
weight: 0.4
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.3
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
weight: 0.3
dtype: bfloat16
Best for : Two models, smooth interpolation
merge_method: slerp
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 32]
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [0, 32]
parameters:
t: 0.5 # 0.0 = first model, 1.0 = second model
dtype: bfloat16
Layer-specific SLERP:
merge_method: slerp
slices:
- sources:
- model: model_a
layer_range: [0, 32]
- model: model_b
layer_range: [0, 32]
parameters:
t:
- filter: self_attn # Attention layers
value: 0.3
- filter: mlp # MLP layers
value: 0.7
- value: 0.5 # Default for other layers
dtype: bfloat16
Best for : Combining specialized skills
merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1 # Math
parameters:
weight: 0.5
- model: teknium/OpenHermes-2.5-Mistral-7B # Chat
parameters:
weight: 0.3
- model: ajibawa-2023/Code-Mistral-7B # Code
parameters:
weight: 0.2
dtype: bfloat16
Best for : Many models, resolving conflicts
merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # Keep top 50% of parameters
weight: 1.0
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 1.0
- model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
parameters:
density: 0.5
weight: 1.0
parameters:
normalize: true
dtype: bfloat16
Best for : Reducing redundancy
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
- model: WizardLM/WizardMath-7B-V1.1
parameters:
density: 0.5 # Drop 50% of deltas
weight: 0.6
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
density: 0.5
weight: 0.4
parameters:
int8_mask: true # Use int8 for masks (saves memory)
dtype: bfloat16
# Different models for different layers
merge_method: passthrough
slices:
- sources:
- model: mistralai/Mistral-7B-v0.1
layer_range: [0, 16] # First half
- sources:
- model: teknium/OpenHermes-2.5-Mistral-7B
layer_range: [16, 32] # Second half
dtype: bfloat16
# Create Mixture of Experts
merge_method: moe
base_model: mistralai/Mistral-7B-v0.1
experts:
- source_model: WizardLM/WizardMath-7B-V1.1
positive_prompts:
- "math"
- "calculate"
- source_model: teknium/OpenHermes-2.5-Mistral-7B
positive_prompts:
- "chat"
- "conversation"
- source_model: ajibawa-2023/Code-Mistral-7B
positive_prompts:
- "code"
- "python"
dtype: bfloat16
merge_method: linear
models:
- model: mistralai/Mistral-7B-v0.1
- model: custom/specialized-model
tokenizer:
source: "union" # Combine vocabularies from both models
tokens:
<|special_token|>:
source: "custom/specialized-model"
# ✅ Good: Same architecture
models = [
"mistralai/Mistral-7B-v0.1",
"teknium/OpenHermes-2.5-Mistral-7B", # Both Mistral 7B
]
# ❌ Bad: Different architectures
models = [
"meta-llama/Llama-2-7b-hf", # Llama
"mistralai/Mistral-7B-v0.1", # Mistral (incompatible!)
]
# ✅ Good: Weights sum to 1.0
models:
- model: model_a
parameters:
weight: 0.6
- model: model_b
parameters:
weight: 0.4 # 0.6 + 0.4 = 1.0
# ⚠️ Acceptable: Weights don't sum to 1 (for task arithmetic)
models:
- model: model_a
parameters:
weight: 0.8
- model: model_b
parameters:
weight: 0.8 # May boost performance
# Choose merge method based on use case:
# 2 models, smooth blend → SLERP
merge_method = "slerp"
# 3+ models, simple average → Linear
merge_method = "linear"
# Multiple task-specific models → Task Arithmetic or TIES
merge_method = "ties"
# Want to reduce redundancy → DARE
merge_method = "dare_ties"
# Start conservative (keep more parameters)
parameters:
density: 0.8 # Keep 80%
# If performance good, increase sparsity
parameters:
density: 0.5 # Keep 50%
# If performance degrades, reduce sparsity
parameters:
density: 0.9 # Keep 90%
# Preserve base model's beginning and end
merge_method: passthrough
slices:
- sources:
- model: base_model
layer_range: [0, 2] # Keep first layers
- sources:
- model: merged_middle # Merge middle layers
layer_range: [2, 30]
- sources:
- model: base_model
layer_range: [30, 32] # Keep last layers
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load merged model
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
# Test on various tasks
test_prompts = {
"math": "Calculate: 25 * 17 =",
"code": "Write a Python function to reverse a string:",
"chat": "What is the capital of France?",
}
for task, prompt in test_prompts.items():
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(f"{task}: {tokenizer.decode(outputs[0])}")
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load merged model
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")
# Upload to HuggingFace Hub
model.push_to_hub("username/my-merged-model")
tokenizer.push_to_hub("username/my-merged-model")
# Quantize with GGUF
python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf
# Quantize with GPTQ
python quantize_gptq.py ./merged-model --bits 4 --group_size 128
# Wrong: Different architectures
models:
- model: meta-llama/Llama-2-7b # Llama architecture
- model: mistralai/Mistral-7B # Mistral architecture
Fix : Only merge models with same architecture
# Suboptimal: One model dominates
models:
- model: model_a
parameters:
weight: 0.95 # Too high
- model: model_b
parameters:
weight: 0.05 # Too low
Fix : Use more balanced weights (0.3-0.7 range)
# Wrong: Merge and deploy without testing
mergekit-yaml config.yml ./merged-model
# Deploy immediately (risky!)
Fix : Always benchmark before deploying
references/methods.md - Deep dive into merge algorithmsreferences/examples.md - Real-world merge configurationsreferences/evaluation.md - Benchmarking and testing strategiesWeekly Installs
175
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
claude-code138
opencode137
gemini-cli127
cursor121
codex118
github-copilot106
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
62,200 周安装
using-superpowers技能使用指南:AI助手强制技能调用规则与工作流程优化
317 周安装
统计分析技能:假设检验、回归、贝叶斯分析与APA报告
319 周安装
AiCoin Trading - 安全合规的加密货币交易下单工具,支持OKX、币安等主流交易所
317 周安装
YouTube字幕下载器 - 一键下载视频字幕/说明文字,支持yt-dlp与Whisper转录
317 周安装
文档协同创作工作流:AI辅助结构化写作指南,提升团队文档质量与效率
316 周安装
Web Artifacts Builder:React + TypeScript + Vite 前端工件构建工具,一键打包为单HTML文件
318 周安装