模型合并指南：无需GPU快速组合预训练模型，提升AI性能5-10%

model-merging by davila7/claude-code-templates

194 周安装量

23,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill model-merging

AI/机器学习自动化代码生成

🇨🇳中文介绍

模型合并：组合预训练模型

何时使用此技能

当您需要以下情况时，请使用模型合并：

组合能力：无需重新训练即可组合多个微调模型的能力
创建专业模型：通过融合特定领域的专业知识（数学 + 编码 + 聊天）
提升性能：超越单一模型（通常在基准测试中提升 +5-10%）
降低训练成本 - 无需 GPU，合并可在 CPU 上运行
快速实验 - 在几分钟内创建新的模型变体，而非数天
保留多项技能 - 合并而不会发生灾难性遗忘

成功案例：Marcoro14-7B-slerp（2024年2月在 Open LLM 排行榜上表现最佳），许多顶级的 HuggingFace 模型都使用了合并技术

工具：mergekit（Arcee AI）、LazyMergekit、Model Soup

安装

# 安装 mergekit
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .

# 或通过 pip 安装
pip install mergekit

# 可选：Transformer 库
pip install transformers torch

快速开始

简单线性合并

# config.yml - 以相等权重合并两个模型
merge_method: linear
models:
  - model: mistralai/Mistral-7B-v0.1
    parameters:
      weight: 0.5
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      weight: 0.5
dtype: bfloat16

# 运行合并
mergekit-yaml config.yml ./merged-model --cuda

# 使用合并后的模型
python -m transformers.models.auto --model_name_or_path ./merged-model

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

SLERP 合并（最适合 2 个模型）

# config.yml - 球面线性插值
merge_method: slerp
slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 32]
      - model: teknium/OpenHermes-2.5-Mistral-7B
        layer_range: [0, 32]
parameters:
  t: 0.5  # 插值因子 (0=模型1, 1=模型2)
dtype: bfloat16

线性（模型汤）

参数的简单加权平均
快速，适用于相似模型
可以合并 2 个以上模型

merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights
# 其中 w1 + w2 + w3 = 1

SLERP（球面线性插值）

在权重空间的球面上进行插值
保留权重向量的大小
最适合合并 2 个模型
比线性更平滑

# SLERP 公式
merged = (sin((1-t)*θ) / sin(θ)) * model1 + (sin(t*θ) / sin(θ)) * model2
# 其中 θ = arccos(dot(model1, model2))
# t ∈ [0, 1]

提取"任务向量"（微调模型 - 基础模型）
组合任务向量，添加到基础模型
适合合并多个专业模型

# 任务向量
task_vector = finetuned_model - base_model

# 合并多个任务向量
merged = base_model + α₁*task_vector₁ + α₂*task_vector₂

任务算术 + 稀疏化
解决参数中的符号冲突
最适合合并许多特定任务模型

DARE（丢弃与重缩放）

随机丢弃微调参数
重新缩放剩余参数
减少冗余，保持性能

# 基本结构
merge_method: <method>  # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path>      # 可选：任务算术的基础模型

models:
  - model: <path/to/model1>
    parameters:
      weight: <float>   # 合并权重
      density: <float>  # 用于 TIES/DARE

  - model: <path/to/model2>
    parameters:
      weight: <float>

parameters:
  # 方法特定参数

dtype: <dtype>  # bfloat16, float16, float32

# 可选
slices:  # 逐层合并
tokenizer:  # 分词器配置

最适合：简单的模型组合，等权重

merge_method: linear
models:
  - model: WizardLM/WizardMath-7B-V1.1
    parameters:
      weight: 0.4
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      weight: 0.3
  - model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
    parameters:
      weight: 0.3
dtype: bfloat16

最适合：两个模型，平滑插值

merge_method: slerp
slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 32]
      - model: teknium/OpenHermes-2.5-Mistral-7B
        layer_range: [0, 32]
parameters:
  t: 0.5  # 0.0 = 第一个模型, 1.0 = 第二个模型
dtype: bfloat16

merge_method: slerp
slices:
  - sources:
      - model: model_a
        layer_range: [0, 32]
      - model: model_b
        layer_range: [0, 32]
parameters:
  t:
    - filter: self_attn    # 注意力层
      value: 0.3
    - filter: mlp          # MLP 层
      value: 0.7
    - value: 0.5           # 其他层的默认值
dtype: bfloat16

最适合：组合专业技能

merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: WizardLM/WizardMath-7B-V1.1  # 数学
    parameters:
      weight: 0.5
  - model: teknium/OpenHermes-2.5-Mistral-7B  # 聊天
    parameters:
      weight: 0.3
  - model: ajibawa-2023/Code-Mistral-7B  # 代码
    parameters:
      weight: 0.2
dtype: bfloat16

最适合：多个模型，解决冲突

merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: WizardLM/WizardMath-7B-V1.1
    parameters:
      density: 0.5  # 保留前 50% 的参数
      weight: 1.0
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      density: 0.5
      weight: 1.0
  - model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
    parameters:
      density: 0.5
      weight: 1.0
parameters:
  normalize: true
dtype: bfloat16

最适合：减少冗余

merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: WizardLM/WizardMath-7B-V1.1
    parameters:
      density: 0.5    # 丢弃 50% 的增量
      weight: 0.6
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      density: 0.5
      weight: 0.4
parameters:
  int8_mask: true  # 对掩码使用 int8（节省内存）
dtype: bfloat16

# 不同层使用不同模型
merge_method: passthrough
slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 16]   # 前半部分
  - sources:
      - model: teknium/OpenHermes-2.5-Mistral-7B
        layer_range: [16, 32]  # 后半部分
dtype: bfloat16

从合并模型创建 MoE

# 创建专家混合模型
merge_method: moe
base_model: mistralai/Mistral-7B-v0.1
experts:
  - source_model: WizardLM/WizardMath-7B-V1.1
    positive_prompts:
      - "math"
      - "calculate"
  - source_model: teknium/OpenHermes-2.5-Mistral-7B
    positive_prompts:
      - "chat"
      - "conversation"
  - source_model: ajibawa-2023/Code-Mistral-7B
    positive_prompts:
      - "code"
      - "python"
dtype: bfloat16

merge_method: linear
models:
  - model: mistralai/Mistral-7B-v0.1
  - model: custom/specialized-model

tokenizer:
  source: "union"  # 组合两个模型的词汇表
  tokens:
    <|special_token|>:
      source: "custom/specialized-model"

# ✅ 良好：相同架构
models = [
    "mistralai/Mistral-7B-v0.1",
    "teknium/OpenHermes-2.5-Mistral-7B",  # 两者都是 Mistral 7B
]

# ❌ 不良：不同架构
models = [
    "meta-llama/Llama-2-7b-hf",  # Llama
    "mistralai/Mistral-7B-v0.1",  # Mistral（不兼容！）
]

# ✅ 良好：权重总和为 1.0
models:
  - model: model_a
    parameters:
      weight: 0.6
  - model: model_b
    parameters:
      weight: 0.4  # 0.6 + 0.4 = 1.0

# ⚠️  可接受：权重总和不为 1（用于任务算术）
models:
  - model: model_a
    parameters:
      weight: 0.8
  - model: model_b
    parameters:
      weight: 0.8  # 可能提升性能

# 根据用例选择合并方法：

# 2 个模型，平滑混合 → SLERP
merge_method = "slerp"

# 3+ 个模型，简单平均 → 线性
merge_method = "linear"

# 多个特定任务模型 → 任务算术或 TIES
merge_method = "ties"

# 希望减少冗余 → DARE
merge_method = "dare_ties"

4. 密度调优（TIES/DARE）

# 从保守开始（保留更多参数）
parameters:
  density: 0.8  # 保留 80%

# 如果性能良好，增加稀疏性
parameters:
  density: 0.5  # 保留 50%

# 如果性能下降，减少稀疏性
parameters:
  density: 0.9  # 保留 90%

# 保留基础模型的开始和结束部分
merge_method: passthrough
slices:
  - sources:
      - model: base_model
        layer_range: [0, 2]     # 保留前几层
  - sources:
      - model: merged_middle    # 合并中间层
        layer_range: [2, 30]
  - sources:
      - model: base_model
        layer_range: [30, 32]   # 保留最后几层

基准测试合并模型

from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载合并模型
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")

# 在各种任务上测试
test_prompts = {
    "math": "Calculate: 25 * 17 =",
    "code": "Write a Python function to reverse a string:",
    "chat": "What is the capital of France?",
}

for task, prompt in test_prompts.items():
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=100)
    print(f"{task}: {tokenizer.decode(outputs[0])}")

Open LLM 排行榜：通用能力
MT-Bench：多轮对话
MMLU：多任务准确性
HumanEval：代码生成
GSM8K：数学推理

from transformers import AutoModelForCausalLM, AutoTokenizer

# 加载合并模型
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")

# 上传到 HuggingFace Hub
model.push_to_hub("username/my-merged-model")
tokenizer.push_to_hub("username/my-merged-model")

# 使用 GGUF 量化
python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf

# 使用 GPTQ 量化
python quantize_gptq.py ./merged-model --bits 4 --group_size 128

❌ 陷阱 1：合并不兼容的模型

# 错误：不同架构
models:
  - model: meta-llama/Llama-2-7b  # Llama 架构
  - model: mistralai/Mistral-7B   # Mistral 架构

修复：仅合并具有相同架构的模型

❌ 陷阱 2：过度加权一个模型

# 次优：一个模型占主导地位
models:
  - model: model_a
    parameters:
      weight: 0.95  # 过高
  - model: model_b
    parameters:
      weight: 0.05  # 过低

修复：使用更平衡的权重（0.3-0.7 范围）

❌ 陷阱 3：不进行评估

# 错误：合并后不测试直接部署
mergekit-yaml config.yml ./merged-model
# 立即部署（有风险！）

修复：部署前始终进行基准测试

mergekit GitHub：https://github.com/arcee-ai/mergekit
HuggingFace 教程：https://huggingface.co/blog/mlabonne/merge-models
LazyMergekit：自动化合并笔记本
TIES 论文：https://arxiv.org/abs/2306.01708
DARE 论文：https://arxiv.org/abs/2311.03099

references/methods.md - 合并算法深入探讨
references/examples.md - 实际合并配置示例
references/evaluation.md - 基准测试和测试策略

🇺🇸English

Model Merging: Combining Pre-trained Models

When to Use This Skill

Use Model Merging when you need to:

Combine capabilities from multiple fine-tuned models without retraining
Create specialized models by blending domain-specific expertise (math + coding + chat)
Improve performance beyond single models (often +5-10% on benchmarks)
Reduce training costs - no GPUs needed, merges run on CPU
Experiment rapidly - create new model variants in minutes, not days
Preserve multiple skills - merge without catastrophic forgetting

Success Stories : Marcoro14-7B-slerp (best on Open LLM Leaderboard 02/2024), many top HuggingFace models use merging

Tools : mergekit (Arcee AI), LazyMergekit, Model Soup

Installation

# Install mergekit
git clone https://github.com/arcee-ai/mergekit.git
cd mergekit
pip install -e .

# Or via pip
pip install mergekit

# Optional: Transformer library
pip install transformers torch

Quick Start

Simple Linear Merge

# config.yml - Merge two models with equal weights
merge_method: linear
models:
  - model: mistralai/Mistral-7B-v0.1
    parameters:
      weight: 0.5
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      weight: 0.5
dtype: bfloat16



# Run merge
mergekit-yaml config.yml ./merged-model --cuda

# Use merged model
python -m transformers.models.auto --model_name_or_path ./merged-model

SLERP Merge (Best for 2 Models)

# config.yml - Spherical interpolation
merge_method: slerp
slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 32]
      - model: teknium/OpenHermes-2.5-Mistral-7B
        layer_range: [0, 32]
parameters:
  t: 0.5  # Interpolation factor (0=model1, 1=model2)
dtype: bfloat16

Core Concepts

1. Merge Methods

Linear (Model Soup)

Simple weighted average of parameters
Fast, works well for similar models
Can merge 2+ models

merged_weights = w1 * model1_weights + w2 * model2_weights + w3 * model3_weights

where w1 + w2 + w3 = 1

SLERP (Spherical Linear Interpolation)

Interpolates along sphere in weight space
Preserves magnitude of weight vectors
Best for merging 2 models
Smoother than linear

SLERP formula

merged = (sin((1-t)θ) / sin(θ)) * model1 + (sin(tθ) / sin(θ)) * model2

where θ = arccos(dot(model1, model2))

t ∈ [0, 1]

Task Arithmetic

Extract "task vectors" (fine-tuned - base)
Combine task vectors, add to base
Good for merging multiple specialized models

Task vector

task_vector = finetuned_model - base_model

Merge multiple task vectors

merged = base_model + α₁task_vector₁ + α₂task_vector₂

TIES-Merging

Task arithmetic + sparsification
Resolves sign conflicts in parameters
Best for merging many task-specific models

DARE (Drop And REscale)

Randomly drops fine-tuned parameters
Rescales remaining parameters
Reduces redundancy, maintains performance

2. Configuration Structure

# Basic structure
merge_method: <method>  # linear, slerp, ties, dare_ties, task_arithmetic
base_model: <path>      # Optional: base model for task arithmetic

models:
  - model: <path/to/model1>
    parameters:
      weight: <float>   # Merge weight
      density: <float>  # For TIES/DARE

  - model: <path/to/model2>
    parameters:
      weight: <float>

parameters:
  # Method-specific parameters

dtype: <dtype>  # bfloat16, float16, float32

# Optional
slices:  # Layer-wise merging
tokenizer:  # Tokenizer configuration

Merge Methods Guide

Linear Merge

Best for : Simple model combinations, equal weighting

merge_method: linear
models:
  - model: WizardLM/WizardMath-7B-V1.1
    parameters:
      weight: 0.4
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      weight: 0.3
  - model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
    parameters:
      weight: 0.3
dtype: bfloat16

SLERP Merge

Best for : Two models, smooth interpolation

merge_method: slerp
slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 32]
      - model: teknium/OpenHermes-2.5-Mistral-7B
        layer_range: [0, 32]
parameters:
  t: 0.5  # 0.0 = first model, 1.0 = second model
dtype: bfloat16

Layer-specific SLERP:

merge_method: slerp
slices:
  - sources:
      - model: model_a
        layer_range: [0, 32]
      - model: model_b
        layer_range: [0, 32]
parameters:
  t:
    - filter: self_attn    # Attention layers
      value: 0.3
    - filter: mlp          # MLP layers
      value: 0.7
    - value: 0.5           # Default for other layers
dtype: bfloat16

Task Arithmetic

Best for : Combining specialized skills

merge_method: task_arithmetic
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: WizardLM/WizardMath-7B-V1.1  # Math
    parameters:
      weight: 0.5
  - model: teknium/OpenHermes-2.5-Mistral-7B  # Chat
    parameters:
      weight: 0.3
  - model: ajibawa-2023/Code-Mistral-7B  # Code
    parameters:
      weight: 0.2
dtype: bfloat16

TIES-Merging

Best for : Many models, resolving conflicts

merge_method: ties
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: WizardLM/WizardMath-7B-V1.1
    parameters:
      density: 0.5  # Keep top 50% of parameters
      weight: 1.0
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      density: 0.5
      weight: 1.0
  - model: NousResearch/Nous-Hermes-2-Mistral-7B-DPO
    parameters:
      density: 0.5
      weight: 1.0
parameters:
  normalize: true
dtype: bfloat16

DARE Merge

Best for : Reducing redundancy

merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
models:
  - model: WizardLM/WizardMath-7B-V1.1
    parameters:
      density: 0.5    # Drop 50% of deltas
      weight: 0.6
  - model: teknium/OpenHermes-2.5-Mistral-7B
    parameters:
      density: 0.5
      weight: 0.4
parameters:
  int8_mask: true  # Use int8 for masks (saves memory)
dtype: bfloat16

Advanced Patterns

Layer-wise Merging

# Different models for different layers
merge_method: passthrough
slices:
  - sources:
      - model: mistralai/Mistral-7B-v0.1
        layer_range: [0, 16]   # First half
  - sources:
      - model: teknium/OpenHermes-2.5-Mistral-7B
        layer_range: [16, 32]  # Second half
dtype: bfloat16

MoE from Merged Models

# Create Mixture of Experts
merge_method: moe
base_model: mistralai/Mistral-7B-v0.1
experts:
  - source_model: WizardLM/WizardMath-7B-V1.1
    positive_prompts:
      - "math"
      - "calculate"
  - source_model: teknium/OpenHermes-2.5-Mistral-7B
    positive_prompts:
      - "chat"
      - "conversation"
  - source_model: ajibawa-2023/Code-Mistral-7B
    positive_prompts:
      - "code"
      - "python"
dtype: bfloat16

Tokenizer Merging

merge_method: linear
models:
  - model: mistralai/Mistral-7B-v0.1
  - model: custom/specialized-model

tokenizer:
  source: "union"  # Combine vocabularies from both models
  tokens:
    <|special_token|>:
      source: "custom/specialized-model"

Best Practices

1. Model Compatibility

# ✅ Good: Same architecture
models = [
    "mistralai/Mistral-7B-v0.1",
    "teknium/OpenHermes-2.5-Mistral-7B",  # Both Mistral 7B
]

# ❌ Bad: Different architectures
models = [
    "meta-llama/Llama-2-7b-hf",  # Llama
    "mistralai/Mistral-7B-v0.1",  # Mistral (incompatible!)
]

2. Weight Selection

# ✅ Good: Weights sum to 1.0
models:
  - model: model_a
    parameters:
      weight: 0.6
  - model: model_b
    parameters:
      weight: 0.4  # 0.6 + 0.4 = 1.0

# ⚠️  Acceptable: Weights don't sum to 1 (for task arithmetic)
models:
  - model: model_a
    parameters:
      weight: 0.8
  - model: model_b
    parameters:
      weight: 0.8  # May boost performance

3. Method Selection

# Choose merge method based on use case:

# 2 models, smooth blend → SLERP
merge_method = "slerp"

# 3+ models, simple average → Linear
merge_method = "linear"

# Multiple task-specific models → Task Arithmetic or TIES
merge_method = "ties"

# Want to reduce redundancy → DARE
merge_method = "dare_ties"

4. Density Tuning (TIES/DARE)

# Start conservative (keep more parameters)
parameters:
  density: 0.8  # Keep 80%

# If performance good, increase sparsity
parameters:
  density: 0.5  # Keep 50%

# If performance degrades, reduce sparsity
parameters:
  density: 0.9  # Keep 90%

5. Layer-specific Merging

# Preserve base model's beginning and end
merge_method: passthrough
slices:
  - sources:
      - model: base_model
        layer_range: [0, 2]     # Keep first layers
  - sources:
      - model: merged_middle    # Merge middle layers
        layer_range: [2, 30]
  - sources:
      - model: base_model
        layer_range: [30, 32]   # Keep last layers

Evaluation & Testing

Benchmark Merged Models

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load merged model
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")

# Test on various tasks
test_prompts = {
    "math": "Calculate: 25 * 17 =",
    "code": "Write a Python function to reverse a string:",
    "chat": "What is the capital of France?",
}

for task, prompt in test_prompts.items():
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=100)
    print(f"{task}: {tokenizer.decode(outputs[0])}")

Common Benchmarks

Open LLM Leaderboard : General capabilities
MT-Bench : Multi-turn conversation
MMLU : Multitask accuracy
HumanEval : Code generation
GSM8K : Math reasoning

Production Deployment

Save and Upload

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load merged model
model = AutoModelForCausalLM.from_pretrained("./merged-model")
tokenizer = AutoTokenizer.from_pretrained("./merged-model")

# Upload to HuggingFace Hub
model.push_to_hub("username/my-merged-model")
tokenizer.push_to_hub("username/my-merged-model")

Quantize Merged Model

# Quantize with GGUF
python convert.py ./merged-model --outtype f16 --outfile merged-model.gguf

# Quantize with GPTQ
python quantize_gptq.py ./merged-model --bits 4 --group_size 128

Common Pitfalls

❌ Pitfall 1: Merging Incompatible Models

# Wrong: Different architectures
models:
  - model: meta-llama/Llama-2-7b  # Llama architecture
  - model: mistralai/Mistral-7B   # Mistral architecture

Fix : Only merge models with same architecture

❌ Pitfall 2: Over-weighting One Model

# Suboptimal: One model dominates
models:
  - model: model_a
    parameters:
      weight: 0.95  # Too high
  - model: model_b
    parameters:
      weight: 0.05  # Too low

Fix : Use more balanced weights (0.3-0.7 range)

❌ Pitfall 3: Not Evaluating

# Wrong: Merge and deploy without testing
mergekit-yaml config.yml ./merged-model
# Deploy immediately (risky!)

Fix : Always benchmark before deploying

Resources

mergekit GitHub : https://github.com/arcee-ai/mergekit
HuggingFace Tutorial : https://huggingface.co/blog/mlabonne/merge-models
LazyMergekit : Automated merging notebook
TIES Paper : https://arxiv.org/abs/2306.01708
DARE Paper : https://arxiv.org/abs/2311.03099

模型合并指南：无需GPU快速组合预训练模型，提升AI性能5-10%

🇨🇳中文介绍

模型合并：组合预训练模型

何时使用此技能

安装

快速开始

简单线性合并

相关 Skills

SLERP 合并（最适合 2 个模型）

核心概念

1. 合并方法

2. 配置结构

合并方法指南

线性合并

SLERP 合并

任务算术

TIES-Merging

DARE 合并

高级模式

逐层合并

从合并模型创建 MoE

分词器合并

最佳实践

1. 模型兼容性

2. 权重选择

3. 方法选择

4. 密度调优（TIES/DARE）

5. 逐层合并

评估与测试

基准测试合并模型

常见基准测试

生产部署

保存与上传

量化合并模型

常见陷阱

❌ 陷阱 1：合并不兼容的模型

❌ 陷阱 2：过度加权一个模型

❌ 陷阱 3：不进行评估

资源

另请参阅

🇺🇸English

Model Merging: Combining Pre-trained Models

When to Use This Skill

Installation

Quick Start

Simple Linear Merge

SLERP Merge (Best for 2 Models)

Core Concepts

1. Merge Methods

where w1 + w2 + w3 = 1

SLERP formula

where θ = arccos(dot(model1, model2))

t ∈ [0, 1]

Task vector

Merge multiple task vectors

2. Configuration Structure

Merge Methods Guide

Linear Merge

SLERP Merge

Task Arithmetic

TIES-Merging

DARE Merge

Advanced Patterns

Layer-wise Merging

MoE from Merged Models

Tokenizer Merging

Best Practices

1. Model Compatibility

2. Weight Selection

3. Method Selection

4. Density Tuning (TIES/DARE)

5. Layer-specific Merging

Evaluation & Testing

Benchmark Merged Models

Common Benchmarks

Production Deployment

Save and Upload

Quantize Merged Model

Common Pitfalls

❌ Pitfall 1: Merging Incompatible Models