simpo-training by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill simpo-trainingSimPO 是一种无需参考模型的偏好优化方法,其性能优于 DPO。
安装:
# 创建环境
conda create -n simpo python=3.10 && conda activate simpo
# 安装 PyTorch 2.2.2
# 访问:https://pytorch.org/get-started/locally/
# 安装 alignment-handbook
git clone https://github.com/huggingface/alignment-handbook.git
cd alignment-handbook
python -m pip install .
# 安装 Flash Attention 2
python -m pip install flash-attn --no-build-isolation
训练 (Mistral 7B):
ACCELERATE_LOG_LEVEL=info accelerate launch \
--config_file accelerate_configs/deepspeed_zero3.yaml \
scripts/run_simpo.py \
training_configs/mistral-7b-base-simpo.yaml
配置 (mistral-7b-base-simpo.yaml):
# 模型
model_name_or_path: mistralai/Mistral-7B-v0.1
torch_dtype: bfloat16
# 数据集
dataset_mixer:
HuggingFaceH4/ultrafeedback_binarized: 1.0
dataset_splits:
- train_prefs
- test_prefs
# SimPO 超参数
beta: 2.0 # 奖励缩放 (2.0-10.0)
gamma_beta_ratio: 0.5 # 目标边界 (0-1)
loss_type: sigmoid # sigmoid 或 hinge
sft_weight: 0.0 # 可选的 SFT 正则化
# 训练
learning_rate: 5e-7 # 关键:3e-7 到 1e-6
num_train_epochs: 1
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
# 输出
output_dir: ./outputs/mistral-7b-simpo
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
启动训练:
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml \
scripts/run_simpo.py training_configs/mistral-7b-base-simpo.yaml
配置 (llama3-8b-instruct-simpo.yaml):
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
dataset_mixer:
argilla/ultrafeedback-binarized-preferences-cleaned: 1.0
beta: 2.5
gamma_beta_ratio: 0.5
learning_rate: 5e-7
sft_weight: 0.1 # 添加 SFT 损失以保持能力
num_train_epochs: 1
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
output_dir: ./outputs/llama3-8b-simpo
启动:
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml \
scripts/run_simpo.py training_configs/llama3-8b-instruct-simpo.yaml
适用于数学/代码任务:
model_name_or_path: deepseek-ai/deepseek-math-7b-base
dataset_mixer:
argilla/distilabel-math-preference-dpo: 1.0
beta: 5.0 # 更高以获得更强信号
gamma_beta_ratio: 0.7 # 更大的边界
learning_rate: 3e-7 # 推理任务使用较低学习率
sft_weight: 0.0
num_train_epochs: 1
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
在以下情况使用 SimPO:
算法选择:
改用替代方案的情况:
问题:损失发散
降低学习率:
learning_rate: 3e-7 # 从 5e-7 降低
降低 beta:
beta: 1.0 # 从 2.0 降低
问题:模型遗忘能力
添加 SFT 正则化:
sft_weight: 0.1 # 添加 SFT 损失组件
问题:偏好区分度差
增加 beta 和边界:
beta: 5.0 # 从 2.0 增加
gamma_beta_ratio: 0.8 # 从 0.5 增加
问题:训练期间内存不足
减小批次大小:
per_device_train_batch_size: 1
gradient_accumulation_steps: 16 # 保持有效批次大小
启用梯度检查点:
gradient_checkpointing: true
损失函数:有关 sigmoid 与 hinge 损失、数学公式以及何时使用每种损失的详细信息,请参阅 references/loss-functions.md。
超参数调优:有关 beta、gamma、学习率选择指南以及特定模型大小建议的详细信息,请参阅 references/hyperparameters.md。
数据集准备:有关偏好数据格式、质量过滤和自定义数据集创建的详细信息,请参阅 references/datasets.md。
内存优化:
每周安装量
151
代码仓库
GitHub 星标数
23.4K
首次出现
2026年1月21日
安全审计
安装于
claude-code131
opencode123
gemini-cli117
cursor117
antigravity105
codex104
SimPO is a reference-free preference optimization method that outperforms DPO without needing a reference model.
Installation :
# Create environment
conda create -n simpo python=3.10 && conda activate simpo
# Install PyTorch 2.2.2
# Visit: https://pytorch.org/get-started/locally/
# Install alignment-handbook
git clone https://github.com/huggingface/alignment-handbook.git
cd alignment-handbook
python -m pip install .
# Install Flash Attention 2
python -m pip install flash-attn --no-build-isolation
Training (Mistral 7B):
ACCELERATE_LOG_LEVEL=info accelerate launch \
--config_file accelerate_configs/deepspeed_zero3.yaml \
scripts/run_simpo.py \
training_configs/mistral-7b-base-simpo.yaml
Config (mistral-7b-base-simpo.yaml):
# Model
model_name_or_path: mistralai/Mistral-7B-v0.1
torch_dtype: bfloat16
# Dataset
dataset_mixer:
HuggingFaceH4/ultrafeedback_binarized: 1.0
dataset_splits:
- train_prefs
- test_prefs
# SimPO hyperparameters
beta: 2.0 # Reward scaling (2.0-10.0)
gamma_beta_ratio: 0.5 # Target margin (0-1)
loss_type: sigmoid # sigmoid or hinge
sft_weight: 0.0 # Optional SFT regularization
# Training
learning_rate: 5e-7 # Critical: 3e-7 to 1e-6
num_train_epochs: 1
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
# Output
output_dir: ./outputs/mistral-7b-simpo
Launch training :
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml \
scripts/run_simpo.py training_configs/mistral-7b-base-simpo.yaml
Config (llama3-8b-instruct-simpo.yaml):
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
dataset_mixer:
argilla/ultrafeedback-binarized-preferences-cleaned: 1.0
beta: 2.5
gamma_beta_ratio: 0.5
learning_rate: 5e-7
sft_weight: 0.1 # Add SFT loss to preserve capabilities
num_train_epochs: 1
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
output_dir: ./outputs/llama3-8b-simpo
Launch :
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml \
scripts/run_simpo.py training_configs/llama3-8b-instruct-simpo.yaml
For math/code tasks :
model_name_or_path: deepseek-ai/deepseek-math-7b-base
dataset_mixer:
argilla/distilabel-math-preference-dpo: 1.0
beta: 5.0 # Higher for stronger signal
gamma_beta_ratio: 0.7 # Larger margin
learning_rate: 3e-7 # Lower LR for reasoning
sft_weight: 0.0
num_train_epochs: 1
per_device_train_batch_size: 1
gradient_accumulation_steps: 16
Use SimPO when :
Algorithm selection :
Use alternatives instead :
Issue: Loss divergence
Reduce learning rate:
learning_rate: 3e-7 # Reduce from 5e-7
Reduce beta:
beta: 1.0 # Reduce from 2.0
Issue: Model forgets capabilities
Add SFT regularization:
sft_weight: 0.1 # Add SFT loss component
Issue: Poor preference separation
Increase beta and margin:
beta: 5.0 # Increase from 2.0
gamma_beta_ratio: 0.8 # Increase from 0.5
Issue: OOM during training
Reduce batch size:
per_device_train_batch_size: 1
gradient_accumulation_steps: 16 # Maintain effective batch
Enable gradient checkpointing:
gradient_checkpointing: true
Loss functions : See references/loss-functions.md for sigmoid vs hinge loss, mathematical formulations, and when to use each.
Hyperparameter tuning : See references/hyperparameters.md for beta, gamma, learning rate selection guide, and model-size-specific recommendations.
Dataset preparation : See references/datasets.md for preference data formats, quality filtering, and custom dataset creation.
Memory optimization :
Weekly Installs
151
Repository
GitHub Stars
23.4K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
claude-code131
opencode123
gemini-cli117
cursor117
antigravity105
codex104
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
48,300 周安装