Hugging Face TRL模型训练：云端GPU微调SFT/DPO/GRPO，自动保存至Hub

hugging-face-model-trainer by huggingface/skills

417 周安装量

9,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/huggingface/skills --skill hugging-face-model-trainer

AI/机器学习自动化云服务

🇨🇳中文介绍

在 Hugging Face Jobs 上进行 TRL 训练

概述

在完全托管的 Hugging Face 基础设施上使用 TRL（Transformer 强化学习）训练语言模型。无需本地 GPU 设置——模型在云端 GPU 上训练，结果自动保存到 Hugging Face Hub。

TRL 提供多种训练方法：

SFT（监督微调）- 标准的指令调优
DPO（直接偏好优化）- 基于偏好数据的对齐
GRPO（组相对策略优化）- 在线 RL 训练
奖励建模 - 为 RLHF 训练奖励模型

详细的 TRL 方法文档：

hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO
# etc.

另请参阅： references/training_methods.md 了解方法概述和选择指南

何时使用此技能

当用户希望时使用此技能：

在云端 GPU 上微调语言模型，无需本地基础设施
使用 TRL 方法（SFT、DPO、GRPO 等）进行训练
在 Hugging Face Jobs 基础设施上运行训练任务
将训练好的模型转换为 GGUF 格式以进行本地部署（Ollama、LM Studio、llama.cpp）
确保训练好的模型永久保存到 Hub
使用具有优化默认设置的现代工作流程

何时使用 Unsloth

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

✅ 账户与认证

拥有 Pro、Team 或 Enterprise 计划的 Hugging Face 账户（Jobs 需要付费计划）
已认证登录：使用 hf_whoami() 检查
用于 Hub 推送的 HF_TOKEN ⚠️ 关键 - 训练环境是临时的，必须推送到 Hub，否则所有训练结果都会丢失
Token 必须具有写入权限
必须在任务配置中传递 secrets={"HF_TOKEN": "$HF_TOKEN"} 以使 token 可用（$HF_TOKEN 语法引用您实际的 token 值）

✅ 数据集要求

数据集必须存在于 Hub 上或可通过 datasets.load_dataset() 加载
格式必须与训练方法匹配（SFT："messages"/文本/提示-完成；DPO：chosen/rejected；GRPO：仅提示）
在 GPU 训练之前，始终验证未知数据集，以防止格式失败（参见下面的数据集验证部分）
大小适合硬件（演示：在 t4-small 上 50-100 个示例；生产：在 a10g-large/a100-large 上 1K-10K+ 个示例）

⚠️ 关键设置

超时必须超过预期的训练时间 - 默认 30 分钟对于大多数训练来说太短。最低推荐：1-2 小时。如果超过超时时间，任务将失败并丢失所有进度。
必须启用 Hub 推送 - 配置：push_to_hub=True，hub_model_id="username/model-name"；任务：secrets={"HF_TOKEN": "$HF_TOKEN"}

⚠️ 重要：训练任务异步运行，可能需要数小时

当用户请求训练时：

创建训练脚本，包含 Trackio（使用 scripts/train_sft_example.py 作为模板）
立即提交，使用 hf_jobs() MCP 工具并内联脚本内容——除非用户要求，否则不要保存到文件
报告提交情况，包括任务 ID、监控 URL 和预计时间
等待用户 请求状态检查——不要自动轮询

任务在后台运行 - 提交后立即返回；训练独立继续
初始日志延迟 - 日志可能需要 30-60 秒才会出现
用户检查状态 - 等待用户请求状态更新
避免轮询 - 仅在用户请求时检查日志；提供监控链接代替

向用户提供：

✅ 任务 ID 和监控 URL
✅ 预计完成时间
✅ Trackio 仪表板 URL
✅ 告知用户稍后可以请求状态检查

✅ Job submitted successfully!

Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz

Expected time: ~2 hours
Estimated cost: ~$10

The job is running in the background. Ask me to check status/logs when ready!

快速开始：三种方法

💡 演示提示： 对于在较小 GPU（t4-small）上的快速演示，可以省略 eval_dataset 和 eval_strategy 以节省约 40% 的内存。您仍然可以看到训练损失和学习进度。

TRL 配置类使用 max_length（而不是 max_seq_length） 来控制分词后的序列长度：

# ✅ 正确 - 如果需要设置序列长度
SFTConfig(max_length=512)   # 将序列截断为 512 个 token
DPOConfig(max_length=2048)  # 更长的上下文（2048 个 token）

# ❌ 错误 - 此参数不存在
SFTConfig(max_seq_length=512)  # TypeError!

默认行为： max_length=1024（从右侧截断）。这适用于大多数训练。

何时需要覆盖：

更长上下文：设置更高值（例如 max_length=2048）
内存限制：设置更低值（例如 max_length=512）
视觉模型：设置 max_length=None（防止切割图像 token）

通常您根本不需要设置此参数——下面的示例使用了合理的默认值。

方法 1：UV 脚本（推荐——默认选择）

UV 脚本使用 PEP 723 内联依赖项，实现干净、自包含的训练。这是 Claude Code 的主要方法。

hf_jobs("uv", {
    "script": """
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
# ///

from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import trackio

dataset = load_dataset("trl-lib/Capybara", split="train")

# 创建训练/评估分割以进行监控
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset_split["train"],
    eval_dataset=dataset_split["test"],
    peft_config=LoraConfig(r=16, lora_alpha=32),
    args=SFTConfig(
        output_dir="my-model",
        push_to_hub=True,
        hub_model_id="username/my-model",
        num_train_epochs=3,
        eval_strategy="steps",
        eval_steps=50,
        report_to="trackio",
        project="meaningful_prject_name", # 训练名称的项目名 (trackio)
        run_name="meaningful_run_name",   # 特定训练运行的描述性名称 (trackio)
    )
)

trainer.train()
trainer.push_to_hub()
""",
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

优点： 直接使用 MCP 工具，代码干净，依赖项内联声明（PEP 723），无需保存文件，完全控制 何时使用： Claude Code 中所有训练任务的默认选择，自定义训练逻辑，任何需要 hf_jobs() 的场景

⚠️ 重要： script 参数接受内联代码（如上所示）或 URL。本地文件路径无效。

为什么本地路径无效： 任务在隔离的 Docker 容器中运行，无法访问您的本地文件系统。脚本必须是：

内联代码（推荐用于自定义训练）
可公开访问的 URL
私有仓库 URL（需 HF_TOKEN）

# ❌ 这些都会失败
hf_jobs("uv", {"script": "train.py"})
hf_jobs("uv", {"script": "./scripts/train.py"})
hf_jobs("uv", {"script": "/path/to/train.py"})

# ✅ 内联代码（推荐）
hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})

# ✅ 来自 Hugging Face Hub
hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})

# ✅ 来自 GitHub
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})

# ✅ 来自 Gist
hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})

要使用本地脚本： 先上传到 HF Hub：

hf repos create my-training-scripts --type model
hf upload my-training-scripts ./train.py train.py
# 使用：https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py

方法 2：TRL 维护的脚本（官方示例）

TRL 为所有方法提供了经过实战检验的脚本。可以从 URL 运行：

hf_jobs("uv", {
    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
    "script_args": [
        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
        "--dataset_name", "trl-lib/Capybara",
        "--output_dir", "my-model",
        "--push_to_hub",
        "--hub_model_id", "username/my-model"
    ],
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

优点： 无需编写代码，由 TRL 团队维护，经过生产测试 何时使用： 标准 TRL 训练，快速实验，不需要自定义代码 可用： 脚本可从 https://github.com/huggingface/trl/tree/main/examples/scripts 获取

在 Hub 上查找更多 UV 脚本

uv-scripts 组织提供了现成的 UV 脚本，作为数据集存储在 Hugging Face Hub 上：

# 发现可用的 UV 脚本集合
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})

# 探索特定集合
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)

热门集合： ocr, classification, synthetic-data, vllm, dataset-creation

方法 3：HF Jobs CLI（直接终端命令）

当 hf_jobs() MCP 工具不可用时，直接使用 hf jobs CLI。

⚠️ 关键：CLI 语法规则

# ✅ 正确语法 - 标志在脚本 URL 之前
hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"

# ❌ 错误 - "run uv" 而不是 "uv run"
hf jobs run uv "https://example.com/train.py" --flavor a10g-large

# ❌ 错误 - 标志在脚本 URL 之后（将被忽略！）
hf jobs uv run "https://example.com/train.py" --flavor a10g-large

# ❌ 错误 - "--secret" 而不是 "--secrets"（复数）
hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"

关键语法规则：

命令顺序是 hf jobs uv run（不是 hf jobs run uv）
所有标志（--flavor、--timeout、--secrets）必须在脚本 URL 之前
使用 --secrets（复数），而不是 --secret
脚本 URL 必须是最后一个位置参数

完整的 CLI 示例：

hf jobs uv run \
  --flavor a10g-large \
  --timeout 2h \
  --secrets HF_TOKEN \
  "https://huggingface.co/user/repo/resolve/main/train.py"

通过 CLI 检查任务状态：

hf jobs ps                        # 列出所有任务
hf jobs logs <job-id>             # 查看日志
hf jobs inspect <job-id>          # 任务详情
hf jobs cancel <job-id>           # 取消任务

方法 4：TRL Jobs 包（简化训练）

trl-jobs 包提供了优化的默认设置和一行式训练。

uvx trl-jobs sft \
  --model_name Qwen/Qwen2.5-0.5B \
  --dataset_name trl-lib/Capybara

优点： 预配置设置，自动 Trackio 集成，自动 Hub 推送，一行命令 何时使用： 用户直接在终端工作（不在 Claude Code 上下文中），快速本地实验 仓库： https://github.com/huggingface/trl-jobs

⚠️ 在 Claude Code 上下文中，如果可用，优先使用 hf_jobs() MCP 工具（方法 1）。

模型大小	推荐硬件	成本（约/小时）	使用场景
<1B 参数	`t4-small`	~$0.75	演示，仅用于快速测试（无评估步骤）
1-3B 参数	`t4-medium`, `l4x1`	~$1.50-2.50	开发
3-7B 参数	`a10g-small`, `a10g-large`	~$3.50-5.00	生产训练
7-13B 参数	`a10g-large`, `a100-large`	~$5-10	大模型（使用 LoRA）
13B+ 参数	`a100-large`, `a10g-largex2`	~$10-20	非常大的模型（使用 LoRA）

GPU 类型： cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8

对于 >7B 的模型，使用 LoRA/PEFT 以减少内存
多 GPU 由 TRL/Accelerate 自动处理
测试时从较小的硬件开始

参见： references/hardware_guide.md 了解详细规格

关键：将结果保存到 Hub

⚠️ 临时环境——必须推送到 Hub

Jobs 环境是临时的。任务结束时所有文件都会被删除。如果模型没有推送到 Hub，所有训练都将丢失。

在训练脚本/配置中：

SFTConfig(
    push_to_hub=True,
    hub_model_id="username/model-name",  # 必须指定
    hub_strategy="every_save",  # 可选：推送检查点
)

在任务提交中：

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # 启用认证
}

配置中设置了 push_to_hub=True
hub_model_id 包含 username/repo-name
secrets 参数包含 HF_TOKEN
用户对目标仓库具有写入权限

参见： references/hub_saving.md 了解详细故障排除

⚠️ 默认：30 分钟——对于训练来说太短

{
    "timeout": "2h"   # 2 小时（格式："90m", "2h", "1.5h"，或整数秒）
}

场景	推荐	备注
快速演示（50-100 个示例）	10-30 分钟	验证设置
开发训练	1-2 小时	小数据集
生产（3-7B 模型）	4-6 小时	完整数据集
使用 LoRA 的大模型	3-6 小时	取决于数据集

始终增加 20-30% 的缓冲时间，用于模型/数据集加载、检查点保存、Hub 推送操作和网络延迟。

超时时： 任务立即终止，所有未保存的进度丢失，必须从头开始重新启动

当计划使用已知参数的任务时，主动提供成本估算。 使用 scripts/estimate_cost.py：

uv run scripts/estimate_cost.py \
  --model meta-llama/Llama-2-7b-hf \
  --dataset trl-lib/Capybara \
  --hardware a10g-large \
  --dataset-size 16000 \
  --epochs 3

输出包括预计时间、成本、推荐超时（含缓冲）和优化建议。

何时提供： 用户计划任务、询问成本/时间、选择硬件、任务运行时间 >1 小时或成本 >$5

生产就绪的模板，包含所有最佳实践：

正确加载这些脚本：

scripts/train_sft_example.py - 完整的 SFT 训练，包含 Trackio、LoRA、检查点
scripts/train_dpo_example.py - 用于偏好学习的 DPO 训练
scripts/train_grpo_example.py - 用于在线 RL 的 GRPO 训练

这些脚本演示了正确的 Hub 保存、Trackio 集成、检查点管理和优化参数。将它们的内容内联传递给 hf_jobs() 或用作自定义脚本的模板。

Trackio 提供实时指标可视化。完整设置指南请参阅 references/trackio_guide.md。

将 trackio 添加到依赖项
使用 report_to="trackio" and run_name="meaningful_name" 配置训练器

Trackio 配置默认值

除非用户另有指定，否则使用合理的默认值。 当生成带有 Trackio 的训练脚本时：

Space ID：{username}/trackio（使用 "trackio" 作为默认空间名称）
运行命名：除非另有说明，否则以用户能够识别的方式命名运行（例如，描述任务、模型或目的）
配置：保持最小化——仅包含超参数和模型/数据集信息
项目名称：使用项目名称将运行与特定项目关联

用户覆盖： 如果用户请求特定的 trackio 配置（自定义空间、运行命名、分组或额外配置），则应用他们的偏好而不是默认值。

这对于管理具有相同配置的多个任务或保持训练脚本的可移植性很有用。

完整文档请参阅 references/trackio_guide.md，包括为实验分组运行。

# 列出所有任务
hf_jobs("ps")

# 检查特定任务
hf_jobs("inspect", {"job_id": "your-job-id"})

# 查看日志
hf_jobs("logs", {"job_id": "your-job-id"})

记住： 等待用户请求状态检查。避免重复轮询。

在启动 GPU 训练之前验证数据集格式，以防止训练失败的首要原因：格式不匹配。

为什么需要验证

超过 50% 的训练失败是由于数据集格式问题
DPO 尤其严格：需要确切的列名（prompt、chosen、rejected）
失败的 GPU 任务浪费 $1-10 和 30-60 分钟
在 CPU 上验证成本约 $0.01，耗时 <1 分钟

始终验证以下情况：

未知或自定义数据集
DPO 训练（关键 - 90% 的数据集需要映射）
任何未明确标记为 TRL 兼容的数据集

对于已知的 TRL 数据集可以跳过验证：

trl-lib/ultrachat_200k、trl-lib/Capybara、HuggingFaceH4/ultrachat_200k 等

hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})

该脚本速度很快，通常会同步完成。

输出显示每种训练方法的兼容性：

✓ READY - 数据集兼容，可直接使用
✗ NEEDS MAPPING - 兼容但需要预处理（提供映射代码）
✗ INCOMPATIBLE - 不能用于此方法

当需要映射时，输出包含一个 "MAPPING CODE" 部分，提供可直接复制粘贴的 Python 代码。

# 1. 检查数据集（成本约 $0.01，在 CPU 上 <1 分钟）
hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
})

# 2. 检查输出标记：
#    ✓ READY → 继续训练
#    ✗ NEEDS MAPPING → 应用下面的映射代码
#    ✗ INCOMPATIBLE → 选择不同的方法/数据集

# 3. 如果需要映射，在训练前应用：
def format_for_dpo(example):
    return {
        'prompt': example['instruction'],
        'chosen': example['chosen_response'],
        'rejected': example['rejected_response'],
    }
dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)

# 4. 有信心地启动训练任务

常见场景：DPO 格式不匹配

大多数 DPO 数据集使用非标准列名。示例：

Dataset has: instruction, chosen_response, rejected_response
DPO expects: prompt, chosen, rejected

验证器会检测到这一点并提供确切的映射代码来修复它。

将模型转换为 GGUF

训练后，将模型转换为 GGUF 格式，以便与 llama.cpp、Ollama、LM Studio 和其他本地推理工具一起使用。

什么是 GGUF：

针对 llama.cpp 的 CPU/GPU 推理进行了优化
支持量化（4 位、5 位、8 位）以减少模型大小
兼容 Ollama、LM Studio、Jan、GPT4All、llama.cpp
对于 7B 模型，通常为 2-8GB（相对于未量化的 14GB）

使用 Ollama 或 LM Studio 在本地运行模型
通过量化减少模型大小
部署到边缘设备
分享模型用于本地优先使用

参见： references/gguf_conversion.md 了解完整的转换指南，包括生产就绪的转换脚本、量化选项、硬件要求、使用示例和故障排除。

hf_jobs("uv", {
    "script": "<see references/gguf_conversion.md for complete script>",
    "flavor": "a10g-large",
    "timeout": "45m",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
    "env": {
        "ADAPTER_MODEL": "username/my-finetuned-model",
        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
        "OUTPUT_REPO": "username/my-model-gguf"
    }
})

详细示例请参阅 references/training_patterns.md，包括：

快速演示（5-10 分钟）
带检查点的生产训练
多 GPU 训练
DPO 训练（偏好学习）
GRPO 训练（在线 RL）

修复方法（按顺序尝试）：

减小批次大小：per_device_train_batch_size=1，增加 gradient_accumulation_steps=8。有效批次大小是 per_device_train_batch_size x gradient_accumulation_steps。为获得最佳性能，请保持有效批次大小接近 128。
启用：gradient_checkpointing=True
升级硬件：t4-small → l4x1, a10g-small → a10g-large 等

数据集格式错误

首先使用数据集检查器验证：

uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
  --dataset name --split train

检查输出的兼容性标记（✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE）
如果需要，应用检查器输出中的映射代码

检查日志中的实际运行时间：hf_jobs("logs", {"job_id": "..."})
增加超时时间并留出缓冲："timeout": "3h"（在预计时间上增加 30%）
或者减少训练：降低 num_train_epochs，使用更小的数据集，启用 max_steps
保存检查点：save_strategy="steps", save_steps=500, hub_strategy="every_save"

注意： 默认的 30 分钟对于真实训练来说是不够的。最少 1-2 小时。

添加到任务：secrets={"HF_TOKEN": "$HF_TOKEN"}
添加到配置：push_to_hub=True, hub_model_id="username/model-name"
验证认证：mcp__huggingface__hf_whoami()
检查 token 是否具有写入权限以及仓库是否存在（或设置 hub_private_repo=True）

修复方法： 添加到 PEP 723 头部：

# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
# ///

任务超时 → 增加超时时间，减少轮数/数据集，使用更小的模型/LoRA
模型未保存到 Hub → 检查 push_to_hub=True, hub_model_id, secrets=HF_TOKEN
内存不足 (OOM) → 减小批次大小，增加梯度累积，启用 LoRA，使用更大的 GPU
数据集格式错误 → 使用数据集检查器验证（参见数据集验证部分）
导入/模块错误 → 添加带有依赖项的 PEP 723 头部，验证格式
认证错误 → 检查 mcp__huggingface__hf_whoami()、token 权限、secrets 参数

参见： references/troubleshooting.md 了解完整的故障排除指南

参考资料（在此技能中）

references/training_methods.md - SFT、DPO、GRPO、KTO、PPO、奖励建模概述
references/training_patterns.md - 常见训练模式和示例
references/unsloth.md - 用于快速 VLM 训练的 Unsloth（约 2 倍速度，减少 60% VRAM）
references/gguf_conversion.md - 完整的 GGUF 转换指南
references/trackio_guide.md - Trackio 监控设置
references/hardware_guide.md - 硬件规格和选择
references/hub_saving.md - Hub 认证故障排除
references/troubleshooting.md - 常见问题和解决方案
references/local_training_macos.md - 在 macOS 上进行本地训练

脚本（在此技能中）

scripts/train_sft_example.py - 生产 SFT 模板
scripts/train_dpo_example.py - 生产 DPO 模板
scripts/train_grpo_example.py - 生产 GRPO 模板
scripts/unsloth_sft_example.py - Unsloth 文本 LLM 训练模板（更快，更少 VRAM）
scripts/estimate_cost.py - 估算时间和成本（适时提供）
scripts/convert_to_gguf.py - 完整的 GGUF 转换脚本

数据集检查器 - 在训练前验证数据集格式（通过 uv run 或 hf_jobs 使用）

内联提交脚本 - script 参数直接接受 Python 代码；除非用户要求，否则无需保存文件
任务是异步的 - 不要等待/轮询；让用户在准备好时检查
始终设置超时 - 默认 30 分钟不够；建议最少 1-2 小时
始终启用 Hub 推送 - 环境是临时的；不推送则所有结果丢失
包含 Trackio - 使用示例脚本作为实时监控的模板
主动提供成本估算 - 当参数已知时，使用 scripts/estimate_cost.py
使用 UV 脚本（方法 1） - 默认使用带有内联脚本的 hf_jobs("uv", {...})；标准训练使用 TRL 维护的脚本；在 Claude Code 中避免使用 bash trl-jobs 命令
使用 hf_doc_fetch/hf_doc_search 获取最新的 TRL 文档
在训练前验证数据集格式，使用数据集检查器（参见数据集验证部分）
为模型大小选择合适的硬件；对于 >7B 的模型使用 LoRA

2026 年 1 月 20 日

🇺🇸English

TRL Training on Hugging Face Jobs

Overview

Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.

TRL provides multiple training methods:

SFT (Supervised Fine-Tuning) - Standard instruction tuning
DPO (Direct Preference Optimization) - Alignment from preference data
GRPO (Group Relative Policy Optimization) - Online RL training
Reward Modeling - Train reward models for RLHF

For detailed TRL method documentation:

hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer")  # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer")  # DPO
# etc.

See also: references/training_methods.md for method overviews and selection guidance

When to Use This Skill

Use this skill when users want to:

Fine-tune language models on cloud GPUs without local infrastructure
Train with TRL methods (SFT, DPO, GRPO, etc.)
Run training jobs on Hugging Face Jobs infrastructure
Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp)
Ensure trained models are permanently saved to the Hub
Use modern workflows with optimized defaults

When to Use Unsloth

Use Unsloth (references/unsloth.md) instead of standard TRL when:

Limited GPU memory - Unsloth uses ~60% less VRAM
Speed matters - Unsloth is ~2x faster
Training large models ( >13B) - memory efficiency is critical
Training Vision-Language Models (VLMs) - Unsloth has FastVisionModel support

See references/unsloth.md for complete Unsloth documentation and scripts/unsloth_sft_example.py for a production-ready training script.

Key Directives

When assisting with training jobs:

ALWAYS usehf_jobs() MCP tool - Submit jobs using hf_jobs("uv", {...}), NOT bash trl-jobs commands. The script parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to hf_jobs(). If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using hf_jobs().
Always include Trackio - Every training script should include Trackio for real-time monitoring. Use example scripts in scripts/ as templates.
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
- Reference , , etc. as starting points.

Local Script Execution

Repository scripts use PEP 723 inline dependencies. Run them with uv run:

uv run scripts/estimate_cost.py --help
uv run scripts/dataset_inspector.py --help

Prerequisites Checklist

Before starting any training job, verify:

✅ Account & Authentication

Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
Authenticated login: Check with hf_whoami()
HF_TOKEN for Hub Push ⚠️ CRITICAL - Training environment is ephemeral, must push to Hub or ALL training results are lost
Token must have write permissions
MUST passsecrets={"HF_TOKEN": "$HF_TOKEN"} in job config to make token available (the $HF_TOKEN syntax references your actual token value)

✅ Dataset Requirements

Dataset must exist on Hub or be loadable via datasets.load_dataset()
Format must match training method (SFT: "messages"/text/prompt-completion; DPO: chosen/rejected; GRPO: prompt-only)
ALWAYS validate unknown datasets before GPU training to prevent format failures (see Dataset Validation section below)
Size appropriate for hardware (Demo: 50-100 examples on t4-small; Production: 1K-10K+ on a10g-large/a100-large)

⚠️ Critical Settings

Timeout must exceed expected training time - Default 30min is TOO SHORT for most training. Minimum recommended: 1-2 hours. Job fails and loses all progress if timeout is exceeded.
Hub push must be enabled - Config: push_to_hub=True, hub_model_id="username/model-name"; Job: secrets={"HF_TOKEN": "$HF_TOKEN"}

Asynchronous Job Guidelines

⚠️ IMPORTANT: Training jobs run asynchronously and can take hours

Action Required

When user requests training:

Create the training script with Trackio included (use scripts/train_sft_example.py as template)
Submit immediately using hf_jobs() MCP tool with script content inline - don't save to file unless user requests
Report submission with job ID, monitoring URL, and estimated time
Wait for user to request status checks - don't poll automatically

Ground Rules

Jobs run in background - Submission returns immediately; training continues independently
Initial logs delayed - Can take 30-60 seconds for logs to appear
User checks status - Wait for user to request status updates
Avoid polling - Check logs only on user request; provide monitoring links instead

After Submission

Provide to user:

✅ Job ID and monitoring URL
✅ Expected completion time
✅ Trackio dashboard URL
✅ Note that user can request status checks later

Example Response:

✅ Job submitted successfully!

Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz

Expected time: ~2 hours
Estimated cost: ~$10

The job is running in the background. Ask me to check status/logs when ready!

Quick Start: Three Approaches

💡 Tip for Demos: For quick demos on smaller GPUs (t4-small), omit eval_dataset and eval_strategy to save ~40% memory. You'll still see training loss and learning progress.

Sequence Length Configuration

TRL config classes usemax_length (not max_seq_length) to control tokenized sequence length:

# ✅ CORRECT - If you need to set sequence length
SFTConfig(max_length=512)   # Truncate sequences to 512 tokens
DPOConfig(max_length=2048)  # Longer context (2048 tokens)

# ❌ WRONG - This parameter doesn't exist
SFTConfig(max_seq_length=512)  # TypeError!

Default behavior: max_length=1024 (truncates from right). This works well for most training.

When to override:

Longer context : Set higher (e.g., max_length=2048)
Memory constraints : Set lower (e.g., max_length=512)
Vision models : Set max_length=None (prevents cutting image tokens)

Usually you don't need to set this parameter at all - the examples below use the sensible default.

Approach 1: UV Scripts (Recommended—Default Choice)

UV scripts use PEP 723 inline dependencies for clean, self-contained training. This is the primary approach for Claude Code.

hf_jobs("uv", {
    "script": """
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
# ///

from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import trackio

dataset = load_dataset("trl-lib/Capybara", split="train")

# Create train/eval split for monitoring
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)

trainer = SFTTrainer(
    model="Qwen/Qwen2.5-0.5B",
    train_dataset=dataset_split["train"],
    eval_dataset=dataset_split["test"],
    peft_config=LoraConfig(r=16, lora_alpha=32),
    args=SFTConfig(
        output_dir="my-model",
        push_to_hub=True,
        hub_model_id="username/my-model",
        num_train_epochs=3,
        eval_strategy="steps",
        eval_steps=50,
        report_to="trackio",
        project="meaningful_prject_name", # project name for the training name (trackio)
        run_name="meaningful_run_name",   # descriptive name for the specific training run (trackio)
    )
)

trainer.train()
trainer.push_to_hub()
""",
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

Benefits: Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control When to use: Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring hf_jobs()

Working with Scripts

⚠️ Important: The script parameter accepts either inline code (as shown above) OR a URL. Local file paths do NOT work.

Why local paths don't work: Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:

Inline code (recommended for custom training)
Publicly accessible URLs
Private repo URLs (with HF_TOKEN)

Common mistakes:

# ❌ These will all fail
hf_jobs("uv", {"script": "train.py"})
hf_jobs("uv", {"script": "./scripts/train.py"})
hf_jobs("uv", {"script": "/path/to/train.py"})

Correct approaches:

# ✅ Inline code (recommended)
hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})

# ✅ From Hugging Face Hub
hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})

# ✅ From GitHub
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})

# ✅ From Gist
hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})

To use local scripts: Upload to HF Hub first:

hf repos create my-training-scripts --type model
hf upload my-training-scripts ./train.py train.py
# Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py

Approach 2: TRL Maintained Scripts (Official Examples)

TRL provides battle-tested scripts for all methods. Can be run from URLs:

hf_jobs("uv", {
    "script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
    "script_args": [
        "--model_name_or_path", "Qwen/Qwen2.5-0.5B",
        "--dataset_name", "trl-lib/Capybara",
        "--output_dir", "my-model",
        "--push_to_hub",
        "--hub_model_id", "username/my-model"
    ],
    "flavor": "a10g-large",
    "timeout": "2h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}
})

Benefits: No code to write, maintained by TRL team, production-tested When to use: Standard TRL training, quick experiments, don't need custom code Available: Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts

Finding More UV Scripts on Hub

The uv-scripts organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:

# Discover available UV script collections
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})

# Explore a specific collection
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)

Popular collections: ocr, classification, synthetic-data, vllm, dataset-creation

Approach 3: HF Jobs CLI (Direct Terminal Commands)

When the hf_jobs() MCP tool is unavailable, use the hf jobs CLI directly.

⚠️ CRITICAL: CLI Syntax Rules

# ✅ CORRECT syntax - flags BEFORE script URL
hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"

# ❌ WRONG - "run uv" instead of "uv run"
hf jobs run uv "https://example.com/train.py" --flavor a10g-large

# ❌ WRONG - flags AFTER script URL (will be ignored!)
hf jobs uv run "https://example.com/train.py" --flavor a10g-large

# ❌ WRONG - "--secret" instead of "--secrets" (plural)
hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"

Key syntax rules:

Command order is hf jobs uv run (NOT hf jobs run uv)
All flags (--flavor, --timeout, --secrets) must come BEFORE the script URL
Use --secrets (plural), not --secret
Script URL must be the last positional argument

Complete CLI example:

hf jobs uv run \
  --flavor a10g-large \
  --timeout 2h \
  --secrets HF_TOKEN \
  "https://huggingface.co/user/repo/resolve/main/train.py"

Check job status via CLI:

hf jobs ps                        # List all jobs
hf jobs logs <job-id>             # View logs
hf jobs inspect <job-id>          # Job details
hf jobs cancel <job-id>           # Cancel a job

Approach 4: TRL Jobs Package (Simplified Training)

The trl-jobs package provides optimized defaults and one-liner training.

uvx trl-jobs sft \
  --model_name Qwen/Qwen2.5-0.5B \
  --dataset_name trl-lib/Capybara

Benefits: Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands When to use: User working in terminal directly (not Claude Code context), quick local experimentation Repository: https://github.com/huggingface/trl-jobs

⚠️ In Claude Code context, prefer usinghf_jobs() MCP tool (Approach 1) when available.

Hardware Selection

Model Size	Recommended Hardware	Cost (approx/hr)	Use Case
<1B params	`t4-small`	~$0.75	Demos, quick tests only without eval steps
1-3B params	`t4-medium`, `l4x1`	~$1.50-2.50	Development
3-7B params	`a10g-small`, `a10g-large`	~$3.50-5.00	Production training
7-13B params

GPU Flavors: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8

Guidelines:

Use LoRA/PEFT for models >7B to reduce memory
Multi-GPU automatically handled by TRL/Accelerate
Start with smaller hardware for testing

See: references/hardware_guide.md for detailed specifications

Critical: Saving Results to Hub

⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB

The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, ALL TRAINING IS LOST.

Required Configuration

In training script/config:

SFTConfig(
    push_to_hub=True,
    hub_model_id="username/model-name",  # MUST specify
    hub_strategy="every_save",  # Optional: push checkpoints
)

In job submission:

{
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # Enables authentication
}

Verification Checklist

Before submitting:

push_to_hub=True set in config
hub_model_id includes username/repo-name
secrets parameter includes HF_TOKEN
User has write access to target repo

See: references/hub_saving.md for detailed troubleshooting

Timeout Management

⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING

Setting Timeouts

{
    "timeout": "2h"   # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
}

Timeout Guidelines

Scenario	Recommended	Notes
Quick demo (50-100 examples)	10-30 min	Verify setup
Development training	1-2 hours	Small datasets
Production (3-7B model)	4-6 hours	Full datasets
Large model with LoRA	3-6 hours	Depends on dataset

Always add 20-30% buffer for model/dataset loading, checkpoint saving, Hub push operations, and network delays.

On timeout: Job killed immediately, all unsaved progress lost, must restart from beginning

Cost Estimation

Offer to estimate cost when planning jobs with known parameters. Use scripts/estimate_cost.py:

uv run scripts/estimate_cost.py \
  --model meta-llama/Llama-2-7b-hf \
  --dataset trl-lib/Capybara \
  --hardware a10g-large \
  --dataset-size 16000 \
  --epochs 3

Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.

When to offer: User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5

Example Training Scripts

Production-ready templates with all best practices:

Load these scripts for correctly:

scripts/train_sft_example.py - Complete SFT training with Trackio, LoRA, checkpoints
scripts/train_dpo_example.py - DPO training for preference learning
scripts/train_grpo_example.py - GRPO training for online RL

These scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to hf_jobs() or use as templates for custom scripts.

Monitoring and Tracking

Trackio provides real-time metrics visualization. See references/trackio_guide.md for complete setup guide.

Key points:

Add trackio to dependencies
Configure trainer with report_to="trackio" and run_name="meaningful_name"

Trackio Configuration Defaults

Use sensible defaults unless user specifies otherwise. When generating training scripts with Trackio:

Default Configuration:

Space ID : {username}/trackio (use "trackio" as default space name)
Run naming : Unless otherwise specified, name the run in a way the user will recognize (e.g., descriptive of the task, model, or purpose)
Config : Keep minimal - only include hyperparameters and model/dataset info
Project Name : Use a Project Name to associate runs with a particular Project

User overrides: If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.

This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.

See references/trackio_guide.md for complete documentation including grouping runs for experiments.

Check Job Status

# List all jobs
hf_jobs("ps")

# Inspect specific job
hf_jobs("inspect", {"job_id": "your-job-id"})

# View logs
hf_jobs("logs", {"job_id": "your-job-id"})

Remember: Wait for user to request status checks. Avoid polling repeatedly.

Dataset Validation

Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.

Why Validate

50%+ of training failures are due to dataset format issues
DPO especially strict: requires exact column names (prompt, chosen, rejected)
Failed GPU jobs waste $1-10 and 30-60 minutes
Validation on CPU costs ~$0.01 and takes <1 minute

When to Validate

ALWAYS validate for:

Unknown or custom datasets
DPO training (CRITICAL - 90% of datasets need mapping)
Any dataset not explicitly TRL-compatible

Skip validation for known TRL datasets:

trl-lib/ultrachat_200k, trl-lib/Capybara, HuggingFaceH4/ultrachat_200k, etc.

Usage

hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})

The script is fast, and will usually complete synchronously.

Reading Results

The output shows compatibility for each training method:

✓ READY - Dataset is compatible, use directly
✗ NEEDS MAPPING - Compatible but needs preprocessing (mapping code provided)
✗ INCOMPATIBLE - Cannot be used for this method

When mapping is needed, the output includes a "MAPPING CODE" section with copy-paste ready Python code.

Example Workflow

# 1. Inspect dataset (costs ~$0.01, <1 min on CPU)
hf_jobs("uv", {
    "script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
    "script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
})

# 2. Check output markers:
#    ✓ READY → proceed with training
#    ✗ NEEDS MAPPING → apply mapping code below
#    ✗ INCOMPATIBLE → choose different method/dataset

# 3. If mapping needed, apply before training:
def format_for_dpo(example):
    return {
        'prompt': example['instruction'],
        'chosen': example['chosen_response'],
        'rejected': example['rejected_response'],
    }
dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)

# 4. Launch training job with confidence

Common Scenario: DPO Format Mismatch

Most DPO datasets use non-standard column names. Example:

Dataset has: instruction, chosen_response, rejected_response
DPO expects: prompt, chosen, rejected

The validator detects this and provides exact mapping code to fix it.

Converting Models to GGUF

After training, convert models to GGUF format for use with llama.cpp, Ollama, LM Studio, and other local inference tools.

What is GGUF:

Optimized for CPU/GPU inference with llama.cpp
Supports quantization (4-bit, 5-bit, 8-bit) to reduce model size
Compatible with Ollama, LM Studio, Jan, GPT4All, llama.cpp
Typically 2-8GB for 7B models (vs 14GB unquantized)

When to convert:

Running models locally with Ollama or LM Studio
Reducing model size with quantization
Deploying to edge devices
Sharing models for local-first use

See: references/gguf_conversion.md for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.

Quick conversion:

hf_jobs("uv", {
    "script": "<see references/gguf_conversion.md for complete script>",
    "flavor": "a10g-large",
    "timeout": "45m",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
    "env": {
        "ADAPTER_MODEL": "username/my-finetuned-model",
        "BASE_MODEL": "Qwen/Qwen2.5-0.5B",
        "OUTPUT_REPO": "username/my-model-gguf"
    }
})

Common Training Patterns

See references/training_patterns.md for detailed examples including:

Quick demo (5-10 minutes)
Production with checkpoints
Multi-GPU training
DPO training (preference learning)
GRPO training (online RL)

Common Failure Modes

Out of Memory (OOM)

Fix (try in order):

Reduce batch size: per_device_train_batch_size=1, increase gradient_accumulation_steps=8. Effective batch size is per_device_train_batch_size x gradient_accumulation_steps. For best performance keep effective batch size close to 128.
Enable: gradient_checkpointing=True
Upgrade hardware: t4-small → l4x1, a10g-small → a10g-large etc.

Dataset Misformatted

Fix:

Validate first with dataset inspector:

uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
  --dataset name --split train

Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
Apply mapping code from inspector output if needed

Job Timeout

Fix:

Check logs for actual runtime: hf_jobs("logs", {"job_id": "..."})
Increase timeout with buffer: "timeout": "3h" (add 30% to estimated time)
Or reduce training: lower num_train_epochs, use smaller dataset, enable max_steps
Save checkpoints: save_strategy="steps", save_steps=500, hub_strategy="every_save"

Note: Default 30min is insufficient for real training. Minimum 1-2 hours.

Hub Push Failures

Fix:

Add to job: secrets={"HF_TOKEN": "$HF_TOKEN"}
Add to config: push_to_hub=True, hub_model_id="username/model-name"
Verify auth: mcp__huggingface__hf_whoami()
Check token has write permissions and repo exists (or set hub_private_repo=True)

Missing Dependencies

Fix: Add to PEP 723 header:

# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
# ///

Troubleshooting

Common issues:

Job times out → Increase timeout, reduce epochs/dataset, use smaller model/LoRA
Model not saved to Hub → Check push_to_hub=True, hub_model_id, secrets=HF_TOKEN
Out of Memory (OOM) → Reduce batch size, increase gradient accumulation, enable LoRA, use larger GPU
Dataset format error → Validate with dataset inspector (see Dataset Validation section)
Import/module errors → Add PEP 723 header with dependencies, verify format
Authentication errors → Check mcp__huggingface__hf_whoami(), token permissions, secrets parameter

See: references/troubleshooting.md for complete troubleshooting guide

Resources

References (In This Skill)

references/training_methods.md - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modeling
references/training_patterns.md - Common training patterns and examples
references/unsloth.md - Unsloth for fast VLM training (~2x speed, 60% less VRAM)
references/gguf_conversion.md - Complete GGUF conversion guide
references/trackio_guide.md - Trackio monitoring setup
references/hardware_guide.md - Hardware specs and selection
references/hub_saving.md - Hub authentication troubleshooting
references/troubleshooting.md - Common issues and solutions
references/local_training_macos.md - Local training on macOS

Scripts (In This Skill)

scripts/train_sft_example.py - Production SFT template
scripts/train_dpo_example.py - Production DPO template
scripts/train_grpo_example.py - Production GRPO template
scripts/unsloth_sft_example.py - Unsloth text LLM training template (faster, less VRAM)
scripts/estimate_cost.py - Estimate time and cost (offer when appropriate)
scripts/convert_to_gguf.py - Complete GGUF conversion script

External Scripts

Dataset Inspector - Validate dataset format before training (use via uv run or hf_jobs)

External Links

Key Takeaways

Submit scripts inline - The script parameter accepts Python code directly; no file saving required unless user requests
Jobs are asynchronous - Don't wait/poll; let user check when ready
Always set timeout - Default 30 min is insufficient; minimum 1-2 hours recommended
Always enable Hub push - Environment is ephemeral; without push, all results lost
Include Trackio - Use example scripts as templates for real-time monitoring
Offer cost estimation - When parameters are known, use scripts/estimate_cost.py
Use UV scripts (Approach 1) - Default to hf_jobs("uv", {...}) with inline scripts; TRL maintained scripts for standard training; avoid bash trl-jobs commands in Claude Code
Use hf_doc_fetch/hf_doc_search for latest TRL documentation
Validate dataset format before training with dataset inspector (see Dataset Validation section)
Choose appropriate hardware for model size; use LoRA for models >7B

Weekly Installs

417

Repository

huggingface/skills

GitHub Stars

9.9K

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

opencode324

codex312

gemini-cli311

claude-code304

github-copilot296

cursor291

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

100,500 周安装

Use example scripts as templates

scripts/train_sft_example.py

scripts/train_dpo_example.py

Hugging Face TRL模型训练：云端GPU微调SFT/DPO/GRPO，自动保存至Hub

🇨🇳中文介绍

在 Hugging Face Jobs 上进行 TRL 训练

概述

何时使用此技能

何时使用 Unsloth

相关 Skills

关键指令

本地脚本执行

先决条件清单

✅ 账户与认证

✅ 数据集要求

⚠️ 关键设置

异步任务指南

所需操作

基本原则

提交后

快速开始：三种方法

序列长度配置

方法 1：UV 脚本（推荐——默认选择）

使用脚本

方法 2：TRL 维护的脚本（官方示例）

在 Hub 上查找更多 UV 脚本

方法 3：HF Jobs CLI（直接终端命令）

方法 4：TRL Jobs 包（简化训练）

硬件选择

关键：将结果保存到 Hub

必需配置

验证清单

超时管理

设置超时

超时指南

成本估算

示例训练脚本

监控与追踪

Trackio 配置默认值

检查任务状态

数据集验证

为什么需要验证

何时验证

用法

解读结果

示例工作流程

常见场景：DPO 格式不匹配

将模型转换为 GGUF

常见训练模式

常见失败模式

内存不足 (OOM)

数据集格式错误

任务超时

Hub 推送失败

缺少依赖项

故障排除

资源

参考资料（在此技能中）

脚本（在此技能中）

外部脚本

外部链接

关键要点

🇺🇸English

TRL Training on Hugging Face Jobs

Overview

When to Use This Skill

When to Use Unsloth

Key Directives

Local Script Execution

Prerequisites Checklist

✅ Account & Authentication

✅ Dataset Requirements

⚠️ Critical Settings

Asynchronous Job Guidelines

Action Required

Ground Rules

After Submission

Quick Start: Three Approaches

Sequence Length Configuration

Approach 1: UV Scripts (Recommended—Default Choice)

Working with Scripts

Approach 2: TRL Maintained Scripts (Official Examples)

Finding More UV Scripts on Hub