hugging-face-model-trainer by huggingface/skills
npx skills add https://github.com/huggingface/skills --skill hugging-face-model-trainer在完全托管的 Hugging Face 基础设施上使用 TRL(Transformer 强化学习)训练语言模型。无需本地 GPU 设置——模型在云端 GPU 上训练,结果自动保存到 Hugging Face Hub。
TRL 提供多种训练方法:
详细的 TRL 方法文档:
hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO
# etc.
另请参阅: references/training_methods.md 了解方法概述和选择指南
当用户希望时使用此技能:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
在以下情况下,使用 Unsloth (references/unsloth.md) 而非标准 TRL:
FastVisionModel完整的 Unsloth 文档请参阅 references/unsloth.md,生产就绪的训练脚本请参阅 scripts/unsloth_sft_example.py。
协助处理训练任务时:
始终使用 hf_jobs() MCP 工具 - 使用 hf_jobs("uv", {...}) 提交任务,而不是 bash trl-jobs 命令。script 参数直接接受 Python 代码。除非用户明确要求,否则不要保存到本地文件。将脚本内容作为字符串传递给 hf_jobs()。如果用户要求“训练一个模型”、“微调”或类似请求,您必须创建训练脚本并立即使用 hf_jobs() 提交任务。
始终包含 Trackio - 每个训练脚本都应包含 Trackio 以进行实时监控。使用 scripts/ 中的示例脚本作为模板。
提交后提供任务详情 - 提交后,提供任务 ID、监控 URL、预计时间,并告知用户稍后可以请求状态检查。
使用示例脚本作为模板 - 参考 scripts/train_sft_example.py、scripts/train_dpo_example.py 等作为起点。
仓库脚本使用 PEP 723 内联依赖项。使用 uv run 运行它们:
uv run scripts/estimate_cost.py --help
uv run scripts/dataset_inspector.py --help
在开始任何训练任务之前,请验证:
hf_whoami() 检查secrets={"HF_TOKEN": "$HF_TOKEN"} 以使 token 可用($HF_TOKEN 语法引用您实际的 token 值)datasets.load_dataset() 加载push_to_hub=True,hub_model_id="username/model-name";任务:secrets={"HF_TOKEN": "$HF_TOKEN"}⚠️ 重要:训练任务异步运行,可能需要数小时
当用户请求训练时:
scripts/train_sft_example.py 作为模板)hf_jobs() MCP 工具并内联脚本内容——除非用户要求,否则不要保存到文件向用户提供:
示例响应:
✅ Job submitted successfully!
Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz
Expected time: ~2 hours
Estimated cost: ~$10
The job is running in the background. Ask me to check status/logs when ready!
💡 演示提示: 对于在较小 GPU(t4-small)上的快速演示,可以省略 eval_dataset 和 eval_strategy 以节省约 40% 的内存。您仍然可以看到训练损失和学习进度。
TRL 配置类使用 max_length(而不是 max_seq_length) 来控制分词后的序列长度:
# ✅ 正确 - 如果需要设置序列长度
SFTConfig(max_length=512) # 将序列截断为 512 个 token
DPOConfig(max_length=2048) # 更长的上下文(2048 个 token)
# ❌ 错误 - 此参数不存在
SFTConfig(max_seq_length=512) # TypeError!
默认行为: max_length=1024(从右侧截断)。这适用于大多数训练。
何时需要覆盖:
max_length=2048)max_length=512)max_length=None(防止切割图像 token)通常您根本不需要设置此参数——下面的示例使用了合理的默认值。
UV 脚本使用 PEP 723 内联依赖项,实现干净、自包含的训练。这是 Claude Code 的主要方法。
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
# ///
from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import trackio
dataset = load_dataset("trl-lib/Capybara", split="train")
# 创建训练/评估分割以进行监控
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
trainer = SFTTrainer(
model="Qwen/Qwen2.5-0.5B",
train_dataset=dataset_split["train"],
eval_dataset=dataset_split["test"],
peft_config=LoraConfig(r=16, lora_alpha=32),
args=SFTConfig(
output_dir="my-model",
push_to_hub=True,
hub_model_id="username/my-model",
num_train_epochs=3,
eval_strategy="steps",
eval_steps=50,
report_to="trackio",
project="meaningful_prject_name", # 训练名称的项目名 (trackio)
run_name="meaningful_run_name", # 特定训练运行的描述性名称 (trackio)
)
)
trainer.train()
trainer.push_to_hub()
""",
"flavor": "a10g-large",
"timeout": "2h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
优点: 直接使用 MCP 工具,代码干净,依赖项内联声明(PEP 723),无需保存文件,完全控制 何时使用: Claude Code 中所有训练任务的默认选择,自定义训练逻辑,任何需要 hf_jobs() 的场景
⚠️ 重要: script 参数接受内联代码(如上所示)或 URL。本地文件路径无效。
为什么本地路径无效: 任务在隔离的 Docker 容器中运行,无法访问您的本地文件系统。脚本必须是:
常见错误:
# ❌ 这些都会失败
hf_jobs("uv", {"script": "train.py"})
hf_jobs("uv", {"script": "./scripts/train.py"})
hf_jobs("uv", {"script": "/path/to/train.py"})
正确方法:
# ✅ 内联代码(推荐)
hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})
# ✅ 来自 Hugging Face Hub
hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})
# ✅ 来自 GitHub
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})
# ✅ 来自 Gist
hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})
要使用本地脚本: 先上传到 HF Hub:
hf repos create my-training-scripts --type model
hf upload my-training-scripts ./train.py train.py
# 使用:https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py
TRL 为所有方法提供了经过实战检验的脚本。可以从 URL 运行:
hf_jobs("uv", {
"script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
"script_args": [
"--model_name_or_path", "Qwen/Qwen2.5-0.5B",
"--dataset_name", "trl-lib/Capybara",
"--output_dir", "my-model",
"--push_to_hub",
"--hub_model_id", "username/my-model"
],
"flavor": "a10g-large",
"timeout": "2h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
优点: 无需编写代码,由 TRL 团队维护,经过生产测试 何时使用: 标准 TRL 训练,快速实验,不需要自定义代码 可用: 脚本可从 https://github.com/huggingface/trl/tree/main/examples/scripts 获取
uv-scripts 组织提供了现成的 UV 脚本,作为数据集存储在 Hugging Face Hub 上:
# 发现可用的 UV 脚本集合
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
# 探索特定集合
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
热门集合: ocr, classification, synthetic-data, vllm, dataset-creation
当 hf_jobs() MCP 工具不可用时,直接使用 hf jobs CLI。
⚠️ 关键:CLI 语法规则
# ✅ 正确语法 - 标志在脚本 URL 之前
hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"
# ❌ 错误 - "run uv" 而不是 "uv run"
hf jobs run uv "https://example.com/train.py" --flavor a10g-large
# ❌ 错误 - 标志在脚本 URL 之后(将被忽略!)
hf jobs uv run "https://example.com/train.py" --flavor a10g-large
# ❌ 错误 - "--secret" 而不是 "--secrets"(复数)
hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"
关键语法规则:
hf jobs uv run(不是 hf jobs run uv)--flavor、--timeout、--secrets)必须在脚本 URL 之前--secrets(复数),而不是 --secret完整的 CLI 示例:
hf jobs uv run \
--flavor a10g-large \
--timeout 2h \
--secrets HF_TOKEN \
"https://huggingface.co/user/repo/resolve/main/train.py"
通过 CLI 检查任务状态:
hf jobs ps # 列出所有任务
hf jobs logs <job-id> # 查看日志
hf jobs inspect <job-id> # 任务详情
hf jobs cancel <job-id> # 取消任务
trl-jobs 包提供了优化的默认设置和一行式训练。
uvx trl-jobs sft \
--model_name Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/Capybara
优点: 预配置设置,自动 Trackio 集成,自动 Hub 推送,一行命令 何时使用: 用户直接在终端工作(不在 Claude Code 上下文中),快速本地实验 仓库: https://github.com/huggingface/trl-jobs
⚠️ 在 Claude Code 上下文中,如果可用,优先使用 hf_jobs() MCP 工具(方法 1)。
| 模型大小 | 推荐硬件 | 成本(约/小时) | 使用场景 |
|---|---|---|---|
| <1B 参数 | t4-small | ~$0.75 | 演示,仅用于快速测试(无评估步骤) |
| 1-3B 参数 | t4-medium, l4x1 | ~$1.50-2.50 | 开发 |
| 3-7B 参数 | a10g-small, a10g-large | ~$3.50-5.00 | 生产训练 |
| 7-13B 参数 | a10g-large, a100-large | ~$5-10 | 大模型(使用 LoRA) |
| 13B+ 参数 | a100-large, a10g-largex2 | ~$10-20 | 非常大的模型(使用 LoRA) |
GPU 类型: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
指南:
参见: references/hardware_guide.md 了解详细规格
⚠️ 临时环境——必须推送到 Hub
Jobs 环境是临时的。任务结束时所有文件都会被删除。如果模型没有推送到 Hub,所有训练都将丢失。
在训练脚本/配置中:
SFTConfig(
push_to_hub=True,
hub_model_id="username/model-name", # 必须指定
hub_strategy="every_save", # 可选:推送检查点
)
在任务提交中:
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # 启用认证
}
提交前检查:
push_to_hub=Truehub_model_id 包含 username/repo-namesecrets 参数包含 HF_TOKEN参见: references/hub_saving.md 了解详细故障排除
⚠️ 默认:30 分钟——对于训练来说太短
{
"timeout": "2h" # 2 小时(格式:"90m", "2h", "1.5h",或整数秒)
}
| 场景 | 推荐 | 备注 |
|---|---|---|
| 快速演示(50-100 个示例) | 10-30 分钟 | 验证设置 |
| 开发训练 | 1-2 小时 | 小数据集 |
| 生产(3-7B 模型) | 4-6 小时 | 完整数据集 |
| 使用 LoRA 的大模型 | 3-6 小时 | 取决于数据集 |
始终增加 20-30% 的缓冲时间,用于模型/数据集加载、检查点保存、Hub 推送操作和网络延迟。
超时时: 任务立即终止,所有未保存的进度丢失,必须从头开始重新启动
当计划使用已知参数的任务时,主动提供成本估算。 使用 scripts/estimate_cost.py:
uv run scripts/estimate_cost.py \
--model meta-llama/Llama-2-7b-hf \
--dataset trl-lib/Capybara \
--hardware a10g-large \
--dataset-size 16000 \
--epochs 3
输出包括预计时间、成本、推荐超时(含缓冲)和优化建议。
何时提供: 用户计划任务、询问成本/时间、选择硬件、任务运行时间 >1 小时或成本 >$5
生产就绪的模板,包含所有最佳实践:
正确加载这些脚本:
scripts/train_sft_example.py - 完整的 SFT 训练,包含 Trackio、LoRA、检查点scripts/train_dpo_example.py - 用于偏好学习的 DPO 训练scripts/train_grpo_example.py - 用于在线 RL 的 GRPO 训练这些脚本演示了正确的 Hub 保存、Trackio 集成、检查点管理和优化参数。将它们的内容内联传递给 hf_jobs() 或用作自定义脚本的模板。
Trackio 提供实时指标可视化。完整设置指南请参阅 references/trackio_guide.md。
要点:
trackio 添加到依赖项report_to="trackio" and run_name="meaningful_name" 配置训练器除非用户另有指定,否则使用合理的默认值。 当生成带有 Trackio 的训练脚本时:
默认配置:
{username}/trackio(使用 "trackio" 作为默认空间名称)用户覆盖: 如果用户请求特定的 trackio 配置(自定义空间、运行命名、分组或额外配置),则应用他们的偏好而不是默认值。
这对于管理具有相同配置的多个任务或保持训练脚本的可移植性很有用。
完整文档请参阅 references/trackio_guide.md,包括为实验分组运行。
# 列出所有任务
hf_jobs("ps")
# 检查特定任务
hf_jobs("inspect", {"job_id": "your-job-id"})
# 查看日志
hf_jobs("logs", {"job_id": "your-job-id"})
记住: 等待用户请求状态检查。避免重复轮询。
在启动 GPU 训练之前验证数据集格式,以防止训练失败的首要原因:格式不匹配。
prompt、chosen、rejected)始终验证以下情况:
对于已知的 TRL 数据集可以跳过验证:
trl-lib/ultrachat_200k、trl-lib/Capybara、HuggingFaceH4/ultrachat_200k 等hf_jobs("uv", {
"script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
"script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})
该脚本速度很快,通常会同步完成。
输出显示每种训练方法的兼容性:
✓ READY - 数据集兼容,可直接使用✗ NEEDS MAPPING - 兼容但需要预处理(提供映射代码)✗ INCOMPATIBLE - 不能用于此方法当需要映射时,输出包含一个 "MAPPING CODE" 部分,提供可直接复制粘贴的 Python 代码。
# 1. 检查数据集(成本约 $0.01,在 CPU 上 <1 分钟)
hf_jobs("uv", {
"script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
"script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
})
# 2. 检查输出标记:
# ✓ READY → 继续训练
# ✗ NEEDS MAPPING → 应用下面的映射代码
# ✗ INCOMPATIBLE → 选择不同的方法/数据集
# 3. 如果需要映射,在训练前应用:
def format_for_dpo(example):
return {
'prompt': example['instruction'],
'chosen': example['chosen_response'],
'rejected': example['rejected_response'],
}
dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)
# 4. 有信心地启动训练任务
大多数 DPO 数据集使用非标准列名。示例:
Dataset has: instruction, chosen_response, rejected_response
DPO expects: prompt, chosen, rejected
验证器会检测到这一点并提供确切的映射代码来修复它。
训练后,将模型转换为 GGUF 格式,以便与 llama.cpp、Ollama、LM Studio 和其他本地推理工具一起使用。
什么是 GGUF:
何时转换:
参见: references/gguf_conversion.md 了解完整的转换指南,包括生产就绪的转换脚本、量化选项、硬件要求、使用示例和故障排除。
快速转换:
hf_jobs("uv", {
"script": "<see references/gguf_conversion.md for complete script>",
"flavor": "a10g-large",
"timeout": "45m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
"env": {
"ADAPTER_MODEL": "username/my-finetuned-model",
"BASE_MODEL": "Qwen/Qwen2.5-0.5B",
"OUTPUT_REPO": "username/my-model-gguf"
}
})
详细示例请参阅 references/training_patterns.md,包括:
修复方法(按顺序尝试):
per_device_train_batch_size=1,增加 gradient_accumulation_steps=8。有效批次大小是 per_device_train_batch_size x gradient_accumulation_steps。为获得最佳性能,请保持有效批次大小接近 128。gradient_checkpointing=True修复方法:
首先使用数据集检查器验证:
uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
--dataset name --split train
检查输出的兼容性标记(✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
如果需要,应用检查器输出中的映射代码
修复方法:
hf_jobs("logs", {"job_id": "..."})"timeout": "3h"(在预计时间上增加 30%)num_train_epochs,使用更小的数据集,启用 max_stepssave_strategy="steps", save_steps=500, hub_strategy="every_save"注意: 默认的 30 分钟对于真实训练来说是不够的。最少 1-2 小时。
修复方法:
secrets={"HF_TOKEN": "$HF_TOKEN"}push_to_hub=True, hub_model_id="username/model-name"mcp__huggingface__hf_whoami()hub_private_repo=True)修复方法: 添加到 PEP 723 头部:
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
# ///
常见问题:
mcp__huggingface__hf_whoami()、token 权限、secrets 参数参见: references/troubleshooting.md 了解完整的故障排除指南
references/training_methods.md - SFT、DPO、GRPO、KTO、PPO、奖励建模概述references/training_patterns.md - 常见训练模式和示例references/unsloth.md - 用于快速 VLM 训练的 Unsloth(约 2 倍速度,减少 60% VRAM)references/gguf_conversion.md - 完整的 GGUF 转换指南references/trackio_guide.md - Trackio 监控设置references/hardware_guide.md - 硬件规格和选择references/hub_saving.md - Hub 认证故障排除references/troubleshooting.md - 常见问题和解决方案references/local_training_macos.md - 在 macOS 上进行本地训练scripts/train_sft_example.py - 生产 SFT 模板scripts/train_dpo_example.py - 生产 DPO 模板scripts/train_grpo_example.py - 生产 GRPO 模板scripts/unsloth_sft_example.py - Unsloth 文本 LLM 训练模板(更快,更少 VRAM)scripts/estimate_cost.py - 估算时间和成本(适时提供)scripts/convert_to_gguf.py - 完整的 GGUF 转换脚本uv run 或 hf_jobs 使用)script 参数直接接受 Python 代码;除非用户要求,否则无需保存文件scripts/estimate_cost.pyhf_jobs("uv", {...});标准训练使用 TRL 维护的脚本;在 Claude Code 中避免使用 bash trl-jobs 命令每周安装次数
417
仓库
GitHub 星标数
9.9K
首次出现
2026 年 1 月 20 日
安全审计
安装于
opencode324
codex312
gemini-cli311
claude-code304
github-copilot296
cursor291
Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub.
TRL provides multiple training methods:
For detailed TRL method documentation:
hf_doc_search("your query", product="trl")
hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT
hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO
# etc.
See also: references/training_methods.md for method overviews and selection guidance
Use this skill when users want to:
Use Unsloth (references/unsloth.md) instead of standard TRL when:
FastVisionModel supportSee references/unsloth.md for complete Unsloth documentation and scripts/unsloth_sft_example.py for a production-ready training script.
When assisting with training jobs:
ALWAYS usehf_jobs() MCP tool - Submit jobs using hf_jobs("uv", {...}), NOT bash trl-jobs commands. The script parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to hf_jobs(). If user asks to "train a model", "fine-tune", or similar requests, you MUST create the training script AND submit the job immediately using hf_jobs().
Always include Trackio - Every training script should include Trackio for real-time monitoring. Use example scripts in scripts/ as templates.
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
- Reference , , etc. as starting points.
Repository scripts use PEP 723 inline dependencies. Run them with uv run:
uv run scripts/estimate_cost.py --help
uv run scripts/dataset_inspector.py --help
Before starting any training job, verify:
hf_whoami()secrets={"HF_TOKEN": "$HF_TOKEN"} in job config to make token available (the $HF_TOKEN syntax references your actual token value)datasets.load_dataset()push_to_hub=True, hub_model_id="username/model-name"; Job: secrets={"HF_TOKEN": "$HF_TOKEN"}⚠️ IMPORTANT: Training jobs run asynchronously and can take hours
When user requests training:
scripts/train_sft_example.py as template)hf_jobs() MCP tool with script content inline - don't save to file unless user requestsProvide to user:
Example Response:
✅ Job submitted successfully!
Job ID: abc123xyz
Monitor: https://huggingface.co/jobs/username/abc123xyz
Expected time: ~2 hours
Estimated cost: ~$10
The job is running in the background. Ask me to check status/logs when ready!
💡 Tip for Demos: For quick demos on smaller GPUs (t4-small), omit eval_dataset and eval_strategy to save ~40% memory. You'll still see training loss and learning progress.
TRL config classes usemax_length (not max_seq_length) to control tokenized sequence length:
# ✅ CORRECT - If you need to set sequence length
SFTConfig(max_length=512) # Truncate sequences to 512 tokens
DPOConfig(max_length=2048) # Longer context (2048 tokens)
# ❌ WRONG - This parameter doesn't exist
SFTConfig(max_seq_length=512) # TypeError!
Default behavior: max_length=1024 (truncates from right). This works well for most training.
When to override:
max_length=2048)max_length=512)max_length=None (prevents cutting image tokens)Usually you don't need to set this parameter at all - the examples below use the sensible default.
UV scripts use PEP 723 inline dependencies for clean, self-contained training. This is the primary approach for Claude Code.
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio"]
# ///
from datasets import load_dataset
from peft import LoraConfig
from trl import SFTTrainer, SFTConfig
import trackio
dataset = load_dataset("trl-lib/Capybara", split="train")
# Create train/eval split for monitoring
dataset_split = dataset.train_test_split(test_size=0.1, seed=42)
trainer = SFTTrainer(
model="Qwen/Qwen2.5-0.5B",
train_dataset=dataset_split["train"],
eval_dataset=dataset_split["test"],
peft_config=LoraConfig(r=16, lora_alpha=32),
args=SFTConfig(
output_dir="my-model",
push_to_hub=True,
hub_model_id="username/my-model",
num_train_epochs=3,
eval_strategy="steps",
eval_steps=50,
report_to="trackio",
project="meaningful_prject_name", # project name for the training name (trackio)
run_name="meaningful_run_name", # descriptive name for the specific training run (trackio)
)
)
trainer.train()
trainer.push_to_hub()
""",
"flavor": "a10g-large",
"timeout": "2h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
Benefits: Direct MCP tool usage, clean code, dependencies declared inline (PEP 723), no file saving required, full control When to use: Default choice for all training tasks in Claude Code, custom training logic, any scenario requiring hf_jobs()
⚠️ Important: The script parameter accepts either inline code (as shown above) OR a URL. Local file paths do NOT work.
Why local paths don't work: Jobs run in isolated Docker containers without access to your local filesystem. Scripts must be:
Common mistakes:
# ❌ These will all fail
hf_jobs("uv", {"script": "train.py"})
hf_jobs("uv", {"script": "./scripts/train.py"})
hf_jobs("uv", {"script": "/path/to/train.py"})
Correct approaches:
# ✅ Inline code (recommended)
hf_jobs("uv", {"script": "# /// script\n# dependencies = [...]\n# ///\n\n<your code>"})
# ✅ From Hugging Face Hub
hf_jobs("uv", {"script": "https://huggingface.co/user/repo/resolve/main/train.py"})
# ✅ From GitHub
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/user/repo/main/train.py"})
# ✅ From Gist
hf_jobs("uv", {"script": "https://gist.githubusercontent.com/user/id/raw/train.py"})
To use local scripts: Upload to HF Hub first:
hf repos create my-training-scripts --type model
hf upload my-training-scripts ./train.py train.py
# Use: https://huggingface.co/USERNAME/my-training-scripts/resolve/main/train.py
TRL provides battle-tested scripts for all methods. Can be run from URLs:
hf_jobs("uv", {
"script": "https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py",
"script_args": [
"--model_name_or_path", "Qwen/Qwen2.5-0.5B",
"--dataset_name", "trl-lib/Capybara",
"--output_dir", "my-model",
"--push_to_hub",
"--hub_model_id", "username/my-model"
],
"flavor": "a10g-large",
"timeout": "2h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"}
})
Benefits: No code to write, maintained by TRL team, production-tested When to use: Standard TRL training, quick experiments, don't need custom code Available: Scripts are available from https://github.com/huggingface/trl/tree/main/examples/scripts
The uv-scripts organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
# Discover available UV script collections
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
# Explore a specific collection
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
Popular collections: ocr, classification, synthetic-data, vllm, dataset-creation
When the hf_jobs() MCP tool is unavailable, use the hf jobs CLI directly.
⚠️ CRITICAL: CLI Syntax Rules
# ✅ CORRECT syntax - flags BEFORE script URL
hf jobs uv run --flavor a10g-large --timeout 2h --secrets HF_TOKEN "https://example.com/train.py"
# ❌ WRONG - "run uv" instead of "uv run"
hf jobs run uv "https://example.com/train.py" --flavor a10g-large
# ❌ WRONG - flags AFTER script URL (will be ignored!)
hf jobs uv run "https://example.com/train.py" --flavor a10g-large
# ❌ WRONG - "--secret" instead of "--secrets" (plural)
hf jobs uv run --secret HF_TOKEN "https://example.com/train.py"
Key syntax rules:
hf jobs uv run (NOT hf jobs run uv)--flavor, --timeout, --secrets) must come BEFORE the script URL--secrets (plural), not --secretComplete CLI example:
hf jobs uv run \
--flavor a10g-large \
--timeout 2h \
--secrets HF_TOKEN \
"https://huggingface.co/user/repo/resolve/main/train.py"
Check job status via CLI:
hf jobs ps # List all jobs
hf jobs logs <job-id> # View logs
hf jobs inspect <job-id> # Job details
hf jobs cancel <job-id> # Cancel a job
The trl-jobs package provides optimized defaults and one-liner training.
uvx trl-jobs sft \
--model_name Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/Capybara
Benefits: Pre-configured settings, automatic Trackio integration, automatic Hub push, one-line commands When to use: User working in terminal directly (not Claude Code context), quick local experimentation Repository: https://github.com/huggingface/trl-jobs
⚠️ In Claude Code context, prefer usinghf_jobs() MCP tool (Approach 1) when available.
| Model Size | Recommended Hardware | Cost (approx/hr) | Use Case |
|---|---|---|---|
| <1B params | t4-small | ~$0.75 | Demos, quick tests only without eval steps |
| 1-3B params | t4-medium, l4x1 | ~$1.50-2.50 | Development |
| 3-7B params | a10g-small, a10g-large | ~$3.50-5.00 | Production training |
| 7-13B params |
GPU Flavors: cpu-basic/upgrade/performance/xl, t4-small/medium, l4x1/x4, a10g-small/large/largex2/largex4, a100-large, h100/h100x8
Guidelines:
See: references/hardware_guide.md for detailed specifications
⚠️ EPHEMERAL ENVIRONMENT—MUST PUSH TO HUB
The Jobs environment is temporary. All files are deleted when the job ends. If the model isn't pushed to Hub, ALL TRAINING IS LOST.
In training script/config:
SFTConfig(
push_to_hub=True,
hub_model_id="username/model-name", # MUST specify
hub_strategy="every_save", # Optional: push checkpoints
)
In job submission:
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # Enables authentication
}
Before submitting:
push_to_hub=True set in confighub_model_id includes username/repo-namesecrets parameter includes HF_TOKENSee: references/hub_saving.md for detailed troubleshooting
⚠️ DEFAULT: 30 MINUTES—TOO SHORT FOR TRAINING
{
"timeout": "2h" # 2 hours (formats: "90m", "2h", "1.5h", or seconds as integer)
}
| Scenario | Recommended | Notes |
|---|---|---|
| Quick demo (50-100 examples) | 10-30 min | Verify setup |
| Development training | 1-2 hours | Small datasets |
| Production (3-7B model) | 4-6 hours | Full datasets |
| Large model with LoRA | 3-6 hours | Depends on dataset |
Always add 20-30% buffer for model/dataset loading, checkpoint saving, Hub push operations, and network delays.
On timeout: Job killed immediately, all unsaved progress lost, must restart from beginning
Offer to estimate cost when planning jobs with known parameters. Use scripts/estimate_cost.py:
uv run scripts/estimate_cost.py \
--model meta-llama/Llama-2-7b-hf \
--dataset trl-lib/Capybara \
--hardware a10g-large \
--dataset-size 16000 \
--epochs 3
Output includes estimated time, cost, recommended timeout (with buffer), and optimization suggestions.
When to offer: User planning a job, asks about cost/time, choosing hardware, job will run >1 hour or cost >$5
Production-ready templates with all best practices:
Load these scripts for correctly:
scripts/train_sft_example.py - Complete SFT training with Trackio, LoRA, checkpointsscripts/train_dpo_example.py - DPO training for preference learningscripts/train_grpo_example.py - GRPO training for online RLThese scripts demonstrate proper Hub saving, Trackio integration, checkpoint management, and optimized parameters. Pass their content inline to hf_jobs() or use as templates for custom scripts.
Trackio provides real-time metrics visualization. See references/trackio_guide.md for complete setup guide.
Key points:
trackio to dependenciesreport_to="trackio" and run_name="meaningful_name"Use sensible defaults unless user specifies otherwise. When generating training scripts with Trackio:
Default Configuration:
{username}/trackio (use "trackio" as default space name)User overrides: If user requests specific trackio configuration (custom space, run naming, grouping, or additional config), apply their preferences instead of defaults.
This is useful for managing multiple jobs with the same configuration or keeping training scripts portable.
See references/trackio_guide.md for complete documentation including grouping runs for experiments.
# List all jobs
hf_jobs("ps")
# Inspect specific job
hf_jobs("inspect", {"job_id": "your-job-id"})
# View logs
hf_jobs("logs", {"job_id": "your-job-id"})
Remember: Wait for user to request status checks. Avoid polling repeatedly.
Validate dataset format BEFORE launching GPU training to prevent the #1 cause of training failures: format mismatches.
prompt, chosen, rejected)ALWAYS validate for:
Skip validation for known TRL datasets:
trl-lib/ultrachat_200k, trl-lib/Capybara, HuggingFaceH4/ultrachat_200k, etc.hf_jobs("uv", {
"script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
"script_args": ["--dataset", "username/dataset-name", "--split", "train"]
})
The script is fast, and will usually complete synchronously.
The output shows compatibility for each training method:
✓ READY - Dataset is compatible, use directly✗ NEEDS MAPPING - Compatible but needs preprocessing (mapping code provided)✗ INCOMPATIBLE - Cannot be used for this methodWhen mapping is needed, the output includes a "MAPPING CODE" section with copy-paste ready Python code.
# 1. Inspect dataset (costs ~$0.01, <1 min on CPU)
hf_jobs("uv", {
"script": "https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py",
"script_args": ["--dataset", "argilla/distilabel-math-preference-dpo", "--split", "train"]
})
# 2. Check output markers:
# ✓ READY → proceed with training
# ✗ NEEDS MAPPING → apply mapping code below
# ✗ INCOMPATIBLE → choose different method/dataset
# 3. If mapping needed, apply before training:
def format_for_dpo(example):
return {
'prompt': example['instruction'],
'chosen': example['chosen_response'],
'rejected': example['rejected_response'],
}
dataset = dataset.map(format_for_dpo, remove_columns=dataset.column_names)
# 4. Launch training job with confidence
Most DPO datasets use non-standard column names. Example:
Dataset has: instruction, chosen_response, rejected_response
DPO expects: prompt, chosen, rejected
The validator detects this and provides exact mapping code to fix it.
After training, convert models to GGUF format for use with llama.cpp, Ollama, LM Studio, and other local inference tools.
What is GGUF:
When to convert:
See: references/gguf_conversion.md for complete conversion guide, including production-ready conversion script, quantization options, hardware requirements, usage examples, and troubleshooting.
Quick conversion:
hf_jobs("uv", {
"script": "<see references/gguf_conversion.md for complete script>",
"flavor": "a10g-large",
"timeout": "45m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
"env": {
"ADAPTER_MODEL": "username/my-finetuned-model",
"BASE_MODEL": "Qwen/Qwen2.5-0.5B",
"OUTPUT_REPO": "username/my-model-gguf"
}
})
See references/training_patterns.md for detailed examples including:
Fix (try in order):
per_device_train_batch_size=1, increase gradient_accumulation_steps=8. Effective batch size is per_device_train_batch_size x gradient_accumulation_steps. For best performance keep effective batch size close to 128.gradient_checkpointing=TrueFix:
Validate first with dataset inspector:
uv run https://huggingface.co/datasets/mcp-tools/skills/raw/main/dataset_inspector.py \
--dataset name --split train
Check output for compatibility markers (✓ READY, ✗ NEEDS MAPPING, ✗ INCOMPATIBLE)
Apply mapping code from inspector output if needed
Fix:
hf_jobs("logs", {"job_id": "..."})"timeout": "3h" (add 30% to estimated time)num_train_epochs, use smaller dataset, enable max_stepssave_strategy="steps", save_steps=500, hub_strategy="every_save"Note: Default 30min is insufficient for real training. Minimum 1-2 hours.
Fix:
secrets={"HF_TOKEN": "$HF_TOKEN"}push_to_hub=True, hub_model_id="username/model-name"mcp__huggingface__hf_whoami()hub_private_repo=True)Fix: Add to PEP 723 header:
# /// script
# dependencies = ["trl>=0.12.0", "peft>=0.7.0", "trackio", "missing-package"]
# ///
Common issues:
mcp__huggingface__hf_whoami(), token permissions, secrets parameterSee: references/troubleshooting.md for complete troubleshooting guide
references/training_methods.md - Overview of SFT, DPO, GRPO, KTO, PPO, Reward Modelingreferences/training_patterns.md - Common training patterns and examplesreferences/unsloth.md - Unsloth for fast VLM training (~2x speed, 60% less VRAM)references/gguf_conversion.md - Complete GGUF conversion guidereferences/trackio_guide.md - Trackio monitoring setupreferences/hardware_guide.md - Hardware specs and selectionreferences/hub_saving.md - Hub authentication troubleshootingreferences/troubleshooting.md - Common issues and solutionsreferences/local_training_macos.md - Local training on macOSscripts/train_sft_example.py - Production SFT templatescripts/train_dpo_example.py - Production DPO templatescripts/train_grpo_example.py - Production GRPO templatescripts/unsloth_sft_example.py - Unsloth text LLM training template (faster, less VRAM)scripts/estimate_cost.py - Estimate time and cost (offer when appropriate)scripts/convert_to_gguf.py - Complete GGUF conversion scriptuv run or hf_jobs)script parameter accepts Python code directly; no file saving required unless user requestsscripts/estimate_cost.pyhf_jobs("uv", {...}) with inline scripts; TRL maintained scripts for standard training; avoid bash trl-jobs commands in Claude CodeWeekly Installs
417
Repository
GitHub Stars
9.9K
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykFail
Installed on
opencode324
codex312
gemini-cli311
claude-code304
github-copilot296
cursor291
Azure Data Explorer (Kusto) 查询技能:KQL数据分析、日志遥测与时间序列处理
100,500 周安装
save-thread技能:会话保存与交接摘要工具(兼容性说明)
320 周安装
Canvas Design - AI设计哲学与视觉表达工具,自动生成专业级视觉艺术作品
320 周安装
Codex Subagent 子代理技能:创建自主代理卸载上下文密集型任务,优化AI工作流
320 周安装
Snowflake原生Streamlit应用开发指南:部署、运行时与安全模型详解
320 周安装
agent-browser:专为AI智能体设计的Rust无头浏览器自动化CLI工具
321 周安装
Preline主题生成器 - 一键生成美观UI主题CSS文件,支持深色模式
321 周安装
scripts/train_sft_example.pyscripts/train_dpo_example.pya10g-large, a100-large |
| ~$5-10 |
| Large models (use LoRA) |
| 13B+ params | a100-large, a10g-largex2 | ~$10-20 | Very large (use LoRA) |