重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
hugging-face-jobs by sickn33/antigravity-awesome-skills
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill hugging-face-jobs在完全托管的 Hugging Face 基础设施上运行任何工作负载。无需本地设置——任务在云端 CPU、GPU 或 TPU 上运行,并可将结果持久化到 Hugging Face Hub。
常见用例:
model-trainer 技能)专门用于模型训练: 针对基于 TRL 的训练工作流,请参阅 model-trainer 技能。
当用户希望进行以下操作时使用此技能:
协助处理任务时:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
始终使用 hf_jobs() MCP 工具 - 使用 hf_jobs("uv", {...}) 或 hf_jobs("run", {...}) 提交任务。script 参数直接接受 Python 代码。除非用户明确要求,否则不要保存到本地文件。将脚本内容作为字符串传递给 hf_jobs()。
始终处理身份验证 - 与 Hub 交互的任务需要通过 secrets 提供 HF_TOKEN。请参阅下面的令牌使用部分。
提交后提供任务详情 - 提交后,提供任务 ID、监控 URL、预计时间,并告知用户稍后可以请求状态检查。
设置适当的超时时间 - 默认 30 分钟对于长时间运行的任务可能不够。
在启动任何任务之前,请验证:
hf_whoami() 检查需要令牌的情况:
如何提供令牌:
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # 推荐:自动令牌
}
⚠️ 关键: $HF_TOKEN 占位符会自动替换为您已登录的令牌。切勿在脚本中硬编码令牌。
什么是 HF 令牌?
hf auth login 后安全地存储在您的机器上令牌类型:
始终需要:
不需要:
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ 自动替换
})
工作原理:
$HF_TOKEN 是一个占位符,会被替换为您的实际令牌hf auth login)好处:
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "hf_abc123..."} # ⚠️ 硬编码令牌
})
何时使用:
安全问题:
hf_jobs("uv", {
"script": "your_script.py",
"env": {"HF_TOKEN": "hf_abc123..."} # ⚠️ 比 secrets 安全性低
})
与 secrets 的区别:
env 变量在任务日志中可见secrets 在服务器端被加密secrets在您的 Python 脚本中,令牌作为环境变量可用:
# /// script
# dependencies = ["huggingface-hub"]
# ///
import os
from huggingface_hub import HfApi
# 如果通过 secrets 传递,令牌会自动可用
token = os.environ.get("HF_TOKEN")
# 与 Hub API 一起使用
api = HfApi(token=token)
# 或者让 huggingface_hub 自动检测
api = HfApi() # 自动使用 HF_TOKEN 环境变量
最佳实践:
os.environ.get("HF_TOKEN") 来访问huggingface_hub 自动检测检查是否已登录:
from huggingface_hub import whoami
user_info = whoami() # 如果已认证,返回您的用户名
验证任务中的令牌:
import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN not found!"
token = os.environ["HF_TOKEN"]
print(f"Token starts with: {token[:7]}...") # 应以 "hf_" 开头
错误:401 未授权
secrets={"HF_TOKEN": "$HF_TOKEN"}hf_whoami() 在本地是否有效错误:403 禁止访问
错误:环境中未找到令牌
secrets 或键名错误secrets={"HF_TOKEN": "$HF_TOKEN"}(而不是 env)os.environ.get("HF_TOKEN")错误:仓库访问被拒绝
$HF_TOKEN 占位符或环境变量# 示例:将结果推送到 Hub
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["huggingface-hub", "datasets"]
# ///
import os
from huggingface_hub import HfApi
from datasets import Dataset
# 验证令牌是否可用
assert "HF_TOKEN" in os.environ, "HF_TOKEN required!"
# 使用令牌进行 Hub 操作
api = HfApi(token=os.environ["HF_TOKEN"])
# 创建并推送数据集
data = {"text": ["Hello", "World"]}
dataset = Dataset.from_dict(data)
dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])
print("✅ Dataset pushed successfully!")
""",
"flavor": "cpu-basic",
"timeout": "30m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ 安全地提供令牌
})
UV 脚本使用 PEP 723 内联依赖项,实现简洁、自包含的工作负载。
MCP 工具:
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["transformers", "torch"]
# ///
from transformers import pipeline
import torch
# 您的工作负载放在这里
classifier = pipeline("sentiment-analysis")
result = classifier("I love Hugging Face!")
print(result)
""",
"flavor": "cpu-basic",
"timeout": "30m"
})
CLI 等效命令:
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m
Python API:
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")
好处: 直接使用 MCP 工具,代码简洁,依赖项内联声明,无需保存文件
何时使用: 所有工作负载的默认选择,自定义逻辑,任何需要 hf_jobs() 的场景
默认情况下,UV 脚本使用 ghcr.io/astral-sh/uv:python3.12-bookworm-slim。对于具有复杂依赖项的 ML 工作负载,请使用预构建的镜像:
hf_jobs("uv", {
"script": "inference.py",
"image": "vllm/vllm-openai:latest", # 预构建的包含 vLLM 的镜像
"flavor": "a10g-large"
})
CLI:
hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py
好处: 启动更快,预安装依赖项,针对特定框架优化
默认情况下,UV 脚本使用 Python 3.12。指定不同版本:
hf_jobs("uv", {
"script": "my_script.py",
"python": "3.11", # 使用 Python 3.11
"flavor": "cpu-basic"
})
Python API:
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")
⚠️ 重要: 根据您运行任务的方式,存在两种“脚本路径”情况:
hf_jobs() MCP 工具(本仓库推荐):script 值必须是内联代码(字符串)或URL。本地文件系统路径(如 "./scripts/foo.py")在远程容器中不存在。hf jobs uv run CLI:本地文件路径确实有效(CLI 会上传您的脚本)。使用 hf_jobs() MCP 工具时的常见错误:
# ❌ 将失败(远程容器无法看到您的本地路径)
hf_jobs("uv", {"script": "./scripts/foo.py"})
使用 hf_jobs() MCP 工具的正确模式:
# ✅ 内联:读取本地脚本文件并传递其*内容*
from pathlib import Path
script = Path("hf-jobs/scripts/foo.py").read_text()
hf_jobs("uv", {"script": script})
# ✅ URL:将脚本托管在可访问的位置
hf_jobs("uv", {"script": "https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py"})
# ✅ 来自 GitHub 的 URL
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})
CLI 等效命令(支持本地路径):
hf jobs uv run ./scripts/foo.py -- --your --args
添加 PEP 723 头部之外的额外依赖项:
hf_jobs("uv", {
"script": "inference.py",
"dependencies": ["transformers", "torch>=2.0"], # 额外依赖项
"flavor": "a10g-small"
})
Python API:
from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])
使用自定义 Docker 镜像和命令运行任务。
MCP 工具:
hf_jobs("run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Hello from HF Jobs!')"],
"flavor": "cpu-basic",
"timeout": "30m"
})
CLI 等效命令:
hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"
Python API:
from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")
好处: 完全的 Docker 控制,使用预构建镜像,运行任何命令 何时使用: 需要特定的 Docker 镜像,非 Python 工作负载,复杂环境
使用 GPU 的示例:
hf_jobs("run", {
"image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
"command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
"flavor": "a10g-small",
"timeout": "1h"
})
使用 Hugging Face Spaces 作为镜像:
您可以使用来自 HF Spaces 的 Docker 镜像:
hf_jobs("run", {
"image": "hf.co/spaces/lhoestq/duckdb", # 将 Space 用作 Docker 镜像
"command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
"flavor": "cpu-basic"
})
CLI:
hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"
uv-scripts 组织提供了随时可用的 UV 脚本,这些脚本作为数据集存储在 Hugging Face Hub 上:
# 发现可用的 UV 脚本集合
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
# 探索特定集合
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
热门集合: OCR、classification、synthetic-data、vLLM、dataset-creation
参考: HF Jobs 硬件文档(更新于 07/2025)
| 工作负载类型 | 推荐硬件 | 用例 |
|---|---|---|
| 数据处理、测试 | cpu-basic, cpu-upgrade | 轻量级任务 |
| 小型模型、演示 | t4-small | <1B 模型,快速测试 |
| 中型模型 | t4-medium, l4x1 | 1-7B 模型 |
| 大型模型、生产环境 | a10g-small, a10g-large | 7-13B 模型 |
| 超大型模型 | a100-large | 13B+ 模型 |
| 批量推理 | a10g-large, a100-large | 高吞吐量 |
| 多 GPU 工作负载 | l4x4, a10g-largex2, a10g-largex4 | 并行/大型模型 |
| TPU 工作负载 | v5e-1x1, v5e-2x2, v5e-2x4 | JAX/Flax,TPU 优化 |
所有可用规格:
cpu-basic, cpu-upgradet4-small, t4-medium, l4x1, l4x4, a10g-small, a10g-large, a10g-largex2, a10g-largex4, a100-largev5e-1x1, v5e-2x2, v5e-2x4指南:
references/hardware_guide.md⚠️ 临时环境——必须持久化结果
任务环境是临时的。所有文件在任务结束时都会被删除。如果结果没有持久化,所有工作都将丢失。
1. 推送到 Hugging Face Hub(推荐)
# 推送模型
model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])
# 推送数据集
dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])
# 推送工件
api.upload_file(
path_or_fileobj="results.json",
path_in_repo="results.json",
repo_id="username/results",
token=os.environ["HF_TOKEN"]
)
2. 使用外部存储
# 上传到 S3、GCS 等
import boto3
s3 = boto3.client('s3')
s3.upload_file('results.json', 'my-bucket', 'results.json')
3. 通过 API 发送结果
# 将结果 POST 到您的 API
import requests
requests.post("https://your-api.com/results", json=results)
在任务提交中:
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # 启用身份验证
}
在脚本中:
import os
from huggingface_hub import HfApi
# 令牌自动从 secrets 中可用
api = HfApi(token=os.environ.get("HF_TOKEN"))
# 推送您的结果
api.upload_file(...)
提交前:
secrets={"HF_TOKEN": "$HF_TOKEN"}请参阅: references/hub_saving.md 获取详细的 Hub 持久化指南
⚠️ 默认:30 分钟
任务在超时后会自动停止。对于像训练这样的长时间运行任务,请始终设置自定义超时。
MCP 工具:
{
"timeout": "2h" # 2 小时
}
支持的格式:
300 = 5 分钟)"5m"(分钟),"2h"(小时),"1d"(天)"90m", "2h", "1.5h", 300, "1d"Python API:
from huggingface_hub import run_job, run_uv_job
run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200) # 2 小时,以秒为单位
| 场景 | 推荐 | 备注 |
|---|---|---|
| 快速测试 | 10-30 分钟 | 验证设置 |
| 数据处理 | 1-2 小时 | 取决于数据大小 |
| 批量推理 | 2-4 小时 | 大批量 |
| 实验 | 4-8 小时 | 多次运行 |
| 长时间运行 | 8-24 小时 | 生产工作负载 |
始终增加 20-30% 的缓冲时间,用于设置、网络延迟和清理。
超时时: 任务立即终止,所有未保存的进度丢失
一般指南:
Total Cost = (Hours of runtime) × (Cost per hour)
示例计算:
快速测试:
数据处理:
批量推理:
成本优化技巧:
MCP 工具:
# 列出所有任务
hf_jobs("ps")
# 检查特定任务
hf_jobs("inspect", {"job_id": "your-job-id"})
# 查看日志
hf_jobs("logs", {"job_id": "your-job-id"})
# 取消任务
hf_jobs("cancel", {"job_id": "your-job-id"})
Python API:
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job
# 列出您的任务
jobs = list_jobs()
# 仅列出正在运行的任务
running = [j for j in list_jobs() if j.status.stage == "RUNNING"]
# 检查特定任务
job_info = inspect_job(job_id="your-job-id")
# 查看日志
for log in fetch_job_logs(job_id="your-job-id"):
print(log)
# 取消任务
cancel_job(job_id="your-job-id")
CLI:
hf jobs ps # 列出任务
hf jobs logs <job-id> # 查看日志
hf jobs cancel <job-id> # 取消任务
记住: 等待用户请求状态检查。避免重复轮询。
提交后,任务具有监控 URL:
https://huggingface.co/jobs/username/job-id
在浏览器中查看日志、状态和详细信息。
import time
from huggingface_hub import inspect_job, run_job
# 运行多个任务
jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]
# 等待所有任务完成
for job in jobs:
while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"):
time.sleep(10)
使用 CRON 表达式或预定义计划按计划运行任务。
MCP 工具:
# 安排一个每小时运行的 UV 脚本
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "@hourly",
"flavor": "cpu-basic"
})
# 使用 CRON 语法安排
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "0 9 * * 1", # 每周一上午 9 点
"flavor": "cpu-basic"
})
# 安排一个基于 Docker 的任务
hf_jobs("scheduled run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Scheduled!')"],
"schedule": "@daily",
"flavor": "cpu-basic"
})
Python API:
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job
# 安排一个 Docker 任务
create_scheduled_job(
image="python:3.12",
command=["python", "-c", "print('Running on schedule!')"],
schedule="@hourly"
)
# 安排一个 UV 脚本
create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")
# 使用 GPU 安排
create_scheduled_uv_job(
"ml_inference.py",
schedule="0 */6 * * *", # 每 6 小时
flavor="a10g-small"
)
可用计划:
@annually, @yearly - 每年一次@monthly - 每月一次@weekly - 每周一次@daily - 每天一次@hourly - 每小时一次"*/5 * * * *" 表示每 5 分钟)管理计划任务:
# MCP 工具
hf_jobs("scheduled ps") # 列出计划任务
hf_jobs("scheduled inspect", {"job_id": "..."}) # 检查详情
hf_jobs("scheduled suspend", {"job_id": "..."}) # 暂停
hf_jobs("scheduled resume", {"job_id": "..."}) # 恢复
hf_jobs("scheduled delete", {"job_id": "..."}) # 删除
用于管理的 Python API:
from huggingface_hub import (
list_scheduled_jobs,
inspect_scheduled_job,
suspend_scheduled_job,
resume_scheduled_job,
delete_scheduled_job
)
# 列出所有计划任务
scheduled = list_scheduled_jobs()
# 检查一个计划任务
info = inspect_scheduled_job(scheduled_job_id)
# 暂停一个计划任务
suspend_scheduled_job(scheduled_job_id)
# 恢复一个计划任务
resume_scheduled_job(scheduled_job_id)
# 删除一个计划任务
delete_scheduled_job(scheduled_job_id)
当 Hugging Face 仓库发生更改时,自动触发任务。
Python API:
from huggingface_hub import create_webhook
# 创建在仓库更改时触发任务的 webhook
webhook = create_webhook(
job_id=job.id,
watched=[
{"type": "user", "name": "your-username"},
{"type": "org", "name": "your-org-name"}
],
domains=["repo", "discussion"],
secret="your-secret"
)
工作原理:
WEBHOOK_PAYLOAD 环境变量用例:
在脚本中访问 webhook 有效负载:
import os
import json
payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"Event type: {payload.get('event', {}).get('action')}")
详情请参阅 Webhooks 文档。
此仓库在 hf-jobs/scripts/ 中提供了随时可运行的 UV 脚本。请优先使用它们,而不是发明新的模板。
scripts/generate-responses.py功能: 从 Hub 加载数据集(聊天 messages 或 prompt 列),应用模型聊天模板,使用 vLLM 生成响应,并将输出数据集 + 数据集卡片推送回 Hub。
要求: GPU + 写入令牌(它会推送数据集)。
from pathlib import Path
script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"username/input-dataset",
"username/output-dataset",
"--messages-column", "messages",
"--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
"--temperature", "0.7",
"--top-p", "0.8",
"--max-tokens", "2048",
],
"flavor": "a10g-large",
"timeout": "4h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
scripts/cot-self-instruct.py功能: 通过 CoT 自指导生成合成提示/答案,可选地过滤输出(答案一致性 / RIP),然后将生成的数据集 + 数据集卡片推送到 Hub。
要求: GPU + 写入令牌(它会推送数据集)。
from pathlib import Path
script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--seed-dataset", "davanstrien/s1k-reasoning",
"--output-dataset", "username/synthetic-math",
"--task-type", "reasoning",
"--num-samples", "5000",
"--filter-method", "answer-consistency",
],
"flavor": "l4x4",
"timeout": "8h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
scripts/finepdfs-stats.py功能: 直接从 Hub 扫描 parquet 文件(无需下载 300GB),计算时间统计信息,并(可选)将结果上传到 Hub 数据集仓库。
要求: CPU 通常足够;仅当您传递 --output-repo(上传)时才需要令牌。
from pathlib import Path
script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--limit", "10000",
"--show-plan",
"--output-repo", "username/finepdfs-temporal-stats",
],
"flavor": "cpu-upgrade",
"timeout": "2h",
"env": {"HF_XET_HIGH_PERFORMANCE": "1"},
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
修复:
修复:
"timeout": "3h"修复:
secrets={"HF_TOKEN": "$HF_TOKEN"}assert "HF_TOKEN" in os.environ修复: 添加到 PEP 723 头部:
# /// script
# dependencies = ["package1", "package2>=1.0.0"]
# ///
修复:
hf_whoami() 在本地是否有效secrets={"HF_TOKEN": "$HF_TOKEN"}hf auth login常见问题:
请参阅: references/troubleshooting.md 获取完整的故障排除指南
references/token_usage.md - 完整的令牌使用指南references/hardware_guide.md - 硬件Run any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.
Common use cases:
model-trainer skill for TRL-specific training)For model training specifically: See the model-trainer skill for TRL-based training workflows.
Use this skill when users want to:
When assisting with jobs:
ALWAYS usehf_jobs() MCP tool - Submit jobs using hf_jobs("uv", {...}) or hf_jobs("run", {...}). The script parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to hf_jobs().
Always handle authentication - Jobs that interact with the Hub require HF_TOKEN via secrets. See Token Usage section below.
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
Set appropriate timeouts - Default 30min may be insufficient for long-running tasks.
Before starting any job, verify:
hf_whoami()When tokens are required:
How to provide tokens:
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # Recommended: automatic token
}
⚠️ CRITICAL: The $HF_TOKEN placeholder is automatically replaced with your logged-in token. Never hardcode tokens in scripts.
What are HF Tokens?
hf auth loginToken Types:
Always Required:
Not Required:
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Automatic replacement
})
How it works:
$HF_TOKEN is a placeholder that gets replaced with your actual tokenhf auth login)Benefits:
hf_jobs("uv", {
"script": "your_script.py",
"secrets": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Hardcoded token
})
When to use:
Security concerns:
hf_jobs("uv", {
"script": "your_script.py",
"env": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Less secure than secrets
})
Difference from secrets:
env variables are visible in job logssecrets are encrypted server-sidesecrets for tokensIn your Python script, tokens are available as environment variables:
# /// script
# dependencies = ["huggingface-hub"]
# ///
import os
from huggingface_hub import HfApi
# Token is automatically available if passed via secrets
token = os.environ.get("HF_TOKEN")
# Use with Hub API
api = HfApi(token=token)
# Or let huggingface_hub auto-detect
api = HfApi() # Automatically uses HF_TOKEN env var
Best practices:
os.environ.get("HF_TOKEN") to accesshuggingface_hub auto-detect when possibleCheck if you're logged in:
from huggingface_hub import whoami
user_info = whoami() # Returns your username if authenticated
Verify token in job:
import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN not found!"
token = os.environ["HF_TOKEN"]
print(f"Token starts with: {token[:7]}...") # Should start with "hf_"
Error: 401 Unauthorized
secrets={"HF_TOKEN": "$HF_TOKEN"} to job confighf_whoami() works locallyError: 403 Forbidden
Error: Token not found in environment
secrets not passed or wrong key namesecrets={"HF_TOKEN": "$HF_TOKEN"} (not env)os.environ.get("HF_TOKEN")Error: Repository access denied
$HF_TOKEN placeholder or environment variables# Example: Push results to Hub
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["huggingface-hub", "datasets"]
# ///
import os
from huggingface_hub import HfApi
from datasets import Dataset
# Verify token is available
assert "HF_TOKEN" in os.environ, "HF_TOKEN required!"
# Use token for Hub operations
api = HfApi(token=os.environ["HF_TOKEN"])
# Create and push dataset
data = {"text": ["Hello", "World"]}
dataset = Dataset.from_dict(data)
dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])
print("✅ Dataset pushed successfully!")
""",
"flavor": "cpu-basic",
"timeout": "30m",
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided securely
})
UV scripts use PEP 723 inline dependencies for clean, self-contained workloads.
MCP Tool:
hf_jobs("uv", {
"script": """
# /// script
# dependencies = ["transformers", "torch"]
# ///
from transformers import pipeline
import torch
# Your workload here
classifier = pipeline("sentiment-analysis")
result = classifier("I love Hugging Face!")
print(result)
""",
"flavor": "cpu-basic",
"timeout": "30m"
})
CLI Equivalent:
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m
Python API:
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")
Benefits: Direct MCP tool usage, clean code, dependencies declared inline, no file saving required
When to use: Default choice for all workloads, custom logic, any scenario requiring hf_jobs()
By default, UV scripts use ghcr.io/astral-sh/uv:python3.12-bookworm-slim. For ML workloads with complex dependencies, use pre-built images:
hf_jobs("uv", {
"script": "inference.py",
"image": "vllm/vllm-openai:latest", # Pre-built image with vLLM
"flavor": "a10g-large"
})
CLI:
hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py
Benefits: Faster startup, pre-installed dependencies, optimized for specific frameworks
By default, UV scripts use Python 3.12. Specify a different version:
hf_jobs("uv", {
"script": "my_script.py",
"python": "3.11", # Use Python 3.11
"flavor": "cpu-basic"
})
Python API:
from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")
⚠️ Important: There are two "script path" stories depending on how you run Jobs:
hf_jobs() MCP tool (recommended in this repo): the script value must be inline code (a string) or a URL. A local filesystem path (like "./scripts/foo.py") won't exist inside the remote container.hf jobs uv run CLI: local file paths do work (the CLI uploads your script).Common mistake withhf_jobs() MCP tool:
# ❌ Will fail (remote container can't see your local path)
hf_jobs("uv", {"script": "./scripts/foo.py"})
Correct patterns withhf_jobs() MCP tool:
# ✅ Inline: read the local script file and pass its *contents*
from pathlib import Path
script = Path("hf-jobs/scripts/foo.py").read_text()
hf_jobs("uv", {"script": script})
# ✅ URL: host the script somewhere reachable
hf_jobs("uv", {"script": "https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py"})
# ✅ URL from GitHub
hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})
CLI equivalent (local paths supported):
hf jobs uv run ./scripts/foo.py -- --your --args
Add extra dependencies beyond what's in the PEP 723 header:
hf_jobs("uv", {
"script": "inference.py",
"dependencies": ["transformers", "torch>=2.0"], # Extra deps
"flavor": "a10g-small"
})
Python API:
from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])
Run jobs with custom Docker images and commands.
MCP Tool:
hf_jobs("run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Hello from HF Jobs!')"],
"flavor": "cpu-basic",
"timeout": "30m"
})
CLI Equivalent:
hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"
Python API:
from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")
Benefits: Full Docker control, use pre-built images, run any command When to use: Need specific Docker images, non-Python workloads, complex environments
Example with GPU:
hf_jobs("run", {
"image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
"command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
"flavor": "a10g-small",
"timeout": "1h"
})
Using Hugging Face Spaces as Images:
You can use Docker images from HF Spaces:
hf_jobs("run", {
"image": "hf.co/spaces/lhoestq/duckdb", # Space as Docker image
"command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
"flavor": "cpu-basic"
})
CLI:
hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"
The uv-scripts organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:
# Discover available UV script collections
dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})
# Explore a specific collection
hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)
Popular collections: OCR, classification, synthetic-data, vLLM, dataset-creation
Reference: HF Jobs Hardware Docs (updated 07/2025)
| Workload Type | Recommended Hardware | Use Case |
|---|---|---|
| Data processing, testing | cpu-basic, cpu-upgrade | Lightweight tasks |
| Small models, demos | t4-small | <1B models, quick tests |
| Medium models | t4-medium, l4x1 | 1-7B models |
| Large models, production | a10g-small, |
All Available Flavors:
cpu-basic, cpu-upgradet4-small, t4-medium, l4x1, l4x4, a10g-small, a10g-large, a10g-largex2, a10g-largex4, a100-largeGuidelines:
references/hardware_guide.md for detailed specifications⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS
The Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, ALL WORK IS LOST.
1. Push to Hugging Face Hub (Recommended)
# Push models
model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])
# Push datasets
dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])
# Push artifacts
api.upload_file(
path_or_fileobj="results.json",
path_in_repo="results.json",
repo_id="username/results",
token=os.environ["HF_TOKEN"]
)
2. Use External Storage
# Upload to S3, GCS, etc.
import boto3
s3 = boto3.client('s3')
s3.upload_file('results.json', 'my-bucket', 'results.json')
3. Send Results via API
# POST results to your API
import requests
requests.post("https://your-api.com/results", json=results)
In job submission:
{
"secrets": {"HF_TOKEN": "$HF_TOKEN"} # Enables authentication
}
In script:
import os
from huggingface_hub import HfApi
# Token automatically available from secrets
api = HfApi(token=os.environ.get("HF_TOKEN"))
# Push your results
api.upload_file(...)
Before submitting:
secrets={"HF_TOKEN": "$HF_TOKEN"} if using HubSee: references/hub_saving.md for detailed Hub persistence guide
⚠️ DEFAULT: 30 MINUTES
Jobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.
MCP Tool:
{
"timeout": "2h" # 2 hours
}
Supported formats:
300 = 5 minutes)"5m" (minutes), "2h" (hours), "1d" (days)"90m", "2h", "1.5h", 300, "1d"Python API:
from huggingface_hub import run_job, run_uv_job
run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200) # 2 hours in seconds
| Scenario | Recommended | Notes |
|---|---|---|
| Quick test | 10-30 min | Verify setup |
| Data processing | 1-2 hours | Depends on data size |
| Batch inference | 2-4 hours | Large batches |
| Experiments | 4-8 hours | Multiple runs |
| Long-running | 8-24 hours | Production workloads |
Always add 20-30% buffer for setup, network delays, and cleanup.
On timeout: Job killed immediately, all unsaved progress lost
General guidelines:
Total Cost = (Hours of runtime) × (Cost per hour)
Example calculations:
Quick test:
Data processing:
Batch inference:
Cost optimization tips:
MCP Tool:
# List all jobs
hf_jobs("ps")
# Inspect specific job
hf_jobs("inspect", {"job_id": "your-job-id"})
# View logs
hf_jobs("logs", {"job_id": "your-job-id"})
# Cancel a job
hf_jobs("cancel", {"job_id": "your-job-id"})
Python API:
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job
# List your jobs
jobs = list_jobs()
# List running jobs only
running = [j for j in list_jobs() if j.status.stage == "RUNNING"]
# Inspect specific job
job_info = inspect_job(job_id="your-job-id")
# View logs
for log in fetch_job_logs(job_id="your-job-id"):
print(log)
# Cancel a job
cancel_job(job_id="your-job-id")
CLI:
hf jobs ps # List jobs
hf jobs logs <job-id> # View logs
hf jobs cancel <job-id> # Cancel job
Remember: Wait for user to request status checks. Avoid polling repeatedly.
After submission, jobs have monitoring URLs:
https://huggingface.co/jobs/username/job-id
View logs, status, and details in the browser.
import time
from huggingface_hub import inspect_job, run_job
# Run multiple jobs
jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]
# Wait for all to complete
for job in jobs:
while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"):
time.sleep(10)
Run jobs on a schedule using CRON expressions or predefined schedules.
MCP Tool:
# Schedule a UV script that runs every hour
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "@hourly",
"flavor": "cpu-basic"
})
# Schedule with CRON syntax
hf_jobs("scheduled uv", {
"script": "your_script.py",
"schedule": "0 9 * * 1", # 9 AM every Monday
"flavor": "cpu-basic"
})
# Schedule a Docker-based job
hf_jobs("scheduled run", {
"image": "python:3.12",
"command": ["python", "-c", "print('Scheduled!')"],
"schedule": "@daily",
"flavor": "cpu-basic"
})
Python API:
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job
# Schedule a Docker job
create_scheduled_job(
image="python:3.12",
command=["python", "-c", "print('Running on schedule!')"],
schedule="@hourly"
)
# Schedule a UV script
create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")
# Schedule with GPU
create_scheduled_uv_job(
"ml_inference.py",
schedule="0 */6 * * *", # Every 6 hours
flavor="a10g-small"
)
Available schedules:
@annually, @yearly - Once per year@monthly - Once per month@weekly - Once per week@daily - Once per day@hourly - Once per hour"*/5 * * * *" for every 5 minutes)Manage scheduled jobs:
# MCP Tool
hf_jobs("scheduled ps") # List scheduled jobs
hf_jobs("scheduled inspect", {"job_id": "..."}) # Inspect details
hf_jobs("scheduled suspend", {"job_id": "..."}) # Pause
hf_jobs("scheduled resume", {"job_id": "..."}) # Resume
hf_jobs("scheduled delete", {"job_id": "..."}) # Delete
Python API for management:
from huggingface_hub import (
list_scheduled_jobs,
inspect_scheduled_job,
suspend_scheduled_job,
resume_scheduled_job,
delete_scheduled_job
)
# List all scheduled jobs
scheduled = list_scheduled_jobs()
# Inspect a scheduled job
info = inspect_scheduled_job(scheduled_job_id)
# Suspend (pause) a scheduled job
suspend_scheduled_job(scheduled_job_id)
# Resume a scheduled job
resume_scheduled_job(scheduled_job_id)
# Delete a scheduled job
delete_scheduled_job(scheduled_job_id)
Trigger jobs automatically when changes happen in Hugging Face repositories.
Python API:
from huggingface_hub import create_webhook
# Create webhook that triggers a job when a repo changes
webhook = create_webhook(
job_id=job.id,
watched=[
{"type": "user", "name": "your-username"},
{"type": "org", "name": "your-org-name"}
],
domains=["repo", "discussion"],
secret="your-secret"
)
How it works:
WEBHOOK_PAYLOAD environment variableUse cases:
Access webhook payload in script:
import os
import json
payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"Event type: {payload.get('event', {}).get('action')}")
See Webhooks Documentation for more details.
This repository ships ready-to-run UV scripts in hf-jobs/scripts/. Prefer using them instead of inventing new templates.
scripts/generate-responses.pyWhat it does: loads a Hub dataset (chat messages or a prompt column), applies a model chat template, generates responses with vLLM, and pushes the output dataset + dataset card back to the Hub.
Requires: GPU + write token (it pushes a dataset).
from pathlib import Path
script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"username/input-dataset",
"username/output-dataset",
"--messages-column", "messages",
"--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
"--temperature", "0.7",
"--top-p", "0.8",
"--max-tokens", "2048",
],
"flavor": "a10g-large",
"timeout": "4h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
scripts/cot-self-instruct.pyWhat it does: generates synthetic prompts/answers via CoT Self-Instruct, optionally filters outputs (answer-consistency / RIP), then pushes the generated dataset + dataset card to the Hub.
Requires: GPU + write token (it pushes a dataset).
from pathlib import Path
script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--seed-dataset", "davanstrien/s1k-reasoning",
"--output-dataset", "username/synthetic-math",
"--task-type", "reasoning",
"--num-samples", "5000",
"--filter-method", "answer-consistency",
],
"flavor": "l4x4",
"timeout": "8h",
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
scripts/finepdfs-stats.pyWhat it does: scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.
Requires: CPU is often enough; token needed only if you pass --output-repo (upload).
from pathlib import Path
script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
"script": script,
"script_args": [
"--limit", "10000",
"--show-plan",
"--output-repo", "username/finepdfs-temporal-stats",
],
"flavor": "cpu-upgrade",
"timeout": "2h",
"env": {"HF_XET_HIGH_PERFORMANCE": "1"},
"secrets": {"HF_TOKEN": "$HF_TOKEN"},
})
Fix:
Fix:
"timeout": "3h"Fix:
secrets={"HF_TOKEN": "$HF_TOKEN"}assert "HF_TOKEN" in os.environFix: Add to PEP 723 header:
# /// script
# dependencies = ["package1", "package2>=1.0.0"]
# ///
Fix:
hf_whoami() works locallysecrets={"HF_TOKEN": "$HF_TOKEN"} in job confighf auth loginCommon issues:
See: references/troubleshooting.md for complete troubleshooting guide
references/token_usage.md - Complete token usage guidereferences/hardware_guide.md - Hardware specs and selectionreferences/hub_saving.md - Hub persistence guidereferences/troubleshooting.md - Common issues and solutionsscripts/generate-responses.py - vLLM batch generation: dataset → responses → push to Hubscripts/cot-self-instruct.py - CoT Self-Instruct synthetic data generation + filtering → push to Hubscripts/finepdfs-stats.py - Polars streaming stats over finepdfs-edu parquet on Hub (optional push)Official Documentation:
Related Tools:
script parameter accepts Python code directly; no file saving required unless user requestssecrets={"HF_TOKEN": "$HF_TOKEN"} for Hub operationshf_jobs("uv", {...}) with inline scripts for Python workloads| Operation | MCP Tool | CLI | Python API |
|---|---|---|---|
| Run UV script | hf_jobs("uv", {...}) | hf jobs uv run script.py | run_uv_job("script.py") |
| Run Docker job | hf_jobs("run", {...}) | hf jobs run image cmd | run_job(image, command) |
| List jobs |
Weekly Installs
51
Repository
GitHub Stars
28.1K
First Seen
Jan 30, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode50
github-copilot50
codex50
cursor50
gemini-cli49
cline49
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
127,000 周安装
Chrome CDP 命令行工具:轻量级浏览器自动化,支持截图、执行JS、无障碍快照
1,300 周安装
template-skill 技能模板:快速构建AI技能,含触发词、输入输出与工作流
1,300 周安装
TanStack Start 全栈开发:基于 Cloudflare 的 React 19 + D1 数据库应用构建指南
1,300 周安装
Doublecheck AI 内容验证工具 - GitHub Copilot 三层事实核查流程,自动识别幻觉风险
1,400 周安装
Sentry问题修复指南:利用AI分析生产环境错误并自动修复
1,300 周安装
退货与逆向物流管理指南:策略、分级、处置与欺诈检测全流程解析
1,400 周安装
a10g-large| 7-13B models |
| Very large models | a100-large | 13B+ models |
| Batch inference | a10g-large, a100-large | High-throughput |
| Multi-GPU workloads | l4x4, a10g-largex2, a10g-largex4 | Parallel/large models |
| TPU workloads | v5e-1x1, v5e-2x2, v5e-2x4 | JAX/Flax, TPU-optimized |
v5e-1x1, v5e-2x2, v5e-2x4hf_jobs("ps")hf jobs ps |
list_jobs() |
| View logs | hf_jobs("logs", {...}) | hf jobs logs <id> | fetch_job_logs(job_id) |
| Cancel job | hf_jobs("cancel", {...}) | hf jobs cancel <id> | cancel_job(job_id) |
| Schedule UV | hf_jobs("scheduled uv", {...}) | - | create_scheduled_uv_job() |
| Schedule Docker | hf_jobs("scheduled run", {...}) | - | create_scheduled_job() |