hugging-face-trackio by huggingface/skills
npx skills add https://github.com/huggingface/skills --skill hugging-face-trackioTrackio 是一个用于记录和可视化机器学习训练指标的实验追踪库。它能同步到 Hugging Face Spaces,实现实时监控仪表板。
| 任务 | 接口 | 参考文档 |
|---|---|---|
| 训练期间记录指标 | Python API | references/logging_metrics.md |
| 训练诊断触发警报 | Python API | references/alerts.md |
| 训练后/期间检索指标与警报 | CLI | references/retrieving_metrics.md |
在你的训练脚本中使用 import trackio 来记录指标:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
trackio.init() 初始化追踪trackio.log() 记录指标,或使用 TRL 的 report_to="trackio"trackio.finish() 完成核心概念:对于远程/云端训练,传递 space_id —— 指标会同步到 Space 仪表板,因此在实例终止后指标仍会保留。
→ 查看 references/logging_metrics.md 了解设置、TRL 集成和配置选项。
在训练代码中插入 trackio.alert() 调用来标记重要事件 —— 就像为调试插入打印语句,但它是结构化的且可查询:
trackio.alert(title="...", level=trackio.AlertLevel.WARN) —— 触发警报INFO、WARN、ERRORLLM 智能体的核心概念:警报是自主实验迭代的主要机制。智能体应在训练代码中为诊断条件(损失激增、NaN 梯度、低准确率、训练停滞)插入警报。由于警报会打印到终端,监视训练脚本输出的智能体会自动看到它们。对于后台或分离的运行,智能体可以通过 CLI 轮询。
→ 查看 references/alerts.md 了解完整的警报 API、webhook 设置和自主智能体工作流。
使用 trackio 命令查询记录的指标和警报:
trackio list projects/runs/metrics —— 发现可用内容trackio get project/run/metric —— 检索摘要和值trackio list alerts --project <name> --json —— 检索警报trackio show —— 启动仪表板trackio sync —— 同步到 HF Space核心概念:添加 --json 参数以获得适合自动化和 LLM 智能体的程序化输出。
→ 查看 references/retrieving_metrics.md 了解所有命令、工作流和 JSON 输出格式。
import trackio
trackio.init(project="my-project", space_id="username/trackio")
trackio.log({"loss": 0.1, "accuracy": 0.9})
trackio.log({"loss": 0.09, "accuracy": 0.91})
trackio.finish()
trackio list projects --json
trackio get metric --project my-project --run my-run --metric loss --json
作为 LLM 智能体自主运行实验时,推荐的工作流是:
trackio.alert() 调用trackio list alerts --project <name> --json --since <timestamp> 检查新警报trackio get metric ... 检查特定值import trackio
trackio.init(project="my-project", config={"lr": 1e-4})
for step in range(num_steps):
loss = train_step()
trackio.log({"loss": loss, "step": step})
if step > 100 and loss > 5.0:
trackio.alert(
title="损失发散",
text=f"经过 {step} 步后损失 {loss:.4f} 仍然很高",
level=trackio.AlertLevel.ERROR,
)
if step > 0 and abs(loss) < 1e-8:
trackio.alert(
title="损失消失",
text="损失接近零 —— 可能梯度崩溃",
level=trackio.AlertLevel.WARN,
)
trackio.finish()
然后从单独的终端/进程轮询:
trackio list alerts --project my-project --json --since "2025-01-01T00:00:00"
每周安装量
225
代码仓库
GitHub 星标数
9.9K
首次出现
2026年1月20日
安全审计
已安装于
opencode190
gemini-cli187
codex185
claude-code178
github-copilot174
cursor172
Trackio is an experiment tracking library for logging and visualizing ML training metrics. It syncs to Hugging Face Spaces for real-time monitoring dashboards.
| Task | Interface | Reference |
|---|---|---|
| Logging metrics during training | Python API | references/logging_metrics.md |
| Firing alerts for training diagnostics | Python API | references/alerts.md |
| Retrieving metrics & alerts after/during training | CLI | references/retrieving_metrics.md |
Use import trackio in your training scripts to log metrics:
trackio.init()trackio.log() or use TRL's report_to="trackio"trackio.finish()Key concept : For remote/cloud training, pass space_id — metrics sync to a Space dashboard so they persist after the instance terminates.
→ See references/logging_metrics.md for setup, TRL integration, and configuration options.
Insert trackio.alert() calls in training code to flag important events — like inserting print statements for debugging, but structured and queryable:
trackio.alert(title="...", level=trackio.AlertLevel.WARN) — fire an alertINFO, WARN, ERRORKey concept for LLM agents : Alerts are the primary mechanism for autonomous experiment iteration. An agent should insert alerts into training code for diagnostic conditions (loss spikes, NaN gradients, low accuracy, training stalls). Since alerts are printed to the terminal, an agent that is watching the training script's output will see them automatically. For background or detached runs, the agent can poll via CLI instead.
→ See references/alerts.md for the full alerts API, webhook setup, and autonomous agent workflows.
Use the trackio command to query logged metrics and alerts:
trackio list projects/runs/metrics — discover what's availabletrackio get project/run/metric — retrieve summaries and valuestrackio list alerts --project <name> --json — retrieve alertstrackio show — launch the dashboardtrackio sync — sync to HF SpaceKey concept : Add --json for programmatic output suitable for automation and LLM agents.
→ See references/retrieving_metrics.md for all commands, workflows, and JSON output formats.
import trackio
trackio.init(project="my-project", space_id="username/trackio")
trackio.log({"loss": 0.1, "accuracy": 0.9})
trackio.log({"loss": 0.09, "accuracy": 0.91})
trackio.finish()
trackio list projects --json
trackio get metric --project my-project --run my-run --metric loss --json
When running experiments autonomously as an LLM agent, the recommended workflow is:
trackio.alert() calls for diagnostic conditionstrackio list alerts --project <name> --json --since <timestamp> to check for new alertstrackio get metric ... to inspect specific valuesimport trackio
trackio.init(project="my-project", config={"lr": 1e-4})
for step in range(num_steps):
loss = train_step()
trackio.log({"loss": loss, "step": step})
if step > 100 and loss > 5.0:
trackio.alert(
title="Loss divergence",
text=f"Loss {loss:.4f} still high after {step} steps",
level=trackio.AlertLevel.ERROR,
)
if step > 0 and abs(loss) < 1e-8:
trackio.alert(
title="Vanishing loss",
text="Loss near zero — possible gradient collapse",
level=trackio.AlertLevel.WARN,
)
trackio.finish()
Then poll from a separate terminal/process:
trackio list alerts --project my-project --json --since "2025-01-01T00:00:00"
Weekly Installs
225
Repository
GitHub Stars
9.9K
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode190
gemini-cli187
codex185
claude-code178
github-copilot174
cursor172
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
60,400 周安装
色彩可访问性指南:WCAG对比度标准、色盲模拟与最佳实践
212 周安装
AgentOps技能转换器 - 一键将技能转换为Codex、Cursor等AI平台格式
212 周安装
Agile Skill Build:快速创建和扩展ace-skills的自动化工具,提升AI技能开发效率
1 周安装
LLM评估工具lm-evaluation-harness使用指南:HuggingFace模型基准测试与性能分析
212 周安装
Agently TriggerFlow 状态与资源管理:runtime_data、flow_data 和运行时资源详解
1 周安装
Agently Tools 工具系统详解:Python 代理工具注册、循环控制与内置工具使用
1 周安装