Trackio - 机器学习实验追踪库，实时监控训练指标与警报，集成Hugging Face Spaces

hugging-face-trackio by huggingface/skills

225 周安装量

10,000 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/huggingface/skills --skill hugging-face-trackio

AI/机器学习自动化监控

🇨🇳中文介绍

Trackio - 机器学习训练实验追踪

Trackio 是一个用于记录和可视化机器学习训练指标的实验追踪库。它能同步到 Hugging Face Spaces，实现实时监控仪表板。

三种接口

任务	接口	参考文档
训练期间记录指标	Python API	references/logging_metrics.md
训练诊断触发警报	Python API	references/alerts.md
训练后/期间检索指标与警报	CLI	references/retrieving_metrics.md

何时使用何种接口

Python API → 记录

在你的训练脚本中使用 import trackio 来记录指标：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

812,900 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

117,000 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

109,600 周安装

使用 trackio.init() 初始化追踪
使用 trackio.log() 记录指标，或使用 TRL 的 report_to="trackio"
使用 trackio.finish() 完成

trackio.alert(title="...", level=trackio.AlertLevel.WARN) —— 触发警报
三种严重级别：INFO、WARN、ERROR
警报会打印到终端，存储在数据库中，显示在仪表板上，并可选择性地发送到 webhook（Slack/Discord）

trackio list projects/runs/metrics —— 发现可用内容
trackio get project/run/metric —— 检索摘要和值
trackio list alerts --project <name> --json —— 检索警报
trackio show —— 启动仪表板
trackio sync —— 同步到 HF Space

设置带警报的训练 —— 为诊断条件插入 trackio.alert() 调用
启动训练 —— 在后台运行脚本
轮询警报 —— 使用 trackio list alerts --project <name> --json --since <timestamp> 检查新警报
读取指标 —— 使用 trackio get metric ... 检查特定值
迭代 —— 基于警报和指标，停止运行，调整超参数，并启动新的运行

trackio.init(project="my-project", config={"lr": 1e-4})

for step in range(num_steps):
    loss = train_step()
    trackio.log({"loss": loss, "step": step})

    if step > 100 and loss > 5.0:
        trackio.alert(
            title="损失发散",
            text=f"经过 {step} 步后损失 {loss:.4f} 仍然很高",
            level=trackio.AlertLevel.ERROR,
        )
    if step > 0 and abs(loss) < 1e-8:
        trackio.alert(
            title="损失消失",
            text="损失接近零 —— 可能梯度崩溃",
            level=trackio.AlertLevel.WARN,
        )

trackio.finish()

🇺🇸English

Trackio - Experiment Tracking for ML Training

Trackio is an experiment tracking library for logging and visualizing ML training metrics. It syncs to Hugging Face Spaces for real-time monitoring dashboards.

Three Interfaces

Task	Interface	Reference
Logging metrics during training	Python API	references/logging_metrics.md
Firing alerts for training diagnostics	Python API	references/alerts.md
Retrieving metrics & alerts after/during training	CLI	references/retrieving_metrics.md

When to Use Each

Python API → Logging

Use import trackio in your training scripts to log metrics:

Initialize tracking with trackio.init()
Log metrics with trackio.log() or use TRL's report_to="trackio"
Finalize with trackio.finish()

Key concept : For remote/cloud training, pass space_id — metrics sync to a Space dashboard so they persist after the instance terminates.

→ See references/logging_metrics.md for setup, TRL integration, and configuration options.

Python API → Alerts

Insert trackio.alert() calls in training code to flag important events — like inserting print statements for debugging, but structured and queryable:

trackio.alert(title="...", level=trackio.AlertLevel.WARN) — fire an alert
Three severity levels: INFO, WARN, ERROR
Alerts are printed to terminal, stored in the database, shown in the dashboard, and optionally sent to webhooks (Slack/Discord)

Key concept for LLM agents : Alerts are the primary mechanism for autonomous experiment iteration. An agent should insert alerts into training code for diagnostic conditions (loss spikes, NaN gradients, low accuracy, training stalls). Since alerts are printed to the terminal, an agent that is watching the training script's output will see them automatically. For background or detached runs, the agent can poll via CLI instead.

→ See references/alerts.md for the full alerts API, webhook setup, and autonomous agent workflows.

CLI → Retrieving

Use the trackio command to query logged metrics and alerts:

trackio list projects/runs/metrics — discover what's available
trackio get project/run/metric — retrieve summaries and values
trackio list alerts --project <name> --json — retrieve alerts
trackio show — launch the dashboard
trackio sync — sync to HF Space

Key concept : Add --json for programmatic output suitable for automation and LLM agents.

→ See references/retrieving_metrics.md for all commands, workflows, and JSON output formats.

Minimal Logging Setup

import trackio

trackio.init(project="my-project", space_id="username/trackio")
trackio.log({"loss": 0.1, "accuracy": 0.9})
trackio.log({"loss": 0.09, "accuracy": 0.91})
trackio.finish()

Minimal Retrieval

trackio list projects --json
trackio get metric --project my-project --run my-run --metric loss --json

Autonomous ML Experiment Workflow

When running experiments autonomously as an LLM agent, the recommended workflow is:

Set up training with alerts — insert trackio.alert() calls for diagnostic conditions
Launch training — run the script in the background
Poll for alerts — use trackio list alerts --project <name> --json --since <timestamp> to check for new alerts
Read metrics — use trackio get metric ... to inspect specific values
Iterate — based on alerts and metrics, stop the run, adjust hyperparameters, and launch a new run

import trackio

trackio.init(project="my-project", config={"lr": 1e-4})

for step in range(num_steps):
    loss = train_step()
    trackio.log({"loss": loss, "step": step})

    if step > 100 and loss > 5.0:
        trackio.alert(
            title="Loss divergence",
            text=f"Loss {loss:.4f} still high after {step} steps",
            level=trackio.AlertLevel.ERROR,
        )
    if step > 0 and abs(loss) < 1e-8:
        trackio.alert(
            title="Vanishing loss",
            text="Loss near zero — possible gradient collapse",
            level=trackio.AlertLevel.WARN,
        )

trackio.finish()

Then poll from a separate terminal/process:

trackio list alerts --project my-project --json --since "2025-01-01T00:00:00"

Weekly Installs

225

Repository

huggingface/skills

GitHub Stars

9.9K

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode190

gemini-cli187

codex185

claude-code178

github-copilot174

cursor172