MLflow 指标查询脚本 - 从跟踪服务器提取性能与评估数据

querying-mlflow-metrics by mlflow/skills

122 周安装量

24 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mlflow/skills --skill querying-mlflow-metrics

AI/机器学习数据分析监控

🇨🇳中文介绍

MLflow 指标

运行 scripts/fetch_metrics.py 以从 MLflow 跟踪服务器查询指标。

示例

令牌使用量摘要：

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG

输出：AVG: 223.91 SUM: 7613

每小时令牌趋势（最近24小时）：

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
    -t 3600 --start-time="-24h" --end-time=now

输出：按小时分组的令牌总和

按跟踪划分的延迟百分位数：

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name

按状态划分的错误率：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

917,400 周安装

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

145,500 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

122,000 周安装

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

71,500 周安装

参数	必需	描述
`-s, --server`	是	MLflow 服务器 URL
`-x, --experiment-ids`	是	实验 ID（逗号分隔）
`-m, --metric`	是	`trace_count`, `latency`, `input_tokens`, `output_tokens`, `total_tokens`
`-a, --aggregations`	是	`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, `P50`, `P95`, `P99`
`-d, --dimensions`	否	分组依据：`trace_name`, `trace_status`
`-t, --time-interval`	否	分桶大小（秒）（3600=每小时，86400=每天）
`--start-time`	否	`-24h`, `-7d`, `now`, ISO 8601 或纪元毫秒
`--end-time`	否	与开始时间相同的格式
`-o, --output`	否	`table`（默认）或 `json`

🇺🇸English

MLflow Metrics

Run scripts/fetch_metrics.py to query metrics from an MLflow tracking server.

Examples

Token usage summary:

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG

Output: AVG: 223.91 SUM: 7613

Hourly token trend (last 24h):

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
    -t 3600 --start-time="-24h" --end-time=now

Output: Time-bucketed token sums per hour

Latency percentiles by trace:

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name

Error rate by status:

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status

Quality scores by evaluator (assessments):

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_value -a AVG,P50 -d assessment_name

Output: Average and median scores for each evaluator (e.g., correctness, relevance)

Assessment count by name:

python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_count -a COUNT -d assessment_name

JSON output: Add -o json to any command.

Arguments

Arg	Required	Description
`-s, --server`	Yes	MLflow server URL
`-x, --experiment-ids`	Yes	Experiment IDs (comma-separated)
`-m, --metric`	Yes	`trace_count`, `latency`, `input_tokens`, `output_tokens`,

For SPANS metrics (span_count, latency), add -v SPANS. For ASSESSMENTS metrics, add -v ASSESSMENTS.

See references/api_reference.md for filter syntax and full API details.

Weekly Installs

Repository

mlflow/skills

GitHub Stars

First Seen

Feb 5, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli88

github-copilot88

codex87

opencode86

kimi-cli85

amp85