build-review-interface by hamelsmu/evals-skills
npx skills add https://github.com/hamelsmu/evals-skills --skill build-review-interface构建一个 HTML 页面,用于从数据源(JSON/CSV 文件)加载追踪记录,每次显示一条追踪记录,并提供通过/失败按钮、自由文本注释字段以及上一个/下一个导航功能。将标签保存到本地文件(CSV/SQLite/JSON)。然后根据以下指南针对特定领域进行定制。
以最适合该领域人类阅读的格式呈现所有数据。电子邮件应看起来像电子邮件。代码应具有语法高亮。Markdown 应被渲染。表格应呈现为表格。JSON 应被美化打印并可折叠。
<details> 切换标签中。在追踪记录级别进行标注。审阅者判断整个追踪记录,而非单个片段。
一旦你通过错误分析确定了失败类别,之后可以添加预定义的失败模式标签作为可点击的复选框、下拉列表或选择列表,以便审阅者除了写注释外,还可以从已知类别中选择。但在初始构建阶段不要添加这些。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
Arrow keys = 导航追踪记录
1 = 通过 2 = 失败
D = 推迟 U = 撤销上一个操作
Cmd+S = 保存 Cmd+Enter = 保存并下一个
构建应用程序以接受来自任何来源(JSON/CSV 文件)的追踪记录。将采样逻辑保留在应用程序外部的一个单独脚本中。从随机采样开始。
参考面板: 可切换的面板,在追踪记录旁边显示真实情况、预期答案或评分标准定义。
筛选: 根据与产品相关的元数据维度(渠道、用户类型、流水线版本)筛选追踪记录。
聚类: 根据元数据或语义相似性对追踪记录进行分组。显示每个聚类的代表性追踪记录,并支持深入查看。
构建界面后,使用 Playwright 进行验证。
视觉审查: 加载代表性追踪数据后,对界面进行截图。审查每张截图,检查:
功能测试: 编写一个 Playwright 脚本,执行完整的标注工作流程:
每周安装量
134
代码仓库
GitHub 星标数
955
首次出现
Mar 3, 2026
安全审计
安装于
codex132
gemini-cli131
kimi-cli131
github-copilot131
cursor131
opencode131
Build an HTML page that loads traces from a data source (JSON/CSV file), displays one trace at a time with Pass/Fail buttons, a free-text notes field, and Next/Previous navigation. Save labels to a local file (CSV/SQLite/JSON). Then customize to the domain using the guidelines below.
Format all data in the most human-readable representation for the domain. Emails should look like emails. Code should have syntax highlighting. Markdown should be rendered. Tables should be tables. JSON should be pretty-printed and collapsible.
<details> toggle.Annotate at the trace level. The reviewer judges the whole trace, not individual spans.
Once you have established failure categories from error analysis, you can later add predefined failure mode tags as clickable checkboxes, dropdowns or picklists so reviewers can select from known categories in addition to writing notes. But don't add these in the initial build.
Arrow keys = Navigate traces
1 = Pass 2 = Fail
D = Defer U = Undo last action
Cmd+S = Save Cmd+Enter = Save and next
Build the app to accept traces from any source (JSON/CSV file). Keep sampling logic outside the app in a separate script. Start with random sampling.
Reference panel: Toggle-able panel showing ground truth, expected answers, or rubric definitions alongside the trace.
Filtering: Filter traces by metadata dimensions relevant to the product (channel, user type, pipeline version).
Clustering: Group traces by metadata or semantic similarity. Show representative traces per cluster with drill-down.
After building the interface, verify it with Playwright.
Visual review: Take screenshots of the interface with representative trace data loaded. Review each screenshot for:
Functional test: Write a Playwright script that performs a full annotation workflow:
Weekly Installs
134
Repository
GitHub Stars
955
First Seen
Mar 3, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex132
gemini-cli131
kimi-cli131
github-copilot131
cursor131
opencode131
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
50,900 周安装
Remotion视频制作技能:使用React生成高质量宣传片、教程和SNS短视频
132 周安装
阿里云媒体处理服务MPS自动化管理工具 - 支持管道、模板、工作流与任务操作
128 周安装
Swift专家 | iOS/macOS/visionOS原生开发与Swift 6并发性能优化
128 周安装
GitHub Actions CI/CD 流水线模式:自动化构建、测试与部署全指南
129 周安装
Vercel部署与构建指南:自动化构建流水线、预览部署、发布控制与即时回滚
129 周安装
阿里云CDN OpenAPI自动化操作指南 - 域名管理、缓存刷新、HTTPS证书配置
129 周安装