重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
golden-dataset by yonatangross/orchestkit
npx skills add https://github.com/yonatangross/orchestkit --skill golden-dataset用于构建、管理和验证 AI/ML 评估黄金数据集的综合模式。每个类别在 rules/ 目录下都有独立的规则文件,按需加载。
| 类别 | 规则数 | 影响 | 使用场景 |
|---|---|---|---|
| 数据整理 | 3 | 高 | 内容收集、标注流程、多样性分析 |
| 数据管理 | 3 | 高 | 版本控制、备份/恢复、CI/CD 自动化 |
| 数据验证 | 3 | 关键 | 质量评分、漂移检测、回归测试 |
| 添加工作流 | 1 | 高 | 9 阶段整理、质量评分、偏见检测、白银到黄金转换 |
总计:4 个类别共 10 条规则
黄金数据集的内容收集、多智能体标注和多样性分析。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 规则 | 文件 | 关键模式 |
|---|
| 收集 | rules/curation-collection.md | 内容类型分类、质量阈值、重复项预防 |
| 标注 | rules/curation-annotation.md | 多智能体流程、共识聚合、Langfuse 追踪 |
| 多样性 | rules/curation-diversity.md | 难度分层、领域覆盖、平衡指南 |
黄金数据集的版本控制、存储和 CI/CD 自动化。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 版本控制 | rules/management-versioning.md | JSON 备份格式、嵌入向量重新生成、灾难恢复 |
| 存储 | rules/management-storage.md | 备份策略、URL 契约、数据完整性检查 |
| CI 集成 | rules/management-ci.md | GitHub Actions 自动化、部署前验证、每周备份 |
黄金数据集的质量评分、漂移检测和回归测试。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 质量 | rules/validation-quality.md | 模式验证、内容质量、引用完整性 |
| 漂移 | rules/validation-drift.md | 重复项检测、语义相似性、覆盖缺口分析 |
| 回归 | rules/validation-regression.md | 难度分布、预提交钩子、完整数据集验证 |
向黄金数据集添加新文档的结构化工作流。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 添加文档 | rules/curation-add-workflow.md | 9 阶段整理、并行质量分析、偏见检测 |
from app.shared.services.embeddings import embed_text
async def validate_before_add(document: dict, source_url_map: dict) -> dict:
"""黄金数据集条目的添加前验证。"""
errors = []
# 1. URL 契约检查
if "placeholder" in document.get("source_url", ""):
errors.append("URL 必须是规范的,不能是占位符")
# 2. 内容质量
if len(document.get("title", "")) < 10:
errors.append("标题过短(至少 10 个字符)")
# 3. 标签要求
if len(document.get("tags", [])) < 2:
errors.append("至少需要 2 个领域标签")
return {"valid": len(errors) == 0, "errors": errors}
| 决策 | 建议 |
|---|---|
| 备份格式 | JSON(版本控制、可移植) |
| 嵌入向量存储 | 排除在备份之外(恢复时重新生成) |
| 质量阈值 | 质量分数 >= 0.70 方可纳入 |
| 置信度阈值 | >= 0.65 可自动纳入 |
| 重复项阈值 | 相似度 >= 0.90 则阻止,>= 0.85 则警告 |
| 每个条目标签最小值 | 2 个领域标签 |
| 每个文档测试查询最小值 | 3 个 |
| 难度平衡 | 简单 3,容易 3,中等 5,困难 3(最小值) |
| CI 频率 | 每周自动备份(UTC 时间周日凌晨 2 点) |
查看 test-cases.json 了解所有类别的 9 个测试用例。
ork:rag-retrieval - 使用黄金数据集进行检索评估langfuse-observability - 整理工作流的追踪模式ork:testing-unit - 单元测试模式和策略ai-native-development - 用于恢复的嵌入向量生成关键词: 黄金数据集、数据整理、内容收集、标注、质量标准
解决的问题:
关键词: 黄金数据集、备份、恢复、版本控制、灾难恢复
解决的问题:
关键词: 黄金数据集、验证、模式、重复项检测、质量指标
解决的问题:
每周安装量
68
代码仓库
GitHub 星标数
132
首次出现
2026 年 2 月 14 日
安全审计
安装于
gemini-cli66
opencode66
codex65
github-copilot65
cursor64
cline60
Comprehensive patterns for building, managing, and validating golden datasets for AI/ML evaluation. Each category has individual rule files in rules/ loaded on-demand.
| Category | Rules | Impact | When to Use |
|---|---|---|---|
| Curation | 3 | HIGH | Content collection, annotation pipelines, diversity analysis |
| Management | 3 | HIGH | Versioning, backup/restore, CI/CD automation |
| Validation | 3 | CRITICAL | Quality scoring, drift detection, regression testing |
| Add Workflow | 1 | HIGH | 9-phase curation, quality scoring, bias detection, silver-to-gold |
Total: 10 rules across 4 categories
Content collection, multi-agent annotation, and diversity analysis for golden datasets.
| Rule | File | Key Pattern |
|---|---|---|
| Collection | rules/curation-collection.md | Content type classification, quality thresholds, duplicate prevention |
| Annotation | rules/curation-annotation.md | Multi-agent pipeline, consensus aggregation, Langfuse tracing |
| Diversity | rules/curation-diversity.md | Difficulty stratification, domain coverage, balance guidelines |
Versioning, storage, and CI/CD automation for golden datasets.
| Rule | File | Key Pattern |
|---|---|---|
| Versioning | rules/management-versioning.md | JSON backup format, embedding regeneration, disaster recovery |
| Storage | rules/management-storage.md | Backup strategies, URL contract, data integrity checks |
| CI Integration | rules/management-ci.md | GitHub Actions automation, pre-deployment validation, weekly backups |
Quality scoring, drift detection, and regression testing for golden datasets.
| Rule | File | Key Pattern |
|---|---|---|
| Quality | rules/validation-quality.md | Schema validation, content quality, referential integrity |
| Drift | rules/validation-drift.md | Duplicate detection, semantic similarity, coverage gap analysis |
| Regression | rules/validation-regression.md | Difficulty distribution, pre-commit hooks, full dataset validation |
Structured workflow for adding new documents to the golden dataset.
| Rule | File | Key Pattern |
|---|---|---|
| Add Document | rules/curation-add-workflow.md | 9-phase curation, parallel quality analysis, bias detection |
from app.shared.services.embeddings import embed_text
async def validate_before_add(document: dict, source_url_map: dict) -> dict:
"""Pre-addition validation for golden dataset entries."""
errors = []
# 1. URL contract check
if "placeholder" in document.get("source_url", ""):
errors.append("URL must be canonical, not a placeholder")
# 2. Content quality
if len(document.get("title", "")) < 10:
errors.append("Title too short (min 10 chars)")
# 3. Tag requirements
if len(document.get("tags", [])) < 2:
errors.append("At least 2 domain tags required")
return {"valid": len(errors) == 0, "errors": errors}
| Decision | Recommendation |
|---|---|
| Backup format | JSON (version controlled, portable) |
| Embedding storage | Exclude from backup (regenerate on restore) |
| Quality threshold | >= 0.70 quality score for inclusion |
| Confidence threshold | >= 0.65 for auto-include |
| Duplicate threshold | >= 0.90 similarity blocks, >= 0.85 warns |
| Min tags per entry | 2 domain tags |
| Min test queries | 3 per document |
| Difficulty balance | Trivial 3, Easy 3, Medium 5, Hard 3 minimum |
| CI frequency | Weekly automated backup (Sunday 2am UTC) |
See test-cases.json for 9 test cases across all categories.
ork:rag-retrieval - Retrieval evaluation using golden datasetlangfuse-observability - Tracing patterns for curation workflowsork:testing-unit - Unit testing patterns and strategiesai-native-development - Embedding generation for restoreKeywords: golden dataset, curation, content collection, annotation, quality criteria
Solves:
Keywords: golden dataset, backup, restore, versioning, disaster recovery
Solves:
Keywords: golden dataset, validation, schema, duplicate detection, quality metrics
Solves:
Weekly Installs
68
Repository
GitHub Stars
132
First Seen
Feb 14, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
gemini-cli66
opencode66
codex65
github-copilot65
cursor64
cline60
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
53,700 周安装