⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

RAG与搜索工程完整指南：构建生产级检索增强生成系统的最佳实践

ai-rag by vasilyu1983/ai-agents-public

86 周安装量

53 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/vasilyu1983/ai-agents-public --skill ai-rag

AI/机器学习搜索自然语言处理

🇨🇳中文介绍

RAG 与搜索工程 — 完整参考

使用混合搜索、基于事实的生成和可衡量的质量构建生产级检索系统。

本技能涵盖：

RAG：分块、上下文检索、事实基础、自适应/自校正系统
搜索：BM25、向量搜索、混合融合、排序流水线
评估：recall@k、nDCG、MRR、事实基础指标

现代最佳实践（2026年1月）：

将检索质量与答案质量分开；两者都需评估（RAG：https://arxiv.org/abs/2005.11401）。
在精度要求高时，默认使用混合检索（稀疏 + 稠密）并配合重排序（DPR：https://arxiv.org/abs/2004.04906）。
使用故障分类法进行系统性调试（RAG中的七个故障点：https://arxiv.org/abs/2401.05856）。
将新鲜度/失效处理视为一等公民；数据陈旧是正确性错误，而非用户体验问题。
添加事实基础门控：可回答性检查、引用覆盖检查以及上下文缺失时的默认拒绝。
对RAG进行威胁建模：检索到的文本是不可信的输入（OWASP LLM Top 10：https://owasp.org/www-project-top-10-for-large-language-model-applications/）。

默认姿态：确定性流水线、有界上下文、显式故障处理以及每个阶段的遥测。

范围说明：关于生成阶段使用的提示结构和输出契约，请参阅。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

任务	工具/框架	命令/模式	使用时机
决定使用RAG还是替代方案	决策框架	如果满足以下条件则使用RAG：需要新鲜度 + 引用 + 语料库规模大；否则：微调/缓存	避免不必要的检索延迟/复杂性
分块与解析	分块器 + 解析器	从简单开始；根据文档类型添加结构感知分块	用于文档、代码、表格、PDF的摄取
检索	稀疏 + 稠密（混合）	融合（例如，RRF）+ 元数据过滤器 + top-k 调优	混合查询风格；高召回率要求
精度提升	重排序器	对 top-k 候选结果进行交叉编码器/LLM重排序	当 top-k 包含近似匹配/噪声时
事实基础	输出契约 + 引用	引用/ID引用；可回答性门控；证据缺失时拒绝	合规性、可信度和可审计性
评估	离线 + 在线评估	检索指标 + 答案指标 + 回归测试	防止静默回归和因陈旧数据导致的故障

决策树：RAG架构选择

构建RAG系统：[架构路径]
    ├─ 文档类型？
    │   ├─ 页面/章节结构化的？ → 结构感知分块（页面/章节 + 元数据）
    │   ├─ 技术文档/代码？ → 结构感知 + 代码感知分块（符号、标题）
    │   └─ 简单内容？ → 带重叠的固定大小令牌分块（基线）
    │
    ├─ 检索准确率低？
    │   ├─ 查询歧义？ → 查询重写 + 多查询扩展 + 过滤器
    │   ├─ 结果噪声大？ → 添加重排序器 + 更好的元数据过滤器
    │   └─ 混合查询？ → 混合检索（稀疏 + 稠密）+ 重排序
    │
    ├─ 数据集大小？
    │   ├─ <100k 分块？ → 平面索引（精确搜索）
    │   ├─ 100k-10M？ → HNSW（低延迟）
    │   └─ >10M？ → IVF/ScaNN/DiskANN（可扩展）
    │
    └─ 生产质量要求？
        └─ 添加：ACL、新鲜度/失效处理、评估门控和端到端遥测

核心概念（与供应商无关）

流水线阶段：摄取 → 分块 → 嵌入 → 索引 → 检索 → 重排序 → 打包上下文 → 生成 → 验证。
两个评估层面：检索相关性（我们是否获取了正确的证据？）与生成保真度（我们是否正确使用了它？）。
新鲜度模型：陈旧度预算、失效触发器和重建策略（增量 vs 全量）。
信任边界：检索到的内容是不可信的；应像对待用户输入一样严格（OWASP LLM Top 10：https://owasp.org/www-project-top-10-for-large-language-model-applications/）。

实施实践（工具示例）

使用检索API契约：查询、过滤器、top_k、trace_id 和返回的证据ID。
使用追踪/指标对每个阶段进行检测（OpenTelemetry GenAI语义约定：https://opentelemetry.io/docs/specs/semconv/gen-ai/）。
有意识地添加缓存：嵌入缓存、检索缓存（查询+过滤器）和响应缓存（带失效机制）。

保持检索的确定性：固定的 top_k、稳定的排序、显式的过滤器。
在检索时（而不仅仅在生成时）强制执行文档级ACL。
在测试中包含具有稳定ID的引用，并验证引用覆盖率。

避免在没有测试集和回归门控的情况下发布RAG系统。
避免“塞入所有内容”的上下文打包方式；这会增加成本并可能降低准确性。
避免在没有元数据和租户隔离的情况下混合语料库。

何时使用此技能

当用户询问以下问题时使用此技能：

“帮我设计一个RAG流水线。”
“我应该如何对这个文档进行分块？”
“为我的用例优化检索。”
“我的RAG系统在产生幻觉 — 修复它。”
“选择合适的向量数据库 / 索引类型。”
“创建一个RAG评估框架。”
“调试为什么检索会给出不相关的结果。”

工具/模型推荐协议

当用户询问供应商/模型/框架推荐时，请根据当前主要来源验证其说法。

“对于[用例]来说，最好的向量数据库是什么？”
“我应该用什么进行[分块/嵌入/重排序]？”
“RAG开发的最新进展是什么？”
“关于[检索/事实基础/评估]的当前最佳实践？”
“[Pinecone/Qdrant/Chroma]在2026年仍然相关吗？”
“[向量数据库 A] vs [向量数据库 B]？”
“针对[用例]的最佳嵌入模型是什么？”
“我应该使用什么RAG框架？”

阅读 data/sources.json 并从标记为 "add_as_web_search": true 的来源开始。
为每个推荐验证1-2个主要文档（发布说明、基准测试、文档）。
如果无法浏览，请说明假设并提供验证清单。

检查后，提供：

当前格局：现在流行的向量数据库/嵌入模型是什么（不是6个月前）
新兴趋势：正在获得关注的技术（后期交互、智能体RAG、图RAG）
已弃用/衰退：失去相关性的方法或工具
推荐：基于最新数据，而不仅仅是静态知识

示例主题（用当前来源验证）

向量数据库（Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB）
嵌入模型（OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers）
重排序（Cohere Rerank, Jina Reranker, FlashRank, RankGPT）
RAG框架（LlamaIndex, LangChain, Haystack, txtai）
高级RAG（上下文检索、智能体RAG、图RAG、CRAG）
评估（RAGAS, TruLens, DeepEval, BEIR）

对于相邻主题，请参考以下技能：

ai-llm - 提示工程、微调、指令数据集
ai-agents - 智能体RAG工作流和工具路由
ai-llm-inference - 服务性能、量化、批处理
ai-mlops - 部署、监控、安全、隐私和治理
ai-prompt-engineering - 用于RAG生成阶段的提示模式

系统设计（从此开始）

上下文打包与事实基础

data/sources.json — 精选的外部参考资料

每当用户需要检索增强系统设计或调试时使用此技能，而非提示工程或部署工作。

在最终回答前，使用网络搜索/网络获取来验证当前外部事实、版本、定价、截止日期、法规或平台行为。
优先使用主要来源；对于易变信息，报告来源链接和日期。
如果无法访问网络，请说明限制并将指导标记为未经验证。

🇺🇸English

RAG & Search Engineering — Complete Reference

Build production-grade retrieval systems with hybrid search , grounded generation , and measurable quality.

This skill covers:

RAG : Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
Search : BM25, vector search, hybrid fusion, ranking pipelines
Evaluation : recall@k, nDCG, MRR, groundedness metrics

Modern Best Practices (Jan 2026) :

Separate retrieval quality from answer quality ; evaluate both (RAG: https://arxiv.org/abs/2005.11401).
Default to hybrid retrieval (sparse + dense) with reranking when precision matters (DPR: https://arxiv.org/abs/2004.04906).
Use a failure taxonomy to debug systematically (Seven Failure Points in RAG: https://arxiv.org/abs/2401.05856).
Treat freshness/invalidation as first-class; staleness is a correctness bug, not a UX issue.
Add grounding gates : answerability checks, citation coverage checks, and refusal-on-missing-context defaults.
Threat-model RAG: retrieved text is untrusted input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Default posture : deterministic pipeline, bounded context, explicit failure handling, and telemetry for every stage.

Scope note : For prompt structure and output contracts used in the generation phase, see ai-prompt-engineering.

Quick Reference

Task	Tool/Framework	Command/Pattern	When to Use
Decide RAG vs alternatives	Decision framework	RAG if: freshness + citations + corpus size; else: fine-tune/caching	Avoid unnecessary retrieval latency/complexity
Chunking & parsing	Chunker + parser	Start simple; add structure-aware chunking per doc type	Ingestion for docs, code, tables, PDFs
Retrieval	Sparse + dense (hybrid)	Fusion (e.g., RRF) + metadata filters + top-k tuning	Mixed query styles; high recall requirements
Precision boost	Reranker	Cross-encoder/LLM rerank of top-k candidates	When top-k contains near-misses/noise
Grounding	Output contract + citations	Quote/ID citations; answerability gate; refuse on missing evidence	Compliance, trust, and auditability
Evaluation	Offline + online eval	Retrieval metrics + answer metrics + regression tests

Decision Tree: RAG Architecture Selection

Building RAG system: [Architecture Path]
    ├─ Document type?
    │   ├─ Page/section-structured? → Structure-aware chunking (pages/sections + metadata)
    │   ├─ Technical docs/code? → Structure-aware + code-aware chunking (symbols, headers)
    │   └─ Simple content? → Fixed-size token chunking with overlap (baseline)
    │
    ├─ Retrieval accuracy low?
    │   ├─ Query ambiguity? → Query rewriting + multi-query expansion + filters
    │   ├─ Noisy results? → Add reranker + better metadata filters
    │   └─ Mixed queries? → Hybrid retrieval (sparse + dense) + reranking
    │
    ├─ Dataset size?
    │   ├─ <100k chunks? → Flat index (exact search)
    │   ├─ 100k-10M? → HNSW (low latency)
    │   └─ >10M? → IVF/ScaNN/DiskANN (scalable)
    │
    └─ Production quality?
        └─ Add: ACLs, freshness/invalidation, eval gates, and telemetry (end-to-end)

Core Concepts (Vendor-Agnostic)

Pipeline stages : ingest → chunk → embed → index → retrieve → rerank → pack context → generate → verify.
Two evaluation planes : retrieval relevance (did we fetch the right evidence?) vs generation fidelity (did we use it correctly?).
Freshness model : staleness budget, invalidation triggers, and rebuild strategy (incremental vs full).
Trust boundaries : retrieved content is untrusted; apply the same rigor as user input (OWASP LLM Top 10: https://owasp.org/www-project-top-10-for-large-language-model-applications/).

Implementation Practices (Tooling Examples)

Use a retrieval API contract : query, filters, top_k, trace_id, and returned evidence IDs.
Instrument each stage with tracing/metrics (OpenTelemetry GenAI semantic conventions: https://opentelemetry.io/docs/specs/semconv/gen-ai/).
Add caches deliberately: embeddings cache, retrieval cache (query+filters), and response cache (with invalidation).

Do / Avoid

Do keep retrieval deterministic: fixed top_k, stable ranking, explicit filters.
Do enforce document-level ACLs at retrieval time (not only at generation time).
Do include citations with stable IDs and verify citation coverage in tests.

Avoid

Avoid shipping RAG without a test set and regression gate.
Avoid "stuff everything" context packing; it increases cost and can reduce accuracy.
Avoid mixing corpora without metadata and tenant isolation.

When to Use This Skill

Use this skill when the user asks:

"Help me design a RAG pipeline."
"How should I chunk this document?"
"Optimize retrieval for my use case."
"My RAG system is hallucinating — fix it."
"Choose the right vector database / index type."
"Create a RAG evaluation framework."
"Debug why retrieval gives irrelevant results."

Tool/Model Recommendation Protocol

When users ask for vendor/model/framework recommendations, validate claims against current primary sources.

Triggers

"What's the best vector database for [use case]?"
"What should I use for [chunking/embedding/reranking]?"
"What's the latest in RAG development?"
"Current best practices for [retrieval/grounding/evaluation]?"
"Is [Pinecone/Qdrant/Chroma] still relevant in 2026?"
"[Vector DB A] vs [Vector DB B]?"
"Best embedding model for [use case]?"
"What RAG framework should I use?"

Required Checks

Read data/sources.json and start from sources with "add_as_web_search": true.
Verify 1-2 primary docs per recommendation (release notes, benchmarks, docs).
If browsing isn't available, state assumptions and give a verification checklist.

What to Report

After checking, provide:

Current landscape : What vector DBs/embeddings are popular NOW (not 6 months ago)
Emerging trends : Techniques gaining traction (late interaction, agentic RAG, graph RAG)
Deprecated/declining : Approaches or tools losing relevance
Recommendation : Based on fresh data, not just static knowledge

Example Topics (verify with current sources)

Vector databases (Pinecone, Qdrant, Weaviate, Milvus, pgvector, LanceDB)
Embedding models (OpenAI, Cohere, Voyage AI, Jina, Sentence Transformers)
Reranking (Cohere Rerank, Jina Reranker, FlashRank, RankGPT)
RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
Advanced RAG (contextual retrieval, agentic RAG, graph RAG, CRAG)
Evaluation (RAGAS, TruLens, DeepEval, BEIR)

Related Skills

For adjacent topics, reference these skills:

ai-llm - Prompting, fine-tuning, instruction datasets
ai-agents - Agentic RAG workflows and tool routing
ai-llm-inference - Serving performance, quantization, batching
ai-mlops - Deployment, monitoring, security, privacy, and governance
ai-prompt-engineering - Prompt patterns for RAG generation phase

Templates

System Design (Start Here)

RAG System Design

Chunking & Ingestion

Embedding & Indexing

Retrieval & Reranking

Context Packaging & Grounding

Evaluation

Search Configuration

Query Rewriting

Query Rewrite

Navigation

Resources

Templates

Data

data/sources.json — Curated external references

Use this skill whenever the user needs retrieval-augmented system design or debugging , not prompt work or deployment.

Fact-Checking

Use web search/web fetch to verify current external facts, versions, pricing, deadlines, regulations, or platform behavior before final answers.
Prefer primary sources; report source links and dates for volatile information.
If web access is unavailable, state the limitation and mark guidance as unverified.

Weekly Installs

Repository

vasilyu1983/ai-…s-public

GitHub Stars

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

gemini-cli57

cursor57

opencode56

codex56

github-copilot53

cline47

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

53,700 周安装