rag-retrieval by yonatangross/orchestkit
npx skills add https://github.com/yonatangross/orchestkit --skill rag-retrieval构建生产级 RAG 系统的综合模式。每个类别在 rules/ 目录下都有独立的规则文件,按需加载。
| 类别 | 规则数量 | 影响程度 | 使用场景 |
|---|---|---|---|
| 核心 RAG | 4 | 关键 | 基础 RAG、引用、混合搜索、上下文管理 |
| 嵌入 | 3 | 高 | 模型选择、分块、批量/缓存优化 |
| 上下文检索 | 3 | 高 | 上下文前置、混合 BM25+向量、流水线 |
| HyDE | 3 | 高 | 词汇不匹配、假设文档生成 |
| 智能体 RAG | 4 | 高 | 自 RAG、CRAG、知识图谱、自适应路由 |
| 多模态 RAG |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 3 |
| 中 |
| 图像+文本检索、PDF 分块、跨模态搜索 |
| 查询分解 | 3 | 中 | 多概念查询、并行检索、RRF 融合 |
| 重排序 | 3 | 中 | 交叉编码器、LLM 评分、组合信号 |
| PGVector | 4 | 高 | PostgreSQL 混合搜索、HNSW 索引、模式设计 |
总计:9 个类别,共 30 条规则
检索、生成和流水线组合的基础模式。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 基础 RAG | rules/core-basic-rag.md | 检索 + 上下文 + 带引用的生成 |
| 混合搜索 | rules/core-hybrid-search.md | 用于语义+关键词的 RRF 融合 (k=60) |
| 上下文管理 | rules/core-context-management.md | 令牌预算 + 充分性检查 |
| 流水线组合 | rules/core-pipeline-composition.md | 可组合的 分解 → HyDE → 检索 → 重排序 |
嵌入模型、分块策略和生产优化。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 模型与 API | rules/embeddings-models.md | 模型选择、批量 API、相似度 |
| 分块 | rules/embeddings-chunking.md | 语义边界分割,512 令牌最佳点 |
| 高级 | rules/embeddings-advanced.md | Redis 缓存、Matryoshka 维度、批处理 |
Anthropic 的上下文前置技术 —— 检索失败减少 67%。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 上下文前置 | rules/contextual-prepend.md | LLM 生成的上下文 + 提示缓存 |
| 混合搜索 | rules/contextual-hybrid.md | 40% BM25 / 60% 向量权重分配 |
| 完整流水线 | rules/contextual-pipeline.md | 端到端索引 + 混合检索 |
用于弥合词汇鸿沟的假设文档嵌入。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 生成 | rules/hyde-generation.md | 嵌入假设文档,而非查询 |
| 按概念 | rules/hyde-per-concept.md | 针对多主题查询的并行 HyDE |
| 回退 | rules/hyde-fallback.md | 2-3 秒超时 → 直接嵌入回退 |
具有 LLM 驱动决策能力的自校正检索。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 自 RAG | rules/agentic-self-rag.md | 用于相关性的二元文档分级 |
| 校正 RAG | rules/agentic-corrective-rag.md | 带有网络回退的 CRAG 工作流 |
| 知识图谱 | rules/agentic-knowledge-graph.md | KG + 向量混合用于实体丰富的领域 |
| 自适应检索 | rules/agentic-adaptive-retrieval.md | 查询路由到最优策略 |
具有跨模态搜索的图像 + 文本检索。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 嵌入 | rules/multimodal-embeddings.md | CLIP、SigLIP 2、Voyage multimodal-3 |
| 分块 | rules/multimodal-chunking.md | 保留图像的 PDF 提取 |
| 流水线 | rules/multimodal-pipeline.md | 去重 + 混合检索 + 生成 |
将复杂查询分解为概念以进行并行检索。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 检测 | rules/query-detection.md | 启发式指示器 (<1ms 快速路径) |
| 分解 + RRF | rules/query-decompose.md | LLM 概念提取 + 并行检索 |
| HyDE 组合 | rules/query-hyde-combo.md | 分解 + HyDE 实现最大覆盖率 |
检索后重新评分以提高精度。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 交叉编码器 | rules/reranking-cross-encoder.md | ms-marco-MiniLM (~50ms,免费) |
| LLM 重排序 | rules/reranking-llm.md | 批量评分 + Cohere API |
| 组合 | rules/reranking-combined.md | 多信号加权评分 |
使用 PostgreSQL 进行生产级混合搜索。
| 规则 | 文件 | 关键模式 |
|---|---|---|
| 模式 | rules/pgvector-schema.md | HNSW 索引 + 预计算的 tsvector |
| 混合搜索 | rules/pgvector-hybrid-search.md | 使用 FULL OUTER JOIN 的 SQLAlchemy RRF |
| 索引 | rules/pgvector-indexing.md | HNSW (快 17 倍) vs IVFFlat |
| 元数据 | rules/pgvector-metadata.md | 过滤、提升、与 Redis 8 比较 |
from openai import OpenAI
client = OpenAI()
async def rag_query(question: str, top_k: int = 5) -> dict:
"""带引用的基础 RAG。"""
docs = await vector_db.search(question, limit=top_k)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])
response = await llm.chat([
{"role": "system", "content": "使用内联引用 [1], [2] 进行回答。仅使用提供的上下文。"},
{"role": "user", "content": f"上下文:\n{context}\n\n问题: {question}"}
])
return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}
| 决策 | 推荐 |
|---|---|
| 嵌入模型 | text-embedding-3-small (通用), voyage-3 (生产) |
| 分块大小 | 256-1024 令牌 (典型 512) |
| 混合权重 | 40% BM25 / 60% 向量 |
| Top-k | 3-10 个文档 |
| 温度 | 0.1-0.3 (事实性) |
| 上下文预算 | 4K-8K 令牌 |
| 重排序 | 检索 50 个,重排序至 10 个 |
| 向量索引 | HNSW (生产), IVFFlat (高容量) |
| HyDE 超时 | 2-3 秒并带有回退 |
| 查询分解 | 启发式优先,仅当多概念时使用 LLM |
查看 test-cases.json 了解所有类别的 30 个测试用例。
ork:langgraph - LangGraph 工作流模式 (用于智能体 RAG 工作流)caching - 缓存 RAG 响应以处理重复查询ork:golden-dataset - 评估检索质量ork:llm-integration - 使用 nomic-embed-text 进行本地嵌入vision-language-models - 用于多模态 RAG 的图像分析ork:database-patterns - 向量搜索的模式设计关键词: retrieval, context, chunks, relevance, rag 解决的问题:
关键词: hybrid, bm25, vector, fusion, rrf 解决的问题:
关键词: embedding, text to vector, vectorize, chunk, similarity 解决的问题:
关键词: contextual, anthropic, context-prepend, bm25 解决的问题:
关键词: hyde, hypothetical, vocabulary mismatch 解决的问题:
关键词: self-rag, crag, corrective, adaptive, grading 解决的问题:
关键词: multimodal, image, clip, vision, pdf 解决的问题:
关键词: decompose, multi-concept, complex query 解决的问题:
关键词: rerank, cross-encoder, precision, scoring 解决的问题:
关键词: pgvector, postgresql, hnsw, tsvector, hybrid 解决的问题:
每周安装量
138
代码库
GitHub 星标数
132
首次出现
2026 年 1 月 22 日
安全审计
安装于
codex127
gemini-cli126
opencode126
github-copilot123
cursor121
amp115
Comprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.
| Category | Rules | Impact | When to Use |
|---|---|---|---|
| Core RAG | 4 | CRITICAL | Basic RAG, citations, hybrid search, context management |
| Embeddings | 3 | HIGH | Model selection, chunking, batch/cache optimization |
| Contextual Retrieval | 3 | HIGH | Context-prepending, hybrid BM25+vector, pipeline |
| HyDE | 3 | HIGH | Vocabulary mismatch, hypothetical document generation |
| Agentic RAG | 4 | HIGH | Self-RAG, CRAG, knowledge graphs, adaptive routing |
| Multimodal RAG | 3 | MEDIUM | Image+text retrieval, PDF chunking, cross-modal search |
| Query Decomposition | 3 | MEDIUM | Multi-concept queries, parallel retrieval, RRF fusion |
| Reranking | 3 | MEDIUM | Cross-encoder, LLM scoring, combined signals |
| PGVector | 4 | HIGH | PostgreSQL hybrid search, HNSW indexes, schema design |
Total: 30 rules across 9 categories
Fundamental patterns for retrieval, generation, and pipeline composition.
| Rule | File | Key Pattern |
|---|---|---|
| Basic RAG | rules/core-basic-rag.md | Retrieve + context + generate with citations |
| Hybrid Search | rules/core-hybrid-search.md | RRF fusion (k=60) for semantic + keyword |
| Context Management | rules/core-context-management.md | Token budgeting + sufficiency check |
| Pipeline Composition | rules/core-pipeline-composition.md | Composable Decompose → HyDE → Retrieve → Rerank |
Embedding models, chunking strategies, and production optimization.
| Rule | File | Key Pattern |
|---|---|---|
| Models & API | rules/embeddings-models.md | Model selection, batch API, similarity |
| Chunking | rules/embeddings-chunking.md | Semantic boundary splitting, 512 token sweet spot |
| Advanced | rules/embeddings-advanced.md | Redis cache, Matryoshka dims, batch processing |
Anthropic's context-prepending technique — 67% fewer retrieval failures.
| Rule | File | Key Pattern |
|---|---|---|
| Context Prepending | rules/contextual-prepend.md | LLM-generated context + prompt caching |
| Hybrid Search | rules/contextual-hybrid.md | 40% BM25 / 60% vector weight split |
| Complete Pipeline | rules/contextual-pipeline.md | End-to-end indexing + hybrid retrieval |
Hypothetical Document Embeddings for bridging vocabulary gaps.
| Rule | File | Key Pattern |
|---|---|---|
| Generation | rules/hyde-generation.md | Embed hypothetical doc, not query |
| Per-Concept | rules/hyde-per-concept.md | Parallel HyDE for multi-topic queries |
| Fallback | rules/hyde-fallback.md | 2-3s timeout → direct embedding fallback |
Self-correcting retrieval with LLM-driven decision making.
| Rule | File | Key Pattern |
|---|---|---|
| Self-RAG | rules/agentic-self-rag.md | Binary document grading for relevance |
| Corrective RAG | rules/agentic-corrective-rag.md | CRAG workflow with web fallback |
| Knowledge Graph | rules/agentic-knowledge-graph.md | KG + vector hybrid for entity-rich domains |
| Adaptive Retrieval | rules/agentic-adaptive-retrieval.md | Query routing to optimal strategy |
Image + text retrieval with cross-modal search.
| Rule | File | Key Pattern |
|---|---|---|
| Embeddings | rules/multimodal-embeddings.md | CLIP, SigLIP 2, Voyage multimodal-3 |
| Chunking | rules/multimodal-chunking.md | PDF extraction preserving images |
| Pipeline | rules/multimodal-pipeline.md | Dedup + hybrid retrieval + generation |
Breaking complex queries into concepts for parallel retrieval.
| Rule | File | Key Pattern |
|---|---|---|
| Detection | rules/query-detection.md | Heuristic indicators (<1ms fast path) |
| Decompose + RRF | rules/query-decompose.md | LLM concept extraction + parallel retrieval |
| HyDE Combo | rules/query-hyde-combo.md | Decompose + HyDE for maximum coverage |
Post-retrieval re-scoring for higher precision.
| Rule | File | Key Pattern |
|---|---|---|
| Cross-Encoder | rules/reranking-cross-encoder.md | ms-marco-MiniLM (~50ms, free) |
| LLM Reranking | rules/reranking-llm.md | Batch scoring + Cohere API |
| Combined | rules/reranking-combined.md | Multi-signal weighted scoring |
Production hybrid search with PostgreSQL.
| Rule | File | Key Pattern |
|---|---|---|
| Schema | rules/pgvector-schema.md | HNSW index + pre-computed tsvector |
| Hybrid Search | rules/pgvector-hybrid-search.md | SQLAlchemy RRF with FULL OUTER JOIN |
| Indexing | rules/pgvector-indexing.md | HNSW (17x faster) vs IVFFlat |
| Metadata | rules/pgvector-metadata.md | Filtering, boosting, Redis 8 comparison |
from openai import OpenAI
client = OpenAI()
async def rag_query(question: str, top_k: int = 5) -> dict:
"""Basic RAG with citations."""
docs = await vector_db.search(question, limit=top_k)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])
response = await llm.chat([
{"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}
| Decision | Recommendation |
|---|---|
| Embedding model | text-embedding-3-small (general), voyage-3 (production) |
| Chunk size | 256-1024 tokens (512 typical) |
| Hybrid weight | 40% BM25 / 60% vector |
| Top-k | 3-10 documents |
| Temperature | 0.1-0.3 (factual) |
| Context budget | 4K-8K tokens |
| Reranking | Retrieve 50, rerank to 10 |
| Vector index | HNSW (production), IVFFlat (high-volume) |
| HyDE timeout | 2-3 seconds with fallback |
| Query decomposition | Heuristic first, LLM only if multi-concept |
See test-cases.json for 30 test cases across all categories.
ork:langgraph - LangGraph workflow patterns (for agentic RAG workflows)caching - Cache RAG responses for repeated queriesork:golden-dataset - Evaluate retrieval qualityork:llm-integration - Local embeddings with nomic-embed-textvision-language-models - Image analysis for multimodal RAGork:database-patterns - Schema design for vector searchKeywords: retrieval, context, chunks, relevance, rag Solves:
Keywords: hybrid, bm25, vector, fusion, rrf Solves:
Keywords: embedding, text to vector, vectorize, chunk, similarity Solves:
Keywords: contextual, anthropic, context-prepend, bm25 Solves:
Keywords: hyde, hypothetical, vocabulary mismatch Solves:
Keywords: self-rag, crag, corrective, adaptive, grading Solves:
Keywords: multimodal, image, clip, vision, pdf Solves:
Keywords: decompose, multi-concept, complex query Solves:
Keywords: rerank, cross-encoder, precision, scoring Solves:
Keywords: pgvector, postgresql, hnsw, tsvector, hybrid Solves:
Weekly Installs
138
Repository
GitHub Stars
132
First Seen
Jan 22, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
codex127
gemini-cli126
opencode126
github-copilot123
cursor121
amp115
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
49,000 周安装