RAG检索增强生成系统实现指南：构建基于外部知识的AI问答与文档助手

rag by giuseppe-trisciuoglio/developer-kit

313 周安装量

173 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/giuseppe-trisciuoglio/developer-kit --skill rag

AI/机器学习知识管理自然语言处理

🇨🇳中文介绍

RAG 实现

构建检索增强生成系统，通过外部知识源扩展 AI 能力。

概述

RAG（检索增强生成）通过从知识库中检索相关信息并将其融入 AI 响应中，从而增强 AI 应用，减少幻觉并提供准确、有依据的答案。

适用场景

在以下情况下使用此技能：

基于专有文档构建问答系统
创建包含最新、事实性信息的聊天机器人
实现支持自然语言查询的语义搜索
通过有依据的响应减少幻觉
使 AI 系统能够访问特定领域知识
构建文档助手
创建带有来源引用的研究工具
开发知识管理系统

操作指南

步骤 1：选择向量数据库

根据您的需求选择合适的向量数据库：

对于生产可扩展性：使用 Pinecone 或 Milvus
对于开源要求：使用 Weaviate 或 Qdrant
对于本地开发：使用 Chroma 或 FAISS
对于混合搜索需求：使用支持 BM25 的 Weaviate

步骤 2：选择嵌入模型

根据您的用例选择嵌入模型：

通用目的：text-embedding-ada-002 (OpenAI)
快速轻量：all-MiniLM-L6-v2
多语言支持：e5-large-v2
最佳性能：bge-large-en-v1.5

步骤 3：实现文档处理流水线

从您的源（文件系统、数据库、API）加载文档
清理和预处理文档（移除格式伪影，规范化文本）
使用适当的分块策略将文档分割成块

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 4：配置检索策略

密集检索：对于大多数用例，通过嵌入向量使用语义相似性
混合搜索：结合密集 + 稀疏检索以获得更好的覆盖率
元数据过滤：基于文档属性添加过滤器
重排序：针对高精度需求实现交叉编码器重排序

步骤 5：构建 RAG 流水线

使用您的嵌入存储创建内容检索器
使用检索器和聊天记忆配置 AI 服务
实现带有上下文注入的提示模板
添加响应验证和依据检查

步骤 6：评估与优化

衡量检索指标（precision@k, recall@k, MRR）
评估答案质量（忠实度、相关性）
监控性能和用户反馈
迭代优化分块、检索和提示参数

示例 1：基础文档问答系统

// Simple RAG setup for document Q&A
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs");

InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(EmbeddingStoreContentRetriever.from(store))
    .build();

String answer = assistant.answer("What is the company policy on remote work?");

示例 2：元数据过滤检索

// RAG with metadata filtering for specific document categories
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

示例 3：多源 RAG 流水线

// Combine multiple knowledge sources
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore);

List<Content> results = new ArrayList<>();
results.addAll(webRetriever.retrieve(query));
results.addAll(docRetriever.retrieve(query));

// Rerank and return top results
List<Content> topResults = reranker.reorder(query, results).subList(0, 5);

示例 4：带聊天记忆的 RAG

// Conversational RAG with context retention
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(retriever)
    .build();

// Multi-turn conversation with context
assistant.chat("Tell me about the product features");
assistant.chat("What about pricing for those features?");  // Maintains context

在以下情况下使用此技能：

基于专有文档构建问答系统
创建包含最新、事实性信息的聊天机器人
实现支持自然语言查询的语义搜索
通过有依据的响应减少幻觉
使 AI 系统能够访问特定领域知识
构建文档助手
创建带有来源引用的研究工具
开发知识管理系统

存储并高效检索文档嵌入向量以进行语义搜索。

Pinecone：托管、可扩展、生产就绪
Weaviate：开源，具备混合搜索能力
Milvus：高性能，支持本地部署
Chroma：轻量级，易于本地开发
Qdrant：快速，具备高级过滤功能
FAISS：Meta 的库，完全控制

将文本转换为数值向量以进行相似性搜索。

text-embedding-ada-002 (OpenAI)：通用目的，1536 维
all-MiniLM-L6-v2：快速、轻量，384 维
e5-large-v2：高质量，多语言
bge-large-en-v1.5：最先进的性能

根据用户查询查找相关内容。

密集检索：通过嵌入向量进行语义相似性匹配
稀疏检索：关键词匹配（BM25, TF-IDF）
混合搜索：结合密集 + 稀疏检索以获得最佳结果
多查询：生成多个查询变体
上下文压缩：仅提取相关部分

// Load documents from file system
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");

// Create embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();

// Ingest documents into the store
EmbeddingStoreIngestor.ingest(documents, embeddingStore);

// Create AI service with RAG capability
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
    .build();

文档处理流水线

// Split documents into chunks
DocumentSplitter splitter = new RecursiveCharacterTextSplitter(
    500,  // chunk size
    100   // overlap
);

// Create embedding model
EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .build();

// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
    .host("localhost")
    .database("postgres")
    .user("postgres")
    .password(System.getenv("DB_PASSWORD"))
    .table("embeddings")
    .dimension(1536)
    .build();

// Process and store documents
for (Document document : documents) {
    List<TextSegment> segments = splitter.split(document);
    for (TextSegment segment : segments) {
        Embedding embedding = embeddingModel.embed(segment).content();
        embeddingStore.add(embedding, segment);
    }
}

模式 1：简单文档问答

基于您的文档创建一个基础的问答系统。

public interface DocumentAssistant {
    String answer(String question);
}

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(retriever)
    .build();

模式 2：元数据过滤检索

基于文档元数据过滤结果。

// Add metadata during document loading
Document document = Document.builder()
    .text("Content here")
    .metadata("source", "technical-manual.pdf")
    .metadata("category", "technical")
    .metadata("date", "2024-01-15")
    .build();

// Filter during retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(embeddingStore)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

模式 3：多源检索

合并来自多个知识源的结果。

ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore);
ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore);

// Combine results
List<Content> allResults = new ArrayList<>();
allResults.addAll(webRetriever.retrieve(query));
allResults.addAll(documentRetriever.retrieve(query));
allResults.addAll(databaseRetriever.retrieve(query));

// Rerank combined results
List<Content> rerankedResults = reranker.reorder(query, allResults);

在摄取前清理和预处理文档
移除无关内容和格式伪影
标准化文档结构以确保处理一致性
添加相关元数据以便过滤和提供上下文

每个块使用 500-1000 个令牌以获得最佳平衡
包含 10-20% 的重叠以在边界处保留上下文
在确定块边界时考虑文档结构
针对您的特定用例测试不同的块大小

从较高的 k 值（10-20）开始，然后进行过滤/重排序
使用元数据过滤提高相关性
结合多种检索策略以获得更好的覆盖率
监控检索质量和用户反馈

缓存频繁访问内容的嵌入向量
使用批处理进行文档摄取
根据您的规模优化向量存储配置
监控查询性能和系统资源

常见问题与解决方案

问题：检索到的文档与用户查询不匹配 解决方案：

改进文档预处理和清理
调整分块大小和重叠参数
尝试不同的嵌入模型
使用结合语义和关键词匹配的混合搜索

问题：检索到的文档包含相关信息但不够具体 解决方案：

为特定领域约束添加元数据过滤
使用交叉编码器模型实现重排序
使用上下文压缩提取相关部分
微调检索参数（k 值，相似度阈值）

问题：检索过程中响应时间慢 解决方案：

优化向量存储配置和索引
为频繁检索的内容实现缓存
使用较小的嵌入模型以加快推理速度
考虑近似最近邻算法

问题：AI 生成了检索文档中不存在的信息 解决方案：

改进提示工程以强调依据
添加验证步骤以检查答案一致性
为响应包含置信度评分
实现事实核查机制

Precision@k：前 k 个结果中相关文档的百分比
Recall@k：在前 k 个结果中找到的所有相关文档的百分比
平均倒数排名：第一个相关结果的平均排名
归一化折损累计增益：排名质量指标

忠实度：答案基于检索文档的程度
答案相关性：答案与用户问题的匹配程度
上下文召回率：答案中使用的相关上下文的百分比
上下文精确率：检索到的相关上下文的百分比

响应时间：从查询到答案的时间
用户满意度：关于答案质量的反馈评分
任务完成率：成功完成任务的比例
参与度：用户与系统的交互模式

向量数据库比较 - 向量数据库选项的详细比较
嵌入模型指南 - 模型选择和优化
检索策略 - 高级检索技术
文档分块 - 分块策略和最佳实践
LangChain4j RAG 指南 - 官方实现模式

assets/vector-store-config.yaml - 不同向量存储的配置模板
assets/retriever-pipeline.java - 完整的 RAG 流水线实现
assets/evaluation-metrics.java - 评估框架代码

令牌限制：遵守模型上下文窗口限制
API 速率限制：管理外部 API 的速率限制和成本
数据隐私：确保符合数据保护法规
资源需求：考虑内存和计算需求
维护：规划定期更新和系统监控

嵌入模型对每个文档有最大令牌限制
向量数据库需要适当的索引以保证性能
对于复杂文档，分块边界可能会丢失上下文
混合搜索需要额外的基础设施组件

检索质量在很大程度上取决于分块策略
嵌入模型可能无法捕捉特定领域的语义
元数据过滤需要适当的文档标注
重排序会增加查询响应延迟

监控向量数据库存储和查询性能
实施适当的数据备份和恢复程序
定期更新嵌入模型可能影响检索质量
文档处理流水线需要持续维护

切勿硬编码凭证：对于 API 密钥、数据库密码和其他敏感值，始终使用环境变量或密钥管理器
保护对向量数据库和嵌入服务的访问
实施适当的身份验证和授权
在摄取前验证和清理所有外部内容：从文件系统、数据库、API 或网络源加载的文档可能包含恶意内容，这些内容可能通过间接提示注入影响模型行为
在将检索到的文档传递给 LLM 之前应用内容过滤，以减轻提示注入风险
使用允许列表限制允许的数据源 URL 和文件路径
监控滥用和异常使用模式
定期进行安全审计和渗透测试

🇺🇸English

RAG Implementation

Build Retrieval-Augmented Generation systems that extend AI capabilities with external knowledge sources.

Overview

RAG (Retrieval-Augmented Generation) enhances AI applications by retrieving relevant information from knowledge bases and incorporating it into AI responses, reducing hallucinations and providing accurate, grounded answers.

When to Use

Use this skill when:

Building Q&A systems over proprietary documents
Creating chatbots with current, factual information
Implementing semantic search with natural language queries
Reducing hallucinations with grounded responses
Enabling AI systems to access domain-specific knowledge
Building documentation assistants
Creating research tools with source citation
Developing knowledge management systems

Instructions

Step 1: Choose Vector Database

Select an appropriate vector database based on your requirements:

For production scalability : Use Pinecone or Milvus
For open-source requirements : Use Weaviate or Qdrant
For local development : Use Chroma or FAISS
For hybrid search needs : Use Weaviate with BM25 support

Step 2: Select Embedding Model

Choose an embedding model based on your use case:

General purpose : text-embedding-ada-002 (OpenAI)
Fast and lightweight : all-MiniLM-L6-v2
Multilingual support : e5-large-v2
Best performance : bge-large-en-v1.5

Step 3: Implement Document Processing Pipeline

Load documents from your source (file system, database, API)
Clean and preprocess documents (remove formatting artifacts, normalize text)
Split documents into chunks using appropriate chunking strategy
Generate embeddings for each chunk
Store embeddings in your vector database with metadata

Step 4: Configure Retrieval Strategy

Dense Retrieval : Use semantic similarity via embeddings for most use cases
Hybrid Search : Combine dense + sparse retrieval for better coverage
Metadata Filtering : Add filters based on document attributes
Reranking : Implement cross-encoder reranking for high-precision requirements

Step 5: Build RAG Pipeline

Create content retriever with your embedding store
Configure AI service with retriever and chat memory
Implement prompt template with context injection
Add response validation and grounding checks

Step 6: Evaluate and Optimize

Measure retrieval metrics (precision@k, recall@k, MRR)
Evaluate answer quality (faithfulness, relevance)
Monitor performance and user feedback
Iterate on chunking, retrieval, and prompt parameters

Examples

Example 1: Basic Document Q&A System

// Simple RAG setup for document Q&A
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/docs");

InMemoryEmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor.ingest(documents, store);

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(EmbeddingStoreContentRetriever.from(store))
    .build();

String answer = assistant.answer("What is the company policy on remote work?");

Example 2: Metadata-Filtered Retrieval

// RAG with metadata filtering for specific document categories
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

Example 3: Multi-Source RAG Pipeline

// Combine multiple knowledge sources
ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever docRetriever = EmbeddingStoreContentRetriever.from(docStore);

List<Content> results = new ArrayList<>();
results.addAll(webRetriever.retrieve(query));
results.addAll(docRetriever.retrieve(query));

// Rerank and return top results
List<Content> topResults = reranker.reorder(query, results).subList(0, 5);

Example 4: RAG with Chat Memory

// Conversational RAG with context retention
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(retriever)
    .build();

// Multi-turn conversation with context
assistant.chat("Tell me about the product features");
assistant.chat("What about pricing for those features?");  // Maintains context

Use this skill when:

Building Q&A systems over proprietary documents
Creating chatbots with current, factual information
Implementing semantic search with natural language queries
Reducing hallucinations with grounded responses
Enabling AI systems to access domain-specific knowledge
Building documentation assistants
Creating research tools with source citation
Developing knowledge management systems

Core Components

Vector Databases

Store and efficiently retrieve document embeddings for semantic search.

Key Options:

Pinecone : Managed, scalable, production-ready
Weaviate : Open-source, hybrid search capabilities
Milvus : High performance, on-premise deployment
Chroma : Lightweight, easy local development
Qdrant : Fast, advanced filtering
FAISS : Meta's library, full control

Embedding Models

Convert text to numerical vectors for similarity search.

Popular Models:

text-embedding-ada-002 (OpenAI): General purpose, 1536 dimensions
all-MiniLM-L6-v2 : Fast, lightweight, 384 dimensions
e5-large-v2 : High quality, multilingual
bge-large-en-v1.5 : State-of-the-art performance

Retrieval Strategies

Find relevant content based on user queries.

Approaches:

Dense Retrieval : Semantic similarity via embeddings
Sparse Retrieval : Keyword matching (BM25, TF-IDF)
Hybrid Search : Combine dense + sparse for best results
Multi-Query : Generate multiple query variations
Contextual Compression : Extract only relevant parts

Quick Implementation

Basic RAG Setup

// Load documents from file system
List<Document> documents = FileSystemDocumentLoader.loadDocuments("/path/to/docs");

// Create embedding store
InMemoryEmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();

// Ingest documents into the store
EmbeddingStoreIngestor.ingest(documents, embeddingStore);

// Create AI service with RAG capability
Assistant assistant = AiServices.builder(Assistant.class)
    .chatModel(chatModel)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
    .build();

Document Processing Pipeline

// Split documents into chunks
DocumentSplitter splitter = new RecursiveCharacterTextSplitter(
    500,  // chunk size
    100   // overlap
);

// Create embedding model
EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .build();

// Create embedding store
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
    .host("localhost")
    .database("postgres")
    .user("postgres")
    .password(System.getenv("DB_PASSWORD"))
    .table("embeddings")
    .dimension(1536)
    .build();

// Process and store documents
for (Document document : documents) {
    List<TextSegment> segments = splitter.split(document);
    for (TextSegment segment : segments) {
        Embedding embedding = embeddingModel.embed(segment).content();
        embeddingStore.add(embedding, segment);
    }
}

Implementation Patterns

Pattern 1: Simple Document Q&A

Create a basic Q&A system over your documents.

public interface DocumentAssistant {
    String answer(String question);
}

DocumentAssistant assistant = AiServices.builder(DocumentAssistant.class)
    .chatModel(chatModel)
    .contentRetriever(retriever)
    .build();

Pattern 2: Metadata-Filtered Retrieval

Filter results based on document metadata.

// Add metadata during document loading
Document document = Document.builder()
    .text("Content here")
    .metadata("source", "technical-manual.pdf")
    .metadata("category", "technical")
    .metadata("date", "2024-01-15")
    .build();

// Filter during retrieval
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(embeddingStore)
    .embeddingModel(embeddingModel)
    .maxResults(5)
    .minScore(0.7)
    .filter(metadataKey("category").isEqualTo("technical"))
    .build();

Pattern 3: Multi-Source Retrieval

Combine results from multiple knowledge sources.

ContentRetriever webRetriever = EmbeddingStoreContentRetriever.from(webStore);
ContentRetriever documentRetriever = EmbeddingStoreContentRetriever.from(documentStore);
ContentRetriever databaseRetriever = EmbeddingStoreContentRetriever.from(databaseStore);

// Combine results
List<Content> allResults = new ArrayList<>();
allResults.addAll(webRetriever.retrieve(query));
allResults.addAll(documentRetriever.retrieve(query));
allResults.addAll(databaseRetriever.retrieve(query));

// Rerank combined results
List<Content> rerankedResults = reranker.reorder(query, allResults);

Best Practices

Document Preparation

Clean and preprocess documents before ingestion
Remove irrelevant content and formatting artifacts
Standardize document structure for consistent processing
Add relevant metadata for filtering and context

Chunking Strategy

Use 500-1000 tokens per chunk for optimal balance
Include 10-20% overlap to preserve context at boundaries
Consider document structure when determining chunk boundaries
Test different chunk sizes for your specific use case

Retrieval Optimization

Start with high k values (10-20) then filter/rerank
Use metadata filtering to improve relevance
Combine multiple retrieval strategies for better coverage
Monitor retrieval quality and user feedback

Performance Considerations

Cache embeddings for frequently accessed content
Use batch processing for document ingestion
Optimize vector store configuration for your scale
Monitor query performance and system resources

Common Issues and Solutions

Poor Retrieval Quality

Problem : Retrieved documents don't match user queries Solutions :

Improve document preprocessing and cleaning
Adjust chunk size and overlap parameters
Try different embedding models
Use hybrid search combining semantic and keyword matching

Irrelevant Results

Problem : Retrieved documents contain relevant information but are not specific enough Solutions :

Add metadata filtering for domain-specific constraints
Implement reranking with cross-encoder models
Use contextual compression to extract relevant parts
Fine-tune retrieval parameters (k values, similarity thresholds)

Performance Issues

Problem : Slow response times during retrieval Solutions :

Optimize vector store configuration and indexing
Implement caching for frequently retrieved content
Use smaller embedding models for faster inference
Consider approximate nearest neighbor algorithms

Hallucination Prevention

Problem : AI generates information not present in retrieved documents Solutions :

Improve prompt engineering to emphasize grounding
Add verification steps to check answer alignment
Include confidence scoring for responses
Implement fact-checking mechanisms

Evaluation Framework

Retrieval Metrics

Precision@k : Percentage of relevant documents in top-k results
Recall@k : Percentage of all relevant documents found in top-k results
Mean Reciprocal Rank (MRR) : Average rank of first relevant result
Normalized Discounted Cumulative Gain (nDCG) : Ranking quality metric

Answer Quality Metrics

Faithfulness : Degree to which answers are grounded in retrieved documents
Answer Relevance : How well answers address user questions
Context Recall : Percentage of relevant context used in answers
Context Precision : Percentage of retrieved context that is relevant

User Experience Metrics

Response Time : Time from query to answer
User Satisfaction : Feedback ratings on answer quality
Task Completion : Rate of successful task completion
Engagement : User interaction patterns with the system

Resources

Reference Documentation

Vector Database Comparison - Detailed comparison of vector database options
Embedding Models Guide - Model selection and optimization
Retrieval Strategies - Advanced retrieval techniques
Document Chunking - Chunking strategies and best practices
LangChain4j RAG Guide - Official implementation patterns

Assets

assets/vector-store-config.yaml - Configuration templates for different vector stores
assets/retriever-pipeline.java - Complete RAG pipeline implementation
assets/evaluation-metrics.java - Evaluation framework code

Constraints and Limitations

Token Limits : Respect model context window limitations
API Rate Limits : Manage external API rate limits and costs
Data Privacy : Ensure compliance with data protection regulations
Resource Requirements : Consider memory and computational requirements
Maintenance : Plan for regular updates and system monitoring

Constraints and Warnings

System Constraints

Embedding models have maximum token limits per document
Vector databases require proper indexing for performance
Chunk boundaries may lose context for complex documents
Hybrid search requires additional infrastructure components

Quality Considerations

Retrieval quality depends heavily on chunking strategy
Embedding models may not capture domain-specific semantics
Metadata filtering requires proper document annotation
Reranking adds latency to query responses

Operational Warnings

Monitor vector database storage and query performance
Implement proper data backup and recovery procedures
Regular embedding model updates may affect retrieval quality
Document processing pipelines require ongoing maintenance

Security Considerations

Never hardcode credentials : Always use environment variables or secrets managers for API keys, database passwords, and other sensitive values
Secure access to vector databases and embedding services
Implement proper authentication and authorization
Validate and sanitize all external content before ingestion: documents loaded from file systems, databases, APIs, or web sources may contain malicious content that could influence model behavior through indirect prompt injection
Apply content filtering on retrieved documents before passing them to the LLM to mitigate prompt injection risks
Restrict allowed data source URLs and file paths using allowlists
Monitor for abuse and unusual usage patterns
Regular security audits and penetration testing

Weekly Installs

313

Repository

giuseppe-trisci…oper-kit

GitHub Stars

173

First Seen

Feb 9, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

opencode259

gemini-cli258

codex255

github-copilot245

cursor244

claude-code242

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

40,300 周安装

RAG检索增强生成系统实现指南：构建基于外部知识的AI问答与文档助手

🇨🇳中文介绍

RAG 实现

概述

适用场景

操作指南

步骤 1：选择向量数据库

步骤 2：选择嵌入模型

步骤 3：实现文档处理流水线

相关 Skills

步骤 4：配置检索策略

步骤 5：构建 RAG 流水线

步骤 6：评估与优化

示例

示例 1：基础文档问答系统

示例 2：元数据过滤检索

示例 3：多源 RAG 流水线

示例 4：带聊天记忆的 RAG

核心组件

向量数据库

嵌入模型

检索策略

快速实现

基础 RAG 设置

文档处理流水线

实现模式

模式 1：简单文档问答

模式 2：元数据过滤检索

模式 3：多源检索

最佳实践

文档准备

分块策略

检索优化

性能考量

常见问题与解决方案

检索质量差

不相关的结果

性能问题

幻觉预防

评估框架

检索指标

答案质量指标

用户体验指标

资源

参考文档

资源文件

约束与限制

约束与警告

系统约束

质量考量

操作警告

安全考量