LangChain4j RAG实现模式：构建文档对话与知识增强AI应用指南

langchain4j-rag-implementation-patterns by giuseppe-trisciuoglio/developer-kit

359 周安装量

174 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/giuseppe-trisciuoglio/developer-kit --skill langchain4j-rag-implementation-patterns

AI/机器学习 Spring Boot 自然语言处理

🇨🇳中文介绍

LangChain4j RAG 实现模式

概述

使用 LangChain4j 实现 RAG 系统：文档摄取管道、嵌入存储和向量搜索，用于文档对话和知识增强的 AI 应用。

何时使用此技能

构建基于 PDF、文本文件或网页的文档对话系统或文档问答
创建能够访问公司知识库或外部源的 AI 助手
在文档存储库上实现语义搜索或混合搜索
构建具有精选知识和来源归属的领域特定 AI

使用说明

初始化 RAG 项目

创建包含所需依赖项的新 Spring Boot 项目：

pom.xml：

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-spring-boot-starter</artifactId>
    <version>1.8.0</version>
</dependency>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-ai</artifactId>
    <version>1.8.0</version>
</dependency>

设置文档摄取

配置文档加载和处理并进行验证：

验证检查点：摄取后，验证嵌入数量是否与分段数量匹配，并使用示例查询测试检索。

@Configuration
public class RAGConfiguration {

    @Bean
    public EmbeddingModel embeddingModel() {
        return OpenAiEmbeddingModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("text-embedding-3-small")
            .build();
    }

    @Bean
    public EmbeddingStore<TextSegment> embeddingStore() {
        return new InMemoryEmbeddingStore<>();
    }
}

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

创建支持 RAG 的 AI 服务

定义带上下文检索的 AI 服务：

interface KnowledgeAssistant {
    @SystemMessage("""
        You are a knowledgeable assistant with access to a comprehensive knowledge base.

        When answering questions:
        1. Use the provided context from the knowledge base
        2. If information is not in the context, clearly state this
        3. Provide accurate, helpful responses
        4. When possible, reference specific sources
        5. If the context is insufficient, ask for clarification
        """)
    String answerQuestion(String question);
}

@Service
@RequiredArgsConstructor
public class KnowledgeService {

    private final KnowledgeAssistant assistant;

    public KnowledgeService(ChatModel chatModel, ContentRetriever contentRetriever) {
        this.assistant = AiServices.builder(KnowledgeAssistant.class)
            .chatModel(chatModel)
            .contentRetriever(contentRetriever)
            .build();
    }

    public String answerQuestion(String question) {
        return assistant.answerQuestion(question);
    }
}

public class BasicRAGExample {
    public static void main(String[] args) {
        var embeddingStore = new InMemoryEmbeddingStore<TextSegment>();

        var embeddingModel = OpenAiEmbeddingModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("text-embedding-3-small")
            .build();

        var ingestor = EmbeddingStoreIngestor.builder()
            .embeddingModel(embeddingModel)
            .embeddingStore(embeddingStore)
            .build();

        ingestor.ingest(Document.from("Spring Boot is a framework for building Java applications with minimal configuration."));

        var retriever = EmbeddingStoreContentRetriever.builder()
            .embeddingStore(embeddingStore)
            .embeddingModel(embeddingModel)
            .build();
    }
}

interface MultiDomainAssistant {
    @SystemMessage("""
        You are an expert assistant with access to multiple knowledge domains:
        - Technical documentation
        - Company policies
        - Product information
        - Customer support guides

        Tailor your response based on the type of question and available context.
        Always indicate which domain the information comes from.
        """)
    String answerQuestion(@MemoryId String userId, String question);
}

@Service
@RequiredArgsConstructor
public class HierarchicalRAGService {

    private final EmbeddingStore<TextSegment> chunkStore;
    private final EmbeddingStore<TextSegment> summaryStore;
    private final EmbeddingModel embeddingModel;

    public String performHierarchicalRetrieval(String query) {
        List<EmbeddingMatch<TextSegment>> summaryMatches = searchSummaries(query);
        List<TextSegment> relevantChunks = new ArrayList<>();

        for (EmbeddingMatch<TextSegment> summaryMatch : summaryMatches) {
            String documentId = summaryMatch.embedded().metadata().getString("documentId");
            List<EmbeddingMatch<TextSegment>> chunkMatches = searchChunksInDocument(query, documentId);
            chunkMatches.stream()
                .map(EmbeddingMatch::embedded)
                .forEach(relevantChunks::add);
        }

        return generateResponseWithChunks(query, relevantChunks);
    }
}

对于大多数应用，使用 500-1000 个令牌的块进行递归分割
在块之间保持 20-50 个令牌的重叠以保留上下文
分割时考虑文档结构（标题、段落）
使用令牌感知的分割器以获得最佳嵌入生成

包含丰富的元数据以进行过滤和归属：
- 用于多租户的用户和租户标识符
- 文档类型和类别分类
- 创建和修改时间戳
- 版本和作者信息
- 机密性和访问级别标签

实现查询预处理和清理
考虑查询扩展以提高召回率
根据用户上下文应用动态过滤
使用重新排序以提高结果质量

缓存重复查询的嵌入
对批量操作使用批量嵌入生成
对大型结果集实现分页
对长时间操作考虑异步处理

@RequiredArgsConstructor
@Service
public class SimpleRAGPipeline {

    private final EmbeddingModel embeddingModel;
    private final EmbeddingStore<TextSegment> embeddingStore;
    private final ChatModel chatModel;

    public String answerQuestion(String question) {
        Embedding queryEmbedding = embeddingModel.embed(question).content();
        EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(3)
            .build();

        List<TextSegment> segments = embeddingStore.search(request).matches().stream()
            .map(EmbeddingMatch::embedded)
            .collect(Collectors.toList());

        String context = segments.stream()
            .map(TextSegment::text)
            .collect(Collectors.joining("\n\n"));

        return chatModel.generate(context + "\n\nQuestion: " + question + "\nAnswer:");
    }
}

混合搜索（向量 + 关键词）

@Service
@RequiredArgsConstructor
public class HybridSearchService {

    private final EmbeddingStore<TextSegment> vectorStore;
    private final FullTextSearchEngine keywordEngine;
    private final EmbeddingModel embeddingModel;

    public List<Content> hybridSearch(String query, int maxResults) {
        // 向量搜索
        List<Content> vectorResults = performVectorSearch(query, maxResults);

        // 关键词搜索
        List<Content> keywordResults = performKeywordSearch(query, maxResults);

        // 使用 RRF 算法组合并重新排序
        return combineResults(vectorResults, keywordResults, maxResults);
    }
}

嵌入数量不匹配：当分段数量 != 嵌入数量时抛出。检查分割器配置和模型可用性。

检索结果为空：调用 validateIngestion(testQuery) 以验证嵌入是否可搜索。检查文档是否成功摄取。

检索分数低：验证 minScore 阈值（默认 0.7）对于您的用例是否过高。使用已知查询进行测试。

检查文档块大小和重叠设置
验证嵌入模型兼容性
确保元数据过滤器限制性不强
考虑添加重新排序步骤
运行验证以确认嵌入存在

对频繁查询使用缓存的嵌入
为向量存储优化数据库索引
对大型数据集实现分页
对批量操作考虑异步处理

内存使用率高

对大型数据集使用基于磁盘的嵌入存储
实现适当的分页和过滤
定期清理未使用的嵌入
监控和优化块大小

嵌入模型成本：为大型文档集合生成嵌入可能很昂贵；实现缓存和批处理。
向量存储可扩展性：内存存储仅适用于开发；生产环境使用持久存储（Pinecone、Qdrant、Redis）。
块大小权衡：较小的块提高精度但丢失上下文；较大的块保留上下文但可能引入噪声。
数据陈旧：当源文档更改时，缓存的嵌入会变得陈旧；实现更新策略。
令牌限制：RAG 上下文窗口有限制；通常 3-5 个检索到的块适合标准模型限制。
幻觉风险：RAG 减少但不消除幻觉；始终根据来源验证关键响应。
延迟：向量搜索和嵌入生成会增加延迟；对实时应用考虑异步处理。
元数据过滤：限制性过强的过滤器可能不返回结果；实现回退策略。
多租户：确保适当的元数据隔离以防止跨租户数据泄漏。

API 参考 - 完整的 API 文档和接口
示例 - 生产就绪的示例和模式
官方 LangChain4j 文档

🇺🇸English

LangChain4j RAG Implementation Patterns

Overview

Implements RAG systems with LangChain4j: document ingestion pipelines, embedding stores, and vector search for chat-with-documents and knowledge-enhanced AI applications.

When to Use This Skill

Building chat-with-documents systems or document Q&A over PDFs, text files, or web pages
Creating AI assistants with access to company knowledge bases or external sources
Implementing semantic search or hybrid search over document repositories
Building domain-specific AI with curated knowledge and source attribution

Instructions

Initialize RAG Project

Create a new Spring Boot project with required dependencies:

pom.xml :

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-spring-boot-starter</artifactId>
    <version>1.8.0</version>
</dependency>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-ai</artifactId>
    <version>1.8.0</version>
</dependency>

Setup Document Ingestion

Configure document loading and processing with validation:

Validation Checkpoint : After ingestion, verify embedding count matches segment count and test retrieval with a sample query.

@Configuration
public class RAGConfiguration {

    @Bean
    public EmbeddingModel embeddingModel() {
        return OpenAiEmbeddingModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("text-embedding-3-small")
            .build();
    }

    @Bean
    public EmbeddingStore<TextSegment> embeddingStore() {
        return new InMemoryEmbeddingStore<>();
    }
}

Create document ingestion service:

@Service
@RequiredArgsConstructor
public class DocumentIngestionService {

    private final EmbeddingModel embeddingModel;
    private final EmbeddingStore<TextSegment> embeddingStore;

    public void ingestDocument(String filePath, Map<String, Object> metadata) {
        Document document = FileSystemDocumentLoader.loadDocument(filePath);
        document.metadata().putAll(metadata);

        DocumentSplitter splitter = DocumentSplitters.recursive(
            500, 50, new OpenAiTokenCountEstimator("text-embedding-3-small")
        );

        List<TextSegment> segments = splitter.split(document);
        List<Embedding> embeddings = embeddingModel.embedAll(segments).content();
        embeddingStore.addAll(embeddings, segments);

        // Validation: verify embedding count matches segments
        if (embeddings.size() != segments.size()) {
            throw new IllegalStateException("Embedding count mismatch: expected " + segments.size() + ", got " + embeddings.size());
        }
    }

    public boolean validateIngestion(String testQuery) {
        // Validation: test retrieval with sample query
        Embedding queryEmbedding = embeddingModel.embed(testQuery).content();
        List<EmbeddingMatch<TextSegment>> results = embeddingStore.search(
            EmbeddingSearchRequest.builder()
                .queryEmbedding(queryEmbedding)
                .maxResults(1)
                .build()
        ).matches();
        return !results.isEmpty();
    }
}

Configure Content Retrieval

Setup content retrieval with filtering:

Validation Checkpoint : After configuration, test retrieval with a known query to verify embeddings are searchable.

@Configuration
public class ContentRetrieverConfiguration {

    @Bean
    public ContentRetriever contentRetriever(
            EmbeddingStore<TextSegment> embeddingStore,
            EmbeddingModel embeddingModel) {

        return EmbeddingStoreContentRetriever.builder()
            .embeddingStore(embeddingStore)
            .embeddingModel(embeddingModel)
            .maxResults(5)
            .minScore(0.7)
            .build();
    }
}

Create RAG-Enabled AI Service

Define AI service with context retrieval:

interface KnowledgeAssistant {
    @SystemMessage("""
        You are a knowledgeable assistant with access to a comprehensive knowledge base.

        When answering questions:
        1. Use the provided context from the knowledge base
        2. If information is not in the context, clearly state this
        3. Provide accurate, helpful responses
        4. When possible, reference specific sources
        5. If the context is insufficient, ask for clarification
        """)
    String answerQuestion(String question);
}

@Service
@RequiredArgsConstructor
public class KnowledgeService {

    private final KnowledgeAssistant assistant;

    public KnowledgeService(ChatModel chatModel, ContentRetriever contentRetriever) {
        this.assistant = AiServices.builder(KnowledgeAssistant.class)
            .chatModel(chatModel)
            .contentRetriever(contentRetriever)
            .build();
    }

    public String answerQuestion(String question) {
        return assistant.answerQuestion(question);
    }
}

Examples

Basic Document Processing

public class BasicRAGExample {
    public static void main(String[] args) {
        var embeddingStore = new InMemoryEmbeddingStore<TextSegment>();

        var embeddingModel = OpenAiEmbeddingModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("text-embedding-3-small")
            .build();

        var ingestor = EmbeddingStoreIngestor.builder()
            .embeddingModel(embeddingModel)
            .embeddingStore(embeddingStore)
            .build();

        ingestor.ingest(Document.from("Spring Boot is a framework for building Java applications with minimal configuration."));

        var retriever = EmbeddingStoreContentRetriever.builder()
            .embeddingStore(embeddingStore)
            .embeddingModel(embeddingModel)
            .build();
    }
}

Multi-Domain Assistant

interface MultiDomainAssistant {
    @SystemMessage("""
        You are an expert assistant with access to multiple knowledge domains:
        - Technical documentation
        - Company policies
        - Product information
        - Customer support guides

        Tailor your response based on the type of question and available context.
        Always indicate which domain the information comes from.
        """)
    String answerQuestion(@MemoryId String userId, String question);
}

Hierarchical RAG

@Service
@RequiredArgsConstructor
public class HierarchicalRAGService {

    private final EmbeddingStore<TextSegment> chunkStore;
    private final EmbeddingStore<TextSegment> summaryStore;
    private final EmbeddingModel embeddingModel;

    public String performHierarchicalRetrieval(String query) {
        List<EmbeddingMatch<TextSegment>> summaryMatches = searchSummaries(query);
        List<TextSegment> relevantChunks = new ArrayList<>();

        for (EmbeddingMatch<TextSegment> summaryMatch : summaryMatches) {
            String documentId = summaryMatch.embedded().metadata().getString("documentId");
            List<EmbeddingMatch<TextSegment>> chunkMatches = searchChunksInDocument(query, documentId);
            chunkMatches.stream()
                .map(EmbeddingMatch::embedded)
                .forEach(relevantChunks::add);
        }

        return generateResponseWithChunks(query, relevantChunks);
    }
}

Best Practices

Document Segmentation

Use recursive splitting with 500-1000 token chunks for most applications
Maintain 20-50 token overlap between chunks for context preservation
Consider document structure (headings, paragraphs) when splitting
Use token-aware splitters for optimal embedding generation

Metadata Strategy

Include rich metadata for filtering and attribution:
- User and tenant identifiers for multi-tenancy
- Document type and category classification
- Creation and modification timestamps
- Version and author information
- Confidentiality and access level tags

Query Processing

Implement query preprocessing and cleaning
Consider query expansion for better recall
Apply dynamic filtering based on user context
Use re-ranking for improved result quality

Performance Optimization

Cache embeddings for repeated queries
Use batch embedding generation for bulk operations
Implement pagination for large result sets
Consider asynchronous processing for long operations

Common Patterns

Simple RAG Pipeline

@RequiredArgsConstructor
@Service
public class SimpleRAGPipeline {

    private final EmbeddingModel embeddingModel;
    private final EmbeddingStore<TextSegment> embeddingStore;
    private final ChatModel chatModel;

    public String answerQuestion(String question) {
        Embedding queryEmbedding = embeddingModel.embed(question).content();
        EmbeddingSearchRequest request = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(3)
            .build();

        List<TextSegment> segments = embeddingStore.search(request).matches().stream()
            .map(EmbeddingMatch::embedded)
            .collect(Collectors.toList());

        String context = segments.stream()
            .map(TextSegment::text)
            .collect(Collectors.joining("\n\n"));

        return chatModel.generate(context + "\n\nQuestion: " + question + "\nAnswer:");
    }
}

Hybrid Search (Vector + Keyword)

@Service
@RequiredArgsConstructor
public class HybridSearchService {

    private final EmbeddingStore<TextSegment> vectorStore;
    private final FullTextSearchEngine keywordEngine;
    private final EmbeddingModel embeddingModel;

    public List<Content> hybridSearch(String query, int maxResults) {
        // Vector search
        List<Content> vectorResults = performVectorSearch(query, maxResults);

        // Keyword search
        List<Content> keywordResults = performKeywordSearch(query, maxResults);

        // Combine and re-rank using RRF algorithm
        return combineResults(vectorResults, keywordResults, maxResults);
    }
}

Troubleshooting

Validation Failures

Embedding Count Mismatch : Thrown when segments != embeddings. Check splitter configuration and model availability.

Empty Retrieval Results : Call validateIngestion(testQuery) to verify embeddings are searchable. Check if document was ingested successfully.

Low Retrieval Scores : Verify minScore threshold (default 0.7) is not too high for your use case. Test with known queries.

Common Issues

Poor Retrieval Results

Check document chunk size and overlap settings
Verify embedding model compatibility
Ensure metadata filters are not too restrictive
Consider adding re-ranking step
Run validation to confirm embeddings exist

Slow Performance

Use cached embeddings for frequent queries
Optimize database indexing for vector stores
Implement pagination for large datasets
Consider async processing for bulk operations

High Memory Usage

Use disk-based embedding stores for large datasets
Implement proper pagination and filtering
Clean up unused embeddings periodically
Monitor and optimize chunk sizes

Constraints and Warnings

Embedding Model Costs : Generating embeddings for large document collections can be expensive; implement caching and batch processing.
Vector Store Scalability : In-memory stores are suitable for development only; use persistent stores (Pinecone, Qdrant, Redis) for production.
Chunk Size Trade-offs : Smaller chunks improve precision but lose context; larger chunks preserve context but may introduce noise.
Stale Data : Cached embeddings become stale when source documents change; implement update strategies.
Token Limits : RAG context windows have limits; typically 3-5 retrieved chunks fit within standard model limits.
Hallucination Risk : RAG reduces but doesn't eliminate hallucinations; always validate critical responses against sources.
Latency : Vector search and embedding generation add latency; consider async processing for real-time applications.
Metadata Filtering : Overly restrictive filters may return no results; implement fallback strategies.
Multi-tenancy : Ensure proper metadata isolation to prevent cross-tenant data leakage.

References

API Reference - Complete API documentation and interfaces
Examples - Production-ready examples and patterns
Official LangChain4j Documentation

Weekly Installs

359

Repository

giuseppe-trisci…oper-kit

GitHub Stars

174

First Seen

Feb 3, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

claude-code281

gemini-cli276

opencode275

codex271

cursor269

github-copilot255

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

40,300 周安装

LangChain4j RAG实现模式：构建文档对话与知识增强AI应用指南

🇨🇳中文介绍

LangChain4j RAG 实现模式

概述

何时使用此技能

使用说明

初始化 RAG 项目

设置文档摄取

相关 Skills

配置内容检索

创建支持 RAG 的 AI 服务

示例

基础文档处理

多领域助手

分层 RAG

最佳实践

文档分割

元数据策略

查询处理

性能优化

常见模式

简单 RAG 管道