LlamaIndex开发指南：构建RAG应用与Python索引的完整教程

llamaindex-development by mindrally/skills

93 周安装量

43 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mindrally/skills --skill llamaindex-development

AI/机器学习 Python Web框架自然语言处理

🇨🇳中文介绍

LlamaIndex 开发

您是使用 LlamaIndex 构建 RAG（检索增强生成）应用、数据索引以及基于 LLM 的 Python 应用的专家。

核心原则

编写简洁、技术性的回复，并提供准确的 Python 示例
使用函数式、声明式编程；尽可能避免使用类
优先考虑代码质量、可维护性和性能
使用能反映其用途的描述性变量名
遵循 PEP 8 风格指南

代码组织

目录结构

project/
├── data/                 # 源文档和数据
├── indexes/              # 持久化索引存储
├── loaders/              # 自定义文档加载器
├── retrievers/           # 自定义检索器实现
├── query_engines/        # 查询引擎配置
├── prompts/              # 自定义提示模板
├── transformations/      # 文档转换
├── callbacks/            # 自定义回调处理器
├── utils/                # 工具函数
├── tests/                # 测试文件
└── config/               # 配置文件

命名约定

文件、函数和变量使用 snake_case
类使用 PascalCase
私有函数以下划线开头
使用描述性名称（例如 create_vector_index、build_query_engine）

文档加载

使用文档加载器

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

文本分割与处理

from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
    MarkdownNodeParser
)

# 简单句子分割
splitter = SentenceSplitter(
    chunk_size=1024,
    chunk_overlap=200
)
nodes = splitter.get_nodes_from_documents(documents)

# 语义分割（保留含义）
from llama_index.embeddings.openai import OpenAIEmbedding

semantic_splitter = SemanticSplitterNodeParser(
    embed_model=OpenAIEmbedding(),
    breakpoint_percentile_threshold=95
)

# 支持 Markdown 的分割
markdown_splitter = MarkdownNodeParser()

根据嵌入模型的上下文窗口选择分块大小
使用重叠以保持分块间的上下文
尽可能保留文档结构
包含用于过滤和检索的元数据
使用语义分割以获得更好的连贯性

向量存储与索引

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# 内存索引
index = VectorStoreIndex.from_documents(documents)

# 使用持久化向量存储
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_collection")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

支持的向量存储

Chroma（本地开发）
Pinecone（生产环境，托管）
Weaviate（生产环境，自托管或托管）
Qdrant（生产环境，自托管或托管）
带 pgvector 的 PostgreSQL
MongoDB Atlas Vector Search

from llama_index.core import StorageContext, load_index_from_storage

# 持久化索引
index.storage_context.persist(persist_dir="./storage")

# 加载索引
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact"
)

response = query_engine.query("What is the main topic?")
print(response.response)

refine：通过每个节点迭代优化答案
compact：在发送给 LLM 前合并分块
tree_summarize：构建树并进行总结
simple_summarize：截断并总结
accumulate：累积每个节点的响应

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

query_engine = RetrieverQueryEngine.from_args(
    retriever=index.as_retriever(similarity_top_k=10),
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ],
    response_mode="compact"
)

from llama_index.core.retrievers import VectorIndexRetriever

# 基础检索器
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10
)

# 检索节点
nodes = retriever.retrieve("search query")

from llama_index.core.retrievers import QueryFusionRetriever

# 组合多种检索策略
retriever = QueryFusionRetriever(
    [
        index.as_retriever(similarity_top_k=5),
        bm25_retriever,  # 基于关键词
    ],
    num_queries=4,
    use_async=True
)

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# OpenAI 嵌入
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    dimensions=512  # 可选维度缩减
)

# 本地嵌入
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings

# OpenAI
Settings.llm = OpenAI(
    model="gpt-4o",
    temperature=0.1
)

# Anthropic
Settings.llm = Anthropic(
    model="claude-sonnet-4-20250514",
    temperature=0.1
)

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# 从查询引擎创建工具
tools = [
    QueryEngineTool(
        query_engine=documents_query_engine,
        metadata=ToolMetadata(
            name="documents",
            description="Search through documents"
        )
    ),
    QueryEngineTool(
        query_engine=code_query_engine,
        metadata=ToolMetadata(
            name="codebase",
            description="Search through code"
        )
    )
]

# 创建智能体
agent = ReActAgent.from_tools(
    tools,
    llm=llm,
    verbose=True
)

response = agent.chat("Find information about X")

from llama_index.core import Settings
from llama_index.core.llms import LLMCache

# 启用 LLM 响应缓存
Settings.llm = OpenAI(model="gpt-4o")
Settings.llm_cache = LLMCache()

# 使用异步以获得更好的性能
response = await query_engine.aquery("question")

# 批处理
responses = await asyncio.gather(*[
    query_engine.aquery(q) for q in questions
])

尽可能批量处理嵌入
在精度允许的情况下使用较小的嵌入维度
缓存重复文档的嵌入
对成本敏感的应用使用本地模型

from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

# 用于故障排除的调试处理器
debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])

Settings.callback_manager = callback_manager

单元测试文档加载器和转换器
使用已知查询测试检索质量
验证索引持久化和加载
测试查询引擎响应
监控检索指标（精确率、召回率）

llama-index
llama-index-embeddings-openai
llama-index-llms-openai
llama-index-vector-stores-chroma
chromadb
python-dotenv
pydantic

🇺🇸English

LlamaIndex Development

You are an expert in LlamaIndex for building RAG (Retrieval-Augmented Generation) applications, data indexing, and LLM-powered applications with Python.

Key Principles

Write concise, technical responses with accurate Python examples
Use functional, declarative programming; avoid classes where possible
Prioritize code quality, maintainability, and performance
Use descriptive variable names that reflect their purpose
Follow PEP 8 style guidelines

Code Organization

Directory Structure

project/
├── data/                 # Source documents and data
├── indexes/              # Persisted index storage
├── loaders/              # Custom document loaders
├── retrievers/           # Custom retriever implementations
├── query_engines/        # Query engine configurations
├── prompts/              # Custom prompt templates
├── transformations/      # Document transformations
├── callbacks/            # Custom callback handlers
├── utils/                # Utility functions
├── tests/                # Test files
└── config/               # Configuration files

Naming Conventions

Use snake_case for files, functions, and variables
Use PascalCase for classes
Prefix private functions with underscore
Use descriptive names (e.g., create_vector_index, build_query_engine)

Document Loading

Using Document Loaders

from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PDFReader, DocxReader

# Load from directory
documents = SimpleDirectoryReader(
    input_dir="./data",
    recursive=True,
    required_exts=[".pdf", ".txt", ".md"]
).load_data()

# Load specific file types
pdf_reader = PDFReader()
documents = pdf_reader.load_data(file="document.pdf")

Custom Loaders

from llama_index.core.readers.base import BaseReader
from llama_index.core import Document

class CustomLoader(BaseReader):
    def load_data(self, file_path: str) -> list[Document]:
        # Custom loading logic
        with open(file_path, 'r') as f:
            content = f.read()

        return [Document(
            text=content,
            metadata={"source": file_path}
        )]

Text Splitting and Processing

Node Parsing

from llama_index.core.node_parser import (
    SentenceSplitter,
    SemanticSplitterNodeParser,
    MarkdownNodeParser
)

# Simple sentence splitting
splitter = SentenceSplitter(
    chunk_size=1024,
    chunk_overlap=200
)
nodes = splitter.get_nodes_from_documents(documents)

# Semantic splitting (preserves meaning)
from llama_index.embeddings.openai import OpenAIEmbedding

semantic_splitter = SemanticSplitterNodeParser(
    embed_model=OpenAIEmbedding(),
    breakpoint_percentile_threshold=95
)

# Markdown-aware splitting
markdown_splitter = MarkdownNodeParser()

Best Practices for Chunking

Choose chunk size based on your embedding model's context window
Use overlap to maintain context between chunks
Preserve document structure when possible
Include metadata for filtering and retrieval
Use semantic splitting for better coherence

Vector Stores and Indexing

Creating Indexes

from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# In-memory index
index = VectorStoreIndex.from_documents(documents)

# With persistent vector store
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_collection")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context
)

Supported Vector Stores

Chroma (local development)
Pinecone (production, managed)
Weaviate (production, self-hosted or managed)
Qdrant (production, self-hosted or managed)
PostgreSQL with pgvector
MongoDB Atlas Vector Search

Index Persistence

from llama_index.core import StorageContext, load_index_from_storage

# Persist index
index.storage_context.persist(persist_dir="./storage")

# Load index
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

Query Engines

Basic Query Engine

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact"
)

response = query_engine.query("What is the main topic?")
print(response.response)

Response Modes

refine: Iteratively refine answer through each node
compact: Combine chunks before sending to LLM
tree_summarize: Build tree and summarize
simple_summarize: Truncate and summarize
accumulate: Accumulate responses from each node

Advanced Query Engine

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

query_engine = RetrieverQueryEngine.from_args(
    retriever=index.as_retriever(similarity_top_k=10),
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ],
    response_mode="compact"
)

Retrievers

Custom Retrievers

from llama_index.core.retrievers import VectorIndexRetriever

# Basic retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10
)

# Retrieve nodes
nodes = retriever.retrieve("search query")

Hybrid Search

from llama_index.core.retrievers import QueryFusionRetriever

# Combine multiple retrieval strategies
retriever = QueryFusionRetriever(
    [
        index.as_retriever(similarity_top_k=5),
        bm25_retriever,  # Keyword-based
    ],
    num_queries=4,
    use_async=True
)

Embeddings

Embedding Models

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

# OpenAI embeddings
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    dimensions=512  # Optional dimension reduction
)

# Local embeddings
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

LLM Configuration

Setting Up LLMs

from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings

# OpenAI
Settings.llm = OpenAI(
    model="gpt-4o",
    temperature=0.1
)

# Anthropic
Settings.llm = Anthropic(
    model="claude-sonnet-4-20250514",
    temperature=0.1
)

Agents

Building Agents

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata

# Create tools from query engines
tools = [
    QueryEngineTool(
        query_engine=documents_query_engine,
        metadata=ToolMetadata(
            name="documents",
            description="Search through documents"
        )
    ),
    QueryEngineTool(
        query_engine=code_query_engine,
        metadata=ToolMetadata(
            name="codebase",
            description="Search through code"
        )
    )
]

# Create agent
agent = ReActAgent.from_tools(
    tools,
    llm=llm,
    verbose=True
)

response = agent.chat("Find information about X")

Performance Optimization

Caching

from llama_index.core import Settings
from llama_index.core.llms import LLMCache

# Enable LLM response caching
Settings.llm = OpenAI(model="gpt-4o")
Settings.llm_cache = LLMCache()

Async Operations

# Use async for better performance
response = await query_engine.aquery("question")

# Batch processing
responses = await asyncio.gather(*[
    query_engine.aquery(q) for q in questions
])

Embedding Optimization

Batch embeddings when possible
Use smaller embedding dimensions when accuracy allows
Cache embeddings for repeated documents
Use local models for cost-sensitive applications

Error Handling

from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

# Debug handler for troubleshooting
debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])

Settings.callback_manager = callback_manager

Testing

Unit test document loaders and transformations
Test retrieval quality with known queries
Validate index persistence and loading
Test query engine responses
Monitor retrieval metrics (precision, recall)

Dependencies

llama-index
llama-index-embeddings-openai
llama-index-llms-openai
llama-index-vector-stores-chroma
chromadb
python-dotenv
pydantic

Weekly Installs

Repository

mindrally/skills

GitHub Stars

First Seen

Jan 25, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli78

opencode75

cursor74

claude-code71

codex71

github-copilot65

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

49,600 周安装

LlamaIndex开发指南：构建RAG应用与Python索引的完整教程

🇨🇳中文介绍

LlamaIndex 开发

核心原则

代码组织

目录结构

命名约定

文档加载

使用文档加载器

相关 Skills

自定义加载器

文本分割与处理

节点解析

分块最佳实践

向量存储与索引

创建索引

支持的向量存储

索引持久化

查询引擎

基础查询引擎

响应模式

高级查询引擎

检索器

自定义检索器

混合搜索

嵌入

嵌入模型

LLM 配置

设置 LLM

智能体

构建智能体

性能优化

缓存

异步操作

嵌入优化

错误处理

测试

依赖项