Chroma向量数据库使用指南：本地与云端部署、混合搜索、嵌入模型选择

chroma by chroma-core/agent-skills

104 周安装量

13 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/chroma-core/agent-skills --skill chroma

AI/机器学习数据库搜索

🇨🇳中文介绍

说明

在编写任何代码之前，请收集以下信息：

部署目标：本地 Chroma 还是 Chroma Cloud？ * 如果是 Cloud：他们需要配置 API 密钥、租户和数据库 * 如果是 Local：确定他们需要持久化存储还是临时存储
搜索类型（仅限 Cloud）：仅密集搜索，还是混合搜索？ * 仅密集搜索：设置更简单，适用于大多数语义搜索 * 混合搜索（密集 + 稀疏）：更适合关键词密集的查询，使用 SPLADE
嵌入模型：使用哪个提供商/模型？ * 默认：@chroma-core/default-embed (TypeScript) 或内置模型 (Python) * OpenAI：text-embedding-3-large 最受欢迎，需要 @chroma-core/openai * 询问用户是否有偏好或现有的提供商
数据结构：他们要索引什么？ * 需要确定分块策略 * 需要为过滤设计元数据模式

决策流程

用户想要添加搜索功能
询问是本地 Chroma 还是 Chroma Cloud？
- 本地 Chroma
  - 使用带有密集嵌入模型的 collection.query()
- Chroma Cloud
  - 询问是否需要混合搜索

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

何时提问与何时继续

嵌入模型选择（涉及成本和质量影响）
Cloud 与本地部署
混合搜索与仅密集搜索
多租户数据隔离策略

使用合理的默认值继续：

使用 getOrCreateCollection() / get_or_create_collection()
使用余弦相似度（最常见）
分块大小小于 8KB
在元数据中存储源 ID 以便更新/删除

需要验证的内容

为 Cloud 部署设置了环境变量
导入了正确的客户端（CloudClient 与 Client）
安装了嵌入函数包（TypeScript）
Schema 和 Search API 仅与 Cloud 一起使用
重要： get_or_create_collection() 接受 embedding_function 或 schema，但不能同时接受两者。当需要多个索引（混合搜索）或稀疏嵌入时使用 Schema；对于简单的仅密集搜索使用 embedding_function。

Chroma Cloud 设置 (CLI)

要开始使用 Chroma Cloud，请使用 CLI 登录、创建数据库并将凭据写入 .env 文件：

chroma login
chroma db create <my_database_name>
chroma db connect <my_database_name> --env-file

这会将包含 CHROMA_API_KEY、CHROMA_TENANT 和 CHROMA_DATABASE 的 .env 文件写入当前目录。下面的代码示例从这些环境变量中读取。

TypeScript (Chroma Cloud):

import { CloudClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';

const client = new CloudClient({
  apiKey: process.env.CHROMA_API_KEY,
  tenant: process.env.CHROMA_TENANT,
  database: process.env.CHROMA_DATABASE,
});

const embeddingFunction = new DefaultEmbeddingFunction();
const collection = await client.getOrCreateCollection({
  name: 'my_collection',
  embeddingFunction,
});

// 添加文档
await collection.add({
  ids: ['doc1', 'doc2'],
  documents: ['First document text', 'Second document text'],
});

// 查询
const results = await collection.query({
  queryTexts: ['search query'],
  nResults: 5,
});

Python (Chroma Cloud):

import os
import chromadb

client = chromadb.CloudClient(
    api_key=os.environ["CHROMA_API_KEY"],
    tenant=os.environ["CHROMA_TENANT"],
    database=os.environ["CHROMA_DATABASE"],
)

collection = client.get_or_create_collection(name="my_collection")

# 添加文档
collection.add(
    ids=["doc1", "doc2"],
    documents=["First document text", "Second document text"],
)

# 查询
results = collection.query(
    query_texts=["search query"],
    n_results=5,
)

Chroma 是一个数据库。一个 Chroma 数据库包含多个集合。一个集合包含多个文档。

与关系型数据库中的表不同，集合是在应用程序级别创建和销毁的。每个 Chroma 数据库可以拥有数百万个集合。可以为每个用户、团队或组织创建一个集合。在 Chroma 中，分区是集合，而不是通过某个键对表进行分区。

集合没有行，它们有文档，文档是要搜索的文本数据。当数据被创建或更新时，客户端将创建数据的嵌入。这是基于提供给客户端的嵌入函数在客户端完成的。为了创建嵌入，客户端将使用其配置通过嵌入函数调用定义的嵌入模型提供商。这可能在进程内发生，但绝大多数情况下是通过 HTTP 在第三方服务上发生。

可以通过文档元数据进一步分区或过滤数据。每个文档都有一个键/值对象形式的元数据。键是字符串，值可以是字符串、整数或布尔值。元数据上有多种运算符。

在查询时，查询文本使用集合定义的嵌入函数进行嵌入，然后与其余查询参数一起发送到 Chroma。然后，Chroma 将考虑任何查询参数（如元数据过滤器）来减少潜在的结果集，然后使用查询向量与被查询集合中的向量索引之间的距离算法搜索最近邻。

通过使用 Chroma 客户端上的 get_or_create_collection()（TypeScript 中为 getOrCreateCollection()），可以轻松处理集合，避免了繁琐的样板代码。

本地版与 Cloud 版

Chroma 可以作为本地进程运行，也可以在云端通过 Chroma Cloud 使用。

本地可以完成的所有操作都可以在云端完成，但并非云端可以完成的所有操作都能在本地完成。

对开发者体验最大的区别是 Schema() 和 Search() API，这些仅在 Chroma Cloud 上可用。

除此之外，唯一需要改变的是从 Chroma 包中导入的客户端，接口是相同的。

如果您使用 Cloud，您可能希望使用 Schema() 和 Search() API。

此外，如果用户想要使用 Cloud，询问他们想要使用哪种类型的搜索。仅密集嵌入，还是混合搜索。如果是混合搜索，您可能希望使用 SPLADE 作为稀疏嵌入策略。

在使用嵌入函数时，默认嵌入函数是可用的，但它通常不是最佳选择。推荐的选择是使用 Chroma Cloud Qwen。Typescript：npm install @chroma-core/chroma-cloud-qwen，Python：包含但需要 pip install httpx。

在 TypeScript 中，您需要为每个嵌入函数安装一个包，根据用户所说的内容安装正确的包。

请注意，Chroma 对 SPLADE 和 Qwen（通过 TypeScript 中的 @chroma-core/chroma-cloud-qwen）提供服务器端嵌入支持，所有其他嵌入函数都是外部的。

如果您需要关于 Chroma 的更详细信息，超出了本技能涵盖的范围，请获取 Chroma 的 llms.txt 以获取全面的文档：https://docs.trychroma.com/llms.txt

Chroma 正则表达式过滤 - 学习如何在 Chroma 查询中使用正则表达式过滤器
查询与获取 - 从 Chroma 集合中查询和获取数据
模式 - Schema() 配置具有多个索引的集合
更新与删除 - 更新现有文档并从集合中删除数据
错误处理 - 处理使用 Chroma 时的错误和故障
本地 Chroma - 如何运行和使用本地 Chroma
Search() API - 一个用于在集合上进行密集和稀疏向量搜索以及混合搜索的表达性强且灵活的 API

Chroma 正则表达式过滤 - 学习如何在 Chroma 查询中使用正则表达式过滤器
查询与获取 - 从 Chroma 集合中查询和获取数据
模式 - Schema() 配置具有多个索引的集合
更新与删除 - 更新现有文档并从集合中删除数据
错误处理 - 处理使用 Chroma 时的错误和故障
本地 Chroma - 如何运行和使用本地 Chroma
Search() API - 一个用于在集合上进行密集和稀疏向量搜索以及混合搜索的表达性强且灵活的 API

Chroma CLI - 开始使用 Chroma CLI 进行云数据库管理
数据模型 - Chroma 如何存储数据的概述
将 Chroma 集成到现有系统中 - 为现有应用程序添加 Chroma 搜索的指南

🇺🇸English

Instructions

Before writing any code, gather this information:

Deployment target : Local Chroma or Chroma Cloud?
- If Cloud: they'll need API key, tenant, and database configured
- If Local: determine if they need persistence or ephemeral storage
Search type (Cloud only): Dense only, or hybrid search?
- Dense only: simpler setup, good for most semantic search
- Hybrid (dense + sparse): better for keyword-heavy queries, use SPLADE
Embedding model : Which provider/model?
- Default: @chroma-core/default-embed (TypeScript) or built-in (Python)
- OpenAI: text-embedding-3-large is most popular, requires @chroma-core/openai
- Ask the user if they have a preference or existing provider
Data structure : What are they indexing?
- Needed to determine chunking strategy
- Needed to design metadata schema for filtering

Decision workflow

User wants to add search
Ask Local Chroma or Chroma Cloud?
- Local Chroma
  - Use collection.query() with a dense embedding model
- Chroma Cloud
  - Ask if hybrid search is needed
    - Yes
      - Use Schema() + Search() APIs with SPLADE sparse index
    - No
      - Use collection.query() with a dense embedding model
Ask for which embedding model
Design metadata schema
Implement data sync strategy

When to ask questions vs proceed

Ask first:

Embedding model choice (cost and quality implications)
Cloud vs local deployment
Hybrid vs dense-only search
Multi-tenant data isolation strategy

Proceed with sensible defaults:

Use getOrCreateCollection() / get_or_create_collection()
Use cosine similarity (most common)
Chunk size under 8KB
Store source IDs in metadata for updates/deletes

What to validate

Environment variables are set for Cloud deployments
Correct client import (CloudClient vs Client)
Embedding function package is installed (TypeScript)
Schema and Search APIs only used with Cloud
Important: get_or_create_collection() accepts either an embedding_function OR a schema, but not both. Use Schema when you need multiple indexes (hybrid search) or sparse embeddings; use embedding_function for simple dense-only search.

Quick Start

Chroma Cloud Setup (CLI)

To get started with Chroma Cloud, use the CLI to log in, create a database, and write your credentials to a .env file:

chroma login
chroma db create <my_database_name>
chroma db connect <my_database_name> --env-file

This writes a .env file with CHROMA_API_KEY, CHROMA_TENANT, and CHROMA_DATABASE to the current directory. The code examples below read from these environment variables.

TypeScript (Chroma Cloud):

import { CloudClient } from 'chromadb';
import { DefaultEmbeddingFunction } from '@chroma-core/default-embed';

const client = new CloudClient({
  apiKey: process.env.CHROMA_API_KEY,
  tenant: process.env.CHROMA_TENANT,
  database: process.env.CHROMA_DATABASE,
});

const embeddingFunction = new DefaultEmbeddingFunction();
const collection = await client.getOrCreateCollection({
  name: 'my_collection',
  embeddingFunction,
});

// Add documents
await collection.add({
  ids: ['doc1', 'doc2'],
  documents: ['First document text', 'Second document text'],
});

// Query
const results = await collection.query({
  queryTexts: ['search query'],
  nResults: 5,
});

Python (Chroma Cloud):

import os
import chromadb

client = chromadb.CloudClient(
    api_key=os.environ["CHROMA_API_KEY"],
    tenant=os.environ["CHROMA_TENANT"],
    database=os.environ["CHROMA_DATABASE"],
)

collection = client.get_or_create_collection(name="my_collection")

# Add documents
collection.add(
    ids=["doc1", "doc2"],
    documents=["First document text", "Second document text"],
)

# Query
results = collection.query(
    query_texts=["search query"],
    n_results=5,
)

Understanding Chroma

Chroma is a database. A Chroma database contains collections. A collection contains documents.

Unlike tables in a relational database, collections are created and destroyed at the application level. Each Chroma database can have millions of collections. There may be a collection for each user, or team or organization. Rather than tables be partitioned by some key, the partition in Chroma is the collection.

Collections don't have rows, they have documents, the document is the text data that is to be searched. When data is created or updated, the client will create an embedding of the data. This is done on the client side based on the embedding function(s) provided to the client. To create the embedding the client will use its configuration to call out to the defined embedding model provider via the embedding function. This could happen in process, but overwhelmingly happens on a third party service over HTTP.

There are ways to further partition or filtering data with document metadata. Each document has a key/value object of metadata. keys are strings and values can be strings, ints or booleans. There are a variety of operators on the metadata.

During query time, the query text is embedded using the collection's defined embedding function and then is sent to Chroma with the rest of the query parameters. Chroma will then consider any query parameters like metadata filters to reduce the potential result set, then search for the nearest neighbors using a distance algorithm between the query vector and the index of vectors in the collection that is being queried.

Working with collections is made easy by using the get_or_create_collection() (getOrCreateCollection() in TypeScript) on the Chroma client, preventing annoying boilerplate code.

Local vs Cloud

Chroma can be run locally as a process or can be used in the cloud with Chroma Cloud.

Everything that can be done locally can be done in the cloud, but not everything that can be done in the cloud can be done locally.

The biggest difference to the developer experience is the Schema() and Search() APIs, those are only available on Chroma Cloud.

Otherwise, the only thing that needs to change is the client that is imported from the Chroma package, the interface is the same.

If you're using cloud, you probably want to use the Schema() and Search() APIs.

Also, if the user wants to use cloud, ask them what type of search they want to use. Just dense embeddings, or hybrid. If hybrid, you probably want to use SPLADE as the sparse embedding strategy.

Embeddings

When working with embedding functions, the default embedding function is available, but it's often not the best option. The recommended option is to use Chroma Cloud Qwen. Typescript: npm install @chroma-core/chroma-cloud-qwen, python, included but needs pip install httpx.

In typescript, you need to install a package for each embedding function, install the correct one based on what the user says.

Note that Chroma has server side embedding support for SPLADE and Qwen (via @chroma-core/chroma-cloud-qwen in typescript), all other embedding functions would be external.

Learn More

If you need more detailed information about Chroma beyond what's covered in this skill, fetch Chroma's llms.txt for comprehensive documentation: https://docs.trychroma.com/llms.txt

Available Topics

Typescript

Chroma Regex Filtering - Learn how to use regex filters in Chroma queries
Query and Get - Query and Get Data from Chroma Collections
Schema - Schema() configures collections with multiple indexes
Updating and Deleting - Update existing documents and delete data from collections
Error Handling - Handling errors and failures when working with Chroma
Local Chroma - How to run and use local chroma
Search() API - An expressive and flexible API for doing dense and sparse vector search on collections, as well as hybrid search

Python

Chroma Regex Filtering - Learn how to use regex filters in Chroma queries
Query and Get - Query and Get Data from Chroma Collections
Schema - Schema() configures collections with multiple indexes
Updating and Deleting - Update existing documents and delete data from collections
Error Handling - Handling errors and failures when working with Chroma
Local Chroma - How to run and use local chroma
Search() API - An expressive and flexible API for doing dense and sparse vector search on collections, as well as hybrid search

General

Chroma CLI - Getting started with the Chroma CLI for cloud database management
Data Model - An overview of how Chroma stores data
Integrating Chroma into an existing system - Guidance for adding Chroma search to an existing application

Weekly Installs

Repository

chroma-core/agent-skills

GitHub Stars

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex54

gemini-cli51

opencode50

github-copilot47

kimi-cli44

amp44

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

50,900 周安装

Chroma向量数据库使用指南：本地与云端部署、混合搜索、嵌入模型选择

🇨🇳中文介绍

说明

在编写任何代码之前，请收集以下信息：

决策流程

相关 Skills

何时提问与何时继续

需要验证的内容

快速开始

Chroma Cloud 设置 (CLI)

理解 Chroma

本地版与 Cloud 版

嵌入

了解更多

可用主题

Typescript

Python

通用

🇺🇸English

Instructions

Before writing any code, gather this information:

Decision workflow

When to ask questions vs proceed

What to validate

Quick Start

Chroma Cloud Setup (CLI)

Understanding Chroma

Local vs Cloud

Embeddings

Learn More

Available Topics

Typescript

Python

General

最新 Skills