google-gemini-embeddings by jezweb/claude-skills
npx skills add https://github.com/jezweb/claude-skills --skill google-gemini-embeddingsGoogle Gemini 嵌入 API 的完整生产就绪指南
此技能全面涵盖了用于生成文本嵌入的 gemini-embedding-001 模型,包括 SDK 使用、REST API 模式、批处理、与 Cloudflare Vectorize 的 RAG 集成,以及语义搜索和文档聚类等高级用例。
安装 Google Generative AI SDK:
npm install @google/genai@^1.37.0
对于 TypeScript 项目:
npm install -D typescript@^5.0.0
将你的 Gemini API 密钥设置为环境变量:
export GEMINI_API_KEY="your-api-key-here"
从以下地址获取你的 API 密钥:https://aistudio.google.com/apikey
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'What is the meaning of life?',
config: {
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768
}
});
console.log(response.embedding.values); // [0.012, -0.034, ...]
console.log(response.embedding.values.length); // 768
结果:一个表示文本语义含义的 768 维嵌入向量。
当前模型:gemini-embedding-001(稳定,可用于生产)
gemini-embedding-exp-03-07(已于 2025 年 10 月弃用,请勿使用)该模型使用套娃表示学习支持灵活的输岀维度:
| 维度 | 用例 | 存储 | 性能 |
|---|---|---|---|
| 768 | 推荐用于大多数用例 | 低 | 快 |
| 1536 | 准确性与效率之间的平衡 | 中等 | 中等 |
| 3072 | 最高准确性(默认) | 高 | 较慢 |
| 128-3071 | 自定义(范围内的任意值) | 可变 | 可变 |
默认:3072 维 推荐:768、1536 或 3072 以获得最佳性能
| 层级 | RPM | TPM | RPD | 要求 |
|---|---|---|---|---|
| 免费 | 100 | 30,000 | 1,000 | 无需结算账户 |
| 层级 1 | 3,000 | 1,000,000 | - | 关联结算账户 |
| 层级 2 | 5,000 | 5,000,000 | - | 消费额 $250+,等待 30 天 |
| 层级 3 | 10,000 | 10,000,000 | - | 消费额 $1,000+,等待 30 天 |
RPM = 每分钟请求数 TPM = 每分钟令牌数 RPD = 每日请求数
{
embedding: {
values: number[] // 浮点数数组
}
}
单文本嵌入:
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'The quick brown fox jumps over the lazy dog',
config: {
taskType: 'SEMANTIC_SIMILARITY',
outputDimensionality: 768
}
});
console.log(response.embedding.values);
// [0.00388, -0.00762, 0.01543, ...]
适用于不支持 SDK 的 Workers/边缘环境:
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const apiKey = env.GEMINI_API_KEY;
const text = "What is the meaning of life?";
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
{
method: 'POST',
headers: {
'x-goog-api-key': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: {
parts: [{ text }]
},
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768
})
}
);
const data = await response.json();
// 响应格式:
// {
// embedding: {
// values: [0.012, -0.034, ...]
// }
// }
return new Response(JSON.stringify(data), {
headers: { 'Content-Type': 'application/json' }
});
}
};
interface EmbeddingResponse {
embedding: {
values: number[];
};
}
const response: EmbeddingResponse = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'Sample text',
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
const embedding: number[] = response.embedding.values;
const dimensions: number = embedding.length; // 默认 3072
⚠️ 关键:当使用 3072 以外的维度时,在计算相似度之前必须对嵌入进行归一化。只有 3072 维的嵌入是由 API 预先归一化的。
为何重要:未归一化的嵌入具有不同的幅度,这会扭曲余弦相似度计算,导致错误的搜索结果。
归一化辅助函数:
/**
* 归一化嵌入向量以进行准确的相似度计算。
* 对于 3072 以外的维度是必需的。
*
* @param vector - 来自 API 响应的嵌入值
* @returns 归一化向量(单位长度)
*/
function normalize(vector: number[]): number[] {
const magnitude = Math.sqrt(
vector.reduce((sum, val) => sum + val * val, 0)
);
return vector.map(val => val / magnitude);
}
// 使用 768 或 1536 维度的用法
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768 // 不是 3072
}
});
// ❌ 错误 - 直接使用原始值
const embedding = response.embedding.values;
await vectorize.insert([{ id, values: embedding }]);
// ✅ 正确 - 先归一化
const normalized = normalize(response.embedding.values);
await vectorize.insert([{ id, values: normalized }]);
来源:官方嵌入文档
同时为多个文本生成嵌入:
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const texts = [
"What is the meaning of life?",
"How does photosynthesis work?",
"Tell me about the history of the internet."
];
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: texts, // 字符串数组
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768
}
});
// 处理每个嵌入
response.embeddings.forEach((embedding, index) => {
console.log(`Text ${index}: ${texts[index]}`);
console.log(`Embedding: ${embedding.values.slice(0, 5)}...`);
console.log(`Dimensions: ${embedding.values.length}`);
});
使用 batchEmbedContents 端点:
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:batchEmbedContents',
{
method: 'POST',
headers: {
'x-goog-api-key': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
requests: texts.map(text => ({
model: 'models/gemini-embedding-001',
content: {
parts: [{ text }]
},
taskType: 'RETRIEVAL_DOCUMENT'
}))
})
}
);
const data = await response.json();
// data.embeddings: Array of {values: number[]}
⚠️ 排序错误 (2025 年 12 月):批量 API 在处理大批量(>500 个文本)时可能无法保持顺序。
⚠️ 内存限制 (2025 年 12 月):大批量(>10k 嵌入)可能导致 ERR_STRING_TOO_LONG 崩溃。
Cannot create a string longer than 0x1fffffe8 characters⚠️ 速率限制异常 (2026 年 1 月):批量 API 可能返回 429 RESOURCE_EXHAUSTED,即使未超过配额。
处理大型数据集时,对请求进行分块以保持在速率限制内:
async function batchEmbedWithRateLimit(
texts: string[],
batchSize: number = 50, // 由于排序错误,从 100 减少
delayMs: number = 60000 // 批次间延迟 1 分钟
): Promise<number[][]> {
const allEmbeddings: number[][] = [];
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
console.log(`Processing batch ${i / batchSize + 1} (${batch.length} texts)`);
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: batch,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768
}
});
allEmbeddings.push(...response.embeddings.map(e => e.values));
// 等待下一个批次(最后一个批次除外)
if (i + batchSize < texts.length) {
await new Promise(resolve => setTimeout(resolve, delayMs));
}
}
return allEmbeddings;
}
// 用法
const embeddings = await batchEmbedWithRateLimit(documents, 50);
提示:
taskType 参数针对特定用例优化嵌入。为获得最佳结果,请始终指定任务类型。
| 任务类型 | 用例 | 示例 |
|---|---|---|
| RETRIEVAL_QUERY | 用户搜索查询 | "How do I fix a flat tire?" |
| RETRIEVAL_DOCUMENT | 待索引/搜索的文档 | 产品描述、文章 |
| SEMANTIC_SIMILARITY | 比较文本相似性 | 重复检测、聚类 |
| CLASSIFICATION | 文本分类 | 垃圾邮件检测、情感分析 |
| CLUSTERING | 分组相似文本 | 主题建模、内容组织 |
| CODE_RETRIEVAL_QUERY | 代码搜索查询 | "function to sort array" |
| QUESTION_ANSWERING | 寻求答案的问题 | FAQ 匹配 |
| FACT_VERIFICATION | 用证据验证声明 | 事实核查系统 |
RAG 系统(检索增强生成):
// 嵌入用户查询时
const queryEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: { taskType: 'RETRIEVAL_QUERY' } // ← 使用 RETRIEVAL_QUERY
});
// 为索引嵌入文档时
const docEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: documentText,
config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← 使用 RETRIEVAL_DOCUMENT
});
语义搜索:
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
文档聚类:
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'CLUSTERING' }
});
使用正确的任务类型能显著提高检索质量:
// ❌ 差:未指定任务类型
const embedding1 = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery
});
// ✅ 好:指定了任务类型
const embedding2 = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: { taskType: 'RETRIEVAL_QUERY' }
});
结果:使用正确的任务类型可以将搜索相关性提高 10-30%。
RAG(检索增强生成)将向量搜索与 LLM 生成相结合,创建使用自定义知识库回答问题的 AI 系统。
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// 为文本块生成嵌入
async function embedChunks(chunks: string[]): Promise<number[][]> {
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: chunks,
config: {
taskType: 'RETRIEVAL_DOCUMENT', // ← 用于索引的文档
outputDimensionality: 768 // ← 匹配 Vectorize 索引维度
}
});
return response.embeddings.map(e => e.values);
}
// 存储到 Cloudflare Vectorize
async function storeInVectorize(
env: Env,
chunks: string[],
embeddings: number[][]
) {
const vectors = chunks.map((chunk, i) => ({
id: `doc-${Date.now()}-${i}`,
values: embeddings[i],
metadata: { text: chunk }
}));
await env.VECTORIZE.insert(vectors);
}
async function ragQuery(env: Env, userQuery: string): Promise<string> {
// 1. 嵌入用户查询
const queryResponse = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: {
taskType: 'RETRIEVAL_QUERY', // ← 查询,不是文档
outputDimensionality: 768
}
});
const queryEmbedding = queryResponse.embedding.values;
// 2. 在 Vectorize 中搜索相似文档
const results = await env.VECTORIZE.query(queryEmbedding, {
topK: 5,
returnMetadata: true
});
// 3. 从顶部结果中提取上下文
const context = results.matches
.map(match => match.metadata.text)
.join('\n\n');
// 4. 使用上下文生成响应
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `Context:\n${context}\n\nQuestion: ${userQuery}\n\nAnswer based on the context above:`
});
return response.text;
}
创建 Vectorize 索引(适用于 Gemini 的 768 维度):
npx wrangler vectorize create gemini-embeddings --dimensions 768 --metric cosine
在 wrangler.jsonc 中绑定:
{
"name": "my-rag-app",
"main": "src/index.ts",
"compatibility_date": "2025-10-25",
"vectorize": {
"bindings": [
{
"binding": "VECTORIZE",
"index_name": "gemini-embeddings"
}
]
}
}
完整的 RAG Worker:
完整实现请参见 templates/rag-with-vectorize.ts。
1. API 密钥缺失或无效
// ❌ 错误:未设置 API 密钥
const ai = new GoogleGenAI({});
// ✅ 正确
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
if (!process.env.GEMINI_API_KEY) {
throw new Error('GEMINI_API_KEY environment variable not set');
}
2. 维度不匹配
// ❌ 错误:嵌入有 3072 维,Vectorize 期望 768
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text
// 未指定 outputDimensionality → 默认为 3072
});
await env.VECTORIZE.insert([{
id: '1',
values: embedding.embedding.values // 3072 维,但索引是 768!
}]);
// ✅ 正确:匹配维度
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { outputDimensionality: 768 } // ← 匹配索引维度
});
3. 速率限制
// ❌ 错误:429 请求过多
for (let i = 0; i < 1000; i++) {
await ai.models.embedContent({ /* ... */ }); // 超过免费层级的 100 RPM
}
// ✅ 正确:实现速率限制
async function embedWithRetry(text: string, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
} catch (error: any) {
if (error.status === 429 && attempt < maxRetries - 1) {
const delay = Math.pow(2, attempt) * 1000; // 指数退避
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
}
所有 8 个已记录的错误及其详细解决方案,请参见 references/top-errors.md。
本节记录了在生产使用中发现的超出上述基本错误之外的其他问题。
错误:相似度分数不正确,未抛出错误 来源:官方嵌入文档 发生原因:只有 3072 维的嵌入是由 API 预先归一化的。所有其他维度(128-3071)具有不同的幅度,会扭曲余弦相似度。 预防:使用 3072 以外的维度时,始终对嵌入进行归一化。
function normalize(vector: number[]): number[] {
const magnitude = Math.sqrt(vector.reduce((sum, val) => sum + val * val, 0));
return vector.map(val => val / magnitude);
}
// 当使用 768 或 1536 维度时
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { outputDimensionality: 768 }
});
const normalized = normalize(response.embedding.values);
// 现在可以安全地进行相似度计算
错误:静默数据损坏 - 嵌入以错误顺序返回 来源:GitHub Issue #1207 发生原因:批量 API 在处理大批量(>500 个文本)时不保持顺序。例如:第 328 项出现在第 628 位。 预防:处理较小的批次(<100 个文本)或添加唯一标识符以验证顺序。
// 带验证的更安全方法
const taggedTexts = texts.map((text, i) => `[ID:${i}] ${text}`);
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: taggedTexts,
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});
// 如果需要,通过解析 ID 来验证顺序
错误:Cannot create a string longer than 0x1fffffe8 characters 来源:GitHub Issue #1205 发生原因:批量 API 响应包含过多的空白字符,导致响应大小在大型负载(>10k 嵌入)下超过 Node.js 字符串限制(约 536MB)。 预防:限制每批次 <5,000 个文本。
// 安全的批次大小
async function batchEmbedSafe(texts: string[]) {
const maxBatchSize = 5000;
if (texts.length > maxBatchSize) {
throw new Error(`Batch too large: ${texts.length} texts (max: ${maxBatchSize})`);
}
// 处理批次...
}
错误:维度不匹配 - 得到 3072 维度而不是指定的 768 来源:Medium 文章 已验证:多个社区报告 发生原因:LangChain 的 GoogleGenerativeAIEmbeddings 类在传递给构造函数(Python SDK)时静默忽略 output_dimensionality 参数。 预防:将维度参数传递给 embed_documents() 方法,而不是构造函数。JavaScript 用户应验证新的 @google/genai SDK 没有类似行为。
# ❌ 错误 - 参数被静默忽略
embeddings = GoogleGenerativeAIEmbeddings(
model="gemini-embedding-001",
output_dimensionality=768 # 被忽略!
)
# ✅ 正确 - 传递给方法
embeddings = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001")
result = embeddings.embed_documents(["text"], output_dimensionality=768)
错误:单文本嵌入比预期更快地达到速率限制 来源:GitHub Issue #427 (Python SDK) 已验证:googleapis 组织中的官方问题 发生原因:embed_content() 函数内部调用 batchEmbedContents 端点,即使是单文本。这导致更高的速率限制消耗(批量端点有不同的限制)。 预防:在单嵌入请求之间添加延迟,并对 429 错误实现指数退避。
// 添加延迟以避免速率限制
async function embedWithDelay(text: string, delayMs: number = 100) {
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
await new Promise(resolve => setTimeout(resolve, delayMs));
return response.embedding.values;
}
✅ 指定任务类型
// 任务类型针对你的用例优化嵌入
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'RETRIEVAL_QUERY' } // ← 始终指定
});
✅ 与 Vectorize 匹配维度
// 确保嵌入与你的 Vectorize 索引维度匹配
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { outputDimensionality: 768 } // ← 匹配索引
});
✅ 实现速率限制
// 对 429 错误使用指数退避
async function embedWithBackoff(text: string) {
// 实现来自错误处理部分
}
✅ 缓存嵌入
// 缓存嵌入以避免冗余的 API 调用
const cache = new Map<string, number[]>();
async function getCachedEmbedding(text: string): Promise<number[]> {
if (cache.has(text)) {
return cache.get(text)!;
}
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
const embedding = response.embedding.values;
cache.set(text, embedding);
return embedding;
}
✅ 对多个文本使用批量 API
// 单个批量请求 vs 多个单独请求
const embeddings = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: texts, // 文本数组
config: { taskType: 'RETRIEVAL_DOCUMENT' }
});
❌ 不要跳过任务类型
// 降低质量 10-30%
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text
// 缺少 taskType!
});
❌ 不要混合不同的维度
// 无法比较不同维度的嵌入
const emb1 = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text1,
config: { outputDimensionality: 768 }
});
const emb2 = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text2,
config: { outputDimensionality: 1536 } // 不同维度!
});
// ❌ 无法计算不同维度之间的相似度
const similarity = cosineSimilarity(emb1.embedding.values, emb2.embedding.values);
❌ 不要为 RAG 使用错误的任务类型
// 降低搜索质量
const queryEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: query,
config: { taskType: 'RETRIEVAL_DOCUMENT' } // 错误!应该是 RETRIEVAL_QUERY
});
package.json - 包含已验证版本的包配置basic-embeddings.ts - 使用 SDK 的单文本嵌入embeddings-fetch.ts - 适用于 Cloudflare Workers 的基于 Fetch 的方法batch-embeddings.ts - 带速率限制的批处理rag-with-vectorize.ts - 与 Vectorize 的完整 RAG 实现model-comparison.md - 比较 Gemini vs OpenAI vs Workers AI 嵌入vectorize-integration.md - Cloudflare Vectorize 设置和模式rag-patterns.md - 完整的 RAG 实现策略dimension-guide.md - 选择正确的维度(768 vs 1536 vs 3072)top-errors.md - 8 个常见错误及详细解决方案check-versions.sh - 验证 @google/genai 包版本是否为最新/websites/ai_google_dev_gemini-api令牌节省:与手动实现相比约 60% 预防的错误:13 个已记录的错误及解决方案(8 个基本 + 5 个已知问题)生产测试:✅ 已在 RAG 应用中验证 包版本:@google/genai@1.37.0 最后更新:2026-01-21 变更:添加了归一化要求、批量 API 警告(排序错误、内存限制、速率限制异常)、LangChain 兼容性说明
MIT 许可证 - 可在个人和商业项目中免费使用。
问题或疑问?
每周安装数
337
仓库
GitHub 星标数
643
首次出现
2026 年 1 月 20 日
安全审计
安装于
claude-code275
gemini-cli227
opencode224
antigravity211
cursor205
codex192
Complete production-ready guide for Google Gemini embeddings API
This skill provides comprehensive coverage of the gemini-embedding-001 model for generating text embeddings, including SDK usage, REST API patterns, batch processing, RAG integration with Cloudflare Vectorize, and advanced use cases like semantic search and document clustering.
Install the Google Generative AI SDK:
npm install @google/genai@^1.37.0
For TypeScript projects:
npm install -D typescript@^5.0.0
Set your Gemini API key as an environment variable:
export GEMINI_API_KEY="your-api-key-here"
Get your API key from: https://aistudio.google.com/apikey
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'What is the meaning of life?',
config: {
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768
}
});
console.log(response.embedding.values); // [0.012, -0.034, ...]
console.log(response.embedding.values.length); // 768
Result : A 768-dimension embedding vector representing the semantic meaning of the text.
Current Model : gemini-embedding-001 (stable, production-ready)
gemini-embedding-exp-03-07 (deprecated October 2025, do not use)The model supports flexible output dimensionality using Matryoshka Representation Learning :
| Dimension | Use Case | Storage | Performance |
|---|---|---|---|
| 768 | Recommended for most use cases | Low | Fast |
| 1536 | Balance between accuracy and efficiency | Medium | Medium |
| 3072 | Maximum accuracy (default) | High | Slower |
| 128-3071 | Custom (any value in range) | Variable | Variable |
Default : 3072 dimensions Recommended : 768, 1536, or 3072 for optimal performance
| Tier | RPM | TPM | RPD | Requirements |
|---|---|---|---|---|
| Free | 100 | 30,000 | 1,000 | No billing account |
| Tier 1 | 3,000 | 1,000,000 | - | Billing account linked |
| Tier 2 | 5,000 | 5,000,000 | - | $250+ spending, 30-day wait |
| Tier 3 | 10,000 | 10,000,000 | - | $1,000+ spending, 30-day wait |
RPM = Requests Per Minute TPM = Tokens Per Minute RPD = Requests Per Day
{
embedding: {
values: number[] // Array of floating-point numbers
}
}
Single text embedding :
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'The quick brown fox jumps over the lazy dog',
config: {
taskType: 'SEMANTIC_SIMILARITY',
outputDimensionality: 768
}
});
console.log(response.embedding.values);
// [0.00388, -0.00762, 0.01543, ...]
For Workers/edge environments without SDK support :
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const apiKey = env.GEMINI_API_KEY;
const text = "What is the meaning of life?";
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent',
{
method: 'POST',
headers: {
'x-goog-api-key': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
content: {
parts: [{ text }]
},
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768
})
}
);
const data = await response.json();
// Response format:
// {
// embedding: {
// values: [0.012, -0.034, ...]
// }
// }
return new Response(JSON.stringify(data), {
headers: { 'Content-Type': 'application/json' }
});
}
};
interface EmbeddingResponse {
embedding: {
values: number[];
};
}
const response: EmbeddingResponse = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: 'Sample text',
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
const embedding: number[] = response.embedding.values;
const dimensions: number = embedding.length; // 3072 by default
⚠️ CRITICAL : When using dimensions other than 3072, you MUST normalize embeddings before computing similarity. Only 3072-dimensional embeddings are pre-normalized by the API.
Why This Matters : Non-normalized embeddings have varying magnitudes that distort cosine similarity calculations, leading to incorrect search results.
Normalization Helper Function :
/**
* Normalize embedding vector for accurate similarity calculations.
* REQUIRED for dimensions other than 3072.
*
* @param vector - Embedding values from API response
* @returns Normalized vector (unit length)
*/
function normalize(vector: number[]): number[] {
const magnitude = Math.sqrt(
vector.reduce((sum, val) => sum + val * val, 0)
);
return vector.map(val => val / magnitude);
}
// Usage with 768 or 1536 dimensions
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: {
taskType: 'RETRIEVAL_QUERY',
outputDimensionality: 768 // NOT 3072
}
});
// ❌ WRONG - Use raw values directly
const embedding = response.embedding.values;
await vectorize.insert([{ id, values: embedding }]);
// ✅ CORRECT - Normalize first
const normalized = normalize(response.embedding.values);
await vectorize.insert([{ id, values: normalized }]);
Source : Official Embeddings Documentation
Generate embeddings for multiple texts simultaneously:
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const texts = [
"What is the meaning of life?",
"How does photosynthesis work?",
"Tell me about the history of the internet."
];
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: texts, // Array of strings
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768
}
});
// Process each embedding
response.embeddings.forEach((embedding, index) => {
console.log(`Text ${index}: ${texts[index]}`);
console.log(`Embedding: ${embedding.values.slice(0, 5)}...`);
console.log(`Dimensions: ${embedding.values.length}`);
});
Use the batchEmbedContents endpoint:
const response = await fetch(
'https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:batchEmbedContents',
{
method: 'POST',
headers: {
'x-goog-api-key': apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
requests: texts.map(text => ({
model: 'models/gemini-embedding-001',
content: {
parts: [{ text }]
},
taskType: 'RETRIEVAL_DOCUMENT'
}))
})
}
);
const data = await response.json();
// data.embeddings: Array of {values: number[]}
⚠️ Ordering Bug (December 2025) : Batch API may not preserve ordering with large batch sizes (>500 texts).
⚠️ Memory Limit (December 2025) : Large batches (>10k embeddings) can cause ERR_STRING_TOO_LONG crash.
Cannot create a string longer than 0x1fffffe8 characters⚠️ Rate Limit Anomaly (January 2026) : Batch API may return 429 RESOURCE_EXHAUSTED even when under quota.
When processing large datasets, chunk requests to stay within rate limits:
async function batchEmbedWithRateLimit(
texts: string[],
batchSize: number = 50, // REDUCED from 100 due to ordering bug
delayMs: number = 60000 // 1 minute delay between batches
): Promise<number[][]> {
const allEmbeddings: number[][] = [];
for (let i = 0; i < texts.length; i += batchSize) {
const batch = texts.slice(i, i + batchSize);
console.log(`Processing batch ${i / batchSize + 1} (${batch.length} texts)`);
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: batch,
config: {
taskType: 'RETRIEVAL_DOCUMENT',
outputDimensionality: 768
}
});
allEmbeddings.push(...response.embeddings.map(e => e.values));
// Wait before next batch (except last batch)
if (i + batchSize < texts.length) {
await new Promise(resolve => setTimeout(resolve, delayMs));
}
}
return allEmbeddings;
}
// Usage
const embeddings = await batchEmbedWithRateLimit(documents, 50);
Tips :
The taskType parameter optimizes embeddings for specific use cases. Always specify a task type for best results.
| Task Type | Use Case | Example |
|---|---|---|
| RETRIEVAL_QUERY | User search queries | "How do I fix a flat tire?" |
| RETRIEVAL_DOCUMENT | Documents to be indexed/searched | Product descriptions, articles |
| SEMANTIC_SIMILARITY | Comparing text similarity | Duplicate detection, clustering |
| CLASSIFICATION | Categorizing texts | Spam detection, sentiment analysis |
| CLUSTERING | Grouping similar texts | Topic modeling, content organization |
| CODE_RETRIEVAL_QUERY | Code search queries | "function to sort array" |
| QUESTION_ANSWERING | Questions seeking answers | FAQ matching |
RAG Systems (Retrieval Augmented Generation):
// When embedding user queries
const queryEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: { taskType: 'RETRIEVAL_QUERY' } // ← Use RETRIEVAL_QUERY
});
// When embedding documents for indexing
const docEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: documentText,
config: { taskType: 'RETRIEVAL_DOCUMENT' } // ← Use RETRIEVAL_DOCUMENT
});
Semantic Search :
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
Document Clustering :
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'CLUSTERING' }
});
Using the correct task type significantly improves retrieval quality:
// ❌ BAD: No task type specified
const embedding1 = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery
});
// ✅ GOOD: Task type specified
const embedding2 = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: { taskType: 'RETRIEVAL_QUERY' }
});
Result : Using the right task type can improve search relevance by 10-30%.
RAG (Retrieval Augmented Generation) combines vector search with LLM generation to create AI systems that answer questions using custom knowledge bases.
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// Generate embeddings for chunks
async function embedChunks(chunks: string[]): Promise<number[][]> {
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: chunks,
config: {
taskType: 'RETRIEVAL_DOCUMENT', // ← Documents for indexing
outputDimensionality: 768 // ← Match Vectorize index dimensions
}
});
return response.embeddings.map(e => e.values);
}
// Store in Cloudflare Vectorize
async function storeInVectorize(
env: Env,
chunks: string[],
embeddings: number[][]
) {
const vectors = chunks.map((chunk, i) => ({
id: `doc-${Date.now()}-${i}`,
values: embeddings[i],
metadata: { text: chunk }
}));
await env.VECTORIZE.insert(vectors);
}
async function ragQuery(env: Env, userQuery: string): Promise<string> {
// 1. Embed user query
const queryResponse = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: userQuery,
config: {
taskType: 'RETRIEVAL_QUERY', // ← Query, not document
outputDimensionality: 768
}
});
const queryEmbedding = queryResponse.embedding.values;
// 2. Search Vectorize for similar documents
const results = await env.VECTORIZE.query(queryEmbedding, {
topK: 5,
returnMetadata: true
});
// 3. Extract context from top results
const context = results.matches
.map(match => match.metadata.text)
.join('\n\n');
// 4. Generate response with context
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: `Context:\n${context}\n\nQuestion: ${userQuery}\n\nAnswer based on the context above:`
});
return response.text;
}
Create Vectorize Index (768 dimensions for Gemini):
npx wrangler vectorize create gemini-embeddings --dimensions 768 --metric cosine
Bind in wrangler.jsonc :
{
"name": "my-rag-app",
"main": "src/index.ts",
"compatibility_date": "2025-10-25",
"vectorize": {
"bindings": [
{
"binding": "VECTORIZE",
"index_name": "gemini-embeddings"
}
]
}
}
Complete RAG Worker :
See templates/rag-with-vectorize.ts for full implementation.
1. API Key Missing or Invalid
// ❌ Error: API key not set
const ai = new GoogleGenAI({});
// ✅ Correct
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
if (!process.env.GEMINI_API_KEY) {
throw new Error('GEMINI_API_KEY environment variable not set');
}
2. Dimension Mismatch
// ❌ Error: Embedding has 3072 dims, Vectorize expects 768
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text
// No outputDimensionality specified → defaults to 3072
});
await env.VECTORIZE.insert([{
id: '1',
values: embedding.embedding.values // 3072 dims, but index is 768!
}]);
// ✅ Correct: Match dimensions
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { outputDimensionality: 768 } // ← Match index dimensions
});
3. Rate Limiting
// ❌ Error: 429 Too Many Requests
for (let i = 0; i < 1000; i++) {
await ai.models.embedContent({ /* ... */ }); // Exceeds 100 RPM on free tier
}
// ✅ Correct: Implement rate limiting
async function embedWithRetry(text: string, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
} catch (error: any) {
if (error.status === 429 && attempt < maxRetries - 1) {
const delay = Math.pow(2, attempt) * 1000; // Exponential backoff
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
}
See references/top-errors.md for all 8 documented errors with detailed solutions.
This section documents additional issues discovered in production use (beyond basic errors above).
Error : Incorrect similarity scores, no error thrown Source : Official Embeddings Documentation Why It Happens : Only 3072-dimensional embeddings are pre-normalized by the API. All other dimensions (128-3071) have varying magnitudes that distort cosine similarity. Prevention : Always normalize embeddings when using dimensions other than 3072.
function normalize(vector: number[]): number[] {
const magnitude = Math.sqrt(vector.reduce((sum, val) => sum + val * val, 0));
return vector.map(val => val / magnitude);
}
// When using 768 or 1536 dimensions
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { outputDimensionality: 768 }
});
const normalized = normalize(response.embedding.values);
// Now safe for similarity calculations
Error : Silent data corruption - embeddings returned in wrong order Source : GitHub Issue #1207 Why It Happens : Batch API does not preserve ordering with large batch sizes (>500 texts). Example: entry 328 appears in position 628. Prevention : Process smaller batches (<100 texts) or add unique identifiers to verify ordering.
// Safer approach with verification
const taggedTexts = texts.map((text, i) => `[ID:${i}] ${text}`);
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: taggedTexts,
config: { taskType: 'RETRIEVAL_DOCUMENT', outputDimensionality: 768 }
});
// Verify ordering by parsing IDs if needed
Error : Cannot create a string longer than 0x1fffffe8 characters Source : GitHub Issue #1205 Why It Happens : Batch API response contains excessive whitespace causing response size to exceed Node.js string limit (~536MB) with large payloads (>10k embeddings). Prevention : Limit batches to <5,000 texts per request.
// Safe batch size
async function batchEmbedSafe(texts: string[]) {
const maxBatchSize = 5000;
if (texts.length > maxBatchSize) {
throw new Error(`Batch too large: ${texts.length} texts (max: ${maxBatchSize})`);
}
// Process batch...
}
Error : Dimension mismatch - getting 3072 dimensions instead of specified 768 Source : Medium Article Verified : Multiple community reports Why It Happens : LangChain's GoogleGenerativeAIEmbeddings class silently ignores output_dimensionality parameter when passed to constructor (Python SDK). Prevention : Pass dimension parameter to embed_documents() method, not constructor. JavaScript users should verify new @google/genai SDK doesn't have similar behavior.
# ❌ WRONG - parameter silently ignored
embeddings = GoogleGenerativeAIEmbeddings(
model="gemini-embedding-001",
output_dimensionality=768 # IGNORED!
)
# ✅ CORRECT - pass to method
embeddings = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001")
result = embeddings.embed_documents(["text"], output_dimensionality=768)
Error : Hitting rate limits faster than expected with single text embeddings Source : GitHub Issue #427 (Python SDK) Verified : Official issue in googleapis organization Why It Happens : The embed_content() function internally calls batchEmbedContents endpoint even for single texts. This causes higher rate limit consumption (batch endpoint has different limits). Prevention : Add delays between single embedding requests and implement exponential backoff for 429 errors.
// Add delays to avoid rate limits
async function embedWithDelay(text: string, delayMs: number = 100) {
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
await new Promise(resolve => setTimeout(resolve, delayMs));
return response.embedding.values;
}
✅ Specify Task Type
// Task type optimizes embeddings for your use case
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'RETRIEVAL_QUERY' } // ← Always specify
});
✅ Match Dimensions with Vectorize
// Ensure embeddings match your Vectorize index dimensions
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { outputDimensionality: 768 } // ← Match index
});
✅ Implement Rate Limiting
// Use exponential backoff for 429 errors
async function embedWithBackoff(text: string) {
// Implementation from Error Handling section
}
✅ Cache Embeddings
// Cache embeddings to avoid redundant API calls
const cache = new Map<string, number[]>();
async function getCachedEmbedding(text: string): Promise<number[]> {
if (cache.has(text)) {
return cache.get(text)!;
}
const response = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text,
config: { taskType: 'SEMANTIC_SIMILARITY' }
});
const embedding = response.embedding.values;
cache.set(text, embedding);
return embedding;
}
✅ Use Batch API for Multiple Texts
// Single batch request vs multiple individual requests
const embeddings = await ai.models.embedContent({
model: 'gemini-embedding-001',
contents: texts, // Array of texts
config: { taskType: 'RETRIEVAL_DOCUMENT' }
});
❌ Don't Skip Task Type
// Reduces quality by 10-30%
const embedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text
// Missing taskType!
});
❌ Don't Mix Different Dimensions
// Can't compare embeddings with different dimensions
const emb1 = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text1,
config: { outputDimensionality: 768 }
});
const emb2 = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: text2,
config: { outputDimensionality: 1536 } // Different dimensions!
});
// ❌ Can't calculate similarity between different dimensions
const similarity = cosineSimilarity(emb1.embedding.values, emb2.embedding.values);
❌ Don't Use Wrong Task Type for RAG
// Reduces search quality
const queryEmbedding = await ai.models.embedContent({
model: 'gemini-embedding-001',
content: query,
config: { taskType: 'RETRIEVAL_DOCUMENT' } // Wrong! Should be RETRIEVAL_QUERY
});
package.json - Package configuration with verified versionsbasic-embeddings.ts - Single text embedding with SDKembeddings-fetch.ts - Fetch-based for Cloudflare Workersbatch-embeddings.ts - Batch processing with rate limitingrag-with-vectorize.ts - Complete RAG implementation with Vectorizemodel-comparison.md - Compare Gemini vs OpenAI vs Workers AI embeddingsvectorize-integration.md - Cloudflare Vectorize setup and patternsrag-patterns.md - Complete RAG implementation strategiesdimension-guide.md - Choosing the right dimensions (768 vs 1536 vs 3072)top-errors.md - 8 common errors and detailed solutionscheck-versions.sh - Verify @google/genai package version is current/websites/ai_google_dev_gemini-apiToken Savings : ~60% compared to manual implementation Errors Prevented : 13 documented errors with solutions (8 basic + 5 known issues) Production Tested : ✅ Verified in RAG applications Package Version : @google/genai@1.37.0 Last Updated : 2026-01-21 Changes : Added normalization requirement, batch API warnings (ordering bug, memory limits, rate limit anomaly), LangChain compatibility notes
MIT License - Free to use in personal and commercial projects.
Questions or Issues?
Weekly Installs
337
Repository
GitHub Stars
643
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
claude-code275
gemini-cli227
opencode224
antigravity211
cursor205
codex192
Azure Data Explorer (Kusto) 查询技能:KQL数据分析、日志遥测与时间序列处理
102,600 周安装
| Verifying claims with evidence |
| Fact-checking systems |