⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

Gemini 令牌优化指南：降低AI成本、选择模型与缓存策略

gemini-token-optimization by melodic-software/claude-code-plugins

71 周安装量

50 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/melodic-software/claude-code-plugins --skill gemini-token-optimization

AI/机器学习命令行工具性能优化

🇨🇳中文介绍

Gemini 令牌优化

🚨 强制要求：首先调用 gemini-cli-docs

停止 - 在提供任何关于 Gemini 令牌使用的回应之前：

调用 gemini-cli-docs 技能

查询特定的令牌或定价主题

所有回应必须完全基于 加载的官方文档

概述

用于在委托给 Gemini CLI 时优化成本和令牌使用情况的技能。对于高效的批量操作和注重成本的工作流至关重要。

何时使用此技能

关键词： 令牌使用、成本优化、gemini 成本、模型选择、flash 与 pro、缓存、批量查询、减少令牌

在以下情况下使用此技能：

规划批量 Gemini 操作
优化大规模分析的成本
在 Flash 和 Pro 模型之间进行选择
了解令牌缓存的好处
跨会话跟踪使用情况

令牌缓存

Gemini CLI 自动缓存上下文，通过重用先前处理过的内容来降低成本。

可用性

认证方法	缓存可用性
API 密钥 (Gemini API)	是

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

最大化缓存命中率

使用一致的系统提示 - 相同的前缀增加缓存重用率
批量处理相似查询 - 将相关的分析分组在一起
重用上下文文件 - 以相同顺序使用相同的文件

监控缓存使用情况

result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))

echo "总计: $total 令牌"
echo "已缓存: $cached 令牌 ($savings% 节省)"
echo "计费: $billable 令牌"

模型	上下文窗口	速度	成本	质量
gemini-2.5-flash	大	快	较低	良好
gemini-2.5-pro	非常大	较慢	较高	最佳

在以下情况下使用 Flash (-m gemini-2.5-flash)：

处理大文件（批量分析）
简单的提取任务
成本是主要考虑因素
速度至关重要
任务直接明了

在以下情况下使用 Pro (-m gemini-2.5-pro)：

需要复杂推理
质量至关重要
需要细致入微的分析
任务需要深入理解
上下文超过 1M 令牌

# 批量文件分析 - 使用 Flash
for file in src/*.ts; do
  gemini "列出所有导出" -m gemini-2.5-flash --output-format json < "$file"
done

# 安全审计 - 使用 Pro 以保证质量
gemini "深度安全分析" -m gemini-2.5-pro --output-format json < critical-auth.ts

# 带模型信息的成本跟踪
result=$(gemini "query" --output-format json)
model=$(echo "$result" | jq -r '.stats.models | keys[0]')
tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
echo "使用了 $model: $tokens 令牌"

为什么要批处理？

减少 API 开销
提高缓存命中率
提供一致的上下文

模式 1：连接文件

# 代替 N 次单独调用
# 使用所有文件进行一次调用
cat src/*.ts | gemini "分析所有 TypeScript 文件以寻找模式" --output-format json

模式 2：批量提示

# 合并相关问题
gemini "回答关于代码库的这些问题：
1. 主要的架构模式是什么？
2. 如何处理身份验证？
3. 使用了什么数据库？" --output-format json

模式 3：分阶段分析

# 第一遍：使用 Flash 快速概览
overview=$(cat src/*.ts | gemini "列出所有模块" -m gemini-2.5-flash --output-format json)

# 第二遍：使用 Pro 深入分析关键领域
echo "$overview" | jq -r '.response' | grep "auth\|security" | while read module; do
  gemini "对 $module 进行深度分析" -m gemini-2.5-pro --output-format json
done

result=$(gemini "query" --output-format json)

# 提取所有与成本相关的统计信息
total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')

echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log

# 跟踪会话中的累计使用情况
total_session_tokens=0
total_session_cached=0
total_session_calls=0

track_usage() {
  local result="$1"
  local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
  local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')

  total_session_tokens=$((total_session_tokens + tokens))
  total_session_cached=$((total_session_cached + cached))
  total_session_calls=$((total_session_calls + 1))
}

# 在工作流中使用
result=$(gemini "query 1" --output-format json)
track_usage "$result"

result=$(gemini "query 2" --output-format json)
track_usage "$result"

echo "会话总计: $total_session_tokens 令牌 ($total_session_cached 已缓存) 在 $total_session_calls 次调用中"

选择合适的模型（Flash 与 Pro）
检查缓存是否可用（API 密钥或 Vertex）
规划批处理策略
设置使用情况跟踪

监控缓存命中率
跟踪每次查询的成本
如果质量不足，调整模型
批量处理相似查询

审查总使用情况
计算有效成本
识别优化机会
记录经验教训

节省成本的命令

# 批量处理使用 Flash
gemini "query" -m gemini-2.5-flash --output-format json

# 检查缓存效率
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'

# 最小化输出（更少的输出令牌）
gemini "用一句话回答: {question}" --output-format json

粗略的令牌估算：

1 令牌 ~ 4 个字符（英文）
1 页代码 ~ 500-1000 令牌
典型的源文件 ~ 200-2000 令牌

关键词注册表（委托给 gemini-cli-docs）

主题	查询关键词
缓存	`token caching`, `cached tokens`, `/stats`
模型选择	`model routing`, `flash vs pro`, `-m flag`
成本	`quota pricing`, `token usage`, `billing`
输出控制	`output format`, `json output`

场景 1：检查令牌使用情况

查询："如何查看 Gemini 使用了多少令牌？" 预期行为：

技能在 "token usage" 或 "gemini cost" 时激活
提供 JSON 统计信息提取模式 成功标准：用户收到提取令牌计数的 jq 命令

场景 2：降低成本

查询："如何降低批量分析的 Gemini CLI 成本？" 预期行为：

技能在 "cost optimization" 或 "reduce tokens" 时激活
推荐 Flash 模型和批处理 成功标准：用户收到成本优化策略

场景 3：模型选择

查询："我应该对这个任务使用 Flash 还是 Pro？" 预期行为：

技能在 "flash vs pro" 或 "model selection" 时激活
提供决策标准表 成功标准：用户收到模型比较和推荐

查询 gemini-cli-docs 获取关于以下内容的官方文档：

"token caching"
"model selection"
"quota and pricing"

v1.1.0 (2025-12-01)：添加了测试场景部分
v1.0.0 (2025-11-25)：初始发布

🇺🇸English

Gemini Token Optimization

🚨 MANDATORY: Invoke gemini-cli-docs First

STOP - Before providing ANY response about Gemini token usage:

INVOKE gemini-cli-docs skill

QUERY for the specific token or pricing topic

BASE all responses EXCLUSIVELY on official documentation loaded

Overview

Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.

When to Use This Skill

Keywords: token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens

Use this skill when:

Planning bulk Gemini operations
Optimizing costs for large-scale analysis
Choosing between Flash and Pro models
Understanding token caching benefits
Tracking usage across sessions

Token Caching

Gemini CLI automatically caches context to reduce costs by reusing previously processed content.

Availability

Auth Method	Caching Available
API key (Gemini API)	YES
Vertex AI	YES
OAuth (personal/enterprise)	NO

How It Works

System instructions and repeated context are cached
Cached tokens don't count toward billing
View savings via /stats command or JSON output

Maximizing Cache Hits

Use consistent system prompts - Same prefix increases cache reuse
Batch similar queries - Group related analysis together
Reuse context files - Same files in same order

Monitoring Cache Usage

result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))

echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"

Model Selection

Model Comparison

Model	Context Window	Speed	Cost	Quality
gemini-2.5-flash	Large	Fast	Lower	Good
gemini-2.5-pro	Very large	Slower	Higher	Best

Selection Criteria

Use Flash (-m gemini-2.5-flash) when:

Processing large files (bulk analysis)
Simple extraction tasks
Cost is a primary concern
Speed is critical
Task is straightforward

Use Pro (-m gemini-2.5-pro) when:

Complex reasoning required
Quality is critical
Nuanced analysis needed
Task requires deep understanding
Context exceeds 1M tokens

Model Selection Examples

# Bulk file analysis - use Flash
for file in src/*.ts; do
  gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file"
done

# Security audit - use Pro for quality
gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts

# Cost tracking with model info
result=$(gemini "query" --output-format json)
model=$(echo "$result" | jq -r '.stats.models | keys[0]')
tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
echo "Used $model: $tokens tokens"

Batching Strategy

Why Batch?

Reduces API overhead
Increases cache hit rate
Provides consistent context

Batching Patterns

Pattern 1: Concatenate Files

# Instead of N separate calls
# Do one call with all files
cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json

Pattern 2: Batch Prompts

# Combine related questions
gemini "Answer these questions about the codebase:
1. What is the main architecture pattern?
2. How is authentication handled?
3. What database is used?" --output-format json

Pattern 3: Staged Analysis

# First pass: Quick overview with Flash
overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)

# Second pass: Deep dive critical areas with Pro
echo "$overview" | jq -r '.response' | grep "auth\|security" | while read module; do
  gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json
done

Cost Tracking

Per-Query Tracking

result=$(gemini "query" --output-format json)

# Extract all cost-relevant stats
total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')

echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log

Session Tracking

# Track cumulative usage across a session
total_session_tokens=0
total_session_cached=0
total_session_calls=0

track_usage() {
  local result="$1"
  local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
  local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')

  total_session_tokens=$((total_session_tokens + tokens))
  total_session_cached=$((total_session_cached + cached))
  total_session_calls=$((total_session_calls + 1))
}

# Use in workflow
result=$(gemini "query 1" --output-format json)
track_usage "$result"

result=$(gemini "query 2" --output-format json)
track_usage "$result"

echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"

Optimization Checklist

Before Large Operations

Choose appropriate model (Flash vs Pro)
Check if caching is available (API key or Vertex)
Plan batching strategy
Set up usage tracking

During Operations

Monitor cache hit rates
Track per-query costs
Adjust model if quality insufficient
Batch similar queries

After Operations

Review total usage
Calculate effective cost
Identify optimization opportunities
Document learnings

Quick Reference

Cost-Saving Commands

# Use Flash for bulk
gemini "query" -m gemini-2.5-flash --output-format json

# Check cache effectiveness
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'

# Minimal output (fewer output tokens)
gemini "Answer in one sentence: {question}" --output-format json

Cost Estimation

Rough token estimates:

1 token ~ 4 characters (English)
1 page of code ~ 500-1000 tokens
Typical source file ~ 200-2000 tokens

Keyword Registry (Delegates to gemini-cli-docs)

Topic	Query Keywords
Caching	`token caching`, `cached tokens`, `/stats`
Model selection	`model routing`, `flash vs pro`, `-m flag`
Costs	`quota pricing`, `token usage`, `billing`

Test Scenarios

Scenario 1: Check Token Usage

Query : "How do I see how many tokens Gemini used?" Expected Behavior :

Skill activates on "token usage" or "gemini cost"
Provides JSON stats extraction pattern Success Criteria : User receives jq commands to extract token counts

Scenario 2: Reduce Costs

Query : "How do I reduce Gemini CLI costs for bulk analysis?" Expected Behavior :

Skill activates on "cost optimization" or "reduce tokens"
Recommends Flash model and batching Success Criteria : User receives cost optimization strategies

Scenario 3: Model Selection

Query : "Should I use Flash or Pro for this task?" Expected Behavior :

Skill activates on "flash vs pro" or "model selection"
Provides decision criteria table Success Criteria : User receives model comparison and recommendation

References

Query gemini-cli-docs for official documentation on:

"token caching"
"model selection"
"quota and pricing"

Version History

v1.1.0 (2025-12-01): Added Test Scenarios section
v1.0.0 (2025-11-25): Initial release

Weekly Installs

Repository

melodic-softwar…-plugins

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli63

opencode63

codex61

cursor60

github-copilot59

kimi-cli57

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

56,600 周安装