neon-postgres-egress-optimizer by neondatabase/agent-skills
npx skills add https://github.com/neondatabase/agent-skills --skill neon-postgres-egress-optimizer引导用户诊断并修复导致 Postgres 数据库产生过量数据传输(出口流量)的应用端查询模式。大多数高额出口流量账单源于应用程序获取了超出其实际使用量的数据。
识别哪些查询传输了最多的数据。主要工具是 pg_stat_statements 扩展。
SELECT 1 FROM pg_stat_statements LIMIT 1;
如果报错,则需要创建该扩展:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
在 Neon 上,该扩展默认可用,但可能仍需要执行此 CREATE EXTENSION 步骤。
当 Neon 计算实例缩放到零并重启时,统计数据会被清除。如果统计数据为空或计算实例最近刚唤醒:
SELECT pg_stat_statements_reset();如果用户拥有生产数据库的统计数据,请使用它们。如果用户无法访问生产统计数据,则直接进入步骤 2,直接分析代码库——代码级别的模式通常足以识别最严重的违规者。
运行以下查询以识别主要的出口流量贡献者。重点关注那些返回大量行、返回宽行(JSONB、TEXT、BYTEA 列)或调用非常频繁的查询。
返回总行数最多的查询:
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
每次执行返回行数最多的查询(范围过大的 SELECT,缺少分页):
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY avg_rows_per_call DESC
LIMIT 10;
调用最频繁的查询(缓存的候选者):
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY calls DESC
LIMIT 10;
运行时间最长的查询(不是直接的出口流量衡量指标,但有助于在流量激增时识别问题查询):
SELECT query, calls, rows AS total_rows,
round(total_exec_time::numeric, 2) AS total_exec_time_ms
FROM pg_stat_statements
WHERE calls > 0
ORDER BY total_exec_time DESC
LIMIT 10;
根据估计的出口流量影响对发现进行排序:
对于在步骤 1 中识别的每个查询,或者如果没有可用统计数据,则对代码库中的每个数据库查询进行检查:
针对发现的每个问题应用适当的修复方法。以下是最常见的出口流量反模式及其修复方法。
问题: 查询获取所有列,但应用程序只使用其中几列。大列(JSONB 大对象、TEXT 字段)通过网络传输后被丢弃。
修复前:
SELECT * FROM products;
修复后:
SELECT id, name, price, image_urls FROM products;
问题: 列表端点返回所有行,没有使用 LIMIT。这是一个无限制的出口流量风险——表中的每个新行都会增加每次请求的数据传输量。无论当前表大小如何,都应标记此问题。
这很容易被忽视,因为应用程序在小数据集上可能运行良好。但在大规模下,一个未分页的端点返回 10,000 行,即使列宽适中,每天也可能传输数百兆字节。
修复前:
SELECT id, name, price FROM products;
修复后:
SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;
添加分页时,请检查消费客户端是否已支持分页响应。如果不支持,请选择合理的默认值,并在 API 中记录分页参数。
问题: 一个查询每天被调用数千次,但返回的数据很少变化。每次调用都从数据库传输相同的行。这种模式只能从 pg_stat_statements 中看到——代码本身看起来正常。
寻找相对于其他查询调用次数极高的查询。常见示例:配置表、类别列表、功能标志、用户角色定义。
修复: 在应用程序和数据库之间添加缓存层,以避免每次请求都访问数据库。
问题: 应用程序从表中获取所有行,然后在应用程序代码中计算聚合(平均值、计数、求和、分组)。完整的数据集通过网络传输,即使结果是一个小的摘要。
修复: 将聚合操作下推到 SQL 中。
修复前: 应用程序获取整个表,并在代码中使用循环或 .reduce() 进行聚合。
修复后:
SELECT p.category_id,
AVG(r.rating) AS avg_rating,
COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;
问题: 宽父表和子表之间的 JOIN 会在每个子行中重复所有父列。如果一个产品有 200 条评论,并且产品行包含一个 50KB 的 JSONB 列,那么该连接会为单个请求发送该 50KB × 200 = ~10MB 的数据。
这与 SELECT * 问题不同。即使只选择所需的列,JOIN 仍然会为每个子行重复父数据。修复方法是结构性的:完全避免这种连接。
修复前:
SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;
修复后(两个独立的查询):
SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, user_name, rating, body FROM reviews WHERE product_id = 1;
用两个查询代替一个 JOIN。产品数据获取一次。评论获取一次。没有重复。
应用修复后:
SELECT pg_stat_statements_reset();),让流量运行,然后重新运行诊断查询以进行前后对比。每周安装次数
123
代码仓库
GitHub 星标数
41
首次出现
11 天前
安全审计
安装于
gemini-cli118
github-copilot118
codex118
cursor118
kimi-cli117
amp117
Guide the user through diagnosing and fixing application-side query patterns that cause excessive data transfer (egress) from their Postgres database. Most high egress bills come from the application fetching more data than it uses.
Identify which queries transfer the most data. The primary tool is the pg_stat_statements extension.
SELECT 1 FROM pg_stat_statements LIMIT 1;
If this errors, the extension needs to be created:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
On Neon, it is available by default but may need this CREATE EXTENSION step.
Stats are cleared when a Neon compute scales to zero and restarts. If the stats are empty or the compute recently woke up:
SELECT pg_stat_statements_reset();If the user has stats from a production database, use those. If they have no access to production stats, proceed to Step 2 and analyze the codebase directly — code-level patterns are often sufficient to identify the worst offenders.
Run these to identify the top egress contributors. Focus on queries that return many rows, return wide rows (JSONB, TEXT, BYTEA columns), or are called very frequently.
Queries returning the most total rows:
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;
Queries returning the most rows per execution (poorly scoped SELECTs, missing pagination):
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY avg_rows_per_call DESC
LIMIT 10;
Most frequently called queries (candidates for caching):
SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY calls DESC
LIMIT 10;
Longest running queries (not a direct egress measure, but helps identify problem queries during a spike):
SELECT query, calls, rows AS total_rows,
round(total_exec_time::numeric, 2) AS total_exec_time_ms
FROM pg_stat_statements
WHERE calls > 0
ORDER BY total_exec_time DESC
LIMIT 10;
Rank findings by estimated egress impact:
For each query identified in Step 1, or for each database query in the codebase if no stats are available, check:
Apply the appropriate fix for each problem found. Below are the most common egress anti-patterns and how to fix them.
Problem: The query fetches all columns but the application only uses a few. Large columns (JSONB blobs, TEXT fields) get transferred over the wire and discarded.
Before:
SELECT * FROM products;
After:
SELECT id, name, price, image_urls FROM products;
Problem: A list endpoint returns all rows with no LIMIT. This is an unbounded egress risk — every new row in the table increases data transfer on every request. Flag this regardless of current table size.
This is easy to miss because the application may work fine with small datasets. But at scale, an unpaginated endpoint returning 10,000 rows with even moderate column widths can transfer hundreds of megabytes per day.
Before:
SELECT id, name, price FROM products;
After:
SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;
When adding pagination, check whether the consuming client already supports paginated responses. If not, pick sensible defaults and document the pagination parameters in the API.
Problem: A query is called thousands of times per day but returns data that rarely changes. Every call transfers the same rows from the database. This pattern is only visible from pg_stat_statements — the code itself looks normal.
Look for queries with extremely high call counts relative to other queries. Common examples: configuration tables, category lists, feature flags, user role definitions.
Fix: Add a caching layer between the application and the database so it avoids hitting the database on every request.
Problem: The application fetches all rows from a table and then computes aggregates (averages, counts, sums, groupings) in application code. The full dataset transfers over the wire even though the result is a small summary.
Fix: Push the aggregation into SQL.
Before: The application fetches entire tables and aggregates in code with loops or .reduce().
After:
SELECT p.category_id,
AVG(r.rating) AS avg_rating,
COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;
Problem: A JOIN between a wide parent table and a child table duplicates all parent columns across every child row. If a product has 200 reviews and the product row includes a 50KB JSONB column, the join sends that 50KB × 200 = ~10MB for a single request.
This is distinct from the SELECT * problem. Even if you select only needed columns, a JOIN still repeats the parent data for every child row. The fix is structural: avoid the join entirely.
Before:
SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;
After (two separate queries):
SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, user_name, rating, body FROM reviews WHERE product_id = 1;
Two queries instead of one JOIN. The product data is fetched once. The reviews are fetched once. No duplication.
After applying fixes:
SELECT pg_stat_statements_reset();), let traffic run, then re-run the diagnostic queries to compare before and after.Weekly Installs
123
Repository
GitHub Stars
41
First Seen
11 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
gemini-cli118
github-copilot118
codex118
cursor118
kimi-cli117
amp117
Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU
104,900 周安装
MongoDB与PostgreSQL数据库指南:选择、查询、优化与部署实战
236 周安装
资深架构师AI助手 - Claude高级架构设计技能,提升系统架构与代码质量
235 周安装
Polymarket交易员分析工具:追踪链上交易活动与持仓数据
237 周安装
Next.js开发技能:构建现代全栈Web应用指南(App Router/服务器组件/SEO优化)
70 周安装
aixyz 智能体开发框架:基于 Bun 和 Vercel AI SDK 构建可盈利 AI 代理
237 周安装
Telegram Bot 开发指南:Node.js/Python 构建机器人教程(含Webhook、支付、部署)
240 周安装