Postgres 出口流量优化指南 - 诊断和修复高额数据库流量账单

neon-postgres-egress-optimizer by neondatabase/agent-skills

278 周安装量

47 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/neondatabase/agent-skills --skill neon-postgres-egress-optimizer

数据库开发运维性能优化

🇨🇳中文介绍

Postgres 出口流量优化器

引导用户诊断并修复导致 Postgres 数据库产生过量数据传输（出口流量）的应用端查询模式。大多数高额出口流量账单源于应用程序获取了超出其实际使用量的数据。

步骤 1：诊断

识别哪些查询传输了最多的数据。主要工具是 pg_stat_statements 扩展。

检查 pg_stat_statements 是否可用

SELECT 1 FROM pg_stat_statements LIMIT 1;

如果报错，则需要创建该扩展：

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

在 Neon 上，该扩展默认可用，但可能仍需要执行此 CREATE EXTENSION 步骤。

处理空的统计数据

当 Neon 计算实例缩放到零并重启时，统计数据会被清除。如果统计数据为空或计算实例最近刚唤醒：

重置统计数据以开始一个干净的测量窗口：SELECT pg_stat_statements_reset();
让应用程序在代表性流量下运行至少一小时。
返回并运行下面的诊断查询。

如果用户拥有生产数据库的统计数据，请使用它们。如果用户无法访问生产统计数据，则直接进入步骤 2，直接分析代码库——代码级别的模式通常足以识别最严重的违规者。

诊断查询

运行以下查询以识别主要的出口流量贡献者。重点关注那些返回大量行、返回宽行（JSONB、TEXT、BYTEA 列）或调用非常频繁的查询。

返回总行数最多的查询：

SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 2：分析代码库

对于在步骤 1 中识别的每个查询，或者如果没有可用统计数据，则对代码库中的每个数据库查询进行检查：

它是否只选择了响应所需的列？
它是否返回有限数量的行（LIMIT/分页）？
它的调用频率是否足以从缓存中受益？
它是否获取了在应用程序代码中聚合的原始数据？
它是否使用了在子行中重复父数据的 JOIN？

针对发现的每个问题应用适当的修复方法。以下是最常见的出口流量反模式及其修复方法。

未使用的列（SELECT *）

问题： 查询获取所有列，但应用程序只使用其中几列。大列（JSONB 大对象、TEXT 字段）通过网络传输后被丢弃。

SELECT * FROM products;

SELECT id, name, price, image_urls FROM products;

问题： 列表端点返回所有行，没有使用 LIMIT。这是一个无限制的出口流量风险——表中的每个新行都会增加每次请求的数据传输量。无论当前表大小如何，都应标记此问题。

这很容易被忽视，因为应用程序在小数据集上可能运行良好。但在大规模下，一个未分页的端点返回 10,000 行，即使列宽适中，每天也可能传输数百兆字节。

SELECT id, name, price FROM products;

SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;

添加分页时，请检查消费客户端是否已支持分页响应。如果不支持，请选择合理的默认值，并在 API 中记录分页参数。

对静态数据的高频查询

问题： 一个查询每天被调用数千次，但返回的数据很少变化。每次调用都从数据库传输相同的行。这种模式只能从 pg_stat_statements 中看到——代码本身看起来正常。

寻找相对于其他查询调用次数极高的查询。常见示例：配置表、类别列表、功能标志、用户角色定义。

修复： 在应用程序和数据库之间添加缓存层，以避免每次请求都访问数据库。

问题： 应用程序从表中获取所有行，然后在应用程序代码中计算聚合（平均值、计数、求和、分组）。完整的数据集通过网络传输，即使结果是一个小的摘要。

修复： 将聚合操作下推到 SQL 中。

修复前： 应用程序获取整个表，并在代码中使用循环或 .reduce() 进行聚合。

SELECT p.category_id,
       AVG(r.rating) AS avg_rating,
       COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;

问题： 宽父表和子表之间的 JOIN 会在每个子行中重复所有父列。如果一个产品有 200 条评论，并且产品行包含一个 50KB 的 JSONB 列，那么该连接会为单个请求发送该 50KB × 200 = ~10MB 的数据。

这与 SELECT * 问题不同。即使只选择所需的列，JOIN 仍然会为每个子行重复父数据。修复方法是结构性的：完全避免这种连接。

SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;

修复后（两个独立的查询）：

SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, user_name, rating, body FROM reviews WHERE product_id = 1;

用两个查询代替一个 JOIN。产品数据获取一次。评论获取一次。没有重复。

运行现有测试以确认没有破坏任何功能。
检查响应——确保 API 仍然返回相同的数据结构。列选择和分页的更改可能会破坏依赖特定字段或完整结果集的客户端。
衡量改进效果——如果 pg_stat_statements 数据可用，重置它（SELECT pg_stat_statements_reset();），让流量运行，然后重新运行诊断查询以进行前后对比。

🇺🇸English

Postgres Egress Optimizer

Guide the user through diagnosing and fixing application-side query patterns that cause excessive data transfer (egress) from their Postgres database. Most high egress bills come from the application fetching more data than it uses.

Step 1: Diagnose

Identify which queries transfer the most data. The primary tool is the pg_stat_statements extension.

Check if pg_stat_statements is available

SELECT 1 FROM pg_stat_statements LIMIT 1;

If this errors, the extension needs to be created:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

On Neon, it is available by default but may need this CREATE EXTENSION step.

Handle empty stats

Stats are cleared when a Neon compute scales to zero and restarts. If the stats are empty or the compute recently woke up:

Reset the stats to start a clean measurement window: SELECT pg_stat_statements_reset();
Let the application run under representative traffic for at least an hour.
Return and run the diagnostic queries below.

If the user has stats from a production database, use those. If they have no access to production stats, proceed to Step 2 and analyze the codebase directly — code-level patterns are often sufficient to identify the worst offenders.

Diagnostic queries

Run these to identify the top egress contributors. Focus on queries that return many rows, return wide rows (JSONB, TEXT, BYTEA columns), or are called very frequently.

Queries returning the most total rows:

SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY rows DESC
LIMIT 10;

Queries returning the most rows per execution (poorly scoped SELECTs, missing pagination):

SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY avg_rows_per_call DESC
LIMIT 10;

Most frequently called queries (candidates for caching):

SELECT query, calls, rows AS total_rows, rows / calls AS avg_rows_per_call
FROM pg_stat_statements
WHERE calls > 0
ORDER BY calls DESC
LIMIT 10;

Longest running queries (not a direct egress measure, but helps identify problem queries during a spike):

SELECT query, calls, rows AS total_rows,
  round(total_exec_time::numeric, 2) AS total_exec_time_ms
FROM pg_stat_statements
WHERE calls > 0
ORDER BY total_exec_time DESC
LIMIT 10;

Interpret the results

Rank findings by estimated egress impact:

High row count + wide rows = biggest egress. A query returning 1,000 rows where each row includes a 50KB JSONB column transfers ~50MB per call.
Extreme call frequency on even small queries adds up. A query called 50,000 times/day returning 10 rows each = 500,000 rows/day.
Cross-reference with the schema to identify which columns are wide. Look for JSONB, TEXT, BYTEA, and large VARCHAR columns.

Step 2: Analyze codebase

For each query identified in Step 1, or for each database query in the codebase if no stats are available, check:

Does it select only the columns the response needs?
Does it return a bounded number of rows (LIMIT/pagination)?
Is it called frequently enough to benefit from caching?
Does it fetch raw data that gets aggregated in application code?
Does it use a JOIN that duplicates parent data across child rows?

Step 3: Fix

Apply the appropriate fix for each problem found. Below are the most common egress anti-patterns and how to fix them.

Unused columns (SELECT *)

Problem: The query fetches all columns but the application only uses a few. Large columns (JSONB blobs, TEXT fields) get transferred over the wire and discarded.

Before:

SELECT * FROM products;

After:

SELECT id, name, price, image_urls FROM products;

Missing pagination

Problem: A list endpoint returns all rows with no LIMIT. This is an unbounded egress risk — every new row in the table increases data transfer on every request. Flag this regardless of current table size.

This is easy to miss because the application may work fine with small datasets. But at scale, an unpaginated endpoint returning 10,000 rows with even moderate column widths can transfer hundreds of megabytes per day.

Before:

SELECT id, name, price FROM products;

After:

SELECT id, name, price FROM products
ORDER BY id
LIMIT 50 OFFSET 0;

When adding pagination, check whether the consuming client already supports paginated responses. If not, pick sensible defaults and document the pagination parameters in the API.

High-frequency queries on static data

Problem: A query is called thousands of times per day but returns data that rarely changes. Every call transfers the same rows from the database. This pattern is only visible from pg_stat_statements — the code itself looks normal.

Look for queries with extremely high call counts relative to other queries. Common examples: configuration tables, category lists, feature flags, user role definitions.

Fix: Add a caching layer between the application and the database so it avoids hitting the database on every request.

Application-side aggregation

Problem: The application fetches all rows from a table and then computes aggregates (averages, counts, sums, groupings) in application code. The full dataset transfers over the wire even though the result is a small summary.

Fix: Push the aggregation into SQL.

Before: The application fetches entire tables and aggregates in code with loops or .reduce().

After:

SELECT p.category_id,
       AVG(r.rating) AS avg_rating,
       COUNT(r.id) AS review_count
FROM reviews r
INNER JOIN products p ON r.product_id = p.id
GROUP BY p.category_id;

JOIN duplication

Problem: A JOIN between a wide parent table and a child table duplicates all parent columns across every child row. If a product has 200 reviews and the product row includes a 50KB JSONB column, the join sends that 50KB × 200 = ~10MB for a single request.

This is distinct from the SELECT * problem. Even if you select only needed columns, a JOIN still repeats the parent data for every child row. The fix is structural: avoid the join entirely.

Before:

SELECT * FROM products
LEFT JOIN reviews ON reviews.product_id = products.id
WHERE products.id = 1;

After (two separate queries):

SELECT id, name, price, description, image_urls FROM products WHERE id = 1;
SELECT id, user_name, rating, body FROM reviews WHERE product_id = 1;

Two queries instead of one JOIN. The product data is fetched once. The reviews are fetched once. No duplication.

Step 4: Verify

After applying fixes:

Run existing tests to confirm nothing broke.
Check the responses — make sure the API still returns the same data shape. Column selection and pagination changes can break clients that depend on specific fields or full result sets.
Measure the improvement — if pg_stat_statements data is available, reset it (SELECT pg_stat_statements_reset();), let traffic run, then re-run the diagnostic queries to compare before and after.

Postgres 出口流量优化指南 - 诊断和修复高额数据库流量账单

🇨🇳中文介绍

Postgres 出口流量优化器

步骤 1：诊断

检查 pg_stat_statements 是否可用

处理空的统计数据

诊断查询

相关 Skills

解读结果

步骤 2：分析代码库

步骤 3：修复

未使用的列（SELECT *）

缺少分页

对静态数据的高频查询

应用端聚合

JOIN 重复

步骤 4：验证

延伸阅读

🇺🇸English

Postgres Egress Optimizer

Step 1: Diagnose

Check if pg_stat_statements is available

Handle empty stats

Diagnostic queries

Interpret the results

Step 2: Analyze codebase

Step 3: Fix

Unused columns (SELECT *)

Missing pagination

High-frequency queries on static data

Application-side aggregation

JOIN duplication

Step 4: Verify

Further reading

最新 Skills