⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

TimescaleDB超表完整设置指南：时间序列数据优化配置与分区策略

setup-timescaledb-hypertables by timescale/pg-aiguide

64 周安装量

1,700 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/timescale/pg-aiguide --skill setup-timescaledb-hypertables

数据库性能优化数据处理

🇨🇳中文介绍

TimescaleDB 完整设置指南

适用于以插入为主的数据模式（数据频繁插入但很少更改）：

时间序列数据（传感器数据、指标数据、系统监控数据）
事件日志（用户事件、审计追踪、应用程序日志）
交易记录（订单、支付、金融交易）
顺序数据（具有自增 ID 和时间戳的记录）
仅追加数据集（不可变记录、历史数据）

步骤 1：创建超表

CREATE TABLE your_table_name (
    timestamp TIMESTAMPTZ NOT NULL,
    entity_id TEXT NOT NULL,          -- device_id, user_id, symbol, etc.
    category TEXT,                    -- sensor_type, event_type, asset_class, etc.
    value_1 DOUBLE PRECISION,         -- price, temperature, latency, etc.
    value_2 DOUBLE PRECISION,         -- volume, humidity, throughput, etc.
    value_3 INTEGER,                  -- count, status, level, etc.
    metadata JSONB                    -- flexible additional data
) WITH (
    tsdb.hypertable,
    tsdb.partition_column='timestamp',
    tsdb.enable_columnstore=true,     -- Disable if table has vector columns
    tsdb.segmentby='entity_id',       -- See selection guide below
    tsdb.orderby='timestamp DESC',     -- See selection guide below
    tsdb.sparse_index='minmax(value_1),minmax(value_2),minmax(value_3)' -- see selection guide below
);

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

Segment_By 列选择

首选单列 - 多列很少是最优的。多列仅适用于具有足够行密度的强相关列（例如，metric_name + metric_type）。

经常在 WHERE 子句中使用（最常见的过滤器）
良好的行密度（每个值在每个数据块中 >100 行）
主要的逻辑分区/分组

物联网：device_id
金融：symbol
指标：service_name、service_name, metric_type（如果行密度足够）、metric_name, metric_type（如果行密度足够）
分析：如果行密度足够则用 user_id，否则用 session_id
电子商务：如果行密度足够则用 product_id，否则用 category_id

行密度指南：

目标：每个 segment_by 值在每个数据块内 >100 行。
差：每个 segment_by 值在每个数据块内 <10 行 → 选择粒度更粗的列。
如何处理低密度列：将其前置到 order_by 列列表中。

查询模式驱动选择：

SELECT * FROM table WHERE entity_id = 'X' AND timestamp > ...
-- ↳ segment_by: entity_id (if >100 rows per chunk)

避免： 时间戳、唯一 ID、低密度列（<100 行/值/数据块）、很少用于过滤的列

与 segment_by 结合使用时，创建自然的时间序列进展以实现最佳压缩。

最常见： timestamp DESC

物联网/金融/电子商务：timestamp DESC
指标：metric_name, timestamp DESC（如果 metric_name 密度太低无法用于 segment_by）
分析：user_id, timestamp DESC（user_id 密度太低无法用于 segment_by）

对于具有序列号的事件流，使用 sequence_id DESC
对于同一时间戳内的子排序，使用 timestamp DESC, event_order DESC

低密度列处理： 如果某列在每个数据块中 <100 行（对于 segment_by 来说太低），则将其前置到 order_by：

示例：metric_name 有 20 行/数据块 → 使用 segment_by='service_name'、order_by='metric_name, timestamp DESC'
将相似的值分组在一起（所有温度读数，然后是压力读数）以获得更好的压缩

良好测试： 由 (segment_by_column, order_by_column) 创建的排序应形成自然的时间序列进展。进展中彼此接近的值应该是相似的。

避免在 order_by 中使用： 随机列、相邻行之间差异很大的列、与 segment_by 无关的列

压缩稀疏索引选择

稀疏索引 允许在不解压的情况下对压缩数据进行查询过滤。存储每批（约 1000 行）的元数据，以消除不匹配查询谓词的批次。

minmax： 每批的最小/最大值 - 用于数值/时间列的范围查询（>、<、BETWEEN）

对以下使用 minmax： 价格、温度、测量值、时间戳（范围过滤）

用于异常值检测的 minmax（温度 > 90）。
用于与 segmentby 和 orderby 列高度相关的字段的 minmax（例如，如果 orderby 包含 created_at，则对 updated_at 使用 minmax 很有用）。

避免： 很少被过滤的列。

重要：切勿对 segmentby 或 orderby 中的列建立索引。Orderby 列将始终具有 minmax 索引，无需任何配置。

配置： 格式是 type_of_index(column_name) 的逗号分隔列表。

ALTER TABLE table_name SET (
    timescaledb.sparse_index = 'minmax(value_1),minmax(value_2)'
);

自 v2.22.0 起提供显式配置（自 v2.16.0 起自动创建）。

数据块时间间隔（可选）

默认值：7 天（如果容量未知则使用，或询问用户）。根据容量调整：

高频：1 小时 - 1 天
中频：1 天 - 1 周
低频：1 周 - 1 个月

SELECT set_chunk_time_interval('your_table_name', INTERVAL '1 day');

良好测试： 最近的数据块索引应占用少于 25% 的 RAM。

常见索引模式 - 在 id 和时间戳上的复合索引：

CREATE INDEX idx_entity_timestamp ON your_table_name (entity_id, timestamp DESC);

重要： 仅创建您实际会使用的索引 - 每个索引都有维护开销。

主键和唯一约束规则： 必须包含分区列。

选项 1：包含分区列的复合主键

ALTER TABLE your_table_name ADD PRIMARY KEY (entity_id, timestamp);

选项 2：单列主键（仅当它是分区列时）

CREATE TABLE ... (id BIGINT PRIMARY KEY, ...) WITH (tsdb.partition_column='id');

选项 3：无主键：对于以插入为主的模式，通常不需要严格的唯一性。

步骤 2：压缩策略（可选）

重要： 如果您在步骤 1 中使用了 tsdb.enable_columnstore=true，从 TimescaleDB 版本 2.23 开始，将自动创建一个列存储策略，其 after => INTERVAL '7 days'。您只有在希望将 after 间隔自定义为 7 天以外的值时，才需要调用 add_columnstore_policy()。

设置 after 间隔，用于以下情况：数据变得基本不可变（允许一些更新/回填）且查询不需要 B 树索引（较少见的标准）。

-- In TimescaleDB 2.23 and later only needed if you want to override the default 7-day policy created by tsdb.enable_columnstore=true
-- Remove the existing auto-created policy first:
-- CALL remove_columnstore_policy('your_table_name');
-- Then add custom policy:
-- CALL add_columnstore_policy('your_table_name', after => INTERVAL '1 day');

步骤 3：保留策略

重要：不要猜测 - 询问用户，如果未知则注释掉。

-- Example - replace with requirements or comment out
SELECT add_retention_policy('your_table_name', INTERVAL '365 days');

步骤 4：创建连续聚合

为不同用途使用不同的聚合间隔。

短期（分钟/小时）

用于高频数据的最新仪表板。

CREATE MATERIALIZED VIEW your_table_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 hour', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;

长期（天/周/月）

用于长期报告和分析。

CREATE MATERIALIZED VIEW your_table_daily
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 day', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value_1) as median_value_1,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value_1) as p95_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;

步骤 5：聚合刷新策略

根据您的数据新鲜度要求设置刷新策略。

start_offset： 通常省略（刷新所有数据）。例外：如果您不关心刷新早于 X 的数据（见下文）。如果原始数据有保留策略：匹配保留策略。

end_offset： 设置超出活动更新窗口的时间（例如，如果数据通常在 10 分钟内到达，则设置为 15 分钟）。比 end_offset 更新的数据在没有实时聚合的情况下不会出现在查询中。如果您不知道更新窗口，请使用查询中 time_bucket 的大小，但不少于 5 分钟。

schedule_interval： 设置为与 end_offset 相同的值，但不超过 1 小时。

每小时 - 用于仪表板的频繁刷新：

SELECT add_continuous_aggregate_policy('your_table_hourly',
    start_offset => NULL,
    end_offset => INTERVAL '15 minutes',
    schedule_interval => INTERVAL '15 minutes');

每天 - 用于报告的较低频率刷新：

SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

仅当您不关心刷新旧数据时才使用 start_offset 用于旧数据查询准确性不重要的大容量系统：

-- the following aggregate can be stale for data older than 7 days
-- SELECT add_continuous_aggregate_policy('aggregate_for_last_7_days',
--     start_offset => INTERVAL '7 days',    -- only refresh last 7 days (NULL = refresh all)
--     end_offset => INTERVAL '15 minutes',
--     schedule_interval => INTERVAL '15 minutes');

重要：您必须将 start_offset 设置为小于原始数据的保留策略。默认情况下，将 start_offset 设置为等于保留策略。如果保留策略被注释掉，也请注释掉 start_offset。像这样：

SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,    -- Use NULL to refresh all data, or set to retention period if enabled on raw data
--  start_offset => INTERVAL '<retention period here>',    -- uncomment if retention policy is enabled on the raw data table
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

步骤 6：实时聚合（可选）

实时聚合在查询时结合物化数据和最近的原始数据。以更高的查询延迟为代价提供最新的结果。

对于细粒度聚合（例如，每分钟）比粗粒度聚合（例如，每天/每月）更有用，因为大桶无论如何大部分都会因最近的数据而不完整。

自 v2.13+ 起默认禁用，在此之前默认启用。

在以下情况下使用： 需要比 end_offset 更新的数据、最新仪表板、可以容忍更高的查询延迟 在以下情况下禁用： 性能关键、刷新策略足够、高查询量、可以接受最近数据的缺失和过时数据

启用以获得当前结果（更高的查询成本）：

ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = false);

禁用以提高性能（但结果可能过时）：

ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = true);

步骤 7：压缩聚合

规则：segment_by = 除 time_bucket 外的所有 GROUP BY 列，order_by = time_bucket DESC

-- Hourly
ALTER MATERIALIZED VIEW your_table_hourly SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_hourly', after => INTERVAL '3 days');

-- Daily
ALTER MATERIALIZED VIEW your_table_daily SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_daily', after => INTERVAL '7 days');

步骤 8：聚合保留

聚合数据通常比原始数据保留更长时间。重要：不要猜测 - 询问用户，如果未知则必须注释掉。

-- Example - replace or comment out
SELECT add_retention_policy('your_table_hourly', INTERVAL '2 years');
SELECT add_retention_policy('your_table_daily', INTERVAL '5 years');

步骤 9：连续聚合上的性能索引

索引策略： 分析常见查询中的 WHERE 子句 → 创建匹配过滤列 + 时间排序的索引

模式： (filter_column, bucket DESC) 支持 WHERE filter_column = X AND bucket >= Y ORDER BY bucket DESC

CREATE INDEX idx_hourly_entity_bucket ON your_table_hourly (entity_id, bucket DESC);
CREATE INDEX idx_hourly_category_bucket ON your_table_hourly (category, bucket DESC);

多列过滤器： 为 WHERE entity_id = X AND category = Y 创建复合索引：

CREATE INDEX idx_hourly_entity_category_bucket ON your_table_hourly (entity_id, category, bucket DESC);

重要： 仅创建您实际会使用的索引 - 每个索引都有维护开销。

步骤 10：可选增强功能

空间分区（不推荐）

仅适用于您总是通过空间分区列进行过滤的查询模式，并且需要专家知识和广泛的基准测试。强烈建议仅使用时间分区。

步骤 11：验证配置

-- Check hypertable
SELECT * FROM timescaledb_information.hypertables
WHERE hypertable_name = 'your_table_name';

-- Check compression settings
SELECT * FROM hypertable_compression_stats('your_table_name');

-- Check aggregates
SELECT * FROM timescaledb_information.continuous_aggregates;

-- Check policies
SELECT * FROM timescaledb_information.jobs ORDER BY job_id;

-- Monitor chunk information
SELECT
    chunk_name,
    range_start,
    range_end,
    is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'your_table_name'
ORDER BY range_start DESC;

数据块大小： 最近的数据块索引应占用少于 25% 的 RAM
压缩： 使用正确的列存储配置，预期减少 90% 以上（10 倍）
查询优化： 对历史查询和仪表板使用连续聚合
内存： 对于自托管，运行 timescaledb-tune（在云上自动配置）

该做和不该做的

✅ 使用 TIMESTAMPTZ 而不是 timestamp
✅ 对时间戳使用 >= 和 < 而不是 BETWEEN
✅ 使用带约束的 TEXT 而不是 char(n)/varchar(n)
✅ 使用 snake_case 而不是 CamelCase
✅ 使用 BIGINT GENERATED ALWAYS AS IDENTITY 而不是 SERIAL
✅ 默认情况下对 ID 使用 BIGINT 而不是 INTEGER 或 SMALLINT
✅ 默认情况下使用 DOUBLE PRECISION 而不是 REAL/FLOAT
✅ 使用 NUMERIC 而不是 MONEY
✅ 使用 NOT EXISTS 而不是 NOT IN
✅ 使用 time_bucket() 或 date_trunc() 而不是 timestamp(0) 进行截断

API 参考（当前与已弃用）

已弃用参数 → 新参数：

timescaledb.compress → timescaledb.enable_columnstore
timescaledb.compress_segmentby → timescaledb.segmentby
timescaledb.compress_orderby → timescaledb.orderby

已弃用函数 → 新函数：

add_compression_policy() → add_columnstore_policy()
remove_compression_policy() → remove_columnstore_policy()
compress_chunk() → convert_to_columnstore()（与 CALL 一起使用，而不是 SELECT）
decompress_chunk() → convert_to_rowstore()（与 CALL 一起使用，而不是 SELECT）

压缩统计信息（使用函数，而不是视图）：

使用函数：hypertable_compression_stats('table_name')
使用函数：chunk_compression_stats('_timescaledb_internal._hyper_X_Y_chunk')
注意：像 columnstore_settings 这样的视图可能并非在所有版本中都可用；请改用函数

手动压缩示例：

-- Compress a specific chunk
CALL convert_to_columnstore('_timescaledb_internal._hyper_7_1_chunk');

-- Check compression statistics
SELECT
    number_compressed_chunks,
    pg_size_pretty(before_compression_total_bytes) as before_compression,
    pg_size_pretty(after_compression_total_bytes) as after_compression,
    ROUND(100.0 * (1 - after_compression_total_bytes::numeric / NULLIF(before_compression_total_bytes, 0)), 1) as compression_pct
FROM hypertable_compression_stats('your_table_name');

需要询问用户的问题

您将存储什么类型的数据？
您期望如何使用这些数据？
您将运行哪些查询？
数据需要保留多长时间？
如果不清楚，请说明列类型

2026 年 1 月 22 日

🇺🇸English

TimescaleDB Complete Setup

Instructions for insert-heavy data patterns where data is inserted but rarely changed:

Time-series data (sensors, metrics, system monitoring)
Event logs (user events, audit trails, application logs)
Transaction records (orders, payments, financial transactions)
Sequential data (records with auto-incrementing IDs and timestamps)
Append-only datasets (immutable records, historical data)

Step 1: Create Hypertable

CREATE TABLE your_table_name (
    timestamp TIMESTAMPTZ NOT NULL,
    entity_id TEXT NOT NULL,          -- device_id, user_id, symbol, etc.
    category TEXT,                    -- sensor_type, event_type, asset_class, etc.
    value_1 DOUBLE PRECISION,         -- price, temperature, latency, etc.
    value_2 DOUBLE PRECISION,         -- volume, humidity, throughput, etc.
    value_3 INTEGER,                  -- count, status, level, etc.
    metadata JSONB                    -- flexible additional data
) WITH (
    tsdb.hypertable,
    tsdb.partition_column='timestamp',
    tsdb.enable_columnstore=true,     -- Disable if table has vector columns
    tsdb.segmentby='entity_id',       -- See selection guide below
    tsdb.orderby='timestamp DESC',     -- See selection guide below
    tsdb.sparse_index='minmax(value_1),minmax(value_2),minmax(value_3)' -- see selection guide below
);

Compression Decision

Enable by default for insert-heavy patterns
Disable if table has vector type columns (pgvector) - indexes on vector columns incompatible with columnstore

Partition Column Selection

Must be time-based (TIMESTAMP/TIMESTAMPTZ/DATE) or integer (INT/BIGINT) with good temporal/sequential distribution.

Common patterns:

TIME-SERIES: timestamp, event_time, measured_at
EVENT LOGS: event_time, created_at, logged_at
TRANSACTIONS: created_at, transaction_time, processed_at
SEQUENTIAL: id (auto-increment when no timestamp), sequence_number

Less ideal: ingested_at (when data entered system - use only if it's your primary query dimension) Avoid: updated_at (breaks time ordering unless it's primary query dimension)

Segment_By Column Selection

PREFER SINGLE COLUMN - multi-column rarely optimal. Multi-column can only work for highly correlated columns (e.g., metric_name + metric_type) with sufficient row density.

Requirements:

Frequently used in WHERE clauses (most common filter)
Good row density (>100 rows per value per chunk)
Primary logical partition/grouping

Examples:

IoT: device_id
Finance: symbol
Metrics: service_name, service_name, metric_type (if sufficient row density), metric_name, metric_type (if sufficient row density)
Analytics: user_id if sufficient row density, otherwise session_id
E-commerce: product_id if sufficient row density, otherwise category_id

Row density guidelines:

Target: >100 rows per segment_by value within each chunk.
Poor: <10 rows per segment_by value per chunk → choose less granular column
What to do with low-density columns: prepend to order_by column list.

Query pattern drives choice:

SELECT * FROM table WHERE entity_id = 'X' AND timestamp > ...
-- ↳ segment_by: entity_id (if >100 rows per chunk)

Avoid: timestamps, unique IDs, low-density columns (<100 rows/value/chunk), columns rarely used in filtering

Order_By Column Selection

Creates natural time-series progression when combined with segment_by for optimal compression.

Most common: timestamp DESC

Examples:

IoT/Finance/E-commerce: timestamp DESC
Metrics: metric_name, timestamp DESC (if metric_name has too low density for segment_by)
Analytics: user_id, timestamp DESC (user_id has too low density for segment_by)

Alternative patterns:

sequence_id DESC for event streams with sequence numbers
timestamp DESC, event_order DESC for sub-ordering within same timestamp

Low-density column handling: If a column has <100 rows per chunk (too low for segment_by), prepend it to order_by:

Example: metric_name has 20 rows/chunk → use segment_by='service_name', order_by='metric_name, timestamp DESC'
Groups similar values together (all temperature readings, then pressure readings) for better compression

Good test: ordering created by (segment_by_column, order_by_column) should form a natural time-series progression. Values close to each other in the progression should be similar.

Avoid in order_by: random columns, columns with high variance between adjacent rows, columns unrelated to segment_by

Compression Sparse Index Selection

Sparse indexes enable query filtering on compressed data without decompression. Store metadata per batch (~1000 rows) to eliminate batches that don't match query predicates.

Types:

minmax: Min/max values per batch - for range queries (>, <, BETWEEN) on numeric/temporal columns

Use minmax for: price, temperature, measurement, timestamp (range filtering)

Use for:

minmax for outlier detection (temperature > 90).
minmax for fields that are highly correlated with segmentby and orderby columns (e.g. if orderby includes created_at, minmax on updated_at is useful).

Avoid: rarely filtered columns.

IMPORTANT: NEVER index columns in segmentby or orderby. Orderby columns will always have minmax indexes without any configuration.

Configuration: The format is a comma-separated list of type_of_index(column_name).

ALTER TABLE table_name SET (
    timescaledb.sparse_index = 'minmax(value_1),minmax(value_2)'
);

Explicit configuration available since v2.22.0 (was auto-created since v2.16.0).

Chunk Time Interval (Optional)

Default: 7 days (use if volume unknown, or ask user). Adjust based on volume:

High frequency: 1 hour - 1 day
Medium: 1 day - 1 week
Low: 1 week - 1 month

SELECT set_chunk_time_interval('your_table_name', INTERVAL '1 day');

Good test: recent chunk indexes should fit in less than 25% of RAM.

Indexes & Primary Keys

Common index patterns - composite indexes on an id and timestamp:

CREATE INDEX idx_entity_timestamp ON your_table_name (entity_id, timestamp DESC);

Important: Only create indexes you'll actually use - each has maintenance overhead.

Primary key and unique constraints rules: Must include partition column.

Option 1: Composite PK with partition column

ALTER TABLE your_table_name ADD PRIMARY KEY (entity_id, timestamp);

Option 2: Single-column PK (only if it's the partition column)

CREATE TABLE ... (id BIGINT PRIMARY KEY, ...) WITH (tsdb.partition_column='id');

Option 3: No PK : strict uniqueness is often not required for insert-heavy patterns.

Step 2: Compression Policy (Optional)

IMPORTANT : If you used tsdb.enable_columnstore=true in Step 1, starting with TimescaleDB version 2.23 a columnstore policy is automatically created with after => INTERVAL '7 days'. You only need to call add_columnstore_policy() if you want to customize the after interval to something other than 7 days.

Set after interval for when: data becomes mostly immutable (some updates/backfill OK) AND B-tree indexes aren't needed for queries (less common criterion).

-- In TimescaleDB 2.23 and later only needed if you want to override the default 7-day policy created by tsdb.enable_columnstore=true
-- Remove the existing auto-created policy first:
-- CALL remove_columnstore_policy('your_table_name');
-- Then add custom policy:
-- CALL add_columnstore_policy('your_table_name', after => INTERVAL '1 day');

Step 3: Retention Policy

IMPORTANT: Don't guess - ask user or comment out if unknown.

-- Example - replace with requirements or comment out
SELECT add_retention_policy('your_table_name', INTERVAL '365 days');

Step 4: Create Continuous Aggregates

Use different aggregation intervals for different uses.

Short-term (Minutes/Hours)

For up-to-the-minute dashboards on high-frequency data.

CREATE MATERIALIZED VIEW your_table_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 hour', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;

Long-term (Days/Weeks/Months)

For long-term reporting and analytics.

CREATE MATERIALIZED VIEW your_table_daily
WITH (timescaledb.continuous) AS
SELECT
    time_bucket(INTERVAL '1 day', timestamp) AS bucket,
    entity_id,
    category,
    COUNT(*) as record_count,
    AVG(value_1) as avg_value_1,
    MIN(value_1) as min_value_1,
    MAX(value_1) as max_value_1,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value_1) as median_value_1,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value_1) as p95_value_1,
    SUM(value_2) as sum_value_2
FROM your_table_name
GROUP BY bucket, entity_id, category;

Step 5: Aggregate Refresh Policies

Set up refresh policies based on your data freshness requirements.

start_offset: Usually omit (refreshes all). Exception: If you don't care about refreshing data older than X (see below). With retention policy on raw data: match the retention policy.

end_offset: Set beyond active update window (e.g., 15 min if data usually arrives within 10 min). Data newer than end_offset won't appear in queries without real-time aggregation. If you don't know your update window, use the size of the time_bucket in the query, but not less than 5 minutes.

schedule_interval: Set to the same value as the end_offset but not more than 1 hour.

Hourly - frequent refresh for dashboards:

SELECT add_continuous_aggregate_policy('your_table_hourly',
    start_offset => NULL,
    end_offset => INTERVAL '15 minutes',
    schedule_interval => INTERVAL '15 minutes');

Daily - less frequent for reports:

SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

Use start_offset only if you don't care about refreshing old data Use for high-volume systems where query accuracy on older data doesn't matter:

-- the following aggregate can be stale for data older than 7 days
-- SELECT add_continuous_aggregate_policy('aggregate_for_last_7_days',
--     start_offset => INTERVAL '7 days',    -- only refresh last 7 days (NULL = refresh all)
--     end_offset => INTERVAL '15 minutes',
--     schedule_interval => INTERVAL '15 minutes');

IMPORTANT: you MUST set a start_offset to be less than the retention policy on raw data. By default, set the start_offset equal to the retention policy. If the retention policy is commented out, comment out the start_offset as well. like this:

SELECT add_continuous_aggregate_policy('your_table_daily',
    start_offset => NULL,    -- Use NULL to refresh all data, or set to retention period if enabled on raw data
--  start_offset => INTERVAL '<retention period here>',    -- uncomment if retention policy is enabled on the raw data table
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

Step 6: Real-Time Aggregation (Optional)

Real-time combines materialized + recent raw data at query time. Provides up-to-date results at the cost of higher query latency.

More useful for fine-grained aggregates (e.g., minutely) than coarse ones (e.g., daily/monthly) since large buckets will be mostly incomplete with recent data anyway.

Disabled by default in v2.13+, before that it was enabled by default.

Use when: Need data newer than end_offset, up-to-minute dashboards, can tolerate higher query latency Disable when: Performance critical, refresh policies sufficient, high query volume, missing and stale data for recent data is acceptable

Enable for current results (higher query cost):

ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = false);

Disable for performance (but with stale results):

ALTER MATERIALIZED VIEW your_table_hourly SET (timescaledb.materialized_only = true);

Step 7: Compress Aggregates

Rule: segment_by = ALL GROUP BY columns except time_bucket, order_by = time_bucket DESC

-- Hourly
ALTER MATERIALIZED VIEW your_table_hourly SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_hourly', after => INTERVAL '3 days');

-- Daily
ALTER MATERIALIZED VIEW your_table_daily SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'entity_id, category',
    timescaledb.orderby = 'bucket DESC'
);
CALL add_columnstore_policy('your_table_daily', after => INTERVAL '7 days');

Step 8: Aggregate Retention

Aggregates are typically kept longer than raw data. IMPORTANT: Don't guess - ask user or you MUST comment out if unknown.

-- Example - replace or comment out
SELECT add_retention_policy('your_table_hourly', INTERVAL '2 years');
SELECT add_retention_policy('your_table_daily', INTERVAL '5 years');

Step 9: Performance Indexes on Continuous Aggregates

Index strategy: Analyze WHERE clauses in common queries → Create indexes matching filter columns + time ordering

Pattern: (filter_column, bucket DESC) supports WHERE filter_column = X AND bucket >= Y ORDER BY bucket DESC

Examples:

CREATE INDEX idx_hourly_entity_bucket ON your_table_hourly (entity_id, bucket DESC);
CREATE INDEX idx_hourly_category_bucket ON your_table_hourly (category, bucket DESC);

Multi-column filters: Create composite indexes for WHERE entity_id = X AND category = Y:

CREATE INDEX idx_hourly_entity_category_bucket ON your_table_hourly (entity_id, category, bucket DESC);

Important: Only create indexes you'll actually use - each has maintenance overhead.

Step 10: Optional Enhancements

Space Partitioning (NOT RECOMMENDED)

Only for query patterns where you ALWAYS filter by the space-partition column with expert knowledge and extensive benchmarking. STRONGLY prefer time-only partitioning.

Step 11: Verify Configuration

-- Check hypertable
SELECT * FROM timescaledb_information.hypertables
WHERE hypertable_name = 'your_table_name';

-- Check compression settings
SELECT * FROM hypertable_compression_stats('your_table_name');

-- Check aggregates
SELECT * FROM timescaledb_information.continuous_aggregates;

-- Check policies
SELECT * FROM timescaledb_information.jobs ORDER BY job_id;

-- Monitor chunk information
SELECT
    chunk_name,
    range_start,
    range_end,
    is_compressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'your_table_name'
ORDER BY range_start DESC;

Performance Guidelines

Chunk size: Recent chunk indexes should fit in less than 25% of RAM
Compression: Expect 90%+ reduction (10x) with proper columnstore config
Query optimization: Use continuous aggregates for historical queries and dashboards
Memory: Run timescaledb-tune for self-hosting (auto-configured on cloud)

Schema Best Practices

Do's and Don'ts

✅ Use TIMESTAMPTZ NOT timestamp
✅ Use >= and < NOT BETWEEN for timestamps
✅ Use TEXT with constraints NOT char(n)/varchar(n)
✅ Use snake_case NOT CamelCase
✅ Use BIGINT GENERATED ALWAYS AS IDENTITY NOT

API Reference (Current vs Deprecated)

Deprecated Parameters → New Parameters:

timescaledb.compress → timescaledb.enable_columnstore
timescaledb.compress_segmentby → timescaledb.segmentby
timescaledb.compress_orderby → timescaledb.orderby

Deprecated Functions → New Functions:

add_compression_policy() → add_columnstore_policy()
remove_compression_policy() → remove_columnstore_policy()
compress_chunk() → convert_to_columnstore() (use with CALL, not SELECT)
decompress_chunk() → convert_to_rowstore() (use with CALL, not )

Compression Stats (use functions, not views):

Use function: hypertable_compression_stats('table_name')
Use function: chunk_compression_stats('_timescaledb_internal._hyper_X_Y_chunk')
Note: Views like columnstore_settings may not be available in all versions; use functions instead

Manual Compression Example:

-- Compress a specific chunk
CALL convert_to_columnstore('_timescaledb_internal._hyper_7_1_chunk');

-- Check compression statistics
SELECT
    number_compressed_chunks,
    pg_size_pretty(before_compression_total_bytes) as before_compression,
    pg_size_pretty(after_compression_total_bytes) as after_compression,
    ROUND(100.0 * (1 - after_compression_total_bytes::numeric / NULLIF(before_compression_total_bytes, 0)), 1) as compression_pct
FROM hypertable_compression_stats('your_table_name');

Questions to Ask User

What kind of data will you be storing?
How do you expect to use the data?
What queries will you run?
How long to keep the data?
Column types if unclear

Weekly Installs

Repository

timescale/pg-aiguide

GitHub Stars

1.7K

First Seen

Jan 22, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli51

opencode50

codex50

github-copilot48

claude-code48

cursor47

Google Sheets 销售跟踪自动化：记录交易更新到表格的完整指南

7,700 周安装

APPEND-ONLY: created_at, inserted_at, id

✅ Use BIGINT for IDs by default over INTEGER or SMALLINT

✅ Use DOUBLE PRECISION by default over REAL/FLOAT

✅ Use NUMERIC NOT MONEY

✅ Use NOT EXISTS NOT NOT IN

✅ Use time_bucket() or date_trunc() NOT timestamp(0) for truncation