postgresql-database-engineering by manutej/luxor-claude-marketplace
npx skills add https://github.com/manutej/luxor-claude-marketplace --skill postgresql-database-engineering一项全面的专业技能,涵盖专业的 PostgreSQL 数据库工程,从查询优化和索引策略到高可用性、复制和生产数据库管理。此技能使您能够大规模地设计、优化和维护高性能的 PostgreSQL 数据库。
在以下情况下使用此技能:
PostgreSQL 采用基于进程的架构,包含以下几个关键组件:
PostgreSQL 的基础并发机制:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
关键影响:
PostgreSQL 支持四种隔离级别:
选择隔离级别:
PostgreSQL 为不同用例提供多种索引类型:
PostgreSQL 的查询规划器决定执行策略:
规划器组件:
关键统计信息:
n_distinct:不同值的数量(用于选择性)correlation:物理行排序相关性most_common_vals:用于偏斜分布的最常见值列表histogram_bounds:值分布直方图理解 EXPLAIN:
用于管理大型数据集的表分区:
分区裁剪:
分区级操作:
PostgreSQL 复制选项:
同步与异步:
高效管理数据库连接:
池化模式:
关键的维护操作:
关键配置参数:
shared_buffers: 25% of RAM (start point)
effective_cache_size: 50-75% of RAM
work_mem: Per-operation memory (sort, hash)
maintenance_work_mem: VACUUM, CREATE INDEX memory
checkpoint_timeout: How often to checkpoint
max_wal_size: WAL size before checkpoint
checkpoint_completion_target: Spread checkpoint I/O
wal_buffers: WAL write buffer size
random_page_cost: Relative cost of random I/O
effective_io_concurrency: Concurrent I/O operations
default_statistics_target: Histogram detail level
max_connections: Maximum client connections
connection_limit: Per-database/user limits
决策矩阵:
| 查询模式 | 索引类型 | 原因 |
|---|---|---|
WHERE id = 5 | B-tree | 相等性查找 |
WHERE created_at > '2024-01-01' | B-tree | 范围查询 |
ORDER BY name | B-tree | 支持排序 |
WHERE tags @> ARRAY['sql'] | GIN | 数组包含 |
WHERE data->>'status' = 'active' | GIN (jsonb_path_ops) | JSONB 查询 |
WHERE to_tsvector(content) @@ query | GIN | 全文搜索 |
WHERE location <-> point(0,0) | GiST | 最近邻 |
WHERE timestamp BETWEEN ... (large table) | BRIN | 顺序时间序列 |
WHERE ip_address << '192.168.0.0/16' | GiST 或 SP-GiST | IP 范围查询 |
用于复杂查询的多列索引:
列排序规则:
示例:
-- Query: WHERE status = 'active' AND created_at > '2024-01-01' ORDER BY created_at
-- Optimal index: (status, created_at)
CREATE INDEX idx_users_status_created ON users(status, created_at);
索引行的子集:
优势:
用例:
WHERE deleted_at IS NULLWHERE created_at > NOW() - INTERVAL '90 days'WHERE status IN ('pending', 'processing')索引计算值:
示例:
-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Date truncation
CREATE INDEX idx_events_date ON events(DATE(created_at));
-- JSONB field
CREATE INDEX idx_data_status ON documents((data->>'status'));
包含非键列以实现仅索引扫描:
CREATE INDEX idx_users_email_include
ON users(email)
INCLUDE (first_name, last_name, created_at);
优势: 查询完全由索引满足,无需表查找
监控索引使用情况:
-- Unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
检测膨胀:
-- Index bloat estimation
SELECT schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
理解查询执行:
-- Basic EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
-- EXPLAIN ANALYZE (actually runs query)
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2024-01-01';
-- Detailed output
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT u.*, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01';
关键指标:
问题: 循环中每行执行一次查询 解决方案: 使用 JOIN 或批量查询
问题: 获取不必要的列 解决方案: 仅选择需要的列
问题: 由于类型不匹配导致索引未被使用 解决方案: 确保查询类型与列类型匹配
问题: WHERE UPPER(email) = 'USER@EXAMPLE.COM' 解决方案: 使用表达式索引或正确比较
问题: WHERE status = 'A' OR status = 'B' 解决方案: 使用 IN:WHERE status IN ('A', 'B')
连接类型:
嵌套循环
哈希连接
归并连接
连接顺序很重要:
SET join_collapse_limit 强制顺序技术:
物化视图:
级别:
时间序列示例:
-- Create partitioned table
CREATE TABLE events (
id BIGSERIAL,
event_type TEXT NOT NULL,
user_id INTEGER NOT NULL,
data JSONB,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Default partition for data outside ranges
CREATE TABLE events_default PARTITION OF events DEFAULT;
-- Indexes on partitions
CREATE INDEX idx_events_2024_01_user ON events_2024_01(user_id);
CREATE INDEX idx_events_2024_02_user ON events_2024_02(user_id);
自动化分区管理:
-- Function to create monthly partitions
CREATE OR REPLACE FUNCTION create_monthly_partition(
base_table TEXT,
partition_date DATE
) RETURNS VOID AS $$
DECLARE
partition_name TEXT;
start_date DATE;
end_date DATE;
BEGIN
partition_name := base_table || '_' || TO_CHAR(partition_date, 'YYYY_MM');
start_date := DATE_TRUNC('month', partition_date);
end_date := start_date + INTERVAL '1 month';
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I
FOR VALUES FROM (%L) TO (%L)',
partition_name, base_table, start_date, end_date
);
-- Create indexes
EXECUTE format(
'CREATE INDEX IF NOT EXISTS %I ON %I(user_id)',
'idx_' || partition_name || '_user', partition_name
);
END;
$$ LANGUAGE plpgsql;
删除旧分区:
-- Detach partition (fast, non-blocking)
ALTER TABLE events DETACH PARTITION events_2023_01;
-- Drop detached partition
DROP TABLE events_2023_01;
-- Or archive before dropping
CREATE TABLE archive.events_2023_01 AS SELECT * FROM events_2023_01;
DROP TABLE events_2023_01;
主服务器配置(postgresql.conf):
# Replication settings
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on
synchronous_commit = on # or off for async
synchronous_standby_names = 'standby1,standby2' # for sync replication
创建复制用户:
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';
主服务器上的 pg_hba.conf:
# Allow replication connections
host replication replicator standby_ip/32 md5
备用服务器设置:
# Stop standby PostgreSQL
systemctl stop postgresql
# Remove old data directory
rm -rf /var/lib/postgresql/14/main
# Base backup from primary
pg_basebackup -h primary_host -D /var/lib/postgresql/14/main \
-U replicator -P -v -R -X stream -C -S standby1
# Start standby
systemctl start postgresql
备用服务器配置(由 -R 标志创建):
# standby.signal file created automatically
# postgresql.auto.conf contains:
primary_conninfo = 'host=primary_host port=5432 user=replicator password=secure_password'
primary_slot_name = 'standby1'
在主服务器上:
-- Check replication status
SELECT client_addr, state, sync_state, replay_lag
FROM pg_stat_replication;
-- Check replication slots
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;
在备用服务器上:
-- Check replication lag
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;
-- Check recovery status
SELECT pg_is_in_recovery();
将备用服务器提升为主服务器:
# Trigger failover
pg_ctl promote -D /var/lib/postgresql/14/main
# Or using SQL
SELECT pg_promote();
受控切换:
# 1. Stop writes on primary
# 2. Wait for standby to catch up
# 3. Promote standby
# 4. Reconfigure old primary as new standby
在发布者(源)上:
-- Create publication
CREATE PUBLICATION my_publication FOR TABLE users, orders;
-- Or all tables
CREATE PUBLICATION all_tables FOR ALL TABLES;
在订阅者(目标)上:
-- Create subscription
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_host dbname=mydb user=replicator password=pass'
PUBLICATION my_publication;
-- Monitor subscription
SELECT * FROM pg_stat_subscription;
pg_basebackup:
# Full physical backup
pg_basebackup -h localhost -U postgres -D /backup/base \
-F tar -z -P -v
# With WAL files for point-in-time recovery
pg_basebackup -h localhost -U postgres -D /backup/base \
-X stream -F tar -z -P
连续归档(WAL 归档):
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'
pg_dump:
# Single database
pg_dump -h localhost -U postgres -F c -b -v -f mydb.dump mydb
# All databases
pg_dumpall -h localhost -U postgres -f all_databases.sql
# Specific tables
pg_dump -h localhost -U postgres -t users -t orders -F c -f tables.dump mydb
# Schema only
pg_dump -h localhost -U postgres --schema-only -F c -f schema.dump mydb
pg_restore:
# Restore database
pg_restore -h localhost -U postgres -d mydb -v mydb.dump
# Parallel restore
pg_restore -h localhost -U postgres -d mydb -j 4 -v mydb.dump
# Restore specific tables
pg_restore -h localhost -U postgres -d mydb -t users -v mydb.dump
设置:
恢复:
# 1. Restore base backup
tar -xzf base.tar.gz -C /var/lib/postgresql/14/main
# 2. Create recovery.signal file
touch /var/lib/postgresql/14/main/recovery.signal
# 3. Configure recovery target (postgresql.conf or postgresql.auto.conf)
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
# Or: recovery_target_name = 'before_disaster'
# Or: recovery_target_lsn = '0/3000000'
# 4. Start PostgreSQL
systemctl start postgresql
3-2-1 规则:
备份计划:
测试备份:
数据库健康状况:
查询性能:
系统资源:
连接统计:
SELECT count(*) as total_connections,
count(*) FILTER (WHERE state = 'active') as active,
count(*) FILTER (WHERE state = 'idle') as idle,
count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_transaction
FROM pg_stat_activity;
缓存命中率:
SELECT sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio
FROM pg_statio_user_tables;
表膨胀:
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
长时间运行的查询:
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;
锁监控:
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted
AND blocking_locks.granted;
安装:
CREATE EXTENSION pg_stat_statements;
配置(postgresql.conf):
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
按总时间排序的顶级查询:
SELECT query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
按平均时间排序的顶级查询:
SELECT query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;
规范化:
数据类型:
约束:
零停机迁移:
添加新列
ALTER TABLE users ADD COLUMN email_verified BOOLEAN;
回填数据(分批)
UPDATE users SET email_verified = false
WHERE email_verified IS NULL
LIMIT 10000;
添加 NOT NULL 约束
ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
索引创建:
CREATE INDEX CONCURRENTLYpg_stat_progress_create_index 监控进度大型表修改:
pg_repack 进行表重写认证:
授权:
网络安全:
审计日志:
每日:
每周:
每月:
每季度:
配置:
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
parallel_setup_cost = 1000
parallel_tuple_cost = 0.1
min_parallel_table_scan_size = 8MB
强制并行执行:
SET max_parallel_workers_per_gather = 4;
EXPLAIN ANALYZE SELECT COUNT(*) FROM large_table;
何时并行有帮助:
存储过程:
CREATE OR REPLACE PROCEDURE update_user_statistics()
LANGUAGE plpgsql
AS $$
BEGIN
UPDATE users SET
order_count = (SELECT COUNT(*) FROM orders WHERE user_id = users.id),
last_order_date = (SELECT MAX(created_at) FROM orders WHERE user_id = users.id);
COMMIT;
END;
$$;
具有适当错误处理的函数:
CREATE OR REPLACE FUNCTION create_user(
p_email TEXT,
p_name TEXT
) RETURNS INTEGER
LANGUAGE plpgsql
AS $$
DECLARE
v_user_id INTEGER;
BEGIN
INSERT INTO users (email, name)
VALUES (p_email, p_name)
RETURNING id INTO v_user_id;
RETURN v_user_id;
EXCEPTION
WHEN unique_violation THEN
RAISE EXCEPTION 'Email already exists: %', p_email;
WHEN OTHERS THEN
RAISE EXCEPTION 'Error creating user: %', SQLERRM;
END;
$$;
访问外部数据源:
-- Install postgres_fdw
CREATE EXTENSION postgres_fdw;
-- Create server
CREATE SERVER remote_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'remote_host', dbname 'remote_database', port '5432');
-- Create user mapping
CREATE USER MAPPING FOR current_user
SERVER remote_db
OPTIONS (user 'remote_user', password 'remote_password');
-- Import foreign schema
IMPORT FOREIGN SCHEMA public
FROM SERVER remote_db
INTO local_schema;
-- Query foreign table
SELECT * FROM local_schema.remote_table;
索引 JSONB:
-- GIN index for containment queries
CREATE INDEX idx_data_gin ON documents USING GIN (data);
-- Expression index for specific field
CREATE INDEX idx_data_status ON documents ((data->>'status'));
-- GIN index with jsonb_path_ops (smaller, faster for @> queries)
CREATE INDEX idx_data_path_ops ON documents USING GIN (data jsonb_path_ops);
高效的 JSONB 查询:
-- Containment query (uses GIN index)
SELECT * FROM documents WHERE data @> '{"status": "active"}';
-- Existence query
SELECT * FROM documents WHERE data ? 'email';
-- Path query
SELECT * FROM documents WHERE data->'user'->>'email' = 'user@example.com';
-- Array operations
SELECT * FROM documents WHERE data->'tags' @> '["sql", "postgres"]';
基本设置:
-- Add tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
-- Generate search vector
UPDATE articles SET search_vector =
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(content, ''));
-- Create GIN index
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
-- Trigger for automatic updates
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', title, content);
搜索查询:
-- Basic search
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & database') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- Phrase search
SELECT title FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'database engineering');
-- Search with highlighting
SELECT title,
ts_headline('english', content, query) AS snippet
FROM articles, to_tsquery('english', 'postgresql') query
WHERE search_vector @@ query;
问题:慢查询
ANALYZE table_name问题:高 CPU 使用率
问题:连接耗尽
pg_stat_activity 监控**问题:
A comprehensive skill for professional PostgreSQL database engineering, covering everything from query optimization and indexing strategies to high availability, replication, and production database management. This skill enables you to design, optimize, and maintain high-performance PostgreSQL databases at scale.
Use this skill when:
PostgreSQL uses a process-based architecture with several key components:
PostgreSQL's foundational concurrency mechanism:
Key Implications:
PostgreSQL supports four isolation levels:
Choosing Isolation:
PostgreSQL offers multiple index types for different use cases:
PostgreSQL's query planner determines execution strategies:
Planner Components:
Key Statistics:
n_distinct: Number of distinct values (for selectivity)correlation: Physical row ordering correlationmost_common_vals: MCV list for skewed distributionshistogram_bounds: Value distribution histogramUnderstanding EXPLAIN:
Table partitioning for managing large datasets:
Partition Pruning:
Partition-Wise Operations:
PostgreSQL replication options:
Synchronous vs Asynchronous:
Managing database connections efficiently:
Pooling Modes:
Critical maintenance operations:
Key configuration parameters:
shared_buffers: 25% of RAM (start point)
effective_cache_size: 50-75% of RAM
work_mem: Per-operation memory (sort, hash)
maintenance_work_mem: VACUUM, CREATE INDEX memory
checkpoint_timeout: How often to checkpoint
max_wal_size: WAL size before checkpoint
checkpoint_completion_target: Spread checkpoint I/O
wal_buffers: WAL write buffer size
random_page_cost: Relative cost of random I/O
effective_io_concurrency: Concurrent I/O operations
default_statistics_target: Histogram detail level
max_connections: Maximum client connections
connection_limit: Per-database/user limits
Decision Matrix:
| Query Pattern | Index Type | Reason |
|---|---|---|
WHERE id = 5 | B-tree | Equality lookup |
WHERE created_at > '2024-01-01' | B-tree | Range query |
ORDER BY name | B-tree | Sorting support |
WHERE tags @> ARRAY['sql'] | GIN | Array containment |
WHERE data->>'status' = 'active' | GIN (jsonb_path_ops) |
Multi-column indexes for complex queries:
Column Ordering Rules:
Example:
-- Query: WHERE status = 'active' AND created_at > '2024-01-01' ORDER BY created_at
-- Optimal index: (status, created_at)
CREATE INDEX idx_users_status_created ON users(status, created_at);
Index subset of rows:
Benefits:
Use Cases:
WHERE deleted_at IS NULLWHERE created_at > NOW() - INTERVAL '90 days'WHERE status IN ('pending', 'processing')Index computed values:
Examples:
-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Date truncation
CREATE INDEX idx_events_date ON events(DATE(created_at));
-- JSONB field
CREATE INDEX idx_data_status ON documents((data->>'status'));
Include non-key columns for index-only scans:
CREATE INDEX idx_users_email_include
ON users(email)
INCLUDE (first_name, last_name, created_at);
Benefit: Query satisfied entirely from index, no table lookup
Monitoring Index Usage:
-- Unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Detecting Bloat:
-- Index bloat estimation
SELECT schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
Understanding query execution:
-- Basic EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
-- EXPLAIN ANALYZE (actually runs query)
EXPLAIN ANALYZE SELECT * FROM users WHERE created_at > '2024-01-01';
-- Detailed output
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT u.*, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01';
Key Metrics:
Problem: One query per row in a loop Solution: JOIN or batch queries
Problem: Fetches unnecessary columns Solution: Select only needed columns
Problem: Index not used due to type mismatch Solution: Ensure query types match column types
Problem: WHERE UPPER(email) = 'USER@EXAMPLE.COM' Solution: Use expression index or compare correctly
Problem: WHERE status = 'A' OR status = 'B' Solution: Use IN: WHERE status IN ('A', 'B')
Join Types:
Nested Loop
Hash Join
Merge Join
Join Order Matters:
SET join_collapse_limitTechniques:
Materialized Views:
Levels:
Time-series example:
-- Create partitioned table
CREATE TABLE events (
id BIGSERIAL,
event_type TEXT NOT NULL,
user_id INTEGER NOT NULL,
data JSONB,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Default partition for data outside ranges
CREATE TABLE events_default PARTITION OF events DEFAULT;
-- Indexes on partitions
CREATE INDEX idx_events_2024_01_user ON events_2024_01(user_id);
CREATE INDEX idx_events_2024_02_user ON events_2024_02(user_id);
Automated partition management:
-- Function to create monthly partitions
CREATE OR REPLACE FUNCTION create_monthly_partition(
base_table TEXT,
partition_date DATE
) RETURNS VOID AS $$
DECLARE
partition_name TEXT;
start_date DATE;
end_date DATE;
BEGIN
partition_name := base_table || '_' || TO_CHAR(partition_date, 'YYYY_MM');
start_date := DATE_TRUNC('month', partition_date);
end_date := start_date + INTERVAL '1 month';
EXECUTE format(
'CREATE TABLE IF NOT EXISTS %I PARTITION OF %I
FOR VALUES FROM (%L) TO (%L)',
partition_name, base_table, start_date, end_date
);
-- Create indexes
EXECUTE format(
'CREATE INDEX IF NOT EXISTS %I ON %I(user_id)',
'idx_' || partition_name || '_user', partition_name
);
END;
$$ LANGUAGE plpgsql;
Dropping old partitions:
-- Detach partition (fast, non-blocking)
ALTER TABLE events DETACH PARTITION events_2023_01;
-- Drop detached partition
DROP TABLE events_2023_01;
-- Or archive before dropping
CREATE TABLE archive.events_2023_01 AS SELECT * FROM events_2023_01;
DROP TABLE events_2023_01;
Primary server configuration (postgresql.conf):
# Replication settings
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
hot_standby = on
synchronous_commit = on # or off for async
synchronous_standby_names = 'standby1,standby2' # for sync replication
Create replication user:
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secure_password';
pg_hba.conf on primary:
# Allow replication connections
host replication replicator standby_ip/32 md5
Standby server setup:
# Stop standby PostgreSQL
systemctl stop postgresql
# Remove old data directory
rm -rf /var/lib/postgresql/14/main
# Base backup from primary
pg_basebackup -h primary_host -D /var/lib/postgresql/14/main \
-U replicator -P -v -R -X stream -C -S standby1
# Start standby
systemctl start postgresql
Standby configuration (created by -R flag):
# standby.signal file created automatically
# postgresql.auto.conf contains:
primary_conninfo = 'host=primary_host port=5432 user=replicator password=secure_password'
primary_slot_name = 'standby1'
On primary:
-- Check replication status
SELECT client_addr, state, sync_state, replay_lag
FROM pg_stat_replication;
-- Check replication slots
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;
On standby:
-- Check replication lag
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;
-- Check recovery status
SELECT pg_is_in_recovery();
Promoting standby to primary:
# Trigger failover
pg_ctl promote -D /var/lib/postgresql/14/main
# Or using SQL
SELECT pg_promote();
Controlled switchover:
# 1. Stop writes on primary
# 2. Wait for standby to catch up
# 3. Promote standby
# 4. Reconfigure old primary as new standby
On publisher (source):
-- Create publication
CREATE PUBLICATION my_publication FOR TABLE users, orders;
-- Or all tables
CREATE PUBLICATION all_tables FOR ALL TABLES;
On subscriber (destination):
-- Create subscription
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_host dbname=mydb user=replicator password=pass'
PUBLICATION my_publication;
-- Monitor subscription
SELECT * FROM pg_stat_subscription;
pg_basebackup:
# Full physical backup
pg_basebackup -h localhost -U postgres -D /backup/base \
-F tar -z -P -v
# With WAL files for point-in-time recovery
pg_basebackup -h localhost -U postgres -D /backup/base \
-X stream -F tar -z -P
Continuous archiving (WAL archiving):
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/wal/%f'
pg_dump:
# Single database
pg_dump -h localhost -U postgres -F c -b -v -f mydb.dump mydb
# All databases
pg_dumpall -h localhost -U postgres -f all_databases.sql
# Specific tables
pg_dump -h localhost -U postgres -t users -t orders -F c -f tables.dump mydb
# Schema only
pg_dump -h localhost -U postgres --schema-only -F c -f schema.dump mydb
pg_restore:
# Restore database
pg_restore -h localhost -U postgres -d mydb -v mydb.dump
# Parallel restore
pg_restore -h localhost -U postgres -d mydb -j 4 -v mydb.dump
# Restore specific tables
pg_restore -h localhost -U postgres -d mydb -t users -v mydb.dump
Setup:
Recovery:
# 1. Restore base backup
tar -xzf base.tar.gz -C /var/lib/postgresql/14/main
# 2. Create recovery.signal file
touch /var/lib/postgresql/14/main/recovery.signal
# 3. Configure recovery target (postgresql.conf or postgresql.auto.conf)
restore_command = 'cp /archive/wal/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
# Or: recovery_target_name = 'before_disaster'
# Or: recovery_target_lsn = '0/3000000'
# 4. Start PostgreSQL
systemctl start postgresql
3-2-1 Rule:
Backup Schedule:
Testing Backups:
Database Health:
Query Performance:
System Resources:
Connection stats:
SELECT count(*) as total_connections,
count(*) FILTER (WHERE state = 'active') as active,
count(*) FILTER (WHERE state = 'idle') as idle,
count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_transaction
FROM pg_stat_activity;
Cache hit ratio:
SELECT sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) AS ratio
FROM pg_statio_user_tables;
Table bloat:
SELECT schemaname, tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
Long-running queries:
SELECT pid, now() - query_start AS duration, state, query
FROM pg_stat_activity
WHERE state != 'idle'
AND query NOT LIKE '%pg_stat_activity%'
ORDER BY duration DESC;
Lock monitoring:
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted
AND blocking_locks.granted;
Installation:
CREATE EXTENSION pg_stat_statements;
Configuration (postgresql.conf):
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000
Top queries by total time:
SELECT query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Top queries by average time:
SELECT query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 20;
Normalization:
Data Types:
Constraints:
Zero-Downtime Migrations:
Add new column
ALTER TABLE users ADD COLUMN email_verified BOOLEAN;
Backfill data (in batches)
UPDATE users SET email_verified = false WHERE email_verified IS NULL LIMIT 10000;
Add NOT NULL constraint
ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
Index Creation:
CREATE INDEX CONCURRENTLY in productionpg_stat_progress_create_indexLarge Table Modifications:
pg_repack for table rewritesAuthentication:
Authorization:
Network Security:
Audit Logging:
Daily:
Weekly:
Monthly:
Quarterly:
Configuration:
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
parallel_setup_cost = 1000
parallel_tuple_cost = 0.1
min_parallel_table_scan_size = 8MB
Forcing parallel execution:
SET max_parallel_workers_per_gather = 4;
EXPLAIN ANALYZE SELECT COUNT(*) FROM large_table;
When parallelism helps:
Stored procedures:
CREATE OR REPLACE PROCEDURE update_user_statistics()
LANGUAGE plpgsql
AS $$
BEGIN
UPDATE users SET
order_count = (SELECT COUNT(*) FROM orders WHERE user_id = users.id),
last_order_date = (SELECT MAX(created_at) FROM orders WHERE user_id = users.id);
COMMIT;
END;
$$;
Functions with proper error handling:
CREATE OR REPLACE FUNCTION create_user(
p_email TEXT,
p_name TEXT
) RETURNS INTEGER
LANGUAGE plpgsql
AS $$
DECLARE
v_user_id INTEGER;
BEGIN
INSERT INTO users (email, name)
VALUES (p_email, p_name)
RETURNING id INTO v_user_id;
RETURN v_user_id;
EXCEPTION
WHEN unique_violation THEN
RAISE EXCEPTION 'Email already exists: %', p_email;
WHEN OTHERS THEN
RAISE EXCEPTION 'Error creating user: %', SQLERRM;
END;
$$;
Access external data sources:
-- Install postgres_fdw
CREATE EXTENSION postgres_fdw;
-- Create server
CREATE SERVER remote_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'remote_host', dbname 'remote_database', port '5432');
-- Create user mapping
CREATE USER MAPPING FOR current_user
SERVER remote_db
OPTIONS (user 'remote_user', password 'remote_password');
-- Import foreign schema
IMPORT FOREIGN SCHEMA public
FROM SERVER remote_db
INTO local_schema;
-- Query foreign table
SELECT * FROM local_schema.remote_table;
Indexing JSONB:
-- GIN index for containment queries
CREATE INDEX idx_data_gin ON documents USING GIN (data);
-- Expression index for specific field
CREATE INDEX idx_data_status ON documents ((data->>'status'));
-- GIN index with jsonb_path_ops (smaller, faster for @> queries)
CREATE INDEX idx_data_path_ops ON documents USING GIN (data jsonb_path_ops);
Efficient JSONB queries:
-- Containment query (uses GIN index)
SELECT * FROM documents WHERE data @> '{"status": "active"}';
-- Existence query
SELECT * FROM documents WHERE data ? 'email';
-- Path query
SELECT * FROM documents WHERE data->'user'->>'email' = 'user@example.com';
-- Array operations
SELECT * FROM documents WHERE data->'tags' @> '["sql", "postgres"]';
Basic setup:
-- Add tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
-- Generate search vector
UPDATE articles SET search_vector =
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(content, ''));
-- Create GIN index
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
-- Trigger for automatic updates
CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', title, content);
Search queries:
-- Basic search
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'postgresql & database') query
WHERE search_vector @@ query
ORDER BY rank DESC;
-- Phrase search
SELECT title FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'database engineering');
-- Search with highlighting
SELECT title,
ts_headline('english', content, query) AS snippet
FROM articles, to_tsquery('english', 'postgresql') query
WHERE search_vector @@ query;
Problem: Slow Queries
ANALYZE table_nameProblem: High CPU Usage
Problem: Connection Exhaustion
pg_stat_activityProblem: Autovacuum Not Keeping Up
Problem: Replication Lag
Problem: Transaction ID Wraparound
Find missing indexes on foreign keys:
SELECT c.conrelid::regclass AS table,
c.confrelid::regclass AS referenced_table,
string_agg(a.attname, ', ') AS foreign_key_columns
FROM pg_constraint c
JOIN pg_attribute a ON a.attnum = ANY(c.conkey) AND a.attrelid = c.conrelid
WHERE c.contype = 'f'
AND NOT EXISTS (
SELECT 1 FROM pg_index i
WHERE i.indrelid = c.conrelid
AND c.conkey[1:array_length(c.conkey, 1)]
OPERATOR(pg_catalog.@>) i.indkey[0:array_length(c.conkey, 1) - 1]
)
GROUP BY c.conrelid, c.confrelid, c.conname;
Identify blocking queries:
SELECT activity.pid,
activity.usename,
activity.query,
blocking.pid AS blocking_id,
blocking.query AS blocking_query
FROM pg_stat_activity AS activity
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid));
Skill Version : 1.0.0 Last Updated : October 2025 Skill Category : Database Engineering, Performance Optimization, Data Architecture Compatible With : PostgreSQL 12+, 13, 14, 15, 16 Prerequisites : SQL knowledge, basic database concepts, Linux command line
Weekly Installs
452
Repository
GitHub Stars
44
First Seen
Jan 22, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykFail
Installed on
opencode401
gemini-cli386
codex383
github-copilot368
cursor360
kimi-cli302
GSAP React 动画库使用指南:useGSAP Hook 与最佳实践
1,400 周安装
Grimoire Pendle 技能:Pendle 元数据检查与路由预检工具 | 区块链开发
937 周安装
gifgrep:命令行搜索下载GIF工具,支持Tenor/Giphy,提取静态帧和帧序列图
312 周安装
小红书封面生成器 - 一键生成小红书风格封面图片,支持自定义主题,AI驱动
950 周安装
Scrapling官方网络爬虫框架 - 自适应解析、绕过Cloudflare、Python爬虫库
952 周安装
Manim Composer:AI 驱动的 3Blue1Brown 风格数学动画制作工具
946 周安装
GitHub PRD 撰写与提交指南:15分钟完成产品需求文档和Pull Request
952 周安装
| JSONB query |
WHERE to_tsvector(content) @@ query | GIN | Full-text search |
WHERE location <-> point(0,0) | GiST | Nearest neighbor |
WHERE timestamp BETWEEN ... (large table) | BRIN | Sequential time-series |
WHERE ip_address << '192.168.0.0/16' | GiST or SP-GiST | IP range query |