重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
clickhouse-architect by terrylica/cc-skills
npx skills add https://github.com/terrylica/cc-skills --skill clickhouse-architect为 ClickHouse (v24.4+) 提供规范化的模式设计、压缩选择与性能优化。涵盖 ClickHouse Cloud (SharedMergeTree) 和自托管 (ReplicatedMergeTree) 两种部署方式。
在以下情况时使用此技能:
设计或审查 ClickHouse 模式时,请遵循以下顺序:
references/schema-documentation.mdORDER BY 子句是 ClickHouse 模式设计中最重要的决策。
规则:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
tenant_id 在 timestamp 之前)示例:
-- 正确:低基数在前,4列
CREATE TABLE trades (
exchange LowCardinality(String),
symbol LowCardinality(String),
timestamp DateTime64(3),
trade_id UInt64,
price Float64,
quantity Float64
) ENGINE = MergeTree()
ORDER BY (exchange, symbol, timestamp, trade_id);
-- 错误:高基数在前(查询速度慢 10 倍)
ORDER BY (trade_id, timestamp, symbol, exchange);
| 列类型 | 默认编解码器 | 读密集型替代方案 | 示例 |
|---|---|---|---|
| DateTime/DateTime64 | CODEC(DoubleDelta, ZSTD) | CODEC(DoubleDelta, LZ4) | timestamp DateTime64(3) CODEC(DoubleDelta, ZSTD) |
| 浮点价格/指标 | CODEC(Gorilla, ZSTD) | CODEC(Gorilla, LZ4) | price Float64 CODEC(Gorilla, ZSTD) |
| 整数计数器 | CODEC(T64, ZSTD) | — | count UInt64 CODEC(T64, ZSTD) |
| 缓慢变化的整数 | CODEC(Delta, ZSTD) | CODEC(Delta, LZ4) | version UInt32 CODEC(Delta, ZSTD) |
| 字符串(低基数) | LowCardinality(String) | — | status LowCardinality(String) |
| 通用数据 | CODEC(ZSTD(3)) | CODEC(LZ4) | 默认压缩级别 3 |
何时使用 LZ4 而非 ZSTD:LZ4 提供 1.76 倍更快的解压缩速度。对于具有单调序列(时间戳、计数器)的读密集型工作负载,使用 LZ4。当压缩比很重要或数据模式未知时,使用 ZSTD(默认)。
关于编解码器组合的说明:
Delta/DoubleDelta + Gorilla 组合默认被阻止(allow_suspicious_codecs),因为 Gorilla 内部已经执行了隐式的增量压缩——组合它们是冗余的,而非危险。一个历史上的损坏错误(PR #45615,2023 年 1 月)已被修复,但阻止机制仍然作为最佳实践的防护栏保留。
为每种数据类型独立使用其对应的编解码器族:
-- 正确用法
price Float64 CODEC(Gorilla, ZSTD) -- 浮点数:使用 Gorilla
timestamp DateTime64 CODEC(DoubleDelta, ZSTD) -- 时间戳:使用 DoubleDelta
timestamp DateTime64 CODEC(DoubleDelta, LZ4) -- 读密集型:使用 LZ4
PARTITION BY 用于数据生命周期管理,而非查询优化。
规则:
示例:
-- 正确:按月分区进行 TTL 管理
PARTITION BY toYYYYMM(timestamp)
-- 错误:按日分区(部分过多)
PARTITION BY toYYYYMMDD(timestamp)
-- 错误:高基数的分区键
PARTITION BY user_id
| 模式 | 严重性 | 现代状态 | 修复方法 |
|---|---|---|---|
| 部分过多 (>300/分区) | 严重 | 仍然严重 | 减少分区粒度 |
| 小批量插入 (<1000) | 严重 | 仍然严重 | 批量到 10k-100k 行 |
| ORDER BY 高基数在前 | 严重 | 仍然严重 | 重新排序:最低基数在前 |
| 无内存限制 | 高 | 仍然严重 | 设置 max_memory_usage |
| 过度使用反规范化 | 高 | 仍然严重 | 使用字典 + 物化视图 |
| 大型 JOIN | 中等 | 180 倍改进 | 对于超低延迟仍需避免 |
| 数据变更 (UPDATE/DELETE) | 中等 | 1700 倍改进 | 使用轻量级 UPDATE (v24.4+);参见下面的 DELETE 策略指南 |
根据范围选择正确的 DELETE 策略。按从最快到最慢排序:
| 策略 | 语法 | 速度 | 使用场景 |
|---|---|---|---|
DROP PARTITION | ALTER TABLE t DROP PARTITION (key1, key2, keyN) | 即时(仅元数据) | 清除整个分区范围(月份、损坏数据、测试数据) |
DELETE IN PARTITION | ALTER TABLE t DELETE IN PARTITION (...) WHERE condition | 快速(扫描 1 个分区) | 在已知分区内进行目标行删除 |
ALTER TABLE DELETE | ALTER TABLE t DELETE WHERE condition | 慢(扫描所有部分) | 分区未知时的备用方案 |
DELETE FROM (轻量级) | DELETE FROM t WHERE condition | 可变 | 写入管道的反模式 — 参见下面的警告 |
反模式:INSERT 前使用轻量级 DELETE FROM
DELETE FROM 设置 _row_exists=0 掩码,而不是物理删除行。这些幽灵行:
FINAL 的查询中显示为幻影数据仅在以下情况使用 DELETE FROM:幽灵行无关紧要的临时数据修正(分析清理、开发/测试)。切勿在 DELETE 后跟 INSERT 的写入管道中使用。
所有 DELETE 变更都应使用:SETTINGS mutations_sync = 1 以阻塞直到完成(防止 INSERT-DELETE 竞争条件)。
分区感知的 DELETE 提示:如果您的分区键包含您正在过滤的列(例如,PARTITION BY (symbol, threshold, toYYYYMM(timestamp))),请使用 DELETE IN PARTITION 将扫描范围限制在单个分区,而不是扫描所有部分。
| 部署方式 | 引擎 | 使用场景 |
|---|---|---|
| ClickHouse Cloud | SharedMergeTree | 云部署的默认引擎 |
| 自托管集群 | ReplicatedMergeTree | 多节点带复制 |
| 自托管单节点 | MergeTree | 单节点开发/测试 |
云 (SharedMergeTree):
CREATE TABLE trades (...)
ENGINE = SharedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}')
ORDER BY (exchange, symbol, timestamp);
自托管 (ReplicatedMergeTree):
CREATE TABLE trades (...)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}')
ORDER BY (exchange, symbol, timestamp);
此技能是 ClickHouse 相关任务的中心。当用户需求超出模式设计范围时,请调用以下相关技能。
| 用户需求 | 调用技能 | 触发短语 |
|---|---|---|
| 创建数据库用户,管理权限 | devops-tools:clickhouse-cloud-management | "create user", "GRANT", "permissions", "credentials" |
| 配置 DBeaver,生成连接 JSON | devops-tools:clickhouse-pydantic-config | "DBeaver", "client config", "connection setup" |
| 根据实时数据库验证模式契约 | quality-tools:schema-e2e-validation | "validate schema", "Earthly E2E", "schema contract" |
clickhouse-cloud-management(如果需要云凭据)clickhouse-pydantic-config(生成 DBeaver JSON)schema-e2e-validation(CI/CD 模式契约)用户:"我需要为 ClickHouse Cloud 设计一个交易表,并设置 DBeaver 来查询它。"
预期行为:
clickhouse-cloud-management 创建数据库用户clickhouse-pydantic-config 进行 DBeaver 配置创建 ClickHouse 自动选择的替代排序顺序:
ALTER TABLE trades ADD PROJECTION trades_by_symbol (
SELECT * ORDER BY symbol, timestamp
);
ALTER TABLE trades MATERIALIZE PROJECTION trades_by_symbol;
为仪表板查询预计算聚合:
CREATE MATERIALIZED VIEW trades_hourly_mv
ENGINE = SummingMergeTree()
ORDER BY (exchange, symbol, hour)
AS SELECT
exchange,
symbol,
toStartOfHour(timestamp) AS hour,
sum(quantity) AS total_volume,
count() AS trade_count
FROM trades
GROUP BY exchange, symbol, hour;
用 O(1) 字典查找替换 JOIN,适用于大规模星型模式:
何时使用字典 (v24.4+):
何时 JOIN 就足够了 (v24.4+):
基准测试背景:在星型模式基准测试(14 亿行)上测得 6.6 倍加速。
CREATE DICTIONARY symbol_info (
symbol String,
name String,
sector String
)
PRIMARY KEY symbol
SOURCE(CLICKHOUSE(TABLE 'symbols'))
LAYOUT(FLAT()) -- 适用于 <500k 条目且具有单调键的最佳布局
LIFETIME(3600);
-- 在查询中使用 (O(1) 查找)
SELECT
symbol,
dictGet('symbol_info', 'name', symbol) AS symbol_name
FROM trades;
执行全面的模式审计:
clickhouse-client --multiquery < scripts/schema-audit.sql
审计脚本检查:
| 参考 | 内容 |
|---|---|
references/schema-design-workflow.md | 完整工作流及示例 |
references/compression-codec-selection.md | 决策树 + 基准测试 |
references/anti-patterns-and-fixes.md | 13 个致命错误 + v24.4+ 状态 |
references/audit-and-diagnostics.md | 查询解释指南 |
references/idiomatic-architecture.md | 参数化视图、字典、去重 |
references/schema-documentation.md | COMMENT 模式 + 命名以利于 AI 理解 |
references/cache-schema-evolution.md | 缓存失效 + 模式演化模式 |
在所有 Python 集成中使用 clickhouse-connect(官方)。
# ✅ 推荐:clickhouse-connect(官方,HTTP)
import clickhouse_connect
client = clickhouse_connect.get_client(
host='localhost',
port=8123, # HTTP 端口
username='default',
password=''
)
result = client.query("SELECT * FROM trades LIMIT 1000")
df = client.query_df("SELECT * FROM trades") # Pandas 集成
clickhouse-driver| 因素 | clickhouse-connect | clickhouse-driver |
|---|---|---|
| 维护者 | ClickHouse Inc. | 个人开发者 |
| 每周提交 | 是(活跃) | 稀疏(数月) |
| 未解决问题 | 41(已处理) | 76(累积) |
| 每周下载量 | 2.7M | 1.5M |
| 巴士因子风险 | 低(公司) | 高(1 人) |
请勿使用 clickhouse-driver,尽管它在大型导出方面有约 26% 的速度优势。维护风险超过了性能收益:
例外:仅当您有极端的性能需求(导出数百万行)并且接受维护风险时,才考虑使用 clickhouse-driver。
每个 ClickHouse 表和列都必须有一个 COMMENT,完整记录其含义、计算方法和约束。 COMMENT 是 SSoT —— 任何外部文档、技能或 wiki 都不能取代它。
DESCRIBE table、SHOW CREATE TABLE、system.columns 查看ALTER TABLE t COMMENT COLUMN session_label
'严格的会话标签。8 个值:sydney_only, tokyo_only, ...
仅当整个柱(开盘→收盘)完全落在一个会话内时设置。
cross_session = 柱跨越边界。使用 WHERE is_pure_session=1。
GitHub: https://github.com/org/repo/issues/54
源脚本:scripts/populate-sessions/populate_v3.py';
切勿创建没有 COMMENT 的 ClickHouse 列。没有文档记录的列就是会被误用的列。
| 技能 | 用途 |
|---|---|
devops-tools:clickhouse-cloud-management | 用户/权限管理 |
devops-tools:clickhouse-pydantic-config | DBeaver 连接生成 |
quality-tools:schema-e2e-validation | YAML 模式契约 |
quality-tools:multi-agent-e2e-validation | 数据库迁移验证 |
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 部分过多 | 过度分区 | 减少分区粒度(按月而非按日) |
| 查询缓慢 | ORDER BY 顺序错误 | 将基数最低的列放在前面 |
| 内存使用率高 | 未设置内存限制 | 配置 max_memory_usage 设置 |
| Delta+Gorilla 编解码器错误 | 可疑的编解码器组合 | 独立使用每个编解码器族 |
| 投影未被使用 | 优化器选择了不同的计划 | 检查 EXPLAIN 以验证投影选择 |
| 字典过时 | 生命周期到期 | 增加 LIFETIME 或触发刷新 |
| 复制延迟 | 部分合并落后 | 检查 merge_tree 设置,增加资源 |
| INSERT 太慢 | 批量大小过小 | 批量到 10k-100k 行每 INSERT |
每周安装次数
82
代码仓库
GitHub 星标数
25
首次出现
2026 年 1 月 24 日
安全审计
安装于
opencode78
github-copilot75
codex75
gemini-cli75
cursor73
amp72
Prescriptive schema design, compression selection, and performance optimization for ClickHouse (v24.4+). Covers both ClickHouse Cloud (SharedMergeTree) and self-hosted (ReplicatedMergeTree) deployments.
Use this skill when:
Follow this sequence when designing or reviewing ClickHouse schemas:
references/schema-documentation.mdThe ORDER BY clause is the most critical decision in ClickHouse schema design.
Rules :
tenant_id before timestamp)Example :
-- Correct: Low cardinality first, 4 columns
CREATE TABLE trades (
exchange LowCardinality(String),
symbol LowCardinality(String),
timestamp DateTime64(3),
trade_id UInt64,
price Float64,
quantity Float64
) ENGINE = MergeTree()
ORDER BY (exchange, symbol, timestamp, trade_id);
-- Wrong: High cardinality first (10x slower queries)
ORDER BY (trade_id, timestamp, symbol, exchange);
| Column Type | Default Codec | Read-Heavy Alternative | Example |
|---|---|---|---|
| DateTime/DateTime64 | CODEC(DoubleDelta, ZSTD) | CODEC(DoubleDelta, LZ4) | timestamp DateTime64(3) CODEC(DoubleDelta, ZSTD) |
| Float prices/gauges | CODEC(Gorilla, ZSTD) | CODEC(Gorilla, LZ4) | price Float64 CODEC(Gorilla, ZSTD) |
When to use LZ4 over ZSTD : LZ4 provides 1.76x faster decompression. Use LZ4 for read-heavy workloads with monotonic sequences (timestamps, counters). Use ZSTD (default) when compression ratio matters or data patterns are unknown.
Note on codec combinations :
Delta/DoubleDelta + Gorilla combinations are blocked by default (allow_suspicious_codecs) because Gorilla already performs implicit delta compression internally—combining them is redundant , not dangerous. A historical corruption bug (PR #45615, Jan 2023) was fixed, but the blocking remains as a best practice guardrail.
Use each codec family independently for its intended data type:
-- Correct usage
price Float64 CODEC(Gorilla, ZSTD) -- Floats: use Gorilla
timestamp DateTime64 CODEC(DoubleDelta, ZSTD) -- Timestamps: use DoubleDelta
timestamp DateTime64 CODEC(DoubleDelta, LZ4) -- Read-heavy: use LZ4
PARTITION BY is for data lifecycle management , NOT query optimization.
Rules :
Example :
-- Correct: Monthly partitions for TTL management
PARTITION BY toYYYYMM(timestamp)
-- Wrong: Daily partitions (too many parts)
PARTITION BY toYYYYMMDD(timestamp)
-- Wrong: High-cardinality partition key
PARTITION BY user_id
| Pattern | Severity | Modern Status | Fix |
|---|---|---|---|
| Too many parts (>300/partition) | Critical | Still critical | Reduce partition granularity |
| Small batch inserts (<1000) | Critical | Still critical | Batch to 10k-100k rows |
| High-cardinality first ORDER BY | Critical | Still critical | Reorder: lowest cardinality first |
| No memory limits | High | Still critical | Set max_memory_usage |
| Denormalization overuse | High | Still critical | Use dictionaries + materialized views |
| Large JOINs | Medium | 180x improved |
Choose the right DELETE strategy based on scope. Ranked fastest to slowest:
| Strategy | Syntax | Speed | Use When |
|---|---|---|---|
DROP PARTITION | ALTER TABLE t DROP PARTITION (key1, key2, keyN) | Instant (metadata-only) | Purge entire partition ranges (months, corrupt data, test data) |
DELETE IN PARTITION | ALTER TABLE t DELETE IN PARTITION (...) WHERE condition | Fast (scans 1 partition) | Targeted row removal within a known partition |
ALTER TABLE DELETE |
Anti-pattern: LightweightDELETE FROM before INSERT
DELETE FROM sets _row_exists=0 masks instead of physically removing rows. These ghost rows:
FINAL as phantom dataUseDELETE FROM only for: ad-hoc data correction where ghost rows don't matter (analytics cleanup, dev/test). Never use in write pipelines where INSERT follows DELETE.
All DELETE mutations should use : SETTINGS mutations_sync = 1 to block until completion (prevents INSERT-DELETE race conditions).
Partition-aware DELETE tip : If your partition key includes the columns you're filtering on (e.g., PARTITION BY (symbol, threshold, toYYYYMM(timestamp))), use DELETE IN PARTITION to scope the scan to a single partition instead of scanning all parts.
| Deployment | Engine | Use Case |
|---|---|---|
| ClickHouse Cloud | SharedMergeTree | Default for cloud deployments |
| Self-hosted cluster | ReplicatedMergeTree | Multi-node with replication |
| Self-hosted single | MergeTree | Single-node development/testing |
Cloud (SharedMergeTree) :
CREATE TABLE trades (...)
ENGINE = SharedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}')
ORDER BY (exchange, symbol, timestamp);
Self-hosted (ReplicatedMergeTree) :
CREATE TABLE trades (...)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}')
ORDER BY (exchange, symbol, timestamp);
This skill is the hub for ClickHouse-related tasks. When the user's needs extend beyond schema design, invoke the related skills below.
| User Need | Invoke Skill | Trigger Phrases |
|---|---|---|
| Create database users, manage permissions | devops-tools:clickhouse-cloud-management | "create user", "GRANT", "permissions", "credentials" |
| Configure DBeaver, generate connection JSON | devops-tools:clickhouse-pydantic-config | "DBeaver", "client config", "connection setup" |
| Validate schema contracts against live database | quality-tools:schema-e2e-validation | "validate schema", "Earthly E2E", "schema contract" |
clickhouse-cloud-management (if cloud credentials needed)clickhouse-pydantic-config (generate DBeaver JSON)schema-e2e-validation (CI/CD schema contracts)User : "I need to design a trades table for ClickHouse Cloud and set up DBeaver to query it."
Expected behavior :
clickhouse-cloud-management for creating database userclickhouse-pydantic-config for DBeaver configurationCreate alternative sort orders that ClickHouse automatically selects:
ALTER TABLE trades ADD PROJECTION trades_by_symbol (
SELECT * ORDER BY symbol, timestamp
);
ALTER TABLE trades MATERIALIZE PROJECTION trades_by_symbol;
Pre-compute aggregations for dashboard queries:
CREATE MATERIALIZED VIEW trades_hourly_mv
ENGINE = SummingMergeTree()
ORDER BY (exchange, symbol, hour)
AS SELECT
exchange,
symbol,
toStartOfHour(timestamp) AS hour,
sum(quantity) AS total_volume,
count() AS trade_count
FROM trades
GROUP BY exchange, symbol, hour;
Replace JOINs with O(1) dictionary lookups for large-scale star schemas :
When to use dictionaries (v24.4+) :
When JOINs are sufficient (v24.4+) :
Benchmark context : 6.6x speedup measured on Star Schema Benchmark (1.4B rows).
CREATE DICTIONARY symbol_info (
symbol String,
name String,
sector String
)
PRIMARY KEY symbol
SOURCE(CLICKHOUSE(TABLE 'symbols'))
LAYOUT(FLAT()) -- Best for <500k entries with monotonic keys
LIFETIME(3600);
-- Use in queries (O(1) lookup)
SELECT
symbol,
dictGet('symbol_info', 'name', symbol) AS symbol_name
FROM trades;
Execute comprehensive schema audit:
clickhouse-client --multiquery < scripts/schema-audit.sql
The audit script checks:
| Reference | Content |
|---|---|
references/schema-design-workflow.md | Complete workflow with examples |
references/compression-codec-selection.md | Decision tree + benchmarks |
references/anti-patterns-and-fixes.md | 13 deadly sins + v24.4+ status |
references/audit-and-diagnostics.md |
Useclickhouse-connect (official) for all Python integrations.
# ✅ RECOMMENDED: clickhouse-connect (official, HTTP)
import clickhouse_connect
client = clickhouse_connect.get_client(
host='localhost',
port=8123, # HTTP port
username='default',
password=''
)
result = client.query("SELECT * FROM trades LIMIT 1000")
df = client.query_df("SELECT * FROM trades") # Pandas integration
clickhouse-driver| Factor | clickhouse-connect | clickhouse-driver |
|---|---|---|
| Maintainer | ClickHouse Inc. | Solo developer |
| Weekly commits | Yes (active) | Sparse (months) |
| Open issues | 41 (addressed) | 76 (accumulating) |
| Downloads/week | 2.7M | 1.5M |
| Bus factor risk | Low (company) | High (1 person) |
Do NOT useclickhouse-driver despite its ~26% speed advantage for large exports. The maintenance risk outweighs performance gains:
Exception : Only consider clickhouse-driver if you have extreme performance requirements (exporting millions of rows) AND accept the maintenance risk.
Every ClickHouse table and column MUST have a COMMENT that fully documents its meaning, computation method, and constraints. The COMMENT is the SSoT — no external document, skill, or wiki supersedes it.
DESCRIBE table, SHOW CREATE TABLE, system.columnsALTER TABLE t COMMENT COLUMN session_label
'STRICT session label. 8 values: sydney_only, tokyo_only, ...
Only set when ENTIRE bar (open→close) falls within one session.
cross_session = bar spans boundary. Use WHERE is_pure_session=1.
GitHub: https://github.com/org/repo/issues/54
Source: scripts/populate-sessions/populate_v3.py';
NEVER create a ClickHouse column without a COMMENT. A column without documentation is a column that will be misused.
| Skill | Purpose |
|---|---|
devops-tools:clickhouse-cloud-management | User/permission management |
devops-tools:clickhouse-pydantic-config | DBeaver connection generation |
quality-tools:schema-e2e-validation | YAML schema contracts |
quality-tools:multi-agent-e2e-validation | Database migration validation |
| Issue | Cause | Solution |
|---|---|---|
| Too many parts | Over-partitioned | Reduce partition granularity (monthly not daily) |
| Slow queries | Wrong ORDER BY order | Put lowest cardinality columns first |
| High memory usage | No memory limits set | Configure max_memory_usage setting |
| Codec error on Delta+Gorilla | Suspicious codec combination | Use each codec family independently |
| Projection not used | Optimizer chose different plan | Check EXPLAIN to verify projection selection |
| Dictionary stale | Lifetime expired | Increase LIFETIME or trigger refresh |
| Replication lag | Part merges falling behind | Check merge_tree settings, add resources |
| INSERT too slow | Small batch sizes | Batch to 10k-100k rows per INSERT |
Weekly Installs
82
Repository
GitHub Stars
25
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode78
github-copilot75
codex75
gemini-cli75
cursor73
amp72
Supabase 使用指南:安全最佳实践、CLI 命令与 MCP 服务器配置
5,400 周安装
Momenta Web Skill - 智能代理技能目录,集成多开发工具提升工作效率
57 周安装
领域识别与分组技能:自动分析代码组件,为微服务和领域驱动设计(DDD)做准备
57 周安装
agent-council:创建管理Discord AI智能体,OpenClaw集成与自动化
57 周安装
Pipedrive自动化技能:通过Rube MCP与Composio工具包实现CRM工作流自动化
57 周安装
Hugging Face CLI 使用指南:hf 命令安装、身份验证与Hub文件管理
57 周安装
generic-static-design-system - 静态设计系统开发工具,提升UI组件库与前端架构效率
57 周安装
| Integer counters |
CODEC(T64, ZSTD) |
| — |
count UInt64 CODEC(T64, ZSTD) |
| Slowly changing integers | CODEC(Delta, ZSTD) | CODEC(Delta, LZ4) | version UInt32 CODEC(Delta, ZSTD) |
| String (low cardinality) | LowCardinality(String) | — | status LowCardinality(String) |
| General data | CODEC(ZSTD(3)) | CODEC(LZ4) | Default compression level 3 |
| Still avoid for ultra-low-latency |
| Mutations (UPDATE/DELETE) | Medium | 1700x improved | Use lightweight UPDATEs (v24.4+); see DELETE Strategy Guide below |
ALTER TABLE t DELETE WHERE condition |
| Slow (scans all parts) |
| Fallback when partition is unknown |
DELETE FROM (lightweight) | DELETE FROM t WHERE condition | Variable | ANTI-PATTERN for write pipelines — see warning below |
| Query interpretation guide |
references/idiomatic-architecture.md | Parameterized views, dictionaries, dedup |
references/schema-documentation.md | COMMENT patterns + naming for AI understanding |
references/cache-schema-evolution.md | Cache invalidation + schema evolution patterns |