⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

ClickHouse 架构师：表模式设计、压缩编解码器与性能优化指南 (v24.4+)

clickhouse-architect by terrylica/cc-skills

91 周安装量

30 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/terrylica/cc-skills --skill clickhouse-architect

数据库系统架构性能优化

🇨🇳中文介绍

ClickHouse 架构师

为 ClickHouse (v24.4+) 提供规范化的模式设计、压缩选择与性能优化。涵盖 ClickHouse Cloud (SharedMergeTree) 和自托管 (ReplicatedMergeTree) 两种部署方式。

何时使用此技能

在以下情况时使用此技能：

设计 ClickHouse 表模式，选择 ORDER BY 键
为列类型选择压缩编解码器
为数据生命周期管理配置分区键
添加性能加速器（投影、索引、字典）
审计和优化现有的 ClickHouse 模式

核心方法论

模式设计工作流

设计或审查 ClickHouse 模式时，请遵循以下顺序：

定义 ORDER BY 键（3-5 列，基数最低的列在前）
按列类型选择压缩编解码器
配置 PARTITION BY 以进行数据生命周期管理
添加性能加速器（投影、索引）
使用审计查询进行验证（参见 scripts/）
使用 COMMENT 语句进行文档记录 — ClickHouse 表和列的 COMMENT 是单一事实来源，用于说明每列的含义、计算方式以及适用的约束。任何外部文档、技能或 wiki 都不能取代 COMMENT。参见 references/schema-documentation.md

ORDER BY 键选择

ORDER BY 子句是 ClickHouse 模式设计中最重要的决策。

规则：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

压缩编解码器快速参考

列类型	默认编解码器	读密集型替代方案	示例
DateTime/DateTime64	`CODEC(DoubleDelta, ZSTD)`	`CODEC(DoubleDelta, LZ4)`	`timestamp DateTime64(3) CODEC(DoubleDelta, ZSTD)`
浮点价格/指标	`CODEC(Gorilla, ZSTD)`	`CODEC(Gorilla, LZ4)`	`price Float64 CODEC(Gorilla, ZSTD)`
整数计数器	`CODEC(T64, ZSTD)`	—	`count UInt64 CODEC(T64, ZSTD)`
缓慢变化的整数	`CODEC(Delta, ZSTD)`	`CODEC(Delta, LZ4)`	`version UInt32 CODEC(Delta, ZSTD)`
字符串（低基数）	`LowCardinality(String)`	—	`status LowCardinality(String)`
通用数据	`CODEC(ZSTD(3))`	`CODEC(LZ4)`	默认压缩级别 3

何时使用 LZ4 而非 ZSTD：LZ4 提供 1.76 倍更快的解压缩速度。对于具有单调序列（时间戳、计数器）的读密集型工作负载，使用 LZ4。当压缩比很重要或数据模式未知时，使用 ZSTD（默认）。

关于编解码器组合的说明：

Delta/DoubleDelta + Gorilla 组合默认被阻止（allow_suspicious_codecs），因为 Gorilla 内部已经执行了隐式的增量压缩——组合它们是冗余的，而非危险。一个历史上的损坏错误（PR #45615，2023 年 1 月）已被修复，但阻止机制仍然作为最佳实践的防护栏保留。

为每种数据类型独立使用其对应的编解码器族：

-- 正确用法
price Float64 CODEC(Gorilla, ZSTD)              -- 浮点数：使用 Gorilla
timestamp DateTime64 CODEC(DoubleDelta, ZSTD)   -- 时间戳：使用 DoubleDelta
timestamp DateTime64 CODEC(DoubleDelta, LZ4)    -- 读密集型：使用 LZ4

PARTITION BY 用于数据生命周期管理，而非查询优化。

按时间单位（月、周）分区以便 TTL 和数据管理
保持所有表的总分区数在 1000 以下
每个分区应最多包含 1-300 个部分
切勿按高基数列分区

-- 正确：按月分区进行 TTL 管理
PARTITION BY toYYYYMM(timestamp)

-- 错误：按日分区（部分过多）
PARTITION BY toYYYYMMDD(timestamp)

-- 错误：高基数的分区键
PARTITION BY user_id

反模式检查清单 (v24.4+)

模式	严重性	现代状态	修复方法
部分过多 (>300/分区)	严重	仍然严重	减少分区粒度
小批量插入 (<1000)	严重	仍然严重	批量到 10k-100k 行
ORDER BY 高基数在前	严重	仍然严重	重新排序：最低基数在前
无内存限制	高	仍然严重	设置 `max_memory_usage`
过度使用反规范化	高	仍然严重	使用字典 + 物化视图
大型 JOIN	中等	180 倍改进	对于超低延迟仍需避免
数据变更 (UPDATE/DELETE)	中等	1700 倍改进	使用轻量级 UPDATE (v24.4+)；参见下面的 DELETE 策略指南

DELETE 策略指南 (v13.49.0+ 最佳实践)

根据范围选择正确的 DELETE 策略。按从最快到最慢排序：

策略	语法	速度	使用场景
`DROP PARTITION`	`ALTER TABLE t DROP PARTITION (key1, key2, keyN)`	即时（仅元数据）	清除整个分区范围（月份、损坏数据、测试数据）
`DELETE IN PARTITION`	`ALTER TABLE t DELETE IN PARTITION (...) WHERE condition`	快速（扫描 1 个分区）	在已知分区内进行目标行删除
`ALTER TABLE DELETE`	`ALTER TABLE t DELETE WHERE condition`	慢（扫描所有部分）	分区未知时的备用方案
`DELETE FROM` (轻量级)	`DELETE FROM t WHERE condition`	可变	写入管道的反模式 — 参见下面的警告

反模式：INSERT 前使用轻量级 DELETE FROM

DELETE FROM 设置 _row_exists=0 掩码，而不是物理删除行。这些幽灵行：

在 ClickHouse 后台合并之前持续存在（时间不可预测）
在未使用 FINAL 的查询中显示为幻影数据
导致监控/完整性检查中出现错误异常
是生产环境中 Stathera 数月幻影间隙的根本原因 (opendeviationbar #269)

仅在以下情况使用 DELETE FROM：幽灵行无关紧要的临时数据修正（分析清理、开发/测试）。切勿在 DELETE 后跟 INSERT 的写入管道中使用。

所有 DELETE 变更都应使用：SETTINGS mutations_sync = 1 以阻塞直到完成（防止 INSERT-DELETE 竞争条件）。

分区感知的 DELETE 提示：如果您的分区键包含您正在过滤的列（例如，PARTITION BY (symbol, threshold, toYYYYMM(timestamp))），请使用 DELETE IN PARTITION 将扫描范围限制在单个分区，而不是扫描所有部分。

部署方式	引擎	使用场景
ClickHouse Cloud	`SharedMergeTree`	云部署的默认引擎
自托管集群	`ReplicatedMergeTree`	多节点带复制
自托管单节点	`MergeTree`	单节点开发/测试

云 (SharedMergeTree)：

CREATE TABLE trades (...)
ENGINE = SharedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}')
ORDER BY (exchange, symbol, timestamp);

自托管 (ReplicatedMergeTree)：

CREATE TABLE trades (...)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}')
ORDER BY (exchange, symbol, timestamp);

此技能是 ClickHouse 相关任务的中心。当用户需求超出模式设计范围时，请调用以下相关技能。

用户需求	调用技能	触发短语
创建数据库用户，管理权限	`devops-tools:clickhouse-cloud-management`	"create user", "GRANT", "permissions", "credentials"
配置 DBeaver，生成连接 JSON	`devops-tools:clickhouse-pydantic-config`	"DBeaver", "client config", "connection setup"
根据实时数据库验证模式契约	`quality-tools:schema-e2e-validation`	"validate schema", "Earthly E2E", "schema contract"

典型工作流顺序

模式设计（本技能） → 设计 ORDER BY、压缩、分区
用户设置 → clickhouse-cloud-management（如果需要云凭据）
客户端配置 → clickhouse-pydantic-config（生成 DBeaver JSON）
验证 → schema-e2e-validation（CI/CD 模式契约）

示例：全栈请求

用户："我需要为 ClickHouse Cloud 设计一个交易表，并设置 DBeaver 来查询它。"

使用本技能进行模式设计
调用 clickhouse-cloud-management 创建数据库用户
调用 clickhouse-pydantic-config 进行 DBeaver 配置

创建 ClickHouse 自动选择的替代排序顺序：

ALTER TABLE trades ADD PROJECTION trades_by_symbol (
    SELECT * ORDER BY symbol, timestamp
);
ALTER TABLE trades MATERIALIZE PROJECTION trades_by_symbol;

为仪表板查询预计算聚合：

CREATE MATERIALIZED VIEW trades_hourly_mv
ENGINE = SummingMergeTree()
ORDER BY (exchange, symbol, hour)
AS SELECT
    exchange,
    symbol,
    toStartOfHour(timestamp) AS hour,
    sum(quantity) AS total_volume,
    count() AS trade_count
FROM trades
GROUP BY exchange, symbol, hour;

用 O(1) 字典查找替换 JOIN，适用于大规模星型模式：

何时使用字典 (v24.4+)：

事实表有 1 亿+ 行，需要连接维度表
维度表有 1k-500k 行且具有单调键
需要 LEFT ANY JOIN 语义

何时 JOIN 就足够了 (v24.4+)：

维度表 <500 行（JOIN 开销可忽略）
v24.4+ 谓词下推提供了 8-180 倍的改进
复杂的 JOIN 类型（FULL、RIGHT、多条件）

基准测试背景：在星型模式基准测试（14 亿行）上测得 6.6 倍加速。

CREATE DICTIONARY symbol_info (
    symbol String,
    name String,
    sector String
)
PRIMARY KEY symbol
SOURCE(CLICKHOUSE(TABLE 'symbols'))
LAYOUT(FLAT())  -- 适用于 <500k 条目且具有单调键的最佳布局
LIFETIME(3600);

-- 在查询中使用 (O(1) 查找)
SELECT
    symbol,
    dictGet('symbol_info', 'name', symbol) AS symbol_name
FROM trades;

执行全面的模式审计：

clickhouse-client --multiquery < scripts/schema-audit.sql

审计脚本检查：

每个分区的部分数量（阈值：300）
按列的压缩比率
查询性能模式
复制延迟（如果适用）
内存使用模式

参考	内容
`references/schema-design-workflow.md`	完整工作流及示例
`references/compression-codec-selection.md`	决策树 + 基准测试
`references/anti-patterns-and-fixes.md`	13 个致命错误 + v24.4+ 状态
`references/audit-and-diagnostics.md`	查询解释指南
`references/idiomatic-architecture.md`	参数化视图、字典、去重
`references/schema-documentation.md`	COMMENT 模式 + 命名以利于 AI 理解
`references/cache-schema-evolution.md`	缓存失效 + 模式演化模式

在所有 Python 集成中使用 clickhouse-connect（官方）。

# ✅ 推荐：clickhouse-connect（官方，HTTP）
import clickhouse_connect

client = clickhouse_connect.get_client(
    host='localhost',
    port=8123,  # HTTP 端口
    username='default',
    password=''
)
result = client.query("SELECT * FROM trades LIMIT 1000")
df = client.query_df("SELECT * FROM trades")  # Pandas 集成

为何不使用 `clickhouse-driver`

因素	clickhouse-connect	clickhouse-driver
维护者	ClickHouse Inc.	个人开发者
每周提交	是（活跃）	稀疏（数月）
未解决问题	41（已处理）	76（累积）
每周下载量	2.7M	1.5M
巴士因子风险	低（公司）	高（1 人）

请勿使用 clickhouse-driver，尽管它在大型导出方面有约 26% 的速度优势。维护风险超过了性能收益：

单一维护者 (mymarilyn)，无继任计划
问题累积且无响应
被弃用的风险会破坏生产代码

例外：仅当您有极端的性能需求（导出数百万行）并且接受维护风险时，才考虑使用 clickhouse-driver。

ClickHouse COMMENT = 单一事实来源

每个 ClickHouse 表和列都必须有一个 COMMENT，完整记录其含义、计算方法和约束。 COMMENT 是 SSoT —— 任何外部文档、技能或 wiki 都不能取代它。

ClickHouse COMMENT 没有长度限制，支持换行、URL 和 Unicode
对查询性能零影响（纯元数据，不在数据路径中）
可通过 DESCRIBE table、SHOW CREATE TABLE、system.columns 查看
在模式迁移中保留（通过 ALTER 操作保留）

COMMENT 中应包含的内容

列用途（用通俗英语说明）
计算公式（如果是派生/计算列）
单位（秒、毫秒、bps、比率）
有效范围或枚举值
反模式（不应对此列进行的操作）
GitHub 问题链接以追溯来源
填充该列的源脚本

ALTER TABLE t COMMENT COLUMN session_label
'严格的会话标签。8 个值：sydney_only, tokyo_only, ...
仅当整个柱（开盘→收盘）完全落在一个会话内时设置。
cross_session = 柱跨越边界。使用 WHERE is_pure_session=1。
GitHub: https://github.com/org/repo/issues/54
源脚本：scripts/populate-sessions/populate_v3.py';

切勿创建没有 COMMENT 的 ClickHouse 列。没有文档记录的列就是会被误用的列。

技能	用途
`devops-tools:clickhouse-cloud-management`	用户/权限管理
`devops-tools:clickhouse-pydantic-config`	DBeaver 连接生成
`quality-tools:schema-e2e-validation`	YAML 模式契约
`quality-tools:multi-agent-e2e-validation`	数据库迁移验证

问题	原因	解决方案
部分过多	过度分区	减少分区粒度（按月而非按日）
查询缓慢	ORDER BY 顺序错误	将基数最低的列放在前面
内存使用率高	未设置内存限制	配置 max_memory_usage 设置
Delta+Gorilla 编解码器错误	可疑的编解码器组合	独立使用每个编解码器族
投影未被使用	优化器选择了不同的计划	检查 EXPLAIN 以验证投影选择
字典过时	生命周期到期	增加 LIFETIME 或触发刷新
复制延迟	部分合并落后	检查 merge_tree 设置，增加资源
INSERT 太慢	批量大小过小	批量到 10k-100k 行每 INSERT

2026 年 1 月 24 日

🇺🇸English

ClickHouse Architect

Prescriptive schema design, compression selection, and performance optimization for ClickHouse (v24.4+). Covers both ClickHouse Cloud (SharedMergeTree) and self-hosted (ReplicatedMergeTree) deployments.

When to Use This Skill

Use this skill when:

Designing ClickHouse table schemas with ORDER BY key selection
Selecting compression codecs for column types
Configuring partition keys for data lifecycle management
Adding performance accelerators (projections, indexes, dictionaries)
Auditing and optimizing existing ClickHouse schemas

Core Methodology

Schema Design Workflow

Follow this sequence when designing or reviewing ClickHouse schemas:

Define ORDER BY key (3-5 columns, lowest cardinality first)
Select compression codecs per column type
Configure PARTITION BY for data lifecycle management
Add performance accelerators (projections, indexes)
Validate with audit queries (see scripts/)
Document with COMMENT statements — ClickHouse table and column COMMENTs are the single source of truth (SSoT) for what each column means, how it was computed, and what constraints apply. No external doc, skill, or wiki supersedes the COMMENT. See references/schema-documentation.md

ORDER BY Key Selection

The ORDER BY clause is the most critical decision in ClickHouse schema design.

Rules :

Limit to 3-5 columns maximum (each additional column has diminishing returns)
Place lowest cardinality columns first (e.g., tenant_id before timestamp)
Include all columns used in WHERE clauses for range queries
PRIMARY KEY must be a prefix of ORDER BY (or omit to use full ORDER BY)

Example :

-- Correct: Low cardinality first, 4 columns
CREATE TABLE trades (
    exchange LowCardinality(String),
    symbol LowCardinality(String),
    timestamp DateTime64(3),
    trade_id UInt64,
    price Float64,
    quantity Float64
) ENGINE = MergeTree()
ORDER BY (exchange, symbol, timestamp, trade_id);

-- Wrong: High cardinality first (10x slower queries)
ORDER BY (trade_id, timestamp, symbol, exchange);

Compression Codec Quick Reference

Column Type	Default Codec	Read-Heavy Alternative	Example
DateTime/DateTime64	`CODEC(DoubleDelta, ZSTD)`	`CODEC(DoubleDelta, LZ4)`	`timestamp DateTime64(3) CODEC(DoubleDelta, ZSTD)`
Float prices/gauges	`CODEC(Gorilla, ZSTD)`	`CODEC(Gorilla, LZ4)`	`price Float64 CODEC(Gorilla, ZSTD)`

When to use LZ4 over ZSTD : LZ4 provides 1.76x faster decompression. Use LZ4 for read-heavy workloads with monotonic sequences (timestamps, counters). Use ZSTD (default) when compression ratio matters or data patterns are unknown.

Note on codec combinations :

Delta/DoubleDelta + Gorilla combinations are blocked by default (allow_suspicious_codecs) because Gorilla already performs implicit delta compression internally—combining them is redundant , not dangerous. A historical corruption bug (PR #45615, Jan 2023) was fixed, but the blocking remains as a best practice guardrail.

Use each codec family independently for its intended data type:

-- Correct usage
price Float64 CODEC(Gorilla, ZSTD)              -- Floats: use Gorilla
timestamp DateTime64 CODEC(DoubleDelta, ZSTD)   -- Timestamps: use DoubleDelta
timestamp DateTime64 CODEC(DoubleDelta, LZ4)    -- Read-heavy: use LZ4

PARTITION BY Guidelines

PARTITION BY is for data lifecycle management , NOT query optimization.

Rules :

Partition by time units (month, week) for TTL and data management
Keep partition count under 1000 total across all tables
Each partition should contain 1-300 parts maximum
Never partition by high-cardinality columns

Example :

-- Correct: Monthly partitions for TTL management
PARTITION BY toYYYYMM(timestamp)

-- Wrong: Daily partitions (too many parts)
PARTITION BY toYYYYMMDD(timestamp)

-- Wrong: High-cardinality partition key
PARTITION BY user_id

Anti-Patterns Checklist (v24.4+)

Pattern	Severity	Modern Status	Fix
Too many parts (>300/partition)	Critical	Still critical	Reduce partition granularity
Small batch inserts (<1000)	Critical	Still critical	Batch to 10k-100k rows
High-cardinality first ORDER BY	Critical	Still critical	Reorder: lowest cardinality first
No memory limits	High	Still critical	Set `max_memory_usage`
Denormalization overuse	High	Still critical	Use dictionaries + materialized views
Large JOINs	Medium	180x improved

DELETE Strategy Guide (v13.49.0+ Best Practices)

Choose the right DELETE strategy based on scope. Ranked fastest to slowest:

Strategy	Syntax	Speed	Use When
`DROP PARTITION`	`ALTER TABLE t DROP PARTITION (key1, key2, keyN)`	Instant (metadata-only)	Purge entire partition ranges (months, corrupt data, test data)
`DELETE IN PARTITION`	`ALTER TABLE t DELETE IN PARTITION (...) WHERE condition`	Fast (scans 1 partition)	Targeted row removal within a known partition
`ALTER TABLE DELETE`

Anti-pattern: LightweightDELETE FROM before INSERT

DELETE FROM sets _row_exists=0 masks instead of physically removing rows. These ghost rows:

Persist until ClickHouse background merge (unpredictable timing)
Show up in queries without FINAL as phantom data
Cause false anomalies in monitoring/integrity checks
Were the root cause of months of phantom Stathera gaps in production (opendeviationbar #269)

UseDELETE FROM only for: ad-hoc data correction where ghost rows don't matter (analytics cleanup, dev/test). Never use in write pipelines where INSERT follows DELETE.

All DELETE mutations should use : SETTINGS mutations_sync = 1 to block until completion (prevents INSERT-DELETE race conditions).

Partition-aware DELETE tip : If your partition key includes the columns you're filtering on (e.g., PARTITION BY (symbol, threshold, toYYYYMM(timestamp))), use DELETE IN PARTITION to scope the scan to a single partition instead of scanning all parts.

Table Engine Selection

Deployment	Engine	Use Case
ClickHouse Cloud	`SharedMergeTree`	Default for cloud deployments
Self-hosted cluster	`ReplicatedMergeTree`	Multi-node with replication
Self-hosted single	`MergeTree`	Single-node development/testing

Cloud (SharedMergeTree) :

CREATE TABLE trades (...)
ENGINE = SharedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}')
ORDER BY (exchange, symbol, timestamp);

Self-hosted (ReplicatedMergeTree) :

CREATE TABLE trades (...)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/trades', '{replica}')
ORDER BY (exchange, symbol, timestamp);

Skill Delegation Guide

This skill is the hub for ClickHouse-related tasks. When the user's needs extend beyond schema design, invoke the related skills below.

Delegation Decision Matrix

User Need	Invoke Skill	Trigger Phrases
Create database users, manage permissions	`devops-tools:clickhouse-cloud-management`	"create user", "GRANT", "permissions", "credentials"
Configure DBeaver, generate connection JSON	`devops-tools:clickhouse-pydantic-config`	"DBeaver", "client config", "connection setup"
Validate schema contracts against live database	`quality-tools:schema-e2e-validation`	"validate schema", "Earthly E2E", "schema contract"

Typical Workflow Sequence

Schema Design (THIS SKILL) → Design ORDER BY, compression, partitioning
User Setup → clickhouse-cloud-management (if cloud credentials needed)
Client Config → clickhouse-pydantic-config (generate DBeaver JSON)
Validation → schema-e2e-validation (CI/CD schema contracts)

Example: Full Stack Request

User : "I need to design a trades table for ClickHouse Cloud and set up DBeaver to query it."

Expected behavior :

Use THIS skill for schema design
Invoke clickhouse-cloud-management for creating database user
Invoke clickhouse-pydantic-config for DBeaver configuration

Performance Accelerators

Projections

Create alternative sort orders that ClickHouse automatically selects:

ALTER TABLE trades ADD PROJECTION trades_by_symbol (
    SELECT * ORDER BY symbol, timestamp
);
ALTER TABLE trades MATERIALIZE PROJECTION trades_by_symbol;

Materialized Views

Pre-compute aggregations for dashboard queries:

CREATE MATERIALIZED VIEW trades_hourly_mv
ENGINE = SummingMergeTree()
ORDER BY (exchange, symbol, hour)
AS SELECT
    exchange,
    symbol,
    toStartOfHour(timestamp) AS hour,
    sum(quantity) AS total_volume,
    count() AS trade_count
FROM trades
GROUP BY exchange, symbol, hour;

Dictionaries

Replace JOINs with O(1) dictionary lookups for large-scale star schemas :

When to use dictionaries (v24.4+) :

Fact tables with 100M+ rows joining dimension tables
Dimension tables 1k-500k rows with monotonic keys
LEFT ANY JOIN semantics required

When JOINs are sufficient (v24.4+) :

Dimension tables <500 rows (JOIN overhead negligible)
v24.4+ predicate pushdown provides 8-180x improvements
Complex JOIN types (FULL, RIGHT, multi-condition)

Benchmark context : 6.6x speedup measured on Star Schema Benchmark (1.4B rows).

CREATE DICTIONARY symbol_info (
    symbol String,
    name String,
    sector String
)
PRIMARY KEY symbol
SOURCE(CLICKHOUSE(TABLE 'symbols'))
LAYOUT(FLAT())  -- Best for <500k entries with monotonic keys
LIFETIME(3600);

-- Use in queries (O(1) lookup)
SELECT
    symbol,
    dictGet('symbol_info', 'name', symbol) AS symbol_name
FROM trades;

Scripts

Execute comprehensive schema audit:

clickhouse-client --multiquery < scripts/schema-audit.sql

The audit script checks:

Part count per partition (threshold: 300)
Compression ratios by column
Query performance patterns
Replication lag (if applicable)
Memory usage patterns

Additional Resources

Reference Files

Reference	Content
`references/schema-design-workflow.md`	Complete workflow with examples
`references/compression-codec-selection.md`	Decision tree + benchmarks
`references/anti-patterns-and-fixes.md`	13 deadly sins + v24.4+ status
`references/audit-and-diagnostics.md`

External Documentation

Python Driver Policy

Useclickhouse-connect (official) for all Python integrations.

# ✅ RECOMMENDED: clickhouse-connect (official, HTTP)
import clickhouse_connect

client = clickhouse_connect.get_client(
    host='localhost',
    port=8123,  # HTTP port
    username='default',
    password=''
)
result = client.query("SELECT * FROM trades LIMIT 1000")
df = client.query_df("SELECT * FROM trades")  # Pandas integration

Why NOT `clickhouse-driver`

Factor	clickhouse-connect	clickhouse-driver
Maintainer	ClickHouse Inc.	Solo developer
Weekly commits	Yes (active)	Sparse (months)
Open issues	41 (addressed)	76 (accumulating)
Downloads/week	2.7M	1.5M
Bus factor risk	Low (company)	High (1 person)

Do NOT useclickhouse-driver despite its ~26% speed advantage for large exports. The maintenance risk outweighs performance gains:

Single maintainer (mymarilyn) with no succession plan
Issues accumulating without response
Risk of abandonment breaks production code

Exception : Only consider clickhouse-driver if you have extreme performance requirements (exporting millions of rows) AND accept the maintenance risk.

ClickHouse COMMENT = Single Source of Truth

Every ClickHouse table and column MUST have a COMMENT that fully documents its meaning, computation method, and constraints. The COMMENT is the SSoT — no external document, skill, or wiki supersedes it.

Why

ClickHouse COMMENTs have no length limit , support newlines, URLs, and unicode
Zero performance impact on queries (pure metadata, never in data path)
Visible via DESCRIBE table, SHOW CREATE TABLE, system.columns
Survives schema migrations (preserved through ALTER operations)

What to Include in COMMENTs

Column purpose in plain English
Computation formula (if derived/computed)
Unit (seconds, milliseconds, bps, ratio)
Valid range or enum values
Anti-patterns (what NOT to do with this column)
GitHub issue link for provenance
Source script that populates the column

Example

ALTER TABLE t COMMENT COLUMN session_label
'STRICT session label. 8 values: sydney_only, tokyo_only, ...
Only set when ENTIRE bar (open→close) falls within one session.
cross_session = bar spans boundary. Use WHERE is_pure_session=1.
GitHub: https://github.com/org/repo/issues/54
Source: scripts/populate-sessions/populate_v3.py';

Anti-Pattern

NEVER create a ClickHouse column without a COMMENT. A column without documentation is a column that will be misused.

Related Skills

Skill	Purpose
`devops-tools:clickhouse-cloud-management`	User/permission management
`devops-tools:clickhouse-pydantic-config`	DBeaver connection generation
`quality-tools:schema-e2e-validation`	YAML schema contracts
`quality-tools:multi-agent-e2e-validation`	Database migration validation

Troubleshooting

Issue	Cause	Solution
Too many parts	Over-partitioned	Reduce partition granularity (monthly not daily)
Slow queries	Wrong ORDER BY order	Put lowest cardinality columns first
High memory usage	No memory limits set	Configure max_memory_usage setting
Codec error on Delta+Gorilla	Suspicious codec combination	Use each codec family independently
Projection not used	Optimizer chose different plan	Check EXPLAIN to verify projection selection
Dictionary stale	Lifetime expired	Increase LIFETIME or trigger refresh
Replication lag	Part merges falling behind	Check merge_tree settings, add resources
INSERT too slow	Small batch sizes	Batch to 10k-100k rows per INSERT

Weekly Installs

Repository

terrylica/cc-skills

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode78

github-copilot75

codex75

gemini-cli75

cursor73

amp72

Supabase 使用指南：安全最佳实践、CLI 命令与 MCP 服务器配置

5,400 周安装