NoSQL专家指南：Cassandra与DynamoDB分布式数据库设计模式与性能优化

nosql-expert by claudiodearaujo/izacenter

1 周安装量

1 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/claudiodearaujo/izacenter --skill nosql-expert

数据库性能优化分布式系统

🇨🇳中文介绍

NoSQL 专家模式（Cassandra 与 DynamoDB）

概述

本技能提供针对分布式宽列与键值存储（特别是 Apache Cassandra 和 Amazon DynamoDB）的专业心智模型和设计模式。

与 SQL（您需要为数据实体建模）或文档存储（如 MongoDB）不同，这些分布式系统要求您首先为查询建模。

适用场景

为扩展性设计：从简单的单节点数据库转向分布式集群。
技术选型：评估或使用 Cassandra、ScyllaDB 或 DynamoDB。
性能调优：排查现有 NoSQL 系统中的"热点分区"或高延迟问题。
微服务：在需要高度优化读取的场景中，实现"每服务一个数据库"的模式。

思维转变：SQL 与分布式 NoSQL

特性	SQL（关系型）	分布式 NoSQL（Cassandra/DynamoDB）
数据建模	为实体 + 关系建模	为查询（访问模式）建模
连接	在读取时进行，CPU 密集型	在写入时预计算（反规范化）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

1. 查询优先建模（访问模式）

通常，您无法在不进行迁移或创建新表/索引的情况下"稍后添加查询"。

列出所有实体（用户、订单、产品）。
列出所有访问模式（"通过电子邮件获取用户"、"按用户获取订单并按日期排序"）。
设计表，专门通过单次查找来服务这些模式。

数据根据分区键分布在物理节点上。

目标： 数据和流量的均匀分布。
反模式： 使用低基数的分区键（例如 status="active" 或 gender="m"）会导致热点分区，将吞吐量限制在单个节点的容量内。
最佳实践： 使用高基数的键（用户 ID、设备 ID、复合键）。

3. 聚类键 / 排序键

在分区内部，数据根据聚类键或排序键在磁盘上排序。

这使得高效的范围查询成为可能（例如 WHERE user_id=X AND date > Y）。
它有效地为特定的检索需求预先排序了数据。

4. 单表设计（邻接列表）

主要用途：DynamoDB（但概念也适用于其他地方）

将多种实体类型存储在一个表中，以实现预连接的读取。

PK（分区）	SK（排序）	数据字段...
`USER#123`	`PROFILE`	`{ name: "Ian", email: "..." }`
`USER#123`	`ORDER#998`	`{ total: 50.00, status: "shipped" }`
`USER#123`	`ORDER#999`	`{ total: 12.00, status: "pending" }`

查询： PK="USER#123"
结果： 在一次网络请求中获取用户资料和所有订单。

5. 反规范化与数据重复

不要害怕将相同的数据存储在多个表中，以服务不同的查询模式。

表 A： users_by_id (PK: uuid)
表 B： users_by_email (PK: email)

权衡：您必须跨表管理数据一致性（通常使用最终一致性或批量写入）。

Apache Cassandra / ScyllaDB

主键结构： ((分区键), 聚类列)
无连接，无聚合： 不要尝试 JOIN 或 GROUP BY。在单独的计数器表中预计算聚合。
避免 ALLOW FILTERING： 如果在生产环境中看到这个，说明您的数据模型是错误的。它意味着对整个集群进行扫描。
写入成本低： 插入和更新只是追加到 LSM 树。相比读取效率，不必过于担心写入量。
墓碑标记： 删除操作是昂贵的标记。避免在标准表中使用高频率的删除模式（如队列）。

GSI： 使用 GSI 创建数据的替代视图（例如，"按日期搜索订单"而不是按用户）。
- 注意： GSI 是最终一致的。
LSI： 在同一分区内以不同方式对数据进行排序。必须在创建表时创建。
WCU / RCU： 理解容量模式。单表设计有助于优化消耗的容量单位。
TTL： 使用生存时间属性自动使旧数据过期（免费删除），而不会产生墓碑标记。

在最终确定 NoSQL 模式之前：

访问模式覆盖： 每个查询模式是否都映射到特定的表或索引？
基数检查： 分区键是否有足够的唯一值来均匀分布流量？
分区拆分风险： 对于任何单个分区（例如，单个用户的所有订单），它是否会无限增长？（如果 > 10GB，您需要"分片"该分区，例如 USER#123#2024-01）。
一致性要求： 应用程序能否容忍此读取模式的最终一致性？

❌ 分散-聚集： 查询所有分区以查找一个项目（扫描）。 ❌ 热点键： 将所有"星期一"的数据放入一个分区。 ❌ 关系型建模： 创建 Author 和 Book 表并尝试在代码中连接它们。（相反，应将书籍摘要嵌入作者记录，或将作者信息复制到书籍记录中）。

🇺🇸English

NoSQL Expert Patterns (Cassandra & DynamoDB)

Overview

This skill provides professional mental models and design patterns for distributed wide-column and key-value stores (specifically Apache Cassandra and Amazon DynamoDB).

Unlike SQL (where you model data entities), or document stores (like MongoDB), these distributed systems require you to model your queries first.

When to Use

Designing for Scale : Moving beyond simple single-node databases to distributed clusters.
Technology Selection : Evaluating or using Cassandra , ScyllaDB , or DynamoDB.
Performance Tuning : Troubleshooting "hot partitions" or high latency in existing NoSQL systems.
Microservices : Implementing "database-per-service" patterns where highly optimized reads are required.

The Mental Shift: SQL vs. Distributed NoSQL

Feature	SQL (Relational)	Distributed NoSQL (Cassandra/DynamoDB)
Data modeling	Model Entities + Relationships	Model Queries (Access Patterns)
Joins	CPU-intensive, at read time	Pre-computed (Denormalized) at write time
Storage cost	Expensive (minimize duplication)	Cheap (duplicate data for read speed)
Consistency	ACID (Strong)	BASE (Eventual) / Tunable
Scalability	Vertical (Bigger machine)	Horizontal (More nodes/shards)

The Golden Rule: In SQL, you design the data model to answer any query. In NoSQL, you design the data model to answer specific queries efficiently.

Core Design Patterns

1. Query-First Modeling (Access Patterns)

You typically cannot "add a query later" without migration or creating a new table/index.

Process:

List all Entities (User, Order, Product).
List all Access Patterns ("Get User by Email", "Get Orders by User sorted by Date").
Design Table(s) specifically to serve those patterns with a single lookup.

2. The Partition Key is King

Data is distributed across physical nodes based on the Partition Key (PK).

Goal: Even distribution of data and traffic.
Anti-Pattern: Using a low-cardinality PK (e.g., status="active" or gender="m") creates Hot Partitions , limiting throughput to a single node's capacity.
Best Practice: Use high-cardinality keys (User IDs, Device IDs, Composite Keys).

3. Clustering / Sort Keys

Within a partition, data is sorted on disk by the Clustering Key (Cassandra) or Sort Key (DynamoDB).

This allows for efficient Range Queries (e.g., WHERE user_id=X AND date > Y).
It effectively pre-sorts your data for specific retrieval requirements.

4. Single-Table Design (Adjacency Lists)

Primary use: DynamoDB (but concepts apply elsewhere)

Storing multiple entity types in one table to enable pre-joined reads.

PK (Partition)	SK (Sort)	Data Fields...
`USER#123`	`PROFILE`	`{ name: "Ian", email: "..." }`
`USER#123`	`ORDER#998`	`{ total: 50.00, status: "shipped" }`
`USER#123`	`ORDER#999`

Query: PK="USER#123"
Result: Fetches User Profile AND all Orders in one network request.

5. Denormalization & Duplication

Don't be afraid to store the same data in multiple tables to serve different query patterns.

Table A: users_by_id (PK: uuid)
Table B: users_by_email (PK: email)

Trade-off: You must manage data consistency across tables (often using eventual consistency or batch writes).

Specific Guidance

Apache Cassandra / ScyllaDB

Primary Key Structure: ((Partition Key), Clustering Columns)
No Joins, No Aggregates: Do not try to JOIN or GROUP BY. Pre-calculate aggregates in a separate counter table.
AvoidALLOW FILTERING: If you see this in production, your data model is wrong. It implies a full cluster scan.
Writes are Cheap: Inserts and Updates are just appends to the LSM tree. Don't worry about write volume as much as read efficiency.
Tombstones: Deletes are expensive markers. Avoid high-velocity delete patterns (like queues) in standard tables.

AWS DynamoDB

GSI (Global Secondary Index): Use GSIs to create alternative views of your data (e.g., "Search Orders by Date" instead of by User).
- Note: GSIs are eventually consistent.
LSI (Local Secondary Index): Sorts data differently within the same partition. Must be created at table creation time.
WCU / RCU: Understand capacity modes. Single-table design helps optimize consumed capacity units.
TTL: Use Time-To-Live attributes to automatically expire old data (free delete) without creating tombstones.

Expert Checklist

Before finalizing your NoSQL schema:

Access Pattern Coverage: Does every query pattern map to a specific table or index?
Cardinality Check: Does the Partition Key have enough unique values to spread traffic evenly?
Split Partition Risk: For any single partition (e.g., a single user's orders), will it grow indefinitely? (If > 10GB, you need to "shard" the partition, e.g., USER#123#2024-01).
Consistency Requirement: Can the application tolerate eventual consistency for this read pattern?

Common Anti-Patterns

❌ Scatter-Gather: Querying all partitions to find one item (Scan). ❌ Hot Keys: Putting all "Monday" data into one partition. ❌ Relational Modeling: Creating Author and Book tables and trying to join them in code. (Instead, embed Book summaries in Author, or duplicate Author info in Books).

Weekly Installs

Repository

claudiodearaujo…zacenter

GitHub Stars

First Seen

Today

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

zencoder1

amp1

cline1

openclaw1

opencode1

cursor1

GSAP 框架集成指南：Vue、Svelte 等框架中 GSAP 动画最佳实践

2,000 周安装

NoSQL专家指南：Cassandra与DynamoDB分布式数据库设计模式与性能优化

🇨🇳中文介绍

NoSQL 专家模式（Cassandra 与 DynamoDB）

概述

适用场景

思维转变：SQL 与分布式 NoSQL

相关 Skills

核心设计模式

1. 查询优先建模（访问模式）

2. 分区键为王

3. 聚类键 / 排序键

4. 单表设计（邻接列表）

5. 反规范化与数据重复

具体指导

Apache Cassandra / ScyllaDB

AWS DynamoDB

专家检查清单

常见反模式

🇺🇸English

NoSQL Expert Patterns (Cassandra & DynamoDB)

Overview

When to Use

The Mental Shift: SQL vs. Distributed NoSQL

Core Design Patterns

1. Query-First Modeling (Access Patterns)

2. The Partition Key is King

3. Clustering / Sort Keys

4. Single-Table Design (Adjacency Lists)

5. Denormalization & Duplication

Specific Guidance

Apache Cassandra / ScyllaDB

AWS DynamoDB

Expert Checklist

Common Anti-Patterns

最新 Skills