What is Cassandra’s data model called?
A partitioned row store - data is stored in sparse multidimensional hash tables where each row has a unique partition key used to distribute rows across nodes.
What are the basic Cassandra data structures in order from smallest to largest?
Column (name/value pair) → Row (container for columns) → Partition (group of related rows) → Table (container for rows) → Keyspace (container for tables) → Cluster (container for keyspaces)
What is a keyspace in Cassandra?
The outermost container for data, corresponding to a database in relational terms. Has attributes defining keyspace-wide behavior like replication.
What is a partition in Cassandra?
A group of related rows stored together on the same nodes. Each partition is uniquely identified by a partition key.
What is a partition key?
The column(s) used to determine which node stores the data. It’s hashed by the partitioner to assign data to nodes in the ring.
What is a clustering column?
Columns that control how data is sorted within a partition. Combined with partition key, they form the primary key and uniquely identify rows.
What is a primary key in Cassandra?
Combination of partition key and optional clustering columns. The partition key determines data distribution; clustering columns determine sort order within partitions.
How do you define a composite partition key?
Surround multiple columns with parentheses: PRIMARY KEY ((col1, col2), col3). The columns in parentheses form the partition key.
What is a static column?
A column that stores data shared by every row in a partition. Defined with STATIC keyword. Not part of the primary key but shared across the partition.
What is the text data type?
A UTF-8 character string. Synonym for varchar. Recommended over ascii for internationalization support.
What is the uuid data type?
A Type 4 UUID (128-bit) based entirely on random numbers. Often used as surrogate keys. Generate with uuid() function.
What is the timeuuid data type?
A Type 1 UUID based on MAC address, system time, and a sequence number. Useful for conflict-free timestamps. Functions: now(), dateOf(), unixTimestampOf().
What is the counter data type?
A 64-bit signed integer that can only be incremented or decremented, not set directly. Used for tracking statistics like page views.
What are the restrictions on counter columns?
Cannot be part of primary key. If a table has a counter, all non-primary key columns must be counters. Counter operations are not idempotent.
What is a set collection in Cassandra?
Stores unordered collection of unique elements. Returned in sorted order. Can add/remove items without reading first. Syntax: set<text></text>
What is a list collection in Cassandra?
Stores ordered collection of elements. Elements accessed by position (index). Can prepend/append items. Updating by index can be expensive. Syntax: list<text></text>
What is a map collection in Cassandra?
Stores collection of key-value pairs. Keys and values can be any type except counter. Access individual items by key. Syntax: map<text frozen<address>>
What is a tuple in Cassandra?
A fixed-length set of values of various types. Must update entire tuple at once (no individual field updates). Less commonly used than UDTs. Syntax: tuple<text, text, int>
What is a User-Defined Type (UDT)?
Custom types to extend the data model. Easier to use than tuples since values are named. Scoped by keyspace. Created with CREATE TYPE command.
What does FROZEN mean for collections?
Freezing serializes a collection as a single binary blob. Required for nesting collections. Frozen collections must be read/written in entirety.
What is the TTL (Time to Live)?
Column-level expiration. After TTL seconds, Cassandra marks data for deletion. Set with USING TTL clause. Check with TTL() function.
How do you set TTL on data?
INSERT INTO table (cols) VALUES (vals) USING TTL 86400; or UPDATE table USING TTL 3600 SET col=val WHERE… (86400 = 24 hours)
What is the timestamp of a column?
Every column stores a timestamp of when it was last modified. Used for conflict resolution (last write wins). View with writetime() function.
What is the blob data type?
Binary large object - arbitrary array of bytes. Useful for media or binary files. Cassandra doesn’t validate blob contents. Represented as hex digits.