Apache Cassandra, 3rd Edition - Part 2 Flashcards

(51 cards)

1
Q

What is Cassandra’s data model called?

A

A partitioned row store - data is stored in sparse multidimensional hash tables where each row has a unique partition key used to distribute rows across nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the basic Cassandra data structures in order from smallest to largest?

A

Column (name/value pair) → Row (container for columns) → Partition (group of related rows) → Table (container for rows) → Keyspace (container for tables) → Cluster (container for keyspaces)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a keyspace in Cassandra?

A

The outermost container for data, corresponding to a database in relational terms. Has attributes defining keyspace-wide behavior like replication.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a partition in Cassandra?

A

A group of related rows stored together on the same nodes. Each partition is uniquely identified by a partition key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a partition key?

A

The column(s) used to determine which node stores the data. It’s hashed by the partitioner to assign data to nodes in the ring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a clustering column?

A

Columns that control how data is sorted within a partition. Combined with partition key, they form the primary key and uniquely identify rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a primary key in Cassandra?

A

Combination of partition key and optional clustering columns. The partition key determines data distribution; clustering columns determine sort order within partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you define a composite partition key?

A

Surround multiple columns with parentheses: PRIMARY KEY ((col1, col2), col3). The columns in parentheses form the partition key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a static column?

A

A column that stores data shared by every row in a partition. Defined with STATIC keyword. Not part of the primary key but shared across the partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the text data type?

A

A UTF-8 character string. Synonym for varchar. Recommended over ascii for internationalization support.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the uuid data type?

A

A Type 4 UUID (128-bit) based entirely on random numbers. Often used as surrogate keys. Generate with uuid() function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the timeuuid data type?

A

A Type 1 UUID based on MAC address, system time, and a sequence number. Useful for conflict-free timestamps. Functions: now(), dateOf(), unixTimestampOf().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the counter data type?

A

A 64-bit signed integer that can only be incremented or decremented, not set directly. Used for tracking statistics like page views.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the restrictions on counter columns?

A

Cannot be part of primary key. If a table has a counter, all non-primary key columns must be counters. Counter operations are not idempotent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a set collection in Cassandra?

A

Stores unordered collection of unique elements. Returned in sorted order. Can add/remove items without reading first. Syntax: set<text></text>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a list collection in Cassandra?

A

Stores ordered collection of elements. Elements accessed by position (index). Can prepend/append items. Updating by index can be expensive. Syntax: list<text></text>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a map collection in Cassandra?

A

Stores collection of key-value pairs. Keys and values can be any type except counter. Access individual items by key. Syntax: map<text frozen<address>>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a tuple in Cassandra?

A

A fixed-length set of values of various types. Must update entire tuple at once (no individual field updates). Less commonly used than UDTs. Syntax: tuple<text, text, int>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a User-Defined Type (UDT)?

A

Custom types to extend the data model. Easier to use than tuples since values are named. Scoped by keyspace. Created with CREATE TYPE command.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does FROZEN mean for collections?

A

Freezing serializes a collection as a single binary blob. Required for nesting collections. Frozen collections must be read/written in entirety.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the TTL (Time to Live)?

A

Column-level expiration. After TTL seconds, Cassandra marks data for deletion. Set with USING TTL clause. Check with TTL() function.

22
Q

How do you set TTL on data?

A

INSERT INTO table (cols) VALUES (vals) USING TTL 86400; or UPDATE table USING TTL 3600 SET col=val WHERE… (86400 = 24 hours)

23
Q

What is the timestamp of a column?

A

Every column stores a timestamp of when it was last modified. Used for conflict resolution (last write wins). View with writetime() function.

24
Q

What is the blob data type?

A

Binary large object - arbitrary array of bytes. Useful for media or binary files. Cassandra doesn’t validate blob contents. Represented as hex digits.

25
What is the inet data type?
Represents IPv4 or IPv6 internet addresses. IPv4 displayed in dotted decimal format. IPv6 in eight groups of four hex digits.
26
What is the boolean data type?
Simple true/false value. cqlsh is case insensitive for input but outputs True or False with capital letters.
27
What is the difference between int and bigint?
int is a 32-bit signed integer. bigint is a 64-bit signed long integer (equivalent to Java long).
28
What are smallint and tinyint?
smallint is a 16-bit signed integer. tinyint is an 8-bit signed integer. Both added in Cassandra 2.2.
29
What is the varint data type?
Variable precision signed integer equivalent to java.math.BigInteger. Useful when you need arbitrary precision.
30
What is the decimal data type?
Variable precision decimal equivalent to java.math.BigDecimal. Useful for financial calculations requiring exact precision.
31
What is the date data type?
Represents a date without time of day. Added in Cassandra 2.2. Supports ISO 8601 formats. Stored as days since epoch.
32
What is the time data type?
Represents time of day without a date. Added in Cassandra 2.2. Stored as nanoseconds since midnight.
33
What is the timestamp data type?
64-bit signed integer representing both date and time. Supports ISO 8601 formats. Best practice: always include timezone.
34
Why can't you modify a primary key after table creation?
The primary key controls how data is distributed in the cluster and stored on disk. Changing it would require reorganizing all data.
35
What is the CREATE KEYSPACE syntax?
CREATE KEYSPACE name WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 2};
36
What is the CREATE TABLE syntax?
CREATE TABLE keyspace.table (col1 type, col2 type, PRIMARY KEY ((partition_cols), clustering_cols)) WITH options;
37
How do you add a column to an existing table?
ALTER TABLE table_name ADD column_name type;
38
How do you drop a column from a table?
ALTER TABLE table_name DROP column_name;
39
What is the INSERT syntax in CQL?
INSERT INTO table (col1, col2) VALUES (val1, val2) [USING TTL seconds] [AND TIMESTAMP microseconds];
40
What is the UPDATE syntax in CQL?
UPDATE table [USING TTL seconds] SET col1 = val1 WHERE partition_key = value AND clustering_col = value;
41
What is an UPSERT in Cassandra?
INSERT and UPDATE are semantically identical - both create a row if it doesn't exist or update if it does. There's no separate insert-only operation.
42
How do you delete data in CQL?
DELETE FROM table WHERE partition_key = value; or DELETE column FROM table WHERE... to delete specific columns.
43
What is TRUNCATE in CQL?
TRUNCATE table_name; removes all data from a table but keeps the table structure. Creates snapshots by default before truncating.
44
How do you create a secondary index?
CREATE INDEX [name] ON table (column); Name defaults to tablename_columnname_idx if not specified.
45
When should you avoid secondary indexes?
1) High cardinality columns 2) Very low cardinality columns 3) Frequently updated/deleted columns. Denormalization or materialized views often perform better.
46
What is SASI (SSTable Attached Secondary Index)?
An alternative secondary index implementation where indexes are stored as part of SSTable files. Supports inequality searches and LIKE pattern matching.
47
How do you create a SASI index?
CREATE CUSTOM INDEX name ON table (column) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode': 'CONTAINS'};
48
What is a materialized view?
A view that stores preconfigured query results, maintained automatically by Cassandra when base table changes. Supports queries on columns not in original primary key.
49
What are the restrictions on materialized views?
Primary key must include all columns from base table's primary key. Only one non-primary key column from base can be added. Still experimental as of 4.0.
50
How do you create a materialized view?
CREATE MATERIALIZED VIEW view_name AS SELECT * FROM table WHERE col IS NOT NULL PRIMARY KEY (new_pk_cols);
51
What is the ALLOW FILTERING clause?
Overrides Cassandra's default behavior to allow queries that may require scanning all nodes. Should be avoided in production due to unpredictable performance.