Data Science at Scale Flashcards

(312 cards)

1
Q

How have companies changed in the big data era and what has enabled this?

A

Become more social, customer-orientated and dynamic. They have done this by collecting data, learning from the data, and improving and adapting in response. This is because of cheaper storage and processing, faster networks, and free open-source tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What technological shift enabled widespread data analytics?

A

Cloud-based infrastructure (e.g., AWS, GCP) and Infrastructure as a Service solutions from internet giants like Google, Amazon, and Microsoft.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How has big data transformed marketing?

A

Customer profiling, targeted ads, and personalised communication and recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 3 V’s of big data?

A

Volume, Velocity and Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the four fundamental functionalities that Data-intensive applications are built from?

A

Database - Store data so it can be retrieved later.
Caching - Store the results of expensive operations to be used again soon.
Indexing - Allow users to efficiently search the data.
Batch Processing - Periodically run specific routines on large amounts of accumulated data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the three important things that a Data-intensive application needs to be?

A

Reliable, Scalable, Maintainable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does it mean for a system to be reliable?

A

It performs the function the user expected. It can tolerate the user making mistakes. Its performance is good enough for the requires use case, under the expected load and volume. It prevents any unauthorised access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are faults and failures?

A

A fault is when a component (hardware/software) of the system works in an unexpected way, and a failure is when the entire system stops providing the service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are Hardware Faults and what measures can be taken to stop them?

A

Usually when a HDD, memory module or PSU stops working. In large data centres this is common. We can use hardware measures such as RAID for HDD’s, redundant PSU’s, and hot-swappable CPU’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are Software Faults?

A

When the software stops working. These are harder to anticipate, and can be present on many nodes of a system causing widespread failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Scalability?

A

A system’s ability to cope with increased load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Load?

A

A measure of the amount of use of a system, for example: requests per second, number of players, read/write ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is performance?

A

How well the system is responding to the load, for example: response time, or time taken to process a dataset. The average and distribution are both important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Vertical Scaling?

A

Upping the specs of the current system, this does not scale linearly, and has limited fault tolerance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Horizontal Scaling?

A

Increasing the amount of devices in the system, which scales better and has better fault tolerance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Maintainability?

A

The overall cost to maintain a system operational and updated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Operability?

A

How easy it is for the operation team to keep it running. Includes good monitoring, automation, and predictable behaviour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Simplicity?

A

How easy it is for new people working on the system to understand it, without reducing functionality, just accidental complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Evolvability?

A

How easy it is to make changes and update the system, which is closely linked with simplicity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a data model in the context of data storage?

A

A structure that maps real-world entities (e.g., objects in code) to how they are stored (e.g., tables, JSON).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an ORM (Object-Relational Mapping)?

A

A system that maps classes in code to a relational database schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

In the relational model, how are one-to-many relationships handled?

A

By using separate tables with foreign keys pointing to the parent table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What structured field types can be used within relational DBs for complex fields?

A

XML or JSON fields within a table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a document model in NoSQL?

A

A system where data is stored in documents (e.g., JSON, XML) representing semi-structured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Why are document models considered more flexible than relational models?
They don’t require a rigid schema; each document can be different.
26
What’s a key weakness of document models?
Difficulty handling many-to-many relationships efficiently.
27
What are graph models used for?
Representing complex many-to-many relationships using nodes and edges.
28
In graph models, what are nodes and edges?
Nodes are entities or objects; edges are the relationships between them.
29
What is a NoSQL database?
A database that doesn’t use traditional relational models; prioritizes scalability and availability.
30
What is the CAP Theorem?
In a distributed system, you can only choose two: Consistency, Availability, and Partition Tolerance.
31
What does consistency mean in CAP?
Every read returns the most recent write or an error.
32
What does availability mean in CAP?
Every request gets a response, even if it's not the most up-to-date.
33
What does partition tolerance mean in CAP?
The system continues functioning despite network partitions.
34
What are the four types of NoSQL databases?
Document, Key-value, Wide-column, and Graph databases.
35
What is a key-value store?
A NoSQL database that stores data as key-value pairs, like a dictionary.
36
What is a wide-column store?
A database model where data is stored in rows and columns, but columns can vary between rows.
37
How is data organized in wide-column stores?
By column families instead of rows.
38
What is a property graph in a graph database?
A graph where nodes and edges can have associated properties.
39
What is the main advantage of using a graph store over relational or document models?
Better handling of complex and interconnected data.
40
What does schema-on-read mean in document databases?
The structure of data is defined when it's read by the application, not enforced when written.
41
Why is schema-on-read useful?
It's good for heterogeneous data or when you can't control the data structure, like tweets or logs.
42
How does schema flexibility in document DBs compare to relational DBs?
Document DBs allow format changes without changing the schema; relational DBs often need downtime and schema updates.
43
What is "locality" in document databases?
It means storing a whole document as a continuous string (like JSON), keeping related data together.
44
What’s a drawback of locality in document databases?
Even if you need only part of the document, the DB loads the whole thing, which can be inefficient.
45
What are the two main jobs of a database?
Store data and retrieve data.
46
What is a transactional workload?
A write-heavy workload (e.g., bank transactions).
47
What is an analytics workload?
A read-heavy workload (e.g., dashboards, reports).
48
What is a storage engine?
The part of the database that handles how data is written to and read from disk.
49
What is a log-structured database?
A database that appends all writes to a log file.
50
Why are appends fast in a log file?
Because they avoid random disk access and just add to the end of the file.
51
What's the downside of using a plain log for reads?
You have to scan the whole file (O(n) time complexity).
52
What is a hash index?
A map of keys to byte offsets in a log file.
53
What is the benefit of a hash index?
Fast lookups for keys.
54
What is a limitation of a hash index?
It doesn’t support range queries and must fit in memory.
55
Why are log files split into segments?
To prevent a single log file from growing too large.
56
What is compaction?
A process that merges segments and removes duplicates or deleted records.
57
What does SSTable stand for?
Sorted String Table.
58
What is the key property of SSTables?
They store keys in sorted order and only once per segment.
59
Why are SSTables efficient for reads?
Because they support binary search and smaller indexes.
60
Do SSTables still need an index?
Yes, but only for some keys (sparse index).
61
How is data written to an SSTable?
Data is first written to a memtable (e.g., an AVL tree) in memory. When full, the memtable is flushed to disk as a new SSTable file.
62
How does reading from an SSTable work?
First, check the memtable; if not found, search the newest SSTable, then older ones.
63
What are the two main reasons for distributed storage?
Scalability and fault tolerance.
64
What is vertical scaling?
Adding more power (CPU, RAM) to a single machine.
65
What is horizontal scaling?
Adding more machines to the system.
66
What are the two main strategies for distributing data?
Replication and partitioning.
67
What is replication in distributed systems?
Each node holds a full copy of the data (a replica).
68
Name three benefits of replication.
Lower latency, higher availability, better read performance.
69
What is the downside of replication?
Writes must update all replicas.
70
What is the leader in a replicated system?
The node that receives all write operations.
71
What do followers do in replication?
Apply updates from the leader using a replication log.
72
What is synchronous replication?
Leader waits for confirmation from followers before acknowledging a write.
73
What is the difference between synchronous and asynchronous followers in replication?
A synchronous follower must confirm a write before it's considered successful, ensuring stronger consistency. An asynchronous follower receives updates later and doesn’t delay the leader’s response.
74
What happens if the synchronous follower fails in replication?
An asynchronous follower is promoted to be synchronous.
75
How do you add a new follower in replication?
Snapshot the leader, copy data, then replay changes since the snapshot.
76
What happens when a follower fails in replication?
It fetches missed changes from the leader using logs.
77
What happens when a leader fails in replication?
A follower is promoted to become the new leader (failover).
78
What is "split brain" in replication?
Multiple leaders making conflicting updates.
79
What is replication lag?
Followers being behind the leader in data updates.
80
What is "read-your-writes" consistency?
Users always see their own recent updates.
81
What is "monotonic reads" consistency?
Users never see older data than a previous read.
82
What is partitioning in distributed databases?
Breaking the dataset into subsets (partitions), each stored on a different node.
83
How many partitions does a piece of data belong to?
Exactly one.
83
Why is partitioning used?
To scale data storage and load across multiple nodes for large datasets.
83
Can a node store more than one partition?
Yes
84
What is a hot spot node?
A node that handles a disproportionately large amount of data or requests.
85
What is the simplest way to distribute partitions?
Randomly scatter data across nodes.
86
What is partitioning by key range?
Assign a continuous range of keys to each partition.
87
What’s a drawback of key range partitioning?
Can cause write hotspots if keys follow insertion order.
88
What is partitioning by hash of key?
Use a hash function to assign data to partitions.
89
What’s the benefit of hash partitioning?
Even distribution of data across partitions.
90
What’s the drawback of hash partitioning?
Poor support for range queries.
91
How do secondary indexes affect partitioning?
They don’t uniquely identify records and don’t map neatly to partitions.
91
What is rebalancing in partitioned systems?
Moving data between nodes to distribute load evenly.
92
When is rebalancing needed?
When nodes fail, data grows, or load increases.
93
What should happen during rebalancing?
Database continues operating, and minimal data is moved.
94
What is fixed partitioning?
A set number of partitions that get reassigned when nodes are added or removed.
95
What is dynamic partitioning?
Partitions split or merge depending on data size.
96
Which partitioning method supports range queries better?
Key range partitioning.
97
Which partitioning method avoids write hotspots better?
Hash partitioning.
98
According to the CAP theorem, what must a system choose between during a network partition?
Consistency and Availability.
99
What does "Consistency" mean in CAP?
Every read gets the most recent write or an error.
100
What does "Availability" mean in CAP?
Every request gets a (non-error) response.
101
What does "Partition Tolerance" mean in CAP?
The system still operates even if parts of the network are disconnected.
102
What are the ACID properties?
Atomicity, Consistency, Isolation, Durability.
103
What is eventual consistency?
All nodes will eventually hold the same data, but reads may return outdated values in the meantime.
104
What is linearizability?
All operations appear instant and atomic, as if there's only one copy of the data.
105
Why is linearizability important?
To guarantee strong consistency, such as unique IDs or avoiding double bookings.
106
What is the trade-off of linearizability in CAP?
It sacrifices availability during network partitions.
107
What happens to performance in linearizable systems with high network delays?
Response time increases with network uncertainty.
108
What is consensus in distributed systems?
Getting all nodes to agree on a value or action, like leader election or committing transactions.
109
What is the Two-Phase Commit protocol used for?
Ensuring atomic commit across multiple nodes.
110
What happens in the first phase of 2PC
The coordinator asks all nodes to prepare to commit.
111
What happens in the second phase of 2PC?
If all agree, the coordinator sends a commit. If any disagree, it sends an abort.
112
Why is 2PC difficult in distributed systems?
Nodes can crash, messages can be lost, and decisions can be left hanging.
113
What is a downside of eventual consistency?
Reads may return outdated or inconsistent data temporarily.
114
In a linearizable system, what happens once a client sees an update?
All other clients must see that update afterward.
115
What problem does the Paxos algorithm solve?
Reaching consensus in a distributed system, even with unreliable nodes.
116
What are the three roles in Paxos?
Proposer, Acceptor, Learner.
117
What is the role of a Proposer in Paxos?
Suggests a value to be agreed upon.
118
What is the role of an Acceptor in Paxos?
Votes to accept proposed values; consensus is reached when a majority accepts.
119
What is the role of a Learner in Paxos?
Learns the value that has been chosen but doesn't participate in voting.
120
What is a quorum in Paxos?
A majority of acceptors (more than half), required to make decisions.
121
What is a proposal in Paxos?
A value identified with a unique, increasing number.
122
What happens during Phase 1 of Paxos?
Proposer sends PREPARE(x); acceptors respond with PROMISE(x).
123
What does a PROMISE(x) mean in Paxos?
The acceptor won’t accept proposals with a number less than x.
124
What happens during Phase 2 of Paxos?
If proposer gets promises from a quorum, it sends ACCEPT(x, v) to acceptors.
125
When is a value considered "chosen" in Paxos?
When a majority of acceptors send ACCEPTED(x, v).
126
What does the Learner do in Paxos?
Learns the value chosen after the majority of acceptors accept it.
127
Why are unique proposal numbers important in Paxos?
They ensure that newer proposals can override older ones and prevent conflict.
128
What happens if multiple proposers send proposals?
Paxos ensures only one value is chosen by allowing only the highest-numbered proposal to proceed.
129
Why is consensus hard in distributed systems?
Because of node failures, network delays, and inconsistent message ordering.
130
What are the three types of data systems?
Services (online), Batch processing (offline), and Stream processing (near-real-time).
131
What is the main performance metric for batch processing?
Throughput.
132
What is batch processing?
Processing large amounts of data in scheduled jobs that produce output.
133
How does stream processing differ from batch processing?
Stream processing operates on data shortly after it is produced; batch processes large data sets periodically.
134
What is the Unix philosophy related to batch jobs?
Build small, simple, modular tools that do one thing well and chain them together.
135
What is MapReduce?
A batch processing framework for distributed computation on large datasets.
136
What is the role of the Mapper in MapReduce?
Extracts key-value pairs from input data; runs once per record.
137
Is the Mapper in MapReduce stateful?
No, it is stateless.
138
What is the role of the Reducer in MapReduce?
Aggregates values grouped by key and produces output.
139
What is the key strength of MapReduce?
Parallelism across distributed systems
140
Can a single MapReduce job solve all batch problems?
No, jobs are often chained together into workflows.
141
What is a reduce-side join in MapReduce?
Joining datasets by emitting a shared key and processing all values for that key in the reducer.
142
Why are reduce-side joins used in batch processing?
To perform large-scale dataset joins efficiently and locally.
143
What kind of input/output model does MapReduce use?
Key-value pair model.
144
What is stream processing?
A method of processing data in real-time, event-by-event, as it arrives.
145
What is an event in stream processing?
A small, immutable data record representing something that happened, usually with a timestamp.
146
What generates events in a stream system?
A producer (also called publisher or sender).
146
Who handles events in a stream system?
One or more consumers (also called subscribers).
147
Why not use databases or files for stream processing?
Polling them constantly adds overhead; they're not built for real-time notification.
148
What is a messaging system?
A service that allows producers to send messages and consumers to receive them in real time.
149
What happens if producers send messages faster than consumers can process?
Messages can be dropped, buffered in a queue, or flow-controlled (backpressure).
150
What happens to messages when a node crashes?
They can be lost or stored (if durability is implemented).
151
What is direct messaging?
Producers send messages directly to consumers, often using TCP or UDP.
152
What is a message broker?
A server that stores and forwards messages between producers and consumers.
153
What’s a benefit of using a message broker?
It can handle clients that disconnect or crash, improving reliability.
154
How do message brokers handle multiple consumers?
Via load balancing or fan-out.
155
What is load balancing in stream systems?
Each message is sent to one consumer in the group to share the load.
156
What is fan-out in stream systems?
Every consumer receives every message, useful for broadcasting.
157
Why are acknowledgements needed in stream systems?
To confirm that a message was processed; if not, it may be redelivered.
158
What can go wrong with message re-delivery?
It can affect message ordering.
159
How are databases different from message brokers?
Databases persist data and offer search; brokers focus on message delivery and often delete after consumption.
160
What is a stream operator?
A piece of code (or job) that consumes input streams and outputs a new derived stream.
161
How is a stream processor similar to MapReduce?
Both read-only inputs and write append-only outputs.
162
Why is fault tolerance harder in stream processing than batch?
Because streams never end — you can't just restart from the beginning.
163
What is Complex Event Processing (CEP)?
A system that detects patterns in streams using SQL-like queries and emits events when a match is found.
164
How is CEP different from databases?
CEP has persistent queries on transient data; databases have transient queries on persistent data.
165
What is stream analytics?
Real-time calculations over streams, like rates, rolling averages, and trend detection.
166
What is event time?
The time when the event actually occurred
167
What is processing time?
The time when the system processed the event.
168
Why is confusing event time and processing time a problem?
It can lead to incorrect results and analysis.
169
Why is timestamp assignment tricky in streaming?
Because devices and servers may have different clocks and delays.
170
What are the three types of timestamps we might store for an event?
When it happened, when it was sent, when it was received.
171
What is a tumbling window?
A fixed-size, non-overlapping time window, each event belongs to one window.
172
What is a hopping window?
A fixed-size window that overlaps in time, allowing events to belong to multiple windows.
173
What is gradient descent?
An iterative algorithm that updates model parameters in the direction that decreases the loss, controlled by a step size called the learning rate.
174
In neural-network training, what two passes occur each epoch?
A forward pass (to compute predictions) and a backpropagation pass (to compute gradients and update weights).
175
What distinguishes full-batch learning from mini-batch learning?
Full batch uses all available data to compute the exact gradient, while mini-batch uses subsets to approximate it.
176
Name one advantage of full-batch learning.
It provides a smooth, consistent gradient each epoch, making convergence more stable.
177
Name one disadvantage of full-batch learning.
Retraining on large or growing datasets is computationally expensive and may not fit in memory.
178
What is concept drift?
When the relationship 𝑃(𝑦∣𝑥)itself changes over time in unanticipated ways .
179
What is mini-batch learning?
A method where the dataset is divided into small batches, each used to compute an approximate gradient.
180
Give one benefit of mini-batch learning.
Faster iterations since less data is processed per update.
181
Give one drawback of mini-batch learning.
Convergence can be noisy and unstable due to variability in each batch.
182
Why is the choice of batch size important in mini-batch learning?
It balances computational cost (smaller batch = faster) against convergence stability (larger batch = smoother).
183
What is streaming data in the context of model training?
Data arriving in a continuous, high-speed flow where only one pass per sample is possible.
184
Why do streaming-data models risk becoming outdated?
Because the data distribution can evolve quickly, and the model can’t revisit past examples.
185
Define online learning.
Updating model parameters immediately after each new data point arrives.
186
How does online learning relate to stochastic gradient descent?
Online learning uses a single-sample update, which is exactly SGD.
187
What role does the learning rate play in online learning?
It controls how aggressively the model adapts to each new observation.
188
Why must online learning be closely monitored?
A single outlier or bad data point can disproportionately affect the model’s performance.
189
What is a covariate shift?
When the input distribution 𝑃(𝑥) changes but the conditional 𝑃(𝑦∣𝑥) remains the same.
190
Provide an example of covariate shift.
A face-recognition system struggling with masked faces.
191
What is prior probability shift?
When the distribution of target variables changes but inputs stay the same.
192
Provide an example of prior probability shift.
A flu-prediction model during a pandemic sees 𝑃(𝑦 = flu) surge while symptoms-to-flu mapping stays constant .
193
How often might you retrain a mini-batch model?
It depends on the application—daily, weekly, monthly, etc., based on data volatility.
194
What is incremental learning?
Continuously updating the model with new data without retraining from scratch.
195
Why is full-batch learning “simpler to reason about”?
Because each epoch uses the entire dataset, so the gradient represents the true direction of steepest descent.
196
What’s the main trade-off between full-batch and mini-batch learning?
Stability and smooth convergence versus computational efficiency and adaptability to new data.
197
Why must we design communication patterns carefully in distributed ML?
To avoid idle workers waiting on data or synchronization, thereby maximizing utilization.
198
What does the push primitive do?
A machine sends data proactively to another without a request.
199
How is pull different from push?
In pull, a machine requests data from another, rather than receiving it unprompted.
200
How is pull different from push?
In pull, a machine requests data from another, rather than receiving it unprompted.
201
What’s the role of broadcast?
To send identical data from one node to all other workers simultaneously.
202
Describe reduce in distributed settings.
Aggregating partial results (e.g., sums) from multiple workers onto a single machine.
203
How does all-reduce extend reduce?
After aggregation, it distributes the final result to every worker.
204
What happens at a wait primitive?
One machine halts its work until it gets a signal from another machine.
205
Define a barrier synchronization.
All machines pause until every one of them reaches the barrier, then all resume together.
206
Why overlap computation with communication?
To hide network latency by doing useful work while data moves.
207
In distributed SGD, what is an epoch?
One full pass over the entire training dataset.
208
How is the effective batch size B related to worker count M and local batch B′?
B = M × B′.
209
Outline the three steps of SGD with all-reduce.
Local gradient on each worker; (2) all-reduce sum; (3) synchronized parameter update.
210
What’s one key benefit of using all-reduce for parallel SGD?
It’s statistically identical to standard minibatch SGD, so hyperparameters carry over.
211
What’s a major drawback of this all-reduce approach?
Workers idle during the global reduction, since there’s no overlap.
212
In k-means MapReduce, what do mappers emit?
Pairs of (cluster_id → point data) after assigning each point to its nearest centroid.
213
What’s the combiner’s job in parallel k-means?
Locally summing coordinates and counts per cluster to reduce network shuffle.
214
During the reduce phase of k-means, how are new centroids computed?
By dividing the total sum of point coordinates by the total count for each cluster.
215
How do you test for convergence in MapReduce k-means?
Compare old and new centroids; if they move less than a threshold, you stop.
216
Why might you include a combiner in MapReduce?
To cut down on data sent over the network by doing partial aggregation in the map stage.
217
When is a barrier sync useful?
When you need all workers to finish a phase before any can proceed (e.g., before starting a new epoch).
218
What would you overlap with communication to improve performance?
Local computation—e.g., computing the next gradient chunk while the previous one is being reduced.
219
In practice, why avoid naïve all-reduce across too many workers?
Because the communication overhead—and idle time—grows as the cluster scales.
220
What’s the difference between reduce and all-reduce in terms of result placement?
Reduce puts the aggregate on one node; all-reduce replicates it to all nodes.
221
What primitive would you use if one worker needed to pause for a signal from another?
Wait.
222
How does parallel k-means ensure every point is assigned before recomputing centroids?
A barrier-like synchronization happens naturally when MapReduce moves from the reduce phase back to the next map phase.
223
Name three forms of concept drift dynamics.
Gradual, abrupt, and recurring (cyclical) changes
224
Why do batch learners become outdated under streaming data?
They can’t adapt to new distributions without expensive full retraining
225
What does ADWIN stand for?
ADaptive WINdowing method for concept drift detection
226
How does ADWIN detect drift?
By adaptively shrinking its window when statistical tests detect change in the stream
227
What triggers ADWIN to grow its window?
When no significant change is detected, indicating stable data.
228
In DDM, what is P𝑖?
The observed error rate at time 𝑖
229
How is S𝑖 computed in DDM?
S𝑖 = SQR(P𝑖(1-P𝑖) / 𝑖)
230
When does DDM enter the “warning zone”?
P𝑖 + S𝑖 ≥ Pmin = 2*Smin
231
When does DDM declare concept drift?
P𝑖 + S𝑖 ≥ Pmin + 3*Smin
232
What is the main idea behind EDDM?
Monitoring the average distance between classification errors to catch slow drift early.
233
Why might EDDM detect drift earlier than DDM?
It uses spacing of errors, which can show drift before error-rate spikes.
234
Give an example of abrupt drift.
A sudden market crash completely changing customer behavior.
235
What’s a recurring drift?
Seasonal or cyclical patterns, like weekend vs. weekday usage.
236
Why is it important to detect drift quickly?
To retrain or adapt the model before performance degrades significantly.
237
How do ADWIN and DDM differ in window handling?
ADWIN explicitly manages a data window; DDM tracks error statistics without a sliding window.
238
What is a 0-order tensor?
A scalar (single value).
239
What order tensor is a feature vector?
1-order tensor.
240
How many dimensions does a matrix have?
Two (rows × columns).
241
Give an example of a 3-order tensor in ML.
A video batch: frames × width × height × channels.
242
Define a dense tensor.
One where most entries are non-zero.
243
Define a sparse tensor.
One where most entries are zero (O(n) non-zeros in an n×n tensor).
244
How is sparsity measured?
As the proportion of zero-valued elements.
245
Why use sparse representations?
To reduce storage and compute on data with many zeros.
246
What’s the dictionary-of-keys format?
A map from index tuples (i,j,…) to non-zero values.
247
What is COO (coordinate list) format?
A list of (row, col, value) triples for non-zeros.
248
Name the three arrays in CSR format.
V (values), COL_INDEX (columns), ROW_INDEX (row pointers).
249
What does ROW_INDEX[k] represent?
The start index in V/COL_INDEX for row k
250
How do you reconstruct row i in CSR?
Slice V and COL_INDEX from ROW_INDEX[i] to ROW_INDEX[i+1].
251
In CSR, what does COL_INDEX store?
The column indices of each non-zero in V.
252
How many non-zeros does row 2 have in V=[5,8,3,6], COL_INDEX=[0,1,2,1], ROW_INDEX=[0,1,2,3,4]?
One non-zero (value 3 at column 2).
253
What is virtualisation?
Creating a software-based VM that behaves like a full computer, including hardware, OS, and peripherals.
254
What is a hypervisor?
The software (virtual machine monitor) that runs on a host to manage and isolate multiple guest VMs.
255
How does emulation differ from virtualisation?
Emulation mimics hardware without direct host-hardware interaction; virtualisation uses the real host hardware via a hypervisor.
256
What is a VM snapshot?
A point-in-time capture of a VM’s complete state, which can be restored later.
257
How can snapshots aid migrations?
By copying a snapshot to another host and restoring it, you move the VM seamlessly.
258
Give two drawbacks of virtualisation.
High overhead (each VM runs a full OS) and redundancy in duplicated system files.
259
What is containerisation?
OS-level virtualisation where containers share a host OS kernel but are isolated environments.
260
How do containers differ from VMs?
Containers share the OS kernel and have less overhead versus VMs, which each run a full guest OS.
261
What resources can a container use?
Only those explicitly allocated to it by the container runtime.
262
List three benefits of containers.
Portability, scalability, and ease of building/deploying/managing applications.
263
What does isolation mean in the container context?
Processes and files in one container cannot affect those in another.
264
What is container orchestration?
Automating deployment, scaling, management, and networking of containers across multiple hosts.
265
Why use PaaS in the context of containers?
To develop and run containerized applications without managing the infrastructure itself.
266
What is thread-level parallelism?
A programming model that splits work into threads, lightweight execution units within one process, to run concurrently on multiple CPU cores
267
Why are threads considered “lightweight” compared to processes?
Threads share the parent’s memory and resources, so they incur much less overhead to create and context-switch than full processes
268
What does SISD stand for?
Single Instruction, Single Data: one instruction stream on one data stream
269
How does SIMD differ from SISD?
SIMD applies the same instruction to multiple data streams in parallel, with each core working on its own data but fetching identical instructions
270
What characterizes MIMD architectures?
Multiple processors each fetch their own instructions and operate on their own data independently
271
Name two resources that threads share within a process
Global variables and file descriptors
272
What must threads share to run in a shared-memory system?
A common address space so they can directly access shared data
273
What is the risk when threads access shared data without synchronization?
Race conditions, leading to inconsistent or corrupted data
274
What API is commonly used for shared-memory threading in C/C++?
OpenMP
275
Describe the OpenMP fork–join model.
The master thread forks worker threads at a parallel region, they execute concurrently, then join back at a synchronization point
276
What is MPI_COMM_WORLD?
The default MPI communicator that includes all processes in an MPI session
277
What attributes does an MPI communicator have?
Context/ID, Group (set of processes), Size (# processes), and Rank (each process’s integer ID)
278
How do MPI point-to-point communications work?
A sender posts a message with data and destination rank; the receiver must post a matching receive, making the exchange cooperative and two-sided
279
Give one advantage of shared‐memory threading versus distributed memory.
Threads can directly access shared data without explicit message passing, simplifying programming for on-node parallelism
280
What does OpenMP abstract away from the programmer?
Low-level thread creation, scheduling, and most synchronization details—letting you focus on marking parallel regions
281
What is software–hardware co-design?
A coupled development process where hardware is tailored to software requirements and software is tuned to exploit hardware features
282
Why are AI/ML workloads driving specialised hardware design?
They rely heavily on tensor operations, both dense (CNNs) and sparse (graph models), which general-purpose CPUs can’t efficiently handle
282
What’s the difference between bottom-up and top-down design?
Bottom-up builds hardware first then software; top-down derives hardware features from software workload demands
283
Name the four key steps in co-design.
Partitioning, Prototyping & Simulation, High-Level Synthesis, and Platform-Based Design
284
What is Partitioning in co-design?
Allocating which functions run in hardware (for performance) versus software (for flexibility/updates)
285
What is Prototyping and Simulation in co-design?
Modelling how the hardware and software will interact. Involves the use of hardware description languages (HDLs), and software dev tools to create accurate system models.
286
How does high-level synthesis aid co-design?
It automatically converts a high level language into HDL, speeding hardware implementation from high-level code
287
How does Platform-Based design aid co-design?
By using a predifined platform (a set of hardware and software components) as a starting point to reduce design time and complexity.
288
Give two advantages of co-design.
Enables thorough design-space exploration (power, cost, performance) and multi-level optimization (system, architectural, algorithmic)
289
What are IPUs and DPUs?
Infrastructure/Data Processing Units—specialised chips in data centres for networking, security, and management tasks
290
List three application domains of co-design.
Embedded systems, automotive electronics (autonomous vehicles), and 5G telecommunications
291
Why is edge computing important for AI/ML co-design?
Personalized models need on-device inference/training, requiring hardware/software tuned for low power and latency
292
What is High-Performance Computing (HPC)?
The use of supercomputers or large processor clusters for parallel processing and clustering with specialized hardware to solve complex computational problems .
292
How is HPC performance measured?
In FLOPS (Floating point operations per second): GFLOPS (10⁹), TFLOPS (10¹²), PFLOPS (10¹⁵), EFLOPS (10¹⁸).
293
What does Moore’s Law state?
Transistor counts on integrated circuits double roughly every two years, historically driving performance gains
294
What is Amdahl’s Law?
A formula predicting maximum speedup from parallelism, showing that the non-parallelizable portion limits overall speedup
295
What is the formula for Amdahl's law
Slatency = 1/(1-p + p/s) p = proportion of execution time enhanced s = the speedup of the enhancement.
296
What are the three main tasks of HPC resource management software?
Resource allocation, workload scheduling, and support for distributed execution & monitoring
297
How is an HPC “job” defined?
A self-contained work unit with input data that produces output, run interactively or in batch, and queued until resources are available
298
Why is HPC important for data science?
It enables big data handling, complex analytics, faster ML/DL training, and large-scale scientific simulations
299
Why is network performance critical in distributed HPC systems?
Because LAN bandwidth and latency determine how fast nodes can exchange data, preventing communication bottlenecks .
300
What does “cost of ownership” mean for an HPC facility?
The total expense of running the system—including admin staff, maintenance, and up to $10 million/year in electricity costs .
301
What’s the difference between distributed and non-distributed HPC?
Distributed HPC spans multiple networked nodes that communicate over an interconnect, whereas non-distributed (shared-memory) runs entirely within one multi-core system .
302
What three factors chiefly drive an HPC cluster’s processing power?
The number of nodes, processors per node, and cores per processor
303
What is SLURM
An open source, modulear, extensible, scalable resource manager and workload scheduling software for clusters and supercomputers.
304
In SLURM, what is a “partition”?
A logical grouping of nodes that defines a job queue with its own constraints and priorities
305
Name three possible node states in SLURM.
Draining, Drained, Down (also Completing, Allocated, Idle, Unknown)
306
What are the main job end-states in SLURM?
Completed, TimeOut, NodeFail, Cancelled, and Failed (with intermediate states Pending, Running, Suspended, Completing)