What is an offset in Apache Kafka?
A unique identifier in a partition, primarily used to identify messages based on their ID.
What are DStreams built as in Spark Streaming?
a continuous stream of RDDs
What is the main advantage of the Kappa architecture compared to the Lambda architecture?
Kappa removes the separate batch and speed layers and keeps a single streaming pipeline built on an immutable event log (for example, Kafka).
This simplifies the overall system design, codebase, and operations.
Which one is a simpler data engineering architecture, kappa or lambda?
kappa
Based on what does Docker create reproducible environments?
Dockerfiles
What process is responsible for building, running, and distributing Docker containers?
Docker daemon
What happens when the
docker build
command is run from a bash terminal?
The instructions in the Dockerfile in the current directory are followed and an image is created.
What do we specify in Dockerfiles?
In Docker, what is the key abstraction that enables communication between containers and between containers and the outside world?
Docker networks
(such as bridge, host, overlay, macvlan)
virtual networks created and managed by Docker’s network drivers that connect containers to each other and to external networks
What is iptable?
a command-line firewall utility that enables or blocks traffic
based on policy chains
What are the two types of
nodes that make up the Docker Swarm?
managers and workers
What does
# syntax=docker/dockerfile:1
as the first line of a Dockerfile tell the Docker builder?
which Dockerfile syntax to use
the latest release of the version 1 syntax
What do anti-affinity rules do?
Anti-affinity rules ensure that selected virtual machines are not placed on the same host.
They spread VMs across different hosts so that a single host failure does not take them all down at once.
What is Kubernetes?
a container management and orchestration tool
κυβερνήτης = kormányos
What is the core function of Kubernetes?
to run and coordinate containerized applications across a cluster of machines
What happens to the pods of a deployment during a rolling update in Kubernetes?
During a rolling update, Kubernetes gradually replaces the existing pods with new pods, creating new ones and terminating old ones in small batches so the application remains available.
What is the main responsibility of the kubelet in Kubernetes?
The kubelet runs on each node and monitors the pods scheduled to that node. It continuously checks their actual state and works to keep them matching their desired state (for example, by starting or restarting containers when needed).
When a node stops sending kubelet heartbeats, the control plane marks the node as NotReady, and controllers recreate the affected Pods on other healthy nodes.
In Kubernetes, by acting as the brain and main gateway, what are the primary responsibilities of the control plane (legacy term: master node)?
The control plane exposes the Kubernetes API, stores and manages the cluster state, schedules pods onto worker nodes, and runs controllers that monitor node and workload health and automatically react to changes (for example, replacing failed pods or handling node failures).
In the context of Kubernetes, what is etcd and what is it used for?
etcd is a persistent, strongly consistent, distributed key–value store used by Kubernetes as the primary data store for all cluster configuration and state.
With which command-line client do you typically access and manage a Kubernetes cluster?
kubectl
the Kubernetes command-line client
to access and manage a cluster from a local machine
In Kubernetes, what are worker nodes responsible for?
Worker nodes host and run pods (the application workloads) and provide the local services (kubelet, container runtime, kube-proxy) that communicate with the control plane and handle node-level networking.
Which architecture pattern does Docker use?
a) Master-slave
b) Event-bus
c) Client-server
d) Model-view-controller
The correct answer is (c) Client-server.
Docker is built on a client-server architecture.
server: the Docker daemon, the persistent background process (often named dockerd)
client: the primary way most users interact with Docker, the Docker CLI or docker command
(When we type a command like docker run or docker build, the client translates that command into a REST API request and sends it to the Docker daemon.)
The control plane in a Kubernetes architecture is also known as…
a) …pod
b) …etcd
c) …API Server
d) …master node
The correct answer is (d) …master node.
The term “master node” is the older terminology used in Kubernetes.
The control plane refers to the set of components that act as the “brain” of the Kubernetes cluster. It is responsible for making global decisions about the cluster (like scheduling pods) and maintaining the cluster’s desired state.
What does the term overhead refer to in data engineering?
wastage of resources