interview Flashcards

(50 cards)

1
Q

What is Apache Kafka?

A

Kafka is a distributed event streaming platform used to publish, subscribe, store, and process data streams in real time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we use Kafka?

A

Kafka is used for handling high-throughput, real-time data pipelines and decoupling systems for reliable message delivery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the main components of Kafka?

A

Producer, Consumer, Broker, Topic, Partition, and Offset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Kafka topic?

A

A topic is like a channel or category where messages are published; topics are divided into partitions for parallel processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Kafka partition?

A

A partition is a unit of parallelism that stores ordered messages and is replicated across brokers for fault tolerance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an offset?

A

An offset is a unique ID assigned to each message in a partition that tracks the consumer’s read position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a broker?

A

A broker is a Kafka server that stores and serves data to producers and consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a producer?

A

A producer sends messages to Kafka topics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a consumer?

A

A consumer subscribes to Kafka topics and reads messages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Kafka ensure fault tolerance?

A

By replicating partitions across multiple brokers so another replica can take over if one fails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is Kafka different from RabbitMQ or traditional message queues?

A

Kafka is distributed, log-based, supports replaying messages, and handles very high throughput compared to queue-based systems like RabbitMQ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How did you use Kafka in your project?

A

We used Kafka to stream data from our source to a database; producers sent messages to topics and consumers wrote them into the DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why did you choose Kafka instead of writing directly to the database?

A

Kafka decoupled data ingestion from DB writes, provided buffering, and ensured no data loss if the DB was temporarily unavailable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How did your producer and consumer communicate?

A

The producer published to a Kafka topic and the consumer subscribed to that topic to fetch, process, and store data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How did you ensure no duplicate or missing data?

A

We committed consumer offsets only after successful DB writes, ensuring at-least-once delivery semantics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which library or client did you use for Kafka?

A

We used the kafka-python or confluent-kafka client library for creating producers and consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What challenges did you face using Kafka?

A

We faced consumer lag and partition configuration issues; tuning batch sizes and partitioning improved throughput.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How did you test your Kafka setup?

A

By producing and consuming sample messages using Kafka CLI tools and verifying the data flow to the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are consumer groups?

A

A consumer group is a set of consumers sharing the load of reading from a topic, with each partition assigned to one consumer per group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are message delivery semantics?

A

Kafka supports at-most-once, at-least-once, and exactly-once delivery semantics depending on offset handling and configurations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How does Kafka achieve high throughput?

A

Through sequential disk writes, zero-copy I/O, and batching messages for efficiency.

22
Q

What is Kafka Connect?

A

Kafka Connect is a tool to integrate Kafka with external systems like databases or file systems without custom code.

23
Q

What is Kafka Streams?

A

Kafka Streams is a client library for building real-time stream processing applications on top of Kafka topics.

24
Q

What is replication factor?

A

It’s the number of copies of each partition maintained across brokers to ensure reliability.

25
What happens if a broker goes down?
The leader partition shifts to another in-sync replica so message flow continues without interruption.
26
How does Kafka ensure message ordering?
Kafka guarantees ordering of messages within a single partition but not across multiple partitions.
27
How do you decide the number of partitions for a topic?
Based on expected data volume and parallelism; more partitions allow more consumers to process data in parallel.
28
If you had to improve your Kafka setup, what would you do?
Monitor consumer lag, optimize batch size, adjust partitions, and ensure balanced load among consumers.
29
If the consumer lags behind, how would you debug it?
Check consumer group offsets, partition lag, network bottlenecks, and processing delays in the consumer.
30
Can Kafka integrate with databases like MySQL or MongoDB?
Yes, using Kafka Connect or custom consumers to read from topics and write to databases.
31
What is the difference between a topic and a partition?
A topic is a logical stream of data, while partitions are physical subsets of that stream for scalability.
32
What is the replication leader and follower concept?
Each partition has one leader handling reads/writes and multiple followers replicating the data for failover.
33
What is ISR (In-Sync Replica)?
ISR is the set of replicas that are fully caught up with the leader and eligible to take over if the leader fails.
34
What is the role of ZooKeeper or KRaft in Kafka?
They manage cluster metadata, broker coordination, and leader election; KRaft replaces ZooKeeper in newer versions.
35
What is producer acknowledgment (acks) setting?
It controls when the producer considers a message sent — acks=0 (no wait), acks=1 (leader only), acks=all (leader + replicas).
36
What is consumer offset commit?
It’s the process of saving the last read message position so consumers resume correctly after restart.
37
How do producers decide which partition to send to?
Either randomly, round-robin, or based on a key to ensure related messages go to the same partition.
38
Can multiple consumers read the same data?
Yes, if they are in different consumer groups; each group gets its own copy of the data.
39
What is message retention in Kafka?
Kafka retains messages for a configured period or size limit, even after consumers have read them.
40
What is a real-world use case of Kafka?
Real-time analytics, event-driven microservices, log aggregation, or streaming IoT sensor data.
41
How does Kafka handle back-pressure or slow consumers?
Messages remain in the log; consumers can catch up later using offsets, and producers can be throttled if necessary.
42
How would you describe Kafka to a non-technical person?
Kafka is like a high-speed postal system that lets applications send and receive data reliably in real time.
43
What happens when a new consumer joins a group?
Kafka triggers a rebalance so partitions are reassigned among all consumers in the group.
44
How can you achieve exactly-once processing?
By using idempotent producers and transactional writes introduced in Kafka 0.11+.
45
How can Kafka be scaled horizontally?
By adding more brokers and increasing partitions to distribute data and load.
46
What is consumer lag?
The difference between the latest offset in a partition and the last committed offset by the consumer; shows how far behind it is.
47
What metrics would you monitor in a Kafka setup?
Producer throughput, consumer lag, broker health, partition replication status, and disk usage.
48
What’s the difference between Kafka and a database?
Kafka is for streaming and event transport; it doesn’t store relational data or provide queries like a database.
49
In your project, what database did you use with Kafka?
We used [your DB name] as the sink database for storing streamed data from Kafka topics.
50
What did you personally learn from using Kafka?
I learned how to design event-driven systems, configure producers/consumers, and manage real-time data streaming efficiently.