What is Apache Kafka?
Kafka is a distributed event streaming platform used to publish, subscribe, store, and process data streams in real time.
Why do we use Kafka?
Kafka is used for handling high-throughput, real-time data pipelines and decoupling systems for reliable message delivery.
What are the main components of Kafka?
Producer, Consumer, Broker, Topic, Partition, and Offset.
What is a Kafka topic?
A topic is like a channel or category where messages are published; topics are divided into partitions for parallel processing.
What is a Kafka partition?
A partition is a unit of parallelism that stores ordered messages and is replicated across brokers for fault tolerance.
What is an offset?
An offset is a unique ID assigned to each message in a partition that tracks the consumer’s read position.
What is a broker?
A broker is a Kafka server that stores and serves data to producers and consumers.
What is a producer?
A producer sends messages to Kafka topics.
What is a consumer?
A consumer subscribes to Kafka topics and reads messages.
How does Kafka ensure fault tolerance?
By replicating partitions across multiple brokers so another replica can take over if one fails.
How is Kafka different from RabbitMQ or traditional message queues?
Kafka is distributed, log-based, supports replaying messages, and handles very high throughput compared to queue-based systems like RabbitMQ.
How did you use Kafka in your project?
We used Kafka to stream data from our source to a database; producers sent messages to topics and consumers wrote them into the DB.
Why did you choose Kafka instead of writing directly to the database?
Kafka decoupled data ingestion from DB writes, provided buffering, and ensured no data loss if the DB was temporarily unavailable.
How did your producer and consumer communicate?
The producer published to a Kafka topic and the consumer subscribed to that topic to fetch, process, and store data.
How did you ensure no duplicate or missing data?
We committed consumer offsets only after successful DB writes, ensuring at-least-once delivery semantics.
Which library or client did you use for Kafka?
We used the kafka-python or confluent-kafka client library for creating producers and consumers.
What challenges did you face using Kafka?
We faced consumer lag and partition configuration issues; tuning batch sizes and partitioning improved throughput.
How did you test your Kafka setup?
By producing and consuming sample messages using Kafka CLI tools and verifying the data flow to the database.
What are consumer groups?
A consumer group is a set of consumers sharing the load of reading from a topic, with each partition assigned to one consumer per group.
What are message delivery semantics?
Kafka supports at-most-once, at-least-once, and exactly-once delivery semantics depending on offset handling and configurations.
How does Kafka achieve high throughput?
Through sequential disk writes, zero-copy I/O, and batching messages for efficiency.
What is Kafka Connect?
Kafka Connect is a tool to integrate Kafka with external systems like databases or file systems without custom code.
What is Kafka Streams?
Kafka Streams is a client library for building real-time stream processing applications on top of Kafka topics.
What is replication factor?
It’s the number of copies of each partition maintained across brokers to ensure reliability.