What is
Apache Kafka is an open-source distributed event streaming platform that can be used either as a message queue or as a stream processing system.
Kafka as Message Queue
Kafka behaves like a publish-subscribe messaging system (like RabbitMQ or ActiveMQ):
Producer: Sends messages (events) to a Kafka topic.
Consumer: Reads messages from the topic.
Kafka is different from traditional queues because messages are not deleted after consumption—they are retained for a configurable time (e.g., 7 days). Multiple consumers can independently read the same stream.
Kafka as Stream Processor
Kafka supports real-time processing of streams of data using:
Kafka Streams API (native Java library)
ksqlDB (SQL-like interface for stream processing)
You can:
Transform, filter, or join streams in real time.
Aggregate values over time (e.g., count events per minute).
Example: Monitor user clickstreams in real-time to generate live metrics or detect fraud.
Stream Processor vs Message Queue
Use a message queue to deliver events reliably between services.
Use a stream processor to derive real-time insights or decisions from the event stream.
Stream Processor - A service monitors the stream of “OrderPlaced” events to:
Count orders per minute
Detect fraud based on pattern
Trigger real-time recommendations
World Cup Scenario
we run a website that provides real-time statistics on the matches. Each time a goal is scored, a player is booked, or a substitution is made, we want to update our website with the latest information.
Events are placed on a queue when they occur. We call the server or process responsible for putting these events on the queue the producer. Downstream, we have a server that reads events off the queue and updates the website. We call this the consumer.
World Cup Scaling System
We need to scale the system by adding more servers to distribute our queue. But how do we ensure that the events are still processed in order?
If we were to randomly distribute the events across the servers, we would have a mess on our hands. Goals would be scored before the match even started, and players would be booked for fouls they haven’t committed yet.
A logical solution is to distribute the items in the queue based on the game they are associated with. This way, all events for a single game are processed in order because they exist on the same queue. This is one of the fundamental ideas behind Kafka: messages sent and received through Kafka require a user specified distribution strategy.
Kafka Fundamental Idea
Messages sent and received through Kafka require a user specified distribution strategy.
Consumer Group
Used to scale the consumer. With consumer groups, each event is guaranteed to only be processed by one consumer in the group.
World Cup - Topic
we’ve decided that we want to expand our hypothetical World Cup to more sports, like basketball. But we don’t want our soccer website to cover basketball events, and we don’t want our basketball website to cover soccer events. So we introduce the concept of topics. Each event is associated with a topic, and consumers can subscribe to specific topics. Therefore, our consumers who update the soccer website only subscribe to the soccer topic, and our consumers that update the basketball website only subscribe to basketball events.