Basics Flashcards

(10 cards)

1
Q

What is

A

Apache Kafka is an open-source distributed event streaming platform that can be used either as a message queue or as a stream processing system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Kafka as Message Queue

A

Kafka behaves like a publish-subscribe messaging system (like RabbitMQ or ActiveMQ):
Producer: Sends messages (events) to a Kafka topic.
Consumer: Reads messages from the topic.
Kafka is different from traditional queues because messages are not deleted after consumption—they are retained for a configurable time (e.g., 7 days). Multiple consumers can independently read the same stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kafka as Stream Processor

A

Kafka supports real-time processing of streams of data using:

Kafka Streams API (native Java library)

ksqlDB (SQL-like interface for stream processing)
You can:
Transform, filter, or join streams in real time.
Aggregate values over time (e.g., count events per minute).
Example: Monitor user clickstreams in real-time to generate live metrics or detect fraud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Stream Processor vs Message Queue

A

Use a message queue to deliver events reliably between services.

Use a stream processor to derive real-time insights or decisions from the event stream.
Stream Processor - A service monitors the stream of “OrderPlaced” events to:
Count orders per minute
Detect fraud based on pattern
Trigger real-time recommendations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

World Cup Scenario

A

we run a website that provides real-time statistics on the matches. Each time a goal is scored, a player is booked, or a substitution is made, we want to update our website with the latest information.
Events are placed on a queue when they occur. We call the server or process responsible for putting these events on the queue the producer. Downstream, we have a server that reads events off the queue and updates the website. We call this the consumer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

World Cup Scaling System

A

We need to scale the system by adding more servers to distribute our queue. But how do we ensure that the events are still processed in order?
If we were to randomly distribute the events across the servers, we would have a mess on our hands. Goals would be scored before the match even started, and players would be booked for fouls they haven’t committed yet.
A logical solution is to distribute the items in the queue based on the game they are associated with. This way, all events for a single game are processed in order because they exist on the same queue. This is one of the fundamental ideas behind Kafka: messages sent and received through Kafka require a user specified distribution strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Kafka Fundamental Idea

A

Messages sent and received through Kafka require a user specified distribution strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Consumer Group

A

Used to scale the consumer. With consumer groups, each event is guaranteed to only be processed by one consumer in the group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

World Cup - Topic

A

we’ve decided that we want to expand our hypothetical World Cup to more sports, like basketball. But we don’t want our soccer website to cover basketball events, and we don’t want our basketball website to cover soccer events. So we introduce the concept of topics. Each event is associated with a topic, and consumers can subscribe to specific topics. Therefore, our consumers who update the soccer website only subscribe to the soccer topic, and our consumers that update the basketball website only subscribe to basketball events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly