Kafka Design Patterns Flashcards

(7 cards)

1
Q

Reference Link

A

https://dzone.com/refcardz/apache-kafka-patterns-and-anti-patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Kafka Client API – Producer Pattern

A

Goal
While producing a message, you want to ensure that it has been sent to Kafka.
Pattern
Use the acks=all Configuration for producer.
Anti-Pattern
Using the default configuration (acks = 1).
If you set acks=all (or -1), your application only receives a successful confirmation when all the in-sync replicas in the cluster have acknowledged the message.
If you don’t provide one explicitly, acks=1 is used by default. The client application will receive an acknowledgment as soon as the leader node receives the message and writes it to its local log. If the message has not yet been replicated to follower nodes and the current leader node, it will result in data loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

No Duplicates Pattern

A

Goal
The producer needs to be idempotent because your application cannot tolerate duplicate messages.
Pattern
Set enable.idempotence=true.
Anti-Pattern
Using a default configuration.
It is possible that the producer application may end up sending the same message to Kafka more than once. Imagine a scenario where the message is actually received by the leader (and replicated to in-sync replicas if acks=all is used), but the application does not receive the acknowledgment from the leader due to request timeout, or maybe the leader node just crashed. The producer will try to resend the message — if it succeeds, you will end up with duplicate messages in Kafka.
The Producer API provides a simple way to avoid this by using the enable.idempotence property (which is set to false by default). When set to true, the producer attaches a sequence number to every message. This is validated by the broker so that a message with a duplicate sequence number will get rejected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Kafka Client API – Consumer Group Pattern

A

Goal
Scale out your data processing pipeline.
Pattern
Run multiple instances of your consumer application.
Anti-Pattern
Number of consumer instances is more than the number of topic partitions.
A Kafka consumer group is a set of consumers that ingest data from one or more topics. The topic partitions are load-balanced among consumers in the group. This load distribution is managed on the fly when new consumer instances are added or removed from a consumer group. For example, if there are ten topic partitions and five consumers in a consumer group for that topic, Kafka will make sure that each consumer instance receives data from two topic partitions of the topic.
Keep in mind: You might end up with more instances than partitions. You need to be mindful of the fact that such instances remain inactive and do not participate in processing data from Kafka.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Kafka - Data Duplication and Data Loss Prevention

A

Goal
Avoid duplicates and/or data loss while processing data from Kafka.
Pattern
Set enable.auto.commit to false and use manual offset management.
Anti-Pattern
Using default configuration with automatic offset management.
Consumers acknowledge the receipt (and processing) of messages by committing the offset of the message they have read. By default, enable.auto.commit is set to true for consumer apps, which implies that the offsets are automatically committed asynchronously (for example, by a background thread in the Java consumer client) at regular intervals (defined by auto.commit.interval.ms property that defaults to 5 seconds). While this is convenient, it allows for data loss and/or duplicate message processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Kakfa Data Loss and Duplication Prevention - Explicitly set enable.auto.commit to false

A

Duplicate messages: Consider a scenario where the consumer app has read and processed messages from offsets 198, 199, and 200 of a topic partition — and the automatic commit process was able to successfully commit offset 198 but then crashed/shutdown after that. This will trigger a rebalance to another consumer app instance (if available), and it will look for the last committed offset, which in this case was 198. Hence, the messages at offsets 199 and 200 will be redelivered to the consumer app.

Data loss: The consumer app has read the messages for offsets 198, 199, and 200. The auto-commit process commits these offsets before the application is able to actually process these messages (perhaps through some transformation and store the result in a downstream system), and the consumer app crashes. In this situation, the new consumer app instance will see that the last committed offset is 200 and will continue reading new messages from thereon. Messages from offsets 198, 199, and 200 were effectively lost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly