Kafka docs Producer possible message loss - apache-kafka

I'm currently learning more about the Kafka Producer. I am a bit puzzled by the following paragraph from the docs:
Messages written to the partition leader are not immediately readable
by consumers regardless of the producer’s acknowledgement settings.
When all in-sync replicas have acknowledged the write, then the
message is considered committed, which makes it available for reading.
This ensures that messages cannot be lost by a broker failure after
they have already been read. Note that this implies that messages
which were acknowledged by the leader only (that is, acks=1) can be
lost if the partition leader fails before the replicas have copied the
message. Nevertheless, this is often a reasonable compromise in
practice to ensure durability in most cases while not impacting
throughput too significantly.
The way I interpret this is that messages can get lost during the sync between leader and replicated brokers, i.e. messages won't be committed unless they have been successfully replicated.
I don't understand how (for example) the Java application can shield against this message loss.
Does it receive different acknowledgements between 'only-leader' and the full replication?
this is often a reasonable compromise in practice
How is that? Do they assume that you should log failed messages and re-queue them manually? Or how does that work?

"Does it receive different acknowledgements between 'only-leader' and the full replication?"
There is no difference between a leader and replica acknowledgment. You only steer the behavior of the producer through its configuration acks. If it is set to 1 it will wait only for the leader acknowledgment, if you set it to all it will wait for all replicas (based on the replication factor of the topic) before the producer considers writing the message as successful.
If you set acks=all and the synchronisation between leader and replicas fail, your producer will receive a retriable Exception (either "NotEnoughReplicasException" or "NotEnoughReplicasAfterAppendException", see more details here). Based on the producer configuration retries it will try to re-send the message. Kafka is built in a way that it expects crashed brokers to be available again (in a "short" amount of time).
In case you have set acks=1 and the synchronisation between leader and replicas fail, your producer considers the message was successfully written to the cluster and it will not try to reproduce the message. Of course the leader will continue to replicate the message to its replicas. But it is not really guaranteed that this will happen. And before the message got replicated the leader broker itself could have issues which will cause the message to be lost forever.

Related

If the partition a Kafka producer try to send messages to went offline, can the producer try to send to a different partition?

My Kafka cluster has 5 brokers and the replication factor is 3 for topics. At some time some partitions went offline but eventually they went back online. My questions are:
How many brokers were down does it indicate, given the fact that there were offline partitions? I think given the cluster setup above, I can afford to lose 2 brokers at the same time. However, if there were 2 brokers down, for some partitions they no longer have quorum; will these partitions go offline in this case?
If there are offline partitions, and a Kafka producer tries to send messages to them and fails, will the producer try a different partition that may be online? The messages have no key in them.
Not sure if I understood your question completely right but I have the impression that you are mixing up partitions and replications. Or at least, your question cannot be looked at isolated on the producer. As soon as one broker is down some things will happen on the cluster.
Each TopicPartition has one Partition Leader and your clients (e.g. Producer and Consumer) are only communicating with this one leader, independen of the number of replications.
In the case where two out of five broker are not available, Kafka will move the partition leader as well as the replicas to a healthy broker. In that scenario you should therefore not get into trouble although it might take some time and retries for the new leader to be selected and the new replications to be created on the healthy broker. A leader selection can be made fast as you have set the replication factor to three, so even if two brokers go down, one broker should still have the complete data (assuming all partitions were in-sync). However, creating two new replicas could take some time dependent on the amount of data. For that scenario you need to look into the topic level configuration min.insync.replicas and the KafkaProducer confiruation acks (see below).
I think the following are the most important configurations for your KafkaProducer to handle such situation:
bootstrap.servers: If you are anticipating regular connection problems with your brokers, you should ensure that you list all five of them. Although it is sufficient to only mention one address (as one broker will then communicate will all other brokers in the cluster) it is safe to have them all listed in case one or even two broker are not available.
acks: This defaults to 1 and defines the number of acknowledgments the producer requires the partition leader to have received before considering a request as successful. Possible values are 0, 1 and all.
retries: This value defaults to 2147483647 and will cause the client to resend any record whose send fails with a potentially transient error until the time of delivery.timeout.ms is reached
delivery.timeout.ms: An upper bound on the time to report success or failure after a call to send() returns. This limits the total time that a record will be delayed prior to sending, the time to await acknowledgement from the broker (if expected), and the time allowed for retriable send failures. The producer may report failure to send a record earlier than this config if either an unrecoverable error is encountered, the retries have been exhausted, or the record is added to a batch which reached an earlier delivery expiration deadline. The value of this config should be greater than or equal to the sum of request.timeout.ms and linger.ms.
You will find more details on the documentation on the Producer configs.

what is the purpose of kafka ack?

kafka consumer saves offsets only when it commits.
Thus , when it rise after crash , it can start from previous offset.
But what is the purpose of ack? in case of crash, ack wouldn't help(if they weren't commited)
Acks = 0
No response is requested from the broker, so if the broker goes offline or an exception happens, we will not know and will lose data. Useful for data where it's fine to potentially lose messages as metric or log collection.
The best performance is this because the producer will not wait for any confirmation.
Acks = 1 (Default)
Leader response is requested, but replication is not guarantee, this happens in background. If an acknowledgement is not received, the producer could retry without duplicate data. If the leader broker goes offline but replicas haven't replicated the data yet, we have data lost.
Acks = all
The leader and the replicas are requested by acknowledgement, this means that the producer has to wait to receive confirmation of any replica of the broker before to continue, this add latency but ensures safety, this setting ensure no lose data.

Consumer receiving messages before all replicas acknowledge to leader kafka

Let's say high watermark for topic partition is 1000 and leader, all follower replicas have same messages exactly. In this scenario, producer sends a message with acks = all and a consumer is consuming from this topic. Is there a possibility here, where a consumer fetch request will be served before other replicas fetch request?
In other words, does leader serve consumer's fetch request before it receives acknowledgements from all in-sync followers in acks = all case?
This is because in our setup, consumer received a message before followers in acks=all case.
In Kafka a message is ready to be consumed after it is added to leader broker, but if you set acks=all leader will wait all in-sync-replicas to replicate message.
Normally it is expected that all replicas of a topic would be in-sync-replicas unless there is a problem in replication process. (if some of replicas become out-of-sync, you can still continue to produce messages if you have enough replicas (min.insync.replicas) even if you set acks=all)
min.insync.replicas: When a producer sets acks to "all" (or "-1"),
min.insync.replicas specifies the minimum number of replicas that must
acknowledge a write for the write to be considered successful. If this
minimum cannot be met, then the producer will raise an exception
(either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
In your case it seems there is no way to bypass replication process if you set acks=all. But you can set acks=1 if you don't want to wait for replication process. With this config a message would be available to consumers right after leader write the message to its local log. (followers will also replicate messages, but leader will not wait them) But you should consider the risk of data loss with this config.
acks=1 This will mean the leader will write the record to its local
log but will respond without awaiting full acknowledgement from all
followers. In this case should the leader fail immediately after
acknowledging the record but before the followers have replicated it
then the record will be lost
In the docs, it's clearly mentioned that the message will be ready for consumption when all in-sync replicas get the message.
Messages written to the partition leader are not immediately readable by consumers regardless of the producer’s acknowledgement settings. When all in-sync replicas have acknowledged the write, then the message is considered committed, which makes it available for reading.
I would guess that you are observing this behavior because you left the min.insync.replicas to the default value which is 1.
The leader partition is included in the min.insync.replicas count, so it means that with min.insync.replicas = 1, it's just the leader that needs to do the write (then acks the producer) and then the message is available to the consumer; it's actually not waiting for the message to be replicated to other followers because the criteria on min.insync.replicas are already met. It makes acks=all the same as acks=1.
You will see a difference if you increase the min.insync.replicas > 1.

What do we mean by 'commit' data in Kafka broker?

In a Kafka cluster containing N brokers , for Topic T against a partition, producers publish data to Leader broker. By the term 'commit' in Kafka terminology , does it mean the data is committed in Leader broker or the data is committed to the Leader broker and also to the corresponding Followers available in the ISR list.
This is controlled by the producer configuration setting called ack:
acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.
acks=1 (default) This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee.
Regardless of the "acks" setting in the producer, from a broker perspective a message is considered "committed" when all in-sync replicas for that partition have applied it to their log.
Only committed messages can be read by consumers.
The "acks" property only tells the producer whether it should wait for the message to be committed (acks=all), written to the leader (acks=1), or not wait at all (acks=0)
Commit of message means two different things from Kafka's point of view and from Producer point of view.
Because Kafka provides durability guarantees - for Kafka, a message is committed when the leader as well as all the InSyncReplicas have received the message. As example, say a topic is created with RF of 5 (1 leader and 4 followers) and out of those 4 follower replicas, say 2 are InSync. At this point when kafka receives a message, Kafka will consider it committed when the leader and 2 InSyncReplicas get that message.
From producer point of view, the producer application has the flexibility to define when they consider a message to be committed to Kafka.
acks = 0: means producer considers message to be committed without any confirmation (acknowledgement) from Kafka
acks = 1: means producer considers message to be committed with just leader confirming that it got the message
acks = all (default): means producer considers message to be committed when both leader and all the ISR confirms that they got the message.
The difference in the point of view of commit is because Kafka and Producer application might have different priorities. While for Kafka - not loosing a message after it has been received is the priority (and that's why it considers a message committed only when leader and all ISR receives the message); for producer throughput might be the priority and it cannot wait while all the ISRs get the message, so the moment leader gets the message, producer considers it committed safely enough.

kafka partition leader HW and ISR failover

I read from https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication
So, for each committed message, we guarantee that the message is stored in >multiple replicas in memory. However, there is no guarantee that any >replica has persisted the commit message to disks though
It makes sense to only store the message in follower's memory after ack in order to achieve low latency. But the article doesn't tell whether the leader persists the message. What if the leader crashes?
After pondering on the topic, I figured out that the leader also doesn't need to persist the message. The committed message guarantee is provided by the assumption that at least one replica in ISR will survive. A new leader will be elected with committed message available in case of the original leader crash