In a Kafka cluster containing N brokers , for Topic T against a partition, producers publish data to Leader broker. By the term 'commit' in Kafka terminology , does it mean the data is committed in Leader broker or the data is committed to the Leader broker and also to the corresponding Followers available in the ISR list.
This is controlled by the producer configuration setting called ack:
acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.
acks=1 (default) This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee.
Regardless of the "acks" setting in the producer, from a broker perspective a message is considered "committed" when all in-sync replicas for that partition have applied it to their log.
Only committed messages can be read by consumers.
The "acks" property only tells the producer whether it should wait for the message to be committed (acks=all), written to the leader (acks=1), or not wait at all (acks=0)
Commit of message means two different things from Kafka's point of view and from Producer point of view.
Because Kafka provides durability guarantees - for Kafka, a message is committed when the leader as well as all the InSyncReplicas have received the message. As example, say a topic is created with RF of 5 (1 leader and 4 followers) and out of those 4 follower replicas, say 2 are InSync. At this point when kafka receives a message, Kafka will consider it committed when the leader and 2 InSyncReplicas get that message.
From producer point of view, the producer application has the flexibility to define when they consider a message to be committed to Kafka.
acks = 0: means producer considers message to be committed without any confirmation (acknowledgement) from Kafka
acks = 1: means producer considers message to be committed with just leader confirming that it got the message
acks = all (default): means producer considers message to be committed when both leader and all the ISR confirms that they got the message.
The difference in the point of view of commit is because Kafka and Producer application might have different priorities. While for Kafka - not loosing a message after it has been received is the priority (and that's why it considers a message committed only when leader and all ISR receives the message); for producer throughput might be the priority and it cannot wait while all the ISRs get the message, so the moment leader gets the message, producer considers it committed safely enough.
Related
I had a minor doubt on kafka partitions. Lets say I have kafka configs as follows:
replication.factor = 3
min.insync.replicas = 3
replica.lag.time.max.ms = 10000
acks = all
I believe that when the producer sends a message to the broker (partition leader), the leader sends an ack only after the message has been persisted in all the in-sync followers. My doubt here is that
will the leader be "pushing" the event to the In-Sync replicas OR
will the followers be pulling the event from the leader. If this is the case, how does the leader understand that the event has been pulled by all in-sync followers before giving an ack to the producer?
From official Apache Kafka documentation results that followers are pulling messages from the Leader.
Followers consume messages from the leader just as a normal Kafka consumer would and apply them to their own log. Having the followers pull from the leader has the nice property of allowing the follower to naturally batch together log entries they are applying to their log.
The replicas let know the Leader that message is replicated successfully when min.insync.replicas count is reached the message get committed .
We can now more precisely define that a message is considered committed when all replicas in the ISR for that partition have applied it to their log. Only committed messages are ever given out to the consumer.
I'm currently learning more about the Kafka Producer. I am a bit puzzled by the following paragraph from the docs:
Messages written to the partition leader are not immediately readable
by consumers regardless of the producer’s acknowledgement settings.
When all in-sync replicas have acknowledged the write, then the
message is considered committed, which makes it available for reading.
This ensures that messages cannot be lost by a broker failure after
they have already been read. Note that this implies that messages
which were acknowledged by the leader only (that is, acks=1) can be
lost if the partition leader fails before the replicas have copied the
message. Nevertheless, this is often a reasonable compromise in
practice to ensure durability in most cases while not impacting
throughput too significantly.
The way I interpret this is that messages can get lost during the sync between leader and replicated brokers, i.e. messages won't be committed unless they have been successfully replicated.
I don't understand how (for example) the Java application can shield against this message loss.
Does it receive different acknowledgements between 'only-leader' and the full replication?
this is often a reasonable compromise in practice
How is that? Do they assume that you should log failed messages and re-queue them manually? Or how does that work?
"Does it receive different acknowledgements between 'only-leader' and the full replication?"
There is no difference between a leader and replica acknowledgment. You only steer the behavior of the producer through its configuration acks. If it is set to 1 it will wait only for the leader acknowledgment, if you set it to all it will wait for all replicas (based on the replication factor of the topic) before the producer considers writing the message as successful.
If you set acks=all and the synchronisation between leader and replicas fail, your producer will receive a retriable Exception (either "NotEnoughReplicasException" or "NotEnoughReplicasAfterAppendException", see more details here). Based on the producer configuration retries it will try to re-send the message. Kafka is built in a way that it expects crashed brokers to be available again (in a "short" amount of time).
In case you have set acks=1 and the synchronisation between leader and replicas fail, your producer considers the message was successfully written to the cluster and it will not try to reproduce the message. Of course the leader will continue to replicate the message to its replicas. But it is not really guaranteed that this will happen. And before the message got replicated the leader broker itself could have issues which will cause the message to be lost forever.
kafka consumer saves offsets only when it commits.
Thus , when it rise after crash , it can start from previous offset.
But what is the purpose of ack? in case of crash, ack wouldn't help(if they weren't commited)
Acks = 0
No response is requested from the broker, so if the broker goes offline or an exception happens, we will not know and will lose data. Useful for data where it's fine to potentially lose messages as metric or log collection.
The best performance is this because the producer will not wait for any confirmation.
Acks = 1 (Default)
Leader response is requested, but replication is not guarantee, this happens in background. If an acknowledgement is not received, the producer could retry without duplicate data. If the leader broker goes offline but replicas haven't replicated the data yet, we have data lost.
Acks = all
The leader and the replicas are requested by acknowledgement, this means that the producer has to wait to receive confirmation of any replica of the broker before to continue, this add latency but ensures safety, this setting ensure no lose data.
Let's say high watermark for topic partition is 1000 and leader, all follower replicas have same messages exactly. In this scenario, producer sends a message with acks = all and a consumer is consuming from this topic. Is there a possibility here, where a consumer fetch request will be served before other replicas fetch request?
In other words, does leader serve consumer's fetch request before it receives acknowledgements from all in-sync followers in acks = all case?
This is because in our setup, consumer received a message before followers in acks=all case.
In Kafka a message is ready to be consumed after it is added to leader broker, but if you set acks=all leader will wait all in-sync-replicas to replicate message.
Normally it is expected that all replicas of a topic would be in-sync-replicas unless there is a problem in replication process. (if some of replicas become out-of-sync, you can still continue to produce messages if you have enough replicas (min.insync.replicas) even if you set acks=all)
min.insync.replicas: When a producer sets acks to "all" (or "-1"),
min.insync.replicas specifies the minimum number of replicas that must
acknowledge a write for the write to be considered successful. If this
minimum cannot be met, then the producer will raise an exception
(either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
In your case it seems there is no way to bypass replication process if you set acks=all. But you can set acks=1 if you don't want to wait for replication process. With this config a message would be available to consumers right after leader write the message to its local log. (followers will also replicate messages, but leader will not wait them) But you should consider the risk of data loss with this config.
acks=1 This will mean the leader will write the record to its local
log but will respond without awaiting full acknowledgement from all
followers. In this case should the leader fail immediately after
acknowledging the record but before the followers have replicated it
then the record will be lost
In the docs, it's clearly mentioned that the message will be ready for consumption when all in-sync replicas get the message.
Messages written to the partition leader are not immediately readable by consumers regardless of the producer’s acknowledgement settings. When all in-sync replicas have acknowledged the write, then the message is considered committed, which makes it available for reading.
I would guess that you are observing this behavior because you left the min.insync.replicas to the default value which is 1.
The leader partition is included in the min.insync.replicas count, so it means that with min.insync.replicas = 1, it's just the leader that needs to do the write (then acks the producer) and then the message is available to the consumer; it's actually not waiting for the message to be replicated to other followers because the criteria on min.insync.replicas are already met. It makes acks=all the same as acks=1.
You will see a difference if you increase the min.insync.replicas > 1.
I am trying to understand the following behavior of message loss in Kafka. Briefly, when a broker dies early on and subsequently after some message processing, all other brokers die. If the broker that died first starts up, then it does not catch up with other brokers after they come up. Instead all the other brokers report errors and reset their offset to match the first broker. Is this behavior expected and what are the changes/settings to ensure zero message loss?
Kafka version: 2.11-0.10.2.0
Reproducible steps
Started 1 zookeeper instance and 3 kafka brokers
Created one topic with replication factor of 3 and partition of 3
Attached a kafka-console-consumer to topic
Used Kafka-console-producer to produce 2 messages
Killed two brokers (1&2)
Sent two messages
Killed last remaining broker (0)
Bring up broker (1) who had not seen the last two messages
Bring up broker (2) who had seen the last two messages and it shows an error
[2017-06-16 14:45:20,239] INFO Truncating log my-second-topic-1 to offset 1. (ka
fka.log.Log)
[2017-06-16 14:45:20,253] ERROR [ReplicaFetcherThread-0-1], Current offset 2 for
partition [my-second-topic,1] out of range; reset offset to 1 (kafka.server.Rep
licaFetcherThread)
Finally connect kafka-console-consumer and it sees two messages instead of the four that were published.
Response here : https://kafka.apache.org/documentation/#producerconfigs
The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the durability of records that are sent. The following settings are allowed:
acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.
acks=1 This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee. This is equivalent to the acks=-1 setting.
By default acks=1 so set it to 'all' :
acks=all in your producer.properties file
Check if unclean.leader.election.enable = true and if so, set it to false so that only replicas that are insync can become the leader. If an out of sync replica is allowed to become leader then messages can get truncated and lost.