kafka partition leader HW and ISR failover - apache-kafka

I read from https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication
So, for each committed message, we guarantee that the message is stored in >multiple replicas in memory. However, there is no guarantee that any >replica has persisted the commit message to disks though
It makes sense to only store the message in follower's memory after ack in order to achieve low latency. But the article doesn't tell whether the leader persists the message. What if the leader crashes?

After pondering on the topic, I figured out that the leader also doesn't need to persist the message. The committed message guarantee is provided by the assumption that at least one replica in ISR will survive. A new leader will be elected with committed message available in case of the original leader crash

Related

Kafka docs Producer possible message loss

I'm currently learning more about the Kafka Producer. I am a bit puzzled by the following paragraph from the docs:
Messages written to the partition leader are not immediately readable
by consumers regardless of the producer’s acknowledgement settings.
When all in-sync replicas have acknowledged the write, then the
message is considered committed, which makes it available for reading.
This ensures that messages cannot be lost by a broker failure after
they have already been read. Note that this implies that messages
which were acknowledged by the leader only (that is, acks=1) can be
lost if the partition leader fails before the replicas have copied the
message. Nevertheless, this is often a reasonable compromise in
practice to ensure durability in most cases while not impacting
throughput too significantly.
The way I interpret this is that messages can get lost during the sync between leader and replicated brokers, i.e. messages won't be committed unless they have been successfully replicated.
I don't understand how (for example) the Java application can shield against this message loss.
Does it receive different acknowledgements between 'only-leader' and the full replication?
this is often a reasonable compromise in practice
How is that? Do they assume that you should log failed messages and re-queue them manually? Or how does that work?
"Does it receive different acknowledgements between 'only-leader' and the full replication?"
There is no difference between a leader and replica acknowledgment. You only steer the behavior of the producer through its configuration acks. If it is set to 1 it will wait only for the leader acknowledgment, if you set it to all it will wait for all replicas (based on the replication factor of the topic) before the producer considers writing the message as successful.
If you set acks=all and the synchronisation between leader and replicas fail, your producer will receive a retriable Exception (either "NotEnoughReplicasException" or "NotEnoughReplicasAfterAppendException", see more details here). Based on the producer configuration retries it will try to re-send the message. Kafka is built in a way that it expects crashed brokers to be available again (in a "short" amount of time).
In case you have set acks=1 and the synchronisation between leader and replicas fail, your producer considers the message was successfully written to the cluster and it will not try to reproduce the message. Of course the leader will continue to replicate the message to its replicas. But it is not really guaranteed that this will happen. And before the message got replicated the leader broker itself could have issues which will cause the message to be lost forever.

what is the purpose of kafka ack?

kafka consumer saves offsets only when it commits.
Thus , when it rise after crash , it can start from previous offset.
But what is the purpose of ack? in case of crash, ack wouldn't help(if they weren't commited)
Acks = 0
No response is requested from the broker, so if the broker goes offline or an exception happens, we will not know and will lose data. Useful for data where it's fine to potentially lose messages as metric or log collection.
The best performance is this because the producer will not wait for any confirmation.
Acks = 1 (Default)
Leader response is requested, but replication is not guarantee, this happens in background. If an acknowledgement is not received, the producer could retry without duplicate data. If the leader broker goes offline but replicas haven't replicated the data yet, we have data lost.
Acks = all
The leader and the replicas are requested by acknowledgement, this means that the producer has to wait to receive confirmation of any replica of the broker before to continue, this add latency but ensures safety, this setting ensure no lose data.

Consumer receiving messages before all replicas acknowledge to leader kafka

Let's say high watermark for topic partition is 1000 and leader, all follower replicas have same messages exactly. In this scenario, producer sends a message with acks = all and a consumer is consuming from this topic. Is there a possibility here, where a consumer fetch request will be served before other replicas fetch request?
In other words, does leader serve consumer's fetch request before it receives acknowledgements from all in-sync followers in acks = all case?
This is because in our setup, consumer received a message before followers in acks=all case.
In Kafka a message is ready to be consumed after it is added to leader broker, but if you set acks=all leader will wait all in-sync-replicas to replicate message.
Normally it is expected that all replicas of a topic would be in-sync-replicas unless there is a problem in replication process. (if some of replicas become out-of-sync, you can still continue to produce messages if you have enough replicas (min.insync.replicas) even if you set acks=all)
min.insync.replicas: When a producer sets acks to "all" (or "-1"),
min.insync.replicas specifies the minimum number of replicas that must
acknowledge a write for the write to be considered successful. If this
minimum cannot be met, then the producer will raise an exception
(either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
In your case it seems there is no way to bypass replication process if you set acks=all. But you can set acks=1 if you don't want to wait for replication process. With this config a message would be available to consumers right after leader write the message to its local log. (followers will also replicate messages, but leader will not wait them) But you should consider the risk of data loss with this config.
acks=1 This will mean the leader will write the record to its local
log but will respond without awaiting full acknowledgement from all
followers. In this case should the leader fail immediately after
acknowledging the record but before the followers have replicated it
then the record will be lost
In the docs, it's clearly mentioned that the message will be ready for consumption when all in-sync replicas get the message.
Messages written to the partition leader are not immediately readable by consumers regardless of the producer’s acknowledgement settings. When all in-sync replicas have acknowledged the write, then the message is considered committed, which makes it available for reading.
I would guess that you are observing this behavior because you left the min.insync.replicas to the default value which is 1.
The leader partition is included in the min.insync.replicas count, so it means that with min.insync.replicas = 1, it's just the leader that needs to do the write (then acks the producer) and then the message is available to the consumer; it's actually not waiting for the message to be replicated to other followers because the criteria on min.insync.replicas are already met. It makes acks=all the same as acks=1.
You will see a difference if you increase the min.insync.replicas > 1.

What do we mean by 'commit' data in Kafka broker?

In a Kafka cluster containing N brokers , for Topic T against a partition, producers publish data to Leader broker. By the term 'commit' in Kafka terminology , does it mean the data is committed in Leader broker or the data is committed to the Leader broker and also to the corresponding Followers available in the ISR list.
This is controlled by the producer configuration setting called ack:
acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.
acks=1 (default) This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee.
Regardless of the "acks" setting in the producer, from a broker perspective a message is considered "committed" when all in-sync replicas for that partition have applied it to their log.
Only committed messages can be read by consumers.
The "acks" property only tells the producer whether it should wait for the message to be committed (acks=all), written to the leader (acks=1), or not wait at all (acks=0)
Commit of message means two different things from Kafka's point of view and from Producer point of view.
Because Kafka provides durability guarantees - for Kafka, a message is committed when the leader as well as all the InSyncReplicas have received the message. As example, say a topic is created with RF of 5 (1 leader and 4 followers) and out of those 4 follower replicas, say 2 are InSync. At this point when kafka receives a message, Kafka will consider it committed when the leader and 2 InSyncReplicas get that message.
From producer point of view, the producer application has the flexibility to define when they consider a message to be committed to Kafka.
acks = 0: means producer considers message to be committed without any confirmation (acknowledgement) from Kafka
acks = 1: means producer considers message to be committed with just leader confirming that it got the message
acks = all (default): means producer considers message to be committed when both leader and all the ISR confirms that they got the message.
The difference in the point of view of commit is because Kafka and Producer application might have different priorities. While for Kafka - not loosing a message after it has been received is the priority (and that's why it considers a message committed only when leader and all ISR receives the message); for producer throughput might be the priority and it cannot wait while all the ISRs get the message, so the moment leader gets the message, producer considers it committed safely enough.

Do you know why the partition leader is " - 1 " and how to solve it?

i have tried to stop all brokers belonging to the same ISR for testing.
All producer was sending exactly partition 0. When the broker 11 and 12 shut down, consumer could no longer receive the message. And leader got - 1.
The partition did not function normally until came back the corresponding leader.
Do you know what is the solution? And why?
When all ISR brokers are stopped.
This is expected behavior. However, if you favor availability over consistency you can enable unclean leader election (see docs: https://kafka.apache.org/documentation/#brokerconfigs). This is not recommended unless you are sure you want availability at the cost of message loss.
unclean.leader.election.enable
Indicates whether to enable replicas not in the ISR set to be elected as leader as a last resort, even though doing so may result in data loss
If the broker in ISR are down, then that partition goes offline.This is how the framework works , please read the kafka documentation for more details.