When producer raise NotEnoughReplicas in Kafka - apache-kafka

Kafka doc min.insync.replicas say:
When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
When used together, min.insync.replicas and acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of "all". This will ensure that the producer raises an exception if a majority of replicas do not receive a write.
But in my understanding, it is not clear when the Producer will raise an NotEnoughReplicas exception.
My two questions:
Does the Producer raise the exception immediately when the number of ISR queues is less than that value and aks=all, or does it wait for a timeout?
When the Producer receives this error, it is also able to retry if the retries does not reach the maximum value?

When the request has been sent to the leader broker it writes into its leader partition and then the follower brokers will replicate that message. The followers must then acknowledge the message write.
This whole process (including replication) must complete within the producer timeout configured by request.timeout.ms parameter and this request.timeout.ms must be larger than the replica.lag.time.max.ms (broker config, which is the time after which the leader will remove a broker from the ISR).
If a follower broker is removed from the ISR after the message is appended to the log, then you get the NotEnoughReplicasAfterAppendException. If it before appending to the log, then you get NotEnoughReplicasException. In either case, retries happen till delivery.timeout.ms.
If your request.timeout.ms elapses before getting acknowledgement from all the replicas in the ISR, then it is retried till max retries is hit or delivery.timeout.ms is elapsed.
Does the Producer raise the exception immediately when the number of
ISR queues is less than that value and aks=all, or does it wait for a
timeout?
There is a timeout (request.timeout.ms) till which it will wait for the acknowledgement.
When the Producer receives this error, it is also able to retry if the
retries does not reach the maximum value?
Retries happen, both NotEnoughReplicasAfterAppendException and NotEnoughReplicasException are RetriableExceptions.

Related

Best way to configure retries in Kaka Producer

With In-Sync replicas configured as Acks=all and min.insync.replicas = N,
Want to understand how retries should be configured for the message/record for unprocessed producer records
Example:
When Kafka fails to process the record with ISR online was N-1 during processing and minimum configured ISR was N replicas.
What is acks?
The acks parameter control how many partition replicas must receive the record before the producer can consider the write successful.
There are 3 values for the acks parameter:
acks=0, the producer will not wait for a reply from the broker before assuming the message sent successfully.
acks=1, the producer will receive a successful response from the broker the moment the leader replica received the message. If the message can't be written to the leader, the producer will receive an error response and can retry.
acks=all, the producer will receive a successful response from the broker once all in-sync replicas received the message.
In your case, acks=all, which is the safest way since you can make sure one more broker has the message.
Retries:
If the producer receives the error message, then the value of the retries property comes into the picture. You can use retry.backoff.ms property to configure the time between retries. It is recommended, test how long it takes to recover from a crashed broker and setting the number of retries and delay between them such that the total amount of time spent retrying will be longer than the time it takes the Kafka cluster to recover from scratch.
Also, check the below link,
https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/
For above scenario, you will get NotEnoughReplicasException. At this stage, Kafka reties the message
Default Retries Value as 0 for kafka <=2.0 and this value set to very high value for Kafka >=2.1
Also, there is another setting with name "retry.backof.ms". Default value for this setting is 100ms. Kafka producer will retry every 100ms to produce this message till it gets succeed.
To avoid it to retry for infinite number of times or you can set "delivery.timout.ms" to make sure the Producer retries to send the message for that man millisecs. Default value for it is 120000ms. which is nothing but 2mins. Means, Producer will not retry after 2mins and considered the message as fail.
Also, you might need to consider "max.in.flight.requests" setting to make sure the retry messages are processed in Sequence when your Kafka messages have a Key in it

Kafka - min.insync.replicas interpretation

I am going through the documentation looking at multiple places, it is adding up confusion..
About the property min.insync.replicas
When a producer sets acks to "all" (or "-1"), this configuration
specifies the minimum number of replicas that must acknowledge a write
for the write to be considered successful. If this minimum cannot be
met, then the producer will raise an exception (either
NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used
together, min.insync.replicas and acks allow you to enforce greater
durability guarantees. A typical scenario would be to create a topic
with a replication factor of 3, set min.insync.replicas to 2, and
produce with acks of "all". This will ensure that the producer raises
an exception if a majority of replicas do not receive a write.
The questions I had,
Is this property had the meaning only if it is used with "acks" as part of "Sending the record" ( Producer) OR does it have any influence as part of the Consumer flow as well ?
What if acks=all and min.insync.replicas = 1(default value :1 ) --> Is it same as acks = 1 ? ( considering replication-factor 3 ?
Update #1
I come across this phrase
"When a producer specifies ack (-1 / all config) it will still wait for acks from all in sync replicas at that moment (independent of the setting for min in-sync replicas). So if you publish when 4 replicas are in sync then you will not get an ack unless all 4 replicas commit the message (even if min in-sync replicas is configured as 2)."
how this phrase is relevant as of today ?Is this property "min in-sync replicas" still independent ?
There are two settings here that affect the producer:
acks - this is a producer-level setting
min.insync.replicas - this is a topic-level setting
The acks property determines how you want to handle writing to kafka:
acks=0 - I don't care about receiving acknowledgment of receipt
acks=1 - Send an acknowledgment when the leader partition has received the batch in memory
all/-1 - Wait for all replicas to receive the batch before sending an acknowledgment
Keep in mind, the receipt in the partition is in memory, Kafka by default doesn't wait for fsync to disk, so acks=1 is not a durable write!
min.insync.replicas is used when there is a problem in the topic, maybe one of the partitions is not in-sync, or offline. When this is the case the cluster will send an ack when min.insync.replicas is satisfied. So 3 replicas, with min.insync.replicas=2 will still be able to write:
The acks property has no affect on the consumers, just that the data won't be written until acks and min.insync.replicas is satisfied.
What if acks=all and min.insync.replicas = 1(default value :1 ) --> Is it same as acks = 1 ? ( considering replication-factor 3 ?
Only if there is a problem with the topic. If you have 3 replicas and min.insync.replicas=1 and two of the partitions are down this is the same as acks=1. If the topic is healthy, the producer will wait for all replicas before sending the ack.

If ISR is less than replication factor and producer acks are set to all, How many acks will the producer wait for?

RF = 3 , ISR = 3 , acks = all >>> sent successfully
RF = 3 , ISR = 2 , acks = all >>> sent successfully
RF = 3 , ISR = 1 , acks = all >>> sent successfully
RF = 3 , ISR = 1 , acks = all , min.isr.replicas = 3 >>> sent successfully !
So, If replication factor is 4, ISR is 3 and producer acks are set to all, How many acks will the producer wait for? I tried different scenarios, What should be the real behavior?
When it comes to the acks setting the replication factor of the topic itself only plays an implicit role. As written in the Broker Configuration documentation and cited below the min.insync.replicas defines the minimum number of successful replications broker-wide before it is seen as a successful write.
min.insync.replicas: When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
When used together, min.insync.replicas and acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of "all". This will ensure that the producer raises an exception if a majority of replicas do not receive a write.
Type: int
Default: 1
Valid Values: [1,...]
Importance: high
Update Mode: cluster-wide
To answer your conrete question: If replication factor is 4, and min.insync.replicas is 3 and producer acks are set to all, then the producer will wait for 3 acknowledgments within the cluster before it is seen as a successful write.
If you set acks=all broker which is the partition leader will wait all in-sync-replicas to replicate the data. In-sync-replica is a replica which is not far behind the partition leader.
What I mean by not far behind:
In Kafka when a message is sent to a topic-partition (firstly message is received and stored in leader) and if replication factor for this topic is greater than 1, then replica broker(s) send fetch request(s) to leader broker and this data is replicated to other broker(s). If replica.lag.time.max.ms is passed from last caught up, replica is considered as out-of-sync and removed from ISR list. (It is still a replica and fetch messages but leader broker doesn't wait it until catch up and became an in-sync-replica again)
From Kafka docs:
Configuration parameter replica.lag.time.max.ms now refers not just to
the time passed since last fetch request from replica, but also to
time since the replica last caught up. Replicas that are still
fetching messages from leaders but did not catch up to the latest
messages in replica.lag.time.max.ms will be considered out of sync.
There is also min.insync.replicas parameter in broker config. It specifies minimum number of in-sync-replicas to continue sending message when acks=all.
min.insync.replicas: When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must
acknowledge a write for the write to be considered successful. If this
minimum cannot be met, then the producer will raise an exception
(either NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used
together, min.insync.replicas and acks allow you to enforce greater
durability guarantees. A typical scenario would be to create a topic
with a replication factor of 3, set min.insync.replicas to 2, and
produce with acks of "all". This will ensure that the producer raises
an exception if a majority of replicas do not receive a write.
If replication factor is 4, ISR is 3 and producer acks are set to
all, How many acks will the producer wait for?
Answer: Broker which is topic-partition leader will wait 3 other brokers in ISR list to replicate the data and send acknowledgement. If number of replicas in ISR list is less than min.insync.replicas then your producer get an exception and cannot produce the data.
Note: You can check current replica and ISR list with command below.
bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic myTopic --describe

Consumer receiving messages before all replicas acknowledge to leader kafka

Let's say high watermark for topic partition is 1000 and leader, all follower replicas have same messages exactly. In this scenario, producer sends a message with acks = all and a consumer is consuming from this topic. Is there a possibility here, where a consumer fetch request will be served before other replicas fetch request?
In other words, does leader serve consumer's fetch request before it receives acknowledgements from all in-sync followers in acks = all case?
This is because in our setup, consumer received a message before followers in acks=all case.
In Kafka a message is ready to be consumed after it is added to leader broker, but if you set acks=all leader will wait all in-sync-replicas to replicate message.
Normally it is expected that all replicas of a topic would be in-sync-replicas unless there is a problem in replication process. (if some of replicas become out-of-sync, you can still continue to produce messages if you have enough replicas (min.insync.replicas) even if you set acks=all)
min.insync.replicas: When a producer sets acks to "all" (or "-1"),
min.insync.replicas specifies the minimum number of replicas that must
acknowledge a write for the write to be considered successful. If this
minimum cannot be met, then the producer will raise an exception
(either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
In your case it seems there is no way to bypass replication process if you set acks=all. But you can set acks=1 if you don't want to wait for replication process. With this config a message would be available to consumers right after leader write the message to its local log. (followers will also replicate messages, but leader will not wait them) But you should consider the risk of data loss with this config.
acks=1 This will mean the leader will write the record to its local
log but will respond without awaiting full acknowledgement from all
followers. In this case should the leader fail immediately after
acknowledging the record but before the followers have replicated it
then the record will be lost
In the docs, it's clearly mentioned that the message will be ready for consumption when all in-sync replicas get the message.
Messages written to the partition leader are not immediately readable by consumers regardless of the producer’s acknowledgement settings. When all in-sync replicas have acknowledged the write, then the message is considered committed, which makes it available for reading.
I would guess that you are observing this behavior because you left the min.insync.replicas to the default value which is 1.
The leader partition is included in the min.insync.replicas count, so it means that with min.insync.replicas = 1, it's just the leader that needs to do the write (then acks the producer) and then the message is available to the consumer; it's actually not waiting for the message to be replicated to other followers because the criteria on min.insync.replicas are already met. It makes acks=all the same as acks=1.
You will see a difference if you increase the min.insync.replicas > 1.

Does min insync replicas property effects consumers in kafka

I have configuration min.insync.replicas=2 and default.replication.factor=3 for my 3 node cluster.
If I try to produce when only one broker is up it was failed as I expected.
But If I try consume when only 1 broker is available the consumer is still able to consume messages. It seems min.insync.replicas=2 is not working for consumers. is it know behavior or I am missing anything ?
min.insync.replicas specifies the minimum number of replicas that must acknowledge a write in order to consider this write as successful and therefore, it has an effect on the producer side which is responsible for the writes. This configuration parameter does not have any direct impact on the consumer side and this is why it does not affect Consumers, even if the number of alive brokers is less than the value of min.insync.replicas.
According to the documentation,
When a producer sets acks to "all" (or "-1"), min.insync.replicas
specifies the minimum number of replicas that must acknowledge a write
for the write to be considered successful. If this minimum cannot be
met, then the producer will raise an exception (either
NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used
together, min.insync.replicas and acks allow you to enforce greater
durability guarantees. A typical scenario would be to create a topic
with a replication factor of 3, set min.insync.replicas to 2, and
produce with acks of "all". This will ensure that the producer raises
an exception if a majority of replicas do not receive a write.