If my producers have ack=all and ISR is 2 and partitions is 2 what is the scenario ?
Number of partitions is irrelevant to the acks setting. The producer will (always) write to the leader partition (pre-determined by the client partitioner).
When you set acks=all, then it'll block and wait for every replica of the record to be written across the cluster using the replication factor of the topic, not ISR setting.
The ISR setting only determines when a topic is unavailable due to brokers being offline, and doesn't affect the producer.
For example, I have a topic that has 2 partitions and a producer using defaultpartitioner (round-robin I assumed) writes to the topic. At some point, partition 1 becomes unavailable because all of the replica brokers go offline. Assuming the messages have no specified keys, will the producer resend the messages to partition 2? or simply gets stuck?
That is an interesting question and we should look at it from a broader (cluster) perspective.
At some point, partition 1 becomes unavailable because all of the replica brokers go offline.
I see the following scenarios:
All replica brokers of partition one are different to the replica brokers of partition two.
All replica brokers of partition one are the same as for partition two.
Some replica brokers are the same as for partition two.
In scenario "1" it means you still have enough brokers alive as the replication factor is a topic-wide not a partition-based configuration. In that case as soon as the first broker goes down its data will be moved to another broker to ensure that your partition always has enough in-sync replicas.
In scenarios "2" both partitions become unavailable and your KafkaProducer will eventually time out. Now, it depends if you have other brokers that are alive and can take on the data of the partitions.
In scenario "3" the dead replicas would be shifted to running brokers. During that time the KafkaProducer will only write to partition 2 as this is the only available partition in the topic. As soon as partition 1 has enough in-sync replicas the producer will start producing again to both partitions.
Actually, I could think of many more scenarios. If you need a more concrete answer you need to specify
how many brokers you have,
what your replication factor actually is and
in what timely order which broker goes down.
Assuming the messages have no specified keys, will the producer resend the messages to partition 2?
The KafkaProducer will not re-send the data that was previously send to partition 1 to partition 2. Whatever was written to partition 1 will stay in partition 1.
I am going through the documentation looking at multiple places, it is adding up confusion..
About the property min.insync.replicas
When a producer sets acks to "all" (or "-1"), this configuration
specifies the minimum number of replicas that must acknowledge a write
for the write to be considered successful. If this minimum cannot be
met, then the producer will raise an exception (either
NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used
together, min.insync.replicas and acks allow you to enforce greater
durability guarantees. A typical scenario would be to create a topic
with a replication factor of 3, set min.insync.replicas to 2, and
produce with acks of "all". This will ensure that the producer raises
an exception if a majority of replicas do not receive a write.
The questions I had,
Is this property had the meaning only if it is used with "acks" as part of "Sending the record" ( Producer) OR does it have any influence as part of the Consumer flow as well ?
What if acks=all and min.insync.replicas = 1(default value :1 ) --> Is it same as acks = 1 ? ( considering replication-factor 3 ?
Update #1
I come across this phrase
"When a producer specifies ack (-1 / all config) it will still wait for acks from all in sync replicas at that moment (independent of the setting for min in-sync replicas). So if you publish when 4 replicas are in sync then you will not get an ack unless all 4 replicas commit the message (even if min in-sync replicas is configured as 2)."
how this phrase is relevant as of today ?Is this property "min in-sync replicas" still independent ?
There are two settings here that affect the producer:
acks - this is a producer-level setting
min.insync.replicas - this is a topic-level setting
The acks property determines how you want to handle writing to kafka:
acks=0 - I don't care about receiving acknowledgment of receipt
acks=1 - Send an acknowledgment when the leader partition has received the batch in memory
all/-1 - Wait for all replicas to receive the batch before sending an acknowledgment
Keep in mind, the receipt in the partition is in memory, Kafka by default doesn't wait for fsync to disk, so acks=1 is not a durable write!
min.insync.replicas is used when there is a problem in the topic, maybe one of the partitions is not in-sync, or offline. When this is the case the cluster will send an ack when min.insync.replicas is satisfied. So 3 replicas, with min.insync.replicas=2 will still be able to write:
The acks property has no affect on the consumers, just that the data won't be written until acks and min.insync.replicas is satisfied.
What if acks=all and min.insync.replicas = 1(default value :1 ) --> Is it same as acks = 1 ? ( considering replication-factor 3 ?
Only if there is a problem with the topic. If you have 3 replicas and min.insync.replicas=1 and two of the partitions are down this is the same as acks=1. If the topic is healthy, the producer will wait for all replicas before sending the ack.
RF = 3 , ISR = 3 , acks = all >>> sent successfully
RF = 3 , ISR = 2 , acks = all >>> sent successfully
RF = 3 , ISR = 1 , acks = all >>> sent successfully
RF = 3 , ISR = 1 , acks = all , min.isr.replicas = 3 >>> sent successfully !
So, If replication factor is 4, ISR is 3 and producer acks are set to all, How many acks will the producer wait for? I tried different scenarios, What should be the real behavior?
When it comes to the acks setting the replication factor of the topic itself only plays an implicit role. As written in the Broker Configuration documentation and cited below the min.insync.replicas defines the minimum number of successful replications broker-wide before it is seen as a successful write.
min.insync.replicas: When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
When used together, min.insync.replicas and acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of "all". This will ensure that the producer raises an exception if a majority of replicas do not receive a write.
Type: int
Default: 1
Valid Values: [1,...]
Importance: high
Update Mode: cluster-wide
To answer your conrete question: If replication factor is 4, and min.insync.replicas is 3 and producer acks are set to all, then the producer will wait for 3 acknowledgments within the cluster before it is seen as a successful write.
If you set acks=all broker which is the partition leader will wait all in-sync-replicas to replicate the data. In-sync-replica is a replica which is not far behind the partition leader.
What I mean by not far behind:
In Kafka when a message is sent to a topic-partition (firstly message is received and stored in leader) and if replication factor for this topic is greater than 1, then replica broker(s) send fetch request(s) to leader broker and this data is replicated to other broker(s). If replica.lag.time.max.ms is passed from last caught up, replica is considered as out-of-sync and removed from ISR list. (It is still a replica and fetch messages but leader broker doesn't wait it until catch up and became an in-sync-replica again)
From Kafka docs:
Configuration parameter replica.lag.time.max.ms now refers not just to
the time passed since last fetch request from replica, but also to
time since the replica last caught up. Replicas that are still
fetching messages from leaders but did not catch up to the latest
messages in replica.lag.time.max.ms will be considered out of sync.
There is also min.insync.replicas parameter in broker config. It specifies minimum number of in-sync-replicas to continue sending message when acks=all.
min.insync.replicas: When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must
acknowledge a write for the write to be considered successful. If this
minimum cannot be met, then the producer will raise an exception
(either NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used
together, min.insync.replicas and acks allow you to enforce greater
durability guarantees. A typical scenario would be to create a topic
with a replication factor of 3, set min.insync.replicas to 2, and
produce with acks of "all". This will ensure that the producer raises
an exception if a majority of replicas do not receive a write.
If replication factor is 4, ISR is 3 and producer acks are set to
all, How many acks will the producer wait for?
Answer: Broker which is topic-partition leader will wait 3 other brokers in ISR list to replicate the data and send acknowledgement. If number of replicas in ISR list is less than min.insync.replicas then your producer get an exception and cannot produce the data.
Note: You can check current replica and ISR list with command below.
bin/kafka-topics.sh --bootstrap-server localhost:9092 --topic myTopic --describe
Let's say high watermark for topic partition is 1000 and leader, all follower replicas have same messages exactly. In this scenario, producer sends a message with acks = all and a consumer is consuming from this topic. Is there a possibility here, where a consumer fetch request will be served before other replicas fetch request?
In other words, does leader serve consumer's fetch request before it receives acknowledgements from all in-sync followers in acks = all case?
This is because in our setup, consumer received a message before followers in acks=all case.
In Kafka a message is ready to be consumed after it is added to leader broker, but if you set acks=all leader will wait all in-sync-replicas to replicate message.
Normally it is expected that all replicas of a topic would be in-sync-replicas unless there is a problem in replication process. (if some of replicas become out-of-sync, you can still continue to produce messages if you have enough replicas (min.insync.replicas) even if you set acks=all)
min.insync.replicas: When a producer sets acks to "all" (or "-1"),
min.insync.replicas specifies the minimum number of replicas that must
acknowledge a write for the write to be considered successful. If this
minimum cannot be met, then the producer will raise an exception
(either NotEnoughReplicas or NotEnoughReplicasAfterAppend).
In your case it seems there is no way to bypass replication process if you set acks=all. But you can set acks=1 if you don't want to wait for replication process. With this config a message would be available to consumers right after leader write the message to its local log. (followers will also replicate messages, but leader will not wait them) But you should consider the risk of data loss with this config.
acks=1 This will mean the leader will write the record to its local
log but will respond without awaiting full acknowledgement from all
followers. In this case should the leader fail immediately after
acknowledging the record but before the followers have replicated it
then the record will be lost
In the docs, it's clearly mentioned that the message will be ready for consumption when all in-sync replicas get the message.
Messages written to the partition leader are not immediately readable by consumers regardless of the producer’s acknowledgement settings. When all in-sync replicas have acknowledged the write, then the message is considered committed, which makes it available for reading.
I would guess that you are observing this behavior because you left the min.insync.replicas to the default value which is 1.
The leader partition is included in the min.insync.replicas count, so it means that with min.insync.replicas = 1, it's just the leader that needs to do the write (then acks the producer) and then the message is available to the consumer; it's actually not waiting for the message to be replicated to other followers because the criteria on min.insync.replicas are already met. It makes acks=all the same as acks=1.
You will see a difference if you increase the min.insync.replicas > 1.
Kafka has introduced rack-id to provide redundancy capabilities if a whole rack fails.
There is a min in-sync replica setting to specify the minimum number of replicas that need to be in-sync before a producer receives an ack (-1 / all config).
There is an unclean leader election setting to specify whether a leader can be elected when it is not in-sync.
So, given the following scenario:
Two racks. Rack 1, 2.
Replication count is 4.
Min in-sync replicas = 2
Producer ack=-1 (all).
Unclean leader election = false
Aiming to have at least once message delivery, redundancy of nodes and tolerant to a rack failure.
Is it possible that there is a moment where the two in-sync replicas both come from rack 1, so the producer receives an ack and at that point rack 1 crashes (before any replicas from rack 2 are in-sync)?
This means that rack 2 will only contain unclean replicas and no producers would be able to add messages to the partition essentially grinding to a halt. The replicas would be unclean so no new leader could be elected in any case.
Is my analysis correct, or is there something under the hood to ensure that the replicas forming min in-sync replicas have to be from different racks?
Since replicas on the same rack would have lower latency it seems that the above scenario is reasonably likely.
The scenario is shown in the image below:
To be technically correct you should fix some of the questions wording. It is not possible to have out of sync replicas "available". Also the min in-sync replica setting specifies the minimum number of replicas that need to be in-sync for the partition to remain available for writes. When a producer specifies ack (-1 / all config) it will still wait for acks from all in sync replicas at that moment (independent of the setting for min in-sync replicas). So if you publish when 4 replicas are in sync then you will not get an ack unless all 4 replicas commit the message (even if min in-sync replicas is configured as 2). It's still possible to construct a scenario similar to your question that highlight the same tradeoff problem by having 2 partitions in rack 2 out of sync first, then publish when the only 2 ISRs are in rack 1, and then take rack 1 down. In that case those partitions would be unavailable for read or write. So the easiest fix to this problem would be to increase min in-sync replicas to 3. Another less fault tolerant fix would be to reduce replication factor to 3.
Yes, I think It is possible. Because Kafka can only maintain the ISR according to the runtime's fact, not by its spirit.
words from https://engineering.linkedin.com/kafka/intra-cluster-replication-apache-kafka
for each partition of a topic, we maintain an in-sync replica set (ISR). This is the set of replicas that are alive and have fully caught up with the leader (note that the leader is always in ISR). When a partition is created initially, every replica is in the ISR. When a new message is published, the leader waits until it reaches all replicas in the ISR before committing the message. If a follower replica fails, it will be dropped out of the ISR and the leader then continues to commit new messages with fewer replicas in the ISR. Notice that now, the system is running in an under replicated mode.
Words from https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication
After a configured timeout period, the leader will drop the failed follower from its ISR and writes will continue on the remaining replicas in ISR. If the failed follower comes back, it first truncates its log to the last checkpointed HW. It then starts to catch up all messages after its HW from the leader. When the follower fully catches up, the leader will add it back to the current ISR.
The min in-sync replicas you mentioned is just a limit number, the ISR size does not depend on it. this settings means if the producer's ack is "all" and the ISR size is less than min, then kafka will refuse to write this message.
So in the first time, the ISR is {1,2,3,4}, and if the broker 3 or 4 fall down, It will be kicked out from ISR. And the case you mentioned will happen.
When rack 1's broker failed, this will be an unclean leader election.