min.insync.replicas vs. unclean.leader.election - apache-kafka

I want to achieve reliable data delivery using a Kafka topic.
If I set min.insync.replicas = 2, may I not change the default value of unclean.leader.election.enable (default value is true)?
Or should I additionally set unclean.leader.election.enable to false?
If min.insync.replicas equal 2, is there a risk of data loss because of unclean leader election?

Actually, whether the data will be lost depends on the circumstances. I could explain this through the scenario:
min.insync.replicas=2 && unclean.leader.election.enable=true (It is default value)
In these case, data(message 3) will be lost. Since unclean leader election was allowed, Broker 1 is made the new leader even though it was not in-sync with the other broker. Otherwise, While unclean.leader.election.enable is a cluster-wide setting, you can override this configuration per topic.
The image is from Kafka In Action book.

If you don't set the value, it'll take the default, of course.
Those settings are for different purposes, though; setting one won't override the other.
If you have 2 in-sync replicas (and your producers have ack'd all their messages), then in theory, you would always have at least one clean leader that can be elected.

unclean.leader.election.enable=true
will allow non ISR replicas to become leader, ensuring availability but consistency is not guaranteed as data loss will occur.

Related

Is it possible to decrease min.insync.replicas?

I'm planning to run a Kafka cluster in a production environment. Before deploying the cluster, I try to find the best configuration to ensure HA and data consistency.
I read in the official doc that it is not possible to reduce the partition replication factor, but what about the min.insync.replicas? When I decrease the value on a test environment I don't see any differences when I look at topics description. After changing the value from 3 to 2, I still have 3 ISR. Is it because it's a min value or because the configuration change is not taken into account?
Yes, it is possible to reduce (or in general change) the min.insync.replicas configuration of a topic.
However, as you were looking for the best configuration to ensure high availablity and data consistency, it would be counter-intuitive to reduce the value.
There is a big difference between ISR (in sync replicas) and the setting min.insync.replicas. The ISR shown in the kafka-topics --describe just tells you how healthy the data in your topic is and if all the replicas keep up with the partitions leader.
On the other hand, the min.insync.replicas works together with a KafkaProducer writing to the topic with that setting. It is described on the official Kafka docs on TopicConfigs as:
When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful.
To emphasize again, if you are looking for high availabilty and consistency it is best to set acks = all in your producer while at the same time keeping the min.insync.replicas high.

Does min insync replicas property effects consumers in kafka

I have configuration min.insync.replicas=2 and default.replication.factor=3 for my 3 node cluster.
If I try to produce when only one broker is up it was failed as I expected.
But If I try consume when only 1 broker is available the consumer is still able to consume messages. It seems min.insync.replicas=2 is not working for consumers. is it know behavior or I am missing anything ?
min.insync.replicas specifies the minimum number of replicas that must acknowledge a write in order to consider this write as successful and therefore, it has an effect on the producer side which is responsible for the writes. This configuration parameter does not have any direct impact on the consumer side and this is why it does not affect Consumers, even if the number of alive brokers is less than the value of min.insync.replicas.
According to the documentation,
When a producer sets acks to "all" (or "-1"), min.insync.replicas
specifies the minimum number of replicas that must acknowledge a write
for the write to be considered successful. If this minimum cannot be
met, then the producer will raise an exception (either
NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used
together, min.insync.replicas and acks allow you to enforce greater
durability guarantees. A typical scenario would be to create a topic
with a replication factor of 3, set min.insync.replicas to 2, and
produce with acks of "all". This will ensure that the producer raises
an exception if a majority of replicas do not receive a write.

How to achieve strong consistency in Kafka?

Try to understanding consistency maintenance in Kafka. Please find the scenario and help to understand.
Number of partition = 2
Replication factor = 3
Number of broker in the cluster = 4
In that case, for achieving the strong consistency how many nodes should acknowledge. Either ack = all or ack = 3 or any other value. Please confirm for the same.
You might be interested in seeing When it Absolutely, Positively, Has to be There talk from Kafka Summit.
Which was given by an engineer at Cloudera, and Cloudera has their own documenation on Kafka availability
To summarize, more than 1 replica and higher than 1 in-sync replica is a good start. Then on the producer, if you are okay with sacrificing throughput for data availability, meaning you must have all replicas be written before continuing, then acks=all. Otherwise, if you trust the leader broker to be highly available with unclean leader election is false, then acks=1 should be okay in most cases.
acks=3 isn't a valid config, by the way. I think you are looking for min.insync.replicas=2 and acks=all with a replication factor of 3; from above link
If min.insync.replicas is set to 2 and acks is set to all, each message must be written successfully to at least two replicas. This guarantees that the message is not lost unless both hosts crash
Also, you can enable the transactional producer, as of Kafka 0.11 to work towards exactly once processing
enable.idempotence=true
In your setting, what you have is
4 brokers
Replication factor = 3
That means each message in a given partition will be replicated to 3 out of 4 brokers, including the leader for that partition.
In-order to achieve strong consistency guarantees, you have to set min.insync.replicas to 2 and use acks=all. This way, you are guaranteed that each write goes to at-least 2 out of 3 brokers which hold the data, before which it is acknowledged.
Setting acks to all provides the highest consistency guarantee at the expense of slower writes to the cluster.
If you use older versions of Kafka where unclean leader election is true by default, you should also consider setting that to false explicitly. This way, an out of sync. broker won't be elected as the leader in case of leader crashes (effectively compromising availability).
Also, Kafka is a system where all the reads go through the leader. This is a bit different from some other distributed system such as zookeeper which supports read replicas. So you do not have a situation where a client ends up reading directly from a stale broker. Leader ensures that writes are ordered and replicated to designated number of in-sync replicas and acknowledged based on your acks setting.
If you are looking for consistency as in realm of ACID property, all replicas need to be acknowledged. Since you have 3 replicas, all of those 3 nodes should be acknowledged.

Why Kafka is not P in CAP theorem

The main developer of Kafka said Kafka is CA but P in CAP theorem. But I'm so confused, is Kafka not Partition tolerate? I think it does, when one replication is down the other would become leader and continue work!
Also, I would like to know what if Kafka uses P? Would P hurt C or A?
If you read how CAP defines C, A and P, "CA but not P" just means that when an arbitrary network partition happens, each Kafka topic-partition will either stop serving requests (lose A), or lose some data (lose C), or both, depending on its settings and partition's specifics.
If a network partition splits all ISRs from Zookeeper, with default configuration unclean.leader.election.enable = false, no replicas can be elected as a leader (lose A).
If at least one ISR can connect, it will be elected, so it can still serve requests (preserve A). But with default min.insync.replicas = 1 an ISR can lag behind the leader by approximately replica.lag.time.max.ms = 10000. So by electing it Kafka potentially throws away writes confirmed to producers by the ex-leader (lose C).
Kafka can preserve both A and C for some limited partitions. E.g. you have min.insync.replicas = 2 and replication.factor = 3, and all 3 replicas are in-sync when a network partition happens, and it splits off at most 1 ISR (either a single-node failures, or a single-DC failure or a single cross-DC link failure).
To preserve C for arbitrary partitions, you have to set min.insync.replicas = replication.factor. This way, no matter which ISR is elected, it is guaranteed to have the latest data. But at the same time it won't be able to serve write requests until the partition heals (lose A).
CAP Theorem states that any distributed system can provide at most two out of the three guarantees: Consistency, Availability and Partition tolerance.
According to the Engineers at LinkedIn (where Kafka was initially founded) Kafka is a CA system:
All distributed systems must make trade-offs between guaranteeing
consistency, availability, and partition tolerance (CAP Theorem). Our
goal was to support replication in a Kafka cluster within a single
datacenter, where network partitioning is rare, so our design focuses
on maintaining highly available and strongly consistent replicas.
Strong consistency means that all replicas are byte-to-byte identical,
which simplifies the job of an application developer.
However, I would say that it depends on your configuration and more precisely on the variables acks, min.insync.replicas and replication.factor. According to the docs,
If a topic is configured with only two replicas and one fails (i.e.,
only one in sync replica remains), then writes that specify acks=all
will succeed. However, these writes could be lost if the remaining
replica also fails. Although this ensures maximum availability of the
partition, this behavior may be undesirable to some users who prefer
durability over availability. Therefore, we provide two topic-level
configurations that can be used to prefer message durability over
availability:
Disable unclean leader election - if all replicas become unavailable, then the partition will remain unavailable until the most
recent leader becomes available again. This effectively prefers
unavailability over the risk of message loss. See the previous section
on Unclean Leader Election for clarification.
Specify a minimum ISR size - the partition will only accept writes if the size of the ISR is above a certain minimum, in order to prevent
the loss of messages that were written to just a single replica, which
subsequently becomes unavailable. This setting only takes effect if
the producer uses acks=all and guarantees that the message will be
acknowledged by at least this many in-sync replicas. This setting
offers a trade-off between consistency and availability. A higher
setting for minimum ISR size guarantees better consistency since the
message is guaranteed to be written to more replicas which reduces the
probability that it will be lost. However, it reduces availability
since the partition will be unavailable for writes if the number of
in-sync replicas drops below the minimum threshold.
CAP is a proofed theorem so there is no distributed system that can have features C, A and P altogether during failure. In case Kafka uses the P, that is when the cluster split into two or more isolate part it can continue the functioning, one of the C or A should be sacrificed.
Maybe if we consider Kafka and Zookeeper nodes as a whole cluster, because Kafka needs zookeeper nodes, we can not consider it partition tolerant in case of losing connection to zookeeper nodes.

kafka consistent when replication-factor = 2 and minimum ISR size = 1

in kafka, for
replication-factor = 2
minimum ISR size = 1
unclean.leader.election.enable = false
is there a chance that(like network partition), two broker think they'are leader and both accept write, so finally some msg lost? and the producer does't even notice this.
producer use acks = all
Similar question has been answered here :How does kafka handle network partitions?
In your case, I think there is no problem when network partitioning. Since unclean.leader.election.enable is false, one of two side cannot elect new leader so only the other side can accept write.
With minimum ISR set to 1, your cluster can have only a single broker with the data at any time, so if the disk of this broker was to blow up, you risk losing data.
If you want stronger guarantees, you need to increase the minimum ISR size. For example, if you set it to 2, at any time at least 2 brokers will have all the data. So in order to lose data in this configuration, you would need to lose the disks of both brokers within the same time frame which is a lot less likely than just losing a single disk.
If you increase minimum ISR, to ease maintenance, you probably also want to bump up the number of replicas so you can have 1 broker down and still be able to produce with acks = all.
Since you have replication factor as 2. Having 1 ISR out of two is sufficient. It means that Even if the leader goes down you have 1 replica to handle the transactions. Having more replicas will lead to higher write overhead and might slow down the throughput. You can have higher number of replicas at the cost of performance for reliability.