Messages are rejected since there are fewer in-sync replicas than required - apache-kafka

we have 3 kafka brokers with min.insync.replicas=2 and all the brokers(3) are insync .
we are getting the error Messages are rejected since there are fewer in-sync replicas than required for some of the topics.
since we have all the brokers in sync we were not expecting this error

min.insync.replicas is both a topic and broker configuration. Do you have this set anywhere else?

Related

When Kafka send acknowledgement if acks=all and all replicas are healthy?

Let the Kafka producer is configured with acks=all and there are 5 broker replicas with min.insync.replica=2 configuration.
I understand that if 3 brokers go offline, but 2 brokers are still up and healthy, then the producer will still get the message acknowledgement and will be able to continue sending the messages.
But what happens if all 5 brokers are up and healthy:
#1 Will the producer receive the acknowledgement only after all 5 brokers write the message to themselves?
#2 Or will the producer already receiveĀ the acknowledgement after 2 brokers write the message? So it will not wait for the feedback from the remaining other 3 brokers?
I am interesting in the throughput in the case when all broker replicas are healthy: will the throughput with min.insync.replica=2 will be higher then with min.insync.replica=5 (for acks=all andĀ  5 broker replicas)?
Producer will get the acknowledgement the moment data in written to min.insync.replicas that is 2 in this case. It will not wait for all brokers to get the message.
And yes, the throughput will definitely be higher when min.insync.replica=2
That is the tradeoff kafka is trying to give the user. If throughput is critical for you, keep min.insync.replica lower, but if reliability is more critical to you keep min.insync.replica higher.

Would the produced message be copied to all brokers irrespective of the replication factor in kafka

Let's say I've a kafka cluster with 5 brokers and my replication factor is 3. With this configuration, if I send/produce a message, would it be copied to just 3 nodes or all 5 nodes but acknowledges after copying to 3 nodes?
Normally it will be replicated to 3 brokers. But acknowledgement is up to ack config of the producer and min.insync.replicas config.
acks=0 means no acknowledgement. Producer sends message and don't care if it arrives to broker. You can lose messages.
acks=1 means leader acknowledgement. Acknowledgement is sent when leader gets the message without waiting other replicas to replicate message.
acks=all means acknowledgement will be sent when all in-sync-replicas write the message (leader waits in-sync-replicas to replicate)
min.insync.replicas means minimum number of in-sync-replicas to produce messages.
For example:
If you have 3 brokers and replication factor of a topic is 3 and min.insync.replicas is 1, then at the beginning the messages you produce are sent to leader and 2 replicas replicate it. But in case of broker failure or slowness in some of the brokers your number of in-sync-replicas can be just 1. At that point even you set acks=all your messages will be stored just in leader. (until the problem in brokers fixed and they catch up the leader)
So minimum recommended configuration to avoid message lost is having 3 brokers and this config:
topic replication factor=3
min.insync.replicas=2
acks=all
But if you want 3 replicas to get acknowledgement in any case, then this configuration will be fine:
numer of brokers in cluster=5
topic replication factor=3
min.insync.replicas=3
acks=all
**With this config you can also tolerate up to 2 broker failures in cluster.

Apache Kafka Cluster Consumer Fail Over

I had a set up of 3 Node Zookeeper and 3 Broker Cluster when one of my brokers goes down in Cluster, the producer is not giving any Error but, consumers will throw an error saying that...
Marking coordinator Dead for the group... Discovered coordinator for
the group.
According to my knowledge if any one Broker available across the cluster I should not be stopped consuming messages.
But, as of now Server.1, server.2, server.3 if my server.2 goes down my all consumers stops consuming messages.
What are the exact parameters to set to achieve failover of producers and as well as consumers?
if my server.2 goes down my all consumers stops consuming messages.
For starters, you disable unclear leader election in the brokers, and create your topics with --replication-factor=3 and a configuration of min.insync.replicas=2.
To ensure that a producer has at least two durable writes (as set by the in-sync replcicas), then set acks=all
Then, if any broker fails, and assuming a leader election does not have any error, a producer and consumer should seemlessly re-connect to the new leader TopicPartitions.

What happens when one of the Kafka replicas is down

I have a cluster of 2 Kafka brokers and a topic with replication factor 2. If one of the brokers dies, will my producers be able to continue sending new messages to this degraded cluster of 1 node? Or replication factor 2 requires 2 alive nodes and messaged will be refused?
It depends on a few factors:
What is your producer configuration for acks? If you configure to "all", the leader broker won't answer with an ACK until the message have been replicated to all nodes in ISR list. At this point is up to your producer to decide if he cares about ACKs or not.
What is your value for min.insync.replicas? If the number of nodes is below this config, your broker leader won't accept more messages from producers until more nodes are available.
So basically your producers may get into a pause for a while, until more nodes are up.
Messages will not be ignored if the no. of alive brokers is lesser than the configured replicas. Whenever a new Kafka broker joins the cluster, the data gets replicated to that node.
You can reproduce this scenario by configuring the replication factor as 3 or more and start only one broker.
Kafka will handle reassigning partitions for producers and consumers that where dealing with the partitions lost, but it will problematic for new topics.
You could start one broker with a replication factor of 2 or 3. It does work. However, you could not create a topic with that replication factor until you have that amount of brokers in the cluster. Either the topic is auto generated on the first message or created manually, kafka will throw an error.
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic test
Error while executing topic command : Replication factor: 3 larger than available brokers: 1.
[2018-08-08 15:23:18,339] ERROR org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 3 larger than available brokers: 1.
As soon as new node joined to the kafka cluster, data will be replicated, the replicas factor does not effect the publisher messages
replication-factor 2 doesn't require 2 live brokers, it publish message while one broker is down depends on those configurations
- acks
- min.insync.replicas
Check those configurations as mentioned above #Javier

Apache Kafka Topic Partition Message Handling

I'm a bit confused on the Topic partitioning in Apache Kafka. So I'm charting down a simple use case and I would like to know what happens in different scenarios. So here it is:
I have a Topic T that has 4 partitions TP1, TP2, TP4 and TP4.
Assume that I have 8 messages M1 to M8. Now when my producer sends these messages to the topic T, how will they be received by the Kafka broker under the following scenarios:
Scenario 1: There is only one kafka broker instance that has Topic T with the afore mentioned partitions.
Scenario 2: There are two kafka broker instances with each node having same Topic T with the afore mentioned partitions.
Now assuming that kafka broker instance 1 goes down, how will the consumers react? I'm assuming that my consumer was reading from broker instance 1.
I'll answer your questions by walking you through partition replication, because you need to learn about replication to understand the answer.
A single broker is considered the "leader" for a given partition. All produces and consumes occur with the leader. Replicas of the partition are replicated to a configurable amount of other brokers. The leader handles replicating a produce to the other replicas. Other replicas that are caught up to the leader are called "in-sync replicas." You can configure what "caught up" means.
A message is only made available to consumers when it has been committed to all in-sync replicas.
If the leader for a given partition fails, the Kafka coordinator will elect a new leader from the list of in-sync replicas and consumers will begin consuming from this new leader. Consumers will have a few milliseconds of added latency while the new leader is elected. A new coordinator will also be elected automatically if the coordinator fails (this adds more latency, too).
If the topic is configured with no replicas, then when the leader of a given partition fails, consumers can't consume from that partition until the broker that was the leader is brought back online. Or, if it is never brought back online, the data previously produced to that partition will be lost forever.
To answer your question directly:
Scenario 1: if replication is configured for the topic, and there exists an in-sync replica for each partition, a new leader will be elected, and consumers will only experience a few milliseconds of latency because of the failure.
Scenario 2: now that you understand replication, I believe you'll see that this scenario is Scenario 1 with a replication factor of 2.
You may also be interested to learn about acks in the producer.
In the producer, you can configure acks such that the produce is acknowledged when:
the message is put on the producer's socket buffer (acks=0)
the message is written to the log of the lead broker (acks=1)
the message is written to the log of the lead broker, and replicated to all other in-sync replicas (acks=all)
Further, you can configure the minimum number of in-sync replicas required to commit a produce. Then, in the event when not enough in-sync replicas exist given this configuration, the produce will fail. You can build your producer to handle this failure in different ways: buffer, retry, do nothing, block, etc.