Kafka cluster fail handling explanation

Kafka cluster fail handling explanation - apache-kafka

I would like to set up an apache kafka cluster for using it in a new project. Unfortunately I can't find any detailed explanation on how kafka handle broker fails and network partitioning.
For example, if I have a cluster made of 2 or more brokers and 1 node fail, does the only one node up keep accepting messages?
If yes, when the second come up again, how it resync its missing data?

Have a look here and here at the description of the replication protocol that Kafka uses. Each partition in a Kafka topic has a 'leader', and messages are sent to the leader. Messages are replicated to 'followers'.
So to answer you questions specifically, my understanding is:
if I have a cluster made of 2 or more brokers and 1 node fail, does the only one node up keep accepting messages?
Only one node accepts messages anyway; the leader node. If a follower fails, the leader continues to accept messages.
If the leader fails a new leader is elected from those followers that are up to date.
If yes, when the second come up again, how it resync its missing data?
'Followers' act as consumers of the 'leader', so a follower once brought back up will continue to consume its messages from the leader to get back into sync.

Related

What are the different scenarios where Kafka Broker tries to elect a leader for a partition?

I'm trying to list out all the different scenarios where Kafka does a leader election. So far, as per my research, a leader election is done when a node goes down. The partitions which were present in the node that went down requires new leaders and hence leader election happens. Are there any other scenario's where leader election occurs?
I'm trying this to reproduce NOT_LEADER_FOR_PARTITION exception which I believe occurs when Kafka pushes to a broker which is not the leader for a partition which I believe is due to outdated metadata in the Producer which can be caused by a Leader Election and hence my efforts to reproduce it.
I tried publishing and stopping a VM with broker in it, but haven't been able to replicate it yet.

Yes, your finding is correct. Basically, a Leader election happens when a node goes down in the Kafka cluster(which should be a leader). But it is not the only time the leader election happens. Following are other scenarios in leader election happens:
When a new partition is added
When a leader for a partition becomes unavailable
Migration of partition to a different node
Coming to the NOT_LEADER_FOR_PARTITION exception, in order to replicate that exception you can bring down your leader partition and send the data. But you have to make sure that your metadata has that node's information as the leader. Just to give a little bit more insight into this, the Client fetches the metadata before sending the data (it may not be the same all the time, metadata caching can happen). So you should make sure the metadata that the client has the information of the old leader/non-leader node, instead of a new leader.
In short, You can use any of the above-mentioned scenarios to produce NotLeaderForPartition exception. But the only criterion is your client needs to send data to a non-leader node.

Apache kafka about replica and partitions

I tried to follow
https://medium.com/#iet.vijay/kafka-multi-brokers-multi-consumers-and-message-ordering-b61ad7841875
to create multiple brokers and consumer.
I am able to produce message and consume the same.
when i try to describe the topic the below is the output which I got.
Can some one explain me about the partitions and leader and replicas here in above image.

All producer and consumer requests are sent to the leader broker, which is elected by the Kafka Controller.
Replicas are the non-leader broker. Replicas can be in or out of sync with the leader (ISR = "in sync replica")
The numbers that are shown are each of the broker.id values from the broker properties, which default to increment from 0 if not set
More details at https://kafka.apache.org/documentation/#replication
Worth pointing out that running multiple brokers on a single host is less than ideal; you still have a single point of failure and you're causing unnecessary duplicate writes on single hard drive for each replica

Kafka brokers will share same locations to store data logs; if they are in a cluster

I am reading one of the article related to Kafka basics. If one of the Kafka brokerX dies in a cluster then, that brokerX data copies will move to other live brokers, which are in the cluster.
If that is the case, Is zookeeper/Kafka Controller will copy the brokerX data folder and move to live brokers like copy paste from one machine hard-disc to another (physical copy)?
Or, live brokers will share a common location ? so that will zookeeper/controller will link/point to the brokerX locations(logical copy) ?
I am little hard in understanding here. Could someone help me on this?

If a broker dies, it's dead. There's no background process that will copy data off of it
The replication of topics only happens while the broker is running
Plus, that image is wrong. The partitions = 2 means exactly that. A third partition doesn't just appear when a broker dies

This all depends if the topic has a replication factor > 1. In this case, brokers holding follower replica are constantly sending fetch requests to the leader replica (a specific broker), with a goal of being head to head with the leader (both the follower replica and leader replica having the same records stored on disk).
So when a broker goes down, all it takes is for the controller to select and promote an in-sync replica (by default, but could select non insync replicas) to take over as the leader of the partition. No copy/paste required, all brokers holding a partition(s) (as a follower replica or leader replica) of that topic are storing the same information prior to shutting down.

If a broker dies the behaviour depends on the dead broker. If it was not the leader for its partition it's non problem. when the broker returns on-line it will have to copy all missing data from the leader replica. If the dead broker was the leader for the partition a new leader will be elected according to some rules. If the new elected leader was in sync before the old leader died, there will be no message loss and the follower brokers will sync their replica from the new leader, as the broken leader will do when up again. If the new elected leader was not in sync you might have some message loss. Anyway you can drive the behaviour of your kafka cluster setting various parameters to balance speed, data integrity and reliability.

Fixing under replicated partitions in kafka

In our production environment, we often see that the partitions go under-replicated while consuming the messages from the topics. We are using Kafka 0.11. From the documentation what is understand is
Configuration parameter replica.lag.max.messages was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync.
Configuration parameter replica.lag.time.max.ms now refers not just to the time passed since last fetch request from the replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms will be considered out of sync.
How do we fix this issue? What are the different reasons for replicas go out of sync? In our scenario, we have all the Kafka brokers in the single RACK of the blade servers and all are using the same network with 10GBPS Ethernet(Simplex). I do not see any reason for the replicas to go out of sync due to the network.

We faced the same issue:
Solution was:
Restart the Zookeeper leader.
Restart the broker\brokers that are not replicating some of the partitions.
No data lose.
The issue is due to a faulty state in ZK, there was an opened issue on ZK for this, don't remember the number.

I faced the same issue on Kafka 2.0,
On restart Kafka controller node everything caught-up on the replicas.
But still looking for the reasons why few partitions are under-replicated whereas the other partitions on the same nodes for the same topic works good, and this issue i see on a random partitions.

Do NOT run reassignment for all topics together, consider running it for small portions.
Find the topic that has under-replicated partitions and where reassignment process can't be completed.
Set unclean.leader.election.enable to true for this topic.
Find under-replicated partition that stuck for this topic. Check its leader ID.
Stop the broker (just the service, not the instance).
Execute Preferred Replica Election (in yahoo/kafka-manager or manually).
Start the broker back.
Repeat for the rest of topics that have the same problem.
Also I tried this advice, it didn't help me: https://stackoverflow.com/a/51063607/1929406

how producers find kafka reader

The producers send messages by setting up a list of Kafka Broker as follows.
props.put("bootstrap.servers", "127.0.0.1:9092,127.0.0.1:9092,127.0.0.1:9092");
I wonder "producers" how to know that which of the three brokers knew which one had a partition leader.
For a typical distributed server, either you have a load bearing server or have a virtual IP, but for Kafka, how is it loaded?
Does the producers program try to connect to one broker at random and look for a broker with a partition leader?

A Kafka cluster contains multiple broker instances. At any given time, exactly one broker is the leader while the remaining are the in-sync-replicas (ISR) which contain the replicated data. When the leader broker is taken down unexpectedly, one of the ISR becomes the leader.
Kafka chooses one broker’s partition’s replicas as leader using ZooKeeper. When a producer publishes a message to a partition in a topic, it is forwarded to its leader.
According to Kafka documentation:
The partitions of the log are distributed over the servers in the
Kafka cluster with each server handling data and requests for a share
of the partitions. Each partition is replicated across a configurable
number of servers for fault tolerance.
Each partition has one server which acts as the "leader" and zero or
more servers which act as "followers". The leader handles all read and
write requests for the partition while the followers passively
replicate the leader. If the leader fails, one of the followers will
automatically become the new leader. Each server acts as a leader for
some of its partitions and a follower for others so load is well
balanced within the cluster.
You can find topic and partition leader using this piece of code.
EDIT:
The producer sends a meta request with a list of topics to one of the brokers you supplied when configuring the producer.
The response from the broker contains a list of partitions in those topics and the leader for each partition. The producer caches this information and therefore, it knows where to redirect the messages.

It's quite an old question but I have the same question and after researched, I want to share the answer cuz I hope it can help others.
To determine leader of a partition, producer uses a request type called a metadata request, which includes a list of topics the producer is interested in.
The broker will response specifies which partitions exist in the topics, the replicas for each partition, and which replica is the leader.
Metadata requests can be sent to any broker because all brokers have a metadata cache that contains this information.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse