Kafka - How to recover if a partition is lost? - apache-kafka

I have 4 Kafka nodes in a cluster, one topic split to 40 partitions and replica count 2. Kafka version is 2.3.1.
How can I recover from the situation when two Kafka nodes die at the same time, it is not possible to start them again and Kafka logs are lost?
I'm sure that I lose some data because some partitions are lost (some partitions have replicas only on the died nodes).
I tried to add two new Kafka nodes and reassign partitions to all 4 available Kafka nodes. Finally, lost partitions are not reassigned to the two new Kafka nodes. Clients cannot publish data that go to lost partitions.

Kafka recovers by himself the losing partitions only if those partitions still have at least one alive replica that was previously in sync. Otherwise unclean.leader.election must be enabled on the brokers to move the leader to an out of sync replica
Since partitions had only 2 replica and you lost 2 nodes, you might lose some partitions.
You can replace 2 replica by 4 replica to more reliability
The two added nodes should have the same id as the previous ones to be able to pull replica.

Related

Apache kafka about replica and partitions

I tried to follow
https://medium.com/#iet.vijay/kafka-multi-brokers-multi-consumers-and-message-ordering-b61ad7841875
to create multiple brokers and consumer.
I am able to produce message and consume the same.
when i try to describe the topic the below is the output which I got.
Can some one explain me about the partitions and leader and replicas here in above image.
All producer and consumer requests are sent to the leader broker, which is elected by the Kafka Controller.
Replicas are the non-leader broker. Replicas can be in or out of sync with the leader (ISR = "in sync replica")
The numbers that are shown are each of the broker.id values from the broker properties, which default to increment from 0 if not set
More details at https://kafka.apache.org/documentation/#replication
Worth pointing out that running multiple brokers on a single host is less than ideal; you still have a single point of failure and you're causing unnecessary duplicate writes on single hard drive for each replica

How failover works in kafka along with keeping up replication factor

I am trying to understand how failover and replication factors work in kafka.
Let's say my cluster has 3 brokers and replication factor is also 3. In this case each broker will have one copy of partition and one of the broker is leader. If leader broker fails, then one of the follower broker will become leader but now the replication factor is down to 2. At this point if I add a new broker in the cluster, will kafka make sure that replication factor is 3 and will it copy the required data on the new broker.
How will above scenario work if my cluster already has an addition broker.
In your setup (3 broker, 3 replicas), when 1 broker fails Kafka will automatically elect new leaders (on the remaining brokers) for all the partitions whose leaders were on the failing broker.
The replication factor does not change. The replication factor is a topic configuration that can only be changed by the user.
Similarly the Replica list does not change. This lists the brokers that should host each partition.
However, the In Sync Replicas (ISR) list will change and only contain the 2 remaining brokers.
If you add another broker to the cluster, what happens depend on its broker.id:
if the broker.id is the same as the broker that failed, this new broker will start replicating data and eventually join the ISR for all the existing partitions.
if it uses a different broker.id, nothing will happen. You will be able to create new topics with 3 replicas (that is not possible while there are only 2 brokers) but Kafka will not automatically replicate existing partitions. You can manually trigger a reassignment if needed, see the docs.
Leaving out partitions (which is another concept of Kafka):
The replication factor does not say how many times a topic is replicated, but rather how many times it should be replicated. It is not affected by brokers shutting down.
Once a leader broker shuts down, the "leader" status goes over to another broker which is in sync, that means a broker that has the current state replicated and is not behind. Electing "leader" status to a broker that is not in sync would obviously lead to data loss, so this will never happen (when using the right settings).
These replicas eligible for taking "leader status" are called in-sync replica (ISR), which is important, as there is a configuration called min.insync.replicas that specifies how many ISR have to exist for a Kafka message to be acknowledged. If this is set to 0, every Kafka message is acknowledged as "successful" as soon as it enters the "leader" broker, if this broker would die, all data that was not replicated yet is lost. If min.insync.replicas would be set to 1, every message waits with the acknowledgement, until at least 1 replica exists in order to be "successful", so if the broker would die now, there would be a replica covering this data. If there are not enough brokers to cover the minimum amount of replicas, your cluster will fail eventually.
So to answer your question: if you had 2 running brokers, min.insync.replicas=1 (default) and replication factor of 3, your cluster runs fine and will add a replica as soon as you start up another broker. If another of the 2 brokers dies before you launch the third one, you will run into problems.

How to do data rebalance on kafka if data is stored persistently

I'm new to kafka and preparing use it for production.
What strategies can be used for rebalancing data storage if brokers for a topic's current partitions are running out of disk space, if more brokers can be added to the cluster?
By a simple example, say a topic has 3 partitions at beginning (1 replica to simplify problem), and 3 brokers each stores 1 partition of the topic, and each of these partition takes up 1TB disk space.
How can I add 3 more new broker servers and alter topic's partition amount to 6, and end up with a data rebalance result of each of the 6 partitions takes up 500GB disk space on its broker?
I think this problem is critical for storing large amount of data forever in kafka cluster.
Thanks.
kafka-reassign-partitions & kafka-preferred-replica-election are the built in commands to handle such relocation tasks, as Kafka does not perform it automatically on cluster expansion.
There are vendored alternatives, such as from Confluent and DataDog.
How can I add 3 more new broker servers
See Docs - Expanding your cluster
alter topic's partition amount to 6
Use kafka-topics --alter and increase partitions (note: this does not relocate existing data to new partitions, or in other words "re-key" the topic)
Also, keep in mind that once you create topics, replicas and ISRs will get defined. Where possible, try to choose a replication factor of 3 for resiliency and durability. Having a replication factor of 2 in a 3-node cluster is not helpful in certain sticky situations, where if one (of the 3) brokers goes down, then none of the available or online brokers will join the replica set (to satisfy the replication factor) and move into the ISR.
In a situation like this, you will end up with an ISR that is incomplete and worse, end up with a single point of failure.
Note that broker being down if different from expanding or contracting the Kafka cluster.

Does shutting down a broker, moves the replicas on that broker to a new broker automatically?

I have a Kafka cluster with 6 brokers and over 60 topics, with a replication factor of either 2 or 3. We plan to replace all the existing brokers with new nodes.
I have 2 questions:
Once we add 6 new nodes to the cluster making it total 12 nodes, and shut down old brokers one by one, will the replicas move to new brokers automatically?
If not, we will have to move them using the reassignment tool, in that case do we need to move __consumer_offsets topic as well, or Kafka will take care of that itself?
No, replicas are not moved automatically. Before shutting down the old brokers, you'll have to reassign replicas using the kafka-reassign-partitions.sh tool. See http://kafka.apache.org/documentation/#basic_ops_cluster_expansion for the details how to use this tool
Yes, you will need to move all partitions, including the internal ones (__consumer_offsets and __transaction_state)

Shutdown Kafka Cluster and then Start Kafka Cluster

I have a 2 Broker node kafka with 3 node ZooKeeper cluster. When Stopping and starting Kafka cluster, what are the steps I should take,
Do I go, stop 2 brokers individually first and then stop 3 zks individually?
And then start zks individually and 2 kafka brokers individually?
Assumptions
This is a production cluster and you don't want any data loss.
You have partition replicas spanned across the brokers
For each partition you have at least one replica on each broker
all zks are accessible by each broker
This is how I would do it
Take down individual broker.
When one of the broker is down, then describe the topics to check if only replicas that are not shown, belong to broker which was taken down.
Restart the broker and again verify that all partitions are in sync before going for next broker.
Then stop-start each zk individually, each time tracking if all replicas and partitions are in sync. That way at least one zk is available for 2 brokers to maintain their meta data