I am having Kafka cluster with 3 brokers and 3 zookeeper node running. we have added 4th broker recently. When we bring it as new cluster, few partitions got stored in the 4th broker as expected. Replication factor for all topics is 3 and has each topic has 10 partitions.
Later, Whenever we bring down whole kafka cluster for maintenance activity and bring it back, all topic partitions is getting stored in first 3 brokers and no partition is getting stored in 4th broker. (Note: Due to bug, we had to use new log directory every time kafka is brought up, pretty much like a new cluster)
I can see that all 4 brokers is available in zookeeper (when i do ls /brokers/ids i can see 4 broker ids) but partition is not distributed to 4th broker.
But when i trigger partition reassignment to move few partitions to 4th broker, it worked fine and 4th broker started storing the given partition. Both producer and consumer able to send and fetch data form 4th broker respectively.I cant find the reason why this storage imbalance is happening among kafka brokers. Please share your suggestion.
When we bring it as new cluster, few partitions got stored in the 4th broker as expected.
This should only be expected when you create new topics or expand partitions of existing ones. Topics do not automatically relocate to new brokers
had to use new log directory every time kafka is brought up
That might explain why data is missing. Unclear what bug you're running into, but this step shouldn't be necessary
when i trigger partition reassignment to move few partitions to 4th broker, it worked fine and 4th broker started storing the given partition. Both producer and consumer able to send and fetch data form 4th broker respectively
This is the correct way to expand a cluster, and sounds like it's working as expected.
Related
I tried to follow
https://medium.com/#iet.vijay/kafka-multi-brokers-multi-consumers-and-message-ordering-b61ad7841875
to create multiple brokers and consumer.
I am able to produce message and consume the same.
when i try to describe the topic the below is the output which I got.
Can some one explain me about the partitions and leader and replicas here in above image.
All producer and consumer requests are sent to the leader broker, which is elected by the Kafka Controller.
Replicas are the non-leader broker. Replicas can be in or out of sync with the leader (ISR = "in sync replica")
The numbers that are shown are each of the broker.id values from the broker properties, which default to increment from 0 if not set
More details at https://kafka.apache.org/documentation/#replication
Worth pointing out that running multiple brokers on a single host is less than ideal; you still have a single point of failure and you're causing unnecessary duplicate writes on single hard drive for each replica
I have a topic "reptop" with replication factor 3. My cluster consist of 4 brokers [IDs: 0,1,2,3]. When the topic was created brokers 0,2 and 3 were assigned to the topic, with leader as '2', now when one of my brokers, leader or follower goes down Kafka does not replicate the topic to broker:1 even though it is healthy and the ISR is less than replication-factor, but when the broker which had gone down and was initially assigned to the topic, comes back up kafka replicates the topic to this node. So the question is why does the kafka not replicate the topic to the brokers that were not assigned the topic when the topic was created even though there are healthy brokers on the cluster and "ISR
This is by design. If you want to reassign the partitions, you must do so with the reassignment tool. Another option is to bring up a new broker instance with the missing ID. Kafka does not "self heal" like say hdfs and there are many cases where you wouldn't want it to. If you want it to, there are told out there like confluent rebalancer that can be used.
My issue is that I have a three broker Kafka Cluster and an availability requirement to have access to consume and produce to a topic when one or two of my three brokers is down.
I also have a reliability requirement to have a replication factor of 3. These seem to be conflicting requirements to me. Here is how my problem manifests:
I create a new topic with replication factor 3
I send several messages to that topic
I kill one of my brokers to simulate a broker issue
I attempt to consume the topic I created
My consumer hangs
I review my logs and see the error:
Number of alive brokers '2' does not meet the required replication factor '3' for the offsets topic
If I set all my broker's offsets.topic.replication.factor setting to 1, then I'm able to produce and consume my topics, even if I set the topic level replication factor to 3.
Is this an okay configuration? Or can you see any pitfalls in setting things up this way?
You only need as many brokers as your replication factor when creating the topic.
I'm guessing in your case, you start with a fresh cluster and no consumers have connected yet. In this case, the __consumer_offsets internal topic does not exist as it is only created when it's first needed. So first connect a consumer for a moment and then kill one of the brokers.
Apart from that, in order to consume you only need 1 broker up, the leader for the partition.
So I'm trying the kafka quickstart as per the main documentation. Got the multi-cluster example all setup and test per the instructions and it works. For example, bringing down one broker and the producer and consumer can still send and receive.
However, as per the example, we setup 3 brokers and we bring down broker 2 (with broker id = 1). Now if I bring up all brokers again, but I bring down broker 1 (with broker id = 0), the consumer just hangs. This only happens with broker 1 (id = 0), does not happen with broker 2 or 3. I'm testing this on Windows 7.
Is there something special here with broker 1? Looking at the config they are exactly the same between all 3 brokers except the id, port number and log file location.
I thought it is just a problem with the provided console consumer which doesn't take a broker list, so I wrote a simple java consumer as per their documentation using the default setup but specify the list of brokers in the "bootstrap.servers" property, but no dice, still get the same problem.
The moment I startup broker 1 (broker id = 0), the consumers will just resume working. This isn't a highly available/fault tolerant behavior for the consumer... any help on how to setup a HA/fault tolerant consumer?
Producers doesn't seem to have an issue.
If you follow the quick-start, the created topic should have only one partition with one replica which is hosted in the first broker by default, namely broker 1. That's why the consumer got failed when you brought down this broker.
Try to create a topic with multiple replicas(specifying --replication-factor when creating topic) and rerun your test to see whether it brings higher availability.
I'm testing kafka's partition reassignment as a precursor to launching a production system. I have several topics with 9 partitions each and a replication factor of 3. I've killed one of the brokers to simulate a failure condition and verified that some topics became under replicated (verification done via a fork of yahoo's kafka manager modified to allow adding a version 0.10.0.1 cluster).
I then started a new broker with a different id. I would now like to distribute partitions to this new broker. I attempted to use kafka manager's reassign partitions functionality however that did not work (possibly due to an improperly modified fork).
I saw that kafka comes with a bin/kafka-reassign-partitions.sh script but the docs say that I have to manually write out the partition reassignments for each topic in json format. Is there a way to handle this without manually deciding on which brokers partitions must go?
Hmm what a coincidence that I was doing exactly the same thing today. I don't have an answer you're probably going to like but I achieved what I wanted in the end.
Ultimately, what I did was executed the kafka-reassign-partitions command with what the same tool proposed for a reassignment. But whatever it generated I just replaced the new broker id with the old failed broker id. For some reason the generated json moved everything around.
This will fail (or rather never complete) because the old broker has passed on. I then had to delete the reassignment operation in zookeeper (znode: admin/reassign_partitions or something).
Then I restarted kafka on the new broker and it magically picked up as leader of the partition that was looking for a new replacement leader.
I'll let you know if everything is still working tomorrow and if I still have a job ;-)