Kafka consumers unable to keep pace on some brokers, not others - apache-kafka

I have a topic with 6 partitions spread over 3 brokers (ie 2 partitions per broker).
I have consumers on 6 separate worker nodes (using Storm).
The partitions are all accepting 20MB/s of messages.
2 partitions are able to output 20MB/s to the consumers on 2 nodes but the other 2 are only managing ~15 MB/s.
File cache is working properly and there are no direct disk reads on any broker.
The offset tracking for the partitions is done by the consumer (ie manualPartitionAssignment, nothing committed to Kafka nor Zookeeper).
What could be causing the apparent internal latency on 2 of the brokers for 4 of the partitions? The load profile, GC etc seems similar across all 3 brokers' JVMs. I am monitoring all manner of metrics for the fetch consumer operation etc through the JMX Mbeans but can't figure this out. Any pointers?

Related

Kafka - broker partitions not in-sync after restart

We use 3 node kafka clusters running 2.7.0 with quite high number of topics and partitions. Almost all the topics have only 1 partition and replication factor of 3 so that gives us roughly:
topics: 7325
partitions total in cluster (including replica): 22110
Brokers are relatively small with
6vcpu
16gb memory
500GB in /var/lib/kafka occupied by partitions data
As you can imagine because we have 3 brokers and replication factor 3 the data is very evenly spread across brokers. Each broker leads very similar (same) amount of partitions and the number of partitions per broker is equal. Under normal circumstances.
Before doing rolling restart yesterday everything was in-sync. We stopped the process and started it again after 1 minute. It took some 10minutes to get synchronized with Zookeeper and start listening on port.
After saing 'Kafka server started'. Nothing is happening. There is no CPU, memory or disk activity. The partition data is visible on data disk. There are no messages in log for more than 1 day now since process booted up.
We've tried restarting zookeeper cluster (one by one). We've tried restart of broker again. Now it's been 24 hours since last restart and still not change.
Broker itself is reporting it leads 0 partitions. Leadership for all the partitions moved to other brokers and they are reporting that everything located in this broker is not in sync.
I'm aware the number of partitions per broker is far exceeding the recommendation but I'm still confused by lack of any activity or log messages. Any ideas what should be checked further? It looks like something is stuck somewhere. I checked the kafka ACLs and there are no block messages related to broker username.
I tried another restart with DEBUG mode and it seems there is some problem with metadata. These two messages are constantly repeating:
[2022-05-13 16:33:25,688] DEBUG [broker-1-to-controller-send-thread]: Controller isn't cached, looking for local metadata changes (kafka.server.BrokerToControllerRequestThread)
[2022-05-13 16:33:25,688] DEBUG [broker-1-to-controller-send-thread]: No controller defined in metadata cache, retrying after backoff (kafka.server.BrokerToControllerRequestThread)
With kcat it's also impossible to fetch metadata about topics (meaning if I specify this broker as bootstrap server).

Kafka cluster performance dropped after adding more Kafka brokers

does anybody knows of a possible reason of slowing down messages processing when more Kafka brokers are added to the cluster?
The situation is the following:
1 setup: In a Kafka cluster of 3 brokers I produce some messages to 50 topics (replication factor=2, 1 partition, ack=1), each has a consumer assigned. I measure the avg time to process 1 message (from producing to consuming).
2 setup: I add 2 more Kafka brokers to the cluster - they are created by the same standard tool, so have the same characteristics like cpu/ram, and the same Kafka configs. I create 50 new topics (replication factor=2, 1 partition, ack=1) - just to save my time and not doing replicas reassignment. So the replicas are spread over the 5 brokers. I produce some messages only to the new 50 topics and measure the avg processing time - it became slower in almost 1/3.
So I didn't change any settings of producer, consumers or brokers (except for listing 2 new brokers in the config of Kafka and zookeeper), and can't explain the performance drop. Please point me to any config option/log file/useful article that would help to explain this, and thank you so much in advance.
In a Kafka cluster of 3 brokers I produce some messages to 50 topics
In the first setup, you have 50 topics with 3 brokers.
I add 2 more Kafka brokers to the cluster. I create 50 new topics
In the second setup, you have 100 topics with 5 brokers.
Even supposing scaling should be linear, 100 topics should contain 6 brokers but not 5
So the replicas are spread over the 5 brokers
Here, how the replicas are spread also matters. A broker may be serving 10 partitions as leader, another broker may be serving 7 and so on. This being the case, a particular broker may have more load compared to other brokers. This could be the cause for slow down.
Also, when you have replication.factor=2, what matters here is whether acks=all or acks=1 or acks=0. If you have put acks=all, then all the replicas must acknowledge the write to the producer which could slow it down.
Next is the locality and configuration of the new brokers, under what machine configurations they are running, their CPU config, RAM, processor load, network between the old brokers, new brokers and clients are also worth considering.
Moreover, if your application is consuming a lot of topics, it necessarily would have to make requests to a lot of brokers since the topic partitions are spread among different brokers. Utilizing one broker to the fullest (CPU, memory etc) vs utilizing multiple brokers can be benchmarked.

Kafka - How to recover if a partition is lost?

I have 4 Kafka nodes in a cluster, one topic split to 40 partitions and replica count 2. Kafka version is 2.3.1.
How can I recover from the situation when two Kafka nodes die at the same time, it is not possible to start them again and Kafka logs are lost?
I'm sure that I lose some data because some partitions are lost (some partitions have replicas only on the died nodes).
I tried to add two new Kafka nodes and reassign partitions to all 4 available Kafka nodes. Finally, lost partitions are not reassigned to the two new Kafka nodes. Clients cannot publish data that go to lost partitions.
Kafka recovers by himself the losing partitions only if those partitions still have at least one alive replica that was previously in sync. Otherwise unclean.leader.election must be enabled on the brokers to move the leader to an out of sync replica
Since partitions had only 2 replica and you lost 2 nodes, you might lose some partitions.
You can replace 2 replica by 4 replica to more reliability
The two added nodes should have the same id as the previous ones to be able to pull replica.

Kafka Producer, Consumer, Broker in same host?

Are there any downsides to running the same producer and consumer code for all nodes in the cluster? If there are 8 nodes in the cluster (8 consumer, 8 kafka broker, and 8 producers), would 8 producers be running at the same time in the cluster then? Is there a way to modify cluster so that only one producer runs at a time?
Kafka cluster is nothing but Kafka brokers running under a distributed consensus. Kafka cluster is agnostic about number of producers and consumers running around it. Producers and consumers are clients of the Kafka cluster. Producers will stream data to Kafka and consumers consume the data from Kafka. Within Kafka cluster data will be distributed within topics. Topics are sharded using partitions. If multiple consumers belong to the same consumer group consumers can work in a self healing fashion.
Is there a way to modify cluster so that only one producer runs at a
time?
If you intend to run a single producer at certain point of time, you don't need to make any change within cluster.
Are there any downsides to running the same producer and consumer code for all nodes in the cluster?
The primary downsides here would be scalability and memory usage.
Producers and Consumers are not required to run on Brokers. Producers should be deployed where data is being generated (or running as separate hosts, like Kafka Connect workers).
Consumers should be scaled out independently based on the throughput and ordering guarantees that you need in your downstream systems.
There is nothing that says 8 brokers requires 8 producers and 8 consumers; partitions are what matters more
If you have N partitions in a topic, you can only scale to N active consumers anyway, and infinitely many producers
8 brokers can hold lots of partitions for any given topic
Running a single producer is an implementation of your own code. The broker cannot force it.

storm-kafka-client spout consume message at different speed for different partition

I have a storm cluster of 5 nodes and a kafka cluster installed on the same nodes.
storm version: 1.2.1
kafka version: 1.1.0
I also have a kafka topic of 10 partitions.
Now, i want to consume this topic's data and process it by storm. But the message consume speed is really strange.
For test reason, my storm topology have only one component - kafka spout, and i always set kafka spout parallelism of 10, so that one partition will be read by only one thread.
When i run this topology on just 1 worker, all partitions will be read quickly and the lag is almost the same.(very small)
When i run this topology on 2 workers, 5 partitions will be read quickly, but the other 5 partitions will be read very slowly.
When i run this topology on 3 or 4 workers, 7 partitions will be read quickly and the other 3 partitions will be read very slowly.
When i run this topology on more than 5 workers, 8 partitions will be read quickly and the other 2 partitions will be read slowly.
Another strange thing is, when i use a different consumer group id when configure kafka spout, the test result may be different.
For example, when i use a specific group id and run topology on 5 workers, only 2 partitions can be read quickly. Just the opposite of the test using another group id.
I have written a simple java app that call High-level kafka jave api. I run it on each of the 5 storm node and find it can consume data very quickly for every partition. So the network issue can be excluded.
Has anyone met the same problem before? Or has any idea of what may cause such strange problem?
Thanks!