storm-kafka-client spout consume message at different speed for different partition - apache-kafka

I have a storm cluster of 5 nodes and a kafka cluster installed on the same nodes.
storm version: 1.2.1
kafka version: 1.1.0
I also have a kafka topic of 10 partitions.
Now, i want to consume this topic's data and process it by storm. But the message consume speed is really strange.
For test reason, my storm topology have only one component - kafka spout, and i always set kafka spout parallelism of 10, so that one partition will be read by only one thread.
When i run this topology on just 1 worker, all partitions will be read quickly and the lag is almost the same.(very small)
When i run this topology on 2 workers, 5 partitions will be read quickly, but the other 5 partitions will be read very slowly.
When i run this topology on 3 or 4 workers, 7 partitions will be read quickly and the other 3 partitions will be read very slowly.
When i run this topology on more than 5 workers, 8 partitions will be read quickly and the other 2 partitions will be read slowly.
Another strange thing is, when i use a different consumer group id when configure kafka spout, the test result may be different.
For example, when i use a specific group id and run topology on 5 workers, only 2 partitions can be read quickly. Just the opposite of the test using another group id.
I have written a simple java app that call High-level kafka jave api. I run it on each of the 5 storm node and find it can consume data very quickly for every partition. So the network issue can be excluded.
Has anyone met the same problem before? Or has any idea of what may cause such strange problem?
Thanks!

Related

Kafka cluster performance dropped after adding more Kafka brokers

does anybody knows of a possible reason of slowing down messages processing when more Kafka brokers are added to the cluster?
The situation is the following:
1 setup: In a Kafka cluster of 3 brokers I produce some messages to 50 topics (replication factor=2, 1 partition, ack=1), each has a consumer assigned. I measure the avg time to process 1 message (from producing to consuming).
2 setup: I add 2 more Kafka brokers to the cluster - they are created by the same standard tool, so have the same characteristics like cpu/ram, and the same Kafka configs. I create 50 new topics (replication factor=2, 1 partition, ack=1) - just to save my time and not doing replicas reassignment. So the replicas are spread over the 5 brokers. I produce some messages only to the new 50 topics and measure the avg processing time - it became slower in almost 1/3.
So I didn't change any settings of producer, consumers or brokers (except for listing 2 new brokers in the config of Kafka and zookeeper), and can't explain the performance drop. Please point me to any config option/log file/useful article that would help to explain this, and thank you so much in advance.
In a Kafka cluster of 3 brokers I produce some messages to 50 topics
In the first setup, you have 50 topics with 3 brokers.
I add 2 more Kafka brokers to the cluster. I create 50 new topics
In the second setup, you have 100 topics with 5 brokers.
Even supposing scaling should be linear, 100 topics should contain 6 brokers but not 5
So the replicas are spread over the 5 brokers
Here, how the replicas are spread also matters. A broker may be serving 10 partitions as leader, another broker may be serving 7 and so on. This being the case, a particular broker may have more load compared to other brokers. This could be the cause for slow down.
Also, when you have replication.factor=2, what matters here is whether acks=all or acks=1 or acks=0. If you have put acks=all, then all the replicas must acknowledge the write to the producer which could slow it down.
Next is the locality and configuration of the new brokers, under what machine configurations they are running, their CPU config, RAM, processor load, network between the old brokers, new brokers and clients are also worth considering.
Moreover, if your application is consuming a lot of topics, it necessarily would have to make requests to a lot of brokers since the topic partitions are spread among different brokers. Utilizing one broker to the fullest (CPU, memory etc) vs utilizing multiple brokers can be benchmarked.

Kafka consumer is not reading from only one partition out of 4

I was using Kafka 0.9 and recently migrated to Kafka 1.0, but the client I am using is still 0.9. Irrespective of this I was facing a problem where our consumers sometimes intermittently stop consuming from one or two of the partitions.
I have 5 consumers reading from 24 partitions, these are consumer JVM threads created from an application deployed in the single server. Frequently one of the consumer (thread) will stop reading from one of the partitions it would be consuming from.
Eg: One consumer thread would be reading from partition 1,2,3,and 4. It will stop reading from partition 1 and end up in building the lag. I have to restart the consumer to start picking those messages from that particular partition.
I want to understand the issue here.
My consumer configuration
session.timeout.ms=150000
request.timeout.ms=300000
max.partition.fetch.bytes=153600

Kafka streams: topic partitions->consumer 1:1 mapping not happening

I have written a streams application to talk to topic on cluster of 5 brokers with 10 partitions. I have tried multiple combinations here like 10 application instances (on 10 different machines) with 1 stream thread each, 5 instances with 2 threads each. But for some reason, when I check in kafka manager, the 1:1 mapping between partition and stream thread is not happening. Some of the threads are picking up 2 partitions while some picking up none. Can you please help me with same?? All threads are part of same group and subscribed to only one topic.
The kafka streams version we are using is 0.11.0.2 and broker version is 0.10.0.2
Thanks for your help
Maybe you are hitting https://issues.apache.org/jira/browse/KAFKA-7144 -- I would recommend to upgrade to the latest versions.
Note: you do not need to upgrade your brokers

How exactly Apache Nifi ConsumeKafka_1_0 processor works

I have Nifi cluster of and Kafka is also installed there.
Created one topic with 5 partitions, start consuming that topic with one gourp-id. So that each partition will get unique messages.
Now I created the 5 ConsumeKafka_1_0 processors having the intent of getting unique messages on each consumer side. But only 2 of the ConsumeKafka_1_0 are consuming all the messages rest is setting ideal.
Now what I did is started the 5 command line Kafka consumer, and what happened is, I was able to see the all the partitions are getting the messages and able to consume them from command line consumer in round-robin fashion only.
Also, I tried descried the Kafka group and what I saw was only 2 of the Nifi ConsumeKafka_1_0 is consuming all the 5 partitions and rest is ideal, see the snapshot.
Would you please let me what I am doing wrong here with Nifi consumer processor.
Note - i used Nifi version is 1.5 and Kafka version is 1.0.
I've written this article which explains how the integration with Kafka works:
https://bryanbende.com/development/2016/09/15/apache-nifi-and-apache-kafka
The Apache Kafka client (used by NiFi) is what assigns partitions to the consumers.
Typically if you had a 5 node NiFi cluster, with 1 ConsumeKafka processor on the canvas with 1 concurrent task, then each node would be consuming 1 partition.

Kafka consumers unable to keep pace on some brokers, not others

I have a topic with 6 partitions spread over 3 brokers (ie 2 partitions per broker).
I have consumers on 6 separate worker nodes (using Storm).
The partitions are all accepting 20MB/s of messages.
2 partitions are able to output 20MB/s to the consumers on 2 nodes but the other 2 are only managing ~15 MB/s.
File cache is working properly and there are no direct disk reads on any broker.
The offset tracking for the partitions is done by the consumer (ie manualPartitionAssignment, nothing committed to Kafka nor Zookeeper).
What could be causing the apparent internal latency on 2 of the brokers for 4 of the partitions? The load profile, GC etc seems similar across all 3 brokers' JVMs. I am monitoring all manner of metrics for the fetch consumer operation etc through the JMX Mbeans but can't figure this out. Any pointers?