Kafka Connect Task spawn strategy - apache-kafka

I have a general question regarding Kafka-Connect. I went through documentation, blogs but couldn't find a straight answer.
If there are two workers, running single Connector(instance) then
How does a Connector(instance) decide when to spawn a new task, if eg. tasks.max = 10? Also, how does a Connector(instance) decide how many tasks to spawn, if eg. tasks.max = 10?
Does it depend upon underlying hardware configuration? eg. number of cores or memory or cpu utilization?

The exact algorithm is internal to Kafka-Connect but it generally relates to the number of partitions and topics. So for example if you set tasks.max = 10 and have the following sink connector configuration:
1 topic, 1 partition - then Kafka connect will only spawn a single task
2 topics, 1 partition each - then Kafka connect will spawn 2 tasks, 1 for each topic
2 topics, 5 partitions each - then Kafka connection will spawn 10 tasks, 1 for each topic partition
4 topics, 5 partitions each - the Kafka connection will spawn 10 tasks, each handling data from 2 topic partitions.
Got this explanation on another forum.

Related

Kafka cluster performance dropped after adding more Kafka brokers

does anybody knows of a possible reason of slowing down messages processing when more Kafka brokers are added to the cluster?
The situation is the following:
1 setup: In a Kafka cluster of 3 brokers I produce some messages to 50 topics (replication factor=2, 1 partition, ack=1), each has a consumer assigned. I measure the avg time to process 1 message (from producing to consuming).
2 setup: I add 2 more Kafka brokers to the cluster - they are created by the same standard tool, so have the same characteristics like cpu/ram, and the same Kafka configs. I create 50 new topics (replication factor=2, 1 partition, ack=1) - just to save my time and not doing replicas reassignment. So the replicas are spread over the 5 brokers. I produce some messages only to the new 50 topics and measure the avg processing time - it became slower in almost 1/3.
So I didn't change any settings of producer, consumers or brokers (except for listing 2 new brokers in the config of Kafka and zookeeper), and can't explain the performance drop. Please point me to any config option/log file/useful article that would help to explain this, and thank you so much in advance.
In a Kafka cluster of 3 brokers I produce some messages to 50 topics
In the first setup, you have 50 topics with 3 brokers.
I add 2 more Kafka brokers to the cluster. I create 50 new topics
In the second setup, you have 100 topics with 5 brokers.
Even supposing scaling should be linear, 100 topics should contain 6 brokers but not 5
So the replicas are spread over the 5 brokers
Here, how the replicas are spread also matters. A broker may be serving 10 partitions as leader, another broker may be serving 7 and so on. This being the case, a particular broker may have more load compared to other brokers. This could be the cause for slow down.
Also, when you have replication.factor=2, what matters here is whether acks=all or acks=1 or acks=0. If you have put acks=all, then all the replicas must acknowledge the write to the producer which could slow it down.
Next is the locality and configuration of the new brokers, under what machine configurations they are running, their CPU config, RAM, processor load, network between the old brokers, new brokers and clients are also worth considering.
Moreover, if your application is consuming a lot of topics, it necessarily would have to make requests to a lot of brokers since the topic partitions are spread among different brokers. Utilizing one broker to the fullest (CPU, memory etc) vs utilizing multiple brokers can be benchmarked.

kafka Connect: Tasks.max more than # of partitions but the status says RUNNING

In our setup, we have 50 tasks and 40 partitions in the topic. We have 2 workers. Ideally, the connector should start just 40 tasks but we see all 50 tasks have the status as RUNNING. How is that possible?
There are may be idle tasks, but that does not necessarily mean they are in UNASSIGNED or FAILURE state. They are active and running as part of a consumer group (assuming a sink connector).
If you had a source connector, then there are just 50 running producer threads, sending data to all 40 partitions. There isn't a 1:1 limitation on how many producers like there are for consumers.
You're welcome to PUT a new configuration for the connector and set tasks.max back to 40.

Kafka streams: topic partitions->consumer 1:1 mapping not happening

I have written a streams application to talk to topic on cluster of 5 brokers with 10 partitions. I have tried multiple combinations here like 10 application instances (on 10 different machines) with 1 stream thread each, 5 instances with 2 threads each. But for some reason, when I check in kafka manager, the 1:1 mapping between partition and stream thread is not happening. Some of the threads are picking up 2 partitions while some picking up none. Can you please help me with same?? All threads are part of same group and subscribed to only one topic.
The kafka streams version we are using is 0.11.0.2 and broker version is 0.10.0.2
Thanks for your help
Maybe you are hitting https://issues.apache.org/jira/browse/KAFKA-7144 -- I would recommend to upgrade to the latest versions.
Note: you do not need to upgrade your brokers

storm-kafka-client spout consume message at different speed for different partition

I have a storm cluster of 5 nodes and a kafka cluster installed on the same nodes.
storm version: 1.2.1
kafka version: 1.1.0
I also have a kafka topic of 10 partitions.
Now, i want to consume this topic's data and process it by storm. But the message consume speed is really strange.
For test reason, my storm topology have only one component - kafka spout, and i always set kafka spout parallelism of 10, so that one partition will be read by only one thread.
When i run this topology on just 1 worker, all partitions will be read quickly and the lag is almost the same.(very small)
When i run this topology on 2 workers, 5 partitions will be read quickly, but the other 5 partitions will be read very slowly.
When i run this topology on 3 or 4 workers, 7 partitions will be read quickly and the other 3 partitions will be read very slowly.
When i run this topology on more than 5 workers, 8 partitions will be read quickly and the other 2 partitions will be read slowly.
Another strange thing is, when i use a different consumer group id when configure kafka spout, the test result may be different.
For example, when i use a specific group id and run topology on 5 workers, only 2 partitions can be read quickly. Just the opposite of the test using another group id.
I have written a simple java app that call High-level kafka jave api. I run it on each of the 5 storm node and find it can consume data very quickly for every partition. So the network issue can be excluded.
Has anyone met the same problem before? Or has any idea of what may cause such strange problem?
Thanks!

Kafka consumers unable to keep pace on some brokers, not others

I have a topic with 6 partitions spread over 3 brokers (ie 2 partitions per broker).
I have consumers on 6 separate worker nodes (using Storm).
The partitions are all accepting 20MB/s of messages.
2 partitions are able to output 20MB/s to the consumers on 2 nodes but the other 2 are only managing ~15 MB/s.
File cache is working properly and there are no direct disk reads on any broker.
The offset tracking for the partitions is done by the consumer (ie manualPartitionAssignment, nothing committed to Kafka nor Zookeeper).
What could be causing the apparent internal latency on 2 of the brokers for 4 of the partitions? The load profile, GC etc seems similar across all 3 brokers' JVMs. I am monitoring all manner of metrics for the fetch consumer operation etc through the JMX Mbeans but can't figure this out. Any pointers?