kafka Performance Reduced when adding more consumer or producer

kafka Performance Reduced when adding more consumer or producer - apache-kafka

I have 3 server with 10GB connection between them and run a Kafka cluster on 2 servers and generate some test in third server...
when I run a single java producer (in third server that is not in Kafka cluster) sending 1 million messages take 3 seconds, but when I run another java producer (with different topic) both of producers take 6 seconds for sending messages.
I sure network connection is not bottleneck (it is 10GB)
so why this problem happened and how can I solve this (I want both producers take 3 seconds) ?

Sounds like you are getting a consistent 333,333 messages/sec performance out of a two node kafka cluster, with zookeeper running on the same 2 machines as your 2 kafka brokers. You don’t say what size these messages are or what kind of disks you are using, or how much memory, or if you are publishing with acks=all, or what programming language you are using (I assume java) but that actually sounds like good consistent results that are probably disk IO bound on the brokers or cpu bound on your single client machine.

Related

Apache Kafka Throughput and Latency

I made a Kafka Cluster on my local machine and I was testing creating producers with different Throughput to see what happens to the latency.
I used the kafka-test-perf benchmark to these tests
https://docs.cloudera.com/runtime/7.2.10/kafka-managing/topics/kafka-manage-cli-perf-test.html
Different throughput on producer
When I set the troughput to 200.000 there is only 22k records/sec. This means that my Kafka Cluster in my local machine can not handle this type of throughput?
I tested different throughputs to try to understand what happens here.

Messages are stuck in ActiveMQ Artemis cluster queues

We have a problem with Apache ActiveMQ Artemis cluster queues. Sometimes messages are beginning to pile up in the particular cluster queues. It usually happens 1-4 times per day and mostly on production (it was only one time for last 90 days when it has happened on one of the test environments).
These messages are not delivered to consumers on other cluster brokers until we restart cluster connector (or entire broker).
The problem looks related to ARTEMIS-3809.
Our setup is: 6 servers in one environment (3 pairs of master/backup servers). Operating system is Linux (Red Hat).
We have tried to:
upgrade from 2.22.0 to 2.23.1
increase minLargeMessageSize on the cluster connectors to 1024000
The messages are still being stuck in the cluster queues.
Another problem that I tried to configure min-large-message-size as it written in documentation (in cluster-connection), but it caused errors at start (broker.xml did not pass validation with xsd), so it was only option to specify minLargeMessageSize in the URL parameters of connector for each cluster broker. I don't know if this setting has effect.
So we had to make a script which checks if messages are stuck in the cluster queues and restarts cluster connector.
How can we debug this situation?
When the messages are stuck, nothing wrong is written to the log (no errors, no stacktraces etc.).
Which logging level (for what classes) should we enable to debug or trace level to find out what happens with the cluster connectors?

I believe you can remedy the situation by setting this on your cluster-connection:
<producer-window-size>-1</producer-window-size>
See ARTEMIS-3805 for more details.
Generally speaking, moving message around the cluster via the cluster-connection, while convenient, isn't terribly efficient (much less so for "large" messages). Ideally you would have a sufficient number of clients on each node to consume the messages that were originally produced there. If you don't have that many clients then you may want to re-evaluate the size of your cluster as it may actually decrease overall message throughput rather than increase it.
If you're just using 3 HA pairs in order to establish a quorum for replication then you should investigate the recently added pluggable quorum voting which allows integration with a 3rd party component (e.g. ZooKeeper) for leader election eliminating the need for a quorum of brokers.

storm-kafka-client spout consume message at different speed for different partition

I have a storm cluster of 5 nodes and a kafka cluster installed on the same nodes.
storm version: 1.2.1
kafka version: 1.1.0
I also have a kafka topic of 10 partitions.
Now, i want to consume this topic's data and process it by storm. But the message consume speed is really strange.
For test reason, my storm topology have only one component - kafka spout, and i always set kafka spout parallelism of 10, so that one partition will be read by only one thread.
When i run this topology on just 1 worker, all partitions will be read quickly and the lag is almost the same.(very small)
When i run this topology on 2 workers, 5 partitions will be read quickly, but the other 5 partitions will be read very slowly.
When i run this topology on 3 or 4 workers, 7 partitions will be read quickly and the other 3 partitions will be read very slowly.
When i run this topology on more than 5 workers, 8 partitions will be read quickly and the other 2 partitions will be read slowly.
Another strange thing is, when i use a different consumer group id when configure kafka spout, the test result may be different.
For example, when i use a specific group id and run topology on 5 workers, only 2 partitions can be read quickly. Just the opposite of the test using another group id.
I have written a simple java app that call High-level kafka jave api. I run it on each of the 5 storm node and find it can consume data very quickly for every partition. So the network issue can be excluded.
Has anyone met the same problem before? Or has any idea of what may cause such strange problem?
Thanks!

Does scaling Kafka Connect is same as scaling Kafka Consumer?

We need to pull data from Kafka and write into AWS s3. The Kafka is managed by separate department and we have access to only specific topic.
Based on Kafka documentation it looks like Kafka Connect is easy solution for me because I don't have any custom message processing logic.
Normally when we run Kafka Consumer we can run multiple JVM with same consumer group for scalability. The consumer JVM of specific consumer can run in same physical server or different. What would be the case when I want to use Kafka Connect?
Let's say I have 20 partitions of the topic.
How can I run Kafka Connect with 20 instances?
Can I have multiple instances of Kafka Connect running on the same physical instance?

Kafka Connect handles balancing the load across all its workers. In your example of 20 nodes, you could have : (for example)
1 Kafka Connect worker, processing 20 partitions
5 Kafka Connect workers, each processing 4 partitions
20 Kafka Connect workers, each processing 1 partition
It depends on your volumes and required throughput.
To run Kafka Connect in Distributed mode across multiple nodes, follow the instructions here and make sure you give them all the same group.id which identifies them as members of the same cluster (and thus eligible for sharing workload of tasks out across them). More config details for distributed mode here.
Even if you're running Kafka Connect on a single node, I would personally recommend running it in Distributed mode as it makes scale-out more simple (you just add additional nodes, but the execution & config remains the same).
I'm don't see a benefit in running multiple Kafka Connect workers on a single node. Each Kafka Connect worker can run multiple tasks, and connectors, as required.

My understanding is that if you only have a single machine, you should only launch one kafka connect instance, and configure the tasks.max property to the amount of parallelism you'd like to achieve (in your example 20 might be good). This should allow kafka connect to read from your partitions in parallel, see the docs for this here.
You could launch multiple instances on the same machine in theory. It makes sense to do this if you need each instance to consume data from different topics. But if you want the instances to consume data from the same topic, I don't think doing this would benefit you. Using separate threads within the same process with tasks.max will give you the same if not better performance.
If you want kafka connect to run on multiple machines and read data from the same topic it is possible to run in distributed mode.

Kafka recommended system configuration

I'm expecting our influx in to Kafka to raise to around 2 TB/day over a period of time. I'm planning to setup a Kafka cluster with 2 brokers (each running on separate system). What is the recommended hardware configuration for handling 2 TB/day ?

To use as a base you could look here: https://docs.confluent.io/4.1.1/installation/system-requirements.html#hardware
You need to know the amount of messages you get per second/hour because this will determine the size of your cluster. For HD, it's not necessary to get SSD because the system will use RAM to store the data first. Still you could need quite speed hard disk to ensure that the flushing process of the queue will not slow your system.
I would also recommend to use 3 kafka broker and 3 or 4 zookeeper server too.