Kafka Streams - Relation between "Stream threads" vs "Tasks" running on 1 C4.XLarge Machine - apache-kafka

I have a Kafka Streams Topology which has 5 Processors & 1 Source. Source topic for this topology has 200 partitions. My understanding is 200 tasks get created to match # of partitions for input topic.
This Kafka Streams app is running on C4.XLarge & these 200 tasks run on single stream thread which means this streams thread should be using up all the CPU Cores (8) & memory.
I know Kafka streams parallelism/scalability is controlled by number of stream threads. I can increase the num.stream.threads to 10, but how would it improve the performance if all of them run on single EC2 instance ?. How would it differ from running all tasks on single stream thread which is on single EC2 instance ?.

If you have a 8 core machine, you might want to run 8 StreamsThreads.
This Kafka Streams app is running on C4.XLarge & these 200 tasks run on single stream thread which means this streams thread should be using up all the CPU Cores (8) & memory.
This does not sound correct. A single thread cannot utilize multiple cores. While configuring a single StreamThread implies that some more other background threads are started (consumer heartbeat thread; producer sender thread), it would assume that you cannot fully utilize all 8 cores with this setting.
If 8 StreamsThreads do not fully utilize your 8 cores you might consider to configure 16 thread. However note, that all thread will share the same network and thus, if the network is the actually limiting factor, running more threads won't give you higher throughput (or higher CPU utilization). For this case, you need to scale out using multiple EC2 instances.
Given that you have 200 tasks, you could conceptually run up to 200 StreamThreads but you probably own't need 200 threads.


When does Kafka throw BufferExhaustedException?

org.apache.kafka.clients.producer.BufferExhaustedException: Failed to allocate memory within the configured max blocking time 5 ms.
This says that exception is thrown when producer is unable to allocate memory to the record within the configured max blocking time 5 ms.
This is what it says when I was trying to add Kafka s3-sink connectors. There are 11 topics in two kafka brokers and there were consumers present already consuming from these topics. I was spinning out a 2 node Kafka connect cluster with 11 connectors trying to consume from these topics. But there was a huge spike in errors when I started these s3-sink connectors. Once I stopped these connectors, the errors dropped and seemed to be fine. Then I started the consumers again with less number of tasks and this time the errors spiked up when there was a sudden surge in the traffic and back to normal when the traffic was back to normal. There was a max retry of 5 and it messages failed to write even after 5 attempts.
From whatever I had read, it might be due to producer batch size or producer rate being higher than the consumer rate. And I guess each consumer will be occupying upto 64 mb when there is bursty traffic. Could that be the reason? Should I try and increase the blocking time?
Producer Config:
lingerTime: 0
maxBlockTime: 5
bufferMemory: 1024000
batchSize: 102400
ack: "1"
maxRequestSize: 102400
retries: 1
maxInFlightRequestsPerConn: 1000
It was actually due to the increase in the IOPS of the EC2 instances that Kafka brokers couldn't handle. Increasing the number of bytes fetched per poll and decreasing the frequency of polls fixed it.

Kafka Streams: Threads vs CPU cores

If the machine has 16 cores and if we define 6 threads in the config, would Kafka Streams utilize 6 cores OR would all the threads run on just a single core OR there is no control over the cores?
It is wrong to think this approach and multiple factors are involved here
If we define tasks as 6 , it means we have 6 partition for that topic which will be consumed parallelly by kafka consumer or connector.
If you have 16 cores and no other process running then chances are that , it will be executed as you expected.
This is not normal production scenario where we have multiple topics (having more than 1 partition ) which invalidated your theory.
You should have task based on consumer and machine should have only worker.
Once above condition is satisfied , we can perform performance test on that data
How much time takes to process 50k record ?
What is out expected time ?
We can upgrade our system based on above basic parameters.

kafka Performance Reduced when adding more consumer or producer

I have 3 server with 10GB connection between them and run a Kafka cluster on 2 servers and generate some test in third server...
when I run a single java producer (in third server that is not in Kafka cluster) sending 1 million messages take 3 seconds, but when I run another java producer (with different topic) both of producers take 6 seconds for sending messages.
I sure network connection is not bottleneck (it is 10GB)
so why this problem happened and how can I solve this (I want both producers take 3 seconds) ?
Sounds like you are getting a consistent 333,333 messages/sec performance out of a two node kafka cluster, with zookeeper running on the same 2 machines as your 2 kafka brokers. You don’t say what size these messages are or what kind of disks you are using, or how much memory, or if you are publishing with acks=all, or what programming language you are using (I assume java) but that actually sounds like good consistent results that are probably disk IO bound on the brokers or cpu bound on your single client machine.

Kafka recommended system configuration

I'm expecting our influx in to Kafka to raise to around 2 TB/day over a period of time. I'm planning to setup a Kafka cluster with 2 brokers (each running on separate system). What is the recommended hardware configuration for handling 2 TB/day ?
To use as a base you could look here: https://docs.confluent.io/4.1.1/installation/system-requirements.html#hardware
You need to know the amount of messages you get per second/hour because this will determine the size of your cluster. For HD, it's not necessary to get SSD because the system will use RAM to store the data first. Still you could need quite speed hard disk to ensure that the flushing process of the queue will not slow your system.
I would also recommend to use 3 kafka broker and 3 or 4 zookeeper server too.

Why are my celery worker processes processing everything in serial?

I'm running multiple celery worker processes on a AWS c3.xlarge (4 core machine). There is a "batch" worker process with its --concurrency parameter set to 2, and a "priority" process with its --concurrency parameter set to 1. Both worker processes draw from the same priority queue. I am using Mongo as my broker. When I submit multiple jobs to the priority queue they are processed serially, one after the other, even though multiple workers are available. All items are processed by the "priority" process, but if I stop the "priority" process, the "batch" process will process everything (still serially). What could I have configured incorrectly that prevents celery from processing jobs asynchronously?
EDIT: It turned out that the synchronous bottleneck is in the server submitting the jobs rather than in celery.
By default the worker will prefetch 4 * concurrency number tasks to execute, which means that your first running worker is prefetching 4 tasks, so if you are queuing 4 or less tasks they will be all processed by this worker alone, and there won't be any other messages in the queue to be consumed by the second worker.
You should set the CELERYD_PREFETCH_MULTIPLIER to a number that works best for you, I had this problem before and set this option to 1, now all my tasks are fairly consumed by the workers.