I have a single queue in RabbitMQ where there can be 5-6 tasks queued at a time. Currently, there is one worker for the queue which takes one task at a time and until the said task is completed and acknowledged other tasks of the queue won't be picked. I want to have multiple consumers for the same queue. So that they would take the remaining tasks and process them without any idle time.
3 options:
Start more threads in the consumer application
Start up more instances of the consumer application
Consume more than one message at the time and delegate the messages to worker threads.
Related
I have been looking at understanding how in Kafka stream, a Stream Thread switch between the execution of the Tasks it is assigned with, but could not find the answer online.
Thread pool are well understood in Java, we know we should not block in our code, as this can quickly lead to thread starvation. In other words, a task will execute until it finishes, then the thread takes anything else that has been submitted to the thread pool.
Hence in the same spirit, I am wondering given that task read data that continuously come through their input partitions, which technically never ends, how does a Stream thread, switch between task?
This information can be helpful in deciding how many task we are prepared to pack per stream thread, depending on what we know about our workload.
The Kafka StreamThread is not analogous to a threadpool. A StreamThread processes one record at a time, from the first Task in the topology down through the completion of the Topology for that one Record.
Once the Record has been processed, a new input Record is fetched from the Input Topics.
Only one StreamThread may process records from any given partition; however, one StreamThread may process multiple partitions. Therefore it is useless to have more StreamThread's than input partitions.
I would recommend over-allocating the number of partitions so that you can increase the number of stream threads as you scale up.
I have multiple queues in Nats with subjects like "queue.1", "queue.2" and etc, multiple workers in the same queue group (let's call it just group). This workers subscribed to "queue.*", Which means that they want to receive messages from all queues. But i need to process this messages in sequential manear in context of one queue. For now if the producer will send message "task.1" to "queue.1" and this message will process slowly by "worker.1", slower than producer will sent message "task.2" to the same queue - it means that another "worker.2" will get message "task.2" before "task.1" will be processed. How can i await "task.1" before processing "task.2"
I am creating a service that polls messages from Kafka topics and hands over each message received during the poll interval to a worker thread from the thread pool. The worker thread processes the message by talking to another service.
How should I handle committing Kafka offsets for this case? If I choose to wait for all the threads to complete, the processing speed decreases. Also once a message reaches a worker thread, it is guaranteed that either the message processing completes successfully or the message is added to a dead letter topic to be seen later if some error occurs during message processing provided that the host on which the service is running doesn't go down. So I may commit the offsets as soon as I have submitted the messages to the thread pool but then I run the risk of losing messages for host crashes. How should I prevent losing messages here or should I use some other strategy for committing/ maintaining offsets.
For example, we have a queue of tasks that handle a number and 2 consumers:
|-> c1
[5,4,3,2,1] -
|-> c2
When a message is consumed, let's say it performs some asynchronous action such as updating a number in a database.
my_number_table
---
number(int)
If consumer 1 gets task 1 and starts to update the row in the database, and consumer 2 gets task 2 and starts to update the same row in the database, won't the database lock?
I believe I would like task 2 to not get picked up by any consumer until task 1 is completed and the number has been successfully saved into the database first.
You can solve your problem by creating two different queues for two consumers. and send all the dependent tasks to the same queue. and also make sure only one consumer is accessing that queue. for example, if you want task2 should start only after the task1 and sent both of them same queue queue1.
you can solve your problem by using one queue for both consumers and acquiring a lock on the database. but still, you can't assure the execution order is maintained. for instance, consumer1 has very high latency and consumer2 has low latency so task2 will reach consumer2 before the task1 reaches consumer1. By the time the consumer1 acquiring the lock the consumer2 might have the finished the task2.
So from my understanding it's better to send the dependent task to the same consumer.
I have written a worker service to consume messages from a Kafka queue, and I have also written a test script to add messages to the queue every few seconds.
What I have noticed is that often the consumer will sit by idle for minutes at a time, while messages are being added to the queue. Then suddenly the consumer will pick up the first message, process it, then rapidly move on to the rest. So it eventually catches up, but I'm wondering why there such a delay in the first place?
Consumer group will take some times to contact group coordinator and get assigned partitions automatically during the delay.
If you use manual assignment, you will get less delay.