Why are my celery worker processes processing everything in serial? - celery

I'm running multiple celery worker processes on a AWS c3.xlarge (4 core machine). There is a "batch" worker process with its --concurrency parameter set to 2, and a "priority" process with its --concurrency parameter set to 1. Both worker processes draw from the same priority queue. I am using Mongo as my broker. When I submit multiple jobs to the priority queue they are processed serially, one after the other, even though multiple workers are available. All items are processed by the "priority" process, but if I stop the "priority" process, the "batch" process will process everything (still serially). What could I have configured incorrectly that prevents celery from processing jobs asynchronously?
EDIT: It turned out that the synchronous bottleneck is in the server submitting the jobs rather than in celery.

By default the worker will prefetch 4 * concurrency number tasks to execute, which means that your first running worker is prefetching 4 tasks, so if you are queuing 4 or less tasks they will be all processed by this worker alone, and there won't be any other messages in the queue to be consumed by the second worker.
You should set the CELERYD_PREFETCH_MULTIPLIER to a number that works best for you, I had this problem before and set this option to 1, now all my tasks are fairly consumed by the workers.

Related

Kafka Streams - Relation between "Stream threads" vs "Tasks" running on 1 C4.XLarge Machine

I have a Kafka Streams Topology which has 5 Processors & 1 Source. Source topic for this topology has 200 partitions. My understanding is 200 tasks get created to match # of partitions for input topic.
This Kafka Streams app is running on C4.XLarge & these 200 tasks run on single stream thread which means this streams thread should be using up all the CPU Cores (8) & memory.
I know Kafka streams parallelism/scalability is controlled by number of stream threads. I can increase the num.stream.threads to 10, but how would it improve the performance if all of them run on single EC2 instance ?. How would it differ from running all tasks on single stream thread which is on single EC2 instance ?.
If you have a 8 core machine, you might want to run 8 StreamsThreads.
This Kafka Streams app is running on C4.XLarge & these 200 tasks run on single stream thread which means this streams thread should be using up all the CPU Cores (8) & memory.
This does not sound correct. A single thread cannot utilize multiple cores. While configuring a single StreamThread implies that some more other background threads are started (consumer heartbeat thread; producer sender thread), it would assume that you cannot fully utilize all 8 cores with this setting.
If 8 StreamsThreads do not fully utilize your 8 cores you might consider to configure 16 thread. However note, that all thread will share the same network and thus, if the network is the actually limiting factor, running more threads won't give you higher throughput (or higher CPU utilization). For this case, you need to scale out using multiple EC2 instances.
Given that you have 200 tasks, you could conceptually run up to 200 StreamThreads but you probably own't need 200 threads.

CeleryExecutor: Does the airflow metric "executor.queued_tasks" report the number of tasks in the celery broker?

Using its statsd plugin, Airflow can report on metric executor.queued_tasks as well as some others.
I am using CeleryExecutor and need to know how many tasks are waiting in the Celery broker, so I know when new workers should be spawned. Indeed, I set my workers so they cannot take many tasks concurrently. Is this metric what I need?
Nope. If you want to know how many TIs are waiting in the broker, you'll have to connect to it.
Task instances that are waiting to get picked up in the celery broker are queued according to the Airflow DB, but running according to the CeleryExecutor. This is because the CeleryExecutor considers that any task instance that was successfully sent to the broker is now running (unlike the DB, which waits for a worker to pick it up before marking it as running).
Metric executor.queued_tasks reports the number of tasks queued according to the executor, not the DB.
The number of queued task instances according to the DB is not exactly what you need either, because it reports the number of task instances that are waiting in the broker plus the number of task instances queued to the executor. But when would TIs be stuck in the executor's queue, you ask? When the parallelism setting of Airflow prevents the executor from sending them to the broker.

How to avoid repetition of job in celery

I am using celery with SQS.
One of my celery worker go to DB and check for pending messages and sends notification.
I have 2 worker doing same job.
The problem here is this job is scheduled and there are 2 worker for job so this causing a problem that some of the messages are getting sent twice.
How to avoid 2 jobs picking same message?
Should I stop using 2 worker for processing scheduled jobs?

Kafka Streams: Threads vs CPU cores

If the machine has 16 cores and if we define 6 threads in the config, would Kafka Streams utilize 6 cores OR would all the threads run on just a single core OR there is no control over the cores?
It is wrong to think this approach and multiple factors are involved here
If we define tasks as 6 , it means we have 6 partition for that topic which will be consumed parallelly by kafka consumer or connector.
If you have 16 cores and no other process running then chances are that , it will be executed as you expected.
This is not normal production scenario where we have multiple topics (having more than 1 partition ) which invalidated your theory.
You should have task based on consumer and machine should have only worker.
Once above condition is satisfied , we can perform performance test on that data
How much time takes to process 50k record ?
What is out expected time ?
We can upgrade our system based on above basic parameters.

How to start a celery worker that only pushes tasks to the broker but does not consume them?

I have the main producer of tasks in a webserver. I do not want the webserver to consume any tasks, so it should only send the tasks to the broker which get consumed by other nodes.
Right now I route tasks using the -Q option in the nodes by specifying the particular queues for each node. Is there a way to specify 0 queues for a worker?
Any help appreciated, thanks!
You do not need to use a worker to push tasks to the broker - you can do that from a regular python process.