Celery group limitations - celery

I'm running about 30 celery tasks in a group and each of these tasks in a group might create subtasks which they also run in a group.
When I run this in production load, about 6000 tasks and subtasks are generated. Then some of the tasks in the top-level group just disappear and are not processed by the workers.
I'm wondering if there's any limitation on the number of tasks that can run in a group.

Related

Do all the tasks across agent jobs in the pipeline run on the same agent?

I'm learning about azure pipelines. By default you get 1 free parallel job for x number of minutes.
A pipeline contains number of tasks. And atleast 1 job. All the tasks in the pipeline (across multiple jobs) run on the same agent?
Does 1 parallel job means 1 pipeline execution containing 2 or more jobs? or only 2 jobs?
No. Each job will run on a new agent.
1 parallel job means that one job can run at a time. Two parallel jobs means that two parallel jobs can run at a time, each on a separate agent. And so on.

Cluster Resource Usage in Databricks

I was just wondering if anyone could explain if all compute resources in a Databricks cluster are shared or if the resources are tied to each worker. For example, if two users were connected to a cluster made up of 2 workers with 4 cores per worker and one user's job required 2 cores and the other's required 6 cores, would they be able to share the 8 total cores or would the full 4 cores from one worker be unavailable during the job that only required 2 cores?
TL;DR; Yes, default behavior is to allow sharing but you're going to have to tightly control the default parallelism with such a small cluster.
Take a look at Job Scheduling for Apache Spark. I'm assuming you are using an "all-purpose" / "interactive" cluster where users are working on notebooks OR you are submitting jobs to an existing, all-purpose cluster and it is NOT a job cluster with multiple spark applications being deployed.
Databricks Runs in FAIR Scheduling Mode by Default
Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.
By default, all queries started in a notebook run in the same fair scheduling pool
The Apache Spark scheduler in Azure Databricks automatically preempts tasks to enforce fair sharing.
Apache Spark Defaults to FIFO
By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.
Keep in mind the word "job" is specific Spark term that represents an action being taken that launches one or more stages and tasks. See What is the concept of application, job, stage and task in spark?.
So in your example you have...
2 Workers with 4 cores each == 8 cores == 8 tasks can be handled in parallel
One application (App A) that has a job that launches a stage with only 2 tasks.
One application (App B) that has a job that launches a stage with 6 tasks.
In this case, YES, you will be able to share the resources of the cluster. However, the devil is in the default behaviors. If you're reading from many files, performing a join, aggregating, etc, you're going to run into the fact that Spark is going to partition your data into chunks that can be acted on in parallel (see configuration like spark.default.parallelism).
So, in a more realistic example, you're going to have...
2 Workers with 4 cores each == 8 cores == 8 tasks can be handled in parallel
One application (App A) that has a job that launches a stage with 200 tasks.
One application (App B) that has a job that launches three stage with 8, 200, and 1 tasks respectively.
In a scenario like this FIFO scheduling, as is the default, will result in one of these applications blocking the other since the number of executors is completely overwhelmed by the number of tasks in just one stage.
In a FAIR scheduling mode, there will still be some blocking since the number of executors is small but some work will be done on each job since FAIR scheduling does a round-robin at the task level.
In Apache Spark, you have tighter control by creating different pools of the resources and submitting apps only to those pools where they have "isolated" resources. The "better" way of doing this is with Databricks Job clusters that have isolated compute dedicated to the application being ran.

Concurrent tasks workers with celery

I have a mongodb collections which are 20 in number that i am using to store some data regarding a tasks that i am currently processing using cron jobs.
I have one worker per collection when using cron jobs. I want to improve this arrangement and i am looking into celery. I want to have at least 4 workers per collection since i have many records in each collection.
I want the jobs to be done as they come and not wait for the five minutes wait as its happening when using cron jobs.
Is this possible for me to have 4 workers per collection in celery in the way i have described?.
Celery workers will pick tasks as soon as a new task is initiated and will execute it, celery can use redis, or rabittMQ for storing the tasks queue. Any day you can scale the celery by running it distributed on multiple machines or by scaling up the machine and increasing the number of workers. https://www.slideshare.net/nicolasgrasset/scaling-up-task-processing-with-celery
Instead of using the crontab, use celery beat which is the task scheduler for celery.
There is no need of having collection wise celery workers.
Please go through the below celery documentation for understanding celery.
http://docs.celeryproject.org/en/latest/getting-started/introduction.html

Running Parallel Tasks in Batch

I have few questions about running tasks in parallel in Azure Batch. Per the official documentation, "Azure Batch allows you to set maximum tasks per node up to four times (4x) the number of node cores."
Is there a setup other than specifying the max tasks per node when creating a pool, that needs to be done (to the code) to be able to run parallel tasks with batch?
So if I am understanding this correctly, if I have a Standard_D1_v2 machine with 1 core, I can run up to 4 concurrent tasks running in parallel in it. Is that right? If yes, I ran some tests and I am quite not sure about the behavior that I got. In a pool of D1_v2 machines set up to run 1 task per node, I get about 16 min for my job execution time. Then, using the same applications and same parameters with the only change being a new pool with same setup, also D1_v2, except running 4 tasks per node, I still get a job execution time of about 15 min. There wasn't any improvement in the job execution time for running tasks in parallel. What could be happening? What am I missing here?
I ran a test with a pool of D3_v2 machines with 4 cores, set up to run 2 tasks per core for a total of 8 tasks per node, and another test with a pool (same number of machines as previous one) of D2_v2 machines with 2 cores, set up to run 2 tasks per core for a total of 4 parallel tasks per node. The run time/ job execution time for both these tests were the same. Isn't there supposed to be an improvement considering that 8 tasks are running per node in the first test versus 4 tasks per node in the second test? If yes, what could be a reason why I'm not getting this improvement?
No. Although you may want to look into the task scheduling policy, compute node fill type to control how your tasks are distributed amongst nodes in your pool.
How many tasks are in your job? Are your tasks compute-bound? If so, you won't see any improvement (perhaps even end-to-end performance degradation).
Batch merely schedules the tasks concurrently on the node. If the command/process that you're running utilizes all of the cores on the machine and is compute-bound, you won't see an improvement. You should double check your tasks start and end times within the job and the node execution info to see if they are actually being scheduled concurrently on the same node.

Scaling periodic tasks in celery

We have a 10 queue setup in our celery, a large setup each queue have a group of 5 to 10 task and each queue running on dedicated machine and some on multiple machines for scaling.
On the other hand, we have a bunch of periodic tasks, running on a separate machine with single instance, and some of the periodic tasks are taking long to execute and I want to run them in 10 queues instead.
Is there a way to scale celery beat or use it purely to trigger the task on a different destination "one of the 10 queues"?
Please advise?
Use celery routing to dispatch the task to where you need: