Multilevel feedback queue using SLURM - queue

I was wondering if there is a way to construct a multilevel feedback queue using Slurm.
I have set up 3 partitions (fastqueue, mediumqueue and slowqueue), which have different time limits (2 minutes, 5 minutes and 10 minutes respectively). What I want to achieve is that all jobs are submitted to the fast queue at first, but when the time limit for the job is exceeded, the job is requeued in the next queue (medium queue). This would imply that fast jobs aren't slowed down by slow jobs.
Is it possible to achieve some king of multilevel feedback queue using Slurm?
Instead of using multiple partitions to achieve this, can a multilevel queue be simulated by playing around with the job priorities?
Any pointers would be greatly appreciated!

Related

What queuing tools or algorithms support fair queuing of jobs

I am hitting a well known problem, but I can't find a simple answer that tells me how to solve it.
I would appreciate you directing me by answering which feature I should look for in available queuing software or suitable algorithms if the solution requires programming in addition to the tools. and if you can direct me to Python supported tools, it would be helpful
My problem is that I get over the span of the day jobs which deploy 10, 100 or 1000 tests (I exaggerate , but it helps make a point). Many jobs deploy 10 tests, some deploy 100 tests and one or two deploy 1000 tests.
I want to deploy the tests in such a manner that the delay in execution is spread in a fair manner between all jobs. Let me explain myself.
If the very large job takes 2 hours on a idle server, it would be acceptable if it completes after 4 hours.
If a small job takes 3 minutes on an idle server, it would be acceptable if it completes after 15 minutes.
I want the delay of running the jobs to be spread in a fair way, so jobs that started earlier don't get too delayed. If it looks that the job is going to be delayed more than allowed it's priority will increase.
I think that prioritizing queues may be the solution, so dynamically changing the weights on a large queue will make it faster when needed.
Is there a queue software that knows how to do the above automatically. Lets say that I give each job some time limit and the queue software knows how to prioritize the tests from each queue so that no job is delayed too much?
Thanks.
Adding information following Jim's comments.
Not enough information to supply an answer. Is a job essentially just a list of tests? Can multiple tests for a single job be run concurrently? Do you always run all tests for a job? – Jim Mischel 14 hours ago
Each job deploys between 10 to 1000 tests.
The test can run concurrently to all other tests from the same or other users without conflicts.
All tests that were deploy by a job, are planned to run.
Additional info:
I've learned so far that Prioritized Queues are actually about applying weights to items in a single queue, where items with the hightest are pulled first. If two or more items have the same highest priority, the first item to arrive will be executed first.
When I pondered about Priority Queues it was more in the way of:
Multiple Queues, where each queue has a priority assigned to the entire queue.
The priority can be changed dynamically in runtime, based on some condition, e.g. setting a time limit on the execution of the entire queue.

How to put a rate limit on a celery queue?

I read this in the celery documentation for Task.rate_limit:
Note that this is a per worker instance rate limit, and not a global rate limit. To enforce a global rate limit (e.g., for an API with a maximum number of requests per second), you must restrict to a given queue.
How do I put a rate limit on a celery queue?
Turns out it cant be done at queue level for multiple workers.
IT can be done at queue level for 1 worker. Or at queue level for each worker.
So if u say 10 jobs/ minute on 5 workers. Your workers will process upto 50 jobs per minute collectively.
So to have only 10 jobs running at a time you either chose one worker. Or chose 5 workers with a limit of 2/minute.
Update: How to exactly put the limit in settings/configuration:
task_annotations = {'tasks.<task_name>': {'rate_limit': '10/m'}}
or change the same for all tasks:
task_annotations = {'*': {'rate_limit': '10/m'}}
10/m means 10 tasks per minute, /s would mean per second. More details here: Task annotations setting
hey I am trying to find a way to do rate limit on queue, and I find out Celery can't do that, however Celery can control the rate per tasks, see this:
http://docs.celeryproject.org/en/latest/userguide/workers.html#rate-limits
so for a workaround, maybe you can set up one tasks per queue(which makes sense in a lot of situations), and put the limit on task.
You can set this limit in the flower > worker pane.
there is a specified blank space for entering your limit there.
The format that is suggested to be used is also like the below:
The rate limits can be specified in seconds, minutes or hours by appending “/s”, >“/m” or “/h” to the value. Tasks will be evenly distributed over the specified >time frame.
Example: “100/m” (hundred tasks a minute). This will enforce a minimum delay of >600ms between starting two tasks on the same worker instance.

Scheduling jobs in Quartz as a process

Is jobs in quartz are executed as process or thread?
If it is executed as a thread then will it effect the performance of quartz scheduler when heavy jobs or time consuming jobs are executed.
If so then please suggest the solution.
If we execute 10 time consuming jobs simultaneously what is the effect?
I read the tutorials but didnt find the solution.
Please suggest the solution.
Thanks.
Read the documentation regarding Configuring the thread pool which explains how the quartz thread pool can be suited for your needs. More specifically the org.quartz.threadPool.threadCount configuration property can be set according to your needs as the documentation explains:
The number of threads available for concurrent execution of jobs. You
can specify any positive integer, although only numbers between 1 and
100 are practical. If you only have a few jobs that fire a few times a
day, then one thread is plenty. If you have tens of thousands of jobs,
with many firing every minute, then you want a thread count more like
50 or 100 (this highly depends on the nature of the work that your
jobs perform, and your systems resources).
In the specific example you mentioned regarding 10 jobs firing simultaneously, if you have configured above property with more than 10 threads, then each job will run concurrently on its own thread. Otherwise if you have configured less, some will start first, and the others will wait for threads to become available. If no threads become available until a configured period of time, the misfire instructions you have set will handle the action to be taken, which usually is to trigger delayed jobs as soon as possible but this is also a configurable setting.

Scala task parallelization with actors => How does the scheduler work?

I have a task which can be easily be broken into parts which can and should be processed in parallel to optimize performance.
I wrote an producer actor which prepares each part of the task that could be processed independently. This preparation is relatively cheap.
I wrote a consumer Actor that processes each of the independent tasks. Depending on the parameters each piece of independent task may take up to a couple of seconds to be processed. All tasks are quite the same. They all process the same algorithm, with the same amount of data (but different values of course) resulting in about equal time of processing.
So the producer is much faster than the consumer. Hence there quickly may be 200 or 2000 tasks prepared (depending on the parameters). All of them consuming memory while just a couple of them can be executed at at once.
Now I see two simple strategies to consume and process the tasks:
Create a new consumer actor instance for each task.
Each consumer processes only on task.
I assume there would be many consumer actor instances at the same time, while only a couple of them, can be processed at any point in time.
How does the default scheduler work? Can each consumer actor finish processing before the next consumer will be scheduled? Or will a consumer be interrupted and be replaced by another consumer resulting in longer time until the first task will be finished? I think this actor scheduling is not the same as process or thread scheduling, but I can imagine, that interruption can still have some disadvantages (e.g. like more cache misses).
The other strategy is to use N instances of the consumer actor and send the tasks to process as messages to them.
Each consumer processes multiple tasks in sequence.
It is left up to me, to find a appropriate value for the N (number of consumers).
The distribution of the tasks over the N consumers is also left up to me.
I could imagine a more sophisticated solution where more coordination is done between the producer and the consumers, but I can't make a good decision without knowledge about the scheduler.
If manual solution will not result in significant better performance, I would prefer a default solution (delivered by some part of the Scala world), where scheduling tasks are not left up to me (like strategy 1).
Question roundup:
How does the default scheduler work?
Can each consumer actor finish processing before the next consumer will be scheduled?
Or will a consumer be interrupted and be replaced by another consumer resulting in longer time until the first task will be finished?
What are the disadvantages when the scheduler frequently interrupts an actor and schedules another one? Cache-Misses?
Would this interruption and scheduling be like a context-change in process scheduling or thread scheduling?
Are there any more advantages or disadvantages comparing these strategies?
Especially does strategy 1 have disadvantages over strategy 2?
Which of these strategies is the best?
Is there a better strategy than I proposed?
I'm afraid, that questions like the last two can not be answered absolutely, but maybe this is possible this time as I tried to give a case as concrete as possible.
I think the other questions can be answered without much discussion. With those answers it should be possible to choose the strategy fitting the requirements best.
I made some research and thoughts myself and came up with some assumptions. If any of these assumptions are wrong, please tell me.
If I were you, I would have gone ahead with 2nd option. A new actor instance for each task would be too tedious. Also with smart decision of N, complete system resources can be used.
Though this is not a complete solution. But one possible option is that, can't the producer stop/slow down the rate of producing tasks? This would be ideal. Only when there is a consumer available or something, the producer will produce more tasks.
Assuming you are using Akka (if you don't, you should ;-) ), you could use a SmallestMailboxRouter to start a number of actors (you can also add a Resizer) and the message distribution will be handled according to some rules. You can read everything about routers here.
For such a simple task, actors give no profit at all. Implement the producer as a Thread, and each task as a Runnable. Use a thread pool from java.util.concurrent to run the tasks. Use a java.util.concurrent. Semaphore to limit the number of prepared and running tasks: before creating the next tasks, producer aquires the sempahore, and each task releases the semaphore at the end of its execution.

Least load scheduler

I'm working on a system that uses several hundreds of workers in parallel (physical devices evaluating small tasks). Some workers are faster than others so I was wondering what the easiest way to load balance tasks on them without a priori knowledge of their speed.
I was thinking about keeping track of the number of tasks a worker is currently working on with a simple counter and then sorting the list to get the worker with the lowest active task count. This way slow workers would get some tasks but not slow down the whole system. The reason I'm asking is that the current round-robin method is causing hold up with some really slow workers (100 times slower than others) that keep accumulating tasks and blocking new tasks.
It should be a simple matter of sorting the list according to the current number of active tasks, but since I would be sorting the list several times a second (average work time per task is below 25ms) I fear that this might be a major bottleneck. So is there a simple version of getting the worker with the lowest task count without having to sort over and over again.
EDIT: The tasks are pushed to the workers via an open TCP connection. Since the dependencies between the tasks are rather complex (exclusive resource usage) let's say that all tasks are assigned to start with. As soon as a task returns from the worker all tasks that are no longer blocked are queued, and a new task is pushed to the worker. The work queue will never be empty.
How about this system:
Worker reaches the end of its task queue
Worker requests more tasks from load balancer
Load balancer assigns N tasks (where N is probably more than 1, perhaps 20 - 50 if these tasks are very small).
In this system, since you are assigning new tasks when the workers are actually done, you don't have to guess at how long the remaining tasks will take.
I think that you need to provide more information about the system:
How do you get a task to a worker? Does the worker request it or does it get pushed?
How do you know if a worker is out of work, or even how much work is it doing?
How are the physical devices modeled?
What you want to do is avoid tracking anything and find a more passive way to distribute the work.