Does quartz scheduler support configured jitter to avoid herding effect? - quartz-scheduler

For example, if you have a lot of jobs scheduled exactly at the same time, it becomes hard to handle.

Related

How to structure an elastic Azure Batch application?

I am evaluating Batch for a project and, while it seems like it will do what I'm looking for, I'm not sure if what I am assuming is really correct.
I have what is basically a job runner from a queue. The current solution works but when the pool of nodes scales down, it just blindly kills off machines. I am looking for something that, when scaling down, will allow currently-running jobs to complete and then remove the node(s) from the pool. I also want to preemptively increase the pool size if a spike is likely to occur (and not have those nodes shut down). I can adjust the pool size externally if that makes sense (seems like the best option so far).
My current idea is to have one pool with one job & task per node, and that task listens to a queue in a loop for messages and processes them. After an iteration count and/or time limit, it shuts down, removing that node from the pool. If the pool size didn't change, I would like to replace that node with a new one. If the pool was shrunk, it should just go away. If the pool size increases, new nodes should run and start up the task.
I'm not planning on running something that continually add pools, or nodes to the pool, or tasks to a job, though I will probably have something that sets the pool size periodically based on queue length or something similar. What I would rather not do is something like "there are 10 things in the queue, add a pool with x nodes, then delete it".
Is this possible or are my expectations incorrect? So far, from reading the docs, it seems like it should be doable, and I have a simple task working, but I'm not sure about the scaling mechanics or exactly how to structure the tasks/jobs/pools.
Here's one possible way to lean into the strengths of Azure Batch and achieve what you've described.
Create your job with a JobManagerTask that monitors your queue for incoming work and adds a new Batch Task for each item of your workload. Each task will process a single piece of work, then exit.
The Batch Scheduler will then take care of allocating tasks to compute nodes. It can also take care of retrying tasks that fail, and so on.
Configure your pool with an AutoScale formula to dynamically resize your pool to meet your load. Your formula can specify taskcompletion to ensure tasks get to complete before any one compute node is removed.
If your workload peaks are predictable (say, 9am every day) your AutoScale expression could scale up your pool in anticipation. If those spikes are not predicable, your external monitoring (or your JobManager) can change the AutoScale expression at any time to suit.
If appropriate, your job manager can terminate once all the required tasks have been added; set onAllTasksComplete to terminatejob, ensuring your job is completed after all your tasks have finished.
A single pool can process tasks from multiple jobs, so if you have multiple concurrent workloads, they could share the same pool. You can give jobs different values for priority if you want certain jobs to be processed first.

Starvation vs Convoy Effect

Is the only difference between starvation and convoy effect that convoy effect is mainly defined on FCFS scheduling algorithms and starvation is on priority based scheduling?
I researched on both effects but couldn't find a comparison. This is based on operating systems theory which I learned for my college degree.
Starvation and convoys can occur both algorithms. The simplest, starvation, can be simulated by a task entering this loop (I hope it isn't UDB):
while (1) {
}
In FCFS, this task will never surrender the CPU, thus all tasks behind it will starve. In a Priority based system, this same task will starve every task of a lower priority.
Convoys can be more generally recognized as a resource contention problem; one task has the resources (cpu), and other tasks have to wait until it is done with it. In a priority-based system, this is manifest in priority inversion where a high priority task is blocked because it needs a resource owned by a lower priority task. There are ways to mitigate these, including priority inheritance and ceiling protocols. Absent these mechanisms, tasks contending for a resource will form a convoy much like in the fcfs; unlike the fcfs, tasks not contending for the resource are free to execute at will.
The aspirations of responsiveness, throughput and fairness are often at odds, which is partly why we don't have a true solution to scheduling problems.

Why schedule threads between cpu cores expensive?

There are some articles which refers to so called core affinity and this technique will bind a thread to a core which would decrease the cost of the scheduling threads between cores. In contrast there is my question.
Why operating system doing this job take more time when scheduling threads between cores.
You're probably misinterpreting something you read. It's not the actual scheduling that's slow, it's that a task will run slower when it moves to a new core because private per-core caches will be cold on that new core.
(And worse than that, dirty on the old core requiring write-back before they can be read.)
In most OSes, it's not so much that a task is "scheduled to a core", as that the kernel running on each core grabs the highest-priority task that's currently runnable, subject to restrictions from the affinity mask. (The scheduler function on this core will only consider tasks whose affinity mask matches this core.)
There is no single-threaded master-control program that decides what each core should be doing; the scheduler in normal kernels is a cooperative multi-threaded algorithm.
It's mostly not the actual cost of CPU time in the kernel's scheduler function, it's that the task runs slower on a new core.

What makes RTOS behaviour predictable ?

How can you ensure that interrupt latency will not exceed a certain value when there may be other variables and factors involved, like the hardware ?
Hardware latency is predictable. It doesn't have to be constant, but it definitely is bounded - for example interrupt entry is usually 12 cycles, but sometimes it may take 15 cycles.
RTOS latency is predictable. It also is not constant, but for example you can be certain, that the RTOS does not block the interrupts for longer than 1000 cycles at any time. Usually it will block them for much shorter periods of time, but never longer than stated.
If only your application doesn't do something strange (like a while (1); in the thread with highest possible priority), then the latency of the whole system will be a sum of hardware latency and RTOS latency.
The important fact here is that using real-time operating system to write your application is not the only requirement for the application to also be real-time. In your application you have to ensure that the real-time constraints are not violated. The main job of RTOS is to NOT get in your way of doing that, so it may not introduce random/unpredictable delays.
Generally the most important of the "predictable" things in RTOS is that the highest priority thread that is not blocked is executing. Period. In a GPOS (like the one on your desktop computer, in tablets or in smartphones), this is not true, because the scheduler actively prevents low priority threads from starvation, by allowing them to run for some time, even if there are more important things to do right now. This makes the behaviour of the application unpredictable, because one day it may react within 10us, while on the other day it may react within 10s, because the scheduler decided it's a great moment to save the logs to hard drive or maybe do some garbage collection.
Alternatively you can think that for RTOS the latency is in the range of microseconds, maybe single milliseconds. For a GPOS the max latency would probably be something like dozens of seconds.

Round Robin scheduling and IO

I'm unsure how Round Robin scheduling works with I/O Operations. I've learned that CPU bound processes are favoured by Round Robin scheduling, but what happens if a process finishes its time slice early?
Say we neglect the dispatching process itself and a process finishes its time slice early, will the scheduler schedule another process if its CPU bound, or will the current process start its IO operation, and since that isn't CPU bound, will immediately switch to another (CPU bound) process after? And if CPU bound processes are favoured, will the scheduler schedule ALL CPU bound process until they are finished and only afterwards schedule the I/O processes?
Please help me understand.
There are two distinct schedulers: the CPU (process/thread ...) scheduler, and the I/O scheduler(s).
CPU schedulers typically employ some hybrid algorithms, because they certainly do regularly encounter both pre-emption and processes which voluntarily give up part of their time-slice. They must service higher-priority work quickly, while not "starving" anyone. (A study of the current Linux scheduler is most interesting. There have been several.)
CPU schedulers identify processes as being either "primarily 'I/O-bound'" or "primarily 'CPU-bound'" at this particular time, knowing that their characteristics can and do change. If your process repeatedly consumes full time slices, it is seen as CPU-bound.
I/O schedulers seek to order and re-order the I/O request queues for maximum efficiency. For instance, to keep the read/write head of a physical disk-drive moving efficiently in a single direction. (The two components of disk-drive delay are "seek time" and "rotational latency," with "seek time" being by-far the worst of the two. Per contra, solid-state drives have very different timing.) I/O-schedulers also have to be aware of the channels (disk interface cards, cabling, etc.) that provide access to each device: they can't simply watch what any one drive is doing. As with the CPU-scheduler, requests must be efficiently handled but never "starved." Linux's I/O-schedulers are also readily available for your study.
"Pure round-robin," as a scheduling discipline, simply means that all requests have equal priority and will be serviced sequentially in the order that they were originally submitted. Very pretty birds though they are, you rarely encounter Pure Robins in real life.