I have the following query regarding the scheduling of process threads.
a) If my process A has 3 threads then can these threads be scheduled concurrently on the different CPUs in SMP m/c or they will be given time slice on the same cpu.
b) Suppose I have two processes A with 3 threads and Process B with 2 threads (all threads are of same priority) then cpu time allocated to each thread (time slice) is dependent on the number of threads in the process or not?
Correct me if I am wrong is it so that cpu time is allocated to process which is then shared among its threads i.e. time slice given to process A threads is less than that of Process B threads.
This depends on your OS and thread implementation. POSIX threads defines an interface for defining how threads are scheduled: whether each thread is scheduled equally or each process is scheduled equally. Not all scheduling types are supported on all platforms.
On Linux, using nptl, the default behavior is to schedule all threads equally, so a process with 10 threads might get 10 times as much time as a process with 1 thread, if all eleven threads are CPU bound.
Related
Consider and operating system with a non-preemptive SJF schedule. If it is given a workload of say 10 processes, and each process performs a CPU burst which ranges from 10ms to 20ms followed by a 500ms I/O burst, will any of the processes experience starvation?
Working through this I know that the shortest process is scheduled first and whichever process is running will run to completion but I don't understand how to determine if any processes will be postponed due to a resource never being allocated given this information, I would like to know before continuing with it and I was wondering how I can tell given the workload and type of scheduler?
Consider and operating system with a non-preemptive SJF schedule. If it is given a workload of say 10 processes, and each process performs a CPU burst which ranges from 10ms to 20ms followed by a 500ms I/O burst, will any of the processes experience starvation?
If you define "starvation" as "perpetually not getting any CPU time"; then, with a "shortest job first" algorithm:
a) longer jobs will get starvation when shorter jobs are created faster than they complete (regardless of how many CPUs there are they literally can't keep up because new shorter jobs are being created too often).
b1) if the number of tasks that take an infinite amount of time exceeds the number of CPUs and none of the tasks block (e.g. wait for IO), or more processes will be starved of CPU time (unless you augment SJF with some some form of time sharing to avoid starvation among "always equal length" jobs).
b2) if the number of tasks that take an infinite amount of time exceeds the number of CPUs and some of the tasks do block (e.g. wait for IO), then whether starvation happens or not depends on "sum of time each process is not blocked".
If a SJF scheduler is given a workload of 10 processes and none of them are "infinite length", and no additional new processes are ever created; then all 10 tasks must complete sooner or later and none of the tasks will be perpetually waiting for a CPU.
Of course this doesn't mean some tasks won't have to wait (temporarily, briefly) for a CPU.
Note: for real systems, typically there are lots of infinite length tasks that do block (e.g. for both Windows and Linux, often there's over 100 processes running as services/daemons and for GUI); nobody knows how long any task will take (and not just because the speed of each CPU keeps changing due to power management - e.g. how long will the web browser you're using run for?); often nobody can know if a process will take an infinite amount of time or not (halting problem); and sometimes a process will accidentally loop forever due to a bug. In other words; "shortest job first" is almost always impossible to implement.
What is the relation between a worker and a worker process in celery? Does it make sense to run multiple workers on a single machine?
Here is the system configuration
8 core and 32GB RAM.
The celery configuration I tried was as below
celery -A Comments_DB worker --loglevel=INFO --concurrency=8
I want to increase the number of requests processed in a given time frame. Which is a better approach?
a. 2 Workers with concurrency set to 8 each( 2*8 = 16) or
b. 1 Worker with concurrency set to 16 *1*16=16) ?
Could anyone please clarify?
A worker (parent process) will have one or more worker processes (child processes). That way if any of the children die because of an error or because of a max task limit, the parent can kick off another child process.
One parent process with concurrency of 16 will generally have better performance than two processes with concurrency of 8. This is because there is less process overhead with one process than with two. You might want two processes if you had multiple queues and wanted to make sure that a slower queue wasn't blocking other important queue tasks from processing.
In an Operating System, threads are typically handled in user mode or kernel mode. What are some of the advantages and disadvantages of each?
User-mode threads are scheduled in user mode by something in the process, and the process itself is the only thing handled by the kernel scheduler.
That means your process gets a certain amount of grunt from the CPU and you have to share it amongst all your user mode threads.
Simple case, you have two processes, one with a single thread and one with a hundred threads.
With a simplistic kernel scheduling policy, the thread in the single-thread process gets 50% of the CPU and each thread in the hundred-thread process gets 0.5% each.
With kernel mode threads, the kernel itself manages your threads and schedules them independently. Using the same simplistic scheduler, each thread would get just a touch under 1% of the CPU grunt (101 threads to share the 100% of CPU).
In an Operating System, threads are typically handled in user mode or kernel mode.
Typically threads are handled in kernel mode.
What are some of the advantages and disadvantages of each?
In theory, the advantage of handling threads in user mode is that it avoids the cost of switching to/from kernel when a thread needs to wait for something (which can be relatively expensive as it involves privilege level switches). In practice this "advantage" often doesn't happen because the thread has to switch to kernel anyway, to ask kernel to do whatever the thread would wait for (e.g. switching to kernel to ask it to read data from a file and then returning to user-space to block/wait instead of blocking/waiting in the kernel while you're already in the kernel). Mostly; it only helps if the kernel isn't involved at all, which only really happens when user-space threads communicate with or share locks with other threads in the same process.
The advantage of handling threads in kernel is that the kernel can support thread priorities properly. For example, if you have two processes that both have a very high priority thread and a very low priority thread; then kernel can make sure CPU time is given to the high priority thread/s when possible (including pre-empting low priority threads when a high priority thread unblocks) because it knows about all threads; but user-space can't do this - one process doesn't know about threads belonging to a different process, so user threading will get it wrong and ruin performance (one process giving CPU time to its own very low priority thread while a very high priority thread belonging to a different process needs the CPU and doesn't get it).
The other advantage of handling threads in the kernel is that (especially for systems with multiple CPUs) the kernel has access to better information and can make smarter scheduling decisions. This includes balancing the load (from any number of processes) across all CPUs while taking into account "CPU topology" (NUMA, SMT, etc; possibly including heterogeneous CPUs - e.g. "big.LITTLE" arrangements); and making trade-offs between thread priorities, CPU temperatures and power consumption (e.g. if one of the CPU's is getting too hot, reduce that CPU's clock speed to let it cool down and use it for low priority threads so that the performance of high priority threads isn't effected).
I'm unsure how Round Robin scheduling works with I/O Operations. I've learned that CPU bound processes are favoured by Round Robin scheduling, but what happens if a process finishes its time slice early?
Say we neglect the dispatching process itself and a process finishes its time slice early, will the scheduler schedule another process if its CPU bound, or will the current process start its IO operation, and since that isn't CPU bound, will immediately switch to another (CPU bound) process after? And if CPU bound processes are favoured, will the scheduler schedule ALL CPU bound process until they are finished and only afterwards schedule the I/O processes?
Please help me understand.
There are two distinct schedulers: the CPU (process/thread ...) scheduler, and the I/O scheduler(s).
CPU schedulers typically employ some hybrid algorithms, because they certainly do regularly encounter both pre-emption and processes which voluntarily give up part of their time-slice. They must service higher-priority work quickly, while not "starving" anyone. (A study of the current Linux scheduler is most interesting. There have been several.)
CPU schedulers identify processes as being either "primarily 'I/O-bound'" or "primarily 'CPU-bound'" at this particular time, knowing that their characteristics can and do change. If your process repeatedly consumes full time slices, it is seen as CPU-bound.
I/O schedulers seek to order and re-order the I/O request queues for maximum efficiency. For instance, to keep the read/write head of a physical disk-drive moving efficiently in a single direction. (The two components of disk-drive delay are "seek time" and "rotational latency," with "seek time" being by-far the worst of the two. Per contra, solid-state drives have very different timing.) I/O-schedulers also have to be aware of the channels (disk interface cards, cabling, etc.) that provide access to each device: they can't simply watch what any one drive is doing. As with the CPU-scheduler, requests must be efficiently handled but never "starved." Linux's I/O-schedulers are also readily available for your study.
"Pure round-robin," as a scheduling discipline, simply means that all requests have equal priority and will be serviced sequentially in the order that they were originally submitted. Very pretty birds though they are, you rarely encounter Pure Robins in real life.
Is it possible to set the concurrency (the number of simultaneous workers) on a per-task level in Celery? I'm looking for something more fine-grained that CELERYD_CONCURRENCY (that sets the concurrency for the whole daemon).
The usage scenario is: I have a single celerlyd running different types of tasks with very different performance characteristics - some are fast, some very slow. For some I'd like to do as many as I can as quickly as I can, for others I'd like to ensure only one instance is running at any time (ie. concurrency of 1).
You can use automatic routing to route tasks to different queues which will be processed by celery workers with different concurrency levels.
celeryd-multi start fast slow -c:slow 3 -c:fast 5
This command launches 2 celery workers listening fast and slow queues with 3 and 5 concurrency levels respectively.
CELERY_ROUTES = {"tasks.a": {"queue": "slow"}, "tasks.b": {"queue":
"fast"}}
The tasks with type tasks.a will be processed by slow queue and tasks.b tasks by fast queue respectively.