Operating System vs Monitor - operating-system

Without going into details, how is a Monitor different from an OS?
I read that first there was Serial Processing in the earlier days, and then Monitors and now OS.

Monitor in this context means Batch Monitor.
In the 1950s - mid 60s, before we had true operating systems, we had Batch Monitors. You would "program" the job onto punch cards and put them on an input queue that the machine would process one by one.
The programmer would sit in front of a monitor, which would display memory dumps, debugging information, etc - it was an incredibly tedious process.
Of course the major drawback of a Batch Monitor is that the CPU was often idle. Because CPU speeds are so much faster than I/O speed, the machine would spend the majority of the time reading in the cards (I/O) while the CPU waited.
Nowadays, modern operating systems can run several processes at once and optimize CPU utilization. When a process on the run queue needs to do I/O, the OS puts it on another queue, and the CPU starts processing the next job. When the I/O is done, that process is moved back to the run queue. This way, the CPU is always doing something.
Edit:
After looking up "batch monitor" and not finding many references to it, it seems that it is more commonly referred to as a "batch system" - here's a book for reference; should be able to find a pdf version online:
Modern Operating Systems.

Related

Throughput vs latency in computer architecture

I've come across articles on "through-put vs latency" in contexts like networking e.g. https://homepage.cs.uri.edu/~thenry/resources/unix_art/ch12s04.html But in the context of computer architecture / operating systems, I'm not able to understand why would there be a trade-off between latency (response time of a program) and through-put (how many programs we're able to complete in a unit of time, say per hour). Is this solely due to the fact that we can choose to parallelize processing of multiple programs / requests leading to overheads like context switches & sharing of caches which make the start-to-end response time per process to be worse? Or am I missing something here?
In terms of single instructions in a superscalar pipelined out-of-order exec CPU, throughput vs. latency is very important because the CPU is trying to extract parallelism from an instruction stream that has to be executed as if in serial program order. See Assembly - How to score a CPU instruction by latency and throughput and the bottom of my answer on latency vs throughput in intel intrinsics for example.
In terms of OS decisions that affect throughput vs. latency on a much longer timescale than a few clock cycles, that's a totally separate question.
One of the major factors there is choosing how to use the available physical RAM, and whether to page out (to a swap file) infrequently used code / data to make more room to cache disk files. (e.g. Linux's vm.swappiness is widely considered a key tunable in terms of setting it differently between servers and desktops. https://unix.stackexchange.com/questions/88693/why-is-swappiness-set-to-60-by-default).
If you alt-tab to a window when many pages of that process have been paged out, it will take some time before the process can redraw its window. (Multiple hard page faults, can be quite slow especially if paging on a rotational disk, not SSD.) So to optimize for latency, you want the kernel to not aggressively swap out pages from running processes, even if they've been idle for a few hours. Those pages, if they'd been free, could have improved throughput for other processes by acting as buffers / cache.
A related factor is I/O scheduling: trying to group IO requests together to minimize HD seek times (for higher throughput and lower average latency), but sometimes at the expense of delaying a few requests for a longer time (higher worst-case latency). Linux for example has many to choose from, including deadline, Completely Fair Queuing (CFQ), and the original elevator (just grouping requests by locality without consideration of fairness or latency). https://wiki.archlinux.org/title/improving_performance#Input/output_schedulers
CPU scheduling is also a factor: a context-switch hurts throughput, as it takes time itself and caches will likely be cold for the new task on this CPU. You also have to run the kernel's schedule() function to decide which task to run next, so that takes away some time from real work.
To minimize latency (for example between a socket message being sent to a process and it waking up when its poll or select system call returns), you want a short timeslice, like Linux HZ=1000. (Timer interrupts every 1 ms to run the scheduler). And you want to be able to pre-empt even the kernel itself, instead of waiting until the kernel is ready to return to the old user-space to consider the possibility of running a different user-space task.
But neither of these helps throughput, and in fact hurt (assuming the workload has enough parallelism to not bottleneck on latency). So HZ=100 was the default for "server" Linux builds, vs. 1000 on "desktop" builds tuned for interactive use. (Modern Linux can be "tickless", not using a fixed timer interrupt on every core at all, instead deciding when to schedule the next interrupt on a case by case basis.)
Real-time kernels take this even further, spending more time on finer-grained locking and stuff like that to enable pausing work and coming back to it later to minimize interrupt latency and other latencies between it being time to do something and actually starting to do that thing. (There are real-time patches for Linux, and there are also totally separate kernels built from the ground up for real-time operation.)
If you have an embedded system controlling a motor or something, you absolutely need hard real-time latency guarantees that it will never take longer than say 1 millisecond from an interrupt pin being asserted to the interrupt handler starting to run.
(Designing the system to make these guarantees possible often comes at the cost of throughput. e.g. obviously you have to pin some memory to make it not swappable, if we're talking about user-space, making it unavailable for cache even if it goes untouched for days.)

Can Multiprocessor CPUs avoid context-switching?

Today's computer architecture are trying to maximize the number of registers. It is faster to access a register (which is an integrated memory circuit near the cpu) than to access first-level cache. The problem is, that each context switch has to save all registers into cache, because the next thread needs other register values. What a modern CPU is doing is to cycle in one second through 100 tasks and everytime it saves the registers, and fetches the old one until the task can be started.
IMHO it would be nice to use one CPU for one task, and no context switching is happening. That means we get 100 CPUs, each 1000 registers which has to be never saved. Is that possible or have I a ignored an important detail?
The only way to completely avoid context switching is by having at least as many cores as there are tasks. Generally, there is no guarantee regarding the maximum number of tasks that may run. Current GPUs and manycore processors and co-processors contain hundreds of small cores. If you put multiple of these things in the same system or in a cluster of systems, you can have thousands or more cores. Still, even if you could avoid context switching with such design, these cores are much slower than the traditional high-end CPU cores, so the net effect might be negative.
But let's take a step back here. The number of context switches is not primarily determined by the number of tasks and cores. Tasks don't just perform computations, they also need to interact with I/O devices and wait for things to happen such as results from other tasks or user input. So some tasks would be in a wait state. The overhead of context switching depends on not only the number of tasks but also the behavior of these tasks.
Both processors architects and OS developers are aware of context switching overhead and employ a variety of techniques to alleviate it. For example, x86 provides a number of instructions that are tuned to saving the context (partially) of the current task. The OS thread scheduler uses techniques such as priorities, preemption (with possibly large time slices on servers), and priority boosting. All of these help reducing the number of context switches and therefore their overall overhead. In addition, reducing the overhead of context switching is not the only thing that matters. In particular, the responsiveness of the system is very important as well, which is at odds with that overhead.

System startup of multicore computer

I would really like to know how does a multicore CPU start when the computer starts up. I imagine there is like a "dominant core" that loads the BIOS and later on ther kernel to RAM and wakes up the rest of the cores leaving them waiting for code to run (like an infinite while loop?). But that it's only how I guess it works.
Other question is, after the kernel is loaded on memory all cores can do system calls, right?. And how does one core control the tasks of the other cores? Which instructions are used? (in x86 / x86-64)
Yes there is a boot CPU. The firmware handles that. It's usually CPU 0, but what if that one is missing or defective? Then it gets trickier.
On x86 platforms there's the ACPI tables which describe the CPU and memory layouts. The operating system starts the other CPUs with IPI (inter processor interrupts) which kick them out of idle into the interrupt handlers (which were set in memory) and then into operating system functions. Which then choose threads to run and start doing useful things.
If you really want to know how it all works read the source code for Linux or one of the BSDs.
Update: Looks like I was wrong about IPI. It is using interrupts but not the normal IPI ones. The Linux SMP boot is here: https://github.com/torvalds/linux/blob/master/arch/x86/kernel/smpboot.c
It seems to use NMI or sets the CPU reset.

What are some of the advantages and disadvantages of user mode and kernel mode

In an Operating System, threads are typically handled in user mode or kernel mode. What are some of the advantages and disadvantages of each?
User-mode threads are scheduled in user mode by something in the process, and the process itself is the only thing handled by the kernel scheduler.
That means your process gets a certain amount of grunt from the CPU and you have to share it amongst all your user mode threads.
Simple case, you have two processes, one with a single thread and one with a hundred threads.
With a simplistic kernel scheduling policy, the thread in the single-thread process gets 50% of the CPU and each thread in the hundred-thread process gets 0.5% each.
With kernel mode threads, the kernel itself manages your threads and schedules them independently. Using the same simplistic scheduler, each thread would get just a touch under 1% of the CPU grunt (101 threads to share the 100% of CPU).
In an Operating System, threads are typically handled in user mode or kernel mode.
Typically threads are handled in kernel mode.
What are some of the advantages and disadvantages of each?
In theory, the advantage of handling threads in user mode is that it avoids the cost of switching to/from kernel when a thread needs to wait for something (which can be relatively expensive as it involves privilege level switches). In practice this "advantage" often doesn't happen because the thread has to switch to kernel anyway, to ask kernel to do whatever the thread would wait for (e.g. switching to kernel to ask it to read data from a file and then returning to user-space to block/wait instead of blocking/waiting in the kernel while you're already in the kernel). Mostly; it only helps if the kernel isn't involved at all, which only really happens when user-space threads communicate with or share locks with other threads in the same process.
The advantage of handling threads in kernel is that the kernel can support thread priorities properly. For example, if you have two processes that both have a very high priority thread and a very low priority thread; then kernel can make sure CPU time is given to the high priority thread/s when possible (including pre-empting low priority threads when a high priority thread unblocks) because it knows about all threads; but user-space can't do this - one process doesn't know about threads belonging to a different process, so user threading will get it wrong and ruin performance (one process giving CPU time to its own very low priority thread while a very high priority thread belonging to a different process needs the CPU and doesn't get it).
The other advantage of handling threads in the kernel is that (especially for systems with multiple CPUs) the kernel has access to better information and can make smarter scheduling decisions. This includes balancing the load (from any number of processes) across all CPUs while taking into account "CPU topology" (NUMA, SMT, etc; possibly including heterogeneous CPUs - e.g. "big.LITTLE" arrangements); and making trade-offs between thread priorities, CPU temperatures and power consumption (e.g. if one of the CPU's is getting too hot, reduce that CPU's clock speed to let it cool down and use it for low priority threads so that the performance of high priority threads isn't effected).

Medium term scheduler

I have read in Galvin book of operating system about the Medium term scheduler.
It was written that:
Sometimes, it is advantageous to swap out the process when it is not executing[waiting for I/O or waiting for CPU] in order to decrease the degree of multiprogramming.
Also, we get more amount of physical memory which makes the execution of other process faster by decreasing the number of page faults[as we have more memory].
So, its the work of medium term scheduler to swap out & swap in partially executed process.
But My question is: Does the work of medium term scheduler is really important in scenarios where we have plenty of available physical/main memory?
The use of medium term scheduler is to improve multiprogramming by allowing multiple processes to reside in main memory by swapping out processes that are waiting (need I/O) or low priority processes and swapping in other processes that were in ready queue.
So you can see that we requied medium term scheduler when we have limited memory. This swapping in and out operation does not take place when we are running a single small program and have large memory.
Similary if we are running multiple programs and we have very large memory(larger than the size of all processes plus addition space for other requirements) then medium term scheduler is not needed. Modern operating systems use paging so instead of swapping processes they swap pages in and out of memory.It is same as a system with very large memory(infinite) would not suffer from page faults.
Medium term scheduling is part of the swapping. It removes the processes from the memory. It reduces the degree of multiprogramming. The medium term scheduler is in-charge of handling the swapped out-processes.
TUTORIALS POINT
Simply Easy Learning Page 28
Running process may become suspended if it makes an I/O request. Suspended processes cannot make any progress towards completion. In this condition, to remove the process from memory and make space for other process, the suspended process is moved to the secondary storage. This process is called swapping, and the process is said to be swapped out or rolled out. Swapping may be necessary to improve the process mix.