CPU Scheduling policy (by multi thread) - operating-system

In general Operation system reference book like Operating system concepts...
When it explain the CPU scheduling (FCFS, RR, ...),
I think that sounds like single CPU / single thread by default.
so, I wonder if that applies to single CPU / multi-thread by default.

thread is the smallest of cpu scheduling unit, so i think it's also applies to single CPU / multi-thread.

A single CPU (or core, to be exact) can run only one thread at a time. The OS gives the impression of multitasking by constantly switching which thread is run.
If your question is about difference between single-core CPUs and multi-core CPUs, multi-core CPUs are handled in the same way as multiple single-core CPUs.

Related

Why schedule threads between cpu cores expensive?

There are some articles which refers to so called core affinity and this technique will bind a thread to a core which would decrease the cost of the scheduling threads between cores. In contrast there is my question.
Why operating system doing this job take more time when scheduling threads between cores.
You're probably misinterpreting something you read. It's not the actual scheduling that's slow, it's that a task will run slower when it moves to a new core because private per-core caches will be cold on that new core.
(And worse than that, dirty on the old core requiring write-back before they can be read.)
In most OSes, it's not so much that a task is "scheduled to a core", as that the kernel running on each core grabs the highest-priority task that's currently runnable, subject to restrictions from the affinity mask. (The scheduler function on this core will only consider tasks whose affinity mask matches this core.)
There is no single-threaded master-control program that decides what each core should be doing; the scheduler in normal kernels is a cooperative multi-threaded algorithm.
It's mostly not the actual cost of CPU time in the kernel's scheduler function, it's that the task runs slower on a new core.

Can Multiprocessor CPUs avoid context-switching?

Today's computer architecture are trying to maximize the number of registers. It is faster to access a register (which is an integrated memory circuit near the cpu) than to access first-level cache. The problem is, that each context switch has to save all registers into cache, because the next thread needs other register values. What a modern CPU is doing is to cycle in one second through 100 tasks and everytime it saves the registers, and fetches the old one until the task can be started.
IMHO it would be nice to use one CPU for one task, and no context switching is happening. That means we get 100 CPUs, each 1000 registers which has to be never saved. Is that possible or have I a ignored an important detail?
The only way to completely avoid context switching is by having at least as many cores as there are tasks. Generally, there is no guarantee regarding the maximum number of tasks that may run. Current GPUs and manycore processors and co-processors contain hundreds of small cores. If you put multiple of these things in the same system or in a cluster of systems, you can have thousands or more cores. Still, even if you could avoid context switching with such design, these cores are much slower than the traditional high-end CPU cores, so the net effect might be negative.
But let's take a step back here. The number of context switches is not primarily determined by the number of tasks and cores. Tasks don't just perform computations, they also need to interact with I/O devices and wait for things to happen such as results from other tasks or user input. So some tasks would be in a wait state. The overhead of context switching depends on not only the number of tasks but also the behavior of these tasks.
Both processors architects and OS developers are aware of context switching overhead and employ a variety of techniques to alleviate it. For example, x86 provides a number of instructions that are tuned to saving the context (partially) of the current task. The OS thread scheduler uses techniques such as priorities, preemption (with possibly large time slices on servers), and priority boosting. All of these help reducing the number of context switches and therefore their overall overhead. In addition, reducing the overhead of context switching is not the only thing that matters. In particular, the responsiveness of the system is very important as well, which is at odds with that overhead.

What are some of the advantages and disadvantages of user mode and kernel mode

In an Operating System, threads are typically handled in user mode or kernel mode. What are some of the advantages and disadvantages of each?
User-mode threads are scheduled in user mode by something in the process, and the process itself is the only thing handled by the kernel scheduler.
That means your process gets a certain amount of grunt from the CPU and you have to share it amongst all your user mode threads.
Simple case, you have two processes, one with a single thread and one with a hundred threads.
With a simplistic kernel scheduling policy, the thread in the single-thread process gets 50% of the CPU and each thread in the hundred-thread process gets 0.5% each.
With kernel mode threads, the kernel itself manages your threads and schedules them independently. Using the same simplistic scheduler, each thread would get just a touch under 1% of the CPU grunt (101 threads to share the 100% of CPU).
In an Operating System, threads are typically handled in user mode or kernel mode.
Typically threads are handled in kernel mode.
What are some of the advantages and disadvantages of each?
In theory, the advantage of handling threads in user mode is that it avoids the cost of switching to/from kernel when a thread needs to wait for something (which can be relatively expensive as it involves privilege level switches). In practice this "advantage" often doesn't happen because the thread has to switch to kernel anyway, to ask kernel to do whatever the thread would wait for (e.g. switching to kernel to ask it to read data from a file and then returning to user-space to block/wait instead of blocking/waiting in the kernel while you're already in the kernel). Mostly; it only helps if the kernel isn't involved at all, which only really happens when user-space threads communicate with or share locks with other threads in the same process.
The advantage of handling threads in kernel is that the kernel can support thread priorities properly. For example, if you have two processes that both have a very high priority thread and a very low priority thread; then kernel can make sure CPU time is given to the high priority thread/s when possible (including pre-empting low priority threads when a high priority thread unblocks) because it knows about all threads; but user-space can't do this - one process doesn't know about threads belonging to a different process, so user threading will get it wrong and ruin performance (one process giving CPU time to its own very low priority thread while a very high priority thread belonging to a different process needs the CPU and doesn't get it).
The other advantage of handling threads in the kernel is that (especially for systems with multiple CPUs) the kernel has access to better information and can make smarter scheduling decisions. This includes balancing the load (from any number of processes) across all CPUs while taking into account "CPU topology" (NUMA, SMT, etc; possibly including heterogeneous CPUs - e.g. "big.LITTLE" arrangements); and making trade-offs between thread priorities, CPU temperatures and power consumption (e.g. if one of the CPU's is getting too hot, reduce that CPU's clock speed to let it cool down and use it for low priority threads so that the performance of high priority threads isn't effected).

What is a "Logical CPU Core"

I am reading some Operating Systems materials. I read this phrase that confused me a little:
"Multicore refers to a computer or processor that has more than one logical CPU core, and that can execute multiple instructions at the same time."
What is a "logical CPU core", is it a processor? Does it correspond to something physical, or is it the OS which sees logical CPU cores but in reality there is less physical processors than logical CPU cores?
A logical CPU core contains the complete architectural context of a uniprocessor. This is the unit for which the OS can do scheduling and control architectural state such as the address for exceptions (for an architecture that does not hardwire such).
There are two common cases where it will not correspond one-to-one with a physical core. First, a single physical core can implement multiple virtual processors, e.g., Intel's Hyper-Threading. In this case the OS scheduler should be aware that virtual processors may share various resources such as instruction fetch, instruction scheduling hardware, and execution units, which generally means that tasks should be scheduled to distinct physical cores to maximize performance. (This issue also applies to a lesser extent to distinct cores that share L2 cache. Such concerns are somewhat related to NUMA optimizations for multi-CPU computers.)
In the second case, a hypervisor's virtualization of the hardware can present an arbitrary number of cores to the OS. While a hypervisor would typically make visible to a guest OS no more logical processors than provided by hardware (i.e., including virtual processors associated with hardware multithreading), theoretically the hypervisor could present an arbitrary number of processors to the OS (just as an OS can present the impression of an arbitrary number of processors to the application layer by using time slicing). In such a software virtualization context, the hypervisor may not expose to the OS the nature of the processors, so the OS could only treat them as abstract units for scheduling.
Somewhat complicating this division, it is also possible for hardware to implement multithreading without providing a full virtual processor for each thread. E.g., the MIPS Multithreading Application Specific Extension makes a distinction between Virtual Processing Elements (which behave as distinct processors in terms of architectural state) and Thread Contexts (which share the system coprocessor among threads in the same VPE). As a further complication, it may be possible for Thread Contexts to be migrated among VPEs. E.g., a physical processor core might have two VPEs and five Thread Contexts and the OS might be allowed to assign a given TC to either VPE such that either VPE could have between one and four TCs. In addition, unprivileged software can FORK and YIELD threads without OS involvement if spare hardware threads are available (in the case of FORK) or at least one thread will still be active (in the case of YIELD).
For MIPS MT-ASE, the OS would generally only be concerned with Thread Contexts, but some optimizations are possible with a more complete knowledge of the actual hardware configuration and some correctness issues are possible if a Thread Context is treated as a Virtual Processing Element.
It might be helpful to have some background knowledge:
Processor
A processor could describe either a single execution core or a single physical multi-core chip. The context of use will define the meaning of the term. e.g Normal PC computer should only have one processor
Chips
A chip refers to a physical integrated circuit (IC) on a computer. A chip is usually referred to an execution unit that can be single- or multi-core technology.
Sockets
The socket refers to a physical connector on a computer motherboard that accepts a single physical chip. Many motherboards can have multiple sockets that can, in turn, accept multi-core chips. 
Cores
Since the advent of multi-core technology, such as dual-cores and quad-cores. Essentially a core comprises a logical execution unit containing an L1 cache and functional units. Cores are able to independently execute programs or threads. Supercomputers are listed as having thousands of cores.
Hyper-threading
Hyper-threading is an Intel technology that originally preceded multi-core systems, and was used to make a single core appear logically as multiple cores on the same chip. Hyper-threading improves performance by sharing the computational workload between multiple cores whenever possible, allowing the operating system to schedule more than one process at a time. For more, see Intel Hyper-Threading Technology. 
Physical/Logical Cores
sockets and cores
As shown in the picture, you have 2 sockets, and each socket has 4 cores, and each core can execute 4 threads currently(due to Hyper-threading). In this case, if you use command lscpu on Linux, you may see you have 32 CPUs. Actually, you have 1 chip, 2 sockets, 8 cores, and 32 CPUs (From Linux perspective)
I guess it referes to ALU (Arithmetic Logical Unit) of CPU.
The ALU unit of any proccessor is the part pf the CPU reasponsible for performing all the arithmetic and logical operations.

How are the stack pointer and program status word maintained in multiprocessor architecture?

In a multi-processor architecture, how are registers organized?
For example, in a 4 cores processor, a minimum of 4 processes can run at a time.
How are stack pointer, program status registers and program counter organized?
What about other general purpose registers?
My guess is, each core will have a separate set of registers.
Imagine 4 completely separate computers, each with a single-core CPU. A 4-core computer is like that; except:
All CPUs share the same physical address space (and can all use the same RAM, PCI devices, etc)
Interrupt/IRQ controllers may be designed so the OS can tell it which CPU/s should be interrupted by the IRQ
CPUs are typically able to signal each other (e.g. "inter-processor interrupts")
Some CPUs may share some caches
Some CPUs may share some control registers (e.g. for things like power management, cache configuration, etc)
For modern CPUs, some CPUs may share some or all execution units (SMT, hyper-threading, etc)
For modern systems (where memory controller is built into the physical chip) some CPUs may share the same memory controller
Most of this is "invisible" to most software. Unless you're writing part of an OS that controls power management, you don't need to care if power management is shared between CPUs or not; unless you're writing an OSs/kernel's low level IRQ handling you don't need to care how IRQs reach device drivers, etc.
The same applies to how many CPUs actually exist. The OS/kernel normally ensures that applications only need to care about higher level abstractions (e.g. "threads"). How this higher level abstraction works depends on the OS - normally (for most OSs) the OS/kernel attempts to provide the illusion that all threads are running at the same time by switching between them "quickly enough" (where if there's only 4 CPUs a maximum of 4 threads actually do run at the exact same time), but it's usually far more complex than this (involving things like thread priorities, pre-emption rules, etc) and (even though it's relatively rare) it may be very different (e.g. for some systems the same thread may be run on multiple CPUs at the same time for fault tolerance/redundancy purposes; for some systems there might just be a queue of functions and their data, where multiple functions run at the same time; etc).
Multiprocessor means that there are at least two discrete processors on the same platform -- usually on the same motherboard
A subset is distributed multiprocessing, where two PC's for example are programmed to appear as a single system with two processors
Multicore means that the most or all of the CPU is replicated many times on single chip.
- this also means that stack, status, program counter and all generic purpose registers are replicated.
Hyperthreading is a technique, where each stage of the pipeline executes commands from different processes.
Multiprocessing means in OS level that everything a process consists of, is switched every now and then.
Multithreading is a lightweight variant of multiprocessing, where the threads e.g. share the same code segment and same data segment, same file descriptors etc. but have unique stacks (and of course unique status registers and program counters)
Also means multiprocessing in general (hardware architecture)