What is a "Logical CPU Core" - operating-system

I am reading some Operating Systems materials. I read this phrase that confused me a little:
"Multicore refers to a computer or processor that has more than one logical CPU core, and that can execute multiple instructions at the same time."
What is a "logical CPU core", is it a processor? Does it correspond to something physical, or is it the OS which sees logical CPU cores but in reality there is less physical processors than logical CPU cores?

A logical CPU core contains the complete architectural context of a uniprocessor. This is the unit for which the OS can do scheduling and control architectural state such as the address for exceptions (for an architecture that does not hardwire such).
There are two common cases where it will not correspond one-to-one with a physical core. First, a single physical core can implement multiple virtual processors, e.g., Intel's Hyper-Threading. In this case the OS scheduler should be aware that virtual processors may share various resources such as instruction fetch, instruction scheduling hardware, and execution units, which generally means that tasks should be scheduled to distinct physical cores to maximize performance. (This issue also applies to a lesser extent to distinct cores that share L2 cache. Such concerns are somewhat related to NUMA optimizations for multi-CPU computers.)
In the second case, a hypervisor's virtualization of the hardware can present an arbitrary number of cores to the OS. While a hypervisor would typically make visible to a guest OS no more logical processors than provided by hardware (i.e., including virtual processors associated with hardware multithreading), theoretically the hypervisor could present an arbitrary number of processors to the OS (just as an OS can present the impression of an arbitrary number of processors to the application layer by using time slicing). In such a software virtualization context, the hypervisor may not expose to the OS the nature of the processors, so the OS could only treat them as abstract units for scheduling.
Somewhat complicating this division, it is also possible for hardware to implement multithreading without providing a full virtual processor for each thread. E.g., the MIPS Multithreading Application Specific Extension makes a distinction between Virtual Processing Elements (which behave as distinct processors in terms of architectural state) and Thread Contexts (which share the system coprocessor among threads in the same VPE). As a further complication, it may be possible for Thread Contexts to be migrated among VPEs. E.g., a physical processor core might have two VPEs and five Thread Contexts and the OS might be allowed to assign a given TC to either VPE such that either VPE could have between one and four TCs. In addition, unprivileged software can FORK and YIELD threads without OS involvement if spare hardware threads are available (in the case of FORK) or at least one thread will still be active (in the case of YIELD).
For MIPS MT-ASE, the OS would generally only be concerned with Thread Contexts, but some optimizations are possible with a more complete knowledge of the actual hardware configuration and some correctness issues are possible if a Thread Context is treated as a Virtual Processing Element.

It might be helpful to have some background knowledge:
Processor
A processor could describe either a single execution core or a single physical multi-core chip. The context of use will define the meaning of the term. e.g Normal PC computer should only have one processor
Chips
A chip refers to a physical integrated circuit (IC) on a computer. A chip is usually referred to an execution unit that can be single- or multi-core technology.
Sockets
The socket refers to a physical connector on a computer motherboard that accepts a single physical chip. Many motherboards can have multiple sockets that can, in turn, accept multi-core chips. 
Cores
Since the advent of multi-core technology, such as dual-cores and quad-cores. Essentially a core comprises a logical execution unit containing an L1 cache and functional units. Cores are able to independently execute programs or threads. Supercomputers are listed as having thousands of cores.
Hyper-threading
Hyper-threading is an Intel technology that originally preceded multi-core systems, and was used to make a single core appear logically as multiple cores on the same chip. Hyper-threading improves performance by sharing the computational workload between multiple cores whenever possible, allowing the operating system to schedule more than one process at a time. For more, see Intel Hyper-Threading Technology. 
Physical/Logical Cores
sockets and cores
As shown in the picture, you have 2 sockets, and each socket has 4 cores, and each core can execute 4 threads currently(due to Hyper-threading). In this case, if you use command lscpu on Linux, you may see you have 32 CPUs. Actually, you have 1 chip, 2 sockets, 8 cores, and 32 CPUs (From Linux perspective)

I guess it referes to ALU (Arithmetic Logical Unit) of CPU.
The ALU unit of any proccessor is the part pf the CPU reasponsible for performing all the arithmetic and logical operations.

Related

Multicore CPUs, Different types of CPUs and operating systems

An operating system should support CPU architecture and not specific CPU, for example if some company has Three types of CPUs all based of x86 architecture,
one is a single core processor, the other one a dual core and the last one has five cores, The operating system isn't CPU type based, it's architecture based, so how would the kernel know if the CPU it is running on supports multi-core processing or how many cores does it even have....
also for example Timer interrupts, Some versions of Intel's i386 processor family use PIT and others use the APIC Timer, to generate periodic timed interrupts, how does the operating system recognize that if it wants for example to config it... ( Specifically regarding timers I know they are usually set by the BIOS but the ISR handles for Timed interrupts should also recognize the timer mechanism it is running upon in order to disable / enable / modify it when handling some interrupt )
Is there such a thing as a CPU Driver that is relevant to the OS and not the BIOS?, also if someone could refer me to somewhere I could gain more knowledge about how Multi-core processing is triggered / implemented by the kernel in terms of "code" It would be great
The operating system kernel almost always has an abstraction layer called the HAL, which provides an interface above the hardware the rest of the kernel can easily use. This HAL is also architecture-dependent and not model-dependent. The CPU architecture has to define some invokation method that will allow the HAL to know about which features are present and which aren't present in the executing processor.
On the IA32/64 architecture, the is an instruction known as CPUID. You may ask another question here:
Was CPUID present from the beginning?
No, CPUID wasn't present in the earliest CPUs. In fact, it came a lot later with the developement in i386 processor. The 21st bit in the EFLAGS register indicates support for the CPUID instruction, according to Intel Manual Volume 2A.
PUSHFD
Using the PUSHFD instruction, you can copy the contents of the EFLAGS register on the stack and check if the 21st bit is set.
How does CPUID return information, if it is just an instruction?
The CPUID instruction returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers. Its output depends on the values put into the EAX and ECX registers before execution.
Each value (which is valid for CPUID) that can be put in the EAX register is known as a CPUID leaf. Some leaves have subleaves, .i.e. they depend on an sub-leaf value in the ECX register.
How is multi-core support detected at the OS kernel level?
There is a standard known as ACPI (Advanced Configuration and Power Interface) which defines a set of ACPI tables. These include the MADT or multiple APIC descriptor table. This table contains entries that have information about local APICs, I/O APICs, Interrupt Redirections and much more. Each local APIC is associated with only one logical processor, as you should know.
Using this table, the kernel can get the APIC-ID of each local APIC present in the system (only those ones whose CPUs are working properly). The APIC id itself is divided into topological Ids (bit-by-bit) whose offsets are taken using CPUID. This allows the OS know where each CPU is located - its domain, chip, core, and hyperthreading id.

CPU Scheduling policy (by multi thread)

In general Operation system reference book like Operating system concepts...
When it explain the CPU scheduling (FCFS, RR, ...),
I think that sounds like single CPU / single thread by default.
so, I wonder if that applies to single CPU / multi-thread by default.
thread is the smallest of cpu scheduling unit, so i think it's also applies to single CPU / multi-thread.
A single CPU (or core, to be exact) can run only one thread at a time. The OS gives the impression of multitasking by constantly switching which thread is run.
If your question is about difference between single-core CPUs and multi-core CPUs, multi-core CPUs are handled in the same way as multiple single-core CPUs.

memory access in multi core processors vs multiple cpu's

I have a question,
is it possible for multiple processor machine to access data from RAM ( single ram system ) ?
for eg machine has 2 processors p1, p2 which are executing in parallel , is it possible that they access same ram for read and write ( ofcos write is not on same location )
i understand that in multi core machines it will not be possible since data bus is shared.
As long as the RAM is mapped to all cores or processors (such as in a multi-threaded application) it may be accessed from any core or processor.
There is no difference if you're discussing single processor/single core, single processor/multi-core, multi-processor/(each with a) single-core or multi-processor/multi-core. Since they have no system RAM of their own - the RAM in caches is not system RAM - the only RAM available to all of them is the system RAM.
The only difference between multi-processor/single core (as in older systems) and single processor/multi-core (modern systems) is that the former needs to coordinate RAM accesses with off-chip logic whereas for the latter all coordination is on-chip and sometimes even on-die which of course results in much faster and more electronically efficient RAM accesses.
In the case of AMD's multi-processor/multi-core solutions each processor owns part of the system RAM. The processors themselves are interconnected with high-speed data (HyperTransport) channels to facilitate accesses to RAM not owned by the processor accessing it.
In any case it is up to the programmer to decide how the processors/cores access the RAM. Sure they can read and/or write to the same location if that is what the programmer wants.

How are the stack pointer and program status word maintained in multiprocessor architecture?

In a multi-processor architecture, how are registers organized?
For example, in a 4 cores processor, a minimum of 4 processes can run at a time.
How are stack pointer, program status registers and program counter organized?
What about other general purpose registers?
My guess is, each core will have a separate set of registers.
Imagine 4 completely separate computers, each with a single-core CPU. A 4-core computer is like that; except:
All CPUs share the same physical address space (and can all use the same RAM, PCI devices, etc)
Interrupt/IRQ controllers may be designed so the OS can tell it which CPU/s should be interrupted by the IRQ
CPUs are typically able to signal each other (e.g. "inter-processor interrupts")
Some CPUs may share some caches
Some CPUs may share some control registers (e.g. for things like power management, cache configuration, etc)
For modern CPUs, some CPUs may share some or all execution units (SMT, hyper-threading, etc)
For modern systems (where memory controller is built into the physical chip) some CPUs may share the same memory controller
Most of this is "invisible" to most software. Unless you're writing part of an OS that controls power management, you don't need to care if power management is shared between CPUs or not; unless you're writing an OSs/kernel's low level IRQ handling you don't need to care how IRQs reach device drivers, etc.
The same applies to how many CPUs actually exist. The OS/kernel normally ensures that applications only need to care about higher level abstractions (e.g. "threads"). How this higher level abstraction works depends on the OS - normally (for most OSs) the OS/kernel attempts to provide the illusion that all threads are running at the same time by switching between them "quickly enough" (where if there's only 4 CPUs a maximum of 4 threads actually do run at the exact same time), but it's usually far more complex than this (involving things like thread priorities, pre-emption rules, etc) and (even though it's relatively rare) it may be very different (e.g. for some systems the same thread may be run on multiple CPUs at the same time for fault tolerance/redundancy purposes; for some systems there might just be a queue of functions and their data, where multiple functions run at the same time; etc).
Multiprocessor means that there are at least two discrete processors on the same platform -- usually on the same motherboard
A subset is distributed multiprocessing, where two PC's for example are programmed to appear as a single system with two processors
Multicore means that the most or all of the CPU is replicated many times on single chip.
- this also means that stack, status, program counter and all generic purpose registers are replicated.
Hyperthreading is a technique, where each stage of the pipeline executes commands from different processes.
Multiprocessing means in OS level that everything a process consists of, is switched every now and then.
Multithreading is a lightweight variant of multiprocessing, where the threads e.g. share the same code segment and same data segment, same file descriptors etc. but have unique stacks (and of course unique status registers and program counters)
Also means multiprocessing in general (hardware architecture)

What are the differences between multi-CPU, multi-core and hyper-thread?

Could anyone explain to me the differences between multi-CPU, multi-core, and hyper-thread? I am always confused about these differences, and about the pros/cons of each architecture in different scenarios.
Here is my current understanding after learning online and learning from others' comments.
I think hyper-thread is the most inferior technology among them, but cheap. Its main idea is duplicate registers to save context switch time;
Multi processor is better than hyper-thread, but since different CPUs are on different chips, the communication between different CPUs is of longer latency than multi-core, and using multiple chips, there is more expense and more power consumption than with multi-core;
multi-core integrates all the CPUs on a single chip, so the latency of communication between different CPUs are greatly reduced compared with multi-processor. Since it uses one single chip to contain all CPUs, it consumer less power and is less expensive than a multi processor system.
Is this correct?
Multi-CPU was the first version: You'd have one or more mainboards with one or more CPU chips on them. The main problem here was that the CPUs would have to expose some of their internal data to the other CPU so they wouldn't get in their way.
The next step was hyper-threading. One chip on the mainboard but it had some parts twice internally so it could execute two instructions at the same time.
The current development is multi-core. It's basically the original idea (several complete CPUs) but in a single chip. The advantage: Chip designers can easily put the additional wires for the sync signals into the chip (instead of having to route them out on a pin, then over the crowded mainboard and up into a second chip).
Super computers today are multi-cpu, multi-core: They have lots of mainboards with usually 2-4 CPUs on them, each CPU is multi-core and each has its own RAM.
[EDIT] You got that pretty much right. Just a few minor points:
Hyper-threading keeps track of two contexts at once in a single core, exposing more parallelism to the out-of-order CPU core. This keeps the execution units fed with work, even when one thread is stalled on a cache miss, branch mispredict, or waiting for results from high-latency instructions. It's a way to get more total throughput without replicating much hardware, but if anything it slows down each thread individually. See this Q&A for more details, and an explanation of what was wrong with the previous wording of this paragraph.
The main problem with multi-CPU is that code running on them will eventually access the RAM. There are N CPUs but only one bus to access the RAM. So you must have some hardware which makes sure that a) each CPU gets a fair amount of RAM access, b) that accesses to the same part of the RAM don't cause problems and c) most importantly, that CPU 2 will be notified when CPU 1 writes to some memory address which CPU 2 has in its internal cache. If that doesn't happen, CPU 2 will happily use the cached value, oblivious to the fact that it is outdated
Just imagine you have tasks in a list and you want to spread them to all available CPUs. So CPU 1 will fetch the first element from the list and update the pointers. CPU 2 will do the same. For efficiency reasons, both CPUs will not only copy the few bytes into the cache but a whole "cache line" (whatever that may be). The assumption is that, when you read byte X, you'll soon read X+1, too.
Now both CPUs have a copy of the memory in their cache. CPU 1 will then fetch the next item from the list. Without cache sync, it won't have noticed that CPU 2 has changed the list, too, and it will start to work on the same item as CPU 2.
This is what effectively makes multi-CPU so complicated. Side effects of this can lead to a performance which is worse than what you'd get if the whole code ran only on a single CPU. The solution was multi-core: You can easily add as many wires as you need to synchronize the caches; you could even copy data from one cache to another (updating parts of a cache line without having to flush and reload it), etc. Or the cache logic could make sure that all CPUs get the same cache line when they access the same part of real RAM, simply blocking CPU 2 for a few nanoseconds until CPU 1 has made its changes.
[EDIT2] The main reason why multi-core is simpler than multi-cpu is that on a mainboard, you simply can't run all wires between the two chips which you'd need to make sync effective. Plus a signal only travels 30cm/ns tops (speed of light; in a wire, you usually have much less). And don't forget that, on a multi-layer mainboard, signals start to influence each other (crosstalk). We like to think that 0 is 0V and 1 is 5V but in reality, "0" is something between -0.5V (overdrive when dropping a line from 1->0) and .5V and "1" is anything above 0.8V.
If you have everything inside of a single chip, signals run much faster and you can have as many as you like (well, almost :). Also, signal crosstalk is much easier to control.
You can find some interesting articles about dual CPU, multi-core and hyper-threading on Intel's website or in a short article from Yale University.
I hope you find here all the information you need.
In a nutshell: multi-CPU or multi-processor system has several processors. A multi-core system is a multi-processor system with several processors on the same die. In hyperthreading, multiple threads can run on the same processor (that is the context-switch time between these multiple threads is very small).
Multi-processors have been there for 30 years now but mostly in labs. Multi-core is the new popular multi-processor. Server processors nowadays implement hyperthreading along with multi-processors.
The wikipedia articles on these topics are quite illustrative.
Hyperthreading is a cheaper and slower alternative to having multiple-cores
The Intel Manual Volume 3 System Programming Guide - 325384-056US September 2015 8.7 "INTEL HYPER-THREADING TECHNOLOGY ARCHITECTURE" describes HT briefly. It contains the following diagram:
TODO it is slower by how much percent in average in real applications?
Hyperthreading is possible because modern single CPUs cores already execute multiple instructions at once with the instruction pipeline https://en.wikipedia.org/wiki/Instruction_pipelining
The instruction pipeline is a separation of functions inside of a single core to ensure that each part of the circuit is used at any given time: reading memory, decoding instructions, executing instructions, etc.
Hyperthreading separates functions further by using:
a single backend, which actually runs the instructions with its pipeline.
Dual core has two backends, which explains the greater cost and performance.
two front-ends, which take two streams of instructions and order them in a way to maximize pipelining usage of the single backend by avoiding hazards.
Dual core would also have 2 front-ends, one for each backend.
There are edge cases where instruction reordering produces no benefit, making hyperthreading useless. But it produces a significant improvement in average.
Two hyperthreads in a single core share further cache levels (TODO how many? L1?) than two different cores, which share only L3, see:
Multiple threads and CPU cache
How are cache memories shared in multicore Intel CPUs?
The interface that each hyperthread exposes to the operating system is similar to that of an actual core, and both can be controlled separately. Thus cat /proc/cpuinfo shows me 4 processors, even though I only have 2 cores with 2 hyperthreads each.
Operating systems can however take advantage of knowing which hyperthreads are on the same core to run multiple threads of a given program on a single core, which might improve cache usage.
This LinusTechTips video contains a light-hearted non-technical explanation: https://www.youtube.com/watch?v=wnS50lJicXc
Multi-CPU is a bit like multicore, but communication can only happen through RAM, not L3 cache
This means that if possible, you want to partition tasks that use the same memory a lot for each separate CPU.
E.g. the following SBI-7228R-T2X blade server contains 4 CPUs, 2 on each node:
Source.
We can see that there seem to be 4 sockets for the CPUs, each covered by a heat sink, with one open.
I think across the nodes, they don't even share RAM memory and must communicate through some kind of networking, thus representing one further step up on the hyperthread/multicore/multi-CPU hierarchy, TODO confirm:
https://scicomp.stackexchange.com/questions/7530/difference-between-nodes-and-cpus-when-running-software-on-a-cluster
SLURM nodes, tasks, cores, and cpus
https://www.quora.com/In-High-Performance-Computing-what-exactly-is-the-difference-between-the-terms-%E2%80%9Ccores-%E2%80%9D-%E2%80%9Cprocessors-%E2%80%9D-%E2%80%9Cnodes-%E2%80%9D-and-%E2%80%9Cclusters%E2%80%9D