System startup of multicore computer - multicore

I would really like to know how does a multicore CPU start when the computer starts up. I imagine there is like a "dominant core" that loads the BIOS and later on ther kernel to RAM and wakes up the rest of the cores leaving them waiting for code to run (like an infinite while loop?). But that it's only how I guess it works.
Other question is, after the kernel is loaded on memory all cores can do system calls, right?. And how does one core control the tasks of the other cores? Which instructions are used? (in x86 / x86-64)

Yes there is a boot CPU. The firmware handles that. It's usually CPU 0, but what if that one is missing or defective? Then it gets trickier.
On x86 platforms there's the ACPI tables which describe the CPU and memory layouts. The operating system starts the other CPUs with IPI (inter processor interrupts) which kick them out of idle into the interrupt handlers (which were set in memory) and then into operating system functions. Which then choose threads to run and start doing useful things.
If you really want to know how it all works read the source code for Linux or one of the BSDs.
Update: Looks like I was wrong about IPI. It is using interrupts but not the normal IPI ones. The Linux SMP boot is here: https://github.com/torvalds/linux/blob/master/arch/x86/kernel/smpboot.c
It seems to use NMI or sets the CPU reset.

Related

What makes the firecracker microvm "micro" vs something like qemu?

From https://firecracker-microvm.github.io/:
Firecracker is an alternative to QEMU that is purpose-built for running serverless functions and containers safely and efficiently, and nothing more. Firecracker is written in Rust, provides a minimal required device model to the guest operating system while excluding non-essential functionality (only 5 emulated devices are available: virtio-net, virtio-block, virtio-vsock, serial console, and a minimal keyboard controller used only to stop the microVM). This, along with a streamlined kernel loading process enables a < 125 ms startup time and a < 5 MiB memory footprint. The Firecracker process also provides a RESTful control API, handles resource rate limiting for microVMs, and provides a microVM metadata service to enable the sharing of configuration data between the host and guest.
So what is the main thing that makes qemu slower—primarily the device emulation?
And that startup time of 125ms + 5MB is in contrast to...what?
Yes, firecracker boots faster and is lighter than QEMU, the numbers vary (from little to 10x) with the kernel used and options (drivers, devices) given.
There is an older paper on that here: https://dreadl0ck.net/papers/Firebench.pdf – which finds firecracker faster but not impressively so:
In our experiments
the mean kernel boot time of Firecracker microVM is 800ms
in the sequential experiments, and 1000ms in the concurrent
scenario. QEMU boots the Linux kernel 18% slower on
average. […] It is important to note
that the network stack setup during takes additional time,
without initialising the network stack the machine is able to
boot in 150ms-200ms. The reduced boot time of Firecracker
can be explained by the fact that Firecracker only emulates five
devices: virtio-net, virtio-block, virtio-vsock, serial console,
and a minimal keyboard controller used only to stop the
microVM.
But I would evaluate this from another perspective: firecracker is purposefully minimal to present less possibility for configuration mishaps and importantly minimal attack surface (it's usually used to run untrusted workloads). Also full control by ReST-API makes it easy to orchestrate.

How are applications and data accessed by the CPU from RAM

I am having a bit of trouble understanding how applications and data are accessed by the CPU from RAM after the application has been loaded into RAM and a file opened (thus data for the file also stored in RAM).
By my understanding, a CPU just gets instructions from RAM as the program counter ticks or carries out tasks after an interrupt. How then does it access the application and data. Is it that it doesn't and still just gets instructions (for example to load a file on the hard drive to be opened in the application) and processes any requests made by the application which are stored in RAM as instructions thereafter (like saving a file). Or does the application and data relating to an opened file (for example) just stay in RAM and not get accessed by the CPU at all.
Similarly, after reading an article, it said that a copy of the operating system is stored in RAM. The CPU can then access the operating system. (I thought the CPU just worked with instructions from RAM). How does it then communicate with the operating system and how are interrupts sent to the CPU, from the copy of the OS in RAM or from the OS in the hard drive.
Sorry if this is really confusing, alot i didn't understand.
Root of your question: Lack of clear differentiation between Computer's Hardware and Computer's Software.
Components of a Computer System
Just so that we are clear about both of them and that we understand their nature, let me state as follows:
Hardware: It includes CPU, RAM, Disk, Register, Graphics Card, Network Card, Memory BUS and everything that you can touch and call to be the 'Computer'. It is the body.
Software: It includes Operating System, Program, CPU instruction, Compiler, Programming Language and almost everything intangible about the computer. It is the soul.
Firmware: It is that basic code which is absolutely essential for hardware's working. This is stored on a Read Only Memory installed in the hardware itself. This piece of software is vital for hardware therefore is considered in the mid of hardware and software and hence called Firmware.
We will start with understanding from the time when we say that the computer is up and running and is properly executing our instructions. But at that time you will say - How did I reach here? So I will mention a few points about the startup of the computer.
When the power button is pressed...
...the most primitive and basic input output system (therefore called BIOS), which is hard written on the computer hardware begins execution. This is written on Read Only Memory and this starts the process to get the machine to stand on its own. And it loads the software (Operating System) from one piece of hardware (disks) into another piece of hardware (RAM and CPU registers) enabling the software to work properly with hardware.
Now the body and soul are together and the individual (machine) can work.
Until now, OS is already in RAM and CPU. (Read When the power button is pressed if you doubt it.) Let's handle your question paragraph by paragraph now -
First Paragraph
I am having a bit of trouble understanding how applications and data
are accessed by the CPU from RAM after the application has been loaded
into RAM and a file opened (thus data for the file also stored in
RAM).
The explanation is as follows:
The exact issue here is your thinking that it is CPU and RAM that access the data. CPU and RAM are only executing units.
It is OS (software) that accesses the data by means of CPU and RAM (hardware). It is in the realm of OS where applications are executed.
This is why you can install Linux and Windows on same hardware but cannot execute .exe files in Linux because OS does the execution and not RAM/CPU.
Further, how do CPU and RAM and disk physically interact to bring in the data, execute it, save it back etc. is in the domain of hardware. That would require explanation which involves logic gates (AND, OR, NOT...), diodes, circuitry and a hell lot of other things which an Electronics guy can explain.
Second Paragraph
By my understanding, a CPU just gets instructions from RAM as the
program counter ticks or carries out tasks after an interrupt. How
then does it access the application and data. Is it that it doesn't
and still just gets instructions (for example to load a file on the
hard drive to be opened in the application) and processes any
requests made by the application which are stored in RAM as
instructions thereafter (like saving a file).
As you have guessed it - CPU doesn't get instructions, Operating System does it through CPU. Also, just the way brain doesn't directly instruct the hands and legs to move and instead uses nerves for interaction, the CPU doesn't tell the disks to give/take the data. CPU works with RAM and registers only. Multiple units of hardware work in conjunction to provide a path for data and instruction to travel. The important pieces of involved hardware are:
Processor (CPU and registers built in the CPU)
Cache
Memory (RAM)
Disk
Tape
I like the image provided in this answer. This image not only lists the hardware pieces but also illustrates the mammoth difference in the execution speed of these pieces.
Let's move on to the...
Third Paragraph
Similarly, after reading an article, it said that a copy of the
operating system is stored in RAM. The CPU can then access the
operating system. (I thought the CPU just worked with instructions
from RAM). How does it then communicate with the operating system and
how are interrupts sent to the CPU, from the copy of the OS in RAM or
from the OS in the hard drive.
By now you already know that indeed OS is present in RAM and CPU registers. That is where it lives. That is from where it tells the CPU how to work. If OS would be small enough (or if Registers and Caches would be big enough), the OS would live even closer to CPU.
The CPU does not communicate with the OS. It can't. It is the worker that is controlled by a boss. OS is that boss.
CPU cannot access Operating System. CPU is the body, OS is the soul. Soul tells the body what to do, not vice-versa.
CPU doesn't work with instructions from RAM. It merely executes the instructions given by the Operating System (which may be living in RAM). So even when there is an instruction to load some module of OS into the RAM, it is not RAM/CPU but OS itself that issues that instruction.
Interrupts are of two types - Hardware and Software - and your query is about the software interrupts. Since the executive part of OS is in the RAM, in simple words we can say that interrupts are sent to CPU from OS living in RAM.
Conclusions
The lack of distinction between hardware and software is the basic cause of your confusions. Take some course about Operating Systems on Coursera or Academic Earth for deeper understanding.
It is confusing indeed. Let me try to explain.
CPU and RAM
The CPU is hardwired to the RAM via the 'motherboard', and they work together. The CPU can perform many instructions, but it has to be told what to do by instructions in RAM. The CPU is basically in a loop: all it does it fetch the next instruction from RAM and execute it, over and over.
So how does this RAM get filled with instructions?
BIOS (basic input/output system)
When the computer first boots up, a portion of RAM is filled with data from a chip on the motherboard (the BIOS chip), and the CPU is turned on and starts processing. These are the factory settings.
The data from the BIOS chip that is copied to RAM consists of a library of instructions to access hardware devices (hard disks, CD/ROM, USB storage, network cards etc.),
and a program using that library to load what is called the bootsector, the first sector on the boot device, into RAM, and transfer control to it (with a jump instruction).
BOOTLOADER
The bootsector data that the BIOS program loaded from the boot device is very small - only 440 bytes - but with the help of the BIOS library, this is enough to be able to load more sectors and execute these. The bootsector and the data it loads is called the bootloader, which is in charge of loading the Operating System.
In effect, the bootloader is a more dynamic version of the BIOS: the BIOS program resides in flash memory, whereas the bootloader resides on hard disks, USB sticks, SSD drives etc., and thus can be larger and more complex.
OPERATING SYSTEM
In it's turn, The operating system (OS) is simply a more advanced version of the bootloader, as it can load and run multiple programs from multiple locations at the same time.
--
The BIOS knows about drives.
The Bootloader knows about drives and partitions.
The OS knows about drives, partitions, and file systems.
CPU,as you've noticed, reads the program from RAM, instruction by instruction. When an instruction is executed, it might refer to data stored in memory, which it either fetches explicitly to the registers (internal storage of the CPU, quite small - on x86_64 that's like several 64-bit registers + other stuff like segment registers, IP, SP etc) with a separate instruction, or the data read from the memory (we are talking about small amount of data). That's all it really does.
Loading a file from a disk would be done by asking the appropriate controller to fetch the data into a specific place in memory. CPU is connected to buses which will carry instructions to appropriate controllers.
As to interrupts these are special things - CPU has several interrupt lines which can be activated by various devices, for example your network card. When it receives such an interrupt, it is usually handled by an interrupt handler, which is just a program located in a well-known place in memory. They can be registered by, for example, operating system. Each interrupt line has its own interrupt handler. When interrupt happens, the CPU saves the current state of the program it happens to be executing, handles interrupt, restores the state and resumes the program.
You seem to be asking about addressing modes. At the risk of gross oversimplification (ignoring caching, segments, and logical memory), memory stored as a sequential array accessed by an integer address.
The CPU has a number of internal storage areas called registers. We will call them R0 to Rn. The processor assigns some registers dedicated purposes. One of those registers is the PC.
One common addressing mode is deferred. I indicate this mode as (Rn). An instruction like this:
MOV (R0), R1
uses the value contained in R0 as a memory address, fetches the value stored that memory location, and stores a copy of that value in R1.
An instruction sequence like this:
MOV (R0), R1
MOV (R2), R3
is stored in memory as data (ignoring protection), code, data, and variables all use the same type of memory. In other words, any memory location can be interpreted as code, data, or variable.
The CPU executes the next instruction located at (PC). After executing the instruction, the CPU automatically increments the PC to point to the next instruction.

Operating System vs Monitor

Without going into details, how is a Monitor different from an OS?
I read that first there was Serial Processing in the earlier days, and then Monitors and now OS.
Monitor in this context means Batch Monitor.
In the 1950s - mid 60s, before we had true operating systems, we had Batch Monitors. You would "program" the job onto punch cards and put them on an input queue that the machine would process one by one.
The programmer would sit in front of a monitor, which would display memory dumps, debugging information, etc - it was an incredibly tedious process.
Of course the major drawback of a Batch Monitor is that the CPU was often idle. Because CPU speeds are so much faster than I/O speed, the machine would spend the majority of the time reading in the cards (I/O) while the CPU waited.
Nowadays, modern operating systems can run several processes at once and optimize CPU utilization. When a process on the run queue needs to do I/O, the OS puts it on another queue, and the CPU starts processing the next job. When the I/O is done, that process is moved back to the run queue. This way, the CPU is always doing something.
Edit:
After looking up "batch monitor" and not finding many references to it, it seems that it is more commonly referred to as a "batch system" - here's a book for reference; should be able to find a pdf version online:
Modern Operating Systems.

How are the stack pointer and program status word maintained in multiprocessor architecture?

In a multi-processor architecture, how are registers organized?
For example, in a 4 cores processor, a minimum of 4 processes can run at a time.
How are stack pointer, program status registers and program counter organized?
What about other general purpose registers?
My guess is, each core will have a separate set of registers.
Imagine 4 completely separate computers, each with a single-core CPU. A 4-core computer is like that; except:
All CPUs share the same physical address space (and can all use the same RAM, PCI devices, etc)
Interrupt/IRQ controllers may be designed so the OS can tell it which CPU/s should be interrupted by the IRQ
CPUs are typically able to signal each other (e.g. "inter-processor interrupts")
Some CPUs may share some caches
Some CPUs may share some control registers (e.g. for things like power management, cache configuration, etc)
For modern CPUs, some CPUs may share some or all execution units (SMT, hyper-threading, etc)
For modern systems (where memory controller is built into the physical chip) some CPUs may share the same memory controller
Most of this is "invisible" to most software. Unless you're writing part of an OS that controls power management, you don't need to care if power management is shared between CPUs or not; unless you're writing an OSs/kernel's low level IRQ handling you don't need to care how IRQs reach device drivers, etc.
The same applies to how many CPUs actually exist. The OS/kernel normally ensures that applications only need to care about higher level abstractions (e.g. "threads"). How this higher level abstraction works depends on the OS - normally (for most OSs) the OS/kernel attempts to provide the illusion that all threads are running at the same time by switching between them "quickly enough" (where if there's only 4 CPUs a maximum of 4 threads actually do run at the exact same time), but it's usually far more complex than this (involving things like thread priorities, pre-emption rules, etc) and (even though it's relatively rare) it may be very different (e.g. for some systems the same thread may be run on multiple CPUs at the same time for fault tolerance/redundancy purposes; for some systems there might just be a queue of functions and their data, where multiple functions run at the same time; etc).
Multiprocessor means that there are at least two discrete processors on the same platform -- usually on the same motherboard
A subset is distributed multiprocessing, where two PC's for example are programmed to appear as a single system with two processors
Multicore means that the most or all of the CPU is replicated many times on single chip.
- this also means that stack, status, program counter and all generic purpose registers are replicated.
Hyperthreading is a technique, where each stage of the pipeline executes commands from different processes.
Multiprocessing means in OS level that everything a process consists of, is switched every now and then.
Multithreading is a lightweight variant of multiprocessing, where the threads e.g. share the same code segment and same data segment, same file descriptors etc. but have unique stacks (and of course unique status registers and program counters)
Also means multiprocessing in general (hardware architecture)

Advice on using hypervisor to run a Real Time OS in parallel with Windows/Linux

What are your advice/experience of using a hypervisor (e.g. RTS Real-Time Hypervisor) to run an RTOS in parallel with a non real time OS. Are there any performance implications? Are there any risks involved? (like how can you ensure that the non-real time OS will not interfere with the real time aspects of the RTOS)
From what I understand, a dual core (or hyperthreading) CPU has to be used so that you can assign each OS its own core.
no, it doesn't need dual core or hyperthreading.
no, the non-RT tasks doesn't interfere with RT ones.
The main idea is to have one RTOS, which executes tasks written specifically for this OS, using it's own API. These tasks are set in string priority levels, where a higher priority task will allways take precedence over a lower priority one. The lowest priority tasks will execute only as long as there's no other task available to run (that is, they're all waiting for some event, either a timeout or an external signal).
all this is just like a usual multitasking OS scheduler, it doesn't need multple cores or hardware threads; it's just that the timing guarantees are radically different, and the available API reflects this fact.
In those hybrid implementations, there's a single lowest-level task that runs a full non-RT OS kernel, usually Linux or some other unix-like kernel (i don't know about windows, but should work the same). Nowadays, we call this architecture a hypervisor.
so, since the whole non-RT OS is run as the lowest-priority task, it doesn't have any guarantee of getting processing time at all. any RT task can interrupt it at any time, even when accessing hardware. to keep this, usually the RT tasks have very limited access to the hardware, or there are minimal arbitrations at very low level. ie: can interrupt a disk access (possibly resulting in a access error); but not a PCI access (as long as are short-lived and time-bounded)
there's also some soft-RT extensions to the Linux scheduler for some time now; but the timing guarantees aren't so tight as some hard-RT OSes built with that in mind.