OS boot in a multiprocessor system - operating-system

In a single processor system, when powered on the processor starts executing the boot rom code and the multiple stages of the boot. However how does this work in a multi process system? Does one processor act as the master? Who decides which processor is the master and the others helpers?
How and where is it configured?
Are the page tables shared between the processors? The processor caches are obviously different, at least the L1 caches is.

Multiprocessor Booting
1 One processor designated as ‘Boot Processor’ (BSP)
– Designation done either by Hardware or BIOS
– All other processors are designated AP (Application Processors)
2- BIOS boots the BSP
3- BSP learns system configuration
4- BSP triggers boot of other AP
– Done by sending an Startup IPI (inter processor interrupt) signal to
the AP
look here
and here for more details

Related

How can operating system detect live resize of disk capacity?

I saw the following discussion and had some questions:
live resize of a NVMe drive
If the physical capacity of the nvme device changes (e.g., from 10GB to 20GB), how the operating system detect it without rebooting?
In the above link, re-scanning pci bus is solution.
When the re-scanning be executed, does the operating system ask the nvme device to update its meta-info (e.g., capacity, etc.) ?
How does OS interact with disk specifically? (How to read changed device parameters from the disk, not the old device parameters in memory?)
This is an AWS virtual-machine probably so the disk is actually a virtual-disk. You can't resize a physical disk like you upgrade its capacity physically (you'd need to change the disk).
With that said, this machine probably runs on top of a type 1 hypervisor. What I understand about these is that the virtual machines (VMs) run as processes on a different ring on top of a minimal operating-system (hypervisor). When the VMs execute privileged instructions, it will trigger a protection fault and the hypervisor can thus inspect who actually triggered the fault (was it the guest kernel or a user mode process within the guest kernel?). If it was the guest kernel than it can execute that instruction on behalf of the guest. Otherwise, it will probably do what a real kernel would do (trigger an exception). It can tell the difference because the guest kernel runs in a different ring than the ring 3 (user mode).
With that said, the NVME device isn't PCI it is NVME. The host controller of the NVME drive is PCI. To rescan the NVME drives, you will read/write to some registers that are memory mapped in RAM and ask the NVME PCI host controller what is the size of the different disks that are found. PCI is known to be hot pluggable (similarly to USB) in some cases but mostly not on consumer motherboards. I don't think you'll get any interrupt when a PCI device is hot plugged so you are left doing a rescan of the devices.
For NVME, it depends on the host controller if you'll get an interrupt when a disk is swapped/changes size. As to virtual-disks, it probably depends on a lot of different things. You could definitely be left doing a PCI rescan here. I guess it depends on the hypervisor, on the OS and on the host controller configurations.

How does VirtualBox-like virtualization work? (some technical details required)

First consider the situation when there is only one operating system installed. Now I run some executable. Processor reads instructions from the executable file and preforms these instructions. Even though I can put whatever instructions I want into the file, my program can't read arbitrary areas of HDD (and do many other potentially "bad" things).
It looks like magic, but I understand how this magic works. Operating system starts my program and puts the processor into some "unprivileged" state. "Unsafe" processor instructions are not allowed in this state and the only way to put the processor back to "privileged" state is give the control back to kernel. Kernel code can use all the processor's instructions, so it can do potentially unsafe things my program "asked" for if it decides it is allowed.
Now suppose we have VMWare or VirtualBox on Windows host. Guest operating system is Linux. I run a program in guest, it transfers control to guest Linux kernel. The guest Linux kernel's code is supposed to be run in processor's "privileged" mode (it must contain the "unsafe" processor instructions!). But I strongly doubt that it has an unlimited access to all the computer's resources.
I do not need too much technical details, I only want to understand how this part of magic works.
This is a great question and it really hits on some cool details regarding security and virtualization. I'll give a high-level overview of how things work on an Intel processor.
How are normal processes managed by the operating system?
An Intel processor has 4 different "protection rings" that it can be in at any time. The ring that code is currently running in determines the specific assembly instructions that may run. Ring 0 can run all of the privileged instructions whereas ring 3 cannot run any privileged instructions.
The operating system kernel always runs in ring 0. This ring allows the kernel to execute the privileged instructions it needs in order to control memory, start programs, write to the HDD, etc.
User applications run in ring 3. This ring does not permit privileged instructions (e.g. those for writing to the HDD) to run. If an application attempts to run a privileged instruction, the processor will take control from the process and raise an exception that the kernel will handle in ring 0; the kernel will likely just terminate the process.
Rings 1 and 2 tend not to be used, though they have a use.
Further reading
How does virtualization work?
Before there was hardware support for virtualization, a virtual machine monitor (such as VMWare) would need to do something called binary translation (see this paper). At a high level, this consists of the VMM inspecting the binary of the guest operating system and emulating the behavior of the privileged instructions in a safe manner.
Now there is hardware support for virtualization in Intel processors (look up Intel VT-x). In addition to the four rings mentioned above, the processor has two states, each of which contains four rings: VMX root mode and VMX non-root mode.
The host operating system and its applications, along with the VMM (such as VMWare), run in VMX root mode. The guest operating system and its applications run in VMX non-root mode. Again, both of these modes each have their own four rings, so the host OS runs in ring 0 of root mode, the host OS applications run in ring 3 of root mode, the guest OS runs in ring 0 of non-root mode, and the guest OS applications run in ring 3 of non-root mode.
When code that is running in ring 0 of non-root mode attempts to execute a privileged instruction, the processor will hand control back to the host operating system running in root mode so that the host OS can emulate the effects and prevent the guest from having direct access to privileged resources (or in some cases, the processor hardware can just emulate the effect itself without getting the host involved). Thus, the guest OS can "execute" privileged instructions without having unsafe access to hardware resources - the instructions are just intercepted and emulated. The guest cannot just do whatever it wants - only what the host and the hardware allow.
Just to clarify, code running in ring 3 of non-root mode will cause an exception to be sent to the guest OS if it attempts to execute a privileged instruction, just as an exception will be sent to the host OS if code running in ring 3 of root mode attempts to execute a privileged instruction.

Interrupt routing for PCIe slot directly connected to the CPUs

If we look at a Haswell architectural diagram today we can see that there are PCIe lanes directly connected to the CPU (for graphics) as well as some of them routed to the the platform controller hub (southbridge replacement):
If we look Intel 8 series data-sheet (the specification of the C222) we will find that the Intel C222 contains the I/O APIC used to route legacy INTx interrupts (Chapter 5.10). My question is what happens if a legacy INTx interupt requests arrives directly at the CPU (over the PCIe 3.0 lanes). Does that have to be forwarded to the C222 first or is there another I/O APIC in the system agent that I will have to program in that case? Also, with Intel Virtualization Technology for Directed I/O there is now an additional indirection, the interrupt remapping table. Is that table in the system agent (former northbridge) on the CPU or on the C222 and does that mean all interrupts from the PCIe 3.0 lanes need to be routed to the C222 first in case the remapping is enabled?
Legacy INTx interupt requests arriving at a root port in the CPU are forwarded to the I/O APIC in the PCH.
There is a separate VT-d instance in the CPU (perhaps even a separate instance per root port), so message-signaled interrupts arriving at a root port do not go through the PCH.

Regd Harware assisted Virtualization

I am trying to understand hardware assisted virtualization for a project with ARM CortexA8 and using the ARM Trustzone feature. I am new to this topic therefore I started with Wiki entries to understand more.
Wikipedia explains hardware assisted virtialization and adds a line in the definitionas:
Full virtualization is used to simulate a complete hardware
environment, or virtual machine, in which an unmodified guest
operating system (using the same instruction set as the host machine)
executes in complete isolation.
The text in bold is a bit confusing. How is the same instruction set of the processor used to provide two isolated environment? Can someone explain it? ArmTrustzone manual also talk of a "virtual processor core" to provide security. Please throw some light.
thanks
The phrase "using the same instruction set as the host machine" means that the guest OS is not aware of the virtualization layer and behaves as if it is executed on a real machine (with the same instruction set). This is in contrast to the para-virtualization paradigm in which the guest OS is aware of virtualization and calls some specific VMM functions, i.e. hypercalls.
No, CPU has not additional instructions. Virtual machine instruction set is translated by a hypervisor component called VMM (virtual machine manager) to be executed on the physical CPU.
Physical CPU with assisted Virtualization introduced only a new ring 0 mode called VMX that allow the virtual machine to execute some instructions in ring 0.

Linux device driver for SMP system

I have developed a Linux block device driver for CD device. The driver is working well but now there is a requirement that it should run on a SMP system. When I did a test run on the SMP system, I found the performance of the driver to degrade. The Bit rate for DATA CD has gone down tremendously as compared to single core system. So I understand that my driver needs to be modified to make it SMP safe.
In my driver , I have used :
1. Kernel threads
2. Mutex
3. Semaphore
4. Completions
My SMP system is : ARM Cortex-A9 Dual Core 600 MHz
Can some one please tell me what all factors that I should keep in mind while doing this porting?
Normally for SMP systems the shared resources (I/O resources) and global variables must be handled in such a way that simultaneous execution of a task must not overwrite, corrupt the data for this you can use spin_locks, semaphores etc to ensure that only one core will perform operation on that block/task at a time. This is logical implementation you've to identify the potential risky areas in device driver like ISR, read and write operations and have to identify the multiple entry points of your device driver and central task (in driver) toward which they are/will going/go.