I am trying to understand how virtualization was performed in the past using shadow page tables. The articles I've read all talk about about the translation from Guest Virtual Memory to Host Physical Memory. I understand how the Shadow Page tables eliminate the need for a Guest Virtual to Guest Physical Translation. My question is, what happened to the Host Virtual to Host Physical step. (HVA --> HPA).
Do the Virtual Machine Managers in the cited articles, not use virtual memory in the host at all? Are they assumed to have direct access to the Physical memory of the host system? Is it even possible? I thought the TLB cache translation is implemented in hardware by the MMU and and every instruction's addresses are translated from virtual to physical by the MMU itself. But then again, I am not sure how kernel code works with TLB? Do kernel instructions not go through TLB?
I am not sure if I got your point accurately, I'm trying my best to answer your question.
There is no need for HVA->HPA because what guest wants is HPA instead of HVA. Which means HVA is useless for a guest accessing its guest memory region.
So the transfer flow you expected might be without considering shadow page table may be:
GVA -> GPA -> HVA -> HPA
But as most hypervisors are running in kernel mode, who knows how the host's and guest's memory is allocated, so it can map GPA to HPA directly and eliminate the need of HVA:
GVA -> GPA -> HPA
This guest memory translation flow is nothing related to the userspace of hyperviosr, whose flow is HVA -> HPA.
Not sure if above answers your question.
The answer can be yes or no. If yes, the hypervisor maps guest RAM into virtual memory on the host, so the host may swap it in and out of host RAM. If no, the hypervisor maps guest RAM into locked physical memory on the host.
VirtualBox is in the no group. VirtualBox runs a device driver in the host kernel, and uses this driver to allocate locked memory for guest RAM. Each page of guest RAM stays resident at a fixed host physical address, so the host can never swap out the page. Because of this, guest RAM must be smaller than host RAM. VirtualBox's manual says to spare at least 256 MB to 512 MB for the host.
The MMU can only map virtual addresses to physical addresses. In VirtualBox, the guest has an emulated MMU to map guest virtual addresses to guest physical addresses. VirtualBox has its own map of guest physical addresses to host physical addresses, and uses the host MMU to map guest virtual addresses to host physical addresses. Because of locked memory, the host physical addresses never become invalid.
Mac-on-Linux is in the yes group. I once used it to run a guest Mac OS 9 inside a host PowerPC Linux. I gave 256 MB of RAM to Mac OS 9, but my real Linux machine had only 64 MB of RAM. This worked because MOL maps guest RAM into host virtual memory, with an ordinary mmap() call in a user process. MOL then uses a Linux kernel module to control the host MMU.
But the host MMU can only map to host physical addresses, not virtual ones. The guest has an emulated MMU that maps guest virtual to guest physical. MOL adds a base address to translate guest physical to host virtual. MOL's kernel module uses the host map to translate host virtual to host physical, then uses the host MMU to map guest virtual to host physical.
If Linux swaps out a page of guest RAM, then the host physical address becomes invalid, and the guest system might overwrite someone else's memory and crash the host. There must be some way to notify MOL that Linux has swapped out the page. MOL solved this problem by patching an internal Linux kernel function named flush_map_page or flush_map_pages.
KVM is also in the yes group. Linux added a kernel feature called memory management notifiers to support KVM. When QEMU uses KVM for virtualization, it allocates guest RAM in host virtual memory. The MMU notifier tells KVM when the host is swapping out a page.
Related
I saw the following discussion and had some questions:
live resize of a NVMe drive
If the physical capacity of the nvme device changes (e.g., from 10GB to 20GB), how the operating system detect it without rebooting?
In the above link, re-scanning pci bus is solution.
When the re-scanning be executed, does the operating system ask the nvme device to update its meta-info (e.g., capacity, etc.) ?
How does OS interact with disk specifically? (How to read changed device parameters from the disk, not the old device parameters in memory?)
This is an AWS virtual-machine probably so the disk is actually a virtual-disk. You can't resize a physical disk like you upgrade its capacity physically (you'd need to change the disk).
With that said, this machine probably runs on top of a type 1 hypervisor. What I understand about these is that the virtual machines (VMs) run as processes on a different ring on top of a minimal operating-system (hypervisor). When the VMs execute privileged instructions, it will trigger a protection fault and the hypervisor can thus inspect who actually triggered the fault (was it the guest kernel or a user mode process within the guest kernel?). If it was the guest kernel than it can execute that instruction on behalf of the guest. Otherwise, it will probably do what a real kernel would do (trigger an exception). It can tell the difference because the guest kernel runs in a different ring than the ring 3 (user mode).
With that said, the NVME device isn't PCI it is NVME. The host controller of the NVME drive is PCI. To rescan the NVME drives, you will read/write to some registers that are memory mapped in RAM and ask the NVME PCI host controller what is the size of the different disks that are found. PCI is known to be hot pluggable (similarly to USB) in some cases but mostly not on consumer motherboards. I don't think you'll get any interrupt when a PCI device is hot plugged so you are left doing a rescan of the devices.
For NVME, it depends on the host controller if you'll get an interrupt when a disk is swapped/changes size. As to virtual-disks, it probably depends on a lot of different things. You could definitely be left doing a PCI rescan here. I guess it depends on the hypervisor, on the OS and on the host controller configurations.
I recently started working on my own operating system. I am following jsandler18‘s awesome tutorial and making changes as I go to allow it to run on the raspberry pi 4.
Sadly, jsandler18 stopped updating the tutorial before he had finished the page on virtual memory. I read through some other sources, and found a little problem: The ARM l1 address translation table divides the computers RAM into 1-MB blocks. The problem here is that it only allows up to 4096 entries, or 4GB of virtual ram.
Is there some way I can use the ARM MMU to translate more than 4GB of virtual memory?
The tutorial being referenced appears to be executing in ARMV7, which can be thought of as 32-bit ARM. This is roughly equivalent to running in 32-bit PAE mode in X86. Thus using this example it is not possible to to use more that 4GB of virtual memory.
ARMV8 (or AARCH64) supports 64-bit virtual addresses, and would allow mapping more that 4GB of virtual memory.
Switching into ARMV8 is done by switching Exception levels, which are usually denoted as EL0, EL1, EL2 and EL3. The one challenge you could run into is that once you enter AARCH32 mode, you can not go to a lower exception level and switch to AARCH64. For example going from EL1 64-bit -> EL0 32-bit is supported, but going EL1 32-bit -> EL0 64-bit is not. This could pose a challenge if the firmware handing execution off to your OS is in AARCH32 mode.
Everywhere I can see is how Docker can be different from virtual machine but nowhere there is a answer on how basic OS containers are different from virtual machine.
If we consider the basics, it looks like both are same i.e. an operating system is running within a operating system.
Would anybody explain the underlying difference?
Virtual machines
Virtual machines use hardware virtualization. There is an additional layer between the original hardware and the virtual one, that the virtual machine thinks it's real.
This model doesn't reutilize anything from the host's OS. This way, you can run a Windows VM on a Linux host and vice-versa.
System Containers
Systems containers use operating-system-level virtualization. It reutilizes the host kernel from the host OS, and subdivide the real hardware directly to the containers. There isn't an additional layer to access the real hardware and, for this reason, the overhead (or loss of performance) is practically zero.
On the other hand, you can't run a Windows container inside a Linux host OS, since the kernel isn't the same.
I was reading about virtualization and a doubt popped in my head. How does virtualization works internally at Operating System level? The topic was discussed in my class and I came across this.
A virtual box runs like a process with some extra privileges on the host operating system.
My doubt is, If VM is running as a process then who provides this extra privilege to it so that it could actually interfere with the underlying OS and hardware resources.
I read what a Hypervisor is: http://searchservervirtualization.techtarget.com/definition/hypervisor
A hypervisor is the connection between VM and host OS. Running hypervisor on a host OS means we will run it as a user process. Again my doubt is the same.
How can a user process (which is a hypervisor) control the host processor and resources? As far as I knew user processes dont have those rights.
Thanks in advance.
I am trying to understand hardware assisted virtualization for a project with ARM CortexA8 and using the ARM Trustzone feature. I am new to this topic therefore I started with Wiki entries to understand more.
Wikipedia explains hardware assisted virtialization and adds a line in the definitionas:
Full virtualization is used to simulate a complete hardware
environment, or virtual machine, in which an unmodified guest
operating system (using the same instruction set as the host machine)
executes in complete isolation.
The text in bold is a bit confusing. How is the same instruction set of the processor used to provide two isolated environment? Can someone explain it? ArmTrustzone manual also talk of a "virtual processor core" to provide security. Please throw some light.
thanks
The phrase "using the same instruction set as the host machine" means that the guest OS is not aware of the virtualization layer and behaves as if it is executed on a real machine (with the same instruction set). This is in contrast to the para-virtualization paradigm in which the guest OS is aware of virtualization and calls some specific VMM functions, i.e. hypercalls.
No, CPU has not additional instructions. Virtual machine instruction set is translated by a hypervisor component called VMM (virtual machine manager) to be executed on the physical CPU.
Physical CPU with assisted Virtualization introduced only a new ring 0 mode called VMX that allow the virtual machine to execute some instructions in ring 0.