What happens in kernel when modprobe is given - operating-system

I got this question for interview. I just know some firmware communication taking place. Once modprobe is given, an interrupt is triggered and kernel handles this interrupt according to priority and loads driver
I wanted to know difference between modprobe and insmod.

Related

Change kernel switch_root with JasperE84/root-ro script

I'm currently using a slightly modified script https://github.com/JasperE84/root-ro for booting system from squashfs image. It works almost as expected.
It does boot to the new read-only filesystem from the image, however, it boots with the kernel from the "main" system, the system that initramfs was built on it. I tried with the switch_root command from initramfs, but I can't get it working, actually since this script creates overlay I don't think I should use switch_root at all.
Could somebody help me with an idea or solution on how to boot to the kernel that is in the read-only image instead of the one that initramfs was built with?
Uros
If you want to use the kernel inside a squashfs file, you either need a boot loader that can read squashfs files, or you need to use kexec to start on a kernel that your boot loader can read and jump into a kernel from any filesystem that the first kernel can read.
To elaborate on the kexec option, you would have
A kernel and initramfs stored normally in the boot partition on a common filesystem
A simple init script in the initramfs that would mount the squashfs file and then find the new kernel
Call kexec to switch to the new kernel
Run another initramfs that again mounts the squashfs, (because it was lost during kexec) initializes the overlay like in your example, and finish booting the system
switch_root might still be needed in the second initramfs, but it only changes the view of the filesystem for userspace. It doesn't change kernels.
U-Boot could simplify this by loading the initial kernel directly from the squashfs file, but I've never used it and don't know if it is compatible with raspberry pi, so I can't make recommendations.

Speeding up syscall in Go

I've been working on a vpn written in go and I'm starting to try to optimize the data flow. From a cursory glance, the implementation code seems sound as there are no issues with memory leaking and CPU doesn't seem to be a constraint.
So I moved to pprof and the problem I am seeing is that most of the execution time is spent in syscall.Syscall. I did a 6 second profile of a running iperf throughput test and this is what I see:
This test is being run with both the client and server inside of docker containers with the client getting a --link to the server. Running iperf on the base bridge networking yields around 40Gbit of throughput, iperf over this vpn impl over the top of the same, nets about 500Mbit.
A simple htop shows that 3/4 of the time is spent in the system.
I've tried a couple approaches to attempt speeding up the single-client case, but I can't seem to find a way to mitigate writing packets in a vpn server... NB: iperf uses full MTU-sized packets during its test which limits some obvious optimizations.
listing Syscall:
Not sure why this is showing the CMPQ is taking all the time, I'd think that should be attributed to SYSCALL.
pprof is a process sampling profiler. It finds that the Program Counter (PC) is often waiting for CMPQ to execute while the OS is executing.
Speeding up syscall in Go
You can make the SYSCALL less often. You can improve the OS SYSCALL mechanism. You can improve the OS code that you asked the SYSCALL to execute. You can use better hardware. And so on.

How does VirtualBox-like virtualization work? (some technical details required)

First consider the situation when there is only one operating system installed. Now I run some executable. Processor reads instructions from the executable file and preforms these instructions. Even though I can put whatever instructions I want into the file, my program can't read arbitrary areas of HDD (and do many other potentially "bad" things).
It looks like magic, but I understand how this magic works. Operating system starts my program and puts the processor into some "unprivileged" state. "Unsafe" processor instructions are not allowed in this state and the only way to put the processor back to "privileged" state is give the control back to kernel. Kernel code can use all the processor's instructions, so it can do potentially unsafe things my program "asked" for if it decides it is allowed.
Now suppose we have VMWare or VirtualBox on Windows host. Guest operating system is Linux. I run a program in guest, it transfers control to guest Linux kernel. The guest Linux kernel's code is supposed to be run in processor's "privileged" mode (it must contain the "unsafe" processor instructions!). But I strongly doubt that it has an unlimited access to all the computer's resources.
I do not need too much technical details, I only want to understand how this part of magic works.
This is a great question and it really hits on some cool details regarding security and virtualization. I'll give a high-level overview of how things work on an Intel processor.
How are normal processes managed by the operating system?
An Intel processor has 4 different "protection rings" that it can be in at any time. The ring that code is currently running in determines the specific assembly instructions that may run. Ring 0 can run all of the privileged instructions whereas ring 3 cannot run any privileged instructions.
The operating system kernel always runs in ring 0. This ring allows the kernel to execute the privileged instructions it needs in order to control memory, start programs, write to the HDD, etc.
User applications run in ring 3. This ring does not permit privileged instructions (e.g. those for writing to the HDD) to run. If an application attempts to run a privileged instruction, the processor will take control from the process and raise an exception that the kernel will handle in ring 0; the kernel will likely just terminate the process.
Rings 1 and 2 tend not to be used, though they have a use.
Further reading
How does virtualization work?
Before there was hardware support for virtualization, a virtual machine monitor (such as VMWare) would need to do something called binary translation (see this paper). At a high level, this consists of the VMM inspecting the binary of the guest operating system and emulating the behavior of the privileged instructions in a safe manner.
Now there is hardware support for virtualization in Intel processors (look up Intel VT-x). In addition to the four rings mentioned above, the processor has two states, each of which contains four rings: VMX root mode and VMX non-root mode.
The host operating system and its applications, along with the VMM (such as VMWare), run in VMX root mode. The guest operating system and its applications run in VMX non-root mode. Again, both of these modes each have their own four rings, so the host OS runs in ring 0 of root mode, the host OS applications run in ring 3 of root mode, the guest OS runs in ring 0 of non-root mode, and the guest OS applications run in ring 3 of non-root mode.
When code that is running in ring 0 of non-root mode attempts to execute a privileged instruction, the processor will hand control back to the host operating system running in root mode so that the host OS can emulate the effects and prevent the guest from having direct access to privileged resources (or in some cases, the processor hardware can just emulate the effect itself without getting the host involved). Thus, the guest OS can "execute" privileged instructions without having unsafe access to hardware resources - the instructions are just intercepted and emulated. The guest cannot just do whatever it wants - only what the host and the hardware allow.
Just to clarify, code running in ring 3 of non-root mode will cause an exception to be sent to the guest OS if it attempts to execute a privileged instruction, just as an exception will be sent to the host OS if code running in ring 3 of root mode attempts to execute a privileged instruction.

How to stop all cores and run my code (in kernel space) on all the cores, in FreeBSD?

In FreeBSD kernel, how can I first stop all the cores, and then run my code (can be a kernel module) on all the cores? Also, when finished, I can let them restore contexts and continue executing.
Linux has APIs like this, I believe FreeBSD also has a set of APIs to do this.
edit:
Most likely I did not clarify what I want to do. First, the machine is x86_64 SMP.
I set a timer, when the time is over; to stop all the threads (including kernel threads) on all cores; save context; run my code on one core to do some kernel stuff; when finished, restore the context and let them continue running; periodically. The other kernel threads and processes are not affected (without changing their relative priority).
I assume that your "code" (the kernel module) actually takes advantage of SMP inherently already.
So, one approach you can do is:
Set the affinity of all your processes/threads to your desired cpus (sched_setaffinity)
Set each of your threads to use Real-Time (RT) scheduling.
If it is a kernel module, you can do this manually in your module (I believe), by changing the scheduling policy for your task_struct to SCHED_RR (or SCHED_FIFO) after pinning each process to a core.
In userspace, you can use the FreeBSD rtprio command (http://www.freebsd.org/cgi/man.cgi?query=rtprio&sektion=1):
rtprio, idprio -- execute, examine or modify a utility's or process's
realtime or idletime scheduling priority
The effect will be: Your code will run first before any other non-essential process in the system, until your code finishes.

how does a hypervisor knows that a privileged instruction happened inside a VM?

I've started reading about VMM and wondered to myself how does the hypervisor knows a privileged instruction (for ex, cpuid) happened inside a VM and not real OS ?
let's say I've executed cpuid, a trap will occur and a VMEXIT would happen, how does the hypevisor
would know that the instruction happened inside my regular OS or inside a VM ?
First off, you are using the wrong terminology. When an OS runs on top of a hypervisor, the OS becomes the VM (virtual-machine) itself and the hypervisor is the VMM (=virtual machine monitor). A VM can also be called "guest". Thus: OS on top of hypervisor = VM = guest (these expressions mean the same thing).
Secondly, you tell the CPU that it's executing inside the VM from the moment you've executed VMLAUNCH or VMRESUME, assuming you're reading about Intel VMX. When for some reason the VM causes a hypervisor trap, we say that "a VM-exit occured" and the CPU knows it's no longer executing inside the VM. Thus:
Between VMLAUNCH/VMRESUME executions and VM-exits we are in the VM and CPUID will trap (causing a VM-exit)
Between VM-exits and VMLAUNCH/VMRESUME executions we are in the VMM (=hypervisor) and CPUID will NOT TRAP, since we already are in the hypervisor
Instructions that are privileged generate exceptions when executed in user mode. The exception is usually an undefined instruction exception. The hypervisor hooks this exception, inspects the executing instruction and then returns control to the VM. When the host OS calls the same instruction, it is in a supervisor or elevated privilege and usually no exception is generated when it executes the instruction. So generally, these issues are handled by the CPU.
However, if an instruction is not available on the processor (say floating point emulation), then the hypervisor may emulate for the VM and chain to the OS handler if not. Possibly it may even allow the OS to handle the emulation for both VMs and user tasks in the OS.
So generally, this question is unanswerable for a generic CPU. It depends on how the instruction is emulated in the VM. However, the best case is that the hypervisor does not emulate any OS instructions. Emulations will not only slow down the VMs, but the entire CPU, including user processes in the host OS.