I have developed a Linux block device driver for CD device. The driver is working well but now there is a requirement that it should run on a SMP system. When I did a test run on the SMP system, I found the performance of the driver to degrade. The Bit rate for DATA CD has gone down tremendously as compared to single core system. So I understand that my driver needs to be modified to make it SMP safe.
In my driver , I have used :
1. Kernel threads
2. Mutex
3. Semaphore
4. Completions
My SMP system is : ARM Cortex-A9 Dual Core 600 MHz
Can some one please tell me what all factors that I should keep in mind while doing this porting?
Normally for SMP systems the shared resources (I/O resources) and global variables must be handled in such a way that simultaneous execution of a task must not overwrite, corrupt the data for this you can use spin_locks, semaphores etc to ensure that only one core will perform operation on that block/task at a time. This is logical implementation you've to identify the potential risky areas in device driver like ISR, read and write operations and have to identify the multiple entry points of your device driver and central task (in driver) toward which they are/will going/go.
Related
I recently started working on my own operating system. I am following jsandler18‘s awesome tutorial and making changes as I go to allow it to run on the raspberry pi 4.
Sadly, jsandler18 stopped updating the tutorial before he had finished the page on virtual memory. I read through some other sources, and found a little problem: The ARM l1 address translation table divides the computers RAM into 1-MB blocks. The problem here is that it only allows up to 4096 entries, or 4GB of virtual ram.
Is there some way I can use the ARM MMU to translate more than 4GB of virtual memory?
The tutorial being referenced appears to be executing in ARMV7, which can be thought of as 32-bit ARM. This is roughly equivalent to running in 32-bit PAE mode in X86. Thus using this example it is not possible to to use more that 4GB of virtual memory.
ARMV8 (or AARCH64) supports 64-bit virtual addresses, and would allow mapping more that 4GB of virtual memory.
Switching into ARMV8 is done by switching Exception levels, which are usually denoted as EL0, EL1, EL2 and EL3. The one challenge you could run into is that once you enter AARCH32 mode, you can not go to a lower exception level and switch to AARCH64. For example going from EL1 64-bit -> EL0 32-bit is supported, but going EL1 32-bit -> EL0 64-bit is not. This could pose a challenge if the firmware handing execution off to your OS is in AARCH32 mode.
First consider the situation when there is only one operating system installed. Now I run some executable. Processor reads instructions from the executable file and preforms these instructions. Even though I can put whatever instructions I want into the file, my program can't read arbitrary areas of HDD (and do many other potentially "bad" things).
It looks like magic, but I understand how this magic works. Operating system starts my program and puts the processor into some "unprivileged" state. "Unsafe" processor instructions are not allowed in this state and the only way to put the processor back to "privileged" state is give the control back to kernel. Kernel code can use all the processor's instructions, so it can do potentially unsafe things my program "asked" for if it decides it is allowed.
Now suppose we have VMWare or VirtualBox on Windows host. Guest operating system is Linux. I run a program in guest, it transfers control to guest Linux kernel. The guest Linux kernel's code is supposed to be run in processor's "privileged" mode (it must contain the "unsafe" processor instructions!). But I strongly doubt that it has an unlimited access to all the computer's resources.
I do not need too much technical details, I only want to understand how this part of magic works.
This is a great question and it really hits on some cool details regarding security and virtualization. I'll give a high-level overview of how things work on an Intel processor.
How are normal processes managed by the operating system?
An Intel processor has 4 different "protection rings" that it can be in at any time. The ring that code is currently running in determines the specific assembly instructions that may run. Ring 0 can run all of the privileged instructions whereas ring 3 cannot run any privileged instructions.
The operating system kernel always runs in ring 0. This ring allows the kernel to execute the privileged instructions it needs in order to control memory, start programs, write to the HDD, etc.
User applications run in ring 3. This ring does not permit privileged instructions (e.g. those for writing to the HDD) to run. If an application attempts to run a privileged instruction, the processor will take control from the process and raise an exception that the kernel will handle in ring 0; the kernel will likely just terminate the process.
Rings 1 and 2 tend not to be used, though they have a use.
Further reading
How does virtualization work?
Before there was hardware support for virtualization, a virtual machine monitor (such as VMWare) would need to do something called binary translation (see this paper). At a high level, this consists of the VMM inspecting the binary of the guest operating system and emulating the behavior of the privileged instructions in a safe manner.
Now there is hardware support for virtualization in Intel processors (look up Intel VT-x). In addition to the four rings mentioned above, the processor has two states, each of which contains four rings: VMX root mode and VMX non-root mode.
The host operating system and its applications, along with the VMM (such as VMWare), run in VMX root mode. The guest operating system and its applications run in VMX non-root mode. Again, both of these modes each have their own four rings, so the host OS runs in ring 0 of root mode, the host OS applications run in ring 3 of root mode, the guest OS runs in ring 0 of non-root mode, and the guest OS applications run in ring 3 of non-root mode.
When code that is running in ring 0 of non-root mode attempts to execute a privileged instruction, the processor will hand control back to the host operating system running in root mode so that the host OS can emulate the effects and prevent the guest from having direct access to privileged resources (or in some cases, the processor hardware can just emulate the effect itself without getting the host involved). Thus, the guest OS can "execute" privileged instructions without having unsafe access to hardware resources - the instructions are just intercepted and emulated. The guest cannot just do whatever it wants - only what the host and the hardware allow.
Just to clarify, code running in ring 3 of non-root mode will cause an exception to be sent to the guest OS if it attempts to execute a privileged instruction, just as an exception will be sent to the host OS if code running in ring 3 of root mode attempts to execute a privileged instruction.
In a single processor system, when powered on the processor starts executing the boot rom code and the multiple stages of the boot. However how does this work in a multi process system? Does one processor act as the master? Who decides which processor is the master and the others helpers?
How and where is it configured?
Are the page tables shared between the processors? The processor caches are obviously different, at least the L1 caches is.
Multiprocessor Booting
1 One processor designated as ‘Boot Processor’ (BSP)
– Designation done either by Hardware or BIOS
– All other processors are designated AP (Application Processors)
2- BIOS boots the BSP
3- BSP learns system configuration
4- BSP triggers boot of other AP
– Done by sending an Startup IPI (inter processor interrupt) signal to
the AP
look here
and here for more details
I am using qemu on Fedora and I find that qemu does not support multi-core. When I use the parameter smp and set cores=2, it will tell me that:
mu-system-riscv: Number of SMP CPUs requested (2) exceeds max CPUs supported by machine 'riscv'
In general, QEMU can support multicore guests, yes. However the number of cores supported depends on the particular board (machine) model you're using. The error message is telling you that the 'riscv' machine you've asked for only supports one CPU.
(In TCG emulation at the moment multicore guests won't be any faster than a single core guest because we don't use all the host cores; this should change in QEMU 2.9 for at least some host/guest combinations when multithreaded TCG support lands. KVM supports multicore guests with no problems.)
In FreeBSD kernel, how can I first stop all the cores, and then run my code (can be a kernel module) on all the cores? Also, when finished, I can let them restore contexts and continue executing.
Linux has APIs like this, I believe FreeBSD also has a set of APIs to do this.
edit:
Most likely I did not clarify what I want to do. First, the machine is x86_64 SMP.
I set a timer, when the time is over; to stop all the threads (including kernel threads) on all cores; save context; run my code on one core to do some kernel stuff; when finished, restore the context and let them continue running; periodically. The other kernel threads and processes are not affected (without changing their relative priority).
I assume that your "code" (the kernel module) actually takes advantage of SMP inherently already.
So, one approach you can do is:
Set the affinity of all your processes/threads to your desired cpus (sched_setaffinity)
Set each of your threads to use Real-Time (RT) scheduling.
If it is a kernel module, you can do this manually in your module (I believe), by changing the scheduling policy for your task_struct to SCHED_RR (or SCHED_FIFO) after pinning each process to a core.
In userspace, you can use the FreeBSD rtprio command (http://www.freebsd.org/cgi/man.cgi?query=rtprio&sektion=1):
rtprio, idprio -- execute, examine or modify a utility's or process's
realtime or idletime scheduling priority
The effect will be: Your code will run first before any other non-essential process in the system, until your code finishes.