Build an RT-application using PREEMPT_RT - real-time

I would like to write real-time Linux programs while using the real-time PREEMPT_RT. I found the official Wiki (https://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-application). There are some code examples but I would like to get the explanation of the possible RT-functions.
Thank you,

It is important to underline that PREEMPT_RT is a patch that changes the internal code of the Linux kernel to reduce the maximum latency experienced by a user-level process. This is done by changing e.g. spinlocks to real-time preemptible mutexes, using threaded interrupts (i.e., hardware interrupts handlers run in seperate kernel threads) and so on. Therefore, it doesn't provide any API for user-level programming and you still rely on the standard libc and system call primitives. Just patch, configure and re-install the kernel (or, alternatively, install a pre-built PREEMPT_RT kernel).
You can still, of course, follow good practice real-time programming to avoid delays and contentions. The page you are mentioning concerns how to configure the kernel and write your code to get full benefit from the patch.
If you look for specific real-time APIs, you may want to have a look at Xenomai 3.0.1 which provides a specific API for running your user-level process on top of either the standard Linux or the Xenomai hypervisor (a layer below the Linux kernel)

Related

What are the main differences between eBPF and LTTng?

What are the main differences between eBPF and LTTng?
I read LTTng uses instrumentation: “Linux Trace Toolkit Next Generation (LTTng) is a tracer able to extract information from the 2 Linux kernel, user space libraries and from programs. It is based on instrumentation of the executables”
https://lttng.org/files/papers/desnoyers-codebreakers.pdf
Does this mean you have to rebuild the kernel or is this also about instrumentation when working with kprobes?
I am more familiar with eBPF than I am with LTTng, based on skimming the LTTng docs I can see the following differences:
LTTng requires the loading of a kernel module whereas eBPF is a native part of the linux kernel. The kernel gives a number of guarantees with regards to eBPF programs, mostly to protect the user from panicking their kernel or getting it stuck in infinite loops ect. You don't have that protection with kernel modules.
LTTng seems to me specifically targeted towards tracing. eBPF on the other hand has a much broader scope which includes tracing, networking, security, infrared drivers, and potentially even CPU scheduling.

Basics of Real Time OS

I am trying to learn an RTOS from scratch and for this, I use freeRTOS.org as a reference. I find out this site as a best resource to learn an RTOS. However, I have some doubts and I was trying to find out but not able to get exact answers.
1) How to find out that device have Real-time capability e.g. some controller has (TI Hercules) and other don't have(MSP430)?
2) Does that depend upon the architecture of the CORE (ARM Cortex-R CPU in TI Hercules TMS570)?
I know that these questions make nuisance, but I don't know how to get the answer of these questions.
Thanks in advance
EDIT:
One more query I have that what is meant by "OS" in RTOS? Does that mean the same OS like others or it's just contains the source code file for the API's?
Figuring out whether a device has a "Real-Time" capability is somewhat arbitrary and depends on your project's timing requirements. If you have timing requirements that are very high, you'll want to use a faster microcontroller/processor.
Using an RTOS (e.g. FreeRTOS, eCOS, or uCOS-X) can help ensure that a given task will execute at a predictable time. The FreeRTOS website provides a good discussion of what operating systems are and what it means for an operating system to claim Real-Time capabilities. http://www.freertos.org/about-RTOS.html
You can also see from the ports pages of uC/OS-X and FreeRTOS that they can run on a variety target microcontrollers / microprocessors.
Real-time capability is a matter of degree. A 32-bit DSP running at 1 GHz has more real-time capability than an 8-bit microcontroller running at 16 MHz. The more powerful microcontroller could be paired with faster memories and ports and could manage applications requiring large amounts of data and computations (such as real-time video image processing). The less powerful microcontroller would be limited to less demanding applications requiring a relatively small amount of data and computations (perhaps real-time motor control).
The MSP430 has real-time capabilities and it's used in a variety of real-time applications. There are many RTOS that have been ported to the MSP430, including FreeRTOS.
When selecting a microcontroller for a real-time application you need to consider the data bandwidth and computational requirements of the application. How much data needs to be processed in what amount of time? Also consider the range and precision of the data (integer or floating point). Then figure out which microcontroller can support those requirements.
While Cortex-R is optimised for hard real-time; that does not imply that other processors are not suited to real-time applications, or even better suited to a specific application. What you need to consider is whether a particular combination of RTOS and processor will meet the real-time constraints of your application; and even then the most critical factor is your software design rather then the platform.
The main goal you want to obtain from an RTOS is determinism, most other features are already available in most other non-RTOS operating systems.
The -OS part in RTOS means Operating System, simply put, and as all other operating systems, RTOSes provide the required infrastructure for managing processor resources so you work on a higher level when designing your application. For accessing those functionalities the OS provides an API. Using that API you can use semaphores, message queues, mutexes, etc.
An RTOS has one requirement to be an RTOS, it must be pre-emptive. This means that it must support task priorities so when a higher-priority task gets ready to run, one of possible task states, the scheduler must switch the current context to that task.
This operation has two implications, one is the requirement of one precise and dedicated timer, tick timer, and the other is that, during context switching, there is a considerable memory operations overhead. The current CPU status, or CPU's in case of multi-core SoCs, must be copied into the pre-empted task's context information and the new ready to run task's context must be restored in the CPU.
ARM processors already provide support for the System Timer, which is intended for a dedicated use as an OS tick timer. Not so long ago, the tick timer was required to be implemented with a regular, non-dedicated timer.
One optimization in cores designed for RTOSes with real-time capabilities is the ability to save/restore the CPU context state with minimum code, so it results in much less execution time than that in regular processors.
It is possible to implement an RTOS in nearly any processor, and there are some implementations targeted to resource constrained cores. You mainly need a timer with interrupt capacity and RAM. If the CPU is very fast you can run the OS tick at high rates, sub-millisecond in some real-time applications with DSPs, or at a lower rate like just 10~100 ticks per second for applications with low timing requirements in low end CPUs.
Some theoretical knowledge would be quite useful too, e.g. figuring out whether a given task set is schedulable under given scheduling approach (sometimes it may not), differences between static-priority and dynamic-priority scheduling, priority inversion problem, etc.

Is it possible or a common technique to sandbox kernel mode drivers?

I see that kernel mode drivers are risky as they run in privileged mode, but are there any monolithic kernel's that do any form of driver/loadable module sandboxing or is this really the domain of microkernels?
Yes, there are platforms with "monolithic" (for some definition of monolithic) kernels that do driver sandboxing for some drivers. Windows does this in more recent versions with the user mode driver framework. There are two reasons for doing this:-
It allows isolation. A failure in a user mode driver need not bring down the whole system. This is important for drivers for hardware which is not considered system critical. An example of this might be your printer, or your soundcard. In those cases if the driver fails, it can often simply be restarted and the user often won't even notice this happened.
It makes writing and debugging drivers much easier. Driver writers can use regular user mode libraries and regular user mode debuggers, without having to worry about things like IRQL and DPCs.
The other poster said there is no sense to this. Hopefully the above explains why you might want to do this. Additionally, the other poster said their is a performance concern. Again, this depends on the type of the driver. In Windows this is typically used for USB drivers. In the case of USB drivers, the driver is not talking directly to the hardware directly anyway regardless of the mode that the driver operates in - they are talking to another driver which is talking to the USB host controller, so there is much less overhead of user mode communication than there would be if you were writing a driver that had to bit bang IO ports from user mode. Also, you would avoid writing user mode drivers for hardware which was performance critical - in the case of printers and audio hardware the user mode transitions are so much faster than the hardware itself, that the performance cost of the one or two additional mode context switches is probably irrelevant.
So sometimes it is worth doing simply because the additional robustness and ease of development make the small and often unnoticeable performance reduction worthwhile.
There are no sense in this sandboxing, OS fully trust to drivers code. Basically this drivers become part of kernel. You can't failover after FS crash or any major subsystem of kernel. Basically it`s bad (failover after crash, imagine that you can do after storage driver of boot disk crash?), because can lead to data loss for example.
And second - sandboxing lead to perfomance hit to all kernel code.

OS memory isolation

I am trying to write a very thin hypervisor that would have the following restrictions:
runs only one operating system at a time (ie. no OS concurrency, no hardware sharing, no way to switch to another OS)
it should be able only to isolate some portions of RAM (do some memory translations behind the OS - let's say I have 6GB of RAM, I want Linux / Win not to use the first 100MB, see just 5.9MB and use them without knowing what's behind)
I searched the Internet, but found close to nothing on this specific matter, as I want to keep as little overhead as possible (the current hypervisor implementations don't fit my needs).
What you are looking for already exists, in hardware!
It's called IOMMU[1]. Basically, like page tables, adding a translation layer between the executed instructions and the actual physical hardware.
AMD calls it IOMMU[2], Intel calls it VT-d (please google:"intel vt-d" I cannot post more than two links yet).
[1] http://en.wikipedia.org/wiki/IOMMU
[2] http://developer.amd.com/documentation/articles/pages/892006101.aspx
Here are a few suggestions / hints, which are necessarily somewhat incomplete, as developing a from-scratch hypervisor is an involved task.
Make your hypervisor "multiboot-compliant" at first. This will enable it to reside as a typical entry in a bootloader configuration file, e.g., /boot/grub/menu.lst or /boot/grub/grub.cfg.
You want to set aside your 100MB at the top of memory, e.g., from 5.9GB up to 6GB. Since you mentioned Windows I'm assuming you're interested in the x86 architecture. The long history of x86 means that the first few megabytes are filled with all kinds of legacy device complexities. There is plenty of material on the web about the "hole" between 640K and 1MB (plenty of information on the web detailing this). Older ISA devices (many of which still survive in modern systems in "Super I/O chips") are restricted to performing DMA to the first 16 MB of physical memory. If you try to get in between Windows or Linux and its relationship with these first few MB of RAM, you will have a lot more complexity to wrestle with. Save that for later, once you've got something that boots.
As physical addresses approach 4GB (2^32, hence the physical memory limit on a basic 32-bit architecture), things get complex again, as many devices are memory-mapped into this region. For example (referencing the other answer), the IOMMU that Intel provides with its VT-d technology tends to have its configuration registers mapped to physical addresses beginning with 0xfedNNNNN.
This is doubly true for a system with multiple processors. I would suggest you start on a uniprocessor system, disable other processors from within BIOS, or at least manually configure your guest OS not to enable the other processors (e.g., for Linux, include 'nosmp'
on the kernel command line -- e.g., in your /boot/grub/menu.lst).
Next, learn about the "e820" map. Again there is plenty of material on the web, but perhaps the best place to start is to boot up a Linux system and look near the top of the output 'dmesg'. This is how the BIOS communicates to the OS which portions of physical memory space are "reserved" for devices or other platform-specific BIOS/firmware uses (e.g., to emulate a PS/2 keyboard on a system with only USB I/O ports).
One way for your hypervisor to "hide" its 100MB from the guest OS is to add an entry to the system's e820 map. A quick and dirty way to get things started is to use the Linux kernel command line option "mem=" or the Windows boot.ini / bcdedit flag "/maxmem".
There are a lot more details and things you are likely to encounter (e.g., x86 processors begin in 16-bit mode when first powered-up), but if you do a little homework on the ones listed here, then hopefully you will be in a better position to ask follow-up questions.

How does the NX flag work?

Could you please explain what the NX flag is and how it works (please be technical)?
It marks a memory page non-executable in the virtual memory system and in the TLB (a structure used by the CPU for resolving virtual memory mappings). If any program code is going to be executed from such page, the CPU will fault and transfer control to the operating system for error handling.
Programs normally have their binary code and static data in a read-only memory section and if they ever try to write there, the CPU will fault and then the operating-system normally kills the application (this is known as segmentation fault or access violation).
For security reasons, the read/write data memory of a program is usually NX-protected by default. This prevents an attacker from supplying some application his malicious code as data, making the application write that to its data area and then having that code executed somehow, usually by a buffer overflow/underflow vulnerability in the application, overwriting the return address of a function in stack with the location of the malicious code in the data area.
Some legitimate applications (most notably high-performance emulators and JIT compilers) also need to execute their data, as they compile the code at runtime, but they specifically allocate memory with no NX flag set for that.
From Wikipedia
The NX bit, which stands for No
eXecute, is a technology used in CPUs
to segregate areas of memory for use
by either storage of processor
instructions (or code) or for storage
of data, a feature normally only found
in Harvard architecture processors.
However, the NX bit is being
increasingly used in conventional von
Neumann architecture processors, for
security reasons.
An operating system with support for
the NX bit may mark certain areas of
memory as non-executable. The
processor will then refuse to execute
any code residing in these areas of
memory. The general technique, known
as executable space protection, is
used to prevent certain types of
malicious software from taking over
computers by inserting their code into
another program's data storage area
and running their own code from within
this section; this is known as a
buffer overflow attack.
Have a look at this 'DEP' found on wikipedia which uses the NX bit. As for supplying the technical answer, sorry, I do not know enough about this but to quote:
Data Execution Prevention (DEP) is a security feature included in modern
Microsoft Windows operating systems that is intended to prevent an
application or service from executing code from a non-executable memory region.
....
DEP was introduced in Windows XP Service Pack 2 and is included in Windows XP
Tablet PC Edition 2005, Windows Server 2003 Service Pack 1 and later, Windows
Vista, and Windows Server 2008, and all newer versions of Windows.
...
Hardware-enforced DEP enables the NX bit on compatible CPUs, through the
automatic use of PAE kernel in 32-bit Windows and the native support on 64-bit
kernels.
Windows Vista DEP works by marking certain parts of memory as being intended to
hold only data, which the NX or XD bit enabled processor then understands as
non-executable.
This helps prevent buffer overflow attacks from succeeding. In Windows Vista,
the DEP status for a process, that is, whether DEP is enabled or disabled for a
particular process can be viewed on the Processes tab in the Windows Task
Manager.
See also here from the MSDN's knowledge base about DEP. There is a very detailed explanation here on how this works.
Hope this helps,
Best regards,
Tom.