x86-64 mechanism for automatic bounds checking - x86-64

Is there any scheme in usermode x64 for the hardware to automatically apply bounds checking on each memory load and store, without explicit instrumentation from the compiler? I also do not want to rely on OS support (e.g. mprotect system call).
<begin: enforce all accesses within 0x10000000-0x10000100>
...
mov ___, ___ #hardware automatically performs range check
...
<end enforcement>
AFAIK, Intel MPX requires explicit bndcl/bndcu instructions before each checked memory access.

No. For the simple reason that CPUs are not crystal balls and the information what the bounds are have to come from somewhere. And it is the task of the program to determine that "what" and be the "somewhere".

Related

Why can User call a system call directly?

Before I ask the question, the following is what I know.
The system call is in the kernel area.
The kernel area cannot be used (accessed) directly by the user.
There are two ways to call a system call.
direct call
wrapping function (API) that contains system call
(2. process:
(User Space) wrapping function ->
system call interface ->
(Kernel Space) System call)
So, in 1. case)
How can User use the kernel area directly?
Or I wonder if there's anything I'm mistaken about.
open sns question
internet search
read operating system concepts 10th (page. 64)
The default is that nothing in user-space is able to execute anything in kernel space. How that works depends on the CPU and the OS, but likely involves some kind of "privilege level" that must be matched or exceeded before the CPU will allow software to access the kernel's part of virtual memory.
This default behavior alone would be horribly useless. For an OS to work there must be some way for user-space to transfer control/execution to (at least one) clearly marked and explicitly allowed kernel entry point. This also depends on the OS and CPU.
For example; for "all 80x86" (including all CPUs and CPU modes) an OS can choose between:
a software interrupt (interrupt gate or trap gate)
an exception (e.g. breakpoint exception)
a call gate
a task gate
the sysenter instruction
the syscall instruction
..and most modern operating system choose to use the syscall instruction now.
All of these possibilities share 2 things in common:
a) There is an implied privilege level switch done by the CPU as part of the control transfer
b) The caller is unable to specify the address they're calling. Instead it's set by the kernel (e.g. during the kernel's initialization).

How to decide the registers to be preserved for OS task switching?

When task switch happens in an OS, how to decide which registers should be preserved?
Is this purely decided by hardware architecture? Or also involve the OS implementation?
I once did some naïve implementation on ARM architecture that preserve all the R1 ~ R15 registers (if I remember it correctly). But that seems too much.
I also tried the x86 hardware task switching support, the TSS segment covers a lot of registers which doesn't have good performance as well.
I guess the design philosophy of an OS, especially the implementation of a task state should decide this. But I am not sure if there's any best practice or conventions. Or other factors.
When task switch happens in an OS, how to decide which registers should be preserved?
Normally most of a scheduler would be written in a higher level language (e.g. C), and the low level task switch code will be written as a small assembly language function (and NOT inline assembly) because there's no sane way to predict what a compiler might do with the stack and local variables.
Because of this; which registers the low level assembly function needs to save/restore depends on the ABI ("calling convention") the compiler felt like using. For example, the System V AMD64 ABI says the callee must preserve RBX, RSP, RBP, and R12 to R15 (and can trash RAX, RCX, RDX, and R8 to R11 if they aren't used as return parameters).
This does depend on the nature of the OS though. E.g. it's possible to design an OS where the kernel runs like a separate task and anything that causes a switch from user-space to kernel-space acts like a task switch and has to save everything before any higher level kernel code is executed.
There is a lot of theoretical wiggle room for what registers an OS chooses to preserve. For a "safe" implementation an OS would save all registers that would be accessible by to a user and/or kernel thread. We typically think of the R0,R1,Rx,... (ARM, MIPS, .ect) or RAX,RBX,... (x86) registers needing to be preserved. However, hardware floating point and vector instructions (x86 AVX) may also need preserved.
This is often were the implementation of the OS has wiggle room. One could simply play it safe and preserve all floating point and vector instruction registers. However, if these registers are not being used by a thread, saving off unused registers slows down context switching. Not to mention families of processors may have the same core instructions and registers, but optional floating point or vector extensions. Thus some operating systems support flagging in a thread if floating point or vectors instructions are used by the thread, so the OS knows which additional registers to preserve.

Device Drivers vs the /dev + glibc Interface

I am looking to have the processor read from I2C and store the data in DDR in an embedded system. As I have been looking at solutions, I have been introduced to Linux device drivers as well as the GNU C Library. It seems like for many operations you can perform with the basic Linux drivers you can also perform with basic glibc system calls. I am somewhat confused when one should be used over the other. Both interfaces can be accessed from the user space.
When should I use a kernel driver to access a device like I2C or USB and when should I use the GNU C Library system functions?
The GNU C Library forwards function calls such as read, write, ioctl directly to the kernel. These functions are just very thin wrappers around the system calls. You could call the kernel all by yourself using inline assembly, but that is rarely helpful. So in this sense, all interactions with the kernel driver will go through these glibc functions.
If you have questions about specific interfaces and their trade-offs, you need to name them explicitly.
In ARM:
Privilege states are built into the processor and are changed via assembly commands. A memory protection unit, a part of the chip, is configured to disallow access to arbitrary ranges of memory depending on the privilege status.
In the case of the Linux kernel, ALL physical memory is privileged- memory addresses in userspace are virtual (fake) addresses, translated to real addresses once in privileged mode.
So, to access a privileged memory range, the mechanics are like a function call- you set the parameters indicating what you want, and then make a ('SVC')- an interrupt function which removes control of the program from userspace, gives it to the kernel. The kernel looks at your parameters and does what you need.
The standard library basically makes that whole process easier.
Drivers create interfaces to physical memory addresses and provide an API through the SVC call and whatever 'arguments' it's passed.
If physical memory is not reserved by a driver, the kernel generally won't allow anyone to access it.
Accessing physical memory you're not privileged to will cause a "bus error".
BTW: You can use a driver like UIO to put physical memory into userspace.

Time-Stamp Counter Restriction

I want to check if the RDTSC instruction is available. There must be a Intel Pentium or newer processor and either the TSD flag in register CR4 is clear or it is set and the CPL equals 0.
So, there's no problem to obtain the current privilege level (Bits 0 and 1 of the CS segment register). Also there is no problem to check if the instruction itself is supported (CPUID.1:EDX[4] = 1).
But (and that's the problem), this must also run under user-mode (PL3). But, I can't read the control register CR4 in user-mode.
Is there any other way to check if the operation system does restrict the access to the time-stamp counter?
The only way is to "try" the instruction and intercept the exception, provided that the operating system gives you the ability to react to the event in a safe way and recover your state so you can continue your program. Unfortunately not all the OS permit to continue after an exception that they consider "fatal". On Windows you can try to play with the structured exception handling, on linux there are specific signals (SIGILL, in particular). But other OS don't forgive this kind of exceptions.
Bye
(edit)
PS: it's also possible, in principle, for an OS to incercept the exception and simulate the instruction so the application has no way to decide if the instruction is really available. I don't know if there are OSs that do this thing (virtual machines, maybe?).
Bye!

A trivial SYSENTER/SYSCALL question

If a Windows executable makes use of SYSENTER and is executed on a processor implementing AMD64 ISA, what happens? I am both new and newbie to this topic (OSes, hardware/software interaction) but from what I've read I have understood that SYSCALL is the AMD64 equivalent to Intel's SYSENTER. Hopefully this question makes sense.
If you try to use SYSENTER where it is not supported, you'll probably get an "invalid opcode" exception.
Note that this situation is unusual - generally, Windows executables do not directly contain instructions to enter kernel mode.
As far as i know AM64 processors using different type of modes to handle such issues.
SYSENTER works fine but is not that fast.
A very useful site to get started about the different modes:
Wikipedia
They got rid of a bunch of unused functionality when they developed AMD64 extensions. One of the main ones is the elimination of the cs, ds, es, and ss segment registers. Normally loading segment registers is an extremely expensive operation (the CPU has to do permission checks, which could involve multiple memory accesses). Entering kernel mode requires loading new segment register values.
The SYSENTER instruction accelerates this by having a set of "shadow registers" which is can copy directly to the (internal, hidden) segment descriptors without doing any permission checks. The vast majority of the benefit is lost with only a couple of segment registers, so most likely the reasoning for removing the support for the instructions is that using regular instructions for the mode switch is faster.