I heard of privilege levels, rings, privileged instructions, non privileged instructions, user mode, kernel mode, user space, kernel space.
User process will run with low privilege where OS process with higher ,also I heard about CPL register which responsible for general protection. Also CPU only know CPL and it is decided basis of to which page instruction belongs to.
I want to know who/what decides initially the privilege level of process?
When it is decided that process will run with low or high privilege level? At compile time? At loading?
What tells that current program will run with specific privilege level? Segment registers? Descriptors? Loader ?
Firstly I see 3 questions.
Who/What decides initially the privilege level of process ?
When it is decided that process will run with low or high privilege level?
What tells that current program will run with specific privilege level?
Secondly to confirm the definition of some terms
When you say privilege level, I believe you are referring to the concept of level of privilege associated with CPU processor mode as opposed the generic level of any other privilege mechanism available.
When you say process, I believe you are referring to the concept of the currently running program as opposed to some alternate definition.
User processes run in user mode with the user privilege for a given CPU architecture
Kernel processes run in kernel mode with the supervisor privilege for a given CPU architecture.
Whether the process is user or kernel depends on which flags are set either in segment descriptors when paging isn't used or in page table or page directory entries where paging is used.
This means that the privilege level of a process is determined by where that process's code is located in memory. If it is in kernel space and marked as such using the relevant flags, then it is a kernel process. If it is in user space and marked as such using the relevant flags, then it is a user process.
If the process / program you are running isn't the kernel, it is a user process on most modern operating systems. So when the decision is made is at program execution time, specifically operating system initialization time when the kernel is first loaded.
Either the process is that kernel and is runs at supervisor privilege level or it isn't and it runs at user privilege level.
The CPU checks every execution of any code or data segment from memory against the relevant status registers (code status register on Intel X86, and current program status register on ARM).
When user processes need to access kernel resources, the general way this is done is by allowing the user process to ask the kernel process on its behalf by making a system call, which makes a privilege context switch when the kernel process runs the request for the user process.
As a side note, Kernel Mode Linux, allows you to run user processes in kernel / supervisor mode.
References and further reading
OS Dev Security Page
OS Dev Segmentation Page
OS Dev Paging Page
OS Dev ARM overview
Memory Translation and Segmentation
CPU Rings, Privilege, and Protection
Operating System Privilege: Protection and Isolation
Operating Modes, System Calls and Interrupts
Paging and Segmentation
Most processors have a trap or software fault instruction that switches the processor into privileged mode. The kernel checks if the user mode process has permission for the particular operation. Since kernel data is protected, the kernel can enforce security policy - the user process can't directly give itself permissions.
Permissions are sometimes called privileges, so that's why I wanted to explain how processor modes work in enforcing security permissions.
Related
From my understanding. kernel mode is a hardware feature. Ex. it can be set via a register (value1 -> kernel mode, value2 -> user mode).
When the kernel loads and runs an user application, the user application should communicate to the kernel via system call to perform privileged action, during which an interrupt will happen, the execution will switch to kernel mode and the privileged action performed.
My question is:
What is the mechanism that prevents a malicious user application from setting that "mode" register and enter the kernel mode (ex. for x86)?
It make sense that only the kernel can set this register, I would like to know more details about how this is enforced.
I don't know about how this is enforced in hardware itself. It also depends on the architecture. In software for x86, it depends because there are several entry points. When the CPU boots, it is in kernel mode. It can execute every instruction and do whatever it pleases with main memory.
The kernel will thus take advantage of this to set up the page tables and the interrupt handlers during boot before starting any user mode processes.
On x86, kernel mode vs user mode is enforced by the page tables. If a user mode process attempts to access a page which is set as kernel mode it will trigger a fault and call an interrupt handler in kernel mode. The kernel will thus kill the process.
Interrupts are not meant to be an entry point to the kernel. They can still be if a fault happens but then the user mode process won't know and it will sometimes kill the process (if the kernel decides it should).
On x86, the real entry point to the kernel is the LSTAR MSR register. This register can be set from kernel mode only. It can be used alongside the syscall instruction in assembly to jump to the address specified in the register. User mode processes cannot jump in the kernel unless they use the syscall instruction. It thus allows the kernel to set up some services for user mode that are called system calls.
I read that there are some privileged instructions in our system that can be executed in kernel mode. But I am unable to understand who make these instructions privileged . Is it the hardware manufacturer that hardwire some harmful instructions as privileged with the help of mode bit or is it the OS designers that make instructions privileged make them work only in privileged mode.
Kernel vs. user mode, and which instructions aren't allowed in user mode, is part of the ISA. That's baked in to the hardware.
CPU architects usually have a pretty good idea of what OSes need to do and want to prevent user-space from doing, so these choices at least make privilege levels possible, i.e. make it impossible for user-space to simply take over the machine.
But that's not the full picture: on some ISAs, such as x86, later ISA extensions have added control-register flag bits that let the OS choose whether some other instructions are privileged or not. On x86 that's done for instructions that could leak information about kernel ASLR, or make timing side-channels easier.
For example, rdpmc (read performance monitor counter) can only be used from user-space if specially enabled by the kernel. rdtsc (Read TimeStamp Counter) can be read from user-space by default, but the TSD (TimeStamp Disable) flag in CR4 can restrict its use to priv level 0 (kernel mode). Stopping user-space from using high-resolution timing is a brute-force way of defending against timing side-channel attacks.
Another x86 extension defends against leaking kernel addresses to make kernel ASLR more secret; CR4.UMIP (User Mode Instruction Prevention) disables instructions like sgdt that reads the virtual address of the GDT. Those instructions were basically useless for user-space in the first place, and unlike rdtsc easily could always have been privileged.
The Linux Kernel option to enable use of this extension describes it:
The User Mode Instruction Prevention (UMIP) is a security feature in newer Intel processors. If enabled, a general protection fault is issued if the SGDT, SLDT, SIDT, SMSW or STR instructions are executed in user mode. These instructions unnecessarily expose information about the hardware state.
The vast majority of applications do not use these instructions. For the very few that do, software emulation is provided in specific cases in protected and virtual-8086 modes. Emulated results are dummy.
Setting a new address for the IDT/GDT/LDT (e.g. lgdt/lidt) is of course a privileged instruction; those let you take over the machine. But until kernel ASLR was a thing, there wasn't any reason to stop user-space from reading the address. It could be in a page that had its page-table entry set to kernel only, preventing user-space from doing anything with that address. (... until Meltdown made it possible for user-space to use a speculative side-channel to read data from kernel-only pages that were hot in cache.)
I read this paragraph from " Modern Operating Systems , Tanenbaum "
Most computers have two modes of operation: kernel
mode and user mode. The operating system is the most fundamental piece of software and runs in kernel mode (also called supervisor mode). In this mode it has complete access to all the hardware and can execute any instruction the machine is capable of executing. The rest of the software runs in user mode, in which only a subset of the machine instructions is available.
I am unable to get how they are describing difference in these two modes on basis of machine instructions available , at user end any software has the capability to make any changes at the hardware level ,like we have software which can affect the functioning of CPU , can play with registry details , so how can we say that at user mode , we have only subset of machine instructions available ?
The instructions that are available only in kernel mode are tend to be very few. These instructions are those that are only needed to manage the system.
For example, most processors have a HALT instruction that stops the CPU that is used for system shutdowns. Obviously you would not want any user to be able to execute HALT and stop the computer for everyone. Such instructions are then made only accessible in kernel mode.
Processors use a table of handlers for interrupt and exceptions. The Operating system creates such a table listing the handlers for these events. Then it loads register(s) giving the location(and size) of the table. The instructions for loading this register(s) are kernel mode only. Otherwise, any application could create total havoc on the system.
Instructions of these nature will trigger an exception if executing in user mode.
Such instructions tend to be few in number.
Well, in user-mode, there is definitely a subset of instructions available. This is the reason we have System Calls.
Example:
A user wants to create a new process in C. He cannot do that without entering kernel-mode, because certain set of instructions are only available to kernel-mode, So he uses the system call fork, that executes instructions for creating a new process (not available in user-mode). So System call is a mechanism of requesting a service from kernel of the OS to do something for the user, which he/she cannot write code for.
Following excerpt from above link sums it up in the best way:
A program is usually limited to its own address space so that it
cannot access or modify other running programs or the operating system
itself, and is usually prevented from directly manipulating hardware
devices (e.g. the frame buffer or network devices).
However, many normal applications obviously need access to these
components, so system calls are made available by the operating system
to provide well defined, safe implementations for such operations. The
operating system executes at the highest level of privilege, and
allows applications to request services via system calls, which are
often initiated via interrupts. An interrupt automatically puts the
CPU into some elevated privilege level, and then passes control to the
kernel, which determines whether the calling program should be granted
the requested service. If the service is granted, the kernel executes
a specific set of instructions over which the calling program has no
direct control, returns the privilege level to that of the calling
program, and then returns control to the calling program.
What is the difference between OS privilege levels and privilege levels of the underlying hardware? Do all system calls cause a trap to the kernel? Why do system calls cause a trap? Is it because of privileged instructions such as IN in their assembly code?
To answer your questions directly:
What is the difference between OS privilege levels and privilege levels of the underlying hardware?
The privileges that must be enforced on a code level (ie privileged instructions) must be supported in hardware. Many OS security levels (ie permissions to access a specific hardware device) does not require hardware support for authenticating that very specific function, but it does at least require that there be hardware support to block code from accessing the device. So, in short, the ability of the OS to implement privilege levels depends on the underlying hardware, but the two need not be the same.
Do all system calls cause a trap to the kernel? Why do system calls cause a trap? Is it because of privileged instructions such as IN in their assembly code?
That's essentially correct. Trapping is the natural mechanism by which unprivileged code can transition to a privileged level. If, for example, a user program needs to access some piece of hardware, then IN and OUT are privileged so it must 'trap' to the Kernel and then the kernel will perform the required operations and return.
I am new to this OS stuff. Since the kernel controls the execution of all other programs and the resources they need, I think it should also be executed by the CPU. If so, where does it gets executed? and if i think that what CPU should execute is controlled by the kernel, then how does kernel controls the CPU if the CPU is executing the kernel itself!!!..
It seems like a paradox for me... plz explain... and by the way i didn't get these CPU modes at all... if kernel is controlling all the processes... why are these CPU modes then? if they are there, then are they implemented by the software(OS) or the hardware itself??
thanq...
A quick answer. On platforms like x86, the kernel has full control of the CPU's interrupt and context-switching abilities. So, although the kernel is not running most of the time, every so often it has a chance to decide which program the CPU will switch to and allow some running for that program. This part of the kernel is called the scheduler. Other than that the kernel gets a chance to execute every time a program makes a system call (such as a request to access some hardware, e.g. disk drive, etc.)
P.S The fact that the kernel can stop a running program, seize control of the CPU and schedule a different program is called preemptive multitasking
UPDATE: About CPU modes, I assume you mean the x86-style rings? These are permission levels on the CPU for currently executing code, allowing the CPU to decide whether the program that is currently running is "the kernel" and can do whatever it wants, or perhaps it is a lower-permission-level program that cannot do certain things (such as force a context switch or fiddle with virtual memory)
There is no paradox:
The kernel is a "program" that runs on the machine it controls. It is loaded by the boot loader at the startup of the machine.
Its task is to provide services to applications and control applications.
To do so, it must control the machine that it is running on.
For details, read here: http://en.wikipedia.org/wiki/Operating_System