Where is the mode bit? - operating-system

I just read this in "Operating System Concepts" from Silberschatz, p. 18:
A bit, called the mode bit, is added to the hardware of the computer
to indicate the current mode: kernel(0) or user(1). With the mode bit,
we are able to distinguish between a task that is executed on behalf
of the operating system and one that is executed on behalf of the
user.
Where is the mode bit stored?
(Is it a register in the CPU? Can you read the mode bit? As far as I understand it, the CPU has to be able to read the mode bit. How does it know which program gets mode bit 0? Do programs with a special adress get mode bit 0? Who does set the mode bit / how is it set?)

Please note that your question depends highly on the CPU itselt; though it's uncommon you might come across certain processors where this concept of user-level/kernel-level does not even exist.
The cs register has another important function: it includes a 2-bit
field that specifies the Current Privilege Level (CPL) of the CPU. The
value 0 denotes the highest privilege level, while the value 3 denotes
the lowest one. Linux uses only levels 0 and 3, which are respectively
called Kernel Mode and User Mode.
(Taken from "Understanding the Linux Kernel 3e", section 2.2.1)
Also note, this depends on the CPU as you can clearly see and it'll change from one to another but the concept, generally, holds.
Who sets it? Typically, the kernel/cpu and a user-process cannot change it but let me explain something here.
**This is an over-simplification, do not take it as it is**
Let's assume that the kernel is loaded and the first application has just started(the first shell), the kernel loads everything for this application to start, sets the bit in the cs register(if you are running x86) and then jumps to the code of the Shell process.
The shell will continue to execute all of its instructions in this context, if the process contains some privileged instruction, the cpu will fetch it and won't execute it; it'll give an exception(hardware exception) that tells the kernel someone tried to execute a privileged instruction and here the kernel code handles the job(CPU sets the cs to kernel mode and jumps to some known-location to handle this type of errors(maybe terminating the process, maybe something else).
So how can a process do something privileged? Talking to a certain device for instance?
Here comes the System Calls; the kernel will do this job for you.
What happens is the following:
You set what you want in a certain place(For instance you set that you want to access a file, the file location is x, you are accessing for reading etc) in some registers(the kernel documentation will let you know about this) and then(on x86) you will call int0x80 instruction.
This interrupts the CPU, stops your work, sets the mode to kernel mode, jumps the IP register to some known-location that has the code which serves file-IO requests and moves from there.
Once your data is ready, the kernel will set this data in a place you can access(memory location, register; it depends on the CPU/Kernel/what you requested), sets the cs flag to user-mode and jumps back to your instruction next to the it int 0x80 instruction.
Finally, this happens whenever a switch happens, the kernel gets notified something happened so the CPU terminates your current instruction, changes the CPU status and jumps to where the code that handles this thing; the process explained above, roughly speaking, applies to how a switch between kernel mode and user-mode happens.

It's a CPU register. It's only accessible if you're already in kernel mode.
The details of how it gets set depend on the CPU design. In most common hardware, it gets set automatically when executing a special opcode that's used to perform system calls. However, there are other architectures where certain memory pages may have a flag set that indicates that they are "gateways" to the kernel -- calling a function on these pages sets the kernel mode bit.

These days it's given other names such as Supervisor Mode or a protection ring.

Related

how does the operating system treat few interrupts and keep processes going?

I'm learning computer organization and structure (I'm using Linux OS with x86-64 architecture). we've studied that when an interrupt occurs in user mode, the OS is notified and it switches between the user stack and the kernel stack by loading the kernels rsp from the TSS, afterwards it saves the necessary registers (such as rip) and in case of software interrupt it also saves the error-code. in the end, just before jumping to the adequate handler routine it zeroes the TF and in case of hardware interrupt it zeroes the IF also. I wanted to ask about few things:
the error code is save in the rip, so why loading both?
if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated? in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt?
does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
how does an operating system keep other necessary progresses running while handling the interrupt?
thank you very much for your time and attention!
the error code is save in the rip, so why loading both?
You're misunderstanding some things about the error code. Specifically:
it's not generated by software interrupts (e.g. instructions like int 0x80)
it is generated by some exceptions (page fault, general protection fault, double fault, etc).
the error code (if used) is not saved in the RIP, it's pushed on the stack so that the exception handler can use it to get more information about the cause of the exception
2a. if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated?
When the IF flag is clear, mask-able IRQs (which doesn't include other types of interrupts - software interrupts, exceptions) are postponed (not disabled) until the IF flag is set again. They're "temporarily untreated" until they're treated later.
The TF flag only matters for debugging (e.g. single-step debugging, where you want the CPU to generate a trap after every instruction executed). It's only cleared in case the process (in user-space) was being debugged, so that you don't accidentally continue debugging the kernel itself; but most processes aren't being debugged like this so most of the time the TF flag is already clear (and clearing it when it's already clear doesn't really do anything).
2b. in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt? does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
There's complex rules that determine when an interrupt can interrupt (including when it can interrupt another interrupt). These rules mostly only apply to IRQs (not software interrupts that the kernel won't ever use itself, and not exceptions which are taken as soon as they occur). Understanding the rules means understanding the IF flag and the interrupt controller (e.g. how interrupt vectors and the "task priority register" in the local APIC influence the "processor priority register" in the local APIC, which determines which groups of IRQs will be postponed when the IF flag is set). Information about this can be obtained from Intel's manuals, but how Linux uses it can only be obtained from Linux source code and/or Linux specific documentation.
On top of that there's "whatever mechanisms and practices the OS felt like adding on top" (e.g. deferred procedure calls, tasklets, softIRQs, additional stack management) that add more complications (which can also only be obtained from Linux source code and/or Linux specific documentation).
Note: I'm not a Linux kernel developer so can't/won't provide links to places to look for Linux specific documentation.
how does an operating system keep other necessary progresses running while handling the interrupt?
A single CPU can't run 2 different pieces of code (e.g. an interrupt handler and user-space code) at the same time. Instead it runs them one at a time (e.g. runs user-space code, then switches to an IRQ handler for very short amount of time, then returns to the user-space code). Because the IRQ handler only runs for a very short amount of time it creates the illusion that everything is happening at the same time (even though it's not).
Of course when you have multiple CPUs, different CPUs can/do run different pieces of code at the same time.

What exactly happens when an OS goes into kernel mode?

I find that neither my textbooks or my googling skills give me a proper answer to this question. I know it depends on the operating system, but on a general note: what happens and why?
My textbook says that a system call causes the OS to go into kernel mode, given that it's not already there. This is needed because the kernel mode is what has control over I/O-devices and other things outside of a specific process' adress space. But if I understand it correctly, a switch to kernel mode does not necessarily mean a process context switch (where you save the current state of the process elsewhere than the CPU so that some other process can run).
Why is this? I was kinda thinking that some "admin"-process was switched in and took care of the system call from the process and sent the result to the process' address space, but I guess I'm wrong. I can't seem to grasp what ACTUALLY is happening in a switch to and from kernel mode and how this affects a process' ability to operate on I/O-devices.
Thanks alot :)
EDIT: bonus question: does a library call necessarily end up in a system call? If no, do you have any examples of library calls that do not end up in system calls? If yes, why do we have library calls?
Historically system calls have been issued with interrupts. Linux used the 0x80 vector and Windows used the 0x2F vector to access system calls and stored the function's index in the eax register. More recently, we started using the SYSENTER and SYSEXIT instructions. User applications run in Ring3 or userspace/usermode. The CPU is very tricky here and switching from kernel mode to user mode requires special care. It actually involves fooling the CPU to think it was from usermode when issuing a special instruction called iret. The only way to get back from usermode to kernelmode is via an interrupt or the already mentioned SYSENTER/EXIT instruction pairs. They both use a special structure called the TaskStateSegment or TSS for short. These allows to the CPU to find where the kernel's stack is, so yes, it essentially requires a task switch.
But what really happens?
When you issue an system call, the CPU looks for the TSS, gets its esp0 value, which is the kernel's stack pointer and places it into esp. The CPU then looks up the interrupt vector's index in another special structure the InterruptDescriptorTable or IDT for short, and finds an address. This address is where the function that handles the system call is. The CPU pushes the flags register, the code segment, the user's stack and the instruction pointer for the next instruction that is after the int instruction. After the systemcall has been serviced, the kernel issues an iret. Then the CPU returns back to usermode and your application continues as normal.
Do all library calls end in system calls?
Well most of them do, but there are some which don't. For example take a look at memcpy and the rest.

The relation between privileged instructions, traps and system calls

I am trying to understand how a virtual machine monitor (VMM) virtualizes the CPU.
My understanding right now is that the CPU issues a protection fault interrupt when a privileged instruction is about to be executed while the CPU is in user mode. In high level languages like C, privileged instructions are wrapped inside system calls. For example, when an application needs the current date and time (instructions that interact with I/O devices are privileged), it calls a certain library function. The assembled version of this library function contains an instruction called 'int' that causes a trap in the CPU. The CPU switches from user mode to privileged mode and jumps to the trap handler the OS has provided. Each system call has its own trap handler. In this example, the trap handler reads the date and time from the hardware clock and returns, then the CPU switches itself from privileged to user mode. (source: http://elvis.rowan.edu/~hartley/Courses/OperatingSystems/Handouts/030Syscalls.html)
However, I am not quite sure this understanding is correct. This article mentions the notion that the (privileged) x86 popf instruction does not cause a trap, and thus complicates things for the VMM: http://www.csd.uwo.ca/courses/CS843a/papers/intro-vm.pdf. In my understanding the popf instruction should not cause a trap but a protection fault interrupt, when explicitly called by a user program and not through a system call.
So my two concrete questions are:
What happens when a user program executes a privileged instruction while the CPU is in user mode?
What happens when a user program performs a system call?
In no particular order:
Your confusion is mainly caused by the fact that the operating systems community does not have standardized vocabulary. Here are some terms that get slung around that sometimes mean the same thing, sometimes not: exception, fault, interrupt, system call, and trap. Any individual author will generally use the terms consistently, but different authors define them differently.
There are 3 different kinds of events that cause entry into privileged mode.
An asynchronous interrupt (caused, for example, by an i/o device needing service.)
A system call instruction (int on the x86). (More generally in the x86 manuals these are called traps and include a couple of other instructions (for debuggers mostly.))
An instruction that does something exceptional (illegal instruction, protection fault, divide-by-0, page fault, ...). (Different authors calls these exceptions, faults or traps. x86 manuals call these faults.)
Each interrupt, trap or fault has a different number associated with it.
In all cases:
The processor enters privileged mode.
The user-mode registers are saved somewhere.
The processor finds the base address of the interrupt vector table, and uses the interrupt/trap/fault number as an offset into the table. This gives a pointer to the service routine for that interrupt/trap/fault.
The processor jumps to the service routine. Now we are in protected mode, the user level state is all saved somewhere we can get at it, and we're in the correct code inside the operating system.
When the service routine is finished it calls an interrupt-return instruction (iret on x86.) (This is the subtle distinction between a fault and a trap on x86: faults return to the instruction that caused the fault, traps return to the instruction after the trap.)
Note the confusing name "interrupt vector table." Even though it is called an interrupt table, it is used for faults and traps as well. (Which leads some authors to call everything an interrupt.)
The popf issue is rather subtle. This is essentially a bug in the x86 architecture. When popf executes from user mode it does not cause a trap or fault (or exception or interrupt or whatever you want to call it.) It simply acts as a noop.
Does this matter? Well, for a normal OS it doesn't really matter. If, on the other hand, you are implementing a virtual machine monitor (like VMWare or Xen or Hyper-V), the VMM is running in protected mode, and you'd like to run the guest operating systems in user mode and efficiently emulate any protected mode code. When the guest operating system uses a popf instruction you want it to generate a general protection fault, but it doesn't. (The cli and sti instructions do generate a general protection fault if called from user mode, which is what you want.)
I'm not an expert on computer architecture. But I have several opinions for your consideration:
The CPU has two kinds of instructions
normal instructions, e.g., add, sub, etc.
privileged instructions, e.g., initiate I/O, load/store from protected memory etc.
The machine (CPU) has two kinds of modes (set by status bit in a protected register):
user mode: processor executes normal instructions in the user’s program
kernel mode: processor executes both normal and privileged instructions (OS == kernel)
Operating systems hide privileged instructions as system calls. And if user program calls them, it will cause an exception (throws a software interrupt), which
vectors to a kernel handler, trap to kernel modes and switch contexts.
Upon encountering a privileged instruction in user mode, processor trap to kernel mode. Depending on what happened it would be one of several traps, such as a memory access violation, an illegal instruction violation, or a register access violation. The trap switches the processor’s execution to kernel mode and switches control to the operating system, which then decides on a course of action. The address is defined by the trap vector, which is set up when the operating system starts up.

Kernel Code vs User Code

Here's a passage from the book
When executing kernel code, the system is in kernel-space execut-
ing in kernel mode.When running a regular process, the system is in user-space executing
in user mode.
Now what really is a kernel code and user code. Can someone explain with example?
Say i have an application that does printf("HelloWorld") now , while executing this application, will it be a user code, or kernel code.
I guess that at some point of time, user-code will switch into the kernel mode and kernel code will take over, but I guess that's not always the case since I came across this
For example, the open() library function does little except call the open() system call.
Still other C library functions, such as strcpy(), should (one hopes) make no direct use
of the kernel at all.
If it does not make use of the kernel, then how does it make everything work?
Can someone please explain the whole thing in a lucid way.
There isn't much difference between kernel and user code as such, code is code. It's just that the code that executes in kernel mode (kernel code) can (and does) contain instructions only executable in kernel mode. In user mode such instructions can't be executed (not allowed there for reliability and security reasons), they typically cause exceptions and lead to process termination as a result of that.
I/O, especially with external devices other than the RAM, is usually performed by the OS somehow and system calls are the entry points to get to the code that does the I/O. So, open() and printf() use system calls to exercise that code in the I/O device drivers somewhere in the kernel. The whole point of a general-purpose OS is to hide from you, the user or the programmer, the differences in the hardware, so you don't need to know or think about accessing this kind of network card or that kind of display or disk.
Memory accesses, OTOH, most of the time can just happen without the OS' intervention. And strcpy() works as is: read a byte of memory, write a byte of memory, oh, was it a zero byte, btw? repeat if it wasn't, stop if it was.
I said "most of the time" because there's often page translation and virtual memory involved and memory accesses may result in switched into the kernel, so the kernel can load something from the disk into the memory and let the accessing instruction that's caused the switch continue.

how does the processor know an instruction is making a system call

system call -- It is an instruction that generates an interrupt that causes OS to gain
control of processor.
so if a running process issue a system call (e.g. create/terminate/read/write etc), a interrupt is generated which cause the KERNEL TO TAKE CONTROL of the processor which then executes the required interrupt handler routine. correct?
then can anyone tell me how the processor known that this instruction is supposed to block the process, go to privileged mode, and bring kernel code.
I mean as a programmer i would just type stream1=system.io.readfile(ABC) or something, which translates to open and read file ABC.
Now what is monitoring the execution of this process, is there a magical power in the cpu to detect this?
As from what i have read a PROCESSOR can only execute only process at a time, so WHERE IS THE MONITOR PROGRAM RUNNING?
How can the KERNEL monitor if a system call is made or not when IT IS NOT IN RUNNING STATE!!
or does the computer have a SYSTEM CALL INSTRUCTION TABLE which it compares with before executing any instruction?
please help
thanku
The kernel doesn't monitor the process to detect a system call. Instead, the process generates an interrupt which transfers control to the kernel, because that's what software-generated interrupts do according to the instruction set reference manual.
For example, on Unix the process stuffs the syscall number in eax and runs an an int 0x80 instruction, which generates interrupt 0x80. The CPU reacts to this by looking in the Interrupt Descriptor Table to find the kernel's handler for that interrupt. This handler is the entry point for system calls.
So, to call _exit(0) (the raw system call, not the glibc exit() function which flushes buffers) in 32-bit x86 Linux:
movl $1, %eax # The system-call number. __NR_exit is 1 for 32-bit
xor %ebx,%ebx # put the arg (exit status) in ebx
int $0x80
Let's analyse each questions you have posed.
Yes, your understanding is correct.
See, if any process/thread wants to get inside kernel there are only two mechanisms, one is by executing TRAP machine instruction and other is through interrupts. Usually interrupts are generated by the hardware, so any other process/threads wants to get into kernel it goes through TRAP. So as usual when TRAP is executed by the process it issues interrupt (mostly software interrupt) to your kernel. Along with trap you will also mentions the system call number, this acts as input to your interrupt handler inside kernel. Based on system call number your kernel finds the system call function inside system call table and it starts to execute that function. Kernel will set the mode bit inside cs register as soon as it starts to handle interrupts to intimate the processor as current instruction is a privileged instruction. By this your processor will comes to know whether the current instruction is privileged or not. Once your system call function finished it's execution your kernel will execute IRET instruction. Which will clear mode bit inside CS register to inform whatever instruction from now inwards are from user mode.
There is no magical power inside processor, switching between user and kernel context makes us to think that processor is a magical thing. It is just a piece of hardware which has the capability to execute tons of instructions at a very high rate.
4..5..6. Answers for all these questions are answered in above cases.
I hope I've answered your questions up to some extent.
The interrupt controller signals the CPU that an interrupt has occurred, passes the interrupt number (since interrupts are assigned priorities to handle simultaneous interrupts) thus the interrupt number to determine wich handler to start. The CPu jumps to the interrupt handler and when the interrupt is done, the program state reloaded and resumes.
[Reference: Silberchatz, Operating System Concepts 8th Edition]
What you're looking for is mode bit. Basically there is a register called cs register. Normally its value is set to 3 (user mode). For privileged instructions, kernel sets its value to 0. Looking at this value, processor knows which kind of instruction it is. If you're interested digging more please refer this excellent article.
Other Ref.
Where is mode bit
Modern hardware supports multiple user sessions. If your hw supports multi user mode, i provides a mechanism called interrupt. An interrupt basically stops the execution of the current code to execute other code (e.g kernel code).
Which code is executed is decided by parameters, that get passed to the interrupt, by the code that issues the interrupt. The hw will increase the run level, load the kernel code into the memory and forces the cpu to execute this code. When the kernel code returns, it again directly informs the hw and the run level gets decreased.
The HW will then restore the cpu state before the interrupt and set the cpu the the next line in the code that started the interrupt. Done.
Since the code is actively calling the hw, which again actively calls the kernel, no monitoring needs to be done by the kernel itself.
Side note:
Try to keep your question short. Make clear what you want. The first answer was correct for the question you posted, you just didnt phrase it well. Make clear that you are new to the topic and need a detailed explanation of basic concepts instead of explaining what you understood so far and don't use caps lock.
Please accept the answer cnicutar provided. thank you.