I'm learning computer organization and structure (I'm using Linux OS with x86-64 architecture). we've studied that when an interrupt occurs in user mode, the OS is notified and it switches between the user stack and the kernel stack by loading the kernels rsp from the TSS, afterwards it saves the necessary registers (such as rip) and in case of software interrupt it also saves the error-code. in the end, just before jumping to the adequate handler routine it zeroes the TF and in case of hardware interrupt it zeroes the IF also. I wanted to ask about few things:
the error code is save in the rip, so why loading both?
if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated? in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt?
does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
how does an operating system keep other necessary progresses running while handling the interrupt?
thank you very much for your time and attention!
the error code is save in the rip, so why loading both?
You're misunderstanding some things about the error code. Specifically:
it's not generated by software interrupts (e.g. instructions like int 0x80)
it is generated by some exceptions (page fault, general protection fault, double fault, etc).
the error code (if used) is not saved in the RIP, it's pushed on the stack so that the exception handler can use it to get more information about the cause of the exception
2a. if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated?
When the IF flag is clear, mask-able IRQs (which doesn't include other types of interrupts - software interrupts, exceptions) are postponed (not disabled) until the IF flag is set again. They're "temporarily untreated" until they're treated later.
The TF flag only matters for debugging (e.g. single-step debugging, where you want the CPU to generate a trap after every instruction executed). It's only cleared in case the process (in user-space) was being debugged, so that you don't accidentally continue debugging the kernel itself; but most processes aren't being debugged like this so most of the time the TF flag is already clear (and clearing it when it's already clear doesn't really do anything).
2b. in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt? does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
There's complex rules that determine when an interrupt can interrupt (including when it can interrupt another interrupt). These rules mostly only apply to IRQs (not software interrupts that the kernel won't ever use itself, and not exceptions which are taken as soon as they occur). Understanding the rules means understanding the IF flag and the interrupt controller (e.g. how interrupt vectors and the "task priority register" in the local APIC influence the "processor priority register" in the local APIC, which determines which groups of IRQs will be postponed when the IF flag is set). Information about this can be obtained from Intel's manuals, but how Linux uses it can only be obtained from Linux source code and/or Linux specific documentation.
On top of that there's "whatever mechanisms and practices the OS felt like adding on top" (e.g. deferred procedure calls, tasklets, softIRQs, additional stack management) that add more complications (which can also only be obtained from Linux source code and/or Linux specific documentation).
Note: I'm not a Linux kernel developer so can't/won't provide links to places to look for Linux specific documentation.
how does an operating system keep other necessary progresses running while handling the interrupt?
A single CPU can't run 2 different pieces of code (e.g. an interrupt handler and user-space code) at the same time. Instead it runs them one at a time (e.g. runs user-space code, then switches to an IRQ handler for very short amount of time, then returns to the user-space code). Because the IRQ handler only runs for a very short amount of time it creates the illusion that everything is happening at the same time (even though it's not).
Of course when you have multiple CPUs, different CPUs can/do run different pieces of code at the same time.
Related
I have read some books about os kernel recently. I knew that when an event (like clock ticks) happens, it will trigger an interruption then the kernel's specified routine response.
So my questions are:
1)When an interruption was triggered and its corresponding kernel routine was still running, then another interruption was triggered for some sort of reason. How will the kernel response? Will it mask the second interruption when it was handling the first interruption? Or the first interruption's corresponding routine was interrupted by the second one? If the second condition was true, how the kernel make sure the routines are "reentrance"?
2)Does the kernel multithreaded or multiprocess? I mean when things go like the first question, the kernel will use CPU's extra cores to handle interruptions? If it did, how can the kernel make sure everything works correctly just like running on a single-core CPU?
1) If an interruption is triggered and its corresponding kernel routine is still running, then another interruption is triggered for some sort of reason; how will the kernel respond? Will it mask the second interruption when it was handling the first interruption? Or the first interruption's corresponding routine was interrupted by the second one?
Yes; different operating systems may either:
mask other IRQs while an IRQ is being handled
allow different IRQs to nest (interrupt each other)
allow all IRQs to nest (including the same IRQ interrupting itself)
mask some IRQs and allow other IRQs to nest
not use more than one IRQs (e.g. only use a timer IRQ, and poll everything else)
If the second condition was true, how does the kernel make sure the routines are "reentrant"?
If the OS designer decided that (some or all) IRQs may interrupt others; then they'll need to figure out how reentrancy will work for whatever cases they allowed. This can be "do nothing that causes a problem" (e.g. maybe IRQ handler just sends a notification to a task that does the real work later), and could be further restrictions (e.g. temporarily acquire a lock that prevents further IRQs for pieces of the IRQ handler that might cause a reentrancy problem but not other pieces that don't).
2) Does the kernel multithreaded or multiprocess? I mean when things go like the first question, the kernel will use CPU's extra cores to handle interruptions?
Yes; different operating systems may either use multi-threading or multi-processing (or both or neither); and may or may not use other cores to handle interrupts.
If it did, how can the kernel make sure everything works correctly just like running on a single-core CPU?
If a kernel does use other cores to handle interrupts; it will also do something to ensure everything works correctly. "Something" could be a system of locks, or transaction memory, or lock-free/block-free algorithms, or a "shared nothing" approach, or a combination of these things.
Quoting the following paragraph from Operating Systems: Three Easy Pieces,
Note that there are two types of register saves/restores that happen
during this protocol. The first is when the timer interrupt occurs; in
this case, the user registers of the running process are implicitly
saved by the hardware, using the kernel stack of that process. The
second is when the OS decides to switch from A to B; in this case, the
kernel registers are explicitly saved by the software (i.e., the OS),
but this time into memory in the process structure of the process.
Reading other literature on context switch I understand that timer interrupt throws the cpu into kernel mode which then saves the process context into kernel stack.
Why is the author talking about a multiple context save emphasising on hardware/software?
The author emphasizes on hardware/software part of it because basically its context saving that is being done , sometimes by hardware and sometimes by software.
When a timer interrupt occurs, the user registers are saved by hardware(meaning saved by the CPU itself) on the kernel stack of that process. When the interrupt handler code is finished, the user registers will be restored using the kernel stack of that process,thereby restoring user stack and process successfully returns to user mode from kernel mode.
In case of a context switch from process A to process B, the very kernel stacks of the two processes A and B are switched,inside the kernel, which indirectly means saving and restoring of kernel registers. The term Software is used because the scheduler process after selecting which process to run next calls a function(thats why software), that does the switching of kernel stacks. The context switch code need not worry about user register values - those are already safely saved away in the kernel stack by that point.
The first is when the timer interrupt occurs; in this case, the user registers of the running process are implicitly saved by the hardware, using the kernel stack of that process.
Often, only SOME of the registers are saved and this is usually to an interrupt stack.
The second is when the OS decides to switch from A to B; in this case, the kernel registers are explicitly saved by the software (i.e., the OS), but this time into memory in the process structure of the process.
Usually this switch occurs in HARDWARE through a special instruction. Maybe they are referring to the fact that the switch is triggered through software as opposed to an interrupt that is triggered by hardware.
Also thanks for that reference. I have just started to go through it. It MUCH better than most of the OS books that only serve to confuse.
I am trying to understand how a virtual machine monitor (VMM) virtualizes the CPU.
My understanding right now is that the CPU issues a protection fault interrupt when a privileged instruction is about to be executed while the CPU is in user mode. In high level languages like C, privileged instructions are wrapped inside system calls. For example, when an application needs the current date and time (instructions that interact with I/O devices are privileged), it calls a certain library function. The assembled version of this library function contains an instruction called 'int' that causes a trap in the CPU. The CPU switches from user mode to privileged mode and jumps to the trap handler the OS has provided. Each system call has its own trap handler. In this example, the trap handler reads the date and time from the hardware clock and returns, then the CPU switches itself from privileged to user mode. (source: http://elvis.rowan.edu/~hartley/Courses/OperatingSystems/Handouts/030Syscalls.html)
However, I am not quite sure this understanding is correct. This article mentions the notion that the (privileged) x86 popf instruction does not cause a trap, and thus complicates things for the VMM: http://www.csd.uwo.ca/courses/CS843a/papers/intro-vm.pdf. In my understanding the popf instruction should not cause a trap but a protection fault interrupt, when explicitly called by a user program and not through a system call.
So my two concrete questions are:
What happens when a user program executes a privileged instruction while the CPU is in user mode?
What happens when a user program performs a system call?
In no particular order:
Your confusion is mainly caused by the fact that the operating systems community does not have standardized vocabulary. Here are some terms that get slung around that sometimes mean the same thing, sometimes not: exception, fault, interrupt, system call, and trap. Any individual author will generally use the terms consistently, but different authors define them differently.
There are 3 different kinds of events that cause entry into privileged mode.
An asynchronous interrupt (caused, for example, by an i/o device needing service.)
A system call instruction (int on the x86). (More generally in the x86 manuals these are called traps and include a couple of other instructions (for debuggers mostly.))
An instruction that does something exceptional (illegal instruction, protection fault, divide-by-0, page fault, ...). (Different authors calls these exceptions, faults or traps. x86 manuals call these faults.)
Each interrupt, trap or fault has a different number associated with it.
In all cases:
The processor enters privileged mode.
The user-mode registers are saved somewhere.
The processor finds the base address of the interrupt vector table, and uses the interrupt/trap/fault number as an offset into the table. This gives a pointer to the service routine for that interrupt/trap/fault.
The processor jumps to the service routine. Now we are in protected mode, the user level state is all saved somewhere we can get at it, and we're in the correct code inside the operating system.
When the service routine is finished it calls an interrupt-return instruction (iret on x86.) (This is the subtle distinction between a fault and a trap on x86: faults return to the instruction that caused the fault, traps return to the instruction after the trap.)
Note the confusing name "interrupt vector table." Even though it is called an interrupt table, it is used for faults and traps as well. (Which leads some authors to call everything an interrupt.)
The popf issue is rather subtle. This is essentially a bug in the x86 architecture. When popf executes from user mode it does not cause a trap or fault (or exception or interrupt or whatever you want to call it.) It simply acts as a noop.
Does this matter? Well, for a normal OS it doesn't really matter. If, on the other hand, you are implementing a virtual machine monitor (like VMWare or Xen or Hyper-V), the VMM is running in protected mode, and you'd like to run the guest operating systems in user mode and efficiently emulate any protected mode code. When the guest operating system uses a popf instruction you want it to generate a general protection fault, but it doesn't. (The cli and sti instructions do generate a general protection fault if called from user mode, which is what you want.)
I'm not an expert on computer architecture. But I have several opinions for your consideration:
The CPU has two kinds of instructions
normal instructions, e.g., add, sub, etc.
privileged instructions, e.g., initiate I/O, load/store from protected memory etc.
The machine (CPU) has two kinds of modes (set by status bit in a protected register):
user mode: processor executes normal instructions in the user’s program
kernel mode: processor executes both normal and privileged instructions (OS == kernel)
Operating systems hide privileged instructions as system calls. And if user program calls them, it will cause an exception (throws a software interrupt), which
vectors to a kernel handler, trap to kernel modes and switch contexts.
Upon encountering a privileged instruction in user mode, processor trap to kernel mode. Depending on what happened it would be one of several traps, such as a memory access violation, an illegal instruction violation, or a register access violation. The trap switches the processor’s execution to kernel mode and switches control to the operating system, which then decides on a course of action. The address is defined by the trap vector, which is set up when the operating system starts up.
I just read this in "Operating System Concepts" from Silberschatz, p. 18:
A bit, called the mode bit, is added to the hardware of the computer
to indicate the current mode: kernel(0) or user(1). With the mode bit,
we are able to distinguish between a task that is executed on behalf
of the operating system and one that is executed on behalf of the
user.
Where is the mode bit stored?
(Is it a register in the CPU? Can you read the mode bit? As far as I understand it, the CPU has to be able to read the mode bit. How does it know which program gets mode bit 0? Do programs with a special adress get mode bit 0? Who does set the mode bit / how is it set?)
Please note that your question depends highly on the CPU itselt; though it's uncommon you might come across certain processors where this concept of user-level/kernel-level does not even exist.
The cs register has another important function: it includes a 2-bit
field that specifies the Current Privilege Level (CPL) of the CPU. The
value 0 denotes the highest privilege level, while the value 3 denotes
the lowest one. Linux uses only levels 0 and 3, which are respectively
called Kernel Mode and User Mode.
(Taken from "Understanding the Linux Kernel 3e", section 2.2.1)
Also note, this depends on the CPU as you can clearly see and it'll change from one to another but the concept, generally, holds.
Who sets it? Typically, the kernel/cpu and a user-process cannot change it but let me explain something here.
**This is an over-simplification, do not take it as it is**
Let's assume that the kernel is loaded and the first application has just started(the first shell), the kernel loads everything for this application to start, sets the bit in the cs register(if you are running x86) and then jumps to the code of the Shell process.
The shell will continue to execute all of its instructions in this context, if the process contains some privileged instruction, the cpu will fetch it and won't execute it; it'll give an exception(hardware exception) that tells the kernel someone tried to execute a privileged instruction and here the kernel code handles the job(CPU sets the cs to kernel mode and jumps to some known-location to handle this type of errors(maybe terminating the process, maybe something else).
So how can a process do something privileged? Talking to a certain device for instance?
Here comes the System Calls; the kernel will do this job for you.
What happens is the following:
You set what you want in a certain place(For instance you set that you want to access a file, the file location is x, you are accessing for reading etc) in some registers(the kernel documentation will let you know about this) and then(on x86) you will call int0x80 instruction.
This interrupts the CPU, stops your work, sets the mode to kernel mode, jumps the IP register to some known-location that has the code which serves file-IO requests and moves from there.
Once your data is ready, the kernel will set this data in a place you can access(memory location, register; it depends on the CPU/Kernel/what you requested), sets the cs flag to user-mode and jumps back to your instruction next to the it int 0x80 instruction.
Finally, this happens whenever a switch happens, the kernel gets notified something happened so the CPU terminates your current instruction, changes the CPU status and jumps to where the code that handles this thing; the process explained above, roughly speaking, applies to how a switch between kernel mode and user-mode happens.
It's a CPU register. It's only accessible if you're already in kernel mode.
The details of how it gets set depend on the CPU design. In most common hardware, it gets set automatically when executing a special opcode that's used to perform system calls. However, there are other architectures where certain memory pages may have a flag set that indicates that they are "gateways" to the kernel -- calling a function on these pages sets the kernel mode bit.
These days it's given other names such as Supervisor Mode or a protection ring.
system call -- It is an instruction that generates an interrupt that causes OS to gain
control of processor.
so if a running process issue a system call (e.g. create/terminate/read/write etc), a interrupt is generated which cause the KERNEL TO TAKE CONTROL of the processor which then executes the required interrupt handler routine. correct?
then can anyone tell me how the processor known that this instruction is supposed to block the process, go to privileged mode, and bring kernel code.
I mean as a programmer i would just type stream1=system.io.readfile(ABC) or something, which translates to open and read file ABC.
Now what is monitoring the execution of this process, is there a magical power in the cpu to detect this?
As from what i have read a PROCESSOR can only execute only process at a time, so WHERE IS THE MONITOR PROGRAM RUNNING?
How can the KERNEL monitor if a system call is made or not when IT IS NOT IN RUNNING STATE!!
or does the computer have a SYSTEM CALL INSTRUCTION TABLE which it compares with before executing any instruction?
please help
thanku
The kernel doesn't monitor the process to detect a system call. Instead, the process generates an interrupt which transfers control to the kernel, because that's what software-generated interrupts do according to the instruction set reference manual.
For example, on Unix the process stuffs the syscall number in eax and runs an an int 0x80 instruction, which generates interrupt 0x80. The CPU reacts to this by looking in the Interrupt Descriptor Table to find the kernel's handler for that interrupt. This handler is the entry point for system calls.
So, to call _exit(0) (the raw system call, not the glibc exit() function which flushes buffers) in 32-bit x86 Linux:
movl $1, %eax # The system-call number. __NR_exit is 1 for 32-bit
xor %ebx,%ebx # put the arg (exit status) in ebx
int $0x80
Let's analyse each questions you have posed.
Yes, your understanding is correct.
See, if any process/thread wants to get inside kernel there are only two mechanisms, one is by executing TRAP machine instruction and other is through interrupts. Usually interrupts are generated by the hardware, so any other process/threads wants to get into kernel it goes through TRAP. So as usual when TRAP is executed by the process it issues interrupt (mostly software interrupt) to your kernel. Along with trap you will also mentions the system call number, this acts as input to your interrupt handler inside kernel. Based on system call number your kernel finds the system call function inside system call table and it starts to execute that function. Kernel will set the mode bit inside cs register as soon as it starts to handle interrupts to intimate the processor as current instruction is a privileged instruction. By this your processor will comes to know whether the current instruction is privileged or not. Once your system call function finished it's execution your kernel will execute IRET instruction. Which will clear mode bit inside CS register to inform whatever instruction from now inwards are from user mode.
There is no magical power inside processor, switching between user and kernel context makes us to think that processor is a magical thing. It is just a piece of hardware which has the capability to execute tons of instructions at a very high rate.
4..5..6. Answers for all these questions are answered in above cases.
I hope I've answered your questions up to some extent.
The interrupt controller signals the CPU that an interrupt has occurred, passes the interrupt number (since interrupts are assigned priorities to handle simultaneous interrupts) thus the interrupt number to determine wich handler to start. The CPu jumps to the interrupt handler and when the interrupt is done, the program state reloaded and resumes.
[Reference: Silberchatz, Operating System Concepts 8th Edition]
What you're looking for is mode bit. Basically there is a register called cs register. Normally its value is set to 3 (user mode). For privileged instructions, kernel sets its value to 0. Looking at this value, processor knows which kind of instruction it is. If you're interested digging more please refer this excellent article.
Other Ref.
Where is mode bit
Modern hardware supports multiple user sessions. If your hw supports multi user mode, i provides a mechanism called interrupt. An interrupt basically stops the execution of the current code to execute other code (e.g kernel code).
Which code is executed is decided by parameters, that get passed to the interrupt, by the code that issues the interrupt. The hw will increase the run level, load the kernel code into the memory and forces the cpu to execute this code. When the kernel code returns, it again directly informs the hw and the run level gets decreased.
The HW will then restore the cpu state before the interrupt and set the cpu the the next line in the code that started the interrupt. Done.
Since the code is actively calling the hw, which again actively calls the kernel, no monitoring needs to be done by the kernel itself.
Side note:
Try to keep your question short. Make clear what you want. The first answer was correct for the question you posted, you just didnt phrase it well. Make clear that you are new to the topic and need a detailed explanation of basic concepts instead of explaining what you understood so far and don't use caps lock.
Please accept the answer cnicutar provided. thank you.