Time-Stamp Counter Restriction - operating-system

I want to check if the RDTSC instruction is available. There must be a Intel Pentium or newer processor and either the TSD flag in register CR4 is clear or it is set and the CPL equals 0.
So, there's no problem to obtain the current privilege level (Bits 0 and 1 of the CS segment register). Also there is no problem to check if the instruction itself is supported (CPUID.1:EDX[4] = 1).
But (and that's the problem), this must also run under user-mode (PL3). But, I can't read the control register CR4 in user-mode.
Is there any other way to check if the operation system does restrict the access to the time-stamp counter?

The only way is to "try" the instruction and intercept the exception, provided that the operating system gives you the ability to react to the event in a safe way and recover your state so you can continue your program. Unfortunately not all the OS permit to continue after an exception that they consider "fatal". On Windows you can try to play with the structured exception handling, on linux there are specific signals (SIGILL, in particular). But other OS don't forgive this kind of exceptions.
Bye
(edit)
PS: it's also possible, in principle, for an OS to incercept the exception and simulate the instruction so the application has no way to decide if the instruction is really available. I don't know if there are OSs that do this thing (virtual machines, maybe?).
Bye!

Related

Why can User call a system call directly?

Before I ask the question, the following is what I know.
The system call is in the kernel area.
The kernel area cannot be used (accessed) directly by the user.
There are two ways to call a system call.
direct call
wrapping function (API) that contains system call
(2. process:
(User Space) wrapping function ->
system call interface ->
(Kernel Space) System call)
So, in 1. case)
How can User use the kernel area directly?
Or I wonder if there's anything I'm mistaken about.
open sns question
internet search
read operating system concepts 10th (page. 64)
The default is that nothing in user-space is able to execute anything in kernel space. How that works depends on the CPU and the OS, but likely involves some kind of "privilege level" that must be matched or exceeded before the CPU will allow software to access the kernel's part of virtual memory.
This default behavior alone would be horribly useless. For an OS to work there must be some way for user-space to transfer control/execution to (at least one) clearly marked and explicitly allowed kernel entry point. This also depends on the OS and CPU.
For example; for "all 80x86" (including all CPUs and CPU modes) an OS can choose between:
a software interrupt (interrupt gate or trap gate)
an exception (e.g. breakpoint exception)
a call gate
a task gate
the sysenter instruction
the syscall instruction
..and most modern operating system choose to use the syscall instruction now.
All of these possibilities share 2 things in common:
a) There is an implied privilege level switch done by the CPU as part of the control transfer
b) The caller is unable to specify the address they're calling. Instead it's set by the kernel (e.g. during the kernel's initialization).

how does the operating system treat few interrupts and keep processes going?

I'm learning computer organization and structure (I'm using Linux OS with x86-64 architecture). we've studied that when an interrupt occurs in user mode, the OS is notified and it switches between the user stack and the kernel stack by loading the kernels rsp from the TSS, afterwards it saves the necessary registers (such as rip) and in case of software interrupt it also saves the error-code. in the end, just before jumping to the adequate handler routine it zeroes the TF and in case of hardware interrupt it zeroes the IF also. I wanted to ask about few things:
the error code is save in the rip, so why loading both?
if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated? in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt?
does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
how does an operating system keep other necessary progresses running while handling the interrupt?
thank you very much for your time and attention!
the error code is save in the rip, so why loading both?
You're misunderstanding some things about the error code. Specifically:
it's not generated by software interrupts (e.g. instructions like int 0x80)
it is generated by some exceptions (page fault, general protection fault, double fault, etc).
the error code (if used) is not saved in the RIP, it's pushed on the stack so that the exception handler can use it to get more information about the cause of the exception
2a. if I consider a case where few interrupts happen together which causes the IF and TF to turn on, if I zero the TF and IF, but I treat only one interrupt at a time, aren't I leave all the other interrupts untreated?
When the IF flag is clear, mask-able IRQs (which doesn't include other types of interrupts - software interrupts, exceptions) are postponed (not disabled) until the IF flag is set again. They're "temporarily untreated" until they're treated later.
The TF flag only matters for debugging (e.g. single-step debugging, where you want the CPU to generate a trap after every instruction executed). It's only cleared in case the process (in user-space) was being debugged, so that you don't accidentally continue debugging the kernel itself; but most processes aren't being debugged like this so most of the time the TF flag is already clear (and clearing it when it's already clear doesn't really do anything).
2b. in general, how does the OS treat few interrupts that occur at the same time when using the method of IDT with specific vector for each interrupt? does this happen because each program has it's own virtual memory and thus the interruption handling processes of all the programs are unrelated? where can i read more about it?
There's complex rules that determine when an interrupt can interrupt (including when it can interrupt another interrupt). These rules mostly only apply to IRQs (not software interrupts that the kernel won't ever use itself, and not exceptions which are taken as soon as they occur). Understanding the rules means understanding the IF flag and the interrupt controller (e.g. how interrupt vectors and the "task priority register" in the local APIC influence the "processor priority register" in the local APIC, which determines which groups of IRQs will be postponed when the IF flag is set). Information about this can be obtained from Intel's manuals, but how Linux uses it can only be obtained from Linux source code and/or Linux specific documentation.
On top of that there's "whatever mechanisms and practices the OS felt like adding on top" (e.g. deferred procedure calls, tasklets, softIRQs, additional stack management) that add more complications (which can also only be obtained from Linux source code and/or Linux specific documentation).
Note: I'm not a Linux kernel developer so can't/won't provide links to places to look for Linux specific documentation.
how does an operating system keep other necessary progresses running while handling the interrupt?
A single CPU can't run 2 different pieces of code (e.g. an interrupt handler and user-space code) at the same time. Instead it runs them one at a time (e.g. runs user-space code, then switches to an IRQ handler for very short amount of time, then returns to the user-space code). Because the IRQ handler only runs for a very short amount of time it creates the illusion that everything is happening at the same time (even though it's not).
Of course when you have multiple CPUs, different CPUs can/do run different pieces of code at the same time.

Motorola 68K TRAP instruction as a bridge to OS

I'm not an expert, but just a hobbyist. I was playing with 68000 architecture in the past and I've been always thinking of its TRAP instruction. This instruction is always described as a "bridge" to an OS (in some systems however it's not used in this regard, but that's a different story). How this is achieved? TRAP itself is a privileged instruction, so how this OS invoking mechanism works in user mode? My guess is that the privilege violation exception is triggered and the exception handler checks what particular instruction has caused the exception. If it's a TRAP instruction then the instruction is simply executed (maybe TRAP's operand i.e. TRAP vector number is checked as well), of course now in the supervisor mode. Am I right?
The TRAP instruction is not privileged, you can call it from either user mode or supervisor mode.
It's the TRAP instruction itself that will force the CPU to supervisor mode, and then depending of the #xx number you used will jump to any of the 16 possible callbacks from the memory area $80 to $BC.
TRAP also pushes to the stack the PC and SR values, so when the last function call returns it goes back to whatever mode was setup before you called TRAP.

What exactly happens when an OS goes into kernel mode?

I find that neither my textbooks or my googling skills give me a proper answer to this question. I know it depends on the operating system, but on a general note: what happens and why?
My textbook says that a system call causes the OS to go into kernel mode, given that it's not already there. This is needed because the kernel mode is what has control over I/O-devices and other things outside of a specific process' adress space. But if I understand it correctly, a switch to kernel mode does not necessarily mean a process context switch (where you save the current state of the process elsewhere than the CPU so that some other process can run).
Why is this? I was kinda thinking that some "admin"-process was switched in and took care of the system call from the process and sent the result to the process' address space, but I guess I'm wrong. I can't seem to grasp what ACTUALLY is happening in a switch to and from kernel mode and how this affects a process' ability to operate on I/O-devices.
Thanks alot :)
EDIT: bonus question: does a library call necessarily end up in a system call? If no, do you have any examples of library calls that do not end up in system calls? If yes, why do we have library calls?
Historically system calls have been issued with interrupts. Linux used the 0x80 vector and Windows used the 0x2F vector to access system calls and stored the function's index in the eax register. More recently, we started using the SYSENTER and SYSEXIT instructions. User applications run in Ring3 or userspace/usermode. The CPU is very tricky here and switching from kernel mode to user mode requires special care. It actually involves fooling the CPU to think it was from usermode when issuing a special instruction called iret. The only way to get back from usermode to kernelmode is via an interrupt or the already mentioned SYSENTER/EXIT instruction pairs. They both use a special structure called the TaskStateSegment or TSS for short. These allows to the CPU to find where the kernel's stack is, so yes, it essentially requires a task switch.
But what really happens?
When you issue an system call, the CPU looks for the TSS, gets its esp0 value, which is the kernel's stack pointer and places it into esp. The CPU then looks up the interrupt vector's index in another special structure the InterruptDescriptorTable or IDT for short, and finds an address. This address is where the function that handles the system call is. The CPU pushes the flags register, the code segment, the user's stack and the instruction pointer for the next instruction that is after the int instruction. After the systemcall has been serviced, the kernel issues an iret. Then the CPU returns back to usermode and your application continues as normal.
Do all library calls end in system calls?
Well most of them do, but there are some which don't. For example take a look at memcpy and the rest.

Where is the mode bit?

I just read this in "Operating System Concepts" from Silberschatz, p. 18:
A bit, called the mode bit, is added to the hardware of the computer
to indicate the current mode: kernel(0) or user(1). With the mode bit,
we are able to distinguish between a task that is executed on behalf
of the operating system and one that is executed on behalf of the
user.
Where is the mode bit stored?
(Is it a register in the CPU? Can you read the mode bit? As far as I understand it, the CPU has to be able to read the mode bit. How does it know which program gets mode bit 0? Do programs with a special adress get mode bit 0? Who does set the mode bit / how is it set?)
Please note that your question depends highly on the CPU itselt; though it's uncommon you might come across certain processors where this concept of user-level/kernel-level does not even exist.
The cs register has another important function: it includes a 2-bit
field that specifies the Current Privilege Level (CPL) of the CPU. The
value 0 denotes the highest privilege level, while the value 3 denotes
the lowest one. Linux uses only levels 0 and 3, which are respectively
called Kernel Mode and User Mode.
(Taken from "Understanding the Linux Kernel 3e", section 2.2.1)
Also note, this depends on the CPU as you can clearly see and it'll change from one to another but the concept, generally, holds.
Who sets it? Typically, the kernel/cpu and a user-process cannot change it but let me explain something here.
**This is an over-simplification, do not take it as it is**
Let's assume that the kernel is loaded and the first application has just started(the first shell), the kernel loads everything for this application to start, sets the bit in the cs register(if you are running x86) and then jumps to the code of the Shell process.
The shell will continue to execute all of its instructions in this context, if the process contains some privileged instruction, the cpu will fetch it and won't execute it; it'll give an exception(hardware exception) that tells the kernel someone tried to execute a privileged instruction and here the kernel code handles the job(CPU sets the cs to kernel mode and jumps to some known-location to handle this type of errors(maybe terminating the process, maybe something else).
So how can a process do something privileged? Talking to a certain device for instance?
Here comes the System Calls; the kernel will do this job for you.
What happens is the following:
You set what you want in a certain place(For instance you set that you want to access a file, the file location is x, you are accessing for reading etc) in some registers(the kernel documentation will let you know about this) and then(on x86) you will call int0x80 instruction.
This interrupts the CPU, stops your work, sets the mode to kernel mode, jumps the IP register to some known-location that has the code which serves file-IO requests and moves from there.
Once your data is ready, the kernel will set this data in a place you can access(memory location, register; it depends on the CPU/Kernel/what you requested), sets the cs flag to user-mode and jumps back to your instruction next to the it int 0x80 instruction.
Finally, this happens whenever a switch happens, the kernel gets notified something happened so the CPU terminates your current instruction, changes the CPU status and jumps to where the code that handles this thing; the process explained above, roughly speaking, applies to how a switch between kernel mode and user-mode happens.
It's a CPU register. It's only accessible if you're already in kernel mode.
The details of how it gets set depend on the CPU design. In most common hardware, it gets set automatically when executing a special opcode that's used to perform system calls. However, there are other architectures where certain memory pages may have a flag set that indicates that they are "gateways" to the kernel -- calling a function on these pages sets the kernel mode bit.
These days it's given other names such as Supervisor Mode or a protection ring.