PC and CPU registers when context switching happens? - operating-system

According to this question: Storing and retrieving process control block
PCB contains a lot of information and it is managed by the kernel (to avoid user access).
But I have question about PC and CPU registers. Is Kernel save these values every time an instruction is executed or only in context switching process?
Are PCBs linked list?

Actually, the value of CPU registers are modified as per the running sequence of instructions.
Say,the Instruction Pointer points to next instruction to be executed, the Stack Pointer,if active,would store the address of the last program request in a stack. And so on. These all are basically CPU registers!
PCB has one of the part Processor state data,which are those pieces
of information that define the status of a process when it's
suspended, allowing the OS to restart it later and still execute
correctly. This always includes the content of the CPU general-purpose
registers, the CPU process status word, stack and frame pointers etc.
During context switch, the running process is stopped and another
process is given a chance to run. The kernel must stop the execution
of the running process, copy out the values in hardware registers to
its PCB, and update the hardware registers with the values from the
PCB of the new process. // (Taken from Wikipedia)
Does Kernel save these values every time an instruction is executed or only in context switching process?
So,you might have got your question solved. The kernel only bothers saving values of hardware(CPU) registers in the case of context switching,not normally. Else,it leaves the burden on process itself to maintain the registers!
Also, the last question's answer is---The implementation of PCB is 'generally' done as a doubly linked list data structure!

Related

hardware implementation of algorithms using in os scheduler

1.in os when a new process comes , does hardware make interrupts (while another process is running) for os to create a new PCB data structure for this new process ?
2.Consider Completely Fair Scheduling (CFS) algorithms : when a process is running (there is one cpu core) as we know it gives priority to a process that has lowest run time until current time , consider a process that is running and the quantum does not expire yet , in this time one process s state turns to ready , Will this make interrupt (so os can reschedule) ?
thanks.
1.in os when a new process comes , does hardware make interrupts (while another process is running) for os to create a new PCB data structure for this new process ?
No; typically the hardware has no idea what any OS uses to keep track of processes (e.g. the contents and order of a PCB data structure's fields, if the OS has a PCB data structure at all, how the OS manages/keeps track of various structures, etc).
Instead, existing software typically calls a kernel system call that provides information about the new process, and the kernel constructs whatever data structures that the kernel wants.
For one possible example; an OS might have a "int SpawnProcess(char *executableFileName, char *processName, int maxThreadPriority)" function; and (when someone calls that function) the kernel might construct a PCB (and set the process name field in that structure to whatever the caller said, set the file name field, set the max thread priority, etc), then set other fields (CPU time consumed by process, number of threads belonging to process, amount of memory consumed by the process, ...) to default values; then put some kind of reference to the new PCB on some kind of a master list of processes that exist; then create a TCB (thread control block) for the process' initial thread (and set fields in that structure to default values - thread state, initial thread name, initial thread priority, signal mask, default CPU register state, etc); then put some kind of reference to the new thread in the new process' PCB; then put some kind of reference to the new thread on a scheduler queue (so that the scheduler knows the thread exists and will give it CPU time). When the scheduler actually does give the new process' initial thread some CPU time, it might start running kernel code that creates a new virtual address space, then loads the executable file and does things like dynamic linking of shared libraries (before finding the entry point from the executable file and jumping/returning to the executable's entry point). All of that is done with normal software without any special hardware features.
2.Consider Completely Fair Scheduling (CFS) algorithms : when a process is running (there is one cpu core) as we know it gives priority to a process that has lowest run time until current time , consider a process that is running and the quantum does not expire yet , in this time one process s state turns to ready , Will this make interrupt (so os can reschedule) ? thanks.
This works in the opposite order. Something will happen (a kernel system call or an IRQ) that will (eventually, after device driver and/or other kernel code does some work) cause one or more blocked tasks to unblock and become ready to run; and when that happens (e.g. when an "unblockTask(taskID task)" function in the scheduler is called by something else in the kernel) the scheduler may decide if the recently unblocked task should/shouldn't preempt the currently running task (and scheduler itself may have no clue why the task was unblocked or if any system call or interrupt was originally involved).

How the speculative load and store happen in modern Intel processor? [duplicate]

Given the small program shown below (handcrafted to look the same from a sequential consistency / TSO perspective), and assuming it's being run by a superscalar out-of-order x86 cpu:
Load A <-- A in main memory
Load B <-- B is in L2
Store C, 123 <-- C is L1
I have a few questions:
Assuming a big enough instruction-window, will the three instructions be fetched, decoded, executed at the same time? I assume not, as that would break execution in program order.
The 2nd load is going to take longer to fetch A from memory than B. Will the later have to wait until the first is fully executed? Will the fetching of B only start after Load A is fully executed? or until when does it have to wait?
Why would the store have to wait for the loads? If yes, will the instruction just wait to be committed in the store buffer until the loads finish or after decoding it will have to sit and wait for the loads?
Thanks
Terminology: "instruction-window" normally means out-of-order execution window, over which the CPU can find ILP. i.e. ROB or RS size. See Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths
The term for how many instructions can go through the pipeline in a single cycle is pipeline width. e.g. Skylake is 4-wide superscalar out-of-order. (Parts of its pipeline, like decode, uop-cache fetch, and retirement, are wider than 4 uops, but issue/rename is the narrowest point.)
Terminology: "wait to be committed in the store buffer" store data + address gets written into the store buffer when a store executes. It commits from the store buffer to L1d at any point after retirement, when it's known to be non-speculative.
(In program order, to maintain the TSO memory model of no store reordering. A store buffer allows stores to execute inside this core out of order but still commit to L1d (and become globally visible) in-order. Executing a store = writing address + data to the store buffer.)
Can a speculatively executed CPU branch contain opcodes that access RAM?
Also what is a store buffer? and
Size of store buffers on Intel hardware? What exactly is a store buffer?
The front-end is irrelevant. 3 consecutive instructions might well be fetched in the same 16-byte fetch block, and might go through pre-decode and decode in the same cycle as a group. And (also or instead) issue into the out-of-order back-end as part of a group of 3 or 4 uops. IDK why you think any of that would cause any potential problem.
The front end (from fetch to issue/rename) processes instructions in program order. Processing simultaneously doesn't put later instructions before earlier ones, it puts them at the same time. And more importantly, it preserves the information of what program order is; that's not lost or discarded because it matters for instructions that depend on the previous one1!
There are queues between most pipeline stages, so (for example on Intel Sandybridge) instructions that pre-decode as part of a group of up-to-6 instructions might not hit the decoders as part of the same group of up-to-4 (or more with macro-fusion). See https://www.realworldtech.com/sandy-bridge/3/ for fetch, and the next page for decode. (And the uop cache.)
Executing (dispatching uops to execution ports from the out-of-order scheduler) is where ordering matters. The out-of-order scheduler has to avoid breaking single threaded code.2
Usually issue/rename is far ahead of execution, unless you're bottlenecked on the front-end. So there's normally no reason to expect that uops that issued together will execute together. (For the sake of argument, let's assume that the 2 loads you show do get dispatched for execution in the same cycle, regardless of how they got there via the front-end.)
But anyway, there's no problem here starting both loads and the store the same time. The uop scheduler doesn't know whether a load will hit or miss in L1d. It just sends 2 load uops to the load execution units in a cycle, and a store-address + store-data uop to those ports.
[load ordering]
This is the tricky part.
As I explained in an answer + comments on your last question, modern x86 CPUs will speculatively use the L2 hit result from Load B for later instructions, even though the memory model requires that this load happens after Load A.
But if no other cores write to cache line B before Load A completes, then nothing can tell the difference. The Memory-Order Buffer takes care of detecting invalidations of cache lines that were loaded from before earlier loads complete, and doing a memory-order mis-speculation pipeline flush (rollback to retirement state) in the rare case that allowing load re-ordering could change the result.
Why would the store have to wait for the loads?
It won't, unless the store-address depends on a load value. The uop scheduler will dispatch the store-address and store-data uops to execution units when their inputs are ready.
It's after the loads in program order, and the store buffer will make it even farther after the loads as far as global memory order is concerned. The store buffer won't commit the store data to L1d (making it globally visible) until after the store has retired. Since it's after the loads, they'll have also retired.
(Retirement is in-order to allow precise exceptions, and to make sure no previous instructions took an exception or were a mispredicted branch. In-order retirement allows us to say for sure that an instruction is non-speculative after it retires.)
So yes, this mechanism does ensure that the store can't commit to L1d until after both loads have taken data from memory (via L1d cache which provides a coherent view of memory to all cores). So this prevents LoadStore reordering (of earlier loads with later stores).
I'm not sure if any weakly-ordered OoO CPUs do LoadStore reordering. It is possible on in-order CPUs when a cache-miss load comes before a cache-hit store, and the CPU uses scoreboarding to avoid stalling until the load data is actually read from a register, if it still isn't ready. (LoadStore is a weird one: see also Jeff Preshing's Memory Barriers Are Like Source Control Operations). Maybe some OoO exec CPUs can also track cache-miss stores post retirement when they're known to be definitely happening, but the data just still hasn't arrived yet. x86 doesn't do this because it would violate the TSO memory model.
Footnote 1: There are some architectures (typically VLIW) where bundles of simultaneous instructions are part of the architecture in a way that's visible to software. So if software can't fill all 3 slots with instructions that can execute simultaneously, it has to fill them with NOPs. It might even be allowed to swap 2 registers with a bundle that contained mov r0, r1 and mov r1, r0, depending on whether the ISA allows instructions in the same bundle to read and write the same registers.
But x86 is not like that: superscalar out-of-order execution must always preserve the illusion of running instructions one at a time in program order. The cardinal rule of OoO exec is: don't break single-threaded code.
Anything that would violate this can only be done with checking for hazards, or speculatively with rollback upon detection of mistakes.
Footnote 2: (continued from footnote 1)
You can fetch / decode / issue two back-to-back inc eax instructions, but they can't execute in the same cycle because register renaming + the OoO scheduler has to detect that the 2nd one reads the output of the first.

How is it possible for OS processes to manage User processes while they themselves are processes?

Recently, I have been reading about Operating Systems, and this bugs me a lot.
How is it really possible for one process to manage other process.
Basically a CPU simply executes instructions, after executing one instruction, then it executes the instruction at address pointed by IP and increments the IP.
Let me elaborate my doubt with an example. Lets say I have an User process (or simply a process) which is being executed by CPU. Lets say, it has 'n' instruction and currently executing 'i'th instruction. IP points to (i+1)th instruction.
So, at this point how can all other OS processes like Scheduler, dispatcher etc... comes into play, Since CPU is already executing another process.
One solution (Just a guess), I could think of is , the use of Interrupts and Interrupt Service Routines.
But its only a guess.
PS: I searched and couldn't find any satisfying answer.
With the help of the hardware, ticks causes the CPU to execute operating system code. This code checks the system state and the time that has elapsed since the beginning of this process execution. At this point, the operating system can decide to schedule a different process. All it has to do is save the current state of the running process with the process that is about to start running. (basically changing the content of the registers and saving the registers state before changing to the new process).
Eventually, the CPU is taken away even if the process doesn't want to yield it.
To address your concern, there are no operating system processes in the way you think... it isn't like there are OS processes in the queue waiting among other processes....

OS: does the process scheduler runs in separate process

I have few doubts about how operating system works.
Scheduler: Does the scheduler runs in a separate process(like any other process). What exactly happens at the time of swapping in new process(i know the processor registers and memory tables are updated, my question is how they are updated. Can we write a program to update the registers(sc, pc) to point to a different process).
The process schedule could feasibly run in a separate process, but such a design would be very inefficient since you would have to swap from one process to the scheduling process (which would then have to make several system calls to the kernel) and then back to the new process, as opposed to just placing the scheduler in the kernel where you will not need system calls nor need to swap contexts more than once. Therefore, the scheduler is generally in the exclusive realm of the kernel.
Here are the steps that occur:
The scheduler determines which process will run in the next time slot (through various different algorithms).
The scheduler tells the Memory Managing Unit (MMU) to use the page table for the next process to run (this is done by setting a register to point to the table).
The scheduler programs the Programmable Interrupt Timer (PIT) to generate an interrupt after N clock cycles.
The scheduler restores the state of the registers from when the process was last running (or sets them to default values for new processes)
The scheduler jumps to the address of the last instruction that was not executed in the process.
After N clock cycles, an interrupt occurs and the operating system recognizes it as caused by the PIT, which is registered to be handled by the scheduler.
The scheduler saves the state of the registers (including stack pointer, etc) and grabs the program counter of where the interrupt occured (and saves it as the address to jump to next time around) and then goes back to step 1.
This is just one example of how it can be done, and many of the low level details are architecture specific. Essentially all the registers (the program state) can be saved to any place in RAM (say a linked list of structures that represent processes each having space for the registers, etc) and the virtual address space (defined by page tables) can be arbitrarily swapped out.
So essentially your question:
"Can we write a program to update the registers to point to a different process?"
is simply stated, yet the answer is correct. We sure can.

how dispatcher works?

I have recently started my OS course. As far as i know the work of dispatcher is to save the context of current process and load context of process to be run next. But how does it do that? When a process is preempted then as soon as dispatcher will be loaded and executed ( as it is also a program ) the context of previous process in registers, PSW etc will be lost. How is it going to save the context before loading itself ?
The simple answer is that modern processors offer architectural extensions providing for several banks of registers that can be swapped in hardware, so up to X tasks get to retain their full set of registers.
The more complex answer is that the dispatcher, when triggered by an interrupt, receives the full register set of the program that was running at the time of interrupt (with the exception of the program counter, which is presumably propagated through a mutually-agreed-upon 'volatile' register or some such). Thus, the dispatcher must be carefully written to store the current state of register banks as its first operation upon being triggered. In short, the dispatcher itself has no immediate context and thus doesn't suffer from the same problem.
Here is an attempt at a simple description of what goes on during the dispatcher call:
The program that currently has context is running on the processor. Registers, program counter, flags, stack base, etc are all appropriate for this program; with the possible exception of an operating-system-native "reserved register" or some such, nothing about the program knows anything about the dispatcher.
The timed interrupt for dispatcher function is triggered. The only thing that happens at this point (in the vanilla architecture case) is that the program counter jumps immediately to whatever the PC address in the BIOS interrupt is listed as. This begins execution of the dispatcher's "dispatch" subroutine; everything else is left untouched, so the dispatcher sees the registers, stack, etc of the program that was previously executing.
The dispatcher (like all programs) has a set of instructions that operate on the current register set. These instructions are written in such a way that they know that the previously executing application has left all of its state behind. The first few instructions in the dispatcher will store this state in memory somewhere.
The dispatcher determines what the next program to have the cpu should be, takes all of its previously stored state and fills registers with it.
The dispatcher jumps to the appropriate PC counter as listed in the task that now has its full context established on the cpu.
To (over)simplify in summary; the dispatcher doesn't need registers, all it does is write the current cpu state to a predetermined memory location, load another processes' cpu state from a predetermined memory location, and jumps to where that process left off.
Does that make it any clearer?
It doesn't generally get loaded in such a way that you lose the information on the current process.
Often, it's an interrupt that happens in the context of the current process.
So the dispatcher (or scheduler) can save all relevant information in a task control block of some sort before loading up that information for the next process.
This includes the register contents, stack pointer and so on.
It's worth noting that the context of the next process includes its state of being in the dispatcher interrupt itself so that, when it returns from the interrupt, it's to a totally different process.
Dispatcher module gives control of the CPU to the process selected by the short-term scheduler; this involves:
switching context,
switching to user mode,
jumping to the proper location in the user program to restart that program
The operating system's principal responsibility is controlling the execution of processes. This includes determining the pattern for execution and allocating resources to the processes.
A process may be in one of the two states :
Running or
Not Running
When the OS creates a new process, it creates a process control block for the process and enters that process into the system into the Not Running state. The process exists is known to OS and is waiting for an opportunity to execute.
From time to time, the currently running processes will be interrupted and the dispatcher portion of the OS will select some other processes to run.
During execution when the process is devoid of resources, it gets blocked. Provided those resources it re-enters the ready state and then into running state. This transition from ready to running state is done by dispatcher. Dispatcher dispatches the process.