Can OS processes share a single CPU stack? - operating-system

Can processes share a single stack?
I'm currently thinking yes and no. That they "share" stack but it need to copy and save the information already there elsewere before using it and return it when the first process is getting picked up by the CPU again. But I might be confusing this with registers in general.
Can someone help me shed some light on this?

Processes do not share CPU stacks.
While processes can potentially share memory using shared-memory facilities, processes do not share memory by default. Operating systems try to minimize the amount of sharing between processes as a way to ensure security.
Sharing CPU stack between processes A and B would be detrimental to security, because process A would be able to poke around "junk" left on the stack by process B, and vice versa. Hackers managed to exploit an indirect sharing on a much smaller scale to create a major security vulnerability (you can read more about Meltdown and Spectre here). Sharing CPU stacks would create a much bigger problem.
It goes without saying that an attempt to share stacks would require a degree of inter-process synchronization that would be prohibitive to overall performance. An ability to make independent operations on CPU stack is so critical to concurrency that threads inside the same process are allocated separate stacks, even though they already share all the memory allocated to the process, so security is not a concern. Sharing stacks would effectively kill concurrency, because maintaining a shared stack would require frequent exclusive access with lots of synchronization.

Some systems use an interrupt stack shared by all processes. Generally, there is one interrupt stack per processor.
User stacks (and there is usually one for each processor mode used by the system) are unique to each process (or thread).

The difference between the registers and the stack is that the latter can be anywhere in memory (it is indirectly referenced by appropriate registers) while the formers are fixed (there is only one set of architecturally visible registers).
The stack is part of the state of a program, just like it make no sense to mix program instruction, data and context together, mixing two stacks make no sense.
If program A pushes X it expects to pop X and not an eventual value pushed by program B meanwhile.
It's possible to make all program shares the same memory area for the stack but this is, in general, counterproductive.
As you correctly noted, the stack must be swapped in an out, thus, in the case of two program A and B, two additional memory areas are needed: one for saving the stack of A and one for the stack of B.
In the end, three memory areas are used instead of two.
There are cases where the swapping is necessary: when the shared is at a fixed location.
This is the case, in some degenerate form, of registers but other structure can have a fixed location.
One simple example are the page table entries, if a program A is used to generate two processes A1 and A2, most OSs will copy-on-write them.
Under this circumstances, the two processes end up sharing a lot of pages, even all but a few. For the OS may be easier to swap in and out the few different pages rather than make the page table (or part of it) point to two different locations.
In general, if we cannot afford to have multiple instances of a resource, we need to time-share it.
Since we can afford to have more than one instance of the stack, we prefer to not share it.

Related

Stored Program Computer in modern computing

I was given this exact question on a quiz.
Question
Answer
Does the question make any sense? My understanding is that the OS schedules a process and manages what instructions it needs the processor to execute next. This is because the OS is liable to pull all sorts of memory management tricks, especially in main memory where fragmentation is a way of life. I remember that there is supposed to be a special register on the processor called the program counter. In light of the scheduler and memory management done by the OS I have trouble figuring out the purpose of this register unless it is just for the OS. Is the concept of the Stored Program Computer really relevant to how a modern computer operates?
Hardware fetches machine code from main memory, at the address in the program counter (which increments on its own as instructions execute, or is modified by executing a jump or call instruction).
Software has to load the code into RAM (main memory) and start the process with its program counter pointing into that memory.
And yes, if the OS wants to page that memory out to disk (or lazily load it in the first place), hardware will trigger a page fault when the CPU tries to fetch code from an unmapped page.
But no, the OS does not feed instructions to the CPU one at a time.
(Unless you're debugging a program by putting the CPU into "single step" mode when returning to user-space for that process, so it traps after executing one instruction. Like x86's trap flag, for example. Some ISAs only have software breakpoints, not HW support for single stepping.)
But anyway, the OS itself is made up of machine code that runs on the CPU. CPU hardware knows how to fetch and execute instructions from memory. An OS is just a fancy program that can load and manage other programs. (Remember, in a von Neumann architecture, code is data.)
Even the OS has to depend on the processing architecture. Memory today often is virtualized. That means the memory location seen by the program is not the real physical location, but is indirected by one or more tables describing the actual location and some attributes (e.g. read/write/execute allowed or not) for memory accesses. If the accessed virtual memory has not been loaded into main memory (these tables say so), an exception is generated, and the address of an exception handler is loaded into the program counter. This exception handler is by the OS and resides in main memory. So the program counter is quite relevant with today's computers, but the next instruction can be changed by exceptions (exceptions are also called for thread or process switching in preemptive multitasking systems) on the fly.
Does the question make any sense?
Yes. It makes sense to me. It is a bit imprecise, but the meanings of each of the alternatives are sufficiently distinct to be able to say that D) is the best answer.
(In theory, you could create a von Neumann computer which was able to execute instructions out of secondary storage, registers or even the internet ... but it would be highly impractical for various reasons.)
My understanding is that the OS schedules a process and manages what instructions it needs the processor to execute next. This is because the OS is liable to pull all sorts of memory management tricks, especially in main memory where fragmentation is a way of life.
Fragmentation of main memory is not actually relevant. A modern machine uses special hardware (and page tables) to deal with that. From the perspective of executing code (application or kernel) this is all hidden. The code uses virtual addresses, and the hardware maps them to physical addresses. (This is even true when dealing with page faults, though special care will be taken to ensure that the code and page table entries for the page fault handler are in RAM pages that are never swapped out.)
I remember that there is supposed to be a special register on the processor called the program counter. In light of the scheduler and memory management done by the OS I have trouble figuring out the purpose of this register unless it is just for the OS.
The PC is fundamental. It contains the virtual memory address of the next instruction that the CPU is to execute. For application code AND for OS kernel code. When you switch between the application and kernel code, the value in the PC is updated as part of the context switch.
Is the concept of the Stored Program Computer really relevant to how a modern computer operates?
Yes. Unless you are working on a special custom machine where (say) the program has been transformed into custom silicon.

Why page faults are usually handled by the OS, not hardware?

I find that during TLB missing process, some architecture use hardware to handle it while some use the OS. But when it comes to page fault, most of them use the OS instead of hardware.
I tried to find the answer but didn't find any article explains why.
Could anyone help with this?
Thanks.
If the hardware could handle it on its own, it wouldn't need to fault.
The whole point is that the OS hasn't wired the page into the hardware page tables, e.g. because it's not actually in memory at all, or because the OS needs to catch an attempt to write so the OS can implement copy-on-write.
Page faults come in three categories:
valid (the process logically has the memory mapped, but the OS was lazy or playing tricks):
hard: the page needs to be paged in from disk, either from swap space or from a disk file (e.g. a memory mapped file, like a page of an executable or shared library). Usually the OS will schedule another task while waiting for I/O.
soft: no disk access required, just for example allocating + zeroing a new physical page to back a virtual page that user-space just tried to write. Or copy-on-write of a writeable page that multiple processes had mapped, but where changes by one shouldn't be visible to the other (like mmap(MAP_PRIVATE)). This turns a shared page into a private dirty page.
invalid: There wasn't even a logical mapping for that page. A POSIX OS like Linux will deliver SIGSEGV signal to the offending process/thread.
The hardware doesn't know which is which, all it knows was that a page walk didn't find a valid page-table entry for that virtual address, so it's time to let the OS decide what to do next. (i.e. raise a page-fault exception which runs the OS's page-fault handler.) valid/invalid are purely software/OS concepts.
These example reasons are not an exhaustive list. e.g. an OS might remove the hardware mapping for a page without actually paging it out, just to see if the process touches it again soon. (In which case it's just a cheap soft page fault. But if not, then it might actually page it out to disk. Or drop it if it's clean.)
For HW to be able to fully handle a page fault, we'd need data structures with a hardware-specified layout that somehow lets hardware know what to do in some possible situations. Unless you build a whole kernel into the CPU microcode, it's not possible to have it handle every page fault, especially not invalid ones which require reading the OS's process / task-management data structures and delivering a signal to user-space. Either to a signal handler if there is one, or killing the process.
And especially not hard page faults, where a multi-tasking OS will let some other process run while waiting for the disk to DMA the page(s) into memory, before wiring up the page tables for this process and letting it retry the faulting load or store instruction.

When deadlocks occur in modern operating systems?

I know deadlocks was a hot research topic in past. But, even though I studied lots of modern operating systems, I cannot see any major problem about deadlocks now. I know some (most) resources which deadlocks can occur strictly managed by operating system itself and seems it prevent deadlocks someway, I really didn't see any case related to a deadlock. I know lots of features about resources handled different than others in popular systems with different design principles but, they can all maintain system deadlock-free.
Try to use two mutexes in your program and in first thread close in sequence: mutex1, sleep(500ms), mutex2, in second thread: mutex2, sleep(1000ms), mutex1.
In systems. In windows (including 8.1) if your application uses SendMessage and broadcast HWND_BROADCAST - if one application is hung, your application also will be in hung state. Also in part cases of DDE communication (including ShellExecute for part of programs), if one application is not responsive, your application can be in hung state.
But you can use SendMessageTimeout...
The deadlock will always be possible if processes or threads will be synchronized. Synchronization of processes and threads is a "must-have" element of applications.
AND... SYSTEM-WIDE deadlock (Windows):
Save all your documents before this action.
Create HWND h1 with parent=0 or parent=GetDesktopWindow and styles 0x96cf0000
Create HWND h2 with parent=h1 and styles 0x96cf0000
Create HWND h3 with parent=h2 and styles 0x56cf0000 (here must be a child window).
Use ::SetParent(h1, h3);
Then click any of these windows.
The system will in cyclic (triangle) order try to reorder windows. The application is hung but if any other application will try to use SetWindowPos, the application will newer return from this function. The Task Manager won't help, the Alt+Ctrl+Del also stops to work. 100% of usage of CPU... Only hard reset will help you.
There is possibility to prevent it but this situation must be detected ASAP.
Operating system deadlocks still happen. When a system has limited contended resources that it can't reclaim a deadlock is still possible.
In linux, look at kernel stalls, these happen when I/O doesn't release in a timely manner. Kernel stalls are particularly interesting between vmware and guest operating systems.
For external instigators, deadlocks happen when san systems and networks have issues.
New release deadlocks happen fairly often while maturing a kernel, not per user, but as a whole from the community.
Ever get a blue screen or instant reboot? Some of those are caused by lost resources.
Kernels are fairly mature, and have gotten good at reclaiming resources, but aren't perfect.
Most modern resource handlers tend to present as services now instead of being lockable objects. Most resource sharing within the operating system relies on separate channels, alleviating much of the overlap. There's a higher reliance on queues and toggles instead of direct locking contention on shared buffers. These are generalities of trends in OS parts and pieces that contribute to less opportunity for deadlocks, but there's not a way to guarantee a deadlock less system.

Difference between shared memory IPC mechanism and API/system-call invocation

I am studying about operating systems(Silberscatz, Galvin et al). My programming experiences are limited to occasional coding of exercise problems given in a programing text or an algorithm text. In other words I do not have a proper application programming or system programming experience. I think my below question is a result of a lack of experience of the above and hence a lack of context.
I am specifically studying IPC mechanisms. While reading about shared memory(SM) I couldn't imagine a real life scenario where processes communicate using SM. An inspection of processes attached to the same SM segment on my linux(ubuntu) machine(using 'ipcs' in a small shell script) is uploaded here
Most of the sharing by applications seem to be with the X deamon. From what I know , X is the process responsible for giving me my GUI. I infered that these applications(mostly applets which stay on my taskbar) share data with X about what needs to change in their appearances and displayed values. Is this a reasonable inference??
If so,
my question is, what is the difference between my applications communicating with 'X' via shared memory segments versus my applications invoking certain API's provided by 'X' and communicate to 'X' about the need to refresh their appearances?? BY difference I mean, why isn't the later approach used?
Isn't that how user processes and the kernel communicate? Application invokes a system call when it wants to, say read a file, communicating the name of the file and other related info via arguments of the system call?
Also could you provide me with examples of routinely used applications which make use of shared memory and message-passing for communication?
EDIT
I have made the question more clearer. I have formatted the edited part to be bold
First, since the X server is just another user space process, it cannot use the operating system's system call mechanism. Even when the communication is done through an API, if it is between user space processes, there will be some inter-process-communication (IPC) mechanism behind that API. Which might be shared memory, sockets, or others.
Typically shared memory is used when a lot of data is involved. Maybe there is a lot of data that multiple processes need to access, and it would be a waste of memory for each process to have its own copy. Or a lot of data needs to be communicated between processes, which would be slower if it were to be streamed, a byte at a time, through another IPC mechanism.
For graphics, it is not uncommon for a program to keep a buffer containing a pixel map of an image, a window, or even the whole screen that then needs to be regularly copied to the screen. Sometimes at a very high rate...30 times a second or more. I suspect this is why X uses shared memory when possible.
The difference is that with an API you as a developer might not have access to what is happening inside these functions, so memory would not necessarily be shared.
Shared Memory is mostly a specific region of memory to which both apps can write and read from. This off course requires that access to that memory is synchronized so things don't get corrupted.
Using somebody's API does not mean you are sharing memory with them, that process will just do what you asked and perhaps return the result of that operation to you, however that doesn't necessarily go via shared memory. Although it could, it depends, as always.
The preference for one over another I'd say depends on the specifications of the particular application and what it is doing and what it needs to share. I can imagine that a big dataset of some kind or another would be shared by shared memory, but passing a file name to another app might only need an API call. However largely dependent on requirements I'd say.

Memory mapped files and "soft" page faults. Unavoidable?

I have two applications (processes) running under Windows XP that share data via a memory mapped file. Despite all my efforts to eliminate per iteration memory allocations, I still get about 10 soft page faults per data transfer. I've tried every flag there is in CreateFileMapping() and CreateFileView() and it still happens. I'm beginning to wonder if it's just the way memory mapped files work.
If anyone there knows the O/S implementation details behind memory mapped files I would appreciate comments on the following theory: If two processes share a memory mapped file and one process writes to it while another reads it, then the O/S marks the pages written to as invalid. When the other process goes to read the memory areas that now belong to invalidated pages, this causes a soft page fault (by design) and the O/S knows to reload the invalidated page. Also, the number of soft page faults is therefore directly proportional to the size of the data write.
My experiments seem to bear out the above theory. When I share data I write one contiguous block of data. In other words, the entire shared memory area is overwritten each time. If I make the block bigger the number of soft page faults goes up correspondingly. So, if my theory is true, there is nothing I can do to eliminate the soft page faults short of not using memory mapped files because that is how they work (using soft page faults to maintain page consistency). What is ironic is that I chose to use a memory mapped file instead of a TCP socket connection because I thought it would be more efficient.
Note, if the soft page faults are harmless please note that. I've heard that at some point if the number is excessive, the system's performance can be marred. If soft page faults intrinsically are not significantly harmful then if anyone has any guidelines as to what number per second is "excessive" I'd like to hear that.
Thanks.