Doubts about Shared Memory space between processes - operating-system

I am new to the subject Operating Systems.
Started studying Operating Systems recently.
I am stuck with some abstraction which I am unable to catch a hold of.
During studying interprocess communication, I read that shmget() allocates a memory Segment and returns an integer called shmid.
As far as I understood,this shared memory Segment will be used for communication between two different processes,let's say process P1 and P2 respectively.
But it's written that before any process can access that shared memory Segment created by shmget(),the process must attach this shared memory Segment in its address space.
I couldn't understand that what is actually meant by attaching the shared memory Segment to the address space of a process.
I meant isn't it enough for a process to just know about the starting address of the shared memory Segment to access it?
And also what actually is happening when the shared memory is getting attached to the address space of a process? And whose address is it...which is being returned by the function shmat()?

During studying interprocess communication, I read that shmget() allocates a memory Segment and returns an integer called shmid.
Most likely, your processes don't need to communicate that way. An application often uses several threads belonging to the same process instead and simply share heap data by passing pointers around. In all cases, it is the threads that are responsible to manage concurrent accesses and race conditions for the data. I haven't read about it but I'd say shmid is an id that other threads use to identify which segment of memory they want to attach. The OS keeps track of shared segments giving each segment an id.
I couldn't understand that what is actually meant by attaching the shared memory Segment to the address space of a process.
If one thread requests a shared memory segment, the other threads from other processes haven't notified the kernel that they also use it. One thread needs to create the segment and save the id. The threads who want to share it need to use that id to notify the kernel that they also want to acess that shared memory.
And also what actually is happening when the shared memory is getting attached to the address space of a process? And whose address is it...which is being returned by the function shmat()?
Each thread has a TCB that informs the kernel on the virtual memory that is allocated to it. Attaching memory to the thread means adding that memory segment to the list of memory allocated to the thread to avoid a page fault when the thread attempts an access. If the thread doesn't notify the kernel then the kernel will kill it after realising that this thread isn't permitted to access the data (because it isn't in its adress space yet).

I couldn't understand that what is actually meant by attaching the shared memory Segment to the address space of a process
Simply it means, updating the page table of the process so it can map it's virtual addresses to the physical address of the owner process of the shared memory.
we can understand it by using the concept of paging.
so let it be a abstraction for now.
And also what actually is happening when the shared memory is getting attached to the address space of a process? And whose address is it...which is being returned by the function shmat()?
when a process, say P1 want's to share it's memory, it must create a shared memory region. for that it calls shmget() syscall containing few parameters. if it's successful then shmget() returns an identifier, say 52.
now if another process P2 want's to use that shared memory created by P1, it must call shmat() and mention that identifier (in this case 52) and if it's successful it's returned a pointer from where P2 now can read or write or do both.
P2 now can modify the shared memory region. ex. write something on it.
it's the address space of P1.

Related

What is the real use of logical addresses?

This is what I understood of logical addresses :
Logical addresses are used so that data on the physical memory do not get corrupted. By the use of logical addresses, the processes wont be able to access the physical memory directly, thereby ensuring that it cannot store data on already accessed physical memory locations and hence protecting data integrity.
I have a doubt whether it was really necessary to use logical addresses. The integrity of the data on the physical memory could have been preserved by using an algorithm or such which do not allow processes to access or modify memory locations which were already accessed by other processes.
"The integrity of the data on the physical memory could have been preserved by using an algorithm or such which do not allow processes to access or modify memory locations which were already accessed by other processes."
Short Answer: It is impossible to devise an efficient algorithm as proposed to match the same level of performance with logical address.
The issue with this algorithm is that how are you going to intercept each processes' memory access? Without intercepting memory access, it is impossible to check if a process has privileges to access certain memory region. If we are really going to implement this algorithms, there are ways to intercept memory access without using the logical address provided by MMU (Memory management unit) on modern cpus (Assume you have a cpu without MMU). However, those methods will not be as efficient as using MMU. If your cpu does have a MMU, although logical address translation will be unavoidable, you could setup a one-to-one to the physical memory.
One way to intercept memory access without MMU is to insert kernel trap instruction before each memory access instruction in a program. Since we cannot trust user level program, such job cannot be delegated to a compiler. Thus, you can write an OS which will do this job before it loads a program into memory. This OS will scan through the binary of your program and insert kernel trap instruction before each memory access. By doing so, kernel can inspect if a memory access should be granted. However, this approach downgrades your system's performance a lot as each memory access, legal or not, will trap into the kernel. And trapping into kernel involves context switching which takes a lot of cpu cycles.
Can we do better? What about do a static analysis of memory access of our programs before we load it into memory so we only insert trap before illegal memory access? However, processes has no predefined execution order. Let's say you have programs A and B. They both try to access the same memory region. Then who should get it with our static analysis? We could randomly assign to one of them. Let's say we assign to B. Then how do we know when will B be done with this memory so we can give to A so it can proceed? Let's say B use this region to hold a global variable, which accessed multiple times throughout its life cycle. Do we wait till the completion of B to give this region to A? What if B never ends?
Furthermore, a static analysis of memory access would be impossible with the present of dynamic memory allocation. If either program A or B tries to allocate a memory region which size depends on user input, then OS or our static analysis tool cannot know ahead of time of where or how big the region is. And thus would not be able to do analysis at all.
Thus, we have to fall back to trap on every memory access and determine if access is legal on runtime. Sounds familiar? This is the function of MMU or logical address. However, with logical address, a trap is incurred if and only if a illegal access has happened instead of every memory access.
It is simulated by the OS to programs as if they were using physical memory. The need of the extra layer (logical address) is necessary for data-integrity purposes. You can make the analogy of logical addresses as the language of OS for addresses because without this Mapping, OS would not be able to understand what are the "actual" addresses allowed to any program. To remove this ambiguity, logical address mapping is required so that the OS know what logical address maps to what physical addressing and whether that physical address location is allowed to that program. It performs the "integrity checks" on logical addresses and not on physical memory because you can check the integrity by changing the logical address and do manipulations but you cant really do the same on physical memory because it would affect the already running processes using the memory.
Also I would like to mention that the base register and limit register are loaded by executing privileged instructions and privileged instructions are executed in kernel mode and only operating system has access to kernel mode and therefore CPU cannot directly access the registers. I hope I helped a little :)
There are some things that you need to understand.
First of all a CPU is unable to access the physical memory directly. In order to calculate the physical address a CPU needs a logical address. Logical address is then used compute the physical address. So this is the basic need of logical addresses to access physical memory. Without logical address you cannot access it. This conversion is necessary. Suppose if there is a system which do not follow virtual/logical addresses, that system will become highly vulnerable to hacker or intruder as they can access physical memory directly and manipulate the useful data on any location.
Second thing, when a process runs, CPU generates logical address in order to load that process on main memory. Now the purpose of this logical address here is, the memory management. The size of registers are very less as compared to the actual size of process. So we need to relocate the memory in order to obtain the optimum efficiency. MMU (Memory Management Unit) comes into play here. Physical memory is calculated by MMU using the logical address. So logical addresses are generated by processes and MMU access physical address based on that logical address.
This example will make it clear.
If data is stored on address 50, base register holds the value 50 and offset holds 0. Now, MMU shifts it to address 100, this would be reflected in logical address as well. Offset becomes 100-50=50. So, now if data is needed to be retrieved via logical address, it goes to base address 50 and then see the offset i.e. 50, it goes to address 100 and access data. Logical address keeps the record of the data where it has been moved. No matter how many address locations that data change, it will be reflected in logical address and hence this logical address give accessibility to that data whatever physical address it holds now.
I hope it helps.

What is the purpose of Logical addresses in operating system? Why they are generated

I want to know that why the CPU generates logical addresses and then maps them into Physical addresses with the help of memory manager? Why do we need them.
Virtual addresses are required to run several program on a computer.
Assume there is no virtual address mechanism. Compilers and link editors generate a memory layout with a given pattern. Instruction (text segment) are positioned in memory from address 0. Then are the segments for initialized or uninitialized data (data and bss) and the dynamic memory (heap and stack). (see for instance https://www.geeksforgeeks.org/memory-layout-of-c-program/ if you have no idea on memory layout)
When you run this program, it will occupy part of the memory that will no longer be available for other processes in a completely unpredictable way. For instance, addresses 0 to 1M will be occupied, or 0 to 16k, or 0 to 128M, it completely depends on the program characteristics.
If you now want to run concurrently a second program, where will its instructions and data go to memory? Memory addresses are generated by the compiler that obviously do not know at compile time what will be the free memory. And remember memory addresses (for instructions or data) are somehow hard-coded in the program code.
A second problem happens when you want to run many processes and that you run out of memory. In this situations, some processes are swapped out to disk and restored later. But when restored, a process will go where memory is free and again, it is something that is unpredictable and would require modifying internal addresses of the program.
Virtual memory simplifies all these tasks. When running a process (or restoring it after a swap), the system looks at free memory and fills page tables to create a mapping between virtual addresses (manipulated by the processor and always unchanged) and physical addresses (that depends on the free memory on the computer at a given time).
Logical address translation serves several functions.
One of these is to support the common mapping of a system address space to all processes. This makes it possible for any process to handle interrupt because the system addresses needed to handle interrupts are always in the same place, regardless of the process.
The logical translation system also handles page protection. This makes is possible to protect the common system address space from individual users messing with it. It also allows protecting the user address space, such as making code and data read only, to check for errors.
Logical translation is also a prerequisite for implementing virtual memory. In an virtual memory system, each process's address space is constructed in secondary storage (ie disk). Pages within the address space are brought into memory as needed. This kind of system would be impossible to implement if processes with large address spaces had to be mapped contiguously within memory.

Can virtual memory exists without Paging concept?

We usually learn Virtual memory and Paging at the same time in Operating System and they seem dependent. However, I wonder if they exist independently of each other?
The answer to your question depends on how you define "Virtual Memory". If you define it just as "the addresses that the application sees", then yes Virtual Memory can exist without paging.
Prior to paging, systems used segmentation to isolate user processes. To put it in simple words every process has it's own segment. All the addresses it "sees" are just offsets inside the segment. The hardware implicitly adds the segment base to the address requested by the application to get the Physical addresses. Just like the page table, the segment bases can be modified only by the kernel and it can effectively isolate memory for processes at the same time allowing scope for sharing some parts of memory between processes too.
Segments also have limits which are checked before every access to ensure that the user doesn't use a very big offset and spill into other process.
Segmentation support has been removed from Intel X86_64 architectures where the segment registers do exist but are always set to 0. Only the two segment registers %fs and %gs continue to exist. But the limit checks on them is not performed by the hardware. These segments are now used by the OS for thread local storage.

why is ready queue and block queue are stored in main memory?

It is said that ready queue and block queues are stored in main memory. Some body please tell me why so. What are pros/cons if they are stored in secondary memory(hard disk).
The ready and block queues must be stored in main memory as these are key/critical OS data structures. For stuff not stored in main memory, it must be paged in (and another page evicted) before it can be accessed by address . This is typically triggered by a page fault and is a blocking operation. If your ready or blocking queues are not in main memory, then how can you block the current thread of execution and schedule another? You can't.
Transferring data to/from secondary memory (such as a hard disk) is slow. Preventing all other threads of execution from running during this period will seriously slow down the system. Therefore the thread that generated the page fault is often blocked while transferring the data.
The thread may also block if all main memory-to-secondary memory data transfer channels are already in use, or if another thread is already transferring the page from secondary memory to main memory, or if the internal structures that track which pages are in main memory are being manipulated. (There may be other reasons too.)
Hope this helps.
When you write a program, do you store your variables on hard disk?!
It is the same with an operating system. during run time, the operating system uses special data structures, like job queues, file-system structures, and many other types of variables/structures.
Any operating system... yet any software stores this kind of stuff in main memory because it is much faster than the hard disk. and the variables/structures are just needed in run time. hard disks are mainly used for "permanent" storage.

Where are multiple stacks and heaps put in virtual memory?

I'm writing a kernel and need (and want) to put multiple stacks and heaps into virtual memory, but I can't figure out how to place them efficiently. How do normal programs do it?
How (or where) are stacks and heaps placed into the limited virtual memory provided by a 32-bit system, such that they have as much growing space as possible?
For example, when a trivial program is loaded into memory, the layout of its address space might look like this:
[ Code Data BSS Heap-> ... <-Stack ]
In this case the heap can grow as big as virtual memory allows (e.g. up to the stack), and I believe this is how the heap works for most programs. There is no predefined upper bound.
Many programs have shared libraries that are put somewhere in the virtual address space.
Then there are multi-threaded programs that have multiple stacks, one for each thread. And .NET programs have multiple heaps, all of which have to be able to grow one way or another.
I just don't see how this is done reasonably efficient without putting a predefined limit on the size of all heaps and stacks.
I'll assume you have the basics in your kernel done, a trap handler for page faults that can map a virtual memory page to RAM. Next level up, you need a virtual memory address space manager from which usermode code can request address space. Pick a segment granularity that prevents excessive fragmentation, 64KB (16 pages) is a good number. Allow usermode code to both reserve space and commit space. A simple bitmap of 4GB/64KB = 64K x 2 bits to keep track of segment state gets the job done. The page fault trap handler also needs to consult this bitmap to know whether the page request is valid or not.
A stack is a fixed size VM allocation, typically 1 megabyte. A thread usually only needs a handful of pages of it, depending on function nesting level, so reserve the 1MB and commit only the top few pages. When the thread nests deeper, it will trip a page fault and the kernel can simply map the extra page to RAM to allow the thread to continue. You'll want to mark the bottom few pages as special, when the thread page faults on those, you declare this website's name.
The most important job of the heap manager is to prevent fragmentation. The best way to do that is to create a lookaside list that partitions heap requests by size. Everything less than 8 bytes comes from the first list of segments. 8 to 16 from the second, 16 to 32 from the third, etcetera. Increasing the size bucket as you go up. You'll have to play with the bucket sizes to get the best balance. Very large allocations come directly from the VM address manager.
The first time an entry in the lookaside list is hit, you allocate a new VM segment. You subdivide the segment into smaller blocks with a linked list. When such an allocation is released, you add the block to the list of free blocks. All blocks have the same size regardless of the program request so there won't be any fragmentation. When the segment is fully used and no free blocks are available you allocate a new segment. When a segment contains nothing but free blocks you can return it to the VM manager.
This scheme allows you to create any number of stacks and heaps.
Simply put, as your system resources are always finite, you can't go limitless.
Memory management always consists of several layers each having its well defined responsibility. From the perspective of the program, the application-level manager is visible that is usually concerned only with its own single allocated heap. A level above could deal with creating the multiple heaps if needed out of (its) one global heap and assigning them to subprograms (each with its own memory manager). Above that could be the standard malloc()/free() that it uses and above those the operating system dealing with pages and actual memory allocation per process (it is basically not concerned not only about multiple heaps, but even user-level heaps in general).
Memory management is costly and so is trapping into the kernel. Combining the two could impose severe performance hit, so what seems to be the actual heap management from the application's point of view is actually implemented in user space (the C runtime library) for the sake of performance (and other reason out of scope for now).
When loading a shared (DLL) library, if it is loaded at program startup, it will of course be most probably loaded to CODE/DATA/etc so no heap fragmentation occurs. On the other hand, if it is loaded at runtime, there's pretty much no other chance than using up heap space.
Static libraries are, of course, simply linked into the CODE/DATA/BSS/etc sections.
At the end of the day, you'll need to impose limits to heaps and stacks so that they're not likely to overflow, but you can allocate others.
If one needs to grow beyond that limit, you can either
Terminate the application with error
Have the memory manager allocate/resize/move the memory block for that stack/heap and most probably defragment the heap (its own level) afterwards; that's why free() usually performs poorly.
Considering a pretty large, 1KB stack frame on every call as an average (might happen if the application developer is unexperienced) a 10MB stack would be sufficient for 10240 nested call -s. BTW, besides that, there's pretty much no need for more than one stack and heap per thread.