why is ready queue and block queue are stored in main memory? - operating-system

It is said that ready queue and block queues are stored in main memory. Some body please tell me why so. What are pros/cons if they are stored in secondary memory(hard disk).

The ready and block queues must be stored in main memory as these are key/critical OS data structures. For stuff not stored in main memory, it must be paged in (and another page evicted) before it can be accessed by address . This is typically triggered by a page fault and is a blocking operation. If your ready or blocking queues are not in main memory, then how can you block the current thread of execution and schedule another? You can't.
Transferring data to/from secondary memory (such as a hard disk) is slow. Preventing all other threads of execution from running during this period will seriously slow down the system. Therefore the thread that generated the page fault is often blocked while transferring the data.
The thread may also block if all main memory-to-secondary memory data transfer channels are already in use, or if another thread is already transferring the page from secondary memory to main memory, or if the internal structures that track which pages are in main memory are being manipulated. (There may be other reasons too.)
Hope this helps.

When you write a program, do you store your variables on hard disk?!
It is the same with an operating system. during run time, the operating system uses special data structures, like job queues, file-system structures, and many other types of variables/structures.
Any operating system... yet any software stores this kind of stuff in main memory because it is much faster than the hard disk. and the variables/structures are just needed in run time. hard disks are mainly used for "permanent" storage.

Related

Doubts about Shared Memory space between processes

I am new to the subject Operating Systems.
Started studying Operating Systems recently.
I am stuck with some abstraction which I am unable to catch a hold of.
During studying interprocess communication, I read that shmget() allocates a memory Segment and returns an integer called shmid.
As far as I understood,this shared memory Segment will be used for communication between two different processes,let's say process P1 and P2 respectively.
But it's written that before any process can access that shared memory Segment created by shmget(),the process must attach this shared memory Segment in its address space.
I couldn't understand that what is actually meant by attaching the shared memory Segment to the address space of a process.
I meant isn't it enough for a process to just know about the starting address of the shared memory Segment to access it?
And also what actually is happening when the shared memory is getting attached to the address space of a process? And whose address is it...which is being returned by the function shmat()?
During studying interprocess communication, I read that shmget() allocates a memory Segment and returns an integer called shmid.
Most likely, your processes don't need to communicate that way. An application often uses several threads belonging to the same process instead and simply share heap data by passing pointers around. In all cases, it is the threads that are responsible to manage concurrent accesses and race conditions for the data. I haven't read about it but I'd say shmid is an id that other threads use to identify which segment of memory they want to attach. The OS keeps track of shared segments giving each segment an id.
I couldn't understand that what is actually meant by attaching the shared memory Segment to the address space of a process.
If one thread requests a shared memory segment, the other threads from other processes haven't notified the kernel that they also use it. One thread needs to create the segment and save the id. The threads who want to share it need to use that id to notify the kernel that they also want to acess that shared memory.
And also what actually is happening when the shared memory is getting attached to the address space of a process? And whose address is it...which is being returned by the function shmat()?
Each thread has a TCB that informs the kernel on the virtual memory that is allocated to it. Attaching memory to the thread means adding that memory segment to the list of memory allocated to the thread to avoid a page fault when the thread attempts an access. If the thread doesn't notify the kernel then the kernel will kill it after realising that this thread isn't permitted to access the data (because it isn't in its adress space yet).
I couldn't understand that what is actually meant by attaching the shared memory Segment to the address space of a process
Simply it means, updating the page table of the process so it can map it's virtual addresses to the physical address of the owner process of the shared memory.
we can understand it by using the concept of paging.
so let it be a abstraction for now.
And also what actually is happening when the shared memory is getting attached to the address space of a process? And whose address is it...which is being returned by the function shmat()?
when a process, say P1 want's to share it's memory, it must create a shared memory region. for that it calls shmget() syscall containing few parameters. if it's successful then shmget() returns an identifier, say 52.
now if another process P2 want's to use that shared memory created by P1, it must call shmat() and mention that identifier (in this case 52) and if it's successful it's returned a pointer from where P2 now can read or write or do both.
P2 now can modify the shared memory region. ex. write something on it.
it's the address space of P1.

What does TCM connection with Icache in this RISCV version?

In the middle of this page (https://github.com/ultraembedded/riscv), there is a block diagram about the core, I really do not know what is TCM doing in the same block with the Icache ? Is it an optional thing to be inside the CPU ?
Some embedded systems provide dedicated memory for code and/or for data.  On some of these systems, Tightly-Coupled Memory serves as a replacement for the (instruction) cache, while on other such systems this memory is in addition to and along side a cache, applying to a certain portion of the address space.  This dedicated memory may be on the chip of the processor.
This memory could be some kind of ROM or other memory that is initialized somehow prior to boot.  In any case, TCM typically isn't backed by main memory, so doesn't suffer cache misses and the associated circuitry, usually also has high performance, like a cache when a hit occurs.
Some systems refer to this as Instruction Tightly Integrated Memory, ITIM, or Data Tightly Integrated Memory, DTIM.
When a system uses ITIM or DTIM, it performs more like a Harvard architecture than the Modified Harvard architecture of laptops and desktops.
The cache has no address space. CPU does not ask for data from the cache, it just asks for a data, then the memory controller first checks the cache if the data is present in the cache. If it is in the cache, data is fetched, if not then the controller checks the RAM. All processor does is ask for data, it does not care where the data came from. In the case of TCM, the CPU can directly write data to TCM and ask data from TCM since it has a specific address. Think of TCM as a RAM that is close to the CPU.

Does a user process have any control over paging?

A program might have some data that, when needed, it wants to access very fast. Let's call this VIP data. It would like to reduce the likelihood that page in memory that the VIP data resides on gets swapped to disk when memory utilization is high on the system. What types of control/influence does it have over this?
For example, I think it can consider the page replacement policy and try to influence the OS to not swap this VIP data to disk. If the policy is LRU, the program can periodically read the VIP data to ensure that the page has always been accessed fairly recently. A program can also use a very small amount of memory in total, making it likely that all its pages are recently accessed when it runs and therefore the VIP data is not likely swapped to disk.
Can it exert any more explicit control over paging?
In order to do this, you might consider
Prioritising the process using renice command or
Lock the processes in the main memory using MLOCK(2)
This is entirely operating system dependent. On some systems, if you have appropriate privileges you can lock pages in physical memory.

Does it make sense to cache data obtained from a memory mapped file?

Or it would be faster to re-read that data from mapped memory once again, since the OS might implement its own cache?
The nature of data is not known in advance, it is assumed that file reads are random.
i wanted to mention a few things i've read on the subject. The answer is no, you don't want to second guess the operating system's memory manager.
The first comes from the idea that you want your program (e.g. MongoDB, SQL Server) to try to limit your memory based on a percentage of free RAM:
Don't try to allocate memory until there is only x% free
Occasionally, a customer will ask for a way to design their program so it continues consuming RAM until there is only x% free. The idea is that their program should use RAM aggressively, while still leaving enough RAM available (x%) for other use. Unless you are designing a system where you are the only program running on the computer, this is a bad idea.
(read the article for the explanation of why it's bad, including pictures)
Next comes from some notes from the author of Varnish, and reverse proxy:
Varnish Cache - Notes from the architect
So what happens with squids elaborate memory management is that it gets into fights with the kernels elaborate memory management, and like any civil war, that never gets anything done.
What happens is this: Squid creates a HTTP object in "RAM" and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive.
Imagine you do cache something from a memory-mapped file. At some point in the future that memory holding that "cache" will be swapped out to disk.
the OS has written to the hard-drive something which already exists on the hard drive
Next comes a time when you want to perform a lookup from your "cache" memory, rather than the "real" memory. You attempt to access the "cache", and since it has been swapped out of RAM the hardware raises a PAGE FAULT, and cache is swapped back into RAM.
your cache memory is just as slow as the "real" memory, since both are no longer in RAM
Finally, you want to free your cache (perhaps your program is shutting down). If the "cache" has been swapped out, the OS must first swap it back in so that it can be freed. If instead you just unmapped your memory-mapped file, everything is gone (nothing needs to be swapped in).
in this case your cache makes things slower
Again from Raymon Chen: If your application is closing - close already:
When DLL_PROCESS_DETACH tells you that the process is exiting, your best bet is just to return without doing anything
I regularly use a program that doesn't follow this rule. The program
allocates a lot of memory during the course of its life, and when I
exit the program, it just sits there for several minutes, sometimes
spinning at 100% CPU, sometimes churning the hard drive (sometimes
both). When I break in with the debugger to see what's going on, I
discover that the program isn't doing anything productive. It's just
methodically freeing every last byte of memory it had allocated during
its lifetime.
If my computer wasn't under a lot of memory pressure, then most of the
memory the program had allocated during its lifetime hasn't yet been
paged out, so freeing every last drop of memory is a CPU-bound
operation. On the other hand, if I had kicked off a build or done
something else memory-intensive, then most of the memory the program
had allocated during its lifetime has been paged out, which means that
the program pages all that memory back in from the hard drive, just so
it could call free on it. Sounds kind of spiteful, actually. "Come
here so I can tell you to go away."
All this anal-rententive memory management is pointless. The process
is exiting. All that memory will be freed when the address space is
destroyed. Stop wasting time and just exit already.
The reality is that programs no longer run in "RAM", they run in memory - virtual memory.
You can make use of a cache, but you have to work with the operating system's virtual memory manager:
you want to keep your cache within as few pages as possible
you want to ensure they stay in RAM, by the virtue of them being accessed a lot (i.e. actually being a useful cache)
Accessing:
a thousand 1-byte locations around a 400GB file
is much more expensive than accessing
a single 1000-byte location in a 400GB file
In other words: you don't really need to cache data, you need a more localized data structure.
If you keep your important data confined to a single 4k page, you will play much nicer with the VMM; Windows is your cache.
When you add 64-byte quad-word aligned cache-lines, there's even more incentive to adjust your data structure layout. But then you don't want it too compact, or you'll start suffering performance penalties of cache flushes from False Sharing.
The answer is highly OS-specific. Generally speaking, there will be no sense in caching this data. Both the "cached" data as well as the memory-mapped can be paged away at any time.
If there will be any difference it will be specific to an OS - unless you need that granularity, there is no sense in caching the data.

Can a shared ready queue limit the scalability of a multiprocessor system?

Can a shared ready queue limit the scalability of a multiprocessor system?
Simply put, most definetly. Read on for some discussion.
Tuning a service is an art-form or requires benchmarking (and the space for the amount of concepts you need to benchmark is huge). I believe that it depends on factors such as the following (this is not exhaustive).
how much time an item which is picked up from the ready qeueue takes to process, and
how many worker threads are their?
how many producers are their, and how often do they produce ?
what type of wait concepts are you using ? spin-locks or kernel-waits (the latter being slower) ?
So, if items are produced often, and if the amount of threads is large, and the processing time is low: the data structure could be locked for large windows, thus causing thrashing.
Other factors may include the data structure used and how long the data structure is locked for -e.g., if you use a linked list to manage such a queue the add and remove oprations take constant time. A prio-queue (heaps) takes a few more operations on average when items are added.
If your system is for business processing you could take this question out of the picture by just using:
A process based architecure and just spawning multiple producer consumer processes and using the file system for communication,
Using a non-preemtive collaborative threading programming language such as stackless python, Lua or Erlang.
also note: synchronization primitives cause inter-processor cache-cohesion floods which are not good and therefore should be used sparingly.
The discussion could go on to fill a Ph.D dissertation :D
A per-cpu ready queue is a natural selection for the data structure. This is because, most operating systems will try to keep a process on the same CPU, for many reasons, you can google for.What does that imply? If a thread is ready and another CPU is idling, OS will not quickly migrate the thread to another CPU. load-balance kicks in long run only.
Had the situation been different, that is it was not a design goal to keep thread-cpu affinities, rather thread migration was frequent, then keeping separate per-cpu run queues would be costly.