Can Someone Give me a high-level overview of the VSWS Algorithm used in Operating Systems? - operating-system

I am trying to find videos/resources that can give me a simple, clear, concise description of the VSWS algorithm but I cannot seem to find any. Any help would be appreciated!

Can Someone Give me a high-level overview of the VSWS Algorithm...
The basic idea of the Variable-Interval Sampled Working Set algorithm is:
each virtual page has a "was used" flag
while the program is running, if/when the program uses a virtual page (including when the page's data had to be fetched from elsewhere/disk before it could be used) the CPU or OS sets the page's "was used" flag.
after a variable amount of time, the OS checks all the "was used" flags and decides that if a page wasn't used then its not part of the working set (and may evict them to free up physical memory); then clears all the "was used" flags (ready for the next variable amount of time).
... used in Operating Systems?
I wouldn't assume it's actually used in modern operating systems.
Most operating systems use something loosely based on "least recently used"; where a similar "variable sampling" approach is used to build up an estimate of "time when page was used last" (and not merely a single "was used" flag), which is then used to estimate "probability of future use"; which might then be combined with "cost of eviction" and "priority of program" to come up with a combined score; where the pages with the worst score are deemed "best to evict to free up physical memory".
Note 1: If a page was modified and needs to be written to swap space (and then possibly loaded back from swap space later) then it has a higher "cost of eviction"; and if a page hasn't been modified since it was fetched from a file or swap space last then it has a lower "cost of eviction". To improve performance (reduce the cost of eviction, not forgetting that estimates are crude and often poorly predict future use) it'd make sense to prefer the eviction of "cheaper to evict" pages.
Note 2: When there's multiple tasks running; it's good to give some tasks preferential treatment. For an extreme example, imagine if the OS is under "low memory" conditions and constantly thrashing (transferring data to/from) disks; and an admin/user is trying to terminate a buggy program that is causing all the disk trashing but can't because the tool/s they need to use to fix the problem are unresponsive (because those tools were not given preferential treatment and have to be fetched from the "already being thrashed" disk).
Note 3: In some cases (e.g. a task called sleep() and it's trivial to determine that it will wake up soon) it's possible to use other information to get a better estimate of "probability of future use" than a simple "least recently used" algorithm could provide.
Note 4: Typically when an OS needs to free up some physical memory there's other things (e.g. file data caches) that could also be considered (and could also participate in that "calculate a score and evict whatever has the worst score" system).
Note 5: Modern systems also pre-fetch data (e.g. from files, etc) before the data is actually requested. It's entirely possibly for pre-fetched "not requested by any program, not used at all yet" data to be more important than "explicitly requested and previously used" data.

Related

Page Fault - How does os search for the page in secondary storage?

My question is when a page fault occurs and the required page is not in RAM ,after that how does the os know where to look for the given page in the entire secondary memory to bring it to the RAM? So is the logical address the address of the secondary memory store or is the required secondary storage address stored in the page table itself or some other way?
I feel like i am probably missing something very basic here but this doubt came in my mind and a quick google search is not providing any answers.
My question is when a page fault occurs and the required page is not in RAM ,after that how does the os know where to look for the given page in the entire secondary memory to bring it to the RAM?
If there were 50 different operating systems that supported an average of 10 different architectures each, there would be up to a maximum of 500 different answers; where one of the answers would be "all software uses physical addresses and there is no virtual memory and there is no secondary memory" and another answer would be "a virtual address is a location on the disk and RAM is just used as disk cache to speed it up" (see https://en.wikipedia.org/wiki/Single-level_store ).
For most typical modern operating systems running on most typical architectures; if you worked out all of the information the kernel needs to know about each virtual page (e.g. what the page is pretending to be, what the page actually is, location on disk if any, location in RAM if any, something to keep track of "least recently used", something to keep track of "number of copy-on-write copies", etc); then you could scatter all the information across multiple different data structures such that:
some of the data structures are used/required by the CPU itself and some aren't
the same information may or may not be in 2 or more data structures at the same time
some data structures have an entry for each virtual page and some just have an entry for each range of multiple pages
some data structures are arrays/tables, some are trees, some are trees of tables, and some are something else.
some use "virtual address" or "virtual page number" as a key to find information; and some use something else (e.g. inverted page tables on PowerPC and Itanium use "physical address" as an index because using what you're trying to find as an index is the least intelligent thing you could possibly do, so why not?).
some of the data structures may be in the kernel and some may not be (e.g. the L4 micro-kernel manages virtual memory mapping purely in user-space via. an "abstract hierarchical address space" model).
In general; the information about where a page's data is in (each different piece of?) secondary memory (if there is secondary memory) will be stored in one or more places in one or more things.
Note that when a page fault occurs the page fault handler typically needs to make multiple decisions; possibly starting with figuring out what made the access (a process, the kernel itself?) and whether the access should be allowed or denied, then figuring out what to do about it (send SIGSEGV? do a kernel panic? fetch data into the CPU's TLB? invalidate stale data from CPU's TLB? do copy-on-write cloning? fetch data from swap space? fetch data from file?); so the page fault handler ends up finding multiple different pieces of data from (potentially) multiple different places.
A Concrete Example
For my OS designs (which are based on asynchronous message passing and use micro-kernels); a micro-kernel is small enough that it can be custom designed and optimised for a specific architecture (without any regard to portability). The operating system design is intended for distributed systems, and for that reason shared memory (and fork()) are not supported (you don't want page fault handler to have to fetch data from a remote computer over a congested network connection to do a "copy on write"); and the only case for "copy on write" is memory mapped files where the page is shared by one or more processes and the (local) VFS cache.
For 64-bit 80x86, the CPU requires a tree of 4 levels of tables (page tables, page directories, page directory pointer tables and page map level 4), and to improve efficiency (reduce memory consumption and reduce cache misses, etc) I use these tables as much as possible.
For page table entries (or page directory entries if 2 MiB pages are being used); if the page is not present there are 63 bits that are ignored by the CPU that the OS can use for its own purposes; and if the page is present then (depending on which features CPU supports) there are at least 9 bits that the OS can use for its own purposes and flags that the CPU uses (e.g. the "read, write, no-execute" flags) can be used to augment the OS's own information.
When a page is not present, the 63-bits are split into 2 fields - one 8 bit field to keep track of the virtual type of the page (if it's supposed to act like RAM, if it's supposed to be executable, if it's supposed to use "write-back" caching, etc), and one 55 bit "where" field. If the highest bit in the "where" field is set the page was sent to swap space and the other 54 bits are a "swap space handle" (allowing for a maximum of "2**54 * 4 KiB" of swap space); and if the highest bit in the "where" field is clear then the other 54 bits are a "memory mapped file handle". If a page fault occurs because of a "not present" page, the page fault handler uses the 8-bit field to determine if the access should be allowed or denied (or if it's already being handled due to a different thread accessing it already), then (if the access should be allowed) the page fault handler tells the scheduler to put the thread in a "WAITING FOR PAGE" state and marks the page as "being fetched" (so that other threads that belong to the same process know that it's being fetched already), then uses the "where" field to either send a request message asking for the page's data to the Swap Manager (which is a process in user-space), or to find a "memory mapped file descriptor" structure in kernel space that contains more information (that didn't fit in the page table entry) to determine the offset of the page within the file and a file handle, and send a request to the VFS for the page's data (the VFS or Virtual File System is another process in user-space). Later; when Swap Manager or VFS send a reply message containing the page's data back to the kernel, the kernel fixes up the page table entry (putting the page of data from the message into the virtual address space) and tells scheduler to unblock the thread/s (shift them from the "WAITING FOR PAGE" state to the "READY TO RUN" state).
For both of these cases (memory mapped file and swap space) if the access was an "allowed read" then the page is mapped as read only (regardless of whether the page is supposed to be writeable). If the access was an "allowed write", or if a later "allowed write" is done to a page that was previously fetched and mapped as read only; then if the page's data came from swap space the page fault handler informs the Swap Manager that the copy of the page in swap space can be discarded (can't be re-used if the same page is sent to swap space later), and if the page's data came from a memory mapped file the page fault handler informs the VFS that there's one less process with a copy of that page and copies the "copy on write" page's data to a newly allocated page.
When a page is "present", it may still be part of a memory mapped file and there may still be a copy in swap space; but there isn't enough space in the page table entry to store the "where" field. In this case, if the page is in swap space and in RAM, the Swap Manager has to accept "Process ID + virtual address" instead of a "swap space handle" (which causes a little extra overhead in Swap Manager because it has to convert "Process ID + virtual address" into "swap space handle" itself). If the page is a "copy on write" memory mapped file, then the page fault handler searches the process' list of "memory mapped file descriptors" (which causes a little extra overhead).
Note that (in theory) when an OS is running low on free RAM it wants to select a "least likely to be needed soon" page to send to swap space, but this isn't easy/practical so most operating systems use "least recently used" instead.
My kernels don't do this at all. Instead they just send "random" pages to the Swap Manager, and (initially) the Swap Manager keeps the data in RAM and doesn't send it to any of the swap providers to store; and the Swap Manager uses "least recently sent to swap manager" to figure out which pages to send to a swap provider to store. A page that is used often may be sent to swap manager many times without ever actually being sent to a swap provider (and without causing slow disk IO for frequently used pages). Also note that, because "copy on write memory mapped file" is the only case that "copy on write" is used and because there is no other form of shared memory, the VFS can keep track of how many processes are sharing a copy of pages itself and the kernel never need to keep track of how many processes are sharing a copy of any page (like most kernels for most operating systems do).

Is there any standard for supporting Lock-step processor?

I want to ask about supporting Lock-step(lockstep, lock-step) processors in SW-level.
As I know, in AUTOSAR-ASILD, Lock-step processor is used for fault torelant system as below scenario.
The input signals for a processor is copied to another processor(its Lock-step pair).
The output signals from two different processors are compared.
If two output signals are different, trap is generated.
I think that if there is generated trap, then this generated trap should be processed somewhere in SW-level.
However, I could not find any standard for this processing.
I have read some error handling in SW topics specified in AUTOSAR, but I could not find any satisfying answers.
So, my question is summarized as below.
In AUTOSAR or other standard, where is the right place that processes Lock-step trap(SW-C or RTE or BSW)?.
In AUTOSAR or other standard, what is the right action that processes Lock-step trap(RESET or ABORT)?
Thank you.
There are multiple concepts involved here, from different sources.
The ASIL levels are defined by ISO 26262. ASIL-D is the highest level and using a lockstep CPU is one of the methods typically used to achieve ASIL-D compliance for the whole system. Autosar doesn't define how you achieve ASIL-D, or any ASIL level at all. From an Autosar perspective, lockstep would be an implementation detail of the MCU driver, and Autosar doesn't require MCUs to support lockstep. How a particular lockstep implementation works (whether the outputs are compared after each instruction or not, etc.) depends on the hardware, so you can find those answers in the corresponding hardware manual.
Correspondingly, some decisions have to be made by people working on the system, including an expert on functional safety. The decision on what to do on lockstep failure is one such decision - how you react to a lockstep trap should be defined at the system level. This is also not defined by Autosar, although the most reasonable option is to reset your microcontroller after saving some information about the error.
As for where in the Autosar stack the trap should be handled, this is also an implementation decision, although the reasonable choice is for this to happen at the MCAL level - to the extent that talking about levels even makes sense here, as the trap will run in interrupt/trap context and not the normal OS task context. Typically, a trap would come with a higher priority than any interrupt, and also typically it's not possible to disable the traps in software. A trap will be handled by some routine that is registered by the OS in the same way it registers ISRs, so you'd want to configure the trap handler in whatever tool you're using for OS configuration. The lockstep trap may (again, depending on the hardware) be considered a non-recoverable trap, meaning that the trap handler should trigger a reset eventually. Calling the standard ShutdownOS() function may be reasonable.

What are the Parameters on which RTOS are compared?

I want to compare two RTOS (e.g. -> Keil-RTX ,Ucos-iii and freertos), but I do not know on what parameters I need to compare them for e.g. Memory footprint, certified etc.
On which points do we compare RTOS ?
You need to compare them on the parameters that are important to your application and meeting its requirements. Those may include for example:
Context switch time
Message passing performance
Scalability
RAM footprint
ROM footprint
Heap usage
OS primitives (queues, mutex, event-flags, semaphores, timer etc.)
Scheduling algorithms (priority-preemptive, round-robin, cooperative)
Per developer cost
Per unit royalty cost
Licence type/terms
Source or object code provided
Availability integrated middleware libraries (filesystem, USB, CAN, TCP/IP etc.)
Safety certified
Platform/target support
RTOS aware debugger support
RTOS/scheduling monitor/debug tools availability
Vendor support
Community support
Documentation quality
The possible parameters are many, and only you can determine what is useful and important to your project.
I suggest selecting about five parameters important to your project, and then analysing each option using the Kepner-Tregoe method. For each parameter you assign a weight based on its relative importance, you score each solution against each parameter, and then you sum the score x weight for an over all score. The method takes some of the subjectivity out of selection and perhaps importantly provides evidence of your decision making process when you have to justify it to your boss.

What is the best definition of an RTOS?

I have yet to find a definition of an RTOS that is specific enough to have meaning. The best one I can find is on wiki:
https://en.wikipedia.org/wiki/Real-time_operating_system
However I have some critical comments/questions:
"Real Time" seems to be undefined in all the definitions for RTOS I've found. Nothing can be fast as actual real time (infinitesimally small!). Therefore, I believe "real time" only makes sense in the context of the observer. Real time for a human using an iPhone user might be <20ms because human eye sight cannot detect changes faster than that. For an air bag deployment it might be <1ms. All definitions on the internet seem to gloss over the definition of "real time"!
If RTOS is defined by the requirement to execute something within a specific time frame ("deadline"), why does jitter come into the definition? If the iPhone response jitters between 12-14ms, is it no longer responding in real time? It meets the 20ms requirement, right? If one time the response went to 100ms, the user might notice, at which point the system is not an RTOS
How can there possibly be a "soft" RTOS?! The definition of RTOS is meeting a particular deadline time requirement. If it doesn't meet it, than its not an RTOS! The very definition of RTOS prohibits a "soft" RTOS
To me it seems there is no formal and precise definition of RTOS. It's a general term to explain the characteristic of an OS who's main priority is the appearance of "real time" (per requirement number) to a particular type of observer. It also seems like the name has taken on implementation meaning such as how things are processed, multi-tasking, message passing, semaphores, etc... all which may NOT be part of an RTOS at all if the system fails to respond within the "deadline" requirement, right?
Sorry about such a ubiquitous question, but I can't get a clear picture in my brain. All definitions I've found are simply not precise enough or cloud the definition with implementation details.
You're right that no definition defines the exact time bounds. That's not the goal of a definition. Real time isn't dependent on the observer, though, but the application. As applications differ, time bounds differ, and therefore a definition cannot give that bound as a number.
Jitter is irrelevant as long as the application's time bound is met. You're absolutely right about the example. If the deadline is 20 ms, taking 100 ms is a failure. If the OS is to blame for the delay, it's not an RTOS.
"Soft realtime" has a very specific meaning, and this is probably the only thing you really got wrong. The concept at work here is, what do you do when a task exceeds its deadline? (Note: this could be either the fault of the task itself or the RTOS.) In a hard realtime system, the task simply has no value anymore. A late outcome is as good as no outcome, and you cancel the task. No point in risking other tasks.
Soft RTOS is actually more complex. Finishing the task still has value, although diminished. So the RTOS cannot hard kill the task, but the OS still has to ensure other tasks meet their deadlines. That requires extra care, which wouldn't have been necessary if you'd just kill the task.
There is an Embedded Systems Dictionary. Here are some excerpts:
real-time adj. Having timeliness requirements, typically in the form of deadlines that can’t be missed.
real-time operating system n. An operating system designed specifically for use in real-time systems. Abbreviated RTOS.
real-time system n. Any computer system, embedded or otherwise, that has timeliness requirements. The following question can be used
to distinguish real-time systems from the rest: “Is a late answer as
bad, or even worse, than a wrong answer?” In other words, what happens
if the computation doesn’t finish in time? If nothing bad happens,
it’s not a real-time system. If someone dies or the mission fails,
it’s generally considered “hard” real-time, which is meant to imply
that the system has hard deadlines. Everything in between is “soft”
real-time.

How can I limit the number of blocks written in a Write_10 command?

I have a product that is basically a USB flash drive based on an NXP LPC18xx microcontroller. I'm using a library provided from the manufacturer (LPCOpen) that handles the USB MSC and the SD card media (which is where I store data).
Here is the problem: Internally the LPC18xx has a 64kB (limited by hardware) buffer used to cache reads/writes which means it can only cache up to 128 blocks(512B) of memory. The SCSI Write-10 command has a total-blocks field that can be up to 256 blocks (128kB). When originally testing the product on Windows 7 it never writes more than 128 blocks at a time but when tested on Linux it sometimes writes more than 128 blocks, which causes the microcontroller to crash.
Is there a way to tell the host OS not to request more than 128 blocks? I see references[1] to a Read-Block-Limit command(05h) but it doesn't seem to be widely supported. Also, what sense key would I return on the Write-10 command to tell Linux the write is too large? I also see references to a block limit VPD page in some device spec sheets but cannot find a lot of documentation about how it is implemented.
[1]https://en.wikipedia.org/wiki/SCSI_command
Let me offer a disclaimer up front that this is what you SHOULD do, but none of this may work. A cursory search of the Linux SCSI driver didn't show me what I wanted to see. So, I'm not at all sure that "doing the right thing" will get you the results you want.
Going by the book, you've got to do two things: implement the Block Limits VPD and handle too-large transfer sizes in WRITE AND READ.
First, implement the Block Limits VPD page, which you can find in late revisions of SBC-3 floating around on the Internet (like this one: http://www.13thmonkey.org/documentation/SCSI/sbc3r25.pdf). It's probably worth going to the t10.org site, registering, and then downloading the last revision (http://www.t10.org/cgi-bin/ac.pl?t=f&f=sbc3r36.pdf).
The Block Limits VPD page has a maximum transfer length field that specifies the maximum number of blocks that can be transferred by all the READ and WRITE commands, and basically anything else that reads or writes data. Of course the downside of implementing this page is that you have to make sure that all the other fields you return are correct!
Second, when handling READ and WRITE, if the command's transfer length exceeds your maximum, respond with an ILLEGAL REQUEST key, and set the additional sense code to INVALID FIELD IN CDB. This behavior is indicated by a table in the section that describes the Block Limits VPD, but only in late revisions of SBC-3 (I'm looking at 35h).
You might just start with returning INVALID FIELD IN CDB, since it's the easiest course of action. See if that's enough?