What is physical storage allocation? - operating-system

I am studying in Msc.It. in subject called 'Advanced Operating System', there is chapter called file system, in which topic named "PHYSICAL STORAGE ALLOCATION" is there. i searched in books and on internet too but didn't find any topic related to it. can any one give me definition of PHYSICAL STORAGE ALLOCATION and explain it's work in 4-5 lines.

The term is IMHO ambiguous.
It could mean, in the context of file systems, the allocation and management of blocks on a physical device (e.g. a disk, a disk partition, some USB storage key). The ext2 wikipage has a nice figure explaining that (for the old ext2 file system). In general, data is read and written from disks in blocks of fixed size (512 bytes on old disks, 4Kbytes on newer ones), and the file system code organize files and provide the "file" abstraction above such blocks.
(your edit, and the image you are showing, suggests that first meaning)
It could also mean, in the context of virtual memory, the allocation of pages in physical RAM (related to the configuration of the MMU), or maybe in the swap device.
Read also Operating Systems: Three Easy Pieces (freely downloadable).

Related

Strategy for local logging on an embedded Raspberry Pi?

My company uses the Raspberry Pi 3 as an embedded controller in a product. User's don't power it off gracefully, they just flip a switch. To avoid corruption, the /boot and /root file systems are read-only. This appears to be bulletproof - we've used test rig to "pull the plug" over and over (2000+ cycles) with no problems.
We are working on a new feature that requires local logging. To do so, we created an additional ext4 read/write partition on the SD card (we are currently using about 2GB on an 8GB card) for the log file. To minimize wear, the application buffers the log data and writes to the card only once every minute. The log file is closed between writes. Nothing else uses that partition. The log file is not written to when the application is in a state that likely indicates the user is about to shut down.
In testing this, we've found that in spite of the rather conservative approach we're using, the read/write partition is always marked as "dirty" after a reboot, frequently contains filesystem errors, and often has a damaged log file. We've also had a number of cards suffer unrecoverable errors which prevent the device from booting up.
Loss of the last set of log entries is not a problem.
Loss of the log file is undesireable but acceptable.
Damage to the /root and /boot filesystems is unacceptable, as is physical damage (other than standard NAND flash wear) to the card.
Short of adding a UPS to gracefully shut down the Pi, is there any approach that will safely allow for read/write operations?
Is there a configuration of the SD card partition "geometry" that would ensure that no two partitions overlap one flash erase block?
Just some points:
Dirty flag: I guess that you are not unmounting the filesystem, right? This is a possible reason to see dirty flag after each unclean reboot. Another (probably better way) is to switch filesystem to read-only mode after writing and make it read-write before writing the file.
BTW, ext4 defers writes to the disk. close() on file doesn't mean that the files are written to the disk, you need to call extra fsync() or sync (see Does Linux guarantee the contents of a file is flushed to disc after close()?). So it is better to ask system to really write the file.
I suggest to use UBIFS or JFFS2 or YAFFS2. Its best practics way. Also I heard about LogFS.
All time mount and writing without delay posible because this FS designed to work with hard shutdown.
Copy-Pasted oveview from https://superuser.com/questions/248078/choice-of-filesystem-for-gnu-linux-on-an-sd-card
JFFS2
Includes compression and elegant wear leveling protection.
YAFFS2
Single thing that makes the difference: short mount times, after successful umount.
Implements write once property: once data is written to one block, there is no need to rewrite it. This is important, as it reduces wear.
LogFS
Not very mature, but already included in Linux kernel tree.
Supports larger filesystems than JFFS2/YAFFS2 without problems.
UBIFS
More mature than LogFS
Write caching support
On scalability: [article][3]. On large disks, better performance than with JFFS2
ext4
If no driver or card (for example SSD drives do have internal wear leveling, at least usually) handle wear leveling, then ext4 is not the best idea, as it is not intended for raw flash usage.

Are pages only secondary memory like hard drives, or are they used for RAM too?

I'm confused by this.
Are pages only memory units that exist in secondary memory or do they also exist in RAM too?
A memory page is the smallest unit of memory used by a virtual memory manager. A page can be backed by physical RAM, or by swap space or a page file on a hard drive. Pages backed by RAM have much faster IO, but as RAM gets full the OS may have to swap out pages to the hard drive.
Pages do not exist [physically] at all. A page is simply a redirection mechanism.
The operating system sets up of linear, logical address space for each process. The logical address space is organized into pages that in turn may map to:
A physical page frame of memory
No where
Somewhere on disk and managed by the operating system.
Paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. Pages are used in RAM too, as a solution of external fragmentation.External fragmentation is a situation when total free space is enough to hold another process but space available is not contiguous. Compaction is one of the solution but for processes which are run-time loaded only. So, Paging is the true solution for external fragmentation where we implement page table which gives illusion that process has been given contiguous memory. Every address from CPU is broken down to page number and offset.

Why does the sequential write to a journal file speed up if it is in a different file system?

As per MongoDB documentation at http://docs.mongodb.org/manual/core/journaling,
To speed the frequent sequential writes that occur to the current
journal file, you can ensure that the journal directory is on a
different filesystem
storing the journal file on a different file system speeds things up. Is it because two different hard disk spindles are at work? Just wanted to understand the mechanics of this optimization tip.
Yes,
If you are using physical rotating hard drives, there is significant performance benefit from separating the journal activities onto a separate (preferably dedicated) physical drive.
The benefits are not the same if you're using SAN hardware. And to an extent are lessened by larger drive caches available in modern hard drives. And it's a different story again with SSD.
The main factor with spinning disks is seek time - the time that it takes for the read/write head to get to the right part of the disk. Hard disks are arranged with circular tracks. To get to a specific block on the disk, the head moves to the right track, and the disk spins around to the right place (the disks keep spinning of course, so it's simply a matter of waiting for the right place to come around).
This doesn't take much time, but when it's happening a lot it adds up.
When you have the primary activity and the journal activity on the same drive, the head has to rapidly move between the two (many, really) locations that the system needs to look at.
If you have your journalling on another physical drive, then the head on that drive can be almost (or perhaps more accurately, relatively) static, with the ability to more rapidly access the correct track / location required. Meanwhile the other drive (with the primary activity on it) will be more efficient also, because the head will not be constantly seeking back to the where the journal entries are being written between the other activities required to keep the database running.
This benefit applies to most database systems and many other applications where there is a constant sequential writing to disk going on at the same time as other mixed disk activity.
You don't get the same profile if you're using SAN, because even if it appears to be separate file systems, it's actually likely to be striped across many drives which are both cached and shared.
SSD has a different profile also, because there is no physical seek time.

What's the difference between "virtual memory" and "swap space"?

Can any one please make me clear what is the difference between virtual memory and swap space?
And why do we say that for a 32-bit machine the maximum virtual memory accessible is 4 GB only?
There's an excellent explantation of virtual memory over on superuser.
Simply put, virtual memory is a combination of RAM and disk space that running processes can use.
Swap space is the portion of virtual memory that is on the hard disk, used when RAM is full.
As for why 32bit CPU is limited to 4gb virtual memory, it's addressed well here:
By definition, a 32-bit processor uses
32 bits to refer to the location of
each byte of memory. 2^32 = 4.2
billion, which means a memory address
that's 32 bits long can only refer to
4.2 billion unique locations (i.e. 4 GB).
There is some confusion regarding the term Virtual Memory, and it actually refers to the following two very different concepts
Using disk pages to extend the conceptual amount of physical memory a computer has - The correct term for this is actually Paging
An abstraction used by various OS/CPUs to create the illusion of each process running in a separate contiguous address space.
Swap space, OTOH, is the name of the portion of disk used to store additional RAM pages when not in use.
An important realization to make is that the former is transparently possible due to the hardware and OS support of the latter.
In order to make better sense of all this, you should consider how the "Virtual Memory" (as in definition 2) is supported by the CPU and OS.
Suppose you have a 32 bit pointer (64 bit points are similar, but use slightly different mechanisms). Once "Virtual Memory" has been enabled, the processor considers this pointer to be made as three parts.
The highest 10 bits are a Page Directory Entry
The following 10 bits are a Page Table Entry
The last 12 bits make up the Page Offset
Now, when the CPU tries to access the contents of a pointer, it first consults the Page Directory table - a table consisting of 1024 entries (in the X86 architecture the location of which is pointed to by the CR3 register). The 10 bits Page Directory Entry is an index in this table, which points to the physical location of the Page Table. This, in turn, is another table of 1024 entries each of which is a pointer in physical memory, and several important control bits. (We'll get back to these later). Once a page has been found, the last 12 bits are used to find an address within that page.
There are many more details (TLBs, Large Pages, PAE, Selectors, Page Protection) but the short explanation above captures the gist of things.
Using this translation mechanism, an OS can use a different set of physical pages for each process, thus giving each process the illusion of having all the memory for itself (as each process gets its own Page Directory)
On top of this Virtual Memory the OS may also add the concept of Paging. One of the control bits discussed earlier allows to specify whether an entry is "Present". If it isn't present, an attempt to access that entry would result in a Page Fault exception. The OS can capture this exception and act accordingly. OSs supporting swapping/paging can thus decide to load a page from the Swap Space, fix the translation tables, and then issue the memory access again.
This is where the two terms combine, an OS supporting Virtual Memory and Paging can give processes the illusion of having more memory than actually present by paging (swapping) pages in and out of the swap area.
As to your last question (Why is it said 32 bit CPU is limited to 4GB Virtual Memory). This refers to the "Virtual Memory" of definition 2, and is an immediate result of the pointer size. If the CPU can only use 32 bit pointers, you have only 32 bit to express different addresses, this gives you 2^32 = 4GB of addressable memory.
Hope this makes things a bit clearer.
IMHO it is terribly misleading to use the concept of swap space as equivalent to virtual memory. VM is a concept much more general than swap space. Among other things, VM allows processes to reference virtual addresses during execution, which are translated into physical addresses with the support of hardware and page tables. Thus processes do not concern about how much physical memory the system has, or where the instruction or data is actually resident in the physical memory hierarchy. VM allows this mapping. The referenced item (instruction or data) may be resident in L1, or L2, or RAM, or finally on disk, in which case it is loaded into main memory.
Swap space it is just a place on secondary memory where pages are stored when they are inactive. If there is no sufficient RAM, the OS may decide to swap-out pages of a process, to make room for other process pages. The processor never ever executes instruction or read/write data directly from swap space.
Notice that it would be possible to have swap space in a system with no VM. That is, processes that directly access physical addresses, still could have portions of it on
disk.
Though the thread is quite old and has already been answered. Still would like to share this link as this is the simplest explanation I have found so far. Below link has got diagrams for better visualization.
Key Difference: Virtual memory is an abstraction of the main memory. It extends the available memory of the computer by storing the inactive parts of the content RAM on a disk. Whenever the content is required, it fetches it back to the RAM. Swap memory or swap space is a part of the hard disk drive that is used for virtual memory. Thus, both are also used interchangeably.
Virtual memory is quiet different from the physical memory. Programmers get direct access to the virtual memory rather than physical memory. Virtual memory is an abstraction of the main memory. It is used to hide the information of the real physical memory of the system. It extends the available memory of the computer by storing the inactive parts of the RAM's content on a disk. When the content is required, it fetches it back to the RAM. Virtual memory creates an illusion of a whole address space with addresses beginning with zero. It is mainly preferred for its optimization feature by which it reduces the space requirements. It is composed of the available RAM and disk space.
Swap memory is generally called as swap space. Swap space refers to the portion of the virtual memory which is reserved as a temporary storage location. Swap space is utilized when available RAM is not able to meet the requirement of the system’s memory. For example, in Linux memory system, the kernel locates each page in the physical memory or in the swap space. The kernel also maintains a table in which the information regarding the swapped out pages and pages in physical memory is kept.
The pages that have not been accessed since a long time are sent to the swap space area. The process is referred to as swapping out. In case the same page is required, it is swapped in physical memory by swapping out a different page. Thus, one can conclude that swap memory and virtual memory are interconnected as swap memory is used for the technique of virtual memory.
difference-between-virtual-memory-and-swap-memory
"Virtual memory" is a generic term. In Windows, it is called as Paging or pagination. In Linux, it is called as Swap.

Total memory consumption of the system

Is it correct to assume that the total memory consumption (virtual + physical) of a system is sum of "Memory Usage" and "VM Size" columns shown by the task manager in windows?
Read these posts by Mark Russinovich:
http://blogs.technet.com/markrussinovich/archive/2008/07/21/3092070.aspx
http://blogs.technet.com/markrussinovich/archive/2008/11/17/3155406.aspx
In modern Windows there really is no single truth about "Total Memory Consumption". It depends of course on the definition, but the real question is what you want to do with the answer.
Some processes like SQL-Server tend to use every byte of memory they can get their hands on, if you let them. The .NET CLR garbage collector monitors memory use and acts accordingly, trying to free more memory when it gets scarce.
So for instance you can have a system with 8 GB of physical memory, of which 90% is "used". How much of that memory is actually needed, is very hard to say. The same system may run on a 4 GB machine with no noticeable performance loss or any other issues.
If you want to explore some of the complexities of memory management under Windows, download "VMMap v2.0" from the former sysinternals site. It shows very detailed memory usage per process and may aid you in your quest.
To quote from VMMaps Help:
VMMap categorizes memory into one of several types:
Image
The memory represents an executable file such as a .exe or .dll. The Details column shows the file's path.
Private
Private memory cannot be shared with other processes, is charged against the system commit limit, and typically contains application data.
Shareable
Shareable memory can be shared with other processes, is charged against the system commit limit and typically contains data shared between DLLs in different processes or inter-process communication messages. The Windows APIs refer to this type of memory as pagefile-backed sections.
Mapped File
The memory represents a file on disk and the Details column shows the file's path. Mapped files typically contain application data.
Heap
Heaps represent memory managed by the user-mode heap manager and, like Private memory, is charged against the system commit limit and contains application data.
Managed Heap
Managed heap represents memory that's allocated and used by the .NET garbage collector.
Stack
Stacks are memory used to store function parameters, local function variables and function invocation records for individual threads. Stacks are charged agains the commit limit and typically grow on demand.
System
System memory is kernel-mode physical memory associated with the process. The vast majority of System memory consists of the process page tables.
Free
Free memory regions are spaces in the process address space that are not allocated.
Now you just need to define what types of memory you consider as "used", add these up for all processes, remove multiple duplicates and look at the number... There is a reason why in task manager or other tools, there is no single number labeled "Total Memory Consumption" :-)
No, physical memory and virtual memory may overlap. If a page of memory is in virtual memory and then paged in to physical memory the virtual memory is not necessarily freed, it may be reserved for when the page gets paged out again.