How does external fragmentation happen? - operating-system

As processes are loaded and removed from memory , the free memory space is broken into little pieces ,causing fragmentation ... but how does this happen ?
And what is the best solution to external fragmentation ?

External fragmentation exists when there is enough total memory to satisfy a request (from a process usually), but the total required memory is not available at a contiguous location i.e, its fragmented.
Solution to external fragmentation :
1) Compaction : shuffling the fragmented memory into one contiguous location.
2) Virtual memory addressing by using paging and segmentation.

External Fragmentation
External fragmentation happens when a dynamic memory allocation algorithm allocates some memory and a small piece is left over that cannot be effectively used. If too much external fragmentation occurs, the amount of usable memory is drastically reduced. Total memory space exists to satisfy a request, but it is not contiguous.
see following example
0x0000 0x1000 0x2000
A B C //Allocated three blocks A, B, and C, of size 0x1000.
A C //Freed block B
Now Notice that the memory that B used cannot be included for an allocation larger than B's size

External fragmentation can be reduced by compaction or shuffle memory contents to place all free memory together in one large block. To make compaction feasible, relocation should be dynamic.External fragmentation is also avoided by using paging technique.

The best solution to avoid external fragmentation is Paging.
Paging is a memory management technique usually used by virtual memory operating systems to help ensure that the data you need is available as quickly as possible.
for more see this : What's the difference between operating system "swap" and "page"?
In case of Paging there is no external fragmentation but it doesn't avoid internal fragmentation.

Related

Memory Address Translation in OS

Is Memory address translation only useful when the total size of virtual memory
(summed over all processes) needs to be larger than physical memory?
Basically, the size of virtual memory depends on what you call "virtual memory". If you call virtual memory the virtual memory of one process then virtual memory has the same size than physical memory. If you call virtual memory the whole virtual memory of all processes than virtual memory can (technically) have an infinite size. This is because every process can have a whole address space. The virtual address space of one process cannot be bigger than physical memory because the processor has limited bits to address this memory. In modern long mode the processor has only 48 bits to address RAM at the byte level. This gives a very big amount of RAM but most systems will have 8GB to 32GB.
Technically on a 8GB RAM computer, every process could have 8GB allocated. I say technically because eventually, the computer will constantly be removing page frames from RAM and that will put too much overhead on the OS and on the computer which will make your system freeze. In the end, the size of the sum of the virtual memory of every process is limited by the capacity of your system (and OS) to have an efficient page swapping algorithm (and on your willingness to have a slow system).
Now to answer your question, paging (virtual memory) is used also to avoid fragmentation and for securing the system. With the old segmentation model, fragmentation was an issue because you had to run a complex algorithm to determine which part of memory a process gets. With paging, the smallest granularity of memory is 4KB. This makes everything much easier because a small process just gets a 4KB page and the process can work in that page the way it wants. While a bigger process will get several pages and can allocate more pages by doing a system call. There is still the issue of external fragmentation but it is mostly due to latency of accessing high memory vs low memory. Basically, paging solves the issue of external fragmentation because a process can get a page anywhere (where it's available) and it will not make a difference (except for high vs low memory). There is still the issue of internal fragmentation with paging.
Paging also secures the system. With segmentation you had several levels of ring protection. With paging you have only user or supervisor. With segmentation, the memory is not well protected because one process can access the memory of another process in the same segment. With paging, there are 2 different protections. The first protection is the ring itself (user vs supervisor) the second are the page tables. The page tables isolate one process from another because the memory accesses are translated to other positions in RAM. It is the job of the OS to fill the page tables properly so that one process doesn't have access to the physical memory allocated to another process. The user vs supervisor bit in the page tables, prevent one process from accessing the kernel except via a system call interface (the instruction syscall in assembly for x86).

How memory fragmentation is avoided in Swift

GC's compaction, sweep and mark avoids heap memory fragmentation. So how is memory fragmentation is avoided in Swift?
Are these statements correct?
Every time a reference count becomes zero, the allocated space gets added to an 'available' list.
For the next allocation, the frontmost chunk of memory which can fit the size is used.
Chunks of previously used up memory will be used again to be best extent possible
Is the 'available list' sorted by address location or size?
Will live objects be moved around for better compaction?
I did some digging in the assembly of a compiled Swift program, and I discovered that swift:: swift_allocObject is the runtime function called when a new Class object is instantiated. It calls SWIFT_RT_ENTRY_IMPL(swift_allocObject) which calls swift::swift_slowAlloc, which ultimately calls... malloc() from the C standard library. So Swift's runtime isn't doing the memory allocation, it's malloc() that does it.
malloc() is implemented in the C library (libc). You can see Apple's implementation of libc here. malloc() is defined in /gen/malloc.c. If you're interested in exactly what memory allocation algorithm is used, you can continue the journey down the rabbit hole from
there.
Is the 'available list' sorted by address location or size?
That's an implementation detail of malloc that I welcome you to discover in the source code linked above.
1. Every time a reference count becomes zero, the allocated space gets added to an 'available' list.
Yes, that's correct. Except the "available" list might not be a list. Furthermore, this action isn't necessarily done by the Swift runtime library, but it could be done by the OS kernel through a system call.
2. For the next allocation, the frontmost chunk of memory which can fit the size is used.
Not necessarily frontmost. There are many different memory allocation schemes. The one you've thought of is called "first fit". Here are some example memory allocation techniques (from this site):
Best fit: The allocator places a process in the smallest block of unallocated memory in which it will fit. For example, suppose a process requests 12KB of memory and the memory manager currently has a list of unallocated blocks of 6KB, 14KB, 19KB, 11KB, and 13KB blocks. The best-fit strategy will allocate 12KB of the 13KB block to the process.
First fit: There may be many holes in the memory, so the operating system, to reduce the amount of time it spends analyzing the available spaces, begins at the start of primary memory and allocates memory from the first hole it encounters large enough to satisfy the request. Using the same example as above, first fit will allocate 12KB of the 14KB block to the process.
Worst fit: The memory manager places a process in the largest block of unallocated memory available. The idea is that this placement will create the largest hold after the allocations, thus increasing the possibility that, compared to best fit, another process can use the remaining space. Using the same example as above, worst fit will allocate 12KB of the 19KB block to the process, leaving a 7KB block for future use.
Objects will not be compacted during their lifetime. The libc handles memory fragmentation by the way it allocates memory. It has no way of moving around objects that have already been allocated.
Alexander’s answer is great, but there are a few other details somewhat related to memory layout. Garbage Collection requires memory overhead for compaction, so the wasted space from malloc fragmentation isn’t really putting it at much of a disadvantage. Moving memory around also has a hit on battery and performance since it invalidates the processor cache. Apple’s memory management implementation can compress memory that hadn’t been accessed in awhile. Even though virtual address space can be fragmented, the actual RAM is less fragmented due to compression. The compression also allows faster swaps to disk.
Less related, but one of the big reasons Apple picked reference counting has more to do with c-calls then memory layout. Apple’s solution works better if you are heavily interacting with c-libraries. A garbage collected system is normally in order of magnitude slower interacting with c since it needs to halt garbage collection operations before the call. The overhead is usually about the same as a syscall in any language. Normally that doesn’t matter unless you are calling c functions in a loop such as with OpenGL or SQLite. Other threads/processes can normally use the processor resources while a c-call is waiting for the garbage collector so the impact is minimal if you can do your work in a few calls. In the future there may be advantages to Swift’s memory management when it comes to system programming and a rust-like lifecycle memory management. It is on Swift’s roadmap, but Swift 4 is not yet suited for systems programming. Typically in C# you would drop to managed c++ for systems programming and operations that make heavy use of c libraries.

Internal and External memory fragmentation

I am currently reading OS and read about internal and external memory fragmentation.
Internal fragmentation is based on fixed size partitioning. For example = paging is based on fixed size partitioning and hence, paging suffers from internal fragmentation.
On the other hand, External fragmentation is based on variable size partitioning.
For example = segmentation is based on dynamic variable size partitioning and hence, segmentation suffers from external fragmentation.
So, my doubt is there is internal fragmentation in paging, so it has 0 external fragmentation or there is something very small, so we can neglect that
and
Similarly, for segmentation, does it also has 0 internal fragmentation or very small, that can be neglected?
Is my understanding right ?
Internal fragmentation is sujected to "Fixed size partitioning scheme" and external fragmentation to "variable size partitioning ".
No, there can never be external fragementation in fixed size partitioning because the leftover space cannot be used to allocate to any other process. External fragmentation occurs only when "there is space available which can be allocated to the process but due to non availability of enough contiguous space,the available space cannot be allocated".
On the other hand,in case of variable size partitioning, there can never be internal fragmentation because the lefover space can be allocated to the process of same or less than the available space(though the probability of allotment could be very less).
We can remove internal fragmentation and external fragmentation, if we can use a method "non-contiguous allocation" in "variable size partitioning".

Where are multiple stacks and heaps put in virtual memory?

I'm writing a kernel and need (and want) to put multiple stacks and heaps into virtual memory, but I can't figure out how to place them efficiently. How do normal programs do it?
How (or where) are stacks and heaps placed into the limited virtual memory provided by a 32-bit system, such that they have as much growing space as possible?
For example, when a trivial program is loaded into memory, the layout of its address space might look like this:
[ Code Data BSS Heap-> ... <-Stack ]
In this case the heap can grow as big as virtual memory allows (e.g. up to the stack), and I believe this is how the heap works for most programs. There is no predefined upper bound.
Many programs have shared libraries that are put somewhere in the virtual address space.
Then there are multi-threaded programs that have multiple stacks, one for each thread. And .NET programs have multiple heaps, all of which have to be able to grow one way or another.
I just don't see how this is done reasonably efficient without putting a predefined limit on the size of all heaps and stacks.
I'll assume you have the basics in your kernel done, a trap handler for page faults that can map a virtual memory page to RAM. Next level up, you need a virtual memory address space manager from which usermode code can request address space. Pick a segment granularity that prevents excessive fragmentation, 64KB (16 pages) is a good number. Allow usermode code to both reserve space and commit space. A simple bitmap of 4GB/64KB = 64K x 2 bits to keep track of segment state gets the job done. The page fault trap handler also needs to consult this bitmap to know whether the page request is valid or not.
A stack is a fixed size VM allocation, typically 1 megabyte. A thread usually only needs a handful of pages of it, depending on function nesting level, so reserve the 1MB and commit only the top few pages. When the thread nests deeper, it will trip a page fault and the kernel can simply map the extra page to RAM to allow the thread to continue. You'll want to mark the bottom few pages as special, when the thread page faults on those, you declare this website's name.
The most important job of the heap manager is to prevent fragmentation. The best way to do that is to create a lookaside list that partitions heap requests by size. Everything less than 8 bytes comes from the first list of segments. 8 to 16 from the second, 16 to 32 from the third, etcetera. Increasing the size bucket as you go up. You'll have to play with the bucket sizes to get the best balance. Very large allocations come directly from the VM address manager.
The first time an entry in the lookaside list is hit, you allocate a new VM segment. You subdivide the segment into smaller blocks with a linked list. When such an allocation is released, you add the block to the list of free blocks. All blocks have the same size regardless of the program request so there won't be any fragmentation. When the segment is fully used and no free blocks are available you allocate a new segment. When a segment contains nothing but free blocks you can return it to the VM manager.
This scheme allows you to create any number of stacks and heaps.
Simply put, as your system resources are always finite, you can't go limitless.
Memory management always consists of several layers each having its well defined responsibility. From the perspective of the program, the application-level manager is visible that is usually concerned only with its own single allocated heap. A level above could deal with creating the multiple heaps if needed out of (its) one global heap and assigning them to subprograms (each with its own memory manager). Above that could be the standard malloc()/free() that it uses and above those the operating system dealing with pages and actual memory allocation per process (it is basically not concerned not only about multiple heaps, but even user-level heaps in general).
Memory management is costly and so is trapping into the kernel. Combining the two could impose severe performance hit, so what seems to be the actual heap management from the application's point of view is actually implemented in user space (the C runtime library) for the sake of performance (and other reason out of scope for now).
When loading a shared (DLL) library, if it is loaded at program startup, it will of course be most probably loaded to CODE/DATA/etc so no heap fragmentation occurs. On the other hand, if it is loaded at runtime, there's pretty much no other chance than using up heap space.
Static libraries are, of course, simply linked into the CODE/DATA/BSS/etc sections.
At the end of the day, you'll need to impose limits to heaps and stacks so that they're not likely to overflow, but you can allocate others.
If one needs to grow beyond that limit, you can either
Terminate the application with error
Have the memory manager allocate/resize/move the memory block for that stack/heap and most probably defragment the heap (its own level) afterwards; that's why free() usually performs poorly.
Considering a pretty large, 1KB stack frame on every call as an average (might happen if the application developer is unexperienced) a 10MB stack would be sufficient for 10240 nested call -s. BTW, besides that, there's pretty much no need for more than one stack and heap per thread.

overhead of reserving address space using mmap

I have a program that routinely uses massive arrays, where the memory is allocated using mmap
Does anyone know the typical overheads of allocating address space in large amounts before the memory is committed, either if allocating with MAP_NORESERVE or backing the space with a sparse file? It5 strikes me mmap can't be free since it must make page table entries for the allocated space. I want to have some idea of this overhead before implementing an algorithm I'm considering.
Obviously the answer is going to be platform dependent, im most interested in x64 linux, sparc solaris and sparc linux. I'm thinking that the availability of 1mb pages makes the overhead rather less on a sparc than x64.
The overhead of mmap depends on the way you use it. And it is typically negligible when you use it in an appropriate way.
In linux kernel, mmap operation can be divided into two parts:
look for a free address range that can hold the mapping
Create/enlarge vma struct in address space (mm_struct)
So allocate large amount of memory use mmap do not introduce more
overhead than small ones.
So you should allocate memory as larger as possible in each time. (avoid mutiple times of small mmap)
And you may provide the start address explicitly (if possible). This could save some time in kernel in looking for an large enough free space.
If your application is an multi-threaded program. You should avoid concurrent calls to mmap. That is because the address space is protected by a reader-writer lock and the mmap always takes the writer lock. mmap latency will be orders of magnitude greater in this case.
Moreover, mmap only create the mapping but not the page table. Pages are allocated in the page fault handler when being touched. Page fault handler would take the reader lock that protects address space and can also affects mmap performance.
In this case, you should always try to reuse your large array instead of munmap it and mmap again. (Avoid pagefaults)