Memory leak - absence of a garbage collector - operating-system

Let us think of a memory leak program, wherein a block of heap memory is not freed and the program terminates. If this was (say) a Java Program, the in-built garbage collector would have automatically deallocated this heap block before the program exits.
But even in C++, if the program exits, wouldn't the Kernel automatically de-allocate all space associated with the process. Also in the Java code, the kernel would have to de-allocate the space for the text part (code) of the process (even if the stack and heap parts are deallocated by the garbage collector). So is the overall advantage of using a garbage collector feature - just the increased savings in time required to deallocate the heap by the program itself rather than the kernel? (if there is any such savings)
EDIT: A primary doubt of mine that has come about looking at the responses - will the GC invoke itself automatically when memory usage reaches a limit? Because, if a GC is only invoked just before the program terminates, it is not going to be useful for long programs.

That assumes that the kernel cleans up after you. Not all OSs take care of dynamically-allocated memory automatically. (But to be fair: Most modern ones, at least on the desktop, do.)
Even the OSs that reclaim all memory only do so when the process terminates. Most programs allocate far more memory over their total runtime than they need at any given point in time (when run long enough, where "long" can be a few seconds for many data-crunching applications).
Because of that, many - especially long-running - processes would create more and more garbage (memory that isn't used any more, and won't be used ever again) over their lifetime without any hope of getting rid of it without terminating. You don't want to kill and restart the whole process just to keep memory usage low, do you?
Because unused memory is almost never (there are quite a few processes that run indefinitely and some that can run for hours) disposed of, you get serious memory shortage after a while. Your browser keeps all images, HTML documents, JS objects, etc. you opened during this session in memory because you won't bother to restart it every few minutes. That's bullshit and a serious problem in the browser, you say? My point exactly.
Moreover, most (that is to say, all good ones) GCs don't deallocate everything - they run from time to time when they think it's worth it, but when the process shuts down, everything that remains in memory is left to a lower level (be it a custom allocator or the OS) to be freed. This is also why finalizers aren't guaranteed to run - in a short-running program that doesn't make many allocations, the GC may never run.
So no, GC isn't about saving time. It's about saving tons of memory, preventing long-running allocation-intense programs from hogging all available memory and eventually making everyone die from out of memory errors.

Let's say that program allocates some resource, uses it all time it is running, but doesn't release it properly before exit. When this program exits, kernel deallocates all program resources - it is OK.
Now consider situation when some function creates memory leak on every call, and this function is called 100 times in a second. After few minutes or hours this program crashes because there is no free memory.
The bad thing is that programmer who makes memory and resource leaks of type 1, usually makes a leaks of type 2, producing dirty and unstable code. Professional programmer writes perfect code with 0 resource and memory leaks. If garbage collector is available - it is OK. If not - manage resources yourself.
BTW, it is still possible to make a leaks with garbage collector - like well-known .NET event source-consumer leak. So, garbage collector is very useful, saves a lot of developer time, but in any case developer must carefully manage program resources.

Related

How memory fragmentation is avoided in Swift

GC's compaction, sweep and mark avoids heap memory fragmentation. So how is memory fragmentation is avoided in Swift?
Are these statements correct?
Every time a reference count becomes zero, the allocated space gets added to an 'available' list.
For the next allocation, the frontmost chunk of memory which can fit the size is used.
Chunks of previously used up memory will be used again to be best extent possible
Is the 'available list' sorted by address location or size?
Will live objects be moved around for better compaction?
I did some digging in the assembly of a compiled Swift program, and I discovered that swift:: swift_allocObject is the runtime function called when a new Class object is instantiated. It calls SWIFT_RT_ENTRY_IMPL(swift_allocObject) which calls swift::swift_slowAlloc, which ultimately calls... malloc() from the C standard library. So Swift's runtime isn't doing the memory allocation, it's malloc() that does it.
malloc() is implemented in the C library (libc). You can see Apple's implementation of libc here. malloc() is defined in /gen/malloc.c. If you're interested in exactly what memory allocation algorithm is used, you can continue the journey down the rabbit hole from
there.
Is the 'available list' sorted by address location or size?
That's an implementation detail of malloc that I welcome you to discover in the source code linked above.
1. Every time a reference count becomes zero, the allocated space gets added to an 'available' list.
Yes, that's correct. Except the "available" list might not be a list. Furthermore, this action isn't necessarily done by the Swift runtime library, but it could be done by the OS kernel through a system call.
2. For the next allocation, the frontmost chunk of memory which can fit the size is used.
Not necessarily frontmost. There are many different memory allocation schemes. The one you've thought of is called "first fit". Here are some example memory allocation techniques (from this site):
Best fit: The allocator places a process in the smallest block of unallocated memory in which it will fit. For example, suppose a process requests 12KB of memory and the memory manager currently has a list of unallocated blocks of 6KB, 14KB, 19KB, 11KB, and 13KB blocks. The best-fit strategy will allocate 12KB of the 13KB block to the process.
First fit: There may be many holes in the memory, so the operating system, to reduce the amount of time it spends analyzing the available spaces, begins at the start of primary memory and allocates memory from the first hole it encounters large enough to satisfy the request. Using the same example as above, first fit will allocate 12KB of the 14KB block to the process.
Worst fit: The memory manager places a process in the largest block of unallocated memory available. The idea is that this placement will create the largest hold after the allocations, thus increasing the possibility that, compared to best fit, another process can use the remaining space. Using the same example as above, worst fit will allocate 12KB of the 19KB block to the process, leaving a 7KB block for future use.
Objects will not be compacted during their lifetime. The libc handles memory fragmentation by the way it allocates memory. It has no way of moving around objects that have already been allocated.
Alexander’s answer is great, but there are a few other details somewhat related to memory layout. Garbage Collection requires memory overhead for compaction, so the wasted space from malloc fragmentation isn’t really putting it at much of a disadvantage. Moving memory around also has a hit on battery and performance since it invalidates the processor cache. Apple’s memory management implementation can compress memory that hadn’t been accessed in awhile. Even though virtual address space can be fragmented, the actual RAM is less fragmented due to compression. The compression also allows faster swaps to disk.
Less related, but one of the big reasons Apple picked reference counting has more to do with c-calls then memory layout. Apple’s solution works better if you are heavily interacting with c-libraries. A garbage collected system is normally in order of magnitude slower interacting with c since it needs to halt garbage collection operations before the call. The overhead is usually about the same as a syscall in any language. Normally that doesn’t matter unless you are calling c functions in a loop such as with OpenGL or SQLite. Other threads/processes can normally use the processor resources while a c-call is waiting for the garbage collector so the impact is minimal if you can do your work in a few calls. In the future there may be advantages to Swift’s memory management when it comes to system programming and a rust-like lifecycle memory management. It is on Swift’s roadmap, but Swift 4 is not yet suited for systems programming. Typically in C# you would drop to managed c++ for systems programming and operations that make heavy use of c libraries.

What is the connection of autorelease pool to garbage collection?

i have read this from apple docs..
In a garbage collected environment, release is a no-op (a do-nothing instruction). NSAutoreleasePool therefore provides a drain method that in a reference-counted environment behaves the same as calling release, but which in a garbage collected environment triggers garbage collection (if the memory allocated since the last collection is greater than the current threshold). Typically, therefore, you should use drain rather than release to dispose of an autorelease pool.
but not getting the meaning of
1) "if the memory allocated since the last collection is greater than the current threshold."
and
2) ios is not supporting garbage collector then what is the use of drain with garbage collector ?
1) It probably means that the GC remembers the amount of allocated memory, and the next time drain is called the amount of allocated memory is compared with the last amount. Only if the change is significant enough garbage collection takes place.
Let me explain it in a different way: garbage collection can be expensive, so you need to decide when to collect. In order to avoid unnecessary work, the GC might remember the amount of memory after a collection run (for example, 25MB are used). Now the next time the GC considers collecting, it first decides whether it's worth doing all the work. For example, if the amount of memory used is now 25.5MB it's probably not worth doing anything. But if now 50MB are in use the collecting is useful.
2) This stuff originated on the Mac, where GC is available. To enable sharing of code between iOS and Mac OS X, these seemingly unnecessary methods still stick around on iOS to remain as compatible with Mac OS X as possible. I even suspect it's due to the fact that iOS is actually a "fork" of Mac OS X and thus inherited this stuff.

Everyday GC is running on same time

In my server everyday 3:00AM GC is running and Heapspace is filling in a flash.
This causing site outage. ANY inputs?
following are my JVM settings.I am using JBOSS server.
-Dprogram.name=run.sh -server -Xms1524m -Xmx1524m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000 -XX:NewSize=512m -XX:MaxNewSize=512m -Djava.net.preferIPv4Stack=true -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -Djavax.net.ssl.trustStorePassword=changeit -Dcom.sun.management.jmxremote.port=8888 -Djava.rmi.server.hostname=192.168.100.140 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote
Any suggestions really helpful..
(This turned out somewhat long; there is an actual suggestion for a fix at the end.)
Very very briefly, garbage collection when you use -XX:+UseConcMarkSweepGC works like this:
All objects are allocated in the so-called young generation. This is typically a couple of hundred megs up to a gig in size, depending om VM settings, number of CPU:s and total heap size. The young generation is collected in a stop-the-world pause, followed by a parallel (multiple CPU) compacting (moving objects) collection. The young generation is sized so as to make this pause reasonably large.
When objects have survived (are still reachable) young gen they get promoted to "old-gen" (old generation).
The old generation is where -XX:+UseConcMarkSweepGC kicks in. In the default mode (without -XX:+UseConcMarkSweepGC) when the old generation becomes full the entire heap is collected and compacted (moving around, eliminating fragmentation) at once in a stop-the-world copy. That pause will typically be longer than young-gen pauses because the entire heap is involved, which is bigger.
With CMS (-XX:+UseConcMarkSweepGC) the work to compact the old generation is mostly concurrent (meaning, running in the background with the application not paused). This work is also not compacting; it works more like malloc()/free() and you are subject to fragmentation.
The main upside of CMS is that when things work well, you avoid long pause times that are linear in the size of the heap, because the main work is cone concurrently (there are some stop-the-world steps involved but they are supposed to usually be short).
The two primary downsides are that:
You are subject to fragmentation because old-gen is not compacted.
If you don't finish a concurrent collection cycle before old-gen fills up, or if fragmentation prevents allocation, the resulting full collection of the entire heap is not parallel as it is with the default collector. I..e, only one CPU is used. That means that when/if you do hit a full garbage collection, the pause will be longer than it would have been with the default collector.
Now... your logs. "Concurrent mode failure" is intended to convey that the concurrent mark/sweep work did not complete in time for another young-gen GC that needs to promote surviving objects into the old generation. The "promotion failed" is rather that during promotion from young-gen to old-gen, an object was unable to be allocated in old-gen due to fragmentation.
Unless you are hitting a true bug in the JVM, the sudden increase in heap usage is almost certainly from your application, JBoss, or some external entity acting on your application. So I can't really help with that. However, what is likely happening is a combination of two things:
The spike in activity is causing an increase in heap usage too quick for the concurrent collection to complete in time.
Old-gen is too fragmented, causing problems especially when the old-gen is almost full.
I should also point out now that the default behavior of CMS is to try to postpone concurrent collections as long as possible (yet not too long) for performance reasons. The later it happens, the more efficient (in terms of CPU usage) the collection is. However, a trade-off is that you're increasing the risk of not finishing in time (which again, will trigger a full GC and a long pause). It should also (I have not made empirical tests here, but it stands to reason) result in fragmentation being a greater concern; basically the more full old-gen is when an object is promoted, the greater is the likelyhood that the object's promotion will worsen fragmentation concerns (too long to go into details here).
In your case, I would do two things:
Keep figuring out what is causing the activity. I would say it's fairly unlikely that it is a GC/JVM bug.
Re-configure the JVM to trigger concurrent collection cycles earlier in order to avoid the heap every becoming so full that fragmentation becomes a particularly huge concern, and giving it more time to complete in time even during your sudden spikes of activity.
You can accomplish (2) most easily be using the JVM options
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
in order to explicitly force the JVM to kick start a CMS cycle at a certain level of heap usage (in this example 75% - you may need to change that; the lower the percentage, the earlier it will kick in).
Note that depending on what your live size is (the number of bytes that are in fact live and reachable) in your application, forcing an earlier CMS cycle may also require that you increase your heap size to avoid CMS running constantly (not a good use of CPU).

Writing a daemon in perl

I'm writing a daemon for a newsletter in Perl.
The daemon will be running 24/7 on the server. It'll have an active connection to postgresql database almost all the time.
I don't have that much experience with Perl so I would love if some of you can share information about the following:
How to limit the RAM. I don't want to get out of ram. As I said this program will be running all the time as a daemon without being stopped.
What should I be aware of when writing such daemons ?
As far as SQL connection - make sure you don't leak memory. Retrieve the least amount of data you need from the query, and ensure that the data structures storing the data go out of scope immediately so garbage collector can reclaim them
Please note that there may be memory leaks you have no control over (e.g. in Postgresql connectivity code). It's been known to happen. The best solution for that problem (short of doing precise memory profiling and fixing the leaks in underlying libraries) is for your daemon to pull a Phoenix - stop doing what it's doing and exec() a new copy of itself.
As far as writing Perl daemons, some resources:
http://www.webreference.com/perl/tutorial/9/
Proc::Daemon - Run Perl program(s) as a daemon process.
Regarding #1: Perl is garbage collected.
The effective meaning of that is you should ensure all references to data are cleaned up when you are done with them, thus allowing the garbage collector to run.
http://perldoc.perl.org/perlobj.html#Two-Phased-Garbage-Collection
One thing to watch for are memory leaks. There’s a very nice thread about memory leaks in Perl already on SO.

Does it make sense to cache data obtained from a memory mapped file?

Or it would be faster to re-read that data from mapped memory once again, since the OS might implement its own cache?
The nature of data is not known in advance, it is assumed that file reads are random.
i wanted to mention a few things i've read on the subject. The answer is no, you don't want to second guess the operating system's memory manager.
The first comes from the idea that you want your program (e.g. MongoDB, SQL Server) to try to limit your memory based on a percentage of free RAM:
Don't try to allocate memory until there is only x% free
Occasionally, a customer will ask for a way to design their program so it continues consuming RAM until there is only x% free. The idea is that their program should use RAM aggressively, while still leaving enough RAM available (x%) for other use. Unless you are designing a system where you are the only program running on the computer, this is a bad idea.
(read the article for the explanation of why it's bad, including pictures)
Next comes from some notes from the author of Varnish, and reverse proxy:
Varnish Cache - Notes from the architect
So what happens with squids elaborate memory management is that it gets into fights with the kernels elaborate memory management, and like any civil war, that never gets anything done.
What happens is this: Squid creates a HTTP object in "RAM" and it gets used some times rapidly after creation. Then after some time it get no more hits and the kernel notices this. Then somebody tries to get memory from the kernel for something and the kernel decides to push those unused pages of memory out to swap space and use the (cache-RAM) more sensibly for some data which is actually used by a program. This however, is done without squid knowing about it. Squid still thinks that these http objects are in RAM, and they will be, the very second it tries to access them, but until then, the RAM is used for something productive.
Imagine you do cache something from a memory-mapped file. At some point in the future that memory holding that "cache" will be swapped out to disk.
the OS has written to the hard-drive something which already exists on the hard drive
Next comes a time when you want to perform a lookup from your "cache" memory, rather than the "real" memory. You attempt to access the "cache", and since it has been swapped out of RAM the hardware raises a PAGE FAULT, and cache is swapped back into RAM.
your cache memory is just as slow as the "real" memory, since both are no longer in RAM
Finally, you want to free your cache (perhaps your program is shutting down). If the "cache" has been swapped out, the OS must first swap it back in so that it can be freed. If instead you just unmapped your memory-mapped file, everything is gone (nothing needs to be swapped in).
in this case your cache makes things slower
Again from Raymon Chen: If your application is closing - close already:
When DLL_PROCESS_DETACH tells you that the process is exiting, your best bet is just to return without doing anything
I regularly use a program that doesn't follow this rule. The program
allocates a lot of memory during the course of its life, and when I
exit the program, it just sits there for several minutes, sometimes
spinning at 100% CPU, sometimes churning the hard drive (sometimes
both). When I break in with the debugger to see what's going on, I
discover that the program isn't doing anything productive. It's just
methodically freeing every last byte of memory it had allocated during
its lifetime.
If my computer wasn't under a lot of memory pressure, then most of the
memory the program had allocated during its lifetime hasn't yet been
paged out, so freeing every last drop of memory is a CPU-bound
operation. On the other hand, if I had kicked off a build or done
something else memory-intensive, then most of the memory the program
had allocated during its lifetime has been paged out, which means that
the program pages all that memory back in from the hard drive, just so
it could call free on it. Sounds kind of spiteful, actually. "Come
here so I can tell you to go away."
All this anal-rententive memory management is pointless. The process
is exiting. All that memory will be freed when the address space is
destroyed. Stop wasting time and just exit already.
The reality is that programs no longer run in "RAM", they run in memory - virtual memory.
You can make use of a cache, but you have to work with the operating system's virtual memory manager:
you want to keep your cache within as few pages as possible
you want to ensure they stay in RAM, by the virtue of them being accessed a lot (i.e. actually being a useful cache)
Accessing:
a thousand 1-byte locations around a 400GB file
is much more expensive than accessing
a single 1000-byte location in a 400GB file
In other words: you don't really need to cache data, you need a more localized data structure.
If you keep your important data confined to a single 4k page, you will play much nicer with the VMM; Windows is your cache.
When you add 64-byte quad-word aligned cache-lines, there's even more incentive to adjust your data structure layout. But then you don't want it too compact, or you'll start suffering performance penalties of cache flushes from False Sharing.
The answer is highly OS-specific. Generally speaking, there will be no sense in caching this data. Both the "cached" data as well as the memory-mapped can be paged away at any time.
If there will be any difference it will be specific to an OS - unless you need that granularity, there is no sense in caching the data.