Memory Management in perl - perl

I have facing a weird issue of handling memory in perl.
I am working in a perl application which uses pretty big hash-structures. I am assigning the has ref to and fro objects. But at the end it seems even if I am deallocating the object and the hash, the memory usage is remaining same.
Here is a sample of the problem:
my $hash = {};
.............
this ds gets populated with a lot of data ...
.......
{
my $obj = new Class("data"=>$hash);
.......
.......
......
}
#even undefing the $hash
undef $hash;
# I can expect some improvement in Memory Utilization, but its not happening
I think I am doing some very basic mistakes. Can any one suggest?

You can't really return memory back to the OS. Perl will usually keep it in order to reallocate it later, though it will garbage collect occasionally.
See http://learn.perl.org/faq/perlfaq3.html#How-can-I-free-an-array-or-hash-so-my-program-shrinks-
and
http://clokwork.net/2012/02/12/memory-management-in-perl/

Generally speaking, Perl memory management does what you need to do, and you needn't worry about it. For example, what is the harm of keeping a huge chunk of memory allocated for the rest of your program? Probably none. Perl will release it if your OS is in danger of running out of memory.
Suppose you had some special case, like a script that runs constantly in the background, but occasionally needs to do a memory-intensive task. You could solve this by separating it into two scripts: background.pl and the memory-intensive-task.pl. The background.pl would execute memory-intensive-task.pl when needed. The memory would be freed when this program completed and exited.

Related

What are the Perl techniques to detach just a portion of code to run independently?

I'm not involved in close-to-OS programming techniques, but as I know, when it comes to doing something in parallel in Perl the weapon of choice is fork and probably some useful modules built upon it. The doc page for fork says:
Does a fork(2) system call to create a new process running the same program at the same point.
As a consequence, having a big application that consumes a lot of memory and calling fork for a small task means there will be 2 big perl processes, and the second will waste resources just to do some simple work.
So, the question is: what to do (or how to use fork, if it's the only method) in order to have a detached portion of code running independently and consuming just the resources it needs?
Just a very simpel example:
use strict;
use warnings;
my #big_array = ( 1 .. 2000000 ); # at least 80 MB memory
sleep 10; # to have time to inspect easely the memory usage
fork();
sleep 10; # to have time to inspect easely the memory usage
and the child process consumes 80+ MB too.
To be clear: it's not important to communicate to this detached code or to use its result somehow, just to be possible to say "hey, run for me this simple task in the background and let me continue my heavy work meanwhile ... and don't waste my resources!" when running a heavy perl application.
fork() to exec() is your bunny here. You fork() to create a new process (which is a fairly cheap operation, see below), then exec() to replace the big perl you've got running with something smaller. This looks like this:
use strict;
use warnings;
use 5.010;
my #ary = (1 .. 10_000_000);
if (my $pid = fork()) {
# parent
say "Forked $pid from $$; sleeping";
sleep 1_000;
} else {
# child
exec('perl -e sleep 1_000');
}
(#ary was just used to fill up the original process' memory a bit.)
I said that fork()ing was relatively cheap, even though it does copy the entire original process. These statements are not in conflict; the guys who designed fork noticed this same problem. The copy is lazy, that is, only the bits that are actually changed are copied.
If you find you want the processes to talk to each other, you'll start getting into the more complex domain of IPC, about which a number of books have been written.
Your forked process is not actually using 80MB of resident memory. A large portion of that memory will be shared - 'borrowed' from the parent process until either the parent or child writes to it, at which point copy-on-write semantics will cause the memory to actually be copied.
If you want to drop that baggage completely, run exec in your fork. That will replace the child Perl process with a different executable, thus freeing the memory. It's also perfect if you don't need to communicate anything back to the parent.
There is no way to fork just a subset of your process's footprint, so the usual workarounds come down to:
fork before you run memory intensive code in the parent process
start a separate process with system or open HANDLE,'|-',.... Of course this new process won't inherit any data from its parent, so you will need to pass data to this child somehow.
fork() as implemented on most operating systems is nicely efficient. It commonly uses a technique called copy-on-write, to mean that pages are initially shared until one or other process writes to them. Also a lot of your process memory is going to be readonly mapped files anyway.
Just because one process uses 80MB before fork() doesn't mean that afterwards the two will use 160. To start with it will be only a tiny fraction more than 80MB, until each process starts dirtying more pages.

Questions around memory utilization in Perl

SO community,
I have been scratching my head lately around two memory issues I am running into with some of my perl scripts and I am hoping I am finding some help/pointers here to better understand what is going on.
Questionable observation #1:
I am running the same perl script on different server instances (local laptop macosx, dedicated server hardware, virtual server hardware) and am getting significantly varying results in the traced memory consumption. Just after script initialization one instance would report be a memory consumption of the script of 210 MB compared to 330 MB on another box which is a fluctuation of over 60%. I understand that the malloc() function in charge of "garbage collection" for Perl is OS specific but are there deviations normal or should I be looking more closely at what is going on?
Questionable observation #2:
One script that is having memory leaks is relatively trivial:
foreach(#dataSamples) {
#memorycheck_1
my $string = subRoutine($_);
print FILE $string;
#memorycheck_2
}
All variables in the subRoutine are kept local and should be out of scope once the subroutine finishes. Yet when checking memory usage at #memorycheck_1 and #memorycheck_1 there is a significant memory leak.
Is there any explanation for that? Using Devel::Leak it seems there are leaked pointers which I have a hard time understanding where they would be coming from. Is there an easy way to translate the response of Devel::Leak into something that can actually give me pointers from where those leaked references origin?
Thanks
You have two different questions:
1) Why is the memory footprint not the same across various environments?
Well, are all the OS involved 64 bit? Or is there a mix? If one OS is 32 bit and the other 64 bit, the variation is to be expected. Or, as #hobbs notes in the comments, is one of the perls compiled with threads support whereas another is not?
2) Why does the memory footprint change between check #1 and check #2?
That does not necessarily mean there is a memory leak. Perl won't give back memory to the OS. The memory footprint of your program will be the largest footprint it reaches and will not go down.
Neither of these points is Perl specific. For more detail, you'll need to show more detail.
See also Question 7.25 in the C FAQ and further reading mentioned in that FAQ entry.
The most common reason for a memory leak in Perl is circular references. The simplest form would be something along the lines of:
sub subRoutine {
my( $this, $that );
$this = \$that;
$that = \$this;
return $_[0];
}
Now of course people reading that are probably saying, "Why would anyone do that?" And one generally wouldn't. But more complex data structures can contain circular references pretty easily, and we don't even blink an eye at them. Consider double-linked lists where each node refers to the node to its left and its right. It's important to not let the last explicit reference to such a list pass out of scope without first breaking the circular references contained in each of its nodes, or you'll get a structure that is inaccessible but can't be garbage collected because the reference count to each node never falls to zero.
Per Eric Strom's excellent suggestion, the core module Scalar::Util has a function called weaken. A reference that has been weakened won't hold a reference count to the entity it refers to. This can be helpful for preventing circular references. Another strategy is to implement your circular-reference-wielding datastructure within a class where an object method explicitly breaks the circular reference. Either way, such data structures do require careful handling.
Another source of trouble is poorly written XS modules (nothing against XS authors; it's just really tricky to write XS modules well). What goes on behind the closed doors of an XS module may be a memory leak.
Until we see what's happening inside of subRoutine we can only guess whether or not there's actually an issue, and what the source of the issue may be.

What can I do to find out what's causing my program to consume lots of memory over time?

I have an application using POE which has about 10 sessions doing various tasks. Over time, the app starts consuming more and more RAM and this usage doesn't go down even though the app is idle 80% of the time. My only solution at present is to restart the process often.
I'm not allowed to post my code here so I realize it is difficult to get help but maybe someone can tell me what I can do find out myself?
Don't expect the process size to decrease. Memory isn't released back to the OS until the process terminates.
That said, might you have reference loops in data structures somewhere? AFAIK, the perl garbage collector can't sort out reference loops.
Are you using any XS modules anywhere? There could be leaks hidden inside those.
A guess: your program executes a loop for as long as it is running; in this loop it may be that you allocate memory for a buffer (or more) each time some condition occurs; since the scope is never exited, the memory remains and will never be cleaned up. I suggest you check for something like this. If it is the case, place the allocating code in a sub that you call from the loop and where it will go out of scope, and get cleaned up, on return to the loop.
Looks like Test::Valgrind is a tool for searching for memory leaks. I've never used it myself though (but I used plain valgrind with C source).
One technique is to periodically dump the contents of $POE::Kernel::poe_kernel to a time- or sequence-named file. $poe_kernel is the root of a tree spanning all known sessions and the contents of their heaps. The snapshots should monotonically grow if the leaked memory is referenced. You'll be able to find out what's leaking by diff'ing an early snapshot with a later one.
You can export POE_ASSERT_DATA=1 to enable POE's internal data consistency checks. I don't expect it to surface problems, but if it does I'd be very happy to receive a bug report.
Perl can not resolve reference rings. Either you have zombies (which you can detect via ps axl) or you have a memory leak (reference rings/circle)
There are a ton of programs to detect memory leaks.
strace, mtrace, Devel::LeakTrace::Fast, Devel::Cycle

Perl memory usage profiling and leak detection?

I wrote a persistent network service in Perl that runs on Linux.
Unfortunately, as it runs, its Resident Stack Size (RSS) just grows, and grows, and grows, slowly but surely.
This is despite diligent efforts on my part to expunge all unneeded hash keys and delete all references to objects that would otherwise cause reference counts to remain in place and obstruct garbage collection.
Are there any good tools for profiling the memory usage associated with various native data primitives, blessed hash reference objects, etc. within a Perl program? What do you use for tracking down memory leaks?
I do not habitually spend time in the Perl debugger or any of the various interactive profilers, so a warm, gentle, non-esoteric response would be appreciated. :-)
You could have a circular reference in one of your objects. When the garbage collector comes along to deallocate this object, the circular reference means that everything referred to by that reference will never get freed. You can check for circular references with Devel::Cycle and Test::Memory::Cycle. One thing to try (although it might get expensive in production code, so I'd disable it when a debug flag is not set) is checking for circular references inside the destructor for all your objects:
# make this be the parent class for all objects you want to check;
# or alternatively, stuff this into the UNIVERSAL class's destructor
package My::Parent;
use strict;
use warnings;
use Devel::Cycle; # exports find_cycle() by default
sub DESTROY
{
my $this = shift;
# callback will be called for every cycle found
find_cycle($this, sub {
my $path = shift;
foreach (#$path)
{
my ($type,$index,$ref,$value) = #$_;
print STDERR "Circular reference found while destroying object of type " .
ref($this) . "! reftype: $type\n";
# print other diagnostics if needed; see docs for find_cycle()
}
});
# perhaps add code to weaken any circular references found,
# so that destructor can Do The Right Thing
}
You can use Devel::Leak to search for memory leaks. However, the documentation is pretty sparse... for example, just where does one get the $handle reference to pass to Devel::Leak::NoteSV()? f I find the answer, I will edit this response.
Ok it turns out that using this module is pretty straightforward (code stolen shamelessly from Apache::Leak):
use Devel::Leak;
my $handle; # apparently this doesn't need to be anything at all
my $leaveCount = 0;
my $enterCount = Devel::Leak::NoteSV($handle);
print STDERR "ENTER: $enterCount SVs\n";
# ... code that may leak
$leaveCount = Devel::Leak::CheckSV($handle);
print STDERR "\nLEAVE: $leaveCount SVs\n";
I'd place as much code as possible in the middle section, with the leaveCount check as close to the end of execution (if you have one) as possible -- after most variables have been deallocated as possible (if you can't get a variable out of scope, you can assign undef to it to free whatever it was pointing to).
What next to try (not sure if this would be best placed in a comment after Alex's question above though): What I'd try next (other than Devel::Leak):
Try to eliminate "unnecessary" parts of your program, or segment it into separate executables (they could use signals to communicate, or call each other with command-line arguments perhaps) -- the goal is to boil down an executable into the smallest amount of code that still exhibits the bad behaviour. If you're sure it's not your code that's doing it, reduce the number of external modules you're using, particularly those that have an XS implementation. If perhaps it is your own code, look for anything potentially fishy:
definitely any use of Inline::C or XS code
direct use of references, e.g. \#list or \%hash, rather than preallocated references like [ qw(foo bar) ] (the former creates another reference which may get lost; in the latter, there is just one reference to worry about, which is usually stored in a local lexical scalar
manipulating variables indirectly, e.g. $$foo where $foo is modified, which can cause autovivication of variables (although you need to disable strict 'refs' checking)
I recently used NYTProf as a profiler for a large Perl application. It doesn't track memory usage, but it does trace all executed code paths which helps with finding out where leaks originate. If what you are leaking is scarce resources such as database connections, tracing where they are allocated and closed goes a long way towards finding leaks.
A nice guide about this is included in the Perl manual : Debugging Perl memory usage

How can I find memory leaks in long-running Perl program?

Perl uses reference counting for GC, and it's quite easy to make a circular reference by accident. I see that my program seems to be using more and more memory, and it will probably overflow after a few days.
Is there any way to debug memory leaks in Perl? Attaching to a program and getting numbers of objects of various types would be a good start. If I knew which objects are much more numerous than expected I could check all references to them and hopefully fix the leak.
It may be relevant that Perl never gives memory back to the system by itself: It's all up to malloc() and all the rules associated with that.
Knowing how malloc() allocates memory is important to answering the greater question, and it varies from system to system, but in general most malloc() implementations are optimized for programs allocating and deallocating in stack-like orders. Perl uses reference-counting for tracking memory which means that deallocations which means (unlike a GC-based language which uses malloc() underneath) it is actually not all that difficult to tell where deallocation is going to occur, and in what order.
It may be that you can reorganize your program to take advantage of this fact- by calling undef($old_object) explicitly - and in the right order, in a manner similar to the way C-programmers say free(old_object);
For long-running programs (days, months, etc), where I have loads of load/copy/dump cycles, I garbage-collect using exit() and exec(), and where it's otherwide unfeasible, I simply pack up my data structures (using Storable) and file descriptors (using $^F) and exec($0) - usually with an environment variable set like $ENV{EXEC_GC_MODE}, and you may need something similar even if you don't have any leaks of your own simply because Perl is leaking small chunks that your system's malloc() can't figure out how to give back.
Of course, if you do have leaks in your code, then the rest of my advice is somewhat more relevant. It was originally posted to another question on this subject, but it didn't explicitly cover long-running programs.
All perl program memory leaks will either be an XS holding onto a reference, or a circular data structure. Devel::Cycle is a great tool for finding circular references, if you know what structures are likely to contain the loops. Devel::Peek can be used to find objects with a higher-than-expected reference count.
If you don't know where else to look, Devel::LeakTrace::Fast could be a good first place, but you'll need a perl built for debugging.
If you suspect the leak is inside XS-space, it's much harder, and Valgrind will probably be your best bet. Test::Valgrind may help you lower the amount of code you need to search, but this won't work on Windows, so you'd have to port (at least the leaky portion) to Linux in order to do this.
Devel::Gladiator is another useful tool in this space.
Seems like the cpan module Devel::Cycle is what you are looking for. It requires making some changes to your code, but it should help you find your references without too many problems.
valgrind is a great linux application, which locates memory leaks in running code. If your Perl code runs on linux, you should check it out.
In addition to the other comments, you may find my Perl Memory Use talk at LPW2013 useful. I'd recommend watching the screencast as it explains the slides and has some cute visuals and some Q&A at the end.
I'd also suggest looking at Paul Evans Devel::MAT module which I mention in the talk.