How can I access the ref count of a Perl hash? - perl

I'm trying to enable the garbage collector of my script to do a better job. There's a ton of memory that it should be able to reclaim, but something is stopping it.
I've used Devel::Cycle a bit and that's allowed me to get closer but I'm not quite there.
How do I find out the current reference count for a Perl hash (the storage for my objects)?
Is there a way to track who is holding a reference to an object? Perhaps a sort of Tie that says, whenever someone points are this object, remember who that someone is.

You are looking for Devel::Refcount.

If you are worried about returning unused memory to the OS, you should know that is not possible in general. The memory footprint of your Perl program will be proportional to the largest allocation during the lifetime of your program.
See How can I make my Perl program take less memory? in the Perl FAQ list as well as Mini-Tutorial: Perl's Memory Management (as pointed out by #Evan Carroll in the comments).

Related

Perl: How to free memory allocated for a scalar without access to the Perl variable?

This question is related to an answer to a former question about memory handling by Perl. I've learned that one can free memory in Perl by explicitly using the undef function on an available scalar and using Devel::Peek or Devel::Size or such one can see how many memory is allocated for a scalar. In all those cases the scalars debugged are used within their scope.
But is it possible to debug things like allocated memory outside the scope of variables, just on the level of a Perl interpreter? Something like searching for all allocated memory for all "things" that are a scalar in the current interpreter and print their associated data, like current value or such?
And if that's the case, if one does already have that information, is one even able to free the known memory? Just like calling undef on a scalar, but without the scalar, something more low level, like on those "things" output of Devel::Peek.
What I'm thinking about is having a mod_perl cleanup handler executed after a request, scanning the current mod_perl interpreter for large chunks of data and freeing them manually. Simply because I decide that large blocks of allocated data are of no use anymore, even if Perl thinks otherwise:
Finally and perhaps the biggest win is memory re-use: as calls are made into Perl subroutines, memory allocations are made for variables when they are used for the first time. Subsequent use of variables may allocate more memory, e.g. if a scalar variable needs to hold a longer string than it did before, or an array has new elements added. As an optimization, Perl hangs onto these allocations, even though their values "go out of scope".
https://perl.apache.org/docs/2.0/user/intro/overview.html#Threads_Support
I could find a lot of monitoring and debugging packages around low level memory access, but no hint yet how one could call something like the undef function on some low level Perl struct in Perl. Might simply not be possible without any XS or such...
is it possible to debug things like allocated memory outside the scope of variables
There really isn't any such memory. Any memory allocated outside of variables is surely needed. As you yourself point out, it's the memory allocated for variables that make up most "wasted" space.
but no hint yet how one could call something like the undef function on some low level Perl struct in Perl.
It's because there are no such structs.
Just like calling undef on a scalar, but without the scalar, something more low level, like on those "things" output of Devel::Peek.
Devel::Peek's only function, Dump, outputs things in variables. Like you've said, undef is what you'd want to clear these.
From the above, it's obvious you want to know how to free the memory associated with the variables in subs.
You also overlooked the fact that many operators have an associated variable (called "target") in which they return their result.
Approach 1
A simple way to clear all those variables would be to selectively clear the symbol table (%::). This would effectively "unload" every module. Be sure not clear core components (perl -E'say for sort keys %::'). And don't forget to clear %INC so the modules can be reloaded.
If clearing the symbol table is the approach you want to take, it might be less risky and time-consuming to take a snapshot of %:: early on, and restore that snapshot when it's time to clear the symbol.
Approach 2
If you didn't want to reload the modules, you could take attempt to locate every sub, and undef their vars, then undef the vars of their ops.
A sub's vars exists within its pads. Conveniently, so do opcode targets. There's a pad for each level of recursion the sub has experienced.
Given a reference to a sub, you can find the variables in a sub's pads. You can refer to PadWalker for an example of how to do this. You can't actually use PadWalker since it only returns one variable per variable name, even if there are more than one (due to more than one variable being declared with the same name, or due to recursion).
Captured variables and our variables should be left untouched. It's possible to detect whether a pad entry is one of these. (Again, refer to PadWalker.)
(Obviously, you could also looking into freeing the sub's extra pads!)
How do you find all the subs? Well, navigating the symbol table will give you most of them. Finding anon ones will be trickier.
Approach 3
The most efficient approach is to simply terminate the mod_perl thread/process. A new clean one will automatically be spawned. It's also the simplest to implement, as it's simply a configuration change (setting MaxRequestsPerChild to 1).
Another form of wasted memory is a memory leak. That's another large question, so I'm not touching it.
I think you are looking for this answer to a similar question.
Everything you really need to know you can find in the internals to the Devel::MAT::* submodules. Namely the Devel::MAT::Dumper.xs, which has the structure of the heap for perl interpreter. The module is designed to dump the heap at signal and analyze it later, but I think you can turn it into a runtime check. If you need help on reading the XS, look here.
To debug memory allocation you should recompile perl with -Accflags=-DPERL_MEM_LOG DOC
(see related question about how to recompile perl)
You maybe will be interested in MEMORY DEBUGGERS
To free perl scalar, just like when it leave her scope:
{ # Scope enter
my $x; # Memory allocation
} # << HERE $x is freed
You should just reduce its variable REFCNT to zero by SvREFCNT_dec macro DOC
To free an SV that you've created, call SvREFCNT_dec(SV*) . Normally this call is not necessary (see Reference Counts and Mortality).
Here is pseudo code:
{
$x;
call_xs_sub( $x ); # << HERE $x is freed
}
XS pseudo code:
call_xs_sub( SV *sv ) {
...
SvREFCNT_dec( sv ); # <<HERE scalar is freed
...
}
To spy every memory allocation you should walk perl arenas.
At compile time you may view every place where variable is declared and accessed with help of B::Xref module
Or run perl with -Dm option (The perl should be compiled with corresponding options. See this topic):
perl -Dm script.pl

How to identify places accumulating memory use in a Perl script?

In my Perl script, it runs with high accumulation speed of occupied memory. I have tried making suspect variables clear immediately when they are no longer needed, but the problem can not be fixed. Is there any method to monitor change of memory occupation before and after executing a block?
I have recently had to troubleshoot an out-of-memory situation in one of my programs. While I do not claim to be an expert in this matter by any means, I'm going to share my findings in the hope that it will benefit someone.
1. High, but stable, memory usage
First, you should ensure that you do not just have a case of high, but stable, memory usage. If memory usage is stable, even if your process does not fit in available memory, the discussion below won't be of much help. Here are some notes worth reading in Perl's documentation here and here, in this SO question, in this PerlMonks discussion. There is an interesting analysis here if you're familiar with Perl internals. A lot of deep information is to be found in Tim Bunce's presentation. You should be aware that Perl may not return memory to the system even if you undef stuff. Finally, there's this opinion from a Perl developer that you shouldn't worry too much about memory usage.
2. Steadily growing memory usage
In case memory usage steadily grows, this may eventually cause an out-of-memory situation. My problem turned out to be a case of circular references. According to this answer on StackOverflow, circular references are a common source of memory leaks in Perl. The underlying reason is that Perl uses a reference counting mechanism and cannot release circularly referenced memory until program exit. (Note: I haven't been able to find a more up-to-date version in Perl's documentation of the last claim.)
You can use Scalar::Util::weaken to 'weaken' a circular reference chain (see also http://perlmaven.com/eliminate-circular-reference-memory-leak-using-weaken).
3. Further reading
Tim Bunce's presentation (slides here); also in this blog post
http://www.perlmonks.org/?node_id=472366
Perl memory usage profiling and leak detection?
and of course the link given by #mpapec: http://perlmaven.com/how-much-memory-does-the-perl-application-use
4. Tools
on Unix, you could do system("ps -p $$ -o vsz,rsz,sz,size") Caution: as explained in Tim Bunce's presentation, you'll want to track VSIZE instead of RSS
How to find the amount of physical memory occupied by a hash in Perl?
https://metacpan.org/pod/Devel::Size
and a more recent take by Tim Bunce, which adds the possibility of estimating the total interpreter memory size: https://metacpan.org/pod/Devel::SizeMe
in test scripts, you can use https://metacpan.org/pod/Test::LeakTrace and https://metacpan.org/pod/Test::Memory::Cycle; an example here
https://metacpan.org/pod/Devel::InterpreterSize

Can I use dtrace on OS X 10.5 to determine which of my perl subs is causing the most memory allocation?

We have a pretty big perl codebase.
Some processes that run for multiple hours (ETL jobs) suddenly started consuming a lot more RAM than usual. Analysis of the changes in the relevant release is a slow and frustrating process. I am hoping to identify the culprit using more automated analysis.
Our live environment is perl 5.14 on Debian squeeze.
I have access to lots of OS X 10.5 machines, though. Dtrace and perl seem to play together nicely on this platform. Seems that using dtrace on linux requires a boot more work. I am hoping that memory allocation patterns will be similar between our live system and a dev OS X system - or at least similar enough to help me find the origin of this new memory use.
This slide deck:
https://dgl.cx/2011/01/dtrace-and-perl
shows how to use dtrace do show number of calls to malloc by perl sub. I am interested in tracking the total amount of memory that perl allocates while executing each sub over the lifetime of a process.
Any ideas on how this can be done?
There's no single way to do this, and doing it on a sub-by-sub basis isn't always the best way to examine memory usage. I'm going to recommend a set of tools that you can use, some work on the program as a whole, others allow you to examine a single section of your code or a single variable.
You might want to consider using Valgrind. There's even a Perl module called Test::Valgrind that will help set up a suppression file for your Perl build, and then check for memory leaks or errors in your script.
There's also Devel::Size which does exactly what you asked for, but on a variable-by-variable basis rather than a sub-by-sub basis.
You can use Devel::Cycle to search for inadvertent circular memory references in complex data structures. While a circular reference doesn't mean that you're wasting memory as you use the object, circular references prevent anything in the chain from being freed until the cycle is broken.
Devel::Leak is a little bit more arcane than the rest, but it basically will allow you to get full information on any SVs that are created and not destroyed between two points in your program's execution. If you check this across a sub call, you'll know any new memory that that subroutine allocated.
You may also want to read the perldebguts section of the Perl manual.
I can't really help more because every codebase is going to wind up being different. Test::Valgrind will work great for some codebases and terribly on others. If you are going to try it, I recommend you use the latest version of Valgrind available and Perl >= 5.10, as Perl 5.8 and Valgrind historically didn't get along too well.
You might want to look at Memory::Usage and Devel::Size
To check the whole process or sub:
use Memory::Usage;
my $mu = Memory::Usage->new();
# Record amount of memory used by current process
$mu->record('starting work');
# Do the thing you want to measure
$object->something_memory_intensive();
# Record amount in use afterwards
$mu->record('after something_memory_intensive()');
# Spit out a report
$mu->dump();
Or to check specific variables:
use Devel::Size qw(size total_size);
my $size = size("A string");
my #foo = (1, 2, 3, 4, 5);
my $other_size = size(\#foo);
my $foo = {
a => [1, 2, 3],
b => {a => [1, 3, 4]}
};
my $total_size = total_size($foo);
The answer to the question is 'yes'. Dtrace can be used to analyze memory usage in a perl process.
This snippet of code:
https://github.com/astletron/perl-dtrace-malloc/blob/master/perl-malloc-total-bytes-by-sub.d
tracks how memory use increases between the call and return of every sub in a program. As an added bonus, dtrace seems to sort the output for you (at least on OS X). Cool.
Thanks to all that chimed in. I answered this one myself as the question is really specific to dtrace/perl.
You could write a simple debug module based on Devel::CallTrace that prints the sub entered as well as the current memory size of the current process. (Using /proc or whatever.)

Is there a way to see how much memory a variable uses?

I am working with Perl for some months now.
As for now, my scripts work but they are far from being perfect.
I would now like to optimize the memory usage, thus I am looking for a way to break down the memory usage per variable/hash.
Is there a way to see how much memory a variable uses?
Devel::Size or Devel::Size::Report can be used to get memory usage for a variable/structure.
You might want to check perl guts illustrated to see what the numbers really mean.
Have a look at Devel::Size on CPAN.

Which tool should I use for finding out my memory allocation in Perl?

I've slurped in a big file using File::Slurp but given the size of the file I can see that I must have it in memory twice or perhaps it's getting inflated by being turned into 16 bit unicode. How can I best diagnose that sort of a problem in Perl?
The file I pulled in is 800mb in size and my perl process that's analysing that data has roughly 1.6gb allocated at runtime.
I realise that I may be wrong about my reason for the problem but I'm not sure the most efficient way to prove/disprove my theory.
Update:
I have elminated dodgy character encoding from the list of suspects. It looks like I'm copying the variable at some point, I just can't figure out where.
Update 2:
I have now done some more investigation and discovered that it's actually just getting the data from File::Slurp that's causing the problem. I had a look through the documentation and discovered that I can get it to return a scalar_ref, i.e.
my $data = read_file($file, binmode => ':raw', scalar_ref => 1);
Then I don't get the inflation of my memory. Which makes some sense and is the most logical thing to do when getting the data in my situation.
The information about looking at what variables exist etc. has generally helpful though thanks.
Maybe Devel::DumpSizes and/or Devel::Size can help out? I think the former would be more useful in your case.
Devel::DumpSizes - Dump the name and size in bytes (in increasing order) of variables that are available at a give point in a script.
Devel::Size - Perl extension for finding the memory usage of Perl variables
Here are some generic resources on memory issues in Perl:
http://perl.active-venture.com/pod/perldebguts-perlmemory.html
Perl memory usage profiling and leak detection?
How can I find memory leaks in long-running Perl program?
As far as your own suggestion, the simplest way to disprove would be to write a simple Perl program that:
Creates a big (100M) file of plain text, probably by just outputting the same string in a loop into a file, or for binary files running dd command via system() call
Read the file in using standard Perl open()/#a=<>;
Measure memory consumption.
Then repeat #2-#3 for your 800M file.
That will tell you if the issue is File::Slurp, some weird logic in your program, or some specific content in the file (e.g. non-ascii, although I'd be surprized if that ends up to be the reason)