Perl memory allocation - perl

The following simple C code allocates abouts 1.6% of my computer memory and completes in less than 2 seconds:
main()
{
int i = 0;
char *array = malloc(64000000);
for (i = 0; i < 64000000; i++) {
array[i] = i % 256;
}
getchar();
}
How can I do a similar thing in Perl?
The following Perl code consumes about 70% of my computer memory (At which I kill it)
my #array;
for(my $i=0;$i<64000000;$i++)
{
$array[$i]=1;
}
getc();
exit;
How do I malloc in Perl ?

You allocated an array of 64,000,000 SV* plus 64,000,000 scalars. The array alone is already 8 times the size of what you allocated in your C program. That's not counting any of the 64,000,000 scalars or the overhead of allocating 64,000,000 memory blocks.
To allocate 64,000,000 bytes, you can use the following:
my $s = "\0" x 64_000_000;
However, that place two copies in memory.[1] The following doesn't.
use Fcntl qw( SEEK_SET );
my $s;
{
open my $fh, '>', \$s;
seek($fh, 64_000_000-1, SEEK_SET);
print $fh "\0";
}
pack+substr can be used to store a number, and substr+unpack can be used to extract a number.
Finally, rather than dealing with packed numbers, you could use PDL.
Technically, it only places one copy into memory, and it does so at compile-time. Thanks to the copy-on-write (COW) mechanism, the assignment simply causes $s to share the buffer of the constant. But, I presume you intend to modify the buffer in $s, which would require making a writable copy of its buffer.

You are seeing the difference in variable sizes between languages.
See http://perlmaven.com/how-much-memory-do-perl-variables-use
This also has a good explanation of memory usage:
http://search.cpan.org/~nwclark/Devel-Size-0.79/lib/Devel/Size.pm
In short, your perl array will need at least 1536 MB of space to store that array.

Related

How can I remove perl object from memory

I'm having some issues with the memory usage of a perl script I wrote (code below). The script initiates some variables, fills them with data, and then undefines them again. However, the memory usage of the script after deleting everything is still way to high to contain no data.
Accoring to ps the script uses 1.027 Mb memory (RSS) during the first 39 seconds (so everything before the foreach loop). Then, memory usage starts rising and ends up fluctuating between 204.391 Mb and 172.410 Mb. However, even in the last 10 seconds of the script (where all data is supposed to be removed), memory usage never goes below 172.410 Mb.
Is there a way to permanently delete a variable and all data in it in perl (in order to reduce the memory usage of the script)? If so, how should I do it?
use strict;
use warnings;
sleep(30);
my $ELEMENTS = 1_000_000;
my $MAX_ELEMENT = 1_000_000_000;
my $if_condition = 1;
sleep(5);
my %hash = (1 => {}, 2 => {}, 3 => {}, 4 => {});
foreach my $key (keys %hash){
if( $if_condition ){
my $arrref1 = [ (rand($MAX_ELEMENT)) x $ELEMENTS ];
my $arrref2 = [ (rand($MAX_ELEMENT)) x $ELEMENTS ];
my $arrref3 = [ (rand($MAX_ELEMENT)) x $ELEMENTS ];
sleep(2);
if(!defined($hash{$key}->{'amplification'})){
$hash{$key}->{'amplification'} = [];
}
push(#{$hash{$key}->{'amplification'}},#{$arrref1});
undef($arrref1);
push(#{$hash{$key}->{'amplification'}},#{$arrref2});
undef($arrref2);
push(#{$hash{$key}->{'amplification'}},#{$arrref3});
undef($arrref3);
sleep(3);
delete($hash{$key});
sleep(5);
}
}
sleep(10);
Perl FAQ 3 - How can I free an array or hash so my program shrinks?
You usually can't. Memory allocated to lexicals (i.e. my() variables)
cannot be reclaimed or reused even if they go out of scope. It is
reserved in case the variables come back into scope. Memory allocated
to global variables can be reused (within your program) by using
undef() and/or delete().
On most operating systems, memory allocated
to a program can never be returned to the system. That's why
long-running programs sometimes re- exec themselves. Some operating
systems (notably, systems that use mmap(2) for allocating large chunks
of memory) can reclaim memory that is no longer used, but on such
systems, perl must be configured and compiled to use the OS's malloc,
not perl's.
In general, memory allocation and de-allocation isn't
something you can or should be worrying about much in Perl.
See also
"How can I make my Perl program take less memory?"
In general, perl won't release memory back to the system. It keeps its own pool of memory in case it is required for another purpose. This happens a lot because lexical data is often used in a loop, for instance your $arrref1 variables refer to a million-element array. If the memory for those arrays was returned to the system and reallocated every time around the loop there would be an enormous speed penalty
As I wrote, 170MB isn't a lot, but you can reduce the footprint by dropping your big temporary arrays and adding the list directly to the hash element. As it stands you are unnecessarily keeping two copies of each array
It would look like this
use strict;
use warnings 'all';
sleep 30;
use constant ELEMENTS => 1_000_000;
use constant MAX_ELEMENT => 1_000_000_000;
my $if_condition = 1;
sleep 5;
my %hash = ( 1 => {}, 2 => {}, 3 => {}, 4 => {} );
foreach my $key ( keys %hash ) {
next unless $if_condition;
sleep 2;
push #{ $hash{$key}{amplification} }, (rand MAX_ELEMENT) x ELEMENTS;
push #{ $hash{$key}{amplification} }, (rand MAX_ELEMENT) x ELEMENTS;
push #{ $hash{$key}{amplification} }, (rand MAX_ELEMENT) x ELEMENTS;
sleep 3;
delete $hash{$key};
sleep 5;
}
sleep 10;

Read binary file bit by bit

Is there a way that I can read a binary file bit by bit, without saving it as an array?
I have a very large binary file that I need to to read it bit by bit. And saving it as an array takes a lot of time, so I want to prevent this. I don't care what happened to the file content.
$size = stat($args{file});
my $vector;
open BIN, "<$args{file}";
read(BIN, $vector, $size->[7], 0);
close BIN;
# The code below is the part that takes a lot of time.
my #unpacked = split //, (unpack "B*", $vector);
return #unpacked;
Read in the file 1 byte at a time using the special $/ variable, and then use bitwise operators to check each bit in the byte. Should end up being something like the following:
$/ = \1; # read 1 byte at a time
while(<>) {
my $ord = ord($_);
# for each bit in the byte
for(1 .. 8) {
if($ord & 1) {
# do 1 stuff
}
else {
# do 0 stuff
}
# move onto the next bit
$ord >>= 1;
}
}
Use the builtin vec function to manipulate Perl scalars as bit vectors.

In Perl, how can I release memory to the operating system?

I am having some problems with memory in Perl. When I fill up a big hash, I can not get the memory to be released back to the OS. When I do the same with a scalar and use undef, it will give the memory back to the OS.
Here is a test program I wrote.
#!/usr/bin/perl
###### Memory test
######
## Use Commands
use Number::Bytes::Human qw(format_bytes);
use Data::Dumper;
use Devel::Size qw(size total_size);
## Create Varable
my $share_var;
my %share_hash;
my $type_hash = 1;
my $type_scalar = 1;
## Start Main Loop
while (true) {
&Memory_Check();
print "Hit Enter (add to memory): "; <>;
&Up_Mem(100_000);
&Memory_Check();
print "Hit Enter (Set Varable to nothing): "; <>;
$share_var = "";
$share_hash = ();
&Memory_Check();
print "Hit Enter (clean data): "; <>;
&Clean_Data();
&Memory_Check();
print "Hit Enter (start over): "; <>;
}
exit;
#### Up Memory
sub Up_Mem {
my $total_loops = shift;
my $n = 1;
print "Adding data to shared varable $total_loops times\n";
until ($n > $total_loops) {
if ($type_hash) {
$share_hash{$n} = 'X' x 1111;
}
if ($type_scalar) {
$share_var .= 'X' x 1111;
}
$n += 1;
}
print "Done Adding Data\n";
}
#### Clean up Data
sub Clean_Data {
print "Clean Up Data\n";
if ($type_hash) {
## Method to fix hash (Trying Everything i can think of!
my $n = 1;
my $total_loops = 100_000;
until ($n > $total_loops) {
undef $share_hash{$n};
$n += 1;
}
%share_hash = ();
$share_hash = ();
undef $share_hash;
undef %share_hash;
}
if ($type_scalar) {
undef $share_var;
}
}
#### Check Memory Usage
sub Memory_Check {
## Get current memory from shell
my #mem = `ps aux | grep \"$$\"`;
my($results) = grep !/grep/, #mem;
## Parse Data from Shell
chomp $results;
$results =~ s/^\w*\s*\d*\s*\d*\.\d*\s*\d*\.\d*\s*//g; $results =~ s/pts.*//g;
my ($vsz,$rss) = split(/\s+/,$results);
## Format Numbers to Human Readable
my $h = Number::Bytes::Human->new();
my $virt = $h->format($vsz);
my $h = Number::Bytes::Human->new();
my $res = $h->format($rss);
print "Current Memory Usage: Virt: $virt RES: $res\n";
if ($type_hash) {
my $total_size = total_size(\%share_hash);
my #arr_c = keys %share_hash;
print "Length of Hash: " . ($#arr_c + 1) . " Hash Mem Total Size: $total_size\n";
}
if ($type_scalar) {
my $total_size = total_size($share_var);
print "Length of Scalar: " . length($share_var) . " Scalar Mem Total Size: $total_size\n";
}
}
OUTPUT:
./Memory_Undef_Simple.cgi
Current Memory Usage: Virt: 6.9K RES: 2.7K
Length of Hash: 0 Hash Mem Total Size: 92
Length of Scalar: 0 Scalar Mem Total Size: 12
Hit Enter (add to memory):
Adding data to shared varable 100000 times
Done Adding Data
Current Memory Usage: Virt: 228K RES: 224K
Length of Hash: 100000 Hash Mem Total Size: 116813243
Length of Scalar: 111100000 Scalar Mem Total Size: 111100028
Hit Enter (Set Varable to nothing):
Current Memory Usage: Virt: 228K RES: 224K
Length of Hash: 100000 Hash Mem Total Size: 116813243
Length of Scalar: 0 Scalar Mem Total Size: 111100028
Hit Enter (clean data):
Clean Up Data
Current Memory Usage: Virt: 139K RES: 135K
Length of Hash: 0 Hash Mem Total Size: 92
Length of Scalar: 0 Scalar Mem Total Size: 24
Hit Enter (start over):
So as you can see the memory goes down, but it only goes down the size of the scalar. Any ideas how to free the memory of the hash?
Also Devel::Size shows the hash is only taking up 92 bytes even though the program still is using 139K.
Generally, yeah, that's how memory management on UNIX works. If you are using Linux with a recent glibc, and are using that malloc, you can return free'd memory to the OS. I am not sure Perl does this, though.
If you want to work with large datasets, don't load the whole thing into memory, use something like BerkeleyDB:
https://metacpan.org/pod/BerkeleyDB
Example code, stolen verbatim:
use strict ;
use BerkeleyDB ;
my $filename = "fruit" ;
unlink $filename ;
tie my %h, "BerkeleyDB::Hash",
-Filename => $filename,
-Flags => DB_CREATE
or die "Cannot open file $filename: $! $BerkeleyDB::Error\n" ;
# Add a few key/value pairs to the file
$h{apple} = "red" ;
$h{orange} = "orange" ;
$h{banana} = "yellow" ;
$h{tomato} = "red" ;
# Check for existence of a key
print "Banana Exists\n\n" if $h{banana} ;
# Delete a key/value pair.
delete $h{apple} ;
# print the contents of the file
while (my ($k, $v) = each %h)
{ print "$k -> $v\n" }
untie %h ;
(OK, not verbatim. Their use of use vars is ... legacy ...)
You can store gigabytes of data in a hash this way, and you will only use a tiny bit of memory. (Basically, whatever BDB's pager decides to keep in memory; this is controllable.)
In general, you cannot expect perl to release memory to the OS.
See the FAQ: How can I free an array or hash so my program shrinks?.
You usually can't. Memory allocated to lexicals (i.e. my() variables) cannot be reclaimed or reused even if they go out of scope. It is reserved in case the variables come back into scope. Memory allocated to global variables can be reused (within your program) by using undef() and/or delete().
On most operating systems, memory allocated to a program can never be returned to the system. That's why long-running programs sometimes re- exec themselves. Some operating systems (notably, systems that use mmap(2) for allocating large chunks of memory) can reclaim memory that is no longer used, but on such systems, perl must be configured and compiled to use the OS's malloc, not perl's.
It is always a good idea to read the FAQ list, also installed on your computer, before wasting your time.
For example, How can I make my Perl program take less memory? is probably relevant to your issue.
Why do you want Perl to release the memory to the OS? You could just use a larger swap.
If you really must, do your work in a forked process, then exit.
Try recompiling perl with the option -Uusemymalloc to use the system malloc and free. You might see some different results

How do I get the size of a file in megabytes using Perl?

I want to get the size of a file on disk in megabytes. Using the -s operator gives me the size in bytes, but I'm going to assume that then dividing this by a magic number is a bad idea:
my $size_in_mb = (-s $fh) / (1024 * 1024);
Should I just use a read-only variable to define 1024 or is there a programmatic way to obtain the amount of bytes in a kilobyte?
EDIT: Updated the incorrect calculation.
If you'd like to avoid magic numbers, try the CPAN module Number::Bytes::Human.
use Number::Bytes::Human qw(format_bytes);
my $size = format_bytes(-s $file); # 4.5M
This is an old question and has been already correctly answered, but just in case your program is constrained to the core modules and you can not use Number::Bytes::Human here you have several other options I have been collected over time. I have kept them also because each one use a different Perl approach and is a nice example for TIMTOWTDI:
example 1: uses state to avoid reinitialize the variable each time (before perl 5.16 you need to use feature state or perl -E)
http://kba49.wordpress.com/2013/02/17/format-file-sizes-human-readable-in-perl/
sub formatSize {
my $size = shift;
my $exp = 0;
state $units = [qw(B KB MB GB TB PB)];
for (#$units) {
last if $size < 1024;
$size /= 1024;
$exp++;
}
return wantarray ? ($size, $units->[$exp]) : sprintf("%.2f %s", $size, $units->[$exp]);
}
example 2: using sort map
.
sub scaledbytes {
# http://www.perlmonks.org/?node_id=378580
(sort { length $a <=> length $b
} map { sprintf '%.3g%s', $_[0]/1024**$_->[1], $_->[0]
}[" bytes"=>0]
,[KB=>1]
,[MB=>2]
,[GB=>3]
,[TB=>4]
,[PB=>5]
,[EB=>6]
)[0]
}
example 3: Take advantage of the fact that 1 Gb = 1024 Mb, 1 Mb = 1024 Kb and 1024 = 2 ** 10:
.
# http://www.perlmonks.org/?node_id=378544
my $kb = 1024 * 1024; # set to 1 Gb
my $mb = $kb >> 10;
my $gb = $mb >> 10;
print "$kb kb = $mb mb = $gb gb\n";
__END__
1048576 kb = 1024 mb = 1 gb
example 4: use of ++$n and ... until .. to obtain an index for the array
.
# http://www.perlmonks.org/?node_id=378542
#! perl -slw
use strict;
sub scaleIt {
my( $size, $n ) =( shift, 0 );
++$n and $size /= 1024 until $size < 1024;
return sprintf "%.2f %s",
$size, ( qw[ bytes KB MB GB ] )[ $n ];
}
my $size = -s $ARGV[ 0 ];
print "$ARGV[ 0 ]: ", scaleIt $size;
Even if you can not use Number::Bytes::Human, take a look at the source code to see all the things that you need to be aware of.
You could of course create a function for calculating this. That is a better solution than creating constants in this instance.
sub size_in_mb {
my $size_in_bytes = shift;
return $size_in_bytes / (1024 * 1024);
}
No need for constants. Changing the 1024 to some kind of variable/constant won't make this code more readable.
Well, there's not 1024 bytes in a meg, there's 1024 bytes in a K, and 1024 K in a meg...
That said, 1024 is a safe "magic" number that will never change in any system you can expect your program to work in.
I would read this into a variable rather than use a magic number. Even if magic numbers are not going to change, like the number of bytes in a megabyte, using a well named constant is a good practice because it makes your code more readable. It makes it immediately apparent to everybody else what your intention is.
1) You don't want 1024. That gives you kilobytes. You want 1024*1024, or 1048576.
2) Why would dividing by a magic number be a bad idea? It's not like the number of bytes in a megabyte will ever change. Don't overthink things too much.
Don't get me wrong, but: I think that declaring 1024 as a Magic Variable goes a bit too far, that's a bit like "$ONE = 1; $TWO = 2;" etc.
A Kilobyte has been falsely declared as 1024 Bytes since more than 20 years, and I seriously doubt that the operating system manufacturers will ever correct that bug and change it to 1000.
What could make sense though is to declare non-obvious stuff, like "$megabyte = 1024 * 1024" since that is more readable than 1048576.
Since the -s operator returns the file size in bytes you should probably be doing something like
my $size_in_mb = (-s $fh) / (1024 * 1024);
and use int() if you need a round figure. It's not like the dimensions of KB or MB is going to change anytime in the near future :)

Is « my » overwriting memory when called in a loop?

A simple but relevant question: Is « my » overwriting memory when called in a loop?
For instance, is it "better" (in terms of memory leaks, performance, speed) to declare it outside of the loop:
my $variable;
for my $number ( #array ) {
$variable = $number * 5;
_sub($variable);
}
Or should I declare it inside the loop:
for my $number ( #array ) {
my $variable = $number * 5;
_sub($variable);
}
(I just made that code up, it's not meant to do anything nor be used - as it is - in real life)
Will Perl allocate a new space in memory for each and every one of the for iterations ?
Aamir already told you what will happen.
I recommend to stick to the second version unless there is some reason to use the first. You don't want to care about the previous state of $variable. It's simplest to start each iteration with a fresh variable. And if variable contains a reference you might actually shoot yourself in the foot if you push that onto an array.
Edit:
Yes, there is a performance hit. Using a recycled variable will be faster. However, it is hard to hell how much faster it will be as this will depend on your specific situation. No matter how much faster it is though, always remember: Premature optimization is the root of all evil.
From your examples above:
A new space for variable will not be allocated everytime, the previous one will be used.
A new space will be allocated for every iteration of loop and will be de-allocated as well in the same iteration.
These are things you aren't supposed to think about with a dynamic language such as Perl. Even though you might get an answer about what the current implementation does, that's not a feature and it isn't something you should rely on.
Define your variables in the shortest scope possible.
However, to be merely curious, you can use the Devel::Peek module to cheat a bit to see the internal (not physical) memory address:
use Devel::Peek;
foreach ( 0 .. 5 ) {
my $var = $_;
Dump( $var );
}
In this small case, the address ends up being the same. That's no guarantee that it will always be the same for different situations, or even the same program:
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 0
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 1
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 2
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 3
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 4
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 5
You can benchmark the difference between the two uses using the Benchmark module which is made for these types of micro-benchmarking comparisons:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw( cmpthese );
sub outside {
my $x;
for my $y ( 1 .. 1_000_000 ) {
$x = $y;
}
return;
}
sub inside {
for my $y ( 1 .. 1_000_000 ) {
my $x = $y;
}
return;
}
cmpthese -1 => {
inside => \&inside,
outside => \&outside,
};
Results on my Windows XP SP3 laptop:
Rate inside outside
inside 4.44/s -- -25%
outside 5.91/s 33% --
Predictably, the difference is less pronounced when the body of the loop is executed only once.
That said, I would not declare $x outside the loop unless I needed outside the loop what is assigned to $x inside the loop.
You are totally safe using "my" inside a for loop or any other block. In general you don't have to worry about memory leaks in perl, but you would be equally safe in this circumstance with a non-garbage-collecting language like C++. A normal variable is deallocated at the end of the block in which it has scope.