I'm having some issues with the memory usage of a perl script I wrote (code below). The script initiates some variables, fills them with data, and then undefines them again. However, the memory usage of the script after deleting everything is still way to high to contain no data.
Accoring to ps the script uses 1.027 Mb memory (RSS) during the first 39 seconds (so everything before the foreach loop). Then, memory usage starts rising and ends up fluctuating between 204.391 Mb and 172.410 Mb. However, even in the last 10 seconds of the script (where all data is supposed to be removed), memory usage never goes below 172.410 Mb.
Is there a way to permanently delete a variable and all data in it in perl (in order to reduce the memory usage of the script)? If so, how should I do it?
use strict;
use warnings;
sleep(30);
my $ELEMENTS = 1_000_000;
my $MAX_ELEMENT = 1_000_000_000;
my $if_condition = 1;
sleep(5);
my %hash = (1 => {}, 2 => {}, 3 => {}, 4 => {});
foreach my $key (keys %hash){
if( $if_condition ){
my $arrref1 = [ (rand($MAX_ELEMENT)) x $ELEMENTS ];
my $arrref2 = [ (rand($MAX_ELEMENT)) x $ELEMENTS ];
my $arrref3 = [ (rand($MAX_ELEMENT)) x $ELEMENTS ];
sleep(2);
if(!defined($hash{$key}->{'amplification'})){
$hash{$key}->{'amplification'} = [];
}
push(#{$hash{$key}->{'amplification'}},#{$arrref1});
undef($arrref1);
push(#{$hash{$key}->{'amplification'}},#{$arrref2});
undef($arrref2);
push(#{$hash{$key}->{'amplification'}},#{$arrref3});
undef($arrref3);
sleep(3);
delete($hash{$key});
sleep(5);
}
}
sleep(10);
Perl FAQ 3 - How can I free an array or hash so my program shrinks?
You usually can't. Memory allocated to lexicals (i.e. my() variables)
cannot be reclaimed or reused even if they go out of scope. It is
reserved in case the variables come back into scope. Memory allocated
to global variables can be reused (within your program) by using
undef() and/or delete().
On most operating systems, memory allocated
to a program can never be returned to the system. That's why
long-running programs sometimes re- exec themselves. Some operating
systems (notably, systems that use mmap(2) for allocating large chunks
of memory) can reclaim memory that is no longer used, but on such
systems, perl must be configured and compiled to use the OS's malloc,
not perl's.
In general, memory allocation and de-allocation isn't
something you can or should be worrying about much in Perl.
See also
"How can I make my Perl program take less memory?"
In general, perl won't release memory back to the system. It keeps its own pool of memory in case it is required for another purpose. This happens a lot because lexical data is often used in a loop, for instance your $arrref1 variables refer to a million-element array. If the memory for those arrays was returned to the system and reallocated every time around the loop there would be an enormous speed penalty
As I wrote, 170MB isn't a lot, but you can reduce the footprint by dropping your big temporary arrays and adding the list directly to the hash element. As it stands you are unnecessarily keeping two copies of each array
It would look like this
use strict;
use warnings 'all';
sleep 30;
use constant ELEMENTS => 1_000_000;
use constant MAX_ELEMENT => 1_000_000_000;
my $if_condition = 1;
sleep 5;
my %hash = ( 1 => {}, 2 => {}, 3 => {}, 4 => {} );
foreach my $key ( keys %hash ) {
next unless $if_condition;
sleep 2;
push #{ $hash{$key}{amplification} }, (rand MAX_ELEMENT) x ELEMENTS;
push #{ $hash{$key}{amplification} }, (rand MAX_ELEMENT) x ELEMENTS;
push #{ $hash{$key}{amplification} }, (rand MAX_ELEMENT) x ELEMENTS;
sleep 3;
delete $hash{$key};
sleep 5;
}
sleep 10;
Related
I have a shared hash using the following:
my $glue = 'data';
my %options = (
create => 1,
exclusive => 0,
mode => 0644,
destroy => 0,
);
tie %hash1, 'IPC::Shareable', $glue, { %options };
The %hash1 declared as above, is in a single perl file, but it is called by multiple applications, each application modifies its own index of the hash:
Application1 --> $hash1{app1}="alpha";
Application2 --> $hash1{app2}="betta";
...
given that applications may or may not run simultanously, will there be an data loss if application1 and application2 try to modify the hash simultaneously?
You need to use a locking mechanism. (One is provided by the module.) Otherwise, changing any value of the hash can cause the loss of any other value changed at the same time. The following program demonstrates this quite easily:
use strict;
use warnings;
use IPC::Shareable qw( );
my $glue = 'data';
my %options = (
create => 1,
exclusive => 0,
mode => 0644,
destroy => 0,
);
my ($key) = #ARGV
or die("usage\n");
tie(my %h, 'IPC::Shareable', $glue, \%options);
my $c;
while (1) {
$h{$key} = ++$c;
my $got = $h{$key};
die("$key was overwritten (got $got; expected $c)\n")
if $got != $c;
}
Run it as follows in one console:
perl a.pl foo
Then run it as follows in another console:
perl a.pl bar
The key (no pun intended) to make it work with IPC::Shareable it to provide a key as one of the initial parameters:
use IPC::SysV qw( ftok );
tie( %hash, 'IPC::Shareable', $glue, { key => ftok(__FILE__,1234567) });
It is better to use the ftok function from IPC::SysV, because if you provide a string to IPC::Shareable, it will make it a numeric id using only the first 4 characters. Thus if you have two processes with one using test_one as a key and ther other test_two, both process will use the same shared memory segment, because in both case the resulting key produced by IPC::Shareable will be 1953719668. Worse is that if the first process creates a shared memory segment of, say 2048 bytes and you want the second to create one of 524288 bytes, you will actually end up with accessing the first shared memory segment with only 2048 bytes in size and its data, with the risk of modifying the wrong data.
If you use ftok even if you spawn 10,000 processes, it will always access the shared data without loss, because IPC::Shareable uses semaphore to ensure each process access the data one after the other.
The following simple C code allocates abouts 1.6% of my computer memory and completes in less than 2 seconds:
main()
{
int i = 0;
char *array = malloc(64000000);
for (i = 0; i < 64000000; i++) {
array[i] = i % 256;
}
getchar();
}
How can I do a similar thing in Perl?
The following Perl code consumes about 70% of my computer memory (At which I kill it)
my #array;
for(my $i=0;$i<64000000;$i++)
{
$array[$i]=1;
}
getc();
exit;
How do I malloc in Perl ?
You allocated an array of 64,000,000 SV* plus 64,000,000 scalars. The array alone is already 8 times the size of what you allocated in your C program. That's not counting any of the 64,000,000 scalars or the overhead of allocating 64,000,000 memory blocks.
To allocate 64,000,000 bytes, you can use the following:
my $s = "\0" x 64_000_000;
However, that place two copies in memory.[1] The following doesn't.
use Fcntl qw( SEEK_SET );
my $s;
{
open my $fh, '>', \$s;
seek($fh, 64_000_000-1, SEEK_SET);
print $fh "\0";
}
pack+substr can be used to store a number, and substr+unpack can be used to extract a number.
Finally, rather than dealing with packed numbers, you could use PDL.
Technically, it only places one copy into memory, and it does so at compile-time. Thanks to the copy-on-write (COW) mechanism, the assignment simply causes $s to share the buffer of the constant. But, I presume you intend to modify the buffer in $s, which would require making a writable copy of its buffer.
You are seeing the difference in variable sizes between languages.
See http://perlmaven.com/how-much-memory-do-perl-variables-use
This also has a good explanation of memory usage:
http://search.cpan.org/~nwclark/Devel-Size-0.79/lib/Devel/Size.pm
In short, your perl array will need at least 1536 MB of space to store that array.
So, basically I have a very large array that I need to read data from. I want to be able to do this in parallel; however, when I tried, I failed miserably. For the sake of simplicity, let's say I have an array with 100 elements in it. My idea was to partition the array into 10 equals parts and try to read them in parallel (10 is arbitrary, but I don't know how many processes I could run at once and 10 seemed low enough). I need to return a computation (new data structure) based off of my readings from each partition, but I am NOT modifying anything in the original array.
Instead of trying the above exactly, I tried something simpler, but I did it incorrectly, because it didn't work in any capacity. So, then I tried to simply use child processes to push to a an array. The code below is using Time::HiRes to see how much faster I can get this to run using forking as opposed to not, but I'm not at that point yet (I'm going to be testing that when I have closer to a few million entries in my array):
use strict;
use warnings;
use Time::HiRes;
print "Starting main program\n";
my %child;
my #array=();
my $counter=0;
my $start = Time::HiRes::time();
for (my $count = 1; $count <= 10; $count++)
{
my $pid = fork();
if ($pid)
{
$child{$pid}++;
}
elsif ($pid == 0)
{
addToArray(\$counter,\#array);
exit 0;
}
else
{
die "couldnt fork: $!\n";
}
}
while (keys %child)
{
my $pid = waitpid(-1,0);
delete $child{$pid};
}
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "Size of array: ".scalar(#array)."\n";
print "End of main program\n";
sub addToArray
{
my $start=shift;
my $count=${$start};
${$start}+=10;
my $array=shift;
for (my $i=$count; $i<$count +10; $i++)
{
push #{$array}, $i;
}
print scalar(#{$array})."\n";
}
NB: I used push in lieu of ${$array}[$i]=$i, because I realized that my $counter wasn't actually updating, so that would never work with this code.
I assume that this doesn't work because the children are all copies of the original program and I'm never actually adding anything to the array in my "original program". On that note, I'm very stuck. Again, the actual problem that I'm actually trying to solve is how to partition my array (with data in it) and try to read them in parallel and return a computation based off of my readings (NOTE: I'm not going to modify the original array), but I'm never going to be able to do that if I can't figure out how to actually get my $counter to update. I'd also like to know how to get the code above to do what I want it to do, but that's a secondary goal.
Once I can get my counter to update correctly, is there any chance that another process would start before it updates and I wouldn't actually be reading in the entire array? If so, how do I account for this?
Please, any help would be much appreciated. I'm very frustrated/stuck. I hope there is an easy fix. Thanks in advance.
EDIT: I attempted to use Parallel::ForkManager, but to no avail:
#!/usr/local/roadm/bin/perl
use strict;
use warnings;
use Time::HiRes;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(10);
for (my $count = 1; $count <= 10; $count++)
{
my $pid = $pm->start and next;
sub1(\$counter,\#array);
$pm->finish; # Terminates the child process
}
$pm->wait_all_children;
I didn't include the other extraneous stuff, see above for missing code/sub... Again, help would be much appreciated. I'm very new to this and kind of need someone to hold my hand. I also tried to do something with run_on_start and run_on_finish, but they didn't work either.
Your code has two issues: Your child processes share no data, and you would have a race condition if forked processes would share data. The solution is to use threads. Any possibility for race conditions can be eliminated by partitioning the data in the parent thread, and of course, by not using shared data.
Threads
Threads in Perl behave similar to forking: by default, there is no shared memory. This makes using threads quite easy. However, each thread runs it own perl interpreter, which makes threads quite costly. Use sparingly.
First, we have to activate threading support via use threads. To start a thread, we do threads->create(\&code, #args), which returns a thread object. The code will then run in a separate thread, and will be invoked with the given arguments. After the thread has finished execution, we can collect the return value by calling $thread->join. Note: The context of the threaded code is determined by the create method, not by join.
We could mark variables with the :shared attribute. Your $counter and #array would be examples for this, but it is generally better to pass explicit copies of data around than to use shared state (disclaimer: from a theoretical standpoint, that is). To avoid race conditions with the shared data, you'd actually have to protect your $counter with a semaphore, but again, there is no need for shared state.
Here is a toy program showing how you could use threads to parallelize a calculation:
use strict;
use warnings;
use threads;
use 5.010; # for `say`, and sane threads
use Test::More;
# This program calculates differences between elements of an array
my #threads;
my #array = (1, 4, 3, 5, 5, 10, 7, 8);
my #delta = ( 3, -1, 2, 0, 5, -3, 1 );
my $number_of_threads = 3;
my #partitions = partition( $#array, $number_of_threads );
say "partitions: #partitions";
for (my $lower_bound = 0; #partitions; $lower_bound += shift #partitions) {
my $upper_bound = $lower_bound + $partitions[0];
say "spawning thread with [#array[$lower_bound .. $upper_bound]]";
# pass copies of the values in the array slice to new thread:
push #threads, threads->create(\&differences, #array[$lower_bound .. $upper_bound]);
# note that threads->create was called in list context
}
my #received;
push #received, $_->join for #threads; # will block until all are finished
is_deeply \#received, \#delta;
done_testing;
# calculates the differences. This doesn't need shared memory.
# note that #array could have been safely accessed, as it is never written to
# If I had written to a (unshared) variable, these changes would have been thread-local
sub differences {
say "Hi from a worker thread, I have ", 0+#_, " elements to work on";
return map $_[$_] - $_[$_-1], 1 .. $#_;
# or more readable:
# my #d;
# for my $i (1 .. $#_) {
# push #d, $_[$i] - $_[$i-1];
# }
# return #d;
}
# divide workload into somewhat fair parts, giving earlier threads more work
sub partition {
my ($total, $parts) = #_;
my $base_size = int($total / $parts);
my #partitions = ($base_size) x $parts;
$partitions[$_-1]++ for 1 .. $total - $base_size*$parts;
return #partitions;
}
A note on the number of threads: This should depend on the number of processors of your system. If you have four cores, more than four threads don't make much sense.
If you're going to use child processes after forking, each child process is autonomous and has its own copy of the data in the program as of the time it was forked from the main program. The changes made by the child in its own memory have no effect on the parent's memory. If you need that, either you need a threading Perl and to use threads, or you need to think again — maybe using shared memory, but locating Perl data into the shared memory might be tricky.
So, one option is to read all the data into memory before forking off and having the children work on their own copies of the data.
Depending on the structure of the problem, another possibility might be to have each child read and work on a portion of the data. This won't work if each child must have access to all the data.
It isn't clear how much speed up you'll get through threading or forking if the threads or processes are all tied up reading the same file. Getting the data into memory may be best treated as a single-threaded (single-tasking) operation; the parallelism can spring into effect — and yield benefits — once the data is in memory.
There are some CPAN modules that makes your life easier. One of them is Parallel::ForkManager, which is a simple parallel processing fork manager
So, after my struggle, here's the fix:
EDIT: THIS DOES NOT ACCOMPLISH WHAT I WANTED TO DO
#!/usr/local/roadm/bin/perl
use strict;
use warnings;
use Time::HiRes;
use Parallel::ForkManager;
print "Starting main program\n";
my #array=();
my $counter=0;
my $start = Time::HiRes::time();
my $max_processes=20;
my $partition=10;
my $max_elements=100;
my $pm = Parallel::ForkManager->new($max_processes);
$pm->run_on_start( sub {
my ($pid, $exit_code, $ident) = #_;
sub1(\$counter,\#array);
});
while ($counter < $max_elements)
{
my $pid = $pm->start and next;
$pm->finish; # Terminates the child process
}
$pm->wait_all_children;
my $stop = Time::HiRes::time();
my $duration = $stop-$start;
print "Time spent: $duration\n";
print "Size of array: ".scalar(#array)."\n";
print "\nEnd of main program\n";
sub sub1 {
my $start=shift;
my $count=${$start};
${$start}+=$partition;
my $array=shift;
for (my $i=$count; $i<$count + $partition; $i++)
{
push #{$array}, $i;
}
return #{$array};
}
I am having some problems with memory in Perl. When I fill up a big hash, I can not get the memory to be released back to the OS. When I do the same with a scalar and use undef, it will give the memory back to the OS.
Here is a test program I wrote.
#!/usr/bin/perl
###### Memory test
######
## Use Commands
use Number::Bytes::Human qw(format_bytes);
use Data::Dumper;
use Devel::Size qw(size total_size);
## Create Varable
my $share_var;
my %share_hash;
my $type_hash = 1;
my $type_scalar = 1;
## Start Main Loop
while (true) {
&Memory_Check();
print "Hit Enter (add to memory): "; <>;
&Up_Mem(100_000);
&Memory_Check();
print "Hit Enter (Set Varable to nothing): "; <>;
$share_var = "";
$share_hash = ();
&Memory_Check();
print "Hit Enter (clean data): "; <>;
&Clean_Data();
&Memory_Check();
print "Hit Enter (start over): "; <>;
}
exit;
#### Up Memory
sub Up_Mem {
my $total_loops = shift;
my $n = 1;
print "Adding data to shared varable $total_loops times\n";
until ($n > $total_loops) {
if ($type_hash) {
$share_hash{$n} = 'X' x 1111;
}
if ($type_scalar) {
$share_var .= 'X' x 1111;
}
$n += 1;
}
print "Done Adding Data\n";
}
#### Clean up Data
sub Clean_Data {
print "Clean Up Data\n";
if ($type_hash) {
## Method to fix hash (Trying Everything i can think of!
my $n = 1;
my $total_loops = 100_000;
until ($n > $total_loops) {
undef $share_hash{$n};
$n += 1;
}
%share_hash = ();
$share_hash = ();
undef $share_hash;
undef %share_hash;
}
if ($type_scalar) {
undef $share_var;
}
}
#### Check Memory Usage
sub Memory_Check {
## Get current memory from shell
my #mem = `ps aux | grep \"$$\"`;
my($results) = grep !/grep/, #mem;
## Parse Data from Shell
chomp $results;
$results =~ s/^\w*\s*\d*\s*\d*\.\d*\s*\d*\.\d*\s*//g; $results =~ s/pts.*//g;
my ($vsz,$rss) = split(/\s+/,$results);
## Format Numbers to Human Readable
my $h = Number::Bytes::Human->new();
my $virt = $h->format($vsz);
my $h = Number::Bytes::Human->new();
my $res = $h->format($rss);
print "Current Memory Usage: Virt: $virt RES: $res\n";
if ($type_hash) {
my $total_size = total_size(\%share_hash);
my #arr_c = keys %share_hash;
print "Length of Hash: " . ($#arr_c + 1) . " Hash Mem Total Size: $total_size\n";
}
if ($type_scalar) {
my $total_size = total_size($share_var);
print "Length of Scalar: " . length($share_var) . " Scalar Mem Total Size: $total_size\n";
}
}
OUTPUT:
./Memory_Undef_Simple.cgi
Current Memory Usage: Virt: 6.9K RES: 2.7K
Length of Hash: 0 Hash Mem Total Size: 92
Length of Scalar: 0 Scalar Mem Total Size: 12
Hit Enter (add to memory):
Adding data to shared varable 100000 times
Done Adding Data
Current Memory Usage: Virt: 228K RES: 224K
Length of Hash: 100000 Hash Mem Total Size: 116813243
Length of Scalar: 111100000 Scalar Mem Total Size: 111100028
Hit Enter (Set Varable to nothing):
Current Memory Usage: Virt: 228K RES: 224K
Length of Hash: 100000 Hash Mem Total Size: 116813243
Length of Scalar: 0 Scalar Mem Total Size: 111100028
Hit Enter (clean data):
Clean Up Data
Current Memory Usage: Virt: 139K RES: 135K
Length of Hash: 0 Hash Mem Total Size: 92
Length of Scalar: 0 Scalar Mem Total Size: 24
Hit Enter (start over):
So as you can see the memory goes down, but it only goes down the size of the scalar. Any ideas how to free the memory of the hash?
Also Devel::Size shows the hash is only taking up 92 bytes even though the program still is using 139K.
Generally, yeah, that's how memory management on UNIX works. If you are using Linux with a recent glibc, and are using that malloc, you can return free'd memory to the OS. I am not sure Perl does this, though.
If you want to work with large datasets, don't load the whole thing into memory, use something like BerkeleyDB:
https://metacpan.org/pod/BerkeleyDB
Example code, stolen verbatim:
use strict ;
use BerkeleyDB ;
my $filename = "fruit" ;
unlink $filename ;
tie my %h, "BerkeleyDB::Hash",
-Filename => $filename,
-Flags => DB_CREATE
or die "Cannot open file $filename: $! $BerkeleyDB::Error\n" ;
# Add a few key/value pairs to the file
$h{apple} = "red" ;
$h{orange} = "orange" ;
$h{banana} = "yellow" ;
$h{tomato} = "red" ;
# Check for existence of a key
print "Banana Exists\n\n" if $h{banana} ;
# Delete a key/value pair.
delete $h{apple} ;
# print the contents of the file
while (my ($k, $v) = each %h)
{ print "$k -> $v\n" }
untie %h ;
(OK, not verbatim. Their use of use vars is ... legacy ...)
You can store gigabytes of data in a hash this way, and you will only use a tiny bit of memory. (Basically, whatever BDB's pager decides to keep in memory; this is controllable.)
In general, you cannot expect perl to release memory to the OS.
See the FAQ: How can I free an array or hash so my program shrinks?.
You usually can't. Memory allocated to lexicals (i.e. my() variables) cannot be reclaimed or reused even if they go out of scope. It is reserved in case the variables come back into scope. Memory allocated to global variables can be reused (within your program) by using undef() and/or delete().
On most operating systems, memory allocated to a program can never be returned to the system. That's why long-running programs sometimes re- exec themselves. Some operating systems (notably, systems that use mmap(2) for allocating large chunks of memory) can reclaim memory that is no longer used, but on such systems, perl must be configured and compiled to use the OS's malloc, not perl's.
It is always a good idea to read the FAQ list, also installed on your computer, before wasting your time.
For example, How can I make my Perl program take less memory? is probably relevant to your issue.
Why do you want Perl to release the memory to the OS? You could just use a larger swap.
If you really must, do your work in a forked process, then exit.
Try recompiling perl with the option -Uusemymalloc to use the system malloc and free. You might see some different results
A simple but relevant question: Is « my » overwriting memory when called in a loop?
For instance, is it "better" (in terms of memory leaks, performance, speed) to declare it outside of the loop:
my $variable;
for my $number ( #array ) {
$variable = $number * 5;
_sub($variable);
}
Or should I declare it inside the loop:
for my $number ( #array ) {
my $variable = $number * 5;
_sub($variable);
}
(I just made that code up, it's not meant to do anything nor be used - as it is - in real life)
Will Perl allocate a new space in memory for each and every one of the for iterations ?
Aamir already told you what will happen.
I recommend to stick to the second version unless there is some reason to use the first. You don't want to care about the previous state of $variable. It's simplest to start each iteration with a fresh variable. And if variable contains a reference you might actually shoot yourself in the foot if you push that onto an array.
Edit:
Yes, there is a performance hit. Using a recycled variable will be faster. However, it is hard to hell how much faster it will be as this will depend on your specific situation. No matter how much faster it is though, always remember: Premature optimization is the root of all evil.
From your examples above:
A new space for variable will not be allocated everytime, the previous one will be used.
A new space will be allocated for every iteration of loop and will be de-allocated as well in the same iteration.
These are things you aren't supposed to think about with a dynamic language such as Perl. Even though you might get an answer about what the current implementation does, that's not a feature and it isn't something you should rely on.
Define your variables in the shortest scope possible.
However, to be merely curious, you can use the Devel::Peek module to cheat a bit to see the internal (not physical) memory address:
use Devel::Peek;
foreach ( 0 .. 5 ) {
my $var = $_;
Dump( $var );
}
In this small case, the address ends up being the same. That's no guarantee that it will always be the same for different situations, or even the same program:
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 0
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 1
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 2
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 3
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 4
SV = IV(0x9ca968) at 0x9ca96c
REFCNT = 1
FLAGS = (PADMY,IOK,pIOK)
IV = 5
You can benchmark the difference between the two uses using the Benchmark module which is made for these types of micro-benchmarking comparisons:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw( cmpthese );
sub outside {
my $x;
for my $y ( 1 .. 1_000_000 ) {
$x = $y;
}
return;
}
sub inside {
for my $y ( 1 .. 1_000_000 ) {
my $x = $y;
}
return;
}
cmpthese -1 => {
inside => \&inside,
outside => \&outside,
};
Results on my Windows XP SP3 laptop:
Rate inside outside
inside 4.44/s -- -25%
outside 5.91/s 33% --
Predictably, the difference is less pronounced when the body of the loop is executed only once.
That said, I would not declare $x outside the loop unless I needed outside the loop what is assigned to $x inside the loop.
You are totally safe using "my" inside a for loop or any other block. In general you don't have to worry about memory leaks in perl, but you would be equally safe in this circumstance with a non-garbage-collecting language like C++. A normal variable is deallocated at the end of the block in which it has scope.