Conditions in Perl loops and performance - perl

There are many idioms in Perl with the using of operators/functions/subprograms/methods in loop conditions. Books advice use them!
But as I understand these conditions are calculated each iteration. Am I right?
Perl 5:
foreach my $key (keys %hash) { ... }
for my $value (values %hash) { ... }
Perl 6:
for 'words.txt'.IO.lines -> $line { ... }
while $index < $fruit.chars { ... }
Why programmers don't assign condition to some variable before loop and use this variable in loop? It would increase speed. So the first example would look like this:
my #array = keys %hash;
foreach my $keys (#array) { ... }

The condition is only calculated initially (before the loop starts) so I do not think it would increase the speed to precalculate the array before the loop.. Example:
for my $key (get_keys()) {
say $key;
}
sub get_keys {
say "Calculating keys..";
return qw(a b c d);
}
Output:
Calculating keys..
a
b
c
d

foreach my $key (keys %hash) { ... }
for my $value (values %hash) { ... }
The for and the foreach are synonymous in Perl, so aside from the fact that your two example snippets are operating on different parts of the hash, they're the same thing.
Ok, so here's what happens internally: In each case all keys, or all values are calculated as a list, and then the looping construct iterates on that calculated list. There is an internal check, but that check is only to see if the loop has reached the offset of the last element in the list yet. That is a cheap operation in the underlying C code. To be clear, keys and values are not called on each iteration. The list of things iterated over is computed only once at the beginning of the loop.
Also, $key and $value are aliases to the actual key or the actual value, not copies. So there is no per-iteration copy made.
The nuance that is often missed is the fact that the iteration list is precomputed upon entering the loop. That is why it's considered a terrible idea to do this:
foreach my $line (<$file_handle>) {...}
...because the entire file must be read and held in memory at once before the first line can be processed. The fact that a list must be available internally first is typically an acceptable memory trade-off for things that are already held in memory to begin with. But for external sources such as a file there's no guarantee that available memory can hold the whole thing -- particularly if it's some endless stream. Consider this code:
open my $fh, '<', '/dev/urandom';
say ord while <$fh>;
It will never terminate, but will emit a constant stream of ordinal values. However, it does not grow in memory usage.
Now change the second line to this:
say ord for <$fh>;
This will appear to hang while it consumes all of the system's memory attempting to retrieve the entire contents of /dev/urandom (and endless stream). It must do this before it can start iterating, because that's how a range-based foreach loop works in Perl, and some other languages.
So a range based foreach loop is inexpensive in its computational overhead, but in some cases potentially expensive in its memory footprint.
Speaking to your final example:
my #array = keys %hash;
foreach my $keys (#array) { ... }
It doesn't make an appreciable difference, and may actually be slower or consume more memory. When I compare the two approaches with a hash of 100000 elements the difference between the two is only 2%, or within the margin of error:
Rate copy direct
copy 35.9/s -- -2%
direct 36.7/s 2% --
Here's the code:
use Benchmark qw(cmpthese);
my %hash;
#hash{1..100000} = (1..100000);
sub copy {
my #array = keys %hash;
my $b = 0;
$b += $_ foreach #array;
return $b;
}
sub direct {
my $b = 0;
$b += $_ foreach keys %hash;
return $b;
}
cmpthese(-5, {
copy => \&copy,
direct => \&direct,
});

Related

Memory/performance tradeoff when determining the size of a Perl hash

I was browsing through some Perl code in a popular repositiory on GitHub and ran across this method to calculate the size of a hash:
while ( my ($a, undef ) = each %h ) { $num++; }
I thought why would one go through the trouble of writing all that code when it could more simply be written as
$num = scalar keys %h;
So, I compared both methods with Benchmark.
my %h = (1 .. 1000);
cmpthese(-10, {
keys => sub {
my $num = 0;
$num = scalar keys %h;
},
whileloop => sub {
my $num = 0;
while ( my ($a, undef ) = each %h ) {
$num++;
}
},
});
RESULTS
Rate whileloop keys
whileloop 5090/s -- -100%
keys 7234884/s 142047% --
The results show that using keys is MUCH faster than the while loop. My question is this: why would the original coder use such a slow method? Is there something that I'm missing? Also, is there a faster way?
I cannot read the mind of whomever might have written that piece of code, but he/she likely thought:
my $n = keys %hash;
used more memory than iterating through everything using each.
Note that the scalar on the left hand side of the assignment creates scalar context: There is no need for scalar unless you want to create a scalar context in what would otherwise have been list context.
Because he didn't know about keys's ability to return the number of elements in the hash.

Perl takes a long time to evaluate: keys %hash / iterate through a large hash

In a Perl script, I build up a large hash (around 10 GB) which takes about 40 minuets, which has around 100 million keys. Next I want to loop through the keys of the hash, like so:
foreach my $key (keys %hash) {
However this lines takes 1 hour and 20 minuets to evaluate! Once in the for-loop the code runs through the whole hash at a quick pace.
Why does entering the forloop take so long? And how can I speed the process up?
foreach my $key (keys %hash) {
This code will create a list that includes all the keys in %hash first, and you said your %hash is huge, then that will take a while to finish. Especially if you start swapping memory to disk because you ran out of real memory.
You could use while (my ($key, $value) = each %hash) { to iterate over that hash, and this one will not create that huge list. If you were swapping, this will be much faster since you won't be anymore.
There are two approaches to iterate over hash, both having their pros and cons.
Approach 1:
foreach my $k (keys %h)
{
print "key: $k, value: $h{$k}\n";
}
Pros:
It is possible to sort the output by key.
Cons:
It creates a temporary list to hold the keys, in case your hash is very large you end up using lots of memory resources.
Approach 2:
while ( ($k, $v) = each %h )
{
print "key: $k, value: $h{$k}\n";
}
Pros:
This uses very little memory as every time each is called it only returns a pair of (key, value) element.
Cons:
You can't order the output by key.
The iterator it uses belongs to %h. If the code inside the loop calls something that does keys %h, values %h or each %h, then the loop won't work properly, because %h only has 1 iterator

Can I copy a hash without resetting its "each" iterator?

I am using each to iterate through a Perl hash:
while (my ($key,$val) = each %hash) {
...
}
Then something interesting happens and I want to print out the hash. At first I consider something like:
while (my ($key,$val) = each %hash) {
if (something_interesting_happens()) {
foreach my $k (keys %hash) { print "$k => $hash{$k}\n" }
}
}
But that won't work, because everyone knows that calling keys (or values) on a hash resets the internal iterator used for each, and we may get an infinite loop. For example, these scripts will run forever:
perl -e '%a=(foo=>1); while(each %a){keys %a}'
perl -e '%a=(foo=>1); while(each %a){values %a}'
No problem, I thought. I could make a copy of the hash, and print out the copy.
if (something_interesting_happens()) {
%hash2 = %hash;
foreach my $k (keys %hash2) { print "$k => $hash2{$k}\n" }
}
But that doesn't work, either. This also resets the each iterator. In fact, any use of %hash in a list context seems to reset its each iterator. So these run forever, too:
perl -e '%a=(foo=>1); while(each %a){%b = %a}'
perl -e '%a=(foo=>1); while(each %a){#b = %a}'
perl -e '%a=(foo=>1); while(each %a){print %a}'
Is this documented anywhere? It makes sense that perl might need to use the same internal iterator to push a hash's contents onto a return stack, but I can also imagine hash implementations that didn't need to do that.
More importantly, is there any way to do what I want? To get to all the elements of a hash without resetting the each iterator?
This also suggests you can't debug a hash inside an each iteration, either. Consider running the debugger on:
%a = (foo => 123, bar => 456);
while ( ($k,$v) = each %a ) {
$DB::single = 1;
$o .= "$k,$v;";
}
print $o;
Just by inspecting the hash where the debugger stops (say, typing p %a or x %a), you will change the output of the program.
Update: I uploaded Hash::SafeKeys as a general solution to this problem. Thanks #gpojd for pointing me in the right direction and #cjm for a suggestion that made the solution much simpler.
Have you tried Storable's dclone to copy it? It would probably be something like this:
use Storable qw(dclone);
my %hash_copy = %{ dclone( \%hash ) };
How big is this hash? How long does it take to iterate through it, such that you care about the timing of the access?
Just set a flag and do the action after the end of the iteration:
my $print_it;
while (my ($key,$val) = each %hash) {
$print_it = 1 if something_interesting_happens();
...
}
if ($print_it) {
foreach my $k (keys %hash) { print "$k => $hash{$k}\n" }
}
Although there's no reason not to use each in the printout code, too, unless you were planning on sorting by key or something.
Let's not forget that keys %hash is already defined when you enter the while loop. One could have simply saved the keys into an array for later use:
my #keys = keys %hash;
while (my ($key,$val) = each %hash) {
if (something_interesting_happens()) {
print "$_ => $hash{$_}\n" for #keys;
}
}
Downside:
It's less elegant (subjective)
It won't work if %hash is modified (but then why would one use each in the first place?)
Upside:
It uses less memory by avoiding hash-copying
Not really. each is incredibly fragile. It stores iteration state on the iterated hash itself, state which is reused by other parts of perl when they need it. Far safer is to forget that it exists, and always iterate your own list from the result of keys %hash instead, because the iteration state over a list is stored lexically as part of the for loop itself, so is immune from corruption by other things.

How to optimize two-dimensional hash traversing in Perl?

I have a hash of hashes %signal_db. A typical element is: $signal_db{$cycle}{$key}. There are 10,000s of signals, and 10,000s of keys.
Is there any way to optimize (timewise) this piece of code:
foreach my $cycle (sort numerically keys %signal_db) {
foreach my $key (sort keys %{$signal_db{$cycle}}) {
print $signal_db{$cycle}{$key}.$key."\n";
}
}
The elements have to be printed in the same order as in my code.
Two micro optimizations: map inner hash instead of constant dereferencing and buffer instead of constant print. It's possible to get rid of sorting using alternative storage formats, tested two variants. Results:
Rate original try3 alternative alternative2
original 46.1/s -- -12% -21% -32%
try3 52.6/s 14% -- -10% -22%
alternative 58.6/s 27% 11% -- -13%
alternative2 67.5/s 46% 28% 15% --
Conclusion:
It's better to use presorted storage format, but without C win would probably be within 100% (on my test dataset). Provided information about data suggests that keys in outer hash are almost sequential numbers, so this cries for array.
Script:
#!/usr/bin/env perl
use strict; use warnings;
use Benchmark qw/timethese cmpthese/;
my %signal_db = map { $_ => {} } 1..1000;
%$_ = map { $_ => $_ } 'a'..'z' foreach values %signal_db;
my #signal_db = map { { cycle => $_ } } 1..1000;
$_->{'samples'} = { map { $_ => $_ } 'a'..'z' } foreach #signal_db;
my #signal_db1 = map { $_ => [] } 1..1000;
#$_ = map { $_ => $_ } 'a'..'z' foreach grep ref $_, #signal_db1;
use Sort::Key qw(nsort);
sub numerically { $a <=> $b }
my $result = cmpthese( -2, {
'original' => sub {
open my $out, '>', 'tmp.out';
foreach my $cycle (sort numerically keys %signal_db) {
foreach my $key (sort keys %{$signal_db{$cycle}}) {
print $out $signal_db{$cycle}{$key}.$key."\n";
}
}
},
'try3' => sub {
open my $out, '>', 'tmp.out';
foreach my $cycle (map $signal_db{$_}, sort numerically keys %signal_db) {
my $tmp = '';
foreach my $key (sort keys %$cycle) {
$tmp .= $cycle->{$key}.$key."\n";
}
print $out $tmp;
}
},
'alternative' => sub {
open my $out, '>', 'tmp.out';
foreach my $cycle (map $_->{'samples'}, #signal_db) {
my $tmp = '';
foreach my $key (sort keys %$cycle) {
$tmp .= $cycle->{$key}.$key."\n";
}
print $out $tmp;
}
},
'alternative2' => sub {
open my $out, '>', 'tmp.out';
foreach my $cycle (grep ref $_, #signal_db1) {
my $tmp = '';
foreach (my $i = 0; $i < #$cycle; $i+=2) {
$tmp .= $cycle->[$i+1].$cycle->[$i]."\n";
}
print $out $tmp;
}
},
} );
my %signal_db = map {$_ => {1 .. 1000}} 1 .. 1000;
sub numerically {$a <=> $b}
sub orig {
my $x;
foreach my $cycle (sort numerically keys %signal_db) {
foreach my $key (sort keys %{$signal_db{$cycle}}) {
$x += length $signal_db{$cycle}{$key}.$key."\n";
}
}
}
sub faster {
my $x;
our ($cycle, $key, %hash); # move allocation out of the loop
local *hash; # and use package variables which are faster to alias into
foreach $cycle (sort {$a <=> $b} # the {$a <=> $b} literal is optimized
keys %signal_db) {
*hash = $signal_db{$cycle}; # alias into %hash
foreach $key (sort keys %hash) {
$x += length $hash{$key}.$key."\n"; # simplify the lookup
}
}
}
use Benchmark 'cmpthese';
cmpthese -5 => {
orig => \&orig,
faster => \&faster,
};
which gets:
Rate orig faster
orig 2.56/s -- -15%
faster 3.03/s 18% --
Not a huge gain, but it is something. There isn't much more you can optimize without changing your data structure to use presorted arrays. (or writing the whole thing in XS)
Switching the foreach loops to use external package variables saves a little bit of time since perl does not have to create lexicals in the loop. Also package variables seem to be a bit faster to alias into. Reducing the inner lookup to a single level also helps.
I assume you are printing to STDOUT and then redirecting the output to a file? If so, using Perl to open the output file directly and then printing to that handle may allow for improvements in file IO performance. Another micro-optimization could be to experiment with different record sizes. For example, does it save any time to build an array in the inner loop, then join / print it at the bottom of the outer loop? But that is something that is fairly device dependent (and possibly pointless due to other IO caching layers), so I will leave that test up to you.
I'd first experiment with the Sort::Key module because sorting takes longer than simple looping and printing. Also, if the inner hashes keys are (mostly) identical, then you should simply presort them, but I'll assume this isn't the case or else you'd be doing that already.
You should obviously try assigning $signal_db{$cycle} to a reference too. You might find that each is faster than keys plus retrieval as well, especially if used with Sort::Key. I'd check if map runs faster than foreach too, probably the same, but who knows. You might find print runs faster if you pass it a list or call it multiple times.
I haven't tried this code but throwing together all these ideas except each gives :
foreach my $cycle (nsort keys %signal_db) {
my $r = $signal_db{$cycle};
map { print ($r->{$_},$_,"\n"); } (nsort keys %$r);
}
There is an article about sorting in perl here, check out the Schwartzian Transform if you wish to see how one might use each.
If your code need not be security conscious, then you could conceivably disable Perl's protection against algorithmic complexity attacks by setting PERL_HASH_SEED or related variables and/or recompile Perl with altered setting, so that perl's keys and values commands returned the keys or values in sorted order already, thus saving you considerable time sorting them. But please watch this 28C3 talk before doing so. I donno if this'll even work either, you'd need to read this part of Perl's source code, maybe easier just implementing your loop in C.

three questions on a Perl function

I am trying to use an existing Perl program, which includes the following function of GetItems. The way to call this function is listed in the following.
I have several questions for this program:
what does foreach my $ref (#_) aim to do? I think #_ should be related to the parameters passed, but not quite sure.
In my #items = sort { $a <=> $b } keys %items; the "items" on the left side should be different from the "items" on the right side? Why do they use the same name?
What does $items{$items[$i]} = $i + 1; aim to do? Looks like it just sets up the value for the hash $items sequentially.
$items = GetItems($classes, $pVectors, $nVectors, $uVectors);
######################################
sub GetItems
######################################
{
my $classes = shift;
my %items = ();
foreach my $ref (#_)
{
foreach my $id (keys %$ref)
{
foreach my $cui (keys %{$ref->{$id}}) { $items{$cui} = 1 }
}
}
my #items = sort { $a <=> $b } keys %items;
open(VAL, "> $classes.items");
for my $i (0 .. $#items)
{
print VAL "$items[$i]\n";
$items{$items[$i]} = $i + 1;
}
close VAL;
return \%items;
}
When you enter a function, #_ starts out as an array of (aliases to) all the parameters passed into the function; but the my $classes = shift removes the first element of #_ and stores it in the variable $classes, so the foreach my $ref (#_) iterates over all the remaining parameters, storing (aliases to) them one at a time in $ref.
Scalars, hashes, and arrays are all distinguished by the syntax, so they're allowed to have the same name. You can have a $foo, a #foo, and a %foo all at the same time, and they don't have to have any relationship to each other. (This, together with the fact that $foo[0] refers to #foo and $foo{'a'} refers to %foo, causes a lot of confusion for newcomers to the language; you're not alone.)
Exactly. It sets each element of %items to a distinct integer ranging from one to the number of elements, proceeding in numeric (!) order by key.
foreach my $ref (#_) loops through each hash reference passed as a parameter to GetItems. If the call looks like this:
$items = GetItems($classes, $pVectors, $nVectors, $uVectors);
then the loop processes the hash refs in $pVector, $nVectors, and $uVectors.
#items and %items are COMPLETELY DIFFERENT VARIABLES!! #items is an array variable and %items is a hash variable.
$items{$items[$i]} = $i + 1 does exactly as you say. It sets the value of the %items hash whose key is $items[$i] to $i+1.
Here is an (nearly) line by line description of what is happening in the subroutine
Define a sub named GetItems.
sub GetItems {
Store the first value in the default array #_, and remove it from the array.
my $classes = shift;
Create a new hash named %items.
my %items;
Loop over the remaining values given to the subroutine, setting $ref to the value on each iteration.
for my $ref (#_){
This code assumes that the previous line set $ref to a hash ref. It loops over the unsorted keys of the hash referenced by $ref, storing the key in $id.
for my $id (keys %$ref){
Using the key ($id) given by the previous line, loop over the keys of the hash ref at that position in $ref. While also setting the value of $cui.
for my $cui (keys %{$ref->{$id}}) {
Set the value of %item at position $cui, to 1.
$items{$cui} = 1;
End of the loops on the previous lines.
}
}
}
Store a sorted list of the keys of %items in #items according to numeric value.
my #items = sort { $a <=> $b } keys %items;
Open the file named by $classes with .items appended to it. This uses the old-style two arg form of open. It also ignores the return value of open, so it continues on to the next line even on error. It stores the file handle in the global *VAL{IO}.
open(VAL, "> $classes.items");
Loop over a list of indexes of #items.
for my $i (0 .. $#items){
Print the value at that index on it's own line to *VAL{IO}.
print VAL "$items[$i]\n";
Using that same value as an index into %items (which it is a key of) to the index plus one.
$items{$items[$i]} = $i + 1;
End of loop.
}
Close the file handle *VAL{IO}.
close VAL;
Return a reference to the hash %items.
return \%items;
End of subroutine.
}
I have several questions for this program:
What does foreach my $ref (#_) aim to do? I think #_ should be related to the parameters passed, but not quite sure.
Yes, you are correct. When you pass parameters into a subroutine, they automatically are placed in the #_ array. (Called a list in Perl). The foreach my $ref (#_) begins a loop. This loop will be repeated for each item in the #_ array, and each time, the value of $ref will be assigned the next item in the array. See Perldoc's Perlsyn (Perl Syntax) section about for loops and foreach loops. Also look at Perldoc's Perlvar (Perl Variables) section of General variables for information about special variables like #_.
Now, the line my $classes = shift; is removing the first item in the #_ list and putting it into the variable $classes. Thus, the foreach loop will be repeated three times. Each time, $ref will be first set to the value of $pVectors, $nVectors, and finally $uVectors.
By the way, these aren't really scalar values. In Perl, you can have what is called a reference. This is the memory location of the data structure you're referencing. For example, I have five students, and each student has a series of tests they've taken. I want to store all the values of each test in a hash keyed by the student's ID.
Normally, each entry in the hash can only contain a single item. However, what if this item refers to a list that contains the student's grades?
Here's the list of student #100's grade:
#grades = (100, 93, 89, 95, 74);
And here's how I set Student 100's entry in my hash:
$student{100} = \#grades;
Now, I can talk about the first grade of the year for Student #100 as $student{100}[0]. See the Perldoc's Mark's very short tutorial about references.
In my #items = sort { $a <=> $b } keys %items; the "items" on the left side should be different from the "items" on the right side? Why do they use the same name?
In Perl, you have three major types of variables: Lists (what some people call Arrays), Hashes (what some people call Keyed Arrays), and Scalars. In Perl, it is perfectly legal to have different variable types have the same name. Thus, you can have $var, %var, and #var in your program, and they'll be treated as completely separate variables1.
This is usually a bad thing to do and is highly discouraged. It gets worse when you think of the individual values: $var refers to the scalar while $var[3] refers to the list, and $var{3} refers to the hash. Yes, it can be very, very confusing.
In this particular case, he has a hash (a keyed array) called %item, and he's converting the keys in this hash into a list sorted by the keys. This syntax could be simplified from:
my #items = sort { $a <=> $b } keys %items;
to just:
my #items = sort keys %items;
See the Perldocs on the sort function and the keys function.
What does $items{$items[$i]} = $i + 1; aim to do? Looks like it just sets up the value for the hash $items sequentially.
Let's look at the entire loop:
foreach my $i (0 .. $#items)
{
print VAL "$items[$i]\n";
$items{$items[$i]} = $i + 1;
}
The subroutine is going to loop through this loop once for each item in the #items list. This is the sorted list of keys to the old %items hash. The $#items means the largest index in the item list. For example, if #items = ("foo", "bar", and "foobar"), then $#item would be 2 because the last item in this list is $item[2] which equals foobar.
This way, he's hitting the index of each entry in #items. (REMEMBER: This is different from %item!).
The next line is a bit tricky:
$items{$items[$i]} = $i + 1;
Remember that $item{} refers to the old %items hash! He's creating a new %items hash. This is being keyed by each item in the #items list. And, the value is being set to the index of that item plus 1. Let's assume that:
#items = ("foo", "bar", "foobar")
In the end, he's doing this:
$item{foo} = 1;
$item{bar} = 2;
$item{foobar} = 3;
1 Well, this isn't 100% true. Perl stores each variable in a kind of hash structure. In memory, $var, #var, and %var will be stored in the same hash entry in memory, but in positions related to each variable type. 99.9999% of the time, this matters not one bit. As far as you are concerned, these are three completely different variables.
However, there are a few rare occasions where a programmer will take advantage of this when they futz directly with memory in Perl.
I want to show you how I would write that subroutine.
Bur first, I want to show you some of the steps of how, and why, I changed the code.
Reduce the number of for loops:
First off this loop doesn't need to set the value of $items{$cui} to anything in particular. It also doesn't have to be a loop at all.
foreach my $cui (keys %{$ref->{$id}}) { $items{$cui} = 1 }
This does practically the same thing. The only real difference is it sets them all to undef instead.
#items{ keys %{$ref->{$id}} } = ();
If you really needed to set the values to 1. Note that (1)x#keys returns a list of 1's with the same number of elements in #keys.
my #keys = keys %{$ref->{$id}};
#items{ #keys } = (1) x #keys;
If you are going to have to loop over a very large number of elements then a for loop may be a good idea, but only if you have to set the value to something other than undef. Since we are only using the loop variable once, to do something simple; I would use this code:
$items{$_} = 1 for keys %{$ref->{$id}};
Swap keys with values:
On the line before that we see:
foreach my $id (keys %$ref){
In case you didn't notice $id was used only once, and that was for getting the associated value.
That means we can use values and get rid of the %{$ref->{$id}} syntax.
for my $hash (values %$ref){
#items{ keys %$hash } = ();
}
( $hash isn't a good name, but I don't know what it represents. )
3 arg open:
It isn't recommended to use the two argument form of open, or to blindly use the bareword style of filehandles.
open(VAL, "> $classes.items");
As an aside, did you know there is also a one argument form of open. I don't really recommend it though, it's mostly there for backward compatibility.
our $VAL = "> $classes.items";
open(VAL);
The recommend way to do it, is with 3 arguments.
open my $val, '>', "$classes.items";
There may be some rare edge cases where you need/want to use the two argument version though.
Put it all together:
sub GetItems {
# this will cause open and close to die on error (in this subroutine only)
use autodie;
my $classes = shift;
my %items;
for my $vector_hash (#_){
# use values so that we don't have to use $ref->{$id}
for my $hash (values %$ref){
# create the keys in %items
#items{keys %$hash} = ();
}
}
# This assumes that the keys of %items are numbers
my #items = sort { $a <=> $b } keys %items;
# using 3 arg open
open my $output, '>', "$classes.items";
my $index; # = 0;
for $item (#items){
print {$output} $item, "\n";
$items{$item} = ++$index; # 1...
}
close $output;
return \%items;
}
Another option for that last for loop.
for my $index ( 1..#items ){
my $item = $items[$index-1];
print {$output} $item, "\n";
$items{$item} = $index;
}
If your version of Perl is 5.12 or newer, you could write that last for loop like this:
while( my($index,$item) = each #items ){
print {$output} $item, "\n";
$items{$item} = $index + 1;
}