Is perl's each function worth using? - perl

From perldoc -f each we read:
There is a single iterator for each hash, shared by all each, keys, and values function calls in the program; it can be reset by reading all the elements from the hash, or by evaluating keys HASH or values HASH.
The iterator is not reset when you leave the scope containing the each(), and this can lead to bugs:
my %h = map { $_, 1 } qw(1 2 3);
while (my $k = each %h) { print "1: $k\n"; last }
while (my $k = each %h) { print "2: $k\n" }
Output:
1: 1
2: 3
2: 2
What are the common workarounds for this behavior? And is it worth using each in general?

I think it is worth using as long as you are aware of this. It's ideal when you need both key and value in iteration:
while (my ($k,$v) = each %h) {
say "$k = $v";
}
In your example you can reset the iterator by adding keys %h; like so:
my %h = map { $_ => 1 } qw/1 2 3/;
while (my $k = each %h) { print "1: $k\n"; last }
keys %h; # reset %h
while (my $k = each %h) { print "2: $k\n" }
From Perl 5.12 each will also allow iteration on an array.

I find each to be very handy for idioms like this:
my $hashref = some_really_complicated_method_that_builds_a_large_and_deep_structure();
while (my ($key, $value) = each %$hashref)
{
# code that does stuff with both $key and $value
}
Contrast that code to this:
my $hashref = ...same call as above
foreach my $key (keys %$hashref)
{
my $value = $hashref->{$key};
# more code here...
}
In the first case, both $key and $value are immediately available to the body of the loop. In the second case, $value must be fetched first. Additionally, the list of keys of $hashref may be really huge, which takes up memory. This is occasionally an issue. each does not incur such overhead.
However, the drawbacks of each are not instantly apparent: if aborting from the loop early, the hash's iterator is not reset. Additionally (and I find this one more serious and even less visible): you cannot call keys(), values() or another each() from within this loop. To do so would reset the iterator, and you would lose your place in the while loop. The while loop would continue forever, which is definitely a serious bug.

each is too dangerous to ever use, and many style guides prohibit its use completely. The danger is that if a cycle of each is aborted before the end of the hash, the next cycle will start there. This can cause very hard-to-reproduce bugs; the behavior of one part of the program will depend on a completely unrelated other part of the program. You might use each right, but what about every module ever written that might use your hash (or hashref; it's the same)?
keys and values are always safe, so just use those. keys makes it easier to traverse the hash in deterministic order, anyway, which is almost always more useful. (for my $key (sort keys %hash) { ... })

each is not only worth using, it's pretty much mandatory if you want to loop over all of a tied hash too big for memory.
A void-context keys() (or values, but consistency is nice) before beginning the loop is the only "workaround" necessary; is there some reason you are looking for some other workaround?

use the keys() function to reset the iterator. See the faq for more info

each has a buit-in, hidden global variable that can hurt you. Unless you need this behavior, it's safer to just use keys.
Consider this example where we want to group our k/v pairs (yes, I know printf would do this better):
#!perl
use strict;
use warnings;
use Test::More 'no_plan';
{ my %foo = map { ($_) x 2 } (1..15);
is( one( \%foo ), one( \%foo ), 'Calling one twice works with 15 keys' );
is( two( \%foo ), two( \%foo ), 'Calling two twice works with 15 keys' );
}
{ my %foo = map { ($_) x 2 } (1..105);
is( one( \%foo ), one( \%foo ), 'Calling one twice works with 105 keys' );
is( two( \%foo ), two( \%foo ), 'Calling two twice works with 105 keys' );
}
sub one {
my $foo = shift;
my $r = '';
for( 1..9 ) {
last unless my ($k, $v) = each %$foo;
$r .= " $_: $k -> $v\n";
}
for( 10..99 ) {
last unless my ($k, $v) = each %$foo;
$r .= " $_: $k -> $v\n";
}
return $r;
}
sub two {
my $foo = shift;
my $r = '';
my #k = keys %$foo;
for( 1..9 ) {
last unless #k;
my $k = shift #k;
$r .= " $_: $k -> $foo->{$k}\n";
}
for( 10..99 ) {
last unless #k;
my $k = shift #k;
$r .= " $_: $k -> $foo->{$k}\n";
}
return $r;
}
Debugging the error shown in the tests above in a real application would be horribly painful. (For better output use Test::Differences eq_or_diff instead of is.)
Of course one() can be fixed by using keys to clear the iterator at the start and end of the subroutine. If you remember. If all your coworkers remember. It's perfectly safe as long as no one forgets.
I don't know about you, but I'll just stick with using keys and values.

It's best if used as it's name: each. It's probably the wrong thing to use if you mean "give me the first key-value pair," or "give me the first two pairs" or whatever. Just keep in mind that the idea is flexible enough that each time you call it, you get the next pair (or key in a scalar context).

each() can be more efficient if you are iterating through a tied hash, for example a database that contains millions of keys; that way you don't have to load all the keys in memory.

Related

Understanding Perl pointers to hashes in my script

So I know how the split is working, my question is how the pointer $p is working here. Does it have a different value for each iteration, and the value gets pushed on to the hash as an array? Is that how it will keep those values together when I need to extract them? I have over 100 lines of values I need to reference back to and I'm not sure how it will do that if $p is not changing with each iteration. Thanks!
else{
my($site,$x,$y) = split /,/, $_;
my $p;
$p->{site} = $site;
$p->{x} = $x;
$p->{y} = $y;
push #{$die_loc{$pattern}}, $p;
}
I take it that this is all in a loop where $_ is assigned every time through.
You declare my $p every time so each gets its own memory location once it is assigned to. At that point it is autovivified into a hash reference, since that's how it is assigned. That reference is copied onto the array, so you will have them all. You can get the memory address of a reference using refaddr from the core module Scalar::Util. Or, for that matter, just print $p.
What you have can be written as
my $p = { site => $site, x => $x, y => $y };
push #{$die_loc{$pattern}}}, $p;
So after all is said and done, the hash %die_loc will under key $pattern have an array reference, which has for elements the hash references with keys site, x, and y.
use feature 'say';
foreach my $hr (#{$die_loc{$pattern}}) {
say "site: $hr->{site}, x: $hr->{x}, y: $hr->{y}"
}
This prints a line for each (presumed) iteration that you processed. But generally you don't want to type key names but rather use keys to print a hash, for example
foreach my $hr (#{$die_loc{$pattern}}) {
say join ', ', map { "$_: $hr->{$_}" } sort keys %$hr;
}
where keys are also sorted for consistent prints. Or use a module, like Data::Dump.
Note that references are a little different from pointers.
There is only a code fragment posted so let me also say that you want to always start with
use warnings 'all';
use strict;
That code is much better written like this
else {
my ( $site, $x, $y ) = split /,/;
my %p = (
site => $site,
x => $x,
y => $y,
);
push #{ $die_loc{$pattern} }, \%p;
}
or, perhaps better still
else {
my %p;
#p{qw/ site x y /} = split /,/;
push #{ $die_loc{$pattern} }, \%p;
}

Why isn't my sort working in Perl?

I have never used Perl, but I need to complete this exercise. My task is to sort an array in a few different ways. I've been provided with a test script. This script puts together the array and prints statements for each stage of it's sorting. I've named it foo.pl:
use strict;
use warnings;
use MyIxHash;
my %myhash;
my $t = tie(%myhash, "MyIxHash", 'a' => 1, 'abe' => 2, 'cat'=>'3');
$myhash{b} = 4;
$myhash{da} = 5;
$myhash{bob} = 6;
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the starting key => val pairs\n";
$t->SortByKey; # sort alphabetically
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the alphabetized key => val pairs\n";
$t->SortKeyByFunc(sub {my ($a, $b) = #_; return ($b cmp $a)}); # sort alphabetically in reverse order
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the reverse alphabetized key => val pairs\n";
$t->SortKeyByFunc(\&abcByLength); # use abcByLength to sort
print join(", ", map { "$_ => $myhash{$_}" } keys %myhash) . " are the abcByLength sorted key => val pairs\n";
print "Done\n\n";
sub abcByLength {
my ($a, $b) = #_;
if(length($a) == length($b)) { return $a cmp $b; }
else { return length($a) <=> length($b) }
}
Foo.pl uses a package called MyIxHash which I've created a module for called MyIxHash.pm. The script runs through the alphabetical sort: "SortByKey", which I've inherited via the "IxHash" package in my module. The last two sorts are the ones giving me issues. When the sub I've created: "SortKeyByFunc" is ran on the array, it passes in the array and a subroutine as arguments. I've attempted to take those arguments and associate them with variables.
The final sort is supposed to sort by string length, then alphabetically. A subroutine for this is provided at the bottom of foo.pl as "abcByLength". In the same way as the reverse alpha sort, this subroutine is being passed as a parameter to my SortKeyByFunc subroutine.
For both of these sorts, it seems the actual sorting work is done for me, and I just need to apply this subroutine to my array.
My main issue here seems to be that I don't know how, if possible, to take my subroutine argument and run my array through it as a parameter. I'm a running my method on my array incorrectly?
package MyIxHash;
#use strict;
use warnings;
use parent Tie::IxHash;
use Data::Dumper qw(Dumper);
sub SortKeyByFunc {
#my $class = shift;
my ($a, $b) = #_;
#this is a reference to the already alphabetaized array being passed in
my #letters = $_[0][1];
#this is a reference to the sub being passed in as a parameter
my $reverse = $_[1];
#this is my variable to contain my reverse sorted array
my #sorted = #letters->$reverse();
return #sorted;
}
1;
"My problem occurs where I try: my #sorted = #letters->$reverse(); I've also tried: my #sorted = sort {$reverse} #letters;"
You were really close; the correct syntax is:
my $reverse = sub { $b cmp $a };
# ...
my #sorted = sort $reverse #letters;
Also note that, for what are basically historical reasons, sort passes the arguments to the comparison function in the (slightly) magic globals $a and $b, not in #_, so you don't need to (and indeed shouldn't) do my ($a, $b) = #_; in your sortsubs (unless you declare them with a prototype; see perldoc -f sort for the gritty details).
Edit: If you're given a comparison function that for some reason does expect its arguments in #_, and you can't change the definition of that function, then your best bet is probably to wrap it in a closure like this:
my $fixed_sortsub = sub { $weird_sortsub->($a, $b) };
my #sorted = sort $fixed_sortsub #letters;
or simply:
my #sorted = sort { $weird_sortsub->($a, $b) } #letters;
Edit 2: Ah, I see the/a problem. When you write:
my #letters = $_[0][1];
what you end up with a is a single-element array containing whatever $_[0][1] is, which is presumably an array reference. You should either dereference it immediately, like this:
my #letters = #{ $_[0][1] };
or just keep is as a reference for now and dereference it when you use it:
my $letters = $_[0][1];
# ...
my #sorted = sort $whatever #$letters;
Edit 3: Once you do manage to sort the keys, then, as duskwuff notes in his original answer, you'll also need to call the Reorder() method from your parent class, Tie::IxHash to actually change the order of the keys. Also, the first line:
my ($a, $b) = #_;
is completely out of place in what's supposed to be an object method that takes a code reference (and, in fact, lexicalizing $a and $b is a bad idea anyway if you want to call sort later in the same code block). What it should read is something like:
my ($self, $sortfunc) = #_;
In fact, rather than enumerating all the things that seem to be wrong with your original code, it might be easier to just fix it:
package MyIxHash;
use strict;
use warnings;
use parent 'Tie::IxHash';
sub SortKeyByFunc {
my ($self, $sortfunc) = #_;
my #unsorted = $self->Keys();
my #sorted = sort { $sortfunc->($a, $b) } #unsorted;
$self->Reorder( #sorted );
}
1;
or simply:
sub SortKeyByFunc {
my ($self, $sortfunc) = #_;
$self->Reorder( sort { $sortfunc->($a, $b) } $self->Keys() );
}
(Ps. I now see why the comparison functions were specified as taking their arguments in #_ rather than in the globals $a and $b where sort normally puts them: it's because the comparison functions belong to a different package, and $a and $b are not magical enough to be the same in every package like, say, $_ and #_ are. I guess that could be worked around, but it would take some quite non-trivial trickery with caller.)
(Pps. Please do credit me and duskwuff / Stack Overflow when you hand in your exercise. And good luck with learning Perl — trust me, it'll be a useful skill to have.)
Your SortKeyByFunc method returns the results of sorting the array (#sorted), but it doesn't modify the array "in place". As a result, just calling $t->SortKeyByFunc(...); doesn't end up having any visible permanent effects.
You'll need to call $t->Reorder() within your SortKeyByFunc method to have any lasting impact on the array. I haven't tried it, but something like:
$t->Reorder(#sorted);
at the end of your method may be sufficient.

How to get the top keys from a hash by value

I have a hash that I sorted by values greatest to least. How would I go about getting the top 5? There was a post on here that talked about getting only one value.
What is the easiest way to get a key with the highest value from a hash in Perl?
I understand that so would lets say getting those values add them to an array and delete the element in the hash and then do the process again?
Seems like there should be an easier way to do this then that though.
My hash is called %words.
Edited Took out code as the question answered without really needing it.
Your question is how to get the five highest values from your hash. You have this code:
my #keys = sort {
$words{$b} <=> $words{$a}
or
"\L$a" cmp "\L$b"
} keys %words;
Where you have your sorted hash keys. Take the five top keys from there?
my #highest = splice #keys, 0, 5; # also deletes the keys from the array
my #highest = #keys[0..4]; # non-destructive solution
Also some comments on your code:
open( my $filehandle0, '<', $file0 ) || die "Could not open $file0\n";
It is a good idea to include the error message $! in your die statement to get valuable information for why the open failed.
for (#words) {
s/[\,|\.|\!|\?|\:|\;|\"]//g;
}
Like I said in the comment, you do not need to escape characters or use alternations in a character class bracket. Use either:
s/[,.!?:;"]//g for #words; #or
tr/,.!?:;"//d for #words;
This next part is a bit odd.
my #stopwords;
while ( my $line = <$filehandle1> ) {
chomp $line;
my #linearray = split( " ", $line );
push( #stopwords, #linearray );
}
for my $w ( my #stopwords ) {
s/\b\Q$w\E\B//ig;
}
You read in the stopwords from a file... and then you delete the stopwords from $_? Are you even using $_ at this point? Moreover, you are redeclaring the #stopwords array in the loop header, which will effectively mean your new array will be empty, and your loop will never run. This error is silent, it seems, so you might never notice.
my %words = %words_count;
Here you make a copy of %words_count, which seems to be redundant, since you never use it again. If you have a big hash, this can decrease performance.
my $key_count = 0;
$key_count = keys %words;
This can be done in one line: my $key_count = keys %words. More readable, in my opinion.
$value_count = $words{$key} + $value_count;
Can also be abbreviated with the += operator: $value_cont += $words{$key}
It is very good that you use strict and warnings.
If performance isn't a big deal
(sort {$words{$a} <=> $words{$b}} keys %words)[0..4])
if you absolutely need killer speed, a selection sort which terminates after 5 iterations is probably the best thing for you.
my #results;
for (0..4) {
my $maxkey;
my $max = 0;
for my $key (keys %words){
if ($max < $words{$key}){
$maxkey = $key;
$max = $words{$key};
}
}
push #results, $maxkey;
delete $words{$maxkey};
}
say join(","=>#results);
There's CPAN module for that, Sort::Key::Top.
It has a straight-forward interface and an efficient XS implementation:
use Sort::Key::Top qw(rnkeytop);
my #results = rnkeytop { $words{$_} } 5 => keys %words;

Can I copy a hash without resetting its "each" iterator?

I am using each to iterate through a Perl hash:
while (my ($key,$val) = each %hash) {
...
}
Then something interesting happens and I want to print out the hash. At first I consider something like:
while (my ($key,$val) = each %hash) {
if (something_interesting_happens()) {
foreach my $k (keys %hash) { print "$k => $hash{$k}\n" }
}
}
But that won't work, because everyone knows that calling keys (or values) on a hash resets the internal iterator used for each, and we may get an infinite loop. For example, these scripts will run forever:
perl -e '%a=(foo=>1); while(each %a){keys %a}'
perl -e '%a=(foo=>1); while(each %a){values %a}'
No problem, I thought. I could make a copy of the hash, and print out the copy.
if (something_interesting_happens()) {
%hash2 = %hash;
foreach my $k (keys %hash2) { print "$k => $hash2{$k}\n" }
}
But that doesn't work, either. This also resets the each iterator. In fact, any use of %hash in a list context seems to reset its each iterator. So these run forever, too:
perl -e '%a=(foo=>1); while(each %a){%b = %a}'
perl -e '%a=(foo=>1); while(each %a){#b = %a}'
perl -e '%a=(foo=>1); while(each %a){print %a}'
Is this documented anywhere? It makes sense that perl might need to use the same internal iterator to push a hash's contents onto a return stack, but I can also imagine hash implementations that didn't need to do that.
More importantly, is there any way to do what I want? To get to all the elements of a hash without resetting the each iterator?
This also suggests you can't debug a hash inside an each iteration, either. Consider running the debugger on:
%a = (foo => 123, bar => 456);
while ( ($k,$v) = each %a ) {
$DB::single = 1;
$o .= "$k,$v;";
}
print $o;
Just by inspecting the hash where the debugger stops (say, typing p %a or x %a), you will change the output of the program.
Update: I uploaded Hash::SafeKeys as a general solution to this problem. Thanks #gpojd for pointing me in the right direction and #cjm for a suggestion that made the solution much simpler.
Have you tried Storable's dclone to copy it? It would probably be something like this:
use Storable qw(dclone);
my %hash_copy = %{ dclone( \%hash ) };
How big is this hash? How long does it take to iterate through it, such that you care about the timing of the access?
Just set a flag and do the action after the end of the iteration:
my $print_it;
while (my ($key,$val) = each %hash) {
$print_it = 1 if something_interesting_happens();
...
}
if ($print_it) {
foreach my $k (keys %hash) { print "$k => $hash{$k}\n" }
}
Although there's no reason not to use each in the printout code, too, unless you were planning on sorting by key or something.
Let's not forget that keys %hash is already defined when you enter the while loop. One could have simply saved the keys into an array for later use:
my #keys = keys %hash;
while (my ($key,$val) = each %hash) {
if (something_interesting_happens()) {
print "$_ => $hash{$_}\n" for #keys;
}
}
Downside:
It's less elegant (subjective)
It won't work if %hash is modified (but then why would one use each in the first place?)
Upside:
It uses less memory by avoiding hash-copying
Not really. each is incredibly fragile. It stores iteration state on the iterated hash itself, state which is reused by other parts of perl when they need it. Far safer is to forget that it exists, and always iterate your own list from the result of keys %hash instead, because the iteration state over a list is stored lexically as part of the for loop itself, so is immune from corruption by other things.

Is there a simple way to validate a hash of hash element comparsion?

Is there a simple way to validate a hash of hash element comparsion ?
I need to validate a Perl hash of hash element $Table{$key1}{$key2}{K1}{Value} compare to all other elements in hash
third key will be k1 to kn and i want comprare those elements and other keys are same
if ($Table{$key1}{$key2}{K1}{Value} eq $Table{$key1}{$key2}{K2}{Value}
eq $Table{$key1}{$key2}{K3}{Value} )
{
#do whatever
}
Something like this may work:
use List::MoreUtils 'all';
my #keys = map "K$_", 1..10;
print "All keys equal"
if all { $Table{$key1}{$key2}{$keys[1]}{Value} eq $Table{$key1}{$key2}{$_}{Value} } #keys;
I would use Data::Dumper to help with a task like this, especially for a more general problem (where the third key is more arbitrary than 'K1'...'Kn'). Use Data::Dumper to stringify the data structures and then compare the strings.
use Data::Dumper;
# this line is needed to assure that hashes with the same keys output
# those keys in the same order.
$Data::Dumper::Sortkeys = 1;
my $string1= Data::Dumper->Dump($Table{$key1}{$key2}{k1});
for ($n=2; exists($Table{$key1}{$key2}{"k$n"}; $n++) {
my $string_n = Data::Dumper->Dump($Table{$key1}{$key2}{"k$n"});
if ($string1 ne $string_n) {
warn "key 'k$n' is different from 'k1'";
}
}
This can be used for the more general case where $Table{$key1}{$key2}{k7}{value} itself contains a complex data structure. When a difference is detected, though, it doesn't give you much help figuring out where that difference is.
A fairly complex structure. You should be looking into using object oriented programming techniques. That would greatly simplify your programming and the handling of these complex structures.
First of all, let's simplify a bit. When you say:
$Table{$key1}{$key2}{k1}{value}
Do you really mean:
my $value = $Table{$key1}->{$key2}->{k1};
or
my $actual_value = $Table{$key1}->{$key2}->{k1}->{Value};
I'm going to assume the first one. If I'm wrong, let me know, and I'll update my answer.
Let's simplify:
my %hash = %{$Table{$key1}->{$key2}};
Now, we're just dealing with a hash. There are two techniques you can use:
Sort the keys of this hash by value, then if two keys have the same value, they will be next to each other in the sorted list, making it easy to detect duplicates. The advantage is that all the duplicate keys would be printed together. The disadvantage is that this is a sort which takes time and resources.
Reverse the hash, so it's keyed by value and the value of that key is the key. If a key already exists, we know the other key has a duplicate value. This is faster than the first technique because no sorting is involved. However, duplicates will be detected, but not printed together.
Here's the first technique:
my %hash = %{$Table{$key1}->{$key2}};
my $previous_value;
my $previous_key;
foreach my $key (sort {$hash{$a} cmp $hash{$b}} keys %hash) {
if (defined $previous_key and $previous_value eq $hash{$key}) {
print "\$hash{$key} is a duplicate of \$hash{$previous_key}\n";
}
$previous_value = $hash{$key};
$previous_key = $key;
}
And the second:
my %hash = %{$Table{$key1}->{$key2}};
my %reverse_hash;
foreach $key (keys %hash) {
my $value = $hash{$key};
if (exists $reverse_hash{$value}) {
print "\$hash{$reverse_hash{$value}} has the same value as \$hash{$key}\n";
}
else {
$reverse_hash{$value} = $key;
}
}
Alternative approach to the problem is make utility function which will compare all keys if has same value returned from some function for all keys:
sub AllSame (&\%) {
my ($c, $h) = #_;
my #k = keys %$h;
my $ref;
$ref = $c->() for $h->{shift #k};
$ref ne $c->() and return for #$h{#k};
return 1
}
print "OK\n" if AllSame {$_->{Value}} %{$Table{$key1}{$key2}};
But if you start thinking in this way you can found this approach much more generic (recommended way):
sub AllSame (#) {
my $ref = shift;
$ref ne $_ and return for #_;
return 1
}
print "OK\n" if AllSame map {$_->{Value}} values %{$Table{$key1}{$key2}};
If mapping operation is expensive you can make lazy counterpart of same:
sub AllSameMap (&#) {
my $c = shift;
my $ref;
$ref = $c->() for shift;
$ref ne $c->() and return for #_;
return 1
}
print "OK\n" if AllSameMap {$_->{Value}} values %{$Table{$key1}{$key2}};
If you want only some subset of keys you can use hash slice syntax e.g.:
print "OK\n" if AllSame map {$_->{Value}} #{$Table{$key1}{$key2}}{map "K$_", 1..10};