In Perl, how do I process an entire hash? - perl

I would like to process all elements of a hash table in Perl. How can I do that?
This is a question from the official perlfaq. We're importing the perlfaq to Stack Overflow.

(This is the official perlfaq answer, minus any subsequent edits)
There are a couple of ways that you can process an entire hash. You can get a list of keys, then go through each key, or grab a one key-value pair at a time.
To go through all of the keys, use the keys function. This extracts all of the keys of the hash and gives them back to you as a list. You can then get the value through the particular key you're processing:
foreach my $key ( keys %hash ) {
my $value = $hash{$key}
...
}
Once you have the list of keys, you can process that list before you process the hash elements. For instance, you can sort the keys so you can process them in lexical order:
foreach my $key ( sort keys %hash ) {
my $value = $hash{$key}
...
}
Or, you might want to only process some of the items. If you only want to deal with the keys that start with text:, you can select just those using grep:
foreach my $key ( grep /^text:/, keys %hash ) {
my $value = $hash{$key}
...
}
If the hash is very large, you might not want to create a long list of keys. To save some memory, you can grab one key-value pair at a time using each(), which returns a pair you haven't seen yet:
while( my( $key, $value ) = each( %hash ) ) {
...
}
The each operator returns the pairs in apparently random order, so if ordering matters to you, you'll have to stick with the keys method.
The each() operator can be a bit tricky though. You can't add or delete keys of the hash while you're using it without possibly skipping or re-processing some pairs after Perl internally rehashes all of the elements. Additionally, a hash has only one iterator, so if you use keys, values, or each on the same hash, you can reset the iterator and mess up your processing. See the each entry in perlfunc for more details.

Related

How to get a single key (random ok) from a very large hash in perl?

Suppose you have a very large hash (lots of keys), and have a function that potentially deletes many of those keys, e.g.:
while ( each %in ) {
push #out, $_;
functionThatDeletesOneOrMoreKeys($_, \%in);
}
I believe each in this case is an efficient way to pull a single key from the hash, but the documentation says each should not be used when deleting keys from the hash.
Otherwise I could use while (%in) { $_ = (keys(%in))[0] .... but that seems horribly inefficient for a very large hash.
Is there a better way to do this?
This seems to be a horrible thing to do, and it would be better if you explained what you were trying to achieve
However the problem with deleting hash elements while iterating using each is that each holds a state value for the hash that depends on the hash remaining unchanged
You can clear that state by calling keys (or values) on the same hash.
Here's an example which deletes the three elements with the given key and those before and after it in an attempt to emulate what your function_that_deletes_one_or_more_keys (which is what it should be called) might do
use strict;
use warnings 'all';
use feature 'say';
my %h = map +( $_ => 1), 0 .. 9;
while ( my $key = each %h ) {
say $key;
delete #h{$key-1, $key, $key+1};
keys %h; # Reset "each" state
}
I recommend that you don't use the global $_ variable for this
output
2
9
4
6
0
General advice without knowing details of the complexity to which you refer:
Change the function lines that delete keys to instead store the keys to be deleted in an array or hash (or file). After the first loop completes, loop through that array or hash (or file) and delete those keys from the first hash.
my #in_keys = keys %in;
for (#in_keys) {
if (exists $in{$_}) {
...
}
}
Or do as Borodin shows, resetting the iterator each time you know you've deleted at least one element.

Dereferencing hashes of hashes in Perl

I'm trying to collect the values that I store in a hash of hashes, but I'm kinda confused in how perl does that. So, I create my hash of hashes as follows:
my %hash;
my #items;
#... some code missing here, generally I'm just populating the #items list
#with $currentitem items
while (<FILE>) { #read the file
($a, $b) = split(/\s+/,$_,-1);
$hash{$currentitem} => {$a => $b};
print $hash{$currentitem}{$a} . "\n";#this is a test and it works
}
The above code seems to work. Now, to the point: I have an array #items, which keeps the $currentitem values. And I want to do something like this:
#test = keys %hash{ $items[$num] };
So that I can get all the key/value pairs for a specific item. I've tried the line of code above, as well as
while ( ($key, $value) = each( $hash{$items[$num]} ) ) {
print "$key, $value\n";
}
I've even tried to populate the hash as follows:
$host{ "$currentcustomer" }{"$a"} => "$b";
Which seems to be more correct according to the various online sources I've met. But still, I can't access the data inside that hash... Any ideas?
I am confused by you saying that this works:
$hash{$currentitem} => {$a => $b};
That shouldn't work (and doesn't work for me). The => operator is a special kind of comma, not an assignment (see perlop). In addition, the construct on the right makes a new anonymous hash. Using that, a new anonymous hash would overwrite the old one for each element you tried to add. You would only ever have one element for each $currentitem.
Here is what you want for assignment:
$hash{$currentitem}{$a} = $b;
And here is how to get the keys:
keys %{ $hash{ $items[$num] } };
I suggest reading up on Perl references to get a better handle on this. The syntax can be a bit tricky at first.
Long answer is in perldoc perldsc.
Short answer is:
keys %{ $expr_producing_hash_ref };
In your case I believe it's
keys %{ $hash{$items[$num]} };

Is it safe in Perl to delete a key from a hash reference when I loop on the same hash? And why?

I basically want to do this:
foreach my $key (keys $hash_ref) {
Do stuff with my $key and $hash_ref
# Delete the key from the hash
delete $hash_ref->{$key};
}
Is it safe? And why?
You're not iterating over the hash, you're iterating over the list of keys returned by keys before you even started looping. Keep in mind that
for my $key (keys %$hash_ref) {
...
}
is roughly the same as
my #anon = keys %$hash_ref;
for my $key (#anon) {
...
}
Deleting from the hash causes no problem whatsoever.
each, on the other, does iterate over a hash. Each time it's called, each returns a different element. Yet, it's still safe to delete the current element!
# Also safe
while (my ($key) = each(%$hash_ref)) {
...
delete $hash_ref->{$key};
...
}
If you add or delete a hash's elements while iterating over it, entries may be skipped or duplicated--so don't do that. Exception: It is always safe to delete the item most recently returned by each()
It is safe, because keys %hash provides entire list once, before you start iterating. foreach then continues to work on this pre-generated list, no matter what you change inside actual hash.
It eats up your memory though, because you keep entire list until you've done.

How can I combine hashes in Perl?

What is the best way to combine both hashes into %hash1? I always know that %hash2 and %hash1 always have unique keys. I would also prefer a single line of code if possible.
$hash1{'1'} = 'red';
$hash1{'2'} = 'blue';
$hash2{'3'} = 'green';
$hash2{'4'} = 'yellow';
Quick Answer (TL;DR)
%hash1 = (%hash1, %hash2)
## or else ...
#hash1{keys %hash2} = values %hash2;
## or with references ...
$hash_ref1 = { %$hash_ref1, %$hash_ref2 };
Overview
Context: Perl 5.x
Problem: The user wishes to merge two hashes1 into a single variable
Solution
use the syntax above for simple variables
use Hash::Merge for complex nested variables
Pitfalls
What do to when both hashes contain one or more duplicate keys
(see e.g., Perl - Merge hash containing duplicate keys)
(see e.g., Perl hashes: how to deal with duplicate keys and get possible pair)
Should a key-value pair with an empty value ever overwrite a key-value pair with a non-empty value?
What constitutes an empty vs non-empty value in the first place? (e.g. undef, zero, empty string, false, falsy ...)
See also
PM post on merging hashes
PM Categorical Q&A hash union
Perl Cookbook 5.10. Merging Hashes
websearch://perlfaq "merge two hashes"
websearch://perl merge hash
https://metacpan.org/pod/Hash::Merge
Footnotes
1 * (aka associative-array, aka dictionary)
Check out perlfaq4: How do I merge two hashes. There is a lot of good information already in the Perl documentation and you can have it right away rather than waiting for someone else to answer it. :)
Before you decide to merge two hashes, you have to decide what to do if both hashes contain keys that are the same and if you want to leave the original hashes as they were.
If you want to preserve the original hashes, copy one hash (%hash1) to a new hash (%new_hash), then add the keys from the other hash (%hash2 to the new hash. Checking that the key already exists in %new_hash gives you a chance to decide what to do with the duplicates:
my %new_hash = %hash1; # make a copy; leave %hash1 alone
foreach my $key2 ( keys %hash2 )
{
if( exists $new_hash{$key2} )
{
warn "Key [$key2] is in both hashes!";
# handle the duplicate (perhaps only warning)
...
next;
}
else
{
$new_hash{$key2} = $hash2{$key2};
}
}
If you don't want to create a new hash, you can still use this looping technique; just change the %new_hash to %hash1.
foreach my $key2 ( keys %hash2 )
{
if( exists $hash1{$key2} )
{
warn "Key [$key2] is in both hashes!";
# handle the duplicate (perhaps only warning)
...
next;
}
else
{
$hash1{$key2} = $hash2{$key2};
}
}
If you don't care that one hash overwrites keys and values from the other, you could just use a hash slice to add one hash to another. In this case, values from %hash2 replace values from %hash1 when they have keys in common:
#hash1{ keys %hash2 } = values %hash2;
This is an old question, but comes out high in my Google search for 'perl merge hashes' - and yet it does not mention the very helpful CPAN module Hash::Merge
For hash references. You should use curly braces like the following:
$hash_ref1 = {%$hash_ref1, %$hash_ref2};
and not the suggested answer above using parenthesis:
$hash_ref1 = ($hash_ref1, $hash_ref2);

What's the safest way to iterate through the keys of a Perl hash?

If I have a Perl hash with a bunch of (key, value) pairs, what is the preferred method of iterating through all the keys? I have heard that using each may in some way have unintended side effects. So, is that true, and is one of the two following methods best, or is there a better way?
# Method 1
while (my ($key, $value) = each(%hash)) {
# Something
}
# Method 2
foreach my $key (keys(%hash)) {
# Something
}
The rule of thumb is to use the function most suited to your needs.
If you just want the keys and do not plan to ever read any of the values, use keys():
foreach my $key (keys %hash) { ... }
If you just want the values, use values():
foreach my $val (values %hash) { ... }
If you need the keys and the values, use each():
keys %hash; # reset the internal iterator so a prior each() doesn't affect the loop
while(my($k, $v) = each %hash) { ... }
If you plan to change the keys of the hash in any way except for deleting the current key during the iteration, then you must not use each(). For example, this code to create a new set of uppercase keys with doubled values works fine using keys():
%h = (a => 1, b => 2);
foreach my $k (keys %h)
{
$h{uc $k} = $h{$k} * 2;
}
producing the expected resulting hash:
(a => 1, A => 2, b => 2, B => 4)
But using each() to do the same thing:
%h = (a => 1, b => 2);
keys %h;
while(my($k, $v) = each %h)
{
$h{uc $k} = $h{$k} * 2; # BAD IDEA!
}
produces incorrect results in hard-to-predict ways. For example:
(a => 1, A => 2, b => 2, B => 8)
This, however, is safe:
keys %h;
while(my($k, $v) = each %h)
{
if(...)
{
delete $h{$k}; # This is safe
}
}
All of this is described in the perl documentation:
% perldoc -f keys
% perldoc -f each
One thing you should be aware of when using each is that it has
the side effect of adding "state" to your hash (the hash has to remember
what the "next" key is). When using code like the snippets posted above,
which iterate over the whole hash in one go, this is usually not a
problem. However, you will run into hard to track down problems (I speak from
experience ;), when using each together with statements like
last or return to exit from the while ... each loop before you
have processed all keys.
In this case, the hash will remember which keys it has already returned, and
when you use each on it the next time (maybe in a totaly unrelated piece of
code), it will continue at this position.
Example:
my %hash = ( foo => 1, bar => 2, baz => 3, quux => 4 );
# find key 'baz'
while ( my ($k, $v) = each %hash ) {
print "found key $k\n";
last if $k eq 'baz'; # found it!
}
# later ...
print "the hash contains:\n";
# iterate over all keys:
while ( my ($k, $v) = each %hash ) {
print "$k => $v\n";
}
This prints:
found key bar
found key baz
the hash contains:
quux => 4
foo => 1
What happened to keys "bar" and baz"? They're still there, but the
second each starts where the first one left off, and stops when it reaches the end of the hash, so we never see them in the second loop.
The place where each can cause you problems is that it's a true, non-scoped iterator. By way of example:
while ( my ($key,$val) = each %a_hash ) {
print "$key => $val\n";
last if $val; #exits loop when $val is true
}
# but "each" hasn't reset!!
while ( my ($key,$val) = each %a_hash ) {
# continues where the last loop left off
print "$key => $val\n";
}
If you need to be sure that each gets all the keys and values, you need to make sure you use keys or values first (as that resets the iterator). See the documentation for each.
Using the each syntax will prevent the entire set of keys from being generated at once. This can be important if you're using a tie-ed hash to a database with millions of rows. You don't want to generate the entire list of keys all at once and exhaust your physical memory. In this case each serves as an iterator whereas keys actually generates the entire array before the loop starts.
So, the only place "each" is of real use is when the hash is very large (compared to the memory available). That is only likely to happen when the hash itself doesn't live in memory itself unless you're programming a handheld data collection device or something with small memory.
If memory is not an issue, usually the map or keys paradigm is the more prevelant and easier to read paradigm.
A few miscellaneous thoughts on this topic:
There is nothing unsafe about any of the hash iterators themselves. What is unsafe is modifying the keys of a hash while you're iterating over it. (It's perfectly safe to modify the values.) The only potential side-effect I can think of is that values returns aliases which means that modifying them will modify the contents of the hash. This is by design but may not be what you want in some circumstances.
John's accepted answer is good with one exception: the documentation is clear that it is not safe to add keys while iterating over a hash. It may work for some data sets but will fail for others depending on the hash order.
As already noted, it is safe to delete the last key returned by each. This is not true for keys as each is an iterator while keys returns a list.
I always use method 2 as well. The only benefit of using each is if you're just reading (rather than re-assigning) the value of the hash entry, you're not constantly de-referencing the hash.
I may get bitten by this one but I think that it's personal preference. I can't find any reference in the docs to each() being different than keys() or values() (other than the obvious "they return different things" answer. In fact the docs state the use the same iterator and they all return actual list values instead of copies of them, and that modifying the hash while iterating over it using any call is bad.
All that said, I almost always use keys() because to me it is usually more self documenting to access the key's value via the hash itself. I occasionally use values() when the value is a reference to a large structure and the key to the hash was already stored in the structure, at which point the key is redundant and I don't need it. I think I've used each() 2 times in 10 years of Perl programming and it was probably the wrong choice both times =)
I usually use keys and I can't think of the last time I used or read a use of each.
Don't forget about map, depending on what you're doing in the loop!
map { print "$_ => $hash{$_}\n" } keys %hash;
I woudl say:
Use whatever's easiest to read/understand for most people (so keys, usually, I'd argue)
Use whatever you decide consistently throught the whole code base.
This give 2 major advantages:
It's easier to spot "common" code so you can re-factor into functions/methiods.
It's easier for future developers to maintain.
I don't think it's more expensive to use keys over each, so no need for two different constructs for the same thing in your code.