How can I combine hashes in Perl? - perl

What is the best way to combine both hashes into %hash1? I always know that %hash2 and %hash1 always have unique keys. I would also prefer a single line of code if possible.
$hash1{'1'} = 'red';
$hash1{'2'} = 'blue';
$hash2{'3'} = 'green';
$hash2{'4'} = 'yellow';

Quick Answer (TL;DR)
%hash1 = (%hash1, %hash2)
## or else ...
#hash1{keys %hash2} = values %hash2;
## or with references ...
$hash_ref1 = { %$hash_ref1, %$hash_ref2 };
Overview
Context: Perl 5.x
Problem: The user wishes to merge two hashes1 into a single variable
Solution
use the syntax above for simple variables
use Hash::Merge for complex nested variables
Pitfalls
What do to when both hashes contain one or more duplicate keys
(see e.g., Perl - Merge hash containing duplicate keys)
(see e.g., Perl hashes: how to deal with duplicate keys and get possible pair)
Should a key-value pair with an empty value ever overwrite a key-value pair with a non-empty value?
What constitutes an empty vs non-empty value in the first place? (e.g. undef, zero, empty string, false, falsy ...)
See also
PM post on merging hashes
PM Categorical Q&A hash union
Perl Cookbook 5.10. Merging Hashes
websearch://perlfaq "merge two hashes"
websearch://perl merge hash
https://metacpan.org/pod/Hash::Merge
Footnotes
1 * (aka associative-array, aka dictionary)

Check out perlfaq4: How do I merge two hashes. There is a lot of good information already in the Perl documentation and you can have it right away rather than waiting for someone else to answer it. :)
Before you decide to merge two hashes, you have to decide what to do if both hashes contain keys that are the same and if you want to leave the original hashes as they were.
If you want to preserve the original hashes, copy one hash (%hash1) to a new hash (%new_hash), then add the keys from the other hash (%hash2 to the new hash. Checking that the key already exists in %new_hash gives you a chance to decide what to do with the duplicates:
my %new_hash = %hash1; # make a copy; leave %hash1 alone
foreach my $key2 ( keys %hash2 )
{
if( exists $new_hash{$key2} )
{
warn "Key [$key2] is in both hashes!";
# handle the duplicate (perhaps only warning)
...
next;
}
else
{
$new_hash{$key2} = $hash2{$key2};
}
}
If you don't want to create a new hash, you can still use this looping technique; just change the %new_hash to %hash1.
foreach my $key2 ( keys %hash2 )
{
if( exists $hash1{$key2} )
{
warn "Key [$key2] is in both hashes!";
# handle the duplicate (perhaps only warning)
...
next;
}
else
{
$hash1{$key2} = $hash2{$key2};
}
}
If you don't care that one hash overwrites keys and values from the other, you could just use a hash slice to add one hash to another. In this case, values from %hash2 replace values from %hash1 when they have keys in common:
#hash1{ keys %hash2 } = values %hash2;

This is an old question, but comes out high in my Google search for 'perl merge hashes' - and yet it does not mention the very helpful CPAN module Hash::Merge

For hash references. You should use curly braces like the following:
$hash_ref1 = {%$hash_ref1, %$hash_ref2};
and not the suggested answer above using parenthesis:
$hash_ref1 = ($hash_ref1, $hash_ref2);

Related

How to make a particular change in all the keys of a hash?

I have a hash which is like this:
'IRQ_VSAFE_LPM_ASC_0' => '140',
'IRQ_VSAFE_LPM_ASC_1' => '141'.......and so on
I want to replace ASC_ by ASC_1 in all keys in the hash. I tried this:
foreach $_(keys $hash)
{
s/ASC_/ASC_1/g;
}
but it's not working.
You have to delete old keys from the hash and insert new ones,
use strict;
use warnings;
sub rename_keys {
my ($hash, $func) = #_;
my #k1 = my #k2 = keys %$hash;
$func->() for #k2;
#$hash{#k2} = delete #$hash{#k1};
}
my %hash = (
'IRQ_VSAFE_LPM_ASC_0' => '140',
'IRQ_VSAFE_LPM_ASC_1' => '141',
);
rename_keys(\%hash, sub { s/ASC_/ASC_1/ });
The previous answer addressed a way to do what you want. However, it also makes sense to explain why what you tried to do didn't work.
The problem is that the syntax used for working with hashes in Perl can mislead you with its simplicity compared to the actual way the hash works underneath.
What you see in Perl code is simply two pieces of information: a hash key and a corresponding hash value: $myHash{$key} = $value; or even more misleading %myHash = ($key => $value);
However, the way the hashes work, this isn't merely storing the key and a value as a pair, as the code above may lead you into thinking. Instead, a hash is a complicated data structure, in which the key serves as an input into the addressing which is done via a formula (hash function) and an algorithm (to deal with collistions) - the details are well covered on Wikipedia article.
As such, changing a hash key as if it was merely a value isn't enough, because what is stored in the hash isn't just a value - it's a whole data structure with addressing based on that value. Therefore when you change a hash key, it would ALSO change the location of the value in the data structure, and doing that isn't possible without removing the old entry and adding a brand new entry under a new key, which will delete and re-insert the value in the correct place.
A simple way to do this may be to use pairmap from recent List::Util.
use 5.014; # so we can use the /r flag to s///
use List::Util qw( pairmap );
my %new = pairmap { ($a =~ s/ASC_/ASC_1/r) => $b } %oldhash;

How to get a single key (random ok) from a very large hash in perl?

Suppose you have a very large hash (lots of keys), and have a function that potentially deletes many of those keys, e.g.:
while ( each %in ) {
push #out, $_;
functionThatDeletesOneOrMoreKeys($_, \%in);
}
I believe each in this case is an efficient way to pull a single key from the hash, but the documentation says each should not be used when deleting keys from the hash.
Otherwise I could use while (%in) { $_ = (keys(%in))[0] .... but that seems horribly inefficient for a very large hash.
Is there a better way to do this?
This seems to be a horrible thing to do, and it would be better if you explained what you were trying to achieve
However the problem with deleting hash elements while iterating using each is that each holds a state value for the hash that depends on the hash remaining unchanged
You can clear that state by calling keys (or values) on the same hash.
Here's an example which deletes the three elements with the given key and those before and after it in an attempt to emulate what your function_that_deletes_one_or_more_keys (which is what it should be called) might do
use strict;
use warnings 'all';
use feature 'say';
my %h = map +( $_ => 1), 0 .. 9;
while ( my $key = each %h ) {
say $key;
delete #h{$key-1, $key, $key+1};
keys %h; # Reset "each" state
}
I recommend that you don't use the global $_ variable for this
output
2
9
4
6
0
General advice without knowing details of the complexity to which you refer:
Change the function lines that delete keys to instead store the keys to be deleted in an array or hash (or file). After the first loop completes, loop through that array or hash (or file) and delete those keys from the first hash.
my #in_keys = keys %in;
for (#in_keys) {
if (exists $in{$_}) {
...
}
}
Or do as Borodin shows, resetting the iterator each time you know you've deleted at least one element.

Adding multiple values to key in perl hash

I need to create multi-dimensional hash.
for example I have done:
$hash{gene} = $mrna;
if (exists ($exon)){
$hash{gene}{$mrna} = $exon;
}
if (exists ($cds)){
$hash{gene}{$mrna} = $cds;
}
where $gene, $mrna, $exon, $cds are unique ids.
But, my issue is that I want some properties of $gene and $mrna to be included in the hash.
for example:
$hash{$gene}{'start_loc'} = $start;
$hash{gene}{mrna}{'start_loc'} = $start;
etc. But, is that a feasible way of declaring a hash? If I call $hash{$gene} both $mrna and start_loc will be printed. What could be the solution?
How would I add multiple values for the same key $gene and $mrna being the keys in this case.
Any suggestions will be appreciated.
What you need to do is to read the Perl Reference Tutorial.
Simple answer to your question:
Perl hashes can only take a single value to a key. However, that single value can be a reference to a memory location of another hash.
my %hash1 = ( foo => "bar", fu => "bur" }; #First hash
my %hash2;
my $hash{some_key} = \%hash1; #Reference to %hash1
And, there's nothing stopping that first hash from containing a reference to another hash. It's turtles all the way down!.
So yes, you can have a complex and convoluted structure as you like with as many sub-hashes as you want. Or mix in some arrays too.
For various reasons, I prefer the -> syntax when using these complex structures. I find that for more complex structures, it makes it easier to read. However, the main this is it makes you remember these are references and not actual multidimensional structures.
For example:
$hash{gene}->{mrna}->{start_loc} = $start; #Quote not needed in string if key name qualifies as a valid variable name.
The best thing to do is to think of your hash as a structure. For example:
my $person_ref = {}; #Person is a hash reference.
my $person->{NAME}->{FIRST} = "Bob";
my $person->{NAME}->{LAST} = "Rogers";
my $person->{PHONE}->{WORK}->[0] = "555-1234"; An Array Ref. Might have > 1
my $person->{PHONE}->{WORK}->[1] = "555-4444";
my $person->{PHONE}->{CELL}->[0] = "555-4321";
...
my #people;
push #people, $person_ref;
Now, I can load up my #people array with all my people, or maybe use a hash:
my %person;
$person{$bobs_ssn} = $person; #Now, all of Bob's info is index by his SSN.
So, the first thing you need to do is to think of what your structure should look like. What are the fields in your structure? What are the sub-fields? Figure out what your structure should look like, and then setup your hash of hashes to look like that. Figure out exactly how it will be stored and keyed.
Remember, this hash contains references to your genes (or whatever), so you want to choose your keys wisely.
Read the tutorial. Then, try your hand at it. It's not all that complicated to understand. However, it can be a bear to maintain.
When you say use strict;, you give yourself some protection:
my $foo = "bar";
say $Foo; #This won't work!
This won't work because you didn't declare $Foo, you declared $foo. The use stict; can catch variable names that are mistyped, but:
my %var;
$var{foo} = "bar";
say $var{Foo}; #Whoops!
This will not be caught (except maybe that $var{Foo} has not been initialized. The use strict; pragma can't detect mistakes in typing in your keys.
The next step, after you've grown comfortable with references is to move onto object oriented Perl. There's a Tutorial for that too.
All Object Oriented Perl does is to take your hash references, and turns them into objects. Then, it creates subroutines that will help you keep track of manipulating objects. For example:
sub last_name {
my $person = shift; #Don't worry about this for now..
my $last_name = shift;
if ( exists $last_name ) {
my $person->{NAME}->{LAST} = $last_name;
}
return $person->{NAME}->{LAST};
}
When I set my last name using this subroutine ...I mean method, I guarantee that the key will be $person->{NAME}->{LAST} and not $person->{LAST}->{NAME} or $person->{LAST}->{NMAE}. or $person->{last}->{name}.
The main problem isn't learning the mechanisms, but learning to apply them. So, think about exactly how you want to represent your items. This about what fields you want, and how you're going to pull up that information.
You could try pushing each value onto a hash of arrays:
my (#gene, #mrna, #exon, #cds);
my %hash;
push #{ $hash{$gene[$_]} }, [$mrna[$_], $exon[$_], $cds[$_] ] for 0 .. $#gene;
This way gene is the key, with multiple values ($mrna, $exon, $cds) associated with it.
Iterate over keys/values as follows:
for my $key (sort keys %hash) {
print "Gene: $key\t";
for my $value (#{ $hash{$key} } ) {
my ($mrna, $exon, $cds) = #$value; # De-references the array
print "Values: [$mrna], [$exon], [$cds]\n";
}
}
The answer to a question I've asked previously might be of help (Can a hash key have multiple 'subvalues' in perl?).

In Perl, how do I process an entire hash?

I would like to process all elements of a hash table in Perl. How can I do that?
This is a question from the official perlfaq. We're importing the perlfaq to Stack Overflow.
(This is the official perlfaq answer, minus any subsequent edits)
There are a couple of ways that you can process an entire hash. You can get a list of keys, then go through each key, or grab a one key-value pair at a time.
To go through all of the keys, use the keys function. This extracts all of the keys of the hash and gives them back to you as a list. You can then get the value through the particular key you're processing:
foreach my $key ( keys %hash ) {
my $value = $hash{$key}
...
}
Once you have the list of keys, you can process that list before you process the hash elements. For instance, you can sort the keys so you can process them in lexical order:
foreach my $key ( sort keys %hash ) {
my $value = $hash{$key}
...
}
Or, you might want to only process some of the items. If you only want to deal with the keys that start with text:, you can select just those using grep:
foreach my $key ( grep /^text:/, keys %hash ) {
my $value = $hash{$key}
...
}
If the hash is very large, you might not want to create a long list of keys. To save some memory, you can grab one key-value pair at a time using each(), which returns a pair you haven't seen yet:
while( my( $key, $value ) = each( %hash ) ) {
...
}
The each operator returns the pairs in apparently random order, so if ordering matters to you, you'll have to stick with the keys method.
The each() operator can be a bit tricky though. You can't add or delete keys of the hash while you're using it without possibly skipping or re-processing some pairs after Perl internally rehashes all of the elements. Additionally, a hash has only one iterator, so if you use keys, values, or each on the same hash, you can reset the iterator and mess up your processing. See the each entry in perlfunc for more details.

What's the safest way to iterate through the keys of a Perl hash?

If I have a Perl hash with a bunch of (key, value) pairs, what is the preferred method of iterating through all the keys? I have heard that using each may in some way have unintended side effects. So, is that true, and is one of the two following methods best, or is there a better way?
# Method 1
while (my ($key, $value) = each(%hash)) {
# Something
}
# Method 2
foreach my $key (keys(%hash)) {
# Something
}
The rule of thumb is to use the function most suited to your needs.
If you just want the keys and do not plan to ever read any of the values, use keys():
foreach my $key (keys %hash) { ... }
If you just want the values, use values():
foreach my $val (values %hash) { ... }
If you need the keys and the values, use each():
keys %hash; # reset the internal iterator so a prior each() doesn't affect the loop
while(my($k, $v) = each %hash) { ... }
If you plan to change the keys of the hash in any way except for deleting the current key during the iteration, then you must not use each(). For example, this code to create a new set of uppercase keys with doubled values works fine using keys():
%h = (a => 1, b => 2);
foreach my $k (keys %h)
{
$h{uc $k} = $h{$k} * 2;
}
producing the expected resulting hash:
(a => 1, A => 2, b => 2, B => 4)
But using each() to do the same thing:
%h = (a => 1, b => 2);
keys %h;
while(my($k, $v) = each %h)
{
$h{uc $k} = $h{$k} * 2; # BAD IDEA!
}
produces incorrect results in hard-to-predict ways. For example:
(a => 1, A => 2, b => 2, B => 8)
This, however, is safe:
keys %h;
while(my($k, $v) = each %h)
{
if(...)
{
delete $h{$k}; # This is safe
}
}
All of this is described in the perl documentation:
% perldoc -f keys
% perldoc -f each
One thing you should be aware of when using each is that it has
the side effect of adding "state" to your hash (the hash has to remember
what the "next" key is). When using code like the snippets posted above,
which iterate over the whole hash in one go, this is usually not a
problem. However, you will run into hard to track down problems (I speak from
experience ;), when using each together with statements like
last or return to exit from the while ... each loop before you
have processed all keys.
In this case, the hash will remember which keys it has already returned, and
when you use each on it the next time (maybe in a totaly unrelated piece of
code), it will continue at this position.
Example:
my %hash = ( foo => 1, bar => 2, baz => 3, quux => 4 );
# find key 'baz'
while ( my ($k, $v) = each %hash ) {
print "found key $k\n";
last if $k eq 'baz'; # found it!
}
# later ...
print "the hash contains:\n";
# iterate over all keys:
while ( my ($k, $v) = each %hash ) {
print "$k => $v\n";
}
This prints:
found key bar
found key baz
the hash contains:
quux => 4
foo => 1
What happened to keys "bar" and baz"? They're still there, but the
second each starts where the first one left off, and stops when it reaches the end of the hash, so we never see them in the second loop.
The place where each can cause you problems is that it's a true, non-scoped iterator. By way of example:
while ( my ($key,$val) = each %a_hash ) {
print "$key => $val\n";
last if $val; #exits loop when $val is true
}
# but "each" hasn't reset!!
while ( my ($key,$val) = each %a_hash ) {
# continues where the last loop left off
print "$key => $val\n";
}
If you need to be sure that each gets all the keys and values, you need to make sure you use keys or values first (as that resets the iterator). See the documentation for each.
Using the each syntax will prevent the entire set of keys from being generated at once. This can be important if you're using a tie-ed hash to a database with millions of rows. You don't want to generate the entire list of keys all at once and exhaust your physical memory. In this case each serves as an iterator whereas keys actually generates the entire array before the loop starts.
So, the only place "each" is of real use is when the hash is very large (compared to the memory available). That is only likely to happen when the hash itself doesn't live in memory itself unless you're programming a handheld data collection device or something with small memory.
If memory is not an issue, usually the map or keys paradigm is the more prevelant and easier to read paradigm.
A few miscellaneous thoughts on this topic:
There is nothing unsafe about any of the hash iterators themselves. What is unsafe is modifying the keys of a hash while you're iterating over it. (It's perfectly safe to modify the values.) The only potential side-effect I can think of is that values returns aliases which means that modifying them will modify the contents of the hash. This is by design but may not be what you want in some circumstances.
John's accepted answer is good with one exception: the documentation is clear that it is not safe to add keys while iterating over a hash. It may work for some data sets but will fail for others depending on the hash order.
As already noted, it is safe to delete the last key returned by each. This is not true for keys as each is an iterator while keys returns a list.
I always use method 2 as well. The only benefit of using each is if you're just reading (rather than re-assigning) the value of the hash entry, you're not constantly de-referencing the hash.
I may get bitten by this one but I think that it's personal preference. I can't find any reference in the docs to each() being different than keys() or values() (other than the obvious "they return different things" answer. In fact the docs state the use the same iterator and they all return actual list values instead of copies of them, and that modifying the hash while iterating over it using any call is bad.
All that said, I almost always use keys() because to me it is usually more self documenting to access the key's value via the hash itself. I occasionally use values() when the value is a reference to a large structure and the key to the hash was already stored in the structure, at which point the key is redundant and I don't need it. I think I've used each() 2 times in 10 years of Perl programming and it was probably the wrong choice both times =)
I usually use keys and I can't think of the last time I used or read a use of each.
Don't forget about map, depending on what you're doing in the loop!
map { print "$_ => $hash{$_}\n" } keys %hash;
I woudl say:
Use whatever's easiest to read/understand for most people (so keys, usually, I'd argue)
Use whatever you decide consistently throught the whole code base.
This give 2 major advantages:
It's easier to spot "common" code so you can re-factor into functions/methiods.
It's easier for future developers to maintain.
I don't think it's more expensive to use keys over each, so no need for two different constructs for the same thing in your code.