Dereferencing hashes of hashes in Perl - perl

I'm trying to collect the values that I store in a hash of hashes, but I'm kinda confused in how perl does that. So, I create my hash of hashes as follows:
my %hash;
my #items;
#... some code missing here, generally I'm just populating the #items list
#with $currentitem items
while (<FILE>) { #read the file
($a, $b) = split(/\s+/,$_,-1);
$hash{$currentitem} => {$a => $b};
print $hash{$currentitem}{$a} . "\n";#this is a test and it works
}
The above code seems to work. Now, to the point: I have an array #items, which keeps the $currentitem values. And I want to do something like this:
#test = keys %hash{ $items[$num] };
So that I can get all the key/value pairs for a specific item. I've tried the line of code above, as well as
while ( ($key, $value) = each( $hash{$items[$num]} ) ) {
print "$key, $value\n";
}
I've even tried to populate the hash as follows:
$host{ "$currentcustomer" }{"$a"} => "$b";
Which seems to be more correct according to the various online sources I've met. But still, I can't access the data inside that hash... Any ideas?

I am confused by you saying that this works:
$hash{$currentitem} => {$a => $b};
That shouldn't work (and doesn't work for me). The => operator is a special kind of comma, not an assignment (see perlop). In addition, the construct on the right makes a new anonymous hash. Using that, a new anonymous hash would overwrite the old one for each element you tried to add. You would only ever have one element for each $currentitem.
Here is what you want for assignment:
$hash{$currentitem}{$a} = $b;
And here is how to get the keys:
keys %{ $hash{ $items[$num] } };
I suggest reading up on Perl references to get a better handle on this. The syntax can be a bit tricky at first.

Long answer is in perldoc perldsc.
Short answer is:
keys %{ $expr_producing_hash_ref };
In your case I believe it's
keys %{ $hash{$items[$num]} };

Related

How to make a particular change in all the keys of a hash?

I have a hash which is like this:
'IRQ_VSAFE_LPM_ASC_0' => '140',
'IRQ_VSAFE_LPM_ASC_1' => '141'.......and so on
I want to replace ASC_ by ASC_1 in all keys in the hash. I tried this:
foreach $_(keys $hash)
{
s/ASC_/ASC_1/g;
}
but it's not working.
You have to delete old keys from the hash and insert new ones,
use strict;
use warnings;
sub rename_keys {
my ($hash, $func) = #_;
my #k1 = my #k2 = keys %$hash;
$func->() for #k2;
#$hash{#k2} = delete #$hash{#k1};
}
my %hash = (
'IRQ_VSAFE_LPM_ASC_0' => '140',
'IRQ_VSAFE_LPM_ASC_1' => '141',
);
rename_keys(\%hash, sub { s/ASC_/ASC_1/ });
The previous answer addressed a way to do what you want. However, it also makes sense to explain why what you tried to do didn't work.
The problem is that the syntax used for working with hashes in Perl can mislead you with its simplicity compared to the actual way the hash works underneath.
What you see in Perl code is simply two pieces of information: a hash key and a corresponding hash value: $myHash{$key} = $value; or even more misleading %myHash = ($key => $value);
However, the way the hashes work, this isn't merely storing the key and a value as a pair, as the code above may lead you into thinking. Instead, a hash is a complicated data structure, in which the key serves as an input into the addressing which is done via a formula (hash function) and an algorithm (to deal with collistions) - the details are well covered on Wikipedia article.
As such, changing a hash key as if it was merely a value isn't enough, because what is stored in the hash isn't just a value - it's a whole data structure with addressing based on that value. Therefore when you change a hash key, it would ALSO change the location of the value in the data structure, and doing that isn't possible without removing the old entry and adding a brand new entry under a new key, which will delete and re-insert the value in the correct place.
A simple way to do this may be to use pairmap from recent List::Util.
use 5.014; # so we can use the /r flag to s///
use List::Util qw( pairmap );
my %new = pairmap { ($a =~ s/ASC_/ASC_1/r) => $b } %oldhash;

Adding multiple values to key in perl hash

I need to create multi-dimensional hash.
for example I have done:
$hash{gene} = $mrna;
if (exists ($exon)){
$hash{gene}{$mrna} = $exon;
}
if (exists ($cds)){
$hash{gene}{$mrna} = $cds;
}
where $gene, $mrna, $exon, $cds are unique ids.
But, my issue is that I want some properties of $gene and $mrna to be included in the hash.
for example:
$hash{$gene}{'start_loc'} = $start;
$hash{gene}{mrna}{'start_loc'} = $start;
etc. But, is that a feasible way of declaring a hash? If I call $hash{$gene} both $mrna and start_loc will be printed. What could be the solution?
How would I add multiple values for the same key $gene and $mrna being the keys in this case.
Any suggestions will be appreciated.
What you need to do is to read the Perl Reference Tutorial.
Simple answer to your question:
Perl hashes can only take a single value to a key. However, that single value can be a reference to a memory location of another hash.
my %hash1 = ( foo => "bar", fu => "bur" }; #First hash
my %hash2;
my $hash{some_key} = \%hash1; #Reference to %hash1
And, there's nothing stopping that first hash from containing a reference to another hash. It's turtles all the way down!.
So yes, you can have a complex and convoluted structure as you like with as many sub-hashes as you want. Or mix in some arrays too.
For various reasons, I prefer the -> syntax when using these complex structures. I find that for more complex structures, it makes it easier to read. However, the main this is it makes you remember these are references and not actual multidimensional structures.
For example:
$hash{gene}->{mrna}->{start_loc} = $start; #Quote not needed in string if key name qualifies as a valid variable name.
The best thing to do is to think of your hash as a structure. For example:
my $person_ref = {}; #Person is a hash reference.
my $person->{NAME}->{FIRST} = "Bob";
my $person->{NAME}->{LAST} = "Rogers";
my $person->{PHONE}->{WORK}->[0] = "555-1234"; An Array Ref. Might have > 1
my $person->{PHONE}->{WORK}->[1] = "555-4444";
my $person->{PHONE}->{CELL}->[0] = "555-4321";
...
my #people;
push #people, $person_ref;
Now, I can load up my #people array with all my people, or maybe use a hash:
my %person;
$person{$bobs_ssn} = $person; #Now, all of Bob's info is index by his SSN.
So, the first thing you need to do is to think of what your structure should look like. What are the fields in your structure? What are the sub-fields? Figure out what your structure should look like, and then setup your hash of hashes to look like that. Figure out exactly how it will be stored and keyed.
Remember, this hash contains references to your genes (or whatever), so you want to choose your keys wisely.
Read the tutorial. Then, try your hand at it. It's not all that complicated to understand. However, it can be a bear to maintain.
When you say use strict;, you give yourself some protection:
my $foo = "bar";
say $Foo; #This won't work!
This won't work because you didn't declare $Foo, you declared $foo. The use stict; can catch variable names that are mistyped, but:
my %var;
$var{foo} = "bar";
say $var{Foo}; #Whoops!
This will not be caught (except maybe that $var{Foo} has not been initialized. The use strict; pragma can't detect mistakes in typing in your keys.
The next step, after you've grown comfortable with references is to move onto object oriented Perl. There's a Tutorial for that too.
All Object Oriented Perl does is to take your hash references, and turns them into objects. Then, it creates subroutines that will help you keep track of manipulating objects. For example:
sub last_name {
my $person = shift; #Don't worry about this for now..
my $last_name = shift;
if ( exists $last_name ) {
my $person->{NAME}->{LAST} = $last_name;
}
return $person->{NAME}->{LAST};
}
When I set my last name using this subroutine ...I mean method, I guarantee that the key will be $person->{NAME}->{LAST} and not $person->{LAST}->{NAME} or $person->{LAST}->{NMAE}. or $person->{last}->{name}.
The main problem isn't learning the mechanisms, but learning to apply them. So, think about exactly how you want to represent your items. This about what fields you want, and how you're going to pull up that information.
You could try pushing each value onto a hash of arrays:
my (#gene, #mrna, #exon, #cds);
my %hash;
push #{ $hash{$gene[$_]} }, [$mrna[$_], $exon[$_], $cds[$_] ] for 0 .. $#gene;
This way gene is the key, with multiple values ($mrna, $exon, $cds) associated with it.
Iterate over keys/values as follows:
for my $key (sort keys %hash) {
print "Gene: $key\t";
for my $value (#{ $hash{$key} } ) {
my ($mrna, $exon, $cds) = #$value; # De-references the array
print "Values: [$mrna], [$exon], [$cds]\n";
}
}
The answer to a question I've asked previously might be of help (Can a hash key have multiple 'subvalues' in perl?).

How to manipulate a hash-ref with Perl?

Take a look at this code. After hours of trial and error. I finally got a solution. But have no idea why it works, and to be quite honest, Perl is throwing me for a loop here.
use Data::Diff 'Diff';
use Data::Dumper;
my $out = Diff(\#comparr,\#grabarr);
my #uniq_a;
#temp = ();
my $x = #$out{uniq_a};
foreach my $y (#$x) {
#temp = ();
foreach my $z (#$y) {
push(#temp, $z);
}
push(#uniq_a, [$temp[0], $temp[1], $temp[2], $temp[3]]);
}
Why is it that the only way I can access the elements of the $out array is to pass a hash key into a scalar which has been cast as an array using a for loop? my $x = #$out{uniq_a}; I'm totally confused. I'd really appreciate anyone who can explain what's going on here so I'll know for the future. Thanks in advance.
$out is a hash reference, and you use the dereferencing operator ->{...} to access members of the hash that it refers to, like
$out->{uniq_a}
What you have stumbled on is Perl's hash slice notation, where you use the # sigil in front of the name of a hash to conveniently extract a list of values from that hash. For example:
%foo = ( a => 123, b => 456, c => 789 );
$foo = { a => 123, b => 456, c => 789 };
print #foo{"b","c"}; # 456,789
print #$foo{"c","a"}; # 789,123
Using hash slice notation with a single element inside the braces, as you do, is not the typical usage and gives you the results you want by accident.
The Diff function returns a hash reference. You are accessing the element of this hash that has key uniq_a by extracting a one-element slice of the hash, instead of the correct $out->{uniq_a}. Your code should look like this
my $out = Diff(\#comparr, \#grabarr);
my #uniq_a;
my $uniq_a = $out->{uniq_a};
for my $list (#$uniq_a) {
my #temp = #$list;
push #uniq_a, [ #temp[0..3] ];
}
In the documentation for Data::Diff it states:
The value returned is always a hash reference and the hash will have
one or more of the following hash keys: type, same, diff, diff_a,
diff_b, uniq_a and uniq_b
So $out is a reference and you have to access the values through the mentioned keys.

What does it mean when you try to print an array or hash using Perl and you get, Array(0xd3888)?

What does it mean when you try to print an array or hash and you see the following; Array(0xd3888) or HASH(0xd3978)?
EXAMPLE
CODE
my #data = (
['1_TEST','1_T','1_TESTER'],
['2_TEST','2_T','2_TESTER'],
['3_TEST','3_T','3_TESTER'],
['4_TEST','4_T','4_TESTER'],
['5_TEST','5_T','5_TESTER'],
['6_TEST','6_T','^_TESTER']
);
foreach my $line (#data) {
chomp($line);
#random = split(/\|/,$line);
print "".$random[0]."".$random[1]."".$random[2]."","\n";
}
RESULT
ARRAY(0xc1864)
ARRAY(0xd384c)
ARRAY(0xd3894)
ARRAY(0xd38d0)
ARRAY(0xd390c)
ARRAY(0xd3948)
It's hard to tell whether you meant it or not, but the reason why you're getting array references is because you're not printing what you think you are.
You started out right when iterating over the 'rows' of #data with:
foreach my $line (#data) { ... }
However, the next line is a no-go. It seems that you're confusing text strings with an array structure. Yes, each row contains strings, but Perl treats #data as an array, not a string.
split is used to convert strings to arrays. It doesn't operate on arrays! Same goes for chomp (with an irrelevant exception).
What you'll want to do is replace the contents of the foreach loop with the following:
foreach my $line (#data) {
print $line->[0].", ".$line->[1].", ".$line->[2]."\n";
}
You'll notice the -> notation, which is there for a reason. $line refers to an array. It is not an array itself. The -> arrows deference the array, allowing you access to individual elements of the array referenced by $line.
If you're not comfortable with the idea of deferencing with arrows (and most beginners usually aren't), you can create a temporary array as shown below and use that instead.
foreach my $line (#data) {
my #random = #{ $line };
print $random[0].", ".$random[1].", ".$random[2]."\n";
}
OUTPUT
1_TEST, 1_T, 1_TESTER
2_TEST, 2_T, 2_TESTER
3_TEST, 3_T, 3_TESTER
4_TEST, 4_T, 4_TESTER
5_TEST, 5_T, 5_TESTER
6_TEST, 6_T, ^_TESTER
A one-liner might go something like print "#$_\n" for #data; (which is a bit OTT), but if you want to just print the array to see what it looks like (say, for debugging purposes), I'd recommend using the Data::Dump module, which pretty-prints arrays and hashes for you without you having to worry about it too much.
Just put use Data::Dump 'dump'; at beginning of your script, and then dump #data;. As simple as that!
It means you do not have an array; you have a reference to an array.
Note that an array is specified with round brackets - as a list; when you use the square bracket notation, you are creating a reference to an array.
foreach my $line (#data)
{
my #array = #$line;
print "$array[0] - $array[1] - $array[2]\n";
}
Illustrating the difference:
my #data = (
['1_TEST','1_T','1_TESTER'],
['2_TEST','2_T','2_TESTER'],
['3_TEST','3_T','3_TESTER'],
['4_TEST','4_T','4_TESTER'],
['5_TEST','5_T','5_TESTER'],
['6_TEST','6_T','^_TESTER']
);
# Original print loop
foreach my $line (#data)
{
chomp($line);
#random = split(/\|/,$line);
print "".$random[0]."".$random[1]."".$random[2]."","\n";
}
# Revised print loop
foreach my $line (#data)
{
my #array = #$line;
print "$array[0] - $array[1] - $array[2]\n";
}
Output
ARRAY(0x62c0f8)
ARRAY(0x649db8)
ARRAY(0x649980)
ARRAY(0x649e48)
ARRAY(0x649ec0)
ARRAY(0x649f38)
1_TEST - 1_T - 1_TESTER
2_TEST - 2_T - 2_TESTER
3_TEST - 3_T - 3_TESTER
4_TEST - 4_T - 4_TESTER
5_TEST - 5_T - 5_TESTER
6_TEST - 6_T - ^_TESTER
You're printing a reference to the hash or array, rather than the contents OF that.
In the particular code you're describing I seem to recall that Perl automagically makes the foreach looping index variable (my $line in your code) into an "alias" (a sort of reference I guess) of the value at each stage through the loop.
So $line is a reference to #data[x] ... which is, at each iteration, some array. To get at one of the element of #data[0] you'd need the $ sigil (because the elements of the array at #data[0] are scalars). However $line[0] is a reference to some package/global variable that doesn't exist (use warnings; use strict; will tell you that, BTW).
[Edited after Ether pointed out my ignorance]
#data is a list of anonymous array references; each of which contains a list of scalars. Thus you have to use the sort of explicit de-referencing I describe below:
What you need is something more like:
print ${$line}[0], ${$line}[1], ${$line}[2], "\n";
... notice that the ${xxx}[0] is ensuring that the xxx is derefenced, then indexing is performed on the result of the dereference, which is then extracted as a scalar.
I testing this as well:
print $$line[0], $$line[1], $$line[2], "\n";
... and it seems to work. (However, I think that the first form is more clear, even if it's more verbose).
Personally I chalk this up to yet another gotchya in Perl.
[Further editorializing] I still count this as a "gotchya." Stuff like this, and the fact that most of the responses to this question have been technically correct while utterly failing to show any effort to actually help the original poster, has once again reminded me why I shifted to Python so many years ago. The code I posted works, of course, and probably accomplishes what the OP was attempting. My explanation was wholly wrong. I saw the word "alias" in the `perlsyn` man page and remembered that there were some funky semantics somewhere there; so I totally missed the part that [...] is creating an anonymous reference. Unless you drink from the Perl Kool-Aid in deep drafts then even the simplest code cannot be explained.

What's the safest way to iterate through the keys of a Perl hash?

If I have a Perl hash with a bunch of (key, value) pairs, what is the preferred method of iterating through all the keys? I have heard that using each may in some way have unintended side effects. So, is that true, and is one of the two following methods best, or is there a better way?
# Method 1
while (my ($key, $value) = each(%hash)) {
# Something
}
# Method 2
foreach my $key (keys(%hash)) {
# Something
}
The rule of thumb is to use the function most suited to your needs.
If you just want the keys and do not plan to ever read any of the values, use keys():
foreach my $key (keys %hash) { ... }
If you just want the values, use values():
foreach my $val (values %hash) { ... }
If you need the keys and the values, use each():
keys %hash; # reset the internal iterator so a prior each() doesn't affect the loop
while(my($k, $v) = each %hash) { ... }
If you plan to change the keys of the hash in any way except for deleting the current key during the iteration, then you must not use each(). For example, this code to create a new set of uppercase keys with doubled values works fine using keys():
%h = (a => 1, b => 2);
foreach my $k (keys %h)
{
$h{uc $k} = $h{$k} * 2;
}
producing the expected resulting hash:
(a => 1, A => 2, b => 2, B => 4)
But using each() to do the same thing:
%h = (a => 1, b => 2);
keys %h;
while(my($k, $v) = each %h)
{
$h{uc $k} = $h{$k} * 2; # BAD IDEA!
}
produces incorrect results in hard-to-predict ways. For example:
(a => 1, A => 2, b => 2, B => 8)
This, however, is safe:
keys %h;
while(my($k, $v) = each %h)
{
if(...)
{
delete $h{$k}; # This is safe
}
}
All of this is described in the perl documentation:
% perldoc -f keys
% perldoc -f each
One thing you should be aware of when using each is that it has
the side effect of adding "state" to your hash (the hash has to remember
what the "next" key is). When using code like the snippets posted above,
which iterate over the whole hash in one go, this is usually not a
problem. However, you will run into hard to track down problems (I speak from
experience ;), when using each together with statements like
last or return to exit from the while ... each loop before you
have processed all keys.
In this case, the hash will remember which keys it has already returned, and
when you use each on it the next time (maybe in a totaly unrelated piece of
code), it will continue at this position.
Example:
my %hash = ( foo => 1, bar => 2, baz => 3, quux => 4 );
# find key 'baz'
while ( my ($k, $v) = each %hash ) {
print "found key $k\n";
last if $k eq 'baz'; # found it!
}
# later ...
print "the hash contains:\n";
# iterate over all keys:
while ( my ($k, $v) = each %hash ) {
print "$k => $v\n";
}
This prints:
found key bar
found key baz
the hash contains:
quux => 4
foo => 1
What happened to keys "bar" and baz"? They're still there, but the
second each starts where the first one left off, and stops when it reaches the end of the hash, so we never see them in the second loop.
The place where each can cause you problems is that it's a true, non-scoped iterator. By way of example:
while ( my ($key,$val) = each %a_hash ) {
print "$key => $val\n";
last if $val; #exits loop when $val is true
}
# but "each" hasn't reset!!
while ( my ($key,$val) = each %a_hash ) {
# continues where the last loop left off
print "$key => $val\n";
}
If you need to be sure that each gets all the keys and values, you need to make sure you use keys or values first (as that resets the iterator). See the documentation for each.
Using the each syntax will prevent the entire set of keys from being generated at once. This can be important if you're using a tie-ed hash to a database with millions of rows. You don't want to generate the entire list of keys all at once and exhaust your physical memory. In this case each serves as an iterator whereas keys actually generates the entire array before the loop starts.
So, the only place "each" is of real use is when the hash is very large (compared to the memory available). That is only likely to happen when the hash itself doesn't live in memory itself unless you're programming a handheld data collection device or something with small memory.
If memory is not an issue, usually the map or keys paradigm is the more prevelant and easier to read paradigm.
A few miscellaneous thoughts on this topic:
There is nothing unsafe about any of the hash iterators themselves. What is unsafe is modifying the keys of a hash while you're iterating over it. (It's perfectly safe to modify the values.) The only potential side-effect I can think of is that values returns aliases which means that modifying them will modify the contents of the hash. This is by design but may not be what you want in some circumstances.
John's accepted answer is good with one exception: the documentation is clear that it is not safe to add keys while iterating over a hash. It may work for some data sets but will fail for others depending on the hash order.
As already noted, it is safe to delete the last key returned by each. This is not true for keys as each is an iterator while keys returns a list.
I always use method 2 as well. The only benefit of using each is if you're just reading (rather than re-assigning) the value of the hash entry, you're not constantly de-referencing the hash.
I may get bitten by this one but I think that it's personal preference. I can't find any reference in the docs to each() being different than keys() or values() (other than the obvious "they return different things" answer. In fact the docs state the use the same iterator and they all return actual list values instead of copies of them, and that modifying the hash while iterating over it using any call is bad.
All that said, I almost always use keys() because to me it is usually more self documenting to access the key's value via the hash itself. I occasionally use values() when the value is a reference to a large structure and the key to the hash was already stored in the structure, at which point the key is redundant and I don't need it. I think I've used each() 2 times in 10 years of Perl programming and it was probably the wrong choice both times =)
I usually use keys and I can't think of the last time I used or read a use of each.
Don't forget about map, depending on what you're doing in the loop!
map { print "$_ => $hash{$_}\n" } keys %hash;
I woudl say:
Use whatever's easiest to read/understand for most people (so keys, usually, I'd argue)
Use whatever you decide consistently throught the whole code base.
This give 2 major advantages:
It's easier to spot "common" code so you can re-factor into functions/methiods.
It's easier for future developers to maintain.
I don't think it's more expensive to use keys over each, so no need for two different constructs for the same thing in your code.