Adding multiple values to key in perl hash - perl

I need to create multi-dimensional hash.
for example I have done:
$hash{gene} = $mrna;
if (exists ($exon)){
$hash{gene}{$mrna} = $exon;
}
if (exists ($cds)){
$hash{gene}{$mrna} = $cds;
}
where $gene, $mrna, $exon, $cds are unique ids.
But, my issue is that I want some properties of $gene and $mrna to be included in the hash.
for example:
$hash{$gene}{'start_loc'} = $start;
$hash{gene}{mrna}{'start_loc'} = $start;
etc. But, is that a feasible way of declaring a hash? If I call $hash{$gene} both $mrna and start_loc will be printed. What could be the solution?
How would I add multiple values for the same key $gene and $mrna being the keys in this case.
Any suggestions will be appreciated.

What you need to do is to read the Perl Reference Tutorial.
Simple answer to your question:
Perl hashes can only take a single value to a key. However, that single value can be a reference to a memory location of another hash.
my %hash1 = ( foo => "bar", fu => "bur" }; #First hash
my %hash2;
my $hash{some_key} = \%hash1; #Reference to %hash1
And, there's nothing stopping that first hash from containing a reference to another hash. It's turtles all the way down!.
So yes, you can have a complex and convoluted structure as you like with as many sub-hashes as you want. Or mix in some arrays too.
For various reasons, I prefer the -> syntax when using these complex structures. I find that for more complex structures, it makes it easier to read. However, the main this is it makes you remember these are references and not actual multidimensional structures.
For example:
$hash{gene}->{mrna}->{start_loc} = $start; #Quote not needed in string if key name qualifies as a valid variable name.
The best thing to do is to think of your hash as a structure. For example:
my $person_ref = {}; #Person is a hash reference.
my $person->{NAME}->{FIRST} = "Bob";
my $person->{NAME}->{LAST} = "Rogers";
my $person->{PHONE}->{WORK}->[0] = "555-1234"; An Array Ref. Might have > 1
my $person->{PHONE}->{WORK}->[1] = "555-4444";
my $person->{PHONE}->{CELL}->[0] = "555-4321";
...
my #people;
push #people, $person_ref;
Now, I can load up my #people array with all my people, or maybe use a hash:
my %person;
$person{$bobs_ssn} = $person; #Now, all of Bob's info is index by his SSN.
So, the first thing you need to do is to think of what your structure should look like. What are the fields in your structure? What are the sub-fields? Figure out what your structure should look like, and then setup your hash of hashes to look like that. Figure out exactly how it will be stored and keyed.
Remember, this hash contains references to your genes (or whatever), so you want to choose your keys wisely.
Read the tutorial. Then, try your hand at it. It's not all that complicated to understand. However, it can be a bear to maintain.
When you say use strict;, you give yourself some protection:
my $foo = "bar";
say $Foo; #This won't work!
This won't work because you didn't declare $Foo, you declared $foo. The use stict; can catch variable names that are mistyped, but:
my %var;
$var{foo} = "bar";
say $var{Foo}; #Whoops!
This will not be caught (except maybe that $var{Foo} has not been initialized. The use strict; pragma can't detect mistakes in typing in your keys.
The next step, after you've grown comfortable with references is to move onto object oriented Perl. There's a Tutorial for that too.
All Object Oriented Perl does is to take your hash references, and turns them into objects. Then, it creates subroutines that will help you keep track of manipulating objects. For example:
sub last_name {
my $person = shift; #Don't worry about this for now..
my $last_name = shift;
if ( exists $last_name ) {
my $person->{NAME}->{LAST} = $last_name;
}
return $person->{NAME}->{LAST};
}
When I set my last name using this subroutine ...I mean method, I guarantee that the key will be $person->{NAME}->{LAST} and not $person->{LAST}->{NAME} or $person->{LAST}->{NMAE}. or $person->{last}->{name}.
The main problem isn't learning the mechanisms, but learning to apply them. So, think about exactly how you want to represent your items. This about what fields you want, and how you're going to pull up that information.

You could try pushing each value onto a hash of arrays:
my (#gene, #mrna, #exon, #cds);
my %hash;
push #{ $hash{$gene[$_]} }, [$mrna[$_], $exon[$_], $cds[$_] ] for 0 .. $#gene;
This way gene is the key, with multiple values ($mrna, $exon, $cds) associated with it.
Iterate over keys/values as follows:
for my $key (sort keys %hash) {
print "Gene: $key\t";
for my $value (#{ $hash{$key} } ) {
my ($mrna, $exon, $cds) = #$value; # De-references the array
print "Values: [$mrna], [$exon], [$cds]\n";
}
}
The answer to a question I've asked previously might be of help (Can a hash key have multiple 'subvalues' in perl?).

Related

How to make a particular change in all the keys of a hash?

I have a hash which is like this:
'IRQ_VSAFE_LPM_ASC_0' => '140',
'IRQ_VSAFE_LPM_ASC_1' => '141'.......and so on
I want to replace ASC_ by ASC_1 in all keys in the hash. I tried this:
foreach $_(keys $hash)
{
s/ASC_/ASC_1/g;
}
but it's not working.
You have to delete old keys from the hash and insert new ones,
use strict;
use warnings;
sub rename_keys {
my ($hash, $func) = #_;
my #k1 = my #k2 = keys %$hash;
$func->() for #k2;
#$hash{#k2} = delete #$hash{#k1};
}
my %hash = (
'IRQ_VSAFE_LPM_ASC_0' => '140',
'IRQ_VSAFE_LPM_ASC_1' => '141',
);
rename_keys(\%hash, sub { s/ASC_/ASC_1/ });
The previous answer addressed a way to do what you want. However, it also makes sense to explain why what you tried to do didn't work.
The problem is that the syntax used for working with hashes in Perl can mislead you with its simplicity compared to the actual way the hash works underneath.
What you see in Perl code is simply two pieces of information: a hash key and a corresponding hash value: $myHash{$key} = $value; or even more misleading %myHash = ($key => $value);
However, the way the hashes work, this isn't merely storing the key and a value as a pair, as the code above may lead you into thinking. Instead, a hash is a complicated data structure, in which the key serves as an input into the addressing which is done via a formula (hash function) and an algorithm (to deal with collistions) - the details are well covered on Wikipedia article.
As such, changing a hash key as if it was merely a value isn't enough, because what is stored in the hash isn't just a value - it's a whole data structure with addressing based on that value. Therefore when you change a hash key, it would ALSO change the location of the value in the data structure, and doing that isn't possible without removing the old entry and adding a brand new entry under a new key, which will delete and re-insert the value in the correct place.
A simple way to do this may be to use pairmap from recent List::Util.
use 5.014; # so we can use the /r flag to s///
use List::Util qw( pairmap );
my %new = pairmap { ($a =~ s/ASC_/ASC_1/r) => $b } %oldhash;

Nicer way to test if hash entry exists before assigning it

I'm looking for a nicer way to first "test" if a hash key exists before using it. I'm currently writing a eventlog parser that decodes hex numbers into strings. As I cannot be sure that my decode table contains hex numbers I first need to check if the key exists in a hash before assigning the value to a new variable. So what I'm doing a lot is:
if ($MEL[$i]{type} eq '5024') {
$MEL[$i]{decoded_inline} = $decode_hash{checkpoint}{"$MEL[$i]{raw}[128]"}
if exists ($decode_hash{checkpoint}{"$MEL[$i]{raw}[128]"})
}
What I do not like is that the expression $decode_hash{checkpoint}{"$MEL[$i]{raw}[128]"} is twice in my code. Is there a nicer or shorter version of the line above?
I doubt this qualifies as "nice", but I think it is achieving the goal of not referring to the expression twice. I'm not sure it's worth this pain, mind you:
my $foo = $decode_hash{checkpoint};
my $bar = $MEL[$i]{raw}[128];
if ($MEL[$i]{type} eq '5024') {
$MEL[$i]{decoded_inline} = $foo->{$bar}
if exists ( $foo->{$bar} );
}
Yes there is an easier way. You know that you can only store references in an array or hash, right? Well, there's a neat side effect to that. You can take references to deep hash or array slots and then treat them like scalar references. The unfortunate side-effect is that it autovivifies the slot, but if you're always going to assign to that slot, and just want to do some checking first, it's not a bad way to keep from typing things over and over--as well as repeatedly indexing the structures as well.
my $ref = \$decode_hash{checkpoint}{"$MEL[$i]{raw}[128]"};
unless ( defined( $$ref )) {
...
$$ref = {};
...
}
As long as an existing hash element can't have an undefined value, I would write this
if ($MEL[$i]{type} eq '5024') {
my $value = $decode_hash{checkpoint}{$MEL[$i]{raw}[128]};
$MEL[$i]{decoded_inline} = $value if defined $value;
}
(Note that you shouldn't have the double-quotes around the hash key.)

How do I create a hash of hashes in Perl?

Based on my current understanding of hashes in Perl, I would expect this code to print "hello world." It instead prints nothing.
%a=();
%b=();
$b{str} = "hello";
$a{1}=%b;
$b=();
$b{str} = "world";
$a{2}=%b;
print "$a{1}{str} $a{2}{str}";
I assume that a hash is just like an array, so why can't I make a hash contain another?
You should always use "use strict;" in your program.
Use references and anonymous hashes.
use strict;use warnings;
my %a;
my %b;
$b{str} = "hello";
$a{1}={%b};
%b=();
$b{str} = "world";
$a{2}={%b};
print "$a{1}{str} $a{2}{str}";
{%b} creates reference to copy of hash %b. You need copy here because you empty it later.
Hashes of hashes are tricky to get right the first time. In this case
$a{1} = { %b };
...
$a{2} = { %b };
will get you where you want to go.
See perldoc perllol for the gory details about two-dimensional data structures in Perl.
Short answer: hash keys can only be associated with a scalar, not a hash. To do what you want, you need to use references.
Rather than re-hash (heh) how to create multi-level data structures, I suggest you read perlreftut. perlref is more complete, but it's a bit overwhelming at first.
Mike, Alexandr's is the right answer.
Also a tip. If you are just learning hashes perl has a module called Data::Dumper that can pretty-print your data structures for you, which is really handy when you'd like to check what values your data structures have.
use Data::Dumper;
print Dumper(\%a);
when you print this it shows
$VAR1 = {
'1' => {
'str' => 'hello'
},
'2' => {
'str' => 'world'
}
};
Perl likes to flatten your data structures. That's often a good thing...for example, (#options, "another option", "yet another") ends up as one list.
If you really mean to have one structure inside another, the inner structure needs to be a reference. Like so:
%a{1} = { %b };
The braces denote a hash, which you're filling with values from %b, and getting back as a reference rather than a straight hash.
You could also say
$a{1} = \%b;
but that makes changes to %b change $a{1} as well.
I needed to create 1000 employees records for testing a T&A system. The employee records were stored in a hash where the key was the employee's identity number, and the value was a hash of their name, date of birth, and date of hire etc. Here's how...
# declare an empty hash
%employees = ();
# add each employee to the hash
$employees{$identity} = {gender=>$gender, forename=>$forename, surname=>$surname, dob=>$dob, doh=>$doh};
# dump the hash as CSV
foreach $identity ( keys %employees ){
print "$identity,$employees{$identity}{forename},$employees{$identity}{surname}\n";
}

Should Perl hashes always contain values?

I had an earlier question that received the following response from the noted Perl expert, Perl author and Perl trainer brian d foy:
[If] you're looking for a fixed sequence of characters at the end of each filename. You want to know if that fixed sequence is in a list of sequences that interest you. Store all the extensions in a hash and look in that hash:
my( $extension ) = $filename =~ m/\.([^.]+)$/;
if( exists $hash{$extension} ) { ... }
You don't need to build up a regular expression, and you don't need to go through several possible regex alternations to check every extension you have to examine.
Thanks for the advice brian.
What I now want to know is what is the best practice in a case like the above. Should one only define the keys, which is all I need to achieve what's described above, or should one always define a value as well?
It's usually preferable to set a defined value for every key. The idiomatic value (when you don't care about the value) is 1.
my %hash = map { $_ => 1 } #array;
Doing it this way makes the code the uses the hash slightly simpler because you can use $hash{key} as a Boolean value. If the value can be undefined you need to use the more verbose exists $hash{key} instead.
That said, there are situations where a value of undef is desirable. For example: imagine that you're parsing C header files to extract preprocessor symbols. It would be logical to store these in a hash of name => value pairs.
#define FOO 1
#define BAR
In Perl, this would map to:
my %symbols = ( FOO => 1, BAR => undef);
In C a #define defines a symbol, not a value -- "defined" in C is mapped to "exists" in Perl.
You can't create a hash key without a value. The value can be undef but it will be there. How else would you construct a hash. Or was your question regarding whether the value can be undef? In which case I would say that the value you store there (undef, 1, 0...) is entirely up to you. If a lot of folks are using it then you probably want to store some true value though incase some one else uses if ($hash{$extension}) {...} instead of exists because they weren't paying attention.
undef is a value.
Of course, stuff like that is always depndent on what you are currently doing. But $foo{bar} is just a variable like $bar and I don't see any reason why either one should not be undef every now and then.
PS:
That's why exists exists.
As others have said, the idiomatic solution for a hashset (a hash that only contains keys, not values) is to use 1 as the value because this makes the testing for existence easy. However, there is something to be said for using undef as the value. It will force the users to test for existence with exists which is marginally faster. Of course, you could test for existence with exists even when the value is 1 and avoid the inevitable bugs from users who forget to use exists.
Using undef as a value in hash is more memory efficient than storing 1.
Storing '1' in a Set-hash Considered Harmful
I know using Considered Harmful is considered harmful, but this is bad, almost as bad as unrestrained goto usage.
Ok, I've harped on this in a few comments, but I think I need a full response to demonstrate the issue.
Let's say we have a daemon process that provides back-end inventory control for a shop that sells widgets.
my #items = qw(
widget
thingy
whozit
whatsit
);
my #items_in_stock = qw(
widget
thingy
);
my %in_stock;
my #in_stock(#items_in_stock) = (1) x #items_in_stock; #initialize all keys to 1
sub Process_Request {
my $request = shift;
if( $request eq REORDER ) {
Reorder_Items(\#items, \%in_stock);
}
else {
Error_Response( ILLEGAL_REQUEST );
}
}
sub Reorder_Items{
my $items = shift;
my $in_stock = shift;
# Order items we do not have in-stock.
for my $item ( #$items ) {
Reorder_Item( $item )
if not exists $in_stock->{$item};
}
}
The tool is great, it automatically keeps items in stock. Very nice. Now, the boss asks for automatically generated catalogs of in-stock items. So we modify Process_Request() and add catalog generation.
sub Process_Request {
my $request = shift;
if( $request eq REORDER ) {
Reorder_Items(\#items, \%in_stock);
}
if( $request eq CATALOG ) {
Build_Catalog(\#items, \%in_stock);
}
else {
Error_Response( ILLEGAL_REQUEST );
}
}
sub Build_Catalog {
my $items = shift;
my $in_stock = shift;
my $catalog_response = '';
foreach my $item ( #$items ) {
$catalog_response .= Catalog_Item($item)
if $in_stock->{$item};
}
return $catalog_response;
}
In testing, Build_Catalog() works fine. Hooray, we go live with the app.
Oops. For some reason nothing is being ordered, the company runs out of stock of everything.
The Build_Catalog() routine adds keys to %in_stock, so Reorder_Items() now sees everything as in stock and never makes an order.
Using Hash::Util's lock_hash can help prevent accidental hash modification. If we locked %in_stock before calling Build_Catalog() we would have gotten a fatal error and would never have gone live with the bug.
In summary, it is best to test existence of keys rather than truth of your set-hash values. If you are using existence as a signifier, don't set your values to '1' because that will mask bugs and make them harder to track. Using lock_hash can help catch these problems.
If you must check for the truth of the values, do so in every case.

How can I combine hashes in Perl?

What is the best way to combine both hashes into %hash1? I always know that %hash2 and %hash1 always have unique keys. I would also prefer a single line of code if possible.
$hash1{'1'} = 'red';
$hash1{'2'} = 'blue';
$hash2{'3'} = 'green';
$hash2{'4'} = 'yellow';
Quick Answer (TL;DR)
%hash1 = (%hash1, %hash2)
## or else ...
#hash1{keys %hash2} = values %hash2;
## or with references ...
$hash_ref1 = { %$hash_ref1, %$hash_ref2 };
Overview
Context: Perl 5.x
Problem: The user wishes to merge two hashes1 into a single variable
Solution
use the syntax above for simple variables
use Hash::Merge for complex nested variables
Pitfalls
What do to when both hashes contain one or more duplicate keys
(see e.g., Perl - Merge hash containing duplicate keys)
(see e.g., Perl hashes: how to deal with duplicate keys and get possible pair)
Should a key-value pair with an empty value ever overwrite a key-value pair with a non-empty value?
What constitutes an empty vs non-empty value in the first place? (e.g. undef, zero, empty string, false, falsy ...)
See also
PM post on merging hashes
PM Categorical Q&A hash union
Perl Cookbook 5.10. Merging Hashes
websearch://perlfaq "merge two hashes"
websearch://perl merge hash
https://metacpan.org/pod/Hash::Merge
Footnotes
1 * (aka associative-array, aka dictionary)
Check out perlfaq4: How do I merge two hashes. There is a lot of good information already in the Perl documentation and you can have it right away rather than waiting for someone else to answer it. :)
Before you decide to merge two hashes, you have to decide what to do if both hashes contain keys that are the same and if you want to leave the original hashes as they were.
If you want to preserve the original hashes, copy one hash (%hash1) to a new hash (%new_hash), then add the keys from the other hash (%hash2 to the new hash. Checking that the key already exists in %new_hash gives you a chance to decide what to do with the duplicates:
my %new_hash = %hash1; # make a copy; leave %hash1 alone
foreach my $key2 ( keys %hash2 )
{
if( exists $new_hash{$key2} )
{
warn "Key [$key2] is in both hashes!";
# handle the duplicate (perhaps only warning)
...
next;
}
else
{
$new_hash{$key2} = $hash2{$key2};
}
}
If you don't want to create a new hash, you can still use this looping technique; just change the %new_hash to %hash1.
foreach my $key2 ( keys %hash2 )
{
if( exists $hash1{$key2} )
{
warn "Key [$key2] is in both hashes!";
# handle the duplicate (perhaps only warning)
...
next;
}
else
{
$hash1{$key2} = $hash2{$key2};
}
}
If you don't care that one hash overwrites keys and values from the other, you could just use a hash slice to add one hash to another. In this case, values from %hash2 replace values from %hash1 when they have keys in common:
#hash1{ keys %hash2 } = values %hash2;
This is an old question, but comes out high in my Google search for 'perl merge hashes' - and yet it does not mention the very helpful CPAN module Hash::Merge
For hash references. You should use curly braces like the following:
$hash_ref1 = {%$hash_ref1, %$hash_ref2};
and not the suggested answer above using parenthesis:
$hash_ref1 = ($hash_ref1, $hash_ref2);