How do I create a hash of hashes in Perl? - perl

Based on my current understanding of hashes in Perl, I would expect this code to print "hello world." It instead prints nothing.
%a=();
%b=();
$b{str} = "hello";
$a{1}=%b;
$b=();
$b{str} = "world";
$a{2}=%b;
print "$a{1}{str} $a{2}{str}";
I assume that a hash is just like an array, so why can't I make a hash contain another?

You should always use "use strict;" in your program.
Use references and anonymous hashes.
use strict;use warnings;
my %a;
my %b;
$b{str} = "hello";
$a{1}={%b};
%b=();
$b{str} = "world";
$a{2}={%b};
print "$a{1}{str} $a{2}{str}";
{%b} creates reference to copy of hash %b. You need copy here because you empty it later.

Hashes of hashes are tricky to get right the first time. In this case
$a{1} = { %b };
...
$a{2} = { %b };
will get you where you want to go.
See perldoc perllol for the gory details about two-dimensional data structures in Perl.

Short answer: hash keys can only be associated with a scalar, not a hash. To do what you want, you need to use references.
Rather than re-hash (heh) how to create multi-level data structures, I suggest you read perlreftut. perlref is more complete, but it's a bit overwhelming at first.

Mike, Alexandr's is the right answer.
Also a tip. If you are just learning hashes perl has a module called Data::Dumper that can pretty-print your data structures for you, which is really handy when you'd like to check what values your data structures have.
use Data::Dumper;
print Dumper(\%a);
when you print this it shows
$VAR1 = {
'1' => {
'str' => 'hello'
},
'2' => {
'str' => 'world'
}
};

Perl likes to flatten your data structures. That's often a good thing...for example, (#options, "another option", "yet another") ends up as one list.
If you really mean to have one structure inside another, the inner structure needs to be a reference. Like so:
%a{1} = { %b };
The braces denote a hash, which you're filling with values from %b, and getting back as a reference rather than a straight hash.
You could also say
$a{1} = \%b;
but that makes changes to %b change $a{1} as well.

I needed to create 1000 employees records for testing a T&A system. The employee records were stored in a hash where the key was the employee's identity number, and the value was a hash of their name, date of birth, and date of hire etc. Here's how...
# declare an empty hash
%employees = ();
# add each employee to the hash
$employees{$identity} = {gender=>$gender, forename=>$forename, surname=>$surname, dob=>$dob, doh=>$doh};
# dump the hash as CSV
foreach $identity ( keys %employees ){
print "$identity,$employees{$identity}{forename},$employees{$identity}{surname}\n";
}

Related

How to make a particular change in all the keys of a hash?

I have a hash which is like this:
'IRQ_VSAFE_LPM_ASC_0' => '140',
'IRQ_VSAFE_LPM_ASC_1' => '141'.......and so on
I want to replace ASC_ by ASC_1 in all keys in the hash. I tried this:
foreach $_(keys $hash)
{
s/ASC_/ASC_1/g;
}
but it's not working.
You have to delete old keys from the hash and insert new ones,
use strict;
use warnings;
sub rename_keys {
my ($hash, $func) = #_;
my #k1 = my #k2 = keys %$hash;
$func->() for #k2;
#$hash{#k2} = delete #$hash{#k1};
}
my %hash = (
'IRQ_VSAFE_LPM_ASC_0' => '140',
'IRQ_VSAFE_LPM_ASC_1' => '141',
);
rename_keys(\%hash, sub { s/ASC_/ASC_1/ });
The previous answer addressed a way to do what you want. However, it also makes sense to explain why what you tried to do didn't work.
The problem is that the syntax used for working with hashes in Perl can mislead you with its simplicity compared to the actual way the hash works underneath.
What you see in Perl code is simply two pieces of information: a hash key and a corresponding hash value: $myHash{$key} = $value; or even more misleading %myHash = ($key => $value);
However, the way the hashes work, this isn't merely storing the key and a value as a pair, as the code above may lead you into thinking. Instead, a hash is a complicated data structure, in which the key serves as an input into the addressing which is done via a formula (hash function) and an algorithm (to deal with collistions) - the details are well covered on Wikipedia article.
As such, changing a hash key as if it was merely a value isn't enough, because what is stored in the hash isn't just a value - it's a whole data structure with addressing based on that value. Therefore when you change a hash key, it would ALSO change the location of the value in the data structure, and doing that isn't possible without removing the old entry and adding a brand new entry under a new key, which will delete and re-insert the value in the correct place.
A simple way to do this may be to use pairmap from recent List::Util.
use 5.014; # so we can use the /r flag to s///
use List::Util qw( pairmap );
my %new = pairmap { ($a =~ s/ASC_/ASC_1/r) => $b } %oldhash;

Unable to understand the behaviour of Perl hash ordering

I am a beginner in Perl and I am trying to run this sample example from "Beginning Perl:Curtis Poe"
#!/perl/bin/perl
use strict;
use warnings;
use diagnostics;
my $hero = 'Ovid';
my $fool = $hero;
print "$hero isn't that much of a hero. $fool is a fool.\n";
$hero = 'anybody else';
print "$hero is probably more of a hero than $fool.\n";
my %snacks = (
stinky => 'limburger',
yummy => 'brie',
surprise => 'soap slices',
);
my #cheese_tray = values %snacks;
print "Our cheese tray will have: ";
for my $cheese (#cheese_tray) {
print "'$cheese' ";
}
print "\n";
Output of the above code, when I tried on my windows7 system with ActivePerl and in codepad.org
Ovid isn't that much of a hero. Ovid is a fool.
anybody else is probably more of a hero than Ovid.
Our cheese tray will have: 'limburger''soap slices''brie'
I am not clear with third line which prints 'limburger''soap slices''brie' but hash order is having 'limburger''brie''soap slices'.
Please help me to understand.
Hashes are not ordered. If you want a specific order, you need to use an array.
For example:
my #desc = qw(stinky yummy surprise);
my #type = ("limburger", "brie", "soap slices");
my %snacks;
#snacks{#desc} = #type;
Now you have the types in #type.
You can of course also use sort:
my #type = sort keys %snacks;
perldoc perldata:
Hashes are unordered collections of scalar values indexed by their
associated string key.
You can sort keys or values as needed.
I think the key is:
my #cheese_tray = values %snacks
From [1]: http://perldoc.perl.org/functions/values.html
"Hash entries are returned in an apparently random order. The actual random order is specific to a given hash; the exact same series of operations on two hashes may result in a different order for each hash."

Adding multiple values to key in perl hash

I need to create multi-dimensional hash.
for example I have done:
$hash{gene} = $mrna;
if (exists ($exon)){
$hash{gene}{$mrna} = $exon;
}
if (exists ($cds)){
$hash{gene}{$mrna} = $cds;
}
where $gene, $mrna, $exon, $cds are unique ids.
But, my issue is that I want some properties of $gene and $mrna to be included in the hash.
for example:
$hash{$gene}{'start_loc'} = $start;
$hash{gene}{mrna}{'start_loc'} = $start;
etc. But, is that a feasible way of declaring a hash? If I call $hash{$gene} both $mrna and start_loc will be printed. What could be the solution?
How would I add multiple values for the same key $gene and $mrna being the keys in this case.
Any suggestions will be appreciated.
What you need to do is to read the Perl Reference Tutorial.
Simple answer to your question:
Perl hashes can only take a single value to a key. However, that single value can be a reference to a memory location of another hash.
my %hash1 = ( foo => "bar", fu => "bur" }; #First hash
my %hash2;
my $hash{some_key} = \%hash1; #Reference to %hash1
And, there's nothing stopping that first hash from containing a reference to another hash. It's turtles all the way down!.
So yes, you can have a complex and convoluted structure as you like with as many sub-hashes as you want. Or mix in some arrays too.
For various reasons, I prefer the -> syntax when using these complex structures. I find that for more complex structures, it makes it easier to read. However, the main this is it makes you remember these are references and not actual multidimensional structures.
For example:
$hash{gene}->{mrna}->{start_loc} = $start; #Quote not needed in string if key name qualifies as a valid variable name.
The best thing to do is to think of your hash as a structure. For example:
my $person_ref = {}; #Person is a hash reference.
my $person->{NAME}->{FIRST} = "Bob";
my $person->{NAME}->{LAST} = "Rogers";
my $person->{PHONE}->{WORK}->[0] = "555-1234"; An Array Ref. Might have > 1
my $person->{PHONE}->{WORK}->[1] = "555-4444";
my $person->{PHONE}->{CELL}->[0] = "555-4321";
...
my #people;
push #people, $person_ref;
Now, I can load up my #people array with all my people, or maybe use a hash:
my %person;
$person{$bobs_ssn} = $person; #Now, all of Bob's info is index by his SSN.
So, the first thing you need to do is to think of what your structure should look like. What are the fields in your structure? What are the sub-fields? Figure out what your structure should look like, and then setup your hash of hashes to look like that. Figure out exactly how it will be stored and keyed.
Remember, this hash contains references to your genes (or whatever), so you want to choose your keys wisely.
Read the tutorial. Then, try your hand at it. It's not all that complicated to understand. However, it can be a bear to maintain.
When you say use strict;, you give yourself some protection:
my $foo = "bar";
say $Foo; #This won't work!
This won't work because you didn't declare $Foo, you declared $foo. The use stict; can catch variable names that are mistyped, but:
my %var;
$var{foo} = "bar";
say $var{Foo}; #Whoops!
This will not be caught (except maybe that $var{Foo} has not been initialized. The use strict; pragma can't detect mistakes in typing in your keys.
The next step, after you've grown comfortable with references is to move onto object oriented Perl. There's a Tutorial for that too.
All Object Oriented Perl does is to take your hash references, and turns them into objects. Then, it creates subroutines that will help you keep track of manipulating objects. For example:
sub last_name {
my $person = shift; #Don't worry about this for now..
my $last_name = shift;
if ( exists $last_name ) {
my $person->{NAME}->{LAST} = $last_name;
}
return $person->{NAME}->{LAST};
}
When I set my last name using this subroutine ...I mean method, I guarantee that the key will be $person->{NAME}->{LAST} and not $person->{LAST}->{NAME} or $person->{LAST}->{NMAE}. or $person->{last}->{name}.
The main problem isn't learning the mechanisms, but learning to apply them. So, think about exactly how you want to represent your items. This about what fields you want, and how you're going to pull up that information.
You could try pushing each value onto a hash of arrays:
my (#gene, #mrna, #exon, #cds);
my %hash;
push #{ $hash{$gene[$_]} }, [$mrna[$_], $exon[$_], $cds[$_] ] for 0 .. $#gene;
This way gene is the key, with multiple values ($mrna, $exon, $cds) associated with it.
Iterate over keys/values as follows:
for my $key (sort keys %hash) {
print "Gene: $key\t";
for my $value (#{ $hash{$key} } ) {
my ($mrna, $exon, $cds) = #$value; # De-references the array
print "Values: [$mrna], [$exon], [$cds]\n";
}
}
The answer to a question I've asked previously might be of help (Can a hash key have multiple 'subvalues' in perl?).

Confusion about proper usage of dereference in Perl

I noticed the other day that - while altering values in a hash - that when you dereference a hash in Perl, you actually are making a copy of that hash. To confirm I wrote this quick little script:
#! perl
use warnings;
use strict;
my %h = ();
my $hRef = \%h;
my %h2 = %{$hRef};
my $h2Ref = \%h2;
if($hRef eq $h2Ref) {
print "\n\tThey're the same $hRef $h2Ref";
}
else {
print "\n\tThey're NOT the same $hRef $h2Ref";
}
print "\n\n";
The output:
They're NOT the same HASH(0x10ff6848) HASH(0x10fede18)
This leads me to realize that there could be spots in some of my scripts where they aren't behaving as expected. Why is it even like this in the first place? If you're passing or returning a hash, it would be more natural to assume that dereferencing the hash would allow me to alter the values of the hash being dereferenced. Instead I'm just making copies all over the place without any real need/reason to beyond making syntax a little more obvious.
I realize the fact that I hadn't even noticed this until now shows its probably not that big of a deal (in terms of the need to go fix in all of my scripts - but important going forward). I think its going to be pretty rare to see noticeable performance differences out of this, but that doesn't alter the fact that I'm still confused.
Is this by design in perl? Is there some explicit reason I don't know about for this; or is this just known and you - as the programmer - expected to know and write scripts accordingly?
The problem is that you are making a copy of the hash to work with in this line:
my %h2 = %{$hRef};
And that is understandable, since many posts here on SO use that idiom to make a local name for a hash, without explaining that it is actually making a copy.
In Perl, a hash is a plural value, just like an array. This means that in list context (such as you get when assigning to a hash) the aggregate is taken apart into a list of its contents. This list of pairs is then assembled into a new hash as shown.
What you want to do is work with the reference directly.
for (keys %$hRef) {...}
for (values %$href) {...}
my $x = $href->{some_key};
# or
my $x = $$href{some_key};
$$href{new_key} = 'new_value';
When working with a normal hash, you have the sigil which is either a % when talking about the entire hash, a $ when talking about a single element, and # when talking about a slice. Each of these sigils is then followed by an identifier.
%hash # whole hash
$hash{key} # element
#hash{qw(a b)} # slice
To work with a reference named $href simply replace the string hash in the above code with $href. In other words, $href is the complete name of the identifier:
%$href # whole hash
$$href{key} # element
#$href{qw(a b)} # slice
Each of these could be written in a more verbose form as:
%{$href}
${$href}{key}
#{$href}{qw(a b)}
Which is again a substitution of the string '$href' for 'hash' as the name of the identifier.
%{hash}
${hash}{key}
#{hash}{qw(a b)}
You can also use a dereferencing arrow when working with an element:
$hash->{key} # exactly the same as $$hash{key}
But I prefer the doubled sigil syntax since it is similar to the whole aggregate and slice syntax, as well as the normal non-reference syntax.
So to sum up, any time you write something like this:
my #array = #$array_ref;
my %hash = %$hash_ref;
You will be making a copy of the first level of each aggregate. When using the dereferencing syntax directly, you will be working on the actual values, and not a copy.
If you want a REAL local name for a hash, but want to work on the same hash, you can use the local keyword to create an alias.
sub some_sub {
my $hash_ref = shift;
our %hash; # declare a lexical name for the global %{__PACKAGE__::hash}
local *hash = \%$hash_ref;
# install the hash ref into the glob
# the `\%` bit ensures we have a hash ref
# use %hash here, all changes will be made to $hash_ref
} # local unwinds here, restoring the global to its previous value if any
That is the pure Perl way of aliasing. If you want to use a my variable to hold the alias, you can use the module Data::Alias
You are confusing the actions of dereferencing, which does not inherently create a copy, and using a hash in list context and assigning that list, which does. $hashref->{'a'} is a dereference, but most certainly does affect the original hash. This is true for $#$arrayref or values(%$hashref) also.
Without the assignment, just the list context %$hashref is a mixed beast; the resulting list contains copies of the hash keys but aliases to the actual hash values. You can see this in action:
$ perl -wle'$x={"a".."f"}; for (%$x) { $_=chr(ord($_)+10) }; print %$x'
epcnal
vs.
$ perl -wle'$x={"a".."f"}; %y=%$x; for (%y) { $_=chr(ord($_)+10) }; print %$x; print %y'
efcdab
epcnal
but %$hashref isn't acting any differently than %hash here.
No, dereferencing does not create a copy of the referent. It's my that creates a new variable.
$ perl -E'
my %h1; my $h1 = \%h1;
my %h2; my $h2 = \%h2;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x83b62e0)
HASH(0x83b6340)
0
$ perl -E'
my %h;
my $h1 = \%h;
my $h2 = \%h;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x9eae2d8)
HASH(0x9eae2d8)
1
No, $#{$someArrayHashRef} does not create a new array.
If perl did what you suggest, then variables would get aliased very easily, which would be far more confusing. As it is, you can alias variables with globbing, but you need to do so explicitly.

Dynamic array using perl code. Begineer [duplicate]

I want to create arrays dynamically based on the user input. For example, if the user gives input as 3 then three arrays should be created with the name #message1, #message2, and #message3.
How do I do it in Perl?
Don't. Instead, use an array of arrays:
my #message;
my $input = 3;
for my $index ( 0..$input-1 ) {
$message[$index][0] = "element 0";
$message[$index][1] = 42;
}
print "The second array has ", scalar( #{ $message[1] } ), " elements\n";
print "They are:\n";
for my $index ( 0..$#{ $message[1] } ) {
print "\t", $message[1][$index], "\n";
}
Some helpful rules are at http://perlmonks.org/?node=References+quick+reference
I have to ask why you want to do this, because it's not the right way to go. If you have three streams of input, each of which needs to be stored as a list, then store one list, which is a list of the lists (where the lists are stored as array references):
my #input = (
[ 'data', 'from', 'first', 'user' ],
[ qw(data from second user) ],
[ qw(etc etc etc) ],
);
If you have names associated with each user's data, you might want to use that
as a hash key, for indexing the data against:
my %input = (
senthil => [ 'data', 'from', 'first', 'user' ],
ether => [ qw(data from second user) ],
apu => [ qw(etc etc etc) ],
);
Please refer to the Perl Data Structures Cookbook (perldoc perldsc) for more
on selecting the right data structure for the situation, and how to define
them.
Creating new named arrays dynamically is almost never a good idea. Mark Dominus, author of the enlightening book Higher-Order Perl, has written a three-part series detailing the pitfalls.
You have names in mind for these arrays, so put them in a hash:
sub create_arrays {
my($where,$n) = #_;
for (1 .. $n) {
$where->{"message$_"} = [];
}
}
For a quick example that shows the structure, the code below
my $n = #ARGV ? shift : 3;
my %hash;
create_arrays \%hash, $n;
use Data::Dumper;
$Data::Dumper::Indent = $Data::Dumper::Terse = 1;
print Dumper \%hash;
outputs
$ ./prog.pl
{
'message2' => [],
'message3' => [],
'message1' => []
}
Specifying a different number of arrays, we get
$ ./prog.pl 7
{
'message2' => [],
'message6' => [],
'message5' => [],
'message4' => [],
'message3' => [],
'message1' => [],
'message7' => []
}
The order of the keys looks funny because they're inside a hash, an unordered data structure.
Recall that [] creates a reference to a new anonymous array, so, for example, to add values to message2, you'd write
push #{ $hash{"message2"} }, "Hello!";
To print it, you'd write
print $hash{"message2"}[0], "\n";
Maybe instead you want to know how long all the arrays are:
foreach my $i (1 .. $n) {
print "message$i: ", scalar #{ $hash{"message$i"} }, "\n";
}
For more details on how to use references in Perl, see the following documentation:
Mark's very short tutorial about references, or perldoc perlreftut
Perl references and nested data structures, or perldoc perlref
Perl Data Structures Cookbook, or perldoc perldsc
In compiled languages, variables don't have a name. The name you see in the code is a unique identifier associated with some numerical offset. In an identifier like message_2 the '2' only serves to make it a unique identifier. Anybody can tell that you could make your three variables: message_125, message_216, and message_343. As long as you can tell what you should put into what, they work just as well as message_1...
The "name" of the variable is only for you keeping them straight while you're writing the code.
Dynamic languages add capability by not purging the symbol table(s). But a symbol table is simply an association of a name with a value. Because Perl offers you lists and hashes so cheaply, there is no need to use the programming/logistical method of keeping track of variables to allow a flexible runtime access.
Chances are that if you see yourself naming lists #message1, #message2, ... -- where the items differ only by their reference order, that these names are just as good: $message[1], $message[2], ....
In addition, since symbol tables are usually mapping from name-to-offset (either on the stack or in the heap), it's really not a whole lot more than a key-value pair you find in a hash. So hashes work just as good for looking up more distinct names.
$h{messages} = [];
$h{replies} = [];
I mean really if you wanted to, you could store everything that you put into a lexical variable into a single hash for the scope, if you didn't mind writing: $h{variable_name} for everything. But you wouldn't get the benefit of Perl's implicit scope management, and across languages, programmers have preferred implicit scope management.
Perl allows symbolic manipulation, but over the years the dynamic languages have found that a mixed blessing. But in Perl you have both "perspectives", to give them a name. Because you can determine what code in a compiled language is likely to do better than a dynamic language, it has been determined more error free to use a "compiled perspective" for more things: So as you can see with the availability of offset-management and lookup compiled behavior given to you in core Perl, there is no reason to mess with the symbol table, if you don't have to.
Creating an array dynamically, is as simple as: []. Assigning it to a spot in memory, when we don't know how many we want to store, is as easy as:
push #message, [];
And creating a list of lists all at once is as easy as:
#message = map { [] } 1..$num_lists;
for some specified value in $num_lists.