I am a first timer with Perl and I have to make changes to a Perl script and I have come across the following:
my %summary ;
for my $id ( keys %trades ) {
my ( $sym, $isin, $side, $type, $usrOrdrNum, $qty ) = #{$trades{$id}} ;
$type = "$side $type" ;
$summary{$sym}{$type} += $qty ;
$summary{$sym}{'ISIN'} = $isin ;
}
The portion I do not understand is $summary{$sym}{$type} += $qty ;. What is the original author trying to do here?
This piece of code populates the %summary hash with a summary of the data in %trades. Each trade is an array with multiple fields which are unpacked inside the loop. I.e. $sym is the value of the first array field of the current trade, $qty the last field
$summary{$sym} accesses the $sym field in the %summary hash. The entry named $type in the $summary{$sym} field is then accessed. If the field does not exist, it is created. If $summary{$sym} does not hold a hashref, one is created there, so everything Just Works. (technical term: autovivification)
$var += $x adds $x to $var, so $summary{$sym}{$type} holds a sum of all $qty values with the same $sym and $type after the loop finishes.
The $summary{$sym}{ISIN} field will hold the $isin value of the last trade with name $sym (I suspect they are the same for all such trades).
Perl has three built in different data types:
Scalar (as in $foo).
Arrays (as in #foo).
Hashes (as in %foo).
The problem is that each of these deal with single bits of data. Sure, there can be lots of items in list and hashes, but they are lots of single bits of data.
Let's say I want to keep track of people. People have a first name, last name, phone, etc. Let's define a person:
my %person;
$person{FIRST_NAME} = "Bob";
$person{LAST_NAME} = "Smith";
$person{PHONE_NUMBER} = "555-1234";
Okay, now I need to store another person. Do I create another hash? What if I could have, say an array of hashes with each hash representing a single person?
Perl allows you to do this by making a reference to the hash:
my #list;
push #list, \%person;
THe \%person is my reference to the memory location that contains my hash. $list[0] points to that memory location and allows me to access that person through dereferencing.
Now, my array contains my person. I can create a second one:
$person{FIRST_NAME} = "Susan";
$person{LAST_NAME} = "Brown";
$person{PHONE_NUMBER} = "555-9876";
push #list, \%person.
Okay, how do I reference my person. In Perl, you dereference by putting the correct sigil in front of your reference. For example:
my $person_ref = #list[0]; #Reference to Bob's hash
my %person = %{person_ref}; #Dereference to Bob's hash. %person is now Bob.
Several things, I'm doing a lot of moving data from one variable to another, and I am not really using those variables. Let's eliminate the variables, or at least their names:
my #list;
push #list, {}; #Anonymous hash in my list
$list[0] still points to a reference to a hash, but I never had to give that hash a name. Now, how do I put Bob's information into it?
If $list[0] is a reference to a hash, I could dereference it by putting %{...} around it!
%person = %{ $list[0] }; #Person is an empty hash, but you get the idea
Let's fill up that hash!
${ $list[0] }{FIRST_NAME} = "Bob";
${ $list[0] }{LAST_NAME} = "Smith";
${ $list[0] }{PHONE_NUMBER} = "555-1234";
That's easy to read...
Fortunately, Perl provides a bit of syntactic sweetener. This is the same:
$list[0]->{FIRST_NAME} = "Bob";
$list[0]->{LAST_NAME} = "Smith";
$list[0]->{PHONE_NUMBER} = "555-1234";
The -> operator points to the dereference you're doing.
Also, in certain circumstances, I don't need the {...} curly braces. Think of it like math operations where there's an order of precedence:
(3 x 4) + (5 x 8)
is the same as:
3 x 4 + 5 x 8
One, I specify the order of operation, and the other I don't:
The original adding names into a hash reference stored in a list:
${ $list[0] }{FIRST_NAME} = "Bob";
${ $list[0] }{LAST_NAME} = "Smith";
${ $list[0] }{PHONE_NUMBER} = "555-1234";
Can be rewritten as:
$list[0]{FIRST_NAME} = "Bob";
$list[0]{LAST_NAME} = "Smith";
$list[0]{PHONE_NUMBER} = "555-1234";
(And I didn't have to do push #list, {}; first. I just wanted to emphasize that this was a reference to a hash.
Thus:
$trades{$id}
Is a reference to an array of data.
Think of it as this way:
my #list = qw(a bunch of data);
$trades{$id} = \#list;
And to dereference that reference to a list, I do this:
#{trades{$id}}
See Mark's Short Tutorial About References.
$summary{$sym}{$type} is a scalar inside a hashref inside a hash.
+= is an operator that takes the left hand side, adds the right hand side to it, then assigns the result back to the left hand side.
$qty is the value to add to the previously stored value.
$summary{$sym}{$type} += $qty ; #is the same as
#$summary{$sym}{$type} = $summary{$sym}{$type} + $qty;
#This line calculates total of the values from the hash %trades ($trades{$id}[5];).
The best way to see types in Perl if you are a newbie is to use perl debugger option.
You can run the script as :
perl -d <scriptname>
And then withoin the debugger (you will see something like this)
DB<1>
type the following to go to the code where you want to debug:
DB<1> c <linenumber>
Then You can use x to see the variables like:
DB<2>x %trades
DB<3>x $trades{$id}
DB<4>print Dumper \%trades
This way you can actually see whats inside the hash or even hash of hash.
It computes the sum of all values in the last field for each combination of values of the first three fields.
If the hash was a SQL table instead (and why not - something like DBD::CSV may come in handy here) with fields id, sym, isin, side, type, usrOrdrNum, qty, the code would translate to something like
SELECT sym, CONCAT(side,' ',type) AS type, SUM(qty), isin
FROM trades
GROUP BY sym, CONCAT(side,' ',type);
Related
Given:
my #list1 = ('a');
my #list2 = ('b');
my #list0 = ( \#list1, \#list2 );
then
my #listRef = $list0[1];
my #list = #$listRef; # works
but
my #list = #$($list0[1]); # gives an error message
I can't figure out why. What am I missing?
There is one simple de-referencing rule that covers this. Loosely put:
What follows the sigil need be the correct reference for it, or a block that evaluates to that.
A specific case from perlreftut
You can always use an array reference, in curly braces, in place of the name of an array.
In your case then, it should be
my #list = #{ $list0[1] };
(not index [2] since your #list0 has two elements) Spaces are there only for readability.
The attempted #$($list0[2]) is a syntax error, first because the ( (following the $) isn't allowed in an identifier (variable name), what presumably follows that $.
A block {} though would be allowed after the $ and would be evaluated, and must yield a scalar reference in this case, to be dereferenced by that $ in front of it; but then the first # would be in error. That can then fixed as well but this gets messy if pushed, and wasn't meant to go that far. While the exact rules are (still) a little murky, see Identifier Parsing in perldata.
The #$listRef earlier is correct syntax in general. But it refers to a scalar variable $listRef (which must be an array reference since it's getting dereferenced into an array by the first #), and there is no such a thing in the example -- you have an array variable #listRef.
So with use strict; in effect this, too, would fail to compile.
Dereferencing an arrayref to assign a new array is expensive as it has to copy all elements (and to construct the new array variable), while it's rarely needed (unless you actually want a copy). With the array reference on hand ($ar) all that one may need is readily available
#$ar; # list of elements
$ar->[$index]; # specific element
#$ar[#indices]; # slice -- list of some elements, like #$ar[0,2..5,-1]
$ar->#[0,-1]; # slice, with new "postfix dereferencing" (stable at v5.24)
$#$ar; # last index in the anonymous array referred by $ar
See Slices in perldata and Postfix reference slicing in perlref
You need
#{ $list0[1] }
Whenever you can use the name of a variable, you can use a block that evaluates to a reference. That means the syntax for getting the elements of an array are
#NAME # If you have the name
#BLOCK # If you have a reference
That means that
my #array1 = 4..5;
my #array2 = #array1;
and
my $array1 = [ 4..5 ];
my #array2 = #{ $array1 }
are equivalent.
When the only thing in the block is a simple scalar ($NAME or $BLOCK), you can omit the curlies. That means that
#{ $array1 }
is equivalent to
#$array1
That's why #$listRef works, and it's why #{ $list0[1] } can't be simplified.
See Perl Dereferencing Syntax.
You have a lot going on there and multiple levels of inadvertent references, so let's go through it:
First, you start by making a list of two items, each of which is an array reference. You store that in an array:
my #list0 = ( \#list2, \#list2 );
Then you ask for the item with index 2, which is a single item, and store that in an array:
my #listRef = $list0[2];
However, there is no item with index 2 because Perl indexes from zero. The value in #listRef in undefined. Not only that, but you've asked for a single item and stored it in an array instead of a scalar. That's probably not what you meant.
You say this following line works, but I don't think you know that because it won't give you the value you were expecting even if you didn't get an error. Something else is happening. You haven't declared or used a variable $listRef, so Perl creates it for you and gives it the value undef. When you try to dereference it, Perl uses "autovivification" to create the reference. This is the process where Perl helpfully creates a reference structure for you if you start with undef:
my #list = #$listRef; # works
There is nothing in that array so #list should be empty.
Fix that to get the last item, which has index of 1, and fix it so you are assigning the single value (the reference) to a scalar variable:
my $listRef = $list0[1];
Data::Dumper is handy here:
use Data::Dumper;
my #list2 = qw(a b c);
my #list0 = ( \#list2, \#list2 );
my $listRef = $list0[1];
print Dumper($listRef);
You get the output:
$VAR1 = [
'a',
'b',
'c'
];
Perl has some features that can catch these sorts of variable naming mistakes and will help you track down problems. Add these to the top of your program:
use strict;
use warnings;
For the rest, you might want to check out my book Intermediate Perl which explains all this reference stuff.
And, recent Perls have a new feature called postfix dereferencing that allows you to write dereferences from left to right:
my #items = ( \#list2, \#list2 );
my #items_of_last_ref = $items[1]->#*;
my #list = #$#listRef; # works
I doubt that works. That may not throw a syntax error but it sure as hell does not do what you think it does. For once
my #list0 = ( \#list2, \#list2 );
defines an array with 2 elements and you access
my #listRef = $list0[2];
the third element. So #listRef is an array that contains one element which is undef. The following code doesn't make sense either.
Unless the question is purely academic (answered by zdim already), I assume you want the second element of #list into a separate array, I would write
my #list = #{ $list0[1] };
The question is not complete and not clear on desired outcome.
OP tries to access an element $list0[2] of array #list0 which does not exist -- array has elements with indexes 0 and 1.
Perhaps #listRef should be $listRef instead in the post.
Bellow is my vision of described problem
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #list1 = qw/word1 word2 word3 word4/;
my #list2 = 1000..1004;
my #list0 = (\#list1, \#list2);
my $ref_array = $list0[0];
map{ say } #{$ref_array};
$ref_array = $list0[1];
map{ say } #{$ref_array};
say "Element: " . #{$ref_array}[2];
output
word1
word2
word3
word4
1000
1001
1002
1003
1004
Element: 1002
Lets say I have a large hash and I want to iterate over the contents of it contents. The standard idiom would be something like this:
while(($key, $value) = each(%{$hash_ref})){
///do something
}
However, if I understand my perl correctly this is actually doing two things. First the
%{$hash_ref}
is translating the ref into list context. Thus returning something like
(key1, value1, key2, value2, key3, value3 etc)
which will be stored in my stacks memory. Then the each method will run, eating the first two values in memory (key1 & value1) and returning them to my while loop to process.
If my understanding of this is right that means that I have effectively copied my entire hash into my stacks memory only to iterate over the new copy, which could be expensive for a large hash, due to the expense of iterating over the array twice, but also due to potential cache hits if both hashes can't be held in memory at once. It seems pretty inefficient. I'm wondering if this is what really happens, or if I'm either misunderstanding the actual behavior or the compiler optimizes away the inefficiency for me?
Follow up questions, assuming I am correct about the standard behavior.
Is there a syntax to avoid copying of the hash by iterating over it values in the original hash? If not for a hash is there one for the simpler array?
Does this mean that in the above example I could get inconsistent values between the copy of my hash and my actual hash if I modify the hash_ref content within my loop; resulting in $value having a different value then $hash_ref->($key)?
No, the syntax you quote does not create a copy.
This expression:
%{$hash_ref}
is exactly equivalent to:
%$hash_ref
and assuming the $hash_ref scalar variable does indeed contain a reference to a hash, then adding the % on the front is simply 'dereferencing' the reference - i.e. it resolves to a value that represents the underlying hash (the thing that $hash_ref was pointing to).
If you look at the documentation for the each function, you'll see that it expects a hash as an argument. Putting the % on the front is how you provide a hash when what you have is a hashref.
If you wrote your own subroutine and passed a hash to it like this:
my_sub(%$hash_ref);
then on some level you could say that the hash had been 'copied', since inside the subroutine the special #_ array would contain a list of all the key/value pairs from the hash. However even in that case, the elements of #_ are actually aliases for the keys and values. You'd only actually get a copy if you did something like: my #args = #_.
Perl's builtin each function is declared with the prototype '+' which effectively coerces a hash (or array) argument into a reference to the underlying data structure.
As an aside, starting with version 5.14, the each function can also take a reference to a hash. So instead of:
($key, $value) = each(%{$hash_ref})
You can simply say:
($key, $value) = each($hash_ref)
No copy is created by each (though you do copy the returned values into $key and $value through assignment). The hash itself is passed to each.
each is a little special. It supports the following syntaxes:
each HASH
each ARRAY
As you can see, it doesn't accept an arbitrary expression. (That would be each EXPR or each LIST). The reason for that is to allow each(%foo) to pass the hash %foo itself to each rather than evaluating it in list context. each can do that because it's an operator, and operators can have their own parsing rules. However, you can do something similar with the \% prototype.
use Data::Dumper;
sub f { print(Dumper(#_)); }
sub g(\%) { print(Dumper(#_)); } # Similar to each
my %h = (a=>1, b=>2);
f(%h); # Evaluates %h in list context.
print("\n");
g(%h); # Passes a reference to %h.
Output:
$VAR1 = 'a'; # 4 args, the keys and values of the hash
$VAR2 = 1;
$VAR3 = 'b';
$VAR4 = 2;
$VAR1 = { # 1 arg, a reference to the hash
'a' => 1,
'b' => 2
};
%{$h_ref} is the same as %h, so all of the above applies to %{$h_ref} too.
Note that the hash isn't copied even if it is flattened. The keys are "copied", but the values are returned directly.
use Data::Dumper;
my %h = (abc=>"def", ghi=>"jkl");
print(Dumper(\%h));
$_ = uc($_) for %h;
print(Dumper(\%h));
Output:
$VAR1 = {
'abc' => 'def',
'ghi' => 'jkl'
};
$VAR1 = {
'abc' => 'DEF',
'ghi' => 'JKL'
};
You can read more about this here.
This question already has answers here:
Why do you need $ when accessing array and hash elements in Perl?
(9 answers)
Closed 8 years ago.
Today I start my perl journey, and now I'm exploring the data type.
My code looks like:
#list=(1,2,3,4,5);
%dict=(1,2,3,4,5);
print "$list[0]\n"; # using [ ] to wrap index
print "$dict{1}\n"; # using { } to wrap key
print "#list[2]\n";
print "%dict{2}\n";
it seems $ + var_name works for both array and hash, but # + var_name can be used to call an array, meanwhile % + var_name can't be used to call a hash.
Why?
#list[2] works because it is a slice of a list.
In Perl 5, a sigil indicates--in a non-technical sense--the context of your expression. Except from some of the non-standard behavior that slices have in a scalar context, the basic thought is that the sigil represents what you want to get out of the expression.
If you want a scalar out of a hash, it's $hash{key}.
If you want a scalar out of an array, it's $array[0]. However, Perl allows you to get slices of the aggregates. And that allows you to retrieve more than one value in a compact expression. Slices take a list of indexes. So,
#list = #hash{ qw<key1 key2> };
gives you a list of items from the hash. And,
#list2 = #list[0..3];
gives you the first four items from the array. --> For your case, #list[2] still has a "list" of indexes, it's just that list is the special case of a "list of one".
As scalar and list contexts were rather well defined, and there was no "hash context", it stayed pretty stable at $ for scalar and # for "lists" and until recently, Perl did not support addressing any variable with %. So neither %hash{#keys} nor %hash{key} had meaning. Now, however, you can dump out pairs of indexes with values by putting the % sigil on the front.
my %hash = qw<a 1 b 2>;
my #list = %hash{ qw<a b> }; # yields ( 'a', 1, 'b', 2 )
my #l2 = %list[0..2]; # yields ( 0, 'a', 1, '1', 2, 'b' )
So, I guess, if you have an older version of Perl, you can't, but if you have 5.20, you can.
But for a completist's sake, slices have a non-intuitive way that they work in a scalar context. Because the standard behavior of putting a list into a scalar context is to count the list, if a slice worked with that behavior:
( $item = #hash{ #keys } ) == scalar #keys;
Which would make the expression:
$item = #hash{ #keys };
no more valuable than:
scalar #keys;
So, Perl seems to treat it like the expression:
$s = ( $hash{$keys[0]}, $hash{$keys[1]}, ... , $hash{$keys[$#keys]} );
And when a comma-delimited list is evaluated in a scalar context, it assigns the last expression. So it really ends up that
$item = #hash{ #keys };
is no more valuable than:
$item = $hash{ $keys[-1] };
But it makes writing something like this:
$item = $hash{ source1(), source2(), #array3, $banana, ( map { "$_" } source4()};
slightly easier than writing:
$item = $hash{ [source1(), source2(), #array3, $banana, ( map { "$_" } source4()]->[-1] }
But only slightly.
Arrays are interpolated within double quotes, so you see the actual contents of the array printed.
On the other hand, %dict{1} works, but is not interpolated within double quotes. So, something like my %partial_dict = %dict{1,3} is valid and does what you expect i.e. %partial_dict will now have the value (1,2,3,4). But "%dict{1,3}" (in quotes) will still be printed as %dict{1,3}.
Perl Cookbook has some tips on printing hashes.
I'm completely new to perl and I'm trying to build up a data structure which should be rather simple. I have a couple of loops collecting data from the database on each iteration, and I want to be able to store this data in an array of hashmaps.
Now this is where my difficulty currently lies: There is a loop that runs before the data collecting loops and just builds a list of names that will get looped over in the collection loop. What I'm trying to do on that loop is to create an array of hashmaps and just assign the name to a field in the map and leave the others empty.
Once that's done how do I assign a value to an item inside a map contained in an array in perl?
-----EDIT seeing as Im getting down voted----
my #characters;
for my $name (#names) {
my %flinstones = (
husband => $name,
pal => "",
);
push #characters, %flinstones;
}
Now how do I set the pal field later in the program?
Perl has three basic variable structures: Scalars variables ($foo), Arrays (#foo), and Hashes (%foo).
Perl allows you to use references to these structures. A reference is a pointer to a particular variable type. For example, I could have a reference to a Hash or an Array. It's using references where you can build more complex structures.
It's important to keep in mind that these more complex structures are references and can be tricky for someone new in Perl to understand. For example:
#foo = (1, 2, 3, 4); # This is a standard array.
%foo = (1, 2, 3, 4); # This is a hash with two items. 1 & 3 are keys. 2 & 4 data.
$foo = [1, 2, 3, 4]; # A reference to a nameless array that contains four items
$foo = {1, 2, 3, 4}; # A reference to a nameless hash that contains four items.
Note that when I merely change the parentheses to square brackets or curly braces, I am now talking about references and not hashes or arrays. Also note that hashes can convert back and forth betweens arrays and hashes.
#foo = (1, 2, 3, 4); # This is an array of four numbers
%foo = #foo; # %foo is a hash with two data items 1=> 2 and 3 => 4
#bar = %foo; # #bar is another array!
No wonder people new to Perl can get confused by this!
Let's take a look at that loop:
my #characters;
for my $name (#names) {
my %flinstones = (
husband => $name,
pal => "",
);
push #characters, %flinstones; # What's this?
}
In your #characters array, you're pushing in an array of four items, and not a hash!
The push is taking %flinstones is an array context (well, list context). If %flinstones is
husband => Fred,
pal => "",
The push will look like this:
#characters = ( "husband", "Fred", "pal", "" );
The next time it executes (and assuming $name gets changed to Barney), you'll see this:
#characters = ("husband", "Fred", "pal", "", "husband", "Barney", "pal", "");
You're basically destroying your structure of your hash with that loop. Not what you want.
What you might have meant is this:
push #characters, \%flinstones; # See the difference from the above?
The backslash in front of %flinstones says you're pushing a reference to that %flinstone hash into your array, and not the items (both keys and values) into your array. That one little backslash makes a big difference in your program. Even worse, both push statements are grammatically correct. Your Perl program will run with either one.
No wonder new Perl users find references so confusing!
After your loop, your array will look something like this:
$characters[0] = { husband => "Fred", pal => "" };
$characters[1] = { husband => "Barney", pal => "" };
Note that curly braces talk about a hash reference! Note that you have an array of hash references this way, and your hash structure is saved. You could pop off each hash reference (and remember it's a reference!) or talk about $character[0]->{husband} being set to Fred.
I highly recommend you read the Perl Tutorial on References. I also recommend that you look at the Data::Dumper module. You can use this tool to print out your complex data structure and see what's going on.
By the way, you usually see a hash of hashes for things like this. Imagine instead of using an array of #characters, you use a hash of %characters keyed by the first name of that character:
my %characters;
$character{Fred} = {}; #Some hash reference. We'll fill it out later...
$character{Barney} = {};
$charcater{Wilma} = {};
$character{Betty} = {};
Now, we can talk about each character! Let's fill in some fields:
$character{Fred}->{spouse} = "Wilma";
$character{Fred}->{pal} = "Barney";
$character{Barney}->{spouse} = "Betty";
$character{Barney}->{pal} = "Fred";
$character{Betty}->{spouse} = "Barney";
$character{Betty}->{pal} = "Wilma";
So, we have a hash, and each entry is a sub-hash (a reference to another hash) that contains two entries (one for spouse and one for pal. We can use this structure to track complex relationships.
In fact, it's likely people have more than one pal. Let's make pal point to an array!
$characher{Fred}->{pal} = []; This is an array reference!
push #{ $character{Fred}->{pal} }, ("Barney", "Joe");
Now, $character{Fred} has two pals! Note the dereferencing syntax to turn that _array reference ($character{Fred}->{pal}) back into an array, so I can push items into it.
Read the tutorial. It's pretty simple, and then play around with references until you have a better idea what's going on. Remember too that references and hashes need to be referenced and dereferenced, and that the type of data grouping parameters you use (parentheses, curly brackets, or square brackets) can make a big difference on whether you're talking about a reference or an array or hash, and even what type of reference you're talking about.
I think your data structure doesn't work as well as you think it does - an array is for an ordered sequence of items.
A hash is for an unordered set of key-value pairs. It's not all that usual to nest hashes inside an array, for that reason.
The problem you'll be having with that 'push' is that if you treat a hash like an array, it actually works ... like an array. Internally, they're both 'sequences of scalars' and the key difference is that an array maintains it's order, where a hash just ensures the relationships between keys and values persist.
For example:
my #array = ( "husband", "fred", "pet", "" );
my %hash = #array;
foreach my $key ( keys %hash ) {
print "$key = $hash{$key}\n";
}
This works in reverse too:
#array = %hash;
print join (":", #array );
So what you're doing is shoving into #characters ("husband", $name, "pal", "" ). That's even less likely to be what you want to do.
So first off - to insert your hash into your array, you need to put a hash reference:
push #characters, \%flinstones; ## ITYM flintstones
Then you'll be able to:
for my $character ( #characters ) {
print $character -> {'husband'};
}
But I don't actually think that structure does what you want it, so you may want to consider taking an object oriented approach instead.
I'd avoid the array of hashes/hashes of arrays and spend a little bit of time looking at how to do OO in Perl. It's a little bit more work up-front but will save maintenance headaches further down the line. you can do a man perlboot or have a look at the online perl OO tutorial
I need to create multi-dimensional hash.
for example I have done:
$hash{gene} = $mrna;
if (exists ($exon)){
$hash{gene}{$mrna} = $exon;
}
if (exists ($cds)){
$hash{gene}{$mrna} = $cds;
}
where $gene, $mrna, $exon, $cds are unique ids.
But, my issue is that I want some properties of $gene and $mrna to be included in the hash.
for example:
$hash{$gene}{'start_loc'} = $start;
$hash{gene}{mrna}{'start_loc'} = $start;
etc. But, is that a feasible way of declaring a hash? If I call $hash{$gene} both $mrna and start_loc will be printed. What could be the solution?
How would I add multiple values for the same key $gene and $mrna being the keys in this case.
Any suggestions will be appreciated.
What you need to do is to read the Perl Reference Tutorial.
Simple answer to your question:
Perl hashes can only take a single value to a key. However, that single value can be a reference to a memory location of another hash.
my %hash1 = ( foo => "bar", fu => "bur" }; #First hash
my %hash2;
my $hash{some_key} = \%hash1; #Reference to %hash1
And, there's nothing stopping that first hash from containing a reference to another hash. It's turtles all the way down!.
So yes, you can have a complex and convoluted structure as you like with as many sub-hashes as you want. Or mix in some arrays too.
For various reasons, I prefer the -> syntax when using these complex structures. I find that for more complex structures, it makes it easier to read. However, the main this is it makes you remember these are references and not actual multidimensional structures.
For example:
$hash{gene}->{mrna}->{start_loc} = $start; #Quote not needed in string if key name qualifies as a valid variable name.
The best thing to do is to think of your hash as a structure. For example:
my $person_ref = {}; #Person is a hash reference.
my $person->{NAME}->{FIRST} = "Bob";
my $person->{NAME}->{LAST} = "Rogers";
my $person->{PHONE}->{WORK}->[0] = "555-1234"; An Array Ref. Might have > 1
my $person->{PHONE}->{WORK}->[1] = "555-4444";
my $person->{PHONE}->{CELL}->[0] = "555-4321";
...
my #people;
push #people, $person_ref;
Now, I can load up my #people array with all my people, or maybe use a hash:
my %person;
$person{$bobs_ssn} = $person; #Now, all of Bob's info is index by his SSN.
So, the first thing you need to do is to think of what your structure should look like. What are the fields in your structure? What are the sub-fields? Figure out what your structure should look like, and then setup your hash of hashes to look like that. Figure out exactly how it will be stored and keyed.
Remember, this hash contains references to your genes (or whatever), so you want to choose your keys wisely.
Read the tutorial. Then, try your hand at it. It's not all that complicated to understand. However, it can be a bear to maintain.
When you say use strict;, you give yourself some protection:
my $foo = "bar";
say $Foo; #This won't work!
This won't work because you didn't declare $Foo, you declared $foo. The use stict; can catch variable names that are mistyped, but:
my %var;
$var{foo} = "bar";
say $var{Foo}; #Whoops!
This will not be caught (except maybe that $var{Foo} has not been initialized. The use strict; pragma can't detect mistakes in typing in your keys.
The next step, after you've grown comfortable with references is to move onto object oriented Perl. There's a Tutorial for that too.
All Object Oriented Perl does is to take your hash references, and turns them into objects. Then, it creates subroutines that will help you keep track of manipulating objects. For example:
sub last_name {
my $person = shift; #Don't worry about this for now..
my $last_name = shift;
if ( exists $last_name ) {
my $person->{NAME}->{LAST} = $last_name;
}
return $person->{NAME}->{LAST};
}
When I set my last name using this subroutine ...I mean method, I guarantee that the key will be $person->{NAME}->{LAST} and not $person->{LAST}->{NAME} or $person->{LAST}->{NMAE}. or $person->{last}->{name}.
The main problem isn't learning the mechanisms, but learning to apply them. So, think about exactly how you want to represent your items. This about what fields you want, and how you're going to pull up that information.
You could try pushing each value onto a hash of arrays:
my (#gene, #mrna, #exon, #cds);
my %hash;
push #{ $hash{$gene[$_]} }, [$mrna[$_], $exon[$_], $cds[$_] ] for 0 .. $#gene;
This way gene is the key, with multiple values ($mrna, $exon, $cds) associated with it.
Iterate over keys/values as follows:
for my $key (sort keys %hash) {
print "Gene: $key\t";
for my $value (#{ $hash{$key} } ) {
my ($mrna, $exon, $cds) = #$value; # De-references the array
print "Values: [$mrna], [$exon], [$cds]\n";
}
}
The answer to a question I've asked previously might be of help (Can a hash key have multiple 'subvalues' in perl?).