Perl data structure - perl

I'm completely new to perl and I'm trying to build up a data structure which should be rather simple. I have a couple of loops collecting data from the database on each iteration, and I want to be able to store this data in an array of hashmaps.
Now this is where my difficulty currently lies: There is a loop that runs before the data collecting loops and just builds a list of names that will get looped over in the collection loop. What I'm trying to do on that loop is to create an array of hashmaps and just assign the name to a field in the map and leave the others empty.
Once that's done how do I assign a value to an item inside a map contained in an array in perl?
-----EDIT seeing as Im getting down voted----
my #characters;
for my $name (#names) {
my %flinstones = (
husband => $name,
pal => "",
);
push #characters, %flinstones;
}
Now how do I set the pal field later in the program?

Perl has three basic variable structures: Scalars variables ($foo), Arrays (#foo), and Hashes (%foo).
Perl allows you to use references to these structures. A reference is a pointer to a particular variable type. For example, I could have a reference to a Hash or an Array. It's using references where you can build more complex structures.
It's important to keep in mind that these more complex structures are references and can be tricky for someone new in Perl to understand. For example:
#foo = (1, 2, 3, 4); # This is a standard array.
%foo = (1, 2, 3, 4); # This is a hash with two items. 1 & 3 are keys. 2 & 4 data.
$foo = [1, 2, 3, 4]; # A reference to a nameless array that contains four items
$foo = {1, 2, 3, 4}; # A reference to a nameless hash that contains four items.
Note that when I merely change the parentheses to square brackets or curly braces, I am now talking about references and not hashes or arrays. Also note that hashes can convert back and forth betweens arrays and hashes.
#foo = (1, 2, 3, 4); # This is an array of four numbers
%foo = #foo; # %foo is a hash with two data items 1=> 2 and 3 => 4
#bar = %foo; # #bar is another array!
No wonder people new to Perl can get confused by this!
Let's take a look at that loop:
my #characters;
for my $name (#names) {
my %flinstones = (
husband => $name,
pal => "",
);
push #characters, %flinstones; # What's this?
}
In your #characters array, you're pushing in an array of four items, and not a hash!
The push is taking %flinstones is an array context (well, list context). If %flinstones is
husband => Fred,
pal => "",
The push will look like this:
#characters = ( "husband", "Fred", "pal", "" );
The next time it executes (and assuming $name gets changed to Barney), you'll see this:
#characters = ("husband", "Fred", "pal", "", "husband", "Barney", "pal", "");
You're basically destroying your structure of your hash with that loop. Not what you want.
What you might have meant is this:
push #characters, \%flinstones; # See the difference from the above?
The backslash in front of %flinstones says you're pushing a reference to that %flinstone hash into your array, and not the items (both keys and values) into your array. That one little backslash makes a big difference in your program. Even worse, both push statements are grammatically correct. Your Perl program will run with either one.
No wonder new Perl users find references so confusing!
After your loop, your array will look something like this:
$characters[0] = { husband => "Fred", pal => "" };
$characters[1] = { husband => "Barney", pal => "" };
Note that curly braces talk about a hash reference! Note that you have an array of hash references this way, and your hash structure is saved. You could pop off each hash reference (and remember it's a reference!) or talk about $character[0]->{husband} being set to Fred.
I highly recommend you read the Perl Tutorial on References. I also recommend that you look at the Data::Dumper module. You can use this tool to print out your complex data structure and see what's going on.
By the way, you usually see a hash of hashes for things like this. Imagine instead of using an array of #characters, you use a hash of %characters keyed by the first name of that character:
my %characters;
$character{Fred} = {}; #Some hash reference. We'll fill it out later...
$character{Barney} = {};
$charcater{Wilma} = {};
$character{Betty} = {};
Now, we can talk about each character! Let's fill in some fields:
$character{Fred}->{spouse} = "Wilma";
$character{Fred}->{pal} = "Barney";
$character{Barney}->{spouse} = "Betty";
$character{Barney}->{pal} = "Fred";
$character{Betty}->{spouse} = "Barney";
$character{Betty}->{pal} = "Wilma";
So, we have a hash, and each entry is a sub-hash (a reference to another hash) that contains two entries (one for spouse and one for pal. We can use this structure to track complex relationships.
In fact, it's likely people have more than one pal. Let's make pal point to an array!
$characher{Fred}->{pal} = []; This is an array reference!
push #{ $character{Fred}->{pal} }, ("Barney", "Joe");
Now, $character{Fred} has two pals! Note the dereferencing syntax to turn that _array reference ($character{Fred}->{pal}) back into an array, so I can push items into it.
Read the tutorial. It's pretty simple, and then play around with references until you have a better idea what's going on. Remember too that references and hashes need to be referenced and dereferenced, and that the type of data grouping parameters you use (parentheses, curly brackets, or square brackets) can make a big difference on whether you're talking about a reference or an array or hash, and even what type of reference you're talking about.

I think your data structure doesn't work as well as you think it does - an array is for an ordered sequence of items.
A hash is for an unordered set of key-value pairs. It's not all that usual to nest hashes inside an array, for that reason.
The problem you'll be having with that 'push' is that if you treat a hash like an array, it actually works ... like an array. Internally, they're both 'sequences of scalars' and the key difference is that an array maintains it's order, where a hash just ensures the relationships between keys and values persist.
For example:
my #array = ( "husband", "fred", "pet", "" );
my %hash = #array;
foreach my $key ( keys %hash ) {
print "$key = $hash{$key}\n";
}
This works in reverse too:
#array = %hash;
print join (":", #array );
So what you're doing is shoving into #characters ("husband", $name, "pal", "" ). That's even less likely to be what you want to do.
So first off - to insert your hash into your array, you need to put a hash reference:
push #characters, \%flinstones; ## ITYM flintstones
Then you'll be able to:
for my $character ( #characters ) {
print $character -> {'husband'};
}
But I don't actually think that structure does what you want it, so you may want to consider taking an object oriented approach instead.

I'd avoid the array of hashes/hashes of arrays and spend a little bit of time looking at how to do OO in Perl. It's a little bit more work up-front but will save maintenance headaches further down the line. you can do a man perlboot or have a look at the online perl OO tutorial

Related

What are the advantages of anonymous arrays in Perl?

What are the advantages of anonymous array in Perl?
Array references—which anonymous arrays are one type of—allows Perl to treat the array as a single item. This allows you to build complex, nested data structures as well. This applies to hash, code, and all the other reference types too, but here I'll show only the hash reference. You can read Intermediate Perl for much more.
For the immediate, literal question, consider that there are two ways to make an array: with a named variable, and without a named variable ("anonymous"):
my #named_array = ( 1, 3, 7 );
[ 1, 3, 7 ];
The first line takes a list and stores it in a named array variable. You are probably used to seeing that everywhere.
That second line, [ 1, 3, 7 ], doesn't do anything. It's just a value.
But, consider this analogue, where you store a scalar value in a scalar variable (overloaded use of "scalar" there), and the next line that is just the value:
my $number = 6;
6;
Now here's the trick. You know that you can pass a scalar variable to a subroutine. You could write that as this:
my $number = 6;
some_sub( $number );
But why bother with the variable at all if that's the only use of it? Get rid of it altogether and pass the value directly:
some_sub( 6 );
It's the same thing with anonymous references. You can make the named version first and take a reference to it:
my #array = ( 1, 3, 7 );
some_sub( \#array );
But just like the scalar example, you don't need to clutter your code with a named array if it's only there so you can get a reference to it. Just make the reference directly:
some_sub( [ 1, 3, 7 ] );
But there's more to the story, and you have to know a little about how Perl works to understand it.
Why references at all?
Perl is mostly built around scalars (single item) and lists (multiple items). A scalar variable holds a scalar value, and an array variable hold a list (see What’s the difference between a list and an array?).
There are many features where you can use only a scalar, including single list elements, hash keys, and hash values:
$array[$i] = $single_item;
$hash{$single_item} = $other_single_item;
Other places that are always a list, such as the argument list to a subroutine:
sub some_sub {
my #args = #_;
...
}
Even if you call some_sub with two arrays, you end up with a single list stored in #_. You can't tell where #array_1 stopped and #array_2 started. This is all one list whose size is the combined sizes of the two arrays:
some_sub( #array_1, #array_2 );
Reference are a way to treat something as a single item. When you get that single item, you dereference it to get back to the original.
This means that you can store a array reference as a hash value:
$hash{$key} = \#some_array; # ref to named variable
$hash{$key} = [ 1, 3, 7 ]; # anonymous array directly
Or, you can create a list where each item is an array reference rather than the single, "flat" list you saw before:
my #Array_of_Arrays = ( \#array_1, \#array_2 , [1,3,7], ... );
my $seventh_of_ninth = $Array_of_Arrays[9][7];
The Perl Data Structures Cookbook (perldsc) has many examples of different sorts of complex data structures which you build with references.
You can pass references to subroutines so the array elements don't mix. This argument list is exactly two elements, and inside the subroutine you know which array you are dealing with:
some_sub( \#array_1, \#array_2 );
If you were curious about another aspect of this, you can update your question.

Does iterating over a hash reference require implicitly copying it in perl?

Lets say I have a large hash and I want to iterate over the contents of it contents. The standard idiom would be something like this:
while(($key, $value) = each(%{$hash_ref})){
///do something
}
However, if I understand my perl correctly this is actually doing two things. First the
%{$hash_ref}
is translating the ref into list context. Thus returning something like
(key1, value1, key2, value2, key3, value3 etc)
which will be stored in my stacks memory. Then the each method will run, eating the first two values in memory (key1 & value1) and returning them to my while loop to process.
If my understanding of this is right that means that I have effectively copied my entire hash into my stacks memory only to iterate over the new copy, which could be expensive for a large hash, due to the expense of iterating over the array twice, but also due to potential cache hits if both hashes can't be held in memory at once. It seems pretty inefficient. I'm wondering if this is what really happens, or if I'm either misunderstanding the actual behavior or the compiler optimizes away the inefficiency for me?
Follow up questions, assuming I am correct about the standard behavior.
Is there a syntax to avoid copying of the hash by iterating over it values in the original hash? If not for a hash is there one for the simpler array?
Does this mean that in the above example I could get inconsistent values between the copy of my hash and my actual hash if I modify the hash_ref content within my loop; resulting in $value having a different value then $hash_ref->($key)?
No, the syntax you quote does not create a copy.
This expression:
%{$hash_ref}
is exactly equivalent to:
%$hash_ref
and assuming the $hash_ref scalar variable does indeed contain a reference to a hash, then adding the % on the front is simply 'dereferencing' the reference - i.e. it resolves to a value that represents the underlying hash (the thing that $hash_ref was pointing to).
If you look at the documentation for the each function, you'll see that it expects a hash as an argument. Putting the % on the front is how you provide a hash when what you have is a hashref.
If you wrote your own subroutine and passed a hash to it like this:
my_sub(%$hash_ref);
then on some level you could say that the hash had been 'copied', since inside the subroutine the special #_ array would contain a list of all the key/value pairs from the hash. However even in that case, the elements of #_ are actually aliases for the keys and values. You'd only actually get a copy if you did something like: my #args = #_.
Perl's builtin each function is declared with the prototype '+' which effectively coerces a hash (or array) argument into a reference to the underlying data structure.
As an aside, starting with version 5.14, the each function can also take a reference to a hash. So instead of:
($key, $value) = each(%{$hash_ref})
You can simply say:
($key, $value) = each($hash_ref)
No copy is created by each (though you do copy the returned values into $key and $value through assignment). The hash itself is passed to each.
each is a little special. It supports the following syntaxes:
each HASH
each ARRAY
As you can see, it doesn't accept an arbitrary expression. (That would be each EXPR or each LIST). The reason for that is to allow each(%foo) to pass the hash %foo itself to each rather than evaluating it in list context. each can do that because it's an operator, and operators can have their own parsing rules. However, you can do something similar with the \% prototype.
use Data::Dumper;
sub f { print(Dumper(#_)); }
sub g(\%) { print(Dumper(#_)); } # Similar to each
my %h = (a=>1, b=>2);
f(%h); # Evaluates %h in list context.
print("\n");
g(%h); # Passes a reference to %h.
Output:
$VAR1 = 'a'; # 4 args, the keys and values of the hash
$VAR2 = 1;
$VAR3 = 'b';
$VAR4 = 2;
$VAR1 = { # 1 arg, a reference to the hash
'a' => 1,
'b' => 2
};
%{$h_ref} is the same as %h, so all of the above applies to %{$h_ref} too.
Note that the hash isn't copied even if it is flattened. The keys are "copied", but the values are returned directly.
use Data::Dumper;
my %h = (abc=>"def", ghi=>"jkl");
print(Dumper(\%h));
$_ = uc($_) for %h;
print(Dumper(\%h));
Output:
$VAR1 = {
'abc' => 'def',
'ghi' => 'jkl'
};
$VAR1 = {
'abc' => 'DEF',
'ghi' => 'JKL'
};
You can read more about this here.

Need help understanding portion of script (globs and references)

I was reviewing this question, esp the response from Mr Eric Strom, and had a question regarding a portion of the more "magical" element within. Please review the linked question for the context as I'm only trying to understand the inner portion of this block:
for (qw($SCALAR #ARRAY %HASH)) {
my ($sigil, $type) = /(.)(.+)/;
if (my $ref = *$glob{$type}) {
$vars{$sigil.$name} = /\$/ ? $$ref : $ref
}
}
So, it loops over three words, breaking each into two vars, $sigil and $type. The if {} block is what I am not understanding. I suspect the portion inside the ( .. ) is getting a symbolic reference to the content within $glob{$type}... there must be some "magic" (some esoteric element of the underlying mechanism that I don't yet understand) relied upon there to determine the type of the "pointed-to" data?
The next line is also partly baffling. Appears to me that we are assigning to the vars hash, but what is the rhs doing? We did not assign to $_ in the last operation ($ref was assigned), so what is being compared to in the /\$/ block? My guess is that, if we are dealing with a scalar (though I fail to discern how we are), we deref the $ref var and store it directly in the hash, otherwise, we store the reference.
So, just looking for a little tale of what is going on in these three lines. Many thanks!
You have hit upon one of the most arcane parts of the Perl language, and I can best explain by referring you to Symbol Tables and Typeglobs from brian d foy's excellent Mastering Perl. Note also that there are further references to the relevant sections of Perl's own documentation at the bottom of the page, the most relevant of which is Typeglobs and Filehandles in perldata.
Essentially, the way perl symbol tables work is that every package has a "stash" -- a "symbol table hash" -- whose name is the same as the package but with a pair of trailing semicolons. So the stash for the default package main is called %main::. If you run this simple program
perl -E"say for keys %main::"
you will see all the familiar built-in identifiers.
The values for the stash elements are references to typeglobs, which again are hashes but have keys that correspond to the different data types, SCALAR, ARRAY, HASH, CODE etc. and values that are references to the data item with that type and identifier.
Suppose you define a scalar variable $xx, or more fully, $main:xx
our $xx = 99;
Now the stash for the main package is %main::, and the typeglob for all data items with the identifier xx is referenced by $main::{xx} so, because the sigil for typeglobs is a star * in the same way that scalar identifiers have a dollar $, we can dereference this as *{$main::{xx}}. To get the reference to the scalar variable that has the identifier xx, this typeglob can be indexed with the SCALAR string, giving *{$main::{xx}}{SCALAR}. Once more, this is a reference to the variable we're after, so to collect its value it needs dereferencing once again, and if you write
say ${*{$main::{xx}}{SCALAR}};
then you will see 99.
That may look a little complex when written in a single statement, but it is fairly stratighforward when split up. The code in your question has the variable $glob set to a reference to a typeglob, which corresponds to this with respect to $main::xx
my $type = 'SCALAR';
my $glob = $main::{xx};
my $ref = *$glob{$type};
now if we say $ref we get SCALAR(0x1d12d94) or similar, which is a reference to $main::xx as before, and printing $$ref will show 99 as expected.
The subsequent assignment to #vars is straightforward Perl, and I don't think you should have any problem understanding that once you get the principle that a packages symbol table is a stash of typglobs, or really just a hash of hashes.
The elements of the iteration are strings. Since we don't have a lexical variable at the top of the loop, the element variable is $_. And it retains that value throughout the loop. Only one of those strings has a literal dollar sign, so we're telling the difference between '$SCALAR' and the other cases.
So what it is doing is getting 3 slots out of a package-level typeglob (sometimes shortened, with a little ambiguity to "glob"). *g{SCALAR}, *g{ARRAY} and *g{HASH}. The glob stores a hash and an array as a reference, so we simply store the reference into the hash. But, the glob stores a scalar as a reference to a scalar, and so needs to be dereferenced, to be stored as just a scalar.
So if you had a glob *a and in your package you had:
our $a = 'boo';
our #a = ( 1, 2, 3 );
our %a = ( One => 1, Two => 2 );
The resulting hash would be:
{ '$a' => 'boo'
, '%a' => { One => 1, Two => 2 }
, '#a' => [ 1, 2, 3 ]
};
Meanwhile the glob can be thought to look like this:
a =>
{ SCALAR => \'boo'
, ARRAY => [ 1, 2, 3 ]
, HASH => { One => 1, Two => 2 }
, CODE => undef
, IO => undef
, GLOB => undef
};
So to specifically answer your question.
if (my $ref = *$glob{$type}) {
$vars{$sigil.$name} = /\$/ ? $$ref : $ref
}
If a slot is not used it is undef. Thus $ref is assigned either a reference or undef, which evaluates to true as a reference and false as undef. So if we have a reference, then store the value of that glob slot into the hash, taking the reference stored in the hash, if it is a "container type" but taking the value if it is a scalar. And it is stored with the key $sigil . $name in the %vars hash.

Adding multiple values to key in perl hash

I need to create multi-dimensional hash.
for example I have done:
$hash{gene} = $mrna;
if (exists ($exon)){
$hash{gene}{$mrna} = $exon;
}
if (exists ($cds)){
$hash{gene}{$mrna} = $cds;
}
where $gene, $mrna, $exon, $cds are unique ids.
But, my issue is that I want some properties of $gene and $mrna to be included in the hash.
for example:
$hash{$gene}{'start_loc'} = $start;
$hash{gene}{mrna}{'start_loc'} = $start;
etc. But, is that a feasible way of declaring a hash? If I call $hash{$gene} both $mrna and start_loc will be printed. What could be the solution?
How would I add multiple values for the same key $gene and $mrna being the keys in this case.
Any suggestions will be appreciated.
What you need to do is to read the Perl Reference Tutorial.
Simple answer to your question:
Perl hashes can only take a single value to a key. However, that single value can be a reference to a memory location of another hash.
my %hash1 = ( foo => "bar", fu => "bur" }; #First hash
my %hash2;
my $hash{some_key} = \%hash1; #Reference to %hash1
And, there's nothing stopping that first hash from containing a reference to another hash. It's turtles all the way down!.
So yes, you can have a complex and convoluted structure as you like with as many sub-hashes as you want. Or mix in some arrays too.
For various reasons, I prefer the -> syntax when using these complex structures. I find that for more complex structures, it makes it easier to read. However, the main this is it makes you remember these are references and not actual multidimensional structures.
For example:
$hash{gene}->{mrna}->{start_loc} = $start; #Quote not needed in string if key name qualifies as a valid variable name.
The best thing to do is to think of your hash as a structure. For example:
my $person_ref = {}; #Person is a hash reference.
my $person->{NAME}->{FIRST} = "Bob";
my $person->{NAME}->{LAST} = "Rogers";
my $person->{PHONE}->{WORK}->[0] = "555-1234"; An Array Ref. Might have > 1
my $person->{PHONE}->{WORK}->[1] = "555-4444";
my $person->{PHONE}->{CELL}->[0] = "555-4321";
...
my #people;
push #people, $person_ref;
Now, I can load up my #people array with all my people, or maybe use a hash:
my %person;
$person{$bobs_ssn} = $person; #Now, all of Bob's info is index by his SSN.
So, the first thing you need to do is to think of what your structure should look like. What are the fields in your structure? What are the sub-fields? Figure out what your structure should look like, and then setup your hash of hashes to look like that. Figure out exactly how it will be stored and keyed.
Remember, this hash contains references to your genes (or whatever), so you want to choose your keys wisely.
Read the tutorial. Then, try your hand at it. It's not all that complicated to understand. However, it can be a bear to maintain.
When you say use strict;, you give yourself some protection:
my $foo = "bar";
say $Foo; #This won't work!
This won't work because you didn't declare $Foo, you declared $foo. The use stict; can catch variable names that are mistyped, but:
my %var;
$var{foo} = "bar";
say $var{Foo}; #Whoops!
This will not be caught (except maybe that $var{Foo} has not been initialized. The use strict; pragma can't detect mistakes in typing in your keys.
The next step, after you've grown comfortable with references is to move onto object oriented Perl. There's a Tutorial for that too.
All Object Oriented Perl does is to take your hash references, and turns them into objects. Then, it creates subroutines that will help you keep track of manipulating objects. For example:
sub last_name {
my $person = shift; #Don't worry about this for now..
my $last_name = shift;
if ( exists $last_name ) {
my $person->{NAME}->{LAST} = $last_name;
}
return $person->{NAME}->{LAST};
}
When I set my last name using this subroutine ...I mean method, I guarantee that the key will be $person->{NAME}->{LAST} and not $person->{LAST}->{NAME} or $person->{LAST}->{NMAE}. or $person->{last}->{name}.
The main problem isn't learning the mechanisms, but learning to apply them. So, think about exactly how you want to represent your items. This about what fields you want, and how you're going to pull up that information.
You could try pushing each value onto a hash of arrays:
my (#gene, #mrna, #exon, #cds);
my %hash;
push #{ $hash{$gene[$_]} }, [$mrna[$_], $exon[$_], $cds[$_] ] for 0 .. $#gene;
This way gene is the key, with multiple values ($mrna, $exon, $cds) associated with it.
Iterate over keys/values as follows:
for my $key (sort keys %hash) {
print "Gene: $key\t";
for my $value (#{ $hash{$key} } ) {
my ($mrna, $exon, $cds) = #$value; # De-references the array
print "Values: [$mrna], [$exon], [$cds]\n";
}
}
The answer to a question I've asked previously might be of help (Can a hash key have multiple 'subvalues' in perl?).

Cannot understand the following Perl syntax

I am a first timer with Perl and I have to make changes to a Perl script and I have come across the following:
my %summary ;
for my $id ( keys %trades ) {
my ( $sym, $isin, $side, $type, $usrOrdrNum, $qty ) = #{$trades{$id}} ;
$type = "$side $type" ;
$summary{$sym}{$type} += $qty ;
$summary{$sym}{'ISIN'} = $isin ;
}
The portion I do not understand is $summary{$sym}{$type} += $qty ;. What is the original author trying to do here?
This piece of code populates the %summary hash with a summary of the data in %trades. Each trade is an array with multiple fields which are unpacked inside the loop. I.e. $sym is the value of the first array field of the current trade, $qty the last field
$summary{$sym} accesses the $sym field in the %summary hash. The entry named $type in the $summary{$sym} field is then accessed. If the field does not exist, it is created. If $summary{$sym} does not hold a hashref, one is created there, so everything Just Works. (technical term: autovivification)
$var += $x adds $x to $var, so $summary{$sym}{$type} holds a sum of all $qty values with the same $sym and $type after the loop finishes.
The $summary{$sym}{ISIN} field will hold the $isin value of the last trade with name $sym (I suspect they are the same for all such trades).
Perl has three built in different data types:
Scalar (as in $foo).
Arrays (as in #foo).
Hashes (as in %foo).
The problem is that each of these deal with single bits of data. Sure, there can be lots of items in list and hashes, but they are lots of single bits of data.
Let's say I want to keep track of people. People have a first name, last name, phone, etc. Let's define a person:
my %person;
$person{FIRST_NAME} = "Bob";
$person{LAST_NAME} = "Smith";
$person{PHONE_NUMBER} = "555-1234";
Okay, now I need to store another person. Do I create another hash? What if I could have, say an array of hashes with each hash representing a single person?
Perl allows you to do this by making a reference to the hash:
my #list;
push #list, \%person;
THe \%person is my reference to the memory location that contains my hash. $list[0] points to that memory location and allows me to access that person through dereferencing.
Now, my array contains my person. I can create a second one:
$person{FIRST_NAME} = "Susan";
$person{LAST_NAME} = "Brown";
$person{PHONE_NUMBER} = "555-9876";
push #list, \%person.
Okay, how do I reference my person. In Perl, you dereference by putting the correct sigil in front of your reference. For example:
my $person_ref = #list[0]; #Reference to Bob's hash
my %person = %{person_ref}; #Dereference to Bob's hash. %person is now Bob.
Several things, I'm doing a lot of moving data from one variable to another, and I am not really using those variables. Let's eliminate the variables, or at least their names:
my #list;
push #list, {}; #Anonymous hash in my list
$list[0] still points to a reference to a hash, but I never had to give that hash a name. Now, how do I put Bob's information into it?
If $list[0] is a reference to a hash, I could dereference it by putting %{...} around it!
%person = %{ $list[0] }; #Person is an empty hash, but you get the idea
Let's fill up that hash!
${ $list[0] }{FIRST_NAME} = "Bob";
${ $list[0] }{LAST_NAME} = "Smith";
${ $list[0] }{PHONE_NUMBER} = "555-1234";
That's easy to read...
Fortunately, Perl provides a bit of syntactic sweetener. This is the same:
$list[0]->{FIRST_NAME} = "Bob";
$list[0]->{LAST_NAME} = "Smith";
$list[0]->{PHONE_NUMBER} = "555-1234";
The -> operator points to the dereference you're doing.
Also, in certain circumstances, I don't need the {...} curly braces. Think of it like math operations where there's an order of precedence:
(3 x 4) + (5 x 8)
is the same as:
3 x 4 + 5 x 8
One, I specify the order of operation, and the other I don't:
The original adding names into a hash reference stored in a list:
${ $list[0] }{FIRST_NAME} = "Bob";
${ $list[0] }{LAST_NAME} = "Smith";
${ $list[0] }{PHONE_NUMBER} = "555-1234";
Can be rewritten as:
$list[0]{FIRST_NAME} = "Bob";
$list[0]{LAST_NAME} = "Smith";
$list[0]{PHONE_NUMBER} = "555-1234";
(And I didn't have to do push #list, {}; first. I just wanted to emphasize that this was a reference to a hash.
Thus:
$trades{$id}
Is a reference to an array of data.
Think of it as this way:
my #list = qw(a bunch of data);
$trades{$id} = \#list;
And to dereference that reference to a list, I do this:
#{trades{$id}}
See Mark's Short Tutorial About References.
$summary{$sym}{$type} is a scalar inside a hashref inside a hash.
+= is an operator that takes the left hand side, adds the right hand side to it, then assigns the result back to the left hand side.
$qty is the value to add to the previously stored value.
$summary{$sym}{$type} += $qty ; #is the same as
#$summary{$sym}{$type} = $summary{$sym}{$type} + $qty;
#This line calculates total of the values from the hash %trades ($trades{$id}[5];).
The best way to see types in Perl if you are a newbie is to use perl debugger option.
You can run the script as :
perl -d <scriptname>
And then withoin the debugger (you will see something like this)
DB<1>
type the following to go to the code where you want to debug:
DB<1> c <linenumber>
Then You can use x to see the variables like:
DB<2>x %trades
DB<3>x $trades{$id}
DB<4>print Dumper \%trades
This way you can actually see whats inside the hash or even hash of hash.
It computes the sum of all values in the last field for each combination of values of the first three fields.
If the hash was a SQL table instead (and why not - something like DBD::CSV may come in handy here) with fields id, sym, isin, side, type, usrOrdrNum, qty, the code would translate to something like
SELECT sym, CONCAT(side,' ',type) AS type, SUM(qty), isin
FROM trades
GROUP BY sym, CONCAT(side,' ',type);