Perl hash of hashes rationale - perl

I have decided to give Perl a try and I have stumbled across a language structure that seems to be valid, but I just can't believe it is. As I guess there is some rationale behind this I decided to ask a question.
Take a following Perl code:
%data = ('John Paul' => ('Age' => 45), 'Lisa' => 30);
print "\$data{'John Paul'} = $data{'John Paul'}{'Age'}\n";
print "\$data{'Lisa'} = $data{'Lisa'}\n";
My intention was to check how hash of hashes works. The above code prints:
$data{'John Paul'} =
$data{'Lisa'} =
To make it a valid hash of hashes one needs:
%data = ('John Paul' => {'Age' => 45}, 'Lisa' => 30);
and the result would be:
$data{'John Paul'} = 45
$data{'Lisa'} = 30
Does anyone know:
Why there is non uniformity and the internal hash needs {} instead of ()?
Why do I get no error or warning that something is wrong when there is () instead of {} for the internal hash? It is very easy to do such kind of mistakes. What is more, ('Age' => 45) breaks not only the value for 'John Paul' but also for 'Lisa'. I just can't imagine searching for such kind of "bugs" in project with thousands lines of code.

( 'John Paul' => ( 'Age' => 45 ), 'Lisa' => 30 )
is just another way of writing
'John Paul', 'Age', 45, 'Lisa', 30
Parens don't create any data structure; they just affect precedence like in (3+4)*5. The reason we don't write
my %h = a => 4;
or the equivalent
my %h = 'a', 4;
is that it would be interpreted as
( my %h = 'a' ), 4;
What creates the hash is my %data, not the parens. The right-hand side of the assignment just places an arbitrary number of scalars on the stack, not a hash. The assignment operator adds these scalars to the hash.
But sometimes, we want to create an anonymous hash. This is where {} comes in.
my %data = ( 'John Paul' => { 'Age' => 45 }, 'Lisa' => 30 );
is basically equivalent to
my %anon = ( 'Age' => 45 );
my %data = ( 'John Paul' => \%anon, 'Lisa' => 30 );
Note that \%anon returns a scalar, a reference to a hash. This is fundamentally different than what ( 'John Paul' => \%anon, 'Lisa' => 30 ) and 'John Paul' => \%anon, 'Lisa' => 30 return, four scalars.
Why there is non uniformity and the internal hash needs {} instead of ()?
An underlying premise of this question is false: Hashes don't need (). For example, the following are perfectly valid:
my %h1 = 'm'..'p';
sub f { return x => 4, y => 5 }
my %h2 = f();
my %h3 = do { i => 6, j => 7 };
() has nothing to do with hashes. The lack of uniformity comes from the lack of parallel. One uses {} to create a hash. One uses () to override precedence.
Since parens just affect precedence, one could use
my %data = ( 'John Paul' => ({ 'Age' => 45 }), 'Lisa' => 30 ); # ok (but weird)
This is very different than the following:
my %data = ( 'John Paul' => ( 'Age' => 45 ), 'Lisa' => 30 ); # XXX
Why do I get no error or warning that something is wrong when there is () instead of {} for the internal hash?
Not only is using () valid, using () around expressions that contain commas is commonly needed. So when exactly should it warn? The point is that it's arguable whether this should be a warning or something perlcritic finds, at least at first glance. The latter should definitely find this, but I wouldn't know if a rule for it exists or not.

Why there is non uniformity and the internal hash needs {} instead of ()?
An assignment to a hash is a list of scalars (alternating between keys and values).
You can't have a hash (because it isn't a scalar) as a value there, but you can have a hash reference.
Lists get flattened.
Why do I get no error or warning that something is wrong when there is () instead of {} for the internal hash?
Because you didn't turn them on with the use strict; use warnings; pragmas (which are off by default for reasons of horrible backwards compatibility but which will be on by default in Perl 7).

Related

Perl: Access hash of dynamic depth

I am struggling with accessing/ modifying hashes of unknown (i.e. dynamic) depth.
Suppose I am reading in a table of measurements (Length, Width, Height) from a file, then calculating Area and Volume to create a hash like the following:
# #Length Width Height Results
my %results = (
'2' => {
'3' => {
'7' => {
'Area' => 6,
'Volume' => 42,
},
},
},
'6' => {
'4' => {
'2' => {
'Area' => 24,
'Volume' => 48,
},
},
},
);
I understand how to access a single item in the hash, e.g. $results{2}{3}{7}{'Area'} would give me 6, or I could check if that combination of measurements has been found in the input file with exists $results{2}{3}{7}{'Area'}. However that notation with the series of {} braces assumes I know when writing the code that there will be 4 layers of keys.
What if there are more or less and I only discover that at runtime? E.g. if there were only Length and Width in the file, how would you make code that would then access the hash like $results{2}{3}{'Area'}?
I.e. given a hash and dynamic-length list of nested keys that may or may not have a resultant entry in that hash, how do you access the hash for basic things like checking if that key combo has a value or modifying the value?
I almost want a notation like:
my #hashkeys = (2,3,7);
if exists ( $hash{join("->",#hashkeys)} ){
print "Found it!\n";
}
I know you can access sub-hashes of a hash and get their references so in this last example I could iterate through #hashkeys, checking for each one if the current hash has a sub-hash at that key and if so, saving a reference to that sub-hash for the next iteration. However, that feels complex and I suspect there is already a way to do this much easier.
Hopefully this is enough to understand my question but I can try to work up a MWE if not.
Thanks.
So here's a recursive function which does more or less what you want:
sub fetch {
my $ref = shift;
my $key = shift;
my #remaining_path = #_;
return undef unless ref $ref;
return undef unless defined $ref->{$key};
return $ref->{$key} unless scalar #remaining_path;
return fetch($ref->{$key}, #remaining_path);
}
fetch(\%results, 2, 3, 7, 'Volume'); # 42
fetch(\%results, 2, 3); # hashref
fetch(\%results, 2, 3, 7, 'Area', 8); # undef
fetch(\%results, 2, 3, 8, 'Area'); # undef
But please check the comment about bad data structure which is already given by someone else, because it's very true. And if you still think that this is what you need, at least rewrite it using a for-loop, as perl does not optimize tail recursion.
Take a look at $; in "man perlvar".
http://perldoc.perl.org/perlvar.html#%24%3b
You may use the idea to convert variable length array into single key.
my %foo;
my (#KEYS)=(2,3,7);
$foo{ join( $; , #KEYS ) }{Area}=6;
$foo{ join( $; , #KEYS ) }{Volume}=42;

Perl using 'map' to rewrite these codes

Can I use 'map' or some similar function to make the codes simpler?
# $animal and #loads are pre-defined somewhere else.
my #bucket;
foreach my $item (#loads) {
push #bucket, $item->{'carrot'} if $animal eq 'rabbit' && $item->{'carrot'};
push #bucket, $item->{'meat'} if $animal eq 'lion' && $item->{'meat'};
}
Are you looking for something like this?
%foods = ( 'lion' => 'meat', 'rabbit' => 'carrot');
# ...
foreach my $item (#loads) {
push #bucket, $item->{$food{$animal}} if $item->{$food{$animal}};
}
This question would be easier to answer authoritatively with a bit more sample data. As it is I need to make a lot of assumptions.
Assuming:
#loads = (
{ carrot => 47, meat => 32, zebras => 25 },
{ carrot => 7, zebras => 81 },
);
and #buckets should look like:
#buckets = ( 47, 32, 7 );
when #animals looks like:
#animals = qw/ rabbit lion /;
Here's a maptastic approach. To understand it you will need to think in terms of lists of values as the operands rather than scalar operands:
my #animals = qw/ rabbit lion /;
my %eaten_by = (
lion => 'meat',
rabbit => 'carrot',
mouse => 'cheese',
);
# Use a "hash slice" to get a list of foods consumed by desired animals.
# hash slices let you access a list of hash values from a hash all at once.
my #foods_eaten = #eaten_by{ #animals };
# Hint: read map/grep chains back to front.
# here, start with map #loads and then work back to the assignment
my #bucket =
grep $_, # Include only non-zero food amounts
map #{$_}{#foods_eaten}, # Hash slice from the load, extract amounts of eaten foods.
map #loads; # Process a list of loads defined in the loads array
Rewritten in a verbose nested loop you get:
my #buckets;
for my $item ( #loads ) {
for my $animal ( #animals ) {
my $amount = $item{ $eaten_by{$animal} };
next unless $amount;
push #buckets, $amount;
}
}
Which one to use? It all depends on your audience--who will be maintaining the code? Are you working with a team of Perl hackers featuring 4 of the perl5porters? use the first one. Is your team composed of one or two interns that come and go with the seasons who will spend 1% of their time working on code of any kind? Use the second example. Likely, your situation is somewhere in the middle. Use your discretion.
Happy hacking!

Slicing a hash using keys stored in a hash of arrays in Perl

Inspired by the answer to this other question: Slicing a nested hash in Perl, what is the syntax for slicing a hash using a list of keys held in another hash?
I thought the following would do it, but it doesn't:
#slice_result = #{$hash1{#($hash_2{$bin})}};
I get an error that says "scalar found where operator expected". ?
Your ambiguous description of your data make me think you're not even sure of what you have. You should spend some time absorbing the structure of your data until you can describe it clearly.
I think you are saying you have
my %hash1 = (
apple => 2,
banana => 3,
orange => 4,
);
my %hash2 = (
red => [qw( apple )],
yellow => [qw( apple banana )],
orange => [qw( orange )],
);
You want to use the array referenced by one of the elements on %hash2 as the keys of a slice of %hash1. If you understand that, it's just a question of doing it step by step.
$hash2{yellow}
will get us the reference to the desired array, and
#{ $hash2{yellow} }
will get us the array itself. We want to use that as the index expression of a hash slice
#hash1{EXPR}
so we get:
#hash1{ #{ $hash2{yellow} } } # 2,3
This is the correct syntax for a hash slice based on the keys of another hash:
my %hash1 = ( 'this' => 2,
'that' => 1,
);
my %hash2 = ( 'this' => 'two',
'that' => 'one',
);
my #slice = #hash1{keys %hash2};
print #slice # prints 12;

How can I find out if a hash has an odd number of elements in assignment?

How could I find out if this hash has an odd number of elements?
my %hash = ( 1, 2, 3, 4, 5 );
Ok, I should have written more information.
sub routine {
my ( $first, $hash_ref ) = #_;
if ( $hash_ref refers to a hash with odd numbers of elements ) {
"Second argument refers to a hash with odd numbers of elements.\nFalling back to default values";
$hash_ref = { option1 => 'office', option2 => 34, option3 => 'fast' };
}
...
...
}
routine( [ 'one', 'two', 'three' ], { option1 =>, option2 => undef, option3 => 'fast' );
Well, I suppose there is some terminological confusion in the question that should be clarified.
A hash in Perl always has the same number of keys and values - because it's fundamentally an engine to store some values by their keys. I mean, key-value pair should be considered as a single element here. )
But I guess that's not what was asked really. ) I suppose the OP tried to build a hash from a list (not an array - the difference is subtle, but it's still there), and got the warning.
So the point is to check the number of elements in the list which will be assigned to a hash. It can be done as simple as ...
my #list = ( ... there goes a list ... );
print #list % 2; # 1 if the list had an odd number of elements, 0 otherwise
Notice that % operator imposes the scalar context on the list variable: it's simple and elegant. )
UPDATE as I see, the problem is slightly different. Ok, let's talk about the example given, simplifying it a bit.
my $anhash = {
option1 =>,
option2 => undef,
option3 => 'fast'
};
See, => is just a syntax sugar; this assignment could be easily rewritten as...
my $anhash = {
'option1', , 'option2', undef, 'option3', 'fast'
};
The point is that missing value after the first comma and undef are not the same, as lists (any lists) are flattened automatically in Perl. undef can be a normal element of any list, but empty space will be just ignored.
Take note the warning you care about (if use warnings is set) will be raised before your procedure is called, if it's called with an invalid hash wrapped in reference. So whoever caused this should deal with it by himself, looking at his own code: fail early, they say. )
You want to use named arguments, but set some default values for missing ones? Use this technique:
sub test_sub {
my ($args_ref) = #_;
my $default_args_ref = {
option1 => 'xxx',
option2 => 'yyy',
};
$args_ref = { %$default_args_ref, %$args_ref, };
}
Then your test_sub might be called like this...
test_sub { option1 => 'zzz' };
... or even ...
test_sub {};
The simple answer is: You get a warning about it:
Odd number of elements in hash assignment at...
Assuming you have not been foolish and turned warnings off.
The hard answer is, once assignment to the hash has been done (and warning issued), it is not odd anymore. So you can't.
my %hash = (1,2,3,4,5);
use Data::Dumper;
print Dumper \%hash;
$VAR1 = {
'1' => 2,
'3' => 4,
'5' => undef
};
As you can see, undef has been inserted in the empty spot. Now, you can check for undefined values and pretend that any existing undefined values constitutes an odd number of elements in the hash. However, should an undefined value be a valid value in your hash, you're in trouble.
perl -lwe '
sub isodd { my $count = #_ = grep defined, #_; return ($count % 2) };
%a=(a=>1,2);
print isodd(%a);'
Odd number of elements in hash assignment at -e line 1.
1
In this one-liner, the function isodd counts the defined arguments and returns whether the amount of arguments is odd or not. But as you can see, it still gives the warning.
You can use the __WARN__ signal to "trap" for when a hash assignment is incorrect.
use strict ;
use warnings ;
my $odd_hash_length = 0 ;
{
local $SIG{__WARN__} = sub {
my $msg = shift ;
if ($msg =~ m{Odd number of elements in hash assignment at}) {
$odd_hash_length = 1 ;
}
} ;
my %hash = (1, 2, 3, 4, 5) ;
}
# Now do what you want based on $odd_hash_length
if ($odd_hash_length) {
die "the hash had an odd hash length assignment...aborting\n" ;
} else {
print "the hash was initialized correctly\n";
}
See also How to capture and save warnings in Perl.

How to convert a Perl hash-of-hashes to a more flexible data structure?

In a quick-and-dirty Perl script, I have a data structure like this:
$tax_revenue{YEAR}{STATE}{GOVLEV}{TAX} = integer
The hash keys assume values like this:
YEAR: 1900 .. 2000
STATE: AK, AL, ... WY
GOVLEV: state, local
TAX: type of tax (income, sales, etc.)
In addition, the hash keys are unique. For example, no value for the TAX parameter collides with a value for another other parameter.
I am starting a medium-sized project working with this data and I would like to implement the data structure in a more flexible way. I don't know all of the data-retrieval functionality I will need yet, but here are some examples:
# Specify the parameters in any order.
Tax_rev( qw(1902 WY state property) );
Tax_rev( qw(state property 1902 WY) );
# Use named parameters.
Tax_rev(year => 1902, state => 'WY', govlev => 'state', tax => 'property');
# Use wildcards to obtain a list of values.
# For example, state property tax revenue in 1902 for all states.
Tax_rev( qw(1902 * state property) );
My initial inclination was to keep storing the data as a hash-of-hashes and to build one or more utility functions (probably as part of a class) to retrieve the values. But then I wondered whether there is a better strategy -- some way of storing the underlying data other than a hash-of-hashes. Any advice about how to approach this problem would be appreciated.
Please consider putting the data in an SQLite database. Then, you have the flexibility of running whatever query you want (via DBI or just the command line interface to SQL) and getting data structures that are suitable for generating reports for taxes by state or states by taxes or taxes for a given year for all states whose names begin with the letter 'W' etc etc. I presume the data are already in some kind of character separated format (tab, comma, pipe etc) and therefore can be easily bulk imported into an SQLite DB, saving some work and code on that end.
If you want a pure Perl implementation, you could build an array of hashes:
my #taxdata = (
{ year => 1902, state => 'WY', level => 'state', type => 'property', amount => 500 },
# ...
);
my #matches = grep {
$_->{year} == 1902 &&
$_->{level} eq 'state' &&
$_->{type} eq 'property'
} #taxdata;
That's flexible if you want to run arbitrary queries against it, but slow if you want to be able to get to a specific record.
A better solution might be a database with a single table where each row contains the fields you listed. Then you could write an SQL query to extract data according to arbitrary criteria. You can use the DBI module to handle the connection.
I would advise you to look into an object system such as Moose. The learning curve isn't too steep (or steep at all) and the benefits will be enormous. You'd start with something like:
package MyApp;
use Moose; # use strict automagically in effect
has 'year' => ( is => 'ro', isa => 'Int', required => 1 );
has 'state' => ( is => 'ro', isa => 'Str', required => 1 );
has 'govlev' => ( is => 'ro', isa => 'Str', required => 1 );
has 'tax' => ( is => 'ro', isa => 'Str', required => 1 );
Then in your main program:
use MyApp;
my $obj = MyApp->new(
year => 2000,
state => 'AK',
govlev => 'local',
tax => 'revenue'
);
# ...
With the flexibility of MooseX::Types you can go on to declare your own type classes, with enums, etc.
Once you go Moose, you never look back :)
Check out Data::Diver: "Simple, ad-hoc access to elements of deeply nested structures". It seems to do exactly what you want from Tax_rev:
use Data::Diver qw( Dive );
...
$tax_revenue{ 1900 }{ NC }{ STATE }{ SALES } = 1000;
...
Dive( \%Hash, qw( 1900 NC STATE SALES ) ) => 1000;
Dive( \%Hash, qw( 1901 NC STATE SALES ) ) => undef;
If you aren't going to use objects, I think that data structure will work just fine.
Here is an example of Tax_rev(). It isn't full featured, but you can give it the 4 arguments in any order. If you actually use it you might want to check the inputs.
my $result = Tax_rev( \%data, qw(state property 1902 WY) );
use strict;
use warnings;
use 5.010;