Perl hash of hashes of hashes of hashes... is there an 'easy' way to get an element at the end of the list? - perl

I have a Perl hash of hashes of ... around 11 or 12 elements deep. Please forgive me for not repeating the structure below!
Some of the levels have fixed labels, e.g. 'NAMES', 'AGES' or similar so accessing these levels are fine as I can use the labels directly, but I need to loop over the other variables which results in some very long statements. This is an example of half of one set of loops:
foreach my $person (sort keys %$people) {
foreach my $name (sort keys %{$people->{$person}{'NAMES'}}) {
foreach my $age (sort keys %{$people->{$person}{'NAMES'}{$name}{'AGES'}}) {
. . . # and so on until I get to the push #list,$element; part
This is just an example, but it follows the structure of the one I have. It might be shorter not to have the fixed name sections (elements in caps) but they are required for reference purposes else where.
I tried to cast the elements as hashes to shorten it at each stage,
e.g. for the second foreach I tried various forms of:
foreach my $name (sort keys %{$person->{'NAMES'}})
but this didn't work. I'm sure I've seen something similar before, so the semantics may be incorrect.
I've studied pages regarding Hash of Hashes and references to hashes and their elements and so on without luck. I've seen examples of while each loops but they don't seem to be particularly shorter or easier to implement. Maybe there is just a different method of doing this and I'm missing the point. I've written out the full set of foreach loops once and it would be great if I don't have to repeat it another six times or so.
Of course, there may be no 'easy' way, but all help appreciated!

$person is the key, to shorten things for the inner loops you need to assign the value to something:
foreach my $person_key (sort keys %$people) {
my $person = $people->{$person_key};
my $names = $person->{NAMES};
foreach my $name (sort keys %$names) {

Also you can work with each keyword. This definetly should help.
while( my ($person, $val1) = each(%$people) ) {
while( my ($name, $val2) = each(%$val1) ) {
while( my ($age, $val3) = each(%$val2) ) {
print $val3->{Somekey};

You could use Data::Walk, which is kind of File::Find for data structures.

If you want to build a somewhat more flexible solution, you could traverse the data tree recursively. Consider this example data tree (arbitrary depth):
Example data
my %people = (
memowe => {
NAMES => {
memo => {AGE => 666},
we => {AGE => 667},
},
},
bladepanthera => {
NAMES => {
blade => {AGE => 42},
panthera => {AGE => 17},
},
},
);
From your question I concluded you just want to work on the leaves (AGEs in this case). So one could write a recursive traverse subroutine that executes a given subref on all leaves it could possibly find in key-sorted depth-first order. This subref gets the leave itself and a path of hash keys for convenience:
Preparations
sub traverse (&$#) {
my ($do_it, $data, #path) = #_;
# iterate
foreach my $key (sort keys %$data) {
# handle sub-tree
if (ref($data->{$key}) eq 'HASH') {
traverse($do_it, $data->{$key}, #path, $key);
next;
}
# handle leave
$do_it->($data->{$key}, #path, $key);
}
}
I think it's pretty clear how this guy works from the inlined comments. It would be no big change to execute the coderef on all nodes and not the leaves only, if you wanted. Note that I exceptionally added a prototype here for convenience because it's pretty easy to use traverse with the well-known map or grep syntax:
Executing stuff on your data
traverse { say shift . " (#_)" } \%people;
Also note that it works on hash references and we initialized the #path with an implicit empty list.
Output:
42 (bladepanthera NAMES blade AGE)
17 (bladepanthera NAMES panthera AGE)
666 (memowe NAMES memo AGE)
667 (memowe NAMES we AGE)
The given subroutine (written as a { block }) could do anything with the given data. For example this more readable push subroutine:
my #flattened_people = ();
traverse {
my ($thing, #path) = #_;
push #flattened_people, { age => $thing, path => \#path };
} \%people;

Related

Find key in subhash without iterate through the whole hash

I have a hash that looks like this:
my $hash = {
level1_f1 => {
level2_f1 => 'something',
level2_f2 => 'another thing'
},
level1_f2 => {
level2_f3 => 'yet another thing',
level2_f4 => 'bla bla'
level2_f5 => ''
}
...
}
I also got a list of values that correspond to the "level2" keys, which I want to know if thy exist in the hash.
#list = ("level2_f2", "level2_f4", "level2_f99")
I don't know which "level1" key each element of #list belongs to. The only way of finding if they existed I could think was using a foreach loop to go through #list, another foreach loop to go through the keys of %hash and checking
foreach my $i (#array) {
foreach my $k (keys %hash) {
if (exists $hash{$k}{$list[$i]})
}
}
but I wanted to know if there is a more eficient or maybe a more elegant way to do it. All the answers I found ask you to know the "level1" key, which I don't.
Thanks!!
Use values:
for my $inner_hash (values %$hash) {
say grep exists $inner_hash->{$_}, #list;
}
You have to loop all the level1 keys. But if you don't need to know which keys match and merely care for the existence of any, then you don't have to ask for each member of your list explicitly. You could say
foreach my $k (keys %hash) {
if ( #{ $hash{$k} }{ #list } )
{
}
}
The hash slice will return all values in the subhash which have matching keys in the list. Keys in the list that are not in the subhash get ignored.
Note however, that this does potentially more work than you may really need.
You don't need to iterate over "the entire hash".
You will necessarily over the elements of the outer hash since you want to check the value of each one, but you don't need to iterate over the elements of the inner hashes. Your solution already demonstrates that.
So your solution is as efficient as it can be, at least in terms of how well it scales. You can only perform small optimizations such as stopping as soon as a match is found.
for my $i (#list) {
while ( my (undef, $inner) = each(%hash) ) {
if (exists($inner->{$i}) {
...
last;
}
}
keys(%hash); # Reset iterator since it might not be exhausted.
}
As a micro optimization, it might be beneficial to invert the nesting of the loops.
my %list = map { $_ => 1 } #list;
while ( my (undef, $inner) = each(%hash) ) {
while (defined( my $k = each(%$inner) )) {
if ($list{$k}) {
delete($list{$k});
...
last if !keys(%list);
}
}
keys(%$inner); # Reset iterator since it might not be exhausted.
last if !keys(%list);
}
keys(%hash); # Reset iterator since it might not be exhausted.
If the hashes are small, these changes might actually slow things down.
Honestly, if there's truly a speed issue, the problem is that you used the wrong data structure for the type of query you want to run on it!

How can I work with multiple values using ->?

I am new to Perl and made some changes to an existing script, but I am not sure if this a right usage in Perl. In C# we do things differently, so is the code sample below correct?
$group->{$type}{class} = 1;
The code I added is
$group->{$type}{class} = 1;
$group->{$name}{port} = 1;
Is this right? Can $group point to both type and name. I tried this with a sample Perl script and it seemed to set and return '1' correctly. But I am not sure if this is how I should do this.
Yes, that looks correct. You are building a complex data structure, specifically a hash of hashes (HoH). The group hash has two keys, $type and $name. The $type subhash has one key, class. The $name subhash has one key, port. It looks like this, roughly, if you dumped it or declared it all at once:
$group = {
$type => {
class => 1
},
$name => {
port => 1
}
}
Of course, $type and $name will evaluate to whatever they're set to. It won't store the reference in the hash.
Um ... this is perl. If it works, it's right. But, to your question. In your code $group is a reference to a hash (maybe called a dictionary in c#). I think you are probably looking for this:
my $group={}; # make the ref
my #types = ('hot','cold','warm'); # make some types
my #names = ('sink','bath','drain'); # and some names
foreach my $type (#types){
$group->{'type'}->{$type}++; # add a new $type to the "type" sub hash
}
foreach my $name (#names){
$group->{'name'}->{$name}++; # add a new $nameto the "name" sub hash
}
Now cycle through the types for example:
foreach my $typeKey (keys %{$group->{'type'}}){
print "type is " . $typeKey; # this is from the #types array
print ", value = " . $group->{'type'}->{$typeKey}; # this would be 1
}

Search duplicates in Hash of Hash

i cant resolv this in my mind, its too much to me, perhaps someone can help me:
#Hosts = ("srv1","db1","srv2","db3","srv3","db3","srv3","db4","srv3","db5");
my $count = #Hosts;
$count= $count / 2;
my %hash;
$i = 0;
$ii = 1;
$j = 0;
$jj = 0;
while ($jj < $count) {
$hash{$j}{$Hosts[$i]} = $Hosts[$ii];
$i = $i + 2;
$ii = $ii +2;
$j++;
$jj++
}
print Dumper(\%hash);
Output:
$VAR1 = {
'4' => {
'srv4' => 'db3'
},
'1' => {
'srv2' => 'db3'
},
'3' => {
'srv3' => 'db3'
},
'0' => {
'srv1' => 'db1'
},
'2' => {
'srv3' => 'db3'
}
'5' => {
'srv3' => 'db5'
}
};
I Know this i ugly code, i dont know how to do that better, what i need to do is find douple servers and douple dbs, and put the positions and the string of the duplicates in an array ore something like that, i want to generate a Nagvis Map file out of that.
The Icinga Config file contains am Member string like this:
members srv1, db1, srv2, db3, srv3, db3, srv3, db3, srv4
It has pairs server, db, server, db, here is a sample of the Nagvis Config:
define host {
object_id=5e78fb
host_name=srv1
x=237
y=122
}
define service {
object_id=30646e
host_name=srv1
service_description=db1
x=52
y=122
}
define host {
object_id=021861
host_name=srv2
x=237
y=217
}
define service {
object_id=a5e725
host_name=srv1
service_description=db2
x=52
y=217
}
Thanks in advance
You need to clarify exactly what you want. It's very difficult to tell by your description.
And, your code is in very poor condition. Indenting loops and if statements like this:
while ($jj < $anzahl) {
$hash{$j}{$Hosts[$i]} = $Hosts[$ii];
$i = $i + 2;
$ii = $ii +2;
$j++;
$jj++
}
Makes your code much easier to understand. You also use generic names. What data is stored in #array? Is it a list of systems. Call it #systems. What is $i and $jj suppose to represent? What do you want $hash{$j}{$Hosts[$i]} to represent?
You should always, always, always add the following lines to the top of your program:
use strict;
use warnings;
If you use strict, you must declare all of your variables with my. This makes sure you don't do things like have #array in one place and #Hosts in another. These two lines will catch about 90% of your errors.
I don't know if you want a list of all the DB system that connect to the various servers or if you want a list of the various servers that connect to the DB systems. Therefore, I'll give you both.
I am guessing that your #array is a list of all of your machines and databases in one list:
use strict;
use warnings;
use feature qw(say); # Allows me to use "say" instead of "print"
use Data::Dumper;
my #systems = qw( # The qw(...) is like putting quotes around each word.
svr1 db1 # A nice way to define an array...
srv2 db3
srv3 db3
srv3 db4
srv3 db5
);
my %db_systems; # Database systems with their servers.
my %servers; # Servers with their database systems.
for (;;) { # Loop forever (until I say otherwise)
my $server = shift #systems;
#
# Let's check to make sure that there's a DB machine for this server
#
if ( not #systems ) {
die qq(Cannot get database for server "$server". Odd number of items in array);
}
my $database = shift #systems;
$servers{$server}->{$database} = 1;
$db_systems{$database}->{$server} = 1;
last if not #systems; # break out of loop if there are no more systems
}
say "Servers:" . Dumper \%servers;
say "Databases: " . Dumper \%db_systems;
This produces:
Servers:$VAR1 = {
'srv3' => {
'db4' => 1,
'db3' => 1,
'db5' => 1
},
'svr1' => {
'db1' => 1
},
'srv2' => {
'db3' => 1
}
};
Databases: $VAR1 = {
'db4' => {
'srv3' => 1
},
'db3' => {
'srv3' => 1,
'srv2' => 1
},
'db5' => {
'srv3' => 1
},
'db1' => {
'svr1' => 1
}
};
Is this close to what you want?
Addendum
Hi this is working!! Now i need to understand how to access the Values to print them in my file. This hash of hash thing is kind off ruff to mee. Thanks for that quick Help!
You need to read the Perl tutorial on References and the Perl Reference Page on References.
In Perl, all data is scalar which means that variables talk about single values. In other programming languages, you have structures or records, but not Perl.
Even arrays and hashes are nothing but collections of individual bits of data. What happens when you need something a bit more complex?
A reference is a memory location of another Perl data structure. You could have references to scalar variables like $foo, but that wouldn't do you much good in most circumstances. Where this is helpful is when you have a reference pointing to an array or a hash. This way, you could have much more complex structures that can be used to represent this data.
Imagine an array of ten items ($foo[0] to $foo[9]). Each entry in the array is pointing to another array of ten items. There are now 101 separate arrays being referenced here. We can treat them as a single structure, but it's important to remember that they are separate arrays.
I have a reference to an array at $foo[0]. How do I get access to the array itself? I do what is known as a dereference. To do that, I use curly braces with the right sigil in front. (The sigil is the $, #, or % you see in front of Perl variables:
$foo[0]; # Reference to an array
my #temp = #{ $foo[0] }; # Dereferencing.
my $temp[0]; # Now I can access that inner array
Having to use a temporary array each time I have to dereference it is rather clumsy, so I don't have to:
$foo[0]; # Reference to an array
my $value = ${ $foo[0] }[0]; # Getting the value of an item in my array reference
You can see that last is a bit hard to read. Imagine if I have a hash of a hash of an array of items:
my $phone = ${ ${ ${ $employee{$emp_number} }{phone} }[0] }{NUMBER};
It's a bit unwieldy. Fortunately, Perl allows you a few shortcuts. First, I can nest the references and use the default precedence:
my $phone = $employee{$emp_number}{phone}[0]{NUMBER};
I prefer using the -> notation:
my $phone = $employee{$emp_number}->{phone}->[0]->{NUMBER};
The arrow notion is cleaner because it separates the parts out, and it reminds you these are references!. and, not some complex structure data structure. This helps remind you when you have to do a dereference such as when you use the key, pop, or push commands:
for my $field ( keys %{ $employee } ) { # Dereference the hash
say "Field $field = " . $employee{$emp_number}->{$field}
if ( not ref $employee{$emp_number}->{$field} );
}
Look up the ref to see what it does and why I am only interested in printing out the field if ref returns an empty string.
By now, you should be able to see how to access your hash of hashes using the -> syntax:
my $db_for_server = $servers{$server}->{$database};
And you can use two loops:
for my $server ( keys %servers } {
my %db_systems = %{ $servers{$server} }; # Dereferencing
for my $db_system ( keys %db_systems } {
say "Server $server has a connection to $db_systems{$db_system}";
}
}
Or, without an intermediate hash...
for my $server { keys %servers } {
for my $db_system ( keys %{ $servers{$server} } ) {
say "Server $server has a connection to " . $servers{$server}->{$db_system};
}
}
Now, go out there and get a good book on Modern Perl. You need to learn good programming techniques like using good variable names, indenting, and using strict and warnings in order to help you write better programs that are easier to decipher and support.

How can I cleanly turn a nested Perl hash into a non-nested one?

Assume a nested hash structure %old_hash ..
my %old_hash;
$old_hash{"foo"}{"bar"}{"zonk"} = "hello";
.. which we want to "flatten" (sorry if that's the wrong terminology!) to a non-nested hash using the sub &flatten(...) so that ..
my %h = &flatten(\%old_hash);
die unless($h{"zonk"} eq "hello");
The following definition of &flatten(...) does the trick:
sub flatten {
my $hashref = shift;
my %hash;
my %i = %{$hashref};
foreach my $ii (keys(%i)) {
my %j = %{$i{$ii}};
foreach my $jj (keys(%j)) {
my %k = %{$j{$jj}};
foreach my $kk (keys(%k)) {
my $value = $k{$kk};
$hash{$kk} = $value;
}
}
}
return %hash;
}
While the code given works it is not very readable or clean.
My question is two-fold:
In what ways does the given code not correspond to modern Perl best practices? Be harsh! :-)
How would you clean it up?
Your method is not best practices because it doesn't scale. What if the nested hash is six, ten levels deep? The repetition should tell you that a recursive routine is probably what you need.
sub flatten {
my ($in, $out) = #_;
for my $key (keys %$in) {
my $value = $in->{$key};
if ( defined $value && ref $value eq 'HASH' ) {
flatten($value, $out);
}
else {
$out->{$key} = $value;
}
}
}
Alternatively, good modern Perl style is to use CPAN wherever possible. Data::Traverse would do what you need:
use Data::Traverse;
sub flatten {
my %hash = #_;
my %flattened;
traverse { $flattened{$a} = $b } \%hash;
return %flattened;
}
As a final note, it is usually more efficient to pass hashes by reference to avoid them being expanded out into lists and then turned into hashes again.
First, I would use perl -c to make sure it compiles cleanly, which it does not. So, I'd add a trailing } to make it compile.
Then, I'd run it through perltidy to improve the code layout (indentation, etc.).
Then, I'd run perlcritic (in "harsh" mode) to automatically tell me what it thinks are bad practices. It complains that:
Subroutine does not end with "return"
Update: the OP essentially changed every line of code after I posted my Answer above, but I believe it still applies. It's not easy shooting at a moving target :)
There are a few problems with your approach that you need to figure out. First off, what happens in the event that there are two leaf nodes with the same key? Does the second clobber the first, is the second ignored, should the output contain a list of them? Here is one approach. First we construct a flat list of key value pairs using a recursive function to deal with other hash depths:
my %data = (
foo => {bar => {baz => 'hello'}},
fizz => {buzz => {bing => 'world'}},
fad => {bad => {baz => 'clobber'}},
);
sub flatten {
my $hash = shift;
map {
my $value = $$hash{$_};
ref $value eq 'HASH'
? flatten($value)
: ($_ => $value)
} keys %$hash
}
print join( ", " => flatten \%data), "\n";
# baz, clobber, bing, world, baz, hello
my %flat = flatten \%data;
print join( ", " => %flat ), "\n";
# baz, hello, bing, world # lost (baz => clobber)
A fix could be something like this, which will create a hash of array refs containing all the values:
sub merge {
my %out;
while (#_) {
my ($key, $value) = splice #_, 0, 2;
push #{ $out{$key} }, $value
}
%out
}
my %better_flat = merge flatten \%data;
In production code, it would be faster to pass references between the functions, but I have omitted that here for clarity.
Is it your intent to end up with a copy of the original hash or just a reordered result?
Your code starts with one hash (the original hash that is used by reference) and makes two copies %i and %hash.
The statement my %i=%{hashref} is not necessary. You are copying the entire hash to a new hash. In either case (whether you want a copy of not) you can use references to the original hash.
You are also losing data if your hash in the hash has the same value as the parent hash. Is this intended?

Traversing a multi-dimensional hash in Perl

If you have a hash (or reference to a hash) in perl with many dimensions and you want to iterate across all values, what's the best way to do it. In other words, if we have
$f->{$x}{$y}, I want something like
foreach ($x, $y) (deep_keys %{$f})
{
}
instead of
foreach $x (keys %f)
{
foreach $y (keys %{$f->{$x})
{
}
}
Stage one: don't reinvent the wheel :)
A quick search on CPAN throws up the incredibly useful Data::Walk. Define a subroutine to process each node, and you're sorted
use Data::Walk;
my $data = { # some complex hash/array mess };
sub process {
print "current node $_\n";
}
walk \&process, $data;
And Bob's your uncle. Note that if you want to pass it a hash to walk, you'll need to pass a reference to it (see perldoc perlref), as follows (otherwise it'll try and process your hash keys as well!):
walk \&process, \%hash;
For a more comprehensive solution (but harder to find at first glance in CPAN), use Data::Visitor::Callback or its parent module - this has the advantage of giving you finer control of what you do, and (just for extra street cred) is written using Moose.
Here's an option. This works for arbitrarily deep hashes:
sub deep_keys_foreach
{
my ($hashref, $code, $args) = #_;
while (my ($k, $v) = each(%$hashref)) {
my #newargs = defined($args) ? #$args : ();
push(#newargs, $k);
if (ref($v) eq 'HASH') {
deep_keys_foreach($v, $code, \#newargs);
}
else {
$code->(#newargs);
}
}
}
deep_keys_foreach($f, sub {
my ($k1, $k2) = #_;
print "inside deep_keys, k1=$k1, k2=$k2\n";
});
This sounds to me as if Data::Diver or Data::Visitor are good approaches for you.
Keep in mind that Perl lists and hashes do not have dimensions and so cannot be multidimensional. What you can have is a hash item that is set to reference another hash or list. This can be used to create fake multidimensional structures.
Once you realize this, things become easy. For example:
sub f($) {
my $x = shift;
if( ref $x eq 'HASH' ) {
foreach( values %$x ) {
f($_);
}
} elsif( ref $x eq 'ARRAY' ) {
foreach( #$x ) {
f($_);
}
}
}
Add whatever else needs to be done besides traversing the structure, of course.
One nifty way to do what you need is to pass a code reference to be called from inside f. By using sub prototyping you could even make the calls look like Perl's grep and map functions.
You can also fudge multi-dimensional arrays if you always have all of the key values, or you just don't need to access the individual levels as separate arrays:
$arr{"foo",1} = "one";
$arr{"bar",2} = "two";
while(($key, $value) = each(%arr))
{
#keyValues = split($;, $key);
print "key = [", join(",", #keyValues), "] : value = [", $value, "]\n";
}
This uses the subscript separator "$;" as the separator for multiple values in the key.
There's no way to get the semantics you describe because foreach iterates over a list one element at a time. You'd have to have deep_keys return a LoL (list of lists) instead. Even that doesn't work in the general case of an arbitrary data structure. There could be varying levels of sub-hashes, some of the levels could be ARRAY refs, etc.
The Perlish way of doing this would be to write a function that can walk an arbitrary data structure and apply a callback at each "leaf" (that is, non-reference value). bmdhacks' answer is a starting point. The exact function would vary depending one what you wanted to do at each level. It's pretty straightforward if all you care about is the leaf values. Things get more complicated if you care about the keys, indices, etc. that got you to the leaf.
It's easy enough if all you want to do is operate on values, but if you want to operate on keys, you need specifications of how levels will be recoverable.
a. For instance, you could specify keys as "$level1_key.$level2_key.$level3_key"--or any separator, representing the levels.
b. Or you could have a list of keys.
I recommend the latter.
Level can be understood by #$key_stack
and the most local key is $key_stack->[-1].
The path can be reconstructed by: join( '.', #$key\_stack )
Code:
use constant EMPTY_ARRAY => [];
use strict;
use Scalar::Util qw<reftype>;
sub deep_keys (\%) {
sub deeper_keys {
my ( $key_ref, $hash_ref ) = #_;
return [ $key_ref, $hash_ref ] if reftype( $hash_ref ) ne 'HASH';
my #results;
while ( my ( $key, $value ) = each %$hash_ref ) {
my $k = [ #{ $key_ref || EMPTY_ARRAY }, $key ];
push #results, deeper_keys( $k, $value );
}
return #results;
}
return deeper_keys( undef, shift );
}
foreach my $kv_pair ( deep_keys %$f ) {
my ( $key_stack, $value ) = #_;
...
}
This has been tested in Perl 5.10.
If you are working with tree data going more than two levels deep, and you find yourself wanting to walk that tree, you should first consider that you are going to make a lot of extra work for yourself if you plan on reimplementing everything you need to do manually on hashes of hashes of hashes when there are a lot of good alternatives available (search CPAN for "Tree").
Not knowing what your data requirements actually are, I'm going to blindly point you at a tutorial for Tree::DAG_Node to get you started.
That said, Axeman is correct, a hashwalk is most easily done with recursion. Here's an example to get you started if you feel you absolutely must solve your problem with hashes of hashes of hashes:
#!/usr/bin/perl
use strict;
use warnings;
my %hash = (
"toplevel-1" =>
{
"sublevel1a" => "value-1a",
"sublevel1b" => "value-1b"
},
"toplevel-2" =>
{
"sublevel1c" =>
{
"value-1c.1" => "replacement-1c.1",
"value-1c.2" => "replacement-1c.2"
},
"sublevel1d" => "value-1d"
}
);
hashwalk( \%hash );
sub hashwalk
{
my ($element) = #_;
if( ref($element) =~ /HASH/ )
{
foreach my $key (keys %$element)
{
print $key," => \n";
hashwalk($$element{$key});
}
}
else
{
print $element,"\n";
}
}
It will output:
toplevel-2 =>
sublevel1d =>
value-1d
sublevel1c =>
value-1c.2 =>
replacement-1c.2
value-1c.1 =>
replacement-1c.1
toplevel-1 =>
sublevel1a =>
value-1a
sublevel1b =>
value-1b
Note that you CAN NOT predict in what order the hash elements will be traversed unless you tie the hash via Tie::IxHash or similar — again, if you're going to go through that much work, I recommend a tree module.