How to find the depth of a nested hash of hashes? - perl

I am trying to write a Perl subroutine to process any given hash (passed by reference) but I would like to make it a generic one so that I can use it anywhere.
Assuming the hash has simple key/value pairs and is not an elaborated record (containing arrays of arrays or arrays of hashes), is there any way to find how deep the hash of hashes runs?
For example
my %stat = (
1 => { one => "One is one.", two => "two is two"},
2 => { one => "second val wone", two => "Seconv v"}
);
The hash above has first level key 1 which has two keys one and two. So it is a hash with two levels.
My question is whether is there any way to test and find this information that the hash has two levels?
Update from comment
About the "problem that has led me to believe that I need to know the depth that a Perl hash is nested". Here is my situation.
The program is creating a data structure of three levels, and also publishing it in a text file for other scripts which are processing this published data and doing something else.
The program is also reading and hashing other data structures of five levels, which has data related to the hash in the first point.
The program is also processing a continuously growing log file and collecting data.

Assuming a uniform hash structure:
use strict;
use warnings;
sub depth {
my ($h) = #_;
my $d = 0;
while () {
return $d if ref($h) ne 'HASH';
($h) = values %$h;
$d++;
}
}
my %stat = (
1 => { one => "One is one.", two => "two is two"},
2 => { one => "second val wone", two => "Seconv v"}
);
print depth(\%stat), "\n";
Output:
2

Traversing hash values and tracking levels could give answer,
use strict;
use warnings;
use v5.16;
{ my $max;
sub hlevel {
my ($h, $n) = #_;
$max = 0 if !$n;
$max = $n if $max < $n;
__SUB__->($_, $n +1) for grep {ref eq "HASH"} values %$h;
return $max;
}}
my %h;
$h{a}{b}{c}{d}{e} =1;
$h{a}{b}{c}{d}{e1}{f} =1;
$h{a}{b}{c}{d}{e1}{f1}{g} =1;
print hlevel(\%h, 0);
output
6

Related

how to use reference of hashes in perl

I'm trying to learn the reference function, but I can't figure out a way to put hashes in reference at the same time. I want to write a subroutine that will take two simple hash references as arguments and to check whether these two hashes are equal or not. My code is:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);
my $hash1_r = {ITALY => "ROME",
FRANCE => "PARIS"};
my $hash2_r = {ITALY => "MILAN",
FRANCE => "PARIS"};
my $hash3_r = {ITALY => "ROME"};
my $hash4_r = {SPAIN => "ROME",
FRANCE => "PARIS"};
sub compareHashes(%$hash1, %$hash2){
my $hash1; my $hash2;
for (my $i =0; $i < keys %$hash1; $i++){
say "The first hash:";
say "keys %$hash1\t, values %$hash1";
}
for (my $i =0; $i < keys %$hash2; $i++){
say "The second hash:";
say "keys %$hash2\t, values %$hash2";
}
for (keys %$hash1) {
if (keys %$hash1 ne keys %$hash2){
say "Two above hashes are not equal";
}elsif (my $key1 (keys %$hash1) ne my $key2 (keys %$hash2)){
say "Two above hashes are not equal";
}elsif (%$hash1->{$_} ne %$hash2->{$_}){
say "Two above hashes are not equal";
}else {
say "Two above hashes are equal";
}
}
}
compareHashes (%$hash1_r, %$hash1_r);
compareHashes (%$hash1_r, %$hash2_r);
compareHashes (%$hash1_r, %$hash3_r);
compareHashes (%$hash1_r, %$hash4_r);
However, I got those errors:
Prototype after '%' for main::compareHashes : %$hash1,%$hash2 at compareHashes2.pl line 16.
Illegal character in prototype for main::compareHashes : %$hash1,%$hash2 at compareHashes2.pl line 16.
syntax error at compareHashes2.pl line 30, near "$key1 ("
syntax error at compareHashes2.pl line 32, near "}elsif"
Global symbol "$hash2" requires explicit package name at compareHashes2.pl line 32.
Any solutions? Any help will be greatly appreciated!
I would recommend reading the following excellent perl documentation for the general idea:
perldoc perlreftut
A slight simplification of your code, getting the references to work:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);
# { ... } creates a hash reference, you can pass this to a function directly
my $hash1_r = { ITALY => "ROME", FRANCE => "PARIS" };
my $hash2_r = { ITALY => "MILAN", FRANCE => "PARIS" };
my $hash3_r = { ITALY => "ROME" };
my $hash4_r = { SPAIN => "ROME", FRANCE => "PARIS" };
sub compareHashes {
my ($hash1, $hash2) = #_; # #_ is the default array
# You can just use hash references directly by prepending with a '%' symbol
# when you need the actual hash, such as when using 'keys', 'values', 'each', etc.
# You can access the elements by using an arrow: $hashref->{'key_name'}
say "-"x40;
say "The first hash:";
while ( my ($key, $value) = each %$hash1 ) {
say "$key => $value";
}
say "The second hash:";
while ( my ($key, $value) = each %$hash2 ) {
say "$key => $value";
}
my (#keys1) = keys %$hash1;
my ($nkey1) = scalar #keys1;
my (#keys2) = keys %$hash2;
my ($nkey2) = scalar #keys2;
if ($nkey1 != $nkey2) {
say "=> unequal number of keys: $nkey1 vs $nkey2";
return 0; # False, the hashes are different, we don't need to test any further
}
# Create a new hash using all of the keys from hash1 and hash2
# The effect is to eliminate duplicates, as repeated keys, i.e.
# common to both hash1 and hash2 will just produce one key in %uniq
# You can use the 'uniq' function from List::MoreUtils to achieve
# the same thing.
# In perl, using a hash to eliminate duplicates, or test for set
# membership is a very common idiom.
# The 'map' function iterates over a list and performs the
# operation inside the curly braces {...}, returning all
# of the results.
# For example: map { 2 * $_ } ( 1,2,3 ) # ( 2,4,6 )
# If you assign a list to a hash, it takes pairs of values
# and turns them into key/value pairs
# The '=>' is equivalent to a ',' but makes the intent easier
# to understand
my %uniq = map { $_ => 1 } ( #keys1, #keys2 );
my $nuniqkey = scalar keys %uniq;
if ($nkey1 != $nuniqkey) {
say "=> unequal set of keys";
return 0; # False, the hashes are different, we don't need to test any further
}
# Now test the values
# If we neglected to check for uniqueness in the above block,
# we would run into the situation where hash1 might have a key
# that hash2 doesn't have (and vice-versa). This would trigger a
# 'use of uninitialized value' warning in the comparison operator
for my $key (#keys1) {
my ($value1) = $hash1->{$key};
my ($value2) = $hash2->{$key};
if ($value1 ne $value2) {
say "=> unequal values for key '$key' : $value1 vs $value2";
return 0; # False, the hashes are different, we don't need to test any further
}
}
say "=> equal, yay!";
return 1; # True, the hashes are equal after all!
}
compareHashes($hash1_r, $hash1_r);
compareHashes($hash1_r, $hash2_r);
compareHashes($hash1_r, $hash3_r);
compareHashes($hash1_r, $hash4_r);
You have a good answer that you have already accepted. But for people finding this question in the future, I think it's worth explaining some of the errors you have made.
You start by defining some anonymous hashes. That's fine.
my $hash1_r = {
ITALY => "ROME",
FRANCE => "PARIS"
};
my $hash2_r = {
ITALY => "MILAN",
FRANCE => "PARIS"
};
my $hash3_r = {
ITALY => "ROME"
};
my $hash4_r = {
SPAIN => "ROME",
FRANCE => "PARIS"
};
I'm now going to skip to where you call your subroutine (I'll get back to the subroutine itself soon).
compareHashes (%$hash1_r, %$hash1_r);
compareHashes (%$hash1_r, %$hash2_r);
compareHashes (%$hash1_r, %$hash3_r);
compareHashes (%$hash1_r, %$hash4_r);
One of the most important uses for references is to enable you to pass multiple arrays and hashes into a subroutine without them being flattened into a single array. As you have hash references already, it would make sense to pass those references into the subroutine. But you don't do that. You dereference your hashes which means you send the actual hashes into the subroutine. That means that, for example, your first call passes in the list ('ITALY', 'ROME', 'FRANCE', 'PARIS', 'ITALY', 'MILAN', 'FRANCE', 'PARIS'). And there is no way for the code inside your subroutine to separate that list into two hashes.
Now, let's look at the subroutine itself. You start by defining a prototype for the subroutine. In most cases, prototypes are unnecessary. In many cases, they change the code behaviour in hard-to-understand ways. No Perl expert would recommend using prototypes in this code. And, as your error message says, you get the prototype wrong.
sub compareHashes(%$hash1, %$hash2){
I'm not sure what you were trying to do with this prototype. Perhaps it's not a prototype at all - perhaps it's a function signature (but if it was, you would need to turn the feature on).
On the next line, you declare two variables. Variables that you never give values to.
my $hash1; my $hash2;
There are then two very confused for loops.
for (my $i =0; $i < keys %$hash1; $i++){
say "The first hash:";
say "keys %$hash1\t, values %$hash1";
}
$hash1 has no value. So %$hash1 is zero (the hash has no keys) and the loop isn't executed. But we're not missing much as the loop body just prints the same uninitialised values each time.
And you could simplify your for loop by making it a foreach-style loop.
foreach my $i (0 .. keys %$hash1 - 1) { ... }
Or (given that you don't use $i at all:
foreach (1 .. keys %$hash1) { ... }
After another, equally ineffective, for loop for $hash2, you try to compare your two hashes.
for (keys %$hash1) {
if (keys %$hash1 ne keys %$hash2){
say "Two above hashes are not equal";
}elsif (my $key1 (keys %$hash1) ne my $key2 (keys %$hash2)){
say "Two above hashes are not equal";
}elsif (%$hash1->{$_} ne %$hash2->{$_}){
say "Two above hashes are not equal";
}else {
say "Two above hashes are equal";
}
}
I have no idea at all why this is all in a for loop. but your comparisons do nothing to actually compare the values in the hash. All you are comparing is the number of keys in the hashes (which are always going to be equal here - as your hashes are always empty).
All in all, this is the work who is extremely confused about how hashes, subroutines and references work in Perl. I would urge you to stop what you are doing and take the time to work through a good reference book like Learning Perl followed by Intermediate Perl before you continue down your current route and just confuse yourself more.

Find values of nested hash matching a specific key

I've created a hash of hashes in perl, where this is an example of what the hash ends up looking like:
my %grades;
$grades{"Foo Bar"}{Mathematics} = 97;
$grades{"Foo Bar"}{Literature} = 67;
$grades{"Peti Bar"}{Literature} = 88;
$grades{"Peti Bar"}{Mathematics} = 82;
$grades{"Peti Bar"}{Art} = 99;
and to print the entire hash, I'm using:
foreach my $name (sort keys %grades) {
foreach my $subject (keys %{ $grades{$name} }) {
print "$name, $subject: $grades{$name}{$subject}\n";
}
}
I need to print just the inner hash referring to "Peti Bar" and find the highest value, so theoretically, I should just parse through Peti Bar, Literature; Peti Bar, Mathematics; and Peti Bar, Art and end up returning Art, since it has the highest value.
Is there a way to do this or do I need to parse through the entire 2d hash?
You don't need to parse through the first level if you know the key that you're interested. Just leave out the first loop and access it directly. To get the highest value, you have to look at each subject once.
Keep track of the highest value and the key that goes with it, and then print.
my $max_value = 0;
my $max_key;
foreach my $subject (keys %{ $grades{'Peti Bar'} }) {
if ($grades{'Peti Bar'}{$subject} > $max_value){
$max_value = $grades{'Peti Bar'}{$subject};
$max_key = $subject;
}
}
print $max_key;
This will output
Art
An alternative implementation with sort would look like this:
print +(
sort { $grades{'Peti Bar'}{$b} <=> $grades{'Peti Bar'}{$a} }
keys %{ $grades{'Peti Bar'} }
)[0];
The + in +( ... ) tells Perl that the parenthesis () are not meant for the function call to print, but to construct a list. The sort sorts on the keys, descending, because it has $b first. It returns a list, and we take the first value (index 0).
Note that this is more expensive than the first implementation, and not necessarily more concise. Unless you're building a one-liner or your ; is broken I wouldn't recommend the second solution.
This is trivial using the List::UtilsBy module
The code is made clearer by extracting a reference to the inner hash that we're interested in. The max_by is called to return the keys of that hash that has the maximum value
use strict;
use warnings 'all';
use feature 'say';
use List::UtilsBy 'max_by';
my %grades = (
'Foo Bar' => { Literature => 67, Mathematics => 97 },
'Peti Bar' => { Literature => 88, Mathematics => 82, Art => 99 },
);
my $pb_grades = $grades{'Peti Bar'};
say max_by { $pb_grades->{$_} } keys %$pb_grades;
output
Art
As a Perl beginner I would use the List::Util core module:
use 5.014;
use List::Util 'reduce';
my $k='Peti Bar';
say reduce { $grades{$k}{$a} > $grades{$k}{$b} ? $a : $b } keys %{$grades{$k}};

Memory/performance tradeoff when determining the size of a Perl hash

I was browsing through some Perl code in a popular repositiory on GitHub and ran across this method to calculate the size of a hash:
while ( my ($a, undef ) = each %h ) { $num++; }
I thought why would one go through the trouble of writing all that code when it could more simply be written as
$num = scalar keys %h;
So, I compared both methods with Benchmark.
my %h = (1 .. 1000);
cmpthese(-10, {
keys => sub {
my $num = 0;
$num = scalar keys %h;
},
whileloop => sub {
my $num = 0;
while ( my ($a, undef ) = each %h ) {
$num++;
}
},
});
RESULTS
Rate whileloop keys
whileloop 5090/s -- -100%
keys 7234884/s 142047% --
The results show that using keys is MUCH faster than the while loop. My question is this: why would the original coder use such a slow method? Is there something that I'm missing? Also, is there a faster way?
I cannot read the mind of whomever might have written that piece of code, but he/she likely thought:
my $n = keys %hash;
used more memory than iterating through everything using each.
Note that the scalar on the left hand side of the assignment creates scalar context: There is no need for scalar unless you want to create a scalar context in what would otherwise have been list context.
Because he didn't know about keys's ability to return the number of elements in the hash.

How do I preserve the order of a hash in Perl?

I have a .sql file from which I am reading my input. Suppose the file contains the following input....
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
Now I have written a perl script to generate hash from this file..
use strict;
use Data::Dumper;
if(open(MYFILE,"file.sql")){
my #stack;
my %hash;
push #stack,\%hash;
my #file = <MYFILE>;
foreach my $row(#file){
if($row =~ /Message /){
my %my_hash;
my #words = split(" ",$row);
my #sep_words = split(",",$words[2]);
foreach my $x(#sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
}
push #stack,$stack[$#stack]->{$words[1]} = {%my_hash};
pop #stack;
}
}
print Dumper(\%hash);
}
I am getting the following output..
$VAR1 = {
'Flowers' => {
'Flower' => '"Rose"',
'Color' => '"Red";'
},
'Fruits' => {
'Taste' => '"Sweet";',
'Fruit' => '"Apple"',
'Color' => '"Red"'
}
};
Now here the hash is not preserving the order in which the input is read.I want my hash to be in the same order as in input file.
I have found some libraries like Tie::IxHash but I want to avoid the use of any libraries.Can anybody help me out???
For a low key approach, you could always maintain the keys in an array, which does have an order.
foreach my $x(#sep_words){
my($key,$value) = split("=",$x);
$my_hash{$key} = $value;
push(#list_keys,$key);
}
And then to extract, iterate over the keys
foreach my $this_key (#list_keys) {
# do something with $my_hash{$this_key}
}
But that does have the issue of, you're relying on the array of keys and the hash staying in sync. You could also accidentally add the same key multiple times, if you're not careful.
Joel has it correct - you cannot reliably trust the order of a hash in Perl. If you need a certain order, you'll have to store your information in an array.
A hash is a set of key-value pairs with unique keys. A set is never ordered per se.
An array is a sequence of any number of scalars. An array is ordered per se, but uniqueness would have to be enforced externally.
Here is my take on your problem:
#!/usr/bin/perl
use strict; use warnings;
use Data::Dumper;
local $/ = ";\n";
my #messages;
while (<DATA>) {
chomp;
my ($msg, $to, $what) = split ' ', $_, 3; # limit number of fragments.
my %options;
while($what =~ /(\w+) = "((?:[^"]++|\\.)*)" (?:,|$)/xg) {
$options{$1} = $2;
}
push #messages, [$to => \%options];
}
print Dumper \#messages;
__DATA__
Message Fruits Fruit="Apple",Color="Red",Taste="Sweet";
Message Flowers Flower="Rose",Color="Red";
I put the messages into an array, because it has to be sorted. Also, I dont do weird gymnastics with a stack I don't need.
I don't split on all newlines, because you could have quoted value that contain newlines. For the same reason, I don't blindly split on , or = and use a sensible regex. It may be worth adding error detection, like die if not defined pos $what or pos($what) != length($what); at the end (requires /c flag on regex), to see if we actually processed everything or were thrown out of the loop prematurely.
This produces:
$VAR1 = [
[ 'Fruits',
{
'Taste' => 'Sweet',
'Fruit' => 'Apple',
'Color' => 'Red'
}
],
[ 'Flowers',
{
'Flower' => 'Rose',
'Color' => 'Red'
}
]
];
(with other indenting, but that's irrelevant).
One gotcha exists: The file has to be terminated by a newline, or the last semicolon isn't caught.

How to count duplicate key and add all values of duplicate key together to make new hash with non duplicate key?

Hi I am new to perl and in a beginners stage Please Help
I am having a hash
%hash = { a => 2 , b=>6, a=>4, f=>2, b=>1, a=>1}
I want output as
a comes 3 times
b comes 2 times
f comes 1 time
a new hash should be
%newhash = { a => 7, b=>7,f =>2}
How can I do this?
To count the frequency of keys in hash i am doing
foreach $element(sort keys %hash) {
my $count = grep /$element/, sort keys %hash;
print "$element comes in $count times \n";
}
But by doing this I am getting the output as:
a comes 1 times
b comes 1 times
a comes 1 times
f comes 1 times
b comes 1 times
a comes 1 times
Which is not what I want.
How can I get the correct number of frequency of the duplicate keys? How can I add the values of those duplicate key and store in a new hash?
By definition, a hash can not have the same hash key in it multiple times. You probably want to store your initial data in a different data structure, such as a two-dimensional array:
use strict;
use warnings;
use Data::Dumper;
my #data = ( [ a => 2 ],
[ b => 6 ],
[ a => 4 ],
[ f => 2 ],
[ b => 1 ],
[ a => 1 ],
);
my %results;
for my $value (#data) {
$results{$value->[0]} += $value->[1];
}
print Dumper %results;
# $VAR1 = 'a';
# $VAR2 = 7;
# $VAR3 = 'b';
# $VAR4 = 7;
# $VAR5 = 'f';
# $VAR6 = 2;
That said, other wrong things:
%hash = { a => 2 , b=>6, a=>4, f=>2, b=>1, a=>1}
You can't do this, it's assigning a hashref ({}) to a hash. Either use %hash = ( ... ) or $hashref = { ... }.
Sonam:
I've reedited your post in order to help format it for reading. Study the Markdown Editing Help Guide and that'll make your posts clearer and easier to understand. Here are a couple of hints:
Indent your code by four spaces. That tells Markdown to leave it alone and don't reformat it.
When you make a list, put astricks with a space in front. Markdown understands it's a bulleted list and formats it as such.
Press "Edit" on your original post, and you can see what changes I made.
Now on to your post. I'm not sure I understand your data. If your data was in a hash, the keys would be unique. You can't have duplicate keys in a hash, so where is your data coming from?
For example, if you're reading it in from a file with two numbers on each line, you could do this:
use autodie;
use strict;
use warnings;
open (my $data_fh, "<", "$fileName");
my %hash;
while (my $line = <$data_fh>) {
chomp $line;
my ($key, $value) = split /\s+/, $line;
$hash{$key}++;
}
foreach my $key (sort keys %hash) {
print "$key appears $hash{$key} times\n";
}
The first three lines are Perl pragmas. They change the way Perl operates:
use autodie: This tells the program to die in certain circumstances such as when you try to open a file that doesn't exist. This way, I didn't have to check to see if the open statement worked or not.
use strict: This makes sure you have to declare your variables before using them which helps eliminate 90% of Perl bugs. You declare a variable most of the time using my. Variables declared with my last in the block where they were declared. That's why my %hash had to be declared before the while block. Otherwise, the variable would become undefined once that loops completes.
use warnings: This has Perl generate warnings in certain conditions. For example, you attempt to print out a variable that has no user set value.
The first loop simply goes line by line through my data and counts the number of occurrences of your key. The second loop prints out the results.