Minimum and maximum values for Perl hash of hashes - perl

This is a variation from another question asked on perlmonks and is similar to the problem I'm trying to figure out. I have the following hash of hashes.
%Year = (
2007 => {
ID1 => 07,
ID4 => 34,
ID2 => 24,
ID9 => 14,
ID3 => 05,
},
2008 => {
ID7 => 11,
ID9 => 64,
ID10 => 20,
ID5 => 13,
ID8 => 22,
}
)
I would like to find the two smallest and two largest values together with their corresponding IDs for each year. Can this be done using List::Util qw (min max)?
Desired results:
2007 - max1:ID4,34 max2:ID2,24 min1:ID3,05 min2:ID1,07
2008 - max1:ID9,64 max2:ID10,20 min1:ID7,11 min2:ID5,13

Unless the lists are huge, it is probably best to find the two largest and two smallest hash values just by sorting the entire hash and picking the first two and last two elements.
You seem to have incorrect expectations for your output. For 2008 the hash data sorted by value looks like
ID7 => 11
ID5 => 13
ID10 => 20
ID8 => 22
ID9 => 64
so max1 and max2 are ID9 and ID8, while min1 and min2 are are ID7 and ID5. But your question says that you expect max2 to be ID10, whose value is 20 - right in the middle of the sorted range. I think max2 should be ID8 which has a value of 22 - the second largest value in the 2008 hash.
I suggest this solution to produce the output that I think you want
use strict;
use warnings;
use 5.010;
my %year = (
2007 => { ID1 => 7, ID2 => 24, ID3 => 5, ID4 => 34, ID9 => 14 },
2008 => { ID10 => 20, ID5 => 13, ID7 => 11, ID8 => 22, ID9 => 64 },
);
for my $year (sort { $a <=> $b } keys %year) {
my $data = $year{$year};
my #sorted_keys = sort { $data->{$a} <=> $data->{$b} } keys %$data;
printf "%4d - max1:%s,%02d max2:%s,%02d min1:%s,%02d min2:%s,%02d\n",
$year, map { $_ => $data->{$_} } #sorted_keys[-1,-2,0,1];
}
output
2007 - max1:ID4,34 max2:ID2,24 min1:ID3,05 min2:ID1,07
2008 - max1:ID9,64 max2:ID8,22 min1:ID7,11 min2:ID5,13

TIMTOWDI: You've mentioned hash of hash, so you can sort your inner hash by values and take a slice (that is first two and last two elements).
#!/usr/bin/perl
use strict;
use warnings;
my %Year = (
2007 => { ID1 => 7, ID2 => 24, ID3 => 5, ID4 => 34, ID9 => 14 },
2008 => { ID10 => 20, ID5 => 13, ID7 => 11, ID8 => 22, ID9 => 64 },
);
for my $year (keys %Year) {
printf "%4d - max1:%s,%02d max2:%s,%02d min1:%s,%02d min2:%s,%02d\n",
$year,
map { $_, $Year{$year}{$_} }
( sort { $Year{$year}{$b} <=> $Year{$year}{$a} } keys %{$Year{$year}} )[0,1,-1,-2];
}
Output:
2007 - max1:ID4,34 max2:ID2,24 min1:ID3,05 min2:ID1,07
2008 - max1:ID9,64 max2:ID8,22 min1:ID7,11 min2:ID5,13

You have hashes, and List::Util works on lists/arrays. That disqualifies you right there since both the keys and the data are still important for you.
It's possible to create a second hash that's keyed by the data, then I could use something from List::Util or List::MoreUtils on that to pull up the data you want, and then look up the keys for that data. However, that's a lot of work just to get the information you want.
In reality, you're not sorting the hash of hashes, but just the data in each year. This makes the job a lot easier.
Normally, when you sort a hash, you're sorting on the keys. However, you can specify a subroutine inside the sort command to change the way Perl sorts. Perl will hand you two items $a and $b which represents the keys to your hash. You figure out which is the bigger one, and pass that back to Perl. Perl gives you <=> for numbers and cmp for non-numeric data.
All I have to do is specify sort { $array{$a} cmp $array{$b} } keys %array to sort by the data and not the keys. I simply toss the sorted keys into another array, then use index positioning to pull out the data I want.
#! /usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
my %year;
#
# Data
#
$year{2007}->{ID1} = "07";
$year{2007}->{ID2} = "24";
$year{2007}->{ID3} = "05";
$year{2007}->{ID4} = "34";
$year{2007}->{ID9} = "14";
$year{2008}->{ID7} = "11";
$year{2008}->{ID9} = "64";
$year{2008}->{ID10} = "20";
$year{2008}->{ID5} = "13";
$year{2008}->{ID8} = "22";
#
# For Each Year...
#
for my $year ( sort keys %year ) {
print "$year - ";
#
# No need to do this dereferencing, but it makes the rest of the code cleaner
#
my %id_hash = %{ $year{$year} };
#
# Now I sort my IDs by their data and not the key names
#
my #keys = sort { $id_hash{$a} cmp $id_hash{$b} } keys %id_hash;
#
# And print them out
#
print "max1:$keys[-1],$id_hash{$keys[-1]} ";
print "max2:$keys[-2],$id_hash{$keys[-2]} ";
print "min1:$keys[0],$id_hash{$keys[0]}, ";
print "min2:$keys[1],$id_hash{$keys[1]}\n";
}
The output is:
2007 - max1:ID4,34 max2:ID2,24 min1:ID3,05, min2:ID1,07
2008 - max1:ID9,64 max2:ID8,22 min1:ID7,11, min2:ID5,13

Related

Sort both levels of keys for hash of hashes in perl

I have a code where I need to keep track of some values (that come up at random) at given positions in different categories (and a fairly large number of them; ~40,000), so I thought a hash of hashes would be the best way, with categories as first layer of keys, position as second and values as values; something like:
%HoH = {
'cat1' => {
'7010' => 19,
'6490' => 13,
'11980' => 2
}
'cat2' => {
'7010' => 28,
'10470' => 13,
'205980' => 54
}
}
Then I need to sort and print them in order of both categories and then position, to get an output file like:
cat1 6490 13
cat1 7010 19
...
cat2 7010 28
But I can't work out the syntax for the nested sorting (alternatively, anyone got a better idea than this approach?)
Perl makes it easy to efficiently sort the keys while iterating through a hash of hashes:
for my $cat (sort keys %HoH) {
# numerical sort:
for my $digits (sort { $a <=> $b } keys %{$HoH{$cat}}) {
print join("\t", $cat, $digits, $HoH{$cat}{$digits}) . "\n";
}
}

sorting hash values in perl [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Perl sorting hash by values in the hash
I have browsed the web quite a bit for a solution, but I couldn't find the anything that meets my needs.
I have a large list of words with values attached to each word
Example:
my %list = (
word => 10,
xword => 15,
yword => 1
)
The list goes on and on, but I want to be able to return the top 5 hash elements with the highest corresponding values
use strict;
use warnings;
sub topN {
my ($N, %list) = (shift, #_);
$N = keys %list if $N > keys %list;
return (sort { $list{$b} <=> $list{$a} } keys %list)[0..$N-1];
}
my %list = ( word => 10, xword => 15, yword => 1, zword => 4);
print join (",", topN(5, %list)), "\n";
Output:
xword,word,zword,yword
This does what you need. Note that it will throw Use of uninitialized value warnings if your hash has fewer than five elements and you may have to add code to cater for that. It is also inefficient in that it sorts the entire hash rather than finding only the top five values. Whether or not that is an issue depends on your circumstance.
use strict;
use warnings;
my %list = (
word => 10,
xword => 15,
yword => 1,
);
my #top5 = (sort { $list{$b} <=> $list{$a} } keys %list)[0..4];
print "$_\n" for #top5;
output
xword
word
yword
use strict;
use warnings;
my %list = (
word => 10,
xword => 15,
yword => 1,
);
my #top5 = sort { $list{$b} <=> $list{$a} } keys %list;
splice(#top5, 5) if #top5 > 5;
print "$_\n" for #top5;

sum hash of hash values using perl

I have a Perl script that parses an Excel file and does the following : It counts for each value in column A, the number of elements it has in column B, the script looks like this :
use strict;
use warnings;
use Spreadsheet::XLSX;
use Data::Dumper;
use List::Util qw( sum );
my $col1 = 0;
my %hash;
my $excel = Spreadsheet::XLSX->new('inout_chartdata_ronald.xlsx');
my $sheet = ${ $excel->{Worksheet} }[0];
$sheet->{MaxRow} ||= $sheet->{MinRow};
my $count = 0;
# Iterate through each row
foreach my $row ( $sheet->{MinRow}+1 .. $sheet->{MaxRow} ) {
# The cell in column 1
my $cell = $sheet->{Cells}[$row][$col1];
if ($cell) {
# The adjacent cell in column 2
my $adjacentCell = $sheet->{Cells}[$row][ $col1 + 1 ];
# Use a hash of hashes
$hash{ $cell->{Val} }{ $adjacentCell->{Val} }++;
}
}
print "\n", Dumper \%hash;
The output looks like this :
$VAR1 = {
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
This works great, my question is : How can I access the elements of this output $VAR1 in order to do : for value 13, klm + hij = 3 and get a final output like this :
$VAR1 = {
'13' => {
'somename' => 3,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
So basically what I want to do is loop through my final hash of hashes and access its specific elements based on a unique key and finally do their sum.
Any help would be appreciated.
Thanks
I used #do_sum to indicate what changes you want to make. The new key is hardcoded in the script. Note that the new key is not created if no key exists in the subhash (the $found flag).
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %hash = (
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
);
my #do_sum = qw(klm hij);
for my $num (keys %hash) {
my $found;
my $sum = 0;
for my $key (#do_sum) {
next unless exists $hash{$num}{$key};
$sum += $hash{$num}{$key};
delete $hash{$num}{$key};
$found = 1;
}
$hash{$num}{somename} = $sum if $found;
}
print Dumper \%hash;
It sounds like you need to learn about Perl References, and maybe Perl Objects which are just a nice way to deal with references.
As you know, Perl has three basic data-structures:
Scalars ($foo)
Arrays (#foo)
Hashes (%foo)
The problem is that these data structures can only contain scalar data. That is, each element in an array can hold a single value or each key in a hash can hold a single value.
In your case %hash is a Hash where each entry in the hash references another hash. For example:
Your %hash has an entry in it with a key of 13. This doesn't contain a scalar value, but a references to another hash with three keys in it: klm, hij, and lkm. YOu can reference this via this syntax:
${ hash{13} }{klm} = 1
${ hash{13} }{hij} = 2
${ hash{13} }{lkm} = 4
The curly braces may or may not be necessary. However, %{ hash{13} } references that hash contained in $hash{13}, so I can now reference the keys of that hash. You can imagine this getting more complex as you talk about hashes of hashes of arrays of hashes of arrays. Fortunately, Perl includes an easier syntax:
$hash{13}->{klm} = 1
%hash{13}->{hij} = 2
%hash{13}->{lkm} = 4
Read up about hashes and how to manipulate them. After you get comfortable with this, you can start working on learning about Object Oriented Perl which handles references in a safer manner.

Traversing a hash in order

I would like to traverse the HASH but one by one. Not in Random ways. Any idea. For example i have hash file something like this...
our %HASH = (
'rajesh:1700' => Bangalore,
'rajesh:1730' => Delhi,
'rajesh:1770' => Ranchi,
'rajesh:1780' => Mumbai,
'rajesh:1800' => MYCITY,
'rajesh:1810' => XCF,
);
and it should print in same fashion. I tried with following but failed. Any ideas?
while ( my $gPort = each %HASH)
{
print "$gPort\n";
}
for my $gPort ( keys %HASH )
{
print "$gPort\n";
}
Given the keys in your question, a simple change to the sort comparator will give your desired output.
for my $gPort (sort keys %HASH) {
print "$gPort => $HASH{$gPort}\n";
}
Note: the code above assumes all numbers in keys will occur at the same position and have the same length. For instance, a rajesh:001775 key will come out first rather than between 1770 and 1780.
You could sort and print out a hash, ordering by VALUE (not keys).
for my $gPort (sort { $HASH{$a} <=> $HASH{$b} } keys %HASH) {
print "$gPort => $HASH{$gPort}\n";
}
Take a look at Data::Dumper. In particular, if you set $Data::Dumper::Sortkeys, then you would get the dump in sorted order.
As an example:
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
my %some_hash;
# code to populate hash
[ . . . ]
print Dumper(\%some_hash);
Of course, this would work only if you want to plainly dump the hash. If you want the printing to be done in some other format, you would want to just sort the keys and print, like
foreach my $key (sort keys %some_hash) {
print "[KEY]: $key; [VAL]: $some_hash{$key}\n";
}
If you wish to preserve the insert-order of your elements in your hash then Tie::IxHash may be the tool for you. It's usage is very simple:
Showing you simple example:
use Tie::IxHash;
tie my %days_in => 'Tie::IxHash',
January => 31,
February => 28,
March => 31,
April => 30,
May => 31,
June => 30,
July => 31,
August => 31,
September => 30,
October => 31,
November => 30,
December => 31;
print join(" ", keys %days_in), "\n";
# prints: January February March April May June July August
# September October November December

How do I sort a hash array by value in perl?

I have a program that finds all the files in a directory and creates a hash array of their names and sizes.
example
%files = ("file1" => 10, "file2" => 30, "file3" => 5);
I want to be able to sort the files by size descending and add the names/values to a new array.
example
%filesSorted = ("file2" => 30, "file1" => 10, "file3" => 5);
I have found many ways to sort the array by value and then print the values but that's not what I want.
You must put the names of the files into an array in sorted order. Unlike Perl hashes, arrays are ordered and will retain their order. This code demonstrates the point using your own data
use strict;
use warnings;
my %files = (file1 => 10, file2 => 30, file3 => 5);
my #sorted = sort { $files{$b} <=> $files{$a} } keys %files;
foreach my $file (#sorted) {
print "$file => $files{$file}\n";
}
OUTPUT
file2 => 30
file1 => 10
file3 => 5