sorting hash values in perl [duplicate] - perl

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Perl sorting hash by values in the hash
I have browsed the web quite a bit for a solution, but I couldn't find the anything that meets my needs.
I have a large list of words with values attached to each word
Example:
my %list = (
word => 10,
xword => 15,
yword => 1
)
The list goes on and on, but I want to be able to return the top 5 hash elements with the highest corresponding values

use strict;
use warnings;
sub topN {
my ($N, %list) = (shift, #_);
$N = keys %list if $N > keys %list;
return (sort { $list{$b} <=> $list{$a} } keys %list)[0..$N-1];
}
my %list = ( word => 10, xword => 15, yword => 1, zword => 4);
print join (",", topN(5, %list)), "\n";
Output:
xword,word,zword,yword

This does what you need. Note that it will throw Use of uninitialized value warnings if your hash has fewer than five elements and you may have to add code to cater for that. It is also inefficient in that it sorts the entire hash rather than finding only the top five values. Whether or not that is an issue depends on your circumstance.
use strict;
use warnings;
my %list = (
word => 10,
xword => 15,
yword => 1,
);
my #top5 = (sort { $list{$b} <=> $list{$a} } keys %list)[0..4];
print "$_\n" for #top5;
output
xword
word
yword

use strict;
use warnings;
my %list = (
word => 10,
xword => 15,
yword => 1,
);
my #top5 = sort { $list{$b} <=> $list{$a} } keys %list;
splice(#top5, 5) if #top5 > 5;
print "$_\n" for #top5;

Related

Minimum and maximum values for Perl hash of hashes

This is a variation from another question asked on perlmonks and is similar to the problem I'm trying to figure out. I have the following hash of hashes.
%Year = (
2007 => {
ID1 => 07,
ID4 => 34,
ID2 => 24,
ID9 => 14,
ID3 => 05,
},
2008 => {
ID7 => 11,
ID9 => 64,
ID10 => 20,
ID5 => 13,
ID8 => 22,
}
)
I would like to find the two smallest and two largest values together with their corresponding IDs for each year. Can this be done using List::Util qw (min max)?
Desired results:
2007 - max1:ID4,34 max2:ID2,24 min1:ID3,05 min2:ID1,07
2008 - max1:ID9,64 max2:ID10,20 min1:ID7,11 min2:ID5,13
Unless the lists are huge, it is probably best to find the two largest and two smallest hash values just by sorting the entire hash and picking the first two and last two elements.
You seem to have incorrect expectations for your output. For 2008 the hash data sorted by value looks like
ID7 => 11
ID5 => 13
ID10 => 20
ID8 => 22
ID9 => 64
so max1 and max2 are ID9 and ID8, while min1 and min2 are are ID7 and ID5. But your question says that you expect max2 to be ID10, whose value is 20 - right in the middle of the sorted range. I think max2 should be ID8 which has a value of 22 - the second largest value in the 2008 hash.
I suggest this solution to produce the output that I think you want
use strict;
use warnings;
use 5.010;
my %year = (
2007 => { ID1 => 7, ID2 => 24, ID3 => 5, ID4 => 34, ID9 => 14 },
2008 => { ID10 => 20, ID5 => 13, ID7 => 11, ID8 => 22, ID9 => 64 },
);
for my $year (sort { $a <=> $b } keys %year) {
my $data = $year{$year};
my #sorted_keys = sort { $data->{$a} <=> $data->{$b} } keys %$data;
printf "%4d - max1:%s,%02d max2:%s,%02d min1:%s,%02d min2:%s,%02d\n",
$year, map { $_ => $data->{$_} } #sorted_keys[-1,-2,0,1];
}
output
2007 - max1:ID4,34 max2:ID2,24 min1:ID3,05 min2:ID1,07
2008 - max1:ID9,64 max2:ID8,22 min1:ID7,11 min2:ID5,13
TIMTOWDI: You've mentioned hash of hash, so you can sort your inner hash by values and take a slice (that is first two and last two elements).
#!/usr/bin/perl
use strict;
use warnings;
my %Year = (
2007 => { ID1 => 7, ID2 => 24, ID3 => 5, ID4 => 34, ID9 => 14 },
2008 => { ID10 => 20, ID5 => 13, ID7 => 11, ID8 => 22, ID9 => 64 },
);
for my $year (keys %Year) {
printf "%4d - max1:%s,%02d max2:%s,%02d min1:%s,%02d min2:%s,%02d\n",
$year,
map { $_, $Year{$year}{$_} }
( sort { $Year{$year}{$b} <=> $Year{$year}{$a} } keys %{$Year{$year}} )[0,1,-1,-2];
}
Output:
2007 - max1:ID4,34 max2:ID2,24 min1:ID3,05 min2:ID1,07
2008 - max1:ID9,64 max2:ID8,22 min1:ID7,11 min2:ID5,13
You have hashes, and List::Util works on lists/arrays. That disqualifies you right there since both the keys and the data are still important for you.
It's possible to create a second hash that's keyed by the data, then I could use something from List::Util or List::MoreUtils on that to pull up the data you want, and then look up the keys for that data. However, that's a lot of work just to get the information you want.
In reality, you're not sorting the hash of hashes, but just the data in each year. This makes the job a lot easier.
Normally, when you sort a hash, you're sorting on the keys. However, you can specify a subroutine inside the sort command to change the way Perl sorts. Perl will hand you two items $a and $b which represents the keys to your hash. You figure out which is the bigger one, and pass that back to Perl. Perl gives you <=> for numbers and cmp for non-numeric data.
All I have to do is specify sort { $array{$a} cmp $array{$b} } keys %array to sort by the data and not the keys. I simply toss the sorted keys into another array, then use index positioning to pull out the data I want.
#! /usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw(say);
use Data::Dumper;
my %year;
#
# Data
#
$year{2007}->{ID1} = "07";
$year{2007}->{ID2} = "24";
$year{2007}->{ID3} = "05";
$year{2007}->{ID4} = "34";
$year{2007}->{ID9} = "14";
$year{2008}->{ID7} = "11";
$year{2008}->{ID9} = "64";
$year{2008}->{ID10} = "20";
$year{2008}->{ID5} = "13";
$year{2008}->{ID8} = "22";
#
# For Each Year...
#
for my $year ( sort keys %year ) {
print "$year - ";
#
# No need to do this dereferencing, but it makes the rest of the code cleaner
#
my %id_hash = %{ $year{$year} };
#
# Now I sort my IDs by their data and not the key names
#
my #keys = sort { $id_hash{$a} cmp $id_hash{$b} } keys %id_hash;
#
# And print them out
#
print "max1:$keys[-1],$id_hash{$keys[-1]} ";
print "max2:$keys[-2],$id_hash{$keys[-2]} ";
print "min1:$keys[0],$id_hash{$keys[0]}, ";
print "min2:$keys[1],$id_hash{$keys[1]}\n";
}
The output is:
2007 - max1:ID4,34 max2:ID2,24 min1:ID3,05, min2:ID1,07
2008 - max1:ID9,64 max2:ID8,22 min1:ID7,11, min2:ID5,13

sum hash of hash values using perl

I have a Perl script that parses an Excel file and does the following : It counts for each value in column A, the number of elements it has in column B, the script looks like this :
use strict;
use warnings;
use Spreadsheet::XLSX;
use Data::Dumper;
use List::Util qw( sum );
my $col1 = 0;
my %hash;
my $excel = Spreadsheet::XLSX->new('inout_chartdata_ronald.xlsx');
my $sheet = ${ $excel->{Worksheet} }[0];
$sheet->{MaxRow} ||= $sheet->{MinRow};
my $count = 0;
# Iterate through each row
foreach my $row ( $sheet->{MinRow}+1 .. $sheet->{MaxRow} ) {
# The cell in column 1
my $cell = $sheet->{Cells}[$row][$col1];
if ($cell) {
# The adjacent cell in column 2
my $adjacentCell = $sheet->{Cells}[$row][ $col1 + 1 ];
# Use a hash of hashes
$hash{ $cell->{Val} }{ $adjacentCell->{Val} }++;
}
}
print "\n", Dumper \%hash;
The output looks like this :
$VAR1 = {
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
This works great, my question is : How can I access the elements of this output $VAR1 in order to do : for value 13, klm + hij = 3 and get a final output like this :
$VAR1 = {
'13' => {
'somename' => 3,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
};
So basically what I want to do is loop through my final hash of hashes and access its specific elements based on a unique key and finally do their sum.
Any help would be appreciated.
Thanks
I used #do_sum to indicate what changes you want to make. The new key is hardcoded in the script. Note that the new key is not created if no key exists in the subhash (the $found flag).
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %hash = (
'13' => {
'klm' => 1,
'hij' => 2,
'lkm' => 4,
},
'12' => {
'abc' => 2,
'efg' => 2
}
);
my #do_sum = qw(klm hij);
for my $num (keys %hash) {
my $found;
my $sum = 0;
for my $key (#do_sum) {
next unless exists $hash{$num}{$key};
$sum += $hash{$num}{$key};
delete $hash{$num}{$key};
$found = 1;
}
$hash{$num}{somename} = $sum if $found;
}
print Dumper \%hash;
It sounds like you need to learn about Perl References, and maybe Perl Objects which are just a nice way to deal with references.
As you know, Perl has three basic data-structures:
Scalars ($foo)
Arrays (#foo)
Hashes (%foo)
The problem is that these data structures can only contain scalar data. That is, each element in an array can hold a single value or each key in a hash can hold a single value.
In your case %hash is a Hash where each entry in the hash references another hash. For example:
Your %hash has an entry in it with a key of 13. This doesn't contain a scalar value, but a references to another hash with three keys in it: klm, hij, and lkm. YOu can reference this via this syntax:
${ hash{13} }{klm} = 1
${ hash{13} }{hij} = 2
${ hash{13} }{lkm} = 4
The curly braces may or may not be necessary. However, %{ hash{13} } references that hash contained in $hash{13}, so I can now reference the keys of that hash. You can imagine this getting more complex as you talk about hashes of hashes of arrays of hashes of arrays. Fortunately, Perl includes an easier syntax:
$hash{13}->{klm} = 1
%hash{13}->{hij} = 2
%hash{13}->{lkm} = 4
Read up about hashes and how to manipulate them. After you get comfortable with this, you can start working on learning about Object Oriented Perl which handles references in a safer manner.

How do I sort a hash array by value in perl?

I have a program that finds all the files in a directory and creates a hash array of their names and sizes.
example
%files = ("file1" => 10, "file2" => 30, "file3" => 5);
I want to be able to sort the files by size descending and add the names/values to a new array.
example
%filesSorted = ("file2" => 30, "file1" => 10, "file3" => 5);
I have found many ways to sort the array by value and then print the values but that's not what I want.
You must put the names of the files into an array in sorted order. Unlike Perl hashes, arrays are ordered and will retain their order. This code demonstrates the point using your own data
use strict;
use warnings;
my %files = (file1 => 10, file2 => 30, file3 => 5);
my #sorted = sort { $files{$b} <=> $files{$a} } keys %files;
foreach my $file (#sorted) {
print "$file => $files{$file}\n";
}
OUTPUT
file2 => 30
file1 => 10
file3 => 5

How to shuffle the values in a hash?

I have a hash of string IDs. What is the best way to shuffle the IDs?
As an example, my hash assigns the following IDs:
this => 0
is => 1
a => 2
test => 3
Now I'd like to randomly shuffle that. An example outcome would be:
this => 1
is => 0
a => 3
test => 2
You could use the shuffle method in List::Util to help out:
use List::Util qw(shuffle);
...
my #values = shuffle(values %hash);
map { $hash{$_} = shift(#values) } (keys %hash);
A hash slice would be the clearest way to me:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/shuffle/;
use Data::Dumper;
my %h = (
this => 0,
is => 1,
a => 2,
test => 3,
);
#h{keys %h} = shuffle values %h;
print Dumper \%h;
This has a drawback in that huge hashes would take up a lot of memory as you pull all of their keys and values out. A more efficient (from a memory standpoint) solution would be:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/shuffle/;
use Data::Dumper;
my %h = (
this => 0,
is => 1,
a => 2,
test => 3,
);
{ #bareblock to cause #keys to be garbage collected
my #keys = shuffle keys %h;
while (my $k1 = each %h) {
my $k2 = shift #keys;
#h{$k1, $k2} = #h{$k2, $k1};
}
}
print Dumper \%h;
This code has the benefit of only having to duplicate the keys (rather than the keys and values).
The following code doesn't randomize the values (except on Perl 5.8.1 where the order of keys is guaranteed to be random), but it does mix up the order. It does have the benefit of working in place without too much extra memory usage:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/shuffle/;
use Data::Dumper;
my %h = (
this => 0,
is => 1,
a => 2,
test => 3,
);
my $k1 = each %h;
while (defined(my $k2 = each %h)) {
#h{$k1, $k2} = #h{$k2, $k1};
last unless defined($k1 = each %h);
}
print Dumper \%h;

Perl map - need to map an array into a hash as arrayelement->array_index

I have a array like this:
my #arr = ("Field3","Field1","Field2","Field5","Field4");
Now i use map like below , where /DOSOMETHING/ is the answer am seeking.
my %hash = map {$_ => **/DOSOMETHING/** } #arr
Now I require the hash to look like below:
Field3 => 0
Field1 => 1
Field2 => 2
Field5 => 3
Field4 => 4
Any help?
%hash = map { $arr[$_] => $_ } 0..$#arr;
print Dumper(\%hash)
$VAR1 = {
'Field4' => 4,
'Field2' => 2,
'Field5' => 3,
'Field1' => 1,
'Field3' => 0
};
my %hash;
#hash{#arr} = 0..$#arr;
In Perl 5.12 and later you can use each on an array to iterate over its index/value pairs:
use 5.012;
my %hash;
while(my ($index, $value) = each #arr) {
$hash{$value} = $index;
}
Here's one more way I can think of to accomplish this:
sub get_bumper {
my $i = 0;
sub { $i++ };
}
my $bump = get_bumper; # $bump is a closure with its very own counter
map { $_ => $bump->(); } #arr;
As with many things that you can do in Perl: Don't do this. :) If the sequence of values you need to assign is more complex (e.g. 0, 1, 4, 9, 16... or a sequence of random numbers, or numbers read from a pipe), it's easy to adapt this approach to it, but it's generally even easier to just use unbeli's approach. The only advantage of this method is that it gives you a nice clean way to provide and consume arbitrary lazy sequences of numbers: a function that needs a caller-specified sequence of numbers can just take a coderef as a parameter, and call it repeatedly to get the numbers.
A very old question, but i had the same problem and this is my solution:
use feature ':5.10';
my #arr = ("Field3","Field1","Field2","Field5","Field4");
my %hash = map {state $i = 0; $_ => $i++} #arr;