Weighted sort in perl? - perl

I have a hash of hashes where the values are all numerical. I can sort fine using the sort command and or to sort the hash values in order first to last, but what if I want to weight the results instead of it just being in order of keys specified? Is there a way to do that?
EDIT: Ok, here's the code...
my #check_order = ["disk_usage","num_dbs","qps_avg"];
my %weights = ( disk_usage => .7,
num_dbs => .4,
qps_avg => .2
);
my #dbs=sort {
($stats{$a}->{$check_order[0]}*$weights{$check_order[0]}) <=>
($stats{$b}->{$check_order[0]}*$weights{$check_order[0]}) or
($stats{$a}->{$check_order[1]}*$weights{$check_order[1]}) <=>
($stats{$b}->{$check_order[1]}*$weights{$check_order[1]}) or
($stats{$a}->{$check_order[2]}*$weights{$check_order[2]}) <=>
($stats{$b}->{$check_order[2]}*$weights{$check_order[2]})
} keys(%stats);

You want to sort the list based on a function value of each element. So use a function in your sort statement.
#sorted = sub { sort_function($a) <=> sort_function($b) } #unsorted;
sub sort_function {
my ($input) = #_;
return $input->{disk_usage} * 0.7
+ $input->{num_dbs} * 0.4
+ $input->{qps_avg} * 0.2;
# -or- more generally
my $value = 0;
while (my ($key,$weight) = each %weights) {
$value += $input->{$key} * $weight;
}
return $value;
}
When your sort function is expensive and there are many items to be sorted, a Schwartzian transform can improve the performance of your sort:
#sorted = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, sort_function($_) ] }
#unsorted;

If your weights are stored in another hash %property
This will sort hash keys based on the product $hash{key} * $property{key}
#!/usr/bin/perl
use strict;
use warnings;
my %hash = (
a => 51,
b => 61,
c => 71,
);
my %property = ( a => 7, b => 6, c => 5 );
foreach (sort { ($hash{$a}*$property{$a}) <=>
($hash{$b}*$property{$b}) } keys %hash)
{
printf("[%d][%d][%d]\n",
$hash{$_},$property{$_},$hash{$_}*$property{$_});
}

Related

custom sort method for hashes which will automatically use the approporiate hash

My hash contains binary numbers as keys:
my %h = ("1010" => 1, "1110" => 0, "0001" => 3, "1100" => 2);
In perl I can use custom function for sorting hash. This is my function for sorting binary numbers from lowest to largest:
sub sort_binary_numbers {
my $a_dec = oct("0b".$a);
my $b_dec = oct("0b".$b);
return $a_dec <=> $b_dec;
}
I can sort hash using this function following way:
print Dumper sort sort_binary_numbers keys %h;
And the result will be:
$VAR1 = '0001';
$VAR2 = '1010';
$VAR3 = '1100';
$VAR4 = '1110';
I want to sort hash using values not keys. I can do following:
print Dumper sort { $h{$b} <=> $h{$a} } keys %h;
As you can see I have to use hash name in sorting block. The problem is how to rewrite this sorting block to function (as above examples) and automatically get the appropriate hash name in function. I've tried access hash name using #_ but it was not printed e.g.
sub sort_by_value {
print Dumper #_; # This was not printed
print ref #_; # This was not printed
return $b <=> $a;
}
And call it following way:
print Dumper sort sort_by_value keys %h;
The interesting part is that when I wrap this sorting in to another function and call it in loop from this function I will get the output of data dumper that was previously missing (but I still did not get the output of ref command):
sub calling_from_function {
my %h = %{$_[0]};
foreach my $key (sort sort_by_value keys %h){
}
}
&calling_from_function(\%h);
Then I get this output:
$VAR1 = {
'0001' => 3,
'1010' => 1,
'1110' => 0,
'1100' => 2
};
$VAR1 = {
'0001' => 3,
'1010' => 1,
'1110' => 0,
'1100' => 2
};
$VAR1 = {
'0001' => 3,
'1010' => 1,
'1110' => 0,
'1100' => 2
};
$VAR1 = {
'0001' => 3,
'1010' => 1,
'1110' => 0,
'1100' => 2
};
Questions:
How can I replace sorting block in this command print Dumper sort { $h{$b} <=> $h{$a} } keys %h; with function and get the appropriate name of hash inside sortign function?
Why wrapping from another function works?
Why ref does not works?
The sorting subroutine doesn't take parameters normally (i.e. unless prototypes are involved) through #_, but through $a and $b. ref #array can never return anything, as an array is never a reference.
Wrapping by another function works, because you populate #_ by parameters to the wrapper.
Use a wrapper to sort any hash:
sub sort_by_value {
my %h = #_;
return sort { $h{$b} <=> $h{$a} } keys %h
}
print Dumper(sort_by_value(%h));
You can also send the hash reference to the subroutine:
sub sort_by_value {
my ($h) = #_;
return sort { $h->{$b} <=> $h->{$a} } keys %$h
}
print Dumper sort_by_value(\%h);
So you want to have a generic sorting function such as
my $sorter = sub { $_[0]{$b} <=> $_[0]{$a} };
When it comes time to sort, just use
my #sorted_keys = sort { $sorter->(\%h) } keys(%h);
You can use hash as a list, convert it to k/v aref pairs, perform sort on values (second element), and pick keys from sorted list (roughly it is Schwartzian transform in disguise).
use strict;
use warnings;
use List::Util 'pairs';
my %h = ("1010" => 1, "1110" => 0, "0001" => 3, "1100" => 2);
my #k = map $_->[0],
sort { $b->[1] <=> $a->[1] }
pairs %h;
without additional modules,
my #k = map $_->[0],
sort { $b->[1] <=> $a->[1] }
map [ $_, $h{$_} ],
keys %h;

Sort an array of hashes

I have a reference that has the following data structure when dumped:
VAR1 = [
{
'0' => 0
},
{
'1' => 1
},
{
'-1' => 2
},
{
'2' => 3
},
];
I am trying to loop over them and eventually sort by key. Here is an example of my code:
use strict;
use warnings;
use Data::Dumper;
my $skew_ref;
push #{$skew_ref}, { 0 => 0, 1 => 1, -1 => 2, 2 => 3, };
my #sorted;
for my $ref ( #{$skew_ref} ) {
while ( my ($k, $v ) = each %{$ref} ) {
print "$k => $v\n";
}
#sorted = sort { %{$b} <=> %{$a} } keys %{$ref};
}
print Dumper(\#sorted);
What am I doing incorrectly? I want the smallest key value and it is giving me the largest.
The output should just be 2 in this case.
use List::Util qw( min );
my $skews = { 0 => 0, 1 => 1, -1 => 2, 2 => 3 };
my $val = $skews->{ min keys %$skews };
Contrary to your implications, there cannot be more than one result since a hash cannot have two elements with the same key.
my #sorted = map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, keys %$_ ], #arr;
Answering your direct question: you swapped a and b in the sort closure:
#sorted = sort { %{$a} <=> %{$b} } keys %{$ref};

Perl extract range of elements from a hash

If I have a hash:
%hash = ("Dog",1,"Cat",2,"Mouse",3,"Fly",4);
How can I extract the first X elements of this hash. For example if I want the first 3 elements, %newhash would contain ("Dog",1,"Cat",2,"Mouse",3).
I'm working with large hashes (~ 8000 keys).
"first X elements of this hash" doesn't mean anything. First three elements in order by numeric value?
my %hash = ( 'Dog' => 1, 'Cat' => 2, 'Mouse' => 3, 'Fly' => 4 );
my #hashkeys = sort { $hash{$a} <=> $hash{$b} } keys %hash;
splice(#hashkeys, 3);
my %newhash;
#newhash{#hashkeys} = #hash{#hashkeys};
You might want to use something like this:
my %hash = ("Dog",1,"Cat",2,"Mouse",3,"Fly",4);
for ( (sort keys %hash)[0..2] ) {
say $hash{$_};
}
You should have an array 1st:
my %hash = ("Dog" => 1,"Cat"=>2,"Mouse"=>3,"Fly"=>4);
my #array;
foreach $value (sort {$hash{$a} <=> $hash{$b} }
keys %hash)
{
push(#array,{$value=>$hash{$value}});
}
#get range:
my #part=#array[0..2];
print part of result;
print $part[0]{'Cat'}."\n";

perl: shuffle value-sorted hash?

At first sorry for my english - i hope you will understand me.
There is a hash:
$hash{a} = 1;
$hash{b} = 3;
$hash{c} = 3;
$hash{d} = 2;
$hash{e} = 1;
$hash{f} = 1;
I want to sort it by values (not keys) so I have:
for my $key ( sort { $hash{ $a } <=> $hash{ $b } } keys %hash ) { ... }
And at first I get all the keys with value 1, then with value 2, etc... Great.
But if hash is not changing, the order of keys (in this sort-by-value) is always the same.
Question: How can I shuffle sort-results, so every time I run 'for' loop, I get different order of keys with value 1, value 2, etc. ?
Not quite sure I well understand your needs, but is this ok:
use List::Util qw(shuffle);
my %hash;
$hash{a} = 1;
$hash{b} = 3;
$hash{c} = 3;
$hash{d} = 2;
$hash{e} = 1;
$hash{f} = 1;
for my $key (sort { $hash{ $a } <=> $hash{ $b } } shuffle( keys %hash )) {
say "hash{$key} = $hash{$key}"
}
You can simply add another level of sorting, which will be used when the regular sorting method cannot distinguish between two values. E.g.:
sort { METHOD_1 || METHOD_2 || ... METHOD_N } LIST
For example:
sub regular_sort {
my $hash = shift;
for (sort { $hash->{$a} <=> $hash->{$b} } keys %$hash) {
print "$_ ";
};
}
sub random_sort {
my $hash = shift;
my %rand = map { $_ => rand } keys %hash;
for (sort { $hash->{$a} <=> $hash->{$b} ||
$rand{$a} <=> $rand{$b} } keys %$hash ) {
print "$_ ";
};
}
To sort the keys by value, with random ordering of keys with identical values, I see two solutions:
use List::Util qw( shuffle );
use sort 'stable';
my #keys =
sort { $hash{$a} <=> $hash{$b} }
shuffle keys %hash;
or
my #keys =
map $_->[0],
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
map [ $_, $hash{$_}, rand ],
keys %hash;
The use sort 'stable'; is required to prevent sort from corrupting the randomness of the list returned by shuffle.
The above's use of the Schwartzian Transform is not an attempt at optimisation. I've seen people use rand in the compare function itself to try to achieve the above result, but doing so is buggy for two reasons.
When using "misbehaving" comparisons such as that, the results are documented as being undefined, so sort is allowed to return garbage, repeated elements, missing elements, etc.
Even if sort doesn't return garbage, it won't be a fair sort. The result will be weighed.
You can have two functions for ascending and decending order and use them accordingly like
sub hasAscending {
$hash{$a} <=> $hash{$b};
}
sub hashDescending {
$hash{$b} <=> $hash{$a};
}
foreach $key (sort hashAscending (keys(%hash))) {
print "\t$hash{$key} \t\t $key\n";
}
foreach $key (sort hashDescending (keys(%hash))) {
print "\t$hash{$key} \t\t $key\n";
}
It seems like you want to randomize looping through the keys.
Perl, does not store in sequential or sorted order, but this doesn't seem to be random enough for you, so you may want to create an array of keys and loop through that instead.
First, populate an array with keys, then use a random number algorithm (1..$#length_of_array) to push the key at that position in the array, to the array_of_keys.
If you're trying to randomize the keys of the sorted-by-value hash, that's a little different.
See Codepad
my %hash = (a=>1, b=>3, c=>3, d=>2, e=>1, f=>1);
my %hash_by_val;
for my $key ( sort { $hash{$a} <=> $hash{$b} } keys %hash ) {
push #{ $hash_by_val{$hash{$key}} }, $key;
}
for my $key (sort keys %hash_by_val){
my #arr = #{$hash_by_val{$key}};
my $arr_ubound = $#arr;
for (0..$arr_ubound){
my $randnum = int(rand($arr_ubound));
my $val = splice(#arr,$randnum,1);
$arr_ubound--;
print "$key : $val\n"; # notice: output varies b/t runs
}
}

Return all hash key/value pairs with maximum value

I have a hash (in Perl) where the values are all numbers. I need to create another hash that contains all key/value pairs from the first hash where the value is the maximum of all values.
For example, given
my %hash = (
key1 => 2,
key2 => 6,
key3 => 6,
);
I would like to create a new hash containing:
%hash_max = (
key2 => 6,
key3 => 6,
);
I'm sure there are many ways to do this, but am looking for an elegant solution (and an opportunity to learn!).
use List::Util 'max';
my $max = max(values %hash);
my %hash_max = map { $hash{$_}==$max ? ($_, $max) : () } keys %hash;
Or a one-pass approach (similar to but slightly different from another answer):
my $max;
my %hash_max;
keys %hash; # reset iterator
while (my ($key, $value) = each %hash) {
if ( !defined $max || $value > $max ) {
%hash_max = ();
$max = $value;
}
$hash_max{$key} = $value if $max == $value;
}
This makes one pass over the data, but wastes a lot of hash writes:
use strict;
use warnings;
my %hash = (
key1 => 2,
key2 => 6,
key3 => 6,
);
my %hash_max = ();
my $max;
foreach my $key (keys %hash) {
if (!defined($max) || $max < $hash{$key} ) {
%hash_max = ();
$max = $hash{$key};
$hash_max{$key} = $hash{$key};
}
elsif ($max == $hash{$key}) {
$hash_max{$key} = $hash{$key};
}
}
foreach my $key (keys %hash_max) {
print "$key\t$hash_max{$key}\n";
}
# sort numerically descending
my #topkey = sort {$hash{$b} <=> $hash{$a}} keys %hash;
Then copy the top values to %hash_max, with a loop terminator after the last max value:
for $key (#topkey) {
if ($hash{$key} == $hash{$topkey[0]}) {
$hash_max{$key} = $hash{$key}
} else { last }
}
ETA: Note to the unbelievers that last works because the keys in #topkey are sorted, so we can break the loop when the value is no longer like the first one. I.e. all the following values are lower.