How to show top values from hash in perl? - perl

Let's say I have the following info stored in hash:
kiwi 15
oranges 25
cherries 30
apples 2
pears 1
I want to write code that would display in the descending (by amount) top 3 entries.
So the output should be
cherries 30
oranges 25
kiwi 15
I can't seem to find the clear answer on that.

This will do as you ask. It uses sort with a comparison block that compares each key's corresponding hash value in reverse order. Then each of the first three sorted keys is printed along with its value from the hash.
Note that, if there are multiple hash elements with the same highest value, then this code will print an arbitrary three elements out of those that share the same value
use strict;
use warnings;
my %data = qw/
kiwi 15
oranges 25
cherries 30
apples 2
pears 1
/;
my #sorted_keys = sort { $data{$b} <=> $data{$a} } keys %data;
for my $key ( #sorted_keys[0..2] ) {
print "$key $data{$key}\n";
}
output
cherries 30
oranges 25
kiwi 15
Update
For a more general solution, the (non-core) List::UtilsBy module offers a number of utility functions that offer sorts, maxima and minima as a function of the object list. It lets me write the above as
use List::UtilsBy qw/ nsort_by /;
my #sorted_keys = nsort_by { $data{$_} } keys %data;
for my $key ( (reverse #sorted_keys)[0..2] ) {
print "$key $data{$key}\n";
}
or, if you prefer the reverse in a different place
use List::UtilsBy qw/ rev_nsort_by /;
my #sorted_keys = rev_nsort_by { $data{$_} } keys %data;
for my $key ( #sorted_keys[0..2] ) {
print "$key $data{$key}\n";
}
Observe that the difference between the module's sort_by and nsort_by is equivalent to the difference between the cmp and <=> comparison operators, respectively.
Both of these alternatives generate identical output to the original above

Related

How to sort an array by substring, perl

So, for example i have array like that:
my #arr = (
"blabla\t23\t55",
"jkdcbx\t55\t89",
"jdxjcl\t88\t69",
......)
And i need to sort this array by second column after \t, without outer splits. Is it possible to do?
May be a more elegant way but this will work :
my #arr = ( "blabla\t23\t55", "jkdcbx\t55\t89", "jdxjcl\t88\t69");
for (sort {(split(/\t/,$a))[2] <=> (split(/\t/,$b))[2]} #arr) {
print "$_\n";
}
Update
I've just realised that your question may mean that you want to sort by the third column instead of the second
That would be done by using
my ($aa, $bb) = map { (split /\t/)[2] } $a, $b;
instead
output
blabla 23 55
jdxjcl 88 69
jkdcbx 55 89
I always prefer to use map to convert the values from the original data into the function that they should be sorted by
This program demonstrates
I assume you want the values sorted numerically? Unfortunately your example data is already sorted as you describe
use strict;
use warnings 'all';
use feature 'say';
my #arr = (
"blabla\t23\t55",
"jkdcbx\t55\t89",
"jdxjcl\t88\t69",
);
my #sorted = sort {
my ($aa, $bb) = map { (split /\t/)[1] } $a, $b;
$aa <=> $bb;
} #arr;
say for #sorted;
output
blabla 23 55
jkdcbx 55 89
jdxjcl 88 69
Try this
use warnings;
use strict;
no warnings "numeric";
my #arr = (
"blabla\t23\t55",
"jkdcbx\t85\t89",
"jdxjcl\t83\t69",
);
my #result = sort {$a=~s/^[^\t]*\t//r <=> $b=~s/^[^\t]*\t//r } #arr;
$, = "\n";
print #result,"\n";
I have used following technique with sort for to do it
Negation character class
Non-destructive modifier(-r) - perform non-destructive substitution and return the new value
And tured of the warning for numeric

Sort hash attending to two parameters

I have a hash with the keys in the following format:
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
I want to print out the hash sorted in the following format:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
So first I have to order numerically attending to the first number and then attending to the last one. I don't want a regular expression searching for "scaffold_" since this may vary. I mean, I can have the hash with other format like "blablabla_NUMBER_NUMBER, or blablablaNUMBER_NUMBER". The last part of the key _NUMBER, is the only thing that is permanent.
I've this code but only sorts numerically attending to the first number:
my #keys = sort {
my ($aa) = $a =~ /(\d+)/;
my ($bb) = $b =~ /(\d+)/;
$aa <=> $bb;
} keys %hash;
foreach my $key (#keys) {
print $key;
}
Any suggestion?
Sort::Naturally to the rescue!
#!/usr/bin/perl
use strict;
use warnings;
use Sort::Naturally qw(nsort);
my %hash = (
scaffold_902_159 => 'v1',
scaffold_2_1980420 => 'v2',
scaffold_2_10 => 'v3',
scaffold_10_402 => 'v4',
);
print "$_\n" for nsort keys %hash;
Output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
As per your query, tried out some keys which did not have number in middle.
#!/usr/bin/perl
use strict;
use warnings;
use Sort::Naturally qw(nsort);
my #keys = qw(
should_come_last_9999_0
blablabla_10_403
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
blablabla902_1
blablabla901_3
);
print "$_\n" for nsort #keys;
Output:
blablabla_10_403
blablabla901_3
blablabla902_1
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
should_come_last_9999_0
This sorts on two columns, and uses the Schwartzian transform to create those columns from your strings.
use strict;
use warnings;
use feature 'say';
my #keys = qw(
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
);
#keys =
map { $_->[0] } # transform back
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] } # sort
map { # transform
m/(\d+)(?:\D+(\d+))/;
[ $_, ( defined $2 ? ( $1, $2 ) : ( 0xffffffff, $1 ) ) ]
} #keys;
say for #keys;
Output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
The data structure returned by the initial transformation map looks like this:
[ 'scaffold_902_159', 902, 159 ]
The sort uses that to first sort by index 1 (the 902) above with the numerical sort <=>. That operator returns 0 if both the RHS and the LHS are equal, so the or || continues with the right expression, It then sorts on index 2 (the 159).
Because you said the first number is optional, and if only the second number is there those elements should come last, we have to substitute a very high number for that. Without going into 64bit integers, 0xffffffff is the highest number we can make.
The second map pulls the full key out of index 0 of the array reference.
If we add some other things to the input, like the blablablaNUMBER_NUMBER you suggested, it will still only sort on the numbers and ignore the string part completely.
my #keys = qw(
should_come_last_9999_0
blablabla_10_403
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
no_first_number_1
);
Here's the output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
blablabla_10_403
blablabla902_1
scaffold_902_159
should_come_last_9999_0
no_first_number_1

finding highest value in hash

I have a hash with 5 keys, each of these keys have 5 values
foreach $a(#mass){
if($a=~some regex){
#value=($1,$2,$3,$4,$5);
$hash{"keysname$c"}="#value";
c++;
}
}
Each scalar is a value of different parameters , I have to determinate the highest value of the first array for the all keys in hash
Edit:
Code must compare first value of key1 with first value of key2, key3...key5 and print the highest one
This will print max value for structure like
my %hash = ( k1 => [6,4,1], k2 => [16,14,11] );
use List::Util qw(max);
# longest array
my $n = max map $#$_, values %hash;
for my $i (0 .. $n) {
my $max = max map $_->[$i], values %hash;
print "max value on position $i is $max\n";
}
and for strings,
my %hash = ( k1 => "6 4 1", k2 => "16 14 11" );
use List::Util qw(max);
# longest array
my $n = max map $#{[ split ]}, values %hash;
for my $i (0 .. $n) {
my $max = max map [split]->[$i], values %hash;
print "max value on position $i is $max\n";
}
If I understand your question correctly (and it's a little unclear) then I think you want something like this:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use List::Util 'max';
my (#data, #max);
while (<DATA>) {
chomp;
push #data, [split];
}
for my $i (0 .. $#{$data[0]}) {
push #max, max map { $_->[$i] } #data;
}
say "#max";
__DATA__
93 3 26 87 7
66 96 46 77 42
26 3 71 64 91
31 27 14 40 86
82 72 71 34 7
try this
map {push #temp, #{$_}} values %hash;
#desc_sorted= sort {$b <=> $a} #temp;
print $desc_sorted[0],"\n";
map will consolidate all lists to a single list and sort will sort that consolidated array in descending order.

why does sort with uniq not work together

I have the following script:
use strict;
use List::MoreUtils qw/uniq/;
use Data::Dumper;
my #x = (3,2);
my #y = (4,3);
print "unique results \n";
print Dumper([uniq(#x,#y)]);
print "sorted unique results\n";
print Dumper([sort uniq(#x,#y)]);
The output is
unique results
$VAR1 = [
3,
2,
4
];
sorted unique results
$VAR1 = [
2,
3,
3,
4
];
So it looks that sort does not work with uniq.
I did not understand why.
I ran the perl script with -MO=Deparse and got
use List::MoreUtils ('uniq');
use Data::Dumper;
use strict 'refs';
my(#x) = (3, 2);
my(#y) = (4, 3);
print "unique results \n";
print Dumper([uniq(#x, #y)]);
print "sorted unique results\n";
print Dumper([(sort uniq #x, #y)]);
My interpretation is that perl decided to remove the parentheses from uniq(#x,#y) and using uniq as a function of sort.
Why did perl decide to do it?
How can i avoid these and similar pitfalls?
Thanks,
David
The sort builtin accepts a subroutine name or block as first argument which is passed two items. It then must return a number which determines the order between the items. These snippets all do the same:
use feature 'say';
my #letters = qw/a c a d b/;
say "== 1 ==";
say for sort #letters;
say "== 2 ==";
say for sort { $a cmp $b } #letters;
say "== 3 ==";
sub func1 { $a cmp $b }
say for sort func1 #letters;
say "== 4 ==";
sub func2 ($$) { $_[0] cmp $_[1] } # special case for $$ prototype
say for sort func2 #letters;
Notice that there isn't any comma between the function name and the list, and note that the parens in Perl are primarly used to determine precedence – sort func1 #letters and sort func1 (#letters) are the same, and neither executes func1(#letters).
To disambiguate, place a + before the function name:
sort +uniq #letters;
To avoid such unexpected behaviour, the best solution is to read the docs when you aren't sure how a certain builtin behaves – unfortunately, many have some special parsing rules.
You could put parenthesis arround the uniq fonction:
print Dumper([sort (uniq(#x,#y))]);
output:
$VAR1 = [
2,
3,
4
];

sorting an array on the first number found in each element

I'm looking for help sorting an array where each element is made up of "a number, then a string, then a number". I would like to sort on the first number part of the array elements, descending (so that I list the higher numbers first), while also listing the text etc.
am still a beginner so alternatives to the below are also welcome
use strict;
use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 ); # build an array of 100 random numbers between 1 and 49
my #count2;
foreach my $i (1..49) {
my #count = join(',', #arr) =~ m/$i,/g; # maybe try to make a string only once then search trough it... ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
}
#for (#count2) {print "$_\n";}
# try to add up all numbers in the first coloum to make sure they == 100
#sort #count2 and print the top 7
#count2 = sort {$b <=> $a} #count2; # try to stop printout of this, or sort on =~ m/^anumber/ ??? or just on the first one or two \d
foreach my $i (0..6) {
print $count2[$i] ."\n"; # seems to be sorted right anyway
}
First, store your data in an array, not in a string:
# inside the first loop, replace your line with the push() with this one:
push(#count2, [$count1, $i];
Then you can easily sort by the first element of each subarray:
my #sorted = sort { $b->[0] <=> $a->[0] } #count2;
And when you print it, construct the string:
printf "%d times for %d\n", $sorted[$i][0], $sorted[$i][1];
See also: http://perldoc.perl.org/perlreftut.html, perlfaq4
Taking your requirements as is. You're probably better off not embedding count information in a string. However, I'll take it as a learning exercise.
Note, I am trading memory for brevity and likely speed by using a hash to do the counting.
However, the sort could be optimized by using a Schwartzian Transform.
EDIT: Create results array using only numbers that were drawn
#!/usr/bin/perl
use strict; use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 );
my %counts;
++$counts{$_} for #arr;
my #result = map sprintf('%d times for %d', $counts{$_}, $_),
sort {$counts{$a} <=> $counts{$b}} keys %counts;
print "$_\n" for #result;
However, I'd probably have done something like this:
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my #arr;
$#arr = 99; #initialize #arr capacity to 100 elements
my %counts;
for my $i (0 .. 99) {
my $n = int(rand(49) + 1); # pick a number
$arr[ $i ] = $n; # store it
++$counts{ $n }; # update count
}
# sort keys according to counts, keys of %counts has only the numbers drawn
# for each number drawn, create an anonymous array ref where the first element
# is the number drawn, and the second element is the number of times it was drawn
# and put it in the #result array
my #result = map [$_, $counts{$_}],
sort {$counts{$a} <=> $counts{$b} }
keys %counts;
print Dump \#result;