Sort hash attending to two parameters - perl

I have a hash with the keys in the following format:
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
I want to print out the hash sorted in the following format:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
So first I have to order numerically attending to the first number and then attending to the last one. I don't want a regular expression searching for "scaffold_" since this may vary. I mean, I can have the hash with other format like "blablabla_NUMBER_NUMBER, or blablablaNUMBER_NUMBER". The last part of the key _NUMBER, is the only thing that is permanent.
I've this code but only sorts numerically attending to the first number:
my #keys = sort {
my ($aa) = $a =~ /(\d+)/;
my ($bb) = $b =~ /(\d+)/;
$aa <=> $bb;
} keys %hash;
foreach my $key (#keys) {
print $key;
}
Any suggestion?

Sort::Naturally to the rescue!
#!/usr/bin/perl
use strict;
use warnings;
use Sort::Naturally qw(nsort);
my %hash = (
scaffold_902_159 => 'v1',
scaffold_2_1980420 => 'v2',
scaffold_2_10 => 'v3',
scaffold_10_402 => 'v4',
);
print "$_\n" for nsort keys %hash;
Output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
As per your query, tried out some keys which did not have number in middle.
#!/usr/bin/perl
use strict;
use warnings;
use Sort::Naturally qw(nsort);
my #keys = qw(
should_come_last_9999_0
blablabla_10_403
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
blablabla902_1
blablabla901_3
);
print "$_\n" for nsort #keys;
Output:
blablabla_10_403
blablabla901_3
blablabla902_1
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
should_come_last_9999_0

This sorts on two columns, and uses the Schwartzian transform to create those columns from your strings.
use strict;
use warnings;
use feature 'say';
my #keys = qw(
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
);
#keys =
map { $_->[0] } # transform back
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] } # sort
map { # transform
m/(\d+)(?:\D+(\d+))/;
[ $_, ( defined $2 ? ( $1, $2 ) : ( 0xffffffff, $1 ) ) ]
} #keys;
say for #keys;
Output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
The data structure returned by the initial transformation map looks like this:
[ 'scaffold_902_159', 902, 159 ]
The sort uses that to first sort by index 1 (the 902) above with the numerical sort <=>. That operator returns 0 if both the RHS and the LHS are equal, so the or || continues with the right expression, It then sorts on index 2 (the 159).
Because you said the first number is optional, and if only the second number is there those elements should come last, we have to substitute a very high number for that. Without going into 64bit integers, 0xffffffff is the highest number we can make.
The second map pulls the full key out of index 0 of the array reference.
If we add some other things to the input, like the blablablaNUMBER_NUMBER you suggested, it will still only sort on the numbers and ignore the string part completely.
my #keys = qw(
should_come_last_9999_0
blablabla_10_403
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
no_first_number_1
);
Here's the output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
blablabla_10_403
blablabla902_1
scaffold_902_159
should_come_last_9999_0
no_first_number_1

Related

How can I remove the title name of each unique number in Data::Dumper in perl?

I use Data::Dumper to catch uniqe number in each element.
#!perl
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
my #names = qw(A A A A B B B C D);
my %counts;
$counts{$_}++ for #names;
print Dumper (\%counts);
exit;
This is output.
$VAR1 = {
'A' => 4,
'B' => 3,
'C' => 1,
'D' => 1
};
How can I remove the title name of each unique number to get output like this format?
$VAR1 = { 4 ,3 ,1 ,1 }
Presuming you want the counts in descending order, you could use the following:
printf "\$VAR1 = { %s};\n",
join ',',
map "$_ ",
sort { $b <=> $a }
values(%counts);
If instead you want the counts sorted by key,
printf "\$VAR1 = { %s};\n",
join ',',
map "$counts{$_} ",
sort
keys(%counts);
Either way, that's a really weird format. Square brackets would make more sense than curly ones.
One of many ways to get desired result
use strict;
use warnings;
use feature 'say';
my #names = qw( A A A A B B B C D );
my %counts;
$counts{$_}++ for #names;
my #values = map { $counts{$_} } sort keys %counts;
say join(',', #values);
output
4,3,1,1

How to sort an array by substring, perl

So, for example i have array like that:
my #arr = (
"blabla\t23\t55",
"jkdcbx\t55\t89",
"jdxjcl\t88\t69",
......)
And i need to sort this array by second column after \t, without outer splits. Is it possible to do?
May be a more elegant way but this will work :
my #arr = ( "blabla\t23\t55", "jkdcbx\t55\t89", "jdxjcl\t88\t69");
for (sort {(split(/\t/,$a))[2] <=> (split(/\t/,$b))[2]} #arr) {
print "$_\n";
}
Update
I've just realised that your question may mean that you want to sort by the third column instead of the second
That would be done by using
my ($aa, $bb) = map { (split /\t/)[2] } $a, $b;
instead
output
blabla 23 55
jdxjcl 88 69
jkdcbx 55 89
I always prefer to use map to convert the values from the original data into the function that they should be sorted by
This program demonstrates
I assume you want the values sorted numerically? Unfortunately your example data is already sorted as you describe
use strict;
use warnings 'all';
use feature 'say';
my #arr = (
"blabla\t23\t55",
"jkdcbx\t55\t89",
"jdxjcl\t88\t69",
);
my #sorted = sort {
my ($aa, $bb) = map { (split /\t/)[1] } $a, $b;
$aa <=> $bb;
} #arr;
say for #sorted;
output
blabla 23 55
jkdcbx 55 89
jdxjcl 88 69
Try this
use warnings;
use strict;
no warnings "numeric";
my #arr = (
"blabla\t23\t55",
"jkdcbx\t85\t89",
"jdxjcl\t83\t69",
);
my #result = sort {$a=~s/^[^\t]*\t//r <=> $b=~s/^[^\t]*\t//r } #arr;
$, = "\n";
print #result,"\n";
I have used following technique with sort for to do it
Negation character class
Non-destructive modifier(-r) - perform non-destructive substitution and return the new value
And tured of the warning for numeric

Recursive sorting in Perl

I have a hash that contains keys that correspond to database subscripts, but the database can have multidimensional records so the key could be a single subscript, or a list of subscripts.
I need to find a way to sort these records so I can print them in a logical order.
Example:
my $data = {
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'3,1,2' => 'more',
};
# Should print the keys in the properly sorted order
print join "\n", sort some_function keys %$data;
sub some_function {
# Do some sorting magikz
}
I want it to sort by the leftmost subscript first. If the leftmost value is identical I want to move to the next value and compare those. If those are identical I want to continue to the next one ... and so on ... until all possibilities are exhausted.
This will most likely involve some recursion, but I can't figure out how to make recursion work with those fancy $a and $b variables.
What can I put in some_function to get the following output?
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
3,1,2
The following is the fastest solution (by far!):
my #sorted_keys =
map { join ',', unpack 'N*', $_ }
sort
map { pack 'N*', split /,/, $_ }
keys(%$data);
If you want something simpler, and still quite fast, you could use a "natural sort".
Sort::Key::Natural
use Sort::Key::Natural qw( natsort );
my #sorted_keys = natsort(keys(%$data));
Sort::Naturally
use Sort::Naturally qw( nsort );
my #sorted_keys = nsort(keys(%$data));
Benchmarks:
Rate SN SKN grt
SN 3769/s -- -40% -88%
SKN 6300/s 67% -- -79%
grt 30362/s 705% 382% --
Benchmark code:
use strict;
use warnings;
use Benchmark qw( cmpthese );
use List::Util qw( shuffle );
use Sort::Key::Natural qw( );
use Sort::Naturally qw( );
my #keys =
shuffle
split ' ',
'1 1,0 1,1 1,1,1 1,1,3 1,2 2,1,1 2,1,2 2,1,3 2,2,2 3,1,2 10,1,1';
sub grt {
my #sorted_keys =
map { join ',', unpack 'N*', $_ }
sort
map { pack 'N*', split /,/, $_ }
#keys;
}
sub SKN { my #sorted_keys = Sort::Key::Natural::natsort(#keys); }
sub SN { my #sorted_keys = Sort::Naturally::nsort(#keys); }
cmpthese(-3, {
grt => \&grt,
SKN => \&SKN,
SN => \&SN,
});
I thought the Sort::Naturally module would help you here, but it seems not
I must have had a bug in my test. This works fine
use Sort::Naturally 'nsort';
say for nsort keys %$data;
I recommend either this or the Sort::Key::Naturally solution as they are the clearest
It is bad practice to chase speed of execution, especially at the expense of readability, before there is evidence that a given solution is too slow. Even then it is foolish to randomly optimise chunks of your code in the hope of making a difference, and your solution should be run through a profiler to discover where it would be most fruitful to make enhancements
There is no need for recursion. This program shows a sort subroutine by_elements which simply compares each item in the list until it finds either a mismatch or the end of one of the lists
In the former case the result is just the comparison of the two differ elements, and in the latter it is a comparison of the number of elements in the two lists
use strict;
use warnings 'all';
use feature 'say';
my $data = {
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'3,1,2' => 'more',
'10,1,1' => 'odd',
'1,1' => 'simple',
'1,0' => 'simple0',
'1' => 'simpler',
};
say for sort by_elements keys %$data;
sub by_elements {
my ( $aa, $bb ) = map [/\d+/g], $a, $b;
for ( my $i = 0; $i < #$aa and $i < #$bb; ++$i ) {
my $cmp = $aa->[$i] <=> $bb->[$i];
return $cmp if $cmp;
}
return #$aa <=> #$bb;
}
output
1
1,0
1,1
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
3,1,2
10,1,1
Use natsort of Sort::Key::Natural:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
use Sort::Key::Natural qw(natsort);
my %data = (
'1,1,1' => 'data1',
'1,2' => 'data2',
'1,1,3' => 'stuff',
'2,1,1' => 'data3',
'2,1,2' => 'data4',
'2,1,3' => 'data blah',
'2,2,2' => 'datawk2n',
'10,1,2' => 'more',
);
say for natsort keys %data;
Outputs:
1,1,1
1,1,3
1,2
2,1,1
2,1,2
2,1,3
2,2,2
10,1,2
No need for recursion, just a loop that you can break out of.
sub some_function {
my #aa = split /,/, $a;
my #bb = split /,/, $b;
my $cmp = 0;
for (my $i=0; $i<#aa || $i<#bb; $i++) {
$cmp = $aa[$i] <=> $bb[$i];
last if $cmp;
}
$cmp;
}
But if your heart is set on a recursive solution, there's
sub aref_sort_recurse {
my ($c,$d) = #_;
#$c ? #$d ? shift #$c <=> shift #$d || aref_sort_recurse($c,$d) ? 1 : -#$d
}
sub some_function {
aref_sort_recurse( [split /,/, $a], [split /,/, $b] )
}

slicing and array into a hasn value

What I was trying to do was combine elements[1..3] into a single array, and then make the has out of that. Then sort by keys and print out the whole thing.
#!/usr/bin/perl
my %hash ;
while ( <> ) {
#elements = split /,/, $_;
#slice_elements = #elements[1..3] ;
if ($elements[0] ne '' ) {
$hash{ $elements[0] } = $slice_elements[0];
}
}
foreach $key (sort keys %hash ) {
print "$key; $hash{$key}\n";
}
This is what I get when I print this out -
casper_mint#casper-mint-dell /tmp $ /tmp/dke /tmp/File1.csv
060001.926941; TOT
060002.029434; RTP
060002.029568; RTP
060002.126895; UL
060002.229327; RDS/A
060002.312512; EON
060002.429382; RTP
060002.585408; BCS
060002.629333; LYG
060002.712240; HBC
This is waht I want the elements of the array - element[0] is the key and element[1..3] in the value
060001.926941,TOT,86.26,86.48
060002.029434,RTP,310.0,310.66
060002.029568,RTP,310.0,310.74
060002.126895,UL,34.06,34.14
060002.229327,RDS/A,84.47,84.72
060002.312512,EON,56.88,57.04
060002.429382,RTP,310.08,310.77
060002.585408,BCS,58.96,59.06
060002.629333,LYG,46.13,46.41
060002.712240,HBC,93.06,93.23
Always include use strict; and use warnings; at the top of EVERY perl script.
What you need is to create a new anonymous array [ ] as the value to your hash. Then join the values when displaying the results:
#!/usr/bin/perl
use strict;
use warnings;
my %hash;
while (<>) {
chomp;
my #elements = split /,/, $_;
if ($elements[0] ne '' ) {
$hash{ $elements[0] } = [#elements[1..3]];
}
}
foreach my $key (sort keys %hash ) {
print join(',', $key, #{$hash{$key}}) . "\n";
}
Of course, if your data really is fixed width like that, and you're not actually doing anything with the values, there actually is no need to split and join. The following would do the same thing:
use strict;
use warnings;
print sort <>;

In Perl, how can I print the key corresponding to the maximum value in a hash?

How can I print only the first key and element of my hash?
I have already a sorted hash, but I want to print only the first key and respective value
thanks,
Thanks to all of you at the end I push the keys and the values to two different #array and print element 0 of each array and it works :)
Hashes have unordered keys. So, there is no such key as a first key in a hash.
However, if you need the key that sorts first (for maximum key value):
my %hash = (
'foo' => 'bar',
'qux' => 'baz',
);
my ($key) = sort { $b cmp $a } keys %hash;
print "$key => $hash{$key}"; # Outputs: qux => baz
Remember to use <=> instead of cmp for numerical sorting.
In perl hashes there is no ordering for keys. Use sort function to get the keys in the order that you want or you can push the keys into an array as you create the hash and your first key will be in zero th index in the array
You can use the below code, i am assuming hash name is my_hash and keys and values are numbers. If you have strings, you can use cmp instead of <=>. Refer to the sort documentation for more details
Get the max key
foreach (sort {$b <=> $a} keys %my_hash) {
print "Keys is $_\n";
print "Value is $my_hash{$_}\n";
last;
}
Get the key corresponding to the max value
foreach (sort {$my_hash{$b} <=> $my_hash{$a}} keys %my_hash) {
print "Keys is $_\n";
print "Value is $my_hash{$_}\n";
last;
}
foreach my $key (sort keys(%hash)) {
print "$key" . "$hash{$key}" . "\n";
last;
}
For large hashes, if you do not need the sorted keys for any other reason, it might be better to avoid sorting.
#!/usr/bin/env perl
use strict; use warnings;
my %hash = map { $_ => rand(10_000) } 'aa' .. 'zz';
my ($argmax, $max) = each %hash;
keys %hash; # reset iterator
while (my ($k, $v) = each %hash) {
if ($v >= $max) {
$max = $v;
$argmax = $k;
}
}
print "$argmax => $max\n";
If you are intent on sorting, you only need the key with the maximum value, not the entire arrays of keys and values:
#!/usr/bin/env perl
use strict; use warnings;
my %hash = map { $_ => rand(10_000) } 'aa' .. 'zz';
my ($argmax) = sort { $hash{$b} <=> $hash{$a} } keys %hash;
print "$argmax => $hash{$argmax}\n";
Just as Alan wrote - hashes don't have specific order, but you can sort hash keys:
foreach my $key (sort keys(%hash)) {
print $key . ': ' . $hash{$key} . "\n";
}
or, as you wish, get first element from keys array:
my #keys = keys(%hash);
print $keys[0];