How to sort an array by substring, perl - perl

So, for example i have array like that:
my #arr = (
"blabla\t23\t55",
"jkdcbx\t55\t89",
"jdxjcl\t88\t69",
......)
And i need to sort this array by second column after \t, without outer splits. Is it possible to do?

May be a more elegant way but this will work :
my #arr = ( "blabla\t23\t55", "jkdcbx\t55\t89", "jdxjcl\t88\t69");
for (sort {(split(/\t/,$a))[2] <=> (split(/\t/,$b))[2]} #arr) {
print "$_\n";
}

Update
I've just realised that your question may mean that you want to sort by the third column instead of the second
That would be done by using
my ($aa, $bb) = map { (split /\t/)[2] } $a, $b;
instead
output
blabla 23 55
jdxjcl 88 69
jkdcbx 55 89
I always prefer to use map to convert the values from the original data into the function that they should be sorted by
This program demonstrates
I assume you want the values sorted numerically? Unfortunately your example data is already sorted as you describe
use strict;
use warnings 'all';
use feature 'say';
my #arr = (
"blabla\t23\t55",
"jkdcbx\t55\t89",
"jdxjcl\t88\t69",
);
my #sorted = sort {
my ($aa, $bb) = map { (split /\t/)[1] } $a, $b;
$aa <=> $bb;
} #arr;
say for #sorted;
output
blabla 23 55
jkdcbx 55 89
jdxjcl 88 69

Try this
use warnings;
use strict;
no warnings "numeric";
my #arr = (
"blabla\t23\t55",
"jkdcbx\t85\t89",
"jdxjcl\t83\t69",
);
my #result = sort {$a=~s/^[^\t]*\t//r <=> $b=~s/^[^\t]*\t//r } #arr;
$, = "\n";
print #result,"\n";
I have used following technique with sort for to do it
Negation character class
Non-destructive modifier(-r) - perform non-destructive substitution and return the new value
And tured of the warning for numeric

Related

Sort hash attending to two parameters

I have a hash with the keys in the following format:
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
I want to print out the hash sorted in the following format:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
So first I have to order numerically attending to the first number and then attending to the last one. I don't want a regular expression searching for "scaffold_" since this may vary. I mean, I can have the hash with other format like "blablabla_NUMBER_NUMBER, or blablablaNUMBER_NUMBER". The last part of the key _NUMBER, is the only thing that is permanent.
I've this code but only sorts numerically attending to the first number:
my #keys = sort {
my ($aa) = $a =~ /(\d+)/;
my ($bb) = $b =~ /(\d+)/;
$aa <=> $bb;
} keys %hash;
foreach my $key (#keys) {
print $key;
}
Any suggestion?
Sort::Naturally to the rescue!
#!/usr/bin/perl
use strict;
use warnings;
use Sort::Naturally qw(nsort);
my %hash = (
scaffold_902_159 => 'v1',
scaffold_2_1980420 => 'v2',
scaffold_2_10 => 'v3',
scaffold_10_402 => 'v4',
);
print "$_\n" for nsort keys %hash;
Output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
As per your query, tried out some keys which did not have number in middle.
#!/usr/bin/perl
use strict;
use warnings;
use Sort::Naturally qw(nsort);
my #keys = qw(
should_come_last_9999_0
blablabla_10_403
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
blablabla902_1
blablabla901_3
);
print "$_\n" for nsort #keys;
Output:
blablabla_10_403
blablabla901_3
blablabla902_1
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
should_come_last_9999_0
This sorts on two columns, and uses the Schwartzian transform to create those columns from your strings.
use strict;
use warnings;
use feature 'say';
my #keys = qw(
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
);
#keys =
map { $_->[0] } # transform back
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] } # sort
map { # transform
m/(\d+)(?:\D+(\d+))/;
[ $_, ( defined $2 ? ( $1, $2 ) : ( 0xffffffff, $1 ) ) ]
} #keys;
say for #keys;
Output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
scaffold_902_159
The data structure returned by the initial transformation map looks like this:
[ 'scaffold_902_159', 902, 159 ]
The sort uses that to first sort by index 1 (the 902) above with the numerical sort <=>. That operator returns 0 if both the RHS and the LHS are equal, so the or || continues with the right expression, It then sorts on index 2 (the 159).
Because you said the first number is optional, and if only the second number is there those elements should come last, we have to substitute a very high number for that. Without going into 64bit integers, 0xffffffff is the highest number we can make.
The second map pulls the full key out of index 0 of the array reference.
If we add some other things to the input, like the blablablaNUMBER_NUMBER you suggested, it will still only sort on the numbers and ignore the string part completely.
my #keys = qw(
should_come_last_9999_0
blablabla_10_403
scaffold_902_159
scaffold_2_1980420
scaffold_2_10
scaffold_10_402
no_first_number_1
);
Here's the output:
scaffold_2_10
scaffold_2_1980420
scaffold_10_402
blablabla_10_403
blablabla902_1
scaffold_902_159
should_come_last_9999_0
no_first_number_1

perl hash for loop to get number

I have a hash like
key value
1 ababababab
11 cdcdcdcdcd
21 efefefefef
31 fgfgfgfgfg
41 ererererer
now I have a array[0]=5 array[1]= 22
How can i get the string from 5-22
abababababcdcdcdcdcdef
I plan use foreach to compare key with 5 and 22, but i don't know how to solve it.
my %hash = qw(
1 ababababab
11 cdcdcdcdcd
21 efefefefef
31 fgfgfgfgfg
41 ererererer
);
my #array = (5,22);
my $str = join "", map $hash{$_}, sort {$a <=> $b} keys %hash;
print
my $result = substr($str, $array[0]-1, $array[1]-$array[0]+1);
Why are you storing your data in a hash if you want it to work like a string? Just put it in a string:
my $string = "abababababcdcdcdcdcdefefefefeffgfgfgfgfgererererer";
Or if you need that hash to exist for another use you need for later, construct a string from it to use for this operation:
my $string = join "", map $hash{$_}, sort {$a <=> $b} keys %hash;
Then get the substring from position 5-22:
my $fragment = substr $string, 5, 17; # Arguments: string, start, length
I'm sure you can find a hacky way to do this with your hash, but that's not what hashes are made for and this will be far more optimal and readable.
First u need to track down the hash keypair values u need. For 5 and 22, u need $hash{1},$hash{11} and $hash{21}.. A small snippet
if($a=1;$a<=41;$a+10)
{
if($array[0] >=$a && $array[0] <=$a+10)
{
$starting_hash_key=$a;
}
if($array[1] >=$a && $array[1] <=$a+10)
{
$Ending_hash_key=$a;
}
}
Get all hashes in between the two hash.. Use substr function (http://perldoc.perl.org/functions/substr.html) to starting and ending hash to get the apt characters.

finding highest value in hash

I have a hash with 5 keys, each of these keys have 5 values
foreach $a(#mass){
if($a=~some regex){
#value=($1,$2,$3,$4,$5);
$hash{"keysname$c"}="#value";
c++;
}
}
Each scalar is a value of different parameters , I have to determinate the highest value of the first array for the all keys in hash
Edit:
Code must compare first value of key1 with first value of key2, key3...key5 and print the highest one
This will print max value for structure like
my %hash = ( k1 => [6,4,1], k2 => [16,14,11] );
use List::Util qw(max);
# longest array
my $n = max map $#$_, values %hash;
for my $i (0 .. $n) {
my $max = max map $_->[$i], values %hash;
print "max value on position $i is $max\n";
}
and for strings,
my %hash = ( k1 => "6 4 1", k2 => "16 14 11" );
use List::Util qw(max);
# longest array
my $n = max map $#{[ split ]}, values %hash;
for my $i (0 .. $n) {
my $max = max map [split]->[$i], values %hash;
print "max value on position $i is $max\n";
}
If I understand your question correctly (and it's a little unclear) then I think you want something like this:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use List::Util 'max';
my (#data, #max);
while (<DATA>) {
chomp;
push #data, [split];
}
for my $i (0 .. $#{$data[0]}) {
push #max, max map { $_->[$i] } #data;
}
say "#max";
__DATA__
93 3 26 87 7
66 96 46 77 42
26 3 71 64 91
31 27 14 40 86
82 72 71 34 7
try this
map {push #temp, #{$_}} values %hash;
#desc_sorted= sort {$b <=> $a} #temp;
print $desc_sorted[0],"\n";
map will consolidate all lists to a single list and sort will sort that consolidated array in descending order.

sorting by mmyy (month and year)

I'm looking for a logical (not additional module) to sort by such format. I have a list of strings which looks like:
asdadasBBBsfasdasdas-0112
asdanfnfnfnfnf222ads-1210
etc.
I cant just sort by the numbers, because, for instance: 812 > 113 (812 = August 2012, 113 = January 2013, so its incorrect)
any good strategy??
thanks,
A schwartzian transform would be a huge waste here. This similar construct whose name I can never remember would be way better.
my #sorted =
map substr($_, 4),
sort
map substr($_, -2) . substr($_, -4, 2) . $_,
#unsorted;
Using the match operator instead of substr:
my #sorted =
map substr($_, 4),
sort
map { /(..)(..)\z/s; $2.$1.$_ }
#unsorted;
How about Schwartzian transform:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dump qw(dump);
my #list = (
'asdadasBBBsfasdasdas-0112',
'asdanfnfnfnfnf222ads-1210',
'asdanfnfnfnfnf222ads-1211',
'asdanfnfnfnfnf222ads-1010',
'asdanfnfnfnfnf222ads-1011',
);
my #sorted =
map { $_->[0] }
sort { $a->[1] <=> $b->[1] or $a->[2] <=> $b->[2] }
map { /-(\d\d)(\d\d)$/; [$_, $2, $1] } #list;
dump #sorted;
output:
(
"asdanfnfnfnfnf222ads-1010",
"asdanfnfnfnfnf222ads-1210",
"asdanfnfnfnfnf222ads-1011",
"asdanfnfnfnfnf222ads-1211",
"asdadasBBBsfasdasdas-0112",
)
Use a sorting function that looks at the year first, and then the date:
sub mmyy_sorter {
my $a_yy = substr($a, -2);
my $b_yy = substr($b, -2);
my $a_mm = substr($a, -4, 2);
my $b_mm = substr($b, -4, 2);
return ($a_yy cmp $b_yy) || ($a_mm cmp $b_mm);
}
my #sorted = sort mmyy_sorter #myarray;
NB: this is technically not as efficient as it could be as it has to re-calculate the month and year subfields for every comparison, not just once for each item in the array.
It would also be possible to take advantage of Perl's automatic type conversion and use the <=> operator in place of cmp, since all of the values actually represent numbers.
What about remake it to months? For example:
812 = 12 * 12 + 8
113 = 13 * 12 + 1
You can turn years into months and it will be good. For selecting numbers you can use regex.
Thanks to #M42 for the sample data.
use strict;
use warnings;
use feature 'say';
my #list = (
'asdadasBBBsfasdasdas-0112',
'asdanfnfnfnfnf222ads-1210',
'asdanfnfnfnfnf222ads-1211',
'asdanfnfnfnfnf222ads-1010',
'asdanfnfnfnfnf222ads-1011',
);
my #sorted = sort {
my ($aa, $bb) = map { /(..)(..)\z/ and $2.$1 } $a, $b;
$aa <=> $bb;
} #list;
say for #sorted;
output
asdanfnfnfnfnf222ads-1010
asdanfnfnfnfnf222ads-1210
asdanfnfnfnfnf222ads-1011
asdanfnfnfnfnf222ads-1211
asdadasBBBsfasdasdas-0112

sorting an array on the first number found in each element

I'm looking for help sorting an array where each element is made up of "a number, then a string, then a number". I would like to sort on the first number part of the array elements, descending (so that I list the higher numbers first), while also listing the text etc.
am still a beginner so alternatives to the below are also welcome
use strict;
use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 ); # build an array of 100 random numbers between 1 and 49
my #count2;
foreach my $i (1..49) {
my #count = join(',', #arr) =~ m/$i,/g; # maybe try to make a string only once then search trough it... ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
}
#for (#count2) {print "$_\n";}
# try to add up all numbers in the first coloum to make sure they == 100
#sort #count2 and print the top 7
#count2 = sort {$b <=> $a} #count2; # try to stop printout of this, or sort on =~ m/^anumber/ ??? or just on the first one or two \d
foreach my $i (0..6) {
print $count2[$i] ."\n"; # seems to be sorted right anyway
}
First, store your data in an array, not in a string:
# inside the first loop, replace your line with the push() with this one:
push(#count2, [$count1, $i];
Then you can easily sort by the first element of each subarray:
my #sorted = sort { $b->[0] <=> $a->[0] } #count2;
And when you print it, construct the string:
printf "%d times for %d\n", $sorted[$i][0], $sorted[$i][1];
See also: http://perldoc.perl.org/perlreftut.html, perlfaq4
Taking your requirements as is. You're probably better off not embedding count information in a string. However, I'll take it as a learning exercise.
Note, I am trading memory for brevity and likely speed by using a hash to do the counting.
However, the sort could be optimized by using a Schwartzian Transform.
EDIT: Create results array using only numbers that were drawn
#!/usr/bin/perl
use strict; use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 );
my %counts;
++$counts{$_} for #arr;
my #result = map sprintf('%d times for %d', $counts{$_}, $_),
sort {$counts{$a} <=> $counts{$b}} keys %counts;
print "$_\n" for #result;
However, I'd probably have done something like this:
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my #arr;
$#arr = 99; #initialize #arr capacity to 100 elements
my %counts;
for my $i (0 .. 99) {
my $n = int(rand(49) + 1); # pick a number
$arr[ $i ] = $n; # store it
++$counts{ $n }; # update count
}
# sort keys according to counts, keys of %counts has only the numbers drawn
# for each number drawn, create an anonymous array ref where the first element
# is the number drawn, and the second element is the number of times it was drawn
# and put it in the #result array
my #result = map [$_, $counts{$_}],
sort {$counts{$a} <=> $counts{$b} }
keys %counts;
print Dump \#result;