Perl sort array by pattern match - perl

I would like to sort this array based on the value after the comma
my #coords;
$coords[0] = "33.7645539, -84.3585973";
$coords[1] = "33.7683870, -84.3559850";
$coords[2] = "33.7687753, -84.3541355";
foreach my $coord (#sorted_coords) {
print "$coord\n";
}
Output:
33.7687753, -84.3541355
33.7683870, -84.3559850
33.7645539, -84.3585973
I've thought about using map, grep, and capture groups as the list input for sort, but I haven't gotten very far:
my #sorted_coords = sort { $a <=> $b } map {$_ =~ /, (-*\d+\.\d+)/} #unique_coords;

It is easy to submit to the temptation to use a fancy implementation instead of something straightforward and clear. Unless the data set is huge, the speed advantage of using a transform is negligible, and comes at the cost of much reduced legibility
A standard sort block is all that's necessary here
use strict;
use warnings;
my #coords = (
"33.7645539, -84.3585973",
"33.7683870, -84.3559850",
"33.7687753, -84.3541355",
);
my #sorted_coords = sort {
my ($aa, $bb) = map { (split)[1] } $a, $b;
$bb <=> $aa;
} #coords;
print "$_\n" for #sorted_coords;
output
33.7687753, -84.3541355
33.7683870, -84.3559850
33.7645539, -84.3585973
Update
If you prefer, the second field may be extracted from the input records using a regex instead. Replacing the map statement with something like this
my ($aa, $bb) = map /.*(\S+)/, $a, $b;
will work fine

Looks like you could use a Schwartzian transform. You had the right idea:
my #coords;
$coords[1] = "33.7683870, -84.3559850";
$coords[2] = "33.7687753, -84.3541355";
$coords[0] = "33.7645539, -84.3585973";
my #sorted_coords = map { $_->[0] } # 3. extract the first element
sort { $b->[1] <=> $a->[1] } # 2. sort on the second
# element, descending
map { [ $_, /,\s*(\S+)$/ ] } # 1. create list of array refs
#coords;
foreach my $coord (#sorted_coords) {
print "$coord\n";
}
Edit: Adding Joshua's suggestion:
my #sorted_coords = map { join ', ', #$_ }
sort { $b->[1] <=> $a->[1] }
map { [ split /, / ] }
#coords;
It seems easier to look at and more descriptive than my original example.

Related

sort hash by values *but* return keys

The incoming file is in the form of account:data. In this case, the data are Y-SNPs.
I want to sort by value (SNPs) and return the key (account) with the data so that I can keep the two associated. This prints only the data. And doing a regular array sort on the second field doesn't work either.
#!/usr/bin/perl
#lines = <STDIN>;
chomp #lines;
foreach (#lines)
{
(#f) = split /:/,$_;
$h{$f[0]} = $f[1];
}
#s = map { [ $_, $h{$_} ] } sort values %h;
foreach (#s) {print "#$_\n"}
If you want to sort a hash by values but get the associated keys, you can do it with a custom sort function, i.e. something like this:
my #sorted_keys = sort { $h{$a} <=> $h{$b} } keys %h;
This will return the keys in %h in the order of their values, in this case sorted numerically with <=>. If you want to sort it alphabetically use cmp instead.
For more see the documentation of sort which even has a similar example:
# this sorts the %age hash by value instead of key
# using an in-line function
my #eldest = sort { $age{$b} <=> $age{$a} } keys %age;

How to sort perl hash keys [duplicate]

This question already has answers here:
Sorting hash keys by Alphanumeric sort
(4 answers)
Closed 8 years ago.
I have a hash which looks like this
my %hash = (
'124:8' => '',
'4:2' => '',
'17:11' => '',
'17:0' => '',
#and so on
);
I tried to sort and use hash keys by small number to bigger
for my $keys ( sort { $a > $b } keys %hash ) {
#do stuff
}
This gives me some result that looks like correct but it fails sometimes. I don't know how to compare both numbers, 124:8 with 4:2 since it has : in a middle, any suggestions ?
You might want to sort on first and second number delimited by :
my #sorted = sort {
my ($aa, $bb) = map [ split /:/ ], $a, $b;
$aa->[0] <=> $bb->[0] || $aa->[1] <=> $bb->[1]
} keys %hash;
for my $key (#sorted) { .. }
Using Schwartzian,
my #sorted = map $_->[0],
sort {
$a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]
}
map [ $_, split /:/ ],
keys %hash;
When you sort numbers, you use the <=> operator:
for my $key (sort { $a <=> $b } keys %hash) {
This operator returns 1, 0 or -1 depending on the comparison. > only returns true or false, which explains it working with some results, but not all.
Because your keys are not numbers, they will only partially convert to numbers, and you will get warnings
Argument "17:11" isn't numeric in sort
Then you will need to use something like Sort::Key::Natural, or swing your own, such as:
sort {
my #a = $a =~ /\d+/g;
my #b = $b =~ /\d+/g;
$a[0] <=> $b[0] ||
$a[1] <=> $b[1] # continue as long as needed
} keys %hash
You may also use a Schwartzian transform to cache the numbers and possibly speed up the sort.
Or just sort by string comparison, though this will cause 17:11 to end up after 17:2.
Not as elegant as above solutions, but what to convert the : to . and compare them as floating point numbers? Because no math operations occurs, no rounding errors and the next could work:
my %tmp = map { (my $x = $_) =~ s/:/./; $_,$x} keys %hash;
my #sortedkeys = sort { $tmp{$a} <=> $tmp{$b} } keys %tmp;
#4:2 17:0 17:11 124:8
Or this approach is wrong?

Perl sort genomic positions

I have a list of genomic positions in the format
chromosome:start-end
for example
chr1:100-110
chr1:1000-1100
chr1:200-300
chr10:100-200
chr2:100-200
chrX:100-200
I want to sort this by chromosome number and numerical start position to get this:
chr1:100-110
chr1:200-300
chr1:1000-1100
chr2:100-200
chr10:100-200
chrX:100-200
What is a good and efficient way to do this in perl?
Just use the module Sort::Keys::Natural:
use strict;
use warnings;
use Sort::Key::Natural qw(natsort);
print natsort <DATA>;
__DATA__
chr1:100-110
chr1:1000-1100
chr1:200-300
chr10:100-200
chr2:100-200
chrX:100-200
chrY:100-200
chrX:1-100
chr10:100-150
Outputs:
chr1:100-110
chr1:200-300
chr1:1000-1100
chr2:100-200
chr10:100-150
chr10:100-200
chrX:1-100
chrX:100-200
chrY:100-200
You can sort this by providing a custom comparator. It appears that you want a two level value as the sorting key, so your custom comparator would derive the key for a row and then compare that:
# You want karyotypical sorting on the first element,
# so set up this hash with an appropriate normalized value
# per available input:
my %karyotypical_sort = (
1 => 1,
...
X => 100,
);
sub row_to_sortable {
my $row = shift;
$row =~ /chr(.+):(\d+)-/; # assuming match here! Be careful
return [$karyotypical_sort{$1}, $2];
}
sub sortable_compare {
my ($one, $two) = #_;
return $one->[0] <=> $two->[0] || $one->[1] <=> $two->[1];
# If first comparison returns 0 then try the second
}
#lines = ...
print join "\n", sort {
sortable_compare(row_to_sortable($a), row_to_sortable($b))
} #lines;
Since the calculation would be slightly onerous (string manipulation is not free) and since you are probably dealing with a lot of data (genomes!) it is likely you will notice improved performance if you perform a Schwartzian Transform. This is performed by precalculating the sort key for the row and then sorting using that and finally removing the additional data:
#st_lines = map { [ row_to_sortable($_), $_ ] } #lines;
#sorted_st_lines = sort { sortable_compare($a->[0], $b->[0]) } #st_lines;
#sorted_lines = map { $_->[1] } #sorted_st_lines;
Or combined:
print join "\n",
map { $_->[1] }
sort { sortable_compare($a->[0], $b->[0]) }
map { [ row_to_sortable($_), $_ ] } #lines;
It looks to me like you want to sort in order of the following:
By Chromosome Number
Then by the Start Position
Then (maybe) by the End Position.
So, perhaps a custom sort like this:
use strict;
use warnings;
print sort {
my #a = split /chr|:|-/, $a;
my #b = split /chr|:|-/, $b;
"$a[1]$b[1]" !~ /\D/ ? $a[1] <=> $b[1] : $a[1] cmp $b[1]
or $a[2] <=> $b[2]
or $a[3] <=> $b[3]
} <DATA>;
__DATA__
chr1:100-110
chr1:1000-1100
chr1:200-300
chr10:100-200
chr2:100-200
chrX:100-200
chrY:100-200
chrX:1-100
chr10:100-150
Outputs:
chr1:100-110
chr1:200-300
chr1:1000-1100
chr2:100-200
chr10:100-150
chr10:100-200
chrX:1-100
chrX:100-200
chrY:100-200
You could do something like this the following script, which takes a text file given your above input. The sorting on the chromosome number would need to change a bit because it's not purely lexical or numerical. But i'm sure you could tweak what I have below:
use strict;
my %chromosomes;
while(<>){
if ($_ =~ /^chr(\w+):(\d+)-\d+$/)
{
my $chr_num = $1;
my $chr_start = $2;
$chromosomes{$1}{$2} = $_;
}
}
my #chr_nums = sort(keys(%chromosomes));
foreach my $chr_num (#chr_nums) {
my #chr_starts = sort { $a <=> $b }(keys(%{$chromosomes{$chr_num}}));
foreach my $chr_start (#chr_starts) {
print "$chromosomes{$chr_num}{$chr_start}";
}
}
1;
There is a similar question asked and answered here:
How to do alpha numeric sort perl?
What you are likely looking for is a general numeric sort, like using sort -g.

convert array to hash using grep and map in perl

I have an array as follows:
#array = ('a:b','c:d','e:f:g','h:j');
How can I convert this into the following using grep and map?
%hash={a=>1,b=>1,c=>1,d=>1,e=>1,f=>1,h=>1,j=>1};
I've tried:
#arr;
foreach(#array){
#a = split ':' , $_;
push #arr,#a;
}
%hash = map {$_=>1} #arr;
but i am getting all the values i should get first two values of an individual array
Its very easy:
%hash = map {$_=>1} grep { defined $_ } map { (split /:/, $_)[0..1] } #array;
So, you split each array element with ":" delimiter, getting bigger array, take only 2 first values; then grep defined values and pass it to other map makng key-value pairs.
You have to ignore everything except first two elements after split,
my #arr;
foreach (#array){
#a = split ':', $_;
push #arr, #a[0,1];
}
my %hash = map {$_=>1} #arr;
Using map,
my %hash =
map { $_ => 1 }
map { (split /:/)[0,1] }
#array;
I think this should work though not elegent enough. I use a temporary array to hold the result of split and return the first two elements.
my %hash = map { $_ => 1 } map { my #t = split ':', $_; $t[0], $t[1] } #array;
This filters out g key
my %hash = map { map { $_ => 1; } (split /:/)[0,1]; } #array;

perl: shuffle value-sorted hash?

At first sorry for my english - i hope you will understand me.
There is a hash:
$hash{a} = 1;
$hash{b} = 3;
$hash{c} = 3;
$hash{d} = 2;
$hash{e} = 1;
$hash{f} = 1;
I want to sort it by values (not keys) so I have:
for my $key ( sort { $hash{ $a } <=> $hash{ $b } } keys %hash ) { ... }
And at first I get all the keys with value 1, then with value 2, etc... Great.
But if hash is not changing, the order of keys (in this sort-by-value) is always the same.
Question: How can I shuffle sort-results, so every time I run 'for' loop, I get different order of keys with value 1, value 2, etc. ?
Not quite sure I well understand your needs, but is this ok:
use List::Util qw(shuffle);
my %hash;
$hash{a} = 1;
$hash{b} = 3;
$hash{c} = 3;
$hash{d} = 2;
$hash{e} = 1;
$hash{f} = 1;
for my $key (sort { $hash{ $a } <=> $hash{ $b } } shuffle( keys %hash )) {
say "hash{$key} = $hash{$key}"
}
You can simply add another level of sorting, which will be used when the regular sorting method cannot distinguish between two values. E.g.:
sort { METHOD_1 || METHOD_2 || ... METHOD_N } LIST
For example:
sub regular_sort {
my $hash = shift;
for (sort { $hash->{$a} <=> $hash->{$b} } keys %$hash) {
print "$_ ";
};
}
sub random_sort {
my $hash = shift;
my %rand = map { $_ => rand } keys %hash;
for (sort { $hash->{$a} <=> $hash->{$b} ||
$rand{$a} <=> $rand{$b} } keys %$hash ) {
print "$_ ";
};
}
To sort the keys by value, with random ordering of keys with identical values, I see two solutions:
use List::Util qw( shuffle );
use sort 'stable';
my #keys =
sort { $hash{$a} <=> $hash{$b} }
shuffle keys %hash;
or
my #keys =
map $_->[0],
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
map [ $_, $hash{$_}, rand ],
keys %hash;
The use sort 'stable'; is required to prevent sort from corrupting the randomness of the list returned by shuffle.
The above's use of the Schwartzian Transform is not an attempt at optimisation. I've seen people use rand in the compare function itself to try to achieve the above result, but doing so is buggy for two reasons.
When using "misbehaving" comparisons such as that, the results are documented as being undefined, so sort is allowed to return garbage, repeated elements, missing elements, etc.
Even if sort doesn't return garbage, it won't be a fair sort. The result will be weighed.
You can have two functions for ascending and decending order and use them accordingly like
sub hasAscending {
$hash{$a} <=> $hash{$b};
}
sub hashDescending {
$hash{$b} <=> $hash{$a};
}
foreach $key (sort hashAscending (keys(%hash))) {
print "\t$hash{$key} \t\t $key\n";
}
foreach $key (sort hashDescending (keys(%hash))) {
print "\t$hash{$key} \t\t $key\n";
}
It seems like you want to randomize looping through the keys.
Perl, does not store in sequential or sorted order, but this doesn't seem to be random enough for you, so you may want to create an array of keys and loop through that instead.
First, populate an array with keys, then use a random number algorithm (1..$#length_of_array) to push the key at that position in the array, to the array_of_keys.
If you're trying to randomize the keys of the sorted-by-value hash, that's a little different.
See Codepad
my %hash = (a=>1, b=>3, c=>3, d=>2, e=>1, f=>1);
my %hash_by_val;
for my $key ( sort { $hash{$a} <=> $hash{$b} } keys %hash ) {
push #{ $hash_by_val{$hash{$key}} }, $key;
}
for my $key (sort keys %hash_by_val){
my #arr = #{$hash_by_val{$key}};
my $arr_ubound = $#arr;
for (0..$arr_ubound){
my $randnum = int(rand($arr_ubound));
my $val = splice(#arr,$randnum,1);
$arr_ubound--;
print "$key : $val\n"; # notice: output varies b/t runs
}
}