I have a list in Perl.
#alist=("a_vld","a_sf","a_ef","a_val");
print join(',', #alist), "\n";
Output:
a_vld,a_sf,a_ef,a_val
How can I re-order the elements in list such that its output is as following expected output:
a_sf,a_ef,a_vld,a_val
Note: a_ will keep changing with different strings but I want to preserve the order of sf,ef,vld,val
One way to order (sort) by a given set is to associate a numeric order with its elements.
Then we need to find these keys in the strings in the list to sort. That is done once, ahead of the actual sorting, via the Schwartzian transform.
use List::Util qw(first);
my #alist = ("a_vld", "a_sf", "a_ef", "a_val");
my #keys = qw(sf ef vld val); # keys to sort by
my %order_by = map { $keys[$_] => $_ } 0..$#keys;
my #sorted =
map { $_->[0] }
sort { $order_by{$a->[1]} <=> $order_by{$b->[1]} }
map {
my $elem = $_;
[ $elem, first { $elem =~ /$_/ } keys %order_by ]
}
#alist;
say "#sorted";
This prints the line: a_sf a_ef a_vld a_val
The elements of the list are associated with the sorting keys by finding the key in the string via a regex. With this in mind the above can be used reasonably generally, with a desired list of keys .
For a short list, use a sort function that identifies the sequences in order
#sorted = sort {
($b =~ /sf$/) <=> ($a =~ /sf$/)
|| ($b =~ /ef$/) <=> ($a =~ /ef$/)
|| ($b =~ /vld$/) <=> ($a =~ /vld$/)
|| ($b =~ /val$/) <=> ($a =~ /val$/)
} #alist;
If there's always a _ before the code, and if the code is always at the end
use Sort::Key qw( ikeysort );
my #order = qw( sf ef vld val );
my %order = map { $order[$_] => $_ } 0..$#order;
my #sorted = ikeysort { /_([^_\W]+)\z/ ? $order{$1} : 0 } #unsorted;
The above is a faster, cleaner version of the following:
my #order = qw( sf ef vld val );
my %order = map { $order[$_] => $_ } 0 .. $#order;
my #sorted =
sort {
my $key_a = /_([^_\W]+)\z/ ? $order{$1} : 0;
my $key_b = /_([^_\W]+)\z/ ? $order{$1} : 0;
$key_a <=> $key_b || $a cmp $b
}
#unsorted;
Otherwise
use Sort::Key qw( ikeysort );
my #order = qw( sf ef vld val );
my %order = map { $order[$_] => $_ } 0..$#order;
my $alt = join '|', map quotemeta, #order;
my $re = qr/($alt)/;
my #sorted = ikeysort { /$re/ ? $order{$1} : 0 } #unsorted;
The above is a faster, cleaner version of the following:
my #order = qw( sf ef vld val );
my %order = map { $order[$_] => $_ } 0..$#order;
my $alt = join '|', map quotemeta, #order;
my $re = qr/($alt)/;
my #sorted =
sort {
my $key_a = /$re/ ? $order{$1} : 0;
my $key_b = /$re/ ? $order{$1} : 0;
$key_a <=> $key_b || $a cmp $b
}
#unsorted;
Related
I have a file which looks like this:
80,1p21
81,19q13
82,6p12.3
83,Xp11.22
84,3pter-q21
86,3q26.33
87,14q24.1-q24.2|14q24|14q22-q24
88,1q42-q43
89,11q13.1
90,2q23-q24
91,12q13
92,2q22.3
93,3p22
94,12q11-q14
95,3p21.1
97,14q24.3
98,2p16.2
And I want to sort them based on the second column. And the first column should change accordingly too. When you use the 'sort' command in Perl, it doesn't do it because it says it's not numeric. Is there a way to sort things alpha numerically in Perl?
If you read the documentation for sort, you'll see that you don't need to do a numeric sort in Perl. You can do string comparisons too.
#sorted = sort { $a cmp $b } #unsorted;
But that still leaves you with a problem as, for example, 19q will sort before 6p. So you can write your own sort function which can make whatever transformations you want before doing the comparison.
#sorted = sort my_complex_sort #unsorted;
sub my_complex_sort {
# code that compares $a and $b and returns -1, 0 or 1 as appropriate
# It's probably best in most cases to do the actual comparison using cmp or <=>
# Extract the digits following the first comma
my ($number_a) = $a =~ /,(\d+)/;
my ($number_b) = $b =~ /,(\d+)/;
# Extract the letter following those digits
my ($letter_a) = $a =~ /,\d+(a-z)/;
my ($letter_b) = $b =~ /,\d+(a-z)/;
# Compare and return
return $number_a <=> $number_b or $letter_a cmp $letter_b;
}
#!/usr/bin/env perl
use strict;
use warnings;
my #datas = map { /^(\d+),(\d*)(.*)$/; [$1, $2, $3]; } <DATA>;
my #res = sort {$a->[1] <=> $b->[1] or $a->[2] cmp $b->[2]} #datas;
foreach my $data (#res) {
my ($x, $y, $z) = #{$data};
print "$x,$y$z\n";
}
__DATA__
80,1p21
81,19q13
82,6p12.3
83,Xp11.22
84,3pter-q21
86,3q26.33
87,14q24.1-q24.2|14q24|14q22-q24
88,1q42-q43
89,11q13.1
90,2q23-q24
91,12q13
92,2q22.3
93,3p22
94,12q11-q14
95,3p21.1
97,14q24.3
98,2p16.2
I actually found the answer to this. The code looks a bit complicated though.
#!/usr/bin/env perl
use strict;
use warnings;
sub main {
my $file;
if (#ARGV != 1) {
die "Usage: perl hashofhash_sort.pl <filename>\n";
}
else {
$file = $ARGV[0];
}
open(IN, $file) or die "Error!! Cannot open the $file file: $!\n";
my #file = <IN>;
chomp #file;
my ($entrez_gene, $loci, $chr, $band, $pq, $band_num);
my (%chromosome, %loci_entrez);
foreach my $line (#file) {
if ($line =~ /(\d+),(.+)/) {
# Entrez genes
$entrez_gene = $1;
# Locus like 12p23.4
$loci = $2;
if ($loci =~ /^(\d+)(.+)?/) {
# chromosome number alone (only numericals)
$chr = $1;
if ($2) {
# locus minus chromosome number. If 12p23.4, then $band is p23.4
$band = "$2";
if ($band =~ /^([pq])(.+)/) {
# either p or q
$pq = $1;
# stores the numericals. for p23.4, stores 23.4
$band_num = $2;
}
if (exists $chromosome{$chr}) {
if (exists $chromosome{$chr}{$pq}) {
push (#{$chromosome{$chr}{$pq}}, $band_num);
}
else {
$chromosome{$chr}{$pq} = [$band_num];
}
}
else {
$chromosome{$chr}{$pq} = [$band_num];
}
}
}
}
} # End of foreach loop
foreach my $key (sort {$a <=> $b} keys %chromosome) {
my %seen = ();
foreach my $key2 (sort {$a cmp $b } keys %{$chromosome{$key}}) {
my #unique = grep { ! $seen{$_}++ } #{$chromosome{$key}{$key2}};
my #sorted = sort #unique;
foreach my $element (#sorted) {
my $sorted_locus = "$key$key2$element";
if (exists $loci_entrez{$sorted_locus}) {
foreach my $element2 (#{$loci_entrez{$sorted_locus}}) {
print "$element2,$sorted_locus\n";
}
}
}
}
}
} # End of main
main();
In the very general case, the question is ambiguous on what to do with integers that are equal but written differently, because of the possibility of leading zeros. The following comparison function (for sort) allows one to consider the lexicographic order as soon as one doesn't have different integers. This is the same as zsh's numeric sort.
sub alphanumcmp ($$)
{
my (#u,#v);
if ((#u = $_[0] =~ /^(\d+)/) &&
(#v = $_[1] =~ /^(\d+)/))
{
my $c = $u[0] <=> $v[0];
return $c if $c;
}
if ((#u = $_[0] =~ /^(.)(.*)/) &&
(#v = $_[1] =~ /^(.)(.*)/))
{
return $u[0] cmp $v[0] || &alphanumcmp($u[1],$v[1]);
}
return $_[0] cmp $_[1];
}
For instance, one would get the following sorted elements:
a0. a00. a000b a00b a0b a001b a01. a01b a1. a1b a010b a10b a011b a11b
Note 1: The use of <=> assumes that the numbers are not too large.
Note 2: In the question, the user wants to do an alphanumeric sort on the second column (instead of the whole string). So, in this particular case, the comparison function could just be adapted to ignore the first column or a Schwartzian transform could be used.
At first sorry for my english - i hope you will understand me.
There is a hash:
$hash{a} = 1;
$hash{b} = 3;
$hash{c} = 3;
$hash{d} = 2;
$hash{e} = 1;
$hash{f} = 1;
I want to sort it by values (not keys) so I have:
for my $key ( sort { $hash{ $a } <=> $hash{ $b } } keys %hash ) { ... }
And at first I get all the keys with value 1, then with value 2, etc... Great.
But if hash is not changing, the order of keys (in this sort-by-value) is always the same.
Question: How can I shuffle sort-results, so every time I run 'for' loop, I get different order of keys with value 1, value 2, etc. ?
Not quite sure I well understand your needs, but is this ok:
use List::Util qw(shuffle);
my %hash;
$hash{a} = 1;
$hash{b} = 3;
$hash{c} = 3;
$hash{d} = 2;
$hash{e} = 1;
$hash{f} = 1;
for my $key (sort { $hash{ $a } <=> $hash{ $b } } shuffle( keys %hash )) {
say "hash{$key} = $hash{$key}"
}
You can simply add another level of sorting, which will be used when the regular sorting method cannot distinguish between two values. E.g.:
sort { METHOD_1 || METHOD_2 || ... METHOD_N } LIST
For example:
sub regular_sort {
my $hash = shift;
for (sort { $hash->{$a} <=> $hash->{$b} } keys %$hash) {
print "$_ ";
};
}
sub random_sort {
my $hash = shift;
my %rand = map { $_ => rand } keys %hash;
for (sort { $hash->{$a} <=> $hash->{$b} ||
$rand{$a} <=> $rand{$b} } keys %$hash ) {
print "$_ ";
};
}
To sort the keys by value, with random ordering of keys with identical values, I see two solutions:
use List::Util qw( shuffle );
use sort 'stable';
my #keys =
sort { $hash{$a} <=> $hash{$b} }
shuffle keys %hash;
or
my #keys =
map $_->[0],
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
map [ $_, $hash{$_}, rand ],
keys %hash;
The use sort 'stable'; is required to prevent sort from corrupting the randomness of the list returned by shuffle.
The above's use of the Schwartzian Transform is not an attempt at optimisation. I've seen people use rand in the compare function itself to try to achieve the above result, but doing so is buggy for two reasons.
When using "misbehaving" comparisons such as that, the results are documented as being undefined, so sort is allowed to return garbage, repeated elements, missing elements, etc.
Even if sort doesn't return garbage, it won't be a fair sort. The result will be weighed.
You can have two functions for ascending and decending order and use them accordingly like
sub hasAscending {
$hash{$a} <=> $hash{$b};
}
sub hashDescending {
$hash{$b} <=> $hash{$a};
}
foreach $key (sort hashAscending (keys(%hash))) {
print "\t$hash{$key} \t\t $key\n";
}
foreach $key (sort hashDescending (keys(%hash))) {
print "\t$hash{$key} \t\t $key\n";
}
It seems like you want to randomize looping through the keys.
Perl, does not store in sequential or sorted order, but this doesn't seem to be random enough for you, so you may want to create an array of keys and loop through that instead.
First, populate an array with keys, then use a random number algorithm (1..$#length_of_array) to push the key at that position in the array, to the array_of_keys.
If you're trying to randomize the keys of the sorted-by-value hash, that's a little different.
See Codepad
my %hash = (a=>1, b=>3, c=>3, d=>2, e=>1, f=>1);
my %hash_by_val;
for my $key ( sort { $hash{$a} <=> $hash{$b} } keys %hash ) {
push #{ $hash_by_val{$hash{$key}} }, $key;
}
for my $key (sort keys %hash_by_val){
my #arr = #{$hash_by_val{$key}};
my $arr_ubound = $#arr;
for (0..$arr_ubound){
my $randnum = int(rand($arr_ubound));
my $val = splice(#arr,$randnum,1);
$arr_ubound--;
print "$key : $val\n"; # notice: output varies b/t runs
}
}
I have a list of filenames which are like so:
fw_d.log.1.gz
through
fw_d.log.300.gz
When I use this code block below, it almost sorts it the way I want, but not quite:
#!/usr/bin/perl -w
my $basedir = "/var/log";
my #verdir = qw(fw_d);
my $fulldir;
my $configs;
my $combidir;
foreach $combidir (#verdir) {
$fulldir = "$basedir/$combidir";
opendir (DIR, $fulldir);
my #files = grep { $_ ne '.' && $_ ne '..' && $_ ne 'CVS' readdir DIR;
closedir (DIR);
#files1 = sort {$a cmp $b}(#files);
foreach my $configs (#files1) {
print "Checking $configs\n";
system("less $basedir/$combidir/$configs | grep \'.* Group = , Username = .* autheauthenticated.\' >> output.log" );
}
}
Here is a snippet output:
Checking fw_d.log
Checking fw_d.log.1.gz
Checking fw_d.log.10.gz
Checking fw_d.log.100.gz
Checking fw_d.log.101.gz
Checking fw_d.log.102.gz
As you can see, it almost sorts it how I was hoping... Does anyone have any suggestions, on either reading, or a code snippet I can use?
You could use Schartzian-transform :
my #sorted = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, $_=~/(\d+)/] }
#files;
print Dumper \#sorted;
Added benchmark for comparison between Schwartzian-Transform and subroutine
use Benchmark qw(:all);
# build list of files
my #files = map {'fw_d.log.'.int(rand()*1000).'.log' } 0 ..300;
my $count = -3;
my $r = cmpthese($count, {
'subname' => sub {
sub expand {
my $file=shift;
$file=~s{(\d+)}{sprintf "%04d", $1}eg;
return $file;
}
my #sorted = sort { expand($a) cmp expand($b) } #files;
},
'schwartzian' => sub {
my #sorted = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [$_, $_=~/(\d+)/] }
#files;
}
});
Result:
Rate subname schwartzian
subname 21.2/s -- -92%
schwartzian 279/s 1215% --
Schwartzian-transform is about 13 times more efficient for sorting 300 files.
the problem is that the code does what you tell it to do: sort the file names in alphabetical order.
You should replace sort { $a cmp $b } by sort { expand($a) cmp expand($b) }
with expand:
sub expand
{ my $file=shift;
$file=~s{(\d+)}{sprintf "%04d", $1}eg; # expand all numbers to 4 digits
return $file;
}
What you can try is using a custom sort function:
sub sort_by_number {
$a =~ /(\d+)/;
$numa = $1;
$b =~ /(\d+)/;
$numb = $1;
return $numa <=> $numb;
}
and then sort like this:
#files1 = sort sort_by_number #files;
This will sort the strings in #files by the value of the first number in each string.
Older question, but there's an answer as yet unmentioned.
Sort::Naturally does this for you:
Sort lexically, but sort numeral parts numerically
#!/usr/bin/env perl
use strict;
use warnings;
use Sort::Naturally;
print nsort <DATA>;
__DATA__
fw_d.log
fw_d.log.101.gz
fw_d.log.1.gz
fw_d.log.10.gz
fw_d.log.100.gz
fw_d.log.2.gz
fw_d.log.102.gz
fw_d.log.12.gz
This orders as:
fw_d.log
fw_d.log.1.gz
fw_d.log.2.gz
fw_d.log.10.gz
fw_d.log.12.gz
fw_d.log.100.gz
fw_d.log.101.gz
fw_d.log.102.gz
I've a set of strings with variable sizes, for example:
AAA23
AB1D1
A1BC
AAB212
My goal is have in alphabetical order and unique characters collected for COLUMNS, such as:
first column : AAAA
second column : AB1A
and so on...
For this moment I was able to extract the posts through a hash of hashes. But now, how can I sort data? Could I for each hash of hash make a new array?
Thank you very much for you help!
Al
My code:
#!/usr/bin/perl
use strict;
use warnings;
my #sessions = (
"AAAA",
"AAAC",
"ABAB",
"ABAD"
);
my $length_max = 0;
my $length_tmp = 0;
my %columns;
foreach my $string (#sessions){
my $l = length($string);
if ($l > $length_tmp){
$length_max = $l;
}
}
print "max legth : $length_max\n\n";
my $n = 1;
foreach my $string (#sessions){
my #ch = split("",$string);
for my $col (1..$length_max){
$columns{$n}{$col} = $ch[$col-1];
}
$n++;
}
foreach my $col (keys %columns) {
print "colonna : $col\n";
my $deref = $columns{$col};
foreach my $pos (keys %$deref){
print " posizione : $pos --> $$deref{$pos}\n";
}
print "\n";
}
exit(0);
What you're doing is rotating the array. It doesn't need a hash of hash or anything, just another array. Surprisingly, neither List::Util nor List::MoreUtils supplies one. Here's a straightforward implementation with a test. I presumed you want short entries filled in with spaces so the columns come out correct.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More;
use List::Util qw(max);
my #Things = qw(
AAA23
AB1D1
A1BC
AAB212
);
sub rotate {
my #rows = #_;
my $maxlength = max map { length $_ } #rows;
my #columns;
for my $row (#rows) {
my #chars = split //, $row;
for my $colnum (1..$maxlength) {
my $idx = $colnum - 1;
$columns[$idx] .= $chars[$idx] || ' ';
}
}
return #columns;
}
sub print_columns {
my #columns = #_;
for my $idx (0..$#columns) {
printf "Column %d: %s\n", $idx + 1, $columns[$idx];
}
}
sub test_rotate {
is_deeply [rotate #_], [
"AAAA",
"AB1A",
"A1BB",
"2DC2",
"31 1",
" 2",
];
}
test_rotate(#Things);
print_columns(#Things);
done_testing;
You can sort the output of %columns in your code with
foreach my $i (sort { $a <=> $b } keys %columns) {
print join(" " => sort values %{ $columns{$i} }), "\n";
}
This gives
A A A A
A A A C
A A B B
A A B D
But using index numbers as hash keys screams that you should use an array instead, so let's do that. To get the columns, use
sub columns {
my #strings = #_;
my #columns;
while (#strings) {
push #columns => [ sort map s/^(.)//s ? $1 : (), #strings ];
#strings = grep length, #strings;
}
#columns;
}
Given the strings from your question, it returns
A A A A
1 A A B
1 A B B
2 2 C D
1 1 3
2
As you can see, this is unsorted and repeats characters. With Perl, when you see the word unique, always think of hashes!
sub unique_sorted_columns {
map { my %unique;
++$unique{$_} for #$_;
[ sort keys %unique ];
}
columns #_;
}
If you don't mind destroying information, you can have columns sort and filter duplicates:
sub columns {
my #strings = #_;
my #columns;
while (#strings) {
my %unique;
map { ++$unique{$1} if s/^(.)//s } #strings;
push #columns => [ sort keys %unique ];
#strings = grep length, #strings;
}
#columns;
}
Output:
A
1 A B
1 A B
2 C D
1 3
2
I think I need some sort of Schwartzian Transform to get this working, but I'm having trouble figuring it out, as perl isn't my strongest language.
I have a directory with contents as such:
album1.htm
album2.htm
album3.htm
....
album99.htm
album100.htm
I'm trying to get the album with the highest number from this directory (in this case, album100.htm). Note that timestamps on the files are not a reliable means of determining things, as people are adding old "missing" albums after the fact.
The previous developer simply used the code snippet below, but this clearly breaks down once there are more than 9 albums in a directory.
opendir(DIR, PATH) || print $!;
#files = readdir(DIR);
foreach $file ( sort(#files) ) {
if ( $file =~ /album/ ) {
$last_file = $file;
}
}
If you just need to find the album with the highest number, you don't really need to sort the list, just run through it and keep track of the maximum.
#!/usr/bin/perl
use strict;
use warnings;
my $max = 0;
while ( <DATA> ) {
my ($album) = $_ =~ m/album(\d+)/;
$max = $album if $album > $max;
}
print "album$max.htm";
__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm
To find the highest number, try a custom sort...
sub sort_files {
(my $num_a = $a) =~ s/^album(\d+)\.htm$/$1/;
(my $num_b = $b) =~ s/^album(\d+)\.htm$/$1/;
return $num_a <=> $num_b;
}
my #sorted = sort \&sort_files #files;
my $last = pop #sorted;
Also, take a look at the File::Next module. It will let you pick out just the files that begin with the word "album". I find it a little easier than readdir.
The reason why you're encountering difficulties is the operator, <=> is the numeric comparison, cmp is the default and it is string comparison.
$ perl -E'say for sort qw/01 1 02 200/';
01
02
1
200
With a slight modification we get something much closer to correct:
$ perl -E'say for sort { $a <=> $b } qw/01 1 02 200/';
01
1
02
200
However, in your case you need to remove the non digits.
$ perl -E'say for sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2 } qw/01 1 02 200/';
01
1
02
200
Here is it more pretty:
sort {
my $s1 = $a =~ m/(\d+)/;
my $s2 = $b =~ /(\d+)/;
$s1 <=> $s2
}
This isn't flawless, but it should give you a good idea of your issue with sort.
Oh, and as a follow up, the Shcwartzian Transform solves a different problem: it stops you from having to run a complex task (unlike the one you're needing -- a regex) multiple times in the search algorithm. It comes at a memory cost of having to cache the results (not to be unexpected). Essentially, what you do is map the input of the problem, to the output (typically in an array) [$input, $output] then you sort on the outputs $a->[1] <=> $b->[1]. With your stuff now sorted you map back over to get your original inputs $_->[0].
map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, fn($_) ]
, qw/input list here/
;
It is cool because it is so compact while being so efficient.
Here you go, using Schwartzian Transform:
my #files = <DATA>;
print join '',
map { $_->[1] }
sort { $a->[0] <=> $b->[0] }
map { [ m/album(\d+)/, $_ ] }
#files;
__DATA__
album12.htm
album1.htm
album2.htm
album10.htm
Here's an alternative solution using reduce:
use strict;
use warnings;
use List::Util 'reduce';
my $max = reduce {
my ($aval, $bval) = ($a =~ m/album(\d+)/, $b =~ m/album(\d+)/);
$aval > $bval ? $a : $b
} <DATA>;
print "max album is $max\n";
__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm
Here's a generic solution:
my #sorted_list
= map { $_->[0] } # we stored it at the head of the list, so we can pull it out
sort {
# first test a normalized version
my $v = $a->[1] cmp $b->[1];
return $v if $v;
my $lim = #$a > #$b ? #$a : #$b;
# we alternate between ascii sections and numeric
for ( my $i = 2; $i < $lim; $i++ ) {
$v = ( $a->[$i] || '' ) cmp ( $b->[$i] || '' );
return $v if $v;
$i++;
$v = ( $a->[$i] || 0 ) <=> ( $b->[$i] || 0 );
return $v if $v;
}
return 0;
}
map {
# split on digits and retain captures in place.
my #parts = split /(\d+)/;
my $nstr = join( '', map { m/\D/ ? $_ : '0' x length() } #parts );
[ $_, $nstr, #parts ];
} #directory_names
;