Perl : sort <=> not working

Perl : sort <=> not working - perl

I am not sure why this Perl sorting is not working.
Please suggest how to resolve this.
while (<>) {
chomp;
if (/VIOLATE/) {
#lines = split " ", $_;
#print "$lines[-2]\n"; ## Print last but one column
my #viol = "$lines[-2]\n";
#sorted = sort {$a <=> $b} #viol;
print "#sorted";
}
};
Command : perl test.pl test.log
test.log :
0.98 2.04 -1.106 VIOLATE
0.98 2.04 3.06
0.98 2.04 -11.06 VIOLATE
0.98 2.04 -1.06 VIOLATE
0.98 2.04 1.06
0.98 2.04 -0.226 VIOLATE
0.98 2.04 -2.06 VIOLATE

Are you trying to match any line with VIOLATE in it, put the result in an array then sort all the violations? If so you need to declare and sort #viol outside the loop:
use strict;
use warnings; # Don't forget these!
my #viol;
while (<>) {
chomp;
if (/VIOLATE/) {
my #lines = split(/\s+/); # Split on one or more whitespace characters.
push #viol, $lines[-2];
}
}
# sort and print
my #sorted = sort {$a <=> $b} #viol;
print "#sorted";
This outputs: -11.06 -2.06 -1.106 -1.06 -0.226

Your sort works just fine. The only problem is that your array only has one element. Right above the sort, you do the assignment.
If you want this to work, you need to fill your array before you sort.
This is also a one-liner:
perl -lanwe 'push(#a, $F[-2]) if /VIOLATE/ }{ print for sort { $a <=> $b } #a'
Note the use of the "Eskimo Kiss" operator, }{. It works in a way similar to an END { ... } block, in that whatever comes after it is executed at the end of the input.
For the curious: The "Eskimo Kiss" works because the switch -n adds a while(<>) { ... } loop around the -e program string, in a very literal way. Deparsed, it looks like this, with comments for clarity:
perl -MO=Deparse -lanwe 'push(#a, $F[-2]) if /VIOLATE/ }{ print for sort { $a <=> $b } #a'
BEGIN { $^W = 1; } # warnings enabled by -w
BEGIN { $/ = "\n"; $\ = "\n"; } # line endings enabled by -l
LINE: while (defined($_ = <ARGV>)) { # while(<>) loop added by -n
chomp $_; # chomp added by -l
our(#F) = split(' ', $_, 0); # autosplit enabled by -a
push #a, $F[-2] if /VIOLATE/; # our code
} # eskimo kiss close
{ # eskimo kiss open
print $_ foreach (sort {$a <=> $b} #a); # our END code
} # closing bracket added by -n
-e syntax OK

Related

Pick up the longest peptide using perl

I want to find out the longest possible protein sequence translated from cds in 6 forward and reverse frame.
This is the example input format:
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
I would like to find out all the strings which start from "M" and stop at "X", count the each length of the strings and select the longest.
For example, in the case above:
the script will find,
>111 has two matches:
MGFSOX
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222 has one match:
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX
Then count each match's length, and print the string and number of longest matches which is the result I want:
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX 38
But it prints out no answer. Does anyone know how to fix it? Any suggestion will be helpful.
#!/usr/bin/perl -w
use strict;
use warnings;
my #pep=();
my $i=();
my #Xnum=();
my $n=();
my %hash=();
my #k=();
my $seq=();
$n=0;
open(IN, "<$ARGV[0]");
while(<IN>){
chomp;
if($_=~/^[^\>]/){
#pep=split(//, $_);
if($_ =~ /(X)/){
push(#Xnum, $1);
if($n >= 0 && $n <= $#Xnum){
if(#pep eq "M"){
for($i=1; $i<=$#pep; $i++){
$seq=join("",#pep);
$hash{$i}=$seq;
push(#k, $i);
}
}
elsif(#pep eq "X"){
$n=$n+1;
}
foreach (sort {$a cmp $b} #k){
print "$hash{$k[0]}\t$k[0]";
}
}
}
}
elsif($_=~/^\>/){
print "$_\n";
}
}
close IN;

Check out this Perl one-liner
$ cat iris.txt
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
$ perl -ne ' if(!/^>/) { print "$p"; while(/(M[^M]+?X)/g ) { if(length($1)>length($x)) {$x=$1 } } print "$x ". length($x)."\n";$x="" } else { $p=$_ } ' iris.txt
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPX 7
$

There's more than one way to do it!
Try this too:
print and next if /^>/;
chomp and my #z = $_ =~ /(M[^X]*X)/g;
my $m = "";
for my $s (#z) {
$m = $s if length $s > length $m
}
say "$m\t" . length $m
Output:
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX 38
uses >=5.14 and make sure to run script with perl -n
As a one-liner:
perl -E 'print and next if /^>/; chomp and my #z = $_ =~ /(M[^X]*X)/g; my $m = ""; for my $s (#z) { $m = $s if length $s > length $m } say "$m\t" . length $m' -n data.txt

Here is solution using reduce from List::Util.
Edit: mistakenly used maxstr which gave results but is not what was needed. Have reedited this post to use reduce (correctly) instead.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce/;
open my $fh, '<', \<<EOF;
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
EOF
my $id;
while (<$fh>) {
chomp;
if (/^>/) {
$id = $_;
}
else {
my $data = reduce {length($a) > length($b) ? $a : $b} /M[^X]*X/g;
print "$id\n$data\t" . length($data) . "\n" if $data;
}
}

Here's my take on it.
I like fasta files tucked into a hash, with the fasta name as the key. This way you can just add descriptions to it, e.g. base composition etc...
#!/usr/local/ActivePerl-5.20/bin/env perl
use strict;
use warnings;
my %prot;
open (my $fh, '<', '/Users/me/Desktop/fun_prot.fa') or die $!;
my $string = do { local $/; <$fh> };
close $fh;
chomp $string;
my #fasta = grep {/./} split (">", $string);
for my $aa (#fasta){
my ($key, $value) = split ("\n", $aa);
$value =~ s/[A-Z]*(M.*M)[A-Z]/$1/;
$prot{$key}->{'len'} = length($value);
$prot{$key}->{'prot'} = $value;
}
for my $sequence (sort { $prot{$b}->{'len'} <=> $prot{$a}->{'len'} } keys %prot){
print ">" . $sequence, "\n", $prot{$sequence}->{'prot'}, "\t", $prot{$sequence}->{'len'}, "\n";
last;
}
__DATA__
>1232
ASDFASMJJJJJMFASDFSDAFSDDFSA
>2343
AASFDFASMJJJJJJJJJJJJJJMRGQEGDAGDA
Output
>2343
MJJJJJJJJJJJJJJM 16

Why does the sort fail in the script but not in the one-liner?

The following snippet works fine:
cat versions.txt | perl -Mversion -le'
chomp( #a = <> );
print for sort { $left = $a; $right = $b; $left =~ s!^\D+!!; $right =~ s!^\D+!!; version->parse($left) cmp version->parse($right) } #a;
'
The following code does not, but I don't understand why. They seem to be effectively the same code.
use strict;
use warnings;
use version;
chomp(my #versions = <>);
#versions = sort { version->parse($a =~ s/^\D+//r) cmp version->parse($b =~ s/^D+//r) } #versions;
print $_."\n" for #versions;
I get:
Invalid version format (non-numeric data) at script line 7

In the non-working code, there's a simple typo. You have:
#versions = sort { version->parse($a =~ s/^\D+//r) cmp version->parse($b =~ s/^D+//r) } #versions;
That might be better formatted as:
#versions = sort { version->parse($a =~ s/^\D+//r) cmp
version->parse($b =~ s/^D+//r) } #versions;
Split over two lines, the asymmetry (bug) is obvious.
You are missing the backslash before the second D:
#versions = sort { version->parse($a =~ s/^\D+//r) cmp
version->parse($b =~ s/^\D+//r) } #versions;

Display text file 5 lines at a time in Perl

I'm not sure how to do this.
I have a file (which will never be large, so won't need a module) and want to break it down so that I can display it on my web page 5 lines per row.
This is as far as I have got.
$row="5";
#DD=<DATA>;
foreach $line (#DD) {
$count++;
chomp($line);
if ($count <= $row) {
print qq~$line ~; # This shows5, but don't know what to do next.
}
}
exit;
__DATA__
aaaa
bbbb
cccc
dddd
eeee
ffff
gggg
hhhh
iiii
jjjj
kkkk
llll
mmmm
Expected result (should be in 3 line but your forum software won't let me)
aaaa bbbb cccc dddd eeee
ffff gggg hhhh iiii jjjj
kkkk llll mmmm
Could someone help please?

You'd have to reset the count and print a new line at 5.
print qq~$line~;
if ( $count == $row ) {
print "\n";
$count = 0;
}
else {
print ' ';
}
However, easier still is a modulus:
use strict;
use warnings;
my $row = 5;
my $count = 0;
foreach my $line ( <DATA> ) {
chomp( $line );
print $line, ++$count % $row ? ' ' : "\n";
}
If $count is a multiple of $row print a newline, else print a space.

When you reach the limit (5) reset the counter to 0 and print a newline
$count = 0;
print "\n";
BTW there are a number of improvements you could do to your code, but the most important would be to use strict and warnings
I think this will work:
use strict;
use warnings;
my $rows = 5;
my $count = 0;
my #lines = <DATA>;
chomp #lines;
foreach my $line (#lines) {
$count++;
if ($count <= $rows) {
print qq{$line };
} else {
$count = 0;
print "\n";
}
}

There are many problems with your code. See my comments below.
#!/usr/bin/env perl
use strict;
use warnings;
my $threshold = 5;
my #buffer;
while (my $line = <DATA>) {
$line =~ s/\s\z//;
push #buffer, $line;
if (#buffer % $threshold == 0) {
print join(' ', #buffer), "\n";
#buffer = ();
}
}
#buffer
and print join(' ', #buffer), "\n";
__DATA__
aaaa
bbbb
cccc
dddd
eeee
ffff
gggg
hhhh
iiii
jjjj
kkkk
llll
mmmm
Here is a list of things you should think about:
First, You should use strict and warnings.
$row="5";
$row is intended to be used as numeric variable. Why assign a string to it?
#DD=<DATA>;
foreach $line (#DD) {
No need to create an extra array by slurping, of all things, your __DATA__ section. Instead, use while and read line-by-line.
$count++;
Perl's builtin $. counts the number of lines read. No need for an additional variable.
For variety: If you insist on slurping, you can slurp into a string:
#!/usr/bin/env perl
use strict;
use warnings;
my $threshold = 5;
my $contents = do { local $/; <DATA> };
while ($contents) {
($contents, my #fields) = reverse split(qr{\n}, $contents, $threshold + 1);
print join(' ', reverse #fields), "\n";
}
or, continue to slurp into an array and use splice:
#!/usr/bin/env perl
use strict;
use warnings;
my $threshold = 5;
my #contents = <DATA>;
while (#contents) {
print join(' ', map { chomp; $_ } splice #contents, 0, $threshold), "\n";
}

# always start your Perl 5 files with these
# two pragmas until you know exactly why they
# are recommended
use strict;
use warnings;
my $row = 5;
while ( <> ){
chomp;
print;
print $. % $row ? ' ' : "\n";
}
# makes sure there is always a trailing newline
print "\n" if $. % $row;
$ time ./example.pl /usr/share/dict/words
...
real 0m2.217s
user 0m0.097s
sys 0m0.084s
In Perl 6 I would probably write it as:
'filename'.IO.lines.rotor(5, :partial).map: *.say;
( currently takes about 15 seconds to process /usr/share/dict/words under the Moar backend, but it hasn't had 20 years of optimizations applied to it like Perl 5 has. It may be faster with the JVM backend )

sorting some data

I am trying to sort some data in bash. Data looks like below.
20110724.gz 1347
20110724.gz 2128
20110725.gz 1315
20110725.gz 2334
20110726.gz 808
20110726.gz 1088
-bash-3.2$
After sorting, it should look like
20110724.gz 3475
20110725.gz 3649
20110726.gz 1896
Basically, for a given date, the data are summed up. Can somebody help? Thanks.
hmm, hopefully I figure it out in a few days.

Here's a quick and dirty perl oneliner:
$ perl -e 'my %h = (); while (<>) { chomp; my ($fname, $count) = split; $h{$fname} += $count;} foreach my $k (sort keys %h) {print $k, " ", $h{$k}, "\n"}' < datafile

Here's a perl solution.
Usage: script.pl input.txt > output.txt
Code:
use warnings;
use strict;
use ARGV::readonly;
my %sums;
while (<>) {
my ($date, $num) = split;
$sums{$date} += $num;
}
for my $date (sort keys %sums) {
print "$date $sums{$date}\n";
}
Or as a one-liner:
$ perl -we 'my %h; while(<>) { ($d,$n)=split; $h{$d}+=$n; } print "$_ $h{$_}\n" for sort keys %h;' data2.txt
In case you do need a numerical sort on the dates:
sort { substr($a,0,8) <=> substr($b,0,8) } keys %sums;

You don't need perl for doing that. Some shell trickery will help :)
sort -n -k1,8 <file | while true ; do
if ! read line ; then
test -n "$accfile" && echo $accfile $value
break
fi
line=$(echo $line | tr -s ' ' )
curfile=$(echo $line | cut -d\ -f1)
curvalue=$(echo $line | cut -d\ -f2)
if [ $curfile != "$accfile" ] ; then
# new file, output the last if not empty
test -n "$accfile" && echo $accfile $value
accfile=$curfile
value=$curvalue
else
value=$(expr $value \+ $curvalue)
fi
done
The k parameter tells sort what characters use to sort. As dates are put in number-ordered format, a number sort (-n) works.

How can I iterate through nested arrays?

I have created an array as follows
while (defined ($line = `<STDIN>`))
{
chomp ($line);
push #stack,($line);
}
each line has two numbers.
15 6
2 8
how do iterate over each item in each line?
i.e. I want to print
15
6
2
8
I understand it's something like
foreach (#{stack}) (#stack){
print "?????
}
This is where I am stuck.

See the perldsc documentation. That's the Perl Data Structures Cookbook, which has examples for dealing with arrays of arrays. From what you're doing though, it doesn't look like you need an array of arrays.
For your problem of taking two numbers per line and outputting one number per line, just turn the whitespace into newlines:
while( <> ) {
s/\s+/\n/; # turn all whitespace runs into newlines
print; # it's ready to print
}
With Perl 5.10, you can use the new \h character class that matches only horizontal whitespace:
while( <> ) {
s/\h+/\n/; # turn all horizontal whitespace runs into newlines
print; # it's ready to print
}
As a Perl one-liner, that's just:
% perl -pe 's/\h+/\n/' file.txt

#!/usr/bin/perl
use strict;
use warnings;
while ( my $data = <DATA> ) {
my #values = split ' ', $data;
print $_, "\n" for #values;
}
__DATA__
15 6
2 8
Output:
C:\Temp> h
15
6
2
8
Alternatively, if you want to store each line in #stack and print out later:
my #stack = map { [ split ] } grep { chomp; length } <DATA>;
The line above slurps everything coming from the DATA filehandle into a list of lines (because <DATA> happens in list context). The grep chomps each line and filters by length after chomping (to avoid getting any trailing empty lines in the data file -- you can avoid it if there are none). The map then splits each line along spaces, and then creates an anonymous array reference for each line. Finally, such array references are stored in each element of #stack. You might want to use Data::Dumper to look at #stack to understand what's going on.
print join("\n", #$_), "\n" for #stack;
Now, we look over each entry in stack, dereferencing each array in turn, then joining the elements of each array with newlines to print one element per line.
Output:
C:\Temp> h
15
6
2
8
The long way of writing essentially the same thing (with less memory consumption) would be:
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, \#values;
}
for my $ref ( #stack ) {
print join("\n", #$ref), "\n";
}
Finally, if you wanted do something other than printing all values, say, sum all the numbers, you should store one value per element of #stack:
use List::Util qw( sum );
my #stack;
while ( my $line = <DATA> ) {
last unless $line =~ /\S/;
my #values = split ' ', $line;
push #stack, #values;
}
printf "The sum is %d\n", sum #stack;

#!/usr/bin/perl
while ($line = <STDIN>) {
chomp ($line);
push #stack, $line;
}
# prints each line
foreach $line (#stack) {
print "$line\n";
}
# splits each line into items using ' ' as separator
# and prints the items
foreach $line (#stack) {
#items = split / /, $line;
foreach $item (#items) {
print $item . "\n";
}
}

I use 'for' for "C" style loops, and 'foreach' for iterating over lists.

#!/usr/bin/perl
use strict;
use warnings;
open IN, "< read.txt" or
die "Can't read in 'read.txt'!";
my $content = join '', <IN>;
while ($content =~ m`(\d+)`g) {
print "$1\n";
}