How to properly declare hash outside BEGIN block? - perl

Consider this simple program. Can you explain why the output is different after uncommenting the first two lines? What is happening with my hash with use strict? How to fix the program to work with use strict?
echo -e "key1\nkey2\nkey3" | perl -lne '
#use strict; use warnings;
#my %hash;
BEGIN {
$hash{'key3'} = "value";
}
chomp;
if ($hash{$_}) {
print "$_ matched";
} else {
print "$_ unmatched ";
}
'
Output:
key1 unmatched
key2 unmatched
key3 matched

perl -ln wraps the code so that you end up with
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (<>) {
chomp;
use strict; use warnings;
my %hash;
BEGIN { $hash{key3} = "value"; }
...
}
Notice how you create a new %hash for every line? If you wanted to use use strict;, you'd use a package variable.
perl -lne'
use strict; use warnings;
our %hash;
BEGIN { $hash{key3} = "value"; }
...
'
Otherwise, you have to drop -n.
perl -le'
use strict; use warnings;
my %hash = ( key3 => "value" );
while (<>) {
chomp;
...
}
'
PS - As you may have noticed, it's useless to chomp when using both -l and -n.

Using the -n switch implictly puts your entire program inside a while (defined($_=<ARGV>)) { ... block, including your my %hash statement:
perl -MO=Deparse -lne '
use strict; use warnings; my %hash;
BEGIN {
$hash{'key3'} = "value";
}
chomp;
if ($hash{$_}) {
print "$_ matched";
} else {
print "$_ unmatched ";
}
'
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
use warnings;
use strict 'refs';
my %hash;
sub BEGIN {
$hash{'key3'} = 'value';
}
&BEGIN;
chomp $_;
if ($hash{$_}) {
print "$_ matched";
}
else {
print "$_ unmatched ";
}
}
-e syntax OK
That is, my %hash is redeclared in each iteration of the loop. To keep this as a one-liner and without making it too cumbersome, consider declaring our %hash to make %hash a package variable.

Related

Pick up the longest peptide using perl

I want to find out the longest possible protein sequence translated from cds in 6 forward and reverse frame.
This is the example input format:
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
I would like to find out all the strings which start from "M" and stop at "X", count the each length of the strings and select the longest.
For example, in the case above:
the script will find,
>111 has two matches:
MGFSOX
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222 has one match:
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX
Then count each match's length, and print the string and number of longest matches which is the result I want:
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX 38
But it prints out no answer. Does anyone know how to fix it? Any suggestion will be helpful.
#!/usr/bin/perl -w
use strict;
use warnings;
my #pep=();
my $i=();
my #Xnum=();
my $n=();
my %hash=();
my #k=();
my $seq=();
$n=0;
open(IN, "<$ARGV[0]");
while(<IN>){
chomp;
if($_=~/^[^\>]/){
#pep=split(//, $_);
if($_ =~ /(X)/){
push(#Xnum, $1);
if($n >= 0 && $n <= $#Xnum){
if(#pep eq "M"){
for($i=1; $i<=$#pep; $i++){
$seq=join("",#pep);
$hash{$i}=$seq;
push(#k, $i);
}
}
elsif(#pep eq "X"){
$n=$n+1;
}
foreach (sort {$a cmp $b} #k){
print "$hash{$k[0]}\t$k[0]";
}
}
}
}
elsif($_=~/^\>/){
print "$_\n";
}
}
close IN;
Check out this Perl one-liner
$ cat iris.txt
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
$ perl -ne ' if(!/^>/) { print "$p"; while(/(M[^M]+?X)/g ) { if(length($1)>length($x)) {$x=$1 } } print "$x ". length($x)."\n";$x="" } else { $p=$_ } ' iris.txt
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPX 7
$
There's more than one way to do it!
Try this too:
print and next if /^>/;
chomp and my #z = $_ =~ /(M[^X]*X)/g;
my $m = "";
for my $s (#z) {
$m = $s if length $s > length $m
}
say "$m\t" . length $m
Output:
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX 38
uses >=5.14 and make sure to run script with perl -n
As a one-liner:
perl -E 'print and next if /^>/; chomp and my #z = $_ =~ /(M[^X]*X)/g; my $m = ""; for my $s (#z) { $m = $s if length $s > length $m } say "$m\t" . length $m' -n data.txt
Here is solution using reduce from List::Util.
Edit: mistakenly used maxstr which gave results but is not what was needed. Have reedited this post to use reduce (correctly) instead.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce/;
open my $fh, '<', \<<EOF;
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
EOF
my $id;
while (<$fh>) {
chomp;
if (/^>/) {
$id = $_;
}
else {
my $data = reduce {length($a) > length($b) ? $a : $b} /M[^X]*X/g;
print "$id\n$data\t" . length($data) . "\n" if $data;
}
}
Here's my take on it.
I like fasta files tucked into a hash, with the fasta name as the key. This way you can just add descriptions to it, e.g. base composition etc...
#!/usr/local/ActivePerl-5.20/bin/env perl
use strict;
use warnings;
my %prot;
open (my $fh, '<', '/Users/me/Desktop/fun_prot.fa') or die $!;
my $string = do { local $/; <$fh> };
close $fh;
chomp $string;
my #fasta = grep {/./} split (">", $string);
for my $aa (#fasta){
my ($key, $value) = split ("\n", $aa);
$value =~ s/[A-Z]*(M.*M)[A-Z]/$1/;
$prot{$key}->{'len'} = length($value);
$prot{$key}->{'prot'} = $value;
}
for my $sequence (sort { $prot{$b}->{'len'} <=> $prot{$a}->{'len'} } keys %prot){
print ">" . $sequence, "\n", $prot{$sequence}->{'prot'}, "\t", $prot{$sequence}->{'len'}, "\n";
last;
}
__DATA__
>1232
ASDFASMJJJJJMFASDFSDAFSDDFSA
>2343
AASFDFASMJJJJJJJJJJJJJJMRGQEGDAGDA
Output
>2343
MJJJJJJJJJJJJJJM 16

Printing specific line from a hash in perl

My question is how can I print a specific line from a hash. The code so far(Thank you Joel Berger for it) is:
#!/usr/bin/env perl
use strict;
use warnings;
use LWP::Simple;
my $content = get('http://temptrax.itworks.com/temp');
my %probes = $content =~ /Probe\s*(\d)\|\s*(\-?[\d\.]+)/gx;
foreach my $probe (sort keys %probes) {
print "$probe => $probes{$probe}\n";
}
The output of it is:
1 => 74.0
2 => -99.9
3 => 74.4
4 => 68.1
How can I get a specific line to print? Such as if I put in number 1, then only line 1 would print. Thank you for taking a look at this.
UPDATE: I was finally able to figure it out after some reading
#!/usr/bin/env perl
use v5.10.1;
use strict;
use warnings;
use LWP::Simple;
my $content = get('http://temptrax.itworks.com/temp');
my %probes = $content =~ /Probe\s*(\d)\|\s*(\-?[\d\.]+)/gx;
for ($ARGV[0]) {
when(1) {print "$probes{1}\n"; }
when(2) {print "$probes{2}\n"; }
when(3) {print "$probes{3}\n"; }
when(4) {print "$probes{4}\n"; }
default {print "error"; }
}
UPDATE2: Figured out an even easier way to do it
#!/usr/bin/env perl
use v5.10.1;
use warnings;
use LWP::Simple;
my $content = get('http://temptrax.itworks.com/temp');
my %probes = $content =~ /Probe\s*(\d)\|\s*(\-?[\d\.]+)/gx;
$MyVar = $ARGV[0];
print $probes{$MyVar};
print $probes{1};
Perhaps? Rather simple. Or:
print "$_ => ", $probes{$_} for 1,2,4; # selected numbers

making one single line

keyword harry /
sally/
tally/
want that whenever the string matches with keyword it should also look for "/" character.This signifies continuation of line
Then I want output as
keyword harry sally tally
==========================
My current code:
#!/usr/bin/perl
open (file2, "trial.txt");
$keyword_1 = keyword;
foreach $line1 (<file2>) {
s/^\s+|\s+$//g;
if ($line1 =~ $keyword_1) {
$line2 =~ (s/$\//g, $line1) ;
print " $line2 " ;
}
}
If the ===== lines in your question are supposed to be in the output, then use
#! /usr/bin/env perl
use strict;
use warnings;
*ARGV = *DATA; # for demo only; delete
sub print_line {
my($line) = #_;
$line =~ s/\n$//; # / fix Stack Overflow highlighting
print $line, "\n",
"=" x (length($line) + 1), "\n";
}
my $line = "";
while (<>) {
$line .= $line =~ /^$|[ \t]$/ ? $_ : " $_";
if ($line !~ s!/\n$!!) { # / ditto
print_line $line;
$line = "";
}
}
print_line $line if length $line;
__DATA__
keyword jim-bob
keyword harry /
sally/
tally/
Output:
keyword jim-bob
================
keyword harry sally tally
==========================
You did not specify what to do with the lines that do not contain the keyword. You might use this code as an inspiration, though:
#!/usr/bin/perl
use warnings;
use strict;
my $on_keyword_line;
while (<>) {
if (/keyword/ or $on_keyword_line) {
chomp;
if (m{/$}) {
chop;
$on_keyword_line = 1;
} else {
$on_keyword_line = 0;
}
print;
} else {
$on_keyword_line = 0;
print "\n";
}
}
A redo is useful when dealing with concatenating continuation lines.
my $line;
while ( defined( $line = <DATA> )) {
chomp $line;
if ( $line =~ s{/\s*$}{ } ) {
$line .= <DATA>;
redo unless eof(DATA);
}
$line =~ s{/}{};
print "$line\n";
}
__DATA__
keyword harry /
sally/
tally/
and
done!!!
$ ./test.pl
keyword harry sally tally and
done!!!
I think you need to simply concatenate all lines that end in a slash, regardless of the keyword.
I suggest this code.
Updated to account for the OP's comment that continuation lines are terminated by backslashes.
while (<>) {
s|\\\s*\z||;
print;
}

Split and add digits

If I open a file with strings like "233445", how can I then split that string into digits "2 3 3 4 4 5" and add each one to each other "2 + 3 + 3 etc..." and print out the result.
My code so far looks like this:
use strict;
#open (FILE, '<', shift);
#my #strings = <FILE>;
#strings = qw(12243434, 345, 676744); ## or a contents of a file
foreach my $numbers (#strings) {
my #done = split(undef, $numbers);
print "#done\n";
}
But I don't know where to start for the actual add function.
use strict;
use warnings;
my #strings = qw( 12243434 345 676744 );
for my $string (#strings) {
my $sum;
$sum += $_ for split(//, $string);
print "$sum\n";
}
or
use strict;
use warnings;
use List::Util qw( sum );
my #strings = qw( 12243434 345 676744 );
for my $string (#strings) {
my $sum = sum split(//, $string);
print "$sum\n";
}
PS — Always use use strict; use warnings;. It would have detected your misuse of commas in qw, and it would have dected your misuse of undef for split's first argument.
use strict;
my #done;
#open (FILE, '<', shift);
#my #strings = <FILE>;
my #strings = qw(12243434, 345, 676744); ## or a contents of a file
foreach my $numbers (#strings) {
#done = split(undef, $numbers);
print "#done\n";
}
my $tot;
map { $tot += $_} #done;
print $tot, "\n";
No one suggested an eval solution?
my #strings = qw( 12243434 345 676744 );
foreach my $string (#strings) {
my $sum = eval join '+',split //, $string;
print "$sum\n";
}
If your numbers are in a file, a one-liner might be nice:
perl -lnwe 'my $sum; s/(\d)/$sum += $1/eg; print $sum' numbers.txt
Since addition only uses numbers, it is safe to ignore all other characters. So just extract them one at the time with the regex and sum them up.
TIMTOWTDI:
perl -MList::Util=sum -lnwe 'print sum(/\d/g);' numbers.txt
perl -lnwe 'my $a; $a+=$_ for /\d/g; print $a' numbers.txt
Options:
-l auto-chomp input and add newline to print
-n implicit while(<>) loop around program -- open the file name given as argument and read each line into $_.

How many different ways are there to concatenate two files line by line using Perl?

Suppose file1 looks like this:
bye bye
hello
thank you
And file2 looks like this:
chao
hola
gracias
The desired output is this:
bye bye chao
hello hola
thank you gracias
I myself have already come up with five different approaches to solve this problem. But I think there must be more ways, probably more concise and more elegant ways, and I hope I can learn more cool stuff :)
The following is what I have tried so far, based on what I've learnt from the many solutions of my previous problems. Also, I'm trying to sort of digest or internalize the knowledge I've acquired from the Llama book.
Code 1:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
open my $file2,'<','c:/file2.txt';
while(defined(my $line1 = <$file1>)
and defined(my $line2 = <$file2>)){
die "Files are different sizes!\n" unless eof(file1) == eof(file2);
$line1 .= $line2;
$line1 =~ s/\n/ /;
print "$line1 \n";
}
Code 2:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
my #file1 = <$file1>;
open my $file2,'<','c:/file2.txt';
my #file2 =<$file2>;
for (my $n=0; $n<=$#file1; $n++) {
$file1[$n] .=$file2[$n];
$file1[$n]=~s/\n/ /;
print $file1[$n];
}
Code 3:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
open my $file2,'<','c:/file2.txt';
my %hash;
while(defined(my $line1 = <$file1>)
and defined(my $line2 = <$file2>)) {
chomp $line1;
chomp $line2;
my ($key, $val) = ($line1,$line2);
$hash{$key} = $val;
}
print map { "$_ $hash{$_}\n" } sort keys %hash;
Code 4:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
open my $file2,'<','c:/file2.txt';
while(defined(my $line1 = <$file1>)
and defined(my $line2 = <$file2>)) {
$line1 =~ s/(.+)/$1 $line2/;
print $line1;
}
Code 5:
#!perl
use autodie;
use warnings;
use strict;
open my $file1,'<','c:/file1.txt';
my #file1 =<$file1>;
open my $file2,'<','c:/file2.txt';
my #file2 =<$file2>;
while ((#file1) && (#file2)){
my $m = shift (#file1);
chomp($m);
my $n = shift (#file2);
chomp($n);
$m .=" ".$n;
print "$m \n";
}
I have tried something like this:
foreach $file1 (#file2) && foreach $file2 (#file2) {...}
But Perl gave me a syntactic error warning. I was frustrated. But can we run two foreach loops simultaneously?
Thanks, as always, for any comments, suggestions and of course the generous code sharing :)
This works for any number of files:
use strict;
use warnings;
use autodie;
my #handles = map { open my $h, '<', $_; $h } #ARGV;
while (#handles){
#handles = grep { ! eof $_ } #handles;
my #lines = map { my $v = <$_>; chomp $v; $v } #handles;
print join(' ', #lines), "\n";
}
close $_ for #handles;
The most elegant way doesn't involve perl at all:
paste -d' ' file1 file2
If I were a golfing man, I could rewrite #FM's answer as:
($,,$\)=(' ',"\n");#_=#ARGV;open $_,$_ for #_;print
map{chomp($a=<$_>);$a} #_=grep{!eof $_} #_ while #_
which you might be able to turn into a one-liner but that is just evil. ;-)
Well, here it is, under 100 characters:
C:\Temp> perl -le "$,=' ';#_=#ARGV;open $_,$_ for #_;print map{chomp($a =<$_>);$a} #_=grep{!eof $_ }#_ while #_" file1 file2
If it is OK to slurp (and why the heck not — we are looking for different ways), I think I have discovered the path the insanity:
#_=#ARGV;chomp($x[$.-1]{$ARGV}=$_) && eof
and $.=0 while<>;print "#$_{#_}\n" for #x
C:\Temp> perl -e "#_=#ARGV;chomp($x[$.-1]{$ARGV}=$_) && eof and $.=0 while<>;print qq{#$_{#_}\n} for #x" file1 file2
Output:
bye bye chao
hello hola
thank you gracias
An easier alternative to your Code 5 which allows for an arbitrary number of lines and does not care if files have different numbers of lines (hat tip #FM):
#!/usr/bin/perl
use strict; use warnings;
use File::Slurp;
use List::AllUtils qw( each_arrayref );
my #lines = map [ read_file $_ ], #ARGV;
my $it = each_arrayref #lines;
while ( my #lines = grep { defined and chomp and length } $it->() ) {
print join(' ', #lines), "\n";
}
And, without using any external modules:
#!perl
use autodie; use warnings; use strict;
my ($file1, $file2) = #ARGV;
open my $file1_h,'<', $file1;
my #file1 = grep { chomp; length } <$file1_h>;
open my $file2_h,'<', $file2;
my #file2 = grep { chomp; length } <$file2_h>;
my $n_lines = #file1 > #file2 ? #file1 : #file2;
for my $i (0 .. $n_lines - 1) {
my ($line1, $line2) = map {
defined $_ ? $_ : ''
} $file1[$i], $file2[$i];
print $line1, ' ', $line2, "\n";
}
If you want to concatenate only the lines that appear in both files:
#!perl
use autodie; use warnings; use strict;
my ($file1, $file2) = #ARGV;
open my $file1_h,'<', $file1;
my #file1 = grep { chomp; length } <$file1_h>;
open my $file2_h,'<', $file2;
my #file2 = grep { chomp; length } <$file2_h>;
my $n_lines = #file1 < #file2 ? #file1 : #file2;
for my $i (0 .. $n_lines - 1) {
print $file1[$i], ' ', $file2[$i], "\n";
}
An easy one with minimal error checking:
#!/usr/bin/perl -w
use strict;
open FILE1, '<file1.txt';
open FILE2, '<file2.txt';
while (defined(my $one = <FILE1>) or defined(my $twotemp = <FILE2>)){
my $two = $twotemp ? $twotemp : <FILE2>;
chomp $one if ($one);
chomp $two if ($two);
print ''.($one ? "$one " : '').($two ? $two : '')."\n";
}
And no, you can't run two loops simultaneous within the same thread, you'd have to fork, but that would not be guaranteed to run synchronously.