In my Perl script I have a double infinite while loop. I read lines from a file with the diamond operator. But somehow if my script reaches the last line of the file, it does not return undef, but hangs forever.
If I reduced my code to a single while loop this did not happen. So I wonder if I am doing something wrong or if this is a known limitation of the language. (This is actually my first perl script.)
Below is my script. It is meant to count the size of DNA sequences in fasta files, but the hanging behavior can be observed with any other file with multiple lines of text.
Perl version 5.18.2
Invoked from the commandline like perl script.pl file.fa
$l = <>;
while (1) {
$N = 0;
while (1) {
print "Get line";
$l = <>;
print "Got line";
if (not($l)) {
last;
}
if ($l =~ /^>/) {
last;
}
$N += length($l);
}
print $N;
if (not($N)) {
last;
}
}
I put some debug print statements so that you can see that the last line printed is "Get line" and then it hangs.
Welcome to Perl.
The issue with your code is that you have no way of escaping the outer loop. <> will return undef when it reaches the end of the file. At this point your inner loop ends and the outer loop sends it back in. Forcing further reads causes <> to start looking at STDIN which never sends an EOF, so your loop continues forever.
As this is your first Perl script I'm going to rewrite it for you with some comments. Perl is a fantastic language, you can write some great code, however mostly due to it's age there are some older styles which are no longer advised.
use warnings; # Warn about coding errors
use strict; # Enforce good style
use 5.010; # Enable modernish (10 year old) features
# Another option which mostly does the same as above.
# I normally do this, but it does require a non-standard CPAN library
# use Modern::Perl;
# Much better style to have the condition in the while loop
# Much clearer than having an infinite loop with break/last statements
# Also avoid $l as a variable name, it looks too much like $1
my $count = 0; # Note variable declaration, enforced by strict
while(my $line = <>) {
if ($line =~ /^>/) {
# End of input block, output and reset
say $count;
$count = 0;
} else {
$count += length($line);
}
}
# Have reached the end of the input files
say $count;
try "echo | perl script.pl file.fa".
works for me with same "problem" in my code.
gets EOF from stdin.
Related
I'm having some problem with a subroutine that locates certain files and extracts some data out of them.
This subroutine is called inside a foreach loop, but whenever the call is made the loop skips to its next iteration. So I am wondering whether any of the next;'s are somehow escaping from the subroutine to the foreach loop where it is called?
To my knowledge the sub looks solid though so I'm hoping if anyone can see something I'm missing?
sub FindKit{
opendir(DH, "$FindBin::Bin\\data");
my #kitfiles = readdir(DH);
closedir(DH);
my $nametosearch = $_[0];
my $numr = 1;
foreach my $kitfile (#kitfiles)
{
# skip . and .. and Thumbs.db and non-K-files
if($kitfile =~ /^\.$/) {shift #kitfiles; next;}
if($kitfile =~ /^\.\.$/) {shift #kitfiles; next;}
if($kitfile =~ /Thumbs\.db/) {shift #kitfiles; next;}
if($kitfile =~ /^[^K]/) {shift #kitfiles; next;}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<","data\\$kitfile") or die "$!";
while (<$fhkits>) {}
if ($. <= 1) {
print " Empty File!";
next;
}
seek($fhkits,0,0);
while (my $kitrow = <$fhkits>) {
if ($. == 0 && $kitrow =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/g) {
close $fhkits;
return $1;
}
}
$numr++;
close $fhkits;
}
return 0;
}
To summarize comments, the refactored code:
use File::Glob ':bsd_glob';
sub FindKit {
my $nametosearch = $_[0];
my #kitfiles = glob "$FindBin::Bin/data/K*"; # files that start with K
foreach my $kitfile (#kitfiles)
{
open my $fhkits, '<', $kitfile or die "$!";
my $kitrow_first_line = <$fhkits>; # read first line
return if eof; # next read is end-of-file so it was just header
my ($result) = $kitrow_first_line =~
/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
return $result if $result;
}
return 0;
}
I use core File::Glob and enable :bsd_glob option, which can handle spaces in filenames. I follow the docs note to use "real slash" on Win32 systems.
I check whether there is only a header line using eof.†
I do not see how this can affect the calling code, other than by its return value. Also, I don't see how the posted code can make the caller skip the beat, either. That problem is unlikely to be in this sub.
Please let me know if I missed some point with the above rewrite.
† Previous version used to check whether there is just one (header) line by
1 while <$fhkits>; # check number of lines ...
return if $. == 1; # there was only one line, the header
Also correct but eof is way better
The thing that is almost certainly screwing you here, is that you are shifting the list that you are iterating.
That's bad news, as you're deleting elements ... but in places you aren't necessarily thinking.
For example:
#!/usr/bin/env perl
use strict;
use warnings;
my #list = qw ( one two three );
my $count;
foreach my $value ( #list ) {
print "Iteration ", ++$count," value is $value\n";
if ( $value eq 'two' ) { shift #list; next };
}
print "#list";
How many times do you think that should iterate, and which values should end up in the array?
Because you shift you never process element 'three' and you delete element 'one'. That's almost certainly what's causing you problems.
You also:
open using a relative path, when your opendir used an absolute one.
skip a bunch of files, and then skip anything that doesn't start with K. Why not just search for things that do start with K?
read the file twice, and one is to just check if it's empty. The perl file test -z will do this just fine.
you set $kitrow for each line in the file, but don't really use it for anything other than pattern matching. It'd probably work better using implicit variables.
You only actually do anything on the first line - so you don't ever need to iterate the whole file. ($numr seems to be discarded).
you use a global match, but only use one result. The g flag seems redundant here.
I'd suggest a big rewrite, and do something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use FindBin;
sub FindKit{
my ($nametosearch) = #_;
my $numr = 1;
foreach my $kitfile (glob "$FindBin::Bin\\data\\K*" )
{
if ( -z $kitfile ) {
print "$kitfile is empty\n";
next;
}
# $kitfile is the file used on this iteration of the loop
open (my $fhkits,"<", $kitfile) or die "$!";
<$kitfile> =~ m/Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/
and return $1;
return 0;
}
}
As a big fan of the Path::Tiny module (me have it always installed and using it in every project) my solution would be:
use strict;
use warnings;
use Path::Tiny;
my $found = FindKit('mykit');
print "$found\n";
sub FindKit {
my($nametosearch) = #_;
my $datadir = path($0)->realpath->parent->child('data');
die "$datadir doesn't exists" unless -d $datadir;
for my $file ($datadir->children( qr /^K/ )) {
next if -z $file; #skip empty
my #lines = $file->lines;
return $1 if $lines[0] =~ /Maakartikel :\s*(\S+)\s+Montagekit.*?($nametosearch)\s{3,}/;
}
return;
}
Some comments and still opened issues:
Using the Path::Tiny you could always use forward slashes in the path-names, regardless of the OS (UNIX/Windows), e.g. the data/file will work on windows too.
AFAIK the FindBin is considered broken - so the above uses the $0 and realpath ...
what if the Kit is in multiple files? The above always returns on the 1st found one
the my #lines = $file->lines; reads all lines - unnecessary - but on small files doesn't big deal.
the the reality this function returns the arg for the Maakartikel, so probably better name would be find_articel_by_kit or find_articel :)
easy to switch to utf8 - just change the $file->lines to $file->lines_utf8;
This little snippet is from the first chapter of the LWP perl oreilly book.
This line
$count++ while $catalog =~ m/Perl/gi;
perplexes me
I do not understand how the while statement iterates through the lines in the $catalog variable to find the matched, I don't even know how to explain what that line does in english much less perl
#!/usr/bin/perl -w
use strict ;
use LWP::Simple ;
my $catalog = get("http://www.oreilly.com/catalog");
my $count = 0;
$count++ while $catalog =~ m/Perl/gi;
print "$count\n";
so I have tried writing it out long hand to no avail.
#!/usr/bin/perl -w
use strict ;
use LWP::Simple ;
my $catalog = get("http://www.oreilly.com/catalog");
open( my $fh_catalog ,"<" , $catalog) || die "cant open $!";
while (<$fh_catalog>) {
print $_ ;
sleep 1;
}
I even tried
#!/usr/bin/perl -w
use strict ;
use LWP::Simple ;
my $catalog = get("http://www.oreilly.com/catalog");
while (<$catalog>) {
print $_ ;
sleep 1;
}
$catalog contains the string <!DOCTYPE HTML PUB[...][newline][newline]<html>[...].
Your first snippet fails because $catalog doesn't contain a file name.
Your second snippet fails because $catalog doesn't contain a file handle.
When a match operator with the /g modifier is used scalar context, it searches from where the last search left off.
The analog would be
use Time::HiRes qw( sleep ); # Support sleeping fractions of seconds.
$| = 1; # Turn off STDOUT's output buffering.
for my $i (0..length($content)-1) {
print(substr($content, $i, 1));
sleep 0.1;
}
Let's use a simpler string as an example.
my $s = "a000a000a000";
++$count while $s =~ /a/g;
Here's what happens:
The match operator is executed. It finds the first a, sets pos($s) = 1;, and returns true.
The loop body is entered, and $count is incremented.
The match operator is executed. It behaves as if the string started as pos($s) (1), finds the second a, sets pos($s) = 5;, and returns true.
The loop body is entered, and $count is incremented.
The match operator is executed. It behaves as if the string started as pos($s) (5), finds the third a, sets pos($s) = 9;, and returns true.
The loop body is entered, and $count is incremented.
The match operator is executed. It behaves as if the string started as pos($s) (9), fails to find a match, clears pos($s), and returns false. The loops exits.
Nothing changes if some of the characters of the string are newlines.
#!/usr/bin/env perl
use Term::ReadKey;
ReadMode 4;
END {
ReadMode 0; # Reset tty mode before exiting
}
while (<>) {
$key = ReadKey(0);
$key == "\x04" and last; # Ctrl+D breaks the loop
print $key;
}
When I had it without the while loop, it was printing back what I typed in.
It doesn't even produce any output at the end (if it was buffering it or something). Like I'd run it and type a few letters and hit Ctrl+D. It prints nothing.
I'm trying to make a program to convert mouse scroll escape codes into keypresses. I hope I'm not barking up the wrong tree.
This line
while (<>)
reads a line from STDIN (assuming you ran the program with no command line arguments). Once a line has been read, it enters the body of the while loop. Whatever you typed up to and including the newline is now in $_.
Now, you press a key, it's stored in $key and numerically compared to CTRL-D. Since neither is numeric, they both end up being zero, the loop terminates.
This is why you should turn on warnings which would have told you:
Argument "^D" isn't numeric in numeric eq (==) at ./tt.pl line 15, line 1.
Argument "c" isn't numeric in numeric eq (==) at ./tt.pl line 15, line 1.
Of course, it would make sense to put the loop-termination condition where it belongs as well:
#!/usr/bin/env perl
use strict;
use warnings;
use Term::ReadKey;
ReadMode 4;
END {
ReadMode 0; # Reset tty mode before exiting
}
my $input;
{
local $| = 1;
while ((my $key = ReadKey(0)) ne "\x04") {
print $key;
$input .= $key;
}
}
print "'$input'\n";
Just replace the while condition to:
while(1) {
# ...
}
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...
I am new to Perl, but I know a little bit of C though. I came across this snippet in one of our classroom notes:
$STUFF = "C:/scripts/stuff.txt";
open STUFF or die "Cannot open $STUFF for read: $!";
print "Line $. is: $_" while (<STUFF>);
Why is the while after the print statement? What does it do?
It's the same as
while (<STUFF>) {
print "Line $. is : $_";
}
It's written the way it is because it's simpler and more compact. This is, in general, Perl's (in)famous "statement modifier" form for conditionals and loops.
The other answers have explained the statement modifier form of the while loop. However, there's a lot of other magic going on here. In particular, the script relies on three of Perl's special variables. Two of these ($_ and $!) are very common; the other ($.) is reasonably common. But they're all worth knowing.
When you run while <$fh> on an opened filehandle, Perl automagically runs through the file, line by line, until it hits EOF. Within each loop, the current line is set to $_ without you doing anything. So these two are the same:
while (<$fh>) { # something }
while (defined($_ = <$fh>)) { # something }
See perldoc perlop, the section on I/O operators. (Some people find this too magical, so they use while (my $line = <$fh>) instead. This gives you $line for each line rather than $_, which is a clearer variable name, but it requires more typing. To each his or her own.)
$! holds the value of a system error (if one is set). See perldoc perlvar, the section on $OS_ERROR, for more on how and when to use this.
$. holds a line number. See perldoc perlvar, the section on $NR. This variable can be surprisingly tricky. It won't necessarily have the line number of the file you are currently reading. An example:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
print "$ARGV: $.\n";
}
If you save this as lines and run it as perl lines file1 file2 file3, then Perl will count lines straight through file1, file2 and file3. You can see that Perl knows what file it's reading from (it's in $ARGV; the filenames will be correct), but it doesn't reset line numbering automatically for you at the end of each file. I mention this since I was bit by this behavior more than once until I finally got it through my (thick) skull. You can reset the numbering to track individual files this way:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
print "$ARGV: $.\n";
}
continue {
close ARGV if eof;
}
You should also check out the strict and warnings pragmas and take a look at the newer, three-argument form of open. I just noticed that you are "unknown (google)", which means you are likely never to return. I guess I got my typing practice for the day, at least.
The following snippets are exactly equivalent, just different ways of writing the same thing:
print "Line $. is : $_" while (<STUFF>);
while (<STUFF>) {
print "Line $. is : $_";
}
What this does is each iteration through the loop, Perl reads one line of text from the STUFF file and puts it in the special variable $_ (this is what the angle brackets do). Then the body of the loop prints lines like:
Line 1 is : test
Line 2 is : file
The special variable $. is the line number of the last line read from a file, and $_ is the contents of that line as set by the angle bracket operator.
Placing the while after the print, makes the line read almost like normal English.
It also puts emphasis on the print instead of the while. And you don't need the curly brackets: { ... }
It can also be used with if and unless, for example,
print "Debug: foobar=$foobar\n" if $DEBUG;
print "Load file...\n" unless $QUIET;
I've taken the liberty of rewriting your snippet as I would.
Below my suggested code is a rogues gallery of less than optimal methods you might see in the wild.
use strict;
use warnings;
my $stuff_path = 'c:/scripts/stuff.txt';
open (my $stuff, '<', $stuff_path)
or die "Cannot open'$stuff_path' for read : $!\n";
# My preferred method to loop over a file line by line:
# while loop with explicit variable
while( my $line = <$stuff> ) {
print "Line $. is : $line\n";
}
Here are other methods you might see. Each one could be substituted for the while loop in my example above.
# while statement modifier
# - OK, but less clear than explicit code above.
print "Line $. is : $_" while <$stuff>;
# while loop
# - OK, but if I am operating on $_ explicitly, I prefer to use an explicit variable.
while( <$stuff> ) {
print "Line $. is : $_";
}
# for statement modifier
# - inefficient
# - loads whole file into memory
print "Line $. is : $_" for <$stuff>;
# for loop - inefficient
# - loads whole file into memory;
for( <$stuff> ) {
print "Line $. is : $_\n";
}
# for loop with explicit variable
# - inefficient
# - loads whole file into memory;
for my $line ( <$stuff> ) {
print "Line $. is : $line\n";
}
# Exotica -
# map and print
# - inefficient
# - loads whole file into memory
# - stores complete output in memory
print map "Line $. is : $_\n", <$stuff>;
# Using readline rather than <>
# - Alright, but overly verbose
while( defined (my $line = readline($stuff) ) {
print "Line $. is : $line\n";
}
# Using IO::Handle methods on a lexical filehandle
# - Alright, but overly verbose
use IO::Handle;
while( defined (my $line = $stuff->readline) ) {
print "Line $. is : $line\n";
}
Note that the while statement can only follow your loop body if it's a one-liner. If your loop body runs over several lines, then your while has to precede it.
It is the same as a:
while (<STUFF>) { print "Line $. is: $_"; }