I am reading a file line by line and want to process each line through a subroutine. Since I am not interested in the line itself, I put the read from the filehandle directly into the subroutine call. However, this leads to unexpected behaviour I don't quite understand.
I created a minimal example demonstrating the effect:
#!/usr/bin/perl
use strict;
use warnings;
use Carp;
use English qw( -no_match_vars );
print "This works as expected:\n";
open my $TEST1, '<', 'filetest1.txt' or croak "Can't open filetest1.txt - $ERRNO";
my $line1 = <$TEST1>;
print_line( $line1 );
while ( 1 ) {
last if eof $TEST1;
$line1 = <$TEST1>;
print $line1;
}
close $TEST1;
print "\n";
print "Unexpected effect here:\n";
open my $TEST2, '<', 'filetest1.txt' or croak "Can't open filetest1.txt - $ERRNO";
print_line(<$TEST2>); # this works, but just once
while ( 1 ) {
last if eof $TEST2; # is triggered immediately
print "This is never printed\n";
print_line(<$TEST2>);
}
close $TEST2;
sub print_line {
my $line = shift;
print $line;
return 1;
}
Content of filetest1.txt:
test line 1
test line 2
test line 3
test line 4
test line 5
Result of the script:
This works as expected:
test line 1
test line 2
test line 3
test line 4
test line 5
Unexpected effect here:
test line 1
It seems that when I read the next line directly in the subroutine call, it works exactly once and then eof is triggered. I haven't found any explanation for that effect, and it forces me to create the variable $line1 just to pass the line to the subroutine.
I'm looking for an explanation why that happens, and consequently I'd like to understand the limits of what I can or cannot do with a filehandle.
In your code print_line(<$FH>); the reading from the filehandle will be will be done in wantarray-mode meaning you don't read a single line but the whole file. And in your subroutine you just use the first line and discard the rest. Thats why the filehandle is empty after your first loop encounter.
You could just provide the filehandle to the subroutine and read a line there:
open my $FH , '<' , 'somefile' or die "Cannot read: $!" ;
while( ! eof $FH ) {
print_line( $FH ) ;
}
sub print_line {
my ( $fh ) = #_ ;
my $line = <$fh> ;
print $line ;
}
You have a context problem. The $TEST1 read is in scalar context, so it only read one line. The $TEST2 read is in list context, so all the lines from the file are read and print_line() is called with a list of them as its arguments. So the second time you try to read from $TEST2 you get EOF, since all the lines were read the first time.
Related
Is there a way to get the line number (and maybe filename) where a __DATA__ token was coded? Or some other way to know the actual line number in the original source file where a line of data read from the DATA filehandle came from?
Note that $. counts from 1 when reading from the DATA filehandle. So if the line number of the __DATA__ token were added to $. it would be what I'm looking for.
For example:
#!/usr/bin/perl
while (<DATA>) {
my $n = $. + WHAT??;
die "Invalid data at line $n\n" if /bad/;
}
__DATA__
something good
something bad
I want this to say "Invalid data at line 9", not "line 2" (which is what you get if $. is used by itself).
In systems that support /proc/<pid> virtual filesystems (e.g., Linux), you can do:
# find the file where <DATA> handle is read from
my $DATA_FILE = readlink("/proc/$$/fd/" . fileno(*DATA));
# find the line where DATA begins
open my $THIS, "<", $DATA_FILE;
my #THIS = <$THIS>;
my ($DATA_LINE) = grep { $THIS[$_] =~ /^__DATA__\b/ } 0 .. $#THIS;
File don't actually have lines; they're just sequences of bytes. The OS doesn't even offer the capability of getting a line from a file, so it has no concept of line numbers.
Perl, on the other hand, does keep track of a line number for each handle. It is accessed via $..
However, the Perl handle DATA is created from a file descriptor that's already been moved to the start of the data —it's the file descriptor that Perl itself uses to load and parse the file— so there's no record of how many lines have already been read. So the line 1 of DATA is the first line after __DATA__.
To correct the line count, one must seek back to the start of the file, and read it line by line until the file handle is back at the same position it started.
#!/usr/bin/perl
use strict;
use warnings qw( all );
use Fcntl qw( SEEK_SET );
# Determines the line number at the current file position without using «$.».
# Corrects the value of «$.» and returns the line number.
# Sets «$.» to «1» and returns «undef» if unable to determine the line number.
# The handle is left pointing to the same position as when this was called, or this dies.
sub fix_line_number {
my ($fh) = #_;
( my $initial_pos = tell($fh) ) >= 0
or return undef;
seek($fh, 0, SEEK_SET)
or return undef;
$. = 1;
while (<$fh>) {
( my $pos = tell($fh) ) >= 0
or last;
if ($pos >= $initial_pos) {
if ($pos > $initial_pos) {
seek($fh, $initial_pos, SEEK_SET)
or die("Can't reset handle: $!\n");
}
return $.;
}
}
seek($fh, $initial_pos, SEEK_SET)
or die("Can't reset handle: $!\n");
$. = 1;
return undef;
}
my $prefix = fix_line_number(\*DATA) ? "" : "+";
while (<DATA>) {
printf "%s:%s: %s", __FILE__, "$prefix$.", $_;
}
__DATA__
foo
bar
baz
Output:
$ ./a.pl
./a.pl:48: foo
./a.pl:49: bar
./a.pl:50: baz
$ perl <( cat a.pl )
/dev/fd/63:+1: foo
/dev/fd/63:+2: bar
/dev/fd/63:+3: baz
Perl keeps track of the file and line at which each symbol is created. A symbol is normally created when the parser/compiler first encounters it. But if __DATA__ is encountered before DATA is otherwise created, this will create the symbol. We can take advantage of this to set the line number associated with the file handle in DATA.
For the case where the Package::DATA handle is not used in Package.pm itself, the line number of the __DATA__ token could be obtained via B::GV->LINE on the DATA handle:
$ cat Foo.pm
package Foo;
1;
__DATA__
good
bad
$ perl -I. -MFoo -MB -e '
my $ln = B::svref_2object(\*Foo::DATA)->LINE;
warn "__DATA__ at line $ln\n";
Foo::DATA->input_line_number($ln);
while(<Foo::DATA>){ die "no good" unless /good/ }
'
__DATA__ at line 4
no good at -e line 1, <DATA> line 6.
In the case where the DATA handle is referenced in the file itself, a possible kludge would be to use an #INC hook:
$ cat DH.pm
package DH;
unshift #INC, sub {
my ($sub, $fname) = #_;
for(#INC){
if(open my $fh, '<', my $fpath = "$_/$fname"){
$INC{$fname} = $fpath;
return \'', $fh, sub {
our (%ln, %pos);
if($_){ $pos{$fname} += length; ++$ln{$fname} }
}
}
}
};
$ cat Bar.pm
package Bar;
print while <DATA>;
1;
__DATA__
good
bad
$ perl -I. -MDH -MBar -e '
my $fn = "Bar.pm";
warn "__DATA__ at line $DH::ln{$fn} pos $DH::pos{$fn}\n";
seek Bar::DATA, $DH::pos{$fn}, 0;
Bar::DATA->input_line_number($DH::ln{$fn});
while (<Bar::DATA>){ die "no good" unless /good/ }
'
good
bad
__DATA__ at line 6 pos 47
no good at -e line 6, <DATA> line 8.
Just for the sake of completion, in the case where you do have control over the file, all could be easily done with:
print "$.: $_" while <DATA>;
BEGIN { our $ln = __LINE__ + 1; DATA->input_line_number($ln) }
__DATA__
...
You can also use the first B::GV solution, provided that you reference the DATA handle via an eval:
use B;
my ($ln, $data) = eval q{B::svref_2object(\*DATA)->LINE, \*DATA}; die $# if $#;
$data->input_line_number($ln);
print "$.: $_" while <$data>;
__DATA__
...
None of these solutions assumes that the source file are seekable (except if you want to read the DATA more than once, as I did in the second example), or try to reparse your files, etc.
Comparing the end of the file to itself in reverse might do what you want:
#!/usr/bin/perl
open my $f, "<", $0;
my #lines;
my #dataLines;
push #lines ,$_ while <$f>;
close $f;
push #dataLines, $_ while <DATA>;
my #revLines= reverse #lines;
my #revDataLines=reverse #dataLines;
my $count=#lines;
my $offset=0;
$offset++ while ($revLines[$offset] eq $revDataLines[$offset]);
$count-=$offset;
print "__DATA__ section is at line $count\n";
__DATA__
Hello there
"Some other __DATA__
lkjasdlkjasdfklj
ljkasdf
Running give a output of :
__DATA__ section is at line 19
The above script reads itself (using $0 for file name) into the #lines array and reads the DATA file into the #dataLines array.
The arrays are reversed and then compared element by element until they are different. The number of lines are tracked in $offset and this is subtracted from the $count variable which is the number of lines in the file.
The result is the line number the DATA section starts at. Hope that helps.
Thank you #mosvy for the clever and general idea.
Below is a consolidated solution which works anywhere. It uses a symbolic reference instead of eval to avoid mentioning "DATA" at compile time, but otherwise uses the same ideas as mosvy.
The important point is that code in a package containing __DATA__ must not refer to the DATA symbol by name so that that symbol won't be created until the compiler sees the __DATA__ token. The way to avoid mentioning DATA is to use a filehandle ref created at run-time.
# Get the DATA filehandle for a package (default: the caller's),
# fixed so that "$." provides the actual line number in the
# original source file where the last-read line of data came
# from, rather than counting from 1.
#
# In scalar context, returns the fixed filehandle.
# In list context, returns ($fh, $filename)
#
# For this to work, a package containing __DATA__ must not
# explicitly refer to the DATA symbol by name, so that the
# DATA symbol (glob) will not yet be created when the compiler
# encounters the __DATA__ token.
#
# Therefore, use the filehandle ref returned by this
# function instead of DATA!
#
sub get_DATA_fh(;$) {
my $pkg = $_[0] // caller;
# Using a symbolic reference to avoid mentioning "DATA" at
# compile time, in case we are reading our own module's __DATA__
my $fh = do{ no strict 'refs'; *{"${pkg}::DATA"} };
use B;
$fh->input_line_number( B::svref_2object(\$fh)->LINE );
wantarray ? ($fh, B::svref_2object(\$fh)->FILE) : $fh
}
Usage examples:
my $fh = get_DATA_fh; # read my own __DATA__
while (<$fh>) { print "$. : $_"; }
or
my ($fh,$fname) = get_DATA_fh("Otherpackage");
while (<$fh>) {
print " $fname line $. : $_";
}
I have a very basic perl script which prints the next line in a text file after matching a search pattern.
#ARGV = <dom_boot.txt>;
while ( <> ) {
print scalar <> if /name=sacux445/;
}
Which works, However I would like to capture the output into a file for further use, rather than printing it to STDOUT.
I'm just learning (slowly) so attempted this:
my $fh;
my $dom_bootdev = 'dom_bootdev.txt';
open ($fh, '>', $dom_bootdev) or die "No such file";
#ARGV = <dom_boot.txt>;
while ( <> ) {
print $fh <> if /name=sacux445/;
}
close $fh;
But I get a syntax error.
syntax error at try.plx line 19, near "<>"
I'm struggling to figure this out. I'm guessing it's probably very simple so any help would be appreciated.
Thanks,
Luke.
The Perl parser sometimes has problems with indirect notation. The canonical way to handle it is to wrap the handle into a block:
print {$fh} <> if /name=sacux445/;
Are you sure you want to remove scalar?
Simply fetch the next line within the loop and print it, if the line matches the pattern:
while (<>) {
next unless /name=sacux445/;
my $next = <>;
last unless defined $next;
print $fh $next;
}
Note, you need to check the return value of the diamond operator.
Input
name=sacux445 (1)
aaa
name=sacux445 (2)
bbb
name=sacux445 (3)
Output
aaa
bbb
One should learn to use state machines for parsing data. A state machine allows the input read to be in only one place in the code. Rewriting the code as a state machine:
use strict;
use warnings;
use autodie; # See http://perldoc.perl.org/autodie.html
my $dom_bootdev = 'dom_bootdev.txt';
open ( my $fh, '>', $dom_bootdev ); # autodie handles open errors
use File::Glob qw( :bsd_glob ); # Perl's default glob() does not handle spaces in file names
#ARGV = glob( 'dom_boot.txt' );
my $print_next_line = 0;
while( my $line = <> ){
if( $line =~ /name=sacux445/ ){
$print_next_line = 1;
next;
}
if( $print_next_line ){
print {$fh} $line;
$print_next_line = 0;
next;
}
}
When To Us a State Machine
If the data is context-free, it can be parsed using only regular expressions.
If the data has a tree structure, it can be parsed using a simple state machine.
For more complex structures, a least one state machine with a push-down stack is required. The stack records the previous state so that the machine can return to it when the current state is finished.
The most complex data structure in use is XML. It requires a state machine for its syntax and a second one with a stack for its semantics.
I am trying to both learn perl and use it in my research. I need to do a simple task which is counting the number of sequences and their lengths in a file such as follow:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this:
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
This is the code I have written which is very crude and simple:
#!/usr/bin/perl
use strict;
use warnings;
my ($input, $output) = #ARGV;
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
while (<INFILE>) {
chomp;
if (/^>/)
{
my $number_of_sequences++;
}else{
my length = length ($input);
}
}
print length, number_of_sequences;
close (INFILE);
I'd be grateful if you could give me some hints, for example, in the else block, when I use the length function, I am not sure what argument I should pass into it.
Thanks in advance
You're printing out just the last length, not each sequence length, and you want to catch the sequence names as you go:
#!/usr/bin/perl
use strict;
use warnings;
my ($input, $output) = #ARGV;
my ($lastSeq, $number_of_sequences) = ('', 0);
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
# You never use OUTFILE
# open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
while (<INFILE>) {
chomp;
if (/^>(.+)/)
{
$lastSeq = $1;
$number_of_sequences++;
}
else
{
my $length = length($_);
print "$lastSeq $length\n";
}
}
print "Total number of sequences = $number_of_sequences\n";
close (INFILE);
Since you have indicated that you want feedback on your program, here goes:
my ($input, $output) = #ARGV;
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
Personally, I think when dealing with a simple input/output file relation, it is best to just use the diamond operator and standard output. That means that you read from the special file handle <>, commonly referred to as "the diamond operator", and you print to STDOUT, which is the default output. If you want to save the output in a file, just use shell redirection:
perl program.pl input.txt > output.txt
In this part:
my $number_of_sequences++;
you are creating a new variable. This variable will go out of scope as soon as you leave the block { .... }, in this case: the if-block.
In this part:
my length = length ($input);
you forgot the $ sigil. You are also using length on the file name, not the line you read. If you want to read a line from your input, you must use the file handle:
my $length = length(<INFILE>);
Although this will also include the newline in the length.
Here you have forgotten the sigils again:
print length, number_of_sequences;
And of course, this will not create the expected output. It will print something like sequence112.
Recommendations:
Use a while (<>) loop to read your input. This is the idiomatic method to use.
You do not need to keep a count of your input lines, there is a line count variable: $.. Though keep in mind that it will also count "bad" lines, like blank lines or headers. Using your own variable will allow you to account for such things.
Remember to chomp the line before finding out its length. Or use an alternative method that only counts the characters you want: my $length = ( <> =~ tr/ATCG// ) This will read a line, count the letters ATGC, return the count and discard the read line.
Summary:
use strict;
use warnings; # always use these two pragmas
my $count;
while (<>) {
next unless /^>/; # ignore non-header lines
$count++; # increment counter
chomp;
my $length = (<> =~ tr/ATCG//); # get length of next line
s/^>(\S+)/$1 $length\n/; # remove > and insert length
} continue {
print; # print to STDOUT
}
print "Total number is sequences = $count\n";
Note the use of continue here, which will allow us to skip a line that we do not want to process, but that will still get printed.
And as I said above, you can redirect this to a file if you want.
For starters, you need to change your inner loop to this:
...
chomp;
if (/^>/)
{
$number_of_sequences++;
$sequence_name = $_;
}else{
print "$sequence_name ", length($input), "\n";
}
...
Note the following:
The my declaration has been removed from $number_of_sequences
The sequence name is captured in the variable $sequence_name. It is used later when the next line is read.
To make the script run under strict mode, you can add my declarations for $number_of_sequences and $sequence_name outside of the loop:
my $sequence_name;
my $number_of_sequences = 0;
while (<INFILE>) {
...(as above)...
}
print "Total number of sequences: $number_of_sequences\n";
The my keyword declares a new lexically scoped variable - i.e. a variable which only exists within a certain block of code, and every time that block of code is entered, a new version of that variable is created. Since you want to have the value of $sequence_name carry over from one loop iteration to the next you need to place the my outside of the loop.
#!/usr/bin/perl
use strict;
use warnings;
my ($file, $line, $length, $tag, $count);
$file = $ARGV[0];
open (FILE, "$file") or print"can't open file $file\n";
while (<FILE>){
$line=$_;
chomp $line;
if ($line=~/^>/){
$tag = $line;
}
else{
$length = length ($line);
$count=1;
}
if ($count==1){
print "$tag\t$length\n";
$count=0
}
}
close FILE;
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a big file with repeated lines as follows:
#UUSM
ABCDEADARFA
+------qqq
!2wqeqs6777
I will like to output the all the 'second line' in the file. I have the following code snipped for doing this, but it's not working as expected. Lines 1, 3 and 4 are in the output instead.
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
while (<IN>) {
$line = $line . $_;
if ($line =~ /^\#/) {
<IN>;
#next;
my $line = $line;
}
}
print "$line";
Please help!
try this
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
my $lines = "";
while (<IN>) {
if ($. % 4 == 2) $lines .= $_;
}
print "$lines";
I assume what you are asking is how to print the line that comes after a line that begins with #:
perl -ne 'if (/^\#/) { print scalar <> }' file1.txt
This says, "If the line begins with #, then print the next line. Do this for all the files in the argument list." The scalar function is used here to impose a scalar context on the file handle, so that it does not print the whole file. By default print has a list context for its arguments.
If you actually want to print the second line in the file, well, that's even easier. Here's a few examples:
Using the line number $. variable, printing if it equals line number 2.
perl -ne '$. == 2 and print, close ARGV' yourfile.txt
Note that if you have multiple files, you must close the ARGV file handle to reset the counter $.. Note also the use of the lower precedence operator and will force print and close to both be bound to the conditional.
Using regular logic.
perl -ne 'print scalar <>; close ARGV;'
perl -pe '$_ = <>; close ARGV;'
Both of these uses a short-circuit feature by closing the ARGV file handle when the second line is printed. If you should want to print every other line of a file, both these will do that if you remove the close statements.
perl -ne '$at = $. if /^\#/; print if $. - 1 == $at' file1.txt
Written out longhand, the above is equivalent to
open my $fh, "<", "file1.txt";
my $at_line = 0;
while (<$fh>) {
if (/^\#/) {
$at_line = $.;
}
else {
print if $. - 1 == $at_line;
}
}
If you want lines 2, 6, 10 printed, then:
while (<>)
{
print if $. % 4 == 2;
}
Where $. is the current line number — and I didn't spend the time opening and closing the file. That might be:
{
my $file = "file1.txt";
open my $in, "<", $file or die "cannot open input file $file: $!";
while (<$in>)
{
print if $. % 4 == 2;
}
}
This uses the modern preferred form of file handle (a lexical file handle), and the braces around the construct mean the file handle is closed automatically. The name of the file that couldn't be opened is included in the error message; the or operator is used so the precedence is correct (the parentheses and || in the original were fine too and could be used here, but conventionally are not).
If you want the line after a line starting with # printed, you have to organize things differently.
my $print_next = 0;
while (<>)
{
if ($print_next)
{
print $_;
$print_next = 0;
}
elsif (m/^#/)
{
$print_next = 1;
}
}
Dissecting the code in the question
The original version of the code in the question was (line numbers added for convenience):
1 open(IN,"<", "file1.txt") || die "cannot open input file:$!";
2 while (<IN>) {
3 $line = $line . $_;
4 if ($line =~ /^\#/) {
5 <IN>;
6 #next;
7 my $line = $line;
8 }
9 }
10 print "$line";
Discussion of each line:
OK, though it doesn't use a lexical file handle or report which file could not be opened.
OK.
Premature and misguided. This adds the current line to the variable $line before any analysis is done. If it was desirable, it could be written $line .= $_;
Suggests that the correct description for the desired output is not 'the second lines' but 'the line after a line starting with #. Note that since there is no multi-line modifier on the regex, this will always match only the first line segment in the variable $line. Because of the premature concatenation, it will match on each line (because the first line of data starts with #), executing the code in lines 5-8.
Reads another line into $_. It doesn't test for EOF, but that's harmless.
Comment line; no significance except to suggest some confusion.
my $line = $line; is a self-assignment to a new variable hiding the outer $line...mainly, this is weird and to a lesser extent it is a no-op. You are not using use strict; and use warnings; because you would have warnings if you did. Perl experts use use strict; and use warnings; to make sure they haven't made silly mistakes; novices should use them for the same reason.
Of itself, OK. However, the code in the condition has not really done very much. It skips the second line in the file; it will later skip the fourth, the sixth, the eighth, etc.
OK.
OK, but...if you're only interested in printing the lines after the line starting #, or only interested in printing the line numbers 2N+2 for integral N, then there is no need to build up the entire string in memory before printing each line. It will be simpler to print each line that needs printing as it is found.
I usually loop through lines in a file using the following code:
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
...
}
However, in answering another question, Evan Carroll edited my answer, changing my while statement to:
while ( defined( my $line = <$fh> ) ) {
...
}
His rationale was that if you have a line that's 0 (it'd have to be the last line, else it would have a carriage return) then your while would exit prematurely if you used my statement ($line would be set to "0", and the return value from the assignment would thus also be "0" which gets evaluated to false). If you check for defined-ness, then you don't run into this problem. It makes perfect sense.
So I tried it. I created a textfile whose last line is 0 with no carriage return on it. I ran it through my loop and the loop did not exit prematurely.
I then thought, "Aha, maybe the value isn't actually 0, maybe there's something else there that's screwing things up!" So I used Dump() from Devel::Peek and this is what it gave me:
SV = PV(0x635088) at 0x92f0e8
REFCNT = 1
FLAGS = (PADMY,POK,pPOK)
PV = 0X962600 "0"\0
CUR = 1
LEN = 80
That seems to tell me that the value is actually the string "0", as I get a similar result if I call Dump() on a scalar I've explicitly set to "0" (the only difference is in the LEN field -- from the file LEN is 80, whereas from the scalar LEN is 8).
So what's the deal? Why doesn't my while() loop exit prematurely if I pass it a line that's only "0" with no carriage return? Is Evan's loop actually more defensive, or does Perl do something crazy internally that means you don't need to worry about these things and while() actually only does exit when you hit eof?
Because
while (my $line = <$fh>) { ... }
actually compiles down to
while (defined( my $line = <$fh> ) ) { ... }
It may have been necessary in a very old version of perl, but not any more! You can see this from running B::Deparse on your script:
>perl -MO=Deparse
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
...
}
^D
die "Could not open file $file for reading: $!\n" unless open my $fh, '<', $file;
while (defined(my $line = <$fh>)) {
do {
die 'Unimplemented'
};
}
- syntax OK
So you're already good to go!
BTW, this is covered in the I/O Operators section of perldoc perlop:
In scalar context, evaluating a filehandle in angle brackets yields the next line from that file (the newline, if any, included), or "undef" at end-of-file or on error. When $/ is set to "undef" (sometimes known as file-slurp mode) and the file is empty, it returns '' the first time, followed by "undef" subsequently.
Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a "while" statement (even if disguised as a "for(;;)" loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) The $_ variable is not implicitly localized. You'll have to put a "local $_;" before the loop if you want that to happen.
The following lines are equivalent:
while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while defined($_ = <STDIN>);
print while ($_ = <STDIN>);
print while <STDIN>;
This also behaves similarly, but avoids $_ :
while (my $line = <STDIN>) { print $line }
In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:
while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }
In other boolean contexts, "<filehandle>" without an explicit "defined" test or comparison elicit a warning if the "use warnings" pragma or the -w command-line switch (the $^W variable) is in effect.
While it is correct that the form of while (my $line=<$fh>) { ... } gets compiled to while (defined( my $line = <$fh> ) ) { ... } consider there are a variety of times when a legitimate read of the value "0" is misinterpreted if you do not have an explicit defined in the loop or testing the return of <>.
Here are several examples:
#!/usr/bin/perl
use strict; use warnings;
my $str = join "", map { "$_\n" } -10..10;
$str.="0";
my $sep='=' x 10;
my ($fh, $line);
open $fh, '<', \$str or
die "could not open in-memory file: $!";
print "$sep Should print:\n$str\n$sep\n";
#Failure 1:
print 'while ($line=chomp_ln()) { print "$line\n"; }:',
"\n";
while ($line=chomp_ln()) { print "$line\n"; } #fails on "0"
rewind();
print "$sep\n";
#Failure 2:
print 'while ($line=trim_ln()) { print "$line\n"; }',"\n";
while ($line=trim_ln()) { print "$line\n"; } #fails on "0"
print "$sep\n";
last_char();
#Failure 3:
# fails on last line of "0"
print 'if(my $l=<$fh>) { print "$l\n" }', "\n";
if(my $l=<$fh>) { print "$l\n" }
print "$sep\n";
last_char();
#Failure 4 and no Perl warning:
print 'print "$_\n" if <$fh>;',"\n";
print "$_\n" if <$fh>; #fails to print;
print "$sep\n";
last_char();
#Failure 5
# fails on last line of "0" with no Perl warning
print 'if($line=<$fh>) { print $line; }', "\n";
if($line=<$fh>) {
print $line;
} else {
print "READ ERROR: That was supposed to be the last line!\n";
}
print "BUT, line read really was: \"$line\"", "\n\n";
sub chomp_ln {
# if I have "warnings", Perl says:
# Value of <HANDLE> construct can be "0"; test with defined()
if($line=<$fh>) {
chomp $line ;
return $line;
}
return undef;
}
sub trim_ln {
# if I have "warnings", Perl says:
# Value of <HANDLE> construct can be "0"; test with defined()
if (my $line=<$fh>) {
$line =~ s/^\s+//;
$line =~ s/\s+$//;
return $line;
}
return undef;
}
sub rewind {
seek ($fh, 0, 0) or
die "Cannot seek on in-memory file: $!";
}
sub last_char {
seek($fh, -1, 2) or
die "Cannot seek on in-memory file: $!";
}
I am not saying these are good forms of Perl! I am saying that they are possible; especially Failure 3,4 and 5. Note the failure with no Perl warning on number 4 and 5. The first two have their own issues...