I am new to Perl, but I know a little bit of C though. I came across this snippet in one of our classroom notes:
$STUFF = "C:/scripts/stuff.txt";
open STUFF or die "Cannot open $STUFF for read: $!";
print "Line $. is: $_" while (<STUFF>);
Why is the while after the print statement? What does it do?
It's the same as
while (<STUFF>) {
print "Line $. is : $_";
}
It's written the way it is because it's simpler and more compact. This is, in general, Perl's (in)famous "statement modifier" form for conditionals and loops.
The other answers have explained the statement modifier form of the while loop. However, there's a lot of other magic going on here. In particular, the script relies on three of Perl's special variables. Two of these ($_ and $!) are very common; the other ($.) is reasonably common. But they're all worth knowing.
When you run while <$fh> on an opened filehandle, Perl automagically runs through the file, line by line, until it hits EOF. Within each loop, the current line is set to $_ without you doing anything. So these two are the same:
while (<$fh>) { # something }
while (defined($_ = <$fh>)) { # something }
See perldoc perlop, the section on I/O operators. (Some people find this too magical, so they use while (my $line = <$fh>) instead. This gives you $line for each line rather than $_, which is a clearer variable name, but it requires more typing. To each his or her own.)
$! holds the value of a system error (if one is set). See perldoc perlvar, the section on $OS_ERROR, for more on how and when to use this.
$. holds a line number. See perldoc perlvar, the section on $NR. This variable can be surprisingly tricky. It won't necessarily have the line number of the file you are currently reading. An example:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
print "$ARGV: $.\n";
}
If you save this as lines and run it as perl lines file1 file2 file3, then Perl will count lines straight through file1, file2 and file3. You can see that Perl knows what file it's reading from (it's in $ARGV; the filenames will be correct), but it doesn't reset line numbering automatically for you at the end of each file. I mention this since I was bit by this behavior more than once until I finally got it through my (thick) skull. You can reset the numbering to track individual files this way:
#!/usr/bin/env perl
use strict;
use warnings;
while (<>) {
print "$ARGV: $.\n";
}
continue {
close ARGV if eof;
}
You should also check out the strict and warnings pragmas and take a look at the newer, three-argument form of open. I just noticed that you are "unknown (google)", which means you are likely never to return. I guess I got my typing practice for the day, at least.
The following snippets are exactly equivalent, just different ways of writing the same thing:
print "Line $. is : $_" while (<STUFF>);
while (<STUFF>) {
print "Line $. is : $_";
}
What this does is each iteration through the loop, Perl reads one line of text from the STUFF file and puts it in the special variable $_ (this is what the angle brackets do). Then the body of the loop prints lines like:
Line 1 is : test
Line 2 is : file
The special variable $. is the line number of the last line read from a file, and $_ is the contents of that line as set by the angle bracket operator.
Placing the while after the print, makes the line read almost like normal English.
It also puts emphasis on the print instead of the while. And you don't need the curly brackets: { ... }
It can also be used with if and unless, for example,
print "Debug: foobar=$foobar\n" if $DEBUG;
print "Load file...\n" unless $QUIET;
I've taken the liberty of rewriting your snippet as I would.
Below my suggested code is a rogues gallery of less than optimal methods you might see in the wild.
use strict;
use warnings;
my $stuff_path = 'c:/scripts/stuff.txt';
open (my $stuff, '<', $stuff_path)
or die "Cannot open'$stuff_path' for read : $!\n";
# My preferred method to loop over a file line by line:
# while loop with explicit variable
while( my $line = <$stuff> ) {
print "Line $. is : $line\n";
}
Here are other methods you might see. Each one could be substituted for the while loop in my example above.
# while statement modifier
# - OK, but less clear than explicit code above.
print "Line $. is : $_" while <$stuff>;
# while loop
# - OK, but if I am operating on $_ explicitly, I prefer to use an explicit variable.
while( <$stuff> ) {
print "Line $. is : $_";
}
# for statement modifier
# - inefficient
# - loads whole file into memory
print "Line $. is : $_" for <$stuff>;
# for loop - inefficient
# - loads whole file into memory;
for( <$stuff> ) {
print "Line $. is : $_\n";
}
# for loop with explicit variable
# - inefficient
# - loads whole file into memory;
for my $line ( <$stuff> ) {
print "Line $. is : $line\n";
}
# Exotica -
# map and print
# - inefficient
# - loads whole file into memory
# - stores complete output in memory
print map "Line $. is : $_\n", <$stuff>;
# Using readline rather than <>
# - Alright, but overly verbose
while( defined (my $line = readline($stuff) ) {
print "Line $. is : $line\n";
}
# Using IO::Handle methods on a lexical filehandle
# - Alright, but overly verbose
use IO::Handle;
while( defined (my $line = $stuff->readline) ) {
print "Line $. is : $line\n";
}
Note that the while statement can only follow your loop body if it's a one-liner. If your loop body runs over several lines, then your while has to precede it.
It is the same as a:
while (<STUFF>) { print "Line $. is: $_"; }
Related
In my Perl script I have a double infinite while loop. I read lines from a file with the diamond operator. But somehow if my script reaches the last line of the file, it does not return undef, but hangs forever.
If I reduced my code to a single while loop this did not happen. So I wonder if I am doing something wrong or if this is a known limitation of the language. (This is actually my first perl script.)
Below is my script. It is meant to count the size of DNA sequences in fasta files, but the hanging behavior can be observed with any other file with multiple lines of text.
Perl version 5.18.2
Invoked from the commandline like perl script.pl file.fa
$l = <>;
while (1) {
$N = 0;
while (1) {
print "Get line";
$l = <>;
print "Got line";
if (not($l)) {
last;
}
if ($l =~ /^>/) {
last;
}
$N += length($l);
}
print $N;
if (not($N)) {
last;
}
}
I put some debug print statements so that you can see that the last line printed is "Get line" and then it hangs.
Welcome to Perl.
The issue with your code is that you have no way of escaping the outer loop. <> will return undef when it reaches the end of the file. At this point your inner loop ends and the outer loop sends it back in. Forcing further reads causes <> to start looking at STDIN which never sends an EOF, so your loop continues forever.
As this is your first Perl script I'm going to rewrite it for you with some comments. Perl is a fantastic language, you can write some great code, however mostly due to it's age there are some older styles which are no longer advised.
use warnings; # Warn about coding errors
use strict; # Enforce good style
use 5.010; # Enable modernish (10 year old) features
# Another option which mostly does the same as above.
# I normally do this, but it does require a non-standard CPAN library
# use Modern::Perl;
# Much better style to have the condition in the while loop
# Much clearer than having an infinite loop with break/last statements
# Also avoid $l as a variable name, it looks too much like $1
my $count = 0; # Note variable declaration, enforced by strict
while(my $line = <>) {
if ($line =~ /^>/) {
# End of input block, output and reset
say $count;
$count = 0;
} else {
$count += length($line);
}
}
# Have reached the end of the input files
say $count;
try "echo | perl script.pl file.fa".
works for me with same "problem" in my code.
gets EOF from stdin.
In AWK, it is common to see this kind of structure for a script that runs on two files:
awk 'NR==FNR { print "first file"; next } { print "second file" }' file1 file2
Which uses the fact that there are two variables defined: FNR, which is the line number in the current file and NR which is the global count (equivalent to Perl's $.).
Is there something similar to this in Perl? I suppose that I could maybe use eof and a counter variable:
perl -nE 'if (! $fn) { say "first file" } else { say "second file" } ++$fn if eof' file1 file2
This works but it feels like I might be missing something.
To provide some context, I wrote this answer in which I manually define a hash but instead, I would like to populate the hash from the values in the first file, then do the substitutions on the second file. I suspect that there is a neat, idiomatic way of doing this in Perl.
Unfortunately, perl doesn't have a similar NR==FNR construct to differentiate between two files. What you can do is use the BEGIN block to process one file and main body to process the other.
For example, to process a file with the following:
map.txt
a=apple
b=ball
c=cat
d=dog
alpha.txt
f
a
b
d
You can do:
perl -lne'
BEGIN {
$x = pop;
%h = map { chomp; ($k,$v) = split /=/; $k => $v } <>;
#ARGV = $x
}
print join ":", $_, $h{$_} //= "Not Found"
' map.txt alpha.txt
f:Not Found
a:apple
b:ball
d:dog
Update:
I gave a pretty simple example, and now when I look at that, I can only say TIMTOWDI since you can do:
perl -F'=' -lane'
if (#F == 2) { $h{$F[0]} = $F[1]; next }
print join ":", $_, $h{$_} //= "Not Found"
' map.txt alpha.txt
f:Not Found
a:apple
b:ball
d:dog
However, I can say for sure, there is no NR==FNR construct for perl and you can probably process them in various different ways based on the files.
It looks like what you're aiming for is to use the same loop for reading both files, and have a conditional inside the loop that chooses what to do with the data. I would avoid that idea because you are hiding what two distinct processes in the same stretch of code, making it less than clear what is going on.
But, in the case of just two files, you could compare the current file with the first element of #ARGV, like this
perl -nE 'if ($ARGV eq $ARGV[0]) { say "first file" } else { say "second file" }' file1 file2
Forgetting about one-line programs, which I hate with a passion, I would just explicitly open $ARGV[0] and $ARGV[1]. Perhaps naming them like this
use strict;
use warnings;
use 5.010;
use autodie;
my ($definitions, $data) = #ARGV;
open my $fh, '<', $definitions;
while (<$fh>) {
# Build hash
}
open $fh, '<', $data;
while (<$fh>) {
# Process file
}
But if you want to avail yourself of the automatic opening facilities then you can mess with #ARGV like this
use strict;
use warnings;
my ($definitions, $data) = #ARGV;
#ARGV = ($definitions);
while (<>) {
# Build hash
}
#ARGV = ($data);
while (<>) {
# Process file
}
You can also create your own $fnr and compare to $..
Given:
var='first line
second line'
echo "$var" >f1
echo "$var" >f2
echo "$var" >f3
You can create a pseudo FNR by setting a variable in the BEGIN block and resetting at each eof:
perl -lnE 'BEGIN{$fnr=1;}
if ($fnr==$.) {
say "first file: $ARGV, $fnr, $. $_";
}
else {
say "$ARGV, $fnr, $. $_";
}
eof ? $fnr=1 : $fnr++;' f{1..3}
Prints:
first file: f1, 1, 1 first line
first file: f1, 2, 2 second line
f2, 1, 3 first line
f2, 2, 4 second line
f3, 1, 5 first line
f3, 2, 6 second line
Definitely not as elegant as awk but it works.
Note that Ruby has support for FNR==NR type logic.
I have simple question:
why the first code does not print the first line of the file but the second one does?
#! /usr/bin/perl
use warnings;
use strict;
my $protfile = "file.txt";
open (FH, $protfile);
while (<FH>) {
print (<FH>);
}
#! /usr/bin/perl
use warnings;
use strict;
my $protfile = "file.txt";
open (FH, $protfile);
while (my $file = <FH>) {
print ("$file");
}
Context.
Your first program tests for end-of-file on FH by reading the first line, then reads FH in list context as an argument to print. That translates to the whole file, as a list with one line per item. It then tests for EOF again, most likely detects it, and stops.
Your second program iterates by line, each one read in scalar context to variable $file, and prints them individually. It detects EOF by a special case in the while syntax. (see the code samples in the documentation)
So the specific reason why your program doesn't print the first line in one case is that it's lost in the argument to while. Do note that the two programs' structure is pretty different: the first only runs a single while iteration, while the second iterates once per line.
PS: nowadays, the recommended way to manage files tends towards lexical filehandles (open my $file, 'name'; print <$file>;).
Because you are comsuming the first line with the <> operator and then using it again in the print, so the first line has already gone but you are not printing it. <> is the readline operator. You need to print the $_ variable, or assign it to a defined variable as you are doing in the second code. You could rewrite the first:
print;
And it would work, because print uses $_ if you don't give it anything.
When used in scalar context, <FH> returns the next single line from the file.
When used in list context, <FH> returns a list of all remaining lines in the file.
while (my $file = <FH>) is a scalar context, since you're assigning to a scalar. while (<FH>) is short for while(defined($_ = <FH>)), so it is also a scalar context. print (<FH>); makes it a list context, since you're using it as argument to a function that can take multiple arguments.
while (<FH>) {
print (<FH>);
}
The while part reads the first line into $_ (which is never used again). Then the print part reads the rest of the lines all at once, then prints them all out again. Then the while condition is checked again, but since there are now no lines left, <FH> returns undef and the loop quits after just one iteration.
while (my $file = <FH>) {
print ("$file");
}
does more what you probably expect: reads and then prints one line during each iteration of the loop.
By the way, print $file; does the same as print ("$file");
while (<FH>) {
print (<FH>);
}
use this instead:
while (<FH>) {
print $_;
}
I'm reading this textfile to get ONLY the words in it and ignore all kind of whitespaces:
hello
now
do you see this.sadslkd.das,msdlsa but
i hoohoh
And this is my Perl code:
#!usr/bin/perl -w
require 5.004;
open F1, './text.txt';
while ($line = <F1>) {
#print $line;
#arr = split /\s+/, $line;
foreach $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
#print #arr;
}
close F1;
And this is the output:
hello
now
do
you
see
this.sadslkd.das,msdlsa
but
i
hoohoh
The output is showing two newlines but I am expecting the output to be just words. What should I do to just get words?
You should always use strict and use warnings (in preference to the -w command-line qualifier) at the top of every Perl program, and declare each variable at its first point of use using my. That way Perl will tell you about simple errors that you may otherwise overlook.
You should also use lexical file handles with the three-parameter form of open, and check the status to make sure it succeeded. There is little point in explicitly closing an input file unless you expect your program to run for an appreciable time, as Perl will close all files for you on exit.
Do you really need to require Perl v5.4? That version is fifteen years old, and if there is anything older than that installed then you have a museum!
Your program would be better like this:
use strict;
use warnings;
open my $fh, '<', './text.txt' or die $!;
while (my $line = <$fh>) {
my #arr = split /\s+/, $line;
foreach my $w (#arr) {
if ($w !~ /^\s+$/) {
print $w."\n";
}
}
}
Note: my apologies. The warnings pragma and lexical file handles were introduced only in v5.6 so that part of my answer is irrelevant. The latest version of Perl is v5.16 and you really should upgrade
As Birei has pointed out, the problem is that, when the line has leading whitespace, there is a empty field before the first separator. Imagine if your data was comma-separated, then you would want Perl to report a leading empty field if the line started with a comma.
To extract all the non-space characters you can use a regular expression that does exactly that
my #arr = $line =~ /\S+/g;
and this can be emulated by using the default parameter for split which is a single quoted space (not a regular expression)
my #arr = $line =~ split ' ', $line;
In this case split behaves like the awk utility and discards any leading empty fields as you expected.
This is even simpler if you let Perl use the $_ variable in the read loop, as all of the parameters for split can be defaulted:
while (<F1>) {
my #arr = split;
foreach my $w (#arr) {
print "$w\n" if $w !~ /^\s+$/;
}
}
This line is the problem:
#arr=split(/\s+/,$line);
\s+ does a match just before the leading spaces. Use ' ' instead.
#arr=split(' ',$line);
I believe that in this line:
if(!($w =~ /^\s+$/))
You wanted to ask if there's nothing in this row - don't print it.
But the "+" in the REGEX actually force it to have at least 1 space.
If you change the "\s+" to "\s*", you'll see that it's working. because * is 0 occurrences or more ...
I have some code that looks like
my ($ids,$nIds);
while (<myFile>){
chomp;
$ids.= $_ . " ";
$nIds++;
}
This should concatenate every line in my myFile, and nIds should be my number of lines. How do I print out my $ids and $nIds?
I tried simply print $ids, but Perl complains.
my ($ids, $nIds)
is a list, right? With two elements?
print "Number of lines: $nids\n";
print "Content: $ids\n";
How did Perl complain? print $ids should work, though you probably want a newline at the end, either explicitly with print as above or implicitly by using say or -l/$\.
If you want to interpolate a variable in a string and have something immediately after it that would looks like part of the variable but isn't, enclose the variable name in {}:
print "foo${ids}bar";
You should always include all relevant code when asking a question. In this case, the print statement that is the center of your question. The print statement is probably the most crucial piece of information. The second most crucial piece of information is the error, which you also did not include. Next time, include both of those.
print $ids should be a fairly hard statement to mess up, but it is possible. Possible reasons:
$ids is undefined. Gives the warning undefined value in print
$ids is out of scope. With use
strict, gives fatal warning Global
variable $ids needs explicit package
name, and otherwise the undefined
warning from above.
You forgot a semi-colon at the end of
the line.
You tried to do print $ids $nIds,
in which case perl thinks that $ids
is supposed to be a filehandle, and
you get an error such as print to
unopened filehandle.
Explanations
1: Should not happen. It might happen if you do something like this (assuming you are not using strict):
my $var;
while (<>) {
$Var .= $_;
}
print $var;
Gives the warning for undefined value, because $Var and $var are two different variables.
2: Might happen, if you do something like this:
if ($something) {
my $var = "something happened!";
}
print $var;
my declares the variable inside the current block. Outside the block, it is out of scope.
3: Simple enough, common mistake, easily fixed. Easier to spot with use warnings.
4: Also a common mistake. There are a number of ways to correctly print two variables in the same print statement:
print "$var1 $var2"; # concatenation inside a double quoted string
print $var1 . $var2; # concatenation
print $var1, $var2; # supplying print with a list of args
Lastly, some perl magic tips for you:
use strict;
use warnings;
# open with explicit direction '<', check the return value
# to make sure open succeeded. Using a lexical filehandle.
open my $fh, '<', 'file.txt' or die $!;
# read the whole file into an array and
# chomp all the lines at once
chomp(my #file = <$fh>);
close $fh;
my $ids = join(' ', #file);
my $nIds = scalar #file;
print "Number of lines: $nIds\n";
print "Text:\n$ids\n";
Reading the whole file into an array is suitable for small files only, otherwise it uses a lot of memory. Usually, line-by-line is preferred.
Variations:
print "#file" is equivalent to
$ids = join(' ',#file); print $ids;
$#file will return the last index
in #file. Since arrays usually start at 0,
$#file + 1 is equivalent to scalar #file.
You can also do:
my $ids;
do {
local $/;
$ids = <$fh>;
}
By temporarily "turning off" $/, the input record separator, i.e. newline, you will make <$fh> return the entire file. What <$fh> really does is read until it finds $/, then return that string. Note that this will preserve the newlines in $ids.
Line-by-line solution:
open my $fh, '<', 'file.txt' or die $!; # btw, $! contains the most recent error
my $ids;
while (<$fh>) {
chomp;
$ids .= "$_ "; # concatenate with string
}
my $nIds = $.; # $. is Current line number for the last filehandle accessed.
How do I print out my $ids and $nIds?
print "$ids\n";
print "$nIds\n";
I tried simply print $ids, but Perl complains.
Complains about what? Uninitialised value? Perhaps your loop was never entered due to an error opening the file. Be sure to check if open returned an error, and make sure you are using use strict; use warnings;.
my ($ids, $nIds) is a list, right? With two elements?
It's a (very special) function call. $ids,$nIds is a list with two elements.