reading and printing a text file in Perl - perl

I have simple question:
why the first code does not print the first line of the file but the second one does?
#! /usr/bin/perl
use warnings;
use strict;
my $protfile = "file.txt";
open (FH, $protfile);
while (<FH>) {
print (<FH>);
}
#! /usr/bin/perl
use warnings;
use strict;
my $protfile = "file.txt";
open (FH, $protfile);
while (my $file = <FH>) {
print ("$file");
}

Context.
Your first program tests for end-of-file on FH by reading the first line, then reads FH in list context as an argument to print. That translates to the whole file, as a list with one line per item. It then tests for EOF again, most likely detects it, and stops.
Your second program iterates by line, each one read in scalar context to variable $file, and prints them individually. It detects EOF by a special case in the while syntax. (see the code samples in the documentation)
So the specific reason why your program doesn't print the first line in one case is that it's lost in the argument to while. Do note that the two programs' structure is pretty different: the first only runs a single while iteration, while the second iterates once per line.
PS: nowadays, the recommended way to manage files tends towards lexical filehandles (open my $file, 'name'; print <$file>;).

Because you are comsuming the first line with the <> operator and then using it again in the print, so the first line has already gone but you are not printing it. <> is the readline operator. You need to print the $_ variable, or assign it to a defined variable as you are doing in the second code. You could rewrite the first:
print;
And it would work, because print uses $_ if you don't give it anything.

When used in scalar context, <FH> returns the next single line from the file.
When used in list context, <FH> returns a list of all remaining lines in the file.
while (my $file = <FH>) is a scalar context, since you're assigning to a scalar. while (<FH>) is short for while(defined($_ = <FH>)), so it is also a scalar context. print (<FH>); makes it a list context, since you're using it as argument to a function that can take multiple arguments.
while (<FH>) {
print (<FH>);
}
The while part reads the first line into $_ (which is never used again). Then the print part reads the rest of the lines all at once, then prints them all out again. Then the while condition is checked again, but since there are now no lines left, <FH> returns undef and the loop quits after just one iteration.
while (my $file = <FH>) {
print ("$file");
}
does more what you probably expect: reads and then prints one line during each iteration of the loop.
By the way, print $file; does the same as print ("$file");

while (<FH>) {
print (<FH>);
}
use this instead:
while (<FH>) {
print $_;
}

Related

open multiline text file in perl

I have the following text file
Hello my
name is
Jeff
and I want to open it using perl. However, when using this code
use strict;
use warnings;
open(my $fh, "<", "input.txt") or die "cannot open input text!";
my $text= <$fh>;
print $text;
The output text is only Hello my. How can I make it print all of my input text lines and not just the first one ?
If you read the section on I/O operators in perldoc perlop, you will see the following:
In scalar context, evaluating a filehandle in angle brackets yields the next line from that file
And, a bit further down:
If a <FILEHANDLE> is used in a context that is looking for a list, a list comprising all input lines is returned, one line per list element.
If you assign <FILEHANDLE> to a scalar variable, you will get the next line (actually record) from the file. This is what you are doing:
my $text = <$fh>;
If you assign <FILEHANDLE> to an array, you will get all of the remaining data from the file, with each line (actually record) in a separate element of the array.
my #text = <$fh>;
If you want to get all data from a file into a scalar, there are a few approaches you can take. The naive approach is to read the data in list context and then join it using an empty string:
my $text = join '', <$fh>;
You can use $/ to change Perl's idea of a "record" (often done in a do block):
my $text = do { local $/; <$fh> };
I like the slurp method from Path::Tiny.
use Path::Tiny;
my $text = path($filename)->slurp;
In scalar context, evaluating a filehandle in angle brackets yields the next line from that file
so you can use while loop to read the lines (preferred method)
while (<$fh>) #output will store into default variable $_
{
print ;
}
Or you can use the array but you are trying to storing the all content of the file at time in the variable
my #array = <$fh>;
And the slurp mode to read all file in the scalar variable
my $multiline = do { local $/; <$fh>; } #$/ input record seprator

So what exactly does <FILE> do?

So, I've used <FILE> a large number of times. A simple example would be:
open (FILE, '<', "someFile.txt");
while (my $line = <FILE>){
print $line;
}
So, I had thought that using <FILE> would take a a part of a file at a time (a line specifically) and use it, and when it was called on again, it would go to the next line. And indeed, whenever I set <FILE> to a scalar, that's exactly what it would do. But, when I told the computer a line like this one:
print <FILE>;
it printed the entire file, newlines and all. So my question is, what does the computer think when it's passed <FILE>, exactly?
Diamond operator <> used to read from file is actually built-in readline function.
From perldoc -f readline
Reads from the filehandle whose typeglob is contained in EXPR (or from *ARGV if EXPR is not provided). In scalar context, each call reads and returns the next line until end-of-file is reached, whereupon the subsequent call returns undef. In list context, reads until end-of-file is reached and returns a list of lines.
If you would like to check particular context in perl,
sub context { return wantarray ? "LIST" : "SCALAR" }
print my $line = context(), "\n";
print my #array = context(), "\n";
print context(), "\n";
output
SCALAR
LIST
LIST
It depends if it's used in a scalar context or a list context.
In scalar context: my $line = <file> it reads one line at a tie.
In list context: my #lines = <FILE> it reads the whole file.
When you say print <FILE>; it's list context.
The behaviour is different depending on what context it is being evaluated in:
my $scalar = <FILE>; # Read one line from FILE into $scalar
my #array = <FILE>; # Read all lines from FILE into #array
As print takes a list argument, <FILE> is evaluated in list context and behaves in the latter way.

'merging' 2 files into a third using perl

I am reviewing for a test and I can't seem to get this example to code out right.
Problem: Write a perl script, called ileaf, which will linterleave the lines of a file with those of another file writing the result to a third file. If the files are a different length then the excess lines are written at the end.
A sample invocation:
ileaf file1 file2 outfile
This is what I have:
#!/usr/bin/perl -w
open(file1, "$ARGV[0]");
open(file2, "$ARGV[1]");
open(file3, ">$ARGV[2]");
while(($line1 = <file1>)||($line2 = <file2>)){
if($line1){
print $line1;
}
if($line2){
print $line2;
}
}
This sends the information to screen so I can immediately see the result. The final verson should "print file3 $line1;" I am getting all of file1 then all of file2 w/out and interleaving of the lines.
If I understand correctly, this is a function of the use of the "||" in my while loop. The while checks the first comparison and if it's true drops into the loop. Which will only check file1. Once file1 is false then the while checks file2 and again drops into the loop.
What can I do to interleave the lines?
You're not getting what you want from while(($line1 = <file1>)||($line2 = <file2>)){ because as long as ($line1 = <file1>) is true, ($line2 = <file2>) never happens.
Try something like this instead:
open my $file1, "<", $ARGV[0] or die;
open my $file2, "<", $ARGV[1] or die;
open my $file3, ">", $ARGV[2] or die;
while (my $f1 = readline ($file1)) {
print $file3 $f1; #line from file1
if (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
}
while (my $f2 = readline ($file2)) { #if there are any lines left in file2
print $file3 $f2;
}
close $file1;
close $file2;
close $file3;
You'd think if they're teaching you Perl, they'd use the modern Perl syntax. Please don't take this personally. After all, this is how you were taught. However, you should know the new Perl programming style because it helps eliminates all sorts of programming mistakes, and makes your code easier to understand.
Use the pragmas use strict; and use warnings;. The warnings pragma replaces the need for the -w flag on the command line. It's actually more flexible and better. For example, I can turn off particular warnings when I know they'll be an issue. The use strict; pragma requires me to declare my variables with either a my or our. (NOTE: Don't declare Perl built in variables). 99% of the time, you'll use my. These variables are called lexically scoped, but you can think of them as true local variables. Lexically scoped variables don't have any value outside of their scope. For example, if you declare a variable using my inside a while loop, that variable will disappear once the loop exits.
Use the three parameter syntax for the open statement: In the example below, I use the three parameter syntax. This way, if a file is called >myfile, I'll be able to read from it.
**Use locally defined file handles. Note that I use my $file_1_fh instead of simply FILE_1_HANDLE. The old way, FILE_1_HANDLE is globally scoped, plus it's very difficult to pass the file handle to a function. Using lexically scoped file handles just works better.
Use or and and instead of || and &&: They're easier to understand, and their operator precedence is better. They're more likely not to cause problems.
Always check whether your open statement worked: You need to make sure your open statement actually opened a file. Or use the use autodie; pragma which will kill your program if the open statements fail (which is probably what you want to do anyway.
And, here's your program:
#! /usr/bin/env perl
#
use strict;
use warnings;
use autodie;
open my $file_1, "<", shift;
open my $file_2, "<", shift;
open my $output_fh, ">", shift;
for (;;) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
no warnings qw(uninitialized);
print {$output_fh} $line_1 . $line_2;
use warnings;
}
In the above example, I read from both files even if they're empty. If there's nothing to read, then $line_1 or $line_2 is simply undefined. After I do my read, I check whether both $line_1 and $line_2 are undefined. If so, I use last to end my loop.
Because my file handle is a scalar variable, I like putting it in curly braces, so people know it's a file handle and not a variable I want to print out. I don't need it, but it improves clarity.
Notice the no warnings qw(uninitialized);. This turns off the uninitialized warning I'll get. I know that either $line_1 or $line_3 might be uninitialized, so I don't want the warning. I turn it back on right below my print statement because it is a valuable warning.
Here's another way to do that for loop:
while ( 1 ) {
my $line_1 = <$file_1>;
my $line_2 = <$file_2>;
last if not defined $line_1 and not defined $line_2;
print {$output_fh} $line_1 if defined $line_1;
print {$output_fh} $line_2 if defined $line_2;
}
The infinite loop is a while loop instead of a for loop. Some people don't like the C style of for loop and have banned it from their coding practices. Thus, if you have an infinite loop, you use while ( 1 ) {. To me, maybe because I came from a C background, for (;;) { means infinite loop, and while ( 1 ) { takes a few extra milliseconds to digest.
Also, I check whether $line_1 or $line_2 is defined before I print them out. I guess it's better than using no warning and warning, but I need two separate print statements instead of combining them into one.
Here's another option that uses List::MoreUtils's zip to interleave arrays and File::Slurp to read and write files:
use strict;
use warnings;
use List::MoreUtils qw/zip/;
use File::Slurp qw/read_file write_file/;
chomp( my #file1 = read_file shift );
chomp( my #file2 = read_file shift );
write_file shift, join "\n", grep defined $_, zip #file1, #file2;
Just noticed Tim A has a nice solution already posted. This solution is a bit wordier, but might illustrate exactly what is going on a bit more.
The method I went with reads all of the lines from both files into two arrays, then loops through them using a counter.
#!/usr/bin/perl -w
use strict;
open(IN1, "<", $ARGV[0]);
open(IN2, "<", $ARGV[1]);
my #file1_lines;
my #file2_lines;
while (<IN1>) {
push (#file1_lines, $_);
}
close IN1;
while (<IN2>) {
push (#file2_lines, $_);
}
close IN2;
my $file1_items = #file1_lines;
my $file2_items = #file2_lines;
open(OUT, ">", $ARGV[2]);
my $i = 0;
while (($i < $file1_items) || ($i < $file2_items)) {
if (defined($file1_lines[$i])) {
print OUT $file1_lines[$i];
}
if (defined($file2_lines[$i])) {
print OUT $file2_lines[$i];
}
$i++
}
close OUT;

How to print variables in Perl

I have some code that looks like
my ($ids,$nIds);
while (<myFile>){
chomp;
$ids.= $_ . " ";
$nIds++;
}
This should concatenate every line in my myFile, and nIds should be my number of lines. How do I print out my $ids and $nIds?
I tried simply print $ids, but Perl complains.
my ($ids, $nIds)
is a list, right? With two elements?
print "Number of lines: $nids\n";
print "Content: $ids\n";
How did Perl complain? print $ids should work, though you probably want a newline at the end, either explicitly with print as above or implicitly by using say or -l/$\.
If you want to interpolate a variable in a string and have something immediately after it that would looks like part of the variable but isn't, enclose the variable name in {}:
print "foo${ids}bar";
You should always include all relevant code when asking a question. In this case, the print statement that is the center of your question. The print statement is probably the most crucial piece of information. The second most crucial piece of information is the error, which you also did not include. Next time, include both of those.
print $ids should be a fairly hard statement to mess up, but it is possible. Possible reasons:
$ids is undefined. Gives the warning undefined value in print
$ids is out of scope. With use
strict, gives fatal warning Global
variable $ids needs explicit package
name, and otherwise the undefined
warning from above.
You forgot a semi-colon at the end of
the line.
You tried to do print $ids $nIds,
in which case perl thinks that $ids
is supposed to be a filehandle, and
you get an error such as print to
unopened filehandle.
Explanations
1: Should not happen. It might happen if you do something like this (assuming you are not using strict):
my $var;
while (<>) {
$Var .= $_;
}
print $var;
Gives the warning for undefined value, because $Var and $var are two different variables.
2: Might happen, if you do something like this:
if ($something) {
my $var = "something happened!";
}
print $var;
my declares the variable inside the current block. Outside the block, it is out of scope.
3: Simple enough, common mistake, easily fixed. Easier to spot with use warnings.
4: Also a common mistake. There are a number of ways to correctly print two variables in the same print statement:
print "$var1 $var2"; # concatenation inside a double quoted string
print $var1 . $var2; # concatenation
print $var1, $var2; # supplying print with a list of args
Lastly, some perl magic tips for you:
use strict;
use warnings;
# open with explicit direction '<', check the return value
# to make sure open succeeded. Using a lexical filehandle.
open my $fh, '<', 'file.txt' or die $!;
# read the whole file into an array and
# chomp all the lines at once
chomp(my #file = <$fh>);
close $fh;
my $ids = join(' ', #file);
my $nIds = scalar #file;
print "Number of lines: $nIds\n";
print "Text:\n$ids\n";
Reading the whole file into an array is suitable for small files only, otherwise it uses a lot of memory. Usually, line-by-line is preferred.
Variations:
print "#file" is equivalent to
$ids = join(' ',#file); print $ids;
$#file will return the last index
in #file. Since arrays usually start at 0,
$#file + 1 is equivalent to scalar #file.
You can also do:
my $ids;
do {
local $/;
$ids = <$fh>;
}
By temporarily "turning off" $/, the input record separator, i.e. newline, you will make <$fh> return the entire file. What <$fh> really does is read until it finds $/, then return that string. Note that this will preserve the newlines in $ids.
Line-by-line solution:
open my $fh, '<', 'file.txt' or die $!; # btw, $! contains the most recent error
my $ids;
while (<$fh>) {
chomp;
$ids .= "$_ "; # concatenate with string
}
my $nIds = $.; # $. is Current line number for the last filehandle accessed.
How do I print out my $ids and $nIds?
print "$ids\n";
print "$nIds\n";
I tried simply print $ids, but Perl complains.
Complains about what? Uninitialised value? Perhaps your loop was never entered due to an error opening the file. Be sure to check if open returned an error, and make sure you are using use strict; use warnings;.
my ($ids, $nIds) is a list, right? With two elements?
It's a (very special) function call. $ids,$nIds is a list with two elements.

Why doesn't Perl file glob() work outside of a loop in scalar context?

According to the Perl documentation on file globbing, the <*> operator or glob() function, when used in a scalar context, should iterate through the list of files matching the specified pattern, returning the next file name each time it is called or undef when there are no more files.
But, the iterating process only seems to work from within a loop. If it isn't in a loop, then it seems to start over immediately before all values have been read.
From the Perl docs:
In scalar context, glob iterates through such filename expansions, returning undef when the list is exhausted.
http://perldoc.perl.org/functions/glob.html
However, in scalar context the operator returns the next value each time it's called, or undef when the list has run out.
http://perldoc.perl.org/perlop.html#I/O-Operators
Example code:
use warnings;
use strict;
my $filename;
# in scalar context, <*> should return the next file name
# each time it is called or undef when the list has run out
$filename = <*>;
print "$filename\n";
$filename = <*>; # doesn't work as documented, starts over and
print "$filename\n"; # always returns the same file name
$filename = <*>;
print "$filename\n";
print "\n";
print "$filename\n" while $filename = <*>; # works in a loop, returns next file
# each time it is called
In a directory with 3 files...file1.txt, file2.txt, and file3.txt, the above code will output:
file1.txt
file1.txt
file1.txt
file1.txt
file2.txt
file3.txt
Note: The actual perl script should be outside the test directory, or you will see the file name of the script in the output as well.
Am I doing something wrong here, or is this how it is supposed to work?
Here's a way to capture the magic of the <> glob operator's state into an object that you can manipulate in a normal sort of way: anonymous subs (and/or closures)!
sub all_files {
return sub { scalar <*> };
}
my $iter = all_files();
print $iter->(), "\n";
print $iter->(), "\n";
print $iter->(), "\n";
or perhaps:
sub dir_iterator {
my $dir = shift;
return sub { scalar glob("$dir/*") };
}
my $iter = dir_iterator("/etc");
print $iter->(), "\n";
print $iter->(), "\n";
print $iter->(), "\n";
Then again my inclination is to file this under "curiosity". Ignore this particular oddity of glob() / <> and use opendir/readdir, IO::All/readdir, or File::Glob instead :)
The following code also seems to create 2 separate instances of the iterator...
for ( 1..3 )
{
$filename = <*>;
print "$filename\n" if defined $filename;
$filename = <*>;
print "$filename\n" if defined $filename;
}
I guess I see the logic there, but it is kind of counter intuitive and contradictory to the documentation. The docs don't mention anything about having to be in a loop for the iteration to work.
Also from perlop:
A (file)glob evaluates its (embedded) argument only when it is starting a new list.
Calling glob creates a list, which is either returned whole (in list context) or retrieved one element at a time (in scalar context). But each call to glob creates a separate list.
(Scratching away at my rusty memory of Perl...) I think that multiple lexical instances of <*> are treated as independent invokations of glob, whereas in the while loop you are invoking the same "instance" (whatever that means).
Imagine, for instance, if you did this:
while (<*>) { ... }
...
while (<*>) { ... }
You certainly wouldn't expect those two invocations to interfere with each other.