Perl File Handling - perl

The below is the Perl script that I wrote today. This reads the content from one file and writes on the other file. It works but, not completely.
#---------------------------------------------------------------------------
#!/usr/bin/perl
open IFILE, "text3.txt" or die "File not found";
open OFILE, ">text4.txt" or die "File not found";
my $lineno = 0;
while(<IFILE>)
{
#var=<IFILE>;
$lineno++;
print OFILE "#var";
}
close(<IFILE>);
close(<OFILE>);
#---------------------------------------------------------------------------
The issue is, it reads and writes contens, but not all.
text3.txt has four lines. The above script reads only from second line and writes on text4.txt. So, finally I get only three lines (line.no 2 to line.no 4) of text3.txt.
What is wrong with the above program. I don't have any idea about how to check the execution flow on Perl scripts. Kindly help me.
I'm completely new to Programming. I believe, learning all these would help me in changing my career path.
Thanks in Advance,
Vijay

<IFILE> reads one line from IFILE (only one because it's in scalar context). So while(<IFILE>) reads the first line, then the <IFILE> in list context within the while block reads the rest. What you want to do is:
# To read each line one by one:
while(!eof(IFILE)) { # check if end of file is reached instead of reading a line
my $line = <IFILE>; # scalar context, reads only one line
print OFILE $line;
}
# Or to read the whole file at once:
my #content = <IFILE>; # list context, read whole file
print OFILE #content;

The problem is that this line...
while(<IFILE>)
...reads one line from text3.txt, and then this line...
#var=<IFILE>;
...reads ALL of the remaining lines from text3.txt.
You can do it either way, by looping with while or all at once with #var=<IFILE>, but trying to do both won't work.

This is how I would have written the code in your question.
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
# don't need to use "or die ..." when using the autodie module
open my $input, '<', 'text3.txt';
open my $output, '>', 'text4.txt';
while(<$input>){
my $lineno = $.;
print {$output} $_;
}
# both files get closed automatically when they go out of scope
# so no need to close them explicitly
I would recommend always putting use strict and use warnings at the beginning of all Perl files. At least until you know exactly why it is recommended.
I used autodie so that I didn't have to check the return value of open manually. ( autodie was added to Core in version 5.10.1 )
I used the three argument form of open because it is more robust.
It is important to note that while (<$input>){ ... } gets transformed into while (defined($_ = <$input>)){ ... } by the compiler. Which means that the current line is in the $_ variable.
I also used the special $. variable to get the current line number, rather than trying to keep track of the number myself.

There is a couple of questions you might want to think about, if you are strictly copying a file you could use File::Copy module.
If you are going to process the input before writing it out, you might also consider whether you want to keep both files open at the same time or instead read the whole content of the first file (into memory) first, and then write it to the outfile.
This depends on what you are doing underneath. Also if you have a huge binary file, each line in the while-loop might end up huge, so if memory is indeed an issue you might want to use more low-level stream-based reading, more info on I/O: http://oreilly.com/catalog/cookbook/chapter/ch08.html
My suggestion would be to use the cleaner PBP suggested way:
#!/usr/bin/perl
use strict;
use warnings;
use English qw(-no_match_vars);
my $in_file = 'text3.txt';
my $out_file = 'text4.txt';
open my $in_fh, '<', $in_file or die "Unable to open '$in_file': $OS_ERROR";
open my $out_fh, '>', $out_file or die "Unable to open '$out_file': $OS_ERROR";
while (<$in_fh>) {
# $_ is automatically populated with the current line
print { $out_fh } $_ or die "Unable to write to '$out_file': $OS_ERROR";
}
close $in_fh or die "Unable to close '$in_file': $OS_ERROR";
close $out_fh or die "Unable to close '$out_file': $OS_ERROR";
OR just print out the whole in-file directly:
#!/usr/bin/perl
use strict;
use warnings;
use English qw(-no_match_vars);
my $in_file = 'text3.txt';
my $out_file = 'text4.txt';
open my $in_fh, '<', $in_file or die "Unable to open '$in_file': $OS_ERROR";
open my $out_fh, '>', $out_file or die "Unable to open '$out_file': $OS_ERROR";
local $INPUT_RECORD_SEPARATOR; # Slurp mode, read in all content at once, see: perldoc perlvar
print { $out_fh } <$in_fh> or die "Unable to write to '$out_file': $OS_ERROR";;
close $in_fh or die "Unable to close '$in_file': $OS_ERROR";
close $out_fh or die "Unable to close '$out_file': $OS_ERROR";
In addition if you just want to apply a regular expression or similar to a file quickly, you can look into the -i switch of the perl command: perldoc perlrun
perl -p -i.bak -e 's/foo/bar/g' text3.txt; # replace all foo with bar in text3.txt and save original in text3.txt.bak

When you're closing the files, use just
close(IFILE);
close(OFILE);
When you surround a file handle with angle brackets like <IFILE>, Perl interprets that to mean "read a line of text from the file inside the angle brackets". Instead of reading from the file, you want to close the actual file itself here.

Related

Open a file and overwrite the file with adjustments and no backup

I have the following three lines:
rename($file_path, $file_fh.'.bak');
open( my $file_IN_fh, '<' , $file_path.'.bak') || die "die message";
open( my $file_OUT_fh, '>' , $file_path) || die "die message";
It works great. It allows me to go through the in file while(<$file_IN_fh>), make a bunch of changes with a script (s///g, if() to determine if the line stays or not, etc), and write to the out file. In the end I get my edited file and the file name is unchanged.
My issue is that I am at a point where I no longer (currently) want the backup files, so I want to replace the code with something similar that won't create the backup file, and comment back and forth the three lines over the years if my needs change.
How do I do this kind of editing in place not from the command line?
One basic way is to read the file line by line and write desired output lines to a temporary file, which is then renamed so to overwrite the original.
use File::Copy qw(move);
open my $fh, '<', $file or die "Can't open $file: $!";
open my $fh_out, '>', $outfile or die "Can't open $outfile: $!";
while (<$fh>) {
next if /line_to_skip/;
s/patt/repl/g;
print $fh_out $_;
}
close $_ for ($fh, $fh_out);
move ($outfile, $file) or die "Can't move $outfile to $file: $!";
This is what is normally done by tools that edit files "in place" (with additional safety, checks, and flexibility). Since the $outfile is temporary use File::Temp.
Add checks when close-ing files.
Note that this changes the file's inode number, which may matter for some applications.†
If the file isn't huge you can simplify this and read it in first
open my $fh, '<', $file or die "Can't open $file: $!";
my #lines = <$fh>;
open $fh, '>', $file or die "Can't open $file for writing: $!";
for (#lines) {
next if /line_to_skip/;
s/patt/repl/g;
print $fh_out $_;
}
close $fh;
what preserves the inode number, since > mode truncates the existing inode data.
† If this is indeed a problem, you can still keep the same inode. After the temporary file is written, open it for reading and open the original file for writing; that truncates the contents of that inode. Then copy the temporary file to the original. Close handles and delete the temporary file.
If the file is huge, then I'd question why you'd want to avoid the temporary file. Otherwise, I'd suggest just loading the file into memory, make modifications, then write it back out.
use File::Slurp qw( read_file write_file );
my $in = read_file($qfn, array_ref => 1);
my #out;
while (defined( $_ = shift(#$in) )) {
s/a/b/g; # For example.
push #out, $_ if /c/; # For example.
}
write_file($qfn, \#out);
I avoided using expensive splice by using two arrays.
Note that using Tie::File might save one line of code, but this will be 30x faster[1], and probably use less memory (despite memory-saving being Tie::File's goal). Tie::File is never the answer!!!
This is not necessarily representative of all Tie::File uses, but I have indeed timed Tie::File taking 30x longer than the alternative at some basic task. That means that 2 seconds worth of work would have taken 1 minute with Tie::File!
Take a look at the Tie::File module. It is a core module and so shouldn't need installing, and the code is as simple as
use Tie::File;
tie my #file, 'Tie::File', $filepath or die $!;
Thereafter the array #file will hold the contents of the file, one line per element, and any changes to the array will be reflected in the file. All array operations such as push, splice, etc. will work fine
Note that line one of the file is in element zero of the array etc.

Printing a text file using perl

I am completely new to this and this should be the easiest thing to do but for some reason I cannot get my local text file to print. After trying multiple times with different code I came to use the following code but it doesn't print.
I have searched for days on various threads to solve this and have had no luck. Please help. Here is my code:
#!/usr/bin/perl
$newfile = "file.txt";
open (FH, $newfile);
while ($file = <FH>) {
print $file;
}
I updated my code to the following:
#!/user/bin/perl
use strict; # Always use strict
use warnings; # Always use warnings.
open(my $fh, "<", "file.txt") or die "unable to open file.txt: $!";
# Above we open file using 3 handle method
# or die die with error if unable to open it.
while (<$fh>) { # While in the file.
print $_; # Print each line
}
close $fh; # Close the file
system('C:\Users\RSS\file.txt');
It returns the following: my first report generated by perl. I do not know where this is coming from. Nowhere do I have a print "my first report generated by perl."; statement and it definitely is not in my text file.
My text file is full of various emails, addresses, phone numbers and snippets of emails.
Thank you all for your help. I figured out my problem. I somehow managed to kick myself out of my directory and did not realize it.
This is most likely a combination of a failure to open the file, and a failure to check the return value of open.
If you are completely new to perl, I warmly recommend reading the excellent "perlintro" man page, using either man perlintro or perldoc perlintro on the command line, or taking a look here: https://perldoc.perl.org/perlintro.html.
The "Files and I/O" section there gives a good and concise way of doing this:
open(my $in, "<", "input.txt") or die "Can't open input.txt: $!";
while (<$in>) { # assigns each line in turn to $_
print "Just read in this line: $_";
}
This version will give you an explanation and abort if anything goes wrong while trying to open the file. For example, if there is no file named file.txt in the current working directory, your version will quietly fail to open the file, and afterwards it will quietly fail to read from the closed file handle.
Also, always adding at least one of these to your perl scripts will save you a lot of trouble in the long run:
use warnings; # or use the -w command line switch to turn warnings on globally
use diagnostics;
These won't catch the failure to open the file, but will alert on the failed read.
In the first example here you can see that without the diagnostics module, the code fails without any error messages. The second example shows how the diagnostics module changes this.
$ perl -le 'open FH, "nonexistent.txt"; while(<FH>){print "foo"}'
$ perl -le 'use diagnostics; open FH, "nonexistent.txt"; while(<FH>){print "foo"}'
readline() on closed filehandle FH at -e line 1 (#1)
(W closed) The filehandle you're reading from got itself closed sometime
before now. Check your control flow.
By the way, the legendary "Camel Book" is basically the perl man pages formatted for paper printing, so reading the perldocs in the order listed in perldoc perl will give you a high level of understanding of the language in a reasonably accessible and inexpensive manner.
Happy hacking!
This is simple and including explanations.
use strict; # Always use strict
use warnings; # Always use warnings.
open(my $fh, "<", "file.txt") or die "unable to open file.txt: $!";
# Above we open file using 3 handle method
# or die die with error if unable to open it.
while (<$fh>) { # While in the file.
print $_; # Print each line
}
close $fh; # Close the file
There is then also the case where you are trying to open a file which is not in a location where you think it is. So consider doing full path, if not in the same dir.
open(my $fh, "<", 'F:\Workdir\file.txt') or die "unable to open < input.txt: $!";
EDIT: After your comments, it seems that you are opening an empty file. Please add this at the bottom of that same script and rerun. It will open the file in C:\Users\RSS and make sure it does actually contain data?
system('C:\Users\RSS\file.txt');
First, of all as you are starting out, it is better to enable all warnings by 'use warnings' and disable all such expression which can lead to uncertain behavior or are difficult to debug by pragma 'use strict'.
As you are dealing with file stream, it is always recommended to the check if you were able to open the stream. so, try to use croak or die both would terminate the program with a given message.
Instead of reading inside the while condition, I would recommend checking for end of file. So, loop breaks as end is found. Usually, when reading a line you would use it for further processing, so it is good idea to remove end of lines using chomp.
A sample for reading a file in perl can be as follows:
#!/user/bin/perl
use strict;
use warnings;
my $newfile = "file.txt";
open (my $fh, $newfile) or die "Could not open file '$newfile' $!";
while (!eof($fh))
{
my $line=<$fh>;
chomp($line);
print $line , "\n";
}

How to delete common lines from one of 2 files in Perl?

I have 2 files, a small one and a big one. The small file is a subset of the big one.
For instance:
Small file:
solar:1000
alexey:2000
Big File:
andrey:1001
solar:1000
alexander:1003
alexey:2000
I want to delete all the lines from Big.txt which are also present in Small.txt. In other words, I want to delete the lines in Big file which are common to the small File.
So, I wrote a Perl Script as shown below:
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
open(BIG, "<$big") || die("Couldn't read from the file: $big\n");
my #contents = <BIG>;
close (BIG);
open(SMALL, "<$small") || die ("Couldn't read from the file: $small\n");
while(<SMALL>)
{
chomp $_;
#contents = grep !/^\Q$_/, #contents;
}
close(SMALL);
open(OUTPUT, ">>$output") || die ("Couldn't open the file: $output\n");
print OUTPUT #contents;
close(OUTPUT);
However, this Perl Script does not delete the lines in Big.txt which are common to Small.txt
In this script, I first open the big file stream and copy the entire contents into the array, #contents. Then, I iterate over each entry in the small file and check for its presence in the bigger file. I filter the line from Big File and save it back into the array.
I am not sure why this script does not work? Thanks
Your script does NOT work because grep uses $_ and takes over (for the duration of grep) the old value of your $_ from the loop (e.g. the variable $_ you use in the regex is NOT the variable used for storing the loop value in the while block - they are named the same, but have different scopes).
Use a named variable instead (as a rule, NEVER use $_ for any code longer than 1 line, precisely to avoid this type of bug):
while (my $line=<SMALL>) {
chomp $line;
#contents = grep !/^\Q$line/, #contents;
}
However, as Oleg pointed out, a more efficient solution is to read small file's lines into a hash and then process the big file ONCE, checking hash contents (I also improved the style a bit - feel free to study and use in the future, using lexical filehandle variables, 3-arg form of open and IO error printing via $!):
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
use File::Slurp;
my #small = read_file($small);
my %small = map { ($_ => 1) } #small;
open(my $big, "<", $big) or die "Can not read $big: Error: $!\n";
open(my $output, ">", $output) or die "Can not write to $output: Error: $!\n";
while(my $line=<$big>) {
chomp $line;
next if $small{$line}; # Skip common
print $output "$line\n";
}
close($big);
close($output);
It doesn't work for several reasons. First, lines in #content still have their newlines in. And second, when you grep, $_ in !/^\Q$_/ is set not to the last line from small file, but for each element of #contents array, effectively making it: for each element in list return everything except this element, leaving you with empty list at the end.
This isn't really the good way to do it - you're reading big file and then trying to reprocess it several times. First, read a small file and put every line in hash. Then read big file inside while(<>) loop, so you won't waste your memory reading it entirely. On each line, check if key exists in previously populated hash and if it does - go to next iteration, otherwise print the line.
Here is a small and efficient solution to your problem:
#!/usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
my %diffx;
open my $bfh, "<", $big or die "Couldn't read from the file $big: $!\n";
# load big file's contents
my #big = <$bfh>;
chomp #big;
# build a lookup table, a structured table for big file
#diffx{#big} = ();
close $bfh or die "$!\n";
open my $sfh, "<", $small or die "Couldn't read from the file $small: $!\n";
my #small = <$sfh>;
chomp #small;
# delete the elements that exist in small file from the lookup table
delete #diffx{#small};
close $sfh;
# print join "\n", keys %diffx;
open my $ofh, ">", $output or die "Couldn't open the file $output for writing: $!\n";
# what is left is unique lines from big file
print $ofh join "\n", keys %diffx;
close $ofh;
__END__
P.S. I learned this trick and many others from Perl Cookbook, 2nd Edition. Thanks

perl appending issues

I have some code that appends into some files in the nested for loops. After exiting the for loops, I want to append .end to all the files.
foreach my $file (#SPICE_FILES)
{
open(FILE1, ">>$file") or die "[ERROR $0] cannot append to file : $file\n";
print FILE1 "\n.end\n";
close FILE1;
}
I noticed in some strange cases that the ".end" is appended into the middle of the files!
how do i resolve this??
Since I do not yet have the comment-privilege I'll have to write this as an 'answer'.
Do you use any dodgy modules?
I have run into issues where (obviously) broken perl-modules have done something to the output buffering. For me placing
$| = 1;
in the code has helped. The above statement turns off perls output buffering (AFAIK). It might have had other effects too, but I have not seen anything negative come out of it.
I guess you've got data buffered in some previously opened file descriptors. Try closing them before re-opening:
open my $fd, ">>", $file or die "Can't open $file: $!";
print $fd, $data;
close $fd or die "Can't close: $!";
Better yet, you can append those filehanles to an array/hash and write to them in cleanup:
push #handles, $fd;
# later
print $_ "\n.end\n" for #handles;
Here's a case to reproduce the "impossible" append in the middle:
#!/usr/bin/perl -w
use strict;
my $file = "file";
open my $fd, ">>", $file;
print $fd "begin"; # no \n -- write buffered
open my $fd2, ">>", $file;
print $fd2 "\nend\n";
close $fd2; # file flushed on close
# program ends here -- $fd finally closed
# you're left with "end\nbegin"
It’s not possible to append something to the middle of the file. The O_APPEND flag guarantees that each write(2) syscall will place its contents at the old EOF and update the st_size field by incrementing it by however many bytes you just wrote.
Therefore if you find that your own data is not showing up at the end when you go to look at it, then another agent has written more data to it afterwards.

How can I read input from a text file in Perl?

I would like to take input from a text file in Perl. Though lot of info are available over the net, it is still very confusing as to how to do this simple task of printing every line of text file. So how to do it? I am new to Perl, thus the confusion .
eugene has already shown the proper way. Here is a shorter script:
#!/usr/bin/perl
print while <>
or, equivalently,
#!/usr/bin/perl -p
on the command line:
perl -pe0 textfile.txt
You should start learning the language methodically, following a decent book, not through haphazard searches on the web.
You should also make use of the extensive documentation that comes with Perl.
See perldoc perltoc or perldoc.perl.org.
For example, opening files is covered in perlopentut.
First, open the file:
open my $fh, '<', "filename" or die $!;
Next, use the while loop to read until EOF:
while (<$fh>) {
# line contents's automatically stored in the $_ variable
}
close $fh or die $!;
# open the file and associate with a filehandle
open my $file_handle, '<', 'your_filename'
or die "Can't open your_filename: $!\n";
while (<$file_handle>) {
# $_ contains each record from the file in turn
}