Perl File I/O grep

Perl File I/O grep - perl

There may be syntax errors as I'm doing this from memory, but:
use strict;
use warnings;
open(FILE, "+>file") or die "can't open";
print FILE "foo";
if (grep {/^foo$/m}, <FILE>) {
print "bar";
}
close(FILE) or die "can't close";
Why doesn't this print bar and how should I do this? I'm writing to a file and I need to check in the future if I've written certain things to the file before continuing, ie. if foo already exists then don't write foo.

Reading data from a file (e.g, <FILE>) starts reading from the current file pointer, not from the start of the file. In this case, that ends up being the end of the file — nothing gets read.
If you wanted to restart reading from the beginning, you could seek to the beginning first:
seek FILE, 0, 0;
However, keep in mind that this will be very inefficient. If you expect this to be a common operation, you'll be much better off storing the things you've written to an array and searching through that.

There are a few problems with the code that could be causing this.
The output to the file may be buffered so reading from the file will not see the output until after the buffer has been flushed to the file.
In addition when you write to the file the file handle's location is moved to the end of where you have written so the read will not see it. This isn't tested but you should be looking along the lines of something like this.
use Modern::Perl;
use Fcntl qw(SEEK_SET);
open(my $file_handle, "+>", "file") or die "can't open";
print $file_handle "foo";
$file_handle->flush;
seek $file_handle, 0, SEEK_SET);
if (grep {/^foo$/m}, <$file_handle>) {
print "bar\n";
}
close($file_handle) or die "can't close";

Related

Open a file and overwrite the file with adjustments and no backup

I have the following three lines:
rename($file_path, $file_fh.'.bak');
open( my $file_IN_fh, '<' , $file_path.'.bak') || die "die message";
open( my $file_OUT_fh, '>' , $file_path) || die "die message";
It works great. It allows me to go through the in file while(<$file_IN_fh>), make a bunch of changes with a script (s///g, if() to determine if the line stays or not, etc), and write to the out file. In the end I get my edited file and the file name is unchanged.
My issue is that I am at a point where I no longer (currently) want the backup files, so I want to replace the code with something similar that won't create the backup file, and comment back and forth the three lines over the years if my needs change.
How do I do this kind of editing in place not from the command line?

One basic way is to read the file line by line and write desired output lines to a temporary file, which is then renamed so to overwrite the original.
use File::Copy qw(move);
open my $fh, '<', $file or die "Can't open $file: $!";
open my $fh_out, '>', $outfile or die "Can't open $outfile: $!";
while (<$fh>) {
next if /line_to_skip/;
s/patt/repl/g;
print $fh_out $_;
}
close $_ for ($fh, $fh_out);
move ($outfile, $file) or die "Can't move $outfile to $file: $!";
This is what is normally done by tools that edit files "in place" (with additional safety, checks, and flexibility). Since the $outfile is temporary use File::Temp.
Add checks when close-ing files.
Note that this changes the file's inode number, which may matter for some applications.†
If the file isn't huge you can simplify this and read it in first
open my $fh, '<', $file or die "Can't open $file: $!";
my #lines = <$fh>;
open $fh, '>', $file or die "Can't open $file for writing: $!";
for (#lines) {
next if /line_to_skip/;
s/patt/repl/g;
print $fh_out $_;
}
close $fh;
what preserves the inode number, since > mode truncates the existing inode data.
† If this is indeed a problem, you can still keep the same inode. After the temporary file is written, open it for reading and open the original file for writing; that truncates the contents of that inode. Then copy the temporary file to the original. Close handles and delete the temporary file.

If the file is huge, then I'd question why you'd want to avoid the temporary file. Otherwise, I'd suggest just loading the file into memory, make modifications, then write it back out.
use File::Slurp qw( read_file write_file );
my $in = read_file($qfn, array_ref => 1);
my #out;
while (defined( $_ = shift(#$in) )) {
s/a/b/g; # For example.
push #out, $_ if /c/; # For example.
}
write_file($qfn, \#out);
I avoided using expensive splice by using two arrays.
Note that using Tie::File might save one line of code, but this will be 30x faster[1], and probably use less memory (despite memory-saving being Tie::File's goal). Tie::File is never the answer!!!
This is not necessarily representative of all Tie::File uses, but I have indeed timed Tie::File taking 30x longer than the alternative at some basic task. That means that 2 seconds worth of work would have taken 1 minute with Tie::File!

Take a look at the Tie::File module. It is a core module and so shouldn't need installing, and the code is as simple as
use Tie::File;
tie my #file, 'Tie::File', $filepath or die $!;
Thereafter the array #file will hold the contents of the file, one line per element, and any changes to the array will be reflected in the file. All array operations such as push, splice, etc. will work fine
Note that line one of the file is in element zero of the array etc.

Holding files in memory in perl while using them like file handles

I have a script in perl that I need to modify. The script opens, reads and seeks through two large (ASCII) files (they are several GB in size). Since it does that quite a bit, I would like to put these two files completely into RAM. The easiest way of doing this while not modifying the script a lot would be to load the files into the memory in a way that I can treat the resulting variable just as a file handle - and for example use seek to get to a specific byte position. Is that possible in perl?
Update: Using File::Slurp as proposed does the job only for small files. If the files are larger than about 2GB, it doesn't work.
Mimimum example:
#!/usr/bin/env perl
use strict;
use warnings;
use Tie::File;
use File::Slurp 'read_file';
my $fn="testfile";
#buffer, then open as file, read first line:
read_file($fn, buf_ref => \my $file_contents_forests) or die "Could not read file!";
my $filehandle;
open($filehandle, "<", \$file_contents_forests) or die "Could not open buffer: $!\n";
my $line = "the first line:".<$filehandle>;
print $line."\n";
close($filehandle);
#open as file, read first line:
open( FORESTS, "<",$fn) or die "Could not open file.\n";
my $line = "the first line:".<FORESTS>;
print $line;
close(FORESTS);
The output in this case is identical for the two methods if the file size is < 2 GB. If the file is larger, then slurping returns an empty line.

Read in the file:
use File::Slurp 'read_file';
read_file( "filename", buf_ref => \my $file_contents );
and open a filehandle to it:
open my $file_handle, '<', \$file_contents;

How to delete common lines from one of 2 files in Perl?

I have 2 files, a small one and a big one. The small file is a subset of the big one.
For instance:
Small file:
solar:1000
alexey:2000
Big File:
andrey:1001
solar:1000
alexander:1003
alexey:2000
I want to delete all the lines from Big.txt which are also present in Small.txt. In other words, I want to delete the lines in Big file which are common to the small File.
So, I wrote a Perl Script as shown below:
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
open(BIG, "<$big") || die("Couldn't read from the file: $big\n");
my #contents = <BIG>;
close (BIG);
open(SMALL, "<$small") || die ("Couldn't read from the file: $small\n");
while(<SMALL>)
{
chomp $_;
#contents = grep !/^\Q$_/, #contents;
}
close(SMALL);
open(OUTPUT, ">>$output") || die ("Couldn't open the file: $output\n");
print OUTPUT #contents;
close(OUTPUT);
However, this Perl Script does not delete the lines in Big.txt which are common to Small.txt
In this script, I first open the big file stream and copy the entire contents into the array, #contents. Then, I iterate over each entry in the small file and check for its presence in the bigger file. I filter the line from Big File and save it back into the array.
I am not sure why this script does not work? Thanks

Your script does NOT work because grep uses $_ and takes over (for the duration of grep) the old value of your $_ from the loop (e.g. the variable $_ you use in the regex is NOT the variable used for storing the loop value in the while block - they are named the same, but have different scopes).
Use a named variable instead (as a rule, NEVER use $_ for any code longer than 1 line, precisely to avoid this type of bug):
while (my $line=<SMALL>) {
chomp $line;
#contents = grep !/^\Q$line/, #contents;
}
However, as Oleg pointed out, a more efficient solution is to read small file's lines into a hash and then process the big file ONCE, checking hash contents (I also improved the style a bit - feel free to study and use in the future, using lexical filehandle variables, 3-arg form of open and IO error printing via $!):
#! /usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
use File::Slurp;
my #small = read_file($small);
my %small = map { ($_ => 1) } #small;
open(my $big, "<", $big) or die "Can not read $big: Error: $!\n";
open(my $output, ">", $output) or die "Can not write to $output: Error: $!\n";
while(my $line=<$big>) {
chomp $line;
next if $small{$line}; # Skip common
print $output "$line\n";
}
close($big);
close($output);

It doesn't work for several reasons. First, lines in #content still have their newlines in. And second, when you grep, $_ in !/^\Q$_/ is set not to the last line from small file, but for each element of #contents array, effectively making it: for each element in list return everything except this element, leaving you with empty list at the end.
This isn't really the good way to do it - you're reading big file and then trying to reprocess it several times. First, read a small file and put every line in hash. Then read big file inside while(<>) loop, so you won't waste your memory reading it entirely. On each line, check if key exists in previously populated hash and if it does - go to next iteration, otherwise print the line.

Here is a small and efficient solution to your problem:
#!/usr/bin/perl
use strict;
use warnings;
my ($small, $big, $output) = #ARGV;
my %diffx;
open my $bfh, "<", $big or die "Couldn't read from the file $big: $!\n";
# load big file's contents
my #big = <$bfh>;
chomp #big;
# build a lookup table, a structured table for big file
#diffx{#big} = ();
close $bfh or die "$!\n";
open my $sfh, "<", $small or die "Couldn't read from the file $small: $!\n";
my #small = <$sfh>;
chomp #small;
# delete the elements that exist in small file from the lookup table
delete #diffx{#small};
close $sfh;
# print join "\n", keys %diffx;
open my $ofh, ">", $output or die "Couldn't open the file $output for writing: $!\n";
# what is left is unique lines from big file
print $ofh join "\n", keys %diffx;
close $ofh;
__END__
P.S. I learned this trick and many others from Perl Cookbook, 2nd Edition. Thanks

perl appending issues

I have some code that appends into some files in the nested for loops. After exiting the for loops, I want to append .end to all the files.
foreach my $file (#SPICE_FILES)
{
open(FILE1, ">>$file") or die "[ERROR $0] cannot append to file : $file\n";
print FILE1 "\n.end\n";
close FILE1;
}
I noticed in some strange cases that the ".end" is appended into the middle of the files!
how do i resolve this??

Since I do not yet have the comment-privilege I'll have to write this as an 'answer'.
Do you use any dodgy modules?
I have run into issues where (obviously) broken perl-modules have done something to the output buffering. For me placing
$| = 1;
in the code has helped. The above statement turns off perls output buffering (AFAIK). It might have had other effects too, but I have not seen anything negative come out of it.

I guess you've got data buffered in some previously opened file descriptors. Try closing them before re-opening:
open my $fd, ">>", $file or die "Can't open $file: $!";
print $fd, $data;
close $fd or die "Can't close: $!";
Better yet, you can append those filehanles to an array/hash and write to them in cleanup:
push #handles, $fd;
# later
print $_ "\n.end\n" for #handles;
Here's a case to reproduce the "impossible" append in the middle:
#!/usr/bin/perl -w
use strict;
my $file = "file";
open my $fd, ">>", $file;
print $fd "begin"; # no \n -- write buffered
open my $fd2, ">>", $file;
print $fd2 "\nend\n";
close $fd2; # file flushed on close
# program ends here -- $fd finally closed
# you're left with "end\nbegin"

It’s not possible to append something to the middle of the file. The O_APPEND flag guarantees that each write(2) syscall will place its contents at the old EOF and update the st_size field by incrementing it by however many bytes you just wrote.
Therefore if you find that your own data is not showing up at the end when you go to look at it, then another agent has written more data to it afterwards.

Perl File Handling

The below is the Perl script that I wrote today. This reads the content from one file and writes on the other file. It works but, not completely.
#---------------------------------------------------------------------------
#!/usr/bin/perl
open IFILE, "text3.txt" or die "File not found";
open OFILE, ">text4.txt" or die "File not found";
my $lineno = 0;
while(<IFILE>)
{
#var=<IFILE>;
$lineno++;
print OFILE "#var";
}
close(<IFILE>);
close(<OFILE>);
#---------------------------------------------------------------------------
The issue is, it reads and writes contens, but not all.
text3.txt has four lines. The above script reads only from second line and writes on text4.txt. So, finally I get only three lines (line.no 2 to line.no 4) of text3.txt.
What is wrong with the above program. I don't have any idea about how to check the execution flow on Perl scripts. Kindly help me.
I'm completely new to Programming. I believe, learning all these would help me in changing my career path.
Thanks in Advance,
Vijay

<IFILE> reads one line from IFILE (only one because it's in scalar context). So while(<IFILE>) reads the first line, then the <IFILE> in list context within the while block reads the rest. What you want to do is:
# To read each line one by one:
while(!eof(IFILE)) { # check if end of file is reached instead of reading a line
my $line = <IFILE>; # scalar context, reads only one line
print OFILE $line;
}
# Or to read the whole file at once:
my #content = <IFILE>; # list context, read whole file
print OFILE #content;

The problem is that this line...
while(<IFILE>)
...reads one line from text3.txt, and then this line...
#var=<IFILE>;
...reads ALL of the remaining lines from text3.txt.
You can do it either way, by looping with while or all at once with #var=<IFILE>, but trying to do both won't work.

This is how I would have written the code in your question.
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
# don't need to use "or die ..." when using the autodie module
open my $input, '<', 'text3.txt';
open my $output, '>', 'text4.txt';
while(<$input>){
my $lineno = $.;
print {$output} $_;
}
# both files get closed automatically when they go out of scope
# so no need to close them explicitly
I would recommend always putting use strict and use warnings at the beginning of all Perl files. At least until you know exactly why it is recommended.
I used autodie so that I didn't have to check the return value of open manually. ( autodie was added to Core in version 5.10.1 )
I used the three argument form of open because it is more robust.
It is important to note that while (<$input>){ ... } gets transformed into while (defined($_ = <$input>)){ ... } by the compiler. Which means that the current line is in the $_ variable.
I also used the special $. variable to get the current line number, rather than trying to keep track of the number myself.

There is a couple of questions you might want to think about, if you are strictly copying a file you could use File::Copy module.
If you are going to process the input before writing it out, you might also consider whether you want to keep both files open at the same time or instead read the whole content of the first file (into memory) first, and then write it to the outfile.
This depends on what you are doing underneath. Also if you have a huge binary file, each line in the while-loop might end up huge, so if memory is indeed an issue you might want to use more low-level stream-based reading, more info on I/O: http://oreilly.com/catalog/cookbook/chapter/ch08.html
My suggestion would be to use the cleaner PBP suggested way:
#!/usr/bin/perl
use strict;
use warnings;
use English qw(-no_match_vars);
my $in_file = 'text3.txt';
my $out_file = 'text4.txt';
open my $in_fh, '<', $in_file or die "Unable to open '$in_file': $OS_ERROR";
open my $out_fh, '>', $out_file or die "Unable to open '$out_file': $OS_ERROR";
while (<$in_fh>) {
# $_ is automatically populated with the current line
print { $out_fh } $_ or die "Unable to write to '$out_file': $OS_ERROR";
}
close $in_fh or die "Unable to close '$in_file': $OS_ERROR";
close $out_fh or die "Unable to close '$out_file': $OS_ERROR";
OR just print out the whole in-file directly:
#!/usr/bin/perl
use strict;
use warnings;
use English qw(-no_match_vars);
my $in_file = 'text3.txt';
my $out_file = 'text4.txt';
open my $in_fh, '<', $in_file or die "Unable to open '$in_file': $OS_ERROR";
open my $out_fh, '>', $out_file or die "Unable to open '$out_file': $OS_ERROR";
local $INPUT_RECORD_SEPARATOR; # Slurp mode, read in all content at once, see: perldoc perlvar
print { $out_fh } <$in_fh> or die "Unable to write to '$out_file': $OS_ERROR";;
close $in_fh or die "Unable to close '$in_file': $OS_ERROR";
close $out_fh or die "Unable to close '$out_file': $OS_ERROR";
In addition if you just want to apply a regular expression or similar to a file quickly, you can look into the -i switch of the perl command: perldoc perlrun
perl -p -i.bak -e 's/foo/bar/g' text3.txt; # replace all foo with bar in text3.txt and save original in text3.txt.bak

When you're closing the files, use just
close(IFILE);
close(OFILE);
When you surround a file handle with angle brackets like <IFILE>, Perl interprets that to mean "read a line of text from the file inside the angle brackets". Instead of reading from the file, you want to close the actual file itself here.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl File I/O grep - perl

Related

Open a file and overwrite the file with adjustments and no backup

Holding files in memory in perl while using them like file handles

How to delete common lines from one of 2 files in Perl?

perl appending issues

Perl File Handling

Categories

Resources