Perl Read/Write File Handle - Unable to Overwrite - perl

I have a Perl Script which performs a specific operation and based on the result, it should update a file.
Basic overview is:
Read a value from the file handle, FILE
Perform some operation and then compare the result with the value stored in INPUT file.
If there is a change, then update the file corresponding to File Handle.
When I say, update, I mean, overwrite the existing value in INPUT file with the new one.
An overview of the script:
#! /usr/bin/perl
use warnings;
use diagnostics;
$input=$ARGV[0];
open(FILE,"+<",$input) || die("Couldn't open the file, $input with error: $!\n");
# perform some operation and set $new_value here.
while(<FILE>)
{
chomp $_;
$old_value=$_;
if($new_value!=$old_value)
{
print FILE $new_value,"\n";
}
}
close FILE;
However, this appends the $new_value to the file instead of overwriting it.
I have read the documentation in several places for this mode of FILE Handle and everywhere it says, read/write mode without append.
I am not sure, why it is unable to overwrite. One reason I could think of is, since I am reading from the handle in the while loop and trying to overwrite it at the same time, it might not work.
Thanks.

your guess is right. You first read the file so file pointer is actually in the position of end of old value. I didn't try this myself, but you can probably seek file pointer to 0 before print it out.
seek(FILE, 0, 0);

You should add truncate to your program along with seek.
if( $new_value != $old_value )
{
seek( FILE, 0, 0 );
truncate FILE, 0;
print FILE $new_value,"\n";
}
Since the file is opened for reading and writing, writing a shorter $new_value will leave some of the $old_value in the file. truncate will remove it.
See perldoc -f seek and perldoc -f truncate for details.

you have to close the file handle and open a different one (or the same one if you like) set to the output file. like this.
close FILE;
open FILE, ">$input" or die $!;
...
close FILE;
that should do the trick

Related

Perl: Substitute text string with value from list (text file or scalar context)

I am a perl novice, but have read the "Learning Perl" by Schwartz, foy and Phoenix and have a weak understanding of the language. I am still struggling, even after using the book and the web.
My goal is to be able to do the following:
Search a specific folder (current folder) and grab filenames with full path. Save filenames with complete path and current foldername.
Open a template file and insert the filenames with full path at a specific location (e.g. using substitution) as well as current foldername (in another location in the same text file, I have not gotten this far yet).
Save the new modified file to a new file in a specific location (current folder).
I have many files/folders that I want to process and plan to copy the perl program to each of these folders so the perl program can make new .
I have gotten so far ...:
use strict;
use warnings;
use Cwd;
use File::Spec;
use File::Basename;
my $current_dir = getcwd;
open SECONTROL_TEMPLATE, '<secontrol_template.txt' or die "Can't open SECONTROL_TEMPLATE: $!\n";
my #secontrol_template = <SECONTROL_TEMPLATE>;
close SECONTROL_TEMPLATE;
opendir(DIR, $current_dir) or die $!;
my #seq_files = grep {
/gz/
} readdir (DIR);
open FASTQFILENAMES, '> fastqfilenames.txt' or die "Can't open fastqfilenames.txt: $!\n";
my #fastqfiles;
foreach (#seq_files) {
$_ = File::Spec->catfile($current_dir, $_);
push(#fastqfiles,$_);
}
print FASTQFILENAMES #fastqfiles;
open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
open SECONTROL, '> secontrol.txt' or die "Can't open SECONTROL: $!\n";
print SECONTROL #secontrol;
close SECONTROL;
close FASTQFILENAMES;
My problem is that I cannot figure out how to use my list of files to replace the "#" in my template text file:
my #secontrol;
foreach (#secontrol_template) {
$_ =~ s/#/$fastqfilenames/eg;
push(#secontrol,$_);
}
The substitute function will not replace the "#" with the list of files listed in $fastqfilenames. I get the "#" replaced with GLOB(0x8ab1dc).
Am I doing this the wrong way? Should I not use substitute as this can not be done, and then rather insert the list of files ($fastqfilenames) in the template.txt file? Instead of the $fastqfilenames, can I substitute with content of file (e.g. s/A/{r file.txt ...). Any suggestions?
Cheers,
JamesT
EDIT:
This made it all better.
foreach (#secontrol_template) {
s/#/$fastqfilenames/g;
push #secontrol, $_;
}
And as both suggestions, the $fastqfiles is a filehandle.
replaced this: open (my ($fastqfilenames), "<", "fastqfilenames.txt") or die "Can't open fastqfilenames.txt: $!\n";
with this:
my $fastqfilenames = join "\n", #fastqfiles;
made it all good. Thanks both of you.
$fastqfilenames is a filehandle. You have to read the information out of the filehandle before you can use it.
However, you have other problems.
You are printing all of the filenames to a file, then reading them back out of the file. This is not only a questionable design (why read from the file again, since you already have what you need in an array?), it also won't even work:
Perl buffers file I/O for performance reasons. The lines you have written to the file may not actually be there yet, because Perl is waiting until it has a large chunk of data saved up, to write it all at once.
You can override this buffering behavior in a few different ways (closing the file handle being the simplest if you are done writing to it), but as I said, there is no reason to reopen the file again and read from it anyway.
Also note, the /e option in a regex replacement evaluates the replacement as Perl code. This is not necessary in your case, so you should remove it.
Solution: Instead of reopening the file and reading it, just use the #fastqfiles variable you previously created when replacing in the template. It is not clear exactly what you mean by replacing # with the filenames.
Do you want to to replace each # with a list of all filenames together? If so, you should probably need to join the filenames together in some way before doing the replacement.
Do you want to create a separate version of the template file for each filename? If so, you need an inner for loop that goes over each filename for each template. And you will need something other than a simple replacement, because the replacement will change the original string on the first time through. If you are on Perl 5.16, you could use the /r option to replace non-destructively: push(#secontrol,s/#/$file_name/gr); Otherwise, you should copy to another variable before doing the replacement.
$_ =~ s/#/$fastqfilenames/eg;
$fastqfilenames is a file handle, not the file contents.
In any case, I recommend the use of Text::Template module in order to do this kind of work (file text substitution).

Re-reading from already read filehandle

I opened a file to read from line by line:
open(FH,"<","$myfile") or die "could not open $myfile: $!";
while (<FH>)
{
# ...do something
}
Later on in the program, I try to re-read the file (walk thru the file again):
while (<FH>)
{
# ...do something
}
and realized that it is as if the control within file is at the EOF and will not iterate from first line in the file.... is this default behavior? How to work around this? The file is big and I do not want to keep in memory as array. So is my only option is to close and open the file again?
Use seek to rewind to the beginning of the file:
seek FH, 0, 0;
Or, being more verbose:
use Fcntl;
seek FH, 0, SEEK_SET;
Note that it greatly limits the usefulness of your tool if you must seek on the input, as it can never be used as a filter. It is extremely useful to be able to read from a pipe, and you should strive to arrange your program so that the seek is not necessary.
You have a few options.
Reopen the file handle
Set the position to the beginning of the file using seek, as William Pursell suggested.
Use a module such as Tie::File, which lets you read the file as an array, without loading it into memory.

Reading from file using Perl

I am learning Perl and have looked up this question but haven't been able to get it to work for me although it terminates without error.
I enter a file that it should want (name-0-0-0) but it just skips the while loop altogether.
open FILE, '+>>userinfo.txt';
print("What is your name?");
$name = <>;
chomp $name;
while (<FILE>) {
chomp;
($nameRead,$wins, $losses, $cats) = split("-");
if ($nameRead eq $name){
print("Oh hello $name, your current record is $wins wins - $losses losses - $cats ties");
print("Would you like to play again? type y for yes or n for no\n");
$bool = <>;
if ($bool == "y"){
print("Okay let's play!");
play();
exit();
}
else {
printf("well fine goodbye!");
exit();
}
}
}
Well it seems my problem was indeed related to the +>>. I am trying to add on to the file, but I wanted to be able to write, not just append. I changed it to +< and everything worked great. Thanks guys I really appreciate it!
Your primary problem is that you have chosen an arcane open mode for userinfo.txt, which will allow you to open an existing file for both read and write but create a new file if it doesn't exist.
You must always check whether a file open has succeeded, and it looks like all you want to do is read from this file, so you want
open FILE, '<', 'userinfo.txt' or die $!;
You must also always add
use strict;
use warnings;
to the top of your program, and declare all variables with my at their first point of use.
Once you have made these changes you will most likely understand yourself what is going wrong, but if you have further problems please post your modified code.
I appears like you're using the wrong syntax to open the file for reading. Try
use autodie qw(:all);
open my $FILE, '<', '/path/to/file';
The syntax you're using opens a file for appending.
Why are you opening the file in +>> mode? That mode opens your file for input and output, but sets the filehandle cursor to the end of the file. Even experienced Perl programmers rarely have a need to do that.
Since the filehandle is positioned at the end of the file when you open it, you won't get anything when you attempt to read from it.
Is there a reason you aren't just saying open FILE, '<', 'userinfo.txt' ?
The following says while you're reading the file and it hasn't reached the end of file:
while (<FILE>) {
...
}
You might want to remove the while loop entirely and just do everything inside it.
Edit/Alternative Solution:
The real reason nothing was happening is because when you use +>>, it opens the file up for read/append as you'd expect, but it immediately sets the cursor at the end of the file. So that when you encounter the while (<FILE>) { ... } there's nothing to read.
One solution would be to reset the file cursor position:
open FILE, '+>>userinfo.txt';
seek(FILE,0,0); # set the file cursor at the top

In Perl, why does print not generate any output after I close STDOUT?

I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file
I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";
It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.
A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed
open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";
You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt

Both Reading and writing to a file

I am new to perl and am trying to read and write to a csv file in perl. But nothing happens can some one help me where the problem is. I am able to read without a problem using '<' but I am unable to write.
use strict;
use warnings;
use Text::CSV_XS;
my $file = 'file.csv';
my $csv = Text::CSV_XS->new();
open (InCSV, '+>>', $file) or die $!;
while (<InCSV>) {
if ($csv->parse($_)) {
my #columns = $csv->fields();
if($columns[1] eq "01") {
my $str = "Selected $columns[6] \n ";
push(#columns,$str);
print InCSV join("," , #columns), "\n";
}
} else {
my $err = $csv->error_input;
print "Failed to parse line: $err";
}
}
close InCSV;
Opening a file in +>> mode will seek to the end of the file, so there will be nothing to read unless you seek back to the beginning (or the middle) of the file. To open in read/write mode with the file cursor at the beginning of the file, use +< mode.
That said, you probably want to rethink your approach to this problem. It looks like you are trying to read a row of data, modify it, and write it back to the file. But the way you have done it, you are overwriting the next row of data rather than the row you have just read, and anyway the new data is longer (has more bytes) than the old data. This is certain to corrupt your data file.
Some better approaches might be to
read and process all data first, then close and overwrite the input with processed data
write data to a temporary file while you are processing it, then overwrite the input with the temporary file (see also about the perl interpreter's in-place editing mode)
use a module like Tie::File to handle the line-based I/O for this task