Deleting a line from a huge file in Perl - perl

I have huge text file and first five lines of it reads as below :
This is fist line
This is second line
This is third line
This is fourth line
This is fifth line
Now, I want to write something at a random position of the third line of that file which will replace the characters in that line by the new string I am writing. I am able to achieve that with the below code :
use strict;
use warnings;
my #pos = (0);
open my $fh, "+<", "text.txt";
while(<$fh) {
push #pos, tell($fh);
}
seek $fh , $pos[2]+1, 0;
print $fh "HELLO";
close($fh);
However, I am not able to figure out with the same kind of approach how can I delete the entire third line from that file so that the texts reads below :
This is fist line
This is second line
This is fourth line
This is fifth line
I do not want to read the entire file into an array, neither do I want to use Tie::File. Is it possible to achieve my requirement using seek and tell ? A solution will be very helpful.

A file is a sequence of bytes. We can replace (overwrite) some of them, but how would we remove them? Once a file is written its bytes cannot be 'pulled out' of the sequence or 'blanked' in any way. (The ones at the end of the file can be dismissed, by truncating the file as needed.)
The rest of the content has to move 'up', so that what follows the text to be removed overwrites it. We have to rewrite the rest of the file. In practice it is often far simpler to rewrite the whole file.
As a very basic example
use warnings 'all';
use strict;
use File::Copy qw(move);
my $file_in = '...';
my $file_out = '...'; # best use `File::Temp`
open my $fh_in, '<', $file_in or die "Can't open $file_in: $!";
open my $fh_out, '>', $file_out or die "Can't open $file_out: $!";
# Remove a line with $pattern
my $pattern = qr/this line goes/;
while (<$fh_in>)
{
print $fh_out $_ unless /$pattern/;
}
close $fh_in;
close $fh_out;
# Rename the new fie into the original one, thus replacing it
move ($file_out, $file_in) or die "Can't move $file_out to $file_in: $!";
This writes every line of input file into the output file, unless a line matches a given pattern. Then that file is renamed, replacing the original (what does not involve data copy). See this topic in perlfaq5.
Since we really use a temporary file I'd recommend the core module File::Temp for that.
This may be made more efficient, but far more complicated, by opening in update '+<' mode so to overwrite only a portion of the file. You iterate until the line with the pattern, record (tell) its position and the line length, then copy all remaining lines in memory. Then seek back to the position minus length of that line, and dump the copied rest of the file, overwriting the line and all that follows it.
Note that now the data for the rest of the file is copied twice, albeit one copy is in memory. Going to this trouble may make sense if the line to be removed is far down a very large file. If there are more lines to remove this gets messier.
Writing out a new file and copying it over the original changes the file's inode number. That may be a problem for some tools or procedures, and if it is you can instead update the original by either
Once the new file is written out, open it for reading and open the original for writing. This clobbers the original file. Then read from the new file and write to the original one, thus copying the content back to the same inode. Remove the new file when done.
Open the original file in read-write mode ('+<') to start with. Once the new file is written, seek to the beginning of the original (or to the place from which to overwrite) and write to it the content of the new file. Remember to also set the end-of-file if the new file is shorter,
truncate $fh, tell($fh);
after copying is done. This requires some care and the first way is probably generally safer.
If the file weren't huge the new "file" can be "written" in memory, as an array or a string.

Use sed command from Linux command line in Perl:
my $return = `sed -i '3d' text.txt`;
Where "3d" means delete the 3rd row.

It is useful to look at perlrun and see how perl itself modifies a file 'in-place.'
Given:
$ cat text.txt
This is fist line
This is second line
This is third line
This is fourth line
This is fifth line
You can apparently 'modify in-place', sed like, by using the -i and -p switch to invoke Perl:
$ perl -i -pe 's/This is third line\s*//' text.txt
$ cat text.txt
This is fist line
This is second line
This is fourth line
This is fifth line
But if you consult the Perl Cookbook recipe 7.9 (or look at perlrun) you will see that this:
$ perl -i -pe 's/This is third line\s*//' text.txt
is equivalent to:
while (<>) {
if ($ARGV ne $oldargv) { # are we at the next file?
rename($ARGV, $ARGV . '.bak');
open(ARGVOUT, ">$ARGV"); # plus error check
select(ARGVOUT);
$oldargv = $ARGV;
}
s/This is third line\s*//;
}
continue{
print;
}
select (STDOUT); # restore default output

Related

Change line in textfile using perl

I read other places on how to do this but they were confusing for me.
I want to read lines from a text file and when I come across a certain line I want to append something to it.
My code is:
open my $p, "$username_filename" or die "can not open $username_filename: $!";
foreach $line (<$p>){
if ($line =~ /^listen/){
`echo "whatever" >> $username_file`;
}
}
However when I run this I get this error
sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `echo "current_user" >> '
Is this way correct to edit the file and why am I getting this error?
Working with files is not like editing in a word processor. Lines are an illusion, a file is just a big string of characters. You can't change a line in the middle of a file for the same reason you can't change a line in the middle of a book, the words can't be moved around to make room.
Instead, like a book, if you want to change something you need to rewrite the whole thing.
The basic algorithm is to...
Open the file for reading.
Open a temporary file for writing.
Read a line, alter the line, write the line.
Repeat 3 until done reading.
Overwrite the file with the temp file.
Some other notes...
print writes to STDOUT by default, but you can give it a filehandle to write to instead.
foreach my $line (<$fh>) is unfortunately not optimized to read files. It will read the possibly enormous file into memory. while(my $line = <$fh>) reads one line at a time.
I've turned on strict. This forces you to declare your variables. It protects you from typos like the one you made of $username_file vs $username_filename.
You could use something like "$filename.tmp" but File::Temp provides temp files that are guaranteed to be temporary, unique and cleaned up when the program exits.
use strict;
use warnings;
use autodie; # because writing 'or die' gets old fast
use File::Temp; # provides safe temp files
my $filename = ...; # set it somehow
open my $read, "<", $filename;
my $temp = File::Temp->new;
while(my $line = <$read>) {
if( $line =~ /^listen/ ) {
chomp $line; # remove the newline
$line .= " whatever\n"; # add our content and put a newline back
}
# Write the line to the temp file
print $temp $line;
}
# Overwrite our file with the rewritten temp file
rename $temp->filename, $filename;
That's inside a program. If you just want to do it quickly, you can do it on the command line with -i and -p.
perl -i.bak -pe 'if( /^listen/ ) { chomp; $_ .= "whatever" }' filename
-p says to run the code on each line of the file. The line will be put into $_ and whatever is in $_ will be printed. -i says to edit the file in place. -i.bak makes a backup of the original file just in case you make a mistake.
There are a few problems with your attempt. The big one is that using echo >> file will append to the file, not insert at some arbitrary place inside the file.
Another problem is that you're trying to append to a file called $username_file, and you haven't declared or defined that variable.
I don't think perl lets you insert into the middle of a file. I think your best bet would be to read the file a line at a time, and on the correct line(s), append the text you want. Write each line to a new file, then swap the files around at the end.
For example:
#!/usr/bin/perl
my $in_filename = "in.txt";
my $out_filename = "out.txt";
open (my $in, "<", $in_filename) or die;
open (my $out, ">", $out_filename) or die;
while (my $lline = <$in>)
{
chomp $lline;
if ( $lline =~ /listen/ )
{
print "$lline whatever\n";
}
else
{
print "$lline\n";
}
}
close $in;
close $out;
rename $in_filename, "$in_filename.original";
rename $out_filename, $in_filename;
I use chomp to remove line endings, because <$in> gives us a line including its line endings, wish otherwise messes up the append.
As always there are many ways to achieve this. I think using sed is probably a better option for this, but you specifically asked how to do it in perl, so perl it is.

Line breaks don't exist on input from FTP file (Perl)

I downloaded a csv file using Net::FTP. When I look at this file in text editor or excel or even when I cut/paste it has line breaks and looks like this:
000000000G911|06
0000000000CDR|25|123
0000000000EGP|19
When I read the file in Perl it sees the entire text as one line like this:
000000000G911|060000000000CDR|25|1230000000000EGP|19
I have tried reading it using
tie #lines, 'Tie::File', "C:/Programs/myfile.csv", autochomp=>0 or die "Can't read file: $!\n";
foreach $l (#lines1)
{print "$l\n";
}
and
open FILE, "`<`$filename" or die $!;
my #lines=`<`FILE>;
foreach $l (#lines)
{print "$l\n";
}
close FILE;
The file has line breaks in a format that Perl is not recognizing because it is coming from a different operating system. The other programs are automatically detecting the different line break format, but Perl doesn't do that.
If you have Net::FTP perform the transfer in ASCII mode (e.g. $ftp->ascii to enable this mode), this should be taken care of and corrected for you.
Alternatively, you can figure out what is being used for line breaks and then set the special $/ variable to that value.

Perl Read/Write File Handle - Unable to Overwrite

I have a Perl Script which performs a specific operation and based on the result, it should update a file.
Basic overview is:
Read a value from the file handle, FILE
Perform some operation and then compare the result with the value stored in INPUT file.
If there is a change, then update the file corresponding to File Handle.
When I say, update, I mean, overwrite the existing value in INPUT file with the new one.
An overview of the script:
#! /usr/bin/perl
use warnings;
use diagnostics;
$input=$ARGV[0];
open(FILE,"+<",$input) || die("Couldn't open the file, $input with error: $!\n");
# perform some operation and set $new_value here.
while(<FILE>)
{
chomp $_;
$old_value=$_;
if($new_value!=$old_value)
{
print FILE $new_value,"\n";
}
}
close FILE;
However, this appends the $new_value to the file instead of overwriting it.
I have read the documentation in several places for this mode of FILE Handle and everywhere it says, read/write mode without append.
I am not sure, why it is unable to overwrite. One reason I could think of is, since I am reading from the handle in the while loop and trying to overwrite it at the same time, it might not work.
Thanks.
your guess is right. You first read the file so file pointer is actually in the position of end of old value. I didn't try this myself, but you can probably seek file pointer to 0 before print it out.
seek(FILE, 0, 0);
You should add truncate to your program along with seek.
if( $new_value != $old_value )
{
seek( FILE, 0, 0 );
truncate FILE, 0;
print FILE $new_value,"\n";
}
Since the file is opened for reading and writing, writing a shorter $new_value will leave some of the $old_value in the file. truncate will remove it.
See perldoc -f seek and perldoc -f truncate for details.
you have to close the file handle and open a different one (or the same one if you like) set to the output file. like this.
close FILE;
open FILE, ">$input" or die $!;
...
close FILE;
that should do the trick

In Perl, why does print not generate any output after I close STDOUT?

I have the code:
open(FILE, "<$new_file") or die "Cant't open file \n";
#lines=<FILE>;
close FILE;
open(STDOUT, ">$new_file") or die "Can't open file\n";
$old_fh = select(OUTPUT_HANDLE);
$| = 1;
select($old_fh);
for(#lines){
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
close(STDOUT);
STDOUT -> autoflush(1);
print "file changed";
After closing STDOUT closing the program does not write the last print print "file changed". Why is this?
*Edited* Print message I want to write on Console no to file
I suppose it is because print default filehandle is STDOUT, which at that point it is already closed. You could reopen it, or print to other filehandle, for example, STDERR.
print STDERR "file changed";
It's because you've closed the filehandle stored in STDOUT, so print can't use it anymore. Generally speaking opening a new filehandle into one of the predefined handle names isn't a very good idea because it's bound to lead to confusion. It's much clearer to use lexical filehandles, or just a different name for your output file. Yes you then have to specify the filehandle in your print call, but then you don't have any confusion over what's happened to STDOUT.
A print statement will output the string in the STDOUT, which is the default output file handle.
So the statement
print "This is a message";
is same as
print STDOUT "This is a message";
In your code, you have closed STDOUT and then printing the message, which will not work. Reopen the STDOUT filehandle or do not close it. As the script ends, the file handles will be automatically closed
open OLDOUT, ">&", STDOUT;
close STDOUT;
open(STDOUT, ">$new_file") or die "Can't open file\n";
...
close(STDOUT);
open (STDOUT, ">&",OLDOUT);
print "file changed";
You seem to be confused about how file IO operations are done in perl, so I would recommend you read up on that.
What went wrong?
What you are doing is:
Open a file for reading
Read the entire file and close it
Open the same file for overwrite (org file is truncated), using the STDOUT file handle.
Juggle around the default print handle in order to set autoflush on a file handle which is not even opened in the code you show.
Perform a substitution on all lines and print them
Close STDOUT then print a message when everything is done.
Your main biggest mistake is trying to reopen the default output file handle STDOUT. I assume this is because you do not know how print works, i.e. that you can supply a file handle to print to print FILEHANDLE "text". Or that you did not know that STDOUT was a pre-defined file handle.
Your other errors:
You did not use use strict; use warnings;. No program you write should be without these. They will prevent you from doing bad things, and give you information on errors, and will save you hours of debugging.
You should never "slurp" a file (read the entire file to a variable) unless you really need to, because this is ineffective and slow and for huge files will cause your program to crash due to lack of memory.
Never reassign the default file handles STDIN, STDOUT, STDERR, unless A) you really need to, B) you know what you are doing.
select sets the default file handle for print, read the documentation. This is rarely something that you need to concern yourself with. The variable $| sets autoflush on (if set to a true value) for the currently selected file handle. So what you did actually accomplished nothing, because OUTPUT_HANDLE is a non-existent file handle. If you had skipped the select statements, it would have set autoflush for STDOUT. (But you wouldn't have noticed any difference)
print uses print buffers because it is efficient. I assume you are trying to autoflush because you think your prints get caught in the buffer, which is not true. Generally speaking, this is not something you need to worry about. All the print buffers are automatically flushed when a program ends.
For the most part, you do not need to explicitly close file handles. File handles are automatically closed when they go out of scope, or when the program ends.
Using lexical file handles, e.g. open my $fh, ... instead of global, e.g. open FILE, .. is recommended, because of the previous statement, and because it is always a good idea to avoid global variables.
Using three-argument open is recommended: open FILEHANDLE, MODE, FILENAME. This is because you otherwise risk meta-characters in your file names to corrupt your open statement.
The quick fix:
Now, as I said in the comments, this -- or rather, what you intended, because this code is wrong -- is pretty much identical to the idiomatic usage of the -p command line switch:
perl -pi.bak -e 's/(.*?xsl.*?)xsl/$1xslt/' file.txt
This short little snippet actually does all that your program does, but does it much better. Explanation:
-p switch automatically assumes that the code you provide is inside a while (<>) { } loop, and prints each line, after your code is executed.
-i switch tells perl to do inplace-edit on the file, saving a backup copy in "file.txt.bak".
So, that one-liner is equivalent to a program such as this:
$^I = ".bak"; # turns inplace-edit on
while (<>) { # diamond operator automatically uses STDIN or files from #ARGV
s/(.*?xsl.*?)xsl/$1xslt/;
print;
}
Which is equivalent to this:
my $file = shift; # first argument from #ARGV -- arguments
open my $fh, "<", $file or die $!;
open my $tmp, ">", "/tmp/foo.bar" or die $!; # not sure where tmpfile is
while (<$fh>) { # read lines from org file
s/(.*?xsl.*?)xsl/$1xslt/;
print $tmp $_; # print line to tmp file
}
rename($file, "$file.bak") or die $!; # save backup
rename("/tmp/foo.bar", $file) or die $!; # overwrite original file
The inplace-edit option actually creates a separate file, then copies it over the original. If you use the backup option, the original file is first backed up. You don't need to know this information, just know that using the -i switch will cause the -p (and -n) option to actually perform changes on your original file.
Using the -i switch with the backup option activated is not required (except on Windows), but recommended. A good idea is to run the one-liner without the option first, so the output is printed to screen instead, and then adding it once you see the output is ok.
The regex
s/(.*?xsl.*?)xsl/$1xslt/;
You search for a string that contains "xsl" twice. The usage of .*? is good in the second case, but not in the first. Any time you find yourself starting a regex with a wildcard string, you're probably doing something wrong. Unless you are trying to capture that part.
In this case, though, you capture it and remove it, only to put it back, which is completely useless. So the first order of business is to take that part out:
s/(xsl.*?)xsl/$1xslt/;
Now, removing something and putting it back is really just a magic trick for not removing it at all. We don't need magic tricks like that, when we can just not remove it in the first place. Using look-around assertions, you can achieve this.
In this case, since you have a variable length expression and need a look-behind assertion, we have to use the \K (mnemonic: Keep) option instead, because variable length look-behinds are not implemented.
s/xsl.*?\Kxsl/xslt/;
So, since we didn't take anything out, we don't need to put anything back using $1. Now, you may notice, "Hey, if I replace 'xsl' with 'xslt', I don't need to remove 'xsl' at all." Which is true:
s/xsl.*?xsl\K/t/;
You may consider using options for this regex, such as /i, which causes it to ignore case and thus also match strings such as "XSL FOO XSL". Or the /g option which will allow it to perform all possible matches per line, and not just the first match. Read more in perlop.
Conclusion
The finished one-liner is:
perl -pi.bak -e 's/xsl.*?xsl\K/t/' file.txt

How to insert a line into the middle of an existing file

consider an example where i want to insert few lines of text when
particular patter matches(if $line=~m/few lines in here/ then
insert lines in next line):
*current file:*
"This is my file and i wanna insert few lines in here and other
text of the file will continue."
*After insertion:*
"This is my file and i wanna insert few lines in here this is my
new text which i wanted to insert and other text of the file will
continue."
This is my code:
my $sourcename = $ARGV[1];
my $destname = $ARGV[0];
print $sourcename,"\n";
print $destname,"\n";
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename) or die "Could not open source Excel file $sourcename: $!";
my $source_cell;
#Sheet 1 - source sheet page having testnumber and worksheet number
my $source_sheet = $source_book->{Worksheet}[0]; #It is used to access worksheet
$source_cell = $source_sheet->{Cells}[1][0]; #Reads content of the cell;
my $seleniumHost = $source_cell->Value;
print $seleniumHost,"\n";
open (F, '+>>',"$destname") or die "Couldn't open `$destname': $!";
my $line;
while ($line = <F>){
print $line;
if($line=~m/FTP/){
#next if /FTP/;
print $line;
print F $seleniumHost;}
The perlfaq covers this. How do I change, delete, or insert a line in a file, or append to the beginning of a file?
Files are fixed blocks of data. They behave much like a piece of paper. How do you insert a line into the middle of a piece of paper? You can't, not unless you left space. You must recopy the whole thing, inserting your line into the new copy.
In a perl one-liner :
perl -ane 's/few lines in here and other\n/this is my\nnew text which i wanted to insert and other /; s/continue./\ncontinue./; print ' FILE
If you don't want a one-liner, it's easy to takes the substitutions in any script ;)
As long as you know the line:
perl -ne 'if ($. == 8) {s//THIS IS NEW!!!\n/}; print;'
Obviously you'd have to use -i to make the actual changes
OR:
perl -i -pe 'if($. == 8) {s//THIS IS NEW!!!\n/}' file
Someone mentioned Tie::File, which is a solution I'll have to look at for editing a file, but I generally use File::Slurp, which has relatively recently added edit_file and edit_file_lines subs.
Using perl's in-place edit flag (-i), it's easy to add lines to an existing file using Perl, as long as you can key off a text string, such as (in your case) "wanna insert few lines in here":
perl -pi -e 's{wanna insert few lines in here}{wanna insert few lines in here this is my\nnew text which i wanted to insert }' filename
It overwrites your old sentence (don't be scared) with a copy of your old sentence (nothing lost) plus the new stuff you want injected. You can even create a backup of the original file if you wish by passing a ".backup" extension to the -i flag:
perl -p -i'.backup' -e 's{wanna insert few lines in here}{wanna insert few lines in here this is my\nnew text which i wanted to insert }' filename
More info on Perl's search & replace capabilities can be found here:
http://www.atrixnet.com/in-line-search-and-replace-in-files-with-real-perl-regular-expressions/
You can avoid having to repeat the "markup" text using variable substitution.
echo -e "first line\nthird line" | perl -pe 's/(^first line$)/\1\nsecond line/'