Open file for reading and writing(not appending) in perl - perl

Is there any way with the standard perl libraries to open a file and edit it, without having to close it then open it again? All I know how to do is to either read a file into a string close the file then overwrite the file with a new one; or read and then append to the end of a file.
The following currently works but; I have to open it and close it twice, instead of once:
#!/usr/bin/perl
use warnings; use strict;
use utf8; binmode(STDIN, ":utf8"); binmode(STDOUT, ":utf8");
use IO::File; use Cwd; my $owd = getcwd()."/"; # OriginalWorkingDirectory
use Text::Tabs qw(expand unexpand);
$Text::Tabs::tabstop = 4; #sets the number of spaces in a tab
opendir (DIR, $owd) || die "$!";
my #files = grep {/(.*)\.(c|cpp|h|java)/} readdir DIR;
foreach my $x (#files){
my $str;
my $fh = new IO::File("+<".$owd.$x);
if (defined $fh){
while (<$fh>){ $str .= $_; }
$str =~ s/( |\t)+\n/\n/mgos;#removes trailing spaces or tabs
$str = expand($str);#convert tabs to spaces
$str =~ s/\/\/(.*?)\n/\/\*$1\*\/\n/mgos;#make all comments multi-line.
#print $fh $str;#this just appends to the file
close $fh;
}
$fh = new IO::File(" >".$owd.$x);
if (defined $fh){
print $fh $str; #this just appends to the file
undef $str; undef $fh; # automatically closes the file
}
}

You already opened the file for reading and writing by opening it with the mode <+, you're just not doing anything useful with it -- if you wanted to replace the contents of the file instead of writing to the current position (the end of the file) then you should seek back to the beginning, write what you need to, and then truncate to make sure that there's nothing left over if you made the file shorter.
But since what you're trying to do is inplace filtering of a file, might I suggest that you use perl's inplace editing extension, instead of doing all of the work yourself?
#!perl
use strict;
use warnings;
use Text::Tabs qw(expand unexpand);
$Text::Tabs::tabstop = 4;
my #files = glob("*.c *.h *.cpp *.java");
{
local $^I = ""; # Enable in-place editing.
local #ARGV = #files; # Set files to operate on.
while (<>) {
s/( |\t)+$//g; # Remove trailing tabs and spaces
$_ = expand($_); # Expand tabs
s{//(.*)$}{/*$1*/}g; # Turn //comments into /*comments*/
print;
}
}
And that's all the code you need -- perl handles the rest. Setting the $^I variable is equivalent to using the -i commandline flag. I made several changes to your code along the way -- use utf8 does nothing for a program with no literal UTF-8 in the source, binmodeing stdin and stdout does nothing for a program that never uses stdin or stdout, saving the CWD does nothing for a program that never chdirs. There was no reason to read each file in all at once so I changed it to linewise, and made the regexes less awkward (and incidentally, the /o regex modifier is good for almost precisely nothing these days, except adding hard-to-find bugs to your code).

Related

edit file contents in perl

I would like to read an input file and then delete a line if it matches my regex. and have the file saved without that line.
I have written
open(my $fh, '<:encoding(UTF-8)', $original_file_path or die "Could not open file $original_file_path $!";
while (my $line = <$fh> ) {
chomp $line;
if ($line ~=/myregex/){
delete line from file
}
}
Thank you
You can modify a file in place by using -i flag.
In your case, a simple one liner would do:
perl -i -ne 'print unless /myregex/' the_name_of_your_file
As mentioned by PerlDuck, if you wish to keep a copy the original file, it's possible: add an extension after the -i flag, like -i.bak or -i~, and then original file will be kept with this extension after its name.
You can find more information about inplace file modification on perlrun.
Note that if you are using Windows (MS-DOS), you will need to specify an extension for the backup file, that you are free to delete afterward. See this link.
You can obtain the same behavior in a script by setting $^I to a value different than undef. For instance:
#!/usr/bin/perl
use strict;
use warnings 'all';
{
local #ARGV = ( $original_file_path );
local $^I = ''; # or set it to something else if you want to keep a backup
while (<>) {
print unless /myregex/
}
}
I've used local #ARGV so if you already had something in #ARGV, it won't cause any troubles. If #ARGV was empty, then push #ARGV, $original_file_path would be fine too.
However, if you have more stuff to do in your script, you might prefer a full script over a one-liner. In that case, you should read your input file, and print the lines you want to keep to another file, then move the second file to the first.
There are some modules that can make your life easier. E.g. here's a solution using Path::Tiny:
use Path::Tiny;
path($original_file_path)->edit_lines_utf8(sub {
if (/myregex/) {
$_ = '';
}
});

Perl Script can't use Tie::File

I'm trying to run a perl script which uses the Tie::File module.
What it basically is supposed to do is read in all the files from the current directory, cut off the last line of the first document, then the first and last line of every other document and the first line of the last document, then write everything to a new document.
When I'm trying to run my script (which might have some mistakes in it...I'd be happy if someone could correct them if you find any) I'm getting an errormessage:
Can't locate object method "TIEARRAY" via package "TIE:File" at script.pl line 28, <$fh> line 7.
I've marked line 28 in the code.
I've installed the latest version of Tie::File and checked with
cpan Tie::File
and
cpan Tie::Array
if everything is installed, I received Tie::Array is up to date (v1.06) and Tie::File is up to date (v1.00) from the terminal, so they have to be installed correctly.
#!/usr/bin/perl
use Cwd;
use Tie::File;
use Tie::Array;
my $cwd = getcwd();
my $buff = '';
# Get all files in cwd.
#my #files = grep { -f && /\.txt$/ } readdir $cwd;
my #files = grep ( -f ,<*.txt>);
# Cut off footer of first (files[0]) file
print 'Opening' . $files[0] . "\n";
use Tie::File;
tie (#lines, Tie::File, $files[0]) or die "can't update $file: $!";
delete $lines[-1];
# Cut off header and footer of $files [1] to $files[-2]
for ($a = 1, $a < $#files-1, $a++){
print 'Opening' . $file . "\n";
use Tie::FILE;
tie (#lines, TIE::File, $files[$a]) or die "can't update $file: $!"; ####this is line 28
delete $lines[0];
delete $lines[-1];
open (FILE, "<", $files[$a]) or die $!;
while (my $line =<FILE>) {
$buff .= $line;
}
close FILE;
}
print 'Opening' . $files[-1] . "\n";
use Tie::FILE;
tie (#lines, TIE::File, $files[-1]) or die "can't update $file: $!";
delete $lines[0];
open (lastfile, "<", $files[-1]) or die "can't open $files[-1]: $!";
while (my $line =<lastfile>) {
$buff .= $line;
}
close lastfile;
# Write the buffer to a new file.
my $allfilename = $cwd.'/Trace.txt';
print 'Writing all files into new file: ' . $allfilename . "\n";
open $outputfile, ">".$allfilename or die $!;
# Write the buffer into the output file.
print $outputfile $buff;
close $outputfile;
Perl module names are case sensitive. The module is called Tie::File, not Tie::FILE or TIE::File.
Your program is frankly a bit of a mess. You seem to be trying things in the hope that they work but without any real reasoning.
I have refactored your code to do what I think you want below. Here are the main changes I have made
You must always add use strict and use warnings to every Perl program you write, and declare all your variables with my as close as possible to their first point of use. Those simple measures alone will save you from a lot of simple errors that you will otherwise overlook
You don't need Tie::Array or Cwd. They are irrelevant to this program
Your tie statement needs a string as the second parameter, so you need to use 'Tie::File' instead of Tie::File
Your output file Trace.txt will be found by the <*.txt> glob, so unless you take measures to specifically exclude it your program will copy trim the first and last lines and copy the contents of that file to itself. In my program I have simply checked in the for loop whether the current file name is Trace.txt and skipped it if so
There is no point in accumulating the data in a buffer $buff. You may as well just write the data to the file as you encounter it
The lines in the tied array #lines have no trailing newline, so you will presumably want to add one when you write to the file
As has been discussed in the comments, you are using Tie::FILE and TIE::File as well as the correct Tie::File. And you have written use Tie::File (and its variations) four times in total. Sure it doesn't stop the program from working, but it is a major indication of foggy thinking, and that you are just statements around in the hope that they make your program work
Using delete on anything other than the last element of an array just sets that element to undef: it doesn't delete it, and all that happens in the tied file is that the text is removed leaving just a newline. You need to use splice instead
Separating your files into the first, the last, and the rest is unnecessary and makes your code illegible. In my program below I have used a single loop that removes the first line of the file unless it's the first fil, and removes the last line of the file unless it's the last file. It's far easier to read that way
Lastly, I'm not at all sure that you want to remove the first and last lines from the existing files, or if you just want all the data copied to your output file except those lines. I have written my program according to your specification, but bear in mind that the files will get shorter by two lines every time you run it, and that probably isn't the effect you want. If you have a different requirement and can't see how to modify the code to achieve it then please ask another question.
I hope this helps you.
use strict;
use warnings;
use Tie::File;
my #files = grep -f, glob '*.txt';
my $all_filename = 'Trace.txt';
open my $out_fh, '>', $all_filename or die qq{Unable to open "$all_filename" for output: $!};
for my $i ( 0 .. $#files ) {
my $file = $files[$i];
next if $file eq $all_filename;
print "Opening $file\n";
tie my #lines, 'Tie::File', $file or die qq{Can't update "$file": $!};
splice #lines, 0, 1 unless $i == 0;
splice #lines, -1, 1 unless $i == $#files;
print $out_fh "$_\n" for #lines;
}
close $out_fh;

Perl incorrectly adding newline characters?

This is my tab delimited input file
Name<tab>Street<tab>Address
This is how I want my output file to look like
Street<tab>Address<tab>Address
(yes duplicate the next two columns) My output file looks like this instead
Street<tab>Address
<tab>Address
What is going on with perl? This is my code.
open (IN, $ARGV[0]);
open (OUT, ">output.txt");
while ($line = <IN>){
chomp $line;
#line=split/\t/,$line;
$line[2]=~s/\n//g;
print OUT $line[1]."\t".$line[2]."\t".$line[2]."\n";
}
close( OUT);
First of all, you should always
use strict and use warnings for even the most trivial programs. You will also need to declare each of your variables using my as close as possible to their first use
use lexical file handles and the three-parameter form of open
check the success of every open call, and die with a string that includes $! to show the reason for the failure
Note also that there is no need to explicitly open files named on the command line that appear in #ARGV: you can just read from them using <>.
As others have said, it looks like you are reading a file of DOS or Windows origin on a Linux system. Instead of using chomp, you can remove all trailing whitespace characters from each line using s/\s+\z//. Since CR and LF both count as "whitespace", this will remove all line terminators from each record. Beware, however, that, if trailing space is significant or if the last field may be blank, then this will also remove spaces and tabs. In that case, s/[\r\n]+\z// is more appropriate.
This version of your program works fine.
use strict;
use warnings;
#ARGV = 'addr.txt';
open my $out, '>', 'output.txt' or die $!;
while (<>) {
s/\s+\z//;
my #fields = split /\t/;
print $out join("\t", #fields[1, 2, 2]), "\n";
}
close $out or die $!;
If you know beforehand the origin of your data file, and know it to be a DOS-like file that terminates records with CR LF, you can use the PerlIO crlf layer when you open the file. Like this
open my $in, '<:crlf', $ARGV[0] or die $!;
then all records will appear to end in just "\n" when they are read on a Linux system.
A general solution to this problem is to install PerlIO::eol. Then you can write
open my $in, '<:raw:eol(LF)', $ARGV[0] or die $!;
and the line ending will always be "\n" regardless of the origin of the file, and regardless of the platform where Perl is running.
Did you try to eliminate not only the "\n" but also the "\r"???
$file[2] =~ s/\r\n//g;
$file[3] =~ s/\r\n//g; # Is it the "good" one?
It could work. DOS line endings could also be "\r" (not only "\n").
Another way to avoid end of line problems is to only capture the characters you're interested in:
open (IN, $ARGV[0]);
open (OUT, ">output.txt");
while (<IN>) {
print OUT "$1\t$2\t$2\n" if /^(\w+)\t\w+\t(\w+)\s*/;
}
close( OUT);

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js

File is adding one extra space in each line

I am trying to add all the elements in array using push . then i stored into another file
but begining of file i am seeing one whitespeace in every thing ..
What is the issue .. any one before face this issue .
open FILE , "a.txt"
while (<FILE>)
{
my $temp =$_;
push #array ,$temp;
}
close(FILE);
open FILE2, "b.txt";
print FILE2 "#array";
close FILE2;
When you quote an array variable like this: "#array" it gets interpolated with spaces. That's where they come from in your output. So do not quote if you do not need or want this sort of interpolation.
Now let's rewrite your program to modern Perl.
use strict;
use warnings FATAL => 'all';
use autodie qw(:all);
my #array;
{
open my $in, '<', 'a.txt';
#array = <$in>;
}
{
open my $out, '>', 'b.txt';
print {$out} #array;
}
You put quotes around "#array". That makes it a string interpolation, which for arrays is equivalent to join($", #array). The default value for $" is (guess what?) a space.
Try
print FILE2 #array;
open usually takes another argument that specifies whether the file is opened for input or for output (or for both or for some other special case). You have omitted this argument, and so by default FILE2 is an input filehandle.
You wanted to say
open FILE2, '>', "b.txt"
If you put the line
use warnings;
at the beginning of every Perl script, the interpreter will catch many issues like this for you.