Is there a simple way to do bulk file text substitution in place? - perl

I've been trying to code a Perl script to substitute some text on all source files of my project. I'm in need of something like:
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" *.{cs,aspx,ascx}
But that parses all the files of a directory recursively.
I just started a script:
use File::Find::Rule;
use strict;
my #files = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
# In-place file editing, or something like that
}
}
But now I'm stuck. Is there a simple way to edit all files in place using Perl?
Please note that I don't need to keep a copy of every modified file; I'm have 'em all subversioned =)
Update: I tried this on Cygwin,
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" {*,*/*,*/*/*}.{cs,aspx,ascx
But it looks like my arguments list exploded to the maximum size allowed. In fact, I'm getting very strange errors on Cygwin...

If you assign #ARGV before using *ARGV (aka the diamond <>), $^I/-i will work on those files instead of what was specified on the command line.
use File::Find::Rule;
use strict;
#ARGV = (File::Find::Rule->file()->name('*.cs', '*.aspx', '*.ascx')->in('.'));
$^I = '.bak'; # or set `-i` in the #! line or on the command-line
while (<>) {
s/thisgoesout/thisgoesin/gi;
print;
}
This should do exactly what you want.
If your pattern can span multiple lines, add in a undef $/; before the <> so that Perl operates on a whole file at a time instead of line-by-line.

You may be interested in File::Transaction::Atomic or File::Transaction
The SYNOPSIS for F::T::A looks very similar with what you're trying to do:
# In this example, we wish to replace
# the word 'foo' with the word 'bar' in several files,
# with no risk of ending up with the replacement done
# in some files but not in others.
use File::Transaction::Atomic;
my $ft = File::Transaction::Atomic->new;
eval {
foreach my $file (#list_of_file_names) {
$ft->linewise_rewrite($file, sub {
s#\bfoo\b#bar#g;
});
}
};
if ($#) {
$ft->revert;
die "update aborted: $#";
}
else {
$ft->commit;
}
Couple that with the File::Find you've already written, and you should be good to go.

You can use Tie::File to scalably access large files and change them in place. See the manpage (man 3perl Tie::File).

Change
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
#inplace file editing, or something like that
}
}
To
foreach my $f (#files){
open my $in, '<', $f;
open my $out, '>', "$f.out";
while (my $line = <$in>){
chomp $line;
$line =~ s/thisgoesout/thisgoesin/gi
print $out "$line\n";
}
}
This assumes that the pattern doesn't span multiple lines. If the pattern might span lines, you'll need to slurp in the file contents. ("slurp" is a pretty common Perl term).
The chomp isn't actually necessary, I've just been bitten by lines that weren't chomped one too many times (if you drop the chomp, change print $out "$line\n"; to print $out $line;).
Likewise, you can change open my $out, '>', "$f.out"; to open my $out, '>', undef; to open a temporary file and then copy that file back over the original when the substitution's done. In fact, and especially if you slurp in the whole file, you can simply make the substitution in memory and then write over the original file. But I've made enough mistakes doing that that I always write to a new file, and verify the contents.
Note, I originally had an if statement in that code. That was most likely wrong. That would have only copied over lines that matched the regular expression "thisgoesout" (replacing it with "thisgoesin" of course) while silently gobbling up the rest.

You could use find:
find . -name '*.{cs,aspx,ascx}' | xargs perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi"
This will list all the filenames recursively, then xargs will read its stdin and run the remainder of the command line with the filenames appended on the end. One nice thing about xargs is it will run the command line more than once if the command line it builds gets too long to run in one go.
Note that I'm not sure whether find completely understands all the shell methods of selecting files, so if the above doesn't work then perhaps try:
find . | grep -E '(cs|aspx|ascx)$' | xargs ...
When using pipelines like this, I like to build up the command line and run each part individually before proceeding, to make sure each program is getting the input it wants. So you could run the part without xargs first to check it.
It just occurred to me that although you didn't say so, you're probably on Windows due to the file suffixes you're looking for. In that case, the above pipeline could be run using Cygwin. It's possible to write a Perl script to do the same thing, as you started to do, but you'll have to do the in-place editing yourself because you can't take advantage of the -i switch in that situation.

Thanks to ephemient on this question and on this answer, I got this:
use File::Find::Rule;
use strict;
sub ReplaceText {
my $regex = shift;
my $replace = shift;
#ARGV = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
$^I = '.bak';
while (<>) {
s/$regex/$replace->()/gie;
print;
}
}
ReplaceText qr/some(crazy)regexp/, sub { "some $1 text" };
Now I can even loop through a hash containing regexp=>subs entries!

Related

edit file contents in perl

I would like to read an input file and then delete a line if it matches my regex. and have the file saved without that line.
I have written
open(my $fh, '<:encoding(UTF-8)', $original_file_path or die "Could not open file $original_file_path $!";
while (my $line = <$fh> ) {
chomp $line;
if ($line ~=/myregex/){
delete line from file
}
}
Thank you
You can modify a file in place by using -i flag.
In your case, a simple one liner would do:
perl -i -ne 'print unless /myregex/' the_name_of_your_file
As mentioned by PerlDuck, if you wish to keep a copy the original file, it's possible: add an extension after the -i flag, like -i.bak or -i~, and then original file will be kept with this extension after its name.
You can find more information about inplace file modification on perlrun.
Note that if you are using Windows (MS-DOS), you will need to specify an extension for the backup file, that you are free to delete afterward. See this link.
You can obtain the same behavior in a script by setting $^I to a value different than undef. For instance:
#!/usr/bin/perl
use strict;
use warnings 'all';
{
local #ARGV = ( $original_file_path );
local $^I = ''; # or set it to something else if you want to keep a backup
while (<>) {
print unless /myregex/
}
}
I've used local #ARGV so if you already had something in #ARGV, it won't cause any troubles. If #ARGV was empty, then push #ARGV, $original_file_path would be fine too.
However, if you have more stuff to do in your script, you might prefer a full script over a one-liner. In that case, you should read your input file, and print the lines you want to keep to another file, then move the second file to the first.
There are some modules that can make your life easier. E.g. here's a solution using Path::Tiny:
use Path::Tiny;
path($original_file_path)->edit_lines_utf8(sub {
if (/myregex/) {
$_ = '';
}
});

Change line in textfile using perl

I read other places on how to do this but they were confusing for me.
I want to read lines from a text file and when I come across a certain line I want to append something to it.
My code is:
open my $p, "$username_filename" or die "can not open $username_filename: $!";
foreach $line (<$p>){
if ($line =~ /^listen/){
`echo "whatever" >> $username_file`;
}
}
However when I run this I get this error
sh: -c: line 0: syntax error near unexpected token `newline' sh: -c: line 0: `echo "current_user" >> '
Is this way correct to edit the file and why am I getting this error?
Working with files is not like editing in a word processor. Lines are an illusion, a file is just a big string of characters. You can't change a line in the middle of a file for the same reason you can't change a line in the middle of a book, the words can't be moved around to make room.
Instead, like a book, if you want to change something you need to rewrite the whole thing.
The basic algorithm is to...
Open the file for reading.
Open a temporary file for writing.
Read a line, alter the line, write the line.
Repeat 3 until done reading.
Overwrite the file with the temp file.
Some other notes...
print writes to STDOUT by default, but you can give it a filehandle to write to instead.
foreach my $line (<$fh>) is unfortunately not optimized to read files. It will read the possibly enormous file into memory. while(my $line = <$fh>) reads one line at a time.
I've turned on strict. This forces you to declare your variables. It protects you from typos like the one you made of $username_file vs $username_filename.
You could use something like "$filename.tmp" but File::Temp provides temp files that are guaranteed to be temporary, unique and cleaned up when the program exits.
use strict;
use warnings;
use autodie; # because writing 'or die' gets old fast
use File::Temp; # provides safe temp files
my $filename = ...; # set it somehow
open my $read, "<", $filename;
my $temp = File::Temp->new;
while(my $line = <$read>) {
if( $line =~ /^listen/ ) {
chomp $line; # remove the newline
$line .= " whatever\n"; # add our content and put a newline back
}
# Write the line to the temp file
print $temp $line;
}
# Overwrite our file with the rewritten temp file
rename $temp->filename, $filename;
That's inside a program. If you just want to do it quickly, you can do it on the command line with -i and -p.
perl -i.bak -pe 'if( /^listen/ ) { chomp; $_ .= "whatever" }' filename
-p says to run the code on each line of the file. The line will be put into $_ and whatever is in $_ will be printed. -i says to edit the file in place. -i.bak makes a backup of the original file just in case you make a mistake.
There are a few problems with your attempt. The big one is that using echo >> file will append to the file, not insert at some arbitrary place inside the file.
Another problem is that you're trying to append to a file called $username_file, and you haven't declared or defined that variable.
I don't think perl lets you insert into the middle of a file. I think your best bet would be to read the file a line at a time, and on the correct line(s), append the text you want. Write each line to a new file, then swap the files around at the end.
For example:
#!/usr/bin/perl
my $in_filename = "in.txt";
my $out_filename = "out.txt";
open (my $in, "<", $in_filename) or die;
open (my $out, ">", $out_filename) or die;
while (my $lline = <$in>)
{
chomp $lline;
if ( $lline =~ /listen/ )
{
print "$lline whatever\n";
}
else
{
print "$lline\n";
}
}
close $in;
close $out;
rename $in_filename, "$in_filename.original";
rename $out_filename, $in_filename;
I use chomp to remove line endings, because <$in> gives us a line including its line endings, wish otherwise messes up the append.
As always there are many ways to achieve this. I think using sed is probably a better option for this, but you specifically asked how to do it in perl, so perl it is.

Perl in place editing within a script (rather than one liner)

So, I'm used to the perl -i to use perl as I would sed and in place edit.
The docs for $^I in perlvar:
$^I
The current value of the inplace-edit extension. Use undef to disable inplace editing.
OK. So this implies that I can perhaps mess around with 'in place' editing in a script?
The thing I'm having trouble with is this:
If I run:
perl -pi -e 's/^/fish/' test_file
And then deparse it:
BEGIN { $^I = ""; }
LINE: while (defined($_ = <ARGV>)) {
s/^/fish/;
}
continue {
die "-p destination: $!\n" unless print $_;
}
Now - if I were to want to use $^I within a script, say to:
foreach my $file ( glob "*.csv" ) {
#inplace edit these files - maybe using Text::CSV to manipulate?
}
How do I 'enable' this to happen? Is it a question of changing $_ (as s/something/somethingelse/ does by default) and letting perl implicitly print it? Or is there something else going on?
My major question is - can I do an 'in place edit' that applies a CSV transform (or XML tweak, or similar).
I appreciate I can open separate file handles, read/print etc. I was wondering if there was another way. (even if it is only situationally useful).
The edit-in-place behaviour that is enabled by the -i command-line option or by setting $^I works only on the ARGV file handle. That means the files must either be named on the command line or #ARGV must be set up within the program
This program will change all lower-case letters to upper-case in all CSV files. Note that I have set $^I to a non-null string, which is advisable while you are testing so that your original data files are retained
use strict;
use warnings;
our $^I = '.bak';
while ( my $file = glob '*.csv' ) {
print "Processing $file\n";
our #ARGV = ($file);
while ( <ARGV> ) {
tr/a-z/A-Z/;
print;
}
}
There is a much simpler answer, if your script is always going to do in-place editing and your OS uses shebang:
#!perl -i
while (<>) {
print "LINE: $_"
}
Will add 'LINE: ' at the beginning of a line for each file it's given. (Note that you'd probably use the full path to perl, i.e., "#!/usr/bin/perl -i")
You can also call your script as:
% perl -i <script> <file1> <file2> ...
To run script as an in-place editor on file1, file2, etc.., if you don't have shebang support.

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js

Perl File Name Change

I am studying and extending a Perl script written by others. It has a line:
#pub=`ls $sourceDir | grep '\.htm' | grep -v Default | head -550`;
foreach (#pub) {
my $docName = $_;
chomp($docName);
$docName =~ s/\.htm$//g;
............}
I know that it uses a UNIX command firstly to take out all the htm files, then get rid of file extension.
Now I need to do one thing, which is also very important. That is, I need to change the file name of the actual files stored, by replacing the white space with underscore. I am stuck here because I am not sure whether I should follow his code style, achieving this by using UNIX, or I should do this in Perl? The point is that I need to modify the real file on the disk, not the string which used to hold the file name.
Thanks.
Something like this should help (not tested)
use File::Basename;
use File::Spec;
use File::Copy;
use strict;
my #files = grep { ! /Default/ } glob("$sourceDir/*.htm");
# I didn't implement the "head -550" part as I don't understand the point.
# But you can easily do it using `splice()` function.
foreach my $file (#files) {
next unless (-f $file); # Don't rename directories!
my $dirname = dirname($file); # file's directory, so we rename only the file itself.
my $file_name = basename($file); # File name fore renaming.
my $new_file_name = $file_name;
$new_file_name =~ s/ /_/g; # replace all spaces with underscores
rename($file, File::Spec->catfile($dirname, $new_file_name))
or die $!; # Error handling - what if we couldn't rename?
}
It will be faster to use File::Copy to move the file to its new name rather than using this method which forks off a new process, spawns a new shell, etc. it takes more memory and is slower than doing it within perl itself.
edit.. you can get rid of all that backtick b.s., too, like this
my #files = grep {!/Default/} glob "$sourcedir/*.html";