Perl in place editing within a script (rather than one liner)

Perl in place editing within a script (rather than one liner) - perl

So, I'm used to the perl -i to use perl as I would sed and in place edit.
The docs for $^I in perlvar:
$^I
The current value of the inplace-edit extension. Use undef to disable inplace editing.
OK. So this implies that I can perhaps mess around with 'in place' editing in a script?
The thing I'm having trouble with is this:
If I run:
perl -pi -e 's/^/fish/' test_file
And then deparse it:
BEGIN { $^I = ""; }
LINE: while (defined($_ = <ARGV>)) {
s/^/fish/;
}
continue {
die "-p destination: $!\n" unless print $_;
}
Now - if I were to want to use $^I within a script, say to:
foreach my $file ( glob "*.csv" ) {
#inplace edit these files - maybe using Text::CSV to manipulate?
}
How do I 'enable' this to happen? Is it a question of changing $_ (as s/something/somethingelse/ does by default) and letting perl implicitly print it? Or is there something else going on?
My major question is - can I do an 'in place edit' that applies a CSV transform (or XML tweak, or similar).
I appreciate I can open separate file handles, read/print etc. I was wondering if there was another way. (even if it is only situationally useful).

The edit-in-place behaviour that is enabled by the -i command-line option or by setting $^I works only on the ARGV file handle. That means the files must either be named on the command line or #ARGV must be set up within the program
This program will change all lower-case letters to upper-case in all CSV files. Note that I have set $^I to a non-null string, which is advisable while you are testing so that your original data files are retained
use strict;
use warnings;
our $^I = '.bak';
while ( my $file = glob '*.csv' ) {
print "Processing $file\n";
our #ARGV = ($file);
while ( <ARGV> ) {
tr/a-z/A-Z/;
print;
}
}

There is a much simpler answer, if your script is always going to do in-place editing and your OS uses shebang:
#!perl -i
while (<>) {
print "LINE: $_"
}
Will add 'LINE: ' at the beginning of a line for each file it's given. (Note that you'd probably use the full path to perl, i.e., "#!/usr/bin/perl -i")
You can also call your script as:
% perl -i <script> <file1> <file2> ...
To run script as an in-place editor on file1, file2, etc.., if you don't have shebang support.

Related

edit file contents in perl

I would like to read an input file and then delete a line if it matches my regex. and have the file saved without that line.
I have written
open(my $fh, '<:encoding(UTF-8)', $original_file_path or die "Could not open file $original_file_path $!";
while (my $line = <$fh> ) {
chomp $line;
if ($line ~=/myregex/){
delete line from file
}
}
Thank you

You can modify a file in place by using -i flag.
In your case, a simple one liner would do:
perl -i -ne 'print unless /myregex/' the_name_of_your_file
As mentioned by PerlDuck, if you wish to keep a copy the original file, it's possible: add an extension after the -i flag, like -i.bak or -i~, and then original file will be kept with this extension after its name.
You can find more information about inplace file modification on perlrun.
Note that if you are using Windows (MS-DOS), you will need to specify an extension for the backup file, that you are free to delete afterward. See this link.
You can obtain the same behavior in a script by setting $^I to a value different than undef. For instance:
#!/usr/bin/perl
use strict;
use warnings 'all';
{
local #ARGV = ( $original_file_path );
local $^I = ''; # or set it to something else if you want to keep a backup
while (<>) {
print unless /myregex/
}
}
I've used local #ARGV so if you already had something in #ARGV, it won't cause any troubles. If #ARGV was empty, then push #ARGV, $original_file_path would be fine too.
However, if you have more stuff to do in your script, you might prefer a full script over a one-liner. In that case, you should read your input file, and print the lines you want to keep to another file, then move the second file to the first.

There are some modules that can make your life easier. E.g. here's a solution using Path::Tiny:
use Path::Tiny;
path($original_file_path)->edit_lines_utf8(sub {
if (/myregex/) {
$_ = '';
}
});

How to read the contents of a file

I am confused on how to read the contents of a text file. I'm able to read the files name but can't figure out how to get the contents. By the way the file is encrypted that's why I'm trying to decrypt.
#!/Strawberry/perl/bin/perl
use v5.14;
sub encode_decode {
shift =~ tr/A-Za-z/Z-ZA-Yz-za-y/r;
}
my ($file1) = #ARGV;
open my $fh1, '<', $file1;
while (<$fh1>) {
my $enc = encode_decode($file1);
print my $dec = encode_decode($enc);
# ... do something with $_ from $file1 ...
}
close $fh1;

This line
my $enc = encode_decode($file1)
passes the name of the file to encode_decode
A loop like while ( <$fh1> ) { ... } puts each line from the file into the default variable $_. You've written so yourself in your comment “do something with $_ from $file1 ...”. You probably want
my $enc = encode_decode($_)
And, by the way, your encode_decode subroutine won't reverse its own encoding. You've written what is effectively a ROT25 encoding, so you would have to apply encode_decode 26 times to get back to the original string
It's also worth noting that your shebang line
#!/Strawberry/perl/bin/perl
is pointless on Windows because the command shell doesn't process shebang lines. Perl itself will check the line for options like -w or -i, but you shouldn't be using those anyway. Just omit the line, or if you want to be able to run your program on Linux as well as Windows then use
#!/bin/env perl
which will cause a Linux shell to search the PATH variable for the first perl executable

How to remove a specific word from a file in perl

A file contains:
rhost=localhost
ruserid=abcdefg_xxx
ldir=
lfile=
rdir=p01
rfile=
pgp=none
mainframe=no
ftpmode=binary
ftpcmd1=
ftpcmd2=
ftpcmd3=
ftpcmd1a=
ftpcmd2a=
notifycc=no
firstfwd=Yes
NOTIFYNYL=
decompress=no
compress=no
I want to write a simple code that removes the "_xxx" in that second line. Keep in mind that there will never be a file that contains the string "_xxx" so that should make it extremely easier. I'm just not too familiar with the syntax. Thanks!

The short answer:
Here's how you can remove just the literal '_xxx'.
perl -pli.bak -e 's/_xxx$//' filename
The detailed explanation:
Since Perl has a reputation for code that is indistinguishable from line noise, here's an explanation of the steps.
-p creates an implicit loop that looks something like this:
while( <> ) {
# Your code goes here.
}
continue {
print or die;
}
-l sort of acts like "auto-chomp", but also places the line ending back on the line before printing it again. It's more complicated than that, but in its simplest use, it changes your implicit loop to look like this:
while( <> ) {
chomp;
# Your code goes here.
}
continue {
print $_, $/;
}
-i tells Perl to "edit in place." Behind the scenes it creates a separate output file and at the end it moves that temporary file to replace the original.
.bak tells Perl that it should create a backup named 'originalfile.bak' so that if you make a mistake it can be reversed easily enough.
Inside the substitution:
s/
_xxx$ # Match (but don't capture) the final '_xxx' in the string.
/$1/x; # Replace the entire match with nothing.
The reference material:
For future reference, information on the command line switches used in Perl "one-liners" can be obtained in Perl's documentation at perlrun. A quick introduction to Perl's regular expressions can be found at perlrequick. And a quick overview of Perl's syntax is found at perlintro.

This overwrites the original file, getting rid of _xxx in the 2nd line:
use warnings;
use strict;
use Tie::File;
my $filename = shift;
tie my #lines, 'Tie::File', $filename or die $!;
$lines[1] =~ s/_xxx//;
untie #lines;

Maybe this can help
perl -ple 's/_.*// if /^ruserid/' < file
will remove anything after the 1st '_' (included) in the line what start with "ruserid".

One way using perl. In second line ($. == 2), delete from last _ until end of line:
perl -lpe 's/_[^_]*\Z// if $. == 2' infile

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?

It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.

To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;

I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js

Is there a simple way to do bulk file text substitution in place?

I've been trying to code a Perl script to substitute some text on all source files of my project. I'm in need of something like:
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" *.{cs,aspx,ascx}
But that parses all the files of a directory recursively.
I just started a script:
use File::Find::Rule;
use strict;
my #files = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
# In-place file editing, or something like that
}
}
But now I'm stuck. Is there a simple way to edit all files in place using Perl?
Please note that I don't need to keep a copy of every modified file; I'm have 'em all subversioned =)
Update: I tried this on Cygwin,
perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi" {*,*/*,*/*/*}.{cs,aspx,ascx
But it looks like my arguments list exploded to the maximum size allowed. In fact, I'm getting very strange errors on Cygwin...

If you assign #ARGV before using *ARGV (aka the diamond <>), $^I/-i will work on those files instead of what was specified on the command line.
use File::Find::Rule;
use strict;
#ARGV = (File::Find::Rule->file()->name('*.cs', '*.aspx', '*.ascx')->in('.'));
$^I = '.bak'; # or set `-i` in the #! line or on the command-line
while (<>) {
s/thisgoesout/thisgoesin/gi;
print;
}
This should do exactly what you want.
If your pattern can span multiple lines, add in a undef $/; before the <> so that Perl operates on a whole file at a time instead of line-by-line.

You may be interested in File::Transaction::Atomic or File::Transaction
The SYNOPSIS for F::T::A looks very similar with what you're trying to do:
# In this example, we wish to replace
# the word 'foo' with the word 'bar' in several files,
# with no risk of ending up with the replacement done
# in some files but not in others.
use File::Transaction::Atomic;
my $ft = File::Transaction::Atomic->new;
eval {
foreach my $file (#list_of_file_names) {
$ft->linewise_rewrite($file, sub {
s#\bfoo\b#bar#g;
});
}
};
if ($#) {
$ft->revert;
die "update aborted: $#";
}
else {
$ft->commit;
}
Couple that with the File::Find you've already written, and you should be good to go.

You can use Tie::File to scalably access large files and change them in place. See the manpage (man 3perl Tie::File).

Change
foreach my $f (#files){
if ($f =~ s/thisgoesout/thisgoesin/gi) {
#inplace file editing, or something like that
}
}
To
foreach my $f (#files){
open my $in, '<', $f;
open my $out, '>', "$f.out";
while (my $line = <$in>){
chomp $line;
$line =~ s/thisgoesout/thisgoesin/gi
print $out "$line\n";
}
}
This assumes that the pattern doesn't span multiple lines. If the pattern might span lines, you'll need to slurp in the file contents. ("slurp" is a pretty common Perl term).
The chomp isn't actually necessary, I've just been bitten by lines that weren't chomped one too many times (if you drop the chomp, change print $out "$line\n"; to print $out $line;).
Likewise, you can change open my $out, '>', "$f.out"; to open my $out, '>', undef; to open a temporary file and then copy that file back over the original when the substitution's done. In fact, and especially if you slurp in the whole file, you can simply make the substitution in memory and then write over the original file. But I've made enough mistakes doing that that I always write to a new file, and verify the contents.
Note, I originally had an if statement in that code. That was most likely wrong. That would have only copied over lines that matched the regular expression "thisgoesout" (replacing it with "thisgoesin" of course) while silently gobbling up the rest.

You could use find:
find . -name '*.{cs,aspx,ascx}' | xargs perl -p -i.bak -e "s/thisgoesout/thisgoesin/gi"
This will list all the filenames recursively, then xargs will read its stdin and run the remainder of the command line with the filenames appended on the end. One nice thing about xargs is it will run the command line more than once if the command line it builds gets too long to run in one go.
Note that I'm not sure whether find completely understands all the shell methods of selecting files, so if the above doesn't work then perhaps try:
find . | grep -E '(cs|aspx|ascx)$' | xargs ...
When using pipelines like this, I like to build up the command line and run each part individually before proceeding, to make sure each program is getting the input it wants. So you could run the part without xargs first to check it.
It just occurred to me that although you didn't say so, you're probably on Windows due to the file suffixes you're looking for. In that case, the above pipeline could be run using Cygwin. It's possible to write a Perl script to do the same thing, as you started to do, but you'll have to do the in-place editing yourself because you can't take advantage of the -i switch in that situation.

Thanks to ephemient on this question and on this answer, I got this:
use File::Find::Rule;
use strict;
sub ReplaceText {
my $regex = shift;
my $replace = shift;
#ARGV = (File::Find::Rule->file()->name('*.cs','*.aspx','*.ascx')->in('.'));
$^I = '.bak';
while (<>) {
s/$regex/$replace->()/gie;
print;
}
}
ReplaceText qr/some(crazy)regexp/, sub { "some $1 text" };
Now I can even loop through a hash containing regexp=>subs entries!