perl search & replace script for all files in a directory - perl

I have a directory with nearly 1,200 files. I need to successively go through each file in a perl script to search and replace any occurrences of 66 strings. So, for each file I need to run all 66 s&r's. My replace string is in Thai, so I cannot use the shell. It must be a .pl file or similar so that I can use use::utf8. I am just not familiar with how to open all files in a directory one by one to perform actions on them. Here is a sample of my s&r:
s/psa0*(\d+)/เพลงสดุดี\1/g;
Thanks for any help.

use utf8;
use strict;
use warnings;
use File::Glob qw( bsd_glob );
#ARGV = map bsd_glob($_), #ARGV;
while (<>) {
s/psa0*(?=\d)/เพลงสดุดี/g;
print;
}
perl -i.bak script.pl *
I used File::Glob's bsd_glob since glob won't handle spaces "correctly". They are actually the same function, but the function behaves differently based on how it's called.
By the way, using \1 in the replacement expression (i.e. outside a regular expression) makes no sense. \1 is a regex pattern that means "match what the first capture captured". So
s/psa0*(\d+)/เพลงสดุดี\1/g;
should be
s/psa0*(\d+)/เพลงสดุดี$1/g;
The following is a faster alternative:
s/psa0*(?=\d)/เพลงสดุดี/g;

See opendir/readdir/closedir for functions that can iterate through all the filenames in a directory (much like you would use open/readline/close to iterate through all the lines in a file).
Also see the glob function, which returns a list of filenames that match some pattern.

Just in case someone could use it in the future. This is what I actually did.
use warnings;
use strict;
use utf8;
my #files = glob ("*.html");
foreach $a (#files) {
open IN, "$a" or die $!;
open OUT, ">$a-" or die $!;
binmode(IN, ":utf8");
binmode(OUT, ":utf8");
select (OUT);
foreach (<IN>) {
s/gen0*(\d+)/ปฐมกาล $1/;
s/exo0*(\d+)/อพยพ $1/;
s/lev0*(\d+)/เลวีนิติ $1/;
s/num0*(\d+)/กันดารวิถี $1/;
...etc...
print "$_";
}
close IN;
close OUT;
};

Related

How do I open a file in Perl using a search path (e.g. $PATH)?

I have a situation in which I am reading filenames from a file in Perl. These filenames never have a directory associated with them, only a file name (e.g. "foo.bar"). I need to search the equivalent of gmake's VPATH (or a shell's PATH) for that file.
I figure I can split the PATH at the colons, concatenate each segment to the file, and see if it exists. Is there an easier way to do this, though?
What's so uneasy in it?
#! /usr/bin/perl
use warnings;
use strict;
use List::Util qw{ first };
sub find_in_path {
my $file = shift;
return first { -f } map "$_/$file", split /:/, $ENV{PATH}
}
print find_in_path('grep'), "\n";

Merge multiple HTML Files

I am merging multiple html files in the directory/subdirectory into single html within the same directories. I gone through some website and tried the below code:
#!/usr/bin/perl -w
use strict;
use File::Slurp;
my $basedir = 'c:/test';
opendir(DIR, $basedir) or die $!;
my #files = readdir(DIR); # name arrays plural, hashes singular
closedir DIR;
my $outfilename = 'final.htm';
my $outfilesrc = undef;
foreach (sort #files){
$outfilesrc.= File::Slurp::slurp("$basedir/$_");
}
open(OUT, "> $basedir/$outfilename") or die ("Can't open for writing: $basedir/$outfilename : $!");
print OUT $outfilesrc;
close OUT;
exit;
But I am getting follwing error and could not merge the file.
read_file 'c:/test.' - sysopen: Permission denied at mergehtml.pl line 15
Can anyone help me! Is there any way to merge HTML files to single in Perl?
Your error most likely comes from trying to open the "current directory" c:\test\. for reading. This comes from using readdir to list the files: readdir includes all the files.
If all you want to do is concatenate the files, its rather simple if you're in linux: cat test/* > final.htm. Unfortunately, in Windows its a bit more tricky.
perl -pe"BEGIN { #ARGV = map glob, #ARGV }" "C:/test/*" > final.htm
Explanation:
We use the -p option to read and print the content of the argument file names. Those arguments are in this case a glob, and the windows command shell does not perform these globs automagically, so we have to ask perl to do it, with the built-in glob command. We do this in a BEGIN block to separate it from the rest of the code. The "rest of the code" is in this case just (basically) a while (<>) { print } block that reads and prints the contents of the files. At the end of the line we redirect all the output to the file final.htm.
Why use glob over readdir? Well, for one thing, readdir includes the directories . (current dir) and .. (parent dir), which will mess up your code, like I mentioned at the top. You would need to filter out directories. And glob does this smoothly with no problem.
If you want the longer version of this script, you can do
use strict;
use warnings;
#ARGV = map glob, #ARGV;
while (<>) {
print;
}
Note that I suspect that you only want html files to be merged. So it would perhaps be a good idea of you to change your glob from * to something like
*.htm *.html
Filter out the files "." and ".." from your #files list.

Is there an issue with opening filenames provided on the command line through $_?

I'm having trouble modifying a script that processes files passed as command line arguments, merely for copying those files, to additionally modifying those files. The following perl script worked just fine for copying files:
use strict;
use warnings;
use File::Copy;
foreach $_ (#ARGV) {
my $orig = $_;
(my $copy = $orig) =~ s/\.js$/_extjs4\.js/;
copy($orig, $copy) or die(qq{failed to copy $orig -> $copy});
}
Now that I have files named "*_extjs4.js", I would like to pass those into a script that similarly takes file names from the command line, and further processes the lines within those files. So far I am able get a file handle successfully as the following script and it's output shows:
use strict;
use warnings;
foreach $_ (#ARGV) {
print "$_\n";
open(my $fh, "+>", $_) or die $!;
print $fh;
#while (my $line = <$fh>) {
# print $line;
#}
close $fh;
}
Which outputs (in part):
./filetree_extjs4.js
GLOB(0x1a457de8)
./async_submit_extjs4.js
GLOB(0x1a457de8)
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves. A start would be to print the files lines, which I've tried to do with the commented out code above.
But that code has no effect, the files' lines do not get printed. What am I doing wrong? Is there a conflict between the $_ used to process command line arguments, and the one used to process file contents?
It looks like there are a couple of questions here.
What I really want to do though rather than printing a representation of the file handle, is to work with the contents of the files themselves.
The reason why print $fh is returning GLOB(0x1a457de8) is because the scalar $fh is a filehandle and not the contents of the file itself. To access the contents of the file itself, use <$fh>. For example:
while (my $line = <$fh>) {
print $line;
}
# or simply print while <$fh>;
will print the contents of the entire file.
This is documented in pelrdoc perlop:
If what the angle brackets contain is a simple scalar variable (e.g.,
<$foo>), then that variable contains the name of the filehandle to
input from, or its typeglob, or a reference to the same.
But it has already been tried!
I can see that. Try it after changing the open mode to +<.
According to perldoc perlfaq5:
How come when I open a file read-write it wipes it out?
Because you're using something like this, which truncates the file
then gives you read-write access:
open my $fh, '+>', '/path/name'; # WRONG (almost always)
Whoops. You should instead use this, which will fail if the file
doesn't exist:
open my $fh, '+<', '/path/name'; # open for update
Using ">" always clobbers or creates. Using "<" never does either. The
"+" doesn't change this.
It goes without saying that the or die $! after the open is highly recommended.
But take a step back.
There is a more Perlish way to back up the original file and subsequently manipulate it. In fact, it is doable via the command line itself (!) using the -i flag:
$ perl -p -i._extjs4 -e 's/foo/bar/g' *.js
See perldoc perlrun for more details.
I can't fit my needs into the command-line.
If the manipulation is too much for the command-line to handle, the Tie::File module is worth a try.
To read the contents of a filehandle you have to call readline read or place the filehandle in angle brackets <>.
my $line = readline $fh;
my $actually_read = read $fh, $text, $bytes;
my $line = <$fh>; # similar to readline
To print to a filehandle other than STDIN you have to have it as the first argument to print, followed by what you want to print, without a comma between them.
print $fh 'something';
To prevent someone from accidentally adding a comma, I prefer to put the filehandle in a block.
print {$fh} 'something';
You could also select your new handle.
{
my $oldfh = select $fh;
print 'something';
select $oldfh; # reset it back to the previous handle
}
Also your mode argument to open, causes it to clobber the contents of the file. At which point there is nothing left to read.
Try this instead:
open my $fh, '+<', $_ or die;
I'd like to add something to Zaid's excellent suggestion of using a one-liner.
When you are new to perl, and trying some tricky regexes, it can be nice to use a source file for them, as the command line may get rather crowded. I.e.:
The file:
#!/usr/bin/perl
use warnings;
use strict;
s/complicated/regex/g;
While tweaking the regex, use the source file like so:
perl -p script.pl input.js
perl -p script.pl input.js > testfile
perl -p script.pl input.js | less
Note that you don't use the -i flag here while testing. These commands will not change the input files, only print the changes to stdout.
When you're ready to execute the (permanent!) changes, just add the in-place edit -i flag, and if you wish (recommended), supply an extension for backups, e.g. ".bak".
perl -pi.bak script.pl *.js

Open file for reading and writing(not appending) in perl

Is there any way with the standard perl libraries to open a file and edit it, without having to close it then open it again? All I know how to do is to either read a file into a string close the file then overwrite the file with a new one; or read and then append to the end of a file.
The following currently works but; I have to open it and close it twice, instead of once:
#!/usr/bin/perl
use warnings; use strict;
use utf8; binmode(STDIN, ":utf8"); binmode(STDOUT, ":utf8");
use IO::File; use Cwd; my $owd = getcwd()."/"; # OriginalWorkingDirectory
use Text::Tabs qw(expand unexpand);
$Text::Tabs::tabstop = 4; #sets the number of spaces in a tab
opendir (DIR, $owd) || die "$!";
my #files = grep {/(.*)\.(c|cpp|h|java)/} readdir DIR;
foreach my $x (#files){
my $str;
my $fh = new IO::File("+<".$owd.$x);
if (defined $fh){
while (<$fh>){ $str .= $_; }
$str =~ s/( |\t)+\n/\n/mgos;#removes trailing spaces or tabs
$str = expand($str);#convert tabs to spaces
$str =~ s/\/\/(.*?)\n/\/\*$1\*\/\n/mgos;#make all comments multi-line.
#print $fh $str;#this just appends to the file
close $fh;
}
$fh = new IO::File(" >".$owd.$x);
if (defined $fh){
print $fh $str; #this just appends to the file
undef $str; undef $fh; # automatically closes the file
}
}
You already opened the file for reading and writing by opening it with the mode <+, you're just not doing anything useful with it -- if you wanted to replace the contents of the file instead of writing to the current position (the end of the file) then you should seek back to the beginning, write what you need to, and then truncate to make sure that there's nothing left over if you made the file shorter.
But since what you're trying to do is inplace filtering of a file, might I suggest that you use perl's inplace editing extension, instead of doing all of the work yourself?
#!perl
use strict;
use warnings;
use Text::Tabs qw(expand unexpand);
$Text::Tabs::tabstop = 4;
my #files = glob("*.c *.h *.cpp *.java");
{
local $^I = ""; # Enable in-place editing.
local #ARGV = #files; # Set files to operate on.
while (<>) {
s/( |\t)+$//g; # Remove trailing tabs and spaces
$_ = expand($_); # Expand tabs
s{//(.*)$}{/*$1*/}g; # Turn //comments into /*comments*/
print;
}
}
And that's all the code you need -- perl handles the rest. Setting the $^I variable is equivalent to using the -i commandline flag. I made several changes to your code along the way -- use utf8 does nothing for a program with no literal UTF-8 in the source, binmodeing stdin and stdout does nothing for a program that never uses stdin or stdout, saving the CWD does nothing for a program that never chdirs. There was no reason to read each file in all at once so I changed it to linewise, and made the regexes less awkward (and incidentally, the /o regex modifier is good for almost precisely nothing these days, except adding hard-to-find bugs to your code).

File is adding one extra space in each line

I am trying to add all the elements in array using push . then i stored into another file
but begining of file i am seeing one whitespeace in every thing ..
What is the issue .. any one before face this issue .
open FILE , "a.txt"
while (<FILE>)
{
my $temp =$_;
push #array ,$temp;
}
close(FILE);
open FILE2, "b.txt";
print FILE2 "#array";
close FILE2;
When you quote an array variable like this: "#array" it gets interpolated with spaces. That's where they come from in your output. So do not quote if you do not need or want this sort of interpolation.
Now let's rewrite your program to modern Perl.
use strict;
use warnings FATAL => 'all';
use autodie qw(:all);
my #array;
{
open my $in, '<', 'a.txt';
#array = <$in>;
}
{
open my $out, '>', 'b.txt';
print {$out} #array;
}
You put quotes around "#array". That makes it a string interpolation, which for arrays is equivalent to join($", #array). The default value for $" is (guess what?) a space.
Try
print FILE2 #array;
open usually takes another argument that specifies whether the file is opened for input or for output (or for both or for some other special case). You have omitted this argument, and so by default FILE2 is an input filehandle.
You wanted to say
open FILE2, '>', "b.txt"
If you put the line
use warnings;
at the beginning of every Perl script, the interpreter will catch many issues like this for you.