How can I do bulk search and replace with Perl? - perl

I have the following script that takes in an input file, output file and
replaces the string in the input file with some other string and writes out
the output file.
I want to change the script to traverse through a directory of files
i.e. instead of prompting for input and output files, the script should take
as argument a directory path such as C:\temp\allFilesTobeReplaced\ and
search for a string x and replace it with y for all files under that
directory path and write out the same files.
How do I do this?
Thanks.
$file=$ARGV[0];
open(INFO,$file);
#lines=<INFO>;
print #lines;
open(INFO,">c:/filelist.txt");
foreach $file (#lines){
#print "$file\n";
print INFO "$file";
}
#print "Input file name: ";
#chomp($infilename = <STDIN>);
if ($ARGV[0]){
$file= $ARGV[0]
}
print "Output file name: ";
chomp($outfilename = <STDIN>);
print "Search string: ";
chomp($search = <STDIN>);
print "Replacement string: ";
chomp($replace = <STDIN>);
open(INFO,$file);
#lines=<INFO>;
open(OUT,">$outfilename") || die "cannot create $outfilename: $!";
foreach $file (#lines){
# read a line from file IN into $_
s/$search/$replace/g; # change the lines
print OUT $_; # print that line to file OUT
}
close(IN);
close(OUT);

The use of the perl single liner
perl -pi -e 's/original string/new string/' filename
can be combined with File::Find, to give the following single script (this is a template I use for many such operations).
use File::Find;
# search for files down a directory hierarchy ('.' taken for this example)
find(\&wanted, ".");
sub wanted
{
if (-f $_)
{
# for the files we are interested in call edit_file().
edit_file($_);
}
}
sub edit_file
{
my ($filename) = #_;
# you can re-create the one-liner above by localizing #ARGV as the list of
# files the <> will process, and localizing $^I as the name of the backup file.
local (#ARGV) = ($filename);
local($^I) = '.bak';
while (<>)
{
s/original string/new string/g;
}
continue
{
print;
}
}

You can do this with the -i param:
Just process all the files as normal, but include -i.bak:
#!/usr/bin/perl -i.bak
while ( <> ) {
s/before/after/;
print;
}
This should process each file, and rename the original to original.bak And of course you can do it as a one-liner as mentioned by #Jamie Cook

Try this
#!/usr/bin/perl -w
#files = <*>;
foreach $file (#files) {
print $file . '\n';
}
Take also a look to glob in Perl:
http://perldoc.perl.org/File/Glob.html
http://www.lyingonthecovers.net/?p=312

I know you can use a simple Perl one-liner from the command line, where filename can be a single filename or a list of filenames. You could probably combine this with bgy's answer to get the desired effect:
perl -pi -e 's/original string/new string/' filename
And I know it's trite but this sounds a lot like sed, if you can use gnu tools:
for i in `find ./allFilesTobeReplaced`; do sed -i s/original string/new string/g $i; done

perl -pi -e 's#OLD#NEW#g' filename.
You can replace filename with the pattern that suits your file list.

Related

Perl wildcards in the file paths

I am working on my project where the GNU Makefile should automatically test my Perl program with different input files. I have this code which reads only one file from inputs directory, searches stop words and outputs out.txt file with frequency table of non-stop words.
#!/usr/bin/perl
use strict;
use warnings;
use Lingua::StopWords qw(getStopWords);
my %found;
my $src = '/programu-testavimas/1-dk/trunk/tests/inputs/test.txt';
my $des = '/programu-testavimas/1-dk/trunk/tests/outputs/out.txt';
open(SRC,'<',$src) or die $!;
open(DES,'>',$des) or die $!;
my $stopwords = getStopWords('en');
while( my $line = <SRC>){
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print DES $_, "\t\t", $found{$_}, $/ for sort keys %found;
close(SRC);
close(DES);
My goal is to test many files with separate case.sh scripts where the input files should be different in each case, this is one of the case:
#!/bin/sh
perl /programu-testavimas/1-dk/trunk/scripts/test.pl /programu-testavimas/1-dk/trunk/tests/inputs/test.txt > /home/aleksandra/programų-testavimas/1-dk/trunk/tests/outputs/out.txt
Then, my Makefile at once should test program with different inputs in each case. So, right now I'm struggling with my Perl code where my input file is only one individual and I need to make it read different files in inputs directory. How can I change path correctly that bash scripts could have each case with individual input file?
EDIT: I tried this with glob function but it outputs empty file
open(DES,'>',$des) or die $!;
my $stopwords = getStopWords('en');
for my $file ( glob $src ) {
open(SRC,'<',$file) or die "$! opening $file";
while( my $line = <SRC>){
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print DES $_, "\t\t", $found{$_}, $/ for sort keys %found;
close(SRC);
}
close(DES);
Correct me if I'm wrong, but to me it sounds like you have different shell scripts, each calling your perl script with a different input, and redirecting your perl's script output to a new file.
You don't need to glob anything in your perl script. It already has all the information it needs: which file to read. Your shell script/Makefile is handling the rest.
So given the shell script
#!/bin/sh
perl /path/to/test.pl /path/to/input.txt > /path/to/output.txt
Then in your perl script, simply read from the file provided via the first positional parameter:
#!/usr/bin/perl
use strict;
use warnings;
use Lingua::StopWords qw(getStopWords);
my %found;
my $stopwords = getStopWords('en');
while(my $line = <>) {
++$found{$_} for grep { !$stopwords->{$_} }
split /\s+/, lc $line;
}
print $_, "\t\t", $found{$_}, $/ for sort keys %found;
while(<>) will read from STDIN or from ARGV.
Your shell script could then call your perl script with different inputs and define outputs:
#!/bin/sh
for input in /path/to/*.txt; do
perl /path/to/test.pl "$input" > "$input.out"
done

Perl to find the extension of file

I have a program that takes directory name as input from user and searches all files inside the directory and prints the contents of file. Is there any way so that I can read the extension of file and read the contents of file that are of specified extension? For example, it should read contents of file that is in ".txt" format.
My code is
use strict;
use warnings;
use File::Basename;
#usr/bin/perl
print "enter a directory name\n";
my $dir = <>;
print "you have entered $dir \n";
chomp($dir);
opendir DIR, $dir or die "cannot open directory $!";
while ( my $file = readdir(DIR) ) {
next if ( $file =~ m/^\./ );
my $filepath = "${dir}${file}";
print "$filepath\n";
print " $file \n";
open( my $fh, '<', $filepath ) or die "unable to open the $file $!";
while ( my $row = <$fh> ) {
chomp $row;
print "$row\n";
}
}
To get just the ".txt" files, you can use a file test operator (-f : regular file) and a regex.
my #files = grep { -f && /\.txt$/ } readdir $dir;
Otherwise, you can look for just text files, using perl's -T (ascii-text file test operator)
my #files = grep { -T } readdir $dir;
Otherwise you can try even this:
my #files = grep {-f} glob("$dir/*.txt");
You're pretty close here. You have a main loop that looks like this:
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./; # skip hidden files
# do stuff
}
See where you're skipping loop iterations if the filename starts with a dot. That's an excellent place to put any other skip requirements that you have - like skipping files that don't end with '.txt'.
while ( my $file = readdir(DIR) ) {
next if $file =~ /^\./; # skip hidden files
next unless $file =~ /\.txt$/i; # skip non-text files
# do stuff
}
In the same way as your original test checked for the start of the string (^) followed by a literal dot (\.), we're now searching for a dot (\.) followed by txt and the end of the string ($). Note that I've also added the /i option to the match operator to make the match case-insensitive - so that we match ".TXT" as well as ".txt".
It's worth noting that the extension of a file is a terrible way to work out what the file contains.
Try this. Below code gives what you expect.
use warnings;
use strict;
print "Enter the directory name: ";
chomp(my $dir=<>);
print "Enter the file extension type: "; #only type the file format. like txt rtf
chomp(my $ext=<>);
opendir('dir',"$dir");
my #files = grep{m/.$ext/g} readdir('dir');
foreach my $ech(#files){
open('file',"$dir/$ech");
print <file>;
}
I'm store the all file from the particular directory to store the one array and i get the particular file format by using the grep command. Then open the files into the foreach condition

Find file which content not match a string pattern in Perl

I'm writing a code to find the file which not contain a string pattern. Provided I have a list of files, I have to look into the content of each file, I would like to get the file name if the string pattern "clean" not appear inside the file. Pls help.
Here is the scenario:
I have a list of files, inside each file is having numerous of lines. If the file is clean, it will have the "clean" wording. But if the file is dirty, the "clean" wording not exist and there will be no clear indication to tell the file is dirty. So as long as inside each file, if the "clean" wording is not detect, I'll category it as dirty file and I would like to trace the file name
You can use a simple one-liner:
perl -0777 -nlwE 'say $ARGV if !/clean/i' *.txt
Slurping the file with -0777, making the regex check against the entire file. If the match is not found, we print the file name.
For perl versions lower than 5.10 that do not support -E you can substitute -E with -e and say $ARGV with print "$ARGV".
perl -0777 -nlwe 'print "$ARGV\n" if !/clean/i' *.txt
If you need to generate the list within Perl, the File::Finder module will make life easy.
Untested, but should work:
use File::Finder;
my #wanted = File::Finder # finds all ..
->type( 'f' ) # .. files ..
->name( '*.txt' ) # .. ending in .txt ..
->in( '.' ) # .. in current dir ..
->not # .. that do not ..
->contains( qr/clean/ ); # .. contain "clean"
print $_, "\n" for #wanted;
Neat stuff!
EDIT:
Now that I have a clearer picture of the problem, I don't think any module is necessary here:
use strict;
use warnings;
my #files = glob '*.txt'; # Dirty & clean laundry
my #dirty;
foreach my $file ( #files ) { # For each file ...
local $/ = undef; # Slurps the file in
open my $fh, $file or die $!;
unless ( <$fh> =~ /clean/ ) { # if the file isn't clean ..
push #dirty, $file; # .. it's dirty
}
close $fh;
}
print $_, "\n" for #dirty; # Dirty laundry list
Once you get the mechanics, this can be simplified a la grep, etc.
One way like this:
ls *.txt | grep -v "$(grep -l clean *.txt)"
#!/usr/bin/perl
use strict;
use warnings;
open(FILE,"<file_list_file>");
while(<FILE>)
{
my $flag=0;
my $filename=$_;
open(TMPFILE,"$_");
while(<TMPFILE>)
{
$flag=1 if(/<your_string>/);
}
close(TMPFILE);
if(!$flag)
{
print $filename;
}
}
close(FILE);

How to change all occurrences of a char in all files in a directory (and subdirectories ) using Perl

** I have a follow-up question that is marked with '**' **
I was asked to write Perl code that replaces every { with {function(<counter>) and in every replacement the counter should get larger by 1. e.g. first replacement of { will be {function(0) ,
second replacement of { will be {function(1) etc.
It suppose to do the replacement in every *.c and *.h file in a folder including subfolders.
I wrote this code :
#!/usr/bin/perl
use Tie::File;
use File::Find;
$counter = 0;
$flag = 1;
#directories_to_search = 'd:\testing perl';
#newString = '{ function('.$counter.')';
$refChar = "{";
finddepth(\&fileMode, #directories_to_search);
sub fileMode
{
my #files = <*[ch]>; # get all files ending in .c or .h
foreach $file (#files) # go through all the .c and .h flies in the directory
{
if (-f $file) # check if it is a file or dir
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $file or die "Can't read file: $!\n";
foreach ( #lines )
{
if (s/{/#newString/g)
{
$counter++;
#newString = '{function('.$counter.')';
}
untie #lines; # free #lines
}
}
}
}
The code searches the directory d:\testing Perl and does the replacement but instead of getting
{function(<number>) I get {function(number1) function(number3) function(number5) function(number7) for instance for the first replacement I get
{function(0) function(2) function(4) function(6) and I wanted to get {function(0)
I really don't know what is wrong with my code.
An awk solution or any other Perl solution will also be great!
* I have a follow-up question.
now I want my perl program to do the same substitution in all the files except the lines when there is a '{'
and a '}' in the same line. so i modified the code this way.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
use File::Find;
my $dir = "C:/test dir";
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
#foreach $filename (#ARGV)
while (<>)
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $ARGV or die "Can't read file: $!\n";
#$_='{function(' . $counter++ . ')';
foreach (#lines)
{
if (!( index (#lines,'}')!= -1 )) # if there is a '}' in the same line don't add the macro
{
s/{/'{function(' . $counter++ . ')'/ge;
print;
}
}
untie #lines; # free #lines
}
what I was trying to do is to go through all the files in #ARGV that i found in my dir and subdirs and for each *.c or *.h file I want to go line by line and check if this line contains '{'. if it does the program won't check if there is a '{' and won't make the substitution, if it doesn't the program will substitute '{' with '{function();'
unfortunately this code does not work.
I'm ashamed to say that I'm trying to make it work all day and still no go.
I would really appreciate some help.
Thank You!!
This is a simple matter of combining a finding method with an in-place edit. You could use Tie::File, but it is really the same end result. Also, needless to say, you should keep backups of your original files, always, when doing edits like these because changes are irreversible.
So, if you do not need recursion, your task is dead simple in Unix/Linux style:
perl -pi -we 's/{/"{ function(" . $i++ . ")"/ge' *.h *.c
Of course, since you seem to be using Windows, the cmd shell won't glob our arguments, so we need to do that manually. And we need to change the quotes around. And also, we need to supply a backup argument for the -i (in-place edit) switch.
perl -pi.bak -we "BEGIN { #ARGV = map glob, #ARGV }; s/{/'{ function(' . $i++ . ')'/ge" *.h *.c
This is almost getting long enough to make a script of.
If you do need recursion, you would use File::Find. Note that this code is pretty much identical in functionality as the one above.
use strict;
use warnings;
use File::Find;
my $dir = "d:/testing perl"; # use forward slashes in paths
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
while (<>) {
s/{/'{ function(' . $counter++ . ')'/ge;
print;
}
Don't be lulled into a false sense of security by the backup option: If you run this script twice in a row, those backups will be overwritten, so keep that in mind.
$ perl -pi -e 's| (?<={) | q#function(# . ($i++) . q#)# |gex' *.c *.h
It can be done in a single line as below:
perl -pi -e 's/({)/"{function(".++$a.")"/ge;' your_file
I have just taken an example input file and tested too.
> cat temp
line-1 { { { {
line-2 { { {
line-3 { {
line-4 {
Now the execution:
> perl -pi -e 's/({)/"{function(".++$a.")"/ge;' temp
> cat temp
line-1 {function(1) {function(2) {function(3) {function(4)
line-2 {function(5) {function(6) {function(7)
line-3 {function(8) {function(9)
line-4 {function(10)
Using awk '/{/{gsub(/{/,"{function("i++")");print;next}{print}' and your code as input:
$ awk '/{/{gsub(/{/,"{function("i++")");print;next}{print}' file
sub fileMode
{function(0)
my #files = <*[ch]>; # get all files ending in .c or .h
foreach $file (#files) # go through all the .c and .h flies in the directory
{function(1)
if (-f $file) # check if it is a file or dir
{function(2)
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{function(3)' '
tie #lines, 'Tie::File', $file or die "Can't read file: $!\n";
foreach ( #lines )
{function(4)
if (s/{function(5)/#newString/g)
{function(6)
$counter++;
#newString = '{function(7)function('.$counter.')';
}
untie #lines; # free #lines
}
}
}
}
Note: The function number won't be incremented for inline nested {.
$ echo -e '{ { \n{\n-\n{' | awk '/{/{gsub(/{/,"{function("i++")");print;next}1'
{function(0) {function(0)
{function(1)
-
{function(2)
Explanation:
/{/ # For any lines that contain {
gsub( /{/ , "{function("i++")" ) # replace { with function(i++)
print;next # print the line where the replacement happened and skip to the next
print # print all the lines

Why does this program fail to copy files?

this morning, my friend and I discussed and wrote the below code. The idea behind this Perl script is to create the directory structure and copy the files to the corresponding directory.
#!/usr/bin/perl
use File::Path;
use File::Copy;
use Path::Class;
use File::Basename qw/dirname/;
my $src = "/Vijay/new.txt";
unless (open(MYFILE, "file1")) {
die ("cannot open input file file1\n");
}
$line = <MYFILE>;
while ($line ne "") {
print ($line);
mkdir_and_copy($src,$line);
$line = <MYFILE>;
}
sub mkdir_and_copy {
my ($from, $to) = #_;
my($directory, $filename) = $to =~ m/(.*\/)(.*)$/;
print("creating dir $directory");
system "mkdir -p $directory";
print("copying file $from to $to");
system "cp -f $from $to";
return;
}
The above piece of code creates the directory structure, but fails to copy the files to the corresponding directory. Could you please let us know, where exactly we are wrong?
Contents of file1:
test/test1/test2/test.txt
Contents of new.txt:
Shell/Test/test1/test1.txt
Shell/Test/test2/test2.txt
Shell/Test/test3/test3.txt
Output:
> ./mypgm.pl
test/test1/test2/test.txt
creating dir test/test1/test2/copying file /Vijay/new.txt to test/test1/test2/test.txt
cp: cannot access /Vijay/new.txt: No such file or directory
>
The directory Vijay has the file new.txt with the above mentioned content.
Thanks in advance,
Vijay
Hello everyone,
I just modified my code. Please refer the below section of code.
#!/usr/bin/perl
use File::Path;
use File::Copy;
use File::Basename qw/dirname/;
my $src = "./Vijay/new.txt";
unless (open(MYFILE, "file1"))
{
die ("cannot open input file file1\n");
}
$line = ;
while ($line ne "")
{
print ($line); print("\n");
mkdir_and_copy($src,$line);
$line = ""; }
sub mkdir_and_copy
{
my ($from, $to) = #_;
my($directory, $filename) = $to =~ m/(.\/)(.)$/;
$temp = $directory.$filename;
print("Creating dirrectory $directory \n");
if(! -d $directory)
{
mkpath($directory) #or die "Failed to create path";
}
printf("From: $from \n");
printf("To: $temp \n");
copy($from,$temp) or die "Failed to Copy";
return;
}
Now, it creates the exact directory structure and copies the file to the corresponding directory. Could you please tell me that, whether the above code is a proper one?
Your goal is not clear to me, but perhaps this will help you solve the problem:
# Perl scripts should always include this.
# Your original script was generating some useful warnings.
use strict;
use warnings;
my $src = "/Vijay/new.txt";
my $f1 = 'file1';
# This is the recommended way to open a file --
# that is, using a lexical file handle.
open(my $file_handle, '<', $f1) or die "open() failed : $f1 : $!";
# This is the typical way of iterating over the lines in a file.
while (my $line = <$file_handle>){
# You probably want to remove the newline
# before passing the line to mkdir_and_copy()
chomp $line;
mkdir_and_copy($src, $line);
}
sub mkdir_and_copy {
my ($from, $to) = #_;
my ($directory, $filename) = $to =~ m/(.*\/)(.*)$/;
# When writing a script that makes system() calls,
# start by simply printing them. After everything
# looks good, convert the print commands to system() calls.
print "system(): mkdir -p $directory", "\n";
print "system(): cp -f $from $to", "\n";
# The return is not needed.
}
When I run the script with the inputs you provided, here's the output:
system(): mkdir -p test/test1/test2/
system(): cp -f /Vijay/new.txt test/test1/test2/test.txt
This can't be your intent. In particular, why are you iterating over file1 when it contains only one line? Perhaps you meant to iterate over new.txt?
The first thing to do if something "does't work" is to catch errors and to look at them. Then to investigate content of variables. In your case the variable $to just contains the file name, so the script copies it into the current working directory, I'd imagine, not into the newly created directory.
HOWEVER, the methods you're using to get your job done are not exactly the best. It would be better to actually use File::Path and File::Copy, and in particular your way of splitting a path into directory and filename at the first slash is anything but general. This sort of thing should be done in libraries, of which Perl has many.
I'll bet your $line variable still has a newline appended to it. The input returned from the filehandle input operator (<MYFILE>) includes the record separator (usually the newline character(s) for your OS). Try this:
$line = <MYFILE>;
chomp($line);