Find file which content not match a string pattern in Perl - perl

I'm writing a code to find the file which not contain a string pattern. Provided I have a list of files, I have to look into the content of each file, I would like to get the file name if the string pattern "clean" not appear inside the file. Pls help.
Here is the scenario:
I have a list of files, inside each file is having numerous of lines. If the file is clean, it will have the "clean" wording. But if the file is dirty, the "clean" wording not exist and there will be no clear indication to tell the file is dirty. So as long as inside each file, if the "clean" wording is not detect, I'll category it as dirty file and I would like to trace the file name

You can use a simple one-liner:
perl -0777 -nlwE 'say $ARGV if !/clean/i' *.txt
Slurping the file with -0777, making the regex check against the entire file. If the match is not found, we print the file name.
For perl versions lower than 5.10 that do not support -E you can substitute -E with -e and say $ARGV with print "$ARGV".
perl -0777 -nlwe 'print "$ARGV\n" if !/clean/i' *.txt

If you need to generate the list within Perl, the File::Finder module will make life easy.
Untested, but should work:
use File::Finder;
my #wanted = File::Finder # finds all ..
->type( 'f' ) # .. files ..
->name( '*.txt' ) # .. ending in .txt ..
->in( '.' ) # .. in current dir ..
->not # .. that do not ..
->contains( qr/clean/ ); # .. contain "clean"
print $_, "\n" for #wanted;
Neat stuff!
EDIT:
Now that I have a clearer picture of the problem, I don't think any module is necessary here:
use strict;
use warnings;
my #files = glob '*.txt'; # Dirty & clean laundry
my #dirty;
foreach my $file ( #files ) { # For each file ...
local $/ = undef; # Slurps the file in
open my $fh, $file or die $!;
unless ( <$fh> =~ /clean/ ) { # if the file isn't clean ..
push #dirty, $file; # .. it's dirty
}
close $fh;
}
print $_, "\n" for #dirty; # Dirty laundry list
Once you get the mechanics, this can be simplified a la grep, etc.

One way like this:
ls *.txt | grep -v "$(grep -l clean *.txt)"

#!/usr/bin/perl
use strict;
use warnings;
open(FILE,"<file_list_file>");
while(<FILE>)
{
my $flag=0;
my $filename=$_;
open(TMPFILE,"$_");
while(<TMPFILE>)
{
$flag=1 if(/<your_string>/);
}
close(TMPFILE);
if(!$flag)
{
print $filename;
}
}
close(FILE);

Related

concatenate and print each array element with a suffix

I have some files in a directory like:
A.txt
B.txt
C.txt
I want to print them with new suffix:
#!/user/bin/perl
my #dir_list_initial = `ls *.txt | sed 's/.txt//g' `;
my $suffix = ".dat";
for (my $i=0; $i<=$#dir_list_initial; $i++){
print $dir_list_initial[$i] . $suffix;
}
I expect to print
A.dat
B.dat
C.dat
but it prints
.datA
.datB
.datC
You might try,
chomp(#dir_list_initial);
and later
print $dir_list_initial[$i] . "$suffix\n";
as every element of #dir_list_initial array has newline at the end of string.
Even better it would be to skip shell altogether, and use only perl,
my #dir_list_initial = map { s|\.txt||; $_ } glob("*.txt");
You're doing half of your program by running ls. Don't shell out when you can use builtin Perl mechanisms to do the same job The glob can do everything you're doing with ls and you don't have to depend upon the ls command (what if this is Windows?).
Also, always use strict; and use warnings; in your program, it can save you a ton of grief by picking up common mistakes.
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say);
for my $file ( glob "*.txt" ) {
chomp $file;
$file =~ s/\.txt$/.dat/;
say $file;
}
Note I'm using s/.../.../ to substitute one suffix with another. You need to learn regular expressions. Your s/.txt// is not correct. The . can stand for any character, and you don't specify that .txt has to be on the end of the string. If my file was called ftxtback.txt, you'd change the file name to .datback.txt which is not what you want.
there's no need for calling sed here, perl can handle that well:
my #dir_list_initial = `ls *.txt`;
chomp #dir_list_initial;
my $suffix = ".dat";
foreach (#dir_list_inital){
s/\.txt$/$suffix/; # this alters #dir_list_initial
print "$_\n";
}
if you have a perl more recent than 5.14, you also can do this:
use 5.014;
my #dir_list_initial = `ls *.txt`;
chomp #dir_list_initial;
my $suffix = ".dat";
say s/\.txt$/$suffix/r foreach #dir_list_initial; # this doesn't alter the array.
and, as mpapec already has pointed out, it would be better to not involve the shell at all:
say s/\.txt$/.dat/r foreach <*txt>

How to change all occurrences of a char in all files in a directory (and subdirectories ) using Perl

** I have a follow-up question that is marked with '**' **
I was asked to write Perl code that replaces every { with {function(<counter>) and in every replacement the counter should get larger by 1. e.g. first replacement of { will be {function(0) ,
second replacement of { will be {function(1) etc.
It suppose to do the replacement in every *.c and *.h file in a folder including subfolders.
I wrote this code :
#!/usr/bin/perl
use Tie::File;
use File::Find;
$counter = 0;
$flag = 1;
#directories_to_search = 'd:\testing perl';
#newString = '{ function('.$counter.')';
$refChar = "{";
finddepth(\&fileMode, #directories_to_search);
sub fileMode
{
my #files = <*[ch]>; # get all files ending in .c or .h
foreach $file (#files) # go through all the .c and .h flies in the directory
{
if (-f $file) # check if it is a file or dir
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $file or die "Can't read file: $!\n";
foreach ( #lines )
{
if (s/{/#newString/g)
{
$counter++;
#newString = '{function('.$counter.')';
}
untie #lines; # free #lines
}
}
}
}
The code searches the directory d:\testing Perl and does the replacement but instead of getting
{function(<number>) I get {function(number1) function(number3) function(number5) function(number7) for instance for the first replacement I get
{function(0) function(2) function(4) function(6) and I wanted to get {function(0)
I really don't know what is wrong with my code.
An awk solution or any other Perl solution will also be great!
* I have a follow-up question.
now I want my perl program to do the same substitution in all the files except the lines when there is a '{'
and a '}' in the same line. so i modified the code this way.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
use File::Find;
my $dir = "C:/test dir";
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
#foreach $filename (#ARGV)
while (<>)
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $ARGV or die "Can't read file: $!\n";
#$_='{function(' . $counter++ . ')';
foreach (#lines)
{
if (!( index (#lines,'}')!= -1 )) # if there is a '}' in the same line don't add the macro
{
s/{/'{function(' . $counter++ . ')'/ge;
print;
}
}
untie #lines; # free #lines
}
what I was trying to do is to go through all the files in #ARGV that i found in my dir and subdirs and for each *.c or *.h file I want to go line by line and check if this line contains '{'. if it does the program won't check if there is a '{' and won't make the substitution, if it doesn't the program will substitute '{' with '{function();'
unfortunately this code does not work.
I'm ashamed to say that I'm trying to make it work all day and still no go.
I would really appreciate some help.
Thank You!!
This is a simple matter of combining a finding method with an in-place edit. You could use Tie::File, but it is really the same end result. Also, needless to say, you should keep backups of your original files, always, when doing edits like these because changes are irreversible.
So, if you do not need recursion, your task is dead simple in Unix/Linux style:
perl -pi -we 's/{/"{ function(" . $i++ . ")"/ge' *.h *.c
Of course, since you seem to be using Windows, the cmd shell won't glob our arguments, so we need to do that manually. And we need to change the quotes around. And also, we need to supply a backup argument for the -i (in-place edit) switch.
perl -pi.bak -we "BEGIN { #ARGV = map glob, #ARGV }; s/{/'{ function(' . $i++ . ')'/ge" *.h *.c
This is almost getting long enough to make a script of.
If you do need recursion, you would use File::Find. Note that this code is pretty much identical in functionality as the one above.
use strict;
use warnings;
use File::Find;
my $dir = "d:/testing perl"; # use forward slashes in paths
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
while (<>) {
s/{/'{ function(' . $counter++ . ')'/ge;
print;
}
Don't be lulled into a false sense of security by the backup option: If you run this script twice in a row, those backups will be overwritten, so keep that in mind.
$ perl -pi -e 's| (?<={) | q#function(# . ($i++) . q#)# |gex' *.c *.h
It can be done in a single line as below:
perl -pi -e 's/({)/"{function(".++$a.")"/ge;' your_file
I have just taken an example input file and tested too.
> cat temp
line-1 { { { {
line-2 { { {
line-3 { {
line-4 {
Now the execution:
> perl -pi -e 's/({)/"{function(".++$a.")"/ge;' temp
> cat temp
line-1 {function(1) {function(2) {function(3) {function(4)
line-2 {function(5) {function(6) {function(7)
line-3 {function(8) {function(9)
line-4 {function(10)
Using awk '/{/{gsub(/{/,"{function("i++")");print;next}{print}' and your code as input:
$ awk '/{/{gsub(/{/,"{function("i++")");print;next}{print}' file
sub fileMode
{function(0)
my #files = <*[ch]>; # get all files ending in .c or .h
foreach $file (#files) # go through all the .c and .h flies in the directory
{function(1)
if (-f $file) # check if it is a file or dir
{function(2)
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{function(3)' '
tie #lines, 'Tie::File', $file or die "Can't read file: $!\n";
foreach ( #lines )
{function(4)
if (s/{function(5)/#newString/g)
{function(6)
$counter++;
#newString = '{function(7)function('.$counter.')';
}
untie #lines; # free #lines
}
}
}
}
Note: The function number won't be incremented for inline nested {.
$ echo -e '{ { \n{\n-\n{' | awk '/{/{gsub(/{/,"{function("i++")");print;next}1'
{function(0) {function(0)
{function(1)
-
{function(2)
Explanation:
/{/ # For any lines that contain {
gsub( /{/ , "{function("i++")" ) # replace { with function(i++)
print;next # print the line where the replacement happened and skip to the next
print # print all the lines

Find Particular String in File and Count How many Times it is repeated using perl

I have a Long File Say 10000 Lines.
That is same set of Data Repeated , Like 10 lines and next ten line will be Same.
I want to Find Say "ObjectName" String in that file and Count it, How Many Times is appearing in that file.
Can anyone post detailed code. I am new to Perl.
Using Perl:
perl -ne '$x+=s/objname//g;END{print $x,"\n";}' file
Updated:
Since OP wants the solution using handlers:
#!/usr/bin/perl
use warnings;
use strict;
open my $fh , '<' , 'f.txt' or die 'Cannot open file';
my $x=0;
while (<$fh>){
chomp;
$x+=s/objname//g;
}
close $fh;
print "$x";
Here's another option that also addresses your comment about searching in a whole directory:
#!/usr/bin/env perl
use warnings;
use strict;
my $dir = '.';
my $count = 0;
my $find = 'ObjectName';
for my $file (<$dir/*.txt>) {
open my $fh, '<', $file or die $!;
while (<$fh>) {
$count += /\Q$find\E/g;
}
close $fh;
}
print $count;
The glob denoted by <$dir/*.txt> will non-recursively get the names of all text files in the directory $dir. If you want all files, change it to <$dir/*>. Each file is opened and read, line-by-line. The regex /\Q$find\E/g globally matches the contents of $find against each line. The \Q ... \E notation escapes any meta-characters in the string you're looking for, else those characters may interfere with the matching.
Hope this helps!
This could be a one liner in bash
grep "ObjectName " <filename> | wc -l

Perl-script to read and print lines from multiple txt files?

We have 300+ txt files, of which are basically replicates of an email, each txt file has the following format:
To: blabla#hotmail.com
Subject: blabla
From: bla1#hotmail.com
Message: Hello World!
The platform I am to the script on is Windows, and everything is local (including the Perl instance). The aim is to write a script, which crawls through each file (all located within the same directory), and print out a list of each 'unique' email address in the from field. The concept is very easy.
Can anyone point me in the right direction here? I know how to start off a Perl script, and I am able to read a single file and print all details:
#!/usr/local/bin/perl
open (MYFILE, 'emails/email_id_1.txt');
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);
So now, I need to be able to read and print line 3 of this file, but perform this activity not just once, but for all of the files. I've looked into the File::Find module, could this be of any use?
What platform? If Linux then it's simple:
foreach $f (#ARGS) {
# Do stuff
}
and then call with:
perl mything.pl *.txt
In Windows you'll need to expand the wildcard first as cmd.exe doesn't expand wildcards (unlike Linux shells):
#ARGV = map glob, #ARGV
foreach $f (#ARGS) {
# Do stuff
}
then extracting the third line is just a simple case of reading each line in and counting when you've got to line 3 so you know to print the results.
The glob() builtin can give you a list of files in a directory:
chdir $dir or die $!;
my #files = glob('*');
You can use Tie::File to access the 3rd line of a file:
use Tie::File;
for (#files) {
tie my #lines, 'Tie::File', $_ or die $!;
print $lines[2], "\n";
}
Perl one-liner, windows-version:
perl -wE "#ARGV = glob '*.txt'; while (<>) { say $1 if /^From:\s*(.*)/ }"
It will check all the lines, but only print if it finds a valid From: tag.
Are you using a Unix-style shell? You can do this in the shell without even using Perl.
grep "^From:" ./* | sort | uniq -c"
The breakdown is as follows:
grep will grab every line that starts with "From:", and send it to...
sort, which will alpha sort those lines, then...
uniq, which will filter out dupe lines. The "-c" part will count the occurrences.
Your output would look like:
3 From: dave#example.com
5 From: foo#bar.example.com
etc...
Possible issues:
I'm not sure how complex your "From" lines will be, e.g. multiple addresses, different formats, etc.
You could enhance that grep step in a few ways, or replace it with a Perl script that has less-broad functionality than your proposed all-in-one script.
Please comment if anything isn't clear.
Here's my solution (I hope this isn't homework).
It checks all files in the current directory whose names end with ".txt", case-insensitive (e.g., it will find "foo.TXT", which is probably what you want under Windows). It also allows for possible variations in line terminators (at least CR-LF and LF), and searches for the From: prefix case-insensitively, and allows arbitrary whitespace after the :.
#!/usr/bin/perl
use strict;
use warnings;
opendir my $DIR, '.' or die "opendir .: $!\n";
my #files = grep /\.txt$/i, readdir $DIR;
closedir $DIR;
# print "Got ", scalar #files, " files\n";
my %seen = ();
foreach my $file (#files) {
open my $FILE, '<', $file or die "$file: $!\n";
while (<$FILE>) {
if (/^From:\s*(.*)\r?$/i) {
$seen{$1} = 1;
}
}
close $FILE;
}
foreach my $addr (sort keys %seen) {
print "$addr\n";
}

How can I do bulk search and replace with Perl?

I have the following script that takes in an input file, output file and
replaces the string in the input file with some other string and writes out
the output file.
I want to change the script to traverse through a directory of files
i.e. instead of prompting for input and output files, the script should take
as argument a directory path such as C:\temp\allFilesTobeReplaced\ and
search for a string x and replace it with y for all files under that
directory path and write out the same files.
How do I do this?
Thanks.
$file=$ARGV[0];
open(INFO,$file);
#lines=<INFO>;
print #lines;
open(INFO,">c:/filelist.txt");
foreach $file (#lines){
#print "$file\n";
print INFO "$file";
}
#print "Input file name: ";
#chomp($infilename = <STDIN>);
if ($ARGV[0]){
$file= $ARGV[0]
}
print "Output file name: ";
chomp($outfilename = <STDIN>);
print "Search string: ";
chomp($search = <STDIN>);
print "Replacement string: ";
chomp($replace = <STDIN>);
open(INFO,$file);
#lines=<INFO>;
open(OUT,">$outfilename") || die "cannot create $outfilename: $!";
foreach $file (#lines){
# read a line from file IN into $_
s/$search/$replace/g; # change the lines
print OUT $_; # print that line to file OUT
}
close(IN);
close(OUT);
The use of the perl single liner
perl -pi -e 's/original string/new string/' filename
can be combined with File::Find, to give the following single script (this is a template I use for many such operations).
use File::Find;
# search for files down a directory hierarchy ('.' taken for this example)
find(\&wanted, ".");
sub wanted
{
if (-f $_)
{
# for the files we are interested in call edit_file().
edit_file($_);
}
}
sub edit_file
{
my ($filename) = #_;
# you can re-create the one-liner above by localizing #ARGV as the list of
# files the <> will process, and localizing $^I as the name of the backup file.
local (#ARGV) = ($filename);
local($^I) = '.bak';
while (<>)
{
s/original string/new string/g;
}
continue
{
print;
}
}
You can do this with the -i param:
Just process all the files as normal, but include -i.bak:
#!/usr/bin/perl -i.bak
while ( <> ) {
s/before/after/;
print;
}
This should process each file, and rename the original to original.bak And of course you can do it as a one-liner as mentioned by #Jamie Cook
Try this
#!/usr/bin/perl -w
#files = <*>;
foreach $file (#files) {
print $file . '\n';
}
Take also a look to glob in Perl:
http://perldoc.perl.org/File/Glob.html
http://www.lyingonthecovers.net/?p=312
I know you can use a simple Perl one-liner from the command line, where filename can be a single filename or a list of filenames. You could probably combine this with bgy's answer to get the desired effect:
perl -pi -e 's/original string/new string/' filename
And I know it's trite but this sounds a lot like sed, if you can use gnu tools:
for i in `find ./allFilesTobeReplaced`; do sed -i s/original string/new string/g $i; done
perl -pi -e 's#OLD#NEW#g' filename.
You can replace filename with the pattern that suits your file list.