Need to loop through directory - delete lines that match pattern - perl

Need to loop through a Unix directory and search each line in each file. If there is a pattern match delete the line. Was not able to get the line deletion to work so i'm just trying to find pattern and replace with another.
Populating an array with file names and looping through. I have a counter set it's looking at each of the lines in each file (at least they count is correct).
#!/usr/bin/perl -l
#!/usr/bin/perl -i.bak -w
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
# 4-1-19
# pfs
# remove lines with dental code ADD2999 from all HMO Max load files in /home/hsxxx/dat/feeLoad directory
$| = 1;
chdir "/home/hstrn/dat/feeLoad";
chdir;
my $dir = </home/hstrn/dat/feeLoad/>;
my #files;
my $count=0;
opendir(DIR, $dir) or die "Cannot open directory $dir, Perl says $!\n";
while (my $file = readdir DIR)
{
push #files, "$dir/$file" unless -d "$dir/$file";
}
closedir DIR;
{
local #ARGV = #files;
while (<>)
{
s/ADD2999/sometext/g;
$count++;
}
print "Total lines read are: $count";
}
Would expect all strings ADD2999 to be replaced with sometext

To remove lines, you need to avoid printing them when writing to the new file. Your code doesn't write to any files at all???
This might be a job for existing tools.
find /home/hstrn/dat/feeLoad -maxdepth 1 -type f \
-exec perl -i~ -ne'print if !/ADD2999/' {} +
Use -i instead of -i~ if you want to avoid creating a backup. I prefer creating the backups, then deleting them once I've confirmed that everything is ok.
Show the files that are going to get deleted:
find /home/hstrn/dat/feeLoad -maxdepth 1 -type f -name '*~'
Delete the files:
find /home/hstrn/dat/feeLoad -maxdepth 1 -type f -name '*~' -delete

This would be my first attempt at the problem, but it could use some more corner case checking. E.g. how do you handle write-protected files, etc. It also assumes that the files are small enough to fit into memory for processing.
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use File::Spec;
use File::Slurper qw(read_text write_text);
my $count = 0;
my $dir = "tmp";
opendir(my $dh, $dir);
while (readdir $dh) {
# skip anything that shouldn't be processed
next if /^\.\.?$/; # . && ..
my $file = File::Spec->catfile($dir, $_);
next if -d $file; # directories
# slurp file content and replace text
my $content = read_text($file);
my $matches = ($content =~ s/ADD2999/sometext/g);
# count lines
my #eols = ($content =~ /(\n)/g);
$count += #eols;
# replace original file if contents were modified
write_text($file, $content) if $matches;
}
closedir($dh);
print "Total lines read are: $count\n";
exit 0;
Test run:
$ wc -l tmp/test*.txt
5 tmp/test2.txt
6 tmp/test.txt
11 total
$ fgrep ADD2999 tmp/*.txt
tmp/test2.txt:asddsada ADD2999 asdsadasd
tmp/test2.txt:21312398 ADD2999 dasdas
$ perl dummy.pl
Total lines read are: 11
$ fgrep ADD2999 tmp/*.txt
$ fgrep sometext tmp/*.txt
tmp/test2.txt:asddsada sometext asdsadasd
tmp/test2.txt:21312398 sometext dasdas
If the files are large you will need to use line-by-line processing approach (just showing the contents of the loop). That has the side-effect that all files will be touched, although they might not have any replacements in it:
# read file and replace text
open(my $ifh, '<', $file);
my $tmpfile = File::Spec->catfile($dir, "$_.$$");
open(my $ofh, '>', $tmpfile);
while (<$ifh>) {
s/ADD2999/sometext/g;
print $ofh $_;
}
$count += $.; # total lines in $ifh
close($ofh);
close($ifh);
# replace original file with new file
unlink($file);
rename($tmpfile, $file);

Related

Recovering a specific line in multiple .txt in a directory using Perl

I have the results of a program which gives me the results from some search giving me 2000+ file txt archives. I just need a specific line in each file, this is what I have been trying with Perl:
opendir(DIR, $dirname) or die "Could not open $dirname\n";
while ($filename = readdir(DIR)) {
print "$filename\n";
open ($filename, '<', $filename)or die("Could not open file.");
my $line;
while( <$filename> ) {
if( $. == $27 ) {
print "$line\n";
last;
}
}
}
closedir(DIR);
But there is a problem with the $filename in line 5 and I don't know an alternative to it so I don't have to manually name each file.
Several issues with that code:
Using an old-school bareword identifier for the directory handle instead of a autovivified variable like you are for the file handle.
Using the same variable for the filename and file handle is pretty strange.
You don't check to see if the file is a directory or something else other than a plain file before trying to open it.
$27?
You never assign anything to that $line variable before printing it.
Unless $directory is your program's current working directory, you're running into an issue mentioned in the readdir documentation
If you're planning to filetest the return values out of a readdir, you'd better prepend the directory in question. Otherwise, because we didn't chdir there, it would have been testing the wrong file.
(Substitute open for filetest)
Always use strict; and use warnings;.
Personally, if you just want to print the 27th line of a large number of files, I'd turn to awk and find (Using its -exec test to avoid potential errors about the command line maximum length being hit):
find directory/ -maxdepth 1 -type -f -exec awk 'FNR == 27 { print FILENAME; print }' \{\} \+
If you're on a Windows system without standard unix tools like those installed, or it's part of a bigger program, a fixed up perl way:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw/say/;
use File::Spec;
my $directory = shift;
opendir(my $dh, $directory);
while (my $filename = readdir $dh) {
my $fullname = File::Spec->catfile($directory, $filename); # Construct a full path to the file
next unless -f $fullname; # Only look at regular files
open my $fh, "<", $fullname;
while (my $line = <$fh>) {
if ($. == 27) {
say $fullname;
print $line;
last;
}
}
close $fh;
}
closedir $dh;
You might also consider using glob to get the filenames instead of opendir/readdir/closedir.
And if you have Path::Tiny available, a simpler version is:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use feature qw/say/;
use Path::Tiny;
my $directory = shift;
my $dir = path $directory;
for my $file ($dir->children) {
next unless -f $file;
my #lines = $file->lines({count => 27});
if (#lines == 27) {
say $file;
print $lines[-1];
}
}

Perl code to check the last line number of a file

I have writeen a perl code that writes the number of lines it is one by one processing. I would like to get only the last line number of a file. The code is as follows:
#!/usr/bin/perl
using strict;
using warnings;
my $qbsid_dir = "/root/deep/";
opendir (DIR, "$qbsid_dir") or die "Cannot open the directory!\n";
while (my $file = readdir DIR){
next if ($file =~ m/^\./);
open(FH, "$qbsid_dir/$file") or die "Cannot open the file\n";
while (my $line = <FH>){
print "$.\n";
}
close (FH);
}
closedir (DIR);
The '/root/deep' directory contains two files. One with 90 lines and other with 100 lines written in the files.
I want those numbers to be printed instead of individual numbers such as 1..90 and 1..100 by $.
Thanks.
Do you really want to use Perl ?
wc -l <File>
If you want the last line number, wait to print $. until outside the while loop for processing the file:
open my $fh, '<', "$qbsid_dir/$file" or die "Can't open $file: $!";
1 while (<$fh>);
print "$file -> $.\n";
close $fh;
Be sure to read: perlfaq5 - How do I count the number of lines in a file?
Try this. It also print the last number of the filename.
$filedir = '/root/deep/';
opendir (dir, "$filedir");
#directory = readdir(dir);
#grep = grep{m/.*\.txt/g} #directory; #It matches the particular file format
foreach $dir(#grep){
open(file,"$filedir/$dir");
#line =<file>;
$total = #line; #$total variable store the last file number from array (#line)
print "File $dir Last line \t: $total\n";
}
You can use oneliner too
perl -nE '}{say $.' filename
#or
command | perl -nE '}{say $.'
test
$ seq 10 | perl -nE '}{say $.'
10
$ seq 10 | wc -l
10

Perl script to compare file contents of dirA with file contents of dirB and output the difference in a separate file

I am just a beginner at perl and I need help with the following.
DirA and DirB have files f1 ,f2 ,f3 ,f4, f5 (not necessarily 5 in number). I need to compare the contents of f1 in DirA with contents of f1 in DirB and output the result in f1_diff. Similarly need to do this for all files in directories A and B. So in the above case assuming the contents of all files are different, Script will output 5 files f1_diff,f2_diff,f3_diff,f4_diff,f5_diff which has the missing lines of both f1 DirA and f1 DirB.
For example, if
f1 dir A has line1, line2, line3, line4xxxx, line5
f1 dir B has line1, line2xxxx, line3, line4, line5
f1_diff should have
line2 --> line2xxxx
line4xxxx -->line4
Can someone please help me with this.
Update:
I have the below script right now and need to add the following.
Filename: # of different lines
File 1 – 1
File 2 - 30
File 3 – missing in dir1
File 3a – missing in dir2
Secondly the number of lines differing. Can someone help me modify the same
#!/usr/bin/perl
package Main; {
use strict;
use warnings;
use Cwd;
my $DirA;
my $DirB;
my $y;
print ("\nChoose one of the entries below\n");
print ("e1\e2\e3\n\n");
print("Enter here --->");
my $dir = <>;
chomp($dir);
if ($dir eq "e1"){
$DirA = "./dir1";
$DirB = "./dir2";
}
elsif ($dir eq "e2"){
$DirA = "./dir3";
$DirB = "./dir4";
}
else{
$DirA = "./dir5";
$DirB = "./dir6";
}
opendir my($dh), "$DirA" or die "Couldn't open dir DirA!";
my #files = readdir $dh;
closedir $dh;
system("rm -rf diffs");
system ("mkdir diffs");
foreach my $file (#files) {
next if -d $file;
my $diff = `diff -y --suppress-common-lines "$DirA/$file" "$DirB/$file"`;
open DIFF_FILE, ">$file"."_diff";
print DIFF_FILE $diff;
close DIFF_FILE;#}
}
chdir("./diffs/");
my $cwd = cwd();
system("mv ../*_diff .");
foreach(glob('*.*')) {
unlink if (-f $_) && !-s _;
}
print("Total number of differences = "); system("ls *_diff | wc -l");print("\n");
}
I recommend the use of CPAN Modules to achieve this tasks:
To find files: File::Find
To compare files: File::Compare
To show file differences: Text::Diff
Check if the output format of Text::Diff is what you need, it offers configuration features
You could use Text::Diff.
#!/usr/bin/perl
use Text::Diff;
use strict;
use warnings;
opendir my($dh), "DirA" or die "Couldn't open dir DirA!";
my #files = readdir $dh;
closedir $dh;
foreach my $file (#files) {
next if -d $file;
my $diff = diff "DirA/$file", "DirB/$file";
open DIFF_FILE, ">$file"."_diff";
print DIFF_FILE $diff;
close DIFF_FILE;
}
It is easy if parallel is available :
ls dirA/f* | parallel 'echo {} - dirB/{/} >>{/}_dif ; diff {} dirB/{/} >> {/}_dif '

How to print all filenames in a directory while running line by line on all the files

The program I wrote goes through all the files in directory and sub directories line by line and does some commands the include $counter.
I would like to make a .txt file every line will look like this:
<file name> <$counter in the beginning of the file>
For example if I have three file a.txt , b.txt and c.txt in file a the counter counts 10 in file b it counts 20 and in file c it counts 30 the file would look like this:
a.txt 0
b.txt 10
c.txt 20
My program looks like this:
use strict;
use warnings;
use File::Find;
my $dir = "C:/New Folder";
open (MYFILE, '>>data.txt');
# fill up our argument list with file names:
find(sub { if (-f && /\.[c]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
foreach $argvv(#ARGV)
{
while (<>)
{
if ($prev_arg ne $argvv)
{
print MYFILE "$argvv $counter\n";
$prev_arg = $argvv;
}
#some unrelated line by line code here
close (MYFILE);
}
}
What I was trying to do is to make the program print the file name and the counter every time it finishes going through a file and starts another one.
The data.txt file i get is the name of the first file and the counter printed for each line of each file in the directory.
Needless to say that I'm a total noob in Perl so I would really appreciate some help.
Thanks :)
There are a few issues with your code.
You are using strict and warnings but you have not declared $prev_arg and $argvv. The warnings have told you about them.
You never open a file for reading. Instead, you try to use the -i command line switch and an argument list to read from STDIN. None of that makes sense to me. Instead, you should be using a normal array and open the files one by one.
You are using old-fashioned bareword filehandles and the two-argument form of open. Instead, please use lexical filehandles as they are not global but only exist in their surrounding block. Use the three-argument form of open so you don't run into security problems. See here.
Try this:
use warnings;
use strict;
use File::Find;
my $dir = 'D:/temp';
my #files;
find(
sub {
if (-f && /\.c.txt$/) {
push #files, $File::Find::name }
},
$dir
);
open my $fh_out, '>>', 'data.txt' # open the output filehandle
or die $!;
foreach my $file (#files) { # iterate the file list
open my $fh_in, '<', $file # open the current file for reading
or die $!;
my $counter = <$fh_in>; # read the first line
chomp $counter; # remove trailing linebreak
close $fh_in;
print $fh_out "$file $counter\n"; # close the input filehandle (explicit)
}
close $fh_out; # close the output filehandle

How to change all occurrences of a char in all files in a directory (and subdirectories ) using Perl

** I have a follow-up question that is marked with '**' **
I was asked to write Perl code that replaces every { with {function(<counter>) and in every replacement the counter should get larger by 1. e.g. first replacement of { will be {function(0) ,
second replacement of { will be {function(1) etc.
It suppose to do the replacement in every *.c and *.h file in a folder including subfolders.
I wrote this code :
#!/usr/bin/perl
use Tie::File;
use File::Find;
$counter = 0;
$flag = 1;
#directories_to_search = 'd:\testing perl';
#newString = '{ function('.$counter.')';
$refChar = "{";
finddepth(\&fileMode, #directories_to_search);
sub fileMode
{
my #files = <*[ch]>; # get all files ending in .c or .h
foreach $file (#files) # go through all the .c and .h flies in the directory
{
if (-f $file) # check if it is a file or dir
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $file or die "Can't read file: $!\n";
foreach ( #lines )
{
if (s/{/#newString/g)
{
$counter++;
#newString = '{function('.$counter.')';
}
untie #lines; # free #lines
}
}
}
}
The code searches the directory d:\testing Perl and does the replacement but instead of getting
{function(<number>) I get {function(number1) function(number3) function(number5) function(number7) for instance for the first replacement I get
{function(0) function(2) function(4) function(6) and I wanted to get {function(0)
I really don't know what is wrong with my code.
An awk solution or any other Perl solution will also be great!
* I have a follow-up question.
now I want my perl program to do the same substitution in all the files except the lines when there is a '{'
and a '}' in the same line. so i modified the code this way.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
use File::Find;
my $dir = "C:/test dir";
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
#foreach $filename (#ARGV)
while (<>)
{
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{' '
tie #lines, 'Tie::File', $ARGV or die "Can't read file: $!\n";
#$_='{function(' . $counter++ . ')';
foreach (#lines)
{
if (!( index (#lines,'}')!= -1 )) # if there is a '}' in the same line don't add the macro
{
s/{/'{function(' . $counter++ . ')'/ge;
print;
}
}
untie #lines; # free #lines
}
what I was trying to do is to go through all the files in #ARGV that i found in my dir and subdirs and for each *.c or *.h file I want to go line by line and check if this line contains '{'. if it does the program won't check if there is a '{' and won't make the substitution, if it doesn't the program will substitute '{' with '{function();'
unfortunately this code does not work.
I'm ashamed to say that I'm trying to make it work all day and still no go.
I would really appreciate some help.
Thank You!!
This is a simple matter of combining a finding method with an in-place edit. You could use Tie::File, but it is really the same end result. Also, needless to say, you should keep backups of your original files, always, when doing edits like these because changes are irreversible.
So, if you do not need recursion, your task is dead simple in Unix/Linux style:
perl -pi -we 's/{/"{ function(" . $i++ . ")"/ge' *.h *.c
Of course, since you seem to be using Windows, the cmd shell won't glob our arguments, so we need to do that manually. And we need to change the quotes around. And also, we need to supply a backup argument for the -i (in-place edit) switch.
perl -pi.bak -we "BEGIN { #ARGV = map glob, #ARGV }; s/{/'{ function(' . $i++ . ')'/ge" *.h *.c
This is almost getting long enough to make a script of.
If you do need recursion, you would use File::Find. Note that this code is pretty much identical in functionality as the one above.
use strict;
use warnings;
use File::Find;
my $dir = "d:/testing perl"; # use forward slashes in paths
# fill up our argument list with file names:
find(sub { if (-f && /\.[hc]$/) { push #ARGV, $File::Find::name } }, $dir);
$^I = ".bak"; # supply backup string to enable in-place edit
my $counter = 0;
# now process our files
while (<>) {
s/{/'{ function(' . $counter++ . ')'/ge;
print;
}
Don't be lulled into a false sense of security by the backup option: If you run this script twice in a row, those backups will be overwritten, so keep that in mind.
$ perl -pi -e 's| (?<={) | q#function(# . ($i++) . q#)# |gex' *.c *.h
It can be done in a single line as below:
perl -pi -e 's/({)/"{function(".++$a.")"/ge;' your_file
I have just taken an example input file and tested too.
> cat temp
line-1 { { { {
line-2 { { {
line-3 { {
line-4 {
Now the execution:
> perl -pi -e 's/({)/"{function(".++$a.")"/ge;' temp
> cat temp
line-1 {function(1) {function(2) {function(3) {function(4)
line-2 {function(5) {function(6) {function(7)
line-3 {function(8) {function(9)
line-4 {function(10)
Using awk '/{/{gsub(/{/,"{function("i++")");print;next}{print}' and your code as input:
$ awk '/{/{gsub(/{/,"{function("i++")");print;next}{print}' file
sub fileMode
{function(0)
my #files = <*[ch]>; # get all files ending in .c or .h
foreach $file (#files) # go through all the .c and .h flies in the directory
{function(1)
if (-f $file) # check if it is a file or dir
{function(2)
my #lines;
# copy each line from the text file to the string #lines and add a function call after every '{function(3)' '
tie #lines, 'Tie::File', $file or die "Can't read file: $!\n";
foreach ( #lines )
{function(4)
if (s/{function(5)/#newString/g)
{function(6)
$counter++;
#newString = '{function(7)function('.$counter.')';
}
untie #lines; # free #lines
}
}
}
}
Note: The function number won't be incremented for inline nested {.
$ echo -e '{ { \n{\n-\n{' | awk '/{/{gsub(/{/,"{function("i++")");print;next}1'
{function(0) {function(0)
{function(1)
-
{function(2)
Explanation:
/{/ # For any lines that contain {
gsub( /{/ , "{function("i++")" ) # replace { with function(i++)
print;next # print the line where the replacement happened and skip to the next
print # print all the lines