Adding more features in perl script - perl

In the below perl script, I check my folder name (which is in the date format like 11-08-31) with the current date. If it matches, I process the folder. It also checks the previous day folder if there is no folder in today's date. I already asked this type of question here but I need to make some changes here and add new features as well:
The script checks for the previous date if todays not find. But I need to check if the previous date has already been processed or not so that I donot process it again. So, Do I need to create a list for it?
This script checks only for the one previous date. What if I have to check for the 2 previous days? Thanks for your help. hope you understand my doubts.
Updated: This perl script run automatically when It checks the curent date with the folder name. The folder is a tar folder which is loaded from other server.
So, basically I need to run the script if it matched with the folder name and current date.
Problem: Sometimes, I used to get the folder next day and my perl script checks only for the current date. The folder i get has the name which is previous date (not the current date).So, I need to do processing of the folder manually. I need to automate it in my perl script
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use DateTime;
use File::Copy;
# set to your desired time zone
my $today = DateTime->now( time_zone => "America/New_York" );
my $td = $today->strftime("%y-%m-%d");
# strongly recommended to do date math in the 'floating'/UTC zone
my $yesterday = $today->set_time_zone('floating')->subtract( days => 1);
my $yd = $yesterday->set_time_zone('America/New_York')->strftime("%y-%m-%d");
my $dir = shift or die "Provide path on command line. $!";
if ($dir eq '.') {
$dir = cwd;
}
elsif ($dir !~ /^\//) {
$dir = cwd() . "/$dir";
}
opendir my $dh, $dir or die $!;
my #dir = sort grep {-d and /$td/ || /$yd/} readdir $dh;
closedir $dh or die $!;
#dir or die "Found no date directories. $!";
my $dday = "$dir/$dir[-1]"; # is today unless today not found, then yesterday
my $fdir = '/some/example/path/';
my #gzfiles = glob("$dday/*tar.gz");
foreach my $zf (#gzfiles) {
next if (($zf =~ /BMP/) || ($zf =~ /LG/) || ($zf =~ /MAP/) || ($zf =~ /STR/));
print "$zf\n";
copy($zf, $fdir) or die "Unable to copy. $!";
}

Well, another way to do it, as suggested by mugen kenichi, is to use Storable. This way stores a hash with all processed directories in it. Then when you run your program, it can check the hash to see if they have been processed.
You would need a one-time script to set up the hash of processed directories.
#!/usr/bin/perl
use strict;
use warnings;
use Storable;
# This script to be run 1 time only. Sets up 'processed' directories hash.
# After this script is run, ready to run the daily script.
my $dir = '.'; # or what ever directory the date-directories are stored in
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = grep {-d && /^\d\d-\d\d-\d\d$/ && $_ le '11-04-21'} readdir $dh;
closedir $dh or die "Unable to close $dir $!";
my %processed = map {$_ => 1} #dir;
store \%processed, 'processed_dirs.dat';
Then, a script to be run periodically to find and process your date directories.
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
use Storable;
my $dir = shift or die "Provide path on command line. $!";
my $processed = retrieve('processed_dirs.dat'); # $processed is a hashref
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = grep {-d && /^\d\d-\d\d-\d\d$/ && !$processed->{$_} } readdir $dh;
closedir $dh or die "Unable to close $dir $!";
#dir or die "Found no unprocessed date directories";
my $fdir = '/some/example/path';
for my $date (#dir) {
my $dday = "$dir/$date";
my #gzfiles = glob("$dday/*tar.gz");
foreach my $zf (#gzfiles) {
next if $zf =~ /BMP/ || $zf =~ /LG/ || $zf =~ /MAP/ || $zf =~ /STR/;
print "$zf\n";
copy($zf, $fdir) or die "Unable to copy $zf to $fdir. $!";
}
$processed->{ $date } = 1;
}
store $processed, 'processed_dirs.dat';

If you want to persist the status of whether these directories were processed beyond a single run of your app, you could create a .processed file in each directory and check for the existence of this file before you process the directory.
If you just need to store the status of these directories (processed or unprocessed) during the execution of your script, you could use a hash keyed with the directory name:
my %PROCESSED = ();
if ($processing_done) {
%PROCESSED{$dirname} = 1;
} else {
%PROCESSED{$dirname} = 0;
}
You can check to see if each directory has been processed by reading the key value from the hash:
if (%PROCESSED{$dirname} == 0) {
... do some processing
} else {
... this one is already done
}

This solution finds all directories yet to be processed that are newer than the most recent direcory-date processed. You have manually record it the first time, (before the script is run). The script will update it from that point on.
The file could be named like my $last = 'dir_last.dat'; I just entered a file at the command line like:
C:\Old_Data\perlp>echo 11-07-14 > dir_last.bat
C:\Old_Data\perlp>type dir_last.bat
11-07-14
C:\Old_Data\perlp>
This assumes the newest directory was 11-07-14. You must find out this yourself before running the script.
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
my $dir = shift or die "Provide path on command line. $!";
my $last = 'dir_last.dat';
open my $fh, "<", $last or die "Unable to open $last $!";
chomp(my $last_proc = <$fh>);
close $fh or die "Unable to close $last $!";
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = sort grep {-d && /^\d\d-\d\d-\d\d$/ && $_ gt $last_proc} readdir $dh;
closedir $dh or die "Unable to close $dir $!";
#dir or die "Found no date directories after last update: $last_proc";
my $fdir = '/some/example/path';
for my $date (#dir) {
my $dday = "$dir/$date";
my #gzfiles = glob("$dday/*tar.gz");
foreach my $zf (#gzfiles) {
next if $zf =~ /BMP/ || $zf =~ /LG/ || $zf =~ /MAP/ || $zf =~ /STR/;
print "$zf\n";
copy($zf, $fdir) or die "Unable to copy $zf to $fdir. $!";
}
}
open $fh, ">", $last or die "Unable to open $last $!";
print $fh "$dir[-1]\n"; # record the newest date-directory as processed
close $fh or die "Unable to close $last $!";
Notice that I didn't rely on cwd like the first script. It really wasn't needed there and isn't needed here. opendir, glob and copy all can handle the dot (cwd) directory and relative paths.
The header includes the lines use strict; and use warnings;. Their purpose is to alert you of errors in your code (most all perl scripts should use them unless an expert decides to exclude them - for what reason I don't know). The first line tells unix where to find the interpreter (perl).

Related

Search string with multiple words in the pattern

My program is trying to search a string from multiple files in a directory. The code searches for single patterns like perl but fails to search a long string like Status Code 1.
Can you please let me know how to search for strings with multiple words?
#!/usr/bin/perl
my #list = `find /home/ad -type f -mtime -1`;
# printf("Lsit is $list[1]\n");
foreach (#list) {
# print("Now is : $_");
open(FILE, $_);
$_ = <FILE>;
close(FILE);
unless ($_ =~ /perl/) { # works, but fails to find string "Status Code 1"
print "found\n";
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh "My first report generated by perl";
close $fh;
} # end unless
} # end For
There are a number of problems with your code
You must always use strict and use warnings at the top of every Perl program. There is little point in delcaring anything with my without strict in place
The lines returned by the find command will have a newline at the end which must be removed before Perl can find the files
You should use lexical file handles (my $fh instead of FILE) and the three-parameter form of open as you do with your output file
$_ = <FILE> reads only the first line of the file into $_
unless ($_ =~ /perl/) is inverted logic, and there's no need to specify $_ as it is the default. You should write if ( /perl/ )
You can't use say unless you have use feature 'say' at the top of your program (or use 5.010, which adds all features available in Perl v5.10)
It is also best to avoid using shell commands as Perl is more than able to do anything that you can using command line utilities. In this case -f $file is a test that returns true if the file is a plain file, and -M $file returns the (floating point) number of days since the file's modification time
This is how I would write your program
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
for my $file ( glob '/home/ad/*' ) {
next unless -f $file and int(-M $file) == 1;
open my $fh, '<', $file or die $!;
while ( <$fh> ) {
if ( /perl/ ) {
print "found\n";
my $filename = 'report.txt';
open my $out_fh, '>>', $filename or die "Could not open file '$filename': $!";
say $fh "My first report generated by perl";
close $out_fh;
last;
}
}
}
it should have matched unless $_ contains text in different case.
try this.
unless($_ =~ /Status\s+Code\s+1/i) {
Change
unless ($_ =~ /perl/) {
to:
unless ($_ =~ /(Status Code 1)/) {
I am certain the above works, except it's case sensitive.
Since you question it, I rewrote your script to make more sense of what you're trying to accomplish and implement the above suggestion. Correct me if I am wrong, but you're trying to make a script which matches "Status Code 1" in a bunch of files where last modified within 1 day and print the filename to a text file.
Anyways, below is what I recommend:
#!/usr/bin/perl
use strict;
use warnings;
my $output_file = 'report.txt';
my #list = `find /home/ad -type f -mtime -1`;
foreach my $filename (#list) {
print "PROCESSING: $filename";
open (INCOMING, "<$filename") || die "FATAL: Could not open '$filename' $!";
foreach my $line (<INCOMING>) {
if ($line =~ /(Status Code 1)/) {
open( FILE, ">>$output_file") or die "FATAL: Could not open '$output_file' $!";
print FILE sprintf ("%s\n", $filename);
close(FILE) || die "FATAL: Could not CLOSE '$output_file' $!";
# Bail when we get the first match
last;
}
}
close(INCOMING) || die "FATAL: Could not close '$filename' $!";
}

Perl - search and replace across multiple lines across multiple files in specified directory

At the moment this code replaces all occurences of my matching string with my replacement string, but only for the file I specify on the command line. Is there a way to change this so that all .txt files for example, in the same directory (the directory I specify) are processed without having to run this 100s of times on individual files?
#!/usr/bin/perl
use warnings;
my $filename = $ARGV[0];
open(INFILE, "<", $filename) or die "Cannot open $ARGV[0]";
my(#fcont) = <INFILE>;
close INFILE;
open(FOUT,">$filename") || die("Cannot Open File");
foreach $line (#fcont) {
$line =~ s/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/gm;
print FOUT $line;
}
close INFILE;
I have also tried this:
perl -p0007i -e 's/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/m' *.txt
But have noticed that is only changes the first occurence of the matched pattern and ignores all the rest in the file.
I also have tried this, but it doesn't work in the sense that it just creates a blank file:
use v5.14;
use strict;
use warnings;
use DBI;
my $source_dir = "C:/Testing2";
# Store the handle in a variable.
opendir my $dirh, $source_dir or die "Unable to open directory: $!";
my #files = grep /\.txt$/i, readdir $dirh;
closedir $dirh;
# Stop script if there aren't any files in the list
die "No files found in $source_dir" unless #files;
foreach my $file (#files) {
say "Processing $source_dir/$file";
open my $in, '<', "$source_dir/$file" or die "Unable to open $source_dir/$file: $!\n";
open(FOUT,">$source_dir/$file") || die("Cannot Open File");
foreach my $line (#files) {
$line =~ s/\<br\/\>\n([[:space:]][[:space:]][[:space:]][[:space:]][A-Z])/\n$1/gm;
print FOUT $line;
}
close $in;
}
say "Status: Processing of complete";
Just wondering what am I missing from my code above? Thanks.
You could try the following:
opendir(DIR,"your_directory");
my #all_files = readdir(DIR);
closedir(DIR);
for (#all_files) .....

perl - loop through directory to find file.mdb and execute code if file.ldb not found

I am a beginner PERL programmer and I have come across a snag that I can't get by. I have been reading and re-reading web posts and Simon Cozens book at perl.org all day, but can't seem to solve the problem.
My intention with the code below is to loop through files in a directory and when the file has a certain string a name to verify that the same file name doesn't exist with a different extension and if it doesn't, to print me the file name (later I will implement a delete of the file, but for now I want to ensure it will work.) Specifically, I am finding .mdb files and after checking there are no associated .ldb's files, deleting the .mdb file.
right now my code returns this:
RRED_Database_KHOVIS.ldb
RRED_Database_KHOVIS.mdb
I will kill RRED_Database_KHOVIS.mdb
RRED_Database_mkuttler.mdb
I will kill RRED_Database_mkuttler.mdb
RRED_Database_SBreslow.ldb
RRED_Database_SBreslow.mdb
I will kill RRED_Database_SBreslow.mdb
i want it to only return the "I will kill..." after a .mdb file with no associated .ldb file.
My current code is below. I appreciate any help offered...
use strict;
use warnings;
use File::Find;
use diagnostics;
my $dir = "//vfg1msfs01ab/vfgcfs01\$/Regulatory Reporting/Access Database/";
my $filename = "RRED_Database";
my $fullname, my $ext;
opendir DH, $dir or die "Couldn't open the directory: $!";
while ($_ = readdir(DH)) {
my $ext = ".mdb";
if ((/$filename/) && ($_ ne $filename . $ext)) {
print "$_ \n";
unless (-e $dir . s/.mdb/.ldb/) {
s/.ldb/.mdb/;
print "I will kill $_ \n\n" ;
#unlink $_ or print "oops, couldn't delete $_: $!\n";
}
s/.ldb/.mdb/;
}
}
When looping through files, I like to use 'next' statements repeatedly to assure that I'm only looking at exactly what I want. Try this:
use strict;
use warnings;
use File::Find;
use diagnostics;
my $dir = "//vfg1msfs01ab/vfgcfs01\$/Regulatory Reporting/Access Database/";
my $filename = "RRED_Database";
my $fullname, my $ext;
opendir DH, $dir or die "Couldn't open the directory: $!";
while ($_ = readdir(DH)) {
my $ext = ".mdb";
# Jump to next while() iteration unless the file begins
# with $filename and ends with $ext,
# and capture the basename in $1
next unless $_ =~ m|($filename.*)$ext|;
# Jump to next while() iteration if if the file basename.ldb is found
next if -f $1 . ".ldb";
# At this point, we have an mdb file with no matching ldb file
print "$_ \n";
print "I will kill $_ \n\n" ;
#unlink $_ or print "oops, couldn't delete $_: $!\n";
}
While stuart's anwser made it more lean... I was able to also get it to work with the code below... (i changed .mdb to .accdb because I am now dealing with different file type)
use strict;
use warnings;
use File::Spec;
use diagnostics;
my $dir = "//vfg1msfs01ab/vfgcfs01\$/Regulatory Reporting/Access Database/";
my $filename = "RRED_Database";
my $ext;
opendir DH, $dir or die "Couldn't open the directory: $!";
while ($_ = readdir(DH)) {
my $ext = ".accdb";
if ((/$filename/) && ($_ ne $filename . $ext) && ($_ !~ /.laccdb/)) {
# if file contains database name, is not the main database and is not a locked version of db
s/$ext/.laccdb/;
unless (-e File::Spec->join($dir,$_)) {
s/.laccdb/$ext/;
#print "I will kill $_ \n\n";
unlink $_ or print "oops, couldn't delete $_: $!\n";
}
s/.laccdb/$ext/;
}
}

location of file in perl

As suggested by Chris, a user on this site. In the 1st perl script: The values are stored in the dictionary.The first script is fine.The first script runs for only one time and stores the values. It is working.
In the 2nd script:
my $processed = retrieve('processed_dirs.dat'); # $processed is a hashref
Here it is reading the "processed_durs.dat" which is in the first script. So, I am just wondering how the second script knows the location of Processed_dirs.dat here?
#!/usr/bin/perl
use strict;
use warnings;
use Storable;
# This script to be run 1 time only. Sets up 'processed' directories hash.
# After this script is run, ready to run the daily script.
my $dir = '.'; # or what ever directory the date-directories are stored in
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = grep {-d && /^\d\d-\d\d-\d\d$/ && $_ le '11-04-21'} readdir $dh;
closedir $dh or die "Unable to close $dir $!";
my %processed = map {$_ => 1} #dir;
store \%processed, 'processed_dirs.dat';
2nd Script:
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
use Storable;
my $dir = shift or die "Provide path on command line. $!";
my $processed = retrieve('processed_dirs.dat'); # $processed is a hashref
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = grep {-d && /^\d\d-\d\d-\d\d$/ && !$processed->{$_} } readdir $dh;
closedir $dh or die "Unable to close $dir $!";
#dir or die "Found no unprocessed date directories";
my $fdir = '/some/example/path';
for my $date (#dir) {
my $dday = "$dir/$date";
my #gzfiles = glob("$dday/*tar.gz");
foreach my $zf (#gzfiles) {
next if $zf =~ /BMP/ || $zf =~ /LG/ || $zf =~ /MAP/ || $zf =~ /STR/;
print "$zf\n";
copy($zf, $fdir) or die "Unable to copy $zf to $fdir. $!";
}
$processed->{ $date } = 1;
}
store $processed, 'processed_dirs.dat';
Unless I'm missing something, the answer is: Both scripts use a file called "processed_dirs.dat", in whatever directory they are run from. So as long as both scripts are run from the same directory, they will both use the same file.

Unable to print directory names to file using Perl

I'm trying to run this Perl script but it is not running as required. It is supposed to store the values of folders name which are in the format of date( example : 11-03-23)
I have some folders placed at this location in my account:
/hqfs/datastore/files
11-02-23 11-02-17 11-04-21
I'm storing these in "processed_dirs.dat" file.
But in the output: I got "pst12345678" in processed_dirs.dat
And when I printed $dh, I got GLOB(0x12345) some thing like this:
Please help me in getting the right output.
#!/usr/bin/perl
use strict;
use warnings;
use Storable;
# This script to be run 1 time only. Sets up 'processed' directories hash.
# After this script is run, ready to run the daily script.
my $dir = '/hqfs/datastore/files'; # or what ever directory the date-directories are stored in
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = grep {-d && /^\d\d-\d\d-\d\d$/ && $_ le '11-07-25'} readdir $dh;
closedir $dh or die "Unable to close $dir $!";
my %processed = map {$_ => 1} #dir;
store \%processed, 'processed_dirs.dat';
You are missing an argument for -d. Try -d "$dir/$_" && .... (Unless the current directory is always going to be the directory you are reading.)
There is almost no reason to ever use store instead of Storable::nstore.
Why were you trying to print dh?
$dh is a directory handle object. There's nothing useful you can get by printing it.
The output of Storable::store is not intended to be human-readable. If you're expecting something readable in processed_dirs.dat, don't... you will need to use Storable::retrieve to fetch it back out through perl, or Data::Dumper to print out the variable in a readable format.
This implementation works and gives you accurate information.
#!/usr/bin/perl
my $dir = '/Volumes/Data/Alex/';
opendir $dh , $dir
or die "Cannot open dir: $!";
my #result = ();
foreach ( readdir $dh )
{
if ( ! /^\d{2}-\d{2}-\d{2}$/ ) { next; } else { push #result , $_; }
}