Automation in Perl script - perl

I am running a perl script. In my perl script, I check the current date and the folder name(which is also in the date format like for example 11-12-07). This perl script run automatically when It checks the curent date with the folder name. The folder is a tar folder which is loaded from other server.
So, basically I need to run the script if it matched with the folder name and current date.
Problem: Sometimes, I used to get the folder next day and my perl script checks only for the current date. The folder i get has the name which is previous date (not the current date).So, I need to do processing of the folder manually. I need to automate it in my perl script.
Please suggest me some ideas to make it happen.
Thanks!!
Code for Reference:
my $tfilename = 'tarmd5.tar';
my $td = `date '+%y-%m-%d'`; # date in yy-mm-dd format
chomp ($td);
my $td2 = `date '+%Y%m%d'`; # date in yyyymmdd format
chomp ($td2);
#
# get directory from command line
$dir = shift;
leave("'$dir' is not a valid directory") unless (-d $dir);
if ($dir eq '.') {$dir = cwd();}
elsif ($dir !~ /^\//) {$dir = cwd()."/$dir";}
# print out the time
print scalar(localtime()),"\n";
######## This section unpacks transferred data ########
# go to directory for today and find *tar.gz files to copy
my $dday = "$dir/$td";
next unless (-d "$dday");
#gzfiles = glob("$dday/*tar.gz");
foreach $zf(#gzfiles) {
next if (($zf =~ /BMP/) || ($zf =~ /LG/) || ($zf =~ /MAP/) || ($zf =~ /STR/));
print "$zf\n";
($status,$message) = systemcall("/bin/cp $zf $fdir");
}

Maybe using DateTime to do the math. I redid the solution as the first was poorly written. Changed DateTime->today to DateTime->now because one wants the hms portion when converting back to the desired time zone (from 'floating' or 'UTC').
Also used Perl functions instead of shelling out to the Unix system, (date functions, current working directory - cwd, and copy function).
Update: elsif ($dir != /^\//) is incorrect. Changed to elsif ($dir !~ /^\//).
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use DateTime;
use File::Copy;
# set to your desired time zone
my $today = DateTime->now( time_zone => "America/New_York" );
my $td = $today->strftime("%y-%m-%d");
# strongly recommended to do date math in the 'floating'/UTC zone
my $yesterday = $today->set_time_zone('floating')->subtract( days => 1);
my $yd = $yesterday->set_time_zone('America/New_York')->strftime("%y-%m-%d");
my $dir = shift or die "Provide path on command line. $!";
if ($dir eq '.') {
$dir = cwd;
}
elsif ($dir !~ /^\//) {
$dir = cwd() . "/$dir";
}
opendir my $dh, $dir or die $!;
my #dir = sort grep {-d and /$td/ || /$yd/} readdir $dh;
closedir $dh or die $!;
#dir or die "Found no date directories. $!";
my $dday = "$dir/$dir[-1]"; # is today unless today not found, then yesterday
my $fdir = '/some/example/path/';
my #gzfiles = glob("$dday/*tar.gz");
foreach my $zf (#gzfiles) {
next if (($zf =~ /BMP/) || ($zf =~ /LG/) || ($zf =~ /MAP/) || ($zf =~ /STR/));
print "$zf\n";
copy($zf, $fdir) or die "Unable to copy. $!";
}

So you want to get all the directory names that match the current day or any previous days? I presume you move the directories somewhere else when they're done being processed.
A good place to start is the DateTime module. Getting the current date is easy enough:
my $now = DateTime->now();
Then you need to iterate through all directories and pick out the dates you want. Use "perldoc -f" to lookup opendir(), readdir(), and closedir() for getting the directories. To match them, parse out the day/month/year, and create another DateTime object:
my $dir_date = DateTime->new(
day => $dir_day,
month => $dir_month,
year => $dir_year,
);
Once you have all that together, finding if the given directory is a hit is as easy as:
processDir( $dir_name )
if DateTime->compare( $now, $dir_date ) >= 0;

I wonder if it wouldn't be simpler with a bash script. If I understand what you are trying to do, it is
find recent .tar.gz files, with names not containing "BMP", "LG", etc.
copy these files to another dir ($fdir, which is undefined in your example)
Maybe you could just ignore the whole folder-name problem, and search for files not older than 24 hours?
dir=/your/bas/dir
fdir=/your/destination
find $dir -iname "*.tar.gz" -mtime -1 -not \( -name "*BMP*" -o -name "*LG*" -o -name "*MAP*" \) -exec cp "{}" "$fdir" \;

Related

Perl: Fastest way to find files older than X number of minutes, sorted oldest to newest

im trying to check if there is a file (i dont care about folders) that older than X minuts. unfortunatly i can;t tell where is my bug on this code.
i will appriciate any help :)
1. Find the files older than X number of minute
#!/usr/bin/perl
my $maindir = "C:\\Users\\Dor\\Desktop\\aba";
my $minutesold = 60;
my $now = time;
my $filedisc;
# Declare arrays
my #xmlfiles;
my #qulfiedfiles;
# Declare a Dictionary
my %filedisc;
opendir(my $dh, $maindir) or die "opendir($maindir): $!";
# Read all the files
while (my $de = readdir($dh))
{
# get the Full path of the file
my $f = $maindir . $de;
if ( -f $f )
{
push (#xmlfiles, $f);
}
}
closedir($dh);
# For every file in directory
for my $file (#xmlfiles) {
# Get stats about a file
my #stats = stat($file);
# If time stamp is older than minutes provided
if ($stats[9] >= ($now - (( $minutesold * 60) ))){
# Put the File and Time stamp in the dictionary
print($stats[9] ." .| " .$file ."\n\n");
}
#print($now ."\n")
#print($now - ( $minutesold * 60) ."\n");
}
It's usually best to use glob rather than opendir/readdir, so as to avoid having to “rebuild” the full path to the file for every result
You will probably want to enable the :bsd_glob option on Windows so that paths with spaces, such as C:\Program Files, are handled correctly
use strict;
use warnings 'all';
use File::Glob ':bsd_glob'; # Provide for spaces in path
my $root = 'C:\Users\Dor\Desktop\aba';
my $minutesold = 60;
my #old_files = grep { -f and -M * 24 * 60 > $minutes_old } glob "$root\\*.*";
The path and file isn't correct.
my $f = $maindir . $de;
Should be (add slash between path and file)
my $f = "$maindir/$de";
Solving this in kind of a functional programming style is the way to go here I think:
my $dir = shift() || $ENV{HOME}; #command line arg or else home dir
my $minutesold = 60; #1h
opendir my $dh, $dir or die "ERR: opendir($dir) $!\n";
print
map "$$_{timestamp} .| $$_{file}\n",
#sort { $$a{timestamp} <=> $$b{timestamp} } # sort by age
#sort { $$a{file} cmp $$b{file} } # sort by name
grep $^T-$$_{timestamp} >= 60*$minutesold, # $^T is program startup time()
map {{timestamp=>(stat($_))[9], file=>$_}}
grep -f $_,
map "$dir/$_",
readdir $dh;
closedir $dh;
You have missed one simple way to get the modification time of file in perl: the -M switch.
my $modifiedTimeinDays = -M "$file";
my $modifiedTimeinSec = $modifiedTimeinDays*60*60*24;
if($modifiedTimeinSec > 60)
{
# file older than 60 sec
}
As simple as that.
See perldoc -f -X to learn about all of the file tests.

Adding more features in perl script

In the below perl script, I check my folder name (which is in the date format like 11-08-31) with the current date. If it matches, I process the folder. It also checks the previous day folder if there is no folder in today's date. I already asked this type of question here but I need to make some changes here and add new features as well:
The script checks for the previous date if todays not find. But I need to check if the previous date has already been processed or not so that I donot process it again. So, Do I need to create a list for it?
This script checks only for the one previous date. What if I have to check for the 2 previous days? Thanks for your help. hope you understand my doubts.
Updated: This perl script run automatically when It checks the curent date with the folder name. The folder is a tar folder which is loaded from other server.
So, basically I need to run the script if it matched with the folder name and current date.
Problem: Sometimes, I used to get the folder next day and my perl script checks only for the current date. The folder i get has the name which is previous date (not the current date).So, I need to do processing of the folder manually. I need to automate it in my perl script
#!/usr/bin/perl
use strict;
use warnings;
use Cwd;
use DateTime;
use File::Copy;
# set to your desired time zone
my $today = DateTime->now( time_zone => "America/New_York" );
my $td = $today->strftime("%y-%m-%d");
# strongly recommended to do date math in the 'floating'/UTC zone
my $yesterday = $today->set_time_zone('floating')->subtract( days => 1);
my $yd = $yesterday->set_time_zone('America/New_York')->strftime("%y-%m-%d");
my $dir = shift or die "Provide path on command line. $!";
if ($dir eq '.') {
$dir = cwd;
}
elsif ($dir !~ /^\//) {
$dir = cwd() . "/$dir";
}
opendir my $dh, $dir or die $!;
my #dir = sort grep {-d and /$td/ || /$yd/} readdir $dh;
closedir $dh or die $!;
#dir or die "Found no date directories. $!";
my $dday = "$dir/$dir[-1]"; # is today unless today not found, then yesterday
my $fdir = '/some/example/path/';
my #gzfiles = glob("$dday/*tar.gz");
foreach my $zf (#gzfiles) {
next if (($zf =~ /BMP/) || ($zf =~ /LG/) || ($zf =~ /MAP/) || ($zf =~ /STR/));
print "$zf\n";
copy($zf, $fdir) or die "Unable to copy. $!";
}
Well, another way to do it, as suggested by mugen kenichi, is to use Storable. This way stores a hash with all processed directories in it. Then when you run your program, it can check the hash to see if they have been processed.
You would need a one-time script to set up the hash of processed directories.
#!/usr/bin/perl
use strict;
use warnings;
use Storable;
# This script to be run 1 time only. Sets up 'processed' directories hash.
# After this script is run, ready to run the daily script.
my $dir = '.'; # or what ever directory the date-directories are stored in
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = grep {-d && /^\d\d-\d\d-\d\d$/ && $_ le '11-04-21'} readdir $dh;
closedir $dh or die "Unable to close $dir $!";
my %processed = map {$_ => 1} #dir;
store \%processed, 'processed_dirs.dat';
Then, a script to be run periodically to find and process your date directories.
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
use Storable;
my $dir = shift or die "Provide path on command line. $!";
my $processed = retrieve('processed_dirs.dat'); # $processed is a hashref
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = grep {-d && /^\d\d-\d\d-\d\d$/ && !$processed->{$_} } readdir $dh;
closedir $dh or die "Unable to close $dir $!";
#dir or die "Found no unprocessed date directories";
my $fdir = '/some/example/path';
for my $date (#dir) {
my $dday = "$dir/$date";
my #gzfiles = glob("$dday/*tar.gz");
foreach my $zf (#gzfiles) {
next if $zf =~ /BMP/ || $zf =~ /LG/ || $zf =~ /MAP/ || $zf =~ /STR/;
print "$zf\n";
copy($zf, $fdir) or die "Unable to copy $zf to $fdir. $!";
}
$processed->{ $date } = 1;
}
store $processed, 'processed_dirs.dat';
If you want to persist the status of whether these directories were processed beyond a single run of your app, you could create a .processed file in each directory and check for the existence of this file before you process the directory.
If you just need to store the status of these directories (processed or unprocessed) during the execution of your script, you could use a hash keyed with the directory name:
my %PROCESSED = ();
if ($processing_done) {
%PROCESSED{$dirname} = 1;
} else {
%PROCESSED{$dirname} = 0;
}
You can check to see if each directory has been processed by reading the key value from the hash:
if (%PROCESSED{$dirname} == 0) {
... do some processing
} else {
... this one is already done
}
This solution finds all directories yet to be processed that are newer than the most recent direcory-date processed. You have manually record it the first time, (before the script is run). The script will update it from that point on.
The file could be named like my $last = 'dir_last.dat'; I just entered a file at the command line like:
C:\Old_Data\perlp>echo 11-07-14 > dir_last.bat
C:\Old_Data\perlp>type dir_last.bat
11-07-14
C:\Old_Data\perlp>
This assumes the newest directory was 11-07-14. You must find out this yourself before running the script.
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
my $dir = shift or die "Provide path on command line. $!";
my $last = 'dir_last.dat';
open my $fh, "<", $last or die "Unable to open $last $!";
chomp(my $last_proc = <$fh>);
close $fh or die "Unable to close $last $!";
opendir my $dh, $dir or die "Opening failed for directory $dir $!";
my #dir = sort grep {-d && /^\d\d-\d\d-\d\d$/ && $_ gt $last_proc} readdir $dh;
closedir $dh or die "Unable to close $dir $!";
#dir or die "Found no date directories after last update: $last_proc";
my $fdir = '/some/example/path';
for my $date (#dir) {
my $dday = "$dir/$date";
my #gzfiles = glob("$dday/*tar.gz");
foreach my $zf (#gzfiles) {
next if $zf =~ /BMP/ || $zf =~ /LG/ || $zf =~ /MAP/ || $zf =~ /STR/;
print "$zf\n";
copy($zf, $fdir) or die "Unable to copy $zf to $fdir. $!";
}
}
open $fh, ">", $last or die "Unable to open $last $!";
print $fh "$dir[-1]\n"; # record the newest date-directory as processed
close $fh or die "Unable to close $last $!";
Notice that I didn't rely on cwd like the first script. It really wasn't needed there and isn't needed here. opendir, glob and copy all can handle the dot (cwd) directory and relative paths.
The header includes the lines use strict; and use warnings;. Their purpose is to alert you of errors in your code (most all perl scripts should use them unless an expert decides to exclude them - for what reason I don't know). The first line tells unix where to find the interpreter (perl).

Perl program help on opendir and readdir

So I have a program that I want to clean some text files. The program asks for the user to enter the full pathway of a directory containing these text files. From there I want to read the files in the directory, print them to a new file (that is specified by the user), and then clean them in the way I need. I have already written the script to clean the text files.
I ask the user for the directory to use:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
Then I need to read the directory.
my #dir = readdir DIR;
foreach (#dir) {
Now I am lost.
Any help please?
I'm not certain of what do you want. So, I made some assumptions:
When you say clean the text file, you meant delete the text file
The names of the files you want to write into are formed by a pattern.
So, if I'm right, try something like this:
chomp ($user_supplied_directory = <STDIN>);
opendir (DIR, $user_supplied_directory);
my #dir = readdir DIR;
foreach (#dir) {
next if (($_ eq '.') || ($_ eq '..'));
# Reads the content of the original file
open FILE, $_;
$contents = <FILE>;
close FILE;
# Here you supply the new filename
$new_filename = $_ . ".new";
# Writes the content to the new file
open FILE, '>'.$new_filename;
print FILE $content;
close FILE;
# Deletes the old file
unlink $_;
}
I would suggest that you switch to File::Find. It can be a bit of a challenge in the beginning but it is powerful and cross-platform.
But, to answer your question, try something like:
my #files = readdir DIR;
foreach $file (#files) {
foo($user_supplied_directory/$file);
}
where "foo" is whatever you need to do to the files. A few notes might help:
using "#dir" as the array of files was a bit misleading
the folder name needs to be prepended to the file name to get the right file
it might be convenient to use grep to throw out unwanted files and subfolders, especially ".."
I wrote something today that used readdir. Maybe you can learn something from it. This is just a part of a (somewhat) larger program:
our #Perls = ();
{
my $perl_rx = qr { ^ perl [\d.] + $ }x;
for my $dir (split(/:/, $ENV{PATH})) {
### scanning: $dir
my $relative = ($dir =~ m{^/});
my $dirpath = $relative ? $dir : "$cwd/$dir";
unless (chdir($dirpath)) {
warn "can't cd to $dirpath: $!\n";
next;
}
opendir(my $dot, ".") || next;
while ($_ = readdir($dot)) {
next unless /$perl_rx/o;
### considering: $_
next unless -f;
next unless -x _;
### saving: $_
push #Perls, "$dir/$_";
}
}
}
{
my $two_dots = qr{ [.] .* [.] }x;
if (grep /$two_dots/, #Perls) {
#Perls = grep /$two_dots/, #Perls;
}
}
{
my (%seen, $dev, $ino);
#Perls = grep {
($dev, $ino) = stat $_;
! $seen{$dev, $ino}++;
} #Perls;
}
The crux is push(#Perls, "$dir/$_"): filenames read by readdir are basenames only; they are not full pathnames.
You can do the following, which allows the user to supply their own directory or, if no directory is specified by the user, it defaults to a designated location.
The example shows the use of opendir, readdir, stores all files in the directory in the #files array, and only files that end with '.txt' in the #keys array. The while loop ensures that the full path to the files are stored in the arrays.
This assumes that your "text files" end with the ".txt" suffix. I hope that helps, as I'm not quite sure what's meant by "cleaning the files".
use feature ':5.24';
use File::Copy;
my $dir = shift || "/some/default/directory";
opendir(my $dh, $dir) || die "Can't open $dir: $!";
while ( readdir $dh ) {
push( #files, "$dir/$_");
}
# store ".txt" files in new array
foreach $file ( #files ) {
push( #keys, $file ) if $file =~ /(\S+\.txt\z)/g;
}
# Move files to new location, even if it's across different devices
for ( #keys ) {
move $_, "/some/other/directory/"; || die "Couldn't move files: $!\n";
}
See the perldoc of File::Copy for more info.

How to get nested directories contents in Perl

i'm trying to write a script which would process certain files. The data are organized like this: there is a folder (let's call it X) where my script will be placed. In this same folder there is a subfolder called 'data'. This contains several more subfolders with various names and each of these contains many files (no other subfolders, just files). I need to process all files in a subfolder (more specifically, run a function on each file) and then merge the results for all files in the subfolder, so for each folder there is one result (no matter how many files it contains).
The problem is, i'm not able to get to the files so i could run my function on them. What i have now is this:
$dirname = "data";
opendir ( DIR, $dirname ) || die "Error in opening dir $dirname\n";
while( ($dirname2 = readdir(DIR)) )
{
next if $dirname2 eq ".";
next if $dirname2 eq "..";
opendir ( DIR2, $dirname2 ) || die "Error in opening dir $dirname2\n";
while( ($file = readdir(DIR2)) )
{
next if $file eq ".";
next if $file eq "..";
print( "file:$file\n" );
}
closedir(DIR2);
}
closedir(DIR);
It always fails with the message "Error in opening dir alex". 'alex' happens to be the first directory in the data directory. My question is - where is the problem? Is this even the correct way how to achieve what i'm trying to do? I'm also worried that this my fail if there is a file also in the data folder, since i cannot open it with opendir, or can I?
PS: sorry for that horrible Perl code - i'm still trying to learn this language.
Thanks,
Peter
You can try File::Path - Create or remove directory trees
As i am running your program, i think you have to specify your full path while opening a directory ie.,
opendir ( DIR2, $dirname.\\.$dirname2 ) || die "Error in opening dir $dirname2\n"; #running code on windows
It will work, try it.
you can use File::Find to do find files nested directories
Are you sure that inside folder exist only folders? Add additional check:
next if !(-d $dirname2);
Here is a slightly cleaned up version of what was posted in the question.
use strict;
use warnings;
use autodie;
use File::Spec::Functions qw'catdir catfile';
my $dirname = "data";
{
opendir my $dir_h, $dirname;
while( my $dirname2 = readdir($dir_h) ){
next if $dirname2 eq ".";
next if $dirname2 eq "..";
$dirname2 = catdir( $dirname, $dirname2 );
next unless -d $dirname2;
opendir my $dir_h2, $dirname2;
while( my $file = readdir($dir_h2) )
{
next if $file eq ".";
next if $file eq "..";
$file = catfile($dirname2,$file);
if( -f $file ){
print( "file:$file\n" );
}
}
# $dir_h2 automatically closes here
}
# $dir_h automatically closes here
}
If you are going to run it on Perl versions earlier than 5.12.0 you should wrap the while loop's conditional with defined().
while( my $dirname2 = readdir($dir_h) ){
while( defined( my $dirname2 = readdir($dir_h) ) ){

How can I find the newest created file in a directory?

Is there an elegant way in Perl to find the newest file in a directory (newest by modification date)?
What I have so far is searching for the files I need, and for each one get it's modification time, push into an array containing the filename, modification time, then sort it.
There must be a better way.
Your way is the "right" way if you need a sorted list (and not just the first, see Brian's answer for that). If you don't fancy writing that code yourself, use this
use File::DirList;
my #list = File::DirList::list('.', 'M');
Personally I wouldn't go with the ls -t method - that involves forking another program and it's not portable. Hardly what I'd call "elegant"!
Regarding rjray's solution hand coded solution, I'd change it slightly:
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
my #files = map { [ stat "$DIR/$_", $_ ] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
sub rev_by_date { $b->[9] <=> $a->[9] }
my #sorted_files = sort rev_by_date #files;
After this, #sorted_files contains the sorted list, where the 0th element is the newest file, and each element itself contains a reference to the results of stat, with the filename itself in the last element:
my #newest = #{$sorted_files[0]};
my $name = pop(#newest);
The advantage of this is that it's easier to change the sorting method later, if desired.
EDIT: here's an easier-to-read (but longer) version of the directory scan, which also ensures that only plain files are added to the listing:
my #files;
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
while (defined (my $file = readdir($DH))) {
my $path = $DIR . '/' . $file;
next unless (-f $path); # ignore non-files - automatically does . and ..
push(#files, [ stat(_), $path ]); # re-uses the stat results from '-f'
}
closedir($DH);
NB: the test for defined() on the result of readdir() is because a file called '0' would cause the loop to fail if you only test for if (my $file = readdir($DH))
You don't need to keep all of the modification times and filenames in a list, and you probably shouldn't. All you need to do is look at one file and see if it's older than the oldest you've previously seen:
{
opendir my $dh, $dir or die "Could not open $dir: $!";
my( $newest_name, $newest_time ) = ( undef, 2**31 -1 );
while( defined( my $file = readdir( $dh ) ) ) {
my $path = File::Spec->catfile( $dir, $file );
next if -d $path; # skip directories, or anything else you like
( $newest_name, $newest_time ) = ( $file, -M _ ) if( -M $path < $newest_time );
}
print "Newest file is $newest_name\n";
}
you could try using the shell's ls command:
#list = `ls -t`;
$newest = $list[0];
Assuming you know the $DIR you want to look in:
opendir(my $DH, $DIR) or die "Error opening $DIR: $!";
my %files = map { $_ => (stat("$DIR/$_"))[9] } grep(! /^\.\.?$/, readdir($DH));
closedir($DH);
my #sorted_files = sort { $files{$b} <=> $files{$a} } (keys %files);
# $sorted_files[0] is the most-recently modified. If it isn't the actual
# file-of-interest, you can iterate through #sorted_files until you find
# the interesting file(s).
The grep that wraps the readdir filters out the "." and ".." special files in a UNIX(-ish) filesystem.
If you can't let ls do the sorting for you as #Nathan suggests, then you can optimize your process by only keeping the newest modification time and associated filename seen thus far and replace it every time you find a newer file in the directory. No need to keep any files around that you know are older than the newest one you've seen so far and certainly no need to sort them since you can detect which is the newest one while reading from the directory.
Subject is old, but maybe someone will try it - it isn't portable (Unix-like systems only), but it's quite simple and works:
chdir $directory or die "cannot change directory";
my $newest_file = bash -c 'ls -t | head -1';
chomp $newest_file;
print "$newest_file \n";