Perl Script did not find the newest file - perl

I use the following perl script from "https://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/Check-Newest-files-age-and-size-in-Diredtory/"
But in these script is an Error. The script is not showing the newest file. Can someone find the mistake? In the comments of the site have wrote somebody, that in line 22 the mistake is. I can't find it:
Here is the code:
# Check that file exists (can be directory or link)
unless (-d $opt_f) {
print "FILE_AGE CRITICAL: Folder not found - $opt_f\n";
exit $ERRORS{'CRITICAL'};
}
my $list = opendir DIRHANDLE, $opt_f or die "Cant open directory: $!";
while ($_ = readdir(DIRHANDLE))
{
$file=sprintf("%s/%s",$opt_f,$_);
$attrs = stat("$file");
$diff = time()-$attrs->mtime;
if($temp == 0)
{
#$temp=$diff;
$new=$file;
}
if($_ ne "." && $_ ne "..")
{
if($diff<$temp)
{
$temp=$diff;
$new=$_;
}
else
{
$temp=$diff; $new=$_;
}
}
}
$st = File::stat::stat($opt_f."/".$new);
$age = time - $st->mtime;
$size = $st->size;
Example:
I have some files on a filer (backups in a .img File). I use this script, to check the newest file size. If I create a new folder with a new file, the check looks to the correct file. But if I create a second file, the check looks to the old file anytime. If I create a third file, the check goes to the correct file. The fourth file is wrong and the fifth file is correct again(and so on)

An easy (easier?) way to do this would be to use the built-in glob function to read the directory instead of opening it, and then use simple file tests to sort the files by creation or modification time:
my #files = sort {-M($a) <=> -M($b)} glob "*"; # or -C for creation
# $files[0] is the newest file
A list of file test operators is at
https://users.cs.cf.ac.uk/Dave.Marshall/PERL/node69.html
Note that -C and -M relate to when the script started, so for long-running or daemon scripts you might need to do something a bit different.

You want to find the earliest mtime, so we're talking about a simple comparison of the previously-found earlier mtime with the mtime of the current file. But there's so much code beyond that in what you posted ...and the first thing you do with the value you want to compare is change it? What?
Let's start over.
my $earliest_mtime = -1;
my $earliest_qfn;
while (defined( my $fn = readdir($dh) )) {
next if $fn =~ /^\.\.?\z/;
my $qfn = "$dir_qfn/$fn";
my $stats = stat($qfn)
or warn("Can't stat \"$qfn\": $!\n"), next;
my $mtime = $stats->mtime;
if ($mtime < $earliest_mtime) {
$earliest_mtime = $mtime;
$earliest_qfn = $qfn;
}
}
if (defined($earliest_qfn)) {
say $earliest_qfn;
}

The biggest issue with the script seems to be that line 12 calls the core version of stat but line 13 expects the output to be that of File::stat::stat(). I suspect that testing for '.' or '..' should be done at the top of the while loop and all the variables should be defined before they are used.
As Jeremy has said, you're better off sorting an array of the files and pushing/poping the first/last value, depending on what you're looking for.

Related

Save contents of those files which contain a specific known string in an single .txt or .tmp file using perl

I'm trying to write a perl script where I'm trying to save whole contents of those files which contain a specific string 'PYAG_GENERATED', in a single .txt/.tmp file one after another. These file names are in a specific pattern and this pattern is 'output_nnnn.txt' where nnnn is 0001,0002 and so on. But I don't know how many number of files are present with this 'output_nnnn.txt' name.
I'm new in perl and I don't know how I can resolve this issue to get the output correctly. Can anyone help me. Thanks in advance.
I've tried to write perl script in different ways but nothing is coming in output file. I'm giving here one of those I've tried. 'new_1.txt' is the new file where I want to save the expected output and "PYAG_GENERATED" is that specific string I'm finding for in the files.
open(NEW,">>new_1.txt") or die "could not open:$!";
$find2="PYAG_GENERATED";
$n='0001';
while('output_$n.txt'){
if(/find2/){
print NEW;
}
$n++;
}
close NEW;
I expect that the output file 'new_1.txt' will save the whole contents of the the files(with filename pattern 'output_nnnn.txt') which have 'PYAG_GENERATED' string at least once inside.
Well, you tried I guess.
Welcome to the wonderful world of Perl where there are always a dozen ways of doing X :-) One possible way to achieve what you want. I put in a lot of comments I hope are helpful. It's also a bit verbose for the sake of clarity. I'm sure it could be golfed down to 5 lines of code.
use warnings; # Always start your Perl code with these two lines,
use strict; # and Perl will tell you about possible mistakes
use experimental 'signatures';
use File::Slurp;
# this is a subroutine/function, a block of code that can be called from
# somewhere else. it takes to arguments, that the caller must provide
sub find_in_file( $filename, $what_to_look_for )
{
# the open function opens $filename for reading
# (that's what the "<" means, ">" stands for writing)
# if successfull open will return we will have a "file handle" in the variable $in
# if not open will return false ...
open( my $in, "<", $filename )
or die $!; # ... and the program will exit here. The variable $! will contain the error message
# now we read the file using a loop
# readline will give us the next line in the file
# or something false when there is nothing left to read
while ( my $line = readline($in) )
{
# now we test wether the current line contains what
# we are looking for.
# the index function gives us the index of a string within another string.
# for example index("abc", "c") will give us 3
if ( index( $line, $what_to_look_for ) > 0 )
{
# we found what we were looking for
# so we don't need to keep looking in this file anymore
# so we must first close the file
close( $in );
# and then we indicate to the caller the search was a successfull
# this will immedeatly end the subroutine
return 1;
}
}
# If we arrive here the search was unsuccessful
# so we tell that to the caller
return 0;
}
# Here starts the main program
# First we get a list of files
# we want to look at
my #possible_files = glob( "where/your/files/are/output_*.txt" );
# Here we will store the files that we are interested in, aka that contain PYAG_GENERATED
my #wanted_files;
# and now we can loop over the files and see if they contain what we are looking for
foreach my $filename ( #possible_files )
{
# here we use the function we defined earlier
if ( find_in_file( $filename, "PYAG_GENERATED" ) )
{
# with push we can add things to the end of an array
push #wanted_files, $filename;
}
}
# We are finished searching, now we can start adding the files together
# if we found any
if ( scalar #wanted_files > 0 )
{
# Now we could code that us ourselves, open the files, loop trough them and write out
# line by line. But we make life easy for us and just
# use two functions from the module File::Slurp, which comes with Perl I believe
# If not you have to install it
foreach my $filename ( #wanted_files )
{
append_file( "new_1.txt", read_file( $filename ) );
}
print "Output created from " . (scalar #wanted_files) . " files\n";
}
else
{
print "No input files\n";
}
use strict;
use warnings;
my #a;
my $i=1;
my $find1="PYAG_GENERATED";
my $n=1;
my $total_files=47276; #got this no. of files by writing 'ls' command in the terminal
while($n<=$total_files){
open(NEW,"<output_$n.txt") or die "could not open:$!";
my $join=join('',<NEW>);
$a[$i]=$join;
#print "$a[10]";
$n++;
$i++;
}
close NEW;
for($i=1;$i<=$total_files;$i++){
if($a[$i]=~m/$find1/){
open(NEW1,">>new_1.tmp") or die "could not open:$!";
print NEW1 $a[$i];
}
}
close NEW1;

How to find a file which exists in different directories under a given path in Perl

I'm looking for a method to looks for file which resides in a few directories in a given path. In other words, those directories will be having files with same filename across. My script seem to have the hierarchy problem on looking into the correct path to grep the filename for processing. I have a fix path as input and the script will need to looks into the path and finding files from there but my script seem stuck on 2 tiers up and process from there rather than looking into the last directories in the tier (in my case here it process on "ln" and "nn" and start processing the subroutine).
The fix input path is:-
/nfs/disks/version_2.0/
The files that I want to do post processing by subroutine will be exist under several directories as below. Basically I wanted to check if the file1.abc do exists in all the directories temp1, temp2 & temp3 under ln directory. Same for file2.abc if exist in temp1, temp2, temp3 under nn directory.
The files that I wanted to check in full path will be like this:-
/nfs/disks/version_2.0/dir_a/ln/temp1/file1.abc
/nfs/disks/version_2.0/dir_a/ln/temp2/file1.abc
/nfs/disks/version_2.0/dir_a/ln/temp3/file1.abc
/nfs/disks/version_2.0/dir_a/nn/temp1/file2.abc
/nfs/disks/version_2.0/dir_a/nn/temp2/file2.abc
/nfs/disks/version_2.0/dir_a/nn/temp3/file2.abc
My script as below:-
#! /usr/bin/perl -w
my $dir = '/nfs/fm/disks/version_2.0/' ;
opendir(TEMP, $dir) || die $! ;
foreach my $file (readdir(TEMP)) {
next if ($file eq "." || $file eq "..") ;
if (-d "$dir/$file") {
my $d = "$dir/$file";
print "Directory:- $d\n" ;
&getFile($d);
&compare($file) ;
}
}
Note that I put the print "Directory:- $d\n" ; there for debug purposes and it printed this:-
/nfs/disks/version_2.0/dir_a/
/nfs/disks/version_2.0/dir_b/
So I knew it get into the wrong path for processing the following subroutine.
Can somebody help to point me where is the error in my script? Thanks!
To be clear: the script is supposed to recurse through a directory and look for files with a particular filename? In this case, I think the following code is the problem:
if (-d "$dir/$file") {
my $d = "$dir/$file";
print "Directory:- $d\n" ;
&getFile($d);
&compare($file) ;
}
I'm assuming the &getFile($d) is meant to step into a directory (i.e., the recursive step). This is fine. However, it looks like the &compare($file) is the action that you want to take when the object that you're looking at isn't a directory. Therefore, that code block should look something like this:
if (-d "$dir/$file") {
&getFile("$dir/$file"); # the recursive step, for directories inside of this one
} elsif( -f "$dir/$file" ){
&compare("$dir/$file"); # the action on files inside of the current directory
}
The general pseudo-code should like like this:
sub myFind {
my $dir = shift;
foreach my $file( stat $dir ){
next if $file -eq "." || $file -eq ".."
my $obj = "$dir/$file";
if( -d $obj ){
myFind( $obj );
} elsif( -f $obj ){
doSomethingWithFile( $obj );
}
}
}
myFind( "/nfs/fm/disks/version_2.0" );
As a side note: this script is reinventing the wheel. You only need to write a script that does the processing on an individual file. You could do the rest entirely from the shell:
find /nfs/fm/disks/version_2.0 -type f -name "the-filename-you-want" -exec your_script.pl {} \;
Wow, it's like reliving the 1990s! Perl code has evolved somewhat, and you really need to learn the new stuff. It looks like you learned Perl in version 3.0 or 4.0. Here's some pointers:
Use use warnings; instead of -w on the command line.
Use use strict;. This will require you to predeclare variables using my which will scope them to the local block or the file if they're not in a local block. This helps catch a lot of errors.
Don't put & in front of subroutine names.
Use and, or, and not instead of &&, ||, and !.
Learn about Perl Modules which can save you a lot of time and effort.
When someone says detect duplicates, I immediately think of hashes. If you use a hash based upon your file's name, you can easily see if there are duplicate files.
Of course a hash can only have a single value for each key. Fortunately, in Perl 5.x, that value can be a reference to another data structure.
So, I recommend you use a hash that contains a reference to a list (array in old parlance). You can push each instance of the file to that list.
Using your example, you'd have a data structure that looks like this:
%file_hash = {
file1.abc => [
/nfs/disks/version_2.0/dir_a/ln/temp1
/nfs/disks/version_2.0/dir_a/ln/temp2
/nfs/disks/version_2.0/dir_a/ln/temp3
],
file2.abc => [
/nfs/disks/version_2.0/dir_a/nn/temp1
/nfs/disks/version_2.0/dir_a/nn/temp2
/nfs/disks/version_2.0/dir_a/nn/temp3
];
And, here's a program to do it:
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say); #Can use `say` which is like `print "\n"`;
use File::Basename; #imports `dirname` and `basename` commands
use File::Find; #Implements Unix `find` command.
use constant DIR => "/nfs/disks/version_2.0";
# Find all duplicates
my %file_hash;
find (\&wanted, DIR);
# Print out all the duplicates
foreach my $file_name (sort keys %file_hash) {
if (scalar (#{$file_hash{$file_name}}) > 1) {
say qq(Duplicate File: "$file_name");
foreach my $dir_name (#{$file_hash{$file_name}}) {
say " $dir_name";
}
}
}
sub wanted {
return if not -f $_;
if (not exists $file_hash{$_}) {
$file_hash{$_} = [];
}
push #{$file_hash{$_}}, $File::Find::dir;
}
Here's a few things about File::Find:
The work takes place in the subroutine wanted.
The $_ is the name of the file, and I can use this to see if this is a file or directory
$File::Find::Name is the full name of the file including the path.
$File::Find::dir is the name of the directory.
If the array reference doesn't exist, I create it with the $file_hash{$_} = [];. This isn't necessary, but I find it comforting, and it can prevent errors. To use $file_hash{$_} as an array, I have to dereference it. I do that by putting a # in front of it, so it can be #$file_hash{$_} or, #{$file_hash{$_}}.
Once all the file are found, I can print out the entire structure. The only thing I do is check to make sure there is more than one member in each array. If there's only a single member, then there are no duplicates.
Response to Grace
Hi David W., thank you very much for your explainaion and sample script. Sorry maybe I'm not really clear in definding my problem statement. I think I can't use hash in my path finding for the data structure. Since the file*.abc is a few hundred and undertermined and each of the file*.abc even is having same filename but it is actually differ in content in each directory structures.
Such as the file1.abc resides under "/nfs/disks/version_2.0/dir_a/ln/temp1" is not the same content as file1.abc resides under "/nfs/disks/version_2.0/dir_a/ln/temp2" and "/nfs/disks/version_2.0/dir_a/ln/temp3". My intention is to grep the list of files*.abc in each of the directories structure (temp1, temp2 and temp3 ) and compare the filename list with a masterlist. Could you help to shed some lights on how to solve this? Thanks. – Grace yesterday
I'm just printing the file in my sample code, but instead of printing the file, you could open them and process them. After all, you now have the file name and the directory. Here's the heart of my program again. This time, I'm opening the file and looking at the content:
foreach my $file_name (sort keys %file_hash) {
if (scalar (#{$file_hash{$file_name}}) > 1) {
#say qq(Duplicate File: "$file_name");
foreach my $dir_name (#{$file_hash{$file_name}}) {
#say " $dir_name";
open (my $fh, "<", "$dir_name/$file_name")
or die qq(Can't open file "$dir_name/$file_name" for reading);
# Process your file here...
close $fh;
}
}
}
If you are only looking for certain files, you could modify the wanted function to skip over files you don't want. For example, here I am only looking for files which match the file*.txt pattern. Note I use a regular expression of /^file.*\.txt$/ to match the name of the file. As you can see, it's the same as the previous wanted subroutine. The only difference is my test: I'm looking for something that is a file (-f) and has the correct name (file*.txt):
sub wanted {
return if not -f $_ and /^file.*\.txt$/;
if (not exists $file_hash{$_}) {
$file_hash{$_} = [];
}
push #{$file_hash{$_}}, $File::Find::dir;
}
If you are looking at the file contents, you can use the MD5 hash to determine if the file contents match or don't match. This reduces a file to a mere string of 16 to 28 characters which could even be used as a hash key instead of the file name. This way, files that have matching MD5 hashes (and thus matching contents) would be in the same hash list.
You talk about a "master list" of files and it seems you have the idea that this master list needs to match the content of the file you're looking for. So, I'm making a slight mod in my program. I am first taking that master list you talked about, and generating MD5 sums for each file. Then I'll look at all the files in that directory, but only take the ones with the matching MD5 hash...
By the way, this has not been tested.
#! /usr/bin/env perl
#
use strict;
use warnings;
use feature qw(say); #Can use `say` which is like `print "\n"`;
use File::Find; #Implements Unix `find` command.
use Digest::file qw(digest_file_hex);
use constant DIR => "/nfs/disks/version_2.0";
use constant MASTER_LIST_DIR => "/some/directory";
# First, I'm going thorugh the MASTER_LIST_DIR directory
# and finding all of the master list files. I'm going to take
# the MD5 hash of those files, and store them in a Perl hash
# that's keyed by the name of file file. Thus, when I find a
# file with a matching name, I can compare the MD5 of that file
# and the master file. If they match, the files are the same. If
# not, they're different.
# In this example, I'm inlining the function I use to find the files
# instead of making it a separat function.
my %master_hash;
find (
{
%master_hash($_) = digest_file_hex($_, "MD5") if -f;
},
MASTER_LIST_DIR
);
# Now I have the MD5 of all the master files, I'm going to search my
# DIR directory for the files that have the same MD5 hash as the
# master list files did. If they do have the same MD5 hash, I'll
# print out their names as before.
my %file_hash;
find (\&wanted, DIR);
# Print out all the duplicates
foreach my $file_name (sort keys %file_hash) {
if (scalar (#{$file_hash{$file_name}}) > 1) {
say qq(Duplicate File: "$file_name");
foreach my $dir_name (#{$file_hash{$file_name}}) {
say " $dir_name";
}
}
}
# The wanted function has been modified since the last example.
# Here, I'm only going to put files in the %file_hash if they
sub wanted {
if (-f $_ and $file_hash{$_} = digest_file_hex($_, "MD5")) {
$file_hash{$_} //= []; #Using TLP's syntax hint
push #{$file_hash{$_}}, $File::Find::dir;
}
}

How could I auto increment a file and have it reset

I am trying to figure out a way to organize files in a folder with
perl. The basis of my script is sort of simple. The script reads the
subfolders in the root folder, goes through each folder, and uses a
simple comparison of the first 3 digits in the file name to the 3
digit folder name. If they do not match, it puts it into the right
place.
My problem is that while looking through the files I sometimes run
into a duplicate order number. Because the order already exists I
cannot overwrite the original, but the two files cannot exist in the
same place for obvious reasons. So I had come up with the idea to
append the word _TEMP to the end of the file name in order to move
them and they can be renamed later. The problem I am running into now
is when I already have two duplicates. I am looking for a way to
allow the TEMP tag to increment by 1 everytime it is used and then
reset to zero each time the loop starts again. I am just not really
sure where I should implement this idea.
Here is the main routine for the script:
foreach my $office (keys %office_names) {
make_junk_folder($office);
# The matches and unmatches come back as array references!
my ($returned_matches, $returned_unmatches) = read_root_folder($office);
foreach my $folder (#$returned_matches) {
my $returned_files = read_subfolder($office, $folder);
foreach my $file (#$returned_files) {
analyze_file($office,$folder,$file);
}
}
foreach my $folder (#$returned_unmatches) {
print "$folder\n";
remove_root_junk($office,$folder);
}
}
Here is the subroutine that handles the file moving and renaming:
sub analyze_file {
my $office = shift;
my $folder = shift;
my $file = shift;
my $order_docs_path = $office_names{$office};
if ($file =~ /^(?<office> (C[AFL]|ME)) (?<folder_num> \d{3})
(?<file_num> \d{3}) ([_|\-] \d+)? \. (?<file_ext> pdf)
$/xmi) {
my $file_office = uc($+{office});
my $folder_num = $+{folder_num};
my $file_num = $+{file_num};
my $file_ext = lc($+{file_ext});
# Change hyphens to a underscore
$file_num =~ s/\-/_/;
my $file_name = "$file_office" . "$folder_num" . "$file_num" .
"\." . "$file_ext";
my $temp_name = "$file_office" . "$folder_num" . "$file_num" .
"_TEMP" . "\." . "$file_ext";
if ($folder != $folder_num) {
# If the folder does not exist create the folder
if (! -e "$order_docs_path\\$folder_num") {
system "mkdir $order_docs_path\\$folder_num";
}
# Check to see if the file already exists
if ( -e "$order_docs_path\\$folder_num\\$file_name") {
# Append the file with a "_TEMP". These files are
# missorted pages belonging to a larger document
rename ("$order_docs_path\\$folder\\$file",
"$order_docs_path\\$folder_num\\$temp_name");
} else {
# Moves the file to correct place, these are mismatched files
rename ("$order_docs_path\\$folder\\$file",
"$order_docs_path\\$folder_num\\$file_name");
}
} else {
# Files are in the correct place, the file name will be
# corrected only
rename ("$order_docs_path\\$folder\\$file",
"$order_docs_path\\$folder_num\\$file_name");
}
}
}
Some example filenames look like this:
CF100145.pdf
CA310244.pdf
CL211745.pdf
CL211745_1.pdf (This denotes a second page, from our document scanner)
ME102103.pdf
Where problems occur is when the same job is changed and rather than putting the file where it should go the person places the updated job information into the current job folder, the directory does not match the first 3 numbers in the file. So later they have to be sorted in order to solve errors, the problem is there are over 500,000 documents in just one office and we got 4 offices.
I would change this part:
if ( -e "$order_docs_path\\$folder_num\\$file_name") {
# Append the file with a "_TEMPn". These files are
# missorted pages belonging to a larger document
my $n = 1;
while (-e "$order_docs_path\\$folder_num\\$temp_name") {
$temp_name =~ s/TEMP\d*/TEMP$n/;
$n++;
}
rename ("$order_docs_path\\$folder\\$file",
"$order_docs_path\\$folder_num\\$temp_name");
} # ...
You can add _TEMP_OLD_FOLDER_NAME to it.
In this folder was only one file, so no files will be with same new name

How can determine the next number to assign a file when attempting to organize a directory?

I am working on a perl script to organize a folder we have that holds all of our order documents. The script works for the most part except for a curve ball someone threw at me the other day.
The problem is when we have an order that we have redone recently. We are land surveyors and sometimes we will do a survey and then a few years later we will do what is called a "flyby" where we will go back and "append" the order with another file either noting changes to the land or just simply say that everything is okay and nothing has changed.
Where this causes a problem for me is the new file we make / made has the same order number document number as the older document. For example we might have a documents named CF145323, that document would then have several single page PDF file named CF145323.pdf, *_1.pdf, *_2.pdf and so on.
What I am looking for is a way to modify my script to count the files it finds and determine / predict the next file number to come. So if there was a *_1.pdf through *_3.pdf. I want perl to take the mismatched file and make it a *_4.pdf. Follow me?
The other catch is the files are sometimes in different folders that do not match the first the number in the file name. That part I seem to have figured out it just the numbering I don't have worked out.
Also I am working in Windows so I cannot use any Linux commands.
Here is my script in the last state I left it:
#!/usr/bin/perl
use strict;
use warnings;
# Root folder for Order Documents
my $orders_root = "C:\\Users\\Ian\\Desktop\\Order_docs";
# Keep track of how many files are processed
my $files_counter = 0;
# Keep track of how many junk files are processed
my $junk_counter = 0;
# Store a list of folders that match the 3 number naming scheme
my #matched_folders;
# Create a place to move junk files into
if (! -e "$orders_root\\Junk") {
system "mkdir $orders_root\\Junk";
}
# Clear the screen
system "cls";
print "Processing files, please wait...\n\n";
# Open $order_dir_root
opendir(ORDERS_ROOT, "$orders_root") or die $!;
# Collect a list of all sub folders
my #folders = readdir(ORDERS_ROOT);
# Close $order_dir_root
closedir(ORDERS_ROOT);
# Remove the directories "." and ".." from the output
splice #folders, 0, 2;
foreach my $folder (#folders) {
# Filter out all directories that don't match the numbering system
if ($folder =~ / \d{3} /xm) {
# If the folder matches the expression above, add it to the list of matched folders
push #matched_folders, $folder;
# Open each folder inside of the Order Documents root
opendir(CURRENT_FOLDER, "$orders_root\\$folder");
# Foreach folder opened, collect a list of files in the folder for sorting
my #files = readdir(CURRENT_FOLDER);
# Close the current folder
closedir(CURRENT_FOLDER);
# Remove the directories "." and ".." from the output
splice #files, 0, 2;
foreach my $file (#files) {
# Match each file to the standard naming scheme
if ($file =~ /^ (C[AFL]|ME) \d{3} \d{3}([_|\-] \d+)? \. pdf /xmi) {
++$files_counter;
# If that file does not match, move it to a junk folder
} else {
++$junk_counter;
rename ("$orders_root\\$folder\\$file", "$orders_root\\Junk\\$file");
} # End pdf match
} # End foreach $file
} # End folder match
} # End foreach $folder
foreach my $folder (#matched_folders) {
# Open $folder
opendir(CURRENT_FOLDER, "$orders_root\\$folder");
# Collect a list of all sub folders
my #files = readdir(CURRENT_FOLDER);
# Close $folder
closedir(CURRENT_FOLDER);
splice #files, 0, 2;
foreach my $file (#files) {
if ($file =~ /^ (?<office> (C[AFL]|ME)) (?<folder_num> \d{3}) (?<file_num> \d{3}([_|\-] \d+)?) \. (?<file_ext> pdf) /xmi) {
my $office = uc($+{office});
my $folder_num = $+{folder_num};
my $file_num = $+{file_num};
my $file_ext = lc($+{file_ext});
# Change hyphens to a underscore
$file_num =~ s/\-/_/;
my $file_name = "$office" . "$folder_num" . "$file_num" . "\." . "$file_ext";
my $fly_by_name = "$office" . "$folder_num" . "$file_num" . "_FB" . "\." . "$file_ext";
# Check if the current file belongs in the current folder
if ($folder != $folder_num) {
# If the folder does not exist create the folder
if (! -e "$orders_root\\$folder_num") {
system "mkdir $orders_root\\$folder_num";
}
# Check to see if the file already exists
if (! -e "$orders_root\\$folder_num\\$file_name") {
# Moves the file to correct place, these are mismatched files
rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$file_name");
} else {
# Appends the file with a "_#" where # is equal to the 1+ the last file number, these files are fly bys
rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$fly_by_name");
}
# Files are in the correct place, the file name will be corrected only
} else {
rename ("$orders_root\\$folder\\$file", "$orders_root\\$folder_num\\$file_name");
}
} # End $file match
} # End foreach $file
} # End foreach $folder
# Show statistics after processing
print "Done!\n\n";
print "$#folders folders processed\n";
print "$files_counter files processed\n";
print "$junk_counter junk files removed\n"
Your script is rather large to wade through, but I suggest a different approach.
First, and most obvious, is something like this:
my $base = "CF145323";
my $num = 1;
$num++ while -f "${base}_$num.pdf";
my $filename = "${base}_$num.pdf";
print "$filename\n";
In other words, see if the file already exists. You would have to modify this to test the various directories you hold the files in, and this won't work if there are gaps in the numbering sequence.
It might be easier to keep a record of each file and the latest generation of it. Typically that would be in a hash, using, for example, 'CF145323' as the key and the latest version number as its value. The hash can be saved and restored using the Storable module (very easy to use, and in the Perl base).

Nested while loop which does not seem to keep variables appropriately

I'm an amature Perl coder, and I'm having a lot of trouble figuring what is causing this particular issue. It seems as though it's a variable issue.
sub patch_check {
my $pline;
my $sline;
while (<SYSTEMINFO>) {
chomp($_);
$sline = $_;
while (<PATCHLIST>) {
chomp($_);
$pline = $_;
print "sline $sline pline $pline underscoreline $_ "; #troubleshooting
print "$sline - $pline\n";
if ($pline =~ /($sline)/) {
#print " - match $pline -\n";
}
} #end while
}
}
There is more code, but I don't think it is relevant. When I print $sline in the first loop it works fine, but not in the second loop. I tried making the variables global, but that did not work either.
The point of the subform is I want to open a file (patches) and see if it is in (systeminfo). I also tried reading the files into arrays and doing foreach loops.
Does anyone have another solution?
It looks like your actual goal here is to find lines which are in both files, correct? The normal (and much more efficient! - it only requires you to read in each file once, rather than reading all of one file for each line in the other) way to do this in Perl would be to read the lines from one file into a hash, then use hash lookups on each line in the other file to check for matches.
Untested (but so simple it should work) code:
sub patch_check {
my %slines;
while (<SYSTEMINFO>) {
# Since we'll just be comparing one file's lines
# against the other file's lines, there's no real
# reason to chomp() them
$slines{$_}++;
}
# %slines now has all lines from SYSTEMINFO as its
# keys and the values are the number of times the
# line appears, in case that's interesting to you
while (<PATCHLIST>) {
print "match: $_" if exists $slines{$_};
}
}
Incidentally, if you're reading your data from SYSTEMINFO and PATCHLIST, then you're doing it the old-fashioned way. When you get a chance, read up on lexical filehandles and the three-argument form of open if you're not already familiar with them.
Your code is not entering the PATCHLIST while loop the 2nd time through the SYSTEMINFO while loop because you already read all the contents of PATCHLIST the first time through. You'd have to re-open the PATCHLIST filehandle to accomplish what you're trying to do.
That's a pretty inefficient way to see if the lines of one file match the lines of another file. Take a look at grep with the -f flag for another way.
grep -f PATCHFILE SYSTEMINFO
What I like to do in such cases is: read one file and create keys for a hash from the values you are looking for. And then read the second file and look if the keys are already existing. In this way you have to read each file only once.
Here is example code, untested:
sub patch_check {
my %patches = ();
open(my $PatchList, '<', "patch.txt") or die $!;
open(my $SystemInfo, '<', "SystemInfo.txt") or die $!;
while ( my $PatchRow = <$PatchList> ) {
$patches($PatchRow) = 0;
}
while ( my $SystemRow = <$SystemInfo> ) {
if exists $patches{$SystemRow} {
#The Patch is in System Info
#Do whateever you want
}
}
}
You can not read one file inside the read loop of another. Slurp one file in, then have one loop as a foreach line of the slurped file, the outer loop, the read loop.