Copy data from huge files while they are open - perl

I am trying to merge data from huge files to a combined file using Perl.
File will be in open condition and large amount of data is continuously being added to the files. Appending around 50,000 lines per minute.
The files are stored in a network shared folder accessed by between 10 and 30 machines.
These are JTL files generated by JMeter.
This merge runs every minute for about 6 or 7 hours, and the time taken should not be more than 30 to 40 seconds.
The process is triggered every minute by a Web Application deployed in a Linux machine.
I have written a script which stores the last line added by the individual files to the combined file in separate files.
This works fine up to 15 minutes but constantly increase the merge time.
My script
#!/usr/bin/perl
use File::Basename;
use File::Path;
$consolidatedFile = $ARGV[0];
$testEndTimestamp = $ARGV[1];
#csvFiles = #ARGV[ 2 .. $#ARGV ];
$testInProcess = 0;
$newMerge = 0;
$lastLines = "_LASTLINES";
$lastLine = "_LASTLINE";
# time() gives current time timestamp
if ( time() <= $testEndTimestamp ) {
$testInProcess = 1;
}
# File exists, has a size of zero
if ( -z $consolidatedFile ) {
mkdir $consolidatedFile . $lastLines;
$newMerge = 1;
}
open( CONSOLIDATED, ">>" . $consolidatedFile );
foreach my $file ( #csvFiles ) {
open( INPUT, "<" . $file );
#linesArray = <INPUT>;
close INPUT;
if ( $newMerge ) {
print CONSOLIDATED #linesArray[ 0 .. $#linesArray - 1 ];
open my $fh, ">", $consolidatedFile . $lastLines . "/" . basename $file . $lastLine;
print $fh $linesArray[ $#linesArray - 1 ];
close $fh;
}
else {
open( AVAILABLEFILE, "<" . $consolidatedFile . $lastLines . "/" . basename $file . $lastLine );
#lineArray = <AVAILABLEFILE>;
close AVAILABLEFILE;
$availableLastLine = $lineArray[0];
open( FILE, "<" . $file );
while ( <FILE> ) {
if ( /$availableLastLine/ ) {
last;
}
}
#grabbed = <FILE>;
close( FILE );
if ( $testInProcess ) {
if ( $#grabbed > 0 ) {
pop #grabbed;
print CONSOLIDATED #grabbed;
open( AVAILABLEFILE, ">" . $consolidatedFile . $lastLines . "/" . basename $file . $lastLine );
print AVAILABLEFILE $grabbed[ $#grabbed - 1 ];
}
close AVAILABLEFILE;
}
else {
if ( $#grabbed >= 0 ) {
print CONSOLIDATED #grabbed;
}
}
}
}
close CONSOLIDATED;
if ( !$testInProcess ) {
rmtree $consolidatedFile . $lastLines;
}
I need to optimize the script in order to reduce the time.
Is it possible to store last line in a cache?
Can anyone suggest another way for this type of merging?
Another script which stores last line in cache instead of file.
Even this does not complete merge within 1 min.
#!/usr/bin/perl
use CHI;
use File::Basename;
use File::Path;
my $cache = CHI->new(
driver => 'File',
root_dir => '/path/to/root'
);
$consolidatedFile = $ARGV[0];
$testEndTimestamp = $ARGV[1];
#csvFiles = #ARGV[ 2 .. $#ARGV ];
$testInProcess = 0;
$newMerge = 0;
$lastLines = "_LASTLINES";
$lastLine = "_LASTLINE";
# time() gives current time timestamp
if ( time() <= $testEndTimestamp ) {
$testInProcess = 1;
}
# File exists, has a size of zero
if ( -z $consolidatedFile ) {
$newMerge = 1;
}
open( CONSOLIDATED, ">>" . $consolidatedFile );
foreach my $file (#csvFiles) {
$fileLastLineKey =
$consolidatedFile . $lastLines . "_" . basename $file . $lastLine;
open( INPUT, "<" . $file );
#linesArray = <INPUT>;
close INPUT;
if ($newMerge) {
print CONSOLIDATED #linesArray[ 0 .. $#linesArray - 1 ];
$fileLastLine = $linesArray[ $#linesArray - 1 ];
$cache->set( $fileLastLineKey, $fileLastLine );
}
else {
$availableLastLine = $cache->get($fileLastLineKey);
open( FILE, "<" . $file );
while (<FILE>) {
if (/$availableLastLine/) {
last;
}
}
#grabbed = <FILE>;
close(FILE);
if ($testInProcess) {
if ( $#grabbed > 0 ) {
pop #grabbed;
print CONSOLIDATED #grabbed;
$fileLastLine = $grabbed[ $#grabbed - 1 ];
$cache->set( $fileLastLineKey, $fileLastLine );
}
}
else {
if ( $#grabbed >= 0 ) {
print CONSOLIDATED #grabbed;
$cache->remove($fileLastLineKey);
}
}
}
}
close CONSOLIDATED;
I am thinking of reading files from last line to required line and copy those lines to consolidated file.
Can anyone suggest on this???

You may want to try open the file in binmode and read it blockwise in a loop. This usually offers significant performance improvements. The following functions is an example, here i put at maximum $maxblocks blocks of a file on the array, from block $offset on, in an array passed as reference. Note that the last block may not contain the entire $block bytes when the file is not large enough.
sub file2binarray {
my $file=shift;
my $array=shift;
my $maxblocks=shift;
my $offset=shift;
my $block=2048;
$offset=0 if ((!defined($offset)) || ($offset !~/^\s*\d+\s*$/o));
$maxblocks="ALL"
if (!defined($maxblocks) || ($maxblocks!~/^\s*\d+\s*$/o));
my $size=(stat($file))[7];
my $mb=$size/$block;
$mb++ if ($mb*$block<$size);
$maxblocks=$mb-$offset if(($maxblocks eq "ALL")||
($maxblocks>$mb-$offset));
$offset*=$block;
open(IN,"$file") || die("Cannot open file <$file>\n");
binmode(IN);
$bytes_read=$block;
seek(IN,$offset,0);
my ($blk,$bytes_read,$buffer)=(0,0,"");
while (($bytes_read==$block)&& ($blk<$maxblocks)){
$bytes_read=sysread(IN,$buffer,$block);
push(#$array,$buffer);
$blk++;
}
close(IN);
}
To read the entire file at ones, e.g. you call it like this
my #array;
my $filename="somefile";
file2binarray ($filename,\#array,"ALL",0);
but probably you'd rather call it in a loop with some bookkeeping over the offset, and parse the array in between subsequent calls.
Hope this helps.

Related

Hash incorrectly tracking counts, runtime long

I am working on a program in Perl and my output is wrong and taking forever to process. The code is meant to take in a large DNA sequence file, read through it in 15 letter increments (kmers), stepping forward 1 position at a time. I'm supposed to enter the kmer sequences into a hash, with their value being the number of incidences of that kmer- meaning each key should be unique and when a duplicate is found, it should increase the count for that particular kmer. I know from my Prof. expected output file, that I have too many lines, so it is allowing duplicates and not counting correctly. It's also running 5+ minutes, so I have to Ctrl+C to escape. When I go look at kmers.txt, the file is at least written and formatted correctly.
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
# countKmers.pl
# Open file /scratch/Drosophila/dmel-2L-chromosome-r5.54.fasta
# Identify all k-mers of length 15, load them into a hash
# and count the number of occurences of each k-mer. Each
# unique k-mer and its' count will be written to file
# kmers.txt
#Create an empty hash
my %kMersHash = ();
#Open a filehandle for the output file kmers.txt
unless ( open ( KMERS, ">", "kmers.txt" ) ) {
die $!;
}
#Call subroutine to load Fly Chromosome 2L
my $sequenceRef = loadSequence("/scratch/Drosophila/dmel-2L-chromosome-r5.54.fasta");
my $kMer = 15; #Set the size of the sliding window
my $stepSize = 1; #Set the step size
for (
#The sliding window's start position is 0
my $windowStart = 0;
#Prevent going past end of the file
$windowStart <= ( length($$sequenceRef) - $kMer );
#Advance the window by the step size
$windowStart += $stepSize
)
{
#Get the substring from $windowStart for length $kMer
my $kMerSeq = substr( $$sequenceRef, $windowStart, $kMer );
#Call the subroutine to iterate through the kMers
processKMers($kMerSeq);
}
sub processKMers {
my ($kMerSeq) = #_;
#Initialize $kCount with at least 1 occurrence
my $kCount = 1;
#If the key already exists, the count is
#increased and changed in the hash
if ( not exists $kMersHash{$kMerSeq} ) {
#The hash key=>value is loaded: kMer=>count
$kMersHash{$kMerSeq} = $kCount;
}
else {
#Increment the count
$kCount ++;
#The hash is updated
$kMersHash{$kMerSeq} = $kCount;
}
#Print out the hash to filehandle KMERS
for (keys %kMersHash) {
print KMERS $_, "\t", $kMersHash{$_}, "\n";
}
}
sub loadSequence {
#Get my sequence file name from the parameter array
my ($sequenceFile) = #_;
#Initialize my sequence to the empty string
my $sequence = "";
#Open the sequence file
unless ( open( FASTA, "<", $sequenceFile ) ) {
die $!;
}
#Loop through the file line-by-line
while (<FASTA>) {
#Assign the line, which is in the default
#variable to a named variable for readability.
my $line = $_;
#Chomp to get rid of end-of-line characters
chomp($line);
#Check to see if this is a FASTA header line
if ( $line !~ /^>/ ) {
#If it's not a header line append it
#to my sequence
$sequence .= $line;
}
}
#Return a reference to the sequence
return \$sequence;
}
Here's how I would write your application. The processKMers subroutine boils down to just incrementing a hash element, so I've removed that. I've also altered the identifiers to be match the snake_case that is more usual in Perl code, and I didn't see any point in load_sequence returning a reference to the sequence so I've changed it to return the string itself
use strict;
use warnings 'all';
use constant FASTA_FILE => '/scratch/Drosophila/dmel-2L-chromosome-r5.54.fasta';
use constant KMER_SIZE => 15;
use constant STEP_SIZE => 1;
my $sequence = load_sequence( FASTA_FILE );
my %kmers;
for (my $offset = 0;
$offset + KMER_SIZE <= length $sequence;
$offset += STEP_SIZE ) {
my $kmer_seq = substr $sequence, $start, KMER_SIZE;
++$kmers{$kmer_seq};
}
open my $out_fh, '>', 'kmers.txt' or die $!;
for ( keys %kmers ) {
printf $out_fh "%s\t%d\n", $_, $kmers{$_};
}
sub load_sequence {
my ( $sequence_file ) = #_;
my $sequence = "";
open my $fh, '<', $sequence_file or die $!;
while ( <$fh> ) {
next if /^>/;
chomp;
$sequence .= $_;
}
return $sequence;
}
Here's a neater way to increment a hash element without using ++ on the hash directly
my $n;
if ( exists $kMersHash{$kMerSeq} ) {
$n = $kMersHash{$kMerSeq};
}
else {
$n = 0;
}
++$n;
$kMersHash{$kMerSeq} = $n;
Everything looks fine in your code besides processKMers. The main issues:
$kCount is not persistent between calls to processKMers, so in your else statement, $kCount will always be 2
You are printing every time you call processKMers, which is what is slowing you down. Printing frequently slows down your process significantly, you should wait until the end of your program and print once.
Keeping your code mostly the same:
sub processKMers {
my ($kMerSeq) = #_;
if ( not exists $kMersHash{$kMerSeq} ) {
$kMersHash{$kMerSeq} = 1;
}
else {
$kMersHash{$kMerSeq}++;
}
}
Then you want to move your print logic to immediately after your for-loop.

Nested Loop running very slowly

I'm trying to run a program to check each line of one file against each line of a second file to see if some of the elements match. Each file is around 200k lines.
What I've got so far looks like this;
#!/usr/bin/perl
#gffgenefind.pl
use strict;
use warnings;
die "SNP gff\n" unless #ARGV == 4;
open( my $snp, "<", $ARGV[0] ) or die "Can't open $:";
open( my $gff, "<", $ARGV[1] ) or die "can't open $:";
open( my $outg, ">", $ARGV[2] );
open( my $outs, ">", $ARGV[3] );
my $scaffold;
my $site;
my #snplines = <$snp>;
my #gfflines = <$gff>;
foreach my $snpline (#snplines) {
my #arr = split( /\t/, $snpline );
$scaffold = $arr[0];
$site = $arr[1];
foreach my $line (#gfflines) {
my #arr1 = split( /\t/, $line );
if ( $arr1[3] <= $site and $site <= $arr1[4] and $arr1[0] eq $scaffold ) {
print $outg "$line";
print $outs "$snpline";
}
}
}
File 1 (snp) looks like this scaffold_100 10689 A C A 0 0 0 0 0 0
File 2 (gff) looks like this scaffold_1 phytozomev10 gene 750912 765975 . - . ID=Carubv10008059m.g.v1.0;Name=Carubv10008059m.g
Essentially, I'm looking to see if the first values match and if the second value from snp is within the range defined on the second file (in this case 750912 to 765975)
I've seen that nested loops are to be avoided, and was wondering if there's an alternative way for me to look through this data.
Thanks!
Firstly - lose the foreach loop. That reads your whole file into memory, when you probably don't need to.
Try instead:
while ( my $snpline = <$snp> ) {
because it reads line by line.
Generally - mixing array indicies and named variables is also bad style.
The core problem will most likely be though because each line of your first file, you're cycling all of the second file.
Edit: Note - because 'scaffold' isn't unique, amended accordingly
This seems like a good place to use a hash. E.g.
my %sites;
while ( <$snp> ) {
my ( $scaffold, $site ) = split ( /\t/ );
$sites{$site}{$scaffold}++
}
while ( <$gff> ) {
my ( $name, $tmp1, $tmp2, $range_start, $range_end ) = split ( /\t/ );
if ( $sites{$name} ) {
foreach my $scaffold ( keys %{ $sites{$name} ) {
if ( $scaffold > $range_start
and $scaffold < $range_end ) {
#do stuff with it;
print;
}
}
}
}
Hopefully you get the gist, even if it isn't specifically what you're after?
Try this Python snippet:
#!/usr/bin/env python
import sys
import contextlib
if len(sys.argv) !=5:
raise Exception('SNP gff')
snp, gff, outg, outs = sys.argv[1:]
gff_dict = {}
with open(gff) as gff_handler:
for line in gff_handler:
fields=line.split()
try:
gff_dict[fields[0]].append(fields[1:])
except KeyError:
gff_dict[fields[0]] = [fields[1:]]
with contextlib.nested(open(snp),
open(outs, 'w'),
open(outg, 'w')) as (snp_handler,
outs_handler,
outg_handler):
for line_snp in snp_handler:
fields=line_snp.split()
key = fields[0]
if key in gff_dict:
for ele in gff_dict[key]:
if ele[2] <= fields[1] <= ele[3]:
outs_handler.write(line_snp)
outg_handler.write("{0}\t{1}\n".format(key,"\t".join(ele)))

Pass lines from 2 files to same subroutine

I'm in the process of learning how to use perl for genomics applications. I am trying to clean up paired end reads (1 forward, 1 reverse). These are stored in 2 files, but the lines match. What I'm having trouble doing is getting the relevant subroutines to read from the second file (the warnings I get are for uninitialized values).
These files are set up in 4 line blocks(fastq) where the first line is a run ID, 2nd is a sequence, 3rd is a "+", and the fourth holds quality values for the sequence in line 2.
I had no real trouble with this code when it was applied only for one file, but I think I'm misunderstanding how to handle multiple files.
Any guidance is much appreciated!
My warning in this scenario is as such : Use of uninitialized value $thisline in subtraction (-) at ./pairedendtrim.pl line 137, line 4.
#!/usr/bin/perl
#pairedendtrim.pl by AHU
use strict;
use warnings;
die "usage: readtrimmer.pl <file1> <file2> <nthreshold> " unless #ARGV == 3;
my $nthreshold = "$ARGV[2]";
open( my $fastq1, "<", "$ARGV[0]" );
open( my $fastq2, "<", "$ARGV[1]" );
my #forline;
my #revline;
while ( not eof $fastq2 and not eof $fastq1 ) {
chomp $fastq1;
chomp $fastq2;
$forline[0] = <$fastq1>;
$forline[1] = <$fastq1>;
$forline[2] = <$fastq1>;
$forline[3] = <$fastq1>;
$revline[0] = <$fastq2>;
$revline[1] = <$fastq2>;
$revline[2] = <$fastq2>;
$revline[3] = <$fastq2>;
my $ncheckfor = removen( $forline[1] );
my $ncheckrev = removen( $revline[1] );
my $fortest = 0;
if ( $ncheckfor =~ /ok/ ) { $fortest = 1 }
my $revtest = 0;
if ( $ncheckrev =~ /ok/ ) { $revtest = 1 }
if ( $fortest == 1 and $revtest == 1 ) { print "READ 1 AND READ 2" }
if ( $fortest == 1 and $revtest == 0 ) { print "Read 1 only" }
if ( $fortest == 0 and $revtest == 1 ) { print "READ 2 only" }
}
sub removen {
my ($thisline) = $_;
my $ntotal = 0;
for ( my $i = 0; $i < length($thisline) - 1; $i++ ) {
my $pos = substr( $thisline, $i, 1 );
#print "$pos\n";
if ( $pos =~ /N/ ) { $ntotal++ }
}
my $nout;
if ( $ntotal <= $nthreshold ) #threshold for N
{
$nout = "ok";
} else {
$nout = "bad";
}
return ($nout);
}
The parameters to a subroutine are in #_, not $_
sub removen {
my ($thisline) = #_;
I have a few other tips for you as well:
use autodie; anytime that you're doing file processing.
Assign the values in #ARGV to variables first thing. This quickly documents what the hold.
Do not chomp a file handle. This does not do anything. Instead apply chomp to the values returned from reading.
Do not use the strings ok and bad as boolean values.
tr can be used to count the number times a character is in a string.
The following is a cleaned up version of your code:
#!/usr/bin/perl
#pairedendtrim.pl by AHU
use strict;
use warnings;
use autodie;
die "usage: readtrimmer.pl <file1> <file2> <nthreshold> " unless #ARGV == 3;
my ( $file1, $file2, $nthreshold ) = #ARGV;
open my $fh1, '<', $file1;
open my $fh2, '<', $file2;
while ( not eof $fh2 and not eof $fh1 ) {
chomp( my #forline = map { scalar <$fh1> } ( 1 .. 4 ) );
chomp( my #revline = map { scalar <$fh2> } ( 1 .. 4 ) );
my $ncheckfor = removen( $forline[1] );
my $ncheckrev = removen( $revline[1] );
print "READ 1 AND READ 2" if $ncheckfor and $ncheckrev;
print "Read 1 only" if $ncheckfor and !$ncheckrev;
print "READ 2 only" if !$ncheckfor and $ncheckrev;
}
sub removen {
my ($thisline) = #_;
my $ntotal = $thisline =~ tr/N/N/;
return $ntotal <= $nthreshold; #threshold for N
}

print n users and their value using perl

I am new to perl require your help to build a logic.
I have some let say 10 files in a directory, each file has some data like below. Each file contain lines depends upon the number of users setting. For example if 4 users are there then 4 lines will get printed from the server.
1405075666889,4044,SOA_breade,200,OK,Thread Group 1-1,text,true,623,4044
1405075666889,4041,SOA_breade,200,OK,Thread Group 1-1,text,true,623,4041
1405075666889,4043,SOA_breade,200,OK,Thread Group 1-1,text,true,623,4043
1405075666889,4045,SOA_breade,200,OK,Thread Group 1-1,text,true,623,4044
I want to some piece of logic that should create single file in a output directory and that file should contain 10 lines
Min_Value, Max_Value, Avg_Value, User1, User2, User3......User4
and their corresponding values from second line in this case corresponding values are coming from second column.
Min_Value, Max_Value, Avg_Value, User1, User2, User3......User4
4.041,4.045,4.044,4.041,4.043,4.045
.
.
.
.
.
10th file data
Here is my code... It is working however I am not getting how to print user1, user2... in sequence and its corresponding values
my #soaTime;
my #soaminTime;
my #soamaxTime;
my #soaavgTime;
my $soadir = $Json_Time;
foreach my $inputfile (glob("$soadir/*Overview*.txt")) {
open(INFILE, $inputfile) or die("Could not open file.");
foreach my $line (<INFILE>) {
my #values = split(',', $line); # parse the file
my $time_ms = $values[1]/1000;
push (#soaTime, $time_ms);
}
my $min = min #soaTime;
push (#soaminTime, $min);
print $soaminTime[0];
my $max = max #soaTime;
push (#soamaxTime, $max);
sub mean { return #_ ? sum(#_) / #_ : 0 };
#print mean(#soaTime);
push (#soaavgTime, mean());
close(INFILE);
}
my $outputfile = $report_path."abc.txt";
open (OUTFILE, ">$outputfile");
print OUTFILE ("Min_Value,Max_Value,User1,User2,User3,User4"."\n"); # Prining the data
for (my $count = 0; $count <= $#soaTC; $count++) {
print OUTFILE("$soaminTime[0],$soamaxTime[0],$soaTime[0],$soaTime[1],$soaTime[2],$soaTime[3]"."\n" ); #Prining the data
}
close(OUTFILE);
Please help.
is it?
use strict;
use List::Util qw( min max );
my $Json_Time="./test";
my $report_path="./out/";
my #soaTime;
my #soaminTime;
my #soamaxTime;
my #soaavgTime;
my #users;
my $maxusers;
my $soadir = $Json_Time;
foreach my $inputfile (glob("$soadir/*Overview*.txt")) {
open(INFILE, $inputfile) or die("Could not open file.");
my $i=0;
my #m_users;
my #m_soaTime;
foreach my $line (<INFILE>) {
my #values = split(',', $line); # parse the file
my $time_ms = $values[1]/1000;
push (#m_soaTime, $time_ms);
$i++;
push(#m_users, "User".$i);
}
push(#soaTime,\#m_soaTime);
if ($maxusers<$#m_users) {
#users=#m_users;
$maxusers=$#m_users;
}
my $min = min(#m_soaTime);
push (#soaminTime, $min);
my $max =max(#m_soaTime);
push (#soamaxTime, $max);
sub mean { return #_ ? sum(#_) / #_ : 0 };
push (#soaavgTime, mean());
close(INFILE);
}
my $outputfile = $report_path."abc.txt";
open (OUTFILE, ">$outputfile");
print OUTFILE "Min_Value,Max_Value,Avg_Value,".join(',',#users)."\n"; # Prining the data
for (my $count = 0; $count <= $#soaavgTime; $count++) {
print OUTFILE $soaminTime[$count].","
.$soamaxTime[$count].","
.$soaavgTime[$count].","
.join(',',#{$soaTime[$count]})
."\n"; #Prining the data
}
close(OUTFILE);

Finding two newest files in two separate directories and merging them

I have two direcotries, each containing pictures. The regional directory is updated every 5 minutes, the watch directory is updated every 15.
What I am trying to do is find the newest file in each directory and take those files and use Image Magik to create a third image.
What I have works for some but is very inconsistent, for example my code will sometimes miss the regional files when it's time matches the watch files.
Other times it will merge two watch files, even though the watch files and regional files are in two separate directories.
I have no clue how to fix it.
Here is my code:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use vars qw/%files_watch/;
use vars qw/%files_regional/;
sub findNewestFiles {
my $element = $File::Find::name;
return if ( !-f $element );
$files_watch{$element} = ( stat($element) )[10];
$files_regional{$element} = ( stat($element) )[10];
}
my $image_magick_exe = "composite.exe\"";
my $pic_dir = "C:\\eterra\\eterravision\\weather";
my $watch_dir = "C:\\eterra\\eterravision\\weather\\watch";
my $regional_dir = "C:\\eterra\\eterravision\\weather\\regional";
open( OUT, ">>names.txt" ) || die;
find( \&findNewestFiles, $watch_dir );
my $newestfile_watch;
my $time_watch = 0;
while ( my ( $t1, $t2 ) = each(%files_watch) ) {
if ( $t2 > $time_watch ) {
$newestfile_watch = $t1;
$time_watch = $t2;
}
}
$time_watch = localtime($time_watch);
find( \&findNewestFiles, $regional_dir );
my $newestfile_regional;
my $time_regional = 0;
while ( my ( $t3, $t4 ) = each(%files_regional) ) {
if ( $t4 > $time_regional ) {
$newestfile_regional = $t3;
$time_regional = $t4;
}
}
$time_regional = localtime($time_regional);
$newestfile_watch =~ s/\//\\/g;
$newestfile_regional =~ s/\//\\/g; #replacing the "/" in the file path to "\"
my #temp = split( /_/, $newestfile_regional );
my $type = $temp[0];
my $date = $temp[1];
my $time = $temp[2];
my $check = "$pic_dir/radarwatch\_$date\_$time"; #check if file was created
unless ( -e $check )
{
system("\"$image_magick_exe \"$newestfile_regional\" \"$newestfile_watch\" \"$pic_dir\\radarwatch\_$date\_$time\"");
print "file created\n";
}
I changed the [10] in the sub function to an [8], and a [9]. 8 is access time, 9 is modification time, and 10 is creation time, 10 hase been the most successful.
I think the problem is with the sub function.
Is there a better way to search for the newest creation time? Something that is more reliable than what I have?
I think the crux of your problem is finding the most recent file in each directory, and then processing them. Leaving aside the details of processing them, here is a script that finds the most recent files. I leave out the imagemagick stuff, that can all be put into the process_latest subroutine. No need for File::Find. File::stat allows us to use names instead of trying to remember those numbers. The program has a clearer structure.
use strict;
use warnings;
use File::stat;
my $watch_dir = "C:\\eterra\\eterravision\\weather\\watch";
my $regional_dir = "C:\\eterra\\eterravision\\weather\\regional";
# get the latest in each directory
my $latest_regional = get_latest_file($regional_dir);
my $latest_watch = get_latest_file($watch_dir);
# do whatever you want here...
process_latest ($latest_regional, $latest_watch);
# I exit 1 in Windows, exit 0 in Unix
exit 1;
#--------------------------------
# subroutines
#--------------------------------
sub get_latest_file {
my $dir = shift;
opendir my $DIR, $dir or die "$dir $!";
my $latest_time = -1;
my $latest_file = '';
FILE:
while (readdir($DIR)) {
my $file = "$dir\\$_";
next FILE unless -f $file;
my $file_time = stat($file)->mtime;
print "$file $file_time\n";
if ($file_time > $latest_time) {
$latest_time = $file_time;
$latest_file = $file;
}
}
closedir $DIR;
return $latest_file;
}
sub process_latest {
my $regional = shift;
my $watch = shift;
print "Latest Regional: $regional\n";
print "Latest Watch: $watch\n";
}