Rolling archival file containing last 1 year's data - perl

A script needs to be developed to create a archive file containing only last 1 year's data. Script will copy the content of main data file (CEMI.log) and append the same to archived log file (CEMI.Archive.log). This will happen once in a week (every monday at 3 am). After one year, the script should be able to remove the data from the archive file which is older than a year.
I am stuck at a point where I need to remove last one year's data. How can this be done using shell or perl script?
sample file:
-bash-3.2# more test.txt
2015-01-15,09:17:10,101,20a6475d-4d0c-4fe4-8765-35065ddfe887,_1.1,L,
2015-01-15,09:18:57,70,al Test,20a6475d-4d0c-4fe4-8765-35065ddfe887,1,L,
2015-01-15,10:59:28,1,,,,Best Practice,9f02745244d6440584b24012d882f935,,L,
2015-01-15,11:00:52,2,,,,Best Practice,9f02745244d6440584b24012d882f935,,L,
2015-01-15,11:00:56,1,,,,Best Practice,9f02745244d6440584b24012d882f935,,L,
2015-01-15,11:03:14,1,,,,Best Practice,9f02745244d6440584b24012d882f935,,L,
2015-01-15,11:03:38,1,,,,Best Practice,9f02745244d6440584b24012d882f935,,L,
2015-01-15,11:04:01,1,,,,Best Practice,9f02745244d6440584b24012d882f935,,L,
2015-01-15,11:05:07,2,,,,Best Practice,9f02745244d6440584b24012d882f935,,L,
2015-01-15,11:06:45,32,,,,Best Practice,9f02745244d6440584b24012d882f935,,L,
2015-01-15,12:57:13,36,,,560909,Best Practice,e8418950-6561-4465-b16b-30e118e826b7,,L,
2015-01-15,13:37:56,1032,,xml-data1,Test,20a6475d-4d0c-4fe4-8765-35065ddfe887,Request_1.1,L,Success
2015-01-15,13:39:01,38,,,Internal Test,20a6475d-4d0c-4fe4-8765-35065ddfe887,Request_1.1,L,
2015-01-15,13:39:50,113,,xml-dat1,al Test,20a6475d-4d0c-4fe4-8765-35065ddfe887,Request_1.1,L,
2015-01-15,13:40:17,74,,, Test,20a6475d-4d0c-4fe4-8765-35065ddfe887,Request_1.1,L,

This will stop at the first line not matching your date format.
#!/usr/bin/env perl
use strict;
use warnings;
use DateTime;
use File::Copy qw/move/;
use File::Temp qw/tempfile/;
# filenames
my $log = "test.txt";
my ( undef, $tmp ) = tempfile( undef, OPEN => 0 ); # get safe temporary filename, but do not open file
# move old log file to temporary location
move $log => $tmp
or die "Cannot rename '$log' to '$tmp': $!";
# open temporary file (contains old log) and new log
open( my $in, "<", $tmp )
or die "Cannot open '$tmp': $!";
open( my $out, ">", $log )
or die "Cannot open '$log': $!";
# calculate a DateTime value one year in the past
my $limit = DateTime->today->subtract( years => 1 );
# skip lines with date that is too old
while (<$in>) {
if (m/^(\d\d\d\d)-(\d\d)-(\d\d)/) {
# get DateTime object from matched date parts
my $dt = DateTime->new(
year => $1,
month => $2,
day => $3,
);
# keep on skipping lines while dates are too old
if ( $dt < $limit ) {
next;
}
}
# if no date was found or date too young, end skipping
last;
}
# copy all remaining lines from temporary file to new log
while (<$in>) {
print $out $_;
}
close $in or warn "Error closing '$in': $!";
close $out or warn "Error closing '$out': $!";
unlink $tmp or die "Cannot delete '$tmp': $!";
How to execute this:
$ cat > test.txt # press Ctrl+D to end text input
2014-01-01,will be removed
stops here
2014-01-01,will be kept because stopped
2015-01-01,would be kept anyway
$ perl rotate_log.pl
$ cat test.txt
2014-01-01,will be kept because stopped
2015-01-01,would be kept anyway

Related

date creation with Image::Exiftools or Date::Handler

I'm trying to merge all dirs containing pictures having different timeZone, in a single dir, uniform file's names with a same format (Hungarian style yyyymmdd_hhmmss) eliminating huge of duplicates, at once.
I was looking for modules (over the stat statement which return only the epoch (stat(file))[9]) that can read the file's creation date, not the last modification or access date.
I fount Immage::ExifTools seem to be the effective and easiest to list.
I noted that after several cycling into the dir the module stop to extract the correct date/time output stuck in a monotone identical wrong date/time value for the next over files. Here is the list and below one of pictures that get me into this trouble: more precisely file's property on win10 give a creation date of 20140626 12:16 pm; running instead the routine I obtain 20021208_120000.
foreach $img(#img){
next if -l $img;
$img =~ /.+(\..+$)/;
$ext = $1;
# %ENV;
$exif=new Image::ExifTool;
$exif->ExtractInfo($dir.$img);
$for = $exif->GetValue('CreateDate');
$for =~ s/$space/\_/g;
$for =~ s/\://g;
$for = '_'.$for;
$size = (stat($dir.$img))[7];
# $date = Date->new($date[9]);
# #data = $date->array;
#tie my %date, 'Date::Tie', utc_epoch => $date{$date[9]}; #tz => $date{tz};
#my $date = Date::Tie->new( epoch => $date[9] );
%date;
# $for = IMG.$for.$ext;
if (!$all{'IMG'.$for.$ext}){
$all{'IMG'.$for.$ext}= $size ;
rename $dir.$img, $dir.'IMG'.$for.$ext;
print "rename $dir.$img, $dir.'IMG'.$for.$ext\n";
}elsif($all{'IMG'.$for.$ext} == $size){
unlink $dir.$img;
print "Deleting $dir.$img\n";
}
Checking the files properties involved, the "wrong" ones seems to have the same properties "working" ones: both working and wrong state the properties of : creation/acquisition date, modification date and last access date..
I can't understand where the module fault.
Have you any recommendation ? Any different module to use ?
Thanks
Simon
On Windows you can use Win32API::File::Time to read and modify the file creation time:
use feature qw(say);
use strict;
use warnings;
use Win32API::File::Time qw(GetFileTime SetFileTime);
use File::Temp qw(tempdir);
my $dir = tempdir( CLEANUP => 1 );
my $fn = 'test.txt';
open (my $fh, '>', $fn) or die "Could not open file '$fn': $!";
say $fh "Foobar";
close $fh;
print_file_times($fn);
my ($atime, $mtime, $ctime) = GetFileTime ($fn);
SetFileTime ($fn, undef, undef, $ctime-180);
print_file_times($fn);
sub print_file_times {
my ($fn) = #_;
my ($atime, $mtime, $ctime) = GetFileTime ($fn);
say "File: $fn:";
say " File access time: $atime";
say " File modification time: $mtime";
say " File creation time: $ctime";
return $ctime;
}
Output:
File: test.txt:
File access time: 1614640101
File modification time: 1614640101
File creation time: 1614639958
File: test.txt:
File access time: 1614640101
File modification time: 1614640101
File creation time: 1614639778

Scan a large .gz file and split it's strings from a known word(which is repeated in the file) and save the all split strings in a .txt file

I'm trying to write a perl script where I'm trying to open and read a .gz file and split it from a known word('.EOM') which is repeated many times in that file and save all the splits in a .txt or .tmp file. That .gz file is very very large( in some GB). I've tried many different ways but every time it's showing the following error at the end.
"panic:sv_setpvn called with negative strlen at perl_gz1.pl line 7, line 38417185 "
here 'per_gz1.pl' is my perl file name and 'line 101' is the line where I've written the following code line: my #spl=split('.EOM',$join);
I don't know what type of error is this and how I can resolve it. Can anyone help to resolve it? Is there another way to do the same without getting this error? Thanks in advance.
I've attached my full code.
I've tried following codes:
use strict ;
use warnings;
my $file = "/nfs/iind/disks/saptak/dsbnatrgd.scntcl.gz";
open(IN, "gzcat $file |",) or die "gunzip $file: $!";
my $join = join('',<IN>);
#print $join;
my #spl=split('.EOM',$join);
print #spl;
close IN;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
my $input = "/nfs/iind/disks/cpc_disk0025/saptak/dsbnatrgd.scntcl.gz";
my $output = "NEW1.tmp";
gunzip $input => $output or die "gunzip failed: $GunzipError\n";
my $data = join("", "NEW1.tmp");
#use File::Slurp;
#my $data = read_file("NEW1.tmp");
my #spl=split(/.EOM/,$data)
and
use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
use IO::File ;
my $input = new IO::File "</nfs/iind/disks/cpc_disk0025/saptak/dsbnatrgd.scntcl.gz" or die "Cannot open 'file1.txt.gz': $!\n" ;
my $buffer ;
gunzip $input => \$buffer or die "gunzip failed: $GunzipError\n";
print $buffer;
my #spl=split(".EOM",$buffer);
But same error is coming every time.
I expect array #spl will save the file with split every time at the specified word/string and the output print it. So that I can work forward with this array #spl but no output is coming and The error "panic:sv_setpvn called with negative strlen at perl_gz1.pl line 7, line 38417185 " is showing on the output screen.
This might be how I would do it if it was a one time job:
zcat dsbnatrgd.scntcl.gz | perl -ne'sub newf{$n||="0000";$n++;open($fh,">","output_$n.txt")||die}$fh||newf();/(.*)\.EOM(.*)/ and print {$fh} $1 and newf() and print {$fh} $2 or print {$fh} $_'
This gives you a new file output_nnnn.txt each time an .EOM is seen somewhere. nnnn is 0001, 0002 and so on. The .EOM can be seen in the middle of a line as well, then the before and after .EOM is kept as well as the last string in the previous file and the first string in the next file.
The oneliner explained:
sub newf{
$n||="0000";
$n++; #increase the filename counter
open($fh,">","output_$n.txt")||die #open a new output filehandler
}
$fh||newf(); # 1st input line: create $fh file handler if it dont exists
/(.*)\.EOM(.*)/ # if the input line have a .EOM mark, grab whats before and after
and print {$fh} $1 #...and print the before on current file
and newf() #...and open new file
and print {$fh} $2 #...and print the after .EOM to the new file
or print {$fh} $_ #or if no .EOM on current line, just print it to the current output file
(Or did you mean the .EOM mark was uncompressed inside the .gz file? In that case the .gz file is probably invalid)
The reason your approach don't work might be because of very large input. You mentioned that the .gz file was some GB and then the input is probably several times bigger than that even. My approach here don't attempt to keep everything in memory at once so it doesn't matter how big your file is.

Why is this Perl foreach loop only executing only once?

I am trying to copy the content of three separate .vect files into one. I want to do this for all 5,000 files in the $fromdir directory.
When I run this program it generates just a single modified .vect file in the output directory. If I include the close(DATA) calls after individual while loops inside the foreach loop, I get the same behavior: a single output file in the output directory instead of the wanted 5,000 files.
I have done some reading, and at first thought I may not be opening the files. But if I print($vectfile) in the foreach loop every file name in the directory is printed.
My second thought was that it was how I was closing the files, but
I get the same behavior whether
I close the file handles inside or outside the foreach loop.
My final thought was maybe I don't have write permission to the file or directory, but I don't know how to change this.
How can I get this loop to run all 5,000 times and not just once?
use strict;
use warnings;
use feature qw(say);
my $dir = "D:\\Downloads";
# And M3.1 and P3.1
my $subfolder = "A0.1";
my $fromdir = $dir . "\\" . $subfolder;
my #files = <$fromdir/*vect>;
# Top of file
my $readfiletop = "C:\\Users\\Owner\\Documents\\MoreKnotVis\\ScriptsForAdditionalDataSets\\VectFileHeader.vect";
# Bottom of file
my $readfilebottom = "C:\\Users\\Owner\\Documents\\MoreKnotVis\\ScriptsForAdditionalDataSets\\VectFileCloser.vect";
foreach my $vectfile ( #files ) {
say("$vectfile");
my $count = 0;
my $readfilebody = $vectfile;
my $out_file = "D:\\Downloads\\ColorsA0.1\\" . "$count" . ".vect";
$count++;
# open top part of each file
open(DATA1, "<", $readfiletop) or die "Can't open '$readfiletop': $!";
# open bottom part of each file
open(DATA3, "<", $readfilebottom) or die "Can't open '$readfilebottom': $!";
# open a file to read
open(DATA2, "<", $vectfile) or die "Can't open '$vectfile': $!";
# open a file to write to
open(DATA4, ">" ,$out_file) or die "Can't open '$out_file': $!";
# Copy data from VectFileTop file to another.
while ( <DATA1> ) {
print DATA4 $_;
}
# Copy the data from VectFileBody to another.
while ( <DATA2> ) {
print DATA4 $_, $_ if 8..12;
}
# Copy the data from VectFileBottom to another.
while ( <DATA3> ) {
print DATA4 $_;
}
}
close( DATA1 );
close( DATA2 );
close( DATA3 );
close( DATA4 );
print("quit\n");
You construct the output file name including $count in it.
But note what you do with this variable:
initially, but inside the loop you set it to 0,
the output file name is constructed with 0 in it,
then you increment it, but this has no effect, because this variable
is again set to 0 in the next execution of the loop..
The effect is that:
the loop executes the required numer of times,
but the output file name every time contains 0 as the "number",
so you keep overwriting the same file with a new content.
Move my $count = 0; instruction before the loop and everything
should be OK.
You seem to be clinging to a specific form of code in fear of everything falling apart if you change a single thing. I recommend that you dare to stray a little more from the formula so that the code is more concise and readable
The problem is that you reset your $count to zero before processing each input file, so all the output files have the same name and overwrite one another. The remaining output file contains only the data from the last input file
Here's a refactoring of your code. I can't guarantee that it will run correctly but it looks right and does compile
I've added use autodie to avoid having to check the status of every IO operation
I've used the same lexical file handle $fh for all the input file. Opening another file on a file handle that is already open will close it first, and a lexical file handle will be closed by perl when it goes out of scope at the end of the block
I've used a while loop to iterate over the input file names instead of reading the whole list into an array which unnecessarily uses an additional variable #files and wastes space
I've used forward slashes instead of backslashes in all the file paths. This is fine in library calls on Windows: it is only a problem if they appear in command line input
I hope you'll agree that this form is more readable. I think you would have stood a much better chance of finding the problem if your code were in this form
use strict;
use warnings;
use autodie;
use feature qw/ say /;
my $indir = 'D:/Downloads';
my $subdir = 'A0.1'; # And M3.1 and P3.1
my $extrasdir = 'C:/Users/Owner/Documents/MoreKnotVis/ScriptsForAdditionalDataSets';
my $outdir = "$indir/Colors$subdir";
my $topfile = "$extrasdir/VectFileHeader.vect";
my $bottomfile = "$extrasdir/VectFileCloser.vect";
my $filenum;
while ( my $vectfile = glob "$indir/$subdir/*.vect" ) {
say qq/Processing "$vectfile"/;
$filenum++;
open my $outfh, '>', "$outdir/$filenum.vect";
my $fh;
open $fh, '<', $topfile;
print { $outfh } $_ while <$fh>;
open $fh, '<', $vectfile;
while ( <$fh> ) {
print { $outfh } $_, $_ if 8..12;
}
open $fh, '<', $bottomfile;
print { $outfh } $_ while <$fh>;
}
say 'DONE';

How to print lines from log file which occurs after some particular time

I want to print all the lines which lets say occur after a time whose value is returned by localtime function of perl inside a perl script. I tried something like below:
my $timestamp = localtime();
open(CMD,'-|','cat xyz.log | grep -A1000 \$timestamp' || die ('Could not open');
while (defined(my $line=<CMD>)){
print $line;
}
If I replace the $timestamp in cat command with actaul time component from xyz.log then it print lines but its not printing with $timestamp variable.
Is there any alternative way I can print lines that occurs after current time in log files or how i can improve above command?
Your $timestamp is never evaluated in Perl as it appears only in single quotes. But why go out to shell in order to match a string and process a file? Perl is far better for that.
Here is a direct way first, then a basic approach. A full script is shown in the second example.
Read the file until you get to the line with the pattern, and exit the loop at that point. The next time you access that filehandle you'll be on the next line and can start printing, in another loop.
while (<$fh>) { last if /$timestamp/ }
print while <$fh>;
This prints out the part of the file starting with the line following the one which has the $timestamp anywhere in it. Adjust how exactly to match the timestamp if it is more specific.
Or -- set a flag when a line matches the timestamp, print if flag is set.
use warnings 'all';
use strict;
my $timestamp = localtime();
my $logile = 'xyz.log';
open my $fh, '<', $logfile or die "Can't open $logfile: $!";
my $mark = 0;
while (<$fh>)
{
if (not $mark) {
$mark = 1 if /$timestamp/;
}
else { print }
}
close $fh;
If you're doing the grepping in Shell anyway you might as well do it the other way round and call perl only to give you the result of localtime:
sed <xyz.log -ne"/^$(perl -E'say scalar localtime')/,\$p"
This uses sed's range addressing: first keep it from printing lines unless explicitly told so using -n, the select everything between the first occurrence of the timestamp (I added a ^ for good measure, just in case log lines could contain time stamps in plain text) and the end of file ($) and print it (p).
A pure Perl solution could look like this:
my $timestamp = localtime();
my $found;
open(my $fh, '<', 'xyz.log') or die ('Could not open xyz.log: $!');
while (<$fh>) {
if($found) {
print;
} else {
$found = 1 if /^$timestamp/;
}
}
I would suggest a Perlish approach like below:
open (my $cmd, "<", "xyz.log") or die $!;
#get all lines in an array with each index containing each line
my #log_lines = <$cmd>;
my $index = 0;
foreach my $line (#log_lines){
#write regex to capture time from line
my $rex = qr/regex_for_time_as_per_logline/is;
if ($line =~ /$rex/){
#found the line with expected time
last;
}
$index++;
}
#At this point we have got the index of array from where our expected time starts.
#So all indexes after that have desired lines, which you can write as below
foreach ($index..$#log_lines){
print $log_lines[$_];
}
If you share one of your logline, I could help with the regex.
You may also try this approach:
In this case, I tried to open /var/log/messages then convert each line timestamp to epoch and finding all the lines which has occurred after time()
use Date::Parse;
my $epoch_now = time(); # print epoch current time.
open (my $fh, "</var/log/messages") || die "error: $!\n";
while (<$fh>) {
chomp;
# one log line - looks like this
# Sep 9 08:17:01 localhost rsyslogd: rsyslogd was HUPed
my ($mon, $day, $hour, $min, $sec) = ($_ =~ /(\S+)\s*(\d+)\s*(\d+):(\d+):(\d+)/);
# date string part shouldn't be empty
if (defined($mon) && defined($day)
&& defined($hour) && defined($min)
&& defined($sec)) {
my $epoch_log = str2time("$mon $day $hour:$min:$sec");
if ($epoch_log > $epoch_now) {
print, "\n";
}
}
}

Generate dynamic file name list

I started programming in perl few months back and this is my first question on stackoverflow.com. I hope I can get a solution.
So I want to copy some files from an ftp server. The file names are in this format:
abc_201149_cde_07_fgh_include_internal
In this example the numeric part gets changed on weekly basis, e.g. 201149 says year = 2011 and week = 49. Similarly, 07 says which version it is.
I have copied all the file names into one file called "sdk_link.txt" and I am reading each file name from this and then copying to my local PC:
use Net::FTP;
use File::Copy;
$Login = "<redacted>";
$Pwd = "<redacted>";
$ftpHost = "<redacted>";
$ftpFolder = "/daj/dhakj/ahdakl/abc_201206_def_05";
$ftp=Net::FTP->new($ftpHost,Timeout=>100);
if ($ftp)
{
print $ftp->message;
}
$ftp->login($Login,$Pwd);
print $ftp->message;
$ftp->cwd($ftpFolder);
print $ftp->message;
open FILE,"sdk_link.txt" or die $!;
while($test=<FILE>)
{
chomp($test);
#Copy the file
copy("$test","/testing");
}
$ftp->quit;
I want to run this script every week on Windows. How can I make the numeric part change so that the correct files get downloaded?
Well, the obvious answer is to keep a template on file, and insert the correct numbers. For example:
echo abc_%s_cde_%s_fgh_include_internal |
perl -MPOSIX -nE 'say sprintf $_, strftime("%Y%U", localtime()), "07";'
Output:
abc_201207_cde_07_fgh_include_internal
So that if you'd have a file with templates, you can use %s to insert strings, and provide arguments either from your own list of arguments, or dynamically, as you prefer. E.g.:
my $year = "2011";
my $week = "49";
my $ver = "07"; # Strings or numbers does not really matter
open my $fh, '<', "sdk_link.txt" or die $!;
while (<$fh>) {
my $file = sprintf $_, $year, $week, $ver;
copy($file, "/testing") or die "Copy failed for file $file: $!";
}
I am not so sure File::Copy::copy works as intended for remote files, but that's another question. I believe Net::FTP::get() might be what you want.