Running into a memory error using Perl and DateTime

Running into a memory error using Perl and DateTime - perl

I am writing a small tool to parse some application logs for to collect data that is going to be used as the inputs for Zabbix monitoring. I am just wanting to keep data from the logs that are within the past two hours.
The format of the logs is pretty simple, the fields are separated by white space and the first three fields are used to determine the time when the logging was written.
Here is an example of the first three fields of a log line:
Jan 5 13:42:07
What I set out to do was to utilize one of my favorite modules, DateTime. Where I convert the above into a DateTime object and then compare that object to another DateTime object when the utility would be invoked.
Everything was fine an dandy and working nicely until I actually set the utility against the a portion of the logs it would actually be parsing -- only a couple gigabytes in size. The test run was being done on a kitchen invoked Ubuntu virtual box instance on my laptop, so the resources are -- as expected -- rather limited. The script would halt with the words 'Killed' displayed.
Looking into /var/log/messages I would see log lines describing the process being killed due to resource issues.
When I invoked the process again, and then switching to another screen instance to watch top, I noticed that the memory percentage would grow, that swap space would being to be consumed all until the script would again stop with the 'Killed' message.
When I would rerun the script with the DateTime portion commented out, the script would execute as expected.
In the script I have a subroutine which would be called to create a DateTime object based upon the information found in the first three fields of the log line. I have tried where I create the object at the beginning of the subroutine then undef it prior to returning a value at the end of the subroutine, I have tried it where I create a global object ( using our ) and then use the DateTime set_* methods to modify what I thought would be a single object's values.
I have read that perl does not clean up hash memory so that it can be reused by the program--I feel that this is the base of the issue that I am running into.
At this point, I am feel the need to get input of others and that is the reason for this post. All comments and criticisms would be appreciated.
This utility was running on Perl v5.14.2.
This code produces the memory leak:
#!/usr/bin/perl -w
use strict;
use DateTime;
my $month = 1;
my $day = 6;
my $hour = 20;
my $minute = 30;
my $second = 00;
for (my $count = 0; $count <= 25_000_000; $count++) {
my $epoch = &get_epoch( $month, $day, $hour, $minute, $second );
}
sub get_epoch {
my $mon = shift;
my $day = shift;
my $hour = shift;
my $min = shift;
my $sec = shift;
my $temp_dt = DateTime->new(
year => 2015,
month => $mon,
day => $day,
hour => $hour,
minute => $min,
second => $sec,
nanosecond => 500_000_000,
time_zone => 'UTC',
);
return( $temp_dt->epoch );
}

This is a bug in Params::Validate 1.15 and will be fixed very soon.

Related

How can I get array of Sundays between two dates?

I'm starting out with two dates.
my $date1 = 01/01/2016;
my $date2 = 05/15/2016;
I need to put all the dates of the Sundays between two dates.
Any idea where I should start?

Your solution is good, but it potentially consumes a lot of memory creating an array of Sundays that might never be used. DateTime objects are not small nor cheap.
An alternative is an iterator, a function which every time it's called generates the next Sunday. It generates each Sunday on demand rather than calculating them all beforehand. This saves memory and supports potentially infinite Sundays.
use strict;
use warnings;
use v5.10;
sub sundays_iterator {
my($start, $end) = #_;
# Since we're going to modify it, copy it.
$start = $start->clone;
# Move to Sunday, if necessary.
$start->add( days => 7 - $start->day_of_week );
# Create an iterator using a closure.
# This will remember the values of $start and $end as
# they were when the function was returned.
return sub {
# Clone the date to return it without modifications.
# We always start on a Sunday.
my $date = $start->clone;
# Move to the next Sunday.
# Do this after cloning because we always start on Sunday.
$start->add( days => 7 );
# Return Sundays until we're past the end date
return $date <= $endDate ? $date : ();
};
}
That returns a closure, an anonymous subroutine which remembers the lexical variables it was created with. Sort of like an inside out object, a function with data attached. You can then call it like any subroutine reference.
my $sundays = sundays_iterator($startDate, $endDate);
while( my $sunday = $sundays->() ) {
say $sunday;
}
The upside is it saves a lot of memory, this can be especially important if you're taking the dates as user input: a malicious attacker can ask you for an enormous range consuming a lot of your server's memory.
It also allows you to separate generating the list from using the list. Now you have a generic way of generating Sundays within a date range (or, with a slight tweak, any day of the week).
The downside is it's likely to be a bit slower than building an array in a loop... but probably not noticeably so. Function calls are relatively slow in Perl, so making one function call for each Sunday will be slower than looping, but calling those DateTime methods (which call other methods which call other methods) will swamp that cost. Compared to using DateTime, calling the iterator function is a drop in the bucket.

You should start by picking a module. I'm partial to DateTime, using DateTime::Format::Strptime for the parsing.
use DateTime qw( );
use DateTime::Format::Strptime qw( );
my $start_date = "01/01/2016";
my $end_date = "05/15/2016";
my $format = DateTime::Format::Strptime->new(
pattern => '%m/%d/%Y',
time_zone => 'floating', # Best not to use "local" for dates with no times.
on_error => 'croak',
);
my $start_dt = $format->parse_datetime($start_date)->set_formatter($format);
my $end_dt = $format->parse_datetime($end_date )->set_formatter($format);
my $sunday_dt = $start_dt->clone->add( days => 7 - $start_dt->day_of_week );
while ($sunday_dt <= $end_dt) {
print "$sunday_dt\n";
$sunday_dt->add( days => 7 );
}
Note: You really shouldn't use DateTime->new as Bill used and Schwern endorsed. It's not the recommended use of DateTime because it creates code that's far more complicated and error-prone. As you can see, using a formatter cut the code size in half.
Note: Schwern is advocating the use of an iterator, replacing the last four lines of my answer with something 4 times longer (all the code in his answer). There's no reason for that high level complexity! He goes into length saying how much memory the iterator is saving, but it doesn't save any at all.

DateTime::Set makes constructing an iterator easy:
use DateTime::Format::Strptime ();
use DateTime::Set ();
my $start_date = "01/01/2016";
my $end_date = "05/15/2016";
my $format = DateTime::Format::Strptime->new(
pattern => '%m/%d/%Y',
time_zone => 'local',
on_error => 'croak',
);
my $iterator = DateTime::Set->from_recurrence(
start => $format->parse_datetime($start_date)->set_formatter($format),
end => $format->parse_datetime($end_date)->set_formatter($format),
recurrence => sub { $_[0]->add( days => 7 - $_[0]->day_of_week || 7 ) }, # next Sunday after $_[0]
);
while ( my $date = $iterator->next ) {
say $date;
}

This is what I came up with but please let me know if there is a better way.
use DateTime;
my $date1 = "1/1/2016";
my $date2 = "5/15/2016";
my ($startMonth, $startDay, $startYear) = split(/\//, $date1);
my ($endMonth, $endDay, $endYear) = split(/\//, $date2);
my $startDate = DateTime->new(
year => $startYear,
month => $startMonth,
day => $startDay
);
my $endDate = DateTime->new(
year => $endYear,
month => $endMonth,
day => $endDay
);
my #sundays;
do {
my $date = DateTime->new(
year => $startDate->year,
month => $startDate->month,
day => $startDate->day
);
push #sundays, $date if ($date->day_of_week == 7);
$startDate->add(days => 1);
} while ($startDate <= $endDate);
foreach my $sunday (#sundays) {
print $sunday->strftime("%m/%d/%Y");
}

Perl execute a command at a specified time

I need to write a perl script that executes a command at a specified time.
use net::ssh::expect to login to a router
read the time from the router's clock ("show clock" command displays the time.)
At the 17:30:00 execute a command.
I tried writing script for it but it doesn't work. Any suggestions please ?
use strict;
use warnings;
use autodie;
use feature qw/say/;
use Net::SSH::Expect;
my $Time;
my $ssh = Net::SSH::Expect->new(
host => "ip",
password => 'pwd',
user => 'user name',
raw_pty => 1,
);
my $login_output = $ssh->login();
while(1) {
$Time = localtime();
if( $Time == 17:30:00 ) {
my $cmd = $ssh->exec("cmd");
print($cmd);
} else {
print" Failed to execute the cmd \n";
}
}

Several things here:
First, use Time::Piece. It's now included in Perl.
use Time::Piece;
for (;;) { # I prefer using "for" for infinite loops
my $time = localtime; # localtime creates a Time::Piece object
# I could also simply look at $time
if ( $time->hms eq "17:30:00" ) {
my $cmd $ssh->exec("cmd");
print "$cmd\n";
}
else {
print "Didn't execute command\n";
}
}
Second, you shouldn't use a loop like this because you're going to be tying up a process just looping over and over again. You can try sleeping until the correct time:
use strict;
use warnings;
use feature qw(say);
use Time::Piece;
my $time_zone = "-0500"; # Or whatever your offset from GMT
my $current_time = local time;
my $run_time = Time::Piece(
$current_time->mdy . " 17:30:00 $time_zone", # Time you want to run including M/D/Y
"%m-%d-%Y %H:%M:%S %z"); # Format of timestamp
sleep $run_time - $current_time;
$ssh->("cmd");
...
What I did here was calculate the difference between the time you want to run your command and the time you want to execute the command. Only issue if I run this script after 5:30pm local time. In that case, I may have to check for the next day.
Or, even better, if you're on Unix, look up the crontab and use that. The crontab will allow you to specify exactly when a particular command should be executed, and you don't have to worry about calculating it in your program. Simply create an entry in the crontab table:
30 17 * * * my_script.pl
The 30 and 17 say you want to run your script everyday at 5:30pm. The other asterisks are for day of the month, the month, and the day of the week. For example, you only want to run your program on weekdays:
30 17 * * 1-5 my_script.pl # Sunday is 0, Mon is 1...
Windows has a similar method called the Schedule Control Panel where you can setup jobs that run at particular times. You might have to use perl my_scipt.pl, so Windows knows to use the Perl interpreter for executing your program.
I highly recommend using the crontab route. It's efficient, guaranteed to work, allows you to concentrate on your program an not finagling when to execute your program. Plus, it's flexible, everyone knows about it, and no one will kill your task while it sits there and waits for 5:30pm.

localtime converts a Unix timestamp (seconds since epoch, which is about 1.4 billion now) to a list of values. The time function conveniently provides that timestamp. From perldoc -f localtime:
Converts a time as returned by the time function to a 9-element
list with the time analyzed for the local time zone. Typically
used as follows:
# 0 1 2 3 4 5 6 7 8
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime(time);
For your time comparison you could do:
$Time = join ':', (localtime(time))[2, 1, 0];
if ($Time eq '17:30:00') {
...
}
Since Perl allows the postcircumfix [...] operator to index into lists just like it does with arrays, we can use it to remove the slice of the (localtime(time)) list that contains hours, minutes, and seconds, join them with colons, and assign the resulting string to $Time.
Note that because $Time now holds a string, you should compare it to '17:30:00' rather than the bareword 17:30:00, which isn't a valid numeric form and should result in a compilation error. And since we're comparing strings instead of numbers, we use the eq operator. == forces numeric context on its operands, and since 17:30:00 isn't a valid number, Perl will treat it as 0 and warn you with
Argument "foo" isn't numeric in numeric eq (==) at ....

Scheduling scripts at a different timezone

There are some programs/scripts that need to be run at specific times in a timezone different from the system timezone.
A la crontab in Perl, but one that honors a timezone and DST rules in a region different from that in which the system is configured.
Here is the use case : I will create an excel sheet with the time in PT in column B and the corresponding program/Perl script to run in column C.
Nothing specific about this information being in a Excel sheet - could be plain text file/"crontab" entry too.
A Perl script will read in the data from the excel sheet and run/spawn those scripts at the correct time.
The thing to keep at mind is that the Perl script should run correctly regardless of what timezone the system that it is running on is.
Regardless of whether the script is running on a Box in NY or IL or CA, it should spawn the scripts at the time mentioned in the file entries as per the Pacific Standard Time with DST at mind.
It is very important, as I said before, of it being aware, "automagically" ( without me doing any explicit programmming ) of the latest DST rules for the PT region.
What would you suggest?
Maybe I can visit some website that shows current time in that region and scan the time value from it, and run the scripts when it's the correct time?
Any such Perl screen scraper friendly site?
Or maybe I can use some smart Perl module, like Schedule::Cron
For the record, a large number of good suggestions came by at http://www.perlmonks.org/index.pl?node_id=772934, however, they, in typical at/cron fashion, work as per the system configured timezone.

In general, if you care about timezones, represent times internally in some universal format and convert times for display purposes only.
Applying this to your problem, write a crontab whose times are expressed in GMT. On each worker machine, convert to local time and install the crontab.
Front matter:
#! /usr/bin/perl
use warnings;
use strict;
use feature qw/ switch /;
use Time::Local qw/ timegm /;
For the conversions this program supports, use today's date and substitute the time from the current cronjob. Return the adjusted hour and day-of-week offset:
sub gmtoday {
my($gmmin,$gmhr,$gmmday,$gmmon,$gmwday) = #_;
my #gmtime = gmtime $^T;
my(undef,undef,$hour,$mday,$mon,$year,$wday) = #gmtime;
my #args = (
0, # sec
$gmmin eq "*" ? "0" : $gmmin,
$gmhr,
$mday,
$mon,
$year,
);
my($lhour,$lwday) = (localtime timegm #args)[2,6];
($lhour, $lwday - $wday);
}
Take the five-field time specification from the current cronjob and convert it from GMT to local time. Note that a fully general implementation would support 32 (i.e., 2 ** 5) cases.
sub localcron {
my($gmmin,$gmhr,$gmmday,$gmmon,$gmwday) = #_;
given ("$gmmin,$gmhr,$gmmday,$gmmon,$gmwday") {
# trivial case: no adjustment necessary
when (/^\d+,\*,\*,\*,\*$/) {
return ($gmmin,$gmhr,$gmmday,$gmmon,$gmwday);
}
# hour and maybe minute
when (/^(\d+|\*),\d+,\*,\*,\*$/) {
my($lhour) = gmtoday #_;
return ($gmmin,$lhour,$gmmday,$gmmon,$gmwday);
}
# day of week, hour, and maybe minute
when (/^(\d+|\*),\d+,\*,\*,\d+$/) {
my($lhour,$wdoff) = gmtoday #_;
return ($gmmin,$lhour,$gmmday,$gmmon,$gmwday+$wdoff);
}
default {
warn "$0: unhandled case: $gmmin $gmhr $gmmday $gmmon $gmwday";
return;
}
}
}
Finally, the main loop reads each line from the input and generates the appropriate output. Note that we do not destroy unhandled times: they instead appear in the output as comments.
while (<>) {
if (/^\s*(?:#.*)?$/) {
print;
next;
}
chomp;
my #gmcron = split " ", $_, 6;
my $cmd = pop #gmcron;
my #localcron = localcron #gmcron;
if (#localcron) {
print join(" " => #localcron), "\t", $cmd, "\n"
}
else {
print "# ", $_, "\n";
}
}
For this sorta-crontab
33 * * * * minute only
0 0 * * * minute and hour
0 10 * * 1 minute, hour, and wday (same day)
0 2 * * 1 minute, hour, and wday (cross day)
the output is the following when run in the US Central timezone:
33 * * * * minute only
0 18 * * * minute and hour
0 4 * * 1 minute, hour, and wday (same day)
0 20 * * 0 minute, hour, and wday (cross day)

In the schedule, store the number of seconds from the epoch when each run should occur rather than a date/time string.
Expanding a little:
#!/usr/bin/perl
use strict; use warnings;
use DateTime;
my $dt = DateTime->new(
year => 2010,
month => 3,
day => 14,
hour => 2,
minute => 0,
second => 0,
time_zone => 'America/Chicago',
);
print $dt->epoch, "\n";
gives me
Invalid local time for date in time zone: America/Chicago
because 2:00 am on March 14, 2010 is when the switch occurs. On the other hand, using hour => 3, I get: 1268553600. Now, in New York, I use:
C:\Temp> perl -e "print scalar localtime 1268553600"
Sun Mar 14 04:00:00 2010
So, the solution seems to be to avoid scheduling these events during non-existent times in your local time zone. This does not require elaborate logic: Just wrap the DateTime constructor call in an eval and deal with the exceptional time.

While I certainly think that there are likely "cleaner" solutions, would the following work?
set the cron to run the scripts several hours ahead of the possible range of times you actually want the script to run
handle the timezone detection in the script and have it sleep for the appropriate amount of time
Again, I know this is kinda kludgey but I thought I would put it out there.

Use the DateTime module to calculate times.
So if your setup says to run a script at 2:30 am every day, you will need logic to:
Try to create a DateTime object for 2:30am in timezone America\Los_Angeles.
If no object add 5 minutes to the time and try again. Give up after 2 hours offset.
Once you have a DateTime object, you can do comparisons with DateTime->now or extract an epoch time from your object and compare that with the results of time.
Note that I chose 2:30 am, since that time won't exist at least 1 day a year. That's why you need to have a loop that adds an offset.

How can I test Perl applications using a changing system time?

I have a web application that I want to run some system tests on, and in order to do that I'm going to need to move the system time around. The application used DateTime all the way through.
Has anyone got any recommendations for how to change the time that DateTime->now reports? The only thing that comes to mind is subclassing DateTime and messing about with all the 'use' lines, but this seems rather invasive.
Note on answers:
All three will work fine, but the Hook::LexWrap one is the one I've chosen because (a) I want to move the clock rather than jiggle it a bit (which is more the purpose of what Time::Mock and friends do); (b) I do, consistently, use DateTime, and I'm happy to have errors come out if I've accidentally not used it; and (c) Hook::LexWrap is simply more elegant than a hack in the symbol table, for all that it does the same thing. (Also, it turns out to be a dependency of some module I already installed, so I didn't even have to CPAN it...)

Rather than taking the high-level approach and wrapping DateTime specifically, you might want to look into the modules Test::MockTime and Time::Mock, which override the low-level functions that DateTime etc. make use of, and (with any luck) will do the right thing on any time-sensitive code. To me it seems like a more robust way to test.

I think Hook::LexWrap is overkill for this situation. It's easier to just redefine such a simple function.
use DateTime;
my $offset;
BEGIN {
$offset = 24 * 60 * 60; # Pretend it's tomorrow
no warnings 'redefine';
sub DateTime::now
{
shift->from_epoch( epoch => ($offset + scalar time), #_ )
}
} # end BEGIN
You can replace my $offset with our $offset if you need to access the $offset from outside the file which contains this code.
You can adjust $offset at any time, if you want to change DateTime's idea of the current time during the run.
The calculation of $offset should probably be more complicated than shown above. For example, to set the "current time" to an absolute time:
my $want = DateTime->new(
year => 2009,
month => 9,
day => 14,
hour => 12,
minute => 0,
second => 0,
time_zone => 'America/Chicago',
);
my $current = DateTime->from_epoch(epoch => scalar time);
$offset = $want->subtract_datetime_absolute($current)->in_units('seconds');
But you probably do want to calculate a fixed number of seconds to add to the current time, so that time will advance normally after that. The problem with using add( days => 1 ); in the redefined now method is that things like DST changes will cause the time to jump at the wrong pseudotime.

You can use code injection via Hook::LexWrap to intercept the now() method.
use Hook::LexWrap;
use DateTime;
# Use real now
test();
{
my $wrapper = wrap 'DateTime::now',
post => sub {
$_[-1] = DateTime->from_epoch( epoch => 0 );
};
# Use fake now
test();
}
# use real now again
test();
sub test {
my $now = DateTime->now;
print "The time is $now\n";
}

When designing a new class with testability in mind, the ideal solution is to be able to inject new date objects.
However, for existing code using DateTime->now and DateTime->today a possible, suitably scoped, solution is below. I include it here as a way to do this without introducing Hook::LexWrap as a dependency and without affecting the behaviour globally.
{
no strict 'refs';
no warnings 'redefine';
local *{'DateTime::today'} = sub {
return DateTime->new(
year => 2012,
month => 5,
day => 31
);
};
say DateTime->today->ymd(); # 2012-05-31
};
say DateTime->today->ymd(); # today

How can I convert a log4j timestamp to milliseconds in Perl?

The log4j logs I have contain timestamps in the following format:
2009-05-10 00:48:41,905
I need to convert it in perl to millseconds since epoch, which in this case would be 124189673005, using the following gawk function. How do I do it in perl?
I have little or no experience in perl, so appreciate if someone can post an entire script that does this
function log4jTimeStampToMillis(log4jts) {
# log4jts is of the form 2009-03-02 20:04:13,474
# extract milliseconds that is after the command
split(log4jts, tsparts, ",");
millis = tsparts[2];
# remove - : from tsstr
tsstr = tsparts[1];
gsub("[-:]", " ", tsstr);
seconds = mktime(tsstr);
print log4jts;
return seconds * 1000 + millis;
}

Though I almost always tell people to go use one of the many excellent modules from the CPAN for this, most of them do have one major drawback - speed. If you're parsing a large number of log files in real-time, that can sometimes be an issue. In those cases, rolling your own can often be a more suitable solution, but there are many pitfalls and nuances that must be considered and handled properly. Hence the preference for using a known-correct, proven, reliable module written by somebody else. :)
However, before I even considered my advice above, I looked at your code and had converted it to perl in my head... therefore, here is a more-or-less direct conversion of your gawk code into perl. I've tried to write it as simply as possible, so as to highlight some of the more delicate parts of dealing with dates and times in perl by hand.
# import the mktime function from the (standard) POSIX module
use POSIX qw( mktime );
sub log4jTimeStampToMillis {
my ($log4jts, $dst) = #_;
# extract the millisecond field
my ($tsstr, $millis) = split( ',', $log4jts );
# extract values to pass to mktime()
my #mktime_args = reverse split( '[-: ]', $tsstr );
# munge values for posix compatibility (ugh)
$mktime_args[3] -= 1;
$mktime_args[4] -= 1;
$mktime_args[5] -= 1900;
# print Dumper \#mktime_args; ## DEBUG
# convert, make sure to account for daylight savings
my $seconds = mktime( #mktime_args, 0, 0, $dst );
# return that time as milliseconds since the epoch
return $seconds * 1000 + $millis;
}
One important difference between my code and yours - my log4jTimeStampToMillis subroutine takes two parameters:
the log timestamp string
whether or not that timestamp is using daylight savings time ( 1 for true, 0 for false )
Of course, you could just add code to detect if that time falls in DST or not and adjust automatically, but I was trying to keep it simple. :)
NOTE: If you uncomment the line marked DEBUG, make sure to add "use Data::Dumper;" before that line in your program so it will work.
Here's an example of how you could test that subroutine:
my $milliseconds = log4jTimeStampToMillis( "2009-05-10 00:48:41,905", 1 );
my $seconds = int( $milliseconds / 1000 );
my $local = scalar localtime( $seconds );
print "ms: $milliseconds\n"; # ms: 1241844521905
print "sec: $seconds\n"; # sec: 1241844521
print "local: $local\n"; # local: Sat May 9 00:48:41 2009

You should take advantage of the great DateTime package, specifically use DateTime::Format::Strptime:
use DateTime;
use DateTime::Format::Strptime;
sub log4jTimeStampToMillis {
my $log4jts=shift(#_);
#see package docs for how the pattern parameter works
my $formatter= new DateTime::Format::Strptime(pattern => '%Y-%m-%d %T,%3N');
my $dayObj = $formatter->parse_datetime($log4jts);
return $dayObj->epoch()*1000+$dayObj->millisecond();
}
print log4jTimeStampToMillis('2009-05-10 10:48:41,905')."\n";
#prints my local version of the TS: 1241952521905
This saves you the pain of figuring out DST yourself (although you'll have to pass your server's TZ to Strptime via the time_zone parameter). It also saves you from dealing with leap everything if it becomes relevant (and I'm sure it will).

Haven't used it, but you might want to check out Time::ParseDate.

SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss,SSS");
Date time = dateFormat.parse(log4jts);
long millis = time.getTime();

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse