Running only one Perl script instance by cron - perl

I need to run Perl script by cron periodically (~every 3-5 minutes). I want to ensure that only one Perl script instance will be running in a time, so next cycle won't start until the previous one is finished. Could/Should that be achieved by some built-in functionality of cron, Perl or I need to handle it at script level?
I am quite new to Perl and cron, so help and general recommendations are appreciated.

I have always had good luck using File::NFSLock to get an exclusive lock on the script itself.
use Fcntl qw(LOCK_EX LOCK_NB);
use File::NFSLock;
# Try to get an exclusive lock on myself.
my $lock = File::NFSLock->new($0, LOCK_EX|LOCK_NB);
die "$0 is already running!\n" unless $lock;
This is sort of the same as the other lock file suggestions, except I don't have to do anything except attempt to get the lock.

The Sys::RunAlone module does what you want very nicely. Just add
use Sys::RunAlone;
near the top of your code.

Use File::Pid to store the script's pid in a file, which the script should check for at the start, and abort if found. You can remove the pidfile when the script is done, but it's not truly necessary, as you can simply check later to see if that process id is still alive (which will also account for the cases when your script aborts unexpectedly):
use strict;
use warnings;
use File::Pid;
my $pidfile = File::Pid->new({file => /var/run/myscript});
exit if $pidfile->running();
$pidfile->write();
# ... rest of script...
# end of script
$pidfile->remove();
exit;

A typical approach is for each process to open and lock a certain file. Then the process reads the process ID contained in the file.
If a process with that ID is running, the latecomer exits quietly. Otherwise, the new winner writes its process ID ($$ in Perl) to the pidfile, closes the handle (which releases the lock), and goes about its business.
Example implementation below:
#! /usr/bin/perl
use warnings;
use strict;
use Fcntl qw/ :DEFAULT :flock :seek /;
my $PIDFILE = "/tmp/my-program.pid";
sub take_lock {
sysopen my $fh, $PIDFILE, O_RDWR | O_CREAT or die "$0: open $PIDFILE: $!";
flock $fh => LOCK_EX or die "$0: flock $PIDFILE: $!";
my $pid = <$fh>;
if (defined $pid) {
chomp $pid;
if (kill 0 => $pid) {
close $fh;
exit 1;
}
}
else {
die "$0: readline $PIDFILE: $!" if $!;
}
sysseek $fh, 0, SEEK_SET or die "$0: sysseek $PIDFILE: $!";
truncate $fh, 0 or die "$0: truncate $PIDFILE: $!";
print $fh "$$\n" or die "$0: print $PIDFILE: $!";
close $fh or die "$0: close: $!";
}
take_lock;
print "$0: [$$] running...\n";
sleep 2;

I have always used this - small and simple - no dependancy on any module and works both Windows + Linux.
use Fcntl ':flock';
### Check to make sure there is only one instance ###
open SELF, "< $0" or die("Cannot run two instances of this program");
unless ( flock SELF, LOCK_EX | LOCK_NB ) {
print "You cannot run two instances of this program , a process is still running";
exit 1;
}

AFAIK perl has no such thing builtin. You could easily create a temporary file, when you start your application and delete it, when your script is done.

Given the frequency I would normally write a daemon (server) that nicely waits idly between job runs (i.e. sleep()) rather than try to use cron for fairly fine-grained access.
If necessary, on Unix / Linux systems you could run it from /etc/inittab (or replacement) to ensure that it always running, and is automatically restarted in the process is killed or dies.
Added: (and some irrelevant stuff removed)
The always present (running, but mostly idle) daemon approach has the benefit of eliminating the possibility of concurrent instances of the script being being started by cron automatically.
However it does mean you are responsible for managing the timing correctly, such as in the case of there is an overlap (i.e. a previous run is still running, while a new trigger occurs). This may help you decide whether to use a forking daemon or non-forking design. Threads don't provide any advantage in this scenario, so there is no need to consider their usage.
This does not completely eliminate the possibility of multiple processes running, but that a common problem with many daemons. The typical solution is to use a semaphore such as a mutually-exclusive lock on a file, to prevent a second instance from being run. The file-lock is automatically forgotten when the process ends, so in the case of abnormal termination (e.g. power failure) there is no clean-up necessary of the lock itself.
An approach using Fcntl module, and using a Perl sysopen with a O_EXCL flag (or O_RDWR | O_CREAT | O_EXCL) was given by Greg Bacon. The only differences I would make are combine exclusive locking into the sysopen call (i.e. use the flags I've suggested), and remove the then redundant flock call. Oh, and I would follow the UNIX (& Linux FHS) file-system and naming conventions of /var/run/daemonname.pid.
Another approach would be to use djb's daemontools or similar to "daemonize" the task.

Related

perl open() always returns the PID of the sh instead of the underlying program

I have to kill a program that I am opening via
$pid = open(FH, "program|")
or
$pid = or open(FH, "-|", "program")
However, the program (mosquittto_sub, to be specific) still lingers around in the background, because open is returning the PID of the sh that perl is using to run the program, so I am only killing the sh wrapper instead of the actual program.
Is there a way to get the programs real PID? What is the point of getting the sh's PID?
There are a few ways to deal with this.
First, you can use a list form to open a process and then no shell is involved so the child process (with pid returned by open) is precisely the one with the program you need to stop
my #cmd = ('progname', '-arg1', ...);
my $pid = open my $fh, '-|', #cmd // die "Can't open \"#cmd\": $!";
...
my $num_signaled = kill 15, $pid;
This sketch needs some checks added. Please see the linked documentation (look for "pipe").
If this isn't suitable for some reason -- perhaps you need the shell to run that program -- then you can find the program's pid, and Proc::ProcessTable module is good for this. A basic demo
use Proc::ProcessTable;
my $prog_name = ...
my $pid;
my $pt = Proc::ProcessTable->new();
foreach my $proc (#{$pt->table}) {
if ($proc->cmndline =~ /\Q$prog_name/) { # is this enough to identify it?
$pid = $proc->pid;
last;
}
}
my $num_signaled = kill 15, $pid;
Please be careful with identifying the program by its name -- on a modern system there may be all kinds of processes running that contain the name of the program you want to terminate. For more detail and discussion please see this post and this post, for starters.
Finally, you can use a module to run your external programs and then you'll be able to manage and control them far more nicely. Here I'd recommend IPC::Run.

Perl Behavioral Differences Closing Child Process Spawned with open() vs. IPC::Open3

I'm trying to figure this out but haven't been able to wrap my head around it. I need to open a piped subprocess and read from its output. Originally I was using the standard open() call like this:
#!/usr/bin/perl;
use warnings;
use strict;
use Scalar::Util qw(openhandle);
use IPC::Open3;
my $fname = "/var/log/file.log.1.gz";
my $pid = open(my $fh, "-|:encoding(UTF-8)", "gunzip -c \"$fname\" | tac");
# Read one line from the file
while (my $row = <$fh>) {
print "Row: $row\n";
last; # Bail out early
}
# Check if the PID is valid and kill it if so
if (kill(0, $pid) == 1) {
kill(15, $pid);
waitpid($pid, 0);
$pid = 0;
}
# Close the filehandle if it is still open
if (openhandle($fh)) {
close $fh;
}
The above works, except that I get errors from tac in the logs saying:
tac: write error
From what I can tell from various testing and research that I've done, this is happening because killing the PID returned from open() just kills the first child process (but not the second) and so when I then close the filehandle, tac is still writing to it, thus the "write error" due to the broken pipe. The strange thing is, at times when I check ($? >> 8) if the close() call returns false, it will return 141, indicating it received a SIGPIPE (backing up my theory above). However, other times it returns 0 which is strange.
Furthermore, if I run the same command but without a double-pipe (only a single one), like this (everything else the same as above):
my $pid = open(my $fh, "-|:encoding(UTF-8)", "gunzip -c \"$fname\"");
...I'll get an error in the logs like this:
gzip: stdout: Broken pipe
...but in this case, gunzip/gzip was the only process (which I killed via the returned PID), so I'm not sure why it would still be writing to the pipe when I close the filehandle (since it was supposed to be killed already, AND waited for with waitpid()).
I'm trying to repro this in the Perl debugger but its difficult because I can't get the stderr of the child process with plain open() (the way I'm seeing the external process' stderr in prod is in the apache2 logs - this is a CGI script).
I understand from reading the docs that I can't get the PID of all child processes in a multi-piped open with open(), so I decided to try and resort to a different method so that I could close all processes cleanly. I tried open3(), and interestingly, without making any changes (literally running basically the same exact scenario as above but with open3() instead of open()):
my $pid = open3(my $in, my $fh, undef, "gunzip -c \"$fname\"");
...and then killing it just like I did above, I don't get any errors. This holds true for both the single piped process as shown above, as well as the double-piped process that involves piping to "tac".
Therefore, I'm wondering what I am missing here? I know there are differences in the way open() and open3() work, but are there differences in the way that child processes are spawned from them? In both cases I can see that the initial child (the PID returned) is itself a child of the Perl process. But its almost as if the process spawned by open(), is not getting properly killed and/or cleaned up (via waitpid()) while the same process spawned by open3() is, and that's the part I can't figure out.
And, more to the bigger picture and the issue at hand - what is the suggestion for the best way to cleanly close a multi-piped process in this sort of scenario? Am I spending more time than is warranted on this? The script itself works as it should aside from these errors, so if it turns out that the tac and gzip errors I'm seeing are inconsequential, should I just live with them and move on?
Any help is much appreciated!
If you just want to read the last line of a gzipped file, it's easy to do it in pure perl without calling an external program:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;
use IO::Uncompress::Gunzip qw/$GunzipError/;
my $fname = 'foo.txt.gz';
my $z = new IO::Uncompress::Gunzip $fname or die "Couldn't open file: $GunzipError\n";
my $row;
while (<$z>) {
$row = $_;
}
say "Row: $row";
This happens because either your perl script or its parent is ignoring the SIGPIPE signal, and the ignore signal dispositions are inherited by the children.
Here is a simpler testcase for your condition:
$ perl -e '$SIG{PIPE}="IGNORE"; open my $fh, "-|", "seq 100000 | tac; true"; print scalar <$fh>'
100000
tac: write error
$ (trap "" PIPE; perl -e 'open my $fh, "-|", "seq 100000 | tac"; print scalar <$fh>')
100000
tac: write error
$ (trap "" PIPE; perl -e 'my $pid = open my $fh, "-|", "seq 100000 | tac"; print scalar <$fh>; kill 15, $pid; waitpid $pid, 0')
100000
$ tac: write error
The latter version does the same kill as the version from the OP, which will not kill either the right or left side of the pipeline, but the shell running and waiting for both (some shells will exec through the left side of a pipeline; with such shells, a ; exit $? could be appended to the command in order to reproduce the example).
A case where SIGPIPE is ignored upon entering a perl script is when run via fastcgi -- which sets the SIGPIPE disposition to ignore, and expects the script to handle it. In that case simply setting an SIGPIPE handler instead of IGNORE (even an empty handler) would work, since in that case the signal disposition will be reset to default upon executing external commands:
$SIG{PIPE} = sub { };
open my $fh, '-|', 'trap - PIPE; ... | tac';
When run as a standalone script it could be some setup bug (I've see it happen in questions related to containerization on Linux), or someone trying to exploit buggy programs running with elevated privileges not bothering to handle write(2) errors (EPIPE in this case).
my $pid = open3(my $in, my $fh, undef, "gunzip -c \"$fname\"");
...and then killing it just like I did above, I don't get any errors.
Where should you get the errors from, if you're redirecting its stderr to the same $fh you only read the first line from?
The thing is absolutely no different with open3:
$ (trap "" PIPE; perl -MIPC::Open3 -e 'my $pid = open3 my $in, my $out, my $err, "seq 100000 | tac 2>/dev/tty"; print scalar <$out>')
100000
$ tac: write error

Synchronize processes by locking a file

One of my scripts is installing a component. When run in parallel, the same script tries to install the same component, so I thought about synchronizing the process by locking a file while the script is installing and wait while other script is installing something.
The code would look like this:
# this will create a file handler on a file from TEMP dir with the
# name of the component; if it doesn't exist in TEMP dir, it will create it
my $file = $ENV{"TEMP"}. "\\" . $componentName;
open (my $fh, ">", "$file") or die "Couldn't open file!";
# this will apply an exclusive lock meaning that if another process
# already locked the file, it will wait until the lock is removed
flock($fh, 2) or die "Failed to lock the file";
# install the component..
# closing the file handle automatically removes the lock
close $fh;
I am concerned about the situation when a script locks the file and is starting the installation and the second script comes and tries to create a file handle on the locked file. I didn't see any errors, but I don't want to miss something.
Will there be a problem with this?
The thing that's important to remember is - the 'open' will work in either case, because that doesn't test the lock. It's the flock operation that will block until the lock is released.
And this should work just fine, although once the lock is released - you might want to check if you still need to run the install, unless you don't really care about doing it twice - e.g. if the rest of the script makes use of/relies upon it.
Also - are there other sources of 'installing' that aren't your script, that could cause the same problem? A lock is an advisory thing.
It would be a style improvement in your program to also:
Test $ENV{'TEMP'} to see that it exists, and default (or fail) if it doesn't.
use Fcntl qw ( :flock ); because then you can flock ( $fh, LOCK_EX ); to make it clear you're taking an exclusive lock.
You appear to be using \\ as a file separator. That's probably better if you used something like File::Spec to do that, for portability reasons.
You can use a LOCK_NB for nonblocking: flock ( $fh, LOCK_EX | LOCK_NB ) and then just skip if it's locked.
A lock doesn't prevent the file from being opened or modified; it prevents it from being locked.
This means the open won't fail, and it will clobber the file even if it's locked and still being used. If the lock is meant to protect access to the file (i.e. if the programs actually write to the locked file), you want to use sysopen to avoid clobbering the file if it already exists[1].
use Fcntl qw( LOCK_EX O_CREAT O_WRONLY );
# Open the file without clobbering it, creating it if necessary.
sysopen(my $fh, $qfn, O_WRONLY | O_CREAT)
or die($!);
# Wait for the file to become available.
flock($fh, LOCK_EX)
or die($!);
truncate($fh, 0)
or die($!);
...
or
use Fcntl qw( LOCK_EX LOCK_NB O_CREAT O_WRONLY );
# Open the file without clobbering it, creating it if necessary.
sysopen(my $fh, $qfn, O_WRONLY | O_CREAT)
or die($!);
# Check if the file is locked.
flock($fh, LOCK_EX | LOCK_NB)
or die($!{EWOULDBLOCK} ? "File already in use\n" : $!);
truncate($fh, 0)
or die($!);
...
You could also use open(my $fh, '>>', $qfn) if you don't mind having the file in append mode.

How to write perl sript that can't be run simultaneously [duplicate]

I need to run Perl script by cron periodically (~every 3-5 minutes). I want to ensure that only one Perl script instance will be running in a time, so next cycle won't start until the previous one is finished. Could/Should that be achieved by some built-in functionality of cron, Perl or I need to handle it at script level?
I am quite new to Perl and cron, so help and general recommendations are appreciated.
I have always had good luck using File::NFSLock to get an exclusive lock on the script itself.
use Fcntl qw(LOCK_EX LOCK_NB);
use File::NFSLock;
# Try to get an exclusive lock on myself.
my $lock = File::NFSLock->new($0, LOCK_EX|LOCK_NB);
die "$0 is already running!\n" unless $lock;
This is sort of the same as the other lock file suggestions, except I don't have to do anything except attempt to get the lock.
The Sys::RunAlone module does what you want very nicely. Just add
use Sys::RunAlone;
near the top of your code.
Use File::Pid to store the script's pid in a file, which the script should check for at the start, and abort if found. You can remove the pidfile when the script is done, but it's not truly necessary, as you can simply check later to see if that process id is still alive (which will also account for the cases when your script aborts unexpectedly):
use strict;
use warnings;
use File::Pid;
my $pidfile = File::Pid->new({file => /var/run/myscript});
exit if $pidfile->running();
$pidfile->write();
# ... rest of script...
# end of script
$pidfile->remove();
exit;
A typical approach is for each process to open and lock a certain file. Then the process reads the process ID contained in the file.
If a process with that ID is running, the latecomer exits quietly. Otherwise, the new winner writes its process ID ($$ in Perl) to the pidfile, closes the handle (which releases the lock), and goes about its business.
Example implementation below:
#! /usr/bin/perl
use warnings;
use strict;
use Fcntl qw/ :DEFAULT :flock :seek /;
my $PIDFILE = "/tmp/my-program.pid";
sub take_lock {
sysopen my $fh, $PIDFILE, O_RDWR | O_CREAT or die "$0: open $PIDFILE: $!";
flock $fh => LOCK_EX or die "$0: flock $PIDFILE: $!";
my $pid = <$fh>;
if (defined $pid) {
chomp $pid;
if (kill 0 => $pid) {
close $fh;
exit 1;
}
}
else {
die "$0: readline $PIDFILE: $!" if $!;
}
sysseek $fh, 0, SEEK_SET or die "$0: sysseek $PIDFILE: $!";
truncate $fh, 0 or die "$0: truncate $PIDFILE: $!";
print $fh "$$\n" or die "$0: print $PIDFILE: $!";
close $fh or die "$0: close: $!";
}
take_lock;
print "$0: [$$] running...\n";
sleep 2;
I have always used this - small and simple - no dependancy on any module and works both Windows + Linux.
use Fcntl ':flock';
### Check to make sure there is only one instance ###
open SELF, "< $0" or die("Cannot run two instances of this program");
unless ( flock SELF, LOCK_EX | LOCK_NB ) {
print "You cannot run two instances of this program , a process is still running";
exit 1;
}
AFAIK perl has no such thing builtin. You could easily create a temporary file, when you start your application and delete it, when your script is done.
Given the frequency I would normally write a daemon (server) that nicely waits idly between job runs (i.e. sleep()) rather than try to use cron for fairly fine-grained access.
If necessary, on Unix / Linux systems you could run it from /etc/inittab (or replacement) to ensure that it always running, and is automatically restarted in the process is killed or dies.
Added: (and some irrelevant stuff removed)
The always present (running, but mostly idle) daemon approach has the benefit of eliminating the possibility of concurrent instances of the script being being started by cron automatically.
However it does mean you are responsible for managing the timing correctly, such as in the case of there is an overlap (i.e. a previous run is still running, while a new trigger occurs). This may help you decide whether to use a forking daemon or non-forking design. Threads don't provide any advantage in this scenario, so there is no need to consider their usage.
This does not completely eliminate the possibility of multiple processes running, but that a common problem with many daemons. The typical solution is to use a semaphore such as a mutually-exclusive lock on a file, to prevent a second instance from being run. The file-lock is automatically forgotten when the process ends, so in the case of abnormal termination (e.g. power failure) there is no clean-up necessary of the lock itself.
An approach using Fcntl module, and using a Perl sysopen with a O_EXCL flag (or O_RDWR | O_CREAT | O_EXCL) was given by Greg Bacon. The only differences I would make are combine exclusive locking into the sysopen call (i.e. use the flags I've suggested), and remove the then redundant flock call. Oh, and I would follow the UNIX (& Linux FHS) file-system and naming conventions of /var/run/daemonname.pid.
Another approach would be to use djb's daemontools or similar to "daemonize" the task.

Atomic open of non-existing file in Perl

I want to write something to a file which name is in variable $filename.
I don't want to overwrite it, so I check first if it exists and then open it:
#stage1
if(-e $filename)
{
print "file $filename exists, not overwriting\n";
exit 1;
}
#stage2
open(OUTFILE, ">", $filename) or die $!;
But this is not atomic. Theoretically someone can create this file between stage1 and stage2. Is there some variant of open command that will do these both things in atomic way, so it will fail to open a file for writing if the file exists?
Here is an atomic way of opening files:
#!/usr/bin/env perl
use strict;
use warnings qw(all);
use Fcntl qw(:DEFAULT :flock);
my $filename = 'test';
my $fh;
# this is "atomic open" part
unless (sysopen($fh, $filename, O_CREAT | O_EXCL | O_WRONLY)) {
print "file $filename exists, not overwriting\n";
exit 1;
}
# flock() isn't required for "atomic open" per se
# but useful in real world usage like log appending
flock($fh, LOCK_EX);
# use the handle as you wish
print $fh scalar localtime;
print $fh "\n";
# unlock & close
flock($fh, LOCK_UN);
close $fh;
Debug session:
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ cat test
Wed Dec 19 12:10:37 2012
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ perl sysopen.pl
file test exists, not overwriting
stas#Stanislaws-MacBook-Pro:~/stackoverflow$ cat test
Wed Dec 19 12:10:37 2012
If you're concerned about multiple Perl scripts modifying the same file, just use the flock() function in each one to lock the file you're interested in.
If you're worried about external processes, which you probably don't have control over, you can use the sysopen() function. According to the Programming Perl book (which I highly recommend, by the way):
To fix this problem of overwriting, you’ll need to use sysopen, which
provides individual controls over whether to create a new file or
clobber an existing one. And we’ll ditch that –e file existence test
since it serves no useful purpose here and only increases our exposure
to race conditions.
They also provide this sample block of code:
use Fcntl qw/O_WRONLY O_CREAT O_EXCL/;
open(FH, "<", $file)
|| sysopen(FH, $file, O_WRONLY | O_CREAT | O_EXCL)
|| die "can't create new file $file: $!";
In this example, they first pull in a few constants (to be used in the sysopen call). Next, they try to open the file with open, and if that fails, they then try sysopen. They continue on to say:
Now even if the file somehow springs into existence between when open
fails and when sysopen tries to open a new file for writing, no harm
is done, because with the flags provided, sysopen will refuse to open
a file that already exists.
So, to make things clear for your situation, remove the file test completely (no more stage 1), and only do the open operation using code similar to the block above. Problem solved!