Perl: Monitor a background process's output without waiting for it to complete - perl

I'm attempting to write a manager in Perl to automate a bioinformatics pipeline my lab has been using. (The REPET pipeline, for anyone who's interested.) The pipeline has eight steps, several of which are broken down into substeps which can be run in parallel. Most notably, step 3 is broken down into three parts, and step 4 into three corresponding parts. Each part of step 3 can be run independently, and its corresponding part in step 4 can be started as soon as its step 3 companion is finished. I'd like my manager to be able to launch step 3 in three parallel threads, and, for each thread, move on to step 4 as soon as step 3 is finished. The best way I can think to do that is to monitor the output of each process. The output from each step looks like this:
START TEdenovo.py (2012-08-23 11:20:10)
version 2.0
project name = dm3_chr2L
project directory = /home/<etc>
beginning of step 1
submitting job(s) with groupid 'dm3_chr2L_TEdenovo_prepareBatches' (2012-08-23 11:20:10)
waiting for 1 job(s) with groupid 'dm3_chr2L_TEdenovo_prepareBatches' (2012-08-23 11:20:10)
execution time per job: n=1 mean=2.995 var=0.000 sd=0.000 min=2.995 med=2.995 max=2.995
step 1 finished successfully
version 2.0
END TEdenovo.py (2012-08-23 11:20:25)
That's the output for step 1, but in step 3, when "step 3 finished successfully" appears in the output, it's safe to move on to step 4. The problem has been successfully tabulating the output for three of these processes as they run at once. Essentially, this is the behavior that I want (pseudocode):
my $log31 = `TEdenovo.py [options] &`;
my $log32 = `TEdenovo.py [options] &`;
my $log33 = `TEdenovo.py [options] &`;
while(1) {
#start step 41 if $log31 =~ /step 3 finished successfully/;
#start step 42 if $log32 =~ /step 3 finished successfully/;
#start step 43 if $log33 =~ /step 3 finished successfully/;
#monitor logs 41, 42, 43 similarly
last if #all logs read "finished successfully"
sleep(5);
}
#move on to step 5
The problem is that evoking a process with backticks causes perl to wait until that process has finished to move on; as I discovered, it isn't like with system(), where you can spin something into a background process with & and then proceed immediately. As far as I know, there isn't a good way to use system() to get the effect I'm looking for. I suppose I could do this:
system("TEdenovo.py [options] & > log31.txt");
And then poll log31.txt periodically to see whether "finished successfully" has appeared, but that seems unecessarily messy.
I've also tried opening the process in a filehandle:
open(my $step3, "TEdenovo.py [options] |");
my #log3;
while(1)
{
push(#log3, <$step3>);
last if grep("step 3 finished successfully", #log3);
sleep(5);
}
...but, once again, Perl waits until the process has finished in order to move on (in this case, at the push()). I tried the above with $| both set and unset.
So, the essence of my question is: Is there a way to capture the standard output of a running background process in perl?

maybe you could try
open(my $step3, "TEdenovo.py [options] |");
while(<$step3>)
{
last if /step 3 finished successfully/;
}
instead of while(1) ?

The approach of using open and reading from the pipehandle is the a correct approach. If Nahuel's suggestion of reading from the handle in scalar context doesn't help, you could still be suffering from buffering.
$| changes the buffering behavior of Perl's output, but not the behavior of any external programs called from Perl. You have to use an external program that doesn't buffer its output. In this case, I believe this is possible by passing the -u option to python:
open(my $step3, "|-", "python -u TEdenovo.py [more options]");

Related

How can I log the results of a series of TAP tests, while keeping the output to the user separate?

I have a series of tests that I run through a Test::Harness. I'm quite happy with the level of verbosity that the tests have so far: the user knows the number of tests that ran and the number of tests that succeeded, and the overall results of the test:
/home/user/project/t/01.my_test.t .. ok
All tests successful.
Files=1, Tests=136, 15 wallclock secs ( 0.03 usr 0.00 sys + 13.84 cusr 0.55 csys = 14.42 CPU)
Result: PASS
However, I would like to log the output of each test into a separate file that can be checked at a later date. The regular output while running the test should still be the same (an aggregate), but the test.log (or however it is named) should have the information for the specific result for each test that run (hopefully including any additional parts of the output, like comments and such).
The closest I've got is capturing the entire output with Capture::Tiny and then processing it myself manually. But apart from this being cumbersome and error prone, the output is processed all as a whole, so while the tests are running there is nothing to show. I would like to avoid this.
I've looked at TAP::Harness::Archive as well, but I couldn't get it to run, so I'm not sure if it can be used for this.
What is the best way to do this?
Edit
I'm implementing a pseudo package manager, and the tests I'm running are part of the installation process. When the packages get installed it is tested, and if the tests fail installation is aborted.
This is implemented right now as a test method for a package object. I thought that, since this would have to run on multiple different platforms, it was probably a better idea to do things within Perl, without having to call system commands (including prove), but I'm ready to change my ways.
This is how tests are being run right now:
use Path::Class;
my #tests;
while (my $file = readdir(DIR)) {
push #tests, file($path, $file) if ($file =~ /\.t$/);
}
#tests = sort #tests;
# Run the tests
my $harness = TAP::Harness->new({
failures => 1,
exec => [ 'myinterpreter' ],
});
my $aggregator = $harness->runtests(#tests);
if ($aggregator->all_passed) { return 1 } else { return 0 }
You could run your tests with the tool prove. It provides the option -a to archive each run in the TAP-Format. This can then be later processed, e.g. with Archive::TAP::Convert.
Here I run the tool in a module directory, where the code is in lib and the tests are in t:
prove -a tests.tgz -Ilib t/
The results are displayed to terminal and also captured in the file tests.tgz.
Maybe not exactly what you where looking for? But it works and is solid. You should use the tools, when they are already there.
NOTE: According to the comment from Michael Carman, "prove is just a (very thin) wrapper around App::Prove. If you're trying to avoid system calls, you could use the module directly."

Catching the error status while running scripts in parallel on Jenkins

I'm running two perl scripts in parallel on Jenkins and one more script which should get executed if the first two succeed. If I get an error in script1, script 2 still runs and hence the exit status becomes successful.
I want to run it in such a way that if any one of the parallel script fails, the job should stop with a failure status.
Currently my setup looks like
perl_script_1 &
perl_script_2 &
wait
perl_script_3
If script 1 or 2 fails in the middle, the job should be terminated with a Failure status without executing job 3.
Note: I'm using tcsh shell in Jenkins.
I have a similar setup where I run several java processes (tests) in parallel and wait for them to finish. If any fail, I fail the rest of my script.
Each test process writes its result to a file to be tested once done.
Note - the code examples below are written in bash, but it should be similar in tcsh.
To do this, I get the process id for every execution:
test1 &
test1_pid=$!
# test1 will write pass or fail to file test1_result
test2 &
test2_pid=$!
...
Now, I wait for the processes to finish by using the kill -0 PID command
For example test1:
# Check test1
kill -0 $test1_pid
# Check if process is done or not
if [ $? -ne 0 ]
then
echo process test1 finished
# check results
grep fail test1_result
if [ $? -eq 0 ]
then
echo test1 failed
mark_whole_build_failed
fi
fi
Same for other tests (you can do a loop to test all running processes periodically).
Later condition the rest of the execution based on mark_whole_build_failed.
I hope this helps.

how to disable Net::SSH::Expect timeout time

My master scripts executes on gateway server. It invokes another script and executes it on a remote server, however there is no definite run time for the remote script. So i would like to disable the timeout..
Ok let me explain.
sub rollout
{
(my $path, my $address)=#_;
my $errorStatus = 0;
print "Rollout process being initiated on netsim\n";
#Creating SSH object
my $ssh = Net::SSH::Expect-> new (
host => "netsim",
user => 'root',
raw_pty => 1
);
#Starting SSH process
$ssh->run_ssh() or die "SSH process couldn't start: $!";
#Start the interactive session
$ssh->exec("/tmp/rollout.pl $path $address", 30);
return $errorStatus;
}
When this part of the code gets executed, the rollout.pl script is invoked, however after a second the process gets killed on the remote server
Without knowing exactly what is going wrong let me do a wild guess at an answer. Should i missinterpret your question, i will edit this later:
The default and enforced timeout of Net::SSH::Expect is only a timeout for read operations. It does not mean, that a started script will be killed when its time has run out, it just means that the last reading operation no longer waits for it saying stuff.
You can start your remote script, for example with a timeout of 1 second and the exec call may return after one second. The string returned by exec will then only contain the first second of output by the remote script. But the remote script may continue to run, you just need to start a new reading operation (there are several to your choice) to capture future output. The remote script, if everything works as expected, should continue to run until it is finished or you close your ssh connection.
That means, it will be killed if you just exec, wait for that to return and then close your connection. One way to go would be to have the remote script print something when it is finished ("done" comes to mind) and wait for that string with a waitfor with very high timeout and only close the connection when waitfor has returned TRUE. Or, well, if the timeout was reasonably long, also when it returns FALSE. If you want to monitor for your script crashing, i would suggest using pgrep or ps on the remote machine. Or make sure your remote script says something when dieing, for example using eval{} or die "done".
edit:
One thing to remember is, that this exec is in fact not something like system. It basically sends the command to the remote shell and hits enter. At least that is how you might visualize what is happening. But then, to appear more like system it waits for the default timeout of 1 second for any output and stores that in its own return value. Sadly it is a rather dumb reading operation as it just sits there and listens, so it is not a good idea to give it a high timeout or actually use it to interact with the script.
I would suggest to uses exec with a timeout of 0 and then use waitfor with a very high timeout to wait for whatever pattern your remote script puts as output once finished. And only then close your connection.
edit 2:
Just replace your exec with something like:
$ssh->exec("/tmp/rollout.pl $path $address", 0);
my $didItReturn = $ssh->waitfor("done", 300);
And make sure rollout.pl prints "done" when it is finished. And not before. Now this will start the script and then wait for up to 5 minutes for the "done". If a "done" showed up, $didItReturn will be true, if it timed out, it will be false. It will not wait for 5 minutes if it sees a "done" (as opposed to giving a timeout to exec).
edit 3:
If you want no timeout at all:
$ssh->exec("/tmp/rollout.pl $path $address", 0);
my $didItReturn = 0;
while ($didItReturn = 0)
{
$didItReturn = $ssh->waitfor("done", 300);
}

How do I exit the vi session which runs during Perl script execution?

I have to run few jobs. After running each job, it runs vi on the contents. After writing and quitting (usually I do :wq!) those data get updated in the database. As the number of these kind of jobs is more than a hundred, I thought of automating the process using Perl.
But when I ran the script, I got stuck in vi, unable to make it exit on its own. This requires manual intervention and fails the purpose of my script. I need help on how to handle such situation as it will help me to save time and effort.
Code is as given below:
print "Enter job name - \n";
$job_rc = <>;
print "Job entered by you is $job_rc \n";
my #job_name = ("job1", "job2", "job3", "job4");
my $total_job = #job_name;
print "Total job present = $total_job + 1 \n";
for ($i = 0; $i < $total_job; $i++) {
print "Curent job name: $job_name[$i] \n";
system "cr_job $job_name[$i] $job_rc";
sleep(10);
}
I think you approach the problem from the wrong side. Instead of exiting vi, think about not running it.
I can only guess why vi runs, it seems related to your “jobs”. One of the possible reasons is that they run a default text editor to grab some user input (well-known example of such behaviour is that when you call hg commit, svn commit, cvs ci, etc. without providing message, they automatically run a text editor to get the commit message).
If this is the case, first check your “jobs”, as they may have options to disable this very prompt. If not, they may be using the $EDITOR environment variable to decide which editor to run, setting this variable to something you prepare (for example, script which will write default message to file given as parameter) may do.

How do I daemonize an arbitrary script in unix?

I'd like a daemonizer that can turn an arbitrary, generic script or command into a daemon.
There are two common cases I'd like to deal with:
I have a script that should run forever. If it ever dies (or on reboot), restart it. Don't let there ever be two copies running at once (detect if a copy is already running and don't launch it in that case).
I have a simple script or command line command that I'd like to keep executing repeatedly forever (with a short pause between runs). Again, don't allow two copies of the script to ever be running at once.
Of course it's trivial to write a "while(true)" loop around the script in case 2 and then apply a solution for case 1, but a more general solution will just solve case 2 directly since that applies to the script in case 1 as well (you may just want a shorter or no pause if the script is not intended to ever die (of course if the script really does never die then the pause doesn't actually matter)).
Note that the solution should not involve, say, adding file-locking code or PID recording to the existing scripts.
More specifically, I'd like a program "daemonize" that I can run like
% daemonize myscript arg1 arg2
or, for example,
% daemonize 'echo `date` >> /tmp/times.txt'
which would keep a growing list of dates appended to times.txt. (Note that if the argument(s) to daemonize is a script that runs forever as in case 1 above, then daemonize will still do the right thing, restarting it when necessary.) I could then put a command like above in my .login and/or cron it hourly or minutely (depending on how worried I was about it dying unexpectedly).
NB: The daemonize script will need to remember the command string it is daemonizing so that if the same command string is daemonized again it does not launch a second copy.
Also, the solution should ideally work on both OS X and linux but solutions for one or the other are welcome.
EDIT: It's fine if you have to invoke it with sudo daemonize myscript myargs.
(If I'm thinking of this all wrong or there are quick-and-dirty partial solutions, I'd love to hear that too.)
PS: In case it's useful, here's a similar question specific to python.
And this answer to a similar question has what appears to be a useful idiom for a quick-and-dirty demonizing of an arbitrary script:
You can daemonize any executable in Unix by using nohup and the & operator:
nohup yourScript.sh script args&
The nohup command allows you to shut down your shell session without it killing your script, while the & places your script in the background so you get a shell prompt to continue your session. The only minor problem with this is standard out and standard error both get sent to ./nohup.out, so if you start several scripts in this manor their output will be intertwined. A better command would be:
nohup yourScript.sh script args >script.out 2>script.error&
This will send standard out to the file of your choice and standard error to a different file of your choice. If you want to use just one file for both standard out and standard error you can us this:
nohup yourScript.sh script args >script.out 2>&1 &
The 2>&1 tells the shell to redirect standard error (file descriptor 2) to the same file as standard out (file descriptor 1).
To run a command only once and restart it if it dies you can use this script:
#!/bin/bash
if [[ $# < 1 ]]; then
echo "Name of pid file not given."
exit
fi
# Get the pid file's name.
PIDFILE=$1
shift
if [[ $# < 1 ]]; then
echo "No command given."
exit
fi
echo "Checking pid in file $PIDFILE."
#Check to see if process running.
PID=$(cat $PIDFILE 2>/dev/null)
if [[ $? = 0 ]]; then
ps -p $PID >/dev/null 2>&1
if [[ $? = 0 ]]; then
echo "Command $1 already running."
exit
fi
fi
# Write our pid to file.
echo $$ >$PIDFILE
# Get command.
COMMAND=$1
shift
# Run command until we're killed.
while true; do
$COMMAND "$#"
sleep 10 # if command dies immediately, don't go into un-ctrl-c-able loop
done
The first argument is the name of the pid file to use. The second argument is the command. And all other arguments are the command's arguments.
If you name this script restart.sh this is how you would call it:
nohup restart.sh pidFileName yourScript.sh script args >script.out 2>&1 &
I apologise for the long answer (please see comments about how my answer nails the spec). I'm trying to be comprehensive, so you have as good of a leg up as possible. :-)
If you are able to install programs (have root access), and are willing to do one-time legwork to set up your script for daemon execution (i.e., more involved than simply specifying the command-line arguments to run on the command line, but only needing to be done once per service), I have a way that's more robust.
It involves using daemontools. The rest of the post describes how to set up services using daemontools.
Initial setup
Follow the instructions in How to install daemontools. Some distributions (e.g., Debian, Ubuntu) already have packages for it, so just use that.
Make a directory called /service. The installer should have already done this, but just verify, or if installing manually. If you dislike this location, you can change it in your svscanboot script, although most daemontools users are used to using /service and will get confused if you don't use it.
If you're using Ubuntu or another distro that doesn't use standard init (i.e., doesn't use /etc/inittab), you will need to use the pre-installed inittab as a base for arranging svscanboot to be called by init. It's not hard, but you need to know how to configure the init that your OS uses.
svscanboot is a script that calls svscan, which does the main work of looking for services; it's called from init so init will arrange to restart it if it dies for any reason.
Per-service setup
Each service needs a service directory, which stores housekeeping information about the service. You can also make a location to house these service directories so they're all in one place; usually I use /var/lib/svscan, but any new location will be fine.
I usually use a script to set up the service directory, to save lots of manual repetitive work. e.g.,
sudo mkservice -d /var/lib/svscan/some-service-name -l -u user -L loguser "command line here"
where some-service-name is the name you want to give your service, user is the user to run that service as, and loguser is the user to run the logger as. (Logging is explained in just a little bit.)
Your service has to run in the foreground. If your program backgrounds by default, but has an option to disable that, then do so. If your program backgrounds without a way to disable it, read up on fghack, although this comes at a trade-off: you can no longer control the program using svc.
Edit the run script to ensure it's doing what you want it to. You may need to place a sleep call at the top, if you expect your service to exit frequently.
When everything is set up right, create a symlink in /service pointing to your service directory. (Don't put service directories directly within /service; it makes it harder to remove the service from svscan's watch.)
Logging
The daemontools way of logging is to have the service write log messages to standard output (or standard error, if you're using scripts generated with mkservice); svscan takes care of sending log messages to the logging service.
The logging service takes the log messages from standard input. The logging service script generated by mkservice will create auto-rotated, timestamped log files in the log/main directory. The current log file is called current.
The logging service can be started and stopped independently of the main service.
Piping the log files through tai64nlocal will translate the timestamps into a human-readable format. (TAI64N is a 64-bit atomic timestamp with a nanosecond count.)
Controlling services
Use svstat to get the status of a service. Note that the logging service is independent, and has its own status.
You control your service (start, stop, restart, etc.) using svc. For example, to restart your service, use svc -t /service/some-service-name; -t means "send SIGTERM".
Other signals available include -h (SIGHUP), -a (SIGALRM), -1 (SIGUSR1), -2 (SIGUSR2), and -k (SIGKILL).
To down the service, use -d. You can also prevent a service from automatically starting at bootup by creating a file named down in the service directory.
To start the service, use -u. This is not necessary unless you've downed it previously (or set it up not to auto-start).
To ask the supervisor to exit, use -x; usually used with -d to terminate the service as well. This is the usual way to allow a service to be removed, but you have to unlink the service from /service first, or else svscan will restart the supervisor.
Also, if you created your service with a logging service (mkservice -l), remember to also exit the logging supervisor (e.g., svc -dx /var/lib/svscan/some-service-name/log) before removing the service directory.
Summary
Pros:
daemontools provides a bulletproof way to create and manage services. I use it for my servers, and I highly recommend it.
Its logging system is very robust, as is the service auto-restart facility.
Because it starts services with a shell script that you write/tune, you can tailor your service however you like.
Powerful service control tools: you can send most any signal to a service, and can bring services up and down reliably.
Your services are guaranteed a clean execution environment: they will execute with the same environment, process limits, etc., as what init provides.
Cons:
Each service takes a bit of setup. Thankfully, this only needs doing once per service.
Services must be set up to run in the foreground. Also, for best results, they should be set up to log to standard output/standard error, rather than syslog or other files.
Steep learning curve if you're new to the daemontools way of doing things. You have to restart services using svc, and cannot run the run scripts directly (since they would then not be under the control of the supervisor).
Lots of housekeeping files, and lots of housekeeping processes. Each service needs its own service directory, and each service uses one supervisor process to auto-restart the service if it dies. (If you have many services, you will see lots of supervise processes in your process table.)
In balance, I think daemontools is an excellent system for your needs. I welcome any questions about how to set it up and maintain it.
You should have a look at daemonize. It allows to detect second copy (but it uses file locking mechanism). Also it works on different UNIX and Linux distributions.
If you need to automatically start your application as daemon, then you need to create appropriate init-script.
You can use the following template:
#!/bin/sh
#
# mydaemon This shell script takes care of starting and stopping
# the <mydaemon>
#
# Source function library
. /etc/rc.d/init.d/functions
# Do preliminary checks here, if any
#### START of preliminary checks #########
##### END of preliminary checks #######
# Handle manual control parameters like start, stop, status, restart, etc.
case "$1" in
start)
# Start daemons.
echo -n $"Starting <mydaemon> daemon: "
echo
daemon <mydaemon>
echo
;;
stop)
# Stop daemons.
echo -n $"Shutting down <mydaemon>: "
killproc <mydaemon>
echo
# Do clean-up works here like removing pid files from /var/run, etc.
;;
status)
status <mydaemon>
;;
restart)
$0 stop
$0 start
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
exit 1
esac
exit 0
I think you may want to try start-stop-daemon(8). Check out scripts in /etc/init.d in any Linux distro for examples. It can find started processes by command line invoked or PID file, so it matches all your requirements except being a watchdog for your script. But you can always start another daemon watchdog script that just restarts your script if necessary.
As an alternative to the already mentioned daemonize and daemontools, there is the daemon command of the libslack package.
daemon is quite configurable and does care about all the tedious daemon stuff such as automatic restart, logging or pidfile handling.
If you're using OS X specifically, I suggest you take a look at how launchd works. It will automatically check to ensure your script is running and relaunch it if necessary. It also includes all sorts of scheduling features, etc. It should satisfy both requirement 1 and 2.
As for ensuring only one copy of your script can run, you need to use a PID file. Generally I write a file to /var/run/.pid that contains a PID of the current running instance. if the file exists when the program runs, it checks if the PID in the file is actually running (the program may have crashed or otherwise forgotten to delete the PID file). If it is, abort. If not, start running and overwrite the PID file.
Daemontools ( http://cr.yp.to/daemontools.html ) is a set of pretty hard-core utilities used to do this, written by dj bernstein. I have used this with some success. The annoying part about it is that none of the scripts return any visible results when you run them - just invisible return codes. But once it's running it's bulletproof.
First get createDaemon() from http://code.activestate.com/recipes/278731/
Then the main code:
import subprocess
import sys
import time
createDaemon()
while True:
subprocess.call(" ".join(sys.argv[1:]),shell=True)
time.sleep(10)
You could give a try to immortal It is a *nix cross-platform (OS agnostic) supervisor.
For a quick try on macOS:
brew install immortal
In case you are using FreeBSD from the ports or by using pkg:
pkg install immortal
For Linux by downloading the precompiled binaries or from source: https://immortal.run/source/
You can either use it like this:
immortal -l /var/log/date.log date
Or by a configuration YAML file which gives you more options, for example:
cmd: date
log:
file: /var/log/date.log
age: 86400 # seconds
num: 7 # int
size: 1 # MegaBytes
timestamp: true # will add timesamp to log
If you would like to keep also the standard error output in a separate file you could use something like:
cmd: date
log:
file: /var/log/date.log
age: 86400 # seconds
num: 7 # int
size: 1 # MegaBytes
stderr:
file: /var/log/date-error.log
age: 86400 # seconds
num: 7 # int
size: 1 # MegaBytes
timestamp: true # will add timesamp to log
This is a working version complete with an example which you can copy into an empty directory and try out (after installing the CPAN dependencies, which are Getopt::Long, File::Spec, File::Pid, and IPC::System::Simple -- all pretty standard and are highly recommended for any hacker: you can install them all at once with cpan <modulename> <modulename> ...).
keepAlive.pl:
#!/usr/bin/perl
# Usage:
# 1. put this in your crontab, to run every minute:
# keepAlive.pl --pidfile=<pidfile> --command=<executable> <arguments>
# 2. put this code somewhere near the beginning of your script,
# where $pidfile is the same value as used in the cron job above:
# use File::Pid;
# File::Pid->new({file => $pidfile})->write;
# if you want to stop your program from restarting, you must first disable the
# cron job, then manually stop your script. There is no need to clean up the
# pidfile; it will be cleaned up automatically when you next call
# keepAlive.pl.
use strict;
use warnings;
use Getopt::Long;
use File::Spec;
use File::Pid;
use IPC::System::Simple qw(system);
my ($pid_file, $command);
GetOptions("pidfile=s" => \$pid_file,
"command=s" => \$command)
or print "Usage: $0 --pidfile=<pidfile> --command=<executable> <arguments>\n", exit;
my #arguments = #ARGV;
# check if process is still running
my $pid_obj = File::Pid->new({file => $pid_file});
if ($pid_obj->running())
{
# process is still running; nothing to do!
exit 0;
}
# no? restart it
print "Pid " . $pid_obj->pid . " no longer running; restarting $command #arguments\n";
system($command, #arguments);
example.pl:
#!/usr/bin/perl
use strict;
use warnings;
use File::Pid;
File::Pid->new({file => "pidfile"})->write;
print "$0 got arguments: #ARGV\n";
Now you can invoke the example above with: ./keepAlive.pl --pidfile=pidfile --command=./example.pl 1 2 3 and the file pidfile will be created, and you will see the output:
Pid <random number here> no longer running; restarting ./example.pl 1 2 3
./example.pl got arguments: 1 2 3
You might also try Monit. Monit is a service that monitors and reports on other services. While it's mainly used as a way to notify (via email and sms) about runtime problems, it can also do what most of the other suggestions here have advocated. It can auto (re)start and stop programs, send emails, initiate other scripts, and maintain a log of output that you can pick up. In addition, I've found it's easy to install and maintain since there's solid documentation.
I have made a series of improvements on the other answer.
stdout out of this script is purely made up of stdout coming from its child UNLESS it exits due to detecting that the command is already being run
cleans up after its pidfile when terminated
optional configurable timeout period (Accepts any positive numeric argument, sends to sleep)
usage prompt on -h
arbitrary command execution, rather than single command execution. The last arg OR remaining args (if more than one last arg) are sent to eval, so you can construct any sort of shell script as a string to send to this script as a last arg (or trailing args) for it to daemonize
argument count comparisons done with -lt instead of <
Here is the script:
#!/bin/sh
# this script builds a mini-daemon, which isn't a real daemon because it
# should die when the owning terminal dies, but what makes it useful is
# that it will restart the command given to it when it completes, with a
# configurable timeout period elapsing before doing so.
if [ "$1" = '-h' ]; then
echo "timeout defaults to 1 sec.\nUsage: $(basename "$0") sentinel-pidfile [timeout] command [command arg [more command args...]]"
exit
fi
if [ $# -lt 2 ]; then
echo "No command given."
exit
fi
PIDFILE=$1
shift
TIMEOUT=1
if [[ $1 =~ ^[0-9]+(\.[0-9]+)?$ ]]; then
TIMEOUT=$1
[ $# -lt 2 ] && echo "No command given (timeout was given)." && exit
shift
fi
echo "Checking pid in file ${PIDFILE}." >&2
#Check to see if process running.
if [ -f "$PIDFILE" ]; then
PID=$(< $PIDFILE)
if [ $? = 0 ]; then
ps -p $PID >/dev/null 2>&1
if [ $? = 0 ]; then
echo "This script is (probably) already running as PID ${PID}."
exit
fi
fi
fi
# Write our pid to file.
echo $$ >$PIDFILE
cleanup() {
rm $PIDFILE
}
trap cleanup EXIT
# Run command until we're killed.
while true; do
eval "$#"
echo "I am $$ and my child has exited; restart in ${TIMEOUT}s" >&2
sleep $TIMEOUT
done
Usage:
$ term-daemonize.sh pidfilefortesting 0.5 'echo abcd | sed s/b/zzz/'
Checking pid in file pidfilefortesting.
azzzcd
I am 79281 and my child has exited; restart in 0.5s
azzzcd
I am 79281 and my child has exited; restart in 0.5s
azzzcd
I am 79281 and my child has exited; restart in 0.5s
^C
$ term-daemonize.sh pidfilefortesting 0.5 'echo abcd | sed s/b/zzz/' 2>/dev/null
azzzcd
azzzcd
azzzcd
^C
Beware that if you run this script from different directories it may use different pidfiles and not detect any existing running instances. Since it is designed to run and restart ephemeral commands provided through an argument there is no way to know whether something's been already started, because who is to say whether it is the same command or not? To improve on this enforcement of only running a single instance of something, a solution specific to the situation is required.
Also, for it to function as a proper daemon, you must use (at the bare minimum) nohup as the other answer mentions. I have made no effort to provide any resilience to signals the process may receive.
One more point to take note of is that killing this script (if it was called from yet another script which is killed, or with a signal) may not succeed in killing the child, especially if the child is yet another script. I am uncertain of why this is, but it seems to be something related to the way eval works, which is mysterious to me. So it may be prudent to replace that line with something that accepts only a single command like in the other answer.
There is also a very simple double-fork + setsid approach to detach any script from its parent process
( setsid my-regular-script arg [arg ...] 1>stdout.log 2>stderr.log & )
setsid is a part of standard util-linux package which has been with linux since birth. This works when launched in any POSIX compatible shell I know.
Another double-fork based approach doesn't even require any extra exacutables or packages and relies purely on POSIX based shell
( my-regular-script arg [arg ...] 1>stdout.log 2>stderr.log & ) &
It also survives becoming an orphan when the parent process leaves the stage