how to use a shell script to supervise a program? - perl

I've searched around but haven't quite found what I'm looking for. In a nutshell I have created a bash script to run in a infinite while loop, sleeping and checking if a process is running. The only problem is even if the process is running, it says it is not and opens another instance.
I know I should check by process name and not process id, since another process could jump in and take the id. However all perl programs are named Perl5.10.0 on my system, and I intend on having multiple instances of the same perl program open.
The following "if" always returns false, what am I doing wrong here???
while true; do
if [ ps -p $pid ]; then
echo "Program running fine"
sleep 10
else
echo "Program being restarted\n"
perl program_name.pl &
sleep 5
read -r pid < "${filename}_pid.txt"
fi
done

Get rid of the square brackets. It should be:
if ps -p $pid; then
The square brackets are syntactic sugar for the test command. This is an entirely different beast and does not invoke ps at all:
if test ps -p $pid; then
In fact that yields "-bash: [: -p: binary operator expected" when I run it.

Aside from the syntax error already pointed out, this is a lousy way to ensure that a process stays alive.
First, you should find out why your program is dying in the first place; this script doesn't fix a bug, it tries to hide one.
Secondly, if it is so important that a program remain running, why do you expect your (at least once already) buggy shell script will do the job? Use a system facility that is specifically designed to restart server processes. If you say what platform you are using and the nature of your server process. I can offer more concrete advice.
added in response to comment:
Sure, there are engineering exigencies, but as the OP noted in the OP, there is still a bug in this attempt at a solution:
I know I should check by process name
and not process id, since another
process could jump in and take the id.
So now you are left with a PID tracking script, not a process "nanny". Although the chances are small, the script as it now stands has a ten second window in which
the "monitored" process fails
I start up my week long emacs process which grabs the same PID
the nanny script continues on blissfully unaware that its dependent has failed
The script isn't merely buggy, it is invalid because it presumes that PIDs are stable identifiers of a process. There are ways that this could be better handled even at the shell script level. The simplest is to never detach the execution of perl from the script since the script is doing nothing other than watching the subprocess. For example:
while true ; do
if perl program_name.pl ; then
echo "program_name terminated normally, restarting"
else
echo "oops program_name died again, restarting"
fi
done
Which is not only shorter and simpler, but it actually blocks for the condition that you are really interested in: the run-state of the perl program. The original script repeatedly checks a bad proxy indication of the run state condition (the PID) and so can get it wrong. And, since the whole purpose of this nanny script is to handle faults, it would be bad if it were faulty itself by design.

I totally agree that fiddling with the PID is nearly always a bad idea. The while true ; do ... done script is quite good, however for production systems there a couple of process supervisors which do exactly this and much more, e.g.
enable you to send signals to the supervised process (without knowing it's PID)
check how long a service has been up or down
capturing its output and write it to a log file
Examples of such process supervisors are daemontools or runit. For a more elaborate discussion and examples see Init scripts considered harmful. Don't be disturbed by the title: Traditional init scripts suffer from exactly the same problem like you do (they start a daemon, keep it's PID in a file and then leave the daemon alone).

I agree that you should find out why your program is dying in the first place. However, an ever running shell script is probably not a good idea. What if this supervising shell script dies? (And yes, get rid of the square braces around ps -p $pid. You want the exit status of ps -p $pid command. The square brackets are a replacement for the test command.)
There are two possible solutions:
Use cron to run your "supervising" shell script to see if the process you're supervising is still running, and if it isn't, restart it. The supervised process can output it's PID into a file. Your supervising program can then cat this file and get the PID to check.
If the program you're supervising is providing a service upon a particular port, make it an inetd service. This way, it isn't running at all until there is a request upon that port. If you set it up correctly, it will terminate when not needed and restart when needed. Takes less resources and the OS will handle everything for you.

That's what kill -0 $pid is for. It returns success if a process with pid $pid exists.

Related

Does a background process complete if its parent finishes first?

I am not sure if this question is specific to Perl, but it is the language I am using. Say I launch a background process to save a web page to a local file like this:
system("curl http://google.com > output_file.html &");
I know this will launch a background process, though I'm not totally sure of the details (for example does it get its own PID?). But what's particularly important to me is, what happens if the process that launched it terminates before curl finishes downloading? Will curl be allowed to continue, or will it terminate too?
Is there any reason the solution wouldn't be to prepend the above command with nohup (nohup curl ...)? See http://linux.101hacks.com/unix/nohup-command/
Yes, your backgrounded process should complete even if the script exits first.
The system call forks, what means that at that point a new, independent, process is created as a near clone of the parent. That process is then replaced by the command to run, or by a shell that will run the command.† The system then waits for the child process to complete.
The & in the command makes sure that it is a shell that is run by the system, which then executes the command. The shell itself forks a process (subshell), in which the command is executed, and doesn't wait for it but returns right away.
At that point system's job is done and it returns control to the script.
The fate of the process forked by the shell has nothing more to do with the shell, or with your script, and the process will run to its completion.
The parent may well exit right away. See this with
use warnings;
use strict;
use feature 'say';
system("(sleep 5; echo hi) &");
say "Parent exiting.";
or, from a terminal
perl -wE'system("(sleep 3; echo me)&"); say "done"'
Once in the shell, the () starts a sub-shell, used here to put multiple commands in the background for this example (and representing your command). Keep that in mind when tracking process IDs via bash internal variables $BASHPID, $$, $PPID (here $$ differs from $BASHPID)
perl -wE'say $$; system("
( sleep 30; echo \$BASHPID; echo \$PPID; echo \$\$ ) &
"); say "done"'
Then view processes while this sleeps (by ps aux on my system, with | tail -n 10).
Most of the time the PID of a system-run command will be by two greater than that of the script, as there is a shell between them (for a backgrounded process as well, on my system). In the example above it should be greater by 3, because of an additional () subshell with mulitple commands.
This assumes that the /bin/sh which system uses does get relegated to bash.
Note: when the parent exits first the child is re-parented by init and all is well (no zombies).
† From system
Does exactly the same thing as exec, except that a fork is done first and the parent process waits for the child process to exit. Note that argument processing varies depending on the number of arguments. If there is more than one argument in LIST, or if LIST is an array with more than one value, starts the program given by the first element of the list with arguments given by the rest of the list. If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to execvp, which is more efficient. ...
The "... starts the program given by the first element ..." also means by execvp, see exec.

Odd behavior with Perl system() command

Note that I'm aware that this is probably not the best or most optimal way to do this but I've run into this somewhere before and I'm curious as to the answer.
I have a perl script that is called from an init that runs and occasionally dies. To quickly debug this, I put together a quick wrapper perl script that basically consists of
#$path set from library call.
while(1){
system("$path/command.pl " . join(" ",#ARGV) . " >>/var/log/outlog 2>&1");
sleep 30; #Added this one later. See below...
}
Fire this up from the command line and it runs fine and as expected. command.pl is called and the script basically halts there until the child process dies then goes around again.
However, when called from a start script (actually via start-stop-daemon), the system command returns immediately, leaving command.pl running. Then it goes around for another go. And again and again. (This was not fun without the sleep command.). ps reveals the parent of (the many) command.pl to be 1 rather than the id of the wrapper script (which it is when I run from the command line).
Anyone know what's occurring?
Maybe the command.pl is not being run successfully. Maybe the file doesn't have execute permission (do you need to say perl command.pl?). Maybe you are running the command from a different directory than you thought, and the command.pl file isn't found.
There are at least three things you can check:
standard error output of your command. For now you are swallowing it by saying 2>&1. Remove that part and observe what errors the system command produces.
the return value of system. The command may run and system may still return an exit code, but if system returns 0, you know the command was successful.
Perl's error variable $!. If there was a problem, Perl will set $!, which may or may not be helpful.
To summarize, try:
my $ec = system("command.pl >> /var/log/outlog");
if ($ec != 0) {
warn "exit code was $ec, \$! is $!";
}
Update: if multiple instance of the command keep showing up in your ps output, then it sounds like the program is forking and running itself in the background. If that is indeed what the command is supposed to do, then what you do NOT want to do is run this command in an endless loop.
Perhaps when run from a deamon the "system" command is using a different shell than the one used when you are running as yourself. Maybe the shell used by the daemon does not recognize the >& construct.
Instead of system("..."), try exec("...") function if that works for you.

Should I turn a perl script that parses a /var/log/.* file into a daemon?

I am writing a perl script to parse, for example, /var/log/syslog.
The perl script triggers further subsequent tasks when particular events in the log appear. The log is parsed following the advice of this post:
Command line: monitor log file and add data to database
Which what I believe is the use of a pipe.
Now I'd like this script to forever run in the background.
This sounds like a daemon to me, and the daemon program referenced in the following question seems ideal:
How can I run a Perl script as a system daemon in linux?
But from this post, it seems clear that daemon's have no open file handles. So how can I have a daemon, or a perl script that becomes a daemon, that monitors a logfile?
It sounds like what you want is a daemon. In that case the advise given in the second post you reference is the best practice. However, you do have other options like daemontools, which removes the fork complexity.
Daemons are allowed to have filehandles, but you should close STDIN, STDOUT, and STDERRR because you shouldn't really use them anymore. A lot of this has to do with the way fork works in *nix systems. Just open the pipe filehandle after your second fork, and you shouldn't have any issues.
this doesn't answer your question, but is another route to consider which may or may not be appropriate for you:
rsyslog can execute a program when a certain message is logged
see Filter Conditions for setting up the up the trigger, Templates for formatting the output that's passed to the script, and Actions > Shell Execute for specifying the executable.
Be sure to read the security implications, and that ryslog blocks while the external program runs. But if your script runs reliably quickly, it may be an option.

What does the EC2 command line say when a machine won't start?

When starting an instance on Amazon EC2, how would I detect a failure, for instance, if there's no machine available to fulfill my request? I'm using one of the less-common machine types and am concerned it won't start up, but am having trouble finding out what message to look for to detect this.
I'm using the EC2 commandline tools to do this. I know I can look for 'running' when I do ec2-describe-instance to see if the machine is up, but don't know what to look for to see if the startup failed.
Thanks!
The output from ec2-start-instances only returns you stopped pending, and as you say you need to use ec2-describe-instances to retrieve the state.
For that, you have a couple of choices; you can either use a loop to check for instance-state-name, looking for a result of running or stopped; alternatively you could look at either the reason or state-reason-code fields; unfortunately you'll need to trigger the failure you're worried about, to obtain the values that indicate failure.
The batch file I use to wait for a successful startup (fill in the underscores):
#echo off
set EC2_HOME=C:\tools\ec2-api-tools
set EC2_PRIVATE_KEY=C:\_\pk-_.pem
set EC2_CERT=C:\_\cert-_.pem
set JAVA_HOME=C:\Program Files (x86)\Java\jre6
%EC2_HOME%\bin\ec2-start-instances i-_
:docheck
%EC2_HOME%\bin\ec2-describe-instances | C:\tools\gnuwin32\bin\grep.exe -c stopped > %EC2_HOME%\temp.txt
findstr /m "1" %EC2_HOME%\temp.txt > nul
if %errorlevel%==0 (c:\tools\gnuwin32\bin\echo -n "."
goto docheck)
del temp.txt
ec2-start-instances will return you the previous state (after last command to instance) and the current state (after your command). ec2-stop instances does the same thing. THE PROBLEM IS, if you are scripting and you use -start- on a 'stopping' instance -OR- you use a -stop- on a 'pending' instance. These will cause exceptions in the command line tool and NASTILY exit your scripts all the way to the original console (VERY BAD BEHVIOR, AMAZON). So you have to go all the way through parsing the ec2-describe-instances [instance-id] result. HOWVER, that still leaves you vulnerable to that tiny little bit of time between when you GET the status from your instance and you APPLY A COMMAND. If someone else, or Amazon, puts you into pending or stopping, and you then do 'stop' or 'start respectively, your script will break. I really don't know how to catch such an exception with script. Bad Amazon AWS, BAD DOG!

How do I daemonize an arbitrary script in unix?

I'd like a daemonizer that can turn an arbitrary, generic script or command into a daemon.
There are two common cases I'd like to deal with:
I have a script that should run forever. If it ever dies (or on reboot), restart it. Don't let there ever be two copies running at once (detect if a copy is already running and don't launch it in that case).
I have a simple script or command line command that I'd like to keep executing repeatedly forever (with a short pause between runs). Again, don't allow two copies of the script to ever be running at once.
Of course it's trivial to write a "while(true)" loop around the script in case 2 and then apply a solution for case 1, but a more general solution will just solve case 2 directly since that applies to the script in case 1 as well (you may just want a shorter or no pause if the script is not intended to ever die (of course if the script really does never die then the pause doesn't actually matter)).
Note that the solution should not involve, say, adding file-locking code or PID recording to the existing scripts.
More specifically, I'd like a program "daemonize" that I can run like
% daemonize myscript arg1 arg2
or, for example,
% daemonize 'echo `date` >> /tmp/times.txt'
which would keep a growing list of dates appended to times.txt. (Note that if the argument(s) to daemonize is a script that runs forever as in case 1 above, then daemonize will still do the right thing, restarting it when necessary.) I could then put a command like above in my .login and/or cron it hourly or minutely (depending on how worried I was about it dying unexpectedly).
NB: The daemonize script will need to remember the command string it is daemonizing so that if the same command string is daemonized again it does not launch a second copy.
Also, the solution should ideally work on both OS X and linux but solutions for one or the other are welcome.
EDIT: It's fine if you have to invoke it with sudo daemonize myscript myargs.
(If I'm thinking of this all wrong or there are quick-and-dirty partial solutions, I'd love to hear that too.)
PS: In case it's useful, here's a similar question specific to python.
And this answer to a similar question has what appears to be a useful idiom for a quick-and-dirty demonizing of an arbitrary script:
You can daemonize any executable in Unix by using nohup and the & operator:
nohup yourScript.sh script args&
The nohup command allows you to shut down your shell session without it killing your script, while the & places your script in the background so you get a shell prompt to continue your session. The only minor problem with this is standard out and standard error both get sent to ./nohup.out, so if you start several scripts in this manor their output will be intertwined. A better command would be:
nohup yourScript.sh script args >script.out 2>script.error&
This will send standard out to the file of your choice and standard error to a different file of your choice. If you want to use just one file for both standard out and standard error you can us this:
nohup yourScript.sh script args >script.out 2>&1 &
The 2>&1 tells the shell to redirect standard error (file descriptor 2) to the same file as standard out (file descriptor 1).
To run a command only once and restart it if it dies you can use this script:
#!/bin/bash
if [[ $# < 1 ]]; then
echo "Name of pid file not given."
exit
fi
# Get the pid file's name.
PIDFILE=$1
shift
if [[ $# < 1 ]]; then
echo "No command given."
exit
fi
echo "Checking pid in file $PIDFILE."
#Check to see if process running.
PID=$(cat $PIDFILE 2>/dev/null)
if [[ $? = 0 ]]; then
ps -p $PID >/dev/null 2>&1
if [[ $? = 0 ]]; then
echo "Command $1 already running."
exit
fi
fi
# Write our pid to file.
echo $$ >$PIDFILE
# Get command.
COMMAND=$1
shift
# Run command until we're killed.
while true; do
$COMMAND "$#"
sleep 10 # if command dies immediately, don't go into un-ctrl-c-able loop
done
The first argument is the name of the pid file to use. The second argument is the command. And all other arguments are the command's arguments.
If you name this script restart.sh this is how you would call it:
nohup restart.sh pidFileName yourScript.sh script args >script.out 2>&1 &
I apologise for the long answer (please see comments about how my answer nails the spec). I'm trying to be comprehensive, so you have as good of a leg up as possible. :-)
If you are able to install programs (have root access), and are willing to do one-time legwork to set up your script for daemon execution (i.e., more involved than simply specifying the command-line arguments to run on the command line, but only needing to be done once per service), I have a way that's more robust.
It involves using daemontools. The rest of the post describes how to set up services using daemontools.
Initial setup
Follow the instructions in How to install daemontools. Some distributions (e.g., Debian, Ubuntu) already have packages for it, so just use that.
Make a directory called /service. The installer should have already done this, but just verify, or if installing manually. If you dislike this location, you can change it in your svscanboot script, although most daemontools users are used to using /service and will get confused if you don't use it.
If you're using Ubuntu or another distro that doesn't use standard init (i.e., doesn't use /etc/inittab), you will need to use the pre-installed inittab as a base for arranging svscanboot to be called by init. It's not hard, but you need to know how to configure the init that your OS uses.
svscanboot is a script that calls svscan, which does the main work of looking for services; it's called from init so init will arrange to restart it if it dies for any reason.
Per-service setup
Each service needs a service directory, which stores housekeeping information about the service. You can also make a location to house these service directories so they're all in one place; usually I use /var/lib/svscan, but any new location will be fine.
I usually use a script to set up the service directory, to save lots of manual repetitive work. e.g.,
sudo mkservice -d /var/lib/svscan/some-service-name -l -u user -L loguser "command line here"
where some-service-name is the name you want to give your service, user is the user to run that service as, and loguser is the user to run the logger as. (Logging is explained in just a little bit.)
Your service has to run in the foreground. If your program backgrounds by default, but has an option to disable that, then do so. If your program backgrounds without a way to disable it, read up on fghack, although this comes at a trade-off: you can no longer control the program using svc.
Edit the run script to ensure it's doing what you want it to. You may need to place a sleep call at the top, if you expect your service to exit frequently.
When everything is set up right, create a symlink in /service pointing to your service directory. (Don't put service directories directly within /service; it makes it harder to remove the service from svscan's watch.)
Logging
The daemontools way of logging is to have the service write log messages to standard output (or standard error, if you're using scripts generated with mkservice); svscan takes care of sending log messages to the logging service.
The logging service takes the log messages from standard input. The logging service script generated by mkservice will create auto-rotated, timestamped log files in the log/main directory. The current log file is called current.
The logging service can be started and stopped independently of the main service.
Piping the log files through tai64nlocal will translate the timestamps into a human-readable format. (TAI64N is a 64-bit atomic timestamp with a nanosecond count.)
Controlling services
Use svstat to get the status of a service. Note that the logging service is independent, and has its own status.
You control your service (start, stop, restart, etc.) using svc. For example, to restart your service, use svc -t /service/some-service-name; -t means "send SIGTERM".
Other signals available include -h (SIGHUP), -a (SIGALRM), -1 (SIGUSR1), -2 (SIGUSR2), and -k (SIGKILL).
To down the service, use -d. You can also prevent a service from automatically starting at bootup by creating a file named down in the service directory.
To start the service, use -u. This is not necessary unless you've downed it previously (or set it up not to auto-start).
To ask the supervisor to exit, use -x; usually used with -d to terminate the service as well. This is the usual way to allow a service to be removed, but you have to unlink the service from /service first, or else svscan will restart the supervisor.
Also, if you created your service with a logging service (mkservice -l), remember to also exit the logging supervisor (e.g., svc -dx /var/lib/svscan/some-service-name/log) before removing the service directory.
Summary
Pros:
daemontools provides a bulletproof way to create and manage services. I use it for my servers, and I highly recommend it.
Its logging system is very robust, as is the service auto-restart facility.
Because it starts services with a shell script that you write/tune, you can tailor your service however you like.
Powerful service control tools: you can send most any signal to a service, and can bring services up and down reliably.
Your services are guaranteed a clean execution environment: they will execute with the same environment, process limits, etc., as what init provides.
Cons:
Each service takes a bit of setup. Thankfully, this only needs doing once per service.
Services must be set up to run in the foreground. Also, for best results, they should be set up to log to standard output/standard error, rather than syslog or other files.
Steep learning curve if you're new to the daemontools way of doing things. You have to restart services using svc, and cannot run the run scripts directly (since they would then not be under the control of the supervisor).
Lots of housekeeping files, and lots of housekeeping processes. Each service needs its own service directory, and each service uses one supervisor process to auto-restart the service if it dies. (If you have many services, you will see lots of supervise processes in your process table.)
In balance, I think daemontools is an excellent system for your needs. I welcome any questions about how to set it up and maintain it.
You should have a look at daemonize. It allows to detect second copy (but it uses file locking mechanism). Also it works on different UNIX and Linux distributions.
If you need to automatically start your application as daemon, then you need to create appropriate init-script.
You can use the following template:
#!/bin/sh
#
# mydaemon This shell script takes care of starting and stopping
# the <mydaemon>
#
# Source function library
. /etc/rc.d/init.d/functions
# Do preliminary checks here, if any
#### START of preliminary checks #########
##### END of preliminary checks #######
# Handle manual control parameters like start, stop, status, restart, etc.
case "$1" in
start)
# Start daemons.
echo -n $"Starting <mydaemon> daemon: "
echo
daemon <mydaemon>
echo
;;
stop)
# Stop daemons.
echo -n $"Shutting down <mydaemon>: "
killproc <mydaemon>
echo
# Do clean-up works here like removing pid files from /var/run, etc.
;;
status)
status <mydaemon>
;;
restart)
$0 stop
$0 start
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
exit 1
esac
exit 0
I think you may want to try start-stop-daemon(8). Check out scripts in /etc/init.d in any Linux distro for examples. It can find started processes by command line invoked or PID file, so it matches all your requirements except being a watchdog for your script. But you can always start another daemon watchdog script that just restarts your script if necessary.
As an alternative to the already mentioned daemonize and daemontools, there is the daemon command of the libslack package.
daemon is quite configurable and does care about all the tedious daemon stuff such as automatic restart, logging or pidfile handling.
If you're using OS X specifically, I suggest you take a look at how launchd works. It will automatically check to ensure your script is running and relaunch it if necessary. It also includes all sorts of scheduling features, etc. It should satisfy both requirement 1 and 2.
As for ensuring only one copy of your script can run, you need to use a PID file. Generally I write a file to /var/run/.pid that contains a PID of the current running instance. if the file exists when the program runs, it checks if the PID in the file is actually running (the program may have crashed or otherwise forgotten to delete the PID file). If it is, abort. If not, start running and overwrite the PID file.
Daemontools ( http://cr.yp.to/daemontools.html ) is a set of pretty hard-core utilities used to do this, written by dj bernstein. I have used this with some success. The annoying part about it is that none of the scripts return any visible results when you run them - just invisible return codes. But once it's running it's bulletproof.
First get createDaemon() from http://code.activestate.com/recipes/278731/
Then the main code:
import subprocess
import sys
import time
createDaemon()
while True:
subprocess.call(" ".join(sys.argv[1:]),shell=True)
time.sleep(10)
You could give a try to immortal It is a *nix cross-platform (OS agnostic) supervisor.
For a quick try on macOS:
brew install immortal
In case you are using FreeBSD from the ports or by using pkg:
pkg install immortal
For Linux by downloading the precompiled binaries or from source: https://immortal.run/source/
You can either use it like this:
immortal -l /var/log/date.log date
Or by a configuration YAML file which gives you more options, for example:
cmd: date
log:
file: /var/log/date.log
age: 86400 # seconds
num: 7 # int
size: 1 # MegaBytes
timestamp: true # will add timesamp to log
If you would like to keep also the standard error output in a separate file you could use something like:
cmd: date
log:
file: /var/log/date.log
age: 86400 # seconds
num: 7 # int
size: 1 # MegaBytes
stderr:
file: /var/log/date-error.log
age: 86400 # seconds
num: 7 # int
size: 1 # MegaBytes
timestamp: true # will add timesamp to log
This is a working version complete with an example which you can copy into an empty directory and try out (after installing the CPAN dependencies, which are Getopt::Long, File::Spec, File::Pid, and IPC::System::Simple -- all pretty standard and are highly recommended for any hacker: you can install them all at once with cpan <modulename> <modulename> ...).
keepAlive.pl:
#!/usr/bin/perl
# Usage:
# 1. put this in your crontab, to run every minute:
# keepAlive.pl --pidfile=<pidfile> --command=<executable> <arguments>
# 2. put this code somewhere near the beginning of your script,
# where $pidfile is the same value as used in the cron job above:
# use File::Pid;
# File::Pid->new({file => $pidfile})->write;
# if you want to stop your program from restarting, you must first disable the
# cron job, then manually stop your script. There is no need to clean up the
# pidfile; it will be cleaned up automatically when you next call
# keepAlive.pl.
use strict;
use warnings;
use Getopt::Long;
use File::Spec;
use File::Pid;
use IPC::System::Simple qw(system);
my ($pid_file, $command);
GetOptions("pidfile=s" => \$pid_file,
"command=s" => \$command)
or print "Usage: $0 --pidfile=<pidfile> --command=<executable> <arguments>\n", exit;
my #arguments = #ARGV;
# check if process is still running
my $pid_obj = File::Pid->new({file => $pid_file});
if ($pid_obj->running())
{
# process is still running; nothing to do!
exit 0;
}
# no? restart it
print "Pid " . $pid_obj->pid . " no longer running; restarting $command #arguments\n";
system($command, #arguments);
example.pl:
#!/usr/bin/perl
use strict;
use warnings;
use File::Pid;
File::Pid->new({file => "pidfile"})->write;
print "$0 got arguments: #ARGV\n";
Now you can invoke the example above with: ./keepAlive.pl --pidfile=pidfile --command=./example.pl 1 2 3 and the file pidfile will be created, and you will see the output:
Pid <random number here> no longer running; restarting ./example.pl 1 2 3
./example.pl got arguments: 1 2 3
You might also try Monit. Monit is a service that monitors and reports on other services. While it's mainly used as a way to notify (via email and sms) about runtime problems, it can also do what most of the other suggestions here have advocated. It can auto (re)start and stop programs, send emails, initiate other scripts, and maintain a log of output that you can pick up. In addition, I've found it's easy to install and maintain since there's solid documentation.
I have made a series of improvements on the other answer.
stdout out of this script is purely made up of stdout coming from its child UNLESS it exits due to detecting that the command is already being run
cleans up after its pidfile when terminated
optional configurable timeout period (Accepts any positive numeric argument, sends to sleep)
usage prompt on -h
arbitrary command execution, rather than single command execution. The last arg OR remaining args (if more than one last arg) are sent to eval, so you can construct any sort of shell script as a string to send to this script as a last arg (or trailing args) for it to daemonize
argument count comparisons done with -lt instead of <
Here is the script:
#!/bin/sh
# this script builds a mini-daemon, which isn't a real daemon because it
# should die when the owning terminal dies, but what makes it useful is
# that it will restart the command given to it when it completes, with a
# configurable timeout period elapsing before doing so.
if [ "$1" = '-h' ]; then
echo "timeout defaults to 1 sec.\nUsage: $(basename "$0") sentinel-pidfile [timeout] command [command arg [more command args...]]"
exit
fi
if [ $# -lt 2 ]; then
echo "No command given."
exit
fi
PIDFILE=$1
shift
TIMEOUT=1
if [[ $1 =~ ^[0-9]+(\.[0-9]+)?$ ]]; then
TIMEOUT=$1
[ $# -lt 2 ] && echo "No command given (timeout was given)." && exit
shift
fi
echo "Checking pid in file ${PIDFILE}." >&2
#Check to see if process running.
if [ -f "$PIDFILE" ]; then
PID=$(< $PIDFILE)
if [ $? = 0 ]; then
ps -p $PID >/dev/null 2>&1
if [ $? = 0 ]; then
echo "This script is (probably) already running as PID ${PID}."
exit
fi
fi
fi
# Write our pid to file.
echo $$ >$PIDFILE
cleanup() {
rm $PIDFILE
}
trap cleanup EXIT
# Run command until we're killed.
while true; do
eval "$#"
echo "I am $$ and my child has exited; restart in ${TIMEOUT}s" >&2
sleep $TIMEOUT
done
Usage:
$ term-daemonize.sh pidfilefortesting 0.5 'echo abcd | sed s/b/zzz/'
Checking pid in file pidfilefortesting.
azzzcd
I am 79281 and my child has exited; restart in 0.5s
azzzcd
I am 79281 and my child has exited; restart in 0.5s
azzzcd
I am 79281 and my child has exited; restart in 0.5s
^C
$ term-daemonize.sh pidfilefortesting 0.5 'echo abcd | sed s/b/zzz/' 2>/dev/null
azzzcd
azzzcd
azzzcd
^C
Beware that if you run this script from different directories it may use different pidfiles and not detect any existing running instances. Since it is designed to run and restart ephemeral commands provided through an argument there is no way to know whether something's been already started, because who is to say whether it is the same command or not? To improve on this enforcement of only running a single instance of something, a solution specific to the situation is required.
Also, for it to function as a proper daemon, you must use (at the bare minimum) nohup as the other answer mentions. I have made no effort to provide any resilience to signals the process may receive.
One more point to take note of is that killing this script (if it was called from yet another script which is killed, or with a signal) may not succeed in killing the child, especially if the child is yet another script. I am uncertain of why this is, but it seems to be something related to the way eval works, which is mysterious to me. So it may be prudent to replace that line with something that accepts only a single command like in the other answer.
There is also a very simple double-fork + setsid approach to detach any script from its parent process
( setsid my-regular-script arg [arg ...] 1>stdout.log 2>stderr.log & )
setsid is a part of standard util-linux package which has been with linux since birth. This works when launched in any POSIX compatible shell I know.
Another double-fork based approach doesn't even require any extra exacutables or packages and relies purely on POSIX based shell
( my-regular-script arg [arg ...] 1>stdout.log 2>stderr.log & ) &
It also survives becoming an orphan when the parent process leaves the stage