When using upstart on ubuntu how do I issue a command for starting a job if not running and restarting if already running. When deploying an app to a new node the job is not defined.
initctl restart JOB complains if not already running
initctl start JOB complains if already running.
I can script it to do
initctl start JOB
initctl restart JOB
But it doesn't seem to be the nicest thing to do.
I was in front of the same problem.
Short of a straight "lazy-stop-then-start" command built-in initctl, we have to script.
Invoke start and restart if it fails:
initctl start JOB || initctl restart JOB
This script is probably not the answer both of us were looking for but it is short enough to mention it.
As long as the service works nicely, it will do the trick.
When the services fails, this script fails twice; For example, if the service was stopped and actually fails to start, it will also fail to restart.
Definitely looking for an improvement to this.
I hope this helps.
I also tried the 'start or restart' method that hmalphettes suggested, but got into troubles. When using this approach then updates to the upstart script would not be applied. Instead I use this, which works as I would expect:
sudo stop JOB || true && sudo start JOB
This basically reads 'Stop the job if it's running, then start it.'
sudo service JOB restart
The service command was patched in Ubuntu to make it work the same on Upstart as it does in the most common cases on sysvinit.
systemctl restart JOB
Has some unexpected effects, and in general should be carefully studied before using. It is mostly there so you can restart a job without re-loading the job definition, which is a really uncommon case.
Related
I'm using a rapsberry pi 4, v10(buster).
I installed supervisor per the instructions here: http://supervisord.org/installing.html
Except I changed "pip" to "pip3" because I want to monitor running things that use the python3 kernel.
I'm using Prefect, and the supervisord.conf is running the program with command=/home/pi/.local/bin/prefect "agent local start" (I tried this with and without double quotes)
Looking at the supervisord.log file it seems like the Prefect Agent does start, I see the ASCII art that normally shows up when I start it from the command line. But then it shows it was terminated by SIGTERM;not expected, WARN recieved SIGTERM inidicating exit request.
I saw this post: Supervisor gets a SIGTERM for some reason, quits and stops all its processes but I don't even have that 10Periodic file it references.
Anyone know why/how Supervisor processes are getting killed by sigterm?
It could be that your process exits immediately because you don’t have an API key in your command and this is required to connect your agent to the Prefect Cloud API. Additionally, it’s a best practice to always assign a unique label to your agents, below is an example with “raspberry” as a label.
You can also check the logs/status:
supervisorctl status
Here is a command you can try, plus you can specify a directory in your supervisor config (not sure whether environment variables are needed but I saw it from other raspberry Pi supervisor user):
[program:prefect-agent]
command=prefect agent local start -l raspberry -k YOUR_API_KEY --no-hostname-label
directory=/home/pi/.local/bin/prefect
user=pi
environment=HOME="/home/pi/.local/bin/prefect",USER="pi"
I've been working with Airflow for a while now, which was set up by a colleague. Lately I run into several errors, which require me to more in dept know how to fix certain things within Airflow.
I do understand what the 3 processes are, I just don't understand the underlying things that happen when I run them. What exactly happens when I run one of the commands? Can I somewhere see afterwards that they are running? And if I run one of these commands, does this overwrite older webservers/schedulers/workers or add a new one?
Moreover, if I for example run airflow webserver, the screen shows some of the things that are happening. Can I simply get out of this by pressing CTRL + C? Because when I do this, it says things like Worker exiting and Shutting down: Master. Does this mean I'm shutting everything down? How else should I get out of the webserver screen then?
Each process does what they are built to do while they are running (webserver provides a UI, scheduler determines when things need to be run, and workers actually run the tasks).
I think your confusion is that you may be seeing them as commands that tell some sort of "Airflow service" to do something, but they are each standalone commands that start the processes to do stuff. ie. Starting from nothing, you run airflow scheduler: now you have a scheduler running. Run airflow webserver: now you have a webserver running. When you run airflow webserver, it is starting a python flask app. While that process is running, the webserver is running, if you kill command, is goes down.
All three have to be running for airflow as a whole to work (assuming you are using an executor that needs workers). You should only ever had one scheduler running, but if you were to run two processes of airflow webserver (ignoring port conflicts, you would then have two separate http servers running using the same metadata database. Workers are a little different in that you may want multiple worker processes running so you can execute more tasks concurrently. So if you create multiple airflow worker processes, you'll end up with multiple processes taking jobs from the queue, executing them, and updating the task instance with the status of the task.
When you run any of these commands you'll see the stdout and stderr output in console. If you are running them as a daemon or background process, you can check what processes are running on the server.
If you ctrl+c you are sending a signal to kill the process. Ideally for a production airflow cluster, you should have some supervisor monitoring the processes and ensuring that they are always running. Locally you can either run the commands in the foreground of separate shells, minimize them and just keep them running when you need them. Or run them in as a background daemon with the -D argument. ie airflow webserver -D.
I've tried below, but doesn't work:
1:start on (net-device-up IFACE!=lo and runlevel [2345])
2:start on started network-interface INTERFACE=eth1
I saw the networking service is still be brought up by calling these init.d scripts,
so I doubt on RHEL 6, there is real a "event" from upstart for NIC's up.
anyone has any idea?
I figure it out somehow by:
start on stopped rc RUNLEVEL [2345]
so the upstart job rc(/etc/init/rc.conf) just call all the init scripts by
exec /etc/rc.d/rc $RUNLEVEL
when all the rc scripts done, all the NIC are up, and the job "rc" in upstarts does have a status change and emit status.
I want to implement a automatic service restarting for several tomcat applications, applications that do take a lot of time to start, even over 10 minutes.
Mainly the test would check if the application is responding on HTTP with a valid response.
Still, this is not the problem, the problem is how to prevent this uptime check to fail while the service is under maintenance, scheduled or not.
I don't want for this service to be started if it was stopped manually, with `service appname stop".
I considered creating .maintenance files on stop or restart actions of the daemon and checking for them before triggering an automated restart.
So far the only problem that I wasn't able to properly solve was, how to detect that the app finished starting up and remove the .maintenance file, so the automatic restart would work properly.
Note, an init.d script is not supposed to wait, so the daemon should start a background command that solves this problem.
I use Ant to start/shutdown JBoss 5 server through Jenkins. Ant java spawn and fork are set to "true", so command is executed in the background.
Jenkins successfully starts up the server, waits two minutes (a "sleep" command in Jenkins), then after the sleep it for some strange reason shuts down the server. The sleep command is the last step in the build job. The shutdown says:
2013-01-29 17:03:39,332 INFO [org.jboss.bootstrap.microcontainer.ServerImpl] Runtime shutdown hook called, forceHalt: true
I googled it and tried the suggested -Xrs command, but it didn't help. What is happening here?
Jenkins have something called the process tree killer that will kill all processes created by the job (even those started with spawn and fork set to true).
There are some workarounds to this behavior.
disabling process tree killer
-Dhudson.util.ProcessTreeKiller.disable=true
or
set the env. var BUILD_ID=dontKillMe in the JBOSS process.
export BUILD_ID=dontKillMe
You can browse the ProcessTreeKill wiki article or jenkins JIRA to find various workarounds for this issue.
This source (comments) suggest other environment variables, apparently for older versions of Jenkins. For me it didn't work before I started using JENKINS(_SERVER)_COOKIE.