supervisord: How to stop supervisord on PROCESS_STATE_FATAL - supervisord

I'm using supervisord to manage multiple processes in a docker container.
However, one process is always the 'master', and the others are monitoring and reporting processes.
What I want to do is kill supervisord if the master process fails to start after startretries.
What I tried to do is use eventlistener to kill the process:
[eventlistener:master]
events=PROCESS_STATE_FAIL
command=supervisorctl stop all
But I don't think the events subsystem is this sophisticated. I think I need to actually write an event listener to handle the events.
Is that correct? Is there a simpler way to kill the entire supervisord if one of the processes kicks?
Thanks

Another try:
[eventlistener:quit_on_failure]
events=PROCESS_STATE_FATAL
command=sh -c 'echo "READY"; while read -r line; do echo "$line"; supervisorctl shutdown; done'
Especially for docker containers, it would literaly be a killer to have a simple straightforward shutdown on errors. Container should go down when processes die.

Answered by:
supervisord event listener
The command parameter MUST be an event handler, can't be a random command.

Related

Preventing the Docker container from exiting when the main process dies

I am using Postgres with repmgr, one of the small problems I am having is that sometimes repmgr will have to stop and start the Postgres service and that will just kill the container, I tried some of the solutions online in the Dokcerfile but none seems to work, is there something I can add in the docker-compose file to prevent docker from exiting immediately, I don't want to stay alive forever, but maybe couple minutes?
Remember that docker-composer is mostly development thing. For production there are other ways, like kubernetes.
The only solution i know is to run our own .sh script as main process, which would have infinite loop with necessary checks in it.
This way you can control how to check - like ps aux and grep what you need. and exit main process if you need to by doing logic.
sh scrip would look something like:
while sleep 180; do
ps aux |grep postgres_service_name |grep -v grep
POSTGRESS=$?
if [ $POSTGRESS -ne 0 ]; then
# do what you need before exiting whole container
exit 1
fi
done
make sure you replace postgres_service_name with real name of Postgres service on linux.
Use that script as a startup script in docker compose or whatever you would use in prod.
if you really need 2 minutes before it is off - i would implement logic which would measure time after first time process is not there
The way docker is designed it will start a new container by starting the command specified as entrypoint/command to it and when this process terminates docker will kill all remaining processes in that container and shut it down.
So to keep the container running while the Postgres process is restarted you need to have another command running as the root process in the container.
You can achieve this by writing a simple shell script as a wrapper which will only exit when no Postgres process is running anymore or by using a dedicated init tool such as supervisord.

systemd not executing ExecStop script

I have created a service(name- develop) using systemd. Following is the content of my develop unit file -
Description=Develop Manager Service
[Service]
Type=forking
PIDFile = /home/nayasa/data/var/run/developPid
User=root
Group=root
ExecStartPre = /bin/bash /home/nayasa/control_scripts/develop_startPre.sh
ExecStart =/bin/bash /home/nayasa/control_scripts/develop_start.sh
ExecStop =/bin/bash /home/nayasa/control_scripts/develop_stop.sh
[Install]
WantedBy=multi-user.target
My develop.service forks multiple processes during runtime.
Whenever I run systemctl stop develop.service , systemd stops all processs in the CGroup of my develop service whereas the develop_stop script that I have provided only kills the main process using pid from pidfile. I want to stop only the main process. It seems to me that systemd is not using my stop script. How do I force systemd to execute my stop script to stop the service and not kill all processes of the Cgroup? FYI- I know that using KillMode option I can direct systemd to kill only main process and leave other processes, but I want to know why is my script not being executed?
It's a little weird to expect orphaned processes to persist after stopping a service. You would be left with a system that's in an unknown state. What would happen if you started the service again?
I think what you probably want is more complicated than a single service.
Let's say you wanted develop.service to launch proc1 and proc2. You want systemctl stop develop.service to kill proc1 but not proc2. In this case, you still need something to manage proc2 otherwise you have a rogue orphaned unmanaged and monitored process. The answer is to use another service.
Instead, try making two services. develop.service would launch proc1, possibly using your scripts. Then add a Wants=proc2.service to your [Unit] section. proc2.service would be responsible for proc2.
This means systemctl start develop.service will launch proc1 and proc2. Meanwhile systemctl stop develop.service will only kill proc1. proc2 can still be stopped/monitored by inspecting proc2.service.

celery stdout/stederr logging while running under supervisor

I'm running celery worker with some concurrency level (e.g. 4) under supervisord:
[program:wgusf-wotwgs1.celery]
command=/home/httpd/wgusf-wotwgs1/app/bin/celery -A roles.frontend worker -c 4 -l info
directory=/home/httpd/wgusf-wotwgs1/app/src
numprocs=1
stdout_logfile=/home/httpd/wgusf-wotwgs1/logs/supervisor_celery.log
stderr_logfile=/home/httpd/wgusf-wotwgs1/logs/supervisor_celery.log
autostart=true
autorestart=true
startsecs=3
killasgroup=true
stopsignal=QUIT
user=wgusf-wotwgs1
Problem is next: some part of stdout messages from worker (about successful execution of tasks/receiving tasks) are missing in logfile. But while running celery worker with the same concurrency level from shell - everything seems ok, messages are steadily appearing for all the tasks.
Any ideas how to fix this behavior?
I think it's because by default celery reports things to stderr instead of stdout

Can't kill celery processes started by Supervisor

I am running a VPS on Digital Ocean with Ubuntu 14.04.
I setup supervisor to run a bash script to export environment vars and then start celery:
#!/bin/bash
DJANGODIR=/webapps/myproj/myproj
# Activate the virtual environment
cd $DJANGODIR
source ../bin/activate
export REDIS_URL="redis://localhost:6379"
...
celery -A connectshare worker --loglevel=info --concurrency=1
Now I've noticed that supervisor does not seem to be killing these processes when I do supervisorctl stop. Furthermore, when I try to manually kill the processes they won't stop. How can I set up a better script for supervisor and how can I kill the processes that are running?
You should configurate the stopasgroup=true option into supervisord.conf file.
Because you just not only kill the parent process but also the child process.
Sending kill -9 have to kill process. If supervisorctl stop doesn't stop your process you can try setting up stopsignal to one of other values, for example QUIT or KILL.
You can see more in supervisord documentation.

Supervisord can't stop celery, how to do the same using Monit

I can't stop my celery worker using Supervisord, in the config file, it looks like this:
command=/usr/local/myapp/src/manage.py celery worker --concurrency=1 --loglevel=INFO
and when I try to stop it using the following command:
sudo service supervisord stop
It shows that the worker has stopped, while it is not.
One more problem, when you restart a program outside supervisord scope, it totally loses control over that program, because of the parent-child relationship between supervisord and its child processes
My question is: how to run celery workers using Monit?