Supervisord sometimes starts celery, sometimes not - kubernetes

I'm deploying my flask api on Kubernetes. The executed command when the container is started is the following:
supervisord -c /etc/supervisor/conf.d/celery.conf
gunicorn wsgi:app --bind=0.0.0.0:5000 --workers 1 --threads 12 --log-level=warning --access-logfile /var/log/gunicorn-access.log --error-logfile /var/log/gunicorn-error.log
You see above that I'm starting celery first with supervisor and after that I'm running the gunicorn server. Content of celery.conf:
[supervisord]
logfile = /tmp/supervisord.log
logfile_maxbytes = 50MB
logfile_backups=10
loglevel = info
pidfile = /tmp/supervisord.pid
nodaemon = false
minfds = 1024
minprocs = 200
umask = 022
identifier = supervisor
directory = /tmp
nocleanup = true
[program:celery]
directory = /mydir/app
command = celery -A celery_worker.celery worker --loglevel=debug
When logged into my pods I can see that sometimes the process of starting celery is working (example in pod 1):
> more /tmp/supervisord.log
2021-06-08 18:19:46,460 CRIT Supervisor running as root (no user in config file)
2021-06-08 18:19:46,462 INFO daemonizing the supervisord process
2021-06-08 18:19:46,462 INFO set current directory: '/tmp'
2021-06-08 18:19:46,463 INFO supervisord started with pid 9
2021-06-08 18:19:47,469 INFO spawned: 'celery' with pid 15
2021-06-08 18:19:48,470 INFO success: celery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Sometimes it's not (in pod 2):
> more /tmp/supervisord.log
2021-06-08 18:19:42,979 CRIT Supervisor running as root (no user in config file)
2021-06-08 18:19:42,988 INFO daemonizing the supervisord process
2021-06-08 18:19:42,988 INFO set current directory: '/tmp'
2021-06-08 18:19:42,989 INFO supervisord started with pid 9
2021-06-08 18:19:43,992 INFO spawned: 'celery' with pid 11
2021-06-08 18:19:44,994 INFO success: celery entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
>>>> 2021-06-08 18:19:58,642 INFO exited: celery (exit status 2; expected) <<<<<HERE
In my pod 1, a ps command shows the following:
> ps aux | grep celery
root 9 0.0 0.0 55308 16376 ? Ss 18:45 0:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/conf.d/celery.conf
root 23 2.2 0.8 2343684 352940 ? S 18:45 0:05 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
root 37 0.0 0.5 2341860 208716 ? S 18:46 0:00 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
root 38 0.0 0.5 2341864 208716 ? S 18:46 0:00 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
root 39 0.0 0.5 2341868 208716 ? S 18:46 0:00 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
root 40 0.0 0.5 2341872 208724 ? S 18:46 0:00 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
root 41 0.0 0.5 2341876 208728 ? S 18:46 0:00 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
root 42 0.0 0.5 2341880 208728 ? S 18:46 0:00 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
root 43 0.0 0.5 2341884 208736 ? S 18:46 0:00 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
root 44 0.0 0.5 2342836 211384 ? S 18:46 0:00 /usr/bin/python3 /usr/local/bin/celery -A celery_worker.celery worker --loglevel=debug
In my pod 2, I can see that supervisord/celery process is still there but I don't have all the individual /usr/local/bin/celery processes that I have in pod 1:
> ps aux | grep celery
root 9 0.0 0.0 55308 16296 ? Ss 18:19 0:00 /usr/bin/python /usr/bin/supervisord -c /etc/supervisor/conf.d/celery.conf
This behavior is not always the same. Sometimes when pods are restarted the two succeed to launch celery, sometimes none of them succeed. In this last scenario if I make a request to my API that is supposed to launch a celery task, I can see on my broker console (RabbitMQ) that a task is created but there is no message "activity" and nothing is written is my database table (the end result of my celery task).
If I start celery manually in my pods:
celery -A celery_worker.celery worker --loglevel=debug
everything works.
What could explain such a behavior?

Following the comments above, the best solution is to have two containers, the first having the entrypoint gunicorn and the other celery celery-worker. If the second is the same image as the first it works very well and I can scale on Kubernetes each container independently. The only thing is that the code sourcing is more difficult, each time I make a code change on the first I must apply the same changes manually on the second, maybe there is a better way to address this specific issue of the code sourcing.

Related

How to run a process in daemon mode with systemd service?

I've googled and read quite a bit of blogs, posts, etc. on this. I've also been trying them out manually on my EC2 instance. However, I'm still not able to properly configure the systemd service unit to have it run the process in background as I expect. The process I'm running is nessus service. Here's my service unit definition:
$ cat /etc/systemd/system/nessusagent.service
[Unit]
Description=Nessus
[Service]
ExecStart=/opt/myorg/bin/init_nessus
Type=simple
[Install]
WantedBy=multi-user.target
and here is my script /opt/myorg/bin/init_nessus:
$ cat /opt/apiq/bin/init_nessus
#!/usr/bin/env bash
set -e
NESSUS_MANAGER_HOST=...
NESSUS_MANAGER_PORT=...
NESSUS_CLIENT_GROUP=...
NESSUS_LINKING_KEY=...
#-------------------------------------------------------------------------------
# link nessus agent with manager host
#-------------------------------------------------------------------------------
/opt/nessus_agent/sbin/nessuscli agent link --key=${NESSUS_LINKING_KEY} --host=${NESSUS_MANAGER_HOST} --port=${NESSUS_MANAGER_PORT} --groups=${NESSUS_CLIENT_GROUP}
if [ $? -ne 0 ]; then
echo "Cannot link the agent to the Nessus manager, quitting."
exit 1
fi
/opt/nessus_agent/sbin/nessus-service -q -D
When I run the service, I always get the following:
$ systemctl status nessusagent.service
● nessusagent.service - Nessus
Loaded: loaded (/etc/systemd/system/nessusagent.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Mon 2020-08-24 06:40:40 UTC; 9min ago
Process: 27787 ExecStart=/opt/myorg/bin/init_nessus (code=exited, status=0/SUCCESS)
Main PID: 27787 (code=exited, status=0/SUCCESS)
...
Aug 24 06:40:40 ip-10-27-0-104 init_nessus[27787]: + /opt/nessus_agent/sbin/nessuscli agent link --key=... --host=... --port=8834 --groups=...
Aug 24 06:40:40 ip-10-27-0-104 init_nessus[27787]: [info] [agent] HostTag::getUnix: setting TAG value to '8596420322084e3ab97d3c39e5c92e00'
Aug 24 06:40:40 ip-10-27-0-104 init_nessus[27787]: [info] [agent] Successfully linked to <myorg.com>:8834
Aug 24 06:40:40 ip-10-27-0-104 init_nessus[27787]: + '[' 0 -ne 0 ']'
Aug 24 06:40:40 ip-10-27-0-104 init_nessus[28506]: + /opt/nessus_agent/sbin/nessus-service -q -D
However, I can't see the process that I expect to see:
$ ps faux | grep nessus
root 28565 0.0 0.0 12940 936 pts/0 S+ 06:54 0:00 \_ grep --color=auto nessus
If I run the last command manually, I can see it:
$ /opt/nessus_agent/sbin/nessus-service -q -D
$ ps faux | grep nessus
root 28959 0.0 0.0 12940 1016 pts/0 S+ 07:00 0:00 \_ grep --color=auto nessus
root 28952 0.0 0.0 6536 116 ? S 07:00 0:00 /opt/nessus_agent/sbin/nessus-service -q -D
root 28953 0.2 0.0 69440 9996 pts/0 Sl 07:00 0:00 \_ nessusd -q
What is it that I'm missing here?
Eventually figured out that this was because of the extra -D option in the last command. Removing the -D option fixed the issue. Running the process in daemon mode inside a system manager is not the way to go. We need to run it in the foreground and let the system manager handle it.

why host process contains kubernetes pod process

when I am list process of host using this command:
[root#fat001 ~]# ps -o user,pid,pidns,%cpu,%mem,vsz,rss,tty,stat,start,time,args ax|grep "room"
root 3488 4026531836 0.0 0.0 107992 644 pts/11 S+ 20:06:01 00:00:00 tail -n 200 -f /data/logs/soa-room/spring.log
root 18114 4026534329 8.5 2.2 5721560 370032 ? Sl 23:17:51 00:01:53 java -jar /root/soa-room-service-1.0.0-SNAPSHOT.jar
root 19107 4026531836 0.0 0.0 107992 616 pts/8 S+ 19:14:10 00:00:00 tail -f -n 200 /data/logs/soa-room/spring.log
root 23264 4026531836 0.0 0.0 112684 1000 pts/13 S+ 23:39:57 00:00:00 grep --color=auto room
root 30416 4026531836 3.4 3.4 4122552 567232 ? Sl 19:52:03 00:07:53 /opt/dabai/tools/jdk1.8.0_211/bin/java -Xmx256M -Xms128M -jar -Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=5011 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dump /data/jenkins/soa-room-service/soa-room-service-1.0.0-SNAPSHOT.jar
I am very sure this process is kubernetes pod's process:
root 18114 4026534329 8.5 2.2 5721560 370032 ? Sl 23:17:51 00:01:53 java -jar /root/soa-room-service-1.0.0-SNAPSHOT.jar
Why the kubernetes container's process show on host?It should be in the docker's container!!!!!
This is perfectly normal. Containers are not VM.
Every process run by Docker is run on the host Kernel. There is no isolation in term of Kernel.
Of course, there is an isolation in terms of process between containers, as each container's process are run in an isolated process namespace.
In summary : container A can't see container B process (well, not by default), however as all the containers process are run inside your host, you'll always be able to see the process from your host.

supervisorctl 3.3.1 not working, complaining about not finding .conf file

root#dev-demo-karl:~# supervisord -v
3.3.1
I'm getting the following error when trying to access supervisorctl:
Error: .ini file does not include supervisorctl section
For help, use /usr/bin/supervisorctl -h
Supervisor not using configuration file
root#dev-demo-karl:/srv/www# /usr/bin/supervisorctl
Error: .ini file does not include supervisorctl section
For help, use /usr/bin/supervisorctl -h
root#dev-demo-karl:/srv/www# cd /etc/
root#dev-demo-karl:/etc# cat supervisor
supervisor/ supervisord/ supervisord.conf
root#dev-demo-karl:/etc# ls supervisord/conf.d
supervisord.conf
root#dev-demo-karl:/etc# ls supervisor/conf.d
supervisord.conf
root#dev-demo-karl:/etc# ls supervisord
conf.d supervisord.conf
root#dev-demo-karl:/etc# ls supervisor
conf.d supervisord.conf
All the supervisord.conf files have the following:
root#dev-demo-karl:/etc# cat supervisord.conf
[supervisord]
nodaemon=true
[program:node]
directory=/srv/www
command=npm run demo
autostart=true
autorestart=true
[program:mongod]
command=/usr/bin/mongod --auth --fork --smallfiles --logpath /var/log/mongodb.log
I KNOW supervisord is finding one of them because the services are up:
root#dev-demo-karl:~# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.8 47624 17744 ? Ss 21:03 0:00 /usr/bin/python /usr/bin/supervisord
root 8 0.1 2.4 1003400 49580 ? Sl 21:03 0:00 npm
root 16 0.6 2.3 295224 48192 ? Sl 21:03 0:03 /usr/bin/mongod --auth --fork --smallfiles --logpath /var/log/mongodb.log
root 40 0.0 0.0 4512 844 ? S 21:03 0:00 sh -c npm run prod
root 41 0.1 2.4 1003412 49584 ? Sl 21:03 0:00 npm
root 52 0.0 0.0 4512 712 ? S 21:03 0:00 sh -c NODE_ENV=production NODE_PATH="$(pwd)" node src/index.js
root 54 0.4 8.1 1068568 166080 ? Sl 21:03 0:02 node src/index.js
root 79 0.0 0.1 18240 3248 ? Ss+ 21:04 0:00 bash
root 238 0.0 0.1 18248 3248 ? Ss 21:06 0:00 bash
root 501 0.0 0.1 34424 2884 ? R+ 21:12 0:00 ps aux
Why isn't supervisorctl not working?
And last:
root#dev-demo-karl:~# cat /etc/supervisord.conf
[supervisord]
nodaemon=true
[program:node]
directory=/srv/www
command=npm run demo
autostart=true
autorestart=true
[program:mongod]
command=/usr/bin/mongod --auth --fork --smallfiles --logpath /var/log/mongodb.log
root#dev-demo-karl:~# supervisorctl -c /etc/supervisord.conf
Error: .ini file does not include supervisorctl section
For help, use /usr/bin/supervisorctl -h
Wth? Anyone know what I'm doing wrong? I'm starting it via the command in a Docker container:
CMD ["/usr/bin/supervisord"]
First, how to start supervisord:
# Start server
supervisord -c /path/to/supervisor.conf
# Then use client
supervisorctl -c /path/to/supervisor.conf status
Second, typical configuration (work for me supervisord --v -> 4.1.0)
[inet_http_server]
port = 127.0.0.1:9001
[supervisorctl]
serverurl = http://127.0.0.1:9001
[program:<your_program_name>]
or
[unix_http_server]
file=/tmp/supervisor.sock
[supervisord]
nodaemon=true
logfile=/var/log/supervisor/supervisord.log
pidfile=/var/run/supervisord.pid
childlogdir=/var/log/supervisor
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock
[program:<your_program_name>]
...
p.s. I've answered on this post, because it first in google search.)
Supervisor in version 3.3.1 has added a few more required fields. http://supervisord.org/configuration.html
I added them and it is now read!

Celery doesn't restart subprocesses

I have an issue with celery deployment - when I restart it old subprocesses don't stop and continue to process some of jobs. I use supervisord to run celery. Here is my config:
$ cat /etc/supervisor/conf.d/celery.conf
[program:celery]
; Full path to use virtualenv, honcho to load .env
command=/home/ubuntu/venv/bin/honcho run celery -A stargeo worker -l info --no-color
directory=/home/ubuntu/app
environment=PATH="/home/ubuntu/venv/bin:%(ENV_PATH)s"
user=ubuntu
numprocs=1
stdout_logfile=/home/ubuntu/logs/celery.log
stderr_logfile=/home/ubuntu/logs/celery.err
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
Here is how celery processes look:
$ ps axwu | grep celery
ubuntu 983 0.0 0.1 47692 10064 ? S 11:47 0:00 /home/ubuntu/venv/bin/python /home/ubuntu/venv/bin/honcho run celery -A stargeo worker -l info --no-color
ubuntu 984 0.0 0.0 4440 652 ? S 11:47 0:00 /bin/sh -c celery -A stargeo worker -l info --no-color
ubuntu 985 0.0 0.5 168720 41356 ? S 11:47 0:01 /home/ubuntu/venv/bin/python /home/ubuntu/venv/bin/celery -A stargeo worker -l info --no-color
ubuntu 990 0.0 0.4 167936 36648 ? S 11:47 0:00 /home/ubuntu/venv/bin/python /home/ubuntu/venv/bin/celery -A stargeo worker -l info --no-color
ubuntu 991 0.0 0.4 167936 36648 ? S 11:47 0:00 /home/ubuntu/venv/bin/python /home/ubuntu/venv/bin/celery -A stargeo worker -l info --no-color
When I run sudo supervisorctl restart celery it only stops first process python ... honcho one and all the other ones continue. And if I try to kill them they continue (kill -9 works).
This appeared to be a bug with honcho. I ended up with workaround of starting this script from supervisor:
#!/bin/bash
source /home/ubuntu/venv/bin/activate
exec env $(cat .env | grep -v ^# | xargs) \
celery -A stargeo worker -l info --no-color

Correct way to start mysqld_safe

I've been searching around a lot but could not figure out how to start mysqld in "safe mode".
This is what I got so far:
[root#localhost bin]# service mysqld_safe start
mysqld_safe: unrecognized service
I'm running CentOS, this is my mysql version:
[root#localhost ~]# mysql --version
mysql Ver 14.12 Distrib 5.0.95, for redhat-linux-gnu (i686) using readline 5.1
Any help would be appreciated!
Starting mysqld should do the trick:
[root#green-penny ~]# service mysqld start
Starting mysqld: [ OK ]
[root#green-penny ~]# ps axu | grep mysql
root 7540 0.8 0.0 5112 1380 pts/0 S 09:29 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql 7642 1.5 0.7 135480 15344 pts/0 Sl 09:29 0:00 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
root 7660 0.0 0.0 4352 724 pts/0 S+ 09:29 0:00 grep mysql
(Note that mysqld_safe is running.)