Multiple celerycam procceses on the same server

Multiple celerycam procceses on the same server - celery

I have two celerycam processes configured for running under supervisor. Here is part of my supervisord.conf:
[program:dev1_celerycam]
directory = /var/www/dev1.example.com
command = /usr/bin/python2.7 /var/www/dev1.example.com/manage.py celerycam --logfile=/var/log/supervisor/dev1_celerycam.log --workdir=/var/www/dev1.example.com
stderr_logfile = /var/log/supervisor/dev1_celerycam_error.log
stdout_logfile = /var/log/supervisor/dev1_celerycam.log
exitcodes=0,2
priority=993
[program:dev_celerycam]
directory = /var/www/dev.example.com
command = /usr/bin/python2.7 /var/www/dev.example.com/manage.py celerycam --logfile=/var/log/supervisor/dev_celerycam.log --workdir=/var/www/dev.example.com
stderr_logfile = /var/log/supervisor/dev_celerycam_error.log
stdout_logfile = /var/log/supervisor/dev_celerycam.log
exitcodes=0,2
priority=995
Also I have two processes of celeryd in the supervisord.conf. They starts perfectly fine on the same server. But for one of celerycam processes I get next in supervisord.log:
2013-09-01 09:35:12,546 INFO exited: dev_celerycam (exit status 1; not expected)
2013-09-01 09:35:12,546 INFO received SIGCLD indicating a child quit
2013-09-01 09:35:15,555 INFO spawned: 'dev_celerycam' with pid 25504
2013-09-01 09:35:16,540 INFO exited: dev_celerycam (exit status 1; not expected)
2013-09-01 09:35:16,540 INFO received SIGCLD indicating a child quit
2013-09-01 09:35:17,542 INFO gave up: dev_celerycam entered FATAL state, too many start retries too quickly
This occurs for dev_celerycam or dev1_celerycam on supervisord restart. One of them starts fine while another one fails. Looks like it happens randomly.
Is there any chance to get both celerycam processes working?

Both celerycam process created pid file in the same path somehow. Had to add --pidfile parameter for each of celerycam processes.

Related

server isn't responding when started with procmon

I'm using boofuzz 0.1.6 on an Ubuntu machine. I'm trying to get the process_monitor_unix to connect to the server programm I want to fuzz. When I start procmon and my script, I get the following output on procmon:
[05:47.20] Process Monitor PED-RPC server initialized:
[05:47.20] listening on: 0.0.0.0:26002
[05:47.20] crash file: /home/rico/PycharmProjects/iec104_server_fuzz/boofuzz-crash-bin
[05:47.20] # records: 0
[05:47.20] proc name: None
[05:47.20] log level: 1
[05:47.20] awaiting requests...
[05:47.24] updating target process name to './simple_server'
[05:47.24] updating stop commands to: [u'kill -SIGINT $(pidof simple_server)']
[05:47.24] updating start commands to: [u'/home/rico/iec60870/lib60870-master/lib60870-C/examples/cs104_server/simple_server']
[05:47.24] updating crash bin filename to 'boofuzz-crash-bin-2020-03-19T16-47-24'
[05:47.24] Starting target...
[05:47.24] starting target process
[05:47.24] done. waiting for start command to terminate.
APCI parameters:
t0: 10
t1: 15
t2: 10
t3: 20
k: 12
w: 8
The output "APCI parameters ..." is a message of the server which is send everytime the server is started. Therefore I think it's up and running. My problem is that it isn't responding to incoming tcp-packages.
The output of my fuzzscript is the following:
[2020-03-19 17:47:24,314] Info: Web interface can be found at http://localhost:26000
[2020-03-19 17:47:24,316] Test Case: 1: activate->s_formatAPDU.no-name.1
[2020-03-19 17:47:24,316] Info: Type: Bytes. Default value: b'\x91\xef\xa5'. Case 1 of 270 overall.
[2020-03-19 17:47:24,316] Test Step: Calling procmon pre_send()
It get's stuck in this test step.
When I start the server first, then procmon, then the fuzzscript, I get the following error:
[10:29.51] Process Monitor PED-RPC server initialized:
[10:29.51] listening on: 0.0.0.0:26002
[10:29.51] crash file: /home/rico/PycharmProjects/iec104_server_fuzz/boofuzz-crash-bin
[10:29.51] # records: 0
[10:29.51] proc name: None
[10:29.51] log level: 1
[10:29.51] awaiting requests...
[10:29.55] updating target process name to './simple_server'
[10:29.55] updating stop commands to: [u'kill -SIGINT $(pidof simple_server)']
[10:29.55] updating start commands to: [u'/home/rico/iec60870/lib60870-master/lib60870-C/examples/cs104_server/simple_server']
[10:29.55] updating crash bin filename to 'boofuzz-crash-bin-2020-03-19T21-29-55'
[10:29.55] Starting target...
[10:29.55] starting target process
[10:29.55] done. waiting for start command to terminate.
APCI parameters:
t0: 10
t1: 15
t2: 10
t3: 20
k: 12
w: 8
Starting server failed!
[10:29.56] searching for process by name "./simple_server"
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/home/rico/.local/lib/python2.7/site-packages/boofuzz/utils/debugger_thread_simple.py", line 130, in run
self.spawn_target()
File "/home/rico/.local/lib/python2.7/site-packages/boofuzz/utils/debugger_thread_simple.py", line 115, in spawn_target
self.watch()
File "/home/rico/.local/lib/python2.7/site-packages/boofuzz/utils/debugger_thread_simple.py", line 166, in watch
for (pid, name) in _enumerate_processes():
File "/home/rico/.local/lib/python2.7/site-packages/boofuzz/utils/debugger_thread_simple.py", line 36, in _enumerate_processes
yield (pid, psutil.Process(pid).name())
File "/home/rico/.local/lib/python2.7/site-packages/psutil/__init__.py", line 346, in __init__
self._init(pid)
File "/home/rico/.local/lib/python2.7/site-packages/psutil/__init__.py", line 386, in _init
raise NoSuchProcess(pid, None, msg)
NoSuchProcess: psutil.NoSuchProcess no process found with pid 21574
Now this seems strange to me, because the pid 21574 isn't the pid of the running server-process. Does someone now more about this? Even wild guesses are appreciated!
If you need other infos aswell, I will gladly provide them.

I fixed the error by deleting the line
"proc_name": '/home/rico/iec60870/lib60870-master/lib60870-C/examples/cs104_server/simple_server' in my fuzzscript. I also had to make sure that the server is !not! already running when I start my fuzzscript.
Now, the server starts in the terminal which runs procmon.
I don't know if there is a better way to fix this, but atleast the procmon can do it's job now.

How to start Cloud SQL proxy with supervisor

I tried to start a CloudSQL proxy on supervisor, however I have no idea what is wrong with it. The documentation does not show any clues to this issue. Any ideas would be much appreciated.
I tried the setup on a clean Ubuntu 16 and then installed supervisor and downloaded cloud_sql_proxy. And I put files under /root and execute as root for debugging.
Here is my current setup:
/etc/supervisord.conf
[unix_http_server]
file=/tmp/supervisor.sock ; the path to the socket file
chmod=0766 ; socket file mode (default 0700)
[supervisord]
logfile=/tmp/supervisord.log ; main log file; default $CWD/supervisord.log
logfile_maxbytes=50MB ; max main logfile bytes b4 rotation; default 50MB
logfile_backups=10 ; # of main logfile backups; 0 means none, default 10
loglevel=info ; log level; default info; others: debug,warn,trace
pidfile=/tmp/supervisord.pid ; supervisord pidfile; default supervisord.pid
nodaemon=false ; start in foreground if true; default false
minfds=1024 ; min. avail startup file descriptors; default 1024
minprocs=200 ; min. avail process descriptors;default 200
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
[include]
files = /etc/supervisor/conf.d/*.conf
/etc/supervisor/conf.d/cloud_sql_proxy.conf
[program:cloud_sql_proxy]
command=/root/cloud_sql_proxy -dir=/cloudsql -instances="project_id:us-central1:instance-name" -credential_file="/root/service-account.json"
autostart=true
autorestart=true
startretries=1
startsecs=8
stdout_logfile=/var/log/cloud_sql_proxy-stdout.log
stderr_logfile=/var/log/cloud_sql_proxy-stderr.log
I got the following error after inspecting /tmp/supervisord.log:
2018-10-14 15:49:49,984 INFO spawned: 'cloud_sql_proxy' with pid 3569
2018-10-14 15:49:49,989 INFO exited: cloud_sql_proxy (exit status 0; not expected)
2018-10-14 15:49:50,991 INFO spawned: 'cloud_sql_proxy' with pid 3574
2018-10-14 15:49:50,996 INFO exited: cloud_sql_proxy (exit status 0; not expected)
2018-10-14 15:49:51,998 INFO gave up: cloud_sql_proxy entered FATAL state, too many start retries too quickly
2018-10-14 15:51:46,981 INFO spawned: 'cloud_sql_proxy' with pid 3591
2018-10-14 15:51:46,986 INFO exited: cloud_sql_proxy (exit status 0; not expected)
2018-10-14 15:51:47,989 INFO spawned: 'cloud_sql_proxy' with pid 3596
2018-10-14 15:51:47,998 INFO exited: cloud_sql_proxy (exit status 0; not expected)
2018-10-14 15:51:47,999 INFO gave up: cloud_sql_proxy entered FATAL state, too many start retries too quickly

Finally I managed to figure out a working solution, and here is it:
Create a new file /root/start_cloud_sql_proxy.sh:
#!/bin/bash
/root/cloud_sql_proxy -dir=/cloudsql -instances="project_id:us-central1:instance-name" -credential_file="/root/service-account.json"
Under /etc/supervisor/conf.d/cloud_sql_proxy.conf, change the command to execute a bash file:
command=/root/start_cloud_sql_proxy.sh

Starting Celery with supervisord: AttributeError: 'module' object has no attribute 'celery'

I used to have all my Flask app code and celery code in one file and it worked fine with supervisor. However, it is very hair so I split my tasks to celery_tasks.py and this problem occurs.
In my project directory, I can start celery manually with the following command
celery -A celery_tasks worker --loglevel=INFO
However, because this is a server, I need celery to run as a daemon in background.
But it shows following error when I called sudo supervisorctl restart celeryd
celeryd: ERROR (abnormal termination)
and the log said:
Traceback (most recent call last):
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/bin/celery", line 9, in <module>
load_entry_point('celery==3.0.19', 'console_scripts', 'celery')()
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/__main__.py", line 14, in main
main()
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/celery.py", line 957, in main
cmd.execute_from_commandline(argv)
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/celery.py", line 901, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/base.py", line 185, in execute_from_commandline
argv = self.setup_app_from_commandline(argv)
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/base.py", line 300, in setup_app_from_commandline
self.app = self.find_app(app)
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/base.py", line 318, in find_app
return sym.celery
AttributeError: 'module' object has no attribute 'celery'
I used the following config.
[program:celeryd]
command = celery -A celery_tasks worker --loglevel=INFO
user=peerapi
numprocs=4
stdout_logfile = <path to log>
stderr_logfile = <path to log>
autostart = true
autorestart = true
environment=PATH="<path to my project>"
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
My code also init celery properly
celery = Celery('celery_tasks', broker='amqp://guest:guest#localhost:5672//',
backend='amqp')
celery.config_from_object(celeryconfig)
and my celeryconfig.py is working normally
CELERY_TASK_SERIALIZER='json'
CELERY_RESULT_SERIALIZER='json'
CELERY_TIMEZONE='America/Los Angeles'
CELERY_ENABLE_UTC=True
Any clue?

Looks like your application can't find your celeryconfig, it happens because you CWD is not set for example. Try to use something like:
cd app_path; celeryd ...
Also you need to setup env
# local settings
PATH=/home/ubuntu/envs/app/bin:$PATH
PYTHONHOME=/home/ubuntu/envs/app/
PYTHONPATH=/home/ubuntu/projects/app/
Should work.

sun gridengine error "shepherd of job 119232.1 exited with exit status = 26"

We use gridengine(extactly open grid scheduler 2011.11.p1) as batch-queuing system. Just now I added an execd host named host094, but when jobs were submitted there, errors issued, status of job is Eqw, logs in $SGE_ROOT/default/spool/host094/messages says:
shepherd of job 119232.1 exited with exit status = 26
can't open usage file active_jobs/119232.1/usage for job 119232.1: No such file or directory
What's the meaning?

Dotcloud supervisord shows error but process is running

My dotcloud setup (django-celery with rabbitmq) was working fine a week ago - the processes were starting up ok and the logs were clean. However, I recently repushed (without updating any of the code), and now the logs are saying that the processes fail to start even though they seem to be running.
Supervisord log
dotcloud#hack-default-www-0:/var/log/supervisor$ more supervisord.log
2012-06-03 10:51:51,836 CRIT Set uid to user 1000
2012-06-03 10:51:51,836 WARN Included extra file "/etc/supervisor/conf.d/uwsgi.c
onf" during parsing
2012-06-03 10:51:51,836 WARN Included extra file "/home/dotcloud/current/supervi
sord.conf" during parsing
2012-06-03 10:51:51,938 INFO RPC interface 'supervisor' initialized
2012-06-03 10:51:51,938 WARN cElementTree not installed, using slower XML parser
for XML-RPC
2012-06-03 10:51:51,938 CRIT Server 'unix_http_server' running without any HTTP
authentication checking
2012-06-03 10:51:51,946 INFO daemonizing the supervisord process
2012-06-03 10:51:51,947 INFO supervisord started with pid 144
2012-06-03 10:51:53,128 INFO spawned: 'celerycam' with pid 159
2012-06-03 10:51:53,133 INFO spawned: 'apnsd' with pid 161
2012-06-03 10:51:53,148 INFO spawned: 'djcelery' with pid 164
2012-06-03 10:51:53,168 INFO spawned: 'uwsgi' with pid 167
2012-06-03 10:51:53,245 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:53,247 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:54,698 INFO spawned: 'celerycam' with pid 176
2012-06-03 10:51:54,698 INFO success: apnsd entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 10:51:54,705 INFO spawned: 'djcelery' with pid 177
2012-06-03 10:51:54,706 INFO success: uwsgi entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 10:51:54,731 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:54,754 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:56,760 INFO spawned: 'celerycam' with pid 178
2012-06-03 10:51:56,765 INFO spawned: 'djcelery' with pid 179
2012-06-03 10:51:56,790 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:56,791 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:59,798 INFO spawned: 'celerycam' with pid 180
2012-06-03 10:52:00,538 INFO spawned: 'djcelery' with pid 181
2012-06-03 10:52:00,565 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:52:00,571 INFO gave up: celerycam entered FATAL state, too many st
art retries too quickly
2012-06-03 10:52:00,573 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:52:01,575 INFO gave up: djcelery entered FATAL state, too many sta
rt retries too quickly
dotcloud#hack-default-www-0:/var/log/supervisor$
The djerror log:
dotcloud#hack-default-www-0:/var/log/supervisor$ more djcelery_error.log
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
dotcloud#hack-default-www-0:/var/log/supervisor$
The statusctrl shows that some processes are running, but the pids are different. Also, the celery functionality seems to be working ok. Messages are processed, and I can see the messages being processed in the django admin interface (dj celery cam is running).
# supervisorctl status
apnsd RUNNING pid 225, uptime 0:00:44
celerycam RUNNING pid 224, uptime 0:00:44
djcelery RUNNING pid 226, uptime 0:00:44
Supervisord.conf file:
[program:djcelery]
directory = /home/dotcloud/current/
command = python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
[program:celerycam]
directory = /home/dotcloud/current/
command = python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
http://jefurii.cafejosti.net/blog/2011/01/26/celery-in-virtualenv-with-supervisord/ says that the problem may be that the python being used is incorrect, so I've explicitly specified the python in the supervisord file. It now works, but it doesn't explain what I'm seeing above and why I've had to change my configuration when it was working fine last week.
Also, not all of the pids are lining up:
2012-06-03 11:19:03,045 CRIT Server 'unix_http_server' running without any HTTP
authentication checking
2012-06-03 11:19:03,051 INFO daemonizing the supervisord process
2012-06-03 11:19:03,052 INFO supervisord started with pid 144
2012-06-03 11:19:04,061 INFO spawned: 'celerycam' with pid 151
2012-06-03 11:19:04,066 INFO spawned: 'apnsd' with pid 153
2012-06-03 11:19:04,085 INFO spawned: 'djcelery' with pid 155
2012-06-03 11:19:04,104 INFO spawned: 'uwsgi' with pid 156
2012-06-03 11:19:05,271 INFO success: celerycam entered RUNNING state, process h
as stayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: apnsd entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: djcelery entered RUNNING state, process ha
s stayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: uwsgi entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
the status shows that the celery cam pids aren't lining up:
# supervisorctl status
apnsd RUNNING pid 153, uptime 0:06:17
celerycam RUNNING pid 150, uptime 0:06:17
djcelery RUNNING pid 155, uptime 0:06:17

My first guess is that your using the wrong python binary (system python, instead of virtualenv python), and it is causing this error (below) because that system python binary doesn't have that package installed.
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
You should change your supervisord.conf to the following to make sure you are pointing to the correct python version.
[program:djcelery]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
[program:celerycam]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
The python path went fromt python to /home/dotcloud/env/bin/python.
I'm not sure why supervisor is saying it is running when it is not, but hopefully this one little change will help clear up your errors, and get everything back to working again.