I tried to start a CloudSQL proxy on supervisor, however I have no idea what is wrong with it. The documentation does not show any clues to this issue. Any ideas would be much appreciated.
I tried the setup on a clean Ubuntu 16 and then installed supervisor and downloaded cloud_sql_proxy. And I put files under /root and execute as root for debugging.
Here is my current setup:
/etc/supervisord.conf
[unix_http_server]
file=/tmp/supervisor.sock ; the path to the socket file
chmod=0766 ; socket file mode (default 0700)
[supervisord]
logfile=/tmp/supervisord.log ; main log file; default $CWD/supervisord.log
logfile_maxbytes=50MB ; max main logfile bytes b4 rotation; default 50MB
logfile_backups=10 ; # of main logfile backups; 0 means none, default 10
loglevel=info ; log level; default info; others: debug,warn,trace
pidfile=/tmp/supervisord.pid ; supervisord pidfile; default supervisord.pid
nodaemon=false ; start in foreground if true; default false
minfds=1024 ; min. avail startup file descriptors; default 1024
minprocs=200 ; min. avail process descriptors;default 200
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
[include]
files = /etc/supervisor/conf.d/*.conf
/etc/supervisor/conf.d/cloud_sql_proxy.conf
[program:cloud_sql_proxy]
command=/root/cloud_sql_proxy -dir=/cloudsql -instances="project_id:us-central1:instance-name" -credential_file="/root/service-account.json"
autostart=true
autorestart=true
startretries=1
startsecs=8
stdout_logfile=/var/log/cloud_sql_proxy-stdout.log
stderr_logfile=/var/log/cloud_sql_proxy-stderr.log
I got the following error after inspecting /tmp/supervisord.log:
2018-10-14 15:49:49,984 INFO spawned: 'cloud_sql_proxy' with pid 3569
2018-10-14 15:49:49,989 INFO exited: cloud_sql_proxy (exit status 0; not expected)
2018-10-14 15:49:50,991 INFO spawned: 'cloud_sql_proxy' with pid 3574
2018-10-14 15:49:50,996 INFO exited: cloud_sql_proxy (exit status 0; not expected)
2018-10-14 15:49:51,998 INFO gave up: cloud_sql_proxy entered FATAL state, too many start retries too quickly
2018-10-14 15:51:46,981 INFO spawned: 'cloud_sql_proxy' with pid 3591
2018-10-14 15:51:46,986 INFO exited: cloud_sql_proxy (exit status 0; not expected)
2018-10-14 15:51:47,989 INFO spawned: 'cloud_sql_proxy' with pid 3596
2018-10-14 15:51:47,998 INFO exited: cloud_sql_proxy (exit status 0; not expected)
2018-10-14 15:51:47,999 INFO gave up: cloud_sql_proxy entered FATAL state, too many start retries too quickly
Finally I managed to figure out a working solution, and here is it:
Create a new file /root/start_cloud_sql_proxy.sh:
#!/bin/bash
/root/cloud_sql_proxy -dir=/cloudsql -instances="project_id:us-central1:instance-name" -credential_file="/root/service-account.json"
Under /etc/supervisor/conf.d/cloud_sql_proxy.conf, change the command to execute a bash file:
command=/root/start_cloud_sql_proxy.sh
Related
I am running rsyslogd 8.24.0 with a local logfile.
I have a test which runs a program that does some syslog logging (with entries from my test going to another file via rsyslog.conf setting) then exits back to a shell script to check the log has expected content. This usually works but sometimes fails as though the logging hadn't happened. I've added a flush (using HUP signal) to the shell script before it does the check. I can see that the HUP has happened and that the correct entry is in the log, but the script's check still fails.
Is there a way for the shell script to wait until the flush has completed? I can add an arbitrary sleep but would prefer to have something more definite.
Here are the relevant bits of the shell script:
# Set syslog to send dump_hook's logging to a local logfile...
sudo echo "user.* `pwd`/dump_hook_log" >> /etc/rsyslog.conf
sudo systemctl restart rsyslog.service
echo "" > ./dump_hook_log
# run the test program which does syslog logging
kill -HUP `cat /var/run/syslogd.pid` # flush syslog
if [ $? -ne 0 ]
then
logFail "failed to HUP `cat /var/run/syslogd.pid`: $?"
fi
echo "sent HUP to `cat /var/run/syslogd.pid`"
grep <the string I want> ./dump_hook_log >/dev/null
The string in question is always in the dump_hook_log by the time that the test has reported fail and I've gone to look at it. I presume it must be that the flush hasn't completed by the time of the grep.
Here is an example:
In /var/log/messages
2019-01-30T12:13:27.216523+00:00 apx-ont-1 apx_dump_hook[28279]: Failed to open raw dump file "core" (Is a directory)
2019-01-30T12:13:27.216754+00:00 apx-ont-1 rsyslogd: [origin software="rsyslogd" swVersion="8.24.0" x-pid="28185" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Mod date of the log file (n.b. this is earlier than the entries it contains!):
rw-rw-rw- 1 nealec appexenv1_group 2205 2019-01-30 12:13:27.215053296 +0000 testdir_OPT/dump_hook_log
Last line of the log file (only apx_dump_hook entries in here):
2019-01-30T12:13:27.216523+00:00 apx-ont-1 apx_dump_hook[28279]: Failed to open raw dump file "core" (Is a directory)
Script reporting error:
Wed 30 Jan 12:13:27 GMT 2019 PSE Test 0.2b FAILED: 'Failed to open raw dump file' not found in ./dump_hook_log
I think I understand this now. The HUP causes rsyslogd to close its open files but it doesn’t reopen a file until it needs to log to it.
Consider the following:
I use inotify to wait for a file to close, like this:
case 9:
{
// Wait for the file, specified in argv[2], to be closed
int inotfd = inotify_init();
if (inotfd < 0) {
printf("inotify_init failed; errno %d: %s\n",
errno, strerror(errno));
exit(99);
}
int watch_desc = inotify_add_watch(inotfd, argv[2], IN_CLOSE);
if (watch_desc < 0) {
printf("can't watch %s failed; errno %d: %s\n",
argv[2], errno, strerror(errno));
exit(99);
}
size_t bufsiz = sizeof(struct inotify_event) + PATH_MAX + 1;
struct inotify_event* event = static_cast<inotify_event*>(malloc(bufsiz));
if (!event) {
printf("Failed to malloc event buffer; errno %d: %s\n",
errno, strerror(errno));
exit(99);
}
/* wait for an event to occur with blocking read*/
read(inotfd, event, bufsiz);
}
Then in my shell script I wait for that:
# Start a process that waits for the log file be closed
${bin}/test_dump_hook.exe 9 "./dump_hook_log" &
wait_pid=$!
# Signal syslogd to cause it it close/reopen its log files
kill -HUP `cat /var/run/syslogd.pid` # flush syslog
if [ $? -ne 0 ]
then
logFail "failed to HUP `cat /var/run/syslogd.pid`: $?"
fi
wait $waid_pid
I find this never returns. Sending a HUP to rsyslogd from another process doesn't break it out of the wait either, but a cat (which does open/close the file) of the log file does.
That’s because the HUP in the shell script was done before the other process waited for it. So the file was already closed at the start of the wait, and because there is no more logging to that file it is not reopened and doesn’t need to close when any subsequent HUPs are received, so the event never occurs to end the wait.
Having understood this behaviour how can I be sure that the log has been written before I check it? I've gone with this solution; put a known message into the log and wait until that appears, I know that the entries I'm waiting for must be before that. Like this:-
function flushSyslog
{
logger -p user.info -t dump_hoook_test "flushSyslog"
# Signal syslogd to cause it it close its log file
kill -HUP `cat /var/run/syslogd.pid` # flush syslog
if [ $? -ne 0 ]
then
logFail "failed to HUP `cat /var/run/syslogd.pid`: $?"
fi
# wait upto 10 secs for the entry we've just logged to appear
sleeps=0
until
grep "flushSyslog" ./dump_hook_log > /dev/null
do
sleeps=$((sleeps+1))
if [ $sleeps -gt 100 ]
then
logFail "failed to flush syslog dump_hook_log"
fi
sleep 0.1
done
}
This seems a bit heavyweight as a solution, but you can use the system's inotify api to wait for the log file to be closed (the result of the HUP signal). For example,
inotifywait -e close ./dump_hook_log
will hang until rsyslogd (or any process) closes the file, when you will get the message
./dump_hook_log CLOSE_WRITE,CLOSE
and the program will exit with return code 0. You can add a timeout.
I can't init mi celeryd and celerybeat service, I used the same code on another enviroment (configuring everything from the start) but here don't work. I think this was by permissions but I could'nt run it. please help me.
this is my celery conf on settings.py
CELERY_RESULT_BACKEND = ‘djcelery.backends.database:DatabaseBackend’
CELERY_BROKER_URL = ‘amqp://localhost’
CELERY_ACCEPT_CONTENT = [‘json’]
CELERY_TASK_SERIALIZER = ‘json’
CELERY_RESULT_SERIALIZER = ‘json’
CELERYBEAT_SCHEDULER = ‘djcelery.schedulers.DatabaseScheduler’
CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = TIME_ZONE # ‘America/Lima’
CELERY_BEAT_SCHEDULE= {}
this is my file /etc/init.d/celeryd
https://github.com/celery/celery/blob/master/extra/generic-init.d/celeryd
then I use
sudo chmod 755 /etc/init.d/celeryd
sudo chown admin1:admin1 /etc/init.d/celeryd
and I created /etc/default/celeryd
CELERY_BIN="/home/admin1/Env/tos/bin/celery"
# App instance to use
CELERY_APP="tos"
# Where to chdir at start.
CELERYD_CHDIR="/home/admin1/webapps/tos/"
# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=8"
# %n will be replaced with the first part of the nodename.
CELERYD_LOG_FILE="/var/log/celery/%n%I.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
# Workers should run as an unprivileged user.
# You need to create this user manually (or you can choose
# a user/group combination that already exists (e.g., nobody).
CELERYD_USER="admin1"
CELERYD_GROUP="admin1"
# If enabled pid and log directories will be created if missing,
# and owned by the userid/group configured.
CELERY_CREATE_DIRS=1
export SECRET_KEY="foobar"
for celerybeat I create a file on /etc/init.d/celerybeat
with:
https://github.com/celery/celery/blob/master/extra/generic-init.d/celerybeat
and start service like this:
sudo /etc/init.d/celeryd start
sudo /etc/init.d/celerybeat start
and I have this error:
sudo: imposible resolver el anfitrión SIO
celery init v10.1.
Using config script: /etc/default/celeryd
celery multi v3.1.25 (Cipater)
> Starting nodes...
> celery#SIO-PRODUCION: OK
ERROR: Pidfile (celery.pid) already exists.
Seems we're already running? (pid: 30198)
/etc/init.d/celeryd: 515: /etc/init.d/celeryd: --pidfile=/var/run/celery/%n.pid: not found
I also I got it when check it with :
sudo C_FAKEFORK=1 sh -x /etc/init.d/celeryd start
some data .....
starting nodes...
ERROR: Pidfile (celery.pid) already exists.
Seems we're already running? (pid: 30198)
> celery#SIO-PRODUCION: * Child terminated with errorcode 73
FAILED
+ --pidfile=/var/run/celery/%n.pid
/etc/init.d/celeryd: 515: /etc/init.d/celeryd: --pidfile=/var/run/celery/%n.pid: not found
+ --logfile=/var/log/celery/%n%I.log
/etc/init.d/celeryd: 517: /etc/init.d/celeryd: --logfile=/var/log/celery/%n%I.log: not found
+ --loglevel=INFO
/etc/init.d/celeryd: 519: /etc/init.d/celeryd: --loglevel=INFO: not found
+ --app=tos
/etc/init.d/celeryd: 521: /etc/init.d/celeryd: --app=tos: not found
+ --time-limit=300 --concurrency=8
/etc/init.d/celeryd: 523: /etc/init.d/celeryd: --time-limit=300: not found
+ exit 0
I have the same problem and i resolved her so:
rm -f /webapps/celery.pid && /etc/init.d/celeryd start
You can try do this. Before running celery clean up pid-files through ampersand by gluing commands.
Other way, create a django command
import shlex
import subprocess
from django.core.management.base import BaseCommand
class Command(BaseCommand):
def handle(self, *args, **options):
kill_worker_cmd = 'pkill -9 celery'
subprocess.call(shlex.split(kill_worker_cmd))
Call it before you start, or just
pkill -9 celery
I have two celerycam processes configured for running under supervisor. Here is part of my supervisord.conf:
[program:dev1_celerycam]
directory = /var/www/dev1.example.com
command = /usr/bin/python2.7 /var/www/dev1.example.com/manage.py celerycam --logfile=/var/log/supervisor/dev1_celerycam.log --workdir=/var/www/dev1.example.com
stderr_logfile = /var/log/supervisor/dev1_celerycam_error.log
stdout_logfile = /var/log/supervisor/dev1_celerycam.log
exitcodes=0,2
priority=993
[program:dev_celerycam]
directory = /var/www/dev.example.com
command = /usr/bin/python2.7 /var/www/dev.example.com/manage.py celerycam --logfile=/var/log/supervisor/dev_celerycam.log --workdir=/var/www/dev.example.com
stderr_logfile = /var/log/supervisor/dev_celerycam_error.log
stdout_logfile = /var/log/supervisor/dev_celerycam.log
exitcodes=0,2
priority=995
Also I have two processes of celeryd in the supervisord.conf. They starts perfectly fine on the same server. But for one of celerycam processes I get next in supervisord.log:
2013-09-01 09:35:12,546 INFO exited: dev_celerycam (exit status 1; not expected)
2013-09-01 09:35:12,546 INFO received SIGCLD indicating a child quit
2013-09-01 09:35:15,555 INFO spawned: 'dev_celerycam' with pid 25504
2013-09-01 09:35:16,540 INFO exited: dev_celerycam (exit status 1; not expected)
2013-09-01 09:35:16,540 INFO received SIGCLD indicating a child quit
2013-09-01 09:35:17,542 INFO gave up: dev_celerycam entered FATAL state, too many start retries too quickly
This occurs for dev_celerycam or dev1_celerycam on supervisord restart. One of them starts fine while another one fails. Looks like it happens randomly.
Is there any chance to get both celerycam processes working?
Both celerycam process created pid file in the same path somehow. Had to add --pidfile parameter for each of celerycam processes.
When I run sudo supervisorctl start stage then I get ERROR (abnormal termination). Will you please take look?
Here is my file /etc/supervisord.conf. Am i missing something? thanks
[unix_http_server]
file=/tmp/supervisor.sock ; (the path to the socket file)
[supervisord]
logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10 ; (num of main logfile rotation backups;default 10)
loglevel=info ; (log level;default info; others: debug,warn,trace)
pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false ; (start in foreground if true;default false)
minfds=1024 ; (min. avail startup file descriptors;default 1024)
minprocs=200 ; (min. avail process descriptors;default 200)
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL for a unix socket
[program:stage]
command=/home/me/envs/project/bin/python /home/me/webapps/project/manage.py run_gunicorn -b 127.0.0.1:8002 --log-file=/tmp/stage_gunicorn.log
directory=/home/me/webapps/project/
user=www-data
autostart=true
autorestart=true
stdout_logfile=/tmp/stage_supervisord.log
redirect_stderr=true
I meet the same problem as yours. As Martijn Pieters saying, it doesn't mean that something goes wrong with your supervisorctl. It just tells you that the program didn't work. You can find some error details in the log.
It indicated error so find it using below command :
supervisorctl tail <APP_NAME>
This error is occurring due to the underlying stage application is not running properly. To fix the error, you can simply go to your console and run the command that you are passing. In your case:
It is
/home/me/envs/project/bin/python /home/me/webapps/project/manage.py run_gunicorn -b 127.0.0.1:8002 --log-file=/tmp/stage_gunicorn.log
It will show you the error that need to be fixed
It means that your APP is wrong.
Go and check [program:stage] section, path or something is not c
Just edit the log level to trace then restart supervisord and see what happened from the supervisor log.
[supervisord]
loglevel=trace
sudo systemctl restart supervisord.service
tail -f /path/to/supervisord.log
When the problem has been resolved, modify the loglevel to info.
My dotcloud setup (django-celery with rabbitmq) was working fine a week ago - the processes were starting up ok and the logs were clean. However, I recently repushed (without updating any of the code), and now the logs are saying that the processes fail to start even though they seem to be running.
Supervisord log
dotcloud#hack-default-www-0:/var/log/supervisor$ more supervisord.log
2012-06-03 10:51:51,836 CRIT Set uid to user 1000
2012-06-03 10:51:51,836 WARN Included extra file "/etc/supervisor/conf.d/uwsgi.c
onf" during parsing
2012-06-03 10:51:51,836 WARN Included extra file "/home/dotcloud/current/supervi
sord.conf" during parsing
2012-06-03 10:51:51,938 INFO RPC interface 'supervisor' initialized
2012-06-03 10:51:51,938 WARN cElementTree not installed, using slower XML parser
for XML-RPC
2012-06-03 10:51:51,938 CRIT Server 'unix_http_server' running without any HTTP
authentication checking
2012-06-03 10:51:51,946 INFO daemonizing the supervisord process
2012-06-03 10:51:51,947 INFO supervisord started with pid 144
2012-06-03 10:51:53,128 INFO spawned: 'celerycam' with pid 159
2012-06-03 10:51:53,133 INFO spawned: 'apnsd' with pid 161
2012-06-03 10:51:53,148 INFO spawned: 'djcelery' with pid 164
2012-06-03 10:51:53,168 INFO spawned: 'uwsgi' with pid 167
2012-06-03 10:51:53,245 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:53,247 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:54,698 INFO spawned: 'celerycam' with pid 176
2012-06-03 10:51:54,698 INFO success: apnsd entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 10:51:54,705 INFO spawned: 'djcelery' with pid 177
2012-06-03 10:51:54,706 INFO success: uwsgi entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 10:51:54,731 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:54,754 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:56,760 INFO spawned: 'celerycam' with pid 178
2012-06-03 10:51:56,765 INFO spawned: 'djcelery' with pid 179
2012-06-03 10:51:56,790 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:51:56,791 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:51:59,798 INFO spawned: 'celerycam' with pid 180
2012-06-03 10:52:00,538 INFO spawned: 'djcelery' with pid 181
2012-06-03 10:52:00,565 INFO exited: celerycam (exit status 1; not expected)
2012-06-03 10:52:00,571 INFO gave up: celerycam entered FATAL state, too many st
art retries too quickly
2012-06-03 10:52:00,573 INFO exited: djcelery (exit status 1; not expected)
2012-06-03 10:52:01,575 INFO gave up: djcelery entered FATAL state, too many sta
rt retries too quickly
dotcloud#hack-default-www-0:/var/log/supervisor$
The djerror log:
dotcloud#hack-default-www-0:/var/log/supervisor$ more djcelery_error.log
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
dotcloud#hack-default-www-0:/var/log/supervisor$
The statusctrl shows that some processes are running, but the pids are different. Also, the celery functionality seems to be working ok. Messages are processed, and I can see the messages being processed in the django admin interface (dj celery cam is running).
# supervisorctl status
apnsd RUNNING pid 225, uptime 0:00:44
celerycam RUNNING pid 224, uptime 0:00:44
djcelery RUNNING pid 226, uptime 0:00:44
Supervisord.conf file:
[program:djcelery]
directory = /home/dotcloud/current/
command = python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
[program:celerycam]
directory = /home/dotcloud/current/
command = python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
http://jefurii.cafejosti.net/blog/2011/01/26/celery-in-virtualenv-with-supervisord/ says that the problem may be that the python being used is incorrect, so I've explicitly specified the python in the supervisord file. It now works, but it doesn't explain what I'm seeing above and why I've had to change my configuration when it was working fine last week.
Also, not all of the pids are lining up:
2012-06-03 11:19:03,045 CRIT Server 'unix_http_server' running without any HTTP
authentication checking
2012-06-03 11:19:03,051 INFO daemonizing the supervisord process
2012-06-03 11:19:03,052 INFO supervisord started with pid 144
2012-06-03 11:19:04,061 INFO spawned: 'celerycam' with pid 151
2012-06-03 11:19:04,066 INFO spawned: 'apnsd' with pid 153
2012-06-03 11:19:04,085 INFO spawned: 'djcelery' with pid 155
2012-06-03 11:19:04,104 INFO spawned: 'uwsgi' with pid 156
2012-06-03 11:19:05,271 INFO success: celerycam entered RUNNING state, process h
as stayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: apnsd entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: djcelery entered RUNNING state, process ha
s stayed up for > than 1 seconds (startsecs)
2012-06-03 11:19:05,271 INFO success: uwsgi entered RUNNING state, process has s
tayed up for > than 1 seconds (startsecs)
the status shows that the celery cam pids aren't lining up:
# supervisorctl status
apnsd RUNNING pid 153, uptime 0:06:17
celerycam RUNNING pid 150, uptime 0:06:17
djcelery RUNNING pid 155, uptime 0:06:17
My first guess is that your using the wrong python binary (system python, instead of virtualenv python), and it is causing this error (below) because that system python binary doesn't have that package installed.
Traceback (most recent call last):
File "hack/manage.py", line 2, in
from django.core.management import execute_manager
ImportError: No module named django.core.management
You should change your supervisord.conf to the following to make sure you are pointing to the correct python version.
[program:djcelery]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
[program:celerycam]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
The python path went fromt python to /home/dotcloud/env/bin/python.
I'm not sure why supervisor is saying it is running when it is not, but hopefully this one little change will help clear up your errors, and get everything back to working again.