django celery daemon does work: it can't create pid file - celery

I can't init mi celeryd and celerybeat service, I used the same code on another enviroment (configuring everything from the start) but here don't work. I think this was by permissions but I could'nt run it. please help me.
this is my celery conf on settings.py
CELERY_RESULT_BACKEND = ‘djcelery.backends.database:DatabaseBackend’
CELERY_BROKER_URL = ‘amqp://localhost’
CELERY_ACCEPT_CONTENT = [‘json’]
CELERY_TASK_SERIALIZER = ‘json’
CELERY_RESULT_SERIALIZER = ‘json’
CELERYBEAT_SCHEDULER = ‘djcelery.schedulers.DatabaseScheduler’
CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = TIME_ZONE # ‘America/Lima’
CELERY_BEAT_SCHEDULE= {}
this is my file /etc/init.d/celeryd
https://github.com/celery/celery/blob/master/extra/generic-init.d/celeryd
then I use
sudo chmod 755 /etc/init.d/celeryd
sudo chown admin1:admin1 /etc/init.d/celeryd
and I created /etc/default/celeryd
CELERY_BIN="/home/admin1/Env/tos/bin/celery"
# App instance to use
CELERY_APP="tos"
# Where to chdir at start.
CELERYD_CHDIR="/home/admin1/webapps/tos/"
# Extra command-line arguments to the worker
CELERYD_OPTS="--time-limit=300 --concurrency=8"
# %n will be replaced with the first part of the nodename.
CELERYD_LOG_FILE="/var/log/celery/%n%I.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
# Workers should run as an unprivileged user.
# You need to create this user manually (or you can choose
# a user/group combination that already exists (e.g., nobody).
CELERYD_USER="admin1"
CELERYD_GROUP="admin1"
# If enabled pid and log directories will be created if missing,
# and owned by the userid/group configured.
CELERY_CREATE_DIRS=1
export SECRET_KEY="foobar"
for celerybeat I create a file on /etc/init.d/celerybeat
with:
https://github.com/celery/celery/blob/master/extra/generic-init.d/celerybeat
and start service like this:
sudo /etc/init.d/celeryd start
sudo /etc/init.d/celerybeat start
and I have this error:
sudo: imposible resolver el anfitrión SIO
celery init v10.1.
Using config script: /etc/default/celeryd
celery multi v3.1.25 (Cipater)
> Starting nodes...
> celery#SIO-PRODUCION: OK
ERROR: Pidfile (celery.pid) already exists.
Seems we're already running? (pid: 30198)
/etc/init.d/celeryd: 515: /etc/init.d/celeryd: --pidfile=/var/run/celery/%n.pid: not found
I also I got it when check it with :
sudo C_FAKEFORK=1 sh -x /etc/init.d/celeryd start
some data .....
starting nodes...
ERROR: Pidfile (celery.pid) already exists.
Seems we're already running? (pid: 30198)
> celery#SIO-PRODUCION: * Child terminated with errorcode 73
FAILED
+ --pidfile=/var/run/celery/%n.pid
/etc/init.d/celeryd: 515: /etc/init.d/celeryd: --pidfile=/var/run/celery/%n.pid: not found
+ --logfile=/var/log/celery/%n%I.log
/etc/init.d/celeryd: 517: /etc/init.d/celeryd: --logfile=/var/log/celery/%n%I.log: not found
+ --loglevel=INFO
/etc/init.d/celeryd: 519: /etc/init.d/celeryd: --loglevel=INFO: not found
+ --app=tos
/etc/init.d/celeryd: 521: /etc/init.d/celeryd: --app=tos: not found
+ --time-limit=300 --concurrency=8
/etc/init.d/celeryd: 523: /etc/init.d/celeryd: --time-limit=300: not found
+ exit 0

I have the same problem and i resolved her so:
rm -f /webapps/celery.pid && /etc/init.d/celeryd start
You can try do this. Before running celery clean up pid-files through ampersand by gluing commands.

Other way, create a django command
import shlex
import subprocess
from django.core.management.base import BaseCommand
class Command(BaseCommand):
def handle(self, *args, **options):
kill_worker_cmd = 'pkill -9 celery'
subprocess.call(shlex.split(kill_worker_cmd))
Call it before you start, or just
pkill -9 celery

Related

GitLab K8s Runner fails for get_sources

we are trying to move gitlab-runners from standard CentOS VMs to kebernetes.
But after setup and registration, pipeline fails with unknown error:
Running with gitlab-runner 15.7.0 (259d2fd4)
on Kubernetes-local JXRw3mH1
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab-runner
Using Kubernetes executor with image gitlab-test.domain:5005/image:latest ...
Using attach strategy to execute scripts...
Preparing environment
00:04
Waiting for pod gitlab-runner/runner-jxrw3mh1-project-290-concurrent-0dpd88 to be running, status is Pending
Running on runner-jxrw3mh1-project-290-concurrent-0dpd88 via gitlab-runner-d7df6c548-hsgxg...
Getting source from Git repository
00:00
error: could not lock config file /root/.gitconfig: Read-only file system
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: command terminated with exit code 1
Inside the log of the job pod we found:
helper Running on runner-jxrw3mh1-project-290-concurrent-0dpd88 via gitlab-runner-d7df6c548-hsgxg...
helper
helper {"command_exit_code": 0, "script": "/scripts-290-207166/prepare_script"}
helper error: could not lock config file /root/.gitconfig: Read-only file system
helper
helper {"command_exit_code": 1, "script": "/scripts-290-207166/get_sources"}
helper
helper {"command_exit_code": 0, "script": "/scripts-290-207166/cleanup_file_variables"}
Inside the log of the gitlab-runner pod we found:
Starting in container "helper" the command ["gitlab-runner-build" "<<<" "/scripts-290-207167/get_sources" "2>&1 | tee -a /logs-290-207167/output.log"] with script: #!/usr/bin/env bash
if set -o | grep pipefail > /dev/null; then set -o pipefail; fi; set -o errexit
set +o noclobber
: | eval $'export FF_CMD_DISABLE_DELAYED_ERROR_LEVEL_EXPANSION=$\'false\'\nexport FF_NETWORK_PER_BUILD=$\'false\'\nexport FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=$\'false\'\nexport FF_USE_DIRECT_DOWNLOAD
exit 0
job=207167 project=290 runner=JXRw3mH1
Remote process exited with the status: CommandExitCode: 1, Script: /scripts-290-207167/get_sources job=207167 project=290 runner=JXRw3mH1
Container "helper" exited with error: command terminated with exit code 1 job=207167 project=290 runner=JXRw3mH1
notes:
the error "error: could not lock config file /root/.gitconfig: Read-only file system" is due to the current user inside container is different by root
the file /logs-290-207167/output.log contains the log of the job pod
Inside job pod shell we also tested some git commands and perform successfully fetch and clone using our personal credentials (the same user that perform the run of the pipeline from gitlab gui).
We think the problem can be related on gitlab-ci-token, but we have finished our investigation... :frowning:

Unable to configure a new agent : Failed to create CoreCLR, HRESULT: 0x80004005

We had some 12 agents (vsts-agent-linux-x64-2.188.4) running on one Az VM (Ubuntu 20.04.2 LTS) as processes (./config.sh && screen ./run.sh). All was well..
I had to run some command related to /tmp folder but it kept showing busy and we suspected that our Agents might be using /tmp. Unfortunately instead of any other clean way of stopping the agents, we killed all processes on this VM manually, including the agents'.
After the /tmp related command ran successfully, I tried running screen ./run.sh from one of the agent directories. And I got an error:
Failed to create CoreCLR, HRESULT: 0x80004005
I also had tried :
.agent2/run.sh and I got the error :
ldd: ./bin/libcoreclr.so: No such file or directory
ldd: ./bin/System.Security.Cryptography.Native.OpenSsl.so: No such file or directory
ldd: ./bin/System.IO.Compression.Native.so: No such file or directory
ldd: ./bin/System.Net.Http.Native.so: No such file or directory
Failed to create CoreCLR, HRESULT: 0x80004005
I even downloaded a new .tar for the agent and ran a fresh ./config . But I get the same error on ./config as well
Is there a solution to this? Please help
export COMPlus_EnableDiagnostics = 0, and then running ./config from the agent directory. worked!
I had this issue when running as the non-privileged user specified in the systemd file but running as root user worked fine.
Finally used:
strace -f -o trace.log /<executable path>/<executable name>
Which led me to:
9183 mknod("/tmp/clr-debug-pipe-9183-8112345738-in", S_IFIFO|0700) = -1 EACCES (Permission denied)
This caused me to compare the /tmp directory between working and non-working boxes.
[<not-working-hostname>]$ ll /
drwxrwxr-x 7 root root 93 Jan 5 21:37 tmp
[<working-hostname>]$ ll /
drwxrwxrwt 7 root root 93 Jan 5 21:59 tmp
(Note the r-x vs rwt)
Fix:
[<hostname>]# chmod 1777 /tmp

Snakemake: Singularity parameters --home and --bind set by default but disallowed on HPC

I have already posted this as an issue on Github at https://github.com/snakemake/snakemake/issues/279 but haven't got any response yet. I hope to find help here.
Version
I am using the following versions on our HPC cluster:
Snakemake c5.4.4
singularity version 3.5.3
Minimal example
singularity: "docker://bash"
rule test:
shell: "echo test"
Describe the bug
snakemake --use-singularity --debug
returns this message:
Building DAG of jobs...
Pulling singularity image docker://bash.
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 test
1
[Fri Mar 13 15:59:30 2020]
rule test:
jobid: 0
Activating singularity image /data/nanopore/test/.snakemake/singularity/36b22e49e8a03fd08160e9345dd1034e.simg
FATAL: container creation failed: not mounting user requested home: user bind control is disallowed
[Fri Mar 13 15:59:30 2020]
Error in rule test:
jobid: 0
RuleException:
CalledProcessError in line 4 of /data/nanopore/test/Snakefile:
Command ' singularity exec --home /data/nanopore/test --bind /opt/snakemake/v5.4.4/lib/python3.5/site-packages/snakemake-5.4.4-py3.5.egg:/mnt/snakemake /data/nanopore/test/.snakemake/singularity/36b22e49e8a03fd08160e9345dd1034e.simg bash -c 'set -euo pipefail; echo test'' returned non-zero exit status 255
File "/data/nanopore/test/Snakefile", line 4, in __rule_test
File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /data/nanopore/test/.snakemake/log/2020-03-13T155917.601627.snakemake.log
Apparently, snakemake runs singularity with default values for --home and --bind. These were disallowed by the administrator, however.
Executing
singularity exec --home /data/nanopore/test --bind /opt/snakemake/v5.4.4/lib/python3.5/site-packages/snakemake-5.4.4-py3.5.egg:/mnt/snakemake /data/nanopore/test/.snakemake/singularity/36b22e49e8a03fd08160e9345dd1034e.simg bash -c 'set -euo pipefail;'
returns:
FATAL: container creation failed: not mounting user requested home: user bind control is disallowed
Additional context
Is there a way to disable the Singularity default parameter setting in snakemake? Inside the singularity container the /data directory is still writeable and readable anyway.
Thanks a lot

Why gcloud command is slow to start?

Just typing gcloud for help take 5 secs.
$ gcloud
...
gcloud 0.30s user 0.13s system 7% cpu 5.508 total
$ gcloud version
Google Cloud SDK 128.0.0
alpha 2016.01.12
bq 2.0.24
bq-nix 2.0.24
core 2016.09.23
core-nix 2016.09.20
gcloud
gsutil 4.21
gsutil-nix 4.21
kubectl
kubectl-darwin-x86_64 1.3.7
$ uname -a
Darwin hiroshi-MacBook.local 16.0.0 Darwin Kernel Version 16.0.0: Mon Aug 29 17:56:20 PDT 2016; root:xnu-3789.1.32~3/RELEASE_X86_64 x86_64
EDIT 2017-03-31: Zachary said that gcloud 148.0.0 addressed this issue. So try gcloud components update. see https://stackoverflow.com/users/4922212/zachary-newman
tl;dr
It turns out that socket.gethostbyaddr(socket.gethostname()) is slow for .local hostname in macOS.
$ python -i
>>> socket.gethostname()
'hiroshi-MacBook.local'
>>> socket.gethostbyaddr(socket.gethostname()) # it takes about 5 seconds
('localhost', ['1.0.0.127.in-addr.arpa'], ['127.0.0.1'])
So, for a workaround, just added the hostname to the localhost line of /etc/hosts.
127.0.0.1 localhost hiroshi-Macbook.local
After that is return value is different, but it returns in an instant.
>>> socket.gethostbyaddr(socket.gethostname())
('localhost', ['1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa'], ['::1'])
How do I get there
Where gcloud command is:
$ which gcloud
/Users/hiroshi/google-cloud-sdk/bin/gcloud
Edit the endline of the shell script...
...
+ echo "$CLOUDSDK_PYTHON" $CLOUDSDK_PYTHON_ARGS "${CLOUDSDK_ROOT_DIR}/lib/gcloud.py" "$#"
"$CLOUDSDK_PYTHON" $CLOUDSDK_PYTHON_ARGS "${CLOUDSDK_ROOT_DIR}/lib/gcloud.py" "$#"
to echo where the gcloud.py is:
$ gcloud
python2.7 -S /Users/hiroshi/google-cloud-sdk/lib/gcloud.py
OK. Who take the 5 secs?
$ python2.7 -S -m cProfile -s time /Users/hiroshi/google-cloud-sdk/lib/gcloud.py
173315 function calls (168167 primitive calls) in 5.451 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 5.062 5.062 5.062 5.062 {_socket.gethostbyaddr}
...
_socket.gethostbyaddr is.
What is the argument of the function call and backtrace look like?
I added some lines before main() of gcloud.py
import traceback
def mygethostbyaddr(addr):
print addr
traceback.print_stack()
return addr
import socket
socket.gethostbyaddr = mygethostbyaddr
Execute gcloud again.
I got it is my .local name of my machine.
$ gcloud
hiroshi-MacBook.local
File "/Users/hiroshi/google-cloud-sdk/lib/gcloud.py", line 74, in <module>
main()
File "/Users/hiroshi/google-cloud-sdk/lib/gcloud.py", line 70, in main
sys.exit(googlecloudsdk.gcloud_main.main())
File "/Users/hiroshi/google-cloud-sdk/lib/googlecloudsdk/gcloud_main.py", line 121, in main
metrics.Started(START_TIME)
File "/Users/hiroshi/google-cloud-sdk/lib/googlecloudsdk/core/metrics.py", line 411, in Wrapper
return func(*args, **kwds)
File "/Users/hiroshi/google-cloud-sdk/lib/googlecloudsdk/core/metrics.py", line 554, in Started
collector = _MetricsCollector.GetCollector()
File "/Users/hiroshi/google-cloud-sdk/lib/googlecloudsdk/core/metrics.py", line 139, in GetCollector
_MetricsCollector._instance = _MetricsCollector()
File "/Users/hiroshi/google-cloud-sdk/lib/googlecloudsdk/core/metrics.py", line 197, in __init__
hostname = socket.getfqdn()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 141, in getfqdn
hostname, aliases, ipaddrs = gethostbyaddr(name)
File "/Users/hiroshi/google-cloud-sdk/lib/gcloud.py", line 32, in mygethostbyaddr
traceback.print_stack()
#hiroshi's answer solved the issue for me i.e. to run gcloud components update. However, since I had installed gcloud through their Cloud SDK using a package manager, I was stuck with the following error*:
ERROR: (gcloud.components.update) You cannot perform this action
because the Google Cloud CLI component manager is disabled for this
installation.
Hence, I had to explicitly mention the apt-get libraries to perform the update. The following command helped me to get it done in one go:
sudo apt-get update && sudo apt-get --only-upgrade install google-cloud-sdk-app-engine-go google-cloud-sdk-datastore-emulator google-cloud-sdk-datalab google-cloud-sdk-firestore-emulator google-cloud-sdk-kubectl-oidc google-cloud-sdk google-cloud-sdk-app-engine-python-extras google-cloud-sdk-cloud-build-local kubectl google-cloud-sdk-cbt google-cloud-sdk-minikube google-cloud-sdk-skaffold google-cloud-sdk-cloud-run-proxy google-cloud-sdk-pubsub-emulator google-cloud-sdk-config-connector google-cloud-sdk-gke-gcloud-auth-plugin google-cloud-sdk-kpt google-cloud-sdk-local-extract google-cloud-sdk-nomos google-cloud-sdk-app-engine-grpc google-cloud-sdk-bigtable-emulator google-cloud-sdk-app-engine-python google-cloud-sdk-terraform-validator google-cloud-sdk-anthos-auth google-cloud-sdk-spanner-emulator google-cloud-sdk-app-engine-java
*More details as to why the aforementioned error occurs and how to permanently solve it can be found here.

Starting Celery with supervisord: AttributeError: 'module' object has no attribute 'celery'

I used to have all my Flask app code and celery code in one file and it worked fine with supervisor. However, it is very hair so I split my tasks to celery_tasks.py and this problem occurs.
In my project directory, I can start celery manually with the following command
celery -A celery_tasks worker --loglevel=INFO
However, because this is a server, I need celery to run as a daemon in background.
But it shows following error when I called sudo supervisorctl restart celeryd
celeryd: ERROR (abnormal termination)
and the log said:
Traceback (most recent call last):
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/bin/celery", line 9, in <module>
load_entry_point('celery==3.0.19', 'console_scripts', 'celery')()
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/__main__.py", line 14, in main
main()
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/celery.py", line 957, in main
cmd.execute_from_commandline(argv)
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/celery.py", line 901, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/base.py", line 185, in execute_from_commandline
argv = self.setup_app_from_commandline(argv)
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/base.py", line 300, in setup_app_from_commandline
self.app = self.find_app(app)
File "/srv/www/learningapi.stanford.edu/peerAPI/peerAPIenv/local/lib/python2.7/site-packages/celery/bin/base.py", line 318, in find_app
return sym.celery
AttributeError: 'module' object has no attribute 'celery'
I used the following config.
[program:celeryd]
command = celery -A celery_tasks worker --loglevel=INFO
user=peerapi
numprocs=4
stdout_logfile = <path to log>
stderr_logfile = <path to log>
autostart = true
autorestart = true
environment=PATH="<path to my project>"
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
My code also init celery properly
celery = Celery('celery_tasks', broker='amqp://guest:guest#localhost:5672//',
backend='amqp')
celery.config_from_object(celeryconfig)
and my celeryconfig.py is working normally
CELERY_TASK_SERIALIZER='json'
CELERY_RESULT_SERIALIZER='json'
CELERY_TIMEZONE='America/Los Angeles'
CELERY_ENABLE_UTC=True
Any clue?
Looks like your application can't find your celeryconfig, it happens because you CWD is not set for example. Try to use something like:
cd app_path; celeryd ...
Also you need to setup env
# local settings
PATH=/home/ubuntu/envs/app/bin:$PATH
PYTHONHOME=/home/ubuntu/envs/app/
PYTHONPATH=/home/ubuntu/projects/app/
Should work.