Celery worker --exclude-queues option is not affected - celery

I use celery 4.0.2.
I want my celery execute only specific queue. (or exclude specific queue)
So I executed celery as below.
celery -A mycelery worker -Q queue1,queue2 -E --logfile=./celery.log --pidfile=./celery.pid &
but when I run the code like this, 'testqueue' is consumed very well..!
mycelery.control.add_consumer('testqueue', reply=False)
myfunc.apply_async(queue='testqueue')
So I change the option as below and execute source code.
celery -A mycelery worker -X testqueue -E --logfile=./celery.log --pidfile=./celery.pid &
myfunc still run well..
'-Q' option means 'consume only the name of queue',
And '-X'option means 'never consume the name of queue'... isn't it?
What's wrong?

Related

Kuebctl wait for multiple jobs completion (fail or success)

I want to wait for multiple jobs which can fail or succeed. I wrote a simple script based on an answer from Sebastian N. It's purpose is to wait for either success or fail of a job. The script works fine for one job (it can only fail or success obviously).
Now for the problem... I need to wait for multiple jobs identified by the same label. The script works fine when all jobs fail or all jobs succeed. But when some job fails, some succeeds the kubectl wait will time out.
For what I intend to do next it's not necessary to know which jobs failed or succeeded I just want to know when they end. Here is the "wait part" of the script I wrote (LABEL is the label by which the jobs I want to wait for are identified):
kubectl wait --for=condition=complete -l LABEL --timeout 14400s && exit 0 &
completion_pid=$!
kubectl wait --for=condition=failed -l LABEL --timeout 14400s && exit 1 &
failure_pid=$!
wait -n $completion_pid $failure_pid
exit_code=$?
if (( exit_code == 0 )); then
echo "Job succeeded"
pkill -P $failure_pid
else
echo "Job failed"
pkill -P $completion_pid
fi
If someone is curious why I kill the other kubectl wait command it's because of the timeout I set. When the job succeeds the process ends but the other one waits until the time out is reached. To stop running it on the background I simply kill it.
I found a workaround that fits my purpose. I found out that kubectl logs with the --follow or -f flag pointed to /dev/null actually "waits" until all jobs are done.
Further explanation:
The --follow flag means that the logs are printed continuously - not looking at the finishing state of the job. In addition, pointing the logs to /dev/null doesn't leave any unwanted text. I needed to print the output of logs via Python so I added another kubectl logs at the end (which I think is not ideal but it serves the purpose). I use sleep because I assume there is some procedure after all jobs are completed - without it the logs are not printed. Finally I use --tail=-1 flag because my logs are expected to have large output.
Here is my updated script (this part replaces everything from the script specified in question):
#wait until all jobs are completed, doesn't matter if failed or succeeded
kubectl logs -f -l LABEL > /dev/null
#sleep for some time && print final logs
sleep 2 && kubectl logs -l LABEL --tail=-1

Printing to stdout in Celery worker shows up as basic.qos: prefetch_count->X

I have a periodic Celery task in which I make quite a bit of printing to stdout.
For example:
print(f"Organization {uuid} updated")
All of these print statements look like this in my worker output:
[2019-10-31 10:36:00,466: DEBUG/MainProcess] basic.qos: prefetch_count->102
The counter at the end is incremented for each print. Why is this? What would I have to change to see stdout?
I run the worker as such:
$ celery -A project worker --purge -l DEBUG
You do not call print() in your tasks. Instead, you have something like: logger = get_task_logger(__name__) and use logger in your tasks when you need to write something to the log at certain level.

How to avoid running code on head node of the cluster

I am using a cluster to run my code. I use a runm file to run my code on the cluster. runm script is as below:
#!/bin/sh
#SBATCH --job-name="....."
#SBATCH -n 4
#SBATCH --output=bachoutput
#SBATCH --nodes=1-1
#SBATCH -p all
#SBATCH --time=1-01:00:00
matlab <znoDisplay.m>o1
today, when my code was running I received an email from cluster boss which says please don't run your code on the head node and use other nodes. I did a lot of searches but I couldn't find that how can I have changed the node from the main node to other nodes. Could anyone help me? Is there any script which could be used in runm to change it?
Could anyone help me to avoid running my code on head node?
If the Matlab process was running on the head node, it means you did not submit your script but you most probably simply ran it.
Make sure to submit it with
sbatch runm
Then you can see it waiting in the queue (or running) with
squeue -u $USER
and check that it is not running on the frontend with
top
Also note #atru's comment about the Matlab options -nodisplay and -nosplash for Matlab to work properly in batch mode.

Celery : CELERYD_CONCURRENCY and number of workers

From the other stackoverflow answer, I've tried to limit celery's number of workers
After I terminated all the celery worker, I restarted celery with new configuration.
CELERYD_CONCURRENCY = 1 (in Django's settings.py)
Then I typed following command to check how many celery workers are working.
ps auxww | grep 'celery worker' | grep -v grep | awk '{print $2}'
It returns two PIDs like 24803, 24817.
Then I changed configuration to CELERYD_CONCURRENCY = 2 and restarted celery.
Same command returns three PIDs like 24944, 24958, 24959. (As you can see, last two PIDs are sequential)
It implies that number of workers is increased as I expected.
However, I don't know why it returns two PIDs even though there is only one celery worker is working?
Is there a something subsidiary process to help worker?
I believe one process always acts like the controller that listens for tasks and then distributes them to it's child processes to actually perform the work. Therefore, you will always have 1 more process than the configuration setting.

how do i find a complete list of available torque pbs queues?

Q: How do I find the available PBS queues on the "typical" Torque MPI system?
(asking our admin takes 24+ hours, and the system changes with constant migration)
(for example, "Std8" is one possible queue)
#PBS -q Std8
The admin finally got back. To get a list of queues on our hpc system, the command is:
$ qstat -q
qstat -f -Q
shows available queues and details about the limits (cputime, walltime, associated nodes etc.)
How about simply "pbsnodes" - that should probably tell you more than you care to know. Or I suppose "qstat -Q".
Run
qhost -q
to see the node-queue mapping.
Another option:
qmgr -c 'p q'
The p and q are for print queues.