I want to purge Celery tasks. Celery is running on GCP Kubernetes in my case. Is there a way to do it from terminal? For example via kubectl?
The solution I found was to write to file shared by both API and
Celery containers. In this file, whenever an interruption is captured,
a flag is set to true. Inside the celery containers I keep
periodically checking the contents of such file. If the flag is set to
true, then I gracefully clear things up and raise an error.
Does this solve your problem? How can I properly kill a celery task in a kubernetes environment?
an alternate solution may be:
$ celery -A proj purge
or
from proj.celery import app
app.control.purge()
Related
I have a fastapi app and I use celery for some async tasks. I also use docker, so fastapi runs in one container and celery on another. Now I am moving to break the workers into different queues and they will run in different containers. Right now I am using almost the same image for fastapi and celery, but for this new worker I will end up with a image way bigger than it should be since I have code and packages that the worker don't need. To get around that I now have 2 different dockerfiles, one for each worker, but in both of them I have the exact same file for setting up the celery app.
This is the celery I was setting up:
celery_app = Celery(
broker=config.celery_settings.broker_url,
backend=config.celery_settings.result_backend,
include=[
"src.iam.service_layer.tasks",
"src.receipt_tracking.service_layer.tasks",
"src.cfe_scraper.tasks",
],
)
And the idea is to spin off src.cfe_scraper.tasks and in that docker image I will not have src.iam.service_layer.tasks neither src.receipt_tracking.service_layer.tasks. But when I try to build the image I get an error saying that those paths don't exists, which is correct for that case, but if I simple delete the include argument the worker won't have any task registered. Is there an easy way to solve that without having to have two modules to setup different celery apps?
From the celery help function:
> celery worker -h
...
Embedded Beat Options:
-B, --beat Also run the celery beat periodic task scheduler. Please note that there must only be
one instance of this service. .. note:: -B is meant to be used for development
purposes. For production environment, you need to start celery beat separately.
This also appears in the docs.
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use:
celery -A proj worker -B
But it's not actually explained why it's "bad" to use this in production. Would love some insight.
The --beat option will start a beat scheduler along with the worker.
But you only need one beat scheduler。
In the production environment, you usually have more than one worker running. Using --beat option will be a disaster.
For example: you have a event scheduled at 12:am each day.
If you started two beat process, the event will run twice at 12:am each day.
If you’ll never run more than one worker node, --beat option if just fine.
i have two systemd service
one handles my celery workers(10 queue for different tasks) and one handles celery beat
after deploying new code i restart celery worker service to get new tasks and update celery jobs
Should i restart celery beat with celery worker service too?
or it gets new tasks automatically ?
It depends on what type of scheduler you're using.
If it's default PersistentScheduler then yes, you need to restart beat daemon to allow it to pick up new configuration from the beat_schedule setting.
But if you're using something like django-celery-beat which allows managing periodic tasks at runtime then you don't have to restart celery beat.
I'm looking to fully understand the jobs in kubernetes.
I have successfully create and executed a job, but I do not see the use case.
Not being able to rerun a job or not being able to actively listen to it completion makes me think it is a bit difficult to manage.
Anyone using them? Which is the use case?
Thank you.
A job retries pods until they complete, so that you can tolerate errors that cause pods to be deleted.
If you want to run a job repeatedly and periodically, you can use CronJob alpha or cronetes.
Some Helm Charts use Jobs to run install, setup, or test commands on clusters, as part of installing services. (Example).
If you save the YAML for the job then you can re-run it by deleting the old job an creating it again, or by editing the YAML to change the name (or use e.g. sed in a script).
You can watch a job's status with this command:
kubectl get jobs myjob -w
The -w option watches for changes. You are looking for the SUCCESSFUL column to show 1.
Here is a shell command loop to wait for job completion (e.g. in a script):
until kubectl get jobs myjob -o jsonpath='{.status.conditions[?(#.type=="Complete")].status}' | grep True ; do sleep 1 ; done
One of the use case can be to take a backup of a DB. But as already mentioned that are some overheads to run a job e.g. When a Job completes the Pods are not deleted . so you need to manually delete the job(which will also delete the pods created by job). so recommended option will be to use Cron instead of Jobs
I have a Kubernetes cluster running Django, Celery, RabbitMq and Celery Beat. I have several periodic tasks spaced out throughout the day (so as to keep server load down). There are only a few hours when no tasks are running, and I want to limit my rolling-updates to those times, without having to track it manually. So I'm looking for a solution that will allow me to fire off a script or task of some sort that will monitor the Celery server, and trigger a rolling update once there's a window in which no tasks are actively running. There are two possible ways I thought of doing this, but I'm not sure which is best, nor how to implement either one.
Run a script (bash or otherwise) that checks up on the Celery server every few minutes, and initiates the rolling-update if the server is inactive
Increment the celery app name before each update (in the Beat run command, the Celery run command, and in the celery.py config file), create a new Celery pod, rolling-update the Beat pod, and then delete the old Celery 12 hours later (a reasonable time span for all running tasks to finish)
Any thoughts would be greatly appreciated.