How to have celery expire results when using a database backend - celery

I'm not sure I understand how result_expires works.
I read,
result_expires
Default: Expire after 1 day.
Time (in seconds, or a timedelta object) for when after stored task tombstones will be deleted.
A built-in periodic task will delete the results after this time (celery.backend_cleanup), assuming that celery beat is enabled. The task runs daily at 4am.
...
When using the database backend, celery beat must be running for the results to be expired.
(from here: http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-result_expires)
So, in order for this to work, I have to actually do something like this:
python -m celery -A myapp beat -l info --detach
?
Is that what the documentation is referring to by "celery beat is enabled"? Or, rather than executing this manually, there is some configuration that needs to be set which would cause celery beat to be called automatically?

Re: celery beat--you are correct. If you use a database backend, you have to run celery beat as you posted in your original post. By default celery beat sets up a daily task that will delete older results from the results database. If you are using a redis results backend, you do not have to run celery beat. How you choose to run celery beat is up to you, personally, we do it via systemd.
If you want to configure the default expiration time to be something other than the default 1 day, you can use the result_expires setting in celery to set the number of seconds after a result is recorded that it should be deleted. e.g., 1800 for 30 minutes.

Related

How to prevent celery.backend_cleanup from executing in default queue

I am using python + flask + SQS and I'm also using celery beat to execute some scheduled tasks.
Recently I went from having one single default "celery" queue to execute all my tasks to having dedicated queues/workers for each task. This includes tasks scheduled by celery beat which now all go to a queue named "scheduler".
Before dropping the "celery" queue, I monitored it to see if any tasks would wind up in that queue. To my surprise, they did.
Since I had no worker consuming from that queue, I could easily inspect the messages which piled up using the AWS console. What is saw was that all tasks were celery.backend_cleanup!!!
I cannot find out from the celery docs how do I prevent this celery.backend_cleanup from getting tossed into this default "celery" queue which I want to get rid of! And the docs on beat do not show an option to pass a queue name. So how do I do this?
This is how I am starting celery beat:
/venv/bin/celery -A backend.app.celery beat -l info --pidfile=
And this is how I am starting the worker
/venv/bin/celery -A backend.app.celery worker -l info -c 2 -Ofair -Q scheduler
Keep in mind, I don't want to stop backend_cleanup from executing, I just want it to go in whatever queue I specify.
Thanks ahead for the assistance!
You can override this in the beat task setup. You could also change the scheduled time to run here if you wanted to.
app.conf.beat_schedule = {
'backend_cleanup': {
'task': 'celery.backend_cleanup',
'options': {'queue': <name>,
'exchange': <name>,
'routing_key': <name>}
}
}

Do I need to enable celery beat to activate result_expires?

I have django celery up and running.
Do I have something else in order to activate https://docs.celeryproject.org/en/master/userguide/configuration.html#std:setting-result_expires or it works already?
I don't have celery beat installed.
No. You do not need Celery beat for that. Result expiry is handled internally by Celery and/or the backend you use in your project.
However, keep in mind this:
Note
For the moment this only works with the AMQP, database, cache, Couchbase, and Redis backends.
When using the database backend, celery beat must be running for the results to be expired.

Is a crontab enough to schedule a celery task to run periodically?

I have a periodic task that uses a crontab to run every day at 1:01 AM using
run_every = crontab(hour=1, minute=1)
Once I get my server up and running, is that enough to trigger the task to run once a day? Or do I also need to use a database scheduler?
Yes. It should be enough as Celery beat has own state file that is enough to run everything as you require.

Change timeout for builtin celery tasks (i.e. celery.backend_cleanup)

We're using Celery 4.2.1 and Redis with global soft and hard timeouts set for our tasks. All of our custom tasks are designed to stay under the limits, but every day the builtin task backend_cleanup task ends up forcibly killed by the timeouts.
I'd rather not have to raise our global timeout just to accommodate builtin Celery tasks. Is there a way to set the timeout of these builtin tasks directly?
I've had trouble finding any documentation on this or even anyone hitting the same problem.
Relevant source from celery/app/builtins.py:
#connect_on_app_finalize
def add_backend_cleanup_task(app):
"""Task used to clean up expired results.
If the configured backend requires periodic cleanup this task is also
automatically configured to run every day at 4am (requires
:program:`celery beat` to be running).
"""
#app.task(name='celery.backend_cleanup', shared=False, lazy=False)
def backend_cleanup():
app.backend.cleanup()
return backend_cleanup
You may set backend cleanup schedule directly in celery.py.
app.conf.beat_schedule = {
'backend_cleanup': {
'task': 'celery.backend_cleanup',
'schedule': 600, # 10 minutes
},
}
And then run the beat celery process:
celery -A YOUR_APP_NAME beat -l info --detach

Gracefully update running celery pod in Kubernetes

I have a Kubernetes cluster running Django, Celery, RabbitMq and Celery Beat. I have several periodic tasks spaced out throughout the day (so as to keep server load down). There are only a few hours when no tasks are running, and I want to limit my rolling-updates to those times, without having to track it manually. So I'm looking for a solution that will allow me to fire off a script or task of some sort that will monitor the Celery server, and trigger a rolling update once there's a window in which no tasks are actively running. There are two possible ways I thought of doing this, but I'm not sure which is best, nor how to implement either one.
Run a script (bash or otherwise) that checks up on the Celery server every few minutes, and initiates the rolling-update if the server is inactive
Increment the celery app name before each update (in the Beat run command, the Celery run command, and in the celery.py config file), create a new Celery pod, rolling-update the Beat pod, and then delete the old Celery 12 hours later (a reasonable time span for all running tasks to finish)
Any thoughts would be greatly appreciated.