I have bunch of celery beat tasks running at different times a day, but there is one particularly task at 8:00AM to send birthday messages which gets executed while beats cleanup happens at 4:00AM, so my task is running twice in a day. and I notice this happen when I restart celery beat the previous day. How to get around this and tell celery not to execute it at 4:00AM.
Related
I have a periodic task that uses a crontab to run every day at 1:01 AM using
run_every = crontab(hour=1, minute=1)
Once I get my server up and running, is that enough to trigger the task to run once a day? Or do I also need to use a database scheduler?
Yes. It should be enough as Celery beat has own state file that is enough to run everything as you require.
I have one celery beat task, that is running other scraping tasks.
When those tasks are not processed, queue is starting to grow.
I know celery use backend db, but there are only: id, task_id, status, result, date_done, traceback.
My ideas is to switch from celery beat to rescheduling tasks by them self, but some tasks are unconnected or can get lost, so celery beat is useful in these cases.
Second idea is to add my logs, like my table, where I can save task-id and task context, by which I will be able to find out if task already exists.
May be you have better approach? Thanks
celery tasks can be delayed with expires argument:
http://docs.celeryproject.org/en/latest/userguide/calling.html#expiration
I have a Kubernetes cluster running Django, Celery, RabbitMq and Celery Beat. I have several periodic tasks spaced out throughout the day (so as to keep server load down). There are only a few hours when no tasks are running, and I want to limit my rolling-updates to those times, without having to track it manually. So I'm looking for a solution that will allow me to fire off a script or task of some sort that will monitor the Celery server, and trigger a rolling update once there's a window in which no tasks are actively running. There are two possible ways I thought of doing this, but I'm not sure which is best, nor how to implement either one.
Run a script (bash or otherwise) that checks up on the Celery server every few minutes, and initiates the rolling-update if the server is inactive
Increment the celery app name before each update (in the Beat run command, the Celery run command, and in the celery.py config file), create a new Celery pod, rolling-update the Beat pod, and then delete the old Celery 12 hours later (a reasonable time span for all running tasks to finish)
Any thoughts would be greatly appreciated.
In a case recently, I had to restart some inexplicably idle workers run by supervisord. We are thinking about adding a periodic restart, say, once or twice a day.
This could easily be done using supervisorctl, but is there any chance tasks will be lost while the restart occurs?
I am trying to create a bunch of celery tasks asynchronously on the fly. Say there are 1000 tasks I start asynchronously and I have only one celeryd process running to execute tasks. How many threads will be created by celery to handle these tasks?
If there are multiple threads that celery starts automatically to process the task queue, how do I limit celery to execute only 100 threads at a given point of time.
Thanks.
Its starts as many as you specify with the CELERYD_OPTS concurrency parameter.
Which is also discussed here.