Celery Beat runs duplicate tasks - celery

I have one celery beat task, that is running other scraping tasks.
When those tasks are not processed, queue is starting to grow.
I know celery use backend db, but there are only: id, task_id, status, result, date_done, traceback.
My ideas is to switch from celery beat to rescheduling tasks by them self, but some tasks are unconnected or can get lost, so celery beat is useful in these cases.
Second idea is to add my logs, like my table, where I can save task-id and task context, by which I will be able to find out if task already exists.
May be you have better approach? Thanks

celery tasks can be delayed with expires argument:
http://docs.celeryproject.org/en/latest/userguide/calling.html#expiration

Related

How to prevent celery.backend_cleanup from executing in default queue

I am using python + flask + SQS and I'm also using celery beat to execute some scheduled tasks.
Recently I went from having one single default "celery" queue to execute all my tasks to having dedicated queues/workers for each task. This includes tasks scheduled by celery beat which now all go to a queue named "scheduler".
Before dropping the "celery" queue, I monitored it to see if any tasks would wind up in that queue. To my surprise, they did.
Since I had no worker consuming from that queue, I could easily inspect the messages which piled up using the AWS console. What is saw was that all tasks were celery.backend_cleanup!!!
I cannot find out from the celery docs how do I prevent this celery.backend_cleanup from getting tossed into this default "celery" queue which I want to get rid of! And the docs on beat do not show an option to pass a queue name. So how do I do this?
This is how I am starting celery beat:
/venv/bin/celery -A backend.app.celery beat -l info --pidfile=
And this is how I am starting the worker
/venv/bin/celery -A backend.app.celery worker -l info -c 2 -Ofair -Q scheduler
Keep in mind, I don't want to stop backend_cleanup from executing, I just want it to go in whatever queue I specify.
Thanks ahead for the assistance!
You can override this in the beat task setup. You could also change the scheduled time to run here if you wanted to.
app.conf.beat_schedule = {
'backend_cleanup': {
'task': 'celery.backend_cleanup',
'options': {'queue': <name>,
'exchange': <name>,
'routing_key': <name>}
}
}

Django celery delete specific tasks

we know we have some badly formed tasks in the celery queue that are crashing our workers, is there an easy way to manually delete them?
We don't want to flush the whole thing as there might be some important emails to be sent...
Use celery flower to manage and monitor the celery task. Refer to https://flower.readthedocs.io/en/latest/

celery beat running task on cleanup, how to stop it

I have bunch of celery beat tasks running at different times a day, but there is one particularly task at 8:00AM to send birthday messages which gets executed while beats cleanup happens at 4:00AM, so my task is running twice in a day. and I notice this happen when I restart celery beat the previous day. How to get around this and tell celery not to execute it at 4:00AM.

What if i schedule tasks for celery to perform every minute and it is not able to complete it in time?

If I schedule the task for every minute and if it is not able to be getting completed in the time(one minute). Would the task wait in queue and it will go on like this? if this happens then after few hours it will be overloaded. Is there any solution for this kind of problems?
I am using beat and worker combination for this. It is working fine for less records to perform tasks. but for large database, I think this could cause problem.
Task is assign to queue (RabbitMQ for example).
Workers are queue consumers, more workers (or worker with high concurrency) - more tasks could be handled in parallel.
Your periodic task produce messages of the same type (I guess) and your celery router route them to the same queue.
Just set your workers to consume messages from that queue and that's all.
celery worker -A celeryapp:app -l info -Q default -c 4 -n default_worker#%h -Ofair
In the example above I used -c 4 for concurrency of four (eqv. to 4 consumers/workers). You can also start move workers and let them consume from the same queue with -Q <queue_name> (in my example it's default queue).
EDIT:
When using celery (the worker code) you are initiate Celery object. In Celery constructor you are setting your broker and backend (celery used them as part of the system)
for more info: http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#application

Number of celery tasks executed at a given point of time

I am trying to create a bunch of celery tasks asynchronously on the fly. Say there are 1000 tasks I start asynchronously and I have only one celeryd process running to execute tasks. How many threads will be created by celery to handle these tasks?
If there are multiple threads that celery starts automatically to process the task queue, how do I limit celery to execute only 100 threads at a given point of time.
Thanks.
Its starts as many as you specify with the CELERYD_OPTS concurrency parameter.
Which is also discussed here.