I 'm trying to create a retry on a celery task but every retry needs to be done after a specific delay.
Is it possible to that and how ?
for example first retry would be 5 seconds, next one 25, next one 125
#app.task(bind=true, max_retries=10, delay=5)
Related
We have registered the activities with auto-heartbeat configuration EnableAutoHeartBeat: true and also configured the activity option config HeartbeatTimeout: 15Min in the activity implementation.
Do we still need to explicitly send heart-beat using activity.heartbeat() or is it automatically taken care by the go-client library?
If its automatic, then what will happen if the Activity is waiting for external API response say >15Min delay?
What will happen during the activity heart-beat if the worker executing the activity crashes or killed?
Will Cadence retry the activities due to heart-beat failures?
No, the SDK will take care of it with this config.
The auto heartbeat will send heartbeat for every interval — the interval is 80% * Heartbeat timeout(15 minutes in your case) so that the activity won’t get timeout as long as the activity worker is still live.
So you should use a smaller heatbeat timeout, ideally 10~20s is the best.
The activity will fail with “heartbeat timeout “
Yes if you have set a retry policy .
See my other answer for retry policy
How to set proper timeout values for Cadence activities(local and regular activities, with or without retry)?
Example
Let say your activity implementation is waiting on AWS SDK API for 2 hours (max API timeout configured) --
You should still use 10-20 s for heartbeat timeout, and also use 2 hours for activity start to close timeout.
Heartbeat timeout is for detecting the host is not live anymore so that the activity can be restarted as early as possible.
Imagine this case:
Because the API takes 2 hours, the activity worker got restarted during the 2 hours.
If the HB timeout is 15 minutes, then Cadence will retry this activity after 15 minutes.
If HB timeout is 10s, then Cadence can retry it after 10s, because it will get HB timeout within 10 seconds.
For example, I have a VPS with 2 shared CPUs, 10 000 receivers, and a task that should not be executed more than 15 times per second. Also, if the request receives a 429 code then it needs to make the request again after 1800 seconds.
for i in receivers_arr:
send_message.delay(i)
#celery_app.task(ignore_result=True,
time_limit=5,
autoretry_for=(Exception,),
retry_backoff=1800,
retry_kwargs={'max_retries': 2},
retry_jitter=False,
rate_limit=1)
def send_message(reciever_id):
code = send(reciever_id)
if code == 429:
raise Exception
How to choose the right number of workers and concurrency? Also, how correctly am I using decorator arguments (at the moment I have 3 workers with 4 concurrency)? (the main task is to avoid RuntimeError: can't start new thread)
I want to have all asycnhronous tasks in my app retry on any exception and also want the retries to follow exponential backoff.
#celery_app.task(autoretry_for=(Exception,))
def some_task():
...
In my configuration I have
CELERY_TASK_ANNOTATIONS = {'*': {'max_retries': 5, 'retry_backoff': 5}}
The max_retries setting works and all tasks are now retried 5 times before failing. But all of them are retried after 180 seconds.
I want some way for all the tasks to follow retry_backoff without having to specify it for each of them so that I can change it anytime at one place.
It looks like according to the Celery documentation the property you want to set is retry_backoff_max.
Task.retry_backoff_max
A number. If retry_backoff is enabled, this option will set a maximum
delay in seconds between task autoretries. By default, this option is
set to 600, which is 10 minutes.
retry_backoff can be a number or a boolean and based on which it is the backoff will behave differently. For an exponential backoff it appears you want to set this true.
Task.retry_backoff
A boolean, or a number. If this option is set to
True, autoretries will be delayed following the rules of exponential
backoff. The first retry will have a delay of 1 second, the second
retry will have a delay of 2 seconds, the third will delay 4 seconds,
the fourth will delay 8 seconds, and so on. (However, this delay value
is modified by retry_jitter, if it is enabled.) If this option is set
to a number, it is used as a delay factor. For example, if this option
is set to 3, the first retry will delay 3 seconds, the second will
delay 6 seconds, the third will delay 12 seconds, the fourth will
delay 24 seconds, and so on. By default, this option is set to False,
and autoretries will not be delayed.
What you can do to avoid changing this in multiple places is to have a global variable, say global_retry_backoff=5 that you will use in your task annotations: #celery_app.task(autoretry_for=(Exception,), retry_backoff=global_retry_backoff) .
1 ) Celery chain.
On the doc I read this:
Here’s a simple chain, the first task executes passing its return value to the next task in the chain, and so on.
>>> from celery import chain
>>> # 2 + 2 + 4 + 8
>>> res = chain(add.s(2, 2), add.s(4), add.s(8))()
>>> res.get()
16
But where exactly is chain item's result passed to next chain item? On the celery server side, or it passed to my app and then my app pass it to the next chain item?
It's important to me, because my results is quite big to pass them to app, and I want to do all this messaging right into celery server.
2 ) Celery group.
>>> g = group(add.s(i) for i in xrange(10))
>>> g(10).get()
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Can I be sure that these tasks will be executed as much as possible together. Will celery give priority certain group since the first task of the group start to be being executed?
For example I have 100 requests and each request run group of task, and I don't want to mix task from different groups between each other. First started request to be processed can be the last completed, while his the last task are waiting for free workers which are busy with tasks from others requests. It seems to be better if group of task will be executed as much as possible together.
I will really appreciate if you can help me.
1. Celery Chain
Results are passed on celery side using message passing broker such as rabbitmq. Result are stored using result backend(explicitly required for chord execution). You could verify this information by running your celery worker with loglevel 'INFO' and identify how tasks are invoked.
Celery maintains dependency graph once you invoke tasks, so it exactly knows how to chain your tasks.
Consider callbacks where you link two different tasks,
http://docs.celeryproject.org/en/latest/userguide/canvas.html#callbacks
2. Celery Group
When you call tasks in group celery executes(invokes) them in parallel. Celery worker will try to pick up them depending upon workload it can pick up. If you invoke large number of tasks than your worker can handle, it is certainly possible your first few tasks will get executed first then celery worker will pick rest gradually.
If you have very large no. of task to be invoked in parallel better to invoke then in chunks of certain pool size,
You can mention priority of tasks as mentioned in answer
Completion of tasks in group depends on how much time each task takes. Celery tries to do fair task scheduling as much as possible.
I have a big task,which i break down into smaller task and analyse them. I have a basic model.
Master,worker and listener .
Master creates the tasks,give them to worker actors. Once an worker actor completes,it asks for another task from the master. Once all task is completed ,they inform the listener. They usually take around less than 2 minutes to complete 1000 tasks.
Now,Some time the time taken for some tasks might be more than others. I want to set timer for each task,and if a task takes more time,then worker task should be aborted by the master and the task has to be resubmitted later as new one. How to implement this? I can calculate the time taken by a worker task,but how Master actor keeps tab on time taken by all worker actors in real time?
One way of handling this would be for each worker, on receipt of a task to start on, sets a timeout before changing state to process the task, eg:
context.setReceiveTimeout(5 minutes) // for the '5 minutes' notation - import scala.concurrent.duration._
If the timeout is received, the worker can abort the task (or whatever other action you deem appropriate - eg. kill itself, or pass a notification message back to the master). Don't forget to cancel the timeout (set duration = Duration.Undefined) if the task is completed or the like.