In Celery tasks, do retry_backoff and retry_backoff_max affect the manual self.retry() calls? - celery

So according to the docs, default_retry_delayspecifies the time in seconds before a retry of the task should be executed. I just want to be sure that this only affects the manually called self.retry()calls and not the autoretries triggered by Celery when the task encounters predefined exceptions.
Likewise, I want to know if retry_backoff and retry_backoff_maxonly affect the autoretries and not the manual self.retry().
Finally, what happens when all of these are set?

That answer may change depending on the version of Celery that is used. However, as far as I have checked with the source code of versions v4.3.1, v4.4.7 and v5.0.0, retry_backoff and retry_backoff_max only affect the auto-retry. If autoretry_for is not set as True (or a True value such as positive integers), then the backoff parameters is not used.
As it can be seen from the implementation, when a True value is set to autoretry_for, Celery uses retry_backoff, retry_backoff_max, and retry_jitter arguments in order to compute the countdown of the next retry, before calling .retry method. Thus, these retry_xx arguments are not directly used by the .retry method; however, they affect the value of countdown. Thus, they are only used in the auto_retry.
When the countdown is explicity defined or computed via autoretry, then the value of default_retry_delay is ignored, as it can be seen in the implementation of v5.0.0. The same is valid when eta is set. Thus, to be able to use default_retry_delay, countdown, and eta should not be set. Additionally, autoretry_for should not be set or it should be set with a False value.
Another approach to use default_retry_delay and autoretry at the same time would be to define autoretry and retry variables; but for manual retry calls, set the eta and countdown to None explicitly. One example:
#app.task(bind=True, default_retry_delay=360, autoretry_for=[ExternalApiError], max_retries=5, retry_backoff=1800, retry_backoff_max=8 * 3600, retry_jitter=False)
def my_task(self, data):
try:
... # some code here that may raise ExternalApiError
except SomeOtherError:
try:
self.retry(countdown=None, eta=None)
except self.MaxRetriesExceededError:
... # when maximum number of retries has been exceeded.
If you want to use backoff parameters with manual retry calls, it can be implemented by using the method get_exponential_backoff_interval of Celery. One example could be this:
from celery.utils.time import get_exponential_backoff_interval
#app.task(bind=True, max_retries=5)
def my_task(self, data):
try:
... # some code here that may raise ExternalApiError
except SomeOtherError:
try:
retry_backoff = True
retry_backoff_max = 5
retry_jitter = True
countdown = get_exponential_backoff_interval(
factor=retry_backoff,
retries=self.request.retries,
maximum=retry_backoff_max,
full_jitter=retry_jitter)
)
self.retry(countdown=countdown)
except self.MaxRetriesExceededError:
... # when maximum number of retries has been exceeded.

Related

How to implement retry logic?

I'm trying to use pytransitions to implement retransmit logic from an initialization state. The summary is that during the init state if the other party isn't responding after 1 second resend the packet. This is very similar to what I see here: https://github.com/pytransitions/transitions/pull/461
I tried this patch, and even though I see the timeouts/failures happening, my callback is only called the first time. This is true with before/after and on_enter/exit. No matter what I've tried, I can't get the retransmit to occur again. Any ideas?
Even though this question is a bit dated I'd like to post an answer since Retry states have been added to transitions in release 0.9.
Retry itself will only count how often a state has been re-entered meaning that the counter will increase when transition source and destination are equal and reset otherwise. It's entirely passive and need another mean to trigger events. The Timeout state extension is commonly used in addition to Retry to achieve this. In the example below a state machine is decorated with Retry and Timeout state extensions which allows to use a couple of keywords for state definitions:
timeout - time in seconds before a timeout is triggered after a state has been entered
on_timeout- the callback(s) called when timeout was triggered
retries - the number of retries before failure callbacks are called when a state is re-entered
on_failure - the callback(s) called when the re-entrance counter reaches retries
The example will re-enter pinging unless a randomly generated number between 0 and 1 is larger than 0.8. This can be interpreted as a server that roughly answers only every fifth request. When you execute the example the retries required to reach 'initialized' can vary or even fail when retries are reached.
from transitions import Machine
from transitions.extensions.states import add_state_features, Retry, Timeout
import random
import time
# create a custom machine with state extension features and also
# add enter callbacks for the states 'pinging', 'initialized' and 'init_failed'
#add_state_features(Retry, Timeout)
class RetryMachine(Machine):
def on_enter_pinging(self):
print("pinging server...")
if random.random() > 0.8:
self.to_initialized()
def on_enter_initialized(self):
print("server answered")
def on_enter_init_failed(self):
print("server did not answer!")
states = ["init",
{"name": "pinging",
"timeout": 0.5, # after 0.5s we assume the "server" wont answer
"on_timeout": "to_pinging", # when timeout enter 'pinging' again
"retries": 3, # three pinging attempts will be conducted
"on_failure": "to_init_failed"},
"initialized",
"init_failed"]
# we don't pass a model to the machine which will result in the machine
# itself acting as a model; if we add another model, the 'on_enter_<state>'
# methods must be defined on the model and not machine
m = RetryMachine(states=states, initial="init")
assert m.is_init()
m.to_pinging()
while m.is_pinging():
time.sleep(0.2)

Anylogic - Assembler should stop working for 2 hours after 10 assemblies done

The "Assembler" should stop working for 2 hours after 10 assemblies are done.
How can I achieve that?
There are so many ways to do this depending on what it means to stop working and what the implications are for the incoming parts.. but here's one option
create a resourcePool called Machine, this will be used along with the technicians:
on the "on exit" action of the assembler do this (I use 9 instead of 10 because the out.count() doesn't count until the agent is completely out, so when it counts 9, it means that you have produced 10)
if(self.out.count()==9){
machine.set_capacity(0);
create_MyDynamicEvent(2, HOUR);
}
In your dynamice event (that you have to create) you will add the following code:
machine.set_capacity(1);
A second option is to have a variable countAssembler count the number of items produced... then
on exit you write countAssembler++;
on enter delay you write the following:
if(countAssembler==10){
self.suspend(agent);
create_MyDynamicEvent(2, HOUR,agent);
}
on the dynamic event you write:
assembler.resume(agent);
Don't forget to add the parameter needed in the dynamic event:
Create a variable called countAssembler of type int. Increment this as agents pass through the assembler. Also create a variable called assemblerStopTime. You also record the assembler stop time with assemblerStopTime=time()
Place a selectOutputOut block before the and let them in if countAssembler value is less than 10. Otherwise send to a Wait block.
Now, to maintain the FIFO rule, in the first selectOutputOut condition, you need to check also if there is any agent in the wait block and if the current time - assemblerStopTime is greater than 2. If there is, you free it and send to the assembler with wait.free(0) function. And send the current agent to wait. You also need to reset the countAssembler to zero.

Product ID in agent-ANYLOGIC

I have created an agent ("Handsfree") in the agent I introduced parameter "WIRE". How do i make that parameter vary as simulation goes on so that it should automatically be set to either true or false so that i could assign the delay a specific time according to the parameter "WIRE".
When does WIRE switch between true and false values? Is it based on time? If so, you can write a simple function to switch the initial value from true to false (or vice-versa). That function could also set the delay time in another parameter, "pDelayTime". Then, you'd call that function with an event at specified or random times (whatever is needed). The delay time in your block would need to be set to "pDelayTime".

How to trigger handle_info due to timeout in erlang?

I am using a gen_server behaviour and trying to understand how can handle_info/2 be triggered from a timeout occurring in a handle_call for example:
-module(server).
-export([init/1,handle_call/3,handle_info/2,terminate/2).
-export([start/0,stop/0]).
init(Data)->
{ok,33}.
start()->
gen_server:start_link(?MODULE,?MODULE,[]).
stop(Pid)->
gen_server:stop(Pid).
handle_call(Request,From,State)->
Return={reply,State,State,5000},
Return.
handle_info(Request,State)->
{stop,Reason,State}.
terminate(Reason,State)->
{ok,S}=file:file_open("D:/Erlang/Supervisor/err.txt",[read,write]),
io:format(S,"~s~n",[Reason]),
ok.
What i want to do:
I was expecting that if I launch the server and would not use gen_server:call/2 for 5 seconds (in my case) then handle_info would be called, which would in turn issue the stop thus calling terminate.
I see it does not happen this way, in fact handle_info is not called at all.
In examples such as this i see the timeout is set in the return of init/1.What I can deduce is that it handle_info gets triggered only if I initialize the server and issue nothing (nor cast nor call for N seconds).If so why I can provide Timeout in the return of both handle_cast/2 and handle_call/3 ?
Update:
I was trying to get the following functionality:
If no call is issued in X seconds trigger handle_info/2
If no cast is issued in Y seconds trigger handle_info/2
I thought this timeouts can be set in the return of handle_call and handle_cast:
{reply,Reply,State,X} //for call
{noreply,State,Y} //for cast
If not, when are those timeouts triggered since they are returns?
To initiate timeout handling from gen_server:handle_call/3 callback, this callback has to be called in the first place. Your Return={reply,State,State,5000}, is not executed at all.
Instead, if you want to “launch the server and would not use gen_server:call/2 for 5 seconds then handle_info/2 would be called”, you might return {ok,State,Timeout} tuple from gen_server:init/1 callback.
init(Data)->
{ok,33,5000}.
You cannot set the different timeouts for different calls and casts. As stated by Alexey Romanov in comments,
Having different timeouts for different types of messages just isn’t something any gen_* behavior does and would have to be simulated by maintaining them inside state.
If one returns {reply,State,Timeout} tuple from any handle_call/3/handle_cast/2, the timeout will be triggered if the mailbox of this process is empty after Timeout.
i suggest you read source code:gen_server.erl
% gen_server.erl
% line 400
loop(Parent, Name, State, Mod, Time, HibernateAfterTimeout, Debug) ->
Msg = receive
Input ->
Input
after Time ->
timeout
end,
decode_msg(Msg, Parent, Name, State, Mod, Time, HibernateAfterTimeout, Debug, false).
it helps you to understand the parameter Timeout

Celery countdown sets eta in the past

I have a celery task which I call using the countdown keyword.
def plan_my_task():
countdown = some_computation_function() #result is a positive integer
res = my_task.apply_async(args=[some_arg], countdown=countdown)
#task
def my_task(some_arg):
do_something()
In my logs in see something like
[2013-11-14 01:22:31,516: INFO/MainProcess] Received task: my_module.my_task[d5d36a59-b88a-43cb-b7ac-bf0737cdab2c] eta:[2013-11-14 01:16:17.513310+01:00]
As you can see, the eta is set before the current time!
I use celery 3.1.
I don't actually use celery, but from the API, it looks like countdown is a keyword parameter to both Task.apply_async, or Task.retry. It's not a keyword for a function just decorated with #task
EDIT: According to this answer, it may be that the log time is in local time, and the ETA time is in UTC time. This might be possible if countdown was in the thousands (to give us a few hours off instead of the few minutes it looks like its off comparing the times directly)