How to implement retry logic?

How to implement retry logic? - pytransitions

I'm trying to use pytransitions to implement retransmit logic from an initialization state. The summary is that during the init state if the other party isn't responding after 1 second resend the packet. This is very similar to what I see here: https://github.com/pytransitions/transitions/pull/461
I tried this patch, and even though I see the timeouts/failures happening, my callback is only called the first time. This is true with before/after and on_enter/exit. No matter what I've tried, I can't get the retransmit to occur again. Any ideas?

Even though this question is a bit dated I'd like to post an answer since Retry states have been added to transitions in release 0.9.
Retry itself will only count how often a state has been re-entered meaning that the counter will increase when transition source and destination are equal and reset otherwise. It's entirely passive and need another mean to trigger events. The Timeout state extension is commonly used in addition to Retry to achieve this. In the example below a state machine is decorated with Retry and Timeout state extensions which allows to use a couple of keywords for state definitions:
timeout - time in seconds before a timeout is triggered after a state has been entered
on_timeout- the callback(s) called when timeout was triggered
retries - the number of retries before failure callbacks are called when a state is re-entered
on_failure - the callback(s) called when the re-entrance counter reaches retries
The example will re-enter pinging unless a randomly generated number between 0 and 1 is larger than 0.8. This can be interpreted as a server that roughly answers only every fifth request. When you execute the example the retries required to reach 'initialized' can vary or even fail when retries are reached.
from transitions import Machine
from transitions.extensions.states import add_state_features, Retry, Timeout
import random
import time
# create a custom machine with state extension features and also
# add enter callbacks for the states 'pinging', 'initialized' and 'init_failed'
#add_state_features(Retry, Timeout)
class RetryMachine(Machine):
def on_enter_pinging(self):
print("pinging server...")
if random.random() > 0.8:
self.to_initialized()
def on_enter_initialized(self):
print("server answered")
def on_enter_init_failed(self):
print("server did not answer!")
states = ["init",
{"name": "pinging",
"timeout": 0.5, # after 0.5s we assume the "server" wont answer
"on_timeout": "to_pinging", # when timeout enter 'pinging' again
"retries": 3, # three pinging attempts will be conducted
"on_failure": "to_init_failed"},
"initialized",
"init_failed"]
# we don't pass a model to the machine which will result in the machine
# itself acting as a model; if we add another model, the 'on_enter_<state>'
# methods must be defined on the model and not machine
m = RetryMachine(states=states, initial="init")
assert m.is_init()
m.to_pinging()
while m.is_pinging():
time.sleep(0.2)

Related

Anylogic: Queue TimeOut blocks flow

I have a pretty simple Anylogic DE model where POs are launched regularly, and a certain amount of material gets to the incoming Queue in one shot (See Sample Picture below). Then the Manufacturing process starts using that material at a regular rate, but I want to check if the material in the queue gets outdated, so I'm using the TimeOut option of that queue, in order to scrap the outdated material (older than 40wks).
The problem is that every time that some material gets scrapped through this Timeout exit, the downstream Manufacturing process "stops" pulling more material, instead of continuing, and it does not get restarted until a new batch of material gets received into the Queue.
What am I doing wrong here? Thanks a lot in advance!!
Kindest regards

Your situation is interesting because there doesn't seem to be anything wrong with what you're doing. So even though what you are doing seems to be correct, I will provide you with a workaround. Instead of the Queue block, use a Wait block. You can assign a timeout and link the timeout port just like you did for the queue (seem image at the end of the answer).
In the On Enter field of the wait block (which I will assume is named Fridge), write the following code:
if( MFG.size() < MFG.capacity ) {
self.free(agent);
}
In the On Enter of MFG block write the following:
if( self.size() < self.capacity && Fridge.size() > 0 ) {
Fridge.free(Fridge.get(0));
}
And finally, in the On Exit of your MFG block write the following:
if( Fridge.size() > 0 ) {
Fridge.free(Fridge.get(0));
}
What we are doing in the above, is we are manually pushing the agents. Each time an agent is processed, the model checks if there is capacity to send more, if yes, a new agent is sent.
I know this is an unpleasant workaround, but it provides you with a solution until AnyLogic support can figure it out.

How to trigger handle_info due to timeout in erlang?

I am using a gen_server behaviour and trying to understand how can handle_info/2 be triggered from a timeout occurring in a handle_call for example:
-module(server).
-export([init/1,handle_call/3,handle_info/2,terminate/2).
-export([start/0,stop/0]).
init(Data)->
{ok,33}.
start()->
gen_server:start_link(?MODULE,?MODULE,[]).
stop(Pid)->
gen_server:stop(Pid).
handle_call(Request,From,State)->
Return={reply,State,State,5000},
Return.
handle_info(Request,State)->
{stop,Reason,State}.
terminate(Reason,State)->
{ok,S}=file:file_open("D:/Erlang/Supervisor/err.txt",[read,write]),
io:format(S,"~s~n",[Reason]),
ok.
What i want to do:
I was expecting that if I launch the server and would not use gen_server:call/2 for 5 seconds (in my case) then handle_info would be called, which would in turn issue the stop thus calling terminate.
I see it does not happen this way, in fact handle_info is not called at all.
In examples such as this i see the timeout is set in the return of init/1.What I can deduce is that it handle_info gets triggered only if I initialize the server and issue nothing (nor cast nor call for N seconds).If so why I can provide Timeout in the return of both handle_cast/2 and handle_call/3 ?
Update:
I was trying to get the following functionality:
If no call is issued in X seconds trigger handle_info/2
If no cast is issued in Y seconds trigger handle_info/2
I thought this timeouts can be set in the return of handle_call and handle_cast:
{reply,Reply,State,X} //for call
{noreply,State,Y} //for cast
If not, when are those timeouts triggered since they are returns?

To initiate timeout handling from gen_server:handle_call/3 callback, this callback has to be called in the first place. Your Return={reply,State,State,5000}, is not executed at all.
Instead, if you want to “launch the server and would not use gen_server:call/2 for 5 seconds then handle_info/2 would be called”, you might return {ok,State,Timeout} tuple from gen_server:init/1 callback.
init(Data)->
{ok,33,5000}.
You cannot set the different timeouts for different calls and casts. As stated by Alexey Romanov in comments,
Having different timeouts for different types of messages just isn’t something any gen_* behavior does and would have to be simulated by maintaining them inside state.
If one returns {reply,State,Timeout} tuple from any handle_call/3/handle_cast/2, the timeout will be triggered if the mailbox of this process is empty after Timeout.

i suggest you read source code:gen_server.erl
% gen_server.erl
% line 400
loop(Parent, Name, State, Mod, Time, HibernateAfterTimeout, Debug) ->
Msg = receive
Input ->
Input
after Time ->
timeout
end,
decode_msg(Msg, Parent, Name, State, Mod, Time, HibernateAfterTimeout, Debug, false).
it helps you to understand the parameter Timeout

What is the meaning of CANBUS function mode initilazing settings for STM32?

I want to understand meaning of the following function mode definition, there is explanation in the library. But I don't understand that because explanations are very short and not enough. I searched on the net I couldnt find any information about.
CAN_InitStructure.CAN_TTCM = DISABLE;
CAN_InitStructure.CAN_ABOM = DISABLE;
CAN_InitStructure.CAN_AWUM = DISABLE;
CAN_InitStructure.CAN_NART = ENABLE;
CAN_InitStructure.CAN_RFLM = DISABLE;
CAN_InitStructure.CAN_TXFP = ENABLE;

These are the names of the bits located in the CAN master control register (CAN_MCR). So, the proper source for their meaning is the reference manual. My following answer will be somewhat copy & paste from the reference manual, but I will try to explain these bits in detail.
TTCM (Time triggered communication mode): This bit activates the Time Triggered Communication (TTCAN) mode, which is an extension to the CAN standard. I don't know much about TTCAN, but as I understand, it assigns time windows to messages to satisfy some real-time requirements. So, normally this bit should remain 0.
ABOM (Automatic bus-off management): If the transmit error counter (TEC) becomes greater than 255, the CAN hardware switches to bus-off state. To recover, it must wait for the recovery sequence, 128 occurrences of 11 consecutive recessive bits. Only after that, the CAN hardware may return to the normal operating state. This bit controls the returning behavior. If it's 1, returning to normal state is automatic. Otherwise, software should make the request, provided that the recovery sequence has been observed.
AWUM (Automatic wakeup mode): The CAN module can be in one of 3 modes: Initialization mode, normal mode or sleep (low power) mode. Sleep mode is requested by the software. However, you have 2 options to exit sleep mode. If this bit is 0, then you have to exit sleep mode manually. You may enable CAN wakeup interrupt to inform you about bus activity, then exit the sleep mode in ISR. But if this bit is 1, the hardware returns to normal mode automatically when it detects bus activity.
NART (No automatic retransmission): Normally, CAN hardware retries to transmit a message if its previous attempts fail, because of arbitration lost etc. But if you make this bit 1, the transmitter does not retry. This is required when you use Time Triggered Communication (TTCAN). Otherwise, you should keep this bit 0.
RFLM (Receive FIFO locked mode): Your receive mailboxes have 3 levels depth, meaning that they can store maximum 3 messages before they are overrun. This bit controls what happens in case of mailbox overrun. Default behavior is to keep the oldest 2 messages and the newest one. For example, if you received 5 messages, the buffer keeps the messages 1, 2 & 5. However, if you make this bit 1, the mailbox keeps the messages 1, 2 & 3 and discards the new arrivals.
TXFP (Transmit FIFO priority): You have 3 transmit mailboxes. When you fill more than one, the hardware must decide which one to transmit first. Normally, one can assume that a message with a lower ID number is more important and should be transmitted first. But if you want to transfer them in a first-comes-first-served fashion for some reason, you need to make this bit 1. Of course, this is just a local priority. On the physical bus, the messages with lower ID always have priority.

In Celery tasks, do retry_backoff and retry_backoff_max affect the manual self.retry() calls?

So according to the docs, default_retry_delayspecifies the time in seconds before a retry of the task should be executed. I just want to be sure that this only affects the manually called self.retry()calls and not the autoretries triggered by Celery when the task encounters predefined exceptions.
Likewise, I want to know if retry_backoff and retry_backoff_maxonly affect the autoretries and not the manual self.retry().
Finally, what happens when all of these are set?

That answer may change depending on the version of Celery that is used. However, as far as I have checked with the source code of versions v4.3.1, v4.4.7 and v5.0.0, retry_backoff and retry_backoff_max only affect the auto-retry. If autoretry_for is not set as True (or a True value such as positive integers), then the backoff parameters is not used.
As it can be seen from the implementation, when a True value is set to autoretry_for, Celery uses retry_backoff, retry_backoff_max, and retry_jitter arguments in order to compute the countdown of the next retry, before calling .retry method. Thus, these retry_xx arguments are not directly used by the .retry method; however, they affect the value of countdown. Thus, they are only used in the auto_retry.
When the countdown is explicity defined or computed via autoretry, then the value of default_retry_delay is ignored, as it can be seen in the implementation of v5.0.0. The same is valid when eta is set. Thus, to be able to use default_retry_delay, countdown, and eta should not be set. Additionally, autoretry_for should not be set or it should be set with a False value.
Another approach to use default_retry_delay and autoretry at the same time would be to define autoretry and retry variables; but for manual retry calls, set the eta and countdown to None explicitly. One example:
#app.task(bind=True, default_retry_delay=360, autoretry_for=[ExternalApiError], max_retries=5, retry_backoff=1800, retry_backoff_max=8 * 3600, retry_jitter=False)
def my_task(self, data):
try:
... # some code here that may raise ExternalApiError
except SomeOtherError:
try:
self.retry(countdown=None, eta=None)
except self.MaxRetriesExceededError:
... # when maximum number of retries has been exceeded.
If you want to use backoff parameters with manual retry calls, it can be implemented by using the method get_exponential_backoff_interval of Celery. One example could be this:
from celery.utils.time import get_exponential_backoff_interval
#app.task(bind=True, max_retries=5)
def my_task(self, data):
try:
... # some code here that may raise ExternalApiError
except SomeOtherError:
try:
retry_backoff = True
retry_backoff_max = 5
retry_jitter = True
countdown = get_exponential_backoff_interval(
factor=retry_backoff,
retries=self.request.retries,
maximum=retry_backoff_max,
full_jitter=retry_jitter)
)
self.retry(countdown=countdown)
except self.MaxRetriesExceededError:
... # when maximum number of retries has been exceeded.

Quickfix reset sequence number at start time but not set ResetSeqNum in Logon message

When the quickfix initiator reconnects at startTime (defined in config) it deletes the files with sequence number, but does not set ResetSeqNumFlag to Y, and the server replies with a Logout message with text "seq msg number to low ..."
Is there a way to set ResetSeqNumFlag = Y only for this behavior? I don`t want to reset the sequence on every log-on.

This appears to be a QuickFIX/J quirk (some might consider it a bug). If ResetOnLogon=N then no ResetSeqNumFlag=Y is sent when the session start time triggers a logon. If ResetOnLogon=Y, the ResetSeqNumFlag=Y is sent on every logon. I believe this is not a big problem in practice because participants in a FIX session typically reset their sequence numbers locally after a session ends (logically ends at the end time, not a connection disconnect).
If you want to slightly modify the source code to implement this behavior, you'd modify the quickfix.Session next() method. You could add a local flag that indicates a session has restarted (per the schedule as determined by checkSessionTime()). Pass that flag to generateLogon() and that method would use it to determine when to send ResetSeqNumFlag=Y regardless of the ResetOnLogon configuration.

I don`t want to reset the sequence on every log-on.
Then don't do it! Set ResetOnLogon=N.
At StartTime, the session will reset sequence numbers always. If ResetOnLogon=N, then they won't reset again until the next StartTime.
The initiator and acceptor should always have matching ResetOnXXX settings.

What you are asking cannot, should not be done. You start you engine with some config and then you change the config while running. If something goes wrong it will be very difficult to pinpoint what started the issue.
Instead of doing ResetSeqNumFlag = Y try adding ResetOnLogon=Y in your config for the acceptor side(that is if you have control over it) or ResetOnLogout=Y / ResetOnDisconnect=Y in your initiator config file. That would be much easier and changing config while running, is possibly not the best solution.
Your logout(disconnect can happen anytime) will happen anyways at EndTime anyways and should be easier for your application.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse