When configuring RetryOnFailure what is the maxRetryDelay parameter - entity-framework-core

I'm using Entity Framework Core 2.2 and i decided to follow some blog sugestion and enable retry on failure:
services.AddDbContext<MyDbContext>( options =>
options.UseSqlServer(Configurations["ConnectionString"]),
sqlServerOptionsAction: sqlOptions =>
{
sqlOptions.EnableRetryOnFailure(
maxRetryCount: 10,
maxRetryDelay: TimeSpan.FromSeconds(5),
errorNumbersToAdd: null);
});
My question is what is the maxRetryDelay argument for?
I would expect it to be the delay time between retries, but the name implies its the maximum time, does that mean i can do my 10 retries 1 second apart and not 5 seconds apart as i desire?

The delay between retries is randomized up to the value specified by maxRetryDelay.
This is done to avoid multiple retries occuring at the same time and overwhelming a server. Imagine for example 10K requests to a web service failing due to a network issue and retrying at the same time after 15 seconds. The database server would get a sudden wave of 15K queries.
By randomizing the delay, retries are spread across time and client.
The delay for each retry is calculated by ExecutionStragegy.GetNextDelay. The source shows it's a random exponential backoff.
The default SqlServerRetryingExecutionStrategy uses that implementation. A custom retry strategy could use a different implementation

Related

EF core request cannot wake-up Azure Sql (serverless sku) database and times out

I'm using EF Core with one of my apps to query an Azure Sql database. It's the serverless sku, that scales down to zero (goes to sleep) after 1h of inactivity.
Now, in that app there is scheduled function to query the database at certain points in time. This often is in a time, where the DB is sleeping. To compensate for this, I'm using the the following in the DbContext.cs
optionsBuilder.UseSqlServer(connection, opt => opt.EnableRetryOnFailure(
maxRetryCount: 20,
maxRetryDelay: TimeSpan.FromSeconds(30),
errorNumbersToAdd: null
));
If the delay is evenly distributed, that results in an avg of 15s, with 20 retries => timeout after 5mins.
I thought this should be plenty, since when querying a sleeping database with SSMS it usaully takes well under 1min to get going. However, this is not the case, the functions regularly time-out and the queries fail.
Is there a better way to deal with this than just even more increasing the timeout? Should 5mins really not be enough?
Cheers
I think I got it working now. The above code snippet from EF core is relevant to any command timeout occurences. However, since the database was sleeping during the request it was rather a connection timeout issue. I fixed this, by providing adding Connect Timeout=120 in the connection string itself.

Distributed timer service

I am looking for a distributed timer service. Multiple remote client services should be able to register for callbacks (via REST apis) after specified intervals. The length of an interval can be 1 minute. I can live with an error margin of around 1 minute. The number of such callbacks can go up to 100,000 for now but I would need to scale up later. I have been looking at schedulers like Quartz but I am not sure if they are a fit for the problem. With Quartz, I will probably have to save the callback requests in a DB and poll every minute for overdue requests on 100,000 rows. I am not sure that will scale. Are there any out of the box solutions around? Else, how do I go about building one?
Posting as answer since i cant comment
One more options to consider is a message queue. Where you publish a message with scheduled delay so that consumers can consume after that delay.
Amazon SQS Delay Queues
Delay queues let you postpone the delivery of new messages in a queue for the specified number of seconds. If you create a delay queue, any message that you send to that queue is invisible to consumers for the duration of the delay period. You can use the CreateQueue action to create a delay queue by setting the DelaySeconds attribute to any value between 0 and 900 (15 minutes). You can also change an existing queue into a delay queue using the SetQueueAttributes action to set the queue's DelaySeconds attribute.
Scheduling Messages with RabbitMQ
https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/
A user can declare an exchange with the type x-delayed-message and then publish messages with the custom header x-delay expressing in milliseconds a delay time for the message. The message will be delivered to the respective queues after x-delay milliseconds.
Out of the box solution
RocketMQ meets your requirements since it supports the Scheduled messages:
Scheduled messages differ from normal messages in that they won’t be
delivered until a provided time later.
You can register your callbacks by sending such messages:
Message message = new Message("TestTopic", "");
message.setDelayTimeLevel(3);
producer.send(message);
And then, listen to this topic to deal with your callbacks:
consumer.subscribe("TestTopic", "*");
consumer.registerMessageListener(new MessageListenerConcurrently() {...})
It does well in almost every way except that the DelayTimeLevel options can only be defined before RocketMQ server start, which means that if your MQ server has configuration messageDelayLevel=1s 5s 10s, then you just can not register your callback with delayIntervalTime=3s.
DIY
Quartz+storage can build such callback service as you mentioned, while I don't recommend that you store callback data in relational DB since you hope it to achieve high TPS and constructing distributed service will be hard to get rid of lock and transaction which bring complexity to DB coding.
I do suggest storing callback data in Redis. Because it has better performance than relational DB and it's data structure ZSET suits this scene well.
I once developed a timed callback service based on Redis and Dubbo. it provides some more useful features. Maybe you can get some ideas from it https://github.com/joooohnli/delay-callback

SAPUI5 request timeout to the Gateway

I have an odata-request in my SAPUI5 application which calls the Gateway.
On the Gateway, I have an Trusted RFC connection to the backend.
Now I have a complex algorithm with a duration around 2 minutes.
After 60 seconds, I get an timeout error.
HTTP request failed500,Internal Server Error,500 Connection timed out
Is there a opportunity to increase the timeout?
I tried it with the parameters gw/reg_timeout gw/conn_pending and with the keepalive-timeout of the rfc connection.
All this options haven´t solved my problem.
I guess you already tried everything from SAP Help.
Maybe this is some ICM/WebDispatcher timeout, check the link and try some of the settings, i.e. PROCTIMEOUT. And also consider the hints there:
Recommendation
In systems where the standard timeout setting of 60
seconds for the keep-alive and processing timeouts is not sufficient
due to long-running applications, SAP recommends that both the TIMEOUT
and PROCTIMEOUT parameters are set for the services concerned so that
they can be configured independently of each other. The TIMEOUT value
should not be set unnecessarily high. We recommend you set this
parameter as follows:
icm/server_port_0 = PROT=HTTP,PORT=1080,TIMEOUT=60,PROCTIMEOUT=600
in order to allow a
maximum processing time of 10 minutes.

Getting QuotaExceededException - What are the operation quota limitations for Azure Notification Hubs?

I was doing some latency/performance testing for sending push notifications with Azure Notification Hub by consecutively sending many notifications in a foreach loop. It worked fine for 100 "SendNotification" requests, altough it was relatively slow (14s), but I got a QuotaExceededException for 1000 requests in a row:
[QuotaExceededException: The remote server returned an error: (403)
Forbidden. The request was terminated because the namespace
pushnotification-testing is being throttled. Please wait 60 seconds
and try again. TrackingId:...
Even when I don't wait for 60 seconds as advised, I can again execute 100 consecutive requests, but 1000 requests in a row always fail... Anything slightly above 100 consecutive requests fails most of the time...
I couldn't find any documentation on these limitations. This should be documented somewhere, so I can be sure Azure Notification Hubs will fit my needs.
The answer to this question says
There is a throttling for CRUD operation's rate. Quotas depend on tire
your are but it is not going to be less then 2000 operations per
minute per namespace any way. If quota is exceed then service returns
403.
For me, it seems to be less then 2000 operations. By the way, I'm using "FREE" tier for testing, but I guess we would switch to "STANDARD" for production.
Has anyone similar experiences or knows where to look for more information?
In particular, what are the operation quota limitations per timefram for the different tiers of Azure Notification Hubs?
UPDATE1: It's weird, but I sending 1000 requests in parallel works most of the time, but consecutively it fails on the 101st request.
For my best knowledge for right now NH has following limitations on number of SENDS (not registrations) per namespace per minute per NH machine:
Free tire: 100
Basic tire: 900
Standard tire: 11500
Massive sending in parallel allows to send more because calls are very likely to be routed on different machines.

Handling Latency in Real Time Distributed Systems

I am trying to implement a poker server. An http server forwards data packets to the backend servers which handle the state of all the poker hands. In any given hand the player to act gets 10 seconds to act (bet,fold,call,raise,etc.). If there is no response within 10 seconds the server automatically folds for them. To check that 10 seconds has passed an event list of when actions must be received is maintained. It is a priority queue ordered by time and each poker hand currently being played has an entry in the priority queue.
Consider the following scenario since the last action 9.99 seconds pass before the next action arrives at the http server. By the time the action is forwarded to the backend servers extra time passes so now a total of 10.1 seconds have passed. The backend servers will have declared the hand folded, but I would like the action to be processed since technically it arrived at the http server after 9.99 seconds. Now one solution would be to have the backends wait some extra time before declaring a hand folded to see if an action timestamped at 9.99 seconds comes. But that would result in delaying when the next person in the hand gets to act.
The goals I would like are
Handle actions reaching the http server at 9.99 seconds instead of folding their hand.
Aggressively minimize delay resulting from having to do idle waiting to "solve" problem mentioned in bullet point 1.
What are the various solutions? To experts in distributed systems is there known literature on what the trade offs are to various solutions. I would like to know the various solutions deemed acceptable by distributed systems literature. Not just various ad hocs solution.
Maybe on the server side when client request arrives you could take the timestamp?
So you would take "start" and "stop" timestamps, to measure exactly 9.9s?