Handling Latency in Real Time Distributed Systems

Handling Latency in Real Time Distributed Systems - real-time

I am trying to implement a poker server. An http server forwards data packets to the backend servers which handle the state of all the poker hands. In any given hand the player to act gets 10 seconds to act (bet,fold,call,raise,etc.). If there is no response within 10 seconds the server automatically folds for them. To check that 10 seconds has passed an event list of when actions must be received is maintained. It is a priority queue ordered by time and each poker hand currently being played has an entry in the priority queue.
Consider the following scenario since the last action 9.99 seconds pass before the next action arrives at the http server. By the time the action is forwarded to the backend servers extra time passes so now a total of 10.1 seconds have passed. The backend servers will have declared the hand folded, but I would like the action to be processed since technically it arrived at the http server after 9.99 seconds. Now one solution would be to have the backends wait some extra time before declaring a hand folded to see if an action timestamped at 9.99 seconds comes. But that would result in delaying when the next person in the hand gets to act.
The goals I would like are
Handle actions reaching the http server at 9.99 seconds instead of folding their hand.
Aggressively minimize delay resulting from having to do idle waiting to "solve" problem mentioned in bullet point 1.
What are the various solutions? To experts in distributed systems is there known literature on what the trade offs are to various solutions. I would like to know the various solutions deemed acceptable by distributed systems literature. Not just various ad hocs solution.

Maybe on the server side when client request arrives you could take the timestamp?
So you would take "start" and "stop" timestamps, to measure exactly 9.9s?

Related

Ajax polling vs SSE (performance on server side)

I'm curious about if there is some type of standard limit on when is better to use Ajax Polling instead of SSE, from a server side viewpoint.
1 request every second: I'm pretty sure is better SSE
1 request per minute: I'm pretty sure is better Ajax
But what about 1 request every 5 seconds? How can we calculate where is the limit frequency for Ajax or SSE?

No way is 1 request per minute always better for Ajax, so that assumption is flawed from the start. Any kind of frequent polling is nearly always a costly choice. It seems from our previous conversation in comments of another question that you start with a belief that an open TCP socket (whether SSE connection or webSocket connection) is somehow costly to server performance. An idle TCP connection takes zero CPU (maybe every once in a long while, a keep alive might be sent, but other than that, an idle socket does not use CPU). It does use a bit of server memory to handle the socket descriptor, but a highly tuned server can have 1,000,000 open sockets at once. So, your CPU usage is going to be more about how many connections are being established and what are they asking the server to do every time they are established than it is about how many open (and mostly idle) connections there are.
Remember, every http connection has to create a TCP socket (which is roundtrips between client/server), then send the http request, then get the http response, then close the socket. That's a lot of roundtrips of data to do every minute. If the connection is https, it's even more work and roundtrips to establish the connection because of the crypto layer and endpoint certification. So doing all that every minute for hundreds of thousands of clients seems like a massive waste of resources and bandwidth when you could create one SSE connection and the client just listen for data to stream from the server over that connection.
As I said in our earlier comment exchange on a different question, these types of questions are not really answerable in the abstract. You have to have specific requirements of both client and server and a specific understanding of the data being delivered and how urgent it is on the client and therefore a specific polling interval and a specific scale in order to begin to do some calculations or test harnesses to evaluate which might be the more desirable way to do things. There are simply too many variables to come up with a purely hypothetical answer. You have to define a scenario and then analyze different implementations for that specific scenario.
Number of requests per second is only one of many possible variables. For example, if most the time you poll there's actually nothing new, then that gives even more of an advantage to the SSE case because it would have nothing to do at all (zero load on the server other than a little bit of memory used for an open socket most of the time) whereas the polling creates continual load, even when nothing to do.
The #1 advantage to server push (whether implement with SSE or webSocket) is that the server only has to do anything with the client when there is actually pertinent data to send to that specific client. All the rest of the time, the socket is just sitting there idle (perhaps occasionally on a long interval, sending a keep-alive).
The #1 disadvantage to polling is that there may be lots of times that the client is polling the server and the server has to expend resources to deal with the polling request only to inform that client that it has nothing new.
How can we calculate where is the limit frequency for Ajax or SSE?
It's a pretty complicated process. Lots of variables in a specific scenario need to be defined. It's not as simple as just requests/sec. Then, you have to decide what you're attempting to measure or evaluate and at what scale? "Server performance" is the only thing you mention, but that has to be completely defined and different factors such as CPU usage and memory usage have to be weighted into whatever you're measuring or calculating. Then, you may even need to run some test harnesses if the calculations don't yield an obvious answer or if the decision is so critical that you want to verify your calculations with real metrics.
It sounds like you're looking for an answer like "at greater than x requests/min, you should use polling instead of SSE" and I don't think there is an answer that simple. It depends upon far more things than requests/min or requests/sec.

"Polling" incurs overhead on all parties. If you can avoid it, don't poll.
If SSE is an option, it might be a good choice. "It depends".
Q: What (if any) kind of "event(s)" will your app need to handle?

Distributed timer service

I am looking for a distributed timer service. Multiple remote client services should be able to register for callbacks (via REST apis) after specified intervals. The length of an interval can be 1 minute. I can live with an error margin of around 1 minute. The number of such callbacks can go up to 100,000 for now but I would need to scale up later. I have been looking at schedulers like Quartz but I am not sure if they are a fit for the problem. With Quartz, I will probably have to save the callback requests in a DB and poll every minute for overdue requests on 100,000 rows. I am not sure that will scale. Are there any out of the box solutions around? Else, how do I go about building one?

Posting as answer since i cant comment
One more options to consider is a message queue. Where you publish a message with scheduled delay so that consumers can consume after that delay.
Amazon SQS Delay Queues
Delay queues let you postpone the delivery of new messages in a queue for the specified number of seconds. If you create a delay queue, any message that you send to that queue is invisible to consumers for the duration of the delay period. You can use the CreateQueue action to create a delay queue by setting the DelaySeconds attribute to any value between 0 and 900 (15 minutes). You can also change an existing queue into a delay queue using the SetQueueAttributes action to set the queue's DelaySeconds attribute.
Scheduling Messages with RabbitMQ
https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/
A user can declare an exchange with the type x-delayed-message and then publish messages with the custom header x-delay expressing in milliseconds a delay time for the message. The message will be delivered to the respective queues after x-delay milliseconds.

Out of the box solution
RocketMQ meets your requirements since it supports the Scheduled messages:
Scheduled messages differ from normal messages in that they won’t be
delivered until a provided time later.
You can register your callbacks by sending such messages:
Message message = new Message("TestTopic", "");
message.setDelayTimeLevel(3);
producer.send(message);
And then, listen to this topic to deal with your callbacks:
consumer.subscribe("TestTopic", "*");
consumer.registerMessageListener(new MessageListenerConcurrently() {...})
It does well in almost every way except that the DelayTimeLevel options can only be defined before RocketMQ server start, which means that if your MQ server has configuration messageDelayLevel=1s 5s 10s, then you just can not register your callback with delayIntervalTime=3s.
DIY
Quartz+storage can build such callback service as you mentioned, while I don't recommend that you store callback data in relational DB since you hope it to achieve high TPS and constructing distributed service will be hard to get rid of lock and transaction which bring complexity to DB coding.
I do suggest storing callback data in Redis. Because it has better performance than relational DB and it's data structure ZSET suits this scene well.
I once developed a timed callback service based on Redis and Dubbo. it provides some more useful features. Maybe you can get some ideas from it https://github.com/joooohnli/delay-callback

real-time multiplayer server, interpolation and prediction validation

I'm building a HTML5 / Websockets based multiplayer canvas game for Facebook and I've been working on the server code for a few days now. While the games pretty simple with a 2d top down, WSAD controls and mouseclick fires a projectile to the cursor x/y - I've never had to do real-time multiplayer before. I've read a few great documents but I'm hoping I can overview my general understanding of the topic and someone can validate the approach and/or point out areas for improvement.
Authoritative multiplayer server, client-side prediction and entity interpolation (and questions below)
Client connects to server
Client syncs time to server
Server has two main update loops:
Update the game physics (or game state) on the server at a frequency of 30 per second (tick rate?)
Broadcast the game state to all clients at a frequency of 10 per second
Client stores three updates before being allowed to move, this builds up the cache for entity interpolation between update states (old to new with one redundency in case of packet loss)
Upon input from the user, the client sends input commands to server at a frequency of 10 per second - these input commands are time stamped with the clients time
Client moves player on screen as a prediction of what the server will return as the final (authoritative) position of client
Server applies all updates to its physics / state in the previously mentioned update loop
Server sends out time stamped world updates.
Client (if is behind server time && has updates in the queue) linearly interpolates the old position to the new.
Questions
At 1: possibility to use NTP time and sync between the two?
At 5: time stamped? Is the main purpose here to time-stamp each packet
At 7: The input commands that come in will be out of sync per different latencies of the clients. I'm guessing this needs to be sorted before being applied? Or is this overkill?
At 9: is the lerp always a fixed amount? 0.5f for example? Should I be doing something smarter?
Lots of questions I know but any help would be appreciated!!

At 1 : You're a bit overthinking this, all you have to do in reality is to send the server time to the client and on that side increment that in your update loop to make sure you're tracking time in server-time. Every sync you set your own value to the one that came from the server. Be EXTRA careful about this part, validate every speed/time server-sided or you will get extremely easy-to-do but incredibly nasty hacks.
At 5 : Timestamped is important when you do this communication via UDP, as the order of the packets is not ensured unless you specifically make it so. Via websockets it shouldn't be that big of an issue, but it's still good practice (but make sure to validate those timestamps, or speedhacks ensure).
At 7 : Can be an overkill, depends on the type of the game. If your clients have large lag, they will send less inputs by definition to the server, so make sure you only process those that came before the point of processing and queue the remaining for the next update.
At 9 : This post from gamedev stackexchange might answer this better than I would, especially the text posted by user ggambett at the bottom.

Basic client-server synchronization

Let do simple thing, we have a cloud, which client draws, and server which sends commands to move cloud. Assume what client 1 runs on 60 fps and Client 2 runs on 30 fps and we want kinda smooth cloud transition.
First problem - server have different fps with clients and if send move command every tick, it will start spamming commands much faster, then clients will draw.
Possible solution 1 - client sends "i want update" command after finishing frame.
Possible soolution 2 - server sends move cloud commands every x ms, but then cloud will not move smoothly. Can be combined with solution 3.
Possible solution 3 - server sends - "start move cloud with speed x" and "change "cloud direction" instead of "move cloud to x". But problem again is what checks for changing cloud dir on edge of screen, will trigger faster then cloud actually drawned on client.
Also Client 2 draws 2 times slower then Client 1, how compensate this?
How sync server logic with clients drawning in basic way?

Solution 3 sounds like the best one by far, if you can do it. All of your other solutions are much too chatty: they require extremely frequent communication between the client and server, much too frequent unless servers and clients have a very good network connection between them.
If your cloud movements are all simple enough that they can be sent to the clients as vectors such that the client can move the cloud along one vector for an extended period of time (many frames) before receiving new instructions (a new starting location and vector) from the server then you should definitely do that. If your cloud movements are not so easily representable as simple vectors then you can choose a more complex model (e.g. add instructions to transform the vector over time) and send the model's parameters to the clients.
If the cloud is part of a larger world and the clients track time in the world, then each of the sets of instructions coming from the server should include a timestamp representing the time when the initial conditions in the model are valid.
As for your question about how to compensate for client 2 drawing two times slower than client 1, you need to make your world clock tick at a consistent rate on both clients. This rate need not have any relationship with the screen refresh rate on either client.

Throttling in VBA

The Back Story
A little while back, I was asked if we could implement a mass email solution in house so that we would have better control over sensitive information. I proposed a two step plan: Develop a prototype in Excel/VBA/CDO for user familiarity, then phase a .Net/SQL server solution for speed and robustness.
What's Changed
3 Months into the 2nd phase, management decides to go ahead and outsource email marketing to another company, which is fine. The 1st problem is that management has not made a move on a company to go through, so I am still implicity obligated to make the current prototype work.
Still, the prototype works, or at least it did. The 2nd problem came when our Exchange 2003 Relay Server got switched with Exchange 2010. Turns out that more "safety" features are turned by default like Throttling Policies, which I have been helping the sysadmin iron out a server config that works. What's happening is that the after +100 emails get sent, the server starts rejecting the send requests with the following error:
The message could not be sent to the SMTP server. The transport error code is 0x800ccc67.
The server response was 421 4.3.2 The maximum number of concurrent connections has exceeded a limit, closing transmission channel
Unfortunately, we only get to test the server configuration when Marketing has something to send out, which is about once per month.
What's Next?
I am looking at Excel's VBA Timer Function to help throttle my main loop pump to help throttle the send requests. The 3rd problem here is, from what I understand from reading, is that the best precision I can get is 1 second on the timer. 1 email per second would be considerably longer ( about 4x-5x longer) as oppossed to the 5 email/sec we have been sending at. This turns a 3 hour process into a an all day process past the hours of staff availability. I suppose I can invert the rate by sending 5 emails for every second that passes, but the creates more of a burst affect as opposed a steady rate if had more precision on the timer. In my opinion, this creates a less controlled process and I am not sure how the server will handle bursts as opposed a steady rate. What are my options?

You can use the windows sleep API if you need finer timer control. It has it's units in milliseconds:
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Public Sub Testing()
'do something
Sleep(1000) 'sleep for 1 second
'continue doing something
End Sub
I'm not very familiar with Exchange, so I can't comment on the throttling policies in place.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse