How to reliably recover/resend data through sockets when servers goes down? - sockets

Let's say I have an internal system of 20+ nodes that pass data back and forth to each other through sockets where low latency is a high, high priority. How do I design it so that if a random server(s) goes down, I can recover/resend the data that was already sent but not processed by the downed server?
For example, if A was streaming data to B, but at some point B goes down without processing some of the data. If we assume A can detect that B went down, and reroute the data to C, how would I design it so that I know what data was sent to B, and now should be rereouted to C?
I'm assuming I'll have to rely on the various message queue software out there, but I'm wondering if there is also another easy way to do this!

Related

Omnetpp application sends multiple streams

Let's say I have a car with different sensors: several cameras, LIDAR and so on, the data from this sensors are going to be send to some host over 5G network (omnetpp + inet + simu5g). For video it is like 5000 packets 1400 bytes each, for lidar 7500 packets 1240 bytes and so on. Each flow is encoded in UDP packets.
So in omnetpp module in handleMessage method I have two sentTo calls, each is scheduled "as soon as possible", i.e., with no delay - that corresponds to the idea of multiple parallel streaming. How does omnetpp handle situations, when it needs to send two different packets at the same time from the same module to the same module (some client, which receives sensor data streams)? Does it create some inner buffer on the sender or receiver side, therefore allowing really only one packet sending per handleMessage call or is it wrong? I want to optimize data transmission and play with packet sizes and maybe with sending intervals, so I want to know, how omnetpp handles multiple streaming at the same time, because if it actually buffers, maybe than it makes sense to form a single package from multiple streams, each such package will consist of a certain amount of data from each stream.
There is some confusion here that needs to be clarified first:
OMNeT++ is a discrete event simulator framework. An OMNeT++ model contains modules that communicate with each other, using OMNeT++ API calls like sendTo() and handleMessage(). Any call of the sendTo() method just queues the provided message into the future event queue (an internal, time ordered queue). So if you send more than one packet in a single handleMessage() method, they will be queued in that order. The packets will be delivered one by one to the requested destination modules when the requested simulation time is reached. So you can send as many packets as you wish and those packets will be delivered one by one to the destination's handleMessage() method. But beware! Even if the different packets will be delivered one by one sequentially in the program's logic, they can still be delivered simultaneously considering the simulation time. There are two time concepts here: real-time that describes the execution order of the code and simulation-time which describes the time passes from the point of the simulated system. That's why, while OMNeT++ is a single threaded application that runs each events sequentially it still can simulate infinite number of parallel running systems.
BUT:
You are not modeling directly with OMNeT++ modules, but rather using INET Framework which is a model directly created to simulate internet protocols and networks. INET's core entity is a node which is something that has network interface(s) (and queues belonging to them). Transmission between nodes are properly modeled and only a single packet can travel on an ethernet line at a time. Other packets must queue in the network interface queue and wait for an opportunity to be delivered from there.
This is actually the core of the problem for Time Sensitive Networks: given a lot of pre-defined data streams in a network, how the various packets interfere and affect each other and how they change the delay and jitter statistics of various streams at the destination, Plus, how you can configure the source and network gate scheduling to achieve some desired upper bounds on those statistics.
The INET master branch (to be released as INET 4.4) contains a lot TSN code, so I highly recommend to try to use it if you want to model in vehicle networks.
If you are not interested in the in-vehicle communication, bit rather want to stream some data over 5G, then TSN is not your interest, but you should NOT start to multiplex/demultiplex data streams at application level. The communication layers below your UDP application will fragment/defragment and queue the packets exactly how it is done in the real world. You will not gain anything by doing mux/demux at application layer.

Push data to client, how to handle slow clients?

In a push model, where server pushes data to clients, how does one handle clients with low or variable bandwidth?
For example i receive data from a producer and send the data to my clients (push). What if one of my clients decides to download a linux iso, the available bandwidth to this client becomes too little to download my data.
Now when my producers produces data and the server pushes it to the client, all clients will have to wait until all clients have downloaded the data. This is a problem when there is one or more slow clients with little bandwidth.
I can cache the data to be send for every client, but because the data size is big this isn't really an option (lots of clients * data size = huge memory requirements).
How is this generally solved? No need for code, just a few thoughts/ideas are already more then welcome.
Now when my producers produces data and the server pushes it to the
client, all clients will have to wait until all clients have
downloaded the data.
The above shouldn't be the case -- your clients should be able to download asynchronously from each other, with each client maintaining its own independent download state. That is, client A should never have to wait for client B to finish, and vice versa.
I can cache the data to be send for every client, but because the data
size is big this isn't really an option (lots of clients * data size =
huge memory requirements).
As Warren said in his answer, this problem can be reduced by keeping only one copy of the data rather than one copy per client. Reference-counting (e.g. via shared_ptr, if you are using C++, or something equivalent in another language) is an easy way to make sure that the shared data is deleted only when all clients are done downloading it. You can make the sharing more fine-grained, if necessary, by breaking up the data into chunks (e.g. instead of all clients holding a reference to a single 800MB linux iso, you could break it up into 800 1MB chunks, so that you can start removing the earlier chunks from memory as soon as all clients have downloaded them, instead of having to hold the entire 800MB of data in memory until every client has downloaded the entire thing)
Of course, that sort of optimization only gets you so far -- e.g. if two clients each request a different 800MB file, then you're liable to end up with 1.6GB of RAM usage for caching, unless you come up with a more clever solution.
Here are some possible approaches you could try (from less complex to more complex). You could try any of these either separately or in combination:
Monitor how much each client's "backlog" is -- that is, keep a count of the amount of data you have cached waiting to send to that client. Also keep track of the number of bytes of cached data your server is currently holding; if that number gets too high, force-disconnect the client with the largest backlog, in order to free up memory. (this doesn't result in a good user experience for the client, of course; but if the client has a buggy or slow connection he was unlikely to have a good user experience anyway. It does keep your server from crashing or swapping itself to death because a single client has a bad connection)
Keep track of how much data your server has cached and waiting to send out. If the amount of data you have cached is too large (for some appropriate value of "too large"), temporarily stop reading from the socket(s) that are pushing the data out to you (or if you are generating your data internally, temporarily stop generating it). Once the amount of cached data gets down to an acceptable level again, you can resume receiving (or generating) more data to push.
(this may or may not be applicable to your use-case) Revise your data model so that instead of being communications-oriented, it becomes state-oriented. For example, if your goal is to update the clients' state to match the state of the data-source, and you can organize the data-source's state into a set of key/value pairs, then you can require that the data-source include a key with each piece of data it sends. Whenever a key/value pair is received from the data-source, simply place that key-value pair into a map (or hash table or some other key/value oriented data structure) for each client (again, used shared_ptr's or similar here to keep memory usage reasonable). Whenever a given client has drained its queue of outgoing TCP data, remove the oldest item from that client's key/value map, convert it into TCP bytes to send, and add them to the outgoing-TCP-data queue. Repeat as necessary. The advantage of this is that "obsolete" values for a given key are automatically dropped inside the server and therefore never need to be sent to the slow clients; rather the slow clients will only ever get the "latest" value for that given key. The beneficial consequence of that is that a given client's maximum "backlog" will be limited by the number of keys in the state-model, regardless of how slow or intermittent the client's bandwidth is. Thus a slow client might see fewer updates (per second/minute/hour), but the updates it does see will still be as recent as possible given its bandwidth.
Cache the data once only, and have each client handler keep track of where it is in the download, all using the same cache. Once all clients have all the data, the cached data can be deleted.

Continious stream of data via socket gets progressively more delayed

I am working on an application which, through a Java program, links two different robot simulation environments. One simulation environment (let's call it A) sends the current state of the robot to the Java application, which does some calculations and then sends data about this current state, as well as some other information, on to the other simulation environment (let's call it B). Simulation B then updates the state of the robot to match Simulation A's version.
The problem is that as the program continues to run, simulation B begins to lag behind what simulation A is doing. This lag increases continuously, so that after a minute or so simulation B is several seconds behind.
I am using TCP sockets to send data between these environments and the Java program. From background reading on socket programming, I found out it is bad practice to continuously open and close sockets rapidly, so what I am doing currently is just keeping both sockets open. I have a loop running which grabs data from Sim A, does some calculations, and then sends the position data to Sim B and then I have the thread wait for 100ms and then the loop repeats. To be clear, the position data sent to B is unaltered from what is received from A.
Upon researching the lag issue, someone suggested to me that for streams of data it is actually a good idea to open and close sockets, because if you keep the socket open, if one simulation takes a longer time to process things than the other, you end up with the position data stacking up in the buffer and being read sequentially, instead of reading the most recent data. Is this true? Would rewriting my code to open and close sockets every 100ms potentially get rid of the delay? Or is this not how sockets actually work?
Edit for clarification: It is more critical that the simulations stay in sync than that all position data is sent, in other words it is acceptable to not pass along all data points for the sake of staying in sync.
Besides keeping the socket open causing problems, does anyone have any ideas of what might be causing the lag issue?
Thanks in advance for any insight/suggestions/hints!
You are correct about using a single connection. Data can indeed back up, but using multiple connections doesn't change that.
The basic question here is whether the Java program can calculate as fast as the robot can send data. If it can't, it will get behind. You can do various things to the networking to speed it up but if the computations can't keep up they are futile. So you need to investigate your timings.

Akka and state among actors in cluster

I am working on my bc thesis project which should be a Minecraft server written in scala and Akka. The server should be easily deployable in the cloud or onto a cluster (not sure whether i use proper terminology...it should run on multiple nodes). I am, however, newbie in akka and i have been wondering how to implement such a thing. The problem i'm trying to figure out right now, is how to share state among actors on different nodes. My first idea was to have an Camel actor that would read tcp stream from minecraft clients and then send it to load balancer which would select a node that would process the request and then send some response to the client via tcp. Lets say i have an AuthenticationService implementing actor that checks whether the credentials provided by user are valid. Every node would have such actor(or perhaps more of them) and all the actors should have exactly same database (or state) of users all the time. My question is, what is the best approach to keep this state? I have came up with some solutions i could think of, but i haven't done anything like this so please point out the faults:
Solution #1: Keep state in a database. This would probably work very well for this authentication example where state is only represented by something like list of username and passwords but it probably wouldn't work in cases where state contains objects that can't be easily broken into integers and strings.
Solution #2: Every time there would be a request to a certain actor that would change it's state, the actor will, after processing the request, broadcast information about the change to all other actors of the same type whom would change their state according to the info send by the original actor. This seems very inefficient and rather clumsy.
Solution #3: Having a certain node serve as sort of a state node, in which there would be actors that represent the state of the entire server. Any other actor, except the actors in such node would have no state and would ask actors in the "state node" everytime they would need some data. This seems also inefficient and kinda fault-nonproof.
So there you have it. Only solution i actually like is the first one, but like i said, it probably works in only very limited subset of problems (when state can be broken into redis structures). Any response from more experienced gurus would be very appriciated.
Regards, Tomas Herman
Solution #1 could possibly be slow. Also, it is a bottleneck and a single point of failure (meaning the application stops working if the node with the database fails). Solution #3 has similar problems.
Solution #2 is less trivial than it seems. First, it is a single point of failure. Second, there are no atomicity or other ordering guarantees (such as regularity) for reads or writes, unless you do a total order broadcast (which is more expensive than a regular broadcast). In fact, most distributed register algorithms will do broadcasts under-the-hood, so, while inefficient, it may be necessary.
From what you've described, you need atomicity for your distributed register. What do I mean by atomicity? Atomicity means that any read or write in a sequence of concurrent reads and writes appears as if it occurs in single point in time.
Informally, in the Solution #2 with a single actor holding a register, this guarantees that if 2 subsequent writes W1 and then W2 to the register occur (meaning 2 broadcasts), then no other actor reading the values from the register will read them in the order different than first W1 and then W2 (it's actually more involved than that). If you go through a couple of examples of subsequent broadcasts where messages arrive to destination at different points in time, you will see that such an ordering property isn't guaranteed at all.
If ordering guarantees or atomicity aren't an issue, some sort of a gossip-based algorithm might do the trick to slowly propagate changes to all the nodes. This probably wouldn't be very helpful in your example.
If you want fully fault-tolerant and atomic, I recommend you to read this book on reliable distributed programming by Rachid Guerraoui and Luís Rodrigues, or the parts related to distributed register abstractions. These algorithms are built on top of a message passing communication layer and maintain a distributed register supporting read and write operations. You can use such an algorithm to store distributed state information. However, they aren't applicable to thousands of nodes or large clusters because they do not scale, typically having complexity polynomial in the number of nodes.
On the other hand, you may not need to have the state of the distributed register replicated across all of the nodes - replicating it across a subset of your nodes (instead of just one node) and accessing those to read or write from it, providing a certain level of fault-tolerance (only if the entire subset of nodes fails, will the register information be lost). You can possibly adapt the algorithms in the book to serve this purpose.

Implement a good performing "to-send" queue with TCP

In order not to flood the remote endpoint my server app will have to implement a "to-send" queue of packets I wish to send.
I use Windows Winsock, I/O Completion Ports.
So, I know that when my code calls "socket->send(.....)" my custom "send()" function will check to see if a data is already "on the wire" (towards that socket).
If a data is indeed on the wire it will simply queue the data to be sent later.
If no data is on the wire it will call WSASend() to really send the data.
So far everything is nice.
Now, the size of the data I'm going to send is unpredictable, so I break it into smaller chunks (say 64 bytes) in order not to waste memory for small packets, and queue/send these small chunks.
When a "write-done" completion status is given by IOCP regarding the packet I've sent, I send the next packet in the queue.
That's the problem; The speed is awfully low.
I'm actually getting, and it's on a local connection (127.0.0.1) speeds like 200kb/s.
So, I know I'll have to call WSASend() with seveal chunks (array of WSABUF objects), and that will give much better performance, but, how much will I send at once?
Is there a recommended size of bytes? I'm sure the answer is specific to my needs, yet I'm also sure there is some "general" point to start with.
Is there any other, better, way to do this?
Of course you only need to resort to providing your own queue if you are trying to send data faster than the peer can process it (either due to link speed or the speed that the peer can read and process the data). Then you only need to resort to your own data queue if you want to control the amount of system resources being used. If you only have a few connections then it is likely that this is all unnecessary, if you have 1000s then it's something that you need to be concerned about. The main thing to realise here is that if you use ANY of the asynchronous network send APIs on Windows, managed or unmanaged, then you are handing control over the lifetime of your send buffers to the receiving application and the network. See here for more details.
And once you have decided that you DO need to bother with this you then don't always need to bother, if the peer can process the data faster than you can produce it then there's no need to slow things down by queuing on the sender. You'll see that you need to queue data because your write completions will begin to take longer as the overlapped writes that you issue cannot complete due to the TCP stack being unable to send any more data due to flow control issues (see http://www.tcpipguide.com/free/t_TCPWindowSizeAdjustmentandFlowControl.htm). At this point you are potentially using an unconstrained amount of limited system resources (both non-paged pool memory and the number of memory pages that can be locked are limited and (as far as I know) both are used by pending socket writes)...
Anyway, enough of that... I assume you already have achieved good throughput before you added your send queue? To achieve maximum performance you probably need to set the TCP window size to something larger than the default (see http://msdn.microsoft.com/en-us/library/ms819736.aspx) and post multiple overlapped writes on the connection.
Assuming you already HAVE good throughput then you need to allow a number of pending overlapped writes before you start queuing, this maximises the amount of data that is ready to be sent. Once you have your magic number of pending writes outstanding you can start to queue the data and then send it based on subsequent completions. Of course, as soon as you have ANY data queued all further data must be queued. Make the number configurable and profile to see what works best as a trade off between speed and resources used (i.e. number of concurrent connections that you can maintain).
I tend to queue the whole data buffer that is due to be sent as a single entry in a queue of data buffers, since you're using IOCP it's likely that these data buffers are already reference counted to make it easy to release then when the completions occur and not before and so the queuing process is made simpler as you simply hold a reference to the send buffer whilst the data is in the queue and release it once you've issued a send.
Personally I wouldn't optimise by using scatter/gather writes with multiple WSABUFs until you have the base working and you know that doing so actually improves performance, I doubt that it will if you have enough data already pending; but as always, measure and you will know.
64 bytes is too small.
You may have already seen this but I wrote about the subject here: http://www.lenholgate.com/blog/2008/03/bug-in-timer-queue-code.html though it's possibly too vague for you.