Deserializing multiple objects from asynchronous socket - sockets

All,
I'm a bit of a newbie to C# and socket programming and I would need some advice. I have been looking on this site and similar sites but haven't really found a solution for my problem.
I am developing a client application and a server application and the two are communicating over an asynchronous socket. The client sends objects to the server, one at a time, by serializing it to a MemoryStream using BinaryFormatter. The resulting byte array is sent over the socket and deserialized by the server.
This works well when the server has time to receive and process the object before the client sends a new one. However when the client sends objects faster than the server can handle them, they queue up at the server side. The next EndReceive() call reads all queued objects from the socket, but the serializer only deserializes the first object and the other ones are lost.
The objects are of variable size, so I guess I can't use the Position property of MemoryStream. Is there a way to detect in the byte array where objects start ?
Also, I have read in other posts that EndReceive() may not receive everything that has been sent in one read, other reads may be needed. So I guess that's something else I'll have to deal with ?
Any pointers ? Any help would be greatly appreciated. :-)

You could read as much as is availble and "queue" it up for processing so that the socket isn't queued up. You could have the server receiving the data simply reading and sending the data into a message queue for processing asynchronously.
It's concerning that the server can't process fast enough to keep up with writes though; you might want to look into optimizing that.

Related

Sending and receiving data over Internet

This question is not for a concrete implementation of how this is done. It is more about the concept and design of sending information over Internet with some kind of protocol - either TCP or UDP. I know only that sockets are needed, but I am wondering about the rest. For example after a connection is made and you send the information through that, but how does the other end listen for a specific port and does it listen constantly?
Is listening done in a background thread waiting for information to be received? (In order to be able to do other things/processing while waiting for information)
So in essence, I think a real world example of how such an application works on a high level would be enough to explain the data flow. For example sending files in Skype or something similar.
P.S. Most other questions on similar topics are about a concrete implementation or a bug that someone has.
What I currently do in an application is the following using POSIX sockets with the TCP Protocol:
Most important thing is: The most function are blocking functions. So when you tell your server to wait for client connection, the function will block until a connection is established (if you need a server that handles multiple clients at once, you need to use threading!)
Server listens for specific port until a client connects. After the connect, you will get a new socket file descriptor to communicate with the client whilst the initial socket can listen to new connections. My server then creats a new thread to handle that client whilst waiting for new connections on the initial socket. In the new thread the server waits for a request command from the Client (e.g. Request Login Token). After a request was received by the server, the server will gather its informations, packs it together using Googles Protocol Buffers and sends it to the client. The client now either tells the server to terminate the session (if every data is received by the client that it needs) or send another request.
Thats basically the idea in my server. The bigger problem is the way you transmit and receive data. E.g. you cant send structs or classes (at least not via C++) over the wire, you need some kind of serializer and you have to make sure the other part knows how much to receive. So what i do is, first send a 4byte integer over the wire containing the size of the incomming package, then send the package itself using a serializer (in my case Googles Protocol buffers). The other side waits for 4 byte to be available, knowing that this will be the size of the incomming package. After 4 bytes are received, the program waits for exact that amount of data being available on the socket, when available, read the data out of the buffer and deserialize it. When the socket is not receiving data for 30 seconds, trigger a timeout and terminate the connection.
What you always need to be aware of is the endianess of the systems. E.g. a big endian system (e.g. PowerPC) and a little endian system (e.g. x86) will have problems when you send an integer directly over the wire. For example a
0001
on the x86, is a
1000
on the Power PC, thus making a 8 out of a 1. So you should always use functions like ntohl, an htonl, which will convert data from and to host byte order from and to network byte order (network byte order is always big endian).
Hope this kind of helps. I could also provide some code to you if that would help.

Game server TCP networking sockets - fairness

I'm writing a game server for a turn-based game. One criteria is that the game needs to be as fair for all players as possible.
So far it works like this:
Each client has a TCP connection. (If relevant, the connection is opened via WebSockets)
While running, continually check for incoming socket messages via epoll.
Iterate through clients with sockets ready to read:
Read all messages from the client.
Update the internal game state for each message.
Queue outgoing messages to affected clients.
At the end of each "window" (turn):
Iterate through clients and write all queued outgoing messages to their sockets
My concern for fairness raises the following questions:
Does it matter in which order I send messages to the clients?
Calling write() on all the sockets takes only a fraction of a second for my program, but somewhere in the underlying OS or networking would it make a difference if I sorted the client list?
Perhaps I should be sending to the highest-latency clients first?
Does it matter how I write the outgoing messages to the sockets?
Currently I'm writing them as one large chunk. The size can exceed a single packet.
Would it be faster for the client to begin its processing if I sent messages in smaller chunks than 1 packet?
Would it be better to write 1 packet worth to each client at a time, and iterate over the clients multiple times?
Are there any linux/networking configurations that would bear impact here?
Thanks in advance for your feedback and tips.
Does it matter in which order I send messages to the clients?
Yes, by fractions of milliseconds. If the network interface is available for sending the OS will immediately start sending. Why would it wait?
Perhaps I should be sending to the highest-latency clients first?
I think you should be sending in random order. Shuffle the list prior to sending. This makes it fair. I think your question is valid and this should be addressed.
Currently I'm writing them as one large chunk. [...]
First, realize that TCP is stream-based and that there are no packets/messages at the protocol level. On a physical level data is indeed packetized.
It is not necessary to manually split off packets because clients will read data as it arrives anyway. If a client issues a read, that read will complete immediately once the first packet has arrived. There is no artificial waiting in the OS.
Are there any linux/networking configurations that would bear impact here?
I don't know. Be sure to disable nagling.

How to maintain a persistant network-connection between two applications over a network?

I was recently approached by my management with an interesting problem - where I am pretty sure I am telling my bosses the correct information but I really want to make sure I am telling them the correct stuff.
I am being asked to develop some software that has this function:
An application at one location is constantly processing real-time data every second and only generates data if the underlying data has changed in any way.
On the event that the data has changed send the results to another box over a network
Maintains a persistent connection between the both machines, altering the remote box if for some reason the network connection went down
From what I understand, I imagine that I need to do some reading on doing some sort of TCP/IP socket-level stuff. That way if the connection is dropped the remote location will be aware that the data it has received may be stale.
However management seems to be very convinced that this can be accomplished using SOAP. I was under the impression that SOAP is more or less a way for a client to initiate a procedure from a server and get some results via the HTTP protocol. Am I wrong in assuming this? I haven't been able to find much information on how SOAP might be able to solve a problem like this.
I feel like a lot of people around my office are using SOAP as a buzzword and that has generated a bit of confusion over what SOAP actually is - and is capable of.
Any thoughts on how to accomplish this task would be appreciated!
I think SOAP is the wrong tool. SOAP is a spec for exchanging structured data. For your problem, the simplest thing would be to write a program to just transfer data and figure out if the other end is alive. Sockets are a good way to go. There are lots of socket programming tutorials on the net. Pick your language, and ask Mr. Google. Write a couple of demo programs to teach yourself how it works. Ask if you have more specific questions.
For the problem, you'll need a sender and a receiver. The sender sends data when it gets it, the receiver waits for data and hands it off when it arrives. Get that working first. Next, add in heartbeats; a message that says "I'm alive", sent periodically. Get that working next. You'll need to be determine the exact behavior you want -- should both sides send heartbeats to the other end, the maximum time you are willing to wait for a heartbeat, and what action you take should heartbeats stop arriving. The network connection can drop, the other end can crash, the other end can hang, and perhaps there are other conditions you should think about (e.g., what if the real time data is nonsense?). Figure out how to handle each condition, and code up the error handling. Test it out, and serve with a side of documentation.
SOAP certainly won't tell you when the data source goes down, though you could use "heartbeats" to add that.
Probably you are right and they are just repeating a buzz word, and don't actually know much about what SOAP is or does or have any real argument for why it ought to be used here.

serving large file using select, epoll or kqueue

Nginx uses epoll, or other multiplexing techniques(select) for its handling multiple clients, i.e it does not spawn a new thread for every request unlike apache.
I tried to replicate the same in my own test program using select. I could accept connections from multiple client by creating a non-blocking socket and using select to decide which client to serve. My program would simply echo their data back to them .It works fine for small data transfers (some bytes per client)
The problem occurs when I need to send a large file over a connection to the client. Since i have only one thread to serve all client till the time I am finished reading the file and writing it over to the socket i cannot resume serving other client.
Is there a known solution to this problem, or is it best to create a thread for every such request ?
When using select you should not send the whole file at once. If you e.g. are using sendfile to do this it will block until the whole file has been sent. Instead use a small buffer, and send a little data at a time to each client. Then use select to identify when the socket is again ready to be written to and send some more until all data has been sent. This will allow you to handle multiple clients in parallel.
The simplest approach is to create a thread per request, but it's certainly not the most scalable approach. I think at this time basically all high-performance web servers use various asynchronous approaches built on things like epoll (Linux), kqueue (BSD), or IOCP (Windows).
Since you don't provide any information about your performance requirements, and since all the non-threaded approaches require restructuring your application to use these often-complex asynchronous techniques (as described in the C10K article and others found from there), for now your best bet is just to use the threaded approach.
Please update your question with concrete requirements for performance and other relevant data if you need more.
For background this may be useful reading http://www.kegel.com/c10k.html
I think you are using your callback to handle a single connection. This is not how it was designed. Your callback has to handle the whatever-thousand of connections you are planning to serve, i.e from the number of file descriptor you get as parameter, you have to know (by reading the global variables) what to do with that client, either read() or send() or ... whatever

socket programming: How do I handle out of band data

I just looked into wikipedia's entry on out-of-band data and as far as I understand, OOB data is somehow flagged more important and treated as ordinary data, but transmitted in a seperate stream, which profoundly confuses me.
The actual question would be (besides "Could someone explain what OOB data is?"):
I'm writing a unix application that uses sockets and need to make use of select() and was wondering what to do with the exceptfds parameter? Do I need to put all my sockets into this parameter and react to such events? Or do I just ignore them?
I know you've decided you don't need to handle OOB data, but here are some things to keep in mind if you ever do care about OOB...
IPv4 doesn't really send OOB data on a separate channel, or at a different priority. It is just a flag on the packet.
OOB data is extremely limited -- 1 byte!
OOB data can be received either inline or separately depending on socket options
An "exception" signaling OOB data may occur even if the next read doesn't contain the OOB data (the network stack on the sender may flag any already queued data, so the other side will know there's OOB ASAP). This is often handled by entering a "drain" loop where you discard data until the actual OOB data is available.
If this seems a bit confusing and worthless, that's because it mostly is. There are good reasons to use OOB, but it's rare. One example is FTP, where the user may be in the middle of a large transfer but decide to abort. The abort is sent as OOB data. At that point the server and client just eat any further "normal" data to drain anything that's still in transit. If the abort were handled inline with the data then all the outstanding traffic would have to be processed, only to be dumped.
It's good to be aware that OOB exists and the basics of how it works, just in case you ever do need it. But don't bother learning it inside-out unless you're just curious. Chances are decent you may never use it.
I think I found the answer on this page. In short:
I don't need to handle OOB data on the receiving side if I'm not sending any OOB data. I had thought that OOB data could be generated by the OS of the sender.
You don't need to handle it at the receiving end even if you are sending it - OOB data is transparently ignored in all circumstances unless you actively go about receiving it.