TCP Sockets: "Rollback" after timeout occured - sockets

This is a rather general question about TCP sockets. I got a client/server application setup where messages are sent over the wire via TCP. The implementation is done via C++ POCO, however the question is not related to a certain technology.
A message can be a request (initiated by the client) or a response (initiated by the server).
A request has the structure:
Message Header
Request Header
Parameters
A response has the structure
Message Header
Response Header
Parameters
I know TCP guarantees that sent packages will be delivered in the order they have been sent. However, nothing can be assumed about the timespan a delivery might need.
On both sides I have a read/send timeout configured. Now I wonder how to have a clean set up on the transmitted data after a timeout. Don't know how to express this in the right terms, so let me describe an example:
Server S sends a response to the client (Message Header, Response Header, Parameters are put into the stream)
Client C receives the message header partially (e.g. the first 4 bytes of 12)
After these 4 bytes have been received, the reception timeout occurs
On client-side, an appropriate exception is thrown, the reception will be stopped.
The client considers the package as invalid.
Now the problem is, when the client tries to receive another package, he might receive the lasting part of the "old" response message header. From the point of view of the currently processed transaction (send request/get response), the client receives garbage.
So it seems that after a timeout has occured (no matter whether it has been on client or server-side), the communication should continue with a "clean setup", meaning that none of the communication partners will try to send some old package data and that no old package data is stored within the stream buffer of the respective socket.
So how are such situations commonly handled? Is there some kind of design pattern / idiomatic way to solve this?
How are such situations handled within other TCP-based protocols, e.g. HTTP?
In all the TCP samples around the net I've never seen an implementation that deals with those kind of problems...
Thank you in advance

when the client tries to receive another package, he might receive the lasting part of the "old" response message header
He will receive the rest of the failed message, if he receives anything at all. He can't receive anything else, and specifically data that was sent later can't be received before or instead of data that was sent earlier. It is a reliable byte-stream. You can code accordingly.
the communication should continue with a "clean setup", meaning that none of the communication partners will try to send some old package data
You can't control that. If a following message has been written to the TCP socket send buffer, which is all that send() actually does, it will be sent, and there is no way of preventing it short of resetting the connection.
So you either need to code your client to cope with the entire bytestream as it arrives or possibly close the connection on a timeout and start again.

Related

how to find that the client is reading from tcp buffer in go

I'm starting to use golang for a quite amount of time for a project. In my project I have to implement a tcp server which responds to tcp clients. The server has to send a number of messages to a client.
The problem is that when a server writes a message to a client connection, it has to wait until the client has read that message from buffer and then send another message (the server has to wait until the client calls the reader.ReadString('\n') method).
In my server code I wrote:
for {
data := <-client.outgoing
client.writer.WriteString(data + "\n")
client.writer.Flush()
}
but the server sends all the messages to client without waiting for ReadString in client.
How to make server wait until the client read a message and then send the other message?
I think that either the assignment is ambiguous or you're misinterpreting it and solving the XY problem.
The short answer is that you can never know whether the client has read a message just by looking at the TCP conversation. You have to implement this "protocol" in your application.
Here are a few problems:
From your application you don't really have access to what TCP is doing. You get a stream on which you can perform I/O.
The fact that a write to your stream "succeeds" only means that TCP has agreed to try to transport your stuff and has an independent copy. It doesn't say anything about whether the data has been received and it doesn't even mean the data has been even sent
You may find certain mechanisms to peer into TCP's inner workings (such as ioctls, SIOCINQ, SIOCOUTQ or various setsockopts): these won't help
Even if you find out what your TCP is doing, this only tells you what the remote TCP is doing. So if you have full control over your TCP and even see the acknowledgments from the peer, you still don't know what the application is doing. It's very possible the application didn't read the data yet (it might not have requested the data, the TCP might be withholding it in a buffer for some weird reason, the scheduler might not have scheduled the remote process etc.)
Going back to your question, a way to really know whether the remote application has received your message is to have the remote application tell you. This means you have to restructure your protocol to:
Send stuff from the server
Wait for a message from the application telling you it received your stuff
Send more stuff (because you know from point 2 it's safe to do so)

Can we just reset TCP connections after an application level acknowledgement has been received?

I'm investigating resetting a TCP connection as a solution to the TIME_WAIT issue.
Let's use the following request-reply protocol as an example:
The client opens a connection to the server.
The client sends a request.
The server replies.
The server closes.
The client closes as well.
This causes a TIME_WAIT state at the server. As a variation, the client could close first. Then, the TIME_WAIT lands on the client.
Can we not replace steps 4 and 5 by the following?
The client resets.
The server resets in response to the incoming reset.
This seems to be a way to avoid the TIME_WAIT issue. The server has proven that it received and processed the request by sending its reply. Once the client has the reply the connection is expendable and can just go away.
Is this a good idea?
I would say: No it's not a good idea. Every possible solution ends up with the same "problem" that TIME_WAIT ultimately addresses: how does party A, acknowledging the ending of the connection (or acknowledging the other side's final acknowledgment of the ending of the connection), know that party B got the acknowledgment? And the answer is always: it can't ever know that for sure.
You say:
the server has proven that it received and processed the request by sending its reply
... but what if that reply gets lost? The server has now cleaned up its side of the session, but the client will be waiting forever for that reply.
The TCP state machine may seem overly complicated at first glance but it's all done that way for good reason.
The only problem is that the server doesn't know whether the client received everything. The situation is ambiguous: did the client connection reset because the client received the whole reply, or was it reset for some other reason?
Adding an application level acknowledgement doesn't reliably fix the problem. If the client acknowledges, and then immediately closes abortively, the client can't be sure that the server received that acknowledgement, because the abortive close discards untransmitted data. Moreover, even if the data are transmitted, it can be lost since the connection is unreliable; and once the connection is aborted, the TCP stack will no longer provide re-transmissions of that data.
The regular, non-abortive situation addresses the problem by having the client and server TCP stacks take care of the final rites independently of application execution.
So, in summary, the aborts are okay if all we care about is that the client receives its reply, and the server doesn't care whether or not that succeeded: not an unreasonable assumption in many circumstances.
I suspect you are wrong about the TIME_WAIT being on the server.
If you follow the following sequence for a single TCP-based client-server transaction, then the TIME_WAIT is on the client side:
client initiates active connection to server
client sends request to server.
client half-closes the connection (i.e. sends FIN)
server reads client request until EOF (FIN segment)
server sends reply and closes (generating FIN)
clients reads response to EOF
client closes.
Since client was the first to send the FIN, it goes into TIME_WAIT.
The trick is that the client must close the sending direction first, and the server synchronizes on it by reading the entire request. In other words, you use the stream boundaries as your message boundaries.
What you're trying to do is do the request framing purely inside the application protocol and not use the TCP framing at all. That is to say, the server recognizes the end of the client message without the client having closed, and likewise the client parses the server response without caring about reading until the end.
Even if your protocol is like this, you can still go through the motions of the half-close dance routine. The server, after having retrieve the client request, can nevertheless keep reading from its socket and discarding bytes until it reads everything, even though no bytes are expected.

Sending and receiving data over Internet

This question is not for a concrete implementation of how this is done. It is more about the concept and design of sending information over Internet with some kind of protocol - either TCP or UDP. I know only that sockets are needed, but I am wondering about the rest. For example after a connection is made and you send the information through that, but how does the other end listen for a specific port and does it listen constantly?
Is listening done in a background thread waiting for information to be received? (In order to be able to do other things/processing while waiting for information)
So in essence, I think a real world example of how such an application works on a high level would be enough to explain the data flow. For example sending files in Skype or something similar.
P.S. Most other questions on similar topics are about a concrete implementation or a bug that someone has.
What I currently do in an application is the following using POSIX sockets with the TCP Protocol:
Most important thing is: The most function are blocking functions. So when you tell your server to wait for client connection, the function will block until a connection is established (if you need a server that handles multiple clients at once, you need to use threading!)
Server listens for specific port until a client connects. After the connect, you will get a new socket file descriptor to communicate with the client whilst the initial socket can listen to new connections. My server then creats a new thread to handle that client whilst waiting for new connections on the initial socket. In the new thread the server waits for a request command from the Client (e.g. Request Login Token). After a request was received by the server, the server will gather its informations, packs it together using Googles Protocol Buffers and sends it to the client. The client now either tells the server to terminate the session (if every data is received by the client that it needs) or send another request.
Thats basically the idea in my server. The bigger problem is the way you transmit and receive data. E.g. you cant send structs or classes (at least not via C++) over the wire, you need some kind of serializer and you have to make sure the other part knows how much to receive. So what i do is, first send a 4byte integer over the wire containing the size of the incomming package, then send the package itself using a serializer (in my case Googles Protocol buffers). The other side waits for 4 byte to be available, knowing that this will be the size of the incomming package. After 4 bytes are received, the program waits for exact that amount of data being available on the socket, when available, read the data out of the buffer and deserialize it. When the socket is not receiving data for 30 seconds, trigger a timeout and terminate the connection.
What you always need to be aware of is the endianess of the systems. E.g. a big endian system (e.g. PowerPC) and a little endian system (e.g. x86) will have problems when you send an integer directly over the wire. For example a
0001
on the x86, is a
1000
on the Power PC, thus making a 8 out of a 1. So you should always use functions like ntohl, an htonl, which will convert data from and to host byte order from and to network byte order (network byte order is always big endian).
Hope this kind of helps. I could also provide some code to you if that would help.

Has the client ACK'd all the data I sent to it?

RFC 7230 defines HTTP/1.1 protocol and it has an interesting passage in 6.6, "Connection management. Tear-down":
To avoid the TCP reset problem, servers typically close a connection
in stages. First, the server performs a half-close by closing only the
write side of the read/write connection. The server then continues to
read from the connection until it receives a corresponding close by
the client, or until the server is reasonably certain that its own TCP
stack has received the client's acknowledgement of the packet(s)
containing the server's last response. Finally, the server fully
closes the connection.
Basically it boils down to the following:
shutdown(s, SD_SEND);
while (recv(s, throaway_buffer, throaway_buffer_len, 0) > 0);
closesocket(s);
which is the standard way of doing the graceful shutdown. However, it also acknowledges that a misbehaving client may exist (that keeps sending requests even after receiving a response with Connection: close header), and that the server has to cope with it by resetting the connection after it's sure the client has received the last response.
However, the socket interface doesn't seem to provide the functionality to learn whether all data passed to send have been actually sent and ACK'd by the remote host. Is it actually there? Without it, all I can think about is to set up a timer of sorts, and call recv until either it signals that the remote host has closed connection or the time is out, whichever comes first. But what would be the appropriate timeout? Is 60 seconds okay?
The Sockets interface provides this mean via the little-used and less understood SO_LINGER option. It allows you inter alia to define a timeout during which close() and possibly shutdown() will block while pending data is being sent. It is of little practical use and as I've stated it is rarely used ... at least rarely used correctly.

Ensuring send() data delivered

Is there any way of checking if data sent using winsock's send() or WSASend() are really delivered to destination?
I'm writing an application talking with third party server, which sometimes goes down after working for some time, and need to be sure if messages sent to that server are delivered or not. The problem is sometimes calling send() finishes without error, even if server is already down, and only next send() finishes with error - so I have no idea if previous message was delivered or not.
I suppose on TCP layer there is information if certain (or all) packets sent were acked or not, but it is not available using socket interface (or I cannot find a way).
Worst of all, I cannot change the code of the server, so I can't get any delivery confirmation messages.
I'm sorry, but given what you're trying to achieve, you should realise that even if the TCP stack COULD give you an indication that a particular set of bytes has been ACK'd by the remote TCP stack it wouldn't actually mean anything different to what you know at the moment.
The problem is that unless you have an application level ACK from the remote application which is only sent once the remote application has actioned the data that you have sent to it then you will never know for sure if the data has been received by the remote application.
'but I can assume its close enough'
is just delusional. You may as well make that assumption if your send completes as it's about as valid.
The issue is that even if the TCP stack could tell you that the remote stack had ACK'd the data (1) that's not the same thing as the remote application receiving the data (2) and that is not the same thing as the remote application actually USING the data (3).
Given that the remote application COULD crash at any point, 1, 2 OR 3 the only worthwhile indication that the data has arrived is one that is sent by the remote application after it has used the data for the intended purpose.
Everything else is just wishful thinking.
Not from the return to send(). All send() says is that the data was pushed into the send buffer. Connected socket streams are not guarenteed to send all the data, just that the data will be in order. So you can't assume that your send() will be done in a single packet or if it will ever occur due to network delay or interruption.
If you need a full acknowledgement, you'll have to look at higher application level acks (server sending back a formatted ack message, not just packet ACKs).