Is application level heartbeating preferable to TCP keepalives? - sockets

Is there a reason why I should use application level heartbeating instead of TCP keepalives to detect stale connections, given that only Windows and Linux machines are involved in our setup?

It seems that the TCP keepalive parameters can't be set on a per-socket basis on Windows or OSX, that's why.
Edit: All parameters except the number of keepalive retransmissions can in fact be set on Windows (2000 onwards) too: http://msdn.microsoft.com/en-us/library/windows/desktop/dd877220%28v=vs.85%29.aspx
I was trying to do this with zeromq, but it just seems that zeromq does not support this on Windows?

From John Jefferies response : ZMQ Pattern Dealer/Router HeartBeating
"Heartbeating isn't necessary to keep the connection alive (there is a ZMQ_TCP_KEEPALIVE socket option for TCP sockets). Instead, heartbeating is required for both sides to know that the other side is still active. If either side does detect that the other is inactive, it can take alternative action."

TCP keepalives serve an entirely different function from application level heartbeating. A keepalive does just that, it keeps the TCP session active rather than allow it to time out after long periods of silence. This is important and good, and (if appropriate) you should use it in your application. But a TCP session dying due to inactivity is only one way that the connection can be severed between a pair of ZMQ sockets. One endpoint could lose power for 90 minutes and be offline, TCP keepalives wouldn't do squat for you in that scenario.
Application level heartbeating is not intended to keep the TCP session active, expecting you to rely on keepalives for that function if possible. Heartbeating is there to tell your application that the connection is in fact still active and the peer socket is still functioning properly. This would tell you that your peer is unavailable so you can behave appropriately, by caching messages, throwing an exception, sending an alert, etc etc etc.
In short:
a TCP keepalive is intended to keep the connection alive (but doesn't protect against all disconnection scenarios)
an app-level heartbeat is intended to tell your application if the connection is alive

Related

CRIU - checkpointing a TCP socket without stopping it (COW ?)

I'm working on a failover mechanism for TCP connections. If a host breaks down (hardware failure), I'd like to be able to take up the connection on another machine. I want to stream periodically the state of the "live" socket to a "backup" host and have it take-over (tcp_repair and all) when the "live" host breaks.
I have a prototype with libsoccr and it works OK, except that I have to pause the socket for some time, and depending on the buffer sizes it can take some time (hundreds of microsecs, sometimes 1-2ms) and it's a bit problematic for my application, since I dump its state quite often (~every 10ms).
I'd like to be able to checkpoint a TCP socket (via libsoccr if it's the way, I'm also OK with raw syscalls if necessary) without pausing the socket. Is it possible to just "fork" or duplicate a TCP socket with its complete state, with some kind of CoW so that the live socket isn't paused ?
Would fork help here ?
Any idea ?

Are application level Retransmission and Acknowledgement needed over TCP?

I have the following queries:
1) Does TCP guarantee delivery of packets and thus is thus application level re-transmission ever required if transport protocol used is TCP. Lets say I have established a TCP connection between a client and server, and server sends a message to the client. However the client goes offline and comes back only after say 10 hours, so will TCP stack handle re-transmission and delivering message to the client or will the application running on the server need to handle it?
2) Related to the above question, is application level ACK needed if transport protocol is TCP. One reason for application ACK would be that without it, the application would not know when the remote end received the message. Is there any reason other than that? Meaning is the delivery of the message itself guaranteed?
Does TCP guarantee delivery of packets and thus is thus application level re-transmission ever required if transport protocol used is TCP
TCP guarantees delivery of message stream bytes to the TCP layer on the other end of the TCP connection. So an application shouldn't have to bother with the nuances of retransmission. However, read the rest of my answer before taking that as an absolute.
However the client goes offline and comes back only after say 10 hours, so will TCP stack handle re-transmission and delivering message to the client or will the application running on the server need to handle it?
No, not really. Even though TCP has some degree of retry logic for individual TCP packets, it can not perform reconnections if the remote endpoint is disconnected. In other words, it will eventually "time out" waiting to get a TCP ACK from the remote side and do a few retries. But will eventually give up and notify the application through the socket interface that the remote endpoint connection is in a dead or closed state. Typical pattern is that when a client application detects that it lost the socket connection to the server, it either reports an error to the user interface of the application or retries the connection. Either way, it's application level decision on how to handle a failed TCP connection.
is application level ACK needed if transport protocol is TCP
Yes, absolutely. Most client-server protocols has some notion of a request/response pair of messages. A TCP socket can only indicate to the application if data "sent" by the application is successfully queued to the kernel's network stack. It provides no guarantees that the application on top of the socket on the remote end actually "got it" or "processed it". Your protocol on top of TCP should provide some sort of response indication when ever a message is processed. Use HTTP as a good example here. Imagine if an application would send an HTTP POST message to the server, but there was not acknowledgement (e.g. 200 OK) from the server. How would the client know the server processed it?
In a world of Network Address Translators (NATs) and proxy servers, TCP connections that are idle (no data between each other) can fail as the NAT or proxy closes the connection on behalf of the actual endpoint because it perceives a lack of data being sent. The solution is to have some sort of periodic "ping" and "pong" protocol by which the applications can keep the TCP connection alive in the absences of having no data to send.

Keep TCP connection on permanently with ESP8266 TCP client

I am using the wifi chip ESP8266 with SMING framework.
I am able to establish a TCP connection as a client to a remote server. The code for initiating client connection to server is simple.
tcpClient.connect(SERVER_HOST, SERVER_PORT);
Unfortunately, the connection will close after idling for some time. I would like to keep this connection open forever permanently. How can this be done?
You will actually need to monitor the connection state and reconnect it if it failed. Your protocol on top of it will need to keep track of what got actually received by the other side and retransmit it.
In any wireless network your link may go down for one reason or another and if you need to maintain a long term connection you will need to have it in a layer above TCP itself.
TCP will continue to be connected as long as both sides allow for it (none of them disconnected) and there are no errors on the link, in this case sending keepalives may actually cause disconnects since the keepalive may fail at one time but the link could recover and if you didn't have the keepalive the link would have stayed up.

How to handle TCP keepalive in application

I have a TCP application running on VxWorks. I have SO_KEEPALIVE option set for my TCP connections. My application keep track of all TCP connection and put it into a link list.
If client is idle for long time, we see that connection is closing down. Connection is not listed in netstat output.
As the connection is closed by TCP stack, resources allocated for that connection are not cleaned up. Can you please help me figure out how does application get notified if connection is closed due to keep-alive's failures.
TCP keepalive is intended primarily to prevent network routers from shutting the TCP connection down during long periods of inactivity, not to prevent your OS or application from shutting down the connection when it deems appropriate.
In most TCP/IP implementations, you can determine if a connection has been closed by attempting to read from it.
From this reference : http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
I quote :
This procedure is useful because if the other peers lose their connection (for example by rebooting) you will notice that the connection is broken, even if you don't have traffic on it. If the keepalive probes are not replied to by your peer, you can assert that the connection cannot be considered valid and then take the correct action.
If you have a server for instance and a lot of clients can connect to it, without sending regularly, you might end up in a situation with clients that are no longer there. A client may have rebooted and this goes undetected because a FIN is never sent in that case.
For cases like this the keepalive exists.
From TCP point of view there is nothing special with a keep alive. And hence if the peer fails to ack a keepalive, you will receive 0 bytes on your socket and you'll have to close your end of the socket. Which is the only corrective action you can do at that moment.
As the connection is closed by TCP stack, resources allocated for that connection are not cleaned up.
Only if you never use the connection again.
If client is idle for long time, we see that connection is closing down. Connection is not listed in netstat output.
Make up your mind. Either you see it or you don't. What you will see is the port in CLOSE_WAIT in netstat.
Can you please help me figure out how does application get notified if connection is closed due to keep-alive's failures.
Next time you use the connection for read or write you will get an ECONNRESET.

TCP IO Akka socket connection closed is not called when Internet is down

I have implemented a socket - client interaction using akka's TCP module. I am trying to make the application to detect when the socket is closed and release the resources assigned to that client's socket.
Akka has case _ : ConnectionClosed case in order to handle this kind of situation.But i have realized that it is not being called when the internet connection is down.
I couldn't be able to find anything to detect that the socket's client part is disconnected from the internet.
Is there any specifics that I am missing?
The network connection going down doesn't necessarily close any sockets, the OS is free to leave them open in case the network connection recovers. I believe this is really an issue with your OS, and not with Akka. TCP connections will eventually timeout, but this can take tens of minutes. See TCP Socket no connection timeout.