Socket Keepalive with Periodic Send - sockets

I have a C/C++ application set up as follows:
A non-blocking TCP server socket on a linux platform
A thread which writes a small packet (less than 20 bytes) to the socket at 1 Hz
The socket is configured with keepalive enabled and with: keepidle=5, keepintvl=5 and keepcnt=3
My intention is that the keepalive mechanism should detect a physical disconnection of the network link. However, when the link is cut, I do not see the zero-length packets which should be generated by the keepalive mechanism (I am using tcpdump to monitor traffic). My impression is that what happens is this: after the cable disconnection, the application keeps making send requests and the fact that there are pending send requests prevents the keepalive mechanism from being activated. Is this explanation valid?
In order to check my explanation, I have modified my test as follows:
A non-blocking TCP server socket on a linux platform
A cyclical thread which writes a small packet (about 100 bytes) to the socket with a period of 30 seconds
The socket is configured with keepalive enabled and with: keepidle=5, keepintvl=5 and keepcnt=2
In this case, if I cut the connection, the keepalive mechanism triggers within about 15-20 seconds (which is what I would expect).
On a related point, I would like to understand the exact semantics of tcp_keepidle. This is defined as: "The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes". What exactly does 'idle' means in this context? Does it simply mean that nothing is received and nothing is put on the network; or does it mean that nothing is received and no send requests are made to the socket?

Related

Does TCP keepalive refresh the timeout on a NAT?

I've read that NAT routers "assume a connection has been terminated if no data has been sent for a certain time period."
I've also read that TCP keepalive packets usually shouldn't contain any data.
So my questions are:
Are the above statements true?
Do NAT routers consider empty TCP keepalive packets when reordering/cleaning their tables?
I'm asking this because I need a reliable connection between two endpoints where both of them have to be able to detect and react to connection problems. I know that I might just implement a keepalive mechanism myself but I want to know whether the TCP implementation could be used for that.
I do believe the second statement refers to payload (The shortest possible TCP/IP packet is 40 bytes long - 20 bytes for TCP header + 20 bytes for IPv4 header).
Regarding the first, here's a quote from RFC 2663:
End of session for TCP, UDP and others
The end of a TCP session is detected when FIN is acknowledged by
both halves of the session or when either half receives a segment with
the RST bit in TCP flags field. However, because it is impossible for
a NAT device to know whether the packets it sees will actually be
delivered to the destination [...] the NAT device cannot safely assume
that the segments containing FINs or SYNs will be the last packets of
the session [...] Consequently, a session can be assumed to have been
terminated only after a period of 4 minutes subsequent to this
detection. The need for this extended wait period is described in RFC
793 [Ref 7], which suggests a TIME-WAIT duration of 2 * MSL (Maximum
Segment Lifetime) or 4 minutes.
Reference: https://www.rfc-editor.org/rfc/rfc2663
To my understanding, any packets that identifies a session would reset the TTL counter - but that depends heavily on implementation, since 'data' can be understood as 'packet' (minimum 40 bytes) or 'packet payload'. Nonetheless, #CodeCaster is spot-on; never assume that a connection is alive, make sure it is before sending (and, if possible and depending on criticality, acknowledge receipt.)

Is there any chance of data of old TCP connection to sneak into new TCP connection on same port

I am setting SO_REUSEADDR option on sockets.
Suppose a socket is closed from one end.
And socket descriptor got reassigned to other process.
Is there any chance of data from old TCP connection to sneak into new TCP connection?
Did anybody observe old data sneaking into new TCP connection especially on Solaris?
No.
If you re-use the local port, but either the remote host or port changes in the subsequent connection, then this is impossible.
For the case of reconnecting back to the same remote IP/port from the same local IP/port, also known as TIME-WAIT Assasination, there are some rules for the TCP stack to abide by. Mainly - starting out with a higher sequence number than the previous connection. You can read the fine print in RFC 1337. But here's a better link and quote that outlines how the sequence number is adjusted on subsequent connections.
http://blogs.technet.com/b/networking/archive/2010/08/11/how-tcp-time-wait-assassination-works.aspx
In a situation where the server side socket goes to a TIME-WAIT state
and the client reconnects to the server within 2MSL (default TIME-WAIT
time), there are 2 things that can happen:
The server will not respond to the SYN packets from the client because the socket is
in the TIME-WAIT state.
The server may accept the SYN from the client and change the state of the socket
from TIME-WAIT to ESTABLISHED. This is known as TIME-WAIT assassination, or
incarnation of a previous connection.
The key to scenario ‘2’ above is that the ISN (Initial Sequence
number) of the SYN sent needs to be higher than the highest sequence
number used in the previous session. If the ISN is not as expected,
the server will not respond to the SYN and the socket will wait for
2MSL before being available for use again.
That's what the TIME_WAIT state is for. It lasts for twice the maximum segment lifetime, so that any data sent to an old connection will expire before a new connection between the same IP:port pairs can be formed.

How to set the keep alive interval for winsock

I am using winsock and TCP.
I have set the KeepAlive option as follows
int aliveToggle = 1;
setsockopt(mySocket,SOL_SOCKET,SO_KEEPALIVE,(char*)&aliveToggle, sizeof(aliveToggle));
But how to specify the Keep aLive time and interval?
I am using VC++ running on windows 7.
From c/c++ you should be able to use SIO_KEEPALIVE_VALS to control the timeouts. You can't use setsockopt, but you should be able to use WSAIoctl. See https://web.archive.org/web/20130828175019/http://msdn.microsoft.com/en-us/library/windows/desktop/dd877220(v=vs.85).aspx
Here's an example https://web.archive.org/web/20130827074722/http://read.pudn.com/downloads79/ebook/301417/Chapter09/SIO_KEEPALIVE_VALS/alive.c__.htm
Two per-interface registry settings under the key \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Tcpip\Parameters control the behavior of TCP/IP keep-alives:
The KeepAliveTime value specifies how long the TCP connection sits idle, with no traffic, before TCP sends a keep-alive packet. The default is 7,200,000 milliseconds (ms) or 2 hours.
The KeepAliveInterval value indicates how many milliseconds to wait for a response after sending a keep-alive before repeating the keep-alive. If no response is received, the TCP/IP stack continues sending keep-alives at this interval until a response is received or until the stack reaches the packet retry limit specified in the TCPMaxDataRetransmissions registry key. KeepAliveInterval defaults to 1 second (1000 .
TCP keep-alives are disabled by default, but Windows Sockets applications can use the setsockopt function to enable them on a per-connection basis.
Note  If the developer elects to use TCP keep-alive messages on a particular connection, the timing of those messages is specified by the registry values described preceding. It is not possible to use different timing on different keep-alive requests.

When is TCP option SO_LINGER (0) required?

I think I understand the formal meaning of the option. In some legacy code I'm handling now, the option is used. The customer complains about RST as response to FIN from its side on connection close from its side.
I am not sure I can remove it safely, since I don't understand when it should be used.
Can you please give an example of when the option would be required?
For my suggestion, please read the last section: “When to use SO_LINGER with timeout 0”.
Before we come to that a little lecture about:
Normal TCP termination
TIME_WAIT
FIN, ACK and RST
Normal TCP termination
The normal TCP termination sequence looks like this (simplified):
We have two peers: A and B
A calls close()
A sends FIN to B
A goes into FIN_WAIT_1 state
B receives FIN
B sends ACK to A
B goes into CLOSE_WAIT state
A receives ACK
A goes into FIN_WAIT_2 state
B calls close()
B sends FIN to A
B goes into LAST_ACK state
A receives FIN
A sends ACK to B
A goes into TIME_WAIT state
B receives ACK
B goes to CLOSED state – i.e. is removed from the socket tables
TIME_WAIT
So the peer that initiates the termination – i.e. calls close() first – will end up in the TIME_WAIT state.
To understand why the TIME_WAIT state is our friend, please read section 2.7 in "UNIX Network Programming" third edition by Stevens et al (page 43).
However, it can be a problem with lots of sockets in TIME_WAIT state on a server as it could eventually prevent new connections from being accepted.
To work around this problem, I have seen many suggesting to set the SO_LINGER socket option with timeout 0 before calling close(). However, this is a bad solution as it causes the TCP connection to be terminated with an error.
Instead, design your application protocol so the connection termination is always initiated from the client side. If the client always knows when it has read all remaining data it can initiate the termination sequence. As an example, a browser knows from the Content-Length HTTP header when it has read all data and can initiate the close. (I know that in HTTP 1.1 it will keep it open for a while for a possible reuse, and then close it.)
If the server needs to close the connection, design the application protocol so the server asks the client to call close().
When to use SO_LINGER with timeout 0
Again, according to "UNIX Network Programming" third edition page 202-203, setting SO_LINGER with timeout 0 prior to calling close() will cause the normal termination sequence not to be initiated.
Instead, the peer setting this option and calling close() will send a RST (connection reset) which indicates an error condition and this is how it will be perceived at the other end. You will typically see errors like "Connection reset by peer".
Therefore, in the normal situation it is a really bad idea to set SO_LINGER with timeout 0 prior to calling close() – from now on called abortive close – in a server application.
However, certain situation warrants doing so anyway:
If a client of your server application misbehaves (times out, returns invalid data, etc.) an abortive close makes sense to avoid being stuck in CLOSE_WAIT or ending up in the TIME_WAIT state.
If you must restart your server application which currently has thousands of client connections you might consider setting this socket option to avoid thousands of server sockets in TIME_WAIT (when calling close() from the server end) as this might prevent the server from getting available ports for new client connections after being restarted.
On page 202 in the aforementioned book it specifically says: "There are certain circumstances which warrant using this feature to send an abortive close. One example is an RS-232 terminal server, which might hang forever in CLOSE_WAIT trying to deliver data to a stuck terminal port, but would properly reset the stuck port if it got an RST to discard the pending data."
I would recommend this long article which I believe gives a very good answer to your question.
The typical reason to set a SO_LINGER timeout of zero is to avoid large numbers of connections sitting in the TIME_WAIT state, tying up all the available resources on a server.
When a TCP connection is closed cleanly, the end that initiated the close ("active close") ends up with the connection sitting in TIME_WAIT for several minutes. So if your protocol is one where the server initiates the connection close, and involves very large numbers of short-lived connections, then it might be susceptible to this problem.
This isn't a good idea, though - TIME_WAIT exists for a reason (to ensure that stray packets from old connections don't interfere with new connections). It's a better idea to redesign your protocol to one where the client initiates the connection close, if possible.
When linger is on but the timeout is zero the TCP stack doesn't wait for pending data to be sent before closing the connection. Data could be lost due to this but by setting linger this way you're accepting this and asking that the connection be reset straight away rather than closed gracefully. This causes an RST to be sent rather than the usual FIN.
Thanks to EJP for his comment, see here for details.
Whether you can remove the linger in your code safely or not depends on the type of your application: is it a „client“ (opening TCP connections and actively closing it first) or is it a „server“ (listening to a TCP open and closing it after the other side initiated the close)?
If your application has the flavor of a „client“ (closing first) AND you initiate & close a huge number of connections to different servers (e.g. when your app is a monitoring app supervising the reachability of a huge number of different servers) your app has the problem that all your client connections are stuck in TIME_WAIT state. Then, I would recommend to shorten the timeout to a smaller value than the default to still shutdown gracefully but free up the client connections resources earlier. I would not set the timeout to 0, as 0 does not shutdown gracefully with FIN but abortive with RST.
If your application has the flavor of a „client“ and has to fetch a huge amount of small files from the same server, you should not initiate a new TCP connection per file and end up in a huge amount of client connections in TIME_WAIT, but keep the connection open and fetch all data over the same connection. Linger option can and should be removed.
If your application is a „server“ (close second as reaction to peer‘s close), on close() your connection is shutdown gracefully and resources are freed up as you don‘t enter TIME_WAIT state. Linger should not be used. But if your sever app has a supervisory process detecting inactive open connections idleing for a long time („long“ is to be defined) you can shutdown this inactive connection from your side - see it as kind of error handling - with an abortive shutdown. This is done by setting linger timeout to 0. close() will then send a RST to the client, telling him that you are angry :-)
I just saw that in the websockets RFC (RFC 6455), it explicitly states that the server should call close() on the TCP socket first(!)
I was in awe, as I hold the answer/posts by #mgd in this thread as de facto, and the RFC clearly goes against that. But, perhaps this would be a case where setting a linger time of 0 would be acceptable.
The underlying TCP connection, in most normal cases, SHOULD be closed
first by the server, so that it holds the TIME_WAIT state and not the
client
I'm very interested to hear any thoughts/insight on this.
In servers, you may like to send RST instead of FIN when disconnecting misbehaving clients. That skips FIN-WAIT followed by TIME-WAIT socket states in the server, which prevents from depleting server resources, and, hence, protects from this kind of denial-of-service attack.
I like Maxim's observation that DOS attacks can exhaust server resources. It also happens without an actually malicious adversary.
Some servers have to deal with the 'unintentional DOS attack' which occurs when the client app has a bug with connection leak, where they keep creating a new connection for every new command they send to your server. And then perhaps eventually closing their connections if they hit GC pressure, or perhaps the connections eventually time out.
Another scenario is when 'all clients have the same TCP address' scenario. Then client connections are distinguishable only by port numbers (if they connect to a single server). And if clients start rapidly cycling opening/closing connections for any reason they can exhaust the (client addr+port, server IP+port) tuple-space.
So I think servers may be best advised to switch to the Linger-Zero strategy when they see a high number of sockets in the TIME_WAIT state - although it doesn't fix the client behavior, it might reduce the impact.
The listen socket on a server can use linger with time 0 to have access to binding back to the socket immediately and to reset any clients whose connections are not yet finished connecting. TIME_WAIT is something that is only interesting when you have a multi-path network and can end up with miss-ordered packets or otherwise are dealing with odd network packet ordering/arrival-timing.

What's the difference between TIME-WAIT Assassination and SO_REUSEADDR

I was reading about using the SO_LINGER socket option to intentionally 'assassinate' the time-wait state by setting the linger time to zero. The author of the book then goes on to say we should never do this and in general that we should never interfere with the time-wait state. He then immediately recommends using the SO_REUSEADDR option to bypass the time-wait state.
My question is, what's the difference? In both cases you're prematurely terminating the time-wait state and taking the risk of receiving duplicate segments. Why is one good and the other bad?
TIME_WAIT is absolutely normal. It occurs after a TCP FIN on the local side followed by a TCP FIN ACK from the remote location. In TIME_WAIT you are just waiting for any stray packets to arrive at the local address. However if there is a lost or stray packet then TIME_WAIT ensure that TTL or "time to live" expires before using the address again.
If you use SO_REUSEADDR then you are basically saying, I will assume that there are no stray packets. Which is increasingly likely with modern, reliable, TCP networks. Although it is still possible it is unlikely.
Setting SO_LINGER to zero causes you to initiate an abnormal close, also called "slamming the connection shut." Here you do not respect TIME_WAIT and ignore the possiblity of a stray packet.
If you see FIN_WAIT_1 then this can cause problems, as the remote location has not sent a TCP FIN ACK in response to your FIN. So the process was either killed or the TCP FIN ACK was lost due to a network partition or a bad route.
When you see CLOSE_WAIT you have a problem, here you are leaking connections as you are not sending the TCP FIN ACK when given the TCP FIN.
I did some more reading and this is my understanding of what happens (hopefully correct):
When you call close on a socket which has SO_REUSEADDR set ( or your app crashes ) the following sequence occurs:
TCP Sends any remaining data in the send buffer and a FIN
If close was called it returns immediately without indicated if any remaining data was delivered successfully.
If data was sent the peer sends a data ACK
The peer sends an ACK of the FIN and sends it's own FIN packet
The peer's FIN is acked and the socket resources are deallocated.
The socket does not enter TIME-WAIT.
When you close a socket with the SO_LINGER time set to zero:
TCP discards any data in the send buffer
TCP sends a RST packet to the peer
The socket resource are deallocated.
The socket does not enter TIME-WAIT
So beyond the fact that setting linger to zero is a hack and bad style it's also bad manners as it doesn't go through a clean shutdown of the connection.
I have use SO_REUSEADDR to wildcard bind() to a local port for which some other program already had a connection open on. It turns out this particular use will never cause a problem so long as no two sockets try to listen() on the same addr/port combo at the same time.