What is the callback mechanism used when non-blocking version of connect() call is used in socket programming? - sockets

In socket programming, let us say the server is listen for TCP connection on a particular port.
Now, on the client side, i create a socket and call connect() to establish a connection with the server. Note: the connect() API is called in non-blocking mode.
Since it is an non-blocking call and there is no callback method being passed when calling connect() API to be notified on completion of the event. So, i want to know HOW does the client gets to know when the TCP connection has been established successfully. So that it can initiate the data transfer?
Secondly part of the question - WHEN. Basically, for the TCP connection to be established, there should be 3 way handshake happening as below-
I assume, when the connect() API is called from client, SYNC packet is being sent from the client and connection establishment process is initiated. Since the connect() API is called in a non-blocking mode, it just initiates the connection by requesting the kernel and returns back the function call. And once the connection is successfully established the kernel has to notify the client saying - it is good to go and transfer the data. My confusion here is, the last phase is the 3 way handshake is completing at the server side (after the ACK packet is reached at the server), so how does the kernel at the client side be aware of the completion of the connection process?
Or is it like the kernel will notify the client process of the establishment of the connection as soon as it receives the SYNC+ACK from Server process?

There is no callback mechanism. Callback mechanisms are associated with asynchronous I/O, in some APIs. Not with non-blocking I/O. And no, they aren't the same thing.
When a non-blocking connect() doesn't complete immediately, as it usually doesn't, otherwise what would be the point, it returns -1 with errno set to EINPROGRESS. You should then select() or poll() or epoll() the socket for writeability, an so on as described in man connect. This is not, to restate the point, a callback mechanism. It is in fact a polling mechanism.

When non-blocking socket is used, connect() will usually return EINPROGRESS.
In that case, you can use select() function for waiting for connection establishment:
Set the socket to the write-set of the select() call.
When the connection is established/failed, select() will return and the write-set indicates that your socket is writable. Then you can call getsockopt() for getting result of the non blocking connect:
if (getsockopt(socket, SOL_SOCKET, SO_ERROR, &error, &len) != -1)
...
Blocking TCP connect() returns when the client is received SYN-ACK.
And similar way with non-blocking TCP socket: select() returns when SYN-ACK is received:
There is little bit inaccuracy in the picture for making it more clear. I tried to illustrate slowness of the network by placing SYN after select call, and ACK after select return.
TCP-state of the client is change to ESTABLISHED when SYN-ACK is received. TCP-state of the server is change to ESTABLISHED when the ACK (of SYN-ACK) is received. So the client application can start sending data to the server before the server is returned from the accept() call. It is also possible that ACK (and retries) is lost in network, and the server never enter to the ESTABLISHED state.

Related

error/timeout detection from socket send() call

I am troubleshooting a socket connection issue where a peer irregularly gets WSAETIMEDOUT (10060) from socket send() and I would like to understand the detail on the actual TCP level.
The actual implementation is done with Winsock blocking socket and has the following call pattern:
auto result = ::send(...);
if (result == SOCKET_ERROR)
{
auto err = ::WSAGetLastError();
// err can be WSAETIMEDOUT
}
As far as I understand the send returns immediately if the outgoing data is copied to the kernel buffer [as asked in another SO].
On the other hand, I assume that the error WSAETIMEDOUT should be caused by missing TCP ACK from the receiving side. Right? (see Steffen Ullrich's answer)
What I am not sure is if such WSAETIMEDOUT only happens when option SO_SNDTIMEO is set.
The default value of SO_SNDTIMEO is 0 for never timeout. Does it mean that an unsuccessful send would block forever? or is there any built-in/hard-coded timeout on Windows for such case?
And how TCP retransmission come into play?
I assume that unacknowledged packet would trigger retransmission. But what happen if all retransmission attempts fail? is the socket connection just stall? or WSAETIMEDOUT would be raised (independent from SO_SNDTIMEO)?
My assumption for my connection issue would be like this:
A current send operation returns SOCKET_ERROR and has error code with WSAETIMEDOUT because the desired outgoing data cannot be copied to kernel buffer which is still occupied with old outgoing data which is either lost or cannot be ACKed from socket peer in time. Is my understanding right?
Possible causes may be: intermediate router drops packets, intermediate network just gets disconnected or peer has problem to receive. What else?
What can be wrong on receiving side?
Maybe the peer application hangs and stops reading data from socket buffer. The receive buffer (on receiver side) is full and block sender to send data.
Thanks you for clarifying all my questions.
On the other hand, I assume that the error WSAETIMEDOUT should be caused by missing TCP ACK from the receiving side. Right?
No. send does not provide any information if the data are acknowledged by the other side. A timeout simply means that the data could not be stored in time into the local socket buffer - because it was already full all the time. The socket buffer stays full if data can not be delivered to the other side, i.e. if the recipient does not read the data or not fast enough.
But what happen if all retransmission attempts fail? is the socket connection just stall?
TCP sockets will not try to retransmit data forever but give up after some time and treat the connection dead - and the associated socket closed. This error will be propagated to the application within send. Thus in this case send might return with WSAETIMEDOUT (or ETIMEDOUT on UNIX systems) due to retransmission timeout even before the the send timeout of the socket (SO_SNDTIMEO) was finished.

Socket Keepalive with Periodic Send

I have a C/C++ application set up as follows:
A non-blocking TCP server socket on a linux platform
A thread which writes a small packet (less than 20 bytes) to the socket at 1 Hz
The socket is configured with keepalive enabled and with: keepidle=5, keepintvl=5 and keepcnt=3
My intention is that the keepalive mechanism should detect a physical disconnection of the network link. However, when the link is cut, I do not see the zero-length packets which should be generated by the keepalive mechanism (I am using tcpdump to monitor traffic). My impression is that what happens is this: after the cable disconnection, the application keeps making send requests and the fact that there are pending send requests prevents the keepalive mechanism from being activated. Is this explanation valid?
In order to check my explanation, I have modified my test as follows:
A non-blocking TCP server socket on a linux platform
A cyclical thread which writes a small packet (about 100 bytes) to the socket with a period of 30 seconds
The socket is configured with keepalive enabled and with: keepidle=5, keepintvl=5 and keepcnt=2
In this case, if I cut the connection, the keepalive mechanism triggers within about 15-20 seconds (which is what I would expect).
On a related point, I would like to understand the exact semantics of tcp_keepidle. This is defined as: "The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes". What exactly does 'idle' means in this context? Does it simply mean that nothing is received and nothing is put on the network; or does it mean that nothing is received and no send requests are made to the socket?

Half-Established TCP Connections

Half-Established Connections
With a half-established connection I mean a connection for which the client's call to connect() returned successfully, but the servers call to accept() didn't. This can happen the following way: The client calls connect(), resulting in a SYN packet to the server. The server goes into state SYN-RECEIVED and sends a SYN-ACK packet to the client. This causes the client to reply with ACK, go into state ESTABLISHED and return from the connect() call. If the final ACK is lost (or ignored, due to a full accept queue at the server, which is probably the more likely scenario), the server is still in state SYN-RECEIVED and the accept() does not return. Due to timeouts associated with the SYN-RECEIVED state the SYN-ACK will be resend, allowing the client to resend the ACK. If the server is able to process the ACK eventually, it will go into state ESTABLISHED as well. Otherwise it will eventually reset the connection (i.e. send a RST to the client).
You can create this scenario by starting lots of connections on a single listen socket (if you do not adjust the backlog and tcp_max_syn_backlog). See this questions and this article for more details.
Experiments
I performed several experiments (with variations of this code) and observed some behaviour I cannot explain. All experiments where performed using Erlang's gen_tcp and a current Linux, but I strongly suspect that the answers are not specific to this setup, so I tried to keep it more general here.
connect() -> wait -> send() -> receive()
My starting point was to establish a connection from the client, wait between 1 to 5 seconds, send a "Ping" message to the server and wait for the reply. With this setup I observed that the receive() failed with the error closed when I had a half-established connection. There was never an error during the send() on a half-established connection. You can find a more detailed description of this setup here.
connect() -> long wait -> send()
To see, if I can get errors while sending data on a half-established connection I waited for 4 minutes before sending data. The 4 minutes should cover all timeouts and retries associated with the half-established connection. Sending data was still possible, i.e. send() returned without error.
connect() -> receive()
Next I tested what happens if I only call receive() with a very long timeout (5 minutes). My expectation was to get an closed error for the half-established connections, as in the original experiments. Alas, nothing happend, no error was thrown and the receive eventually timed out.
My questions
Is there a common name for what I call a half-established connection?
Why is the send() on a half-established connection successful?
Why does a receive() only fail if I send data first?
Any help, especially links to detailed explanations, are welcome.
From the client's point of view, the session is fully established, it sent SYN, got back SYN/ACK and sent ACK. It is only on the server side that you have a half-established state. (Even if it gets a repeated SYN/ACK from the server, it will just re-ACK because it's in the established state.)
The send on this session works fine because as far as the client is concerned, the session is established. The sent data does not have to be acknowledged by the far side in order to succeed (the send system call is finished when the data is copied into kernel buffers) but see below.
I believe here that the send actually is generating an error on the connection (probably a RST) because the receiving system cannot ACK data on a session it has not finished establishing. My guess is that any system call referencing the socket on the client side that happens after the send plus a short delay (i.e. when the RST has had a chance to come back) will result in an error.
The receive by itself never causes an error because the client side doesn't need to do anything (I mean TCP protocol-wise) for a receive; it's just idly waiting. But once you send some data, you've forced the server side's hand: it either has completed the session establishment (in which case it can accept the data) or it must send a reset (my guess here that it can't "hold" undelivered data on a session that isn't fully established).

SO_KEEPALIVE makes which connection side to send keepalive probes?

If a socket is set as SO_KEEPALIVE with setsockopt, does it means that the side which invokes setsockopt will send keepalive probes?
So if a side which performs the following steps, it will send keepalive probes:
Create a socket with socket
Use setsockopt to set SO_KEEPALIVE
Invoke connect
Begin data transfer
And if the other side which performs the following steps, it will also send keepalive probes:
Create a socket with accept
Use setsockopt to set SO_KEEPALIVE
Begin data transfer
I have searched on Google and browseed TCP Keepalive HOWTO. But I can't find a clear answer.
Keep-alive is sent from that end where the application sets the SO_KEEPALIVE on socket. When to trigger a probe on idle line, whats the interval of the probes, the count of unacknowledged probes to trigger reset - All are set as socket options which sets SO_KEEPALIVE. The peer end application does not even know the its peer is attempting keep alive.
That's correct. Socket option affect things only the local side can do.
If a local socket is doing keep-alive and gets no response after some retries, it will reset. The other side must fend for itself.

Socket programming - API doubt

There was this question posted in class today about API design in socket programming.
Why are listen() and accept() provided as different functions and not merged into one function?
Now as far as I know, listen marks a connected socket as ready to accept connections and sets a max bound on the queue of incoming connections. If accept and listen are merged, can such a queue not be maintained?
Or is there some other explanation?
Thanks in advance.
listen() means "start listening for clients"
accept() means "accept a client, blocking until one connects if necessary"
It makes sense to separate these two, because if they were merged, then the single merged function would block. This would cause problems for non-blocking I/O programs.
For example, lets take a typical server that wants to listen for new client connections, but also monitor existing client connections for new messages. A server like this typically uses a non-blocking I/O model so that it is not blocked on any one particular socket. So it needs a way to "start listening" on the server socket without blocking on it. Once listening on the server socket has been initiated, the server socket is added to the bucket of sockets being monitored via select() (called poll() on some systems). The select() call would indicate when there is a client pending on the server socket. Then the program can then call accept() without fear of blocking on that socket.
listen(2) makes given TCP socket a server socket, i.e. creates a queue for accepting connection requests from the clients. Only the listening side port, and possibly IP address, are bound (thus you need to call bind(2) before listen(2)). accept(2) then actually takes such connection request from that queue and turns it into a connected socket (four parts required for two-way communication - source IP address, source port number, destination IP address, and destination port number - are assigned). listen(2) is called only once, while accept(2) is usually called multiple times.
Under the hood, bind assigns an address and a port to a socket descriptor. It means the port is now reserved for that socket, and therefore the system won't be able to assign the same port to another application (an exception exists, but I won't go into details here). It's also a one-time-per-socket operation.
Then listen is responsible for establishing the number of connections that can be queued for a given socket descriptor, and indicate that you're now willing to receive connections.
On the other hand, accept is used to dequeue the first connection from the queue of pending connections, and create a new socket to handle further communication through it. It may be called multiple times, and generally is. By default, this operation is blocking if there are no connections in the queue.
Now suppose you want to use an async IO mechanism (like epoll, poll, kqueue, select, etc). If listen and accept were a single API, how would you indicate that a given socket is willing to receive connections? The async mechanism needs to know you wish to handle this type of event as well.
With quite different semantics, it makes sense to have them apart.