Reliable Message Delievery XMPP - sockets

In some of the cases server is not able to determine the state of the client at socket level, some of the time, user is not connected to the server because of connection drop the socket return success. Some of the message get drop, how can we overcome this issue.
Client c1 send message to Server S1 server send message to the Client2 c2, we can only determine the state of client connection on socket send if it returns success we can assume that the client is alive and the message has been successfully sent. However some time message get drop because the connection was no more and socket is not able to predict? how can overcome this problem.
socket m_Sock;
m_Sock.BeginSend(byteData, 0, byteData.Length, 0, out errCode, SendCallback, null
if error code is success we assume that the message has been successfully sent to this client how ever it does not notify immediately after the connection has been drop, after copule of seconds expected error code get return in this interval we lost messages
if(errCode == SocketError.Success)

Related

error/timeout detection from socket send() call

I am troubleshooting a socket connection issue where a peer irregularly gets WSAETIMEDOUT (10060) from socket send() and I would like to understand the detail on the actual TCP level.
The actual implementation is done with Winsock blocking socket and has the following call pattern:
auto result = ::send(...);
if (result == SOCKET_ERROR)
{
auto err = ::WSAGetLastError();
// err can be WSAETIMEDOUT
}
As far as I understand the send returns immediately if the outgoing data is copied to the kernel buffer [as asked in another SO].
On the other hand, I assume that the error WSAETIMEDOUT should be caused by missing TCP ACK from the receiving side. Right? (see Steffen Ullrich's answer)
What I am not sure is if such WSAETIMEDOUT only happens when option SO_SNDTIMEO is set.
The default value of SO_SNDTIMEO is 0 for never timeout. Does it mean that an unsuccessful send would block forever? or is there any built-in/hard-coded timeout on Windows for such case?
And how TCP retransmission come into play?
I assume that unacknowledged packet would trigger retransmission. But what happen if all retransmission attempts fail? is the socket connection just stall? or WSAETIMEDOUT would be raised (independent from SO_SNDTIMEO)?
My assumption for my connection issue would be like this:
A current send operation returns SOCKET_ERROR and has error code with WSAETIMEDOUT because the desired outgoing data cannot be copied to kernel buffer which is still occupied with old outgoing data which is either lost or cannot be ACKed from socket peer in time. Is my understanding right?
Possible causes may be: intermediate router drops packets, intermediate network just gets disconnected or peer has problem to receive. What else?
What can be wrong on receiving side?
Maybe the peer application hangs and stops reading data from socket buffer. The receive buffer (on receiver side) is full and block sender to send data.
Thanks you for clarifying all my questions.
On the other hand, I assume that the error WSAETIMEDOUT should be caused by missing TCP ACK from the receiving side. Right?
No. send does not provide any information if the data are acknowledged by the other side. A timeout simply means that the data could not be stored in time into the local socket buffer - because it was already full all the time. The socket buffer stays full if data can not be delivered to the other side, i.e. if the recipient does not read the data or not fast enough.
But what happen if all retransmission attempts fail? is the socket connection just stall?
TCP sockets will not try to retransmit data forever but give up after some time and treat the connection dead - and the associated socket closed. This error will be propagated to the application within send. Thus in this case send might return with WSAETIMEDOUT (or ETIMEDOUT on UNIX systems) due to retransmission timeout even before the the send timeout of the socket (SO_SNDTIMEO) was finished.

What is the callback mechanism used when non-blocking version of connect() call is used in socket programming?

In socket programming, let us say the server is listen for TCP connection on a particular port.
Now, on the client side, i create a socket and call connect() to establish a connection with the server. Note: the connect() API is called in non-blocking mode.
Since it is an non-blocking call and there is no callback method being passed when calling connect() API to be notified on completion of the event. So, i want to know HOW does the client gets to know when the TCP connection has been established successfully. So that it can initiate the data transfer?
Secondly part of the question - WHEN. Basically, for the TCP connection to be established, there should be 3 way handshake happening as below-
I assume, when the connect() API is called from client, SYNC packet is being sent from the client and connection establishment process is initiated. Since the connect() API is called in a non-blocking mode, it just initiates the connection by requesting the kernel and returns back the function call. And once the connection is successfully established the kernel has to notify the client saying - it is good to go and transfer the data. My confusion here is, the last phase is the 3 way handshake is completing at the server side (after the ACK packet is reached at the server), so how does the kernel at the client side be aware of the completion of the connection process?
Or is it like the kernel will notify the client process of the establishment of the connection as soon as it receives the SYNC+ACK from Server process?
There is no callback mechanism. Callback mechanisms are associated with asynchronous I/O, in some APIs. Not with non-blocking I/O. And no, they aren't the same thing.
When a non-blocking connect() doesn't complete immediately, as it usually doesn't, otherwise what would be the point, it returns -1 with errno set to EINPROGRESS. You should then select() or poll() or epoll() the socket for writeability, an so on as described in man connect. This is not, to restate the point, a callback mechanism. It is in fact a polling mechanism.
When non-blocking socket is used, connect() will usually return EINPROGRESS.
In that case, you can use select() function for waiting for connection establishment:
Set the socket to the write-set of the select() call.
When the connection is established/failed, select() will return and the write-set indicates that your socket is writable. Then you can call getsockopt() for getting result of the non blocking connect:
if (getsockopt(socket, SOL_SOCKET, SO_ERROR, &error, &len) != -1)
...
Blocking TCP connect() returns when the client is received SYN-ACK.
And similar way with non-blocking TCP socket: select() returns when SYN-ACK is received:
There is little bit inaccuracy in the picture for making it more clear. I tried to illustrate slowness of the network by placing SYN after select call, and ACK after select return.
TCP-state of the client is change to ESTABLISHED when SYN-ACK is received. TCP-state of the server is change to ESTABLISHED when the ACK (of SYN-ACK) is received. So the client application can start sending data to the server before the server is returned from the accept() call. It is also possible that ACK (and retries) is lost in network, and the server never enter to the ESTABLISHED state.

Half-Established TCP Connections

Half-Established Connections
With a half-established connection I mean a connection for which the client's call to connect() returned successfully, but the servers call to accept() didn't. This can happen the following way: The client calls connect(), resulting in a SYN packet to the server. The server goes into state SYN-RECEIVED and sends a SYN-ACK packet to the client. This causes the client to reply with ACK, go into state ESTABLISHED and return from the connect() call. If the final ACK is lost (or ignored, due to a full accept queue at the server, which is probably the more likely scenario), the server is still in state SYN-RECEIVED and the accept() does not return. Due to timeouts associated with the SYN-RECEIVED state the SYN-ACK will be resend, allowing the client to resend the ACK. If the server is able to process the ACK eventually, it will go into state ESTABLISHED as well. Otherwise it will eventually reset the connection (i.e. send a RST to the client).
You can create this scenario by starting lots of connections on a single listen socket (if you do not adjust the backlog and tcp_max_syn_backlog). See this questions and this article for more details.
Experiments
I performed several experiments (with variations of this code) and observed some behaviour I cannot explain. All experiments where performed using Erlang's gen_tcp and a current Linux, but I strongly suspect that the answers are not specific to this setup, so I tried to keep it more general here.
connect() -> wait -> send() -> receive()
My starting point was to establish a connection from the client, wait between 1 to 5 seconds, send a "Ping" message to the server and wait for the reply. With this setup I observed that the receive() failed with the error closed when I had a half-established connection. There was never an error during the send() on a half-established connection. You can find a more detailed description of this setup here.
connect() -> long wait -> send()
To see, if I can get errors while sending data on a half-established connection I waited for 4 minutes before sending data. The 4 minutes should cover all timeouts and retries associated with the half-established connection. Sending data was still possible, i.e. send() returned without error.
connect() -> receive()
Next I tested what happens if I only call receive() with a very long timeout (5 minutes). My expectation was to get an closed error for the half-established connections, as in the original experiments. Alas, nothing happend, no error was thrown and the receive eventually timed out.
My questions
Is there a common name for what I call a half-established connection?
Why is the send() on a half-established connection successful?
Why does a receive() only fail if I send data first?
Any help, especially links to detailed explanations, are welcome.
From the client's point of view, the session is fully established, it sent SYN, got back SYN/ACK and sent ACK. It is only on the server side that you have a half-established state. (Even if it gets a repeated SYN/ACK from the server, it will just re-ACK because it's in the established state.)
The send on this session works fine because as far as the client is concerned, the session is established. The sent data does not have to be acknowledged by the far side in order to succeed (the send system call is finished when the data is copied into kernel buffers) but see below.
I believe here that the send actually is generating an error on the connection (probably a RST) because the receiving system cannot ACK data on a session it has not finished establishing. My guess is that any system call referencing the socket on the client side that happens after the send plus a short delay (i.e. when the RST has had a chance to come back) will result in an error.
The receive by itself never causes an error because the client side doesn't need to do anything (I mean TCP protocol-wise) for a receive; it's just idly waiting. But once you send some data, you've forced the server side's hand: it either has completed the session establishment (in which case it can accept the data) or it must send a reset (my guess here that it can't "hold" undelivered data on a session that isn't fully established).

tcp keep alive basic query

I have a tcp socket for my app. TCP keep alive is enabled with a 10 seconds freq.
In addition, I also have msgs flowing between the app and the server every 1 sec to get status.
So, since there are msgs flowing anyway over the socket at a faster rate, there will be no keep alives flowing at all.
Now,consider this scenario: The remote server is down, so the periodic msg send (that happens every 1 sec) fails 3-5 times in a row. I dont think by enabling tcp keep alives, we can detect that the socket is broken, can we?
Do we have to then build logic in our code to ensure that if this periodic msg fails a certain number of times in a row, the other end is to be assumed dead?
Let me know.
In your application it makes no sense to enable keep alive.
Keep alive is for applications that have an open connection, and don't use it all the time, you are using it all the time so keep alive is not needed.
When you send something and the other end has crashed, TCP on the client will send all retransmissions with an increasing timeout. Finally if you have a blocking socket, you well get an error indication on the send operation where you know that you have to close the socket and retry a connection.
An error indication is where the return code of the socket operation is < 0.
I don't know the value of these timeouts by heart but it can go up to a minute or longer.
When the server is gracefully shutdown, meaning it will close its send of the socket, you will get that information by receiving 0 bytes on your receiving socket.
You might wanna check out my answer of yesterday as well :
Reset TCP connection if server closes/crashes mid connection
No, you don't need to assume anything. The connection will break either because a send will time out or a keep alive will time out. Either way, the connection will break and you'll start getting errors on reads and writes.

How to send Udp packet 2 or 3 times after failed received packet in java?

I have send Udp packet to the Server. If the server is OK then I can receive the response packet nicely but when the server is down then I did not get any response packet. Anybody can help me that how can I send my packet to server multiple time when fail to receive the response packet. Moreover, want to keep alive the connection with server. Thanks in advance.
Well,
After you've sent the packet, you wait for the ACK (response) package from the server. You could use the DatagramSocket.setSoTimeout() to an appropriate time, if you get the Timeout Exception increment a counter, if that counter is less than 2/3 send the packet again and repeat these steps. If the counter is bigger than 2/3 the server is down, just quit.
According to Java documentation, receive will block until a package is received or a timeout has expired.
To keep the connection alive you need to implement a ping-pong. In another thread of your program, you send a Keep-Alive packet (any small packet will do) and wait for a response. I suggest using a different port number for this purpose so that these packets won't mess up with the normal data packets. These packets can be send every 2 seconds o 2 minutes depends on your particular needs. When the thread receives the ACK packet it will update a private time variable with the current time, for example:
lastTimeSeen = System.currentTimeMillis();
put a method in your thread class to access the value of that variable.