tcp connection issue for unreachable server after connection - sockets

I am facing an issue with tcp connection..
I have a number of clients connected to the a remote server over tcp .
Now,If due to any issue i am not able to reach my server , after the successful establishment of the tcp connection , i do not receive any error on the client side .
On client end if i do netstat , it shows me that clients are connected the remote server , even though i am not able to ping the server.
So,now i am in the case where the server shows it is not connected to any client and on another end the client shows it is connected the server.
I have tested this for websocket also with node.js , but the same behavior persists over there also .
I have tried to google it around , but no luck .
Is there any standard solution for that ?

This is by design.
If two endpoints have a successful socket (TCP) connection between each other, but aren't sending any data, then the TCP state machines on both endpoints remains in the CONNECTED state.
Imagine if you had a shell connection open in a terminal window on your PC at work to a remote Unix machine across the Internet. You leave work that evening with the terminal window still logged in and at the shell prompt on the remote server.
Overnight, some router in between your PC and the remote computer goes out. Hours later, the router is fixed. You come into work the next day and start typing at the shell prompt. It's like the loss of connectivity never happened. How is this possible? Because neither socket on either endpoint had anything to send during the outage. Given that, there was no way that the TCP state machine was going to detect a connectivity failure - because no traffic was actually occurring. Now if you had tried to type something at the prompt during the outage, then the socket connection would eventually time out within a minute or two, and the terminal session would end.
One workaround is to to enable the SO_KEEPALIVE option on your socket. YMMV with this socket option - as this mode of TCP does not always send keep-alive messages at a rate in which you control.
A more common approach is to just have your socket send data periodically. Some protocols on top of TCP that I've worked with have their own notion of a "ping" message for this very purpose. That is, the client sends a "ping" message over the TCP socket every minute and the server responds back with "pong" or some equivalent. If neither side gets the expected ping/pong message within N minutes, then the connection, regardless of socket error state, is assumed to be dead. This approach of sending periodic messages also helps with NATs that tend to drop TCP connections for very quiet protocols when it doesn't observe traffic over a period of time.

Related

client/server connection timed out

i am having an application running inside a gateway,
this application is a coap-server coded using the libcoap library
the server is running perfectly fine, the ip:port is tested using different commands such as nmap , telnet and others, each time it shows that the port is open and the connection is a success.
My problem is that there's no response from the server, wireshark is showing that the requests are being re-transmitted until timeout.
After some research, i thought that the gateway doesn't support NAT loopback, so i tried sending requests from another connection (i used my phones 4G). I even disabled firewall on the gateway too, But no success either.
UPDATE:
after some digging, i managed to receive a response from the server, but only when using TCP connection, the UDP still sends requests until timeout,
from a logical point of view, what may be the problem here ?
note: UDP is a must in this application so i cant just ignore it.

What causes "Transport endpoint is not connected" in ZeroMQ?

I am working on a product which uses ZeroMQ (version 4.0.1).
The server and client communicate based on ZeroMQ ROUTER-socket.
To read socket events, server and client also create socket-monitor sockets (PAIR). There are three ports on which server binds and listens. Out of these three ports, one port is in a non-secured mode. Other two ports are using md5-authentication.
The issue I am facing is that, both the server and the client spontaneously receive socket disconnect for one of the secure port sockets (please see a log below). I have checked multiple times that server and client both have L3 reachability to each other.
What else I should check for?
What really triggers this error scenario?
zmq_print_callback:ZmQ: int zmq::stream_engine_t::read(void*, size_t):923
Stream engine recv():
TCP socket (187) to unknown:0 was disconnected
with error 107 [Transport endpoint is not connected]
Below sequence of events can trigger this error on server
Server receives ACCEPTED event for clientY and gets FD1.
Link-flap/network issue happens and clientY disconnects but server does not receive this disconnect.
Network recovers and clientY connects back to server.
Server receives ACCEPTED event for clientY and gets FD2. However, packets sent to this sockets does not go out of the server.
After 1 min or so, clientY receives "Transport endpoint is not connected error" for FD1.
Application can use this to treat as client disconnect.

Why does the server application send RST after having gone through SYN->SYN,ACK->ACK?

I have a system with server/client applications. The client will send in socket connection request and the server will accept the socket connection when it's working correctly. However, in some situations (most likely due to ungraceful socket disconnection like system shutdown on client side or crash), the client will not be able to reconnect to the server application. The Wireshark capture shows the client will continue to try to connect; but after going through SYN->SYN,ACK->ACK, the server application will send RST. At this point, sometimes the netstat -an will show the connection is in CLOSE_WAIT state and other times would not show this connection. The capture shows 'Acknowledgment Number: Broken TCP. The ackowledge field is nonzero while the ACK flag is not set.
My questions is why the server application would send this RST?

Keep TCP connection on permanently with ESP8266 TCP client

I am using the wifi chip ESP8266 with SMING framework.
I am able to establish a TCP connection as a client to a remote server. The code for initiating client connection to server is simple.
tcpClient.connect(SERVER_HOST, SERVER_PORT);
Unfortunately, the connection will close after idling for some time. I would like to keep this connection open forever permanently. How can this be done?
You will actually need to monitor the connection state and reconnect it if it failed. Your protocol on top of it will need to keep track of what got actually received by the other side and retransmit it.
In any wireless network your link may go down for one reason or another and if you need to maintain a long term connection you will need to have it in a layer above TCP itself.
TCP will continue to be connected as long as both sides allow for it (none of them disconnected) and there are no errors on the link, in this case sending keepalives may actually cause disconnects since the keepalive may fail at one time but the link could recover and if you didn't have the keepalive the link would have stayed up.

TCP connection between client and server gone wrong

I establish a TCP connection between my server and client which runs on the same host. We gather and read from the server or say source in our case continuously.
We read data on say 3 different ports.
Once the source stops publishing data or gets restarted , the server/source is not able to publish data again on the same port saying port is already bind. The reason given is that client still has established connection on those ports.
I wanted to know what could be the probable reasons of this ? Can there be issue since client is already listening on these ports and trying to reconnect again and again because we try this reconnection mechanism. I am more looking for reason on source side as the same code in client sides when source and client are on different host and not the same host works perfectly fine for us.
Edit:-
I found this while going through various article .
On the question of using SO_LINGER to send a RST on close to avoid the TIME_WAIT state: I've been having some problems with router access servers (names withheld to protect the guilty) that have problems dealing with back-to-back connections on a modem dedicated to a specific channel. What they do is let go of the connection, accept another call, attempt to connect to a well-known socket on a host, and the host refuses the connection because there is a connection in TIME_WAIT state involving the well-known socket. (Stevens' book TCP Illustrated, Vol 1 discusses this problem in more detail.) In order to avoid the connection-refused problem, I've had to install an option to do reset-on-close in the server when the server initiates the disconnection.
Link to source:- http://developerweb.net/viewtopic.php?id=2941
I guess i am facing the same problem: 'attempt to connect to a well-known socket on a host, and the host refuses the connection'. Probable fix mention is 'option to do reset-on-close in the server when the server initiates the disconnection'. Now how do I do that ?
Set the SO_REUSEADDR option on the server socket before you bind it and call listen().
EDIT The suggestion to fiddle around with SO_LINGER option is worthless and dangerous to your data in flight. Just use SO_RESUSEADDR.
You need to close the socket bound to that port before you restart/shutdown the server!
http://www.gnu.org/software/libc/manual/html_node/Closing-a-Socket.html
Also, there's a timeout time, which I think is 4 minutes, so if you created a TCP socket and close it, you may still have to wait 4 minutes until it closes.
You can use netstat to see all the bound ports on your system. If you shut down your server, or close your server after forking on connect, you may have zombie processes which are bound to certain ports that do not close and remain active, and thus, you can't rebind to the same port. Show some code.