ill effects of reducing TCP socket connection retries - sockets

I have a tcp client in embedded Linux device,to establish connection with the server while the device is in running mode.
We have a program mode, where all activities have to seize, as the system parameters will be changed.
The way I designed it was create a socket on boot and close the connection and reopen the connections after coming out of program mode.
My problem is the 'connect' , during the boot up is blocking for more than 2 minutes , and it keeps on increasing as time goes on making the system sluggish.
someone told me that, changing the 'tcp_syn_retries' will eventually reduce the hog time and I tried and found that it will reduce the blocking time to under '1 ms'
Can anyone tell me about the possible implications of this change?
Also, can you suggest me how to implement the connect in a non blocking mode ? because the one i tried didn't establish the connection.
Any comments / response will be helpful.
Edit: As TCP has a 3way handshake, this would reduce the number of SYNC requests to the TCP server during the TCP handshaking. As a result , connecting to the remote TCP servers on a slow or sluggish connection will not be reliable
This is the info i got out of from googling. how much is too much ? Any suggestions welcome.

Related

tcp connection issue for unreachable server after connection

I am facing an issue with tcp connection..
I have a number of clients connected to the a remote server over tcp .
Now,If due to any issue i am not able to reach my server , after the successful establishment of the tcp connection , i do not receive any error on the client side .
On client end if i do netstat , it shows me that clients are connected the remote server , even though i am not able to ping the server.
So,now i am in the case where the server shows it is not connected to any client and on another end the client shows it is connected the server.
I have tested this for websocket also with node.js , but the same behavior persists over there also .
I have tried to google it around , but no luck .
Is there any standard solution for that ?
This is by design.
If two endpoints have a successful socket (TCP) connection between each other, but aren't sending any data, then the TCP state machines on both endpoints remains in the CONNECTED state.
Imagine if you had a shell connection open in a terminal window on your PC at work to a remote Unix machine across the Internet. You leave work that evening with the terminal window still logged in and at the shell prompt on the remote server.
Overnight, some router in between your PC and the remote computer goes out. Hours later, the router is fixed. You come into work the next day and start typing at the shell prompt. It's like the loss of connectivity never happened. How is this possible? Because neither socket on either endpoint had anything to send during the outage. Given that, there was no way that the TCP state machine was going to detect a connectivity failure - because no traffic was actually occurring. Now if you had tried to type something at the prompt during the outage, then the socket connection would eventually time out within a minute or two, and the terminal session would end.
One workaround is to to enable the SO_KEEPALIVE option on your socket. YMMV with this socket option - as this mode of TCP does not always send keep-alive messages at a rate in which you control.
A more common approach is to just have your socket send data periodically. Some protocols on top of TCP that I've worked with have their own notion of a "ping" message for this very purpose. That is, the client sends a "ping" message over the TCP socket every minute and the server responds back with "pong" or some equivalent. If neither side gets the expected ping/pong message within N minutes, then the connection, regardless of socket error state, is assumed to be dead. This approach of sending periodic messages also helps with NATs that tend to drop TCP connections for very quiet protocols when it doesn't observe traffic over a period of time.

TCP connection between client and server gone wrong

I establish a TCP connection between my server and client which runs on the same host. We gather and read from the server or say source in our case continuously.
We read data on say 3 different ports.
Once the source stops publishing data or gets restarted , the server/source is not able to publish data again on the same port saying port is already bind. The reason given is that client still has established connection on those ports.
I wanted to know what could be the probable reasons of this ? Can there be issue since client is already listening on these ports and trying to reconnect again and again because we try this reconnection mechanism. I am more looking for reason on source side as the same code in client sides when source and client are on different host and not the same host works perfectly fine for us.
Edit:-
I found this while going through various article .
On the question of using SO_LINGER to send a RST on close to avoid the TIME_WAIT state: I've been having some problems with router access servers (names withheld to protect the guilty) that have problems dealing with back-to-back connections on a modem dedicated to a specific channel. What they do is let go of the connection, accept another call, attempt to connect to a well-known socket on a host, and the host refuses the connection because there is a connection in TIME_WAIT state involving the well-known socket. (Stevens' book TCP Illustrated, Vol 1 discusses this problem in more detail.) In order to avoid the connection-refused problem, I've had to install an option to do reset-on-close in the server when the server initiates the disconnection.
Link to source:- http://developerweb.net/viewtopic.php?id=2941
I guess i am facing the same problem: 'attempt to connect to a well-known socket on a host, and the host refuses the connection'. Probable fix mention is 'option to do reset-on-close in the server when the server initiates the disconnection'. Now how do I do that ?
Set the SO_REUSEADDR option on the server socket before you bind it and call listen().
EDIT The suggestion to fiddle around with SO_LINGER option is worthless and dangerous to your data in flight. Just use SO_RESUSEADDR.
You need to close the socket bound to that port before you restart/shutdown the server!
http://www.gnu.org/software/libc/manual/html_node/Closing-a-Socket.html
Also, there's a timeout time, which I think is 4 minutes, so if you created a TCP socket and close it, you may still have to wait 4 minutes until it closes.
You can use netstat to see all the bound ports on your system. If you shut down your server, or close your server after forking on connect, you may have zombie processes which are bound to certain ports that do not close and remain active, and thus, you can't rebind to the same port. Show some code.

Jain-sip tcp socket is not clossed even after multiple retransmission on cent os

I am using jain-sip stack on centos 64bit machine, and below is the detail steps and explanation of the issue
Register to SIP Server using jain-sip client
Register timeouts for every 360 seconds
During Reregistration process if server is down due to network issue, the outbound TCP socket is
continue to retransmit the registration request, it takes around 15 minutes to close this socket
So client is unable to register for 15 minutes even after loosing network connectivity for few seconds
This works fine with the same code on windows, in windows there will be 5 retransmission then socket gets closed, when client re tries new socket will be opened.
Please help to resolve this issue on centos
If the socket is in frozen state then the OS should take care of the recovery. Monitor with tcpdump what happens with TCP retransmissions. If you want to force JSIP to close the socket, use ((SIPTransactionStack)sipStack).closeAllSockets(); and then implement your recovery logic in the app.

how to start a huge amount of tcp client connections

                         Happy Spring Festival - the Chinese New Year.
I'm working on server programming, and stucked in 10055 Error.
I have a TCP client application, which can simulate a huge amount of clients.
Hearing that 65534 is the maximum value of tcp client connections of one computer,
I use Asio to implement simulation client which start 50000 asynchronous tcp connects.
pseudocode:
for (int i=0: i<50000 ; ++i)
asyn_connect(...);
Development Environment is:
windows xp , x86 , 4G memory, 4 core CPU
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort=65000
The result is:
when connects come up to 17000 , 10055 Error occur.
I tried another computer ,the Error occur at 30000 connections, better but not enough good.
( the server app runs at another computer, also using Asio ).
The question is:
How to successfully start 50000 client connections at one computer?
Cou could try to do it more blockwise:
Eg. start with 10000 connections. As soon as 5000 connections were successful start the next 5000 async_connect calls. Then repeat that until you have reached your target. That would at least put less stress on the IO completion port. If it doesn't work I would try with even smaller blocks.
However depending on where the OS runs out of memory that still won't help.
Do you start asynchronous reads directly after the connect succeeds? These will also drain the memory resources.

Is application level heartbeating preferable to TCP keepalives?

Is there a reason why I should use application level heartbeating instead of TCP keepalives to detect stale connections, given that only Windows and Linux machines are involved in our setup?
It seems that the TCP keepalive parameters can't be set on a per-socket basis on Windows or OSX, that's why.
Edit: All parameters except the number of keepalive retransmissions can in fact be set on Windows (2000 onwards) too: http://msdn.microsoft.com/en-us/library/windows/desktop/dd877220%28v=vs.85%29.aspx
I was trying to do this with zeromq, but it just seems that zeromq does not support this on Windows?
From John Jefferies response : ZMQ Pattern Dealer/Router HeartBeating
"Heartbeating isn't necessary to keep the connection alive (there is a ZMQ_TCP_KEEPALIVE socket option for TCP sockets). Instead, heartbeating is required for both sides to know that the other side is still active. If either side does detect that the other is inactive, it can take alternative action."
TCP keepalives serve an entirely different function from application level heartbeating. A keepalive does just that, it keeps the TCP session active rather than allow it to time out after long periods of silence. This is important and good, and (if appropriate) you should use it in your application. But a TCP session dying due to inactivity is only one way that the connection can be severed between a pair of ZMQ sockets. One endpoint could lose power for 90 minutes and be offline, TCP keepalives wouldn't do squat for you in that scenario.
Application level heartbeating is not intended to keep the TCP session active, expecting you to rely on keepalives for that function if possible. Heartbeating is there to tell your application that the connection is in fact still active and the peer socket is still functioning properly. This would tell you that your peer is unavailable so you can behave appropriately, by caching messages, throwing an exception, sending an alert, etc etc etc.
In short:
a TCP keepalive is intended to keep the connection alive (but doesn't protect against all disconnection scenarios)
an app-level heartbeat is intended to tell your application if the connection is alive