how to start a huge amount of tcp client connections - sockets

                         Happy Spring Festival - the Chinese New Year.
I'm working on server programming, and stucked in 10055 Error.
I have a TCP client application, which can simulate a huge amount of clients.
Hearing that 65534 is the maximum value of tcp client connections of one computer,
I use Asio to implement simulation client which start 50000 asynchronous tcp connects.
pseudocode:
for (int i=0: i<50000 ; ++i)
asyn_connect(...);
Development Environment is:
windows xp , x86 , 4G memory, 4 core CPU
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort=65000
The result is:
when connects come up to 17000 , 10055 Error occur.
I tried another computer ,the Error occur at 30000 connections, better but not enough good.
( the server app runs at another computer, also using Asio ).
The question is:
How to successfully start 50000 client connections at one computer?

Cou could try to do it more blockwise:
Eg. start with 10000 connections. As soon as 5000 connections were successful start the next 5000 async_connect calls. Then repeat that until you have reached your target. That would at least put less stress on the IO completion port. If it doesn't work I would try with even smaller blocks.
However depending on where the OS runs out of memory that still won't help.
Do you start asynchronous reads directly after the connect succeeds? These will also drain the memory resources.

Related

ill effects of reducing TCP socket connection retries

I have a tcp client in embedded Linux device,to establish connection with the server while the device is in running mode.
We have a program mode, where all activities have to seize, as the system parameters will be changed.
The way I designed it was create a socket on boot and close the connection and reopen the connections after coming out of program mode.
My problem is the 'connect' , during the boot up is blocking for more than 2 minutes , and it keeps on increasing as time goes on making the system sluggish.
someone told me that, changing the 'tcp_syn_retries' will eventually reduce the hog time and I tried and found that it will reduce the blocking time to under '1 ms'
Can anyone tell me about the possible implications of this change?
Also, can you suggest me how to implement the connect in a non blocking mode ? because the one i tried didn't establish the connection.
Any comments / response will be helpful.
Edit: As TCP has a 3way handshake, this would reduce the number of SYNC requests to the TCP server during the TCP handshaking. As a result , connecting to the remote TCP servers on a slow or sluggish connection will not be reliable
This is the info i got out of from googling. how much is too much ? Any suggestions welcome.

40+ clients on home router + repeater, but the communication fails after a few hours

I have around 43 Embedded devices(2.4GHz band) connected to home router(Netgear N300). What I have found is that my Netgear router does not allow more than 32 DHCP clients to connect. Hence I put a repeater(Dlink DIR 816, dual antenna, DHCP server disabled) just to extend the router client table capacity. This was successfully accomplished as all my 43 embedded devices and 3 computers remain connected to the main router. The devices connect to a TCP server hosted on a computer which is assigned a static IP in router table. I avoid half broken pipe in TCP by sending "ALIVE" packets from server to my devices every 1 second. If the device fails to receive this packet in a 5 second window, it breaks the connection and reconnects. This setup has worked for a few months, but now I have encountered a weird problem. After a few hours of operation, my devices stop receiving these "ALIVE" packets and continuously make and break connections. Once I restart my computer everything becomes normal for next few hours. I am unable to identify what the issue might be.
Following points I have deduced:
1. It is not computer problem as I have changed the computer hosting TCP server but the issue remains
2. It is not router problem as even when I restart the router the issue does not go away
3. It is not TCP server problem as even if I restart my TCP server the connect-disconnect cycle continues
Can anybody help me about what might be causing this problem?
(All the communications with my device are limited to a few bytes)
So you're sending 43 alive packets over TCP via Wi-Fi per second using a $40 router. TCP transmission implies unicast delivery with acknowledgements - over wifi, 43 times per second. It makes me think the Wi-Fi access point hardware is the weakest point here (let alone that 43 devices is just too much for a home grade router).
To check the connection over Wi-Fi, make your embedded devices ping your server over ICMP. The devices may not have ping application available, but it's pretty straingforwand to implement by yourself.
If the AP is broken, you'll probably see a wide range of response times: from 100 ms to 3 seconds.
If this theory proves true, you may also want to revise your system's architecture. TCP is a heavy thing. It doesn't support multicast whereas UDP, IP and WiFi do (though I'm not sure about the last). Single multicasted alive message to all clients instead of 43 TCP transmissions should greatly reduce the load on your network.

tcp connection issue for unreachable server after connection

I am facing an issue with tcp connection..
I have a number of clients connected to the a remote server over tcp .
Now,If due to any issue i am not able to reach my server , after the successful establishment of the tcp connection , i do not receive any error on the client side .
On client end if i do netstat , it shows me that clients are connected the remote server , even though i am not able to ping the server.
So,now i am in the case where the server shows it is not connected to any client and on another end the client shows it is connected the server.
I have tested this for websocket also with node.js , but the same behavior persists over there also .
I have tried to google it around , but no luck .
Is there any standard solution for that ?
This is by design.
If two endpoints have a successful socket (TCP) connection between each other, but aren't sending any data, then the TCP state machines on both endpoints remains in the CONNECTED state.
Imagine if you had a shell connection open in a terminal window on your PC at work to a remote Unix machine across the Internet. You leave work that evening with the terminal window still logged in and at the shell prompt on the remote server.
Overnight, some router in between your PC and the remote computer goes out. Hours later, the router is fixed. You come into work the next day and start typing at the shell prompt. It's like the loss of connectivity never happened. How is this possible? Because neither socket on either endpoint had anything to send during the outage. Given that, there was no way that the TCP state machine was going to detect a connectivity failure - because no traffic was actually occurring. Now if you had tried to type something at the prompt during the outage, then the socket connection would eventually time out within a minute or two, and the terminal session would end.
One workaround is to to enable the SO_KEEPALIVE option on your socket. YMMV with this socket option - as this mode of TCP does not always send keep-alive messages at a rate in which you control.
A more common approach is to just have your socket send data periodically. Some protocols on top of TCP that I've worked with have their own notion of a "ping" message for this very purpose. That is, the client sends a "ping" message over the TCP socket every minute and the server responds back with "pong" or some equivalent. If neither side gets the expected ping/pong message within N minutes, then the connection, regardless of socket error state, is assumed to be dead. This approach of sending periodic messages also helps with NATs that tend to drop TCP connections for very quiet protocols when it doesn't observe traffic over a period of time.

Memory leak in long running windows service with non stopping Socket Listener

I wrote a TCP async listener using sockets. It will listen from only one client which sends ~8 messages per second for 24/7. I hosted that in a windows service, to make it alive for all day long. It works well for some time and all of sudden the memory consumption is getting increasing like anything. This is making my listener to hang and not listening the incoming data. But still windows service is showing as running. I think my program is busy in listening which is not letting the GC to clear the memory, is it correct? or Is this common for long running TCP Listener windows service.
Please suggest. I am stuck here.

Jain-sip tcp socket is not clossed even after multiple retransmission on cent os

I am using jain-sip stack on centos 64bit machine, and below is the detail steps and explanation of the issue
Register to SIP Server using jain-sip client
Register timeouts for every 360 seconds
During Reregistration process if server is down due to network issue, the outbound TCP socket is
continue to retransmit the registration request, it takes around 15 minutes to close this socket
So client is unable to register for 15 minutes even after loosing network connectivity for few seconds
This works fine with the same code on windows, in windows there will be 5 retransmission then socket gets closed, when client re tries new socket will be opened.
Please help to resolve this issue on centos
If the socket is in frozen state then the OS should take care of the recovery. Monitor with tcpdump what happens with TCP retransmissions. If you want to force JSIP to close the socket, use ((SIPTransactionStack)sipStack).closeAllSockets(); and then implement your recovery logic in the app.