I'm working on a legacy VB6 app that uses sockets to communicate to various devices.
On a 2012 system, we are noticing the time between calling winSock.Connect() to the connection event being fired is holding at about 9 seconds, across multiple systems on different domains.
On a 2008 R2 or lower system, it's taking 1-3 milliseconds between the call and the event being fired.
Has anyone run into this before, or has any ideas on what could be causing this?
I've done some snooping with Wireshark, and found that the first few TCP transmissions are not connecting and being retransmitted, not sure if that will help.
I ended up finding the answer to this after some extensive digging.
Starting in Windows Server 2012, Microsoft has enabled an extension of TCP called Explicit Congestion Notification (ECN). This allows end-to-end notification of network congestion with the loss of packets. The way this is enabled on a TCP packet is via a flag, which is defined in the definition of ECN (RFC 3168(2001)).
What was happening for me was that the devices my application talks to are older, and don't support the ECN flag. When they received packets with that flag enabled, they wouldn't acknowledge the transmission, leading to a timeout from the server. After two failed transmissions, it looks like Windows shuts off the ECN flag, and the device acknowledged the packets.
I disabled ECN running the following command from an Administrator Command Prompt:
netsh interface tcp set global ecncapability=disabled
There is nothing particularly "special" about the Winsock control, which is just a thin wrapper on top of the API. The only thing of note really is that it is 32-bit and must run inside WOW64.
You're probably doing something funny or all 32-bit programs using the winsock API the same way should see the same issue.
Perhaps you have a name resolution issue on this server?
Related
Summary:
I am guessing that the issue here is something to do with how Windows and Linux handle TCP connections, or sockets, but I have no idea what it is. I'm initiating a TCP connection to a piece of custom hardware that someone else has developed and I am trying to understand its behaviour. In doing so, I've created a .Net core 2.2 application; run on a Windows system, I can initiate the connection successfully, but on Linux (latest Raspbian), I cannot.
It appears that it may be because Linux systems do not try to retry/retransmit a SYN after a RST, whereas Windows ones do - and this behaviour seems key to how this peculiar piece of hardware works..
Background:
We have a black box piece of hardware that can be controlled and queried over a network, by using a manufacturer-supplied Windows application. Data is unencrypted and requires no authentication to connect to it and the application has some other issues. Ultimately, we want to be able to relay data from it to another system, so we decided to make our own application.
I've spent quite a long time trying to understand the packet format and have created a library, which targets .net core 2.2, that can be used to successfully communicate with this kit. In doing so, I discovered that the device seems to require a kind of "request to connect" command to be sent, via UDP. Straight afterwards, I am able to initiate a TCP connection on port 16000, although the first TCP attempt always results in a RST,ACK being returned - so a second attempt needs to be made.
What I've developed works absolutely fine on both Windows (x86) and Linux (Raspberry Pi/ARM) systems and I can send and receive data. However, when run on the Raspbian system, there seems to be problems when initiating the TCP connection. I could have sworn that we had it working absolutely fine on a previous build, but none of the previous commits seem to work - so it may well be a system/kernel update that has changed something.
The issue:
When initiating a TCP connection to this device, it will - straight away - reset the connection. It does this even with the manufacturer-supplied software, which itself then immediately re-attempts the connection again and it succeeds; so this kind of reset-once-then-it-works-the-second-time behaviour in itself isn't a "problem" that I have any control over.
What I am trying to understand is why a Windows system immediately re-attempts the connection through a retransmission...
..but the Linux system just gives up after one attempt (this is the end of the packet capture..)
To prove it is not an application-specific issue, I've tried using ncat/netcat on both the Windows system and the Raspbian system, as well as a Kali system on a separate laptop to prove it isn't an ARM/Raspberry issue. Since the UDP "request" hasn't been sent, the connection will never succeed anyway, but this simply demonstrates different behaviour between the OSes.
Linux versions look pretty much the same as above, whereby they send a single packet that gets reset - whereas the Windows attempt demonstrates the multiple retransmissions..
So, does anyone have any answer for this behaviour difference? I am guessing it isn't a .net core specific issue, but is there any way I can set socket options to attempt a retransmission? Or can it be set at the OS level with systemctl commands or something? I did try and see if there are any SocketOptionNames, in .net, that look like they'd control attempts/retries, as this answer had me wonder, but no luck so far.
If anyone has any suggestions as to how to better align this behaviour across platforms, or can explain the reason for this difference is at all, I would very much appreciate it!
Nice find! According to this, Windows´ TCP will retry a connection if it receives a RST/ACK from the remote host after sending a SYN:
... Upon receiving the ACK/RST client from the target host, the client determines that there is indeed no service listening there. In the Microsoft Winsock implementation of TCP, a pending connection will keep attempting to issue SYN packets until a maximum retry value is reached (set in the registry, this value defaults to 3 extra times)...
The value used to limit those retries is set in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpMaxConnectRetransmissions according to the same article. At least in Win10 Pro it doesn´t seem to be present by default.
Although this is a conveniece for Windows machines, an application still should determine its own criteria for handling a failed connect attempt IMO (i. e number of attempts, timeouts etc).
Anyhow, as I said, surprising fact! Living and learning I guess ...
Cristian.
I am generating UDP packets on a 100 multicast groups on one VM Ubuntu 16.04 machine and subscribe to those groups on the other VM Ubuntu 16.04 machine. Both are on a HP server run by Hyper-V manager. The problem is that my application only receives 2 out of 100 groups. However, when Wireshark is capturing, the application starts receiving all messages.
I found several other similar questions like this one, where it explains that because Wireshark is running in promiscuous mode, it allows all packets to get through (through what?), and this explains why my application starts "seeing" them too. Thus, changing the Ethernet interface configuration to promiscuous mode allows the application to receive all the messages without running the Wireshark.
But what is the problem with the other packets that are not normally received? I tried to cross-verify the hex-dump of the "good" and "bad" messages and they don't seem to be different. The check sums for on the IP and UDP levels are correct. What else could be the problem?
Multicast ip range 239.1.4.1-100
Destination port 50003
Source port range ~33000 - 60900
firewall is disabled
EDIT:
It looks like when the application is subscribed to only 8 multicast groups, it works fine, however, if subscribed to more than 8, it receives only 2 (if they end on .7 or .8) or none, as described above. So, I would assume that the packets are correct. Could the problem be in the network settings? Or the application itself - need to find the bug in the script I did not write.
EDIT2:
I installed the ISO image on the other machine (Virtual box instead of HP Windows Server) and it works as it should. Thus, I assume my application works fine and all the ubuntu OS configurations are correct. Now I put all the blame on the Virtual Manager/settings. Any ideas?
It sounds as if you didn't tell the kernel about them.
See http://tldp.org/HOWTO/Multicast-HOWTO-6.html
You have to use setsockopt with IP_ADD_MEMBERSHIP. And be sure to use the correct values for your local interfaces.
I am trying to run the ZeroMQ's local_thr/remote_thr on SDP (infiniband) compiled on MSVS2012. But it's not connecting.
On IPoIB it is working properly. Operating System is Windows Server 2008 R2. On further investigation I found out that select() calls within ZeroMQ libraries are not working for Asynch accept() and send().
I also created a test application using BSD socket API and used select to accept connection on non blocking socket. But selectis not receiving the event for accept.
Please let me know what can be done to troubleshoot this issue.
I have a small portable tool which connects to around 150 servers at diverse locations to get a quick status check from them. It is important to get the status for all servers back to the user relatively quickly so the tool connects to the servers in parallel using non-blocking connect, and uses select() to determine when each socket is ready. The use of select() is fairly straightforward, and the tool is failure mature now and works well on Linux. It runs on windows xp, but connections to the vast majority of the servers out there do not complete. The tool staggers the calls to connect to avoid creating what looks like a SYN flood. It connects to one server about 100 msecs. I also have a check in place to ensure FD_SETSIZE is not violated. I have anecdotal evidence from someone else that the behaviour is better on later windows versions, but have not been able to verify.
I have used WinDump to verify that the syn packets are being sent, and I can see ack packets coming back, but select() keeps returning zero, and the code simply can't connect to most of the servers that do exist, and I can connect to just fine with the same code on Linux.
Has anyone seen and or solved any similar issues with many non-blocking connects and select on Windows XP?
After another day or so of digging I seem to have found the answer. On windows XP SP2 there is a limit of 10 concurrent connecting sockets system wide. If 10 or more half open connections exist, a System event is logged noting that the limit has been reached, and new connecting sockets are throttled silently. The System Event number is 4226.
I have fixed my code by adding version checks for Windows XP, and throttled to less than 10 connections on those systems. So far I have no reports of other versions being affected.
I have a server and client program on the same machine. The server is part of an application- it can start and stop arbitrarily. When the server is up, I want the client to connect to the server's listening socket. There are win32 functions to wait on file system changes (ReadDirectoryChangesW) and registry changes (RegNotifyChangeKeyValue)- is there anything similar for network changes? I'd rather not have the client constantly polling.
There is no such Win32 API, however this can be easily accomplished by using an event. The client would wait on that event to be signaled. The server would signal the event when it starts up.
The related API that you will need to use is CreateEvent, OpenEvent, SetEvent, ResetEvent and WaitForSingleObject.
If your server will run as a service, then for Vista and up it will run in session 0 isolation. That means you will need to use an event with a name prefixed with "Global\".
You probably do have a good reason for needing this, but before you implement this please consider:
Is there some reason you need a connect right away? I see this as a non issue because if you perform an action in the client, you can at that point make a new server connection.
Is the server starting and stopping more frequently than the client? You could switch roles of who listens/connects
Consider using some form of Windows synchronization, such as semaphore. The client can wait on the synchronization primitive and the server can signal it when it starts up.
Personally I'd use a UDP broadcast from the server and have the "client" listening for it. The server could broadcast a UDP packet every X period whilst running and when the client gets one, if it's not already connected, it could connect.
This has the advantage that you can move the client onto a different machine without any issues (and since the main connection from client to server is sockets already it would be a pity to tie the client and server to the same machine simply because you selected a local IPC method for the initial bootstrap).