TCP retransmission on RST - Different socket behaviour on Windows and Linux? - sockets

Summary:
I am guessing that the issue here is something to do with how Windows and Linux handle TCP connections, or sockets, but I have no idea what it is. I'm initiating a TCP connection to a piece of custom hardware that someone else has developed and I am trying to understand its behaviour. In doing so, I've created a .Net core 2.2 application; run on a Windows system, I can initiate the connection successfully, but on Linux (latest Raspbian), I cannot.
It appears that it may be because Linux systems do not try to retry/retransmit a SYN after a RST, whereas Windows ones do - and this behaviour seems key to how this peculiar piece of hardware works..
Background:
We have a black box piece of hardware that can be controlled and queried over a network, by using a manufacturer-supplied Windows application. Data is unencrypted and requires no authentication to connect to it and the application has some other issues. Ultimately, we want to be able to relay data from it to another system, so we decided to make our own application.
I've spent quite a long time trying to understand the packet format and have created a library, which targets .net core 2.2, that can be used to successfully communicate with this kit. In doing so, I discovered that the device seems to require a kind of "request to connect" command to be sent, via UDP. Straight afterwards, I am able to initiate a TCP connection on port 16000, although the first TCP attempt always results in a RST,ACK being returned - so a second attempt needs to be made.
What I've developed works absolutely fine on both Windows (x86) and Linux (Raspberry Pi/ARM) systems and I can send and receive data. However, when run on the Raspbian system, there seems to be problems when initiating the TCP connection. I could have sworn that we had it working absolutely fine on a previous build, but none of the previous commits seem to work - so it may well be a system/kernel update that has changed something.
The issue:
When initiating a TCP connection to this device, it will - straight away - reset the connection. It does this even with the manufacturer-supplied software, which itself then immediately re-attempts the connection again and it succeeds; so this kind of reset-once-then-it-works-the-second-time behaviour in itself isn't a "problem" that I have any control over.
What I am trying to understand is why a Windows system immediately re-attempts the connection through a retransmission...
..but the Linux system just gives up after one attempt (this is the end of the packet capture..)
To prove it is not an application-specific issue, I've tried using ncat/netcat on both the Windows system and the Raspbian system, as well as a Kali system on a separate laptop to prove it isn't an ARM/Raspberry issue. Since the UDP "request" hasn't been sent, the connection will never succeed anyway, but this simply demonstrates different behaviour between the OSes.
Linux versions look pretty much the same as above, whereby they send a single packet that gets reset - whereas the Windows attempt demonstrates the multiple retransmissions..
So, does anyone have any answer for this behaviour difference? I am guessing it isn't a .net core specific issue, but is there any way I can set socket options to attempt a retransmission? Or can it be set at the OS level with systemctl commands or something? I did try and see if there are any SocketOptionNames, in .net, that look like they'd control attempts/retries, as this answer had me wonder, but no luck so far.
If anyone has any suggestions as to how to better align this behaviour across platforms, or can explain the reason for this difference is at all, I would very much appreciate it!

Nice find! According to this, Windows´ TCP will retry a connection if it receives a RST/ACK from the remote host after sending a SYN:
... Upon receiving the ACK/RST client from the target host, the client determines that there is indeed no service listening there. In the Microsoft Winsock implementation of TCP, a pending connection will keep attempting to issue SYN packets until a maximum retry value is reached (set in the registry, this value defaults to 3 extra times)...
The value used to limit those retries is set in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpMaxConnectRetransmissions according to the same article. At least in Win10 Pro it doesn´t seem to be present by default.
Although this is a conveniece for Windows machines, an application still should determine its own criteria for handling a failed connect attempt IMO (i. e number of attempts, timeouts etc).
Anyhow, as I said, surprising fact! Living and learning I guess ...
Cristian.

Related

How to Confirm PostgreSQL on Ubuntu VM is communicating with External Server for Updates

I have an Ubuntu VM installed on a client's VMware system. Recently, the client's IT informed us that his firewall has been detecting consistent potential port scans to our VM's internal IP address (coming from 87.238.57.227). He asked if this was part of a known package update process on our VM.
He sent us a firewall output where we can see several instances of the port scan, but there are also instances of our Ubuntu VM trying to communicate back to the external server on port 37258 (this is dropped by the firewall).
Based on a google lookup, the hostname of the external IP address is "feris.postgresql.org", with the ASN pointing to a European company called Redpill-Linpro. As far as I can tell, they offer IT consulting services, specializing in open source software (like PostgreSQL, which is installed on our VM). I have never heard of them before though and have no idea why our VM would be communicating with them or vice-versa. I'm also not sure if I'm interpreting the IP lookup information correctly: https://ipinfo.io/87.238.57.227
I'm looking for a way to confirm or disprove that this is just our VM pinging for a standard postgres update. If that's the case I'd like to restrict this behaviour. We would prefer to do these types of updates manually and limit the communication outside of the VM to what is strictly necessary for the functionality of our application.
Update
I sent an email to Redpill's abuse account. They responded quickly saying that the server should not be port scanning anyone and if it appears that way, something is wrong.
The server is part of a cluster of machines that serves apt.postgresql.org among other postgres download sites. I don't think we have anything like ansible or puppet installed that would automatically check for updates but I will look into that to make sure. I'm wondering if Ubuntu reaching out to update the MOTD with the number of available packages would explain why our VM is trying to reach out to the external postgres server?
The abuse rep said in any case there should only be outgoing connections from the VM, not incoming. He asked for some additional info so I will keep communicating with him and try to update this post accordingly
My communication with the client's IT dropped off so I did not get a definitive answer on this, but I'll provide some new details:
I reached out to the abuse email for Redpill-Linpro. He got back to me and confirmed the server corresponding to the detected IP address is part of a cluster that hosts postgres download sites, including apt.postgresql.org. He was surprised to learn we had detected a port scan from their server and seems eager to figure out why that is happening.
He asked if the client IT could pass along some necessary info for them to set up tracking on that server. But the client IT never got back to me. I think he was satisfied that it wasn't malicious and stopped pursuing it.
Here's one of the messages the abuse rep sent me that may be relevant:
That does look a lot like the tcp to the apt download server yes. It's
strange that your firewall reports that many incoming connections, but
they could be fallout from some connection tracking that's not
operating as intended. The timing appears to be matching up more or
less perfectly. And there should definitely not be any ping-back
connections from it.
Since you appear to be using the http version of the server (and not https) bringing the data in cleartext, they should be able to just
dump the TCP connection contents and verify exactly what it does. But
I bet they are going to see a number of http requests initiated by the
apt client that is checking for updates.

socket opening on WIndows 2012 extremely slow

I'm working on a legacy VB6 app that uses sockets to communicate to various devices.
On a 2012 system, we are noticing the time between calling winSock.Connect() to the connection event being fired is holding at about 9 seconds, across multiple systems on different domains.
On a 2008 R2 or lower system, it's taking 1-3 milliseconds between the call and the event being fired.
Has anyone run into this before, or has any ideas on what could be causing this?
I've done some snooping with Wireshark, and found that the first few TCP transmissions are not connecting and being retransmitted, not sure if that will help.
I ended up finding the answer to this after some extensive digging.
Starting in Windows Server 2012, Microsoft has enabled an extension of TCP called Explicit Congestion Notification (ECN). This allows end-to-end notification of network congestion with the loss of packets. The way this is enabled on a TCP packet is via a flag, which is defined in the definition of ECN (RFC 3168(2001)).
What was happening for me was that the devices my application talks to are older, and don't support the ECN flag. When they received packets with that flag enabled, they wouldn't acknowledge the transmission, leading to a timeout from the server. After two failed transmissions, it looks like Windows shuts off the ECN flag, and the device acknowledged the packets.
I disabled ECN running the following command from an Administrator Command Prompt:
netsh interface tcp set global ecncapability=disabled
There is nothing particularly "special" about the Winsock control, which is just a thin wrapper on top of the API. The only thing of note really is that it is 32-bit and must run inside WOW64.
You're probably doing something funny or all 32-bit programs using the winsock API the same way should see the same issue.
Perhaps you have a name resolution issue on this server?

Sockets won't connect after Windows 8.1 update

I am currently working on a project that involves the use of sockets. The program was working just fine, but after a windows 8.1 update on my computer, either the client socket is sending out a signal to be accepted or the server socket isn't receiving the signal so it can accept. I tried to use it on my emulator and it worked just fine. In addition, I have updated my GPU drivers to see if this could be fixed, but they still won't connect. The program doesn't crash and give me an error; it just sits there and waits. Does anyone have any ideas? Perhaps the update is blocking my peer to peer connection? Any help is much appreciated. Thanks.

Missed connections using select() on many non-blocking connecting TCP sockets on windows XP

I have a small portable tool which connects to around 150 servers at diverse locations to get a quick status check from them. It is important to get the status for all servers back to the user relatively quickly so the tool connects to the servers in parallel using non-blocking connect, and uses select() to determine when each socket is ready. The use of select() is fairly straightforward, and the tool is failure mature now and works well on Linux. It runs on windows xp, but connections to the vast majority of the servers out there do not complete. The tool staggers the calls to connect to avoid creating what looks like a SYN flood. It connects to one server about 100 msecs. I also have a check in place to ensure FD_SETSIZE is not violated. I have anecdotal evidence from someone else that the behaviour is better on later windows versions, but have not been able to verify.
I have used WinDump to verify that the syn packets are being sent, and I can see ack packets coming back, but select() keeps returning zero, and the code simply can't connect to most of the servers that do exist, and I can connect to just fine with the same code on Linux.
Has anyone seen and or solved any similar issues with many non-blocking connects and select on Windows XP?
After another day or so of digging I seem to have found the answer. On windows XP SP2 there is a limit of 10 concurrent connecting sockets system wide. If 10 or more half open connections exist, a System event is logged noting that the limit has been reached, and new connecting sockets are throttled silently. The System Event number is 4226.
I have fixed my code by adding version checks for Windows XP, and throttled to less than 10 connections on those systems. So far I have no reports of other versions being affected.

how can we sniff traffic between two ports of same machine?

I am testing a thick client which is connected to a database, need to sniff traffic b/w tcp port on same machine
WireShark (formerly Ethereal) will work perfectly, if you're not familiar with it, it can be a little tricky on OSX, Windows it's no problem and Linux can be a headache. You can download it here http://www.wireshark.org/, and read a short-primer here - http://www.ipprimer.com/packets.cfm
Essentially there's a capture phase, and then you can work with the data – for your purposes you can live-capture and filter the output to the packets on the port/destination you care about, I've used it many-a-time to debug dodgy home networking, or problems at the office.
Beware if using MySQL and localhost for example, this is a key-word for MySQL and it will infact use the socket instead.. which makes things a matter more complicated, you can circumvent this problem by always making sure to use 127.0.0.1 if working with MySQL. (Perhaps other software uses this convention?)
You can try some tools like WireShark.
Assuming you're on Windows:
I'd split the client and server across two machines, either two real ones, or a VM with something VMWare. Then I'd use Wireshark.