Teamcity fails to publish artifacts and stop builds - sockets

I'm having an issue with TeamCity that is proving very difficult to solve for a number of reasons. I've looked around and not found any useful answers so far.
We have a teamcity server running on port 8080 with two agents connecting to it on ports 9090 and 9091 respectively. The agents register successfully and can accept new builds just fine. When the build is complete, tests have passed and the logs state "Sending artifacts" things stop and the artifacts never reach the server. Having left this sit overnight I make requests to stop the build which fail.
We have recently switched to a new firewall but things have been working after setting the required port rules for 8080, 9090 and 9091. No changes have been made since we got things working but now things do not work.
To the logs...
The server is aware of the failure as I can see logs in several places stating:
jetbrains.buildServer.SERVER - Failed to upload artifact, due to error: org.apache.commons.fileupload.FileUploadBase$IOFileUploadException: Processing of multipart/form-data request failed. Read timed out
The agent also has logs stating a similar reason:
jetbrains.buildServer.AGENT - Failed to publish artifacts because of error: java.net.SocketException: Connection reset by peer: socket write error, will try again.
During all this the firewall logs show that all traffic on the expected ports is being allowed through. What is odd though are some logs that look like this:
2016-04-01 10:45:00 Deny [sourceIp] [targetIP] 49426/tcp 8080 49426 0-External Firebox tcp syn checking failed (expecting SYN packet for new TCP connection, but received ACK, FIN, or RST instead). 558 113 (Internal Policy) proc_id="firewall" rc="101" msg_id="3000-0148" tcp_info="offset 5 A 478076245 win 258"
Examining port 49426 on the agent shows that it was being used by java.exe. Now I'm assuming this might have something to do with TeamCity as it runs in the JVM. The next step was to scour every bit of config I can find to figure out where this port number comes from. After a while the agent decided to retry and the port changed. It looks to me that java is just using whatever port it wants (as if unassigned in code) so could there be something missing in the agent config instructing it which port to use for artifact uploads?
I did read somewhere that perhaps the server or the firewall doesn't like requests or file uploads that exceed a certain size (the largest file is 81 meg) but we found nothing to suggest there was such a rule in place.
The Teamcity version is old (v7.1.1) but we are currently unable to upgrade (I am waiting on approval to use a newer, bigger server due to hard disk space issues).
UPDATE
We very briefly opened up a bit of the firewall to see if it was the cause of the issues to no avail. At this point I'm not convinced the firewall is the problem.
Any ideas?
Thanks in advance.
UPDATE 2
I've ended up setting up a whole new build server and things work just fine there. The new server has the latest TeamCity version but the agents are the same machines and artifact uploads appear to work just fine. This isn't really a solution to the question but at least I have a working setup now.

This can happen when the agent is too slow to start sending data for whatever reason. This workaround by Jetbrains employee Pavel Sher might help:
Increase the connectionTimeout value in the server.xml file
<Connector port="8111" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8543"
enableLookup="false"
useBodyEncodingForURI="true">
To 20000 to 60000 or even more.

Related

TCP retransmission on RST - Different socket behaviour on Windows and Linux?

Summary:
I am guessing that the issue here is something to do with how Windows and Linux handle TCP connections, or sockets, but I have no idea what it is. I'm initiating a TCP connection to a piece of custom hardware that someone else has developed and I am trying to understand its behaviour. In doing so, I've created a .Net core 2.2 application; run on a Windows system, I can initiate the connection successfully, but on Linux (latest Raspbian), I cannot.
It appears that it may be because Linux systems do not try to retry/retransmit a SYN after a RST, whereas Windows ones do - and this behaviour seems key to how this peculiar piece of hardware works..
Background:
We have a black box piece of hardware that can be controlled and queried over a network, by using a manufacturer-supplied Windows application. Data is unencrypted and requires no authentication to connect to it and the application has some other issues. Ultimately, we want to be able to relay data from it to another system, so we decided to make our own application.
I've spent quite a long time trying to understand the packet format and have created a library, which targets .net core 2.2, that can be used to successfully communicate with this kit. In doing so, I discovered that the device seems to require a kind of "request to connect" command to be sent, via UDP. Straight afterwards, I am able to initiate a TCP connection on port 16000, although the first TCP attempt always results in a RST,ACK being returned - so a second attempt needs to be made.
What I've developed works absolutely fine on both Windows (x86) and Linux (Raspberry Pi/ARM) systems and I can send and receive data. However, when run on the Raspbian system, there seems to be problems when initiating the TCP connection. I could have sworn that we had it working absolutely fine on a previous build, but none of the previous commits seem to work - so it may well be a system/kernel update that has changed something.
The issue:
When initiating a TCP connection to this device, it will - straight away - reset the connection. It does this even with the manufacturer-supplied software, which itself then immediately re-attempts the connection again and it succeeds; so this kind of reset-once-then-it-works-the-second-time behaviour in itself isn't a "problem" that I have any control over.
What I am trying to understand is why a Windows system immediately re-attempts the connection through a retransmission...
..but the Linux system just gives up after one attempt (this is the end of the packet capture..)
To prove it is not an application-specific issue, I've tried using ncat/netcat on both the Windows system and the Raspbian system, as well as a Kali system on a separate laptop to prove it isn't an ARM/Raspberry issue. Since the UDP "request" hasn't been sent, the connection will never succeed anyway, but this simply demonstrates different behaviour between the OSes.
Linux versions look pretty much the same as above, whereby they send a single packet that gets reset - whereas the Windows attempt demonstrates the multiple retransmissions..
So, does anyone have any answer for this behaviour difference? I am guessing it isn't a .net core specific issue, but is there any way I can set socket options to attempt a retransmission? Or can it be set at the OS level with systemctl commands or something? I did try and see if there are any SocketOptionNames, in .net, that look like they'd control attempts/retries, as this answer had me wonder, but no luck so far.
If anyone has any suggestions as to how to better align this behaviour across platforms, or can explain the reason for this difference is at all, I would very much appreciate it!
Nice find! According to this, Windows´ TCP will retry a connection if it receives a RST/ACK from the remote host after sending a SYN:
... Upon receiving the ACK/RST client from the target host, the client determines that there is indeed no service listening there. In the Microsoft Winsock implementation of TCP, a pending connection will keep attempting to issue SYN packets until a maximum retry value is reached (set in the registry, this value defaults to 3 extra times)...
The value used to limit those retries is set in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpMaxConnectRetransmissions according to the same article. At least in Win10 Pro it doesn´t seem to be present by default.
Although this is a conveniece for Windows machines, an application still should determine its own criteria for handling a failed connect attempt IMO (i. e number of attempts, timeouts etc).
Anyhow, as I said, surprising fact! Living and learning I guess ...
Cristian.

Tableau Server v2018.2 refusing to use port 80 despite it being open

I have a Windows Server 2016 that used to run Tableau Server v2018.1 (and a few versions before that); during this last update, I performed a backup and continued to wipe Tableau off the server (used the tableau-obliterate script which removed all things Tableau).
I then proceeded to install Tableau v2018.2 as a clean install, set up the configuration to use port 80 and started the server successfully.
However, I quickly discovered that Tableau moved the gateway to port 8000; I proceeded to review the ports to ensure nothing else is using this (this VM has nothing other than Tableau installed on it); I used TCPView and monitored the ports while the Tableau Server was running and Stopping/Starting; the only hint I found of something touching port 80 was the output of netstat, which showed an entry of TCP vizqlserver.exe with the state of CLOSE_WAIT.
I have tried manually setting the port through TSM configuration (run set, confirm with get, restart), TSM Settings import, and manually adjusting the configuration file for gateway, but Tableau just reverts back to port 8000.
I am at a loss as to why this is happening as again, nothing else has ever been on this server and nothing has changed since removing v2018.1 (which was running on port 80).
I tried to post this on the Tableau community forum, but 20 hrs later, it is still pending moderator approval :(
Would appreciate any help!
A recent Windows update has been causing some port conflicts try this:
https://kb.tableau.com/articles/Issue/kb4338818-windows-update-causing-tableau-server-to-become-unstable

client is waiting forever for remote server to return a webpage

I have an application with a server written in F# and serve web files using suave. I remote login using powershell into another machine in the network to run the application (The application is also in one of the network drives). I do that because that machine have access to third party APIs needed for the server. Now when I do [IPAddress_Of_Remote_Machine]/[html_file] or [name_of_pc]/[html_file] then chrome is waiting forever and doesn't ever return the webpage. This wasn't happening before and I ran into this problem recently. I opened a different port and used it instead of the default one 80. This made things work but the problem keeps showing up after a couple of days. I don't think it's a firewall issue but I'm clueless to why this is happening.
When running netstat -an, this is what I get (I hid the IP address):
As you can see all of the connections are either in CLOSE_WAIT or ESTABLISHED but not LISTENING. All of these TCP connections is probably because I have PhantomJS and two other APIs running in the application as well. However the loop back address is also open on the same port 5959:
I'm not sure what is difference between these two but when using PortQryUI to query the remote server it returns a success!
I have already made an inbound rule for port 5959 on the server so it should be allowed. The web page is stuck at Waiting for [name_of_pc]. Also, sometimes this problem disappears and everything works fine.
What is the potential problem behind this? Why would this happen all of a sudden?
UPDATE:
I re-ran the application today and it's working correctly. It could be that something is dynamically set within the firewall? Not really sure what is going on. The machine I'm running the server on has a bunch of applications running on it as well so maybe there is an external process that is affecting it?
I made a hello world app with Suave and deployed it on the network drive to test if it's going to work. I opened inbound rule for port 6001
Then I ran the app:
However, it's still not working and this time it says the site cannot be reached when I do: http://[name_of_pc]:6001.
Moving this to an answer so that it can be closed:
Could you post the bindings section of your suave cfg? I'm guessing you know where that is since you are using a non-standard port but if you need don't, search for HttpBinding. I suspect you will find it pointing to 127.0.0.1 which is not good enough for remote access. You could try changing it to 0.0.0.0 or to the server's actual IP address. I would try 0.0.0.0 first for the flexibility it provides

Azure drops TCP connection after a few packets

I'm having a strange problem trying to maintain TCP connections from my local PC to Azure (oddly remote desktop works fine). I first noticed the problem with my own software, but it's not limited to it. What I've noticed is:
TCP 3 way handshake completes
Some data is successfully sent and received
Something bad happens and no more data is sent
To rule out my software, I tested netcat. On my Azure machine I set up a netcat server to echo a large text file. On my local PC I established the netcat connection to the Azure server and observed some of the text file being printed and then it just stopped.
The first Wireshark image is from the Azure server, and the second image is from my PC. Both were captured at the same time doing the netcat test I described above.
Here is my Azure endpoint configuration (same result with both endpoints):
I'm currently at a loss, and don't know enough about what the problem may be to continue my debugging efforts. Any suggestions would be greatly appreciated.
Thanks!
Some guy from Azure support told me that Azure Network's max. TCP packet size is 1350. So if your packets larger than this, it might be a problem. try to limit them to 1300 and test it again.
There are several things that could cause these kind of n/w problems such as below.
Azure SLB/SNAT
Azure Physical n/w issues(ACLs on Azure routers where your vms reside)
Proxies in front of your client applications etc.
We should systematically prune the problem space while tackling these kind of problems. For example, in this instance, you should run your client app on one of the Azure VMS and verify whether the issue is reproducible or not. If the issue is not reproducible, your problem must reside on your local pc n/w(or proxy behind which your machine resides) for 99% of time.
If you found the issue is reproducible from another Azure VM too, there must be something wrong on Azure side for 99% of time.
Some tips to identify issues on Azure side
Check whether your tcp connection is idle(not sending) data for 4 minutes. If that's the case, Azure SLB/SNAT layer drops the tcp connection on azure side. You could prevent this issue by either sending tcp keep alives or increasing vm endpoint idle time out using AzureEndpoint.
Hope this helps.

Netbeans & Eclipse hang when I attempt remote EC2 debugging via Xdebug

Already, I've checked at least 20 resources and am out of ideas:
I have a clean, remote Ubuntu EC2 instance, fresh from the AMI, having stopped only to install LAMP, phpmyadmin, and xdebug on it. Yes, I have configured my remote EC2 instance's php.ini file as follows:
Meanwhile, back on my laptop I have Netbeans & Eclipse installed. While I can get either to seamlessly upload and Run my php web app on my EC2 site (via SSH/SFTP) as soon as I hit "Debug" from either, index.php gets uploaded, a browser window opens, and then NOTHING HAPPENS. The page doesn't load, the Debug perspective doesn't open, breakpoints don't get triggered, nothing. Netbeans just hangs out saying "waiting for connection" whereas Eclipse just sits at the notorious 57% level (& yes, I toggled the xdebug.idekey before testing with Eclipse)).
So I tested xdebug's functionality on my server according to the instructions found here and here (both passed). I tried changing to port 9001 (in remote php.ini as well as in local Netbeans/Eclipse), I even tried launching this brand spanking-new EC2 instance with pretty much open Security group settings (SSH=0.0.0.0/0), but nothing seems to be working. I am out & out flummoxed, a self-confessed noob, and appreciative of any insight seasoned professionals in the community may have to offer.
Thanks,
Debbie
This feels like a networking issue to me. Port 9000 may not be accessible. The quickest way to test is to telnet to port 9000 on the remote system (if you have a telnet client installed that allows you to specify which port to telnet to). If the telnet attempt times out or is closed by the remote system you will see the error and this verifies that there is a networking issue.
I would check /etc/services to make sure that port 9000 is not reserved for use of something else. If port 9000 exists and is uncommented then something else is using the port and that services does not know how to respond to your request so it hangs.
I would do a netstat (lookup params to see "all" listening ports) and make sure the remote system is listening on port 9000. If you don't see port 9000 then the remote system is not configured to establish the connection.
If you are on a WIFI network then port 9000 may need to be port forwarded to the remote system using the internal cable modem configuration menu/utility. This is the scenerio I favor because I've wasted so much time solving this kind of problem with different software.
Good luck, you have more troubleshooting ahead of you and different questions to ask to resolve your problem.