Azure VM TCP idle timeout - sockets

I have a problem with setting up a FTP Server on a Azure VM.
In normal using the server runs great. Problem are coming with large file transfer over passive FTP connection.
Setup
FTP-Server software is a FileZilla Server.
Azure VM endpoint, Windows Firewall and Filezilla are configurated to use port 10000-10009 for passive connections.
The client is a 3rd party device.
Problem
On large file transfers with a duration over 4min the connection gets an idle timeout.
I found a Microsoft blog entry where is written:
"When FTP is transferring large files, the elapsed time for transfer may exceed 4 minutes, especially if the VM size is A0. Any time the file transfer exceeds 4 minutes, the Azure SLB will time out the idle TCP/21 connection, which causes issues with cleanly finishing up the FTP transfer once all the data has been transferred. [..] Basically, FTP uses TCP/21 to set everything up and begin the transfer of data. The transfer of data happens on another port. The TCP/21 connection goes idle for the duration of the transfer on the other port. When the transfer is complete, FTP tries to send data on the TCP/21 connection to finish up the transfer, but the SLB sends a TCP reset instead."
Now... for my 3rd party client is it not possible to set it up to send a TCP keepalive command to avoid idle timeout.
Question
How can I tell the Azure VM to not close idel TCP connection after 4min?
I even don't understand why this is happens because this violates the TCP specifications (RFC 5382 makes this especially clear its 2h 4m in normal). In other words, Azure that is dropping idle connections too early cannot be used for long FTP transfers.
Please help!
Regards
Steffen

I found two solutions!
1.
It is possible to set the timeout of VM endpoints up to 30 minutes.
Powershell command to do this is:
> Get-AzureVM -ServiceName "MyService" -Name "MyVM" | Set-AzureEndpoint -Name "MyEndpoint" -IdleTimeoutInMinutes 30 | Update-AzureVM
More information here.
2. Create ILIP (instance level IP)
You can create a ILIP to bypass the VM webservice enpoint layer. The PowerShell command to do this is:
Get-AzureVM -ServiceName “MyService” -Name “MyVM” | Set-AzurePublicIP -PublicIPName "MyNewEndpoint" | Update-AzureVM
More information here.

I'm using the latest version of Filezilla (3.14.1) and you can set Filezilla to send Keep-Alive packets, which would be recommended you try that first, rather than attempting to alter the default Azure load-balancer timeouts. However, the load balancer timeouts are user-configurable (ie: under your control) and details can be found here: https://azure.microsoft.com/en-us/documentation/articles/load-balancer-tcp-idle-timeout/
To set keep-alive commands on in Filezilla:
•Open the FileZilla "Edit" menu and select "Settings." On a Mac, open the "FileZilla" menu and choose "Preferences."
•Select the "FTP" page in the "Connection" section of the Settings dialog box. Look for the "FTP Keep-Alive" section of the page.
•Activate the "Send FTP keep-alive commands" box in the "FTP Keep-alive" section. This sends commands between FileZilla and the FTP server at short intervals, resetting the time-out function and preventing the server from closing the connection.
Hope that helps.

Related

Can I increase azure linux vms tcp keep alive timeout if not using load balancer?

I've Azure VM running Linux(ubuntu 18.06). I'm running Python socket server there. Now the problem is, any socket client which is not doing any activity for 4 minutes is getting disconnected. I've gone through https://github.com/wbuchwalter/azure-content/blob/master/includes/guidance-tcp-session-timeout-include.md and changed /etc/sysctl.conf on my linux instance, but it's not working. Now my question is,
1. Is it possible to change keepalive with default public ip of azre vm, because the link says "outbound using SNAT (Source NAT). This timeout is set to 4 minutes, and cannot be adjusted."
Inbound TCP timeout for Public IP can be controlled. For outbound, the default value is 4 minutes and cannot be changed. You an still keep the session active by sending keep-alive packets.

Get Network statistic for Client Apps in PowerShell

I was looking into network statistics in powershell and there are a lot of good scripts to measure:
Network traffic through a network card using counter
List all connections using for example windows built in functions
What I am looking for is the detailed info about the network traffic per client connection, so how how much traffic (bytes sent\read) is consumed by the remote client application.
Any suggestions how to start ?
I think TCPView by Sysinternals should do the trick. This gives you real-time traffic on a per process level.
You would filter by the process or remote address or source port, depending on what the Client App in question is, and then you would see sent and received packets and bytes for each TCP/UDP connection.

Azure drops TCP connection after a few packets

I'm having a strange problem trying to maintain TCP connections from my local PC to Azure (oddly remote desktop works fine). I first noticed the problem with my own software, but it's not limited to it. What I've noticed is:
TCP 3 way handshake completes
Some data is successfully sent and received
Something bad happens and no more data is sent
To rule out my software, I tested netcat. On my Azure machine I set up a netcat server to echo a large text file. On my local PC I established the netcat connection to the Azure server and observed some of the text file being printed and then it just stopped.
The first Wireshark image is from the Azure server, and the second image is from my PC. Both were captured at the same time doing the netcat test I described above.
Here is my Azure endpoint configuration (same result with both endpoints):
I'm currently at a loss, and don't know enough about what the problem may be to continue my debugging efforts. Any suggestions would be greatly appreciated.
Thanks!
Some guy from Azure support told me that Azure Network's max. TCP packet size is 1350. So if your packets larger than this, it might be a problem. try to limit them to 1300 and test it again.
There are several things that could cause these kind of n/w problems such as below.
Azure SLB/SNAT
Azure Physical n/w issues(ACLs on Azure routers where your vms reside)
Proxies in front of your client applications etc.
We should systematically prune the problem space while tackling these kind of problems. For example, in this instance, you should run your client app on one of the Azure VMS and verify whether the issue is reproducible or not. If the issue is not reproducible, your problem must reside on your local pc n/w(or proxy behind which your machine resides) for 99% of time.
If you found the issue is reproducible from another Azure VM too, there must be something wrong on Azure side for 99% of time.
Some tips to identify issues on Azure side
Check whether your tcp connection is idle(not sending) data for 4 minutes. If that's the case, Azure SLB/SNAT layer drops the tcp connection on azure side. You could prevent this issue by either sending tcp keep alives or increasing vm endpoint idle time out using AzureEndpoint.
Hope this helps.

Session getting disconnected in the middle of working

Sessions are getting disconnected automatically (in the middle of working).
Disconnection happens for the users when they working by using telnet connection to Linux server via putty telnet application.
During the disconnection, the Network b/w utilization is high and no limitation for total number of users in a network.
Error "Hangup signal received (562)"
Any idea about this ??
The network connection was interrupted or a hangup signal was sent via "kill".
You mention network utilization being "high" when disconnects happen. How do you know that? What measurement are you looking at that tells you it is "high"? That might be a symptom of a networking issue that is at the root of the problem.
There are few directions:
OpenEdge has published this article with links to implementing keep-alive packets:
https://knowledgebase.progress.com/articles/Article/Telnet-connection-times-out-after-15-minutes
Increase the number of "instances" in xinetd.conf, and then restart the service.
Make sure that the database watchdog is up and running: https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/dmadm/prowdog-command.html
Check the database log file, to find out what happened just before the hangup (https://documentation.progress.com/output/ua/OpenEdge_latest/index.html#page/gsins/openedge-database-log-file.html)

TCP communication stops with firewall on

I am having a server client application developed in vb.net. The server app sends a file to the client app at regular intervals. It uses TCP/IP to send files.
After installing the application the application runs fine with firewall off. But when firewall is on and an exception is added for the application, the file transfer works for around 10 mins and then stops.
But as soon as i turn off the firewall, the transfer starts again. Please suggest how to resolve this issue.
When you say "file transfer works for around 10 mins and then stops" can you elaborate a little more. For example... "I am transferring a large file and during the transfer, it stops." or "I transferred a file successfully, and ten minutes later, went to transfer another, and it didn't work."
In the first scenario (large transfer), there may be some form of bandwidth limitation/rule stopping. In the second, there could be some form of "STATE" processing where a STATEFUL session occurs, the firewall doesn't close it, another initiation is made 10 minutes later, and your firewall is viewing the session "someone is trying to piggyback/hijack this session... better close it"
I would turn on verbose logging on the firewall to see what the firewall is doing and how it perceives the connection.