Is it possible to deploy without downtime without disconnecting TCP sockets connected? - sockets

There is a long connected TCP socket. Up to two clients can connect to a server. In other words, the load is not high. However, once a TCP connection is made, the socket will not be disconnected unless there is an accident, such as a server power down or network failure. Is it possible to reuse an existing TCP socket when restarting the process? I think TCP load balancer like AWS NLB cannot be used since the existing socket won't be moved to a new application. I'd like to have a deployment without downtime, as the system i'm working on is a system that can suffer financial damage when a socket is broken and data is lost. Low-level socket programming is ok.
I have read CloudFlare's https://blog.cloudflare.com/graceful-upgrades-in-go/ article explaining Nginx's Gracefully Reload mechanism. Since an HTTP server is a server that opens and closes sockets frequently, that article assumes that the server's connection would someday be closed, but my situation is slightly different. So I'm not sure if this can be used.

A socket can be shared between multiple processes, for example by opening the socket in same parent processing and forking a child process. But if the last process using the socket is closed the socket and thus the underlying connection is implicitly closed.
This means you must make sure that there is always a process open which uses the socket. This can be for example done if the deployment of the new software does not first exit the old process and then creates the new one but if the new process would start and the old process would transfer the socket to the new one, see Can I share a file descriptor to another process on linux or are they local to the process?
for how this can be done in Linux. Other ways would be using file descriptor inheritance when doing a fork().
Note that these sharing of file descriptors will only work with plain sockets where the state is fully kept in the OS kernel. It will be much harder or impossible with TLS sockets since in this case also the current user space state somehow needs to be shared.
Another way is to have some intermediate "proxy" which on the hand has the stable socket connection to your fragil application and on the other hand is a robust socket handling (i.e. reconnect when needed) to the application you want to update. Then this proxy transfers the traffic between both sides and will reconnect the socket if needed whenever a problem occurs.

Related

Java/Tomcat Open TCP Connections - Resource Monitor

Right now we are running into a problem where we have a bunch of "open TCP connections" on our Windows server's that are running a tomcat webserver. The Java code is doing a SOAP call to a vendor, and we see a lot of open connections in Resource Monitor (pictured below) showing the vendor's IP address. I've tried a couple different methods of doing the SOAP call thinking the connection wasn't explicitly being closed somewhere behind the scenes. Nothing has worked so far, so I'm thinking that I may be misunderstanding what this page is actually showing.
What is the lifecycle for a TCP connection as it relates to the Windows Resource Monitor? Is it normal for connections that are no longer being used to stay out there for a while? If not, how do I remedy the situation?
It'll be either a connection pool or a resource leak in your code.
To make sure it's not a resource leak check your code to make sure that whatever object is making the network call closes the connection after it's used otherwise you'll be waiting until the garbage collector runs.
However, if the network client supports connection pooling then closing it may only place the open connection back into a pool ready for quick re-use. You don't say which client API you're using but if it supports pooling then it should provide an API to say how long released connections remain in the pool.
There is no Windows Winsock-level pooling or persistence. If the underlying socket gets closed then that's it, it gets closed.

How can two Unicorn servers bind to the same Unix socket?

This (rather old) article seems to suggest that two Unicorn master processes
can bind to the same Unix socket path:
When the old master receives the QUIT, it starts gracefully shutting down its workers. Once
all the workers have finished serving requests, it dies. We now have a fresh version of our
app, fully loaded and ready to receive requests, without any downtime: the old and new workers
all share the Unix Domain Socket so nginx doesn’t have to even care about the transition.
Reading around, I don't understand how this is possible. From what I understand, to truly have zero
downtime you have to use SO_REUSEPORT to let the old and new servers temporarily be bound to the
same socket. But SO_REUSEPORT is not supported on Unix sockets.
(I tested this by binding to a Unix socket path that is already in use by another server, and I got
an EADDRINUSE.)
So how can the configuration that the article describes be achieved?
Nginx forwards HTTP requests to a Unix socket.
Normally a single Unicorn server accepts requests on this socket and handles them (fair enough).
During redeployment, a new Unicorn server begins to accept requests on this socket and handles them, while the old server is still running (how?)
My best guess is that the second server calls unlink on the socket file immediately before calling bind with the same socket file, so in fact there is a small window where no process is bound to the socket and a connection would be refused.
Interestingly, if I bind to a socket file and then immediately delete the file, the next connection to the socket actually gets accepted. The second and subsequent connections are refused with ENOENT as expected. So maybe the kernel covers for you somewhat while one process is taking control of a socket that was is bound by another process. (This is on Linux BTW.)

TCP connection between client and server gone wrong

I establish a TCP connection between my server and client which runs on the same host. We gather and read from the server or say source in our case continuously.
We read data on say 3 different ports.
Once the source stops publishing data or gets restarted , the server/source is not able to publish data again on the same port saying port is already bind. The reason given is that client still has established connection on those ports.
I wanted to know what could be the probable reasons of this ? Can there be issue since client is already listening on these ports and trying to reconnect again and again because we try this reconnection mechanism. I am more looking for reason on source side as the same code in client sides when source and client are on different host and not the same host works perfectly fine for us.
Edit:-
I found this while going through various article .
On the question of using SO_LINGER to send a RST on close to avoid the TIME_WAIT state: I've been having some problems with router access servers (names withheld to protect the guilty) that have problems dealing with back-to-back connections on a modem dedicated to a specific channel. What they do is let go of the connection, accept another call, attempt to connect to a well-known socket on a host, and the host refuses the connection because there is a connection in TIME_WAIT state involving the well-known socket. (Stevens' book TCP Illustrated, Vol 1 discusses this problem in more detail.) In order to avoid the connection-refused problem, I've had to install an option to do reset-on-close in the server when the server initiates the disconnection.
Link to source:- http://developerweb.net/viewtopic.php?id=2941
I guess i am facing the same problem: 'attempt to connect to a well-known socket on a host, and the host refuses the connection'. Probable fix mention is 'option to do reset-on-close in the server when the server initiates the disconnection'. Now how do I do that ?
Set the SO_REUSEADDR option on the server socket before you bind it and call listen().
EDIT The suggestion to fiddle around with SO_LINGER option is worthless and dangerous to your data in flight. Just use SO_RESUSEADDR.
You need to close the socket bound to that port before you restart/shutdown the server!
http://www.gnu.org/software/libc/manual/html_node/Closing-a-Socket.html
Also, there's a timeout time, which I think is 4 minutes, so if you created a TCP socket and close it, you may still have to wait 4 minutes until it closes.
You can use netstat to see all the bound ports on your system. If you shut down your server, or close your server after forking on connect, you may have zombie processes which are bound to certain ports that do not close and remain active, and thus, you can't rebind to the same port. Show some code.

Is AF_INET slower than AF_UNIX due to three way hand-shake involved in AF_INET?

I have a requirement in which server needs to interact with 2 clients, one residing on local machine and one on remote.
So, initially I was thinking of creating a socket using AF_UNIX for communication with local client (since its faster than AF_INET), and AF_INET in case of communication with remote, and polling between them.
But in case of local client, channel will only be created in the beginning which will exist permanently till the server is running, i.e. single accept, followed by multiple read/writes.
So, can I replace this AF_UNIX with AF_INET, since the connection establishment will be done only once?
Where does performance hits in case of AF_INET? Is it in three-way handshake or somewhere else as well?
Quote from Performance: TCP loopback connection vs Unix Domain Socket:
When the server and client benchmark programs run on the same box, both the TCP/IP loopback and unix domain sockets can be used. Depending on the platform, unix domain sockets can achieve around 50% more throughput than the TCP/IP loopback (on Linux for instance). The default behavior of redis-benchmark is to use the TCP/IP loopback.
However, make sure that the performance gain is worth the tradeoff of complicating the network stack of your application (by using various types of sockets depending on client location).

Is there a way to wait for a listening socket on win32?

I have a server and client program on the same machine. The server is part of an application- it can start and stop arbitrarily. When the server is up, I want the client to connect to the server's listening socket. There are win32 functions to wait on file system changes (ReadDirectoryChangesW) and registry changes (RegNotifyChangeKeyValue)- is there anything similar for network changes? I'd rather not have the client constantly polling.
There is no such Win32 API, however this can be easily accomplished by using an event. The client would wait on that event to be signaled. The server would signal the event when it starts up.
The related API that you will need to use is CreateEvent, OpenEvent, SetEvent, ResetEvent and WaitForSingleObject.
If your server will run as a service, then for Vista and up it will run in session 0 isolation. That means you will need to use an event with a name prefixed with "Global\".
You probably do have a good reason for needing this, but before you implement this please consider:
Is there some reason you need a connect right away? I see this as a non issue because if you perform an action in the client, you can at that point make a new server connection.
Is the server starting and stopping more frequently than the client? You could switch roles of who listens/connects
Consider using some form of Windows synchronization, such as semaphore. The client can wait on the synchronization primitive and the server can signal it when it starts up.
Personally I'd use a UDP broadcast from the server and have the "client" listening for it. The server could broadcast a UDP packet every X period whilst running and when the client gets one, if it's not already connected, it could connect.
This has the advantage that you can move the client onto a different machine without any issues (and since the main connection from client to server is sockets already it would be a pity to tie the client and server to the same machine simply because you selected a local IPC method for the initial bootstrap).