observing on-off socket problem - sockets

recently I encounter a problem. I am using two programs A and B, developed by someone else, which use TCP sockets to communicate each other, A is server, B is client. That is what I observed: when I start both A and B, they run and communicate with each other, if I first kill A, then restart A again, now by checking the processes, A is successfully launched, but cannot be connected by B, no matter I restart B. however, If I continue to kill this non-detectable A and start A again, it can be detected by B.
At the same time, if I close B's socket before kill A, then when I start A and B, they work very well.
what the problem might be and is there some way to see the opened sockets when I kill A?

It depends on the OS you are using.
lsof -p <pid> is quite common on UNIX and lets you list all file descriptors used by a process.
netstat is probably available and will also list opened ports.
This is probably due to the TIME_WAIT state. When you kill A, the server port is still allocated by the OS and can be reused only if A sets a specific flag when opening server port to be able to reuse this port (SO_REUSEADDR). Otherwise, A won't be able to reuse the server port until it is closed by the OS (can take a few minutes, reason why when you continue to kill A, at some point, the port is available again). I don't know what A is doing if it cannot open the server port because of that.

Related

Is it possible to deploy without downtime without disconnecting TCP sockets connected?

There is a long connected TCP socket. Up to two clients can connect to a server. In other words, the load is not high. However, once a TCP connection is made, the socket will not be disconnected unless there is an accident, such as a server power down or network failure. Is it possible to reuse an existing TCP socket when restarting the process? I think TCP load balancer like AWS NLB cannot be used since the existing socket won't be moved to a new application. I'd like to have a deployment without downtime, as the system i'm working on is a system that can suffer financial damage when a socket is broken and data is lost. Low-level socket programming is ok.
I have read CloudFlare's https://blog.cloudflare.com/graceful-upgrades-in-go/ article explaining Nginx's Gracefully Reload mechanism. Since an HTTP server is a server that opens and closes sockets frequently, that article assumes that the server's connection would someday be closed, but my situation is slightly different. So I'm not sure if this can be used.
A socket can be shared between multiple processes, for example by opening the socket in same parent processing and forking a child process. But if the last process using the socket is closed the socket and thus the underlying connection is implicitly closed.
This means you must make sure that there is always a process open which uses the socket. This can be for example done if the deployment of the new software does not first exit the old process and then creates the new one but if the new process would start and the old process would transfer the socket to the new one, see Can I share a file descriptor to another process on linux or are they local to the process?
for how this can be done in Linux. Other ways would be using file descriptor inheritance when doing a fork().
Note that these sharing of file descriptors will only work with plain sockets where the state is fully kept in the OS kernel. It will be much harder or impossible with TLS sockets since in this case also the current user space state somehow needs to be shared.
Another way is to have some intermediate "proxy" which on the hand has the stable socket connection to your fragil application and on the other hand is a robust socket handling (i.e. reconnect when needed) to the application you want to update. Then this proxy transfers the traffic between both sides and will reconnect the socket if needed whenever a problem occurs.

How can two Unicorn servers bind to the same Unix socket?

This (rather old) article seems to suggest that two Unicorn master processes
can bind to the same Unix socket path:
When the old master receives the QUIT, it starts gracefully shutting down its workers. Once
all the workers have finished serving requests, it dies. We now have a fresh version of our
app, fully loaded and ready to receive requests, without any downtime: the old and new workers
all share the Unix Domain Socket so nginx doesn’t have to even care about the transition.
Reading around, I don't understand how this is possible. From what I understand, to truly have zero
downtime you have to use SO_REUSEPORT to let the old and new servers temporarily be bound to the
same socket. But SO_REUSEPORT is not supported on Unix sockets.
(I tested this by binding to a Unix socket path that is already in use by another server, and I got
an EADDRINUSE.)
So how can the configuration that the article describes be achieved?
Nginx forwards HTTP requests to a Unix socket.
Normally a single Unicorn server accepts requests on this socket and handles them (fair enough).
During redeployment, a new Unicorn server begins to accept requests on this socket and handles them, while the old server is still running (how?)
My best guess is that the second server calls unlink on the socket file immediately before calling bind with the same socket file, so in fact there is a small window where no process is bound to the socket and a connection would be refused.
Interestingly, if I bind to a socket file and then immediately delete the file, the next connection to the socket actually gets accepted. The second and subsequent connections are refused with ENOENT as expected. So maybe the kernel covers for you somewhat while one process is taking control of a socket that was is bound by another process. (This is on Linux BTW.)

What happen when two process having the same port configure

Suppose we have two process having the same port, but one at a time one will be using it and other one will be in passive mode and once the active process goes down, the passive will start reading from the port.
now since in linux everything is a file descriptor, i wanted to know is there any way where passive process can immediately start reading from the port.
Currently i am closing the port in active process and then again open it in the passive once it becomes active.
Thanks in advance.
No, only one process can read from a port at a time. Even on linux, when a process uses the port, it locks it. It is linux architecture that everything is a file(with exception of network devices). But these device files(or ports) are special files or device files. You cannot manipulate them as normal files.

Sockets on a webhost

If you telnet to the ip address 192.43.244.18 port 13, you'll get the current time.
well, if I'm not wrong, this is simply a server socket. But there's one thing strange: how's this socket always listening?
If I take a PHP page and program sockets in there, I still have to request for the page first in order to activate the server socket, but this one isn't associated with any pages, and even if a make a perl script, I still have to request for that in order to run the server socket!
My question is: how can I make such a thing - an always listening socket - on a webhost (any language will do)?
You can run the process that's listening on the socket as a daemon (Linux) or service (Windows), or just a regular program really (although that's less elegant).
A simple place to begin would be http://docs.oracle.com/javase/tutorial/networking/sockets/clientServer.html which teaches you how to make a simple serversocket in Java that listens for a connection on a specific port. The program created will have to be run at all times to be able to accept the connections.

How can I force a refresh of what ports have listeners

I'm trying to re-launch a WCF service that I killed earlier, but I'm getting an AddressAlreadyInUseException. The port it's attempting to use is 1819.
I ran netstat -nao from the command line, and have found there is a listening process on port 1819, that has a PID of 4840. I went into Process Explorer (from SysInternals) to try to kill PID 4840, but it's not there.
I'm guessing PID 4840 was the WCF service running earlier (that I killed) but it didn't clear out the connections. How can I force a refresh of these ports being listened in on? Otherwise I'll have to reboot every time this happens.
It doesn't look like there's a way to refresh it. For now I have reconfigured the service to use another port until it's more convenient for me to restart.
I had same problem, the only way to make my port free and refreshed is just by restarting the computer. It is a bit tedious but that was the only way for me to solve the problem