How to know whether any process is bound to a Unix domain socket? - sockets

I'm writing a Unix domain socket server for Linux.
A peculiarity of Unix domain sockets I quickly found out is that, while creating a listening Unix socket creates the matching filesystem entry, closing the socket doesn't remove it. Moreover, until the filesystem entry is removed manually, it's not possible to bind() a socket to the same path again : bind() fails with EADDRINUSE if the path it is given already exists in the filesystem.
As a consequence, the socket's filesystem entry needs to be unlink()'ed on server shutdown to avoid getting EADDRINUSE on server restart. However, this cannot always be done (i.e.: server crash). Most FAQs, forum posts, Q&A websites I found only advise, as a workaround, to unlink() the socket prior to calling bind(). In this case however, it becomes desirable to know whether a process is bound to this socket before unlink()'ing it.
Indeed, unlink()'ing a Unix socket while a process is still bound to it and then re-creating the listening socket doesn't raise any error. As a result, however, the old server process is still running but unreachable : the old listening socket is "masked" by the new one. This behavior has to be avoided.
Ideally, using Unix domain sockets, the socket API should have exposed the same "mutual exclusion" behavior that is exposed when binding TCP or UDP sockets : "I want to bind socket S to address A; if a process is already bound to this address, just complain !" Unfortunately this is not the case...
Is there a way to enforce this "mutual exclusion" behavior ? Or, given a filesystem path, is there a way to know, via the socket API, whether any process on the system has a Unix domain socket bound to this path ? Should I use a synchronization primitive external to the socket API (flock(), ...) ? Or am I missing something ?
Thanks for your suggestions.
Note : Linux's abstract namespace Unix sockets seem to solve this issue, as there is no filesystem entry to unlink(). However, the server I'm writing aims to be generic : it must be robust against both types of Unix domain sockets, as I am not responsible for choosing listening addresses.

I know I am very late to the party and that this was answered a long time ago but I just encountered this searching for something else and I have an alternate proposal.
When you encounter the EADDRINUSE return from bind() you can enter an error checking routine that connects to the socket. If the connection succeeds, there is a running process that is at least alive enough to have done the accept(). This strikes me as being the simplest and most portable way of achieving what you want to achieve. It has drawbacks in that the server that created the UDS in the first place may actually still be running but "stuck" somehow and unable to do an accept(), so this solution certainly isn't fool-proof, but it is a step in the right direction I think.
If the connect() fails then go ahead and unlink() the endpoint and try the bind() again.

I don't think there is much to be done beyond things you have already considered. You seem to have researched it well.
There are ways to determine if a socket is bound to a unix socket (obviously lsof and netstat do it) but they are complicated and system dependent enough that I question whether they are worth the effort to deal with the problems you raise.
You are really raising two problems - dealing with name collisions with other applications and dealing with previous instances of your own app.
By definition multiple instances of your pgm should not be trying to bind to the same path so that probably means you only want one instance to run at a time. If that's the case you can just use the standard pid filelock technique so two instances don't run simultaneously. You shouldn't be unlinking the existing socket or even running if you can't get the lock. This takes care of the server crash scenario as well. If you can get the lock then you know you can unlink the existing socket path before binding.
There is not much you can do AFAIK to control other programs creating collisions. File permissions aren't perfect, but if the option is available to you, you could put your app in its own user/group. If there is an existing socket path and you don't own it then don't unlink it and put out an error message and letting the user or sysadmin sort it out. Using a config file to make it easily changeable - and available to clients - might work. Beyond that you almost have to go some kind of discovery service, which seems like massive overkill unless this is a really critical application.
On the whole you can take some comfort that this doesn't actually happen often.

Assuming you only have one server program that opens that socket.
Then what about this:
Exclusively create a file that contains the PID of the server process (maybe also the path of the socket)
If you succeed, then write your PID (and socket path) there and continue creating the socket.
If you fail, the socket was created before (most likely), but the server may be dead. Therefore read the PID from the file that exists, and then check that such a process still exists (e.g. using the kill with 0-signal):
If a process exists, it may be the server process, or it may be an unrelated process
(More steps may be needed here)
If no such process exists, remove the file and begin trying to create it exclusively.
Whenever the process terminates, remove the file after having closed (and removed) the socket.
If you place the socket and the lock file both in a volatile filesystem (/tmp in older ages, /run in modern times, then a reboot will clear old sockets and lock files automatically, most likely)
Unless administrators like to play with kill -9 you could also establish a signal handler that tries to remove the lock file when receiving fatal signals.

Related

Why re-binding to a socket will fail randomly?

This question is related to a question about getting a free port in Haskell, where I included a getFreePort function that retrieved the first available port. This function works on a Windows system, but when I tried on my Linux box it fails randomly (the free port is reported as busy).
I've modified the function to try to re-bind to the free address, and it fails at random:
getFreePort :: IO Integer
getFreePort = do
sock <- socket AF_INET Stream defaultProtocol
bind sock (SockAddrInet aNY_PORT iNADDR_ANY)
port <- socketPort sock
close sock
print "Trying to rebind to the sock"
sock <- socket AF_INET Stream defaultProtocol
bind sock (SockAddrInet port 0x0100007f)
port <- socketPort sock
close sock
return (toInteger port)
I understand that there is a race condition about other process acquiring that port, but isn't this unlikely?
As a general remark, the pattern of check if a resource is available and if so take it is often an anti-pattern. Whenever you do that you run the risk that another process takes the resource after the check but before you actually acquire it yourself.
The only info you have after such a check is that the resource was not used at that particular point in time. It may or may not help you to guess the port's state in the future but the information you have is in no way binding at any later time. You cannot assume that because the resource was free at time t it will still be free at t+dt. Even if dt is very small. It's maybe a bit more likely that it will still be free when you ask fast. But that's just it - maybe a higher probability.
You should just try to acquire a resource and handle failure appropriately. The only way you can be sure a port was really free is when you just successfully opened it. Then you know it was indeed free. As soon as you close it all bets are off again.
I don't think you can ever safely check if a port is free in one process and then assume it still is free in another process. That does not make sense. It does not even make sense within the same process!
At the very least you would have to design a protocol that would go back and forth:
here's a port that was just free, try that
nope, it's taken now
ok, here's another one
nope, it's taken now
ok, here's another one
yep, got it, thanks
But that is pretty silly to begin with. The process that needs the port should just open it. When it already has the port open and not before, then it should communicate the port number to the other party.

Kernel gets stuck after sock_release() call in a custom module

I wrote a Kernel module that deals with socket-based TCP connections. Everything works great except one specific use case. I’d appreciate if somebody advise me how to solve the problem described below.
I have:
Kernel module which is a device registered using
misc_register().
User space application that communicates with this module using the standard file i/o functions: open,
close, ioctl, etc.
The exact scenario looks like this:
Load the module using insmod.
Open the associated device from user application using the standard open() function
call ioctl() that performs the following actions in the Kernel module (insignificant code lines omitted):
`
...
sock_create(PF_INET, SOCK_STREAM, 0, sock);
...
flags = O_NONBLOCK;
sock_map_fd(*sock, flags);
...
kernel_connect (sock, (struct sockaddr *)server_addr, sizeof(struct sockaddr_in), sock->file->f_flags);
...
`
All functions return successfully. The TCP connection is established successfully. After that tere can be also reads/writes on this connection but it doesn’t influence the problem.
If the application finishes naturally or I interrupt it by sending SIGINT the connection is closed nicely - with FIN exchange etc. On SIGKILL it issues TCP as I expect. No problems so far.
Now I would like to close this socket w/o stopping application. I try to do it by calling sock_release() in my Kernel module via another ioctl call. Upon this call the TCP connection is also closed nicely. However now the Kernel gets stuck when my application finishes or is interrupted!
I suspect that the Kernel somehow is not “informed” that the socket is closed. It tries to close it again and fails once the socket memory structure is de-allocated.
Did somebody use sockets from Kernel modules and had similar problems?
Can you recommend an alternative way to work with TCP sockets from Kernel modules?
Alternative ways to close sockets from within Kernel?
Thank you very much in advance.
After Kernel code investigation I found out that in case you map socket to a file using sock_map_fd() function it is not enough to call sock_release(). This function doesn't release the file descriptor associated wit the socket. In case you really need to map a Kernel socket to a file keep the file descriptor returned by sock_map_fd() and use sys_close() function to close the socket and clean up the associated file. Note that when the device file descriptor is closed all sockets created in the module and associated with files are also closed automatically.
Alternatively you can just avoid mapping socket to a file descriptor. The socket basic functionality will stay ok even without the mapping. In this case sock_release() works perfectly.

Perl IO::Socket::UNIX Connect with Timeout gives EAGAIN/EWOULDBLOCK

Ubuntu Linux, 2.6.32-45 kernel, 64b, Perl 5.10.1
I connect many new IO::Socket::UNIX stream sockets to a server, and mostly they work fine. But sometimes in a heavily threaded environment on a faster processor, they return "Resource temporarily unavailable" (EAGAIN/EWOULDBLOCK). I use a timeout on the Connect, so this causes the sockets to be put into non-blocking mode during the connect. But my timeout period isn't occurring - it doesn't wait any noticeable time, it returns quickly.
I see that inside IO::Socket, it tries the connect, and if it fails with EINPROGRESS or EAGAIN/EWOULDBLOCK, it does a select to wait for the write bit to be set. This seems normal so far. In my case the select quickly succeeds, implying that the write bit is set, and the code then tries a re-connect. (I guess this is an attempt to get any error via error slippage?) Anyway, the re-connect fails again with the EAGAIN/EWOULDBLOCK.
In my code this is easy to fix with a re-try loop. But I don't understand why, when the socket becomes writeable, that the socket is not re-connectable. I thought the select guard was always sufficient for a non-blocking connect. Apparently not; so my questions are:
What conditions cause the connect to fail when the select works (the write bit gets set)?
Is there a better way than spinning and retrying, to wait for the connect to succeed? The spinning is wasting cycles. Instead I'd like it to block on something like a select/poll, but I still need a timeout.
Thanx,
-- Steve
But I don't understand why, when the socket becomes writeable, that the socket is not re-connectable.
I imagine it's because whatever needed resource became free got snatched up before you were able to connect again. Replacing the select with a spin loop would not help that.

getting local host name as destination using getaddrinfo/getnameinfo

I am looking through some out-of-date code which uses getaddrinfo and getnameinfo to determine host name information and then falls back to gethostname and gethostbyname if getnameinfo fails.
Now, this seems wrong to me. I am trying to understand the intent of the code so that I can make a recommendation. I don't want to repost the entire code here because it is long and complicated, but I'll try to summarize:
As far as I can tell, the point of this code is to generate a string which can be used by another process to connect to a listening socket. This seems to be not just for local processes, but also for remote hosts to connect back to this computer.
So the code in question is basically doing the following:
getaddrinfo(node = NULL, service = port, hints.ai_flags = AI_PASSIVE, ai); -- this gets a list of possible arguments for socket() that can be used with bind().
go through the list of results and create a socket.
first time a socket is successfully created, this is selected as the "used" addrinfo.
for the ai_addr of the selected addrinfo, call getnameinfo() to get the associated host name.
if this fails, call gethostname(), then look up gethostbyname() on the result.
There are a few reasons I think this is wrong, but I want to verify my logic. Firstly, it seems from some experiments that getnameinfo() pretty much always fails here. I suppose that the input address is unknown, since it is a listening socket, not a destination, so it doesn't need a valid IP from this point of view. Then, calling gethostname() and passing the result to gethostbyname() pretty much always returns the same result as gethostname() by itself. In other words, it's just verifying the local host name, and seems pointless to me. This is problematic because it's not even necessarily usable by remote hosts, is it?
Somehow I think it's possible that the whole idea of trying to determine your own host name on the subnet is not that useful, but rather you must ping a message to another host and see what IP address they see it as. (Unfortunately in this context that doesn't make sense, since I don't know other peers at this level of the program.) For instance, the local host could have more than one NIC and therefore multiple IP addresses, so trying to determine a single host-address pair is nonsensical. (Is the correct resolution to just bind() and simultaneously listen on all addrinfo results?)
I also noticed that one can get names resolved by just passing them in to getaddrinfo() and setting the AI_CANONNAME flag, meaning the getnameinfo() step may be redundant. However, I guess this is not done here because they are trying to determine some kind of unbiased view of the hostname without supplying it apriori. Of course, it fails, and they end up using gethostname() anyways! I also tried supplying "localhost" to getaddrinfo(), and it reports in ai_canonname` the host name under Linux, but just results in "localhost" on OS X, so not so useful since this is supposed to be cross-platform.
I guess to summarize, my question is, what is the correct way, if one exists, to get a local hostname that can be announced to subnet peers, in modern socket programming? I am leaning towards replacing this code with simply returning the results of gethostname(), but I'm wondering if there's a more appropriate solution using modern calls like getaddrinfo().
If the answer is that there's no way to do this, I'll just have to use gethostname() anyways since I must return something here, or it would break the API.
If I read this correctly, you just want to get a non-localhost socket address that is likely to succeed for creating a local socket, and for a remote host to connect back on.
I have a function that I wrote that you can reference called "GetBestAddressForSocketBind". You can get it off my GitHub project page here. You may need to reference some of the code in the parent directory.
The code essentially just uses getifaddrs to enumerate adapters and picks the first one that is "up", not a loopback/local and has an IP address of the desired address family (AF_INET or AF_INET6).
Hope this helps.
I think that you should look at Ulrich Drepper's article about IPv6 programming. It is relatively short and may answer on some of your concerns. I found it really useful. I'm posting this link, because it is very difficult to answer to your question(s) without (at least) pseudo-code.

Socket Read In Multi-Threaded Application Returns Zero Bytes or EINTR (104)

Am a c-coder for a while now - neither a newbie nor an expert. Now, I have a certain daemoned application in C on a PPC Linux. I use PHP's socket_connect as a client to connect to this service locally. The server uses epoll for multiplexing connections via a Unix socket. A user submitted string is parsed for certain characters/words using strstr() and if found, spawns 4 joinable threads to different websites simultaneously. I use socket, connect, write and read, to interact with the said webservers via TCP on their port 80 in each thread. All connections and writes seems successful. Reads to the webserver sockets fail however, with either (A) all 3 threads seem to hang, and only one thread returns -1 and errno is set to 104. The responding thread takes like 10 minutes - an eternity long:-(. *I read somewhere that the 104 (is EINTR?), which in the network context suggests that ...'the connection was reset by peer'; or (B) 0 bytes from 3 threads, and only 1 of the 4 threads actually returns some data. Isn't the socket read/write thread-safe? I use thread-safe (and reentrant) libc functions such as strtok_r, gethostbyname_r, etc.
*I doubt that the said webhosts are actually resetting the connection, because when I run a single-threaded standalone (everything else equal) all things works perfectly right, but of course in series not parallel.
There's a second problem too (oops), I can't write back to the client who connect to my epoll-ed Unix socket. My daemon application will hang and hog CPU > 100% for ever. Yet nothing is written to the clients end. Am sure the client (a very typical PHP socket application) hasn't closed the connection whenever this is happening - no error(s) detected either. Any ideas?
I cannot figure-out whatever is wrong even with Valgrind, GDB or much logging. Kindly help where you can.
Yes, read/write are thread-safe. But beware of gethostbyname() and getservbyname() if you're using them - they return pointers to static data, and may not be thread-safe.
errno 104 is ECONNREFUSED (not EINTR). Use strerror or perror to get the textual error message (like 'Connection reset by peer') for a particular errno code.
The best way to figure out what's going wrong is often to do very detailed logging - log the results of every operation, plus details like the IP address/port connecting to, the number of bytes read/written, the thread id, and so forth. And, of course, make sure your logging code is thread-safe :-)
Getting an ECONNRESET after 10 minutes sounds like the result of your connection timing out. Either the web server isn't sending the data or your app isn't receiving it.
To test the former, hookup a program like Wireshark to the local loopback device and look for traffic to and from the port you are using.
For the later, take a look at the epoll() man page. They mention a scenario where using edge triggered events could result in a lockup, because there is still data in the buffer, but no new data comes in so no new event is triggered.