how libevent detect that a socket is closed - sockets

if I add an event for a specific socket to event loop,
for example, a TCP connection socket.
then it may happen that the socket is closed,
then how will libevent act?
can it detect this?
thanks!

EDIT: I think I misinterpreted your question at first.
If you mean that the socket is closed from the remote end
As per the documentation you can use the event_new() and event_add() calls to register interest in a socket. Make sure you specify EV_READ since you are interested in when the socket is closed.
Remember that there is no difference in file descriptor readiness between data available for reading and a closed socket. Normally you must read the socket to find out which condition is true, but if you don't want to read the socket then you can look here for a hint.
If you mean that the socket is locally closed (the file descriptor was closed
Using a file descriptor after it has been closed is never defined and can always lead to undefined results. This is not specific to libevent. Before you close a file descriptor, you must make sure that no other thread in your program is using it, and you must make sure that no other part of your program is going to try using it in the future. That means unregistering the file descriptor from libevent at the same time that you close it.

Related

What is correct procedure following a failure to connect a TCP socket?

I'm writing a TCP client using asynchronous calls. If the server is active when the app starts then it connects and talks OK. However if the first connect fails, then every subsequent call to connect() fails with WSAENOTCONN(10057) without producing any network traffic (checked with Wireshark).
Currently the code does not close the socket when it gets the error. The TCP state diagram does not seem to require it. It simply waits for 30 seconds and tries the connect() again.
That subsequent connect() and the following read() both return the WSAENOTCONN error.
Do I need to close the socket and open a new one? If so, which errors require me to close the socket, since there are a lot of different errors, some of which I will probably never see on the test bench.
You can assume this is MS Winsock2, although it is actually Interval Zero RTX 2009, which is subtly different in some places.
Do I need to close the socket and open a new one?
Yes.
If so, which errors require me to close the socket, since there are a lot of different errors, some of which I will probably never see on the test bench.
Almost all errors are fatal to the connection and should result in you closing the socket. EAGAIN/EWOULDBLOCK s a prominent exception, as is EINTR, but I can't think of any others offhand.
Do I need to close the socket and open a new one?
Yes.
You should close the socket under all error conditions that results in connection gone for good (Say, like the peer has closed the connection)

Kernel gets stuck after sock_release() call in a custom module

I wrote a Kernel module that deals with socket-based TCP connections. Everything works great except one specific use case. I’d appreciate if somebody advise me how to solve the problem described below.
I have:
Kernel module which is a device registered using
misc_register().
User space application that communicates with this module using the standard file i/o functions: open,
close, ioctl, etc.
The exact scenario looks like this:
Load the module using insmod.
Open the associated device from user application using the standard open() function
call ioctl() that performs the following actions in the Kernel module (insignificant code lines omitted):
`
...
sock_create(PF_INET, SOCK_STREAM, 0, sock);
...
flags = O_NONBLOCK;
sock_map_fd(*sock, flags);
...
kernel_connect (sock, (struct sockaddr *)server_addr, sizeof(struct sockaddr_in), sock->file->f_flags);
...
`
All functions return successfully. The TCP connection is established successfully. After that tere can be also reads/writes on this connection but it doesn’t influence the problem.
If the application finishes naturally or I interrupt it by sending SIGINT the connection is closed nicely - with FIN exchange etc. On SIGKILL it issues TCP as I expect. No problems so far.
Now I would like to close this socket w/o stopping application. I try to do it by calling sock_release() in my Kernel module via another ioctl call. Upon this call the TCP connection is also closed nicely. However now the Kernel gets stuck when my application finishes or is interrupted!
I suspect that the Kernel somehow is not “informed” that the socket is closed. It tries to close it again and fails once the socket memory structure is de-allocated.
Did somebody use sockets from Kernel modules and had similar problems?
Can you recommend an alternative way to work with TCP sockets from Kernel modules?
Alternative ways to close sockets from within Kernel?
Thank you very much in advance.
After Kernel code investigation I found out that in case you map socket to a file using sock_map_fd() function it is not enough to call sock_release(). This function doesn't release the file descriptor associated wit the socket. In case you really need to map a Kernel socket to a file keep the file descriptor returned by sock_map_fd() and use sys_close() function to close the socket and clean up the associated file. Note that when the device file descriptor is closed all sockets created in the module and associated with files are also closed automatically.
Alternatively you can just avoid mapping socket to a file descriptor. The socket basic functionality will stay ok even without the mapping. In this case sock_release() works perfectly.

Properly close a TCP socket

What is the recommended way to "gracefully" close a TCP socket?
I've learned that close()ing before read()ing all the remaining data in my host buffer can cause problems for the remote host (he could lose all the data in his receive buffer that has been acked, but not yet been read by his application). Is that correct?
What would be a good approach to avoid that situation? Is there some way to tell the API that I don't really care about data being lost due to my ignoring any remaining buffered data and closing the socket?
Or do I have to consider the problem at the application protocol level and use some kind of implicit or explicit "end of transmission" signal to let the other party know that it's safe to close a socket for reading?
1) Call shutdown to indicate that you will not write any more data to the socket.
2) Continue to read from the socket until you get either an error or the connection is closed.
3) Now close the socket.
If you don't do this, you may wind up closing the connection while there's still data to be read. This will result in an ungraceful close.

How to know whether any process is bound to a Unix domain socket?

I'm writing a Unix domain socket server for Linux.
A peculiarity of Unix domain sockets I quickly found out is that, while creating a listening Unix socket creates the matching filesystem entry, closing the socket doesn't remove it. Moreover, until the filesystem entry is removed manually, it's not possible to bind() a socket to the same path again : bind() fails with EADDRINUSE if the path it is given already exists in the filesystem.
As a consequence, the socket's filesystem entry needs to be unlink()'ed on server shutdown to avoid getting EADDRINUSE on server restart. However, this cannot always be done (i.e.: server crash). Most FAQs, forum posts, Q&A websites I found only advise, as a workaround, to unlink() the socket prior to calling bind(). In this case however, it becomes desirable to know whether a process is bound to this socket before unlink()'ing it.
Indeed, unlink()'ing a Unix socket while a process is still bound to it and then re-creating the listening socket doesn't raise any error. As a result, however, the old server process is still running but unreachable : the old listening socket is "masked" by the new one. This behavior has to be avoided.
Ideally, using Unix domain sockets, the socket API should have exposed the same "mutual exclusion" behavior that is exposed when binding TCP or UDP sockets : "I want to bind socket S to address A; if a process is already bound to this address, just complain !" Unfortunately this is not the case...
Is there a way to enforce this "mutual exclusion" behavior ? Or, given a filesystem path, is there a way to know, via the socket API, whether any process on the system has a Unix domain socket bound to this path ? Should I use a synchronization primitive external to the socket API (flock(), ...) ? Or am I missing something ?
Thanks for your suggestions.
Note : Linux's abstract namespace Unix sockets seem to solve this issue, as there is no filesystem entry to unlink(). However, the server I'm writing aims to be generic : it must be robust against both types of Unix domain sockets, as I am not responsible for choosing listening addresses.
I know I am very late to the party and that this was answered a long time ago but I just encountered this searching for something else and I have an alternate proposal.
When you encounter the EADDRINUSE return from bind() you can enter an error checking routine that connects to the socket. If the connection succeeds, there is a running process that is at least alive enough to have done the accept(). This strikes me as being the simplest and most portable way of achieving what you want to achieve. It has drawbacks in that the server that created the UDS in the first place may actually still be running but "stuck" somehow and unable to do an accept(), so this solution certainly isn't fool-proof, but it is a step in the right direction I think.
If the connect() fails then go ahead and unlink() the endpoint and try the bind() again.
I don't think there is much to be done beyond things you have already considered. You seem to have researched it well.
There are ways to determine if a socket is bound to a unix socket (obviously lsof and netstat do it) but they are complicated and system dependent enough that I question whether they are worth the effort to deal with the problems you raise.
You are really raising two problems - dealing with name collisions with other applications and dealing with previous instances of your own app.
By definition multiple instances of your pgm should not be trying to bind to the same path so that probably means you only want one instance to run at a time. If that's the case you can just use the standard pid filelock technique so two instances don't run simultaneously. You shouldn't be unlinking the existing socket or even running if you can't get the lock. This takes care of the server crash scenario as well. If you can get the lock then you know you can unlink the existing socket path before binding.
There is not much you can do AFAIK to control other programs creating collisions. File permissions aren't perfect, but if the option is available to you, you could put your app in its own user/group. If there is an existing socket path and you don't own it then don't unlink it and put out an error message and letting the user or sysadmin sort it out. Using a config file to make it easily changeable - and available to clients - might work. Beyond that you almost have to go some kind of discovery service, which seems like massive overkill unless this is a really critical application.
On the whole you can take some comfort that this doesn't actually happen often.
Assuming you only have one server program that opens that socket.
Then what about this:
Exclusively create a file that contains the PID of the server process (maybe also the path of the socket)
If you succeed, then write your PID (and socket path) there and continue creating the socket.
If you fail, the socket was created before (most likely), but the server may be dead. Therefore read the PID from the file that exists, and then check that such a process still exists (e.g. using the kill with 0-signal):
If a process exists, it may be the server process, or it may be an unrelated process
(More steps may be needed here)
If no such process exists, remove the file and begin trying to create it exclusively.
Whenever the process terminates, remove the file after having closed (and removed) the socket.
If you place the socket and the lock file both in a volatile filesystem (/tmp in older ages, /run in modern times, then a reboot will clear old sockets and lock files automatically, most likely)
Unless administrators like to play with kill -9 you could also establish a signal handler that tries to remove the lock file when receiving fatal signals.

What causes the ENOTCONN error?

I'm currently maintaining some web server software and I need to perform a lot of I/O operations. The read(), write(), close() and shutdown() calls, when used on a socket, may sometimes raise an ENOTCONN error. What exactly does this error mean? What are the conditions that would trigger it? I can never seem to reproduce it locally but there are users who can.
Right now I just ignore ENOTCONN when raised by close() and shutdown() because it seems harmless, but I'm not entirely sure.
EDIT:
I am absolutely sure that the connect() call succeeded. I check for its return value.
ENOTCONN is most often raised by close() and shutdown(). I've only very rarely seen a read() and write() raising ENOTCONN.
If you are sure that nothing on your side of the TCP connection is closing the connection, then it sounds to me like the remote side is closing the connection.
ENOTCONN, as others have pointed out, simply means that the socket is not connected. This doesn't necessarily mean that connect failed. The socket may well have been connected previously, it just wasn't at the time of the call that resulted in ENOTCONN.
This differs from:
ECONNRESET: the other end of the connection sent a TCP reset packet. This can happen if the other end is refusing a connection, or doesn't acknowledge that it is already connected, among other things.
ETIMEDOUT: this generally applies only to connect. This can happen if the connection attempt is not successful within a system-dependent amount of time.
EPIPE can sometimes be returned by some socket-related system calls under conditions that are more or less the same as ENOTCONN. For example, on some systems, EPIPE and ENOTCONN are synonymous when returned by send.
While it's not unusual for shutdown to return ENOTCONN, since this function is supposed to tear down the TCP connection, I would be surprised to see close return ENOTCONN. It really should never do that.
Finally, as dwc mentioned, EBADF shouldn't apply in your scenario unless you are attempting some operation on a file descriptor that has already been closed. Having a socket get disconnected (i.e. the TCP connection has broken) is not the same as closing the file descriptor associated with that socket.
It's because, at the moment of shutting() the socket, you have data in the socket's buffer waiting to be delivered to the remote party which has closed() or shutted down() its receiving socket.
I don't finish understanding how sockets work, I am rather a noob, and I've failed to even find the files where this "shutdown" function is implemented, but seeing that there's practically no user manual for the whole sockets thing I started trying all possibilities until I got the error in a "controlled" environment. It could be something else, but after much trying these are the explanations I settled for:
If you sent data after the remote side closed the connection, when you shutdown(), you get the error.
If you sent data before the remote side closed the connection but it didn't get received() on the other end, you can shutdown() once, the next time you try to shutdown(), you get the error.
If you didn't send any data, you can shutdown all the times you want, as long as the remote side doesn't shutdown(); once the remote side has shutdown(), if you try to shutdown() and the socket was already shutdown(), you get the error.
I believe ENOTCONN is returned, because shutdown() is not supposed to return ECONNRESET or other more accurate errors.
It is wrong to assume that the other side “just” closed the connection. On the TCP-level, the other side can only half-close a connection (or abort it). The connection is ordinary fully closed if both sides do a shutdown() (or close()). If both side do that, shutdown() actually succeeds for both of them!
The problem is that shutdown() did not succeed in ordinary (half-)closing the connection, neither as the first one to close it, nor as the second one. – From the errors listed in the POSIX docs for shutdown(), ENOTCONN is the least inappropriate, because the others indicate problems with arguments passed to shutdown() (or local resource problems to handle the request).
So what happened? These days, a NAT device somewhere between the two parties involved might have dropped the association and sends out RESET packets as a reaction. Reset connections are so common for IPv4, that you will get them anywhere in your code, even masked as ENOTCONN in shutdown().
A coding bug might also be the reason. On a non-blocking socket, for example, a connect() can return 0 without indicating a successful connection yet.
Transport endpoint is not connected
The socket is associated with a connection-oriented protocol and has not been connected. This is usually a programming flaw.
From: http://www.wlug.org.nz/ENOTCONN
If you're sure you've connected properly in the first place, ENOTCONN is most likely to be caused by either the fd being closed on your end (perhaps in another thread?) while you're in the middle of a request, or by the connection dropping while you're in the middle of the request.
In any case, it means that the socket is not connected. Go ahead and clean up that socket. It's dead. No problem calling close() or shutdown() on it.