SOCK_DGRAM socket recv thread-safe? - sockets

Are UNIX SOCK_DGRAM sockets thread-safe for recv() method?
If multiple threads are calling recv() on socket, are both guaranteed to get one clean UDP packet each or is there a chance of data getting mixed up?
Will the behavior be affected by whether socket is in blocking or non-blocking mode? Any pointers to documentation would be highly appreciated.

Calling recv() from multiple threads is a safe operation. If the socket is a datagram socket then each recv returns a unique datagram that is not mixed up with other datagrams.
Posix standard explicitly enumerates all standard functions that are unsafe:
2.9.1 Thread-Safety
All functions defined by this volume of POSIX.1-2017 shall be
thread-safe, except that the following functions1 need not be
thread-safe.
asctime() basename() catgets() crypt() ctime()
....
There are almost 100 unsafe functions and further functions that are safe under certain conditions only. recv() is not there. See POSIX.1-2017 2.9.1 Thread-Safety.

Related

When is a file descriptor not considered available for writing? [duplicate]

When, exactly, does the BSD socket send() function return to the caller?
In non-blocking mode, it should return immediately, correct?
As for blocking mode, the man page says:
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in non-blocking I/O mode.
Questions:
Does this mean that the send() call will always return immediately if there is room in the kernel send buffer?
Is the behavior and performance of the send() call identical for TCP and UDP? If not, why not?
Does this mean that the send() call will always return immediately if there is room in the kernel send buffer?
Yes. As long as immediately means after the memory you provided it has been copied to the kernel's buffer. Which, in some edge cases, may not be so immediate. For instance if the pointer you pass in triggers a page fault that needs to pull the buffer in from either a memory mapped file or the swap, that would add significant delay to the call returning.
Is the behavior and performance of the send() call identical for TCP and UDP? If not, why not?
Not quite. Possible performance differences depends on the OS' implementation of the TCP/IP stack. In theory the UDP socket could be slightly cheaper, since the OS needs to do fewer things with it.
EDIT: On the other hand, since you can send much more data per system call with TCP, typically the cost per byte can be a lot lower with TCP. This can be mitigated with sendmmsg() in recent linux kernels.
As for the behavior, it's nearly identical.
For blocking sockets, both TCP and UDP will block until there's space in the kernel buffer. The distinction however is that the UDP socket will wait until your entire buffer can be stored in the kernel buffer, whereas the TCP socket may decide to only copy a single byte into the kernel buffer (typically it's more than one byte though).
If you try to send packets that are larger than 64kiB, a UDP socket will likely consistently fail with EMSGSIZE. This is because UDP, being a datagram socket, guarantees to send your entire buffer as a single IP packet (or train of IP packet fragments) or not send it at all.
Non blocking sockets behave identical to the blocking versions with the single exception that instead of blocking (in case there's not enough space in the kernel buffer), the calls fail with EAGAIN (or EWOULDBLOCK). When this happens, it's time to put the socket back into epoll/kqueue/select (or whatever you're using) to wait for it to become writable again.
As usual when working on POSIX, keep in mind that your call may fail with EINTR (if the call was interrupted by a signal). In this case you most likely want to call send() again.
If there is room in the kernel buffer, then send() copies as many bytes as it can into the buffer and exits immediately, returning how many bytes were actually copied (which can be fewer than how many you requested). If there is no room in the kernel buffer, then send() blocks until either room becomes available or a timeout occurs (if one is configured).
The send() will return as soon as the data has been accepted by the kernel.
In case of blocking socket: The send() will block if the kernel buffer is not free enough to intake the data provided to send() call.
Non blocking sockets: send() will not block, but would fail and returns -1 or it may return number of bytes copied partially(depending on the buffer space available). It sets the errno EWOULDBLOCK or EAGAIN. This means at that time of send(), the buffer was not able to intake all the bytes and you should try again with select() call to send() the data again. Or you could put a loop with a sleep() and call send(), but you have to take care of number of bytes actually sent and the remaining number of bytes that are to be sent.
Does this mean that the send() call
will always return immediately if
there is room in the kernel send
buffer?
Shouldn't it? The moment after which the data "is sent" can be defined differently. I think this is a moment when OS accepted your data for delivery on stack. Otherwise it's quite diffucult to define it. Is it a moment, when data is transmitted to network card buffer? Or after the moment when data is pushed out of network card buffer?
Is there any problem you need to know this for sure or you are just curious?
Your presumption is correct. If there is room in the kernel send buffer, the kernel will copy the data into the send buffer and send() will return.

Boost socket async_send: how does it handle ewouldblock?

On Unix, send() on a non-blocking socket could return error EWOULDBLOCK if outbound socket buffer is full. In this case, one should call select() to determine when it's possible to retry. Does Boost sockets in nonblocking mode handle all of this for you?
Yes, it does. You can check for yourself for example here boost/asio/detail/impl/socket_ops.ipp.

Mechanism of MSG_WAITALL in Berkeley socket

In Berkeley socket, is recv function with MSG_WAITALL flag set, replaces having multiple read functions until the the whole data requested has been read?
I mean does recv function read the whole block determined by the size in one call, whereas the the read function might read part of the data block, and I need to call it multiple times in a loop until the whole block is read?
Yes, MSG_WAITALL tells recv() to wait until all of the requested bytes have been read. However, that it is only supported in blocking mode and not in non-blocking mode, and it only works on stream-oriented sockets, like TCP. Even then, you also still have to loop, such as on Linux if recv() gets interrupted by a signal and has to be called again to continue reading.

How is it possible to have send timeout on a non blocking socket?

I have some problems understanding the working of sockets in Linux.
setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, &timeout, sizeof(int));
write = write(sockfd, buf, len);
In the above code as writes are buffered, send timeout doesn't make any sense(write system call will return immediately when the user space buffer is copied into the kernel buffers). Send buffer size is much more important parameter, but send timeout seems it does nothing worthwile. But I am certainly wrong, as I have seen quite a lot of code which uses SO_SNDTIMEO. How can user space code timeout using SO_SNDTIMEO assuming that the receiver is very slow?
How is it possible to have send timeout on a non blocking socket?
It isn't. Timeouts are for blocking mode. A non-blocking recv() won't block, and therefore cannot time out either.
I have seen a lot of code which uses SO_SNDTIMEO.
Not in non-blocking mode unless the code concerned is nonsense.
SO_SNDTIMEO is useful for a blocking socket. If the socket's buffer is full, send() can block, in which case it may be useful to use the SO_SNDTIMEO socket option. For non-blocking sockets, if the socket's buffer is full, send will fail immediately, so there is no point in setting SO_SNDTIMEO with a non-blocking socket.

WinSock select() on listen()ing socket, non-blocking I/O?

When I do a select() on a listen()ing socket on Windows and it is non-blocking. Do I get a read event or a write event when there is a connection pending?
read.
From MSDN:
The parameter readfds identifies the
sockets that are to be checked for
readability. If the socket is
currently in the listen state, it will
be marked as readable if an incoming
connection request has been received
such that an accept is guaranteed to
complete without blocking.