Why would one need to use `MSG_WAITALL` FLAG instead of `0` FLAG? Why to use it with UDP? - sockets

At some point when coding sockets one will face the receive-family of functions (recv, recvfrom, recvmsg).
This function accepts a FLAG argument, in which I see that the MSG_WAITALL is used in many examples on the web, such as this example on UDP.
Here is a definition of the MSG_WAITALL flag
MSG_WAITALL (since Linux 2.2)
This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned. This flag has no effect for datagram sockets.
Hence, my two questions:
Why would one need to use MSG_WAITALL FLAG instead of 0 FLAG? (Could someone explain a scenario of a problem for which the use of this would be the solution?)
Why to use it with UDP?

As the quoted man page mentions, MSG_WAITALL has no effect on UDP sockets, so there's no reason to use it there. Examples that do use it are probably confused and/or the result of several generations of cargo-cult/copy-and-paste programming. :)
For TCP, OTOH, the default behavior of recv() is to block until at least one byte of data can be copied into the user's buffer from the sockets incoming-data-buffer. The TCP stack will try to provide as many bytes of data as it can, of course, but in a case where the socket's incoming-data-buffer contains fewer bytes of data than the user has passed in to recv(), the TCP stack will copy as many bytes as it can, and return the byte-count indicating how many bytes it actually provided.
However, some people find would prefer to have their recv() call keep blocking until all of the bytes in their passed-in array have been filled in, regardless of how long that might take. For those people, the MSG_WAITALL flag provides a simple way to obtain that behavior. (The flag is not strictly necessary, since the programmer could always emulate that behavior by writing a while() loop that calls recv() multiple times as necessary, until all the bytes in the buffer have been populated... but it's provided as a convenience nonetheless)

Related

When is a file descriptor not considered available for writing? [duplicate]

When, exactly, does the BSD socket send() function return to the caller?
In non-blocking mode, it should return immediately, correct?
As for blocking mode, the man page says:
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in non-blocking I/O mode.
Questions:
Does this mean that the send() call will always return immediately if there is room in the kernel send buffer?
Is the behavior and performance of the send() call identical for TCP and UDP? If not, why not?
Does this mean that the send() call will always return immediately if there is room in the kernel send buffer?
Yes. As long as immediately means after the memory you provided it has been copied to the kernel's buffer. Which, in some edge cases, may not be so immediate. For instance if the pointer you pass in triggers a page fault that needs to pull the buffer in from either a memory mapped file or the swap, that would add significant delay to the call returning.
Is the behavior and performance of the send() call identical for TCP and UDP? If not, why not?
Not quite. Possible performance differences depends on the OS' implementation of the TCP/IP stack. In theory the UDP socket could be slightly cheaper, since the OS needs to do fewer things with it.
EDIT: On the other hand, since you can send much more data per system call with TCP, typically the cost per byte can be a lot lower with TCP. This can be mitigated with sendmmsg() in recent linux kernels.
As for the behavior, it's nearly identical.
For blocking sockets, both TCP and UDP will block until there's space in the kernel buffer. The distinction however is that the UDP socket will wait until your entire buffer can be stored in the kernel buffer, whereas the TCP socket may decide to only copy a single byte into the kernel buffer (typically it's more than one byte though).
If you try to send packets that are larger than 64kiB, a UDP socket will likely consistently fail with EMSGSIZE. This is because UDP, being a datagram socket, guarantees to send your entire buffer as a single IP packet (or train of IP packet fragments) or not send it at all.
Non blocking sockets behave identical to the blocking versions with the single exception that instead of blocking (in case there's not enough space in the kernel buffer), the calls fail with EAGAIN (or EWOULDBLOCK). When this happens, it's time to put the socket back into epoll/kqueue/select (or whatever you're using) to wait for it to become writable again.
As usual when working on POSIX, keep in mind that your call may fail with EINTR (if the call was interrupted by a signal). In this case you most likely want to call send() again.
If there is room in the kernel buffer, then send() copies as many bytes as it can into the buffer and exits immediately, returning how many bytes were actually copied (which can be fewer than how many you requested). If there is no room in the kernel buffer, then send() blocks until either room becomes available or a timeout occurs (if one is configured).
The send() will return as soon as the data has been accepted by the kernel.
In case of blocking socket: The send() will block if the kernel buffer is not free enough to intake the data provided to send() call.
Non blocking sockets: send() will not block, but would fail and returns -1 or it may return number of bytes copied partially(depending on the buffer space available). It sets the errno EWOULDBLOCK or EAGAIN. This means at that time of send(), the buffer was not able to intake all the bytes and you should try again with select() call to send() the data again. Or you could put a loop with a sleep() and call send(), but you have to take care of number of bytes actually sent and the remaining number of bytes that are to be sent.
Does this mean that the send() call
will always return immediately if
there is room in the kernel send
buffer?
Shouldn't it? The moment after which the data "is sent" can be defined differently. I think this is a moment when OS accepted your data for delivery on stack. Otherwise it's quite diffucult to define it. Is it a moment, when data is transmitted to network card buffer? Or after the moment when data is pushed out of network card buffer?
Is there any problem you need to know this for sure or you are just curious?
Your presumption is correct. If there is room in the kernel send buffer, the kernel will copy the data into the send buffer and send() will return.

How can I get the remote address from an incoming message on UDP listener socket?

Although it's possible to read from a Gio.Socket by wrapping it's file-descriptor in Gio.DataInputStream, using Gio.Socket.receive_from() in GJS to receive is not possible because as commented here:
GJS will clone array arguments before passing them to the C-code which will make the call to Socket.receive_from work and return the number of bytes received as well as the source of the packet. The buffer content will be unchanged as buffer actually read into is a freed clone.
Thus, input arguments are cloned and data will be written to the cloned buffer, not the instance of buffer actually passed in.
Although reading from a data stream is not a problem, Gio.Socket.receive_from() is the only way I can find to get the remote address from a UDP listener, since Gio.Socket.remote_address will be undefined. Unfortunately as the docs say for Gio.Socket.receive():
For G_SOCKET_TYPE_DATAGRAM [...] If the received message is too large to fit in buffer, then the data beyond size bytes will be discarded, without any explicit indication that this has occurred.
So if I try something like Gio.Socket.receive_from(new Uint8Array(0), null); just to get the address, the packet is swallowed, but if I read via the file-descriptor I can't tell where the message came from. Is there another non-destructive way to get the incoming address for a packet?
Since you’re using a datagram socket, it should be possible to use Gio.Socket.receive_message() and pass the Gio.SocketMsgFlags.PEEK flag to it. This isn’t possible for a stream-based socket, but you are not going to want the sender address for each read you do in that case.
If you want improved performance, you may be able to use Gio.Socket.receive_messages(), although I am not sure whether that’s completely introspectable at the moment.

Mechanism of MSG_WAITALL in Berkeley socket

In Berkeley socket, is recv function with MSG_WAITALL flag set, replaces having multiple read functions until the the whole data requested has been read?
I mean does recv function read the whole block determined by the size in one call, whereas the the read function might read part of the data block, and I need to call it multiple times in a loop until the whole block is read?
Yes, MSG_WAITALL tells recv() to wait until all of the requested bytes have been read. However, that it is only supported in blocking mode and not in non-blocking mode, and it only works on stream-oriented sockets, like TCP. Even then, you also still have to loop, such as on Linux if recv() gets interrupted by a signal and has to be called again to continue reading.

MSG_READALL is to recv() as ?? is to send()

From the recv(2) man page:
MSG_WAITALL
This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned.
It doesn't look like there's an equivalent flag for send(2), which strikes me as strange. Maybe send()s are guaranteed to always accept the whole buffer, but I don't see that written anywhere (and anyway, that seems unlikely to me).
Is there a way to tell send() to wait until it's sent the whole buffer before returning, equivalently to recv()'s MSG_WAITALL?
Edit: I understand that send() just copies data into a buffer in the operating system and that I can't force send() to put data onto the wire. My question is: Can I force send() to block until all the data I gave it has been copied into the OS's buffer?
You can't. send just offloads the buffer to the kernel, then returns.
To quote from the Unix standard:
The send() function shall initiate transmission of a message from the specified socket to its peer (...)
Successful completion of a call to send() does not guarantee delivery of the message.
Note the word "initiate". It doesn't actually send anything, rather tells the OS to send the message when it's ready for it (when its buffers are full or after some time has passed).
send(2) for TCP does not actually "send" anything on the wire, but places your bytes into the socket send buffer. It tells you how many bytes it was able to copy there in the return value.
Make the send buffer bigger (see setsockopt(2) and tcp(7)), pay attention to the syscall return value. In any case, TCP is a stream. You need to manage application-level protocol yourself.

Socket Protocol Fundamentals

Recently, while reading a Socket Programming HOWTO the following section jumped out at me:
But if you plan to reuse your socket for further transfers, you need to realize that there is no "EOT" (End of Transfer) on a socket. I repeat: if a socket send or recv returns after handling 0 bytes, the connection has been broken. If the connection has not been broken, you may wait on a recv forever, because the socket will not tell you that there's nothing more to read (for now). Now if you think about that a bit, you'll come to realize a fundamental truth of sockets: messages must either be fixed length (yuck), or be delimited (shrug), or indicate how long they are (much better), or end by shutting down the connection. The choice is entirely yours, (but some ways are righter than others).
This section highlights 4 possibilities for how a socket "protocol" may be written to pass messages. My question is, what is the preferred method to use for real applications?
Is it generally best to include message size with each message (presumably in a header), as the article more or less asserts? Are there any situations where another method would be preferable?
The common protocols either specify length in the header, or are delimited (like HTTP, for instance).
Keep in mind that this also depends on whether you use TCP or UDP sockets. Since TCP sockets are reliable you can be sure that you get everything you shoved into them. With UDP the story is different and more complex.
These are indeed our choices with TCP. HTTP, for example, uses a mix of second, third, and forth option (double new-line ends request/response headers, which might contain the Content-Length header or indicate chunked encoding, or it might say Connection: close and not give you the content length but expect you to rely on reading EOF.)
I prefer the third option, i.e. self-describing messages, though fixed-length is plain easy when suitable.
If you're designing your own protocol then look at other people's work first; there might already be something similar out there that you could either use 'as is' or repurpose and adjust. For example; ISO-8583 for financial txns, HTTP or POP3 all do things differently but in ways that are proven to work... In fact it's worth looking at these things anyway as you'll learn a lot about how real world protocols are put together.
If you need to write your own protocol then, IMHO, prefer length prefixed messages where possible. They're easy and efficient to parse for the receiver but possibly harder to generate if it is costly to determine the length of the data before you begin sending it.
The decision should depend on the data you want to send (what it is, how is it gathered). If the data is fixed length, then fixed length packets will probably be the best. If data can be easily (no escaping needed) split into delimited entities then delimiting may be good. If you know the data size when you start sending the data piece, then len-prefixing may be even better. If the data sent is always single characters, or even single bits (e.g. "on"/"off") then anything different than fixed size one character messages will be too much.
Also think how the protocol may evolve. EOL-delimited strings are good as long as they do not contain EOL characters themselves. Fixed length may be good until the data may be extended with some optional parts, etc.
I do not know if there is a preferred option. In our real-world situation (client-server application), we use the option of sending the total message length as one of the first pieces of data. It is simple and works for both our TCP and UDP implementations. It makes the logic reasonably "simple" when reading data in both situations. With TCP, the amount of code is fairly small (by comparison). The UDP version is a bit (understatement) more complex but still relies on the size that is passed in the initial packet to know when all data has been sent.