Does recvmsg() which returns ENOBUFS also returns available messages? - sockets

I'm using NETLINK socket to receive NETLINK_ROUTE notifications in a user-space application.
I understand that ENOBUFS error is returned from recvmsg() when:
The user-space application is too slow to handle all the NETLINK messages that the kernel subsystem sends at a give rate.
The queue that is used to store messages that go from kernel to user-space is too small.
Now, am sure that the second point does not happen in my case since am able to receive certain notifications initially without any error.
After a period of time, I get the ENOBUFS error.
My doubt is that when the recvmsg() returns ENOBUFS:
Will it also fill and return the available messages that are still in
the socket buffer?
Or will it just return ENOBUFS?
Because, according to my understanding, if the socket buffer is full and NETLINK cannot write any more notifications onto the socket buffer, then it means there are still messages that need to be read from the socket.

Related

very long block in send(), seems to the thread related, not TCP

I have an application whose main purpose is to transform a RTP stream into an HTTP stream. One thread is receiving RTP packets and write them into a circular buffer and another thread acts as a mini webserver and answers HTTP request by reading from that buffer (only one GET request can happen at a time).
This HTTP thread, once the GET has been received is a simple loop that call send() whenever there is something in the circular buffer. But sometimes, the send() blocks for an insane amount of time (like >1s), creating audio dropout.
To be clear, RTP packets arrive in due real time, no over or underflow here. The HTTP socket is, on purpose, blocking as it is expected that the receiver regulates its flow using TCP when it does not need audio (enough on its own buffers). But the HTTP client is not overwhelmed by audio as the RTP source is, again, just doing realtime.
But obviously, something else happens and I've observed that on Linux, MacOS and Windows (the code works on all these) and on two different network topologies.
I'm wondering if the send() long blocks are not due to something else than the TCP flow control, like something I'm missing with what happens when a thread blocks in a send()
Get a wireshark trace so you can see where the TCP stall is happening. I suspect what is happening is any of the following:
You're actually sending faster than client is consuming. I think you've already ruled that out...
The more likely case is that an IP packet is getting lost and TCP is stuck waiting for the ACK, times out, and then retransmits. Meanwhile your sending thread is trying to stuff more data into the socket and it's getting backed up and eventually blocks.
One simple things you can do is to try increasing the send buffer (SO_SNDBUF) on the socket you send with. This value specifies how many untransmitted bytes that the app can write to the socket before blocking. And if possible, increase the receive buffer (SO_RCVBUF) on the client side. That way, if the network takes a burp for a couple of seconds, your socket will take longer to fill up before blocking.
int size = 512*1024;
setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size));

TCP connection for real time

I want to use a real time TCP connection, I have a streaming of data from server , and I receive it by a client, but this client is too slow to receive as fast as the sender is, so the server buffer the data until it's reach the destination, for example if I "produce" data at time t, and suppose that the client are 10 time slower, then the data produced at time t, will arrive at time 10t.
I want to make the server "drop" the data that can't reach the client at the present time, and send the new data which is expected to arrive at the time?
B.S : I know that UDP protocol do this, but I want to do this by TCP.
I've done this sort of thing in the past, and got reasonably good results. Here's how I did it:
1) On the sending side, use setsockopt(SOL_SOCKET, SO_SNDBUF) to make the server's TCP socket's send buffer as small as you can get away with (since you can't drop data once it's already in the socket's send buffer, you want to keep as little data there as possible)
2) On the sending side, never proactively send() any outgoing data into the socket at all. Instead, write a function (we'll call it DumpCurrentStateToBuffer()) that writes the "current state" bytes (that you want to send to the client) into an in-memory buffer.
3) When the client's socket select()'s (or poll()'s, or whatever mechanism you use) as ready-for-write, call DumpCurrentStateToBuffer() to create a memory-buffer of bytes that are to be sent to the client. Now send that data to the client (if you're using blocking I/O you can do it synchronously, at the cost of potentially stalling your server until the data can be sent; OTOH if you're using non-blocking I/O, you may need to keep the memory-buffer and your current sent-bytes index into the buffer around as state variables, so you can keep sending more sub-chunks of the memory buffer over time, whenever the socket indicates that it can receive more bytes)
4) Once the memory-buffer's contents have been fully sent, you can free the memory buffer, and then wait for the socket to select as ready-for-write again; when it does, goto (3).
This technique doesn't solve all of TCP's non-real-time issues; for example, a dropped TCP packet will still have to be resent to the client. What it does do is guarantee that the client-to-server data backlog will never be more than one or two "states" long, because you never generate any new data unless/until there is at least some room in the socket's output buffer.

what happens when I write data to a blocking socket, faster than the other side reads?

suppose I write data really fast [I have all the data in memory] to a blocking socket.
further suppose the other side will read data very slow [like sleep 1 second between each read].
what is the expected behavior on the writing side in this case?
would the write operation block until the other side reads enough data, or will the write return an error like connection reset?
For a blocking socket, the send() call will block until all the data has been copied into the networking stack's buffer for that connection. It does not have to be received by the other side. The size of this buffer is implementation dependent.
Data is cleared from the buffer when the remote side acknowledges it. This is an OS thing and is not dependent upon the remote application actually reading the data. The size of this buffer is also implementation dependent.
When the remote buffer is full, it tells your local stack to stop sending. When data is cleared from the remote buffer (by being read by the remote application) then the remote system will inform the local system to send more data.
In both cases, small systems (like embedded systems) may have buffers of a few KB or smaller and modern servers may have buffers of a few MB or larger.
Once space is available in the local buffer, more data from your send() call will be copied. Once all of that data has been copied, your call will return.
You won't get a "connection reset" error (from the OS -- libraries may do anything) unless the connection actually does get reset.
So... It really doesn't matter how quickly the remote application is reading data until you've sent as much data as both local & remote buffer sizes combined. After that, you'll only be able to send() as quickly as the remote side will recv().
Output (send) buffer gets filled until it gets full and send() block until the buffer get freed enough to enqueue the packet.
As send manual page says:
When the message does not fit into the send buffer of the socket,
send() normally blocks, unless the socket has been placed in non-
blocking I/O mode.
Look at this: http://manpages.ubuntu.com/manpages/lucid/man2/send.2.html

What is the benefit of using non-blocking sockets with the "select" function?

I'm writing a server in Linux that will have to support simultaneous read/write operations from multiple clients. I want to use the select function to manage read/write availability.
What I don't understand is this: Suppose I want to wait until a socket has data available to be read. The documentation for select states that it blocks until there is data available to read, and that the read function will not block.
So if I'm using select and I know that the read function will not block, why would I need to set my sockets to non-blocking?
There might be cases when a socket is reported as ready but by the time you get to check it, it changes its state.
One of the good examples is accepting connections. When a new connection arrives, a listening socket is reported as ready for read. By the time you get to call accept, the connection might be closed by the other side before ever sending anything and before we called accept. Of course, the handling of this case is OS-dependent, but it's possible that accept will simply block until a new connection is established, which will cause our application to wait for indefinite amount of time preventing processing of other sockets. If your listening socket is in a non-blocking mode, this won't happen and you'll get EWOULDBLOCK or some other error, but accept will not block anyway.
Some kernels used to have (I hope it's fixed now) an interesting bug with UDP and select. When a datagram arrives select wakes up with the socket with datagram being marked as ready for read. The datagram checksum validation is postponed until a user code calls recvfrom (or some other API capable of receiving UDP datagrams). When the code calls recvfrom and the validating code detects a checksum mismatch, a datagram is simply dropped and recvfrom ends up being blocked until a next datagram arrives. One of the patches fixing this problem (along with the problem description) can be found here.
Other than the kernel bugs mentioned by others, a different reason for choosing non-blocking sockets, even with a polling loop, is that it allows for greater performance with fast-arriving data. Think what happens when a blocking socket is marked as "readable". You have no idea how much data has arrived, so you can safely read it only once. Then you have to get back to the event loop to have your poller check whether the socket is still readable. This means that for every single read from or write to the socket you have to do at least two system calls: the select to tell you it's safe to read, and the reading/writing call itself.
With non-blocking sockets you can skip the unnecessary calls to select after the first one. When a socket is flagged as readable by select, you have the option of reading from it as long as it returns data, which allows faster processing of quick bursts of data.
This going to sound snarky but it isn't. The best reason to make them non-blocking is so you don't block.
Think about it. select() tells you there is something to read but you don't know how much. Could be 2 bytes, could be 2,000. In most cases it more efficient to drain whatever data is there before going back to select. So you enter a while loop to read
while (1)
{
n = read(sock, buffer, 200);
//check return code, etc
}
What happens on the last read when there is nothing left to read? If the socket isn't non-blocking you will block, thereby defeating (at least partially) the point of the select().
One of the benefits, is that it will catch any programming errors you make, because if you try to read a socket that would normally block you, you'll get EWOULDBLOCK instead. For objects other than sockets, the exact api behaviour may change, see http://www.scottklement.com/rpg/socktut/nonblocking.html.

Is a successful send() "atomic"?

Does a successful call to send() with the number returned equal to the amount specified in the size parameter guarantee that no "partial sends" will occur?
Or is there some way that the OS might be interrupted while servicing the system call, send part of the data, wait for a possibly long time, then send the rest and return without notifying me with a smaller return value?
I'm not talking about a case where there is not enough room in the kernel buffer; I realize that I would then get a smaller return value and have to try again.
Update:
Based on the answers so far, my question could be rephrased as follows:
Is there any way for packets/data to be sent over the wire before the call to send() returns?
Does a successful call to send() with the number returned equal to the amount specified in >the size parameter guarantee that no "partial sends" will occur?
No, it's possible that parts of your data gets passed over the wire, and another part only goes as far as being copied into the internal buffers of the local TCP stack. send() will return the no. of bytes passed to the local TCP stack, not the no. of bytes that gets passed onto the wire (and even if the data reaches the wire, it might not reach the peer).
Or is there some way that the OS might be interrupted while servicing the system call, send part of the data, wait for a possibly long time, then send the rest and return without notifying me with a smaller return value?
As send() only returns the no. of bytes passed into the local TCP stack, not whether send() actually sends anything, you can't really distinguish these two cases anyway. But yes, it's possibly only some data makes it over the wire. Even if there's enough space in the local buffer, the peer might not have enough space. If you send 2 bytes, but the peer only has room for 1 more byte, 1 byte might be sent, the other will reside in the local tcp stack until the peer has enough room again.
(That's an extreme example, most TCP stacks protects against sending such small segments of data at a time, but the same applies if you try to send 4k of data but the peer only have room for 3k).
I'm not talking about a case where there is not enough room in the kernel buffer; I realize that I would then get a smaller return value and have to try again
That will only happen if your socket is non-blocking. If it's blocking and the local buffers are full, send() will wait until there's room in the local buffers again (or, it might return
a short count if parts of the data was delivered, but an error occured in the mean time.)
Edit to answer:
Is there any way for packets/data to be sent over the wire before the call to send() returns?
Yes. That might happen for many reasons.
e.g.
The local buffers gets filled up by that recent send() call, and you use blocking I/O.
The TCP stack sends your data over the wire but decides to schedule other processes to
run before that sending process returns from send().
Though this depends on the protocol you are using, the general question is no.
For TCP the data gets buffered inside the kernel and then sent out at the discretion of the TCP packetization algorithm, which is pretty hairy - it keeps multiple timers, minds path MTU trying to avoid IP fragmentation.
For UDP you can only assume this kind of "atomicity" if your datagram does not exceed link frame size (usual value is 1472 = 1500 of ethernet frame - 20 bytes of IP header - 8 bytes of UDP header). Otherwise your sending host will have to IP-fragment the datagram.
Then intermediate routers can still IP-fragment the passing packet if their outgoing link MTU is less then the packet size.