Determine how many bytes can be sent with winsock (FIONWRITE)? - select

With select I can determine if any bytes can be received or sent without blocking.
With this function I can determine how many bytes can be received:
function BytesAvailable(S: TSocket): Integer;
begin
if ioctlsocket(S, FIONREAD, Result) = SOCKET_ERROR then
Result := -1;
end;
Is there also a way to determine how many bytes can be sent?
So I can be sure when I call send with N bytes, it will return exactly N bytes sent (or SOCKET_ERROR) but not less (send buffer is full).
FIONWRITE is not available for Winsock.

According to MVP Alexander Nickolov, there is no such facility in Windows. He also mentions that "good socket code" doesn't use FIONWRITE-like ioctls, but doesn't explain why.
To circumvent this issue, you could enable non-blocking I/O (using FIONBIO, I guess) on sockets you're interested in. That way, WSASend will succeed on such sockets when it can complete sending without blocking, or fail with WSAGetLastError() == WSAEWOULDBLOCK when the buffer is full (as stated in the documentation for WSASend):
WSAEWOULDBLOCK
Overlapped sockets: There are too many outstanding overlapped I/O requests. Nonoverlapped sockets: The socket is marked as nonblocking and the send operation cannot be completed immediately.
Also read further notes about this error code.

Winsock send() blocks only if the socket is running in blocking mode and the socket's outbound buffer fills up with queued data. If you are managing multiple sockets in the same thread, do not use blocking mode. If one receiver does not read data in a timely maner, it can cause all of the connections on that thread to be affected. Use non-blocking mode instead, then send() will report when a socket has entered a state where blocking would occur, then you can use select() to detect when the socket can accept new data again. A better option is to use overlapped I/O or I/O Completion Ports instead. Submit outbound data to the OS and let the OS handle all of the waiting for you, notifying you when the data has eventually been accepted/sent. Do not submit new data for a given socket until you receive that notification. For scalability to a large number of connections, I/O Completion Ports are generally a better choice.

Related

When is a file descriptor not considered available for writing? [duplicate]

When, exactly, does the BSD socket send() function return to the caller?
In non-blocking mode, it should return immediately, correct?
As for blocking mode, the man page says:
When the message does not fit into the send buffer of the socket, send() normally blocks, unless the socket has been placed in non-blocking I/O mode.
Questions:
Does this mean that the send() call will always return immediately if there is room in the kernel send buffer?
Is the behavior and performance of the send() call identical for TCP and UDP? If not, why not?
Does this mean that the send() call will always return immediately if there is room in the kernel send buffer?
Yes. As long as immediately means after the memory you provided it has been copied to the kernel's buffer. Which, in some edge cases, may not be so immediate. For instance if the pointer you pass in triggers a page fault that needs to pull the buffer in from either a memory mapped file or the swap, that would add significant delay to the call returning.
Is the behavior and performance of the send() call identical for TCP and UDP? If not, why not?
Not quite. Possible performance differences depends on the OS' implementation of the TCP/IP stack. In theory the UDP socket could be slightly cheaper, since the OS needs to do fewer things with it.
EDIT: On the other hand, since you can send much more data per system call with TCP, typically the cost per byte can be a lot lower with TCP. This can be mitigated with sendmmsg() in recent linux kernels.
As for the behavior, it's nearly identical.
For blocking sockets, both TCP and UDP will block until there's space in the kernel buffer. The distinction however is that the UDP socket will wait until your entire buffer can be stored in the kernel buffer, whereas the TCP socket may decide to only copy a single byte into the kernel buffer (typically it's more than one byte though).
If you try to send packets that are larger than 64kiB, a UDP socket will likely consistently fail with EMSGSIZE. This is because UDP, being a datagram socket, guarantees to send your entire buffer as a single IP packet (or train of IP packet fragments) or not send it at all.
Non blocking sockets behave identical to the blocking versions with the single exception that instead of blocking (in case there's not enough space in the kernel buffer), the calls fail with EAGAIN (or EWOULDBLOCK). When this happens, it's time to put the socket back into epoll/kqueue/select (or whatever you're using) to wait for it to become writable again.
As usual when working on POSIX, keep in mind that your call may fail with EINTR (if the call was interrupted by a signal). In this case you most likely want to call send() again.
If there is room in the kernel buffer, then send() copies as many bytes as it can into the buffer and exits immediately, returning how many bytes were actually copied (which can be fewer than how many you requested). If there is no room in the kernel buffer, then send() blocks until either room becomes available or a timeout occurs (if one is configured).
The send() will return as soon as the data has been accepted by the kernel.
In case of blocking socket: The send() will block if the kernel buffer is not free enough to intake the data provided to send() call.
Non blocking sockets: send() will not block, but would fail and returns -1 or it may return number of bytes copied partially(depending on the buffer space available). It sets the errno EWOULDBLOCK or EAGAIN. This means at that time of send(), the buffer was not able to intake all the bytes and you should try again with select() call to send() the data again. Or you could put a loop with a sleep() and call send(), but you have to take care of number of bytes actually sent and the remaining number of bytes that are to be sent.
Does this mean that the send() call
will always return immediately if
there is room in the kernel send
buffer?
Shouldn't it? The moment after which the data "is sent" can be defined differently. I think this is a moment when OS accepted your data for delivery on stack. Otherwise it's quite diffucult to define it. Is it a moment, when data is transmitted to network card buffer? Or after the moment when data is pushed out of network card buffer?
Is there any problem you need to know this for sure or you are just curious?
Your presumption is correct. If there is room in the kernel send buffer, the kernel will copy the data into the send buffer and send() will return.

UDP socket gives error WSAETIMEDOUT

I have a call to sendto() for a UDP socket. Sometimes(not always) it blocks my application for ~2.5 seconds. When I check the return value of the sendto() call I get SOCKET_ERROR(-1) and WSAGetLastError() returns WSAETIMEDOUT(10060)
Why would a UDP socket timeout? Under what circumstances would sendto() be a blocking call?
Why would a UDP socket timeout?
It can happen if the socket is running in blocking mode (the default mode), and has a send timeout assigned to it.
Under what circumstances would sendto() be a blocking call?
Sockets are created in blocking mode by default. You have to explicitly request non-blocking behavior if you need it.
In blocking mode, a UDP socket can block if the kernel buffer fills up or if WinSock has to wait for a network event before completing the send. This is documented behavior:
sendto() function
When issuing a blocking Winsock call such as sendto, Winsock may need to wait for a network event before the call can complete. Winsock performs an alertable wait in this situation, which can be interrupted by an asynchronous procedure call (APC) scheduled on the same thread. Issuing another blocking Winsock call inside an APC that interrupted an ongoing blocking Winsock call on the same thread will lead to undefined behavior, and must never be attempted by Winsock clients.
...
If no buffer space is available within the transport system to hold the data to be transmitted, sendto will block unless the socket has been placed in a nonblocking mode. On nonblocking, stream oriented sockets, the number of bytes written can be between 1 and the requested length, depending on buffer availability on both the client and server systems. The select, WSAAsyncSelect or WSAEventSelect function can be used to determine when it is possible to send more data.

UDP non-blocking socket on a real-time OS: sendto() and recvfrom() can return with partial message?

This is my first message here.
I'm working with a non-blocking UDP socket on a real-time OS (OnTime and VxWorks).
I have read the documentation and some forums but I have some doubts about 'atomicity' of sendto() and recvfrom() functions:
sendto() returns the number of bytes enqueued or error. Is it possible that it's less then the input buffer length? Maybe the output buffer has not enough free space and just few bytes are enqueued...
recvfrom() returns the number of byte received or error. Is it possible that it's less then the size of message the source has sent? I mean partial message reading...
I hopes reading and writing functions are atomic (full message or no message read/write).
Thanks.
Emanuele.
I asked to OnTime support and they told me that it's possible that sendto() will enqueue a partial message if the output buffer has not enough free space. I don't know if also recvfrom() could return a partial message in some cases. I suppose that there's no standard behavior on socket implementations among different OS.
sendto() returns the number of bytes enqueued or error. Is it possible
that it's less then the input buffer length?
No. It's sent wholly, or not at all for UDP.
recvfrom() returns the number of byte received or error. Is it
possible that it's less then the size of message the source has sent?
I mean partial message reading...
If the buffers of the OS stack can't hold an entire UDP packet, it is dropped. If your application buffers can't hold the entire packet, you get the initial content of the packet.
i.e. you can read a partial message in just one case, that is if the data cannot fit in the buffer you pass to sendto(). In that case the rest of the packet is discarded. With recvmsg() you can detect if the packet was truncated, but this is normally resolved by either using a max sized buffer (UDP must fit in an IP packet which MTU is 2^16-1), or by designing the protocol you use inside UDP where you set your own reasonable max size.
I'm not really familiar with these systems, but I would be very surprised if they break normal UDP socket semantics, which is always to enqueue a full datagram on "send" and to dequeue a full single datagram on a "receive".

How is it possible to have send timeout on a non blocking socket?

I have some problems understanding the working of sockets in Linux.
setsockopt(sockfd, SOL_SOCKET, SO_SNDTIMEO, &timeout, sizeof(int));
write = write(sockfd, buf, len);
In the above code as writes are buffered, send timeout doesn't make any sense(write system call will return immediately when the user space buffer is copied into the kernel buffers). Send buffer size is much more important parameter, but send timeout seems it does nothing worthwile. But I am certainly wrong, as I have seen quite a lot of code which uses SO_SNDTIMEO. How can user space code timeout using SO_SNDTIMEO assuming that the receiver is very slow?
How is it possible to have send timeout on a non blocking socket?
It isn't. Timeouts are for blocking mode. A non-blocking recv() won't block, and therefore cannot time out either.
I have seen a lot of code which uses SO_SNDTIMEO.
Not in non-blocking mode unless the code concerned is nonsense.
SO_SNDTIMEO is useful for a blocking socket. If the socket's buffer is full, send() can block, in which case it may be useful to use the SO_SNDTIMEO socket option. For non-blocking sockets, if the socket's buffer is full, send will fail immediately, so there is no point in setting SO_SNDTIMEO with a non-blocking socket.

close() socket directly after send(): unsafe?

Is it wise/safe to close() a socket directly after the last send()?
I know that TCP is supposed to try to deliver all remaining data in the send buffer even after closing the socket, but can I really count on that?
I'm making sure that there is no remaining data in my receive buffer so that no RST will be sent following my close.
In my case, the close is actually the very last statement of code before calling exit().
Will the TCP stack really continue to try and transmit the data even after the process sending it has terminated? Is that as reliable as waiting for an arbitrary timeout myself before calling close() by setting SO_LINGER?
That is, do the same TCP timeouts apply, or are they shorter? With a big send buffer and a slow connection, the time to actually transfer all the buffered data could be substantial, after all.
I'm not interested at all in being notified of the last byte sent; I just want them to eventually arrive at the remote host as reliably as possible.
Application layer acknowledgements are not an option (the protocol is HTTP, and I'm writing a small server).
I've been reading the The ultimate SO_LINGER page, or: why is my tcp not reliable blog post a lot. I recommend you read it too. It discusses edge cases of large data transfers with regards to TCP sockets.
I'm not the expert at SO_LINGER, but on my server code (still in active development) I do the following:
After the last byte is sent via send(), I call shutdown(sock, SHUT_WR) to trigger a FIN to be sent.
Then wait for a subsequent recv() call on that socket to return 0 (or recv returns -1 and errno is anything other that EAGAIN/EWOULDBLOCK).
Then the server does a close() on the socket.
The assumption is that the client will close his socket first after it has received all the bytes of the response.
But I do have a timeout enforced between the final send() and when recv() indicates EOF. If the client never closes his end of the connection, the server will give up waiting and close the connection anyway. I'm at 45-90 seconds for this timeout.
All of my sockets are non-blocking and I use poll/epoll to be notified of connection events as a hint to see if it's time to try calling recv() or send() again.
Application layer acknowledgements are not an option (the protocol is HTTP, and I'm writing a small server).
HTTP protocol doesn't suffer from this problem. A HTTP server is not supposed to close the connection in any normal operation. The client closes it after recv(), and it knows exactly how many bytes it expects.
And just to be clear, the answer is "no".
Yes, it is safe that send() then close() immediately.
the kernel will sent out all data in buffer and wait ack, then fin the socket gracefully.