Understanding select() in socket programming - sockets

I am in a gaming scenario. Both the server and clients exchange messages through non-blocking UDP in a high rate.
(Maybe odd here...) The legacy code also uses select() with timeout value set to 0, which means select() does not block. select() is within a forever while loop. Upon select() returns a number greater than 0, following code goes receive message through recvfrom(). If it returns 0, following code does not try receiving.
From print-out info, I saw select() sometimes returns 1 (greater than 0). I am confused that since timeout is set to 0, how does select() have time to check if a message is ready to read from any of readfds? Thanks.

According to the spec:
Upon successful completion, the pselect() or select() function shall
modify the objects pointed to by the readfds, writefds, and errorfds
arguments to indicate which file descriptors are ready for reading,
ready for writing, or have an error condition pending, respectively,
and shall return the total number of ready descriptors in all the
output sets. For each file descriptor less than nfds, the
corresponding bit shall be set upon successful completion if it was
set on input and the associated condition is true for that file
descriptor.
If none of the selected descriptors are ready for the requested operation, the pselect() or select() function shall block until at least one of the requested operations becomes ready, until the timeout occurs, or until interrupted by a signal. The timeout parameter controls how long the pselect() or select() function shall take before timing out. If the timeout parameter is not a null pointer, it specifies a maximum interval to wait for the selection to complete. If the specified time interval expires without any requested operation becoming ready, the function shall return. If the timeout parameter is a null pointer, then the call to pselect() or select() shall block indefinitely until at least one descriptor meets the specified criteria. To effect a poll, the timeout parameter should not be a null pointer, and should point to a zero-valued timespec structure.
In short, checking the descriptors happens before checking the timeout. If the socket already has data ready when select() is called, the timeout is ignored and select() exits immediately.

Related

Do select()-ing for ready-to-write fds cause a busy loop?

Usually we use select() for waiting the socket which is ready to read. However, if writefds is also passed into select, then it will immediately return when fds are either readable or writable. The problem is that sockets are writable at most of the time. Won't this cause a busy loop?
You should only wait on writable when you have something to write. The same for the read - you wait for readable only when you're ready to receive data. After each successful writable check there should be write() and after each successful readable check there should be read().
Meeting these criteria you can't introduce busy wait loop as either your stream is not infinite or the socket buffer is not infinite.

HttpURL Connection setReadTimeout

public void setReadTimeout (int timeoutMillis)
Sets the maximum time to wait for an input stream read to complete before giving up. Reading will fail with a SocketTimeoutException if the timeout elapses before data becomes available. The default value of 0 disables read timeouts; read attempts will block indefinitely.
Parameters
timeoutMillis - the read timeout in milliseconds. Non-negative.
What is the meaning of info with bold characters?Is it good to include this in network connection?
It indicates the time between when the socket is connected and when it expects response for the request the client makes.
Default value is good enough unless you want to return the call within a certain duration and you are sure the server doesn't exceed responses beyond certain time.

select() socket call CPU consumption

If I have the following code with select call and suppose the socket fds are not ready for I/O for most of the time, will the thread take CPU or will it sleep and let the other thread do the job. Is select() a CPU intensive call?
while(1)
{
select(maxfd, &rfds, NULL, NULL, NULL);
}
Will the behaviour be same it the timeout = 0 (kind of polling ) instead of NULL.
If the timeout value is NULL, select will block indefinitely until data is available on the sockets and file descriptors in rfds. However, as soon as data is available on any file descriptor in that set, the code you have will consume the entire CPU since you don't show any thing that suggests draining the data off the socket. (The second call to select will return immediately indicating there is still data on the socket).
If you pass in a zero'd out timeval to select, it will be a non-blocking poll. It's equivalent to calling send() or recv() with the MSG_DONTWAIT flag (but without data being copied).

What is the benefit of using non-blocking sockets with the "select" function?

I'm writing a server in Linux that will have to support simultaneous read/write operations from multiple clients. I want to use the select function to manage read/write availability.
What I don't understand is this: Suppose I want to wait until a socket has data available to be read. The documentation for select states that it blocks until there is data available to read, and that the read function will not block.
So if I'm using select and I know that the read function will not block, why would I need to set my sockets to non-blocking?
There might be cases when a socket is reported as ready but by the time you get to check it, it changes its state.
One of the good examples is accepting connections. When a new connection arrives, a listening socket is reported as ready for read. By the time you get to call accept, the connection might be closed by the other side before ever sending anything and before we called accept. Of course, the handling of this case is OS-dependent, but it's possible that accept will simply block until a new connection is established, which will cause our application to wait for indefinite amount of time preventing processing of other sockets. If your listening socket is in a non-blocking mode, this won't happen and you'll get EWOULDBLOCK or some other error, but accept will not block anyway.
Some kernels used to have (I hope it's fixed now) an interesting bug with UDP and select. When a datagram arrives select wakes up with the socket with datagram being marked as ready for read. The datagram checksum validation is postponed until a user code calls recvfrom (or some other API capable of receiving UDP datagrams). When the code calls recvfrom and the validating code detects a checksum mismatch, a datagram is simply dropped and recvfrom ends up being blocked until a next datagram arrives. One of the patches fixing this problem (along with the problem description) can be found here.
Other than the kernel bugs mentioned by others, a different reason for choosing non-blocking sockets, even with a polling loop, is that it allows for greater performance with fast-arriving data. Think what happens when a blocking socket is marked as "readable". You have no idea how much data has arrived, so you can safely read it only once. Then you have to get back to the event loop to have your poller check whether the socket is still readable. This means that for every single read from or write to the socket you have to do at least two system calls: the select to tell you it's safe to read, and the reading/writing call itself.
With non-blocking sockets you can skip the unnecessary calls to select after the first one. When a socket is flagged as readable by select, you have the option of reading from it as long as it returns data, which allows faster processing of quick bursts of data.
This going to sound snarky but it isn't. The best reason to make them non-blocking is so you don't block.
Think about it. select() tells you there is something to read but you don't know how much. Could be 2 bytes, could be 2,000. In most cases it more efficient to drain whatever data is there before going back to select. So you enter a while loop to read
while (1)
{
n = read(sock, buffer, 200);
//check return code, etc
}
What happens on the last read when there is nothing left to read? If the socket isn't non-blocking you will block, thereby defeating (at least partially) the point of the select().
One of the benefits, is that it will catch any programming errors you make, because if you try to read a socket that would normally block you, you'll get EWOULDBLOCK instead. For objects other than sockets, the exact api behaviour may change, see http://www.scottklement.com/rpg/socktut/nonblocking.html.

Do I need to poll nonblocking sockets for better performance?

I have a list of nonblocking sockets.
I could call recv in each one (in this case, some calls shall fail) or poll the list and later call recv on ready sockets.
Is there a performance difference between these approaches?
Thanks!
Unless the rate of data on the sockets is quite high (eg: recv() will fail <25% of the time), using poll() or select() is almost always the better choice.
Modern operating system will intelligent block a poll() operation until one of fds in the set is ready (the kernel will block the thread on a set of fds, awaking it only when that fd has been accessed... ultimately, this happens far more than necessary, resulting in some busy-waiting, but it's better than nothing), while a recv() loop will always result in busy waiting.