Unexpected WSA_IO_PENDING from blocking (with overlapped I/O attribute) Winsock2 calls - sockets

Short version:
I get WSA_IO_PENDING when using blocking socket API calls. How should I handle it? The socket has overlapped I/O attribute and set with a timeout.
Long version:
Platform: Windows 10. Visual Studio 2015
A socket is created in a very traditional simple way.
s = ::socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
The socket has by default overlapped I/O attribute enabled. It can be verified with getsockop / SO_OPENTYPE.
I do need overlapped attribute because I want to use timeout feature, e.g. SO_SNDTIMEO.
And I would use the socket only in blocking (i.e., synchronous) manner.
socket read operation runs only within a single thread.
socket write operation can be performed from different threads synchronized with the mutex.
The socket is enabled with timeout and keep-alive with...
::setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, ...);
::setsockopt(s, SOL_SOCKET, SO_SNDTIMEO, ...);
::WSAIoctl(s, SIO_KEEPALIVE_VALS, ...);
The socket operations are done with
::send(s, sbuffer, ssize, 0); and
::recv(s, rbuffer, rsize, 0);
I also try to use WSARecv and WSASend with both lpOverlapped and lpCompletionRoutine set to NULL.
[MSDN] ... If both lpOverlapped and lpCompletionRoutine are NULL, the socket in
this function will be treated as a non-overlapped socket.
::WSARecv(s, &dataBuf, 1, &nBytesReceived, &flags, NULL/*lpOverlapped*/, NULL/*lpCompletionRoutine*/)
::WSASend(s, &dataBuf, 1, &nBytesSent, 0, NULL/*lpOverlapped*/, NULL/*lpCompletionRoutine*/)
The Problem:
Those send / recv / WSARecv / WSASend blocking calls would return error with WSA_IO_PENDING error code!
Questions:
Q0: any reference on overlapped attribute with blocking call and timeout?
How does it behave?
in case I have a socket with overlapped "attribute" + timeout feature enable, and just use blocking socket API with "none-overlapped I/O semantics".
I could not find any reference yet about it (e.g. from MSDN).
Q1: is it expected behavior?
I observed this issue (get WSA_IO_PENDING) after migrating code from Win XP/ Win 7 to Win 10.
Here is client code part: (note: the assert is not used in real code, but just describes here that the corresponding error would be handled and a faulty socket will stop the procedure..)
auto s = ::socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
assert(s != INVALID_SOCKET);
timeval timeout;
timeout.tv_sec = (long)(1500);
timeout.tv_usec = 0;
assert(::setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, (const char*)&timeout, sizeof(timeout)) != SOCKET_ERROR);
assert(::setsockopt(s, SOL_SOCKET, SO_SNDTIMEO, (const char*)&timeout, sizeof(timeout)) != SOCKET_ERROR);
struct tcp_keepalive
{
unsigned long onoff;
unsigned long keepalivetime;
unsigned long keepaliveinterval;
} heartbeat;
heartbeat.onoff = (unsigned long)true;
heartbeat.keepalivetime = (unsigned long)3000;
heartbeat.keepaliveinterval = (unsigned long)3000;
DWORD nob = 0;
assert(0 == ::WSAIoctl(s, SIO_KEEPALIVE_VALS, &heartbeat, sizeof(heartbeat), 0, 0, &nob, 0, 0));
SOCKADDR_IN connection;
connection.sin_family = AF_INET;
connection.sin_port = ::htons(port);
connection.sin_addr.s_addr = ip;
assert(::connect(s, (SOCKADDR*)&connection, sizeof(connection)) != SOCKET_ERROR);
char buffer[100];
int receivedBytes = ::recv(s, buffer, 100, 0);
if (receivedBytes > 0)
{
// process buffer
}
else if (receivedBytes == 0)
{
// peer shutdown
// we will close socket s
}
else if (receivedBytes == SOCKET_ERROR)
{
const int lastError = ::WSAGetLastError();
switch (lastError)
{
case WSA_IO_PENDING:
//.... I get the error!
default:
}
}
Q2: How should I handle it?
Ignore it? or just close socket as a usual error case?
From the observation, once I get WSA_IO_PENDING, and if I just ignore it, the socket would become eventually not responsive anymore..
Q3: How about WSAGetOverlappedResult?
does it make any sense?
What WSAOVERLAPPED object should I give? Since there is no such one I use for all those blocking socket calls.
I have tried just create a new empty WSAOVERLAPPED and use it to call WSAGetOverlappedResult. It will eventually return with success with 0 byte transferred.

Q3: How about WSAGetOverlappedResult?
in [WSA]GetOverlappedResult we can only use pointer to WSAOVERLAPPED passed to I/O request. use any another pointer is senseless. all info about I/O operation WSAGetOverlappedResult get from lpOverlapped (final status, number of bytes transferred, if need wait - it wait on event from this overlapped). in general words - every I/O request must pass OVERLAPPED (IO_STATUS_BLOCK really) pointer to kernel. kernel direct modify memory (final status and information (usually bytes transferred). because this lifetime of OVERLAPPED must be valid until I/O not complete. and must be unique for every I/O request. the [WSA]GetOverlappedResult check this memory OVERLAPPED (IO_STATUS_BLOCK really) - first of all look for status. if it another from STATUS_PENDING - this mean that operation completed - api take number of bytes transferred and return. if still STATUS_PENDING here - I/O yet not complete. if we want wait - api use hEvent from overlapped to wait. this event handle is passed to kernel during I/O request and will be set to signal state when I/O finished. wait on any another event is senseless - how it related to concrete I/O request ? think now must be clear why we can call [WSA]GetOverlappedResult only with exactly overlapped pointer passed to I/O request.
if we not pass pointer to OVERLAPPED yourself (for example if we use recv or send) the low level socket api - yourself allocate OVERLAPPED as local variable in stack and pass it pointer to I/O. as result - api can not return in this case until I/O not finished. because overlapped memory must be valid until I/O not complete (in completion kernel write data to this memory). but local variable became invalid after we leave function. so function must wait in place.
because all this we can not call [WSA]GetOverlappedResult after send or recv - at first we simply have no pointer to overlapped. at second overlapped used in I/O request already "destroyed" (more exactly in stack below top - so in trash zone). if I/O yet not completed - the kernel already modify data in random place stack, when it finally completed - this will be have unpredictable effect - from nothing happens - to crash or very unusual side effects. if send or recv return before I/O completed - this will be have fatal effect for process. this never must be (if no bug in windows).
Q2: How should I handle it?
how i try explain if WSA_IO_PENDING really returned by send or recv - this is system bug. good if I/O completed by device with such result (despite it must not) - simply some unknown (for such situation) error code. handle it like any general error. not require special processing (like in case asynchronous io). if I/O really yet not completed (after send or recv returned) - this mean that at random time (may be already) your stack can be corrupted. effect of this unpredictable. and here nothing can be done. this is critical system error.
Q1: is it expected behavior?
no, this is absolute not excepted.
Q0: any reference on overlapped attribute with blocking call and
timeout?
first of all when we create file handle we set or not set asynchronous attribute on it: in case CreateFileW - FILE_FLAG_OVERLAPPED, in case WSASocket - WSA_FLAG_OVERLAPPED. in case NtOpenFile or NtCreateFile - FILE_SYNCHRONOUS_IO_[NO]NALERT (reverse effect compare FILE_FLAG_OVERLAPPED). all this information stored in FILE_OBJECT.Flags - FO_SYNCHRONOUS_IO (The file object is opened for synchronous I/O.) will be set or clear.
effect of FO_SYNCHRONOUS_IO flag is next: I/O subsystem call some driver via IofCallDriver and if driver return STATUS_PENDING - in case FO_SYNCHRONOUS_IO flag set in FILE_OBJECT - wait in place(so in kernel) until I/O not completed. otherwise return this status - STATUS_PENDING for caller - it can wait yourself in place, or receiver callback via APC or IOCP.
when we use socket it internal call WSASocket -
The socket that is created will have the overlapped attribute as a
default
this mean file will be not have FO_SYNCHRONOUS_IO attribute and low level I/O calls can return STATUS_PENDING from kernel. but let look how recv is worked:
internally WSPRecv is called with lpOverlapped = 0. because this - WSPRecv yourself allocate OVERLAPPED in stack, as local variable. before do actual I/O request via ZwDeviceIoControlFile. because file (socket) created without FO_SYNCHRONOUS flag - the STATUS_PENDING is returned from kernel. in this case WSPRecv look - are lpOverlapped == 0. if yes - it can not return, until operation not complete. it begin wait on event (internally maintain in user mode for this socket) via SockWaitForSingleObject - ZwWaitForSingleObject. in place Timeout used value which you associated with socket via SO_RCVTIMEO or 0 (infinite wait) if you not set SO_RCVTIMEO. if ZwWaitForSingleObject return STATUS_TIMEOUT (this can be only in case you set timeout via SO_RCVTIMEO) - this mean that I/O operation not finished in excepted time. in this case WSPRecv called SockCancelIo (same effect as CancelIo). CancelIo must not return (wait) until all I/O request on file (from current thread) will be completed. after this WSPRecv read final status from overlapped. here must be STATUS_CANCELLED (but really the concrete driver decide with which status complete canceled IRP). the WSPRecv convert STATUS_CANCELLED to STATUS_IO_TIMEOUT. then call NtStatusToSocketError for convert ntstatus code to win32 error. say STATUS_IO_TIMEOUT converted to WSAETIMEDOUT. but if still was STATUS_PENDING in overlapped, after CancelIo - you got WSA_IO_PENDING. only in this case. look like device bug, but i can not reproduce it on own win 10 (may be version play role)
what can be do here (if you sure that really got WSA_IO_PENDING) ? first try use WSASocket without WSA_FLAG_OVERLAPPED - in this case ZwDeviceIoControlFile never return STATUS_PENDING and you never must got WSA_IO_PENDING. check this - are error is gone ? if yes - return overlapped attribute and remove SO_RCVTIMEO call (all this for test - not solution for release product) and check are after this error is gone. if yes - look like device invalid cancel (with STATUS_PENDING ?!?) IRP. sense of all this - locate where is error more concrete. anyway interesting will be build minimal demo exe, which can stable reproduce this situation and test it on another systems - are this persist ? are only for concrete versions ? if it can not be reproduced on another comps - need debug on your concrete

Related

How to recover from EAGAIN error on a non-blocking socket.send?

I'm writing a simple script for LoPy4 module. The scrip is supposed to send a PING message at an interval of X. I'm doing this using socket.send() with a non-blocking socket, but every once in a while I get EAGAIN error.
Comparing the size of the data that supposed to be sent and the return value of the send() method shows that everything was sent. So, I guess the buffer supposed to be empty, and still, I get this error which takes approx 10 sec to recover (or restarting the LoPy).
How can I recover (or avoid) from this error skipping just one interval
s = socket.socket(socket.AF_LORA, socket.SOCK_RAW)
s.setblocking(False)
while True:
try:
bytes = s.send(MY_DATA)
except OSError as e:
if e.args[0] == 11:
print('EAGAIN error occurred, skipping')
continue
time.sleep(INTERVAL)

Discussion about select()

There are some points that i cant understand about select() and i wish your guide. As i read about this function, i've found that
The select() function gives you a way to simultaneously check
multiple sockets to see if they have data waiting to be recv()d, or if
you can send() data to them without blocking, or if some exception has
occurred.
1) The first thing that i understood was that this function can check the sockets in parallel. now imagine the sock1 and sock2 receives packets exactly in the same time (packet1 from sock1 and packet2 from sock2) and there are some process that have to done over each packet. is the processing of packets in the parallel? or the packet1 will process then packet 2 will process? (for example in the following code)
int rv = select(maxSd, &readfds, NULL, NULL, NULL);
if (rv == -1) {
perror("select"); // error occurred in select()
} else if (rv == 0) {
printf("Timeout occurred! No data after 10.5 seconds.\n");
} else {
// one or both of the descriptors have data
if (FD_ISSET(sock1, &readfds)) {
printf("socket %i RECEIVED A PACKET \n", sock1);
recvlen = recvfrom(sock1, buf, BUFSIZE, 0, (struct sockaddr *)&remaddr1, &addrlen1);
if (recvlen > 0) {
buf[recvlen] = 0;
printf("received message: \"%s\" (%d bytes)\n", buf, recvlen);
Packet mp;
mp.de_packet((unsigned char *)buf,recvlen);
}
else {
printf("uh oh - something went wrong!\n");
}
}
if (FD_ISSET(sock2, &readfds)) {
printf("socket %i RECEIVED A PACKET \n", sock2);
recvlen2 = recvfrom(sock2, buf2, BUFSIZE, 0, (struct sockaddr *)&remaddr2, &addrlen2);
if (recvlen2 > 0) {
buf[recvlen2] = 0;
printf("received message2: \"%s\" (%d bytes)\n", buf2, recvlen2);
Packet mp;
mp.de_packet((unsigned char *)buf,recvlen);
}
else
printf("uh oh - something went wrong2!\n");
}
}
2) The other doubt about select that i have is related to blocking and non blocking.
What is exactly the meaning of blocking? Does it mean that the program stops on this line till an event occur?
I think that to avoid blocking it is possible to use timeval tv or fcntl(). Is there any better way too?
Thanks in advance
Upon return of the select, provided it didn't return 0 or -1, your program needs to loop on all elements of readfds and evaluate if ISSET, it is set the corresponding socket must be processed. So, your code is also correct supposing only sock1 and sock2 were set in readfds. The evaluation of the sockets in readfds is usually done sequentially by the same thread. Then the packets in each socket can be processed sequentially or in parallel. It must be clear that two sockets are totally independent of each other, there is no possibility of race condition. All this depends on how you program it. For example for each socket that ISSET returns true you can spawn a thread that processes it or you can pass it to a work queue for a set of worker threads to process each one in parallel. There is no limitation of any kind. You could even check readfs in parallel, for example you could have a thread checking the lower half of the set and another thread checking the upper half. This is just an example. Again, there is no limitation providing you program it well without generating any race conditions in your application.
Regarding the concept of blocking or non-blocking, select will always block until a socket in the sets has an event to process (read, write, exception) or there is a timeout (if you set the timeout value).
You could also be talking about blocking and non-blocking sockets, which is different. Blocking sockets are those that can be blocked in a read or write operation. A blocking socket will block in a read operation until there is a Byte ready to be read and it will block in a write operation if the send buffer is full and it cannot write the bytes in the buffer (this may happen in STREAM sockets). It will block until it can write its Bytes. A non-blocking socket will not block in a read operation if there is nothing to read, function read will return -1 and errno will be set to EAGAIN or EWOULDBLOCK (see: http://man7.org/linux/man-pages/man2/read.2.html).
select is usually used with non-blocking sockets so that a thread just blocks there until there is a socket ready to be processed. This is good because otherwise your application would need to be polling the non-blocking sockets all the time, which is not efficient.
select will handle all you sockets in parallel but just to check if there is any event. select does not process any packet, if you pay attention to your example, after select returns your application will read the data from the sockets and this can be done sequentially or in parallel.
I hope this explanation helps you understand the concept.

Weird Winsock recv() slowdown

I'm writing a little VOIP app like Skype, which works quite good right now, but I've run into a very strange problem.
In one thread, I'm calling within a while(true) loop the winsock recv() function twice per run to get data from a socket.
The first call gets 2 bytes which will be casted into a (short) while the second call gets the rest of the message which looks like:
Complete Message: [2 Byte Header | Message, length determined by the 2Byte Header]
These packets are round about 49/sec which will be round about 3000bytes/sec.
The content of these packets is audio-data that gets converted into wave.
With ioctlsocket() I determine wether there is some data on the socket or not at each "message" I receive (2byte+data). If there's something on the socket right after I received a message within the while(true) loop of the thread, the message will be received, but thrown away to work against upstacking latency.
This concept works very well, but here's the problem:
While my VOIP program is running and when I parallely download (e.g. via browser) a file, there always gets too much data stacked on the socket, because while downloading, the recv() loop seems actually to slow down. This happens in every download/upload situation besides the actual voip up/download.
I don't know where this behaviour comes from, but when I actually cancel every up/download besides the voip traffic of my application, my apps works again perfectly.
If the program runs perfectly, the ioctlsocket() function writes 0 into the bytesLeft var, defined within the class where the receive function comes from.
Does somebody know where this comes from? I'll attach my receive function down below:
std::string D_SOCKETS::receive_message(){
recv(ClientSocket,(char*)&val,sizeof(val),MSG_WAITALL);
receivedBytes = recv(ClientSocket,buffer,val,MSG_WAITALL);
if (receivedBytes != val){
printf("SHORT: %d PAKET: %d ERROR: %d",val,receivedBytes,WSAGetLastError());
exit(128);
}
ioctlsocket(ClientSocket,FIONREAD,&bytesLeft);
cout<<"Bytes left on the Socket:"<<bytesLeft<<endl;
if(bytesLeft>20)
{
// message gets received, but ignored/thrown away to throw away
return std::string();
}
else
return std::string(buffer,receivedBytes);}
There is no need to use ioctlsocket() to discard data. That would indicate a bug in your protocol design. Assuming you are using TCP (you did not say), there should not be any left over data if your 2byte header is always accurate. After reading the 2byte header and then reading the specified number of bytes, the next bytes you receive after that constitute your next message and should not be discarded simply because it exists.
The fact that ioctlsocket() reports more bytes available means that you are receiving messages faster than you are reading them from the socket. Make your reading code run faster, don't throw away good data due to your slowness.
Your reading model is not efficient. Instead of reading 2 bytes, then X bytes, then 2 bytes, and so on, you should instead use a larger buffer to read more raw data from the socket at one time (use ioctlsocket() to know how many bytes are available, and then read at least that many bytes at one time and append them to the end of your buffer), and then parse as many complete messages are in the buffer before then reading more raw data from the socket again. The more data you can read at a time, the faster you can receive data.
To help speed up the code even more, don't process the messages inside the loop directly, either. Do the processing in another thread instead. Have the reading loop put complete messages in a queue and go back to reading, and then have a processing thread pull from the queue whenever messages are available for processing.

Select and read sockets (Unix)

I have an intermittent problem with a telnet based server on Unix (the problem crops up on both AIX and Linux).
The server opens two sockets, one to a client telnet session, and one to a program running on the same machine as the server. The idea is that the data is passed through the server to and from this program.
The current setup has a loop using select to wait for a "read" file descriptor to become available, then uses select to wait for a "write" file descriptor to become available.
Then the program reads from the incoming file descriptor, then processes the data before writing to the outgoing descriptor.
The snippet below shows what is going on. The problem is that very occasionally the read fails, with errno being set to ECONNRESET or ETIMEDOUT. Neither of these are codes documented by read, so where are they coming from?
The real question is, how can I either stop this happening, or handle it gracefully?
Could doing two selects in a row be the problem?
The current handling behaviour is to shut down and restart. One point to note is that once this happens it normally happens three or four times, then clears up. The system load doesn't really seem to be that high (it's a big server).
if (select(8, &readset, NULL, NULL, NULL) < 0)
{
break;
}
if (select(8, NULL, &writeset, NULL, NULL) < 0)
{
break;
}
if (FD_ISSET(STDIN_FILENO, &readset)
&& FD_ISSET(fdout, &writeset))
{
if ((nread = read(STDIN_FILENO, buff, BUFFSIZE)) < 0)
{
/* This sometimes fails with errno =
ECONNRESET or ETIMEDOUT */
break;
}
}
Look at the comments in http://lxr.free-electrons.com/source/arch/mips/include/asm/errno.h on lines 85 and 98: these basically say there was a network connection reset or time out. Check and see if there are timeouts you can adjust on the remote network program, or send some periodic filler bytes to ensure that the connection stays awake consistently. You may just be victim of an error in the network transit path between the remote client and your local server (this happens to me when my DSL line hiccups).
EDIT: not sure what the downvote is for. The man page for read explicitly says:
Other errors may occur, depending on the object connected to fd.
The error is probably occuring in the select, not in the read: you're not checking errors after the select, you're just proceeding to the read, which will fail if the select returned an error. I'm betting if you check the errno value after the select call you'll see the errors: you don't need to wait for the read to see the errors.

epoll_wait() receives socket closed twice (read()/recv() returns 0)

We have an application that uses epoll to listen and process http-connections. Sometimes epoll_wait() receives close event on fd twice in a "row". Meaning: epoll_wait() returns connection fd on which read()/recv() returns 0. This is a problem, since I have malloc:ed pointer saved in epoll_event struct (struct epoll_event.data.ptr) and which is freed when fd(socket) is detected as closed the first time. Second time it crashes.
This problem occurs very rarely in real use (except one site, which actually has around 500-1000 users per server). I can replicate the problem using http siege with >1000 simultaneous connections per second. In this case application segfaults (because of invalid pointer) very randomly, sometimes after few seconds, usually after tens of minutes. I have been able to replicate the problem with fewer connections per second, but for that I have to run the application a long time, many days, even weeks.
All new accept() connection fd:s are set as non-blocking and added to epoll as one-shot, edge-triggering and waiting for read() to be available. So somewhy when the server load is high, epoll thinks that my application didn't get the close-event and queues new one?
epoll_wait() is running in it's own thread and queues fd events to be handled elsewhere. I noticed that there was multiple closes incoming with simple code that checks if there comes event twice in a row from epoll to same fd. It did happen and the events where both closes (recv(.., MSG_PEEK) told this to me :)).
epoll fd is created: epoll_create(1024);
epoll_wait() is run as follows: epoll_wait(epoll_fd, events, 256, 300);
new fd is set as non-blocking after accept():
int flags = fcntl(fd, F_GETFL, 0);
err = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
new fd is added to epoll (client is malloc:ed struct pointer):
static struct epoll_event ev;
ev.events = EPOLLIN | EPOLLONESHOT | EPOLLET;
ev.data.ptr = client;
err = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client->fd, &ev);
And after receiving and handling data from fd, it is re-armed (of course since EPOLLONESHOT). At first I wasn't using edge-triggering and non-blocking io, but I tested it and got a nice perfomance boost using those. This problem existed before adding them though. Btw. shutdown(fd, SHUT_RDWR) is used on other threads to trigger proper close event to be received trough epoll when the server needs to close the fd because of some http-error etc (I don't actually know if this is the right way to do it, but it has worked perfectly).
As soon as the first read() returns 0, this means that the connection was closed by the peer. Why does the kernel generate a EPOLLIN event for this case? Well, there's no other way to indicate the socket's closure when you're only subscribed to EPOLLIN. You can add EPOLLRDHUP which is basically the same as checking for read() returning 0. However, make sure to test for this flag before you test for EPOLLIN.
if (flag & EPOLLRDHUP) {
/* Connection was closed. */
deleteConnectionData(...);
close(fd); /* Will unregister yourself from epoll. */
return;
}
if (flag & EPOLLIN) {
readData(...);
}
if (flag & EPOLLOUT) {
writeData(...);
}
The way I've ordered these blocks is relevant and the return for EPOLLRDHUP is important too, because it is likely that deleteConnectionData() may have destroyed internal structures. As EPOLLIN is set as well in case of a closure, this could lead to some problems. Ignoring EPOLLIN is safe because it won't yield any data anyway. Same for EPOLLOUT as it's never sent in conjunction with EPOLLRDHUP!
epoll_wait() is running in it's own thread and queues fd events to be handled elsewhere.
... So why when the server load is high, epoll thinks that my application didn't get the close-event and queues new one?
Assuming that EPOLLONESHOT is bug free (I haven't searched for associated bugs though), the fact that you are processing your epoll events in another thread and that it crashes sporadically or under heavy load may mean that there is a race condition somewhere in your application.
May be the object pointed to by epoll_event.data.ptr gets deallocated prematurely before the epoll event is unregistered in another thread when you server does an active close of the client connection.
My first try would be to run it under valgrind and see if it reports any errors.
I would re-check myself against the following sections from epoll(7):
Q6Will closing a file descriptor cause it to be removed from all epoll sets automatically?
and
o If using an event cache...
There're some good points there.
Removing EPOLLONESHOT made the problem disappear after few other changes. Unfortunately I'm not totally sure what caused it. Using EPOLLONESHOT with threads and adding the fd again manually into the epoll queue was quite certainly the problem. Also the data pointer in epoll struct is released after a delay. Works perfectly now.
Register Signal 0x2000 for Remote host closed connection
ex ev.events = EPOLLIN | EPOLLONESHOT | EPOLLET | 0x2000
and check if (flag & 0x2000) for Remote Host Close Connection