Weird Winsock recv() slowdown - sockets

I'm writing a little VOIP app like Skype, which works quite good right now, but I've run into a very strange problem.
In one thread, I'm calling within a while(true) loop the winsock recv() function twice per run to get data from a socket.
The first call gets 2 bytes which will be casted into a (short) while the second call gets the rest of the message which looks like:
Complete Message: [2 Byte Header | Message, length determined by the 2Byte Header]
These packets are round about 49/sec which will be round about 3000bytes/sec.
The content of these packets is audio-data that gets converted into wave.
With ioctlsocket() I determine wether there is some data on the socket or not at each "message" I receive (2byte+data). If there's something on the socket right after I received a message within the while(true) loop of the thread, the message will be received, but thrown away to work against upstacking latency.
This concept works very well, but here's the problem:
While my VOIP program is running and when I parallely download (e.g. via browser) a file, there always gets too much data stacked on the socket, because while downloading, the recv() loop seems actually to slow down. This happens in every download/upload situation besides the actual voip up/download.
I don't know where this behaviour comes from, but when I actually cancel every up/download besides the voip traffic of my application, my apps works again perfectly.
If the program runs perfectly, the ioctlsocket() function writes 0 into the bytesLeft var, defined within the class where the receive function comes from.
Does somebody know where this comes from? I'll attach my receive function down below:
std::string D_SOCKETS::receive_message(){
recv(ClientSocket,(char*)&val,sizeof(val),MSG_WAITALL);
receivedBytes = recv(ClientSocket,buffer,val,MSG_WAITALL);
if (receivedBytes != val){
printf("SHORT: %d PAKET: %d ERROR: %d",val,receivedBytes,WSAGetLastError());
exit(128);
}
ioctlsocket(ClientSocket,FIONREAD,&bytesLeft);
cout<<"Bytes left on the Socket:"<<bytesLeft<<endl;
if(bytesLeft>20)
{
// message gets received, but ignored/thrown away to throw away
return std::string();
}
else
return std::string(buffer,receivedBytes);}

There is no need to use ioctlsocket() to discard data. That would indicate a bug in your protocol design. Assuming you are using TCP (you did not say), there should not be any left over data if your 2byte header is always accurate. After reading the 2byte header and then reading the specified number of bytes, the next bytes you receive after that constitute your next message and should not be discarded simply because it exists.
The fact that ioctlsocket() reports more bytes available means that you are receiving messages faster than you are reading them from the socket. Make your reading code run faster, don't throw away good data due to your slowness.
Your reading model is not efficient. Instead of reading 2 bytes, then X bytes, then 2 bytes, and so on, you should instead use a larger buffer to read more raw data from the socket at one time (use ioctlsocket() to know how many bytes are available, and then read at least that many bytes at one time and append them to the end of your buffer), and then parse as many complete messages are in the buffer before then reading more raw data from the socket again. The more data you can read at a time, the faster you can receive data.
To help speed up the code even more, don't process the messages inside the loop directly, either. Do the processing in another thread instead. Have the reading loop put complete messages in a queue and go back to reading, and then have a processing thread pull from the queue whenever messages are available for processing.

Related

Reading from Socket Stream Blocking After Retrieval

I'm currently attempting to read an incoming message from a client socket, that, prior to the below procedure has already been connected to the server socket. The below procedure outputs the message, one character at a time, as it retrieves it from the stream.
The problem is that, when the stream is out of information, the call to Ada.Streams.Read is blocking, and stops the application flow completely. According to some examples, it would appear as though Offset should be set to 0 automatically at the end of the stream, but that never happens. Instead the application stops at the call to Read.
procedure Read_From (Channel : Sockets.Stream_Access) is
use Ada.Text_IO;
use Ada.Streams;
Data : Stream_Element_Array (1 .. 1);
Offset : Stream_Element_Offset;
begin
loop
Read (Channel.All, Data, Offset);
exit when Offset = 0;
Put (Character'Val (Data (1)));
end loop;
-- The application never reaches this point.
New_Line;
Put_Line ("Finished reading from client!");
end Read_From;
-- #param Channel `GNAT.Sockets.Stream (Client_Socket)`
I've also attempted the same process with GNAT.Sockets.Receive_Socket, but the same issue remains: the application flow is stopped completely, assumably awaiting further information from the stream, even though there is nothing more to retrieve.
Any pointers in the right direction would be highly appreciated!
Normally, you’d read a (binary) message from a stream knowing how much data needed to be read, so you could read until you’d got that much.
But, if you’re reading a text message from an externally-defined source, as it might be an HTTP request, there needs to be some terminator sequence so you can read character-by-character until you’ve read the terminator. In the case of an HTTP request, that’s a CR/LF/CR/LF sequence. Or it could be a null-terminated C string, in which case you’d be looking for the ASCII.NUL.
The Ada way to transfer variable-length text is to use String’Output/String’Input (see ARM 13.13.2(18)ff). What happens for a String (an array of Character) is that first the bounds are sent, then the content; on reception, the bounds are read, a String with those bounds is created, and the required number of bytes are read into the new String, which is then returned.
Basically that's how Ada streams work. The end of the stream only comes once you reach the final end of the stream, not just the current end of a buffer.
If you want to be able to interrupt reading, you have to use another representation of the connection than GNAT.Sockets.Stream_Access.

Triggering Interrupt for any byte received

I'm trying to get a code to work that triggers an interrupt for a variable data size coming to a RX input of a STM32 board (not discovery) in DMA Circular mode. ex.:CONNECTED\r\nDATAREQUEST\r\n
So far so good, I'm being able to receive data and all, while also triggering the DMA interrupt.
I will then create a sub RX message processing buffer breaking down each \r\n to a different char array pointer.
msgProcessingBuffer[0] = "COM_OK"
msgProcessingBuffer[1] = "DATAREQUEST"
msgProcessingBuffer[n] = "BlahBlahBlah"
My problem comes actually from the trigger of the interrupt. I would like to trigger the interrupt from any amount of data and processing any data received.
If I use the interrupt request bellow:
HAL_UART_Receive_DMA(&huart1,uart1RxMsgBuffer, 30);
The input buffer will take 30 bytes to trigger the interrupt, but that's too much time to wait because I would like to process the RX data as soon as a \r\n is found in the string. So I cannot wait for the full buffer to fill to begin processing it.
If I use the interrupt request bellow:
HAL_UART_Receive_DMA(&huart1,uart1RxMsgBuffer, 1;
It will trigger as I want, but there is no point on using DMA in this case because it will trigger the interrupt for every byte and will create a buffer of just 1 byte (duh) just like in "polling mode".
So my question is, how do I trigger the DMA for the first byte received but still receive/process all data that might come after it in a single interrupt? I believe I might be missing some basic concept here.
Best regards,
Blukrr
In short: HAL/SPL libraries don't provide such feachures.
Generally some MCUs, for example STM32F091VCT6 have hardware supporting of Modbus and byte flow analysis (interrupt by recieve some control byte) - so if you will use such MCU in you project, you can configure receive by circular DMA with interrupts by receive '\r' or '\n' byte.
And I repeat: HAL or SPL don't support this features, you can use it only throught work with registers (see reference manuals).
I was taking a look at some other forums and I've found there a work around for this problem.
I'm using a DMA in circular mode and then I monitor the NDTR which updates its value every time a byte is received through the UART interface. Then I cyclically call a function (in while 1 loop or in a cyclic interrupt handler) that break down each message part always looking for /n /r chars. This function also saves the current NDTR value for comparison if it has changed since the last "while 1" cycle. If the NDTR has changed since last cycle I wait a couple milliseconds to receive the remaining message (UART it's too slow to transmit) and then save those received messages in a char buffer array for post processing.
If you create a circular DMA buffer of about 50 bytes (HAL_UART_Receive_DMA(&huart1,uart1RxMsgBuffer, 50)) I think it's enough to compensate any fluctuations in the program cycle.
In the mean time I opened a ticket to ST and they confirmed what you just said they also added:
SOLUTION PROPOSED BY SUPPORTER - 14/4/2016 16:45:22 :
Hi Gilberto,
The DMA interrupt requests available are listed on Table 50 of the Reference Manual, RM0090, http://www.st.com/web/en/resource/technical/document/reference_manual/DM00031020.pdf. Therefore, basically, the DMA interrupt can only trigger at the end of one of these events.
• Half-transfer reached
• Transfer complete
• Transfer error
• Fifo error (overrun, underrun or FIFO level error)
• Direct mode error
Getting a DMA interrupt to trigger upon reception of a specific character in your receive data stream is not possible. You may want to trigger the interrupt when you receive packets of say 30 bytes each and then process the datastring to check if your \r\n chars have arrived so you can process the data block.
Regards,
MCU Tech Support

Using `chan pending output` instead of writable fileevent

Yo, I've written a server with a simple protocol: the client sends a line, the server sends a line back in response, repeat. To prevent a client from filling Tcl's output buffer by sending lots of lines but not accepting data back, can I just check chan pending output instead of using the writable fileevent?
proc respond {stream msg} {
if {[chan pending output $stream] <= 1024} {
puts $stream $msg
} else {
#close $stream
}
}
For output, chan pending output will correctly describe the number of bytes waiting in the output queue. Normally, that value will be bounded by the -buffersize value that you chan configure (or fconfigure) it to have.
That value will only be exceeded when the channel is non-blocking; with a blocking channel, when the value would go over it, instead there's a blocking write to the underlying device (socket, pipe, file, serial line, whatever) so by the time you could see that it went over, it's back under the limit again.
But if you're using non-blocking channels, you really should use chan event (or fileevent). Luckily for the actual writes, Tcl will actually do this for you automatically; the single most useful thing you could want from a writable event is already there. In practice, the most common actual use of a writable event is in detecting when an async socket connection becomes ready for service.
So what you are doing will work, but you'll have to think carefully about what to do if the output buffer is “getting full”; the idea that a message can need to be delayed is a place where a simple abstraction tends to become leaky. With 8.6's coroutines, you could (probably) do a transparent suspend or something like that, but getting that sort of thing right can take a little thought. (For example, a GUI client might need to show a busy indicator and put things into a state where the user can't enter more requests.)

socket receive loop never returns

I have a loop that reads from a socket in Lua:
socket = nmap.new_socket()
socket:connect(host, port)
socket:set_timeout(15000)
socket:send(command)
repeat
response,data = socket:receive_buf("\n", true)
output = output..data
until data == nil
Basically, the last line of the data does not contain a "\n" character, so is never read from the socket. But this loop just hangs and never completes. I basically need it to return whenever the "\n" delimeter is not recognised. Does anyone know a way to do this?
Cheers
Updated
to include socket code
Update2
OK I have got around the initial problem of waiting for a "\n" character by using the "receive_bytes" method.
New code:
--socket set as above
repeat
data = nil
response,data = socket:receive_bytes(5000)
output = output..data
until data == nil
return output
This works and I get the large complete block of data back. But I need to reduce the buffer size from 5000 bytes, as this is used in a recursive function and memory usage could get very high. I'm still having problems with my "until" condition however, and if I reduce the buffer size to a size that will require the method to loop, it just hangs after one iteration.
Update3
I have gotten around this problem using string.match and receive_bytes. I take in at least 80 bytes at a time. Then string.match checks to see if the data variable conatins a certain pattern. If so it exits. Its not the cleanest solution, but it works for what I need it to do. Here is the code:
repeat
response,data = socket:receive_bytes(80)
output = output..data
until string.match(data, "pattern")
return output
I believe the only way to deal with this situation in a socket is to set a timeout.
The following link has a little bit of info, but it's on http socket: lua http socket timeout
There is also this one (9.4 - Non-Preemptive Multithreading): http://www.lua.org/pil/9.4.html
And this question: http://lua-list.2524044.n2.nabble.com/luasocket-howto-read-write-Non-blocking-TPC-socket-td5792021.html
A good discussion on Socket can be found on this link:
http://nitoprograms.blogspot.com/2009/04/tcpip-net-sockets-faq.html
It's .NET but the concepts are general.
See update 3. Because the last part of the data is always the same pattern, I can read in a block of bytes and each time check if that block has the pattern. If it has the pattern it will mean that it is the end of the data, append to the output variable and exit.

epoll_wait() receives socket closed twice (read()/recv() returns 0)

We have an application that uses epoll to listen and process http-connections. Sometimes epoll_wait() receives close event on fd twice in a "row". Meaning: epoll_wait() returns connection fd on which read()/recv() returns 0. This is a problem, since I have malloc:ed pointer saved in epoll_event struct (struct epoll_event.data.ptr) and which is freed when fd(socket) is detected as closed the first time. Second time it crashes.
This problem occurs very rarely in real use (except one site, which actually has around 500-1000 users per server). I can replicate the problem using http siege with >1000 simultaneous connections per second. In this case application segfaults (because of invalid pointer) very randomly, sometimes after few seconds, usually after tens of minutes. I have been able to replicate the problem with fewer connections per second, but for that I have to run the application a long time, many days, even weeks.
All new accept() connection fd:s are set as non-blocking and added to epoll as one-shot, edge-triggering and waiting for read() to be available. So somewhy when the server load is high, epoll thinks that my application didn't get the close-event and queues new one?
epoll_wait() is running in it's own thread and queues fd events to be handled elsewhere. I noticed that there was multiple closes incoming with simple code that checks if there comes event twice in a row from epoll to same fd. It did happen and the events where both closes (recv(.., MSG_PEEK) told this to me :)).
epoll fd is created: epoll_create(1024);
epoll_wait() is run as follows: epoll_wait(epoll_fd, events, 256, 300);
new fd is set as non-blocking after accept():
int flags = fcntl(fd, F_GETFL, 0);
err = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
new fd is added to epoll (client is malloc:ed struct pointer):
static struct epoll_event ev;
ev.events = EPOLLIN | EPOLLONESHOT | EPOLLET;
ev.data.ptr = client;
err = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client->fd, &ev);
And after receiving and handling data from fd, it is re-armed (of course since EPOLLONESHOT). At first I wasn't using edge-triggering and non-blocking io, but I tested it and got a nice perfomance boost using those. This problem existed before adding them though. Btw. shutdown(fd, SHUT_RDWR) is used on other threads to trigger proper close event to be received trough epoll when the server needs to close the fd because of some http-error etc (I don't actually know if this is the right way to do it, but it has worked perfectly).
As soon as the first read() returns 0, this means that the connection was closed by the peer. Why does the kernel generate a EPOLLIN event for this case? Well, there's no other way to indicate the socket's closure when you're only subscribed to EPOLLIN. You can add EPOLLRDHUP which is basically the same as checking for read() returning 0. However, make sure to test for this flag before you test for EPOLLIN.
if (flag & EPOLLRDHUP) {
/* Connection was closed. */
deleteConnectionData(...);
close(fd); /* Will unregister yourself from epoll. */
return;
}
if (flag & EPOLLIN) {
readData(...);
}
if (flag & EPOLLOUT) {
writeData(...);
}
The way I've ordered these blocks is relevant and the return for EPOLLRDHUP is important too, because it is likely that deleteConnectionData() may have destroyed internal structures. As EPOLLIN is set as well in case of a closure, this could lead to some problems. Ignoring EPOLLIN is safe because it won't yield any data anyway. Same for EPOLLOUT as it's never sent in conjunction with EPOLLRDHUP!
epoll_wait() is running in it's own thread and queues fd events to be handled elsewhere.
... So why when the server load is high, epoll thinks that my application didn't get the close-event and queues new one?
Assuming that EPOLLONESHOT is bug free (I haven't searched for associated bugs though), the fact that you are processing your epoll events in another thread and that it crashes sporadically or under heavy load may mean that there is a race condition somewhere in your application.
May be the object pointed to by epoll_event.data.ptr gets deallocated prematurely before the epoll event is unregistered in another thread when you server does an active close of the client connection.
My first try would be to run it under valgrind and see if it reports any errors.
I would re-check myself against the following sections from epoll(7):
Q6Will closing a file descriptor cause it to be removed from all epoll sets automatically?
and
o If using an event cache...
There're some good points there.
Removing EPOLLONESHOT made the problem disappear after few other changes. Unfortunately I'm not totally sure what caused it. Using EPOLLONESHOT with threads and adding the fd again manually into the epoll queue was quite certainly the problem. Also the data pointer in epoll struct is released after a delay. Works perfectly now.
Register Signal 0x2000 for Remote host closed connection
ex ev.events = EPOLLIN | EPOLLONESHOT | EPOLLET | 0x2000
and check if (flag & 0x2000) for Remote Host Close Connection