Socket SO_SNDTIMEO Timout is double the set value - sockets

I recently patched my copy of GStreamer 0.10.36 to time out the tcpclientsink if the network connection is switched between wired/wireless (More information at Method to Cancel/Abort GStreamer tcpclientsink Timeout). It's a simple change. I just added the following to the gst_tcp_client_sink_start() function of gsttcpclientsink.c:
struct timeval timeout;
timeout.tv_sec = 60;
timeout.tv_usec = 0;
...
setsockopt (this->sock_fd.fd, SOL_SOCKET, SO_SNDTIMEO, (char *)&timeout, sizeof(timeout));
The strange thing is that the actual timeout (measured by wall clock time) is always double the value I set. If I disrupt the network connection with the timeout set to 60 seconds, it will take 120 seconds for GStreamer/socket to abort. If I set the timeout to 30 seconds, it will take 60 seconds. If I set the timeout to 180 seconds, it will take 360 seconds. Is there something about sockets that I don't understand that might be causing this behavior? I'd really like to know what's going on here.

This might be a duplicate of Socket SO_RCVTIMEO Timeout is double the set value in C++/VC++
I'm pasting my answer below since I think I had a similar problem.
Pasted answer
SO_RCVTIMEO and SO_SNDTIMEO do not work on all socket operations, you should use non-blocking mode and select.
The behaviour may change on different operating system configurations.
On my system the connect timeouts after two times the value I set in SO_RCVTIMEO. A quick hack like setting SO_RCVTIMEO to x/2 before a connect and x after it works, but the proper solution is using select.
References
Discussion on this problem (read comments to answer):
https://stackoverflow.com/a/4182564/4074995
How to use select to achive the desired result:
http://beej.us/guide/bgnet/output/html/multipage/advanced.html#select
C: socket connection timeout

Related

STM32 UART in DMA mode stops receiving after receiving from a host with wrong baud rate

The scenario: I have a STM32 MCU, which uses an UART in DMA Mode with Idle Interrupt for RS485 data transfer. The baud rate of the UART is set in CubeMX, in this case to 115200. My Code works fine, when the Host uses the correct baud rate, it is also "long time" stable, no issues or worries.
BUT: when I set the wrong baud rate at the host, e.g. 56700 instead of 115200, the UART stops receiving data, even if I later set the baud rate at the host to the same baud rate the Microcontroller uses, it won't work. The only way to solve this issue so far is: reset the MCU and connect again with the correct baud rate.
To give you some (Pseudo-)Code:
uint8_t UART_Buf[128];
HAL_UART_Receive_DMA(&huart2, UART_Buf, 128);
__HAL_UART_ENABLE_IT(&huart2, UART_IT_IDLE);
Or in Plain Words: there is a UART Buffer for DMA (UART_Buf[128]) and the UART is started with HAL_UART_Receive_DMA(...), DMA Rx is set to circular mode in CubeMX, also the Idle-Interrupt is activated, using the HAL Macro: __HAL_UART_ENABLE_IT(...); This code works fine so far.
Works fine means:
when I transmit data from my PC to the Micro, the (one) Idle Interrupt is triggered (correctly) by the MCU. In the ISR I set a flag, to start the data parsing afterwards. I receive exactly the number of bytes I have sent, and all is fine.
BUT: when I make the wrong setting in my Terminal Program and instead of the (correct) baud rate of 115200, the baud rate select menu is set to e.g. 57600, the trouble begins:
The idle interrupt will still trigger after each transmission.
But it triggers 2-4 times in a quick "burst" (depending on the baud rate) and the number of bytes received is 0. I'd expect at least some bs data, but there is exactly 0 data in the buffer - which I can check with the debugger. There is obviously received nothing. When I change the baud rate in my terminal program and restart it, there is still nothing received on the MCU.
I could live with 0 received bytes, if the baud rate of the host is incorrect, but it's pretty uncool that one incoming transmission of a host with the wrong baud rate disables the UART until a hardware reset is done.
My attempts to resolve this were so far:
count the "Idle Interrupt Bursts" in combination with 0 received bytes to trigger a "self reset" routine, that stops the UART and restarts it, using the MX_USART2_UART_Init(); Routine. With zero effect. I can see the Idle Interrupt is still triggered correctly, but the buffer remains empty and no data is transferred into the buffer. The UART remains in a non-receiving state.
The Question
Has anyone out there experienced similar issues, and if yes: how did you solve that?
Additional Info: this happens on a STM32F030 as well as on a STM32G03x
When you send to the UART at the wrong baud rate it will appear to the receiver as framing errors and/or noise errors. It could also appear as random characters being received correctly, but this is less likely so don't be surprised to have nothing in your buffer.
When you are receiving with DMA, it is normal to turn the error interrupt on or else poll the error bits. When an error is detected you would then re-initialize everything and restart the DMA. This sounds like what you are trying to do by counting the idle interrupts, but you are just not checking the right bits.
If you don't want to do that, it is not impossible to imagine that you have nothing to do at the driver level and want to try to do the resynchronisation at a higher level (eg: start reading again and discard everything until a newline character) but you will have to bear in mind at least two things:
First, make sure you clear the DDRE bit in the USART_CR3 register. The name "DMA Disable on Reception Error" speaks for itself.
Second, the UART peripheral is able to self resynchronize, as long as you have an idle gap between bytes. If you switch the transmitter to the correct baud rate but keep blasting out data then the receiver may never correctly identify which bit is a start bit.
After investigating this issue a little bit further, i found a solution.
Abstract:
When a host connects to the MCU to an UART with an other baud rate than the UART is set to, it will go into an error state and stop DMA transmission to the RX Buffer. You can check if there is an error with the HAL_UART_GetError(...) function. If there is an error, stop the UART/DMA and restart it.
The Details:
First of all, it was not the DDRE bit in the USART_CR2 register. This was set to 0 by CubeMX. But the hint of Tom V led me into the right direction.
I tried to recover the UART by playing around with the register bits. I read through the UART section of the reference manual multiple times and tried to figure out, which bits to set in which order, to resolve the error condition manually.
What I found out:
When a transmission with the wrong baud rate is received by the UART the following changes in the UART Registers occur (on an STM32F030):
Control register 1 (USART_CR1) - Bit 8 (PEIE) goes from 1 to 0. PEIE is the Parity Interrupt Enable Bit.
Control register 2 (USART_CR2) - remains unchanged
Control register 3 (USART_CR3) - changes from 0d16449 to 0d16384, which means
Bit 0 (EIE - Error Interrupt enable) goes from 1 to 0
Bit 6 (DMAR - DMA enable receiver) goes from 1 to 0
Bit 14 (DEM - Driver enable mode) remains unchanged at 1
USART_CR3.DEM makes sense. I am using the RS485-Functionality of the F030, so the UART handles the Driver-Enable GPIO by itself.
the transition from 1 to 0 at USART_CR3.EIE and USART_CR3.DMAR are most probably the reason why no more data are transfered to the DMA buffer.
Besides that, the error Flags in the Interrupt and status register (USART_ISR) for ORE and FE are set. ORE stands for Overrun Error and FE for Frame Error. Although these bit can be cleared by writing a 1 to the corresponding bit of the Interrupt flag clear register (USART_ICR), the ErrorCode in the hUART Struct remains at the intial error value.
At the end of my try&error process, I managed to have all registers at the same values they had during valid transmissions, but there were still no bytes received. Whatever i tried, id had no effect. The UART remained in a non receiving state. So i decided to use the "brute force" approach and use the HAL functions, which I know they work.
Finally the solution is pretty simple:
if an Idle Interrupt is detected, but the number of received bytes is 0
=> check the Error-Status of the UART with HAL_UART_GetError(...)
If there is an error, stop the UART with HAL_UART_DMAStop(...) and restart it with HAL_UART_Receive_DMA(...)
The code:
if(RxLen) {
// normal execution, number of received bytes > 0
if(UA_RXCallback[i]) (*UA_RXCallback[i])(hUA); // exec RX callback function
} else {
if(HAL_UART_GetError(&huart2)) {
HAL_UART_DMAStop(&huart2); // STOP Uart
MX_USART2_UART_Init(); // INIT Uart
HAL_UART_Receive_DMA(&huart2, UA2_Buf, UA2_BufSz); // START Uart DMA
__HAL_UART_CLEAR_IDLEFLAG(&huart2); // Clear Idle IT-Flag
__HAL_UART_ENABLE_IT(&huart2, UART_IT_IDLE); // Enable Idle Interrupt
}
}
I had a similar issue. I'm using a DMA to receive data, and then periodically checking how many bytes were received. After a bit error, it would not recover. The solution for me was to first subscribe to ErrorCallback on the UART_HandleTypeDef.
In the error handler, I then call UART_Start_Receive_DMA(...) again. This seems to restart the UART and DMA without issue.

How to change the default network socket timeout in DCMTK?

The default network socket timeout in DCMTK is 60 seconds.
How to change it to 30?
I could see the code written as below, but could not change it to 30:
extern DCMTK_DCMNET_EXPORT OFGlobal<Sint32> dcmSocketReceiveTimeout; /* default: 60 */
As far as I understand your question, you want to set the timeout programmtically.
You can check how to do this in the dcmtk tools like echoscu -- basically you have to call:
#include "dcmtk/dcmnet/dcmtrans.h"
dcmSocketReceiveTimeout.set(OFstatic_cast(Sint32, new_socket_timeout));
and the global timeout will change accordingly.
The same is true for setting the send timeout, where you use dcmSocketSendTimeout instead.

Why is my recvfrom function from Winsock so slow even with a timeout?

I have a basic UDP server that's using blocking calls with timeouts.
I set the timeouts using
int timeout = 1;
setsockopt(serverSocket, SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout, sizeof(int));
Then I have a while loop with nothing else in there except from
while (1)
{
printf("receiving...");
int ret = recvfrom(serverSocket, (char*)buffer, 1024, 0, (sockaddr*)&from, &fromlen);
}
But there seems to be a ridiculous delay between each printf and it's definitely coming from recvfrom. Removing this line entirely makes the application speed up.
Everything is hosted locally so there should be no lag.
The code returned is correct and is a WSAETIMEDOUT (10060) but the timings are not correct! A 1ms timeout should not take around 400ms.
How can I get rid of this delay?
I still don't know why this is happening so I will leave the question open.
For now I have switched to using select() for the timeout and this works as expected.

How to set connection timeout for Mongodb using pymongo?

I tried setting connectTimeoutMS and socketTimeoutMS to a low value but it still takes about 20 seconds before my script times out. Am I not using the options correctly? I want the script to exit after 5 seconds.
def init_mongo():
mongo_connection = MongoClient('%s' %MONGO_SERVER, connectTimeoutMS=5000, socketTimeoutMS=5000)
if mongo_connection is None:
return
try:
<code>
except:
<code>
So if anyone comes across this later, I was using the wrong option.
What I was looking for is serverSelectionTimeoutMS
The web page:
https://api.mongodb.com/python/current/api/pymongo/mongo_client.html
says:
connectTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait during server monitoring when connecting a new socket to a server before concluding the server is unavailable. Defaults to 20000 (20 seconds)
(Where "server monitoring" is undefined)
So what? Is connectTimeoutMS sort of like a decoy to keep out the amateurs (like me)

epoll_wait() receives socket closed twice (read()/recv() returns 0)

We have an application that uses epoll to listen and process http-connections. Sometimes epoll_wait() receives close event on fd twice in a "row". Meaning: epoll_wait() returns connection fd on which read()/recv() returns 0. This is a problem, since I have malloc:ed pointer saved in epoll_event struct (struct epoll_event.data.ptr) and which is freed when fd(socket) is detected as closed the first time. Second time it crashes.
This problem occurs very rarely in real use (except one site, which actually has around 500-1000 users per server). I can replicate the problem using http siege with >1000 simultaneous connections per second. In this case application segfaults (because of invalid pointer) very randomly, sometimes after few seconds, usually after tens of minutes. I have been able to replicate the problem with fewer connections per second, but for that I have to run the application a long time, many days, even weeks.
All new accept() connection fd:s are set as non-blocking and added to epoll as one-shot, edge-triggering and waiting for read() to be available. So somewhy when the server load is high, epoll thinks that my application didn't get the close-event and queues new one?
epoll_wait() is running in it's own thread and queues fd events to be handled elsewhere. I noticed that there was multiple closes incoming with simple code that checks if there comes event twice in a row from epoll to same fd. It did happen and the events where both closes (recv(.., MSG_PEEK) told this to me :)).
epoll fd is created: epoll_create(1024);
epoll_wait() is run as follows: epoll_wait(epoll_fd, events, 256, 300);
new fd is set as non-blocking after accept():
int flags = fcntl(fd, F_GETFL, 0);
err = fcntl(fd, F_SETFL, flags | O_NONBLOCK);
new fd is added to epoll (client is malloc:ed struct pointer):
static struct epoll_event ev;
ev.events = EPOLLIN | EPOLLONESHOT | EPOLLET;
ev.data.ptr = client;
err = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, client->fd, &ev);
And after receiving and handling data from fd, it is re-armed (of course since EPOLLONESHOT). At first I wasn't using edge-triggering and non-blocking io, but I tested it and got a nice perfomance boost using those. This problem existed before adding them though. Btw. shutdown(fd, SHUT_RDWR) is used on other threads to trigger proper close event to be received trough epoll when the server needs to close the fd because of some http-error etc (I don't actually know if this is the right way to do it, but it has worked perfectly).
As soon as the first read() returns 0, this means that the connection was closed by the peer. Why does the kernel generate a EPOLLIN event for this case? Well, there's no other way to indicate the socket's closure when you're only subscribed to EPOLLIN. You can add EPOLLRDHUP which is basically the same as checking for read() returning 0. However, make sure to test for this flag before you test for EPOLLIN.
if (flag & EPOLLRDHUP) {
/* Connection was closed. */
deleteConnectionData(...);
close(fd); /* Will unregister yourself from epoll. */
return;
}
if (flag & EPOLLIN) {
readData(...);
}
if (flag & EPOLLOUT) {
writeData(...);
}
The way I've ordered these blocks is relevant and the return for EPOLLRDHUP is important too, because it is likely that deleteConnectionData() may have destroyed internal structures. As EPOLLIN is set as well in case of a closure, this could lead to some problems. Ignoring EPOLLIN is safe because it won't yield any data anyway. Same for EPOLLOUT as it's never sent in conjunction with EPOLLRDHUP!
epoll_wait() is running in it's own thread and queues fd events to be handled elsewhere.
... So why when the server load is high, epoll thinks that my application didn't get the close-event and queues new one?
Assuming that EPOLLONESHOT is bug free (I haven't searched for associated bugs though), the fact that you are processing your epoll events in another thread and that it crashes sporadically or under heavy load may mean that there is a race condition somewhere in your application.
May be the object pointed to by epoll_event.data.ptr gets deallocated prematurely before the epoll event is unregistered in another thread when you server does an active close of the client connection.
My first try would be to run it under valgrind and see if it reports any errors.
I would re-check myself against the following sections from epoll(7):
Q6Will closing a file descriptor cause it to be removed from all epoll sets automatically?
and
o If using an event cache...
There're some good points there.
Removing EPOLLONESHOT made the problem disappear after few other changes. Unfortunately I'm not totally sure what caused it. Using EPOLLONESHOT with threads and adding the fd again manually into the epoll queue was quite certainly the problem. Also the data pointer in epoll struct is released after a delay. Works perfectly now.
Register Signal 0x2000 for Remote host closed connection
ex ev.events = EPOLLIN | EPOLLONESHOT | EPOLLET | 0x2000
and check if (flag & 0x2000) for Remote Host Close Connection