SO_KEEPALIVE: Detecting connection lost or terminated - sockets

I have multiple threads who have a socket open with a client application each. Each of those threads receive an instruction from a main thread to send commands to the client (commands could be run test, stop test, terminate session, exit....). Those threads are generic, they just have a socket per client and just send a command when the main thread asks it to.
The client could exit or crash, or network could be bad.
I have been trying to see how to figure out that my TCP session has ended per client. Two solutions that I have found that seem appropriate here.
1) Implement my own heartbeat system
2) Use keepAlive using setsockopt.
I have tried 2) as it sounds faster to implement, but I am not sure of one thing: Will SO_KEEPALIVE generate a SIGPIPE when connection is interrupted please? I saw that it should be the case but never received a SIGPIPE.
This is how my code looks:
void setKeepAlive(int sockfd) {
int optval;
optval = 1;
setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, &optval, sizeof(optval));
optval = 1;
setsockopt(sockfd, SOL_TCP, TCP_KEEPIDLE, &optval, sizeof(optval));
optval = 1;
setsockopt(sockfd, SOL_TCP, TCP_KEEPCNT, &optval, sizeof(optval));
optval = 1;
setsockopt(sockfd, SOL_TCP, TCP_KEEPINTVL, &optval, sizeof(optval));
}
And my code that accepts connection is as follows:
for (mNumberConnectedClients = 0; mNumberConnectedClients < maxConnections; ++mNumberConnectedClients) {
clientSocket = accept(sockfd, (struct sockaddr *) &client_addr, &clientLength);
// set KeepAlive on socket
setKeepAlive(clientSocket);
pthread_create(&mThread_pool[mNumberConnectedClients], NULL, controlClient, (void*) &clientSocket);
}
signal(SIGPIPE, test);
....
And the test function:
void test(int n) {
printf("Socket broken: %d\n", n);
}
test() never gets triggered. Is it my understanding that is wrong please? I am not sure if SIGPIPE gets generated or not. Thanks a lot.

If a keep-alive fails, the connection will simply be invalidated by the OS, and any subsequent read/write operations on that socket will fail with an appropriate error code. You need to make sure your reading/writing code is handling errors so it can close the socket, if it is not already doing so.

Related

Is a non-blocking connect guaranteed to fail with EINPROGRESS?

If I set up a socket for non-blocking operation, as follows:
int fd = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, IPPROTO_TCP);
int rc = connect(fd, (struct sockaddr *)&addr, sizeof(addr));
...is connect guaranteed to fail with EINPROGRESS, or do I need to handle the case where it succeeds immediately?
Not necessarily. Connecting to 127.0.0.1 may connect or fail immediately.
You need to handle the case where it succeeds immediately. That's why it returns 0 or -1. The documentation doesn't make any exception about that for non-blocking mode.

BSD socket connect + select (client)

There must be something wrong in the below code but I don't seem to be able to use a client connect, non blocking in combination with a select statement. Please ignore the below lack of error handling.
I seem to have two issues
1. select blocks until timeout (60) if I try to connect port 80 on an internet server
2. trying to connect a existing or non existing port on 127.0.0.1 always instantly returns the select with no way to distinction between success or failure to connect.
What am I missing in my understanding of BSD nonblocking in combination with select?
fd_set readfds;
FD_ZERO(&readfds);
struct timeval tv;
tv.tv_sec = 60;
tv.tv_usec = 0;
struct sockaddr_in dest;
int socketFD = socket(AF_INET, SOCK_STREAM, 0);
memset(&dest, 0, sizeof(dest));
dest.sin_family = AF_INET;
dest.sin_addr.s_addr = inet_addr("127.0.0.1");
dest.sin_port = htons(9483);
long arg;
arg = fcntl(socketFD, F_GETFL, NULL);
arg |= O_NONBLOCK;
fcntl(socketFD, F_SETFL, arg);
if (connect(socketFD, (struct sockaddr *)&dest, sizeof(struct sockaddr))<0 && errno == EINPROGRESS) {
//now add it to the read set
FD_SET(socketFD, &readfds);
int res = select(socketFD+1, &readfds, NULL, NULL, &tv);
int error = errno;
if (res>0 && FD_ISSET(socketFD, &readfds)) {
NSLog(#"errno: %d", error); //Always 36
}
}
errno is set in your original attempt to connect -- legitimately: that is, it's in-progress. You then call select. Since select didn't fail, errno is not being reset. System calls only set errno on failure; they do not clear it on success.
The connect may have completed successfully. You aren't checking that though. You should add a call to getsockopt with SO_ERROR to determine whether it worked. This will return the error state on the socket.
One other important note. According to the manual page (https://www.freebsd.org/cgi/man.cgi?query=connect&sektion=2), you should be using the writefds to await completion of the connect. I don't know whether the readfds will correctly report the status.
[EINPROGRESS] The socket is non-blocking and the connection cannot
be completed immediately. It is possible to select(2)
for completion by selecting the socket for writing.
See also this very similar question. Using select() for non-blocking sockets to connect always returns 1

Fail to Bind Socket

I've been writing a server and every time I quit it and re-open it, it seems to fail to bind to the socket. I'm connecting 2 clients and then disconnecting them with close() before I shut down the server, I also then quit the clients before opening the server just in case, however it still seems to fail and I have to restart my computer. Here is my code:
listenSocket = device = app = 0;
struct sockaddr_in server_addr;
char buffer[1024];
listenSocket = socket(AF_INET, SOCK_STREAM, 0);
memset(&server_addr, '0', sizeof(server_addr));
memset(buffer, '0', sizeof(buffer));
server_addr.sin_family = AF_INET;
server_addr.sin_addr.s_addr = htonl(INADDR_ANY);
server_addr.sin_port = htons(35565);
//bind the socket
if (bind(listenSocket,(struct sockaddr*)&server_addr, sizeof(server_addr)) == -1) {
NSLog(#"Error binding to socket");
}
if (listen(listenSocket, 5) == -1) {
NSLog(#"Failed to listen");
}
//launch thread for console
[NSThread detachNewThreadSelector:#selector(console) toTarget:self withObject:nil];
NSLog(#"Starting server");
//socket open, ask for clients
while (true) {
int client = -1;
if (!device || !app)
client = accept(listenSocket, (struct sockaddr*)NULL, NULL);
//handshake omitted for length
}
And the code to close the server:
close(listenSocket);
close(device);
close(app);
NSLog(#"Clean");
Is there something I'm doing wrong? Any help would be appreciated. Thanks.
EDIT: Here is my error checking code:
NSLog(#"%s",strerror(errno));
int e = bind(listenSocket,(struct sockaddr*)&server_addr, sizeof(server_addr));
NSLog(#"%s",strerror(errno));
You need to set the SO_REUSEADDR option. Otherwise, once you grab the port in a process, there is a significant timeout before the kernel will let you have it again. Much detail to be found in an existing question; I've voted to close as a duplicate.
I had a similar problem which was caused by another process holding on to the ports. Killing that process solved the problem.

WSAConnect returns WSAEINVAL on WindowsXP

I use sockets in non-blocking mode, and sometimes WSAConnect function returns WSAEINVAL error.
I investigate a problem and found, that it occurs if there is no pause (or it is very small ) between
WSAConnect function calls.
Does anyone know how to avoid this situation?
Below you can found source code, that reproduce the problem. If I increase value of parameter in Sleep function to 50 or great - problem dissapear.
P.S. This problem reproduces only on Windows XP, on Win7 it works well.
#undef UNICODE
#include <winsock2.h>
#include <ws2tcpip.h>
#include <stdio.h>
#include <iostream>
#include <windows.h>
#pragma comment(lib, "Ws2_32.lib")
static int getError(SOCKET sock)
{
DWORD error = WSAGetLastError();
return error;
}
void main()
{
SOCKET sock;
WSADATA wsaData;
if (WSAStartup(MAKEWORD(2, 2), &wsaData) != 0) {
fprintf(stderr, "Socket Initialization Error. Program aborted\n");
return;
}
for (int i = 0; i < 1000; ++i) {
struct addrinfo hints;
struct addrinfo *res = NULL;
memset(&hints, 0, sizeof(hints));
hints.ai_flags = AI_PASSIVE;
hints.ai_socktype = SOCK_STREAM;
hints.ai_family = AF_INET;
hints.ai_protocol = IPPROTO_TCP;
if (0 != getaddrinfo("172.20.1.59", "8091", &hints, &res)) {
fprintf(stderr, "GetAddrInfo Error. Program aborted\n");
closesocket(sock);
WSACleanup();
return;
}
struct addrinfo *ptr = 0;
for (ptr=res; ptr != NULL ;ptr=ptr->ai_next) {
sock = WSASocket(ptr->ai_family, ptr->ai_socktype, ptr->ai_protocol, NULL, 0, NULL); //
if (sock == INVALID_SOCKET)
int err = getError(sock);
else {
u_long noblock = 1;
if (ioctlsocket(sock, FIONBIO, &noblock) == SOCKET_ERROR) {
int err = getError(sock);
closesocket(sock);
sock = INVALID_SOCKET;
}
break;
}
}
int ret;
do {
ret = WSAConnect(sock, ptr->ai_addr, (int)ptr->ai_addrlen, NULL, NULL, NULL, NULL);
if (ret == SOCKET_ERROR) {
int error = getError(sock);
if (error == WSAEWOULDBLOCK) {
Sleep(5);
continue;
}
else if (error == WSAEISCONN) {
fprintf(stderr, "+");
closesocket(sock);
sock = SOCKET_ERROR;
break;
}
else if (error == 10037) {
fprintf(stderr, "-");
closesocket(sock);
sock = SOCKET_ERROR;
break;
}
else {
fprintf(stderr, "Connect Error. [%d]\n", error);
closesocket(sock);
sock = SOCKET_ERROR;
break;
}
}
else {
int one = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, (char*)&one, sizeof(one));
fprintf(stderr, "OK\n");
break;
}
}
while (1);
}
std::cout<<"end";
char ch;
std::cin >> ch;
}
You've got a whole basketful of errors and questionable design and coding decisions here. I'm going to have to break them up into two groups:
Outright Errors
I expect if you fix all of the items in this section, your symptom will disappear, but I wouldn't want to speculate about which one is the critical fix:
Calling connect() in a loop on a single socket is simply wrong.
If you mean to establish a connection, drop it, and reestablish it 1000 times, you need to call closesocket() at the end of each loop, then call socket() again to get a fresh socket. You can't keep re-connecting the same socket. Think of it like a power plug: if you want to plug it in twice, you have to unplug (closesocket()) between times.
If instead you mean to establish 1000 simultaneous connections, you need to allocate a new socket with socket() on each iteration, connect() it, then go back around again to get another socket. It's basically the same loop as for the previous case, except without the closesocket() call.
Beware that since XP is a client version of Windows, it's not optimized for handling thousands of simultaneous sockets.
Calling connect() again is not the correct response to WSAEWOULDBLOCK:
if (error == WSAEWOULDBLOCK) {
Sleep(5);
continue; /// WRONG!
}
That continue code effectively commits the same error as above, but worse, if you only fix the previous error and leave this, this usage will then make your code start leaking sockets.
WSAEWOULDBLOCK is not an error. All it means after a connect() on a nonblcoking socket is that the connection didn't get established immediately. The stack will notify your program when it does.
You get that notification by calling one of select(), WSAEventSelect(), or WSAAsyncSelect(). If you use select(), the socket will be marked writable when the connection gets established. With the other two, you will get an FD_CONNECT event when the connection gets established.
Which of these three APIs to call depends on why you want nonblocking sockets in the first place, and what the rest of the program will look like. What I see so far doesn't need nonblocking sockets at all, but I suppose you have some future plan that will inform your decision. I've written an article, Which I/O Strategy Should I Use (part of the Winsock Programmers' FAQ) which will help you decide which of these options to use; it may instead guide you to another option entirely.
You shouldn't use AI_PASSIVE and connect() on the same socket. Your use of AI_PASSIVE with getaddrinfo() tells the stack you intend to use this socket to accept incoming connections. Then you go and use that socket to make an outgoing connection.
You've basically lied to the stack here. Computers find ways to get revenge when you lie to them.
Sleep() is never the right way to fix problems with Winsock. There are built-in delays within the stack that your program can see, such as TIME_WAIT and the Nagle algorithm, but Sleep() is not the right way to cope with these, either.
Questionable Coding/Design Decisions
This section is for things I don't expect to make your symptom go away, but you should consider fixing them anyway:
The main reason to use getaddrinfo() — as opposed to older, simpler functions like inet_addr() — is if you have to support IPv6. That kind of conflicts with your wish to support XP, since XP's IPv6 stack wasn't nearly as heavily tested during the time XP was the current version of Windows as its IPv4 stack. I would expect XP's IPv6 stack to still have bugs as a result, even if you've got all the patches installed.
If you don't really need IPv6 support, doing it the old way might make your symptoms disappear. You might end up needing an IPv4-only build for XP.
This code:
for (int i = 0; i < 1000; ++i) {
// ...
if (0 != getaddrinfo("172.20.1.59", "8091", &hints, &res)) {
...is inefficient. There is no reason you need to keep reinitializing res on each loop.
Even if there is some reason I'm not seeing, you're leaking memory by not calling freeaddrinfo() on res.
You should initialize this data structure once before you enter the loop, then reuse it on each iteration.
else if (error == 10037) {
Why aren't you using WSAEALREADY here?
You don't need to use WSAConnect() here. You're using the 3-argument subset that Winsock shares with BSD sockets. You might as well use connect() here instead.
There's no sense making your code any more complex than it has to be.
Why aren't you using a switch statement for this?
if (error == WSAEWOULDBLOCK) {
// ...
}
else if (error == WSAEISCONN) {
// ...
}
// etc.
You shouldn't disable the Nagle algorithm:
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, ...);

socket timeout value for iPhone app

I'm writing iPhone application which uses sockets and uses CFSocketConnectToAddress for creating sockets. I need to specify socket timeout in seconds. What is best timeout value in seconds for iPhone/iPod which uses wifi/3g/edge connection ?
Sample code:
#define SOCKET_TIMEOUT_VALUE ?
CFSocketRef sock_id = CFSocketCreate(kCFAllocatorDefault, PF_INET, SOCK_STREAM, IPPROTO_TCP, kCFSocketNoCallBack, NULL, NULL) ;
struct sockaddr_in addr4;
memset(&addr4, 0, sizeof(addr4));
addr4.sin_len = sizeof(addr4);
addr4.sin_family = PF_INET;
addr4.sin_port = htons([[hostValue port_number] intValue]);
inet_pton(AF_INET, inet_ntoa(*(struct in_addr *)host_name->h_addr_list[0]), &addr4.sin_addr);
CFDataRef addr = CFDataCreate(kCFAllocatorDefault, (void*)&addr4, sizeof(struct sockaddr_in));
int retVal = CFSocketConnectToAddress(sock_id, addr, SOCKET_TIMEOUT_VALUE);
if (retVal != 0)
{
// Failed to Connect!
errorNumber = FAILED_CONNECT ;
CFRelease(addr) ;
CFRelease(sock_id) ;
goto shutdown2;
}
Apple documentation for CFSocketConnectToAddress
EDIT:
App will have few socket creations same time for different hosts.
Thanks
Given typical circumstances, a socket will connect within an imperceptibly small amount of time (for the user). Sometimes that isn't the case. The structure of your application (specifically, the threading) should account for this.
The foremost rule is that you shouldn't block the main thread while waiting for the socket to connect. Show an indeterminate progress indicator that you're trying to connect. Give the user the ability to back up and choose not to proceed.
If you're looking for a numerical answer instead of a design answer, I frequently toss in sixty seconds.