How to detect when socket connection is lost? - perl

I have a script (I don't have the code example here at the moment but I used IO::Async) which connects to socket on a remote server and listens. Client usually just listens for new data.
Problem is that the client is not able to detect if network problems occur and the socket connection is gone.
I used IO::Async and I also tried it with IO::Socket. Handle is always "connected" after the initial connection is established.
If the network connection is established again the socket connection is naturally still lost because the script has no idea that it should reconnect.
I was thinking to create some kind of "keepAlive" which "pings" (syswrite) the socket every X seconds (if nothing new came through socket) to check whether the connection is still there.
Is this the correct way to do it or is there maybe an another more creative or cleaner solution?

You can set the SO_KEEPALIVE socket option which, for TCP, sends periodic keepalive messages, and may help detect this condition. If this is detected, you will be delivered an EOF condition (most likely causing the containing IO::Async::Stream to fire on_read_eof).
For a better solution you might consider some sort of application-level keepalive message, such as IRC's PING command.

The short answer is there is no default way to automatically detect a dropped socket in perl.
Your approach of pinging would probably work pretty well; you could run a continuous thread in the background that sends ping requests and if it doesn't receive a response the main thread can be notified and a reconnect should be issued.
If you want to get messy you can work with select() to detect keep alive messages; however this may require some OS configuration depending upon your platform.
See this thread for more details: http://www.perlmonks.org/?node_id=566568

Related

Network packet loss causes client code to act strange

I am facing some issues which I need some help on coming with a best way to resolve this.
here is the problem -
I have server code running which has a socket that is listening to accept new incoming connections.
I then attempt to start a client, which also has a socket that is listening to accept new incoming connections.
The client code begins with accepting a new connection on the listening socket file descriptor and gets a new socket file descriptor for I/O.
The server does the same thing and gets a new socket file descriptor for I/O.
Note: The client is not completely up, yet. It needs to receive some bytes from the server and send some before it can start.
I then introduce some packet loss over the TCP/IP network connection. This causes the certain errors (example: the recv() system call in the client process sees no received bytes and then closes the socket connection on the client side and the associated new socket file descriptor is closed.) However, this leaves the client process hanging since there are other descriptors in the FD_SET but none of them are I/O ready. So pselect() keeps returning 0 file descriptors ready for I/O. The client needs to send and receive certain bytes over the connection before it can start up.
My question is more of what should I do here ?
I did research on the SO_KEEPALIVE option when I create the new socket connection during the accept() system call. But I do not think that would resolve my problem here especially if the network packet loss is ongoing.
Should I kill the client process here if I realize there are no file descriptors ready for I/O and never will be ? Is there a better way to approach this ?
If I'm reading the question correctly, the core of the question is: "what should your client program do when a TCP connection that is central to its functionality has been broken?"
The answer to that question is really a matter of preference -- what would you like your client program to do in that case? Or to put it another way, what behavior would your users find most useful?
In many of my own client programs, I have logic included such that if the TCP connection to the server is ever broken, the client will automatically try to create a new TCP connection to the server and thereby recover its connectivity and useful functionality as soon as possible.
The other obvious option would be to just have the client quit when the connection is broken; perhaps with some sort of error indication so that the user will know why the client went away. (perhaps an error dialog that asks if the user would like to try to reconnect?)
SO_KEEPALIVE is probably not going to help you much in this scenario, by the way -- despite its name, its purpose is to help a program discover in a more timely manner that TCP connectivity has been lost, not to try harder to keep a TCP connection from being lost. (And it doesn't even serve that purpose particularly well, since in many TCP stacks only one keepalive packet is sent per hour, or so, which means that even with SO_KEEPALIVE enabled it can be a very long time before your program starts receiving error messages reflecting the loss of network connectivity)

Has the client ACK'd all the data I sent to it?

RFC 7230 defines HTTP/1.1 protocol and it has an interesting passage in 6.6, "Connection management. Tear-down":
To avoid the TCP reset problem, servers typically close a connection
in stages. First, the server performs a half-close by closing only the
write side of the read/write connection. The server then continues to
read from the connection until it receives a corresponding close by
the client, or until the server is reasonably certain that its own TCP
stack has received the client's acknowledgement of the packet(s)
containing the server's last response. Finally, the server fully
closes the connection.
Basically it boils down to the following:
shutdown(s, SD_SEND);
while (recv(s, throaway_buffer, throaway_buffer_len, 0) > 0);
closesocket(s);
which is the standard way of doing the graceful shutdown. However, it also acknowledges that a misbehaving client may exist (that keeps sending requests even after receiving a response with Connection: close header), and that the server has to cope with it by resetting the connection after it's sure the client has received the last response.
However, the socket interface doesn't seem to provide the functionality to learn whether all data passed to send have been actually sent and ACK'd by the remote host. Is it actually there? Without it, all I can think about is to set up a timer of sorts, and call recv until either it signals that the remote host has closed connection or the time is out, whichever comes first. But what would be the appropriate timeout? Is 60 seconds okay?
The Sockets interface provides this mean via the little-used and less understood SO_LINGER option. It allows you inter alia to define a timeout during which close() and possibly shutdown() will block while pending data is being sent. It is of little practical use and as I've stated it is rarely used ... at least rarely used correctly.

Select() is not coming out in client side

I have written one client socket program using linux sockets only. Here is the information giving picture what I am doing in my program
Creating the socket
Making connection with server socket
assigning that socket to read set and exception set for select.
using the select method giving the timeout value NULL in a separate thread
Server is running in one external device.
this program is working fine for reading and all.. Now I am facing problem when I unplug the power cable of that device.
I assumed that when we remove the power cable of the device all the sockets will abruptly closed and connected client sockets will get read event. when we try to read we receive number of bytes read as zero that means connection closed by server.
But in my program when I unplug the power cable of the device, Here in my client program select is not coming out means client socket is not getting any event. I don't understand why..
Any suggestion will be appreciated on how we can come to know that connection is closed by server or any information on whats the sockets behaviour when shutting down the power supply.
I need your help, its very critical.
thank you.
When a remote machine is suddenly cut off the network (network cable unplug or power loss), there is no way it can inform the other side of the connection about that. What is more the client side that performs only reads from a half-open socket (like in your case) won't be able to detect this either.
The only way to know about a connection loss is to to send a packet. Since all data being sent should be acknowledged by the other side, TCP on a client computer will keep retrying to send an unconfirmed portion of data till the number of attempts is exhausted. Then a ETIMEDOUT error should be returned (via a socket that is expecting read events). You can create one more socket for sending these messages periodically to detect a peer disappearance (heart beat connection) on the client side. But all this retries might still take some time.
Another option could be to use SO_KEEPALIVE socket option. After some time a connection has been idle, TCP starts sending probe messages to the server and can detect its disappearance. The default values for idle item are usually enormously huge, so they need to be modified. Some of other parameters that might be related (TCP_KEEPCNT, TCP_KEEPINTVL, TCP_KEEPIDLE). It appears, this option might be implemented differently on different systems or can be simply absent.
I've never personally tried to solve this problem so all this is just a bunch of thoughts that might give some ideas. Here is one more source of ideas.

TCP recover connection after hardware disconnect

I've got a program that continuously writes to a TCP socket. I want to make sure that if the connection between the client and server is disconnected for any amount of time, the connection can be restablished.
Right now, I can disconnect the wire, and while the write() function loops, it returns one "connection reset by peer" error, and then the value of ULLONG_MAX. Then, once I replug the wire, write() continuously returns "broken pipe" errors. I've tried to close and reopen the connection but I continue to get the "connection reset by peer" error.
Does anyone know how I could either restablish the connection or keep it alive for a certain amount of time (or indefinitely) in the first place?
You cannot re-use file descriptor here, you have to start from scratch again - create new socket(2) and call connect(2) on it.
I'm afraid you have to establish a new connection, and that can only initiated by the client program. You might need some way to ensure it's the same client reconnecting maybe check the IP or exchange a token on first connection, so you can do some different kind of initiation on your connection for first connection and recovery. That solution needs some programming on your account, though..
If TCP is not for some reason the only choice, you might want to think about UDP communication, since there the part that decideds when a connection is lost is left to you. But you'll need to take care of a lot of other thinks (but since you are aiming for a lost and recover communication, that might be more to your needs).

Best socket options for client and sever that continuously transfer data

I am using Java (although I think the socket options is implement in most languages) to implement a client and server. The server sends data to the client for processing which the client acknowledges. On another port the client then sends the results of the processing back to the server. When it comes to options such as
SO_LINGER
SO_KEEPALIVE
SO_NODELAY
SO_REUSEADDRESS
SO_SENDBUFFER
SO_RECBUFFER
TCP_NODELAY
We have noticed that the connection between the client and server occasionally breaks. There will be a timeout on the send or the receive. When this happens will kill the socket and open a new one to continue.
What would be the best options to set in terms of the above scenario and is there anything that we could do from our side (programmatically or options-wise) to try minimize the amount of times the connection is dropped. We are using normal TCP/IP.
UPDATE:
The bounty on this ends soon. I haven't had a satisfactory answer yet so it is still open. I think everyone is missing the point of the quest. What is the best practice with regards to the options above for sockets that continuously chat. I have already got a ping packet in that if there is no work to be done (hardly ever the scenario) the normal message is sent with no inner elements so there is always processing.
Strictly speaking, you don't need any of these socket options:
* SO_LINGER
You need to set SO_LINGER only if your application still has outstanding packets to send when close(2) or shutdown(2) has been called. Not really applicable for your application.
* SO_KEEPALIVE
Sending keepalive-pings every two hours would really only help very long-lived but -very- quiet connections going through stateful firewalls with very long session timeouts. (Two hours between pings is entirely too long to be practical in today's Internet.)
* SO_NODELAY
This (presumably an alias for TCP_NODELAY) disables Nagle's algorithm, which is just a small-packet-avoidance problem. Perhaps Nagle is getting in the way in your application, but it takes special sequences of packets to introduce 500ms delays into processing; it never just hangs connections.
* SO_REUSEADDRESS
Useful for all 'servers' that listen on well-known port numbers; use on 'clients' is almost always covering up some bug or other, but it is sometimes necessary if requests must come from a well-known port number.
* SO_SENDBUFFER
* SO_RECBUFFER
These buffer sizes influence the kernel-side buffer sizes maintained for receiving or sending data while your program (receive buffer) or the socket (send buffer) isn't yet ready to accept more data. If these are set too small, your application might not transfer data as smoothly as possible, reducing throughput, but it should not lead to any stalls if these are set smaller than optimal. Of course, too large may put unreasonable demands on kernel memory, but there should be a reasonable system-wide maximum allowed size.
* TCP_NODELAY
Disables Nagle. Not likely to do more than introduce 500ms delays if your application sends multiple small packets before attempting a blocking read.
Really, you shouldn't need to set any socket options.
Can you distill your code into something that could be pasted here and tested or inspected? I'm used to TCP sessions surviving for days or weeks without trouble, so this is pretty surprising.
First I think that this page is relevant, regarding half-open connections.
http://nitoprograms.blogspot.com/2009/05/detection-of-half-open-dropped.html
That being said, TCP is designed to hide connection problems, so you may often find yourself in cases where the connection is broken, but neither side thinks it is. You have addressed this partially by using timeouts and taking that as a sign the connection is broken.
Since you are writing the client and server, I would avoid relying on TCP to tell you when the connection is broken altogether. I would just have the server also acknowledge the receipt of the result from the client. Then both sides will expect immediate responses to their messages, and you can track which messages have been ack'd and set an appropriately small timeout for receiving the ack. This is not a timeout on the send or receive, but a timeout on the time between sending a message and receiving the ack for that message. Then you can set the timeout appropriately depending on the quality of your connection (e.g. very small if you are running on loopback, but large if running over wireless with a weak signal).
Regarding the options you list, you will want to use SO_REUSEADDRESS so that you won't be prevented from reopening the socket, for example if it hasn't finished closing from a previously killed process.
You probably have, but it is best to check the obvious....
Have you verified that it IS the socket that is timing out, and not your code? Sockets are fairly stable, and while there might be an issue somewhere, it seems more likely that it is in your code. I would use logs, timestamps, and synchronised clocks to be sure.
There may be an issue that you genuinely DO take a long time to do the calculation, so maybe adding a 'I'm still thinking about it' message to your protocol that gets sent regularly, to keep the connection alive?
Of course networks will drop out from time to time regardless of what you do, and it sounds like you are already handling that case nicely.
try these options
SO_LINGER - for specyfying when the Socket close s called while some unsent data in the queue
TCP_NODELAY - For non blocking datat transfer
I would strongly encourage you to use a ping/echo model between client and server, so that if no data is sent for x seconds a ping message needs to be send. A typical reason for a break might be a firewall, which shuts down socketss because of inactivity.
The typical issue where the TCP model fails are physical problems e.g. a pulled/broken cable and hangs on one side, where technically someone is listening until a queue overrun kicks in (which might never happen given your amount of data).
What are the chances the connection is going through a NAT firewall somewhere along the way? Stateful firewalls maintain a table of open connections so that packets belonging to an allowed connection can quickly pass through the system, without forcing firewall admins to write overly-complex rule sets.
The downside is that this table can grow immensely large, so it must be pruned as connections are closed or as they appear to have simply grown stale and died quietly. A connection that has gone silent for 20 minutes is usually quiet enough to reaped. (Which is really very quick, as the TCP KEEPALIVE is typically two hours, making it nearly useless in the face of NAT firewalls.)
So: is this going through a NAT firewall? Is the connection quiet for long stretches? If so, add a ping/pong to your protocol, and fire it every few minutes.