I've got a program that continuously writes to a TCP socket. I want to make sure that if the connection between the client and server is disconnected for any amount of time, the connection can be restablished.
Right now, I can disconnect the wire, and while the write() function loops, it returns one "connection reset by peer" error, and then the value of ULLONG_MAX. Then, once I replug the wire, write() continuously returns "broken pipe" errors. I've tried to close and reopen the connection but I continue to get the "connection reset by peer" error.
Does anyone know how I could either restablish the connection or keep it alive for a certain amount of time (or indefinitely) in the first place?
You cannot re-use file descriptor here, you have to start from scratch again - create new socket(2) and call connect(2) on it.
I'm afraid you have to establish a new connection, and that can only initiated by the client program. You might need some way to ensure it's the same client reconnecting maybe check the IP or exchange a token on first connection, so you can do some different kind of initiation on your connection for first connection and recovery. That solution needs some programming on your account, though..
If TCP is not for some reason the only choice, you might want to think about UDP communication, since there the part that decideds when a connection is lost is left to you. But you'll need to take care of a lot of other thinks (but since you are aiming for a lost and recover communication, that might be more to your needs).
Related
I am facing some issues which I need some help on coming with a best way to resolve this.
here is the problem -
I have server code running which has a socket that is listening to accept new incoming connections.
I then attempt to start a client, which also has a socket that is listening to accept new incoming connections.
The client code begins with accepting a new connection on the listening socket file descriptor and gets a new socket file descriptor for I/O.
The server does the same thing and gets a new socket file descriptor for I/O.
Note: The client is not completely up, yet. It needs to receive some bytes from the server and send some before it can start.
I then introduce some packet loss over the TCP/IP network connection. This causes the certain errors (example: the recv() system call in the client process sees no received bytes and then closes the socket connection on the client side and the associated new socket file descriptor is closed.) However, this leaves the client process hanging since there are other descriptors in the FD_SET but none of them are I/O ready. So pselect() keeps returning 0 file descriptors ready for I/O. The client needs to send and receive certain bytes over the connection before it can start up.
My question is more of what should I do here ?
I did research on the SO_KEEPALIVE option when I create the new socket connection during the accept() system call. But I do not think that would resolve my problem here especially if the network packet loss is ongoing.
Should I kill the client process here if I realize there are no file descriptors ready for I/O and never will be ? Is there a better way to approach this ?
If I'm reading the question correctly, the core of the question is: "what should your client program do when a TCP connection that is central to its functionality has been broken?"
The answer to that question is really a matter of preference -- what would you like your client program to do in that case? Or to put it another way, what behavior would your users find most useful?
In many of my own client programs, I have logic included such that if the TCP connection to the server is ever broken, the client will automatically try to create a new TCP connection to the server and thereby recover its connectivity and useful functionality as soon as possible.
The other obvious option would be to just have the client quit when the connection is broken; perhaps with some sort of error indication so that the user will know why the client went away. (perhaps an error dialog that asks if the user would like to try to reconnect?)
SO_KEEPALIVE is probably not going to help you much in this scenario, by the way -- despite its name, its purpose is to help a program discover in a more timely manner that TCP connectivity has been lost, not to try harder to keep a TCP connection from being lost. (And it doesn't even serve that purpose particularly well, since in many TCP stacks only one keepalive packet is sent per hour, or so, which means that even with SO_KEEPALIVE enabled it can be a very long time before your program starts receiving error messages reflecting the loss of network connectivity)
Imagine connection established between client and server. If one of the participants has lost connection with the network for a short time, will socket still be valid? Mostly I interested in LWIP implementation but something tells me that answer is the same for any socket.
By the way, is it cool idea to change KEEP_ALIVE parameters to the order of seconds when very fast disconnection detection is required but for a short time?
By "connection lost" I mean physical reasons, like loosing connection to a wifi network.
If one of the participants has lost connection with the network for a short time, will socket still be valid?
It depends. Assuming that you mean TCP sockets: if no data had to be exchanged within this time then a short loss of connectivity does not matter at all. If instead data had to be exchanged or TCP keep alive was active then the connection might either degrade (slowing down and retrying to send data in case application data got not yet acknowledged) or get closed with error depending on how long the physical connection loss happened.
In case of UDP or raw sockets it does not care about lost data anyway so nothing important will happen.
I have written one client socket program using linux sockets only. Here is the information giving picture what I am doing in my program
Creating the socket
Making connection with server socket
assigning that socket to read set and exception set for select.
using the select method giving the timeout value NULL in a separate thread
Server is running in one external device.
this program is working fine for reading and all.. Now I am facing problem when I unplug the power cable of that device.
I assumed that when we remove the power cable of the device all the sockets will abruptly closed and connected client sockets will get read event. when we try to read we receive number of bytes read as zero that means connection closed by server.
But in my program when I unplug the power cable of the device, Here in my client program select is not coming out means client socket is not getting any event. I don't understand why..
Any suggestion will be appreciated on how we can come to know that connection is closed by server or any information on whats the sockets behaviour when shutting down the power supply.
I need your help, its very critical.
thank you.
When a remote machine is suddenly cut off the network (network cable unplug or power loss), there is no way it can inform the other side of the connection about that. What is more the client side that performs only reads from a half-open socket (like in your case) won't be able to detect this either.
The only way to know about a connection loss is to to send a packet. Since all data being sent should be acknowledged by the other side, TCP on a client computer will keep retrying to send an unconfirmed portion of data till the number of attempts is exhausted. Then a ETIMEDOUT error should be returned (via a socket that is expecting read events). You can create one more socket for sending these messages periodically to detect a peer disappearance (heart beat connection) on the client side. But all this retries might still take some time.
Another option could be to use SO_KEEPALIVE socket option. After some time a connection has been idle, TCP starts sending probe messages to the server and can detect its disappearance. The default values for idle item are usually enormously huge, so they need to be modified. Some of other parameters that might be related (TCP_KEEPCNT, TCP_KEEPINTVL, TCP_KEEPIDLE). It appears, this option might be implemented differently on different systems or can be simply absent.
I've never personally tried to solve this problem so all this is just a bunch of thoughts that might give some ideas. Here is one more source of ideas.
I have a script (I don't have the code example here at the moment but I used IO::Async) which connects to socket on a remote server and listens. Client usually just listens for new data.
Problem is that the client is not able to detect if network problems occur and the socket connection is gone.
I used IO::Async and I also tried it with IO::Socket. Handle is always "connected" after the initial connection is established.
If the network connection is established again the socket connection is naturally still lost because the script has no idea that it should reconnect.
I was thinking to create some kind of "keepAlive" which "pings" (syswrite) the socket every X seconds (if nothing new came through socket) to check whether the connection is still there.
Is this the correct way to do it or is there maybe an another more creative or cleaner solution?
You can set the SO_KEEPALIVE socket option which, for TCP, sends periodic keepalive messages, and may help detect this condition. If this is detected, you will be delivered an EOF condition (most likely causing the containing IO::Async::Stream to fire on_read_eof).
For a better solution you might consider some sort of application-level keepalive message, such as IRC's PING command.
The short answer is there is no default way to automatically detect a dropped socket in perl.
Your approach of pinging would probably work pretty well; you could run a continuous thread in the background that sends ping requests and if it doesn't receive a response the main thread can be notified and a reconnect should be issued.
If you want to get messy you can work with select() to detect keep alive messages; however this may require some OS configuration depending upon your platform.
See this thread for more details: http://www.perlmonks.org/?node_id=566568
I am wondering if it possible to determine if an accepted socket connection has been disconnected without trying to write to it.
IO::Select still indicates that the socket can be written to with can_write, even after the socket connection has been lost.
Is it possible to check if a TCP connection has been disconnected without writing to it (in the situation where there is an unplanned internet outage).
This is more a TCP than a Perl issue.
Events like a disconnected cable/internet connection do not lead to a TCP event. Thus you must write to a TCP connection to be sure that it is still connected. You might add a ping/echo message for the sole porpose to know that the connection is still available.
Generally, no. You'll usually only get a failure when you write: if you never write, it will just sit there. If you entirely lose network connectivity I've seen errors pop up (on Windows: haven't tried it on Linux) but you're typically required to try writing to it to verify that its alive.