A customer of mine has a Windows application where there is a network connection between two machines. The system is supposed to cope with the connection being lost. It does this by keeping a counter on the client position which is reset every time data is received from the server. If the counter reaches 60 seconds (i.e. we haven't heard from the server for 60 seconds) it performs some expected action to cope with the connection being lost.
The customer has a problem, however, where sometimes the connection will be lost but the client doesn't perform the expected action. Upon investigation, it appears that this is an intermittent problem caused by the client's socket to the server sometimes raising error 10057 (WSAENOTCONN / "Socket is not connected") when the connection is lost. Because the client behaves differently when it gets a socket error the customer doesn't get the desired behaviour when they get this socket error. This is not difficult for me to fix, but I am a bit puzzled by the different behaviour.
To reproduce the problem I'm physically pulling the network cable out of the back of my server machine. The majority of the time, the effect on the client side is that we just don't get any data over the socket, and we don't get an error. Some fraction of the time however error 10057 is raised. Can anyone shed any light on why there is this inconsistency? The client socket is a nonblocking STREAM socket.
I would expect you would get an error only if you try to send something. That is when the TCP connection would discover it can't reach the other end point. This will take a variable amount of time to discover the failure, depending on the network round trip time. There might be a "keep alive" option, that forces the socket to periodically send something to detect failure even when app is idle.
WSAENOTCONN is a bug in your application. It isn't a result of a lost connection. The result of a lost connection is WSAECONNRESET. Your code must have got WSAECONNRESET, and then proceed to use the connection as though it was still valid. Then you get WSAENOTCONN.
Related
I have a client/server app that maintains a socket continuously. When the client signs off, it sends a 'signing off' message to the server and then closes the socket and cleans up. The server cleans up and closes the socket when it receives this message - and does not reply to the message.
On a fairly regular basis, I see "connection reset by peer" errors getting logged by the server without any complaints from end users, and I figure this must be an occasional timing issue in my sign-off sequence. I do see the same errors when end users complain about their connections actually being dropped, so I'm wondering how to tell the difference between those scenarios - or even better, how to prevent the bogus 'connection reset' scenario in the normal case.
I'm guessing that in some cases the server's getting hit by the closed socket before (or during) receipt of the "signing off" message. Is this possible? Is there a proper sequence you're supposed to follow for letting a server know that the client is about to terminate before actually closing the socket? Some way to check that the last message was delivered prior to closing?
Thanks,
Rob
The shutdown(s, SHUT_RDWR) function should solve your problem. There's a more complete explanation in this document.
This usually means that you have either written to a connection that had already been closed by the peer, or closed a connection without reading all the pending incoming data. In other words, an application protocol error.
I have a client and a server program in java. The server is sending data to the client over wifi. A break of wifi connection at the client causes it to throw an exception and get out of the receive loop. The server still keeps sending data. The method used by the server is oos.writeObject(). How can I detect the break of wifi at the client at the server and cause the server to get out of the send loop?
The reason why your client (under some types of failure modes of the wifi connection) notices immediately when the connection goes away is because the wifi interface is local to the client and the operating system tears it down as soon as it is detected as down. From the server's point of view, it's a faraway network link that fails.
When the connection goes down, at first the server will continue to buffer data. If the buffers (the server's operating system buffer plus the TCP window size) become full, the server will block if it tries to send more data.
Later, if the connection is still down, the server will declare that the connection has timed out and you will get an exception. If, instead, the connection comes back up, the buffers will empty and normal data flow will resume.
Even if you could, you don't actually want to change things so that server's connection times out immediately when any faraway network hop fails. If you did that, your connection would be very fragile: every little network reconvergence event or short transient failure would break your connection.
I have written one client socket program using linux sockets only. Here is the information giving picture what I am doing in my program
Creating the socket
Making connection with server socket
assigning that socket to read set and exception set for select.
using the select method giving the timeout value NULL in a separate thread
Server is running in one external device.
this program is working fine for reading and all.. Now I am facing problem when I unplug the power cable of that device.
I assumed that when we remove the power cable of the device all the sockets will abruptly closed and connected client sockets will get read event. when we try to read we receive number of bytes read as zero that means connection closed by server.
But in my program when I unplug the power cable of the device, Here in my client program select is not coming out means client socket is not getting any event. I don't understand why..
Any suggestion will be appreciated on how we can come to know that connection is closed by server or any information on whats the sockets behaviour when shutting down the power supply.
I need your help, its very critical.
thank you.
When a remote machine is suddenly cut off the network (network cable unplug or power loss), there is no way it can inform the other side of the connection about that. What is more the client side that performs only reads from a half-open socket (like in your case) won't be able to detect this either.
The only way to know about a connection loss is to to send a packet. Since all data being sent should be acknowledged by the other side, TCP on a client computer will keep retrying to send an unconfirmed portion of data till the number of attempts is exhausted. Then a ETIMEDOUT error should be returned (via a socket that is expecting read events). You can create one more socket for sending these messages periodically to detect a peer disappearance (heart beat connection) on the client side. But all this retries might still take some time.
Another option could be to use SO_KEEPALIVE socket option. After some time a connection has been idle, TCP starts sending probe messages to the server and can detect its disappearance. The default values for idle item are usually enormously huge, so they need to be modified. Some of other parameters that might be related (TCP_KEEPCNT, TCP_KEEPINTVL, TCP_KEEPIDLE). It appears, this option might be implemented differently on different systems or can be simply absent.
I've never personally tried to solve this problem so all this is just a bunch of thoughts that might give some ideas. Here is one more source of ideas.
I have a script (I don't have the code example here at the moment but I used IO::Async) which connects to socket on a remote server and listens. Client usually just listens for new data.
Problem is that the client is not able to detect if network problems occur and the socket connection is gone.
I used IO::Async and I also tried it with IO::Socket. Handle is always "connected" after the initial connection is established.
If the network connection is established again the socket connection is naturally still lost because the script has no idea that it should reconnect.
I was thinking to create some kind of "keepAlive" which "pings" (syswrite) the socket every X seconds (if nothing new came through socket) to check whether the connection is still there.
Is this the correct way to do it or is there maybe an another more creative or cleaner solution?
You can set the SO_KEEPALIVE socket option which, for TCP, sends periodic keepalive messages, and may help detect this condition. If this is detected, you will be delivered an EOF condition (most likely causing the containing IO::Async::Stream to fire on_read_eof).
For a better solution you might consider some sort of application-level keepalive message, such as IRC's PING command.
The short answer is there is no default way to automatically detect a dropped socket in perl.
Your approach of pinging would probably work pretty well; you could run a continuous thread in the background that sends ping requests and if it doesn't receive a response the main thread can be notified and a reconnect should be issued.
If you want to get messy you can work with select() to detect keep alive messages; however this may require some OS configuration depending upon your platform.
See this thread for more details: http://www.perlmonks.org/?node_id=566568
I've got a program that continuously writes to a TCP socket. I want to make sure that if the connection between the client and server is disconnected for any amount of time, the connection can be restablished.
Right now, I can disconnect the wire, and while the write() function loops, it returns one "connection reset by peer" error, and then the value of ULLONG_MAX. Then, once I replug the wire, write() continuously returns "broken pipe" errors. I've tried to close and reopen the connection but I continue to get the "connection reset by peer" error.
Does anyone know how I could either restablish the connection or keep it alive for a certain amount of time (or indefinitely) in the first place?
You cannot re-use file descriptor here, you have to start from scratch again - create new socket(2) and call connect(2) on it.
I'm afraid you have to establish a new connection, and that can only initiated by the client program. You might need some way to ensure it's the same client reconnecting maybe check the IP or exchange a token on first connection, so you can do some different kind of initiation on your connection for first connection and recovery. That solution needs some programming on your account, though..
If TCP is not for some reason the only choice, you might want to think about UDP communication, since there the part that decideds when a connection is lost is left to you. But you'll need to take care of a lot of other thinks (but since you are aiming for a lost and recover communication, that might be more to your needs).