Apparently, whenever the closesocket function fails with WSAENETDOWN, “The network subsystem has failed”. But what exactly does that mean? When does it happen? If it does happen, is the socket descriptor still closed? How should I handle it?
Regarding my first question, the Windows Sockets Error Codes page says that WSAENETDOWN means
Network is down.
A socket operation encountered a dead network. This could indicate a serious failure of the network system (that is, the protocol stack that the Windows Sockets DLL runs over), the network interface, or the local network itself.
But that does not really help me either.
Note that POSIX ENETDOWN is only specified for connect, send, sendto, sendmsg, write, not for close. The Winsock counterpart is more ubiquitous.
Related
I am facing some issues which I need some help on coming with a best way to resolve this.
here is the problem -
I have server code running which has a socket that is listening to accept new incoming connections.
I then attempt to start a client, which also has a socket that is listening to accept new incoming connections.
The client code begins with accepting a new connection on the listening socket file descriptor and gets a new socket file descriptor for I/O.
The server does the same thing and gets a new socket file descriptor for I/O.
Note: The client is not completely up, yet. It needs to receive some bytes from the server and send some before it can start.
I then introduce some packet loss over the TCP/IP network connection. This causes the certain errors (example: the recv() system call in the client process sees no received bytes and then closes the socket connection on the client side and the associated new socket file descriptor is closed.) However, this leaves the client process hanging since there are other descriptors in the FD_SET but none of them are I/O ready. So pselect() keeps returning 0 file descriptors ready for I/O. The client needs to send and receive certain bytes over the connection before it can start up.
My question is more of what should I do here ?
I did research on the SO_KEEPALIVE option when I create the new socket connection during the accept() system call. But I do not think that would resolve my problem here especially if the network packet loss is ongoing.
Should I kill the client process here if I realize there are no file descriptors ready for I/O and never will be ? Is there a better way to approach this ?
If I'm reading the question correctly, the core of the question is: "what should your client program do when a TCP connection that is central to its functionality has been broken?"
The answer to that question is really a matter of preference -- what would you like your client program to do in that case? Or to put it another way, what behavior would your users find most useful?
In many of my own client programs, I have logic included such that if the TCP connection to the server is ever broken, the client will automatically try to create a new TCP connection to the server and thereby recover its connectivity and useful functionality as soon as possible.
The other obvious option would be to just have the client quit when the connection is broken; perhaps with some sort of error indication so that the user will know why the client went away. (perhaps an error dialog that asks if the user would like to try to reconnect?)
SO_KEEPALIVE is probably not going to help you much in this scenario, by the way -- despite its name, its purpose is to help a program discover in a more timely manner that TCP connectivity has been lost, not to try harder to keep a TCP connection from being lost. (And it doesn't even serve that purpose particularly well, since in many TCP stacks only one keepalive packet is sent per hour, or so, which means that even with SO_KEEPALIVE enabled it can be a very long time before your program starts receiving error messages reflecting the loss of network connectivity)
According Unix Network Programming by Stevens, EHOSTUNREACH can be returned when readline\recv is used.
However, in linux man pages, EHOSTUNREACH cannot be received by recv.
Who is right?
If an error occurs in the communication the error will be set on the socket and delivered with the next syscall related to the socket. The EHOSTUNREACH error can be (among other things) triggered by sending a UDP packet to a target and getting an ICMP unreachable back. Since this ICMP message comes back only after the send call was done it will not returned for the send but only on the next syscall on the socket which might also be a recv.
Thus I would suggest that this error can be returned in Linux too but I might be wrong. In generally Linux is not UNIX, systems evolve and documentation is often flawed. If you look at the documentation for recv on various platforms you will see that OpenBSD documents EHOSTUNREACHABLE as possible error while FreeBSD, NetBSD, Linux... do not. I would suggest you better expect the unexpected :)
My PC has two gigabit ethernet connections (NICs) - one on the motherboard, and one on a plugin card. I've never used multiple NICs before, and I'm simply not clear on how the OS resolves which NIC to use, and at what stage it occurs. Chance are "you don't have to know" because it happens automatically... but I'd still like to know - does it happen when you call the bind() function, for example, or later during a send or receive? Is it precisely the same process prior to both send and receive? Is it the same for TCP, UDP or any other protocol? Is it different between Windows and UNIX/Linux or Mac systems?
I'm motivated to ask because I have some Winsock2 code that "was working fine", but which stopped working when I reversed the order of the send and receive on a single socket. I discovered that it only received when there was at least one packet sent first.
I'm 99% sure there will be a bug somewhere, but I'd like to be 100% sure in the unlikely case that this is a "feature", or a bug beyond my code... because the symptoms are consistent with the possibility that the receive functionality is working fine, but somehow waiting to receive on the wrong NIC.
It consults the IP routing tables to find the cheapest route, whuch determines the outbound NIC. This happens when you connect(). In UDP if you don't connect, as you usually don't, it happens on send().
I have written one client socket program using linux sockets only. Here is the information giving picture what I am doing in my program
Creating the socket
Making connection with server socket
assigning that socket to read set and exception set for select.
using the select method giving the timeout value NULL in a separate thread
Server is running in one external device.
this program is working fine for reading and all.. Now I am facing problem when I unplug the power cable of that device.
I assumed that when we remove the power cable of the device all the sockets will abruptly closed and connected client sockets will get read event. when we try to read we receive number of bytes read as zero that means connection closed by server.
But in my program when I unplug the power cable of the device, Here in my client program select is not coming out means client socket is not getting any event. I don't understand why..
Any suggestion will be appreciated on how we can come to know that connection is closed by server or any information on whats the sockets behaviour when shutting down the power supply.
I need your help, its very critical.
thank you.
When a remote machine is suddenly cut off the network (network cable unplug or power loss), there is no way it can inform the other side of the connection about that. What is more the client side that performs only reads from a half-open socket (like in your case) won't be able to detect this either.
The only way to know about a connection loss is to to send a packet. Since all data being sent should be acknowledged by the other side, TCP on a client computer will keep retrying to send an unconfirmed portion of data till the number of attempts is exhausted. Then a ETIMEDOUT error should be returned (via a socket that is expecting read events). You can create one more socket for sending these messages periodically to detect a peer disappearance (heart beat connection) on the client side. But all this retries might still take some time.
Another option could be to use SO_KEEPALIVE socket option. After some time a connection has been idle, TCP starts sending probe messages to the server and can detect its disappearance. The default values for idle item are usually enormously huge, so they need to be modified. Some of other parameters that might be related (TCP_KEEPCNT, TCP_KEEPINTVL, TCP_KEEPIDLE). It appears, this option might be implemented differently on different systems or can be simply absent.
I've never personally tried to solve this problem so all this is just a bunch of thoughts that might give some ideas. Here is one more source of ideas.
I have setup a TCP/IP client/server connection that will open and close the connection every time a request is exchaged. It works perfectly; the client app opens the connection, sends the request and waits. The server application receives the request produces a response and sends it back and closes the connection. Cient and server apps do that hundreds of times.
Now I was trying to go to the next step: setup the source IP address and port.
The code was supposed to work on both Linux and Windows, so SO_BINDTODEVICE is out of question, since it is only supported on Linux/Unix.
I tried to bind the source port and ANYADRR on the client socket. And it works... For a while. Eventually it thorws error 10038. I've read over the internet several articles but without clear answer... The selection of the source IP remains unclear.
Please, note that I also have a UNICAST and MULTICAST mode on the same library (connectionless UDP communication modes), a sender and receiver, and I was able to setup the source port/IP on the MULTICAST mode, UNICAST I didn't try yet.
Anyway, anyone know anything that could be of help? I'm using WinSock 2.2 and trying to be as much as possible platform indemendent.
Winsock error 10038 is WSAENOTSOCK, which means you have a bug in your code somewhere. You are trying to do something with a SOCKET handle that is not pointing at a valid socket object. That has nothing to do with the bind() function itself. Either you are calling socket() and not checking its result for an error, or you are trying to use a SOCKET handle that has already been closed by your app, or you have a memory overflow somewhere that is corrupting your SOCKET handle.