socket error 10054 - sockets

I have a C/S program. Client use socket to send a file to server, after send approximate more than 700k data, client(on win7) will receive a socket 10054 error which means Connection reset by peer.
Server worked on CentOS 5.4, client is windows7 virtual machine run in virtual box. client and server communicate via a virtual network interface.
The command port(send log) is normal, but the data port(send file) have the problem.
If it was caused by wrong configuration of socket buffer size or something else?
If anyone can help me check the problem. Thanks.
Every time I call socket send a buffer equals 4096 byte
send(socket, buffer, 4096, 0 )
CentOS socket config.
#sysctl -a
...
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_mem = 196608 262144 393216
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_ecn = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_fack = 1
I'm not quite understand what the socket buffer configuration means, if this will cause the receive incomplete result problem?

It's almost definitely a bug in your code. Most likely, one side thinks the other side has timed out and so closes the connection abnormally. The most common way this happens it that you call a receive function to get data, but you actually already got that data and just didn't realize it. So you're waiting for data that you have already received and thus time out.
For example:
1) Client sends a message.
2) Client sends another message.
3) Server reads both messages but thinks it only got one, sends an acknowledge.
4) Client receives acknowledge, waits for second acknowledge which server will never send.
5) Server waits for second message which it actually already received.
Now the server is waiting for the client and the client is waiting for the server. The server was coded incorrectly and didn't realize that it actually got two messages in one go. TCP does not preserve message boundaries.
If you tell me more about your protocol, I can probably tell you in more detail what went wrong. What constitutes a message? Which side sends when? Are there any acknowledgements? And so on.
But the short version is that each side is probably waiting for the other.
Most likely, the connection reset by peer is a symptom. Your problem occurs, one side times out and aborts the connection. That causes the other side to get a connection reset because the other side aborted the connection.

Related

UDP Socket writes expired packets when Ethernet is reconnected. How do I flush the write buffer from the socket when Ethernet is disconnected?

I have a "Transmit Thread" that manages a socket (#include WinSock2.h) for transmitting all the UDP data my application requires. The application is a c++ Windows app running on Windows 10. I am sending up to 5 or so packets per second to a subnet broadcast address, each packet less than 200 bytes.
The problem is, when I disconnect the Ethernet, there seems to be some un-sent data in the write buffer of the socket than I haven't been able to flush out.
When my application detects the ethernet loss, I close (closesocket) and re-open the socket. Immediately upon re-connection, the socket sends several old messages that were "sent" around when the Ethernet was disconnected.
I think this problem is outside my application. I have disconnected ethernet while the application is running and then closed the application. Immediately upon re-connection, I see that several of the messages get transmitted, despite the application no longer running.
Things I have tried (without luck):
I have tried calling shutdown( m_sock, SD_BOTH ); immediately before closesocket( m_sock );
I tried writing a buffer full of zeroes to the socket immediately before closing
I can't set the SO_DONTLINGER option because my socket is SOCK_DGRAM
When I try WSAIoctl( m_sock, SIO_FLUSH, NULL, 0, NULL, 0, &dwBytesRead, &wsol, NULL ) it returns error 997 (WSA_IO_PENDING)
I don't know how to stop the IO from pending.
Neither WSASendDisconnect( m_sock, NULL ) nor CancelIo( (HANDLE)m_sock ) work, and neither signals error
The problem was not with my software, or with my local NIC, as far as I can tell. My NIC was connected to an Ethernet hub before connecting to other devices on my LAN via a Switch.
The presence of the hub caused the described behavior. Without the hub, I don't experience any "flush" or sending expired packets.

How to know if server rebooted or crashed reliably on the client

I have a Server and Client which is using TCP for communication and file transfer. In the middle of the transfer of file from Server to Client, the server crashes or reboots. The client was doing a read of the connection FD while this was happening and it returned back 0 indicating EOF. This seems to be incorrect, as server might not have sent everything over. How do I check on the client, that the server did not do a graceful shutdown, but was some sort of crash?

epoll_wait missing EPOLLIN events on a TCP socket fd

On the server side: I am using epoll_wait to monitor the possible read IO on a TCP socket.
On the client side: I have a single threaded app to write to the socket that's connected to the server.
The problem is, sometimes epoll_wait doesn't recognize there is new IO to read even after a new message is sent from the client. (I confirmed the message is indeed received by the server using wireshark) So the client is hanging waiting on the response from server. BUT: if I kill the client connection, epoll_wait does get notified!
Originally I am using EPOLLET and thought it would be a problem. But this issue still exists after removing EPOLLET.
Is there any tool that I can use to debug this? (e.g, outside of server process, to confirm that there is IO on the server socket queue but epoll_wait doesn't process it?) Any thought or guidance on how to debug this would be appreciated.

Is TCP Reset (RST) two way?

I have a client-server (Java) application using persistent TCP connections, but sometimes the Server receives java.io.IOException: Connection reset by peer exception when trying to write on the socket, however I don't see any error in the Client log.
This RST is probably caused by an intermediate proxy/router, but if that's the case, should this be seen on the client as well?
If the RST is sent by the client, it can be seen on it using a packet sniffer such as wireshark. However, it won't show up in any user-level sockets since it's sent by the OS as a response to various erroneous inputs (such as connection attempts to a closed port).
If the RST is sent by the network, then it's pretending to be the client to sever the connection. It can do so in one direction, or in both of them. In that case, the client might not see anything, except for a RST sent by the actual server when the client continues to send data to a connection it perceives as open, while the server sees it as closed.
Try capturing the traffic on both the server and the client, see where the resets are coming from.

Dealing with intermittent Winsock errors

My client app gets intermittent winsock errors (10060, 10053) against one particular server we interface with. I have it re-trying the request that failed, but sometimes it fails repeatedly, and I give up after 5 re-tries. Would it be likely to help at all if I closed the socket and created a new one? (I know nothing about the server-side.)
Ok, so the errors that you're getting are:
10060 - WSAETIMEDOUT
10053 - WSAECONNABORTED
When do you get them? What are you doing at the time?
You get a WSAECONNABORTED when the remote end of the connection, or possibly an intermediary router, resets the connection and sends an RST. This could simply be the remote end issuing a non lingering close or it could be the remote end aborting or crashing.
You can't continue doing anything with a connection that has had a WSAECONNABORTED on it as the connection has been aborted and is no more; it is a dead connection, it has passed on...
Context matters immensely as to why you might get a WSAETIMEDOUT exception and the context will dictate if retrying is sensible or not.
One thing I would try is- do tracert to your server.
Often when someone is connected through VPN; you may see this error because your local and remote ip addresses overlap.
e.g. if your local ipaddress range is 192.168.1.xxx and vpn remote range is also 192.168.1.xxx you will also see this error.
sharrajesh