I have two processes communicating through the domain socket. Both server and client socket addresses are set to an abstract socket address.
If client process crashes or be killed, the return value of recvmsg() in the server will get 0 or -1. My code handles these two returned value in the different ways, which are
return 0 (work fine)
server closes connection fd which generated by accept() and waits for another client connection (by accept())
return -1
I got the errno ECONNRESET and the server will close both fds (which generated by socket() and accept()). Then the server will try to restart all socket related connection ( socket() -> unlink() -> bind() -> listen() -> accept()). I fail to bind() the same address and get the errno EADDRINUSE.
My questions are
Is my procedure to handle the ECONNRESET correct?
Why I get EADDRINUSE even if I close the fd before? ( I've checked the return value of close(), it close successfully.)
It is possible when the client side doesn't close the fd (generated by connect()). Then the server will see the address is used?
I want the know the correct way to handle the ECONNRESET without restarting the process.
Thank you for your time!
Related
For a blocking recv with SO_RCVTIMEO set via setsockopt, what is the difference between EAGAIN and ETIMEDOUT?
I have a blocking recv which is occasionally failing, but it fails (returning -1) in different ways depending on the client which is connected to my server. One client produces "Resource temporarily unavailable", and the other produces "Connection timed out". The socket man page says
if no data has been transferred and the timeout has been reached then
-1 is returned with errno set to EAGAIN or EWOULDBLOCK
with no mention of ETIMEDOUT. I'm guessing that one of the clients is still producing TCP keepalives, but I can't find any docs on this. I'm on Linux 3.10, Centos 7.5.
ETIMEDOUT is almost certainly a response to a previous send(). send() is asynchronous. If it doesn't return -1, all that means is that data was transferred into the local socket send buffer. It is sent, or not sent, asynchronously, and if there was an error in that process it can only be delivered via the next system call: in this case, recv().
It isn't clear that there is any problem here to solve
I want to make 2 devices communicate via sockets.
I use this code for the client socket:
Socket socket = Gdx.net.newClientSocket(Net.Protocol.TCP, adress, 1337, socketHints);
(SocketHints: timeout = 4000)
I get a GdxRuntimeException each time this line is being executed. What is wrong with the socket?
Screenshot of stack trace
You get that message because the socket couldn't be opened.
Note the last line about the return in the API:
newClientSocket:
Socket newClientSocket(Net.Protocol protocol,
java.lang.String host,
int port,
SocketHints hints)
Creates a new TCP client socket that connects to the given host and port.
Parameters:
host - the host address
port - the port
hints - additional SocketHints used to create the socket. Input null to use the default setting provided by the system.
Returns:
GdxRuntimeException in case the socket couldn't be opened
Try doing some debugging to find out why you are getting this error.
Is the port already in use? Are you trying to open more than one connection on the same port? Is the server IP valid? Maybe something else is causing the issue?
My experiment showed that I can write to a non-blocking socket just after the connect() call, with no TCP connection established yet, and the written data correctly received by the peer after connection occured (asynchronously). Is this guaranteed on Linux / FreeBSD? I mean, will write() return > 0 when the connection is still in progress? Or maybe I was lucky and the TCP connection was successfully established between the connect() and write() calls?
The experiment code:
int fd = socket (PF_INET, SOCK_STREAM, 0);
fcntl(fd, F_SETFL, O_NONBLOCK)
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(_ip_port.port);
addr.sin_addr.s_addr = htonl(_ip_port.ipv4);
int res = connect(fd, (struct sockaddr*)&addr, sizeof(addr));
// HERE: res == -1, errno == 115 (EINPROGRESS)
int r = ::write(fd, "TEST", 4);
// HERE: r == 4
P.S.
I process multiple listening and connecting sockets (incoming and outgoing connections) in single thread and manage them by epoll. Usually, when I want to create a new outgoing connection, I call non-blocking connect() and wait the EPOLLOUT (epoll event) and then write() my data. But I noticed that I can begin writing before the EPOLLOUT and get appropriate result. Can I trust this approach or should I use my old fashion approach?
P.P.S.
I repeated my experiment with a remote host with latency 170ms and got different results: the write() (just after connect()) returned -1 with errno == EAGAIN. So, yes, my first experiment was not fair (connecting to fast localhost), but still I think the "write() just next to connect()" can be used: if write() returned -1 and EAGAIN, I wait the EPOLLOUT and retry writing. But I agree, this is dirty and useless approach.
Can I write() to a socket just after connect() call, but before TCP connection established?
Sure, you can. It's just likely to fail.
Per the POSIX specification of write():
[ECONNRESET]
A write was attempted on a socket that is not connected.
Per the Linux man page for write():
EDESTADDRREQ
fd refers to a datagram socket for which a peer address has
not been set using connect(2).
If the TCP connect has not completed, your write() call will fail.
At least on Linux, the socket is marked as not writable until the [SYN, ACK] is received from the peer. This means the system will not send any application data over the network until the [SYN, ACK] is received.
If the socket is in non-blocking mode, you must use select/poll/epoll to wait until it becomes writable (otherwise write calls will fail with EAGAIN and no data will be enqueued). When the socket becomes writable, the kernel has usually already sent an empty [ACK] message to the peer before the application has had time to write the first data, which results in some unnecessary overhead due to the API design.
What appears to be working is to after calling connect on a non-blocking socket and getting EINPROGRESS, set the socket to blocking and then start to write data. Then the kernel will internally first wait until the [SYN, ACK] is received from the peer and then send the application data and the initial ACK in a single packet, which will avoid that empty [ACK]. Note that the write call will block until [SYN, ACK] is received and will e.g. return -1 with errno ECONNREFUSED, ETIMEDOUT etc. if the connection fails. This approach however does not work in WSL 1 (Windows Subsystem for Linux), which just fails will EPIPE immediately (no SIGPIPE though).
In any case, not much can be done to eliminate this initial round-trip time due to the design of TCP. If the TCP Fast Open (TFO) feature is supported by both endpoints however, and can accept its security issues, this round-trip can be eliminated. See https://lwn.net/Articles/508865/ for more info.
http://erlangcentral.org/wiki/index.php/Building_a_Non-blocking_TCP_server_using_OTP_principles describe how to build a non-blocking tcp server, and one question about inet_async message.
handle_info({inet_async, ListSock, Ref, Error}, #state{listener=ListSock, acceptor=Ref} = State) ->
error_logger:error_msg("Error in socket acceptor: ~p.\n", [Error]),
{stop, Error, State};
If Error = {error, close}, who close the socket, client or server?
It depends, if you get that error, the socket may not have been opened in the first place. So if you try gen_tcp:send(Socket, "Message") you will get that the connection is closed.
Other reasons that the connection closed could be that the listening socket timed out waiting on a connection, or that gen_tcp:close(Socket) was called before the attempt to send a message.
Also you need to make sure you are connecting to the same port that the server initially opened the listening socket. So to answer your question, it could be either closed the connection.
client:
socket(), connect() and then
for (1 to 1024) {
write(1024 bytes)
}
exit(0);
server:
socket(), bind(), listen()
while (1) {
accept()
while((n = read()) {
if (n == -1) abort(); /* never happended */
total_read += n
}
close()
}
now, client runs on Mac under NAT and server runs on my VPS (abroad)
generally, it works fine (client send all data and exit & server recv all data)
however, when client is running but suddenly the network is broken for couple minutes(and regain), the client won't exit after a long long time... I kill it with control + C and run it again, the server seems not read the data any more (client is still running)
here is what netstat shows:
client:
tcp4 0 130312 192.168.1.254.58573 A.B.C.D.8888 ESTABLISHED
server:
tcp 0 0 A.B.C.D:8888 a.b.c.d:54566 ESTABLISHED 10970/a.out
tcp 102136 0 A.B.C.D:8888 a.b.c.d:60916 ESTABLISHED -
A.B.C.D is my VPS address
a.b.c.d is my public client address
my quesiton is:
1, why ?
2, server will works fine after restarting, how to write code to get rid of it without restarting ?
In TCP, there's no way to tell that a connection has failed unless you try to send something on the connection. TCP doesn't perform active monitoring of the connection (actually, there are optional "keepalive" packets, but these are not normally sent until the connection has been idle for a couple of hours). When you send something, you'll eventually get an error if there's a timeout waiting for the other machine to return an acknowledgement. But if you're just reading data without sending, you can't tell that the connection has failed -- it just looks like the sender doesn't have anything to send.
You can resolve this by designing your application so that the client is required to send something every N seconds. Then set a timer in the server that detects that you haven't received anything for more than N seconds (you should add a little extra time to allow for transient delays).
When the network is broken what happens is that you clients keep sending data and at some point the socket send buffer gets full (I understand from what you show that you are sending 1024 Bytes, 1024 times, 1MB in total). The default for send buffer could be 16KB (surely less than 1MB). Then when the client tries to write, it gets blocked forever.
BTW, now I'm answering your question I don't know whether eventually after a number of TCP timeouts, TCP gives up and closes the socket making the socket interface return with error. I think that's not happening ... :) - So, connect fails if there is a problem in the network but write and read do not fail.
In the server side, the server gets blocked in read because it never receives the EOF.
Solution:
In the client side use non-blocking sockets, if the network is broken, at some point write will return with error EWOULDBLOCK. Then you will realize the send buffer is full for some reason. At that point, you could clouse the connection and try to connect again. If the network is broken, you will receive an error.
In the server side also use non-blocking sockets and select() function with a timeout. After a few timeouts you may decide there is a problem with the new connection and close it.