socket programming for bad network - sockets

client:
socket(), connect() and then
for (1 to 1024) {
write(1024 bytes)
}
exit(0);
server:
socket(), bind(), listen()
while (1) {
accept()
while((n = read()) {
if (n == -1) abort(); /* never happended */
total_read += n
}
close()
}
now, client runs on Mac under NAT and server runs on my VPS (abroad)
generally, it works fine (client send all data and exit & server recv all data)
however, when client is running but suddenly the network is broken for couple minutes(and regain), the client won't exit after a long long time... I kill it with control + C and run it again, the server seems not read the data any more (client is still running)
here is what netstat shows:
client:
tcp4 0 130312 192.168.1.254.58573 A.B.C.D.8888 ESTABLISHED
server:
tcp 0 0 A.B.C.D:8888 a.b.c.d:54566 ESTABLISHED 10970/a.out
tcp 102136 0 A.B.C.D:8888 a.b.c.d:60916 ESTABLISHED -
A.B.C.D is my VPS address
a.b.c.d is my public client address
my quesiton is:
1, why ?
2, server will works fine after restarting, how to write code to get rid of it without restarting ?

In TCP, there's no way to tell that a connection has failed unless you try to send something on the connection. TCP doesn't perform active monitoring of the connection (actually, there are optional "keepalive" packets, but these are not normally sent until the connection has been idle for a couple of hours). When you send something, you'll eventually get an error if there's a timeout waiting for the other machine to return an acknowledgement. But if you're just reading data without sending, you can't tell that the connection has failed -- it just looks like the sender doesn't have anything to send.
You can resolve this by designing your application so that the client is required to send something every N seconds. Then set a timer in the server that detects that you haven't received anything for more than N seconds (you should add a little extra time to allow for transient delays).

When the network is broken what happens is that you clients keep sending data and at some point the socket send buffer gets full (I understand from what you show that you are sending 1024 Bytes, 1024 times, 1MB in total). The default for send buffer could be 16KB (surely less than 1MB). Then when the client tries to write, it gets blocked forever.
BTW, now I'm answering your question I don't know whether eventually after a number of TCP timeouts, TCP gives up and closes the socket making the socket interface return with error. I think that's not happening ... :) - So, connect fails if there is a problem in the network but write and read do not fail.
In the server side, the server gets blocked in read because it never receives the EOF.
Solution:
In the client side use non-blocking sockets, if the network is broken, at some point write will return with error EWOULDBLOCK. Then you will realize the send buffer is full for some reason. At that point, you could clouse the connection and try to connect again. If the network is broken, you will receive an error.
In the server side also use non-blocking sockets and select() function with a timeout. After a few timeouts you may decide there is a problem with the new connection and close it.

Related

Windows TCP socket, writing and reading full

We have a situation where client writes faster than the server can read, say every 1 second or less a client writes to a server making the tcp socket buffer full and therefore disconnects.
How to handle this sort of situation?
Is there a way to check tcp socket buffer from client side before writing and waits until buffer is freed and can send again?
Here is a sample pseudo code to easily reproduce the issue
Server
socket = create server Socket at port 7777;
socket->Accept(); //wait for just 1 connection
while(true)
{
// just do nothing and let the client fill the buffe
}
Client
socket = connect to localhost 7777
while(true)
{
socket->write("hello from test");
}
this will loop until write buffer is full, and it will hang up, and will disconnects with win socket error 10057.

Timed out recv producing both EAGAIN and ETIMEDOUT?

For a blocking recv with SO_RCVTIMEO set via setsockopt, what is the difference between EAGAIN and ETIMEDOUT?
I have a blocking recv which is occasionally failing, but it fails (returning -1) in different ways depending on the client which is connected to my server. One client produces "Resource temporarily unavailable", and the other produces "Connection timed out". The socket man page says
if no data has been transferred and the timeout has been reached then
-1 is returned with errno set to EAGAIN or EWOULDBLOCK
with no mention of ETIMEDOUT. I'm guessing that one of the clients is still producing TCP keepalives, but I can't find any docs on this. I'm on Linux 3.10, Centos 7.5.
ETIMEDOUT is almost certainly a response to a previous send(). send() is asynchronous. If it doesn't return -1, all that means is that data was transferred into the local socket send buffer. It is sent, or not sent, asynchronously, and if there was an error in that process it can only be delivered via the next system call: in this case, recv().
It isn't clear that there is any problem here to solve

Can I write() to a socket just after connect() call, but before TCP connection established?

My experiment showed that I can write to a non-blocking socket just after the connect() call, with no TCP connection established yet, and the written data correctly received by the peer after connection occured (asynchronously). Is this guaranteed on Linux / FreeBSD? I mean, will write() return > 0 when the connection is still in progress? Or maybe I was lucky and the TCP connection was successfully established between the connect() and write() calls?
The experiment code:
int fd = socket (PF_INET, SOCK_STREAM, 0);
fcntl(fd, F_SETFL, O_NONBLOCK)
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(_ip_port.port);
addr.sin_addr.s_addr = htonl(_ip_port.ipv4);
int res = connect(fd, (struct sockaddr*)&addr, sizeof(addr));
// HERE: res == -1, errno == 115 (EINPROGRESS)
int r = ::write(fd, "TEST", 4);
// HERE: r == 4
P.S.
I process multiple listening and connecting sockets (incoming and outgoing connections) in single thread and manage them by epoll. Usually, when I want to create a new outgoing connection, I call non-blocking connect() and wait the EPOLLOUT (epoll event) and then write() my data. But I noticed that I can begin writing before the EPOLLOUT and get appropriate result. Can I trust this approach or should I use my old fashion approach?
P.P.S.
I repeated my experiment with a remote host with latency 170ms and got different results: the write() (just after connect()) returned -1 with errno == EAGAIN. So, yes, my first experiment was not fair (connecting to fast localhost), but still I think the "write() just next to connect()" can be used: if write() returned -1 and EAGAIN, I wait the EPOLLOUT and retry writing. But I agree, this is dirty and useless approach.
Can I write() to a socket just after connect() call, but before TCP connection established?
Sure, you can. It's just likely to fail.
Per the POSIX specification of write():
[ECONNRESET]
A write was attempted on a socket that is not connected.
Per the Linux man page for write():
EDESTADDRREQ
fd refers to a datagram socket for which a peer address has
not been set using connect(2).
If the TCP connect has not completed, your write() call will fail.
At least on Linux, the socket is marked as not writable until the [SYN, ACK] is received from the peer. This means the system will not send any application data over the network until the [SYN, ACK] is received.
If the socket is in non-blocking mode, you must use select/poll/epoll to wait until it becomes writable (otherwise write calls will fail with EAGAIN and no data will be enqueued). When the socket becomes writable, the kernel has usually already sent an empty [ACK] message to the peer before the application has had time to write the first data, which results in some unnecessary overhead due to the API design.
What appears to be working is to after calling connect on a non-blocking socket and getting EINPROGRESS, set the socket to blocking and then start to write data. Then the kernel will internally first wait until the [SYN, ACK] is received from the peer and then send the application data and the initial ACK in a single packet, which will avoid that empty [ACK]. Note that the write call will block until [SYN, ACK] is received and will e.g. return -1 with errno ECONNREFUSED, ETIMEDOUT etc. if the connection fails. This approach however does not work in WSL 1 (Windows Subsystem for Linux), which just fails will EPIPE immediately (no SIGPIPE though).
In any case, not much can be done to eliminate this initial round-trip time due to the design of TCP. If the TCP Fast Open (TFO) feature is supported by both endpoints however, and can accept its security issues, this round-trip can be eliminated. See https://lwn.net/Articles/508865/ for more info.

FIN,ACK after PSH,ACK

I'm trying to implement a communication between a legacy system and a Linux system but I constantly get one of the following scenarios:
(The legacy system is server, the Linux is client)
Function recv(2) returns 0 (the peer has performed an orderly shutdown.)
> SYN
< SYN, ACK
> ACK
< PSH, ACK (the data)
> FIN, ACK
< ACK
> RST
< FIN, ACK
> RST
> RST
Function connect(2) returns -1 (error)
> SYN
< RST, ACK
When the server have send its data, the client should answer with data, but instead I get a "FIN, ACK"
Why is it like this? How should I interpret this? I'm not that familiar with TCP at this level
When the server have send its data, the client should answer with data, but I instead get a "FIN, ACK" Why is it like this? How should I interpret this?
It could be that once the server has sent the data (line 4) the client closes the socket or terminates prematurely and the operating system closes its socket and sends FIN (line 5). The server replies to FIN with ACK but the client has ceased to exist already and its operating system responds with RST. (I would expect the client OS to silently ignore and discard any TCP segments arriving for a closed connection during the notorious TIME-WAIT state, but that doesn't happen for some reason.)
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_termination:
Some host TCP stacks may implement a half-duplex close sequence, as Linux or HP-UX do. If such a host actively closes a connection but still has not read all the incoming data the stack already received from the link, this host sends a RST instead of a FIN (Section 4.2.2.13 in RFC 1122). This allows a TCP application to be sure the remote application has read all the data the former sent—waiting the FIN from the remote side, when it actively closes the connection. However, the remote TCP stack cannot distinguish between a Connection Aborting RST and this Data Loss RST. Both cause the remote stack to throw away all the data it received, but that the application still didn't read
After FIN, PSH, ACK --> One transaction completed
Second request receiving but sending [RST] seq=140 win=0 len=0

Recover a TCP connection

I have a simple Python server which can handle multiple clients:
import select
import socket
import sys
host = ''
port = 50000
backlog = 5
size = 1024
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind((host,port))
server.listen(backlog)
input = [server,sys.stdin]
running = 1
while running:
inputready,outputready,exceptready = select.select(input,[],[])
for s in inputready:
if s == server:
# handle the server socket
client, address = server.accept()
input.append(client)
elif s == sys.stdin:
# handle standard input
junk = sys.stdin.readline()
running = 0
else:
# handle all other sockets
data = s.recv(size)
if data:
s.send(data)
else:
s.close()
input.remove(s)
server.close()
One client connects to it and they can communicate. I have a third box from where I am sending a RST signal to the server (using Scapy). The TCP state diagram does not say if an endpoint is supposed to try to recover a connection when it sees a RESET. Is there any way I can force the server to recover the connection? (I want it to send back a SYN so that it gets connected to the third client)
Your question doesn't make much sense. TCP just doesn't work like that.
Re "The TCP state diagram does not say if an endpoint is supposed to try to recover a connection when it sees a RESET": RFC 793 #3.4 explicitly says "If the receiver was in any other state [than LISTEN or SYN-RECEIVED], it aborts the connection and advises the user and goes to the CLOSED state.".
An RST won't disturb a connection unless it arrives over that connection. I guess you could plausibly forge one, but you would have to know the current TCP sequence number, and you can't get that from within either of the peers, let alone a third host.
If you succeeded somehow, the connection would then be dead, finished, kaput. Can't see the point of that either.
I can't attach any meaning to your requirement for the server to send a SYN to the third host, in response to an RST from the third host, that has been made to appear as though it came from the second host. TCP just doesn't work anything like this either.
If you want the server to connect to the third host it will just have to call connect() like everybody else. In which case it becomes a client, of course.