Prevent Replay once errored? - system.reactive

I have a stream that simply emits an established network connection, and when that connection goes down for whatever reason, emits an error. An established connection allows me to obtain any number of streams of data from the server.
Since I want to re-use the connection, and I want it to reconnect on errors, I do the following.
var sharedConnection = connection
.Retry()
.Replay(1)
.RefCount()
It works really well, re-connecting whenever there's a network error. When all the streams have run down, the connection is disposed. This is great, and what I'd expect. However, the first subsequent stream request is replayed the initial disposed connection, instead of a new one. Am I doing something wrong?

Related

Half-Established TCP Connections

Half-Established Connections
With a half-established connection I mean a connection for which the client's call to connect() returned successfully, but the servers call to accept() didn't. This can happen the following way: The client calls connect(), resulting in a SYN packet to the server. The server goes into state SYN-RECEIVED and sends a SYN-ACK packet to the client. This causes the client to reply with ACK, go into state ESTABLISHED and return from the connect() call. If the final ACK is lost (or ignored, due to a full accept queue at the server, which is probably the more likely scenario), the server is still in state SYN-RECEIVED and the accept() does not return. Due to timeouts associated with the SYN-RECEIVED state the SYN-ACK will be resend, allowing the client to resend the ACK. If the server is able to process the ACK eventually, it will go into state ESTABLISHED as well. Otherwise it will eventually reset the connection (i.e. send a RST to the client).
You can create this scenario by starting lots of connections on a single listen socket (if you do not adjust the backlog and tcp_max_syn_backlog). See this questions and this article for more details.
Experiments
I performed several experiments (with variations of this code) and observed some behaviour I cannot explain. All experiments where performed using Erlang's gen_tcp and a current Linux, but I strongly suspect that the answers are not specific to this setup, so I tried to keep it more general here.
connect() -> wait -> send() -> receive()
My starting point was to establish a connection from the client, wait between 1 to 5 seconds, send a "Ping" message to the server and wait for the reply. With this setup I observed that the receive() failed with the error closed when I had a half-established connection. There was never an error during the send() on a half-established connection. You can find a more detailed description of this setup here.
connect() -> long wait -> send()
To see, if I can get errors while sending data on a half-established connection I waited for 4 minutes before sending data. The 4 minutes should cover all timeouts and retries associated with the half-established connection. Sending data was still possible, i.e. send() returned without error.
connect() -> receive()
Next I tested what happens if I only call receive() with a very long timeout (5 minutes). My expectation was to get an closed error for the half-established connections, as in the original experiments. Alas, nothing happend, no error was thrown and the receive eventually timed out.
My questions
Is there a common name for what I call a half-established connection?
Why is the send() on a half-established connection successful?
Why does a receive() only fail if I send data first?
Any help, especially links to detailed explanations, are welcome.
From the client's point of view, the session is fully established, it sent SYN, got back SYN/ACK and sent ACK. It is only on the server side that you have a half-established state. (Even if it gets a repeated SYN/ACK from the server, it will just re-ACK because it's in the established state.)
The send on this session works fine because as far as the client is concerned, the session is established. The sent data does not have to be acknowledged by the far side in order to succeed (the send system call is finished when the data is copied into kernel buffers) but see below.
I believe here that the send actually is generating an error on the connection (probably a RST) because the receiving system cannot ACK data on a session it has not finished establishing. My guess is that any system call referencing the socket on the client side that happens after the send plus a short delay (i.e. when the RST has had a chance to come back) will result in an error.
The receive by itself never causes an error because the client side doesn't need to do anything (I mean TCP protocol-wise) for a receive; it's just idly waiting. But once you send some data, you've forced the server side's hand: it either has completed the session establishment (in which case it can accept the data) or it must send a reset (my guess here that it can't "hold" undelivered data on a session that isn't fully established).

Limit connections to server

I'd like to limit the connections to the websocket server. Namely to 1. The new client kicks the old client out.
This somewhat represents what I want to do. Taking what is in messages and sending it through the websocket. If another client connects or the browser refreshes (which should close the old connection, but doesn't for some reason) there are suddenly 2 connections and only every second message receives at the new client.
I use the snap framework for this.
createServer = forkIO $ httpServe defaultConfig app
app = route [("/", runWebSocketsSnap handler)]
handler pending = do
connection <- acceptRequest pending
loop connection
loop connection = do
msg <- takeMVar messages
sendTextData connection msg
{-# NOINLINE messages #-}
messages = unsafePerformIO newEmptyMVar
sendMessage = putMVar messages
I see two different questions here:
how to limit number of connections, so there is at most N clients at the same time;
make sure old connection will not live forever after browser refresh;
I think you mean #2. In that case you should check that the connection is alive. The best way to do that is to ping the client periodically, e.g. using forkPingThread.
If you really need #1, then you should establish shared MVar with ThreadId of the current client in it. When new client connects, just kill the old one.

tcp keep alive basic query

I have a tcp socket for my app. TCP keep alive is enabled with a 10 seconds freq.
In addition, I also have msgs flowing between the app and the server every 1 sec to get status.
So, since there are msgs flowing anyway over the socket at a faster rate, there will be no keep alives flowing at all.
Now,consider this scenario: The remote server is down, so the periodic msg send (that happens every 1 sec) fails 3-5 times in a row. I dont think by enabling tcp keep alives, we can detect that the socket is broken, can we?
Do we have to then build logic in our code to ensure that if this periodic msg fails a certain number of times in a row, the other end is to be assumed dead?
Let me know.
In your application it makes no sense to enable keep alive.
Keep alive is for applications that have an open connection, and don't use it all the time, you are using it all the time so keep alive is not needed.
When you send something and the other end has crashed, TCP on the client will send all retransmissions with an increasing timeout. Finally if you have a blocking socket, you well get an error indication on the send operation where you know that you have to close the socket and retry a connection.
An error indication is where the return code of the socket operation is < 0.
I don't know the value of these timeouts by heart but it can go up to a minute or longer.
When the server is gracefully shutdown, meaning it will close its send of the socket, you will get that information by receiving 0 bytes on your receiving socket.
You might wanna check out my answer of yesterday as well :
Reset TCP connection if server closes/crashes mid connection
No, you don't need to assume anything. The connection will break either because a send will time out or a keep alive will time out. Either way, the connection will break and you'll start getting errors on reads and writes.

Receiving FD_CLOSE when there should be FD_READ

I have a strange problem on one of my clients workstation. I have a simple application that exchanges some data over network between two endpoints.
Basically the transaction goes like this:
Client A listens for incomming connection
Client B connects to A and sends some data
Client A read this data for further processing
Now the strange part is that client A does not receive whole data (sometimes it a part of buffer sometimes it is empty).
The A client uses WSAEventSelect function and waits for FD_READ to read data sent by B and for FD_CLOSE to detect disconnection.
Usually ( everytime except this one particular client) the FD_READ is signaled, data is processed and after that FD_CLOSE is signaled and all is fine, but here instead FD_READ i receive FD_CLOSE.
Can someone tell me how this is possible? Another thing is that program was working fine for about a year and suddenly it crashed.
Now the strange part is that client A does not receive whole data (sometimes it a part of buffer sometimes it is empty).
There's nothing strange about that, that's how TCP works, except that you will never receive zero bytes in blocking mode.
Usually ( everytime except this one particular client) the FD_READ is signaled, data is processed and after that FD_CLOSE is signaled and all is fine, but here instead FD_READ i receive FD_CLOSE.
Note that FD_READ can be signalled any number of times, not just once. You're not guaranteed to receive an entire message in a single read.
Can someone tell me how this is possible?
The client has closed the connection.
Quoting http://msdn.microsoft.com/en-us/library/windows/desktop/ms741576%28v=vs.85%29.aspx
"An application should check for remaining data upon receipt of FD_CLOSE to avoid any possibility of losing data."
So if the error code associated with the FD_CLOSE notification is 0, you should check to see if you still have data to read, that might be where your missing data is.
If the error code is NOT 0, then there was an error and the missing data is probably lost.

Connection refused sockets. Normal behavior?

I have a socket server which accepts multiple connections from various clients. I'm testing it on localhost with a client application which connects - sends data and closes connection 10 times every 10 ms. Some times the test client raises an error: Connection refused by the remote server or something similar.
Is this a normal behavior of the server application ?
10 connects every 10mS is one connection per millisecond, which seems a rather fast rate. Are these connection attempts being made in parallel? If so, perhaps you are filling up the server's listen() backlog-queue; IIRC clients who try to connect while the backlog-queue is full will get a connection-refused error.
To test that hypothesis, try passing in larger or smaller numbers as the second argument to listen() on your server, and see if that makes the connection-refused error occur more or less often.
I'm with Jeremy. You didn't mention the protocol, but I assume it's SOCK_STREAM. It will take longer than 10ms to do the tcp handshake on anything but the most local connection, eventually causing a backlog (and subsequent connection refused error) no matter how high you set your listen backlog to.
You'd be way ahead if you could keep the connection open, and not close it down during each of your computation cycles.