nginx detecting sudden disconnect of client - sockets

At this point we are using long poll with keepalives on the tcp level.
If we pull the eth cable out of the client, client does not detect a dead connection until the keepalive counters finish what they are supposed to do.
But what happens on the nginx level ? When and how will nginx detect the same dead connection? How long will the connection be open from the nginx side ?
Should we use keepalive from nginx to client as well ?
Thank you!

Related

Reusing TIME_WAIT socket and the ephemeral port limit

There is a web Service behind Nginx in a reverse proxy mode.
The Service and Nginx are on the same Linux host. Clients are on a separate host.
Nginx is configured to start a new connection on each HTTP request to the service and it sends the "Connection: close" HTTP header so the receiver closes TCP connection after returning the response to Nginx.
Because the Service always actively closes TCP connections the corresponding sockets are left in TIME_WAIT state.
Between the client and Nginx there is a fixed number of TCP connections because Nginx is configured to use 'keepalive' for incoming requests.
Under a heavy load clients produce a lot of HTTP requests and the number of TIME_WAIT sockets increases. One might expect that soon the system runs out of ephemeral ports, but it does not.
When the total number of sockets used between Nginx and the Service (in any states: TIME_WAIT, ESTABLISHED and so on) reaches exactly the half of a maximum number of ephemeral ports - 14115 (it is (60999-32768)/2) - the system starts reusing TIME_WAIT sockets without waiting for 60s timeout.
In details.
The Service processes incoming requests in ~21ms, so when there is only one client sending request one after another there will be ~2900 (60s/21ms = 2857) sockets in TIME_WAIT state after 60s, then they start transferring to the 'CLOSED' state and can be reused. The output of ss command confirms this.
When there two clients the number of TIME_WAITS is ~5800 and so on. But when the number of TIME_WAITs reaches ~14000 it stops increasing. "ss -o" shows that TIME_WAIT sockets are reused before TIME_WAIT timer expiration (60s).
The question is where does this limit of 14000 come from?

Does haproxy buffer tcp request body when backend is down?

I am using haproxy 1.6.4 as TCP(not HTTP) proxy.
My clients are making TCP requests. They do not wait for any response, they just send the data and close the connection.
How haproxy behaves when all back-end nodes are down?
I see that (from the client point of view) haproxy is accepting incomming connections.
Haproxy statistics show that front-end has status OPEN, he is accepting connections.
Number of sessions and bytes-in increases for frontend, but not for back-end (he is DOWN).
Is haproxy buffering incoming TCP requests, and will pass them to the back-end once back-end is up?
If yes, it is possible to configure this buffer size? Where data is buffered (in memory, disk?)
Is this possible to turn off front-end (do not accept incoming TCP connections) when all back-end nodes are DOWN?
Edit:
when backend started, I see that
* backend in-bytes and sessions is equal to front-end number of sessions
* but my one and only back-end node has fever number of bytes-in, fever sessions and has errors.
So, it seems that in default configuration there is no tcp buffering.
Data is accepted by haproxy even if all backend nodes are down, but this data is lost.
I would prefer to turn off tcp front-end when there are no backend servers- so client connections would be rejected. Is that possible?
edit:
haproxy log is
Jul 15 10:02:32 172.17.0.2 haproxy[1]: 185.130.180.3:11319
[15/Jul/2016:10:02:32.335] tcp-in app/ -1/-1/0 0 SC \0/0/0/0/0
0/0 908
my log format is
%ci:%cp\ [%t]\ %ft\ %b/%s\ %Tw/%Tc/%Tt\ %B\ %ts\ \%ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U
What I understand from log:
there are no backeend servers
termination state SC translates to
S : the TCP session was unexpectedly aborted by the server, or the
server explicitly refused it.
C : the proxy was waiting for the CONNECTION to establish on the
server. The server might at most have noticed a connection attempt.
I don't think what you are looking for is possible. HAproxy handles the two sides of the connection (frontend, backend) separately. The incoming TCP connection is established first and then HAproxy looks for a matching destination for it.

tcp connection issue for unreachable server after connection

I am facing an issue with tcp connection..
I have a number of clients connected to the a remote server over tcp .
Now,If due to any issue i am not able to reach my server , after the successful establishment of the tcp connection , i do not receive any error on the client side .
On client end if i do netstat , it shows me that clients are connected the remote server , even though i am not able to ping the server.
So,now i am in the case where the server shows it is not connected to any client and on another end the client shows it is connected the server.
I have tested this for websocket also with node.js , but the same behavior persists over there also .
I have tried to google it around , but no luck .
Is there any standard solution for that ?
This is by design.
If two endpoints have a successful socket (TCP) connection between each other, but aren't sending any data, then the TCP state machines on both endpoints remains in the CONNECTED state.
Imagine if you had a shell connection open in a terminal window on your PC at work to a remote Unix machine across the Internet. You leave work that evening with the terminal window still logged in and at the shell prompt on the remote server.
Overnight, some router in between your PC and the remote computer goes out. Hours later, the router is fixed. You come into work the next day and start typing at the shell prompt. It's like the loss of connectivity never happened. How is this possible? Because neither socket on either endpoint had anything to send during the outage. Given that, there was no way that the TCP state machine was going to detect a connectivity failure - because no traffic was actually occurring. Now if you had tried to type something at the prompt during the outage, then the socket connection would eventually time out within a minute or two, and the terminal session would end.
One workaround is to to enable the SO_KEEPALIVE option on your socket. YMMV with this socket option - as this mode of TCP does not always send keep-alive messages at a rate in which you control.
A more common approach is to just have your socket send data periodically. Some protocols on top of TCP that I've worked with have their own notion of a "ping" message for this very purpose. That is, the client sends a "ping" message over the TCP socket every minute and the server responds back with "pong" or some equivalent. If neither side gets the expected ping/pong message within N minutes, then the connection, regardless of socket error state, is assumed to be dead. This approach of sending periodic messages also helps with NATs that tend to drop TCP connections for very quiet protocols when it doesn't observe traffic over a period of time.

AWS TCP ELB refuse connection when there is no available back-end server

We have a TCP application that receives connections in a protocol that we did not design and don’t control.
This protocol will assume that if it can establish a TCP connection, then it can send a message and that message is acknowledged.
This works ok if connecting directly to a machine, if the machine or application is down, the tcp connection will be refused or dropped and the client will attempt to redeliver the message.
When we use AWS elastic load balancer, ELB will establish a TCP connection with the client, regardless of whether there is an available back-end server to fulfil the request.
As a result if our application or server crashes then we lose messages.
ELB will close the TCP connection shortly thereafter, but its not good enough.
Is there a way to make ELB, only establish a connection if it can reach the back-end server?
What options do we have (within the AWS ecosystem), of balancing a TCP based service, while still refusing connections if they cannot be served.
I don't think that's achievable through ELB. By design a load balancer will manage 2 sets of connections (frontend - LB and LB - backend). The load balancer will attempt to minimize the time it takes to serve the traffic it receives. This means that the FE-LB connection will be established as the LB looks for a Backend connection to use / reuse. The case in which all of the Backend hosts are dead is such an edge case that you end up with the behavior you are seeing. Normally it's not a big deal as the requested will just get disconnected once the LB figures out that it cannot server the traffic.
Back to your protocol: to me it seem really weird that you would interpret the ability to establish a connection as equal to message delivery. It sounds like you're using TCP but not waiting for the confirmations that the message were actually received at the destination. To me that seems wrong and will get you in trouble eventually with or without a load balancer.
And not to sound too pessimistic (I do understand we are not living in an ideal world) what I would do in this specific scenario, if you can deploy additional software on the client, would be to use a tcp proxy on the client that would get disabled automatically whenever the load balancer is unhealthy/unable to serve traffic. Instruct the client to connect to this proxy. Far from ideal but it should do the trick.
You could create a health check from your ELB to verify if the backend EC2 instances respond on the TCP port. See ELB Health Checks
Then, you monitor the health status of the EC2 instances sent by the ELB to CloudWatch.
Once you determine that none of the EC2 instances are responding on the TCP port, you can remove the TCP listener from the ELB. See Delete ELB Listeners
Hopefully, at that point the ELB stops accepting TCP connections.
Note, I have not tested this solution.

Nginx proxy hangs in a while when idle

We are using nginx proxy_pass feature for bridging RESTful calls to a backend app, and use nginx web socket proxy for the same system at eh same time. Sometimes (guess when the system has no client request for a while) the nginx freezes any request till we restart it and then anything works well. What is the problem? DO I have to change keep-alive settings? I have turned off buffer and cache feature for proxy in nginx.conf.
I found the problem. By checking nginx error log and a bit a hackery sniff and guess, I found out that the web socket connections usually disconnect and reconnect (mobile devices) and the nginx peer tries to keep the connection alive, and then maximum connection limit reaches. I just decreased timeouts and increased max connections.