Haproxy still dispatches connections to backend server when it's graceful restarting - haproxy

We are using haproxy for thrift(rpc) server load balancing in tcp mode. But we've encountered one problem when backend server restarts.
When our thrift(rpc) server restarts, it first stop listening on the port which haproxy is configured to connect, but will still process running requests until they are all done(graceful restart).
So during restarting period, there are still connected sockets made from client to backend server via haproxy while backend server is not accepting any new connections, but haproxy still treats this backend server as healthy, and will dispatch new connections to this server. Any new connections dispatched to this server will take quite a long time to connect, and then timeout.
Is there any way to notify haproxy that server has stop listening and not to dispatch any connection to it?
I've tried following:
timeout connect set to very low + redispatch + retry 3
option tcp-check
Both not solve the problem.

Related

Understanding keepalive between client and cockroachdb with haproxy

We are facing a problem where our client lets name it A. Is attempting to connect DB server (Cockroach) name B load balanced via ha-proxy
A < -- > haproxy < -- > B
Now at every, while our client A is receiving Broken Pipe error.
But I'm not able to understand why?
Cockroach server already has the below default value i.e 60 seconds.
COCKROACH_SQL_TCP_KEEP_ALIVE ## which is enabled to send for 60 second
Plus our haproxy config has the following setting.
defaults
mode tcp
# Timeout values should be configured for your specific use.
# See: https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-timeout%20connect
timeout connect 10s
timeout client 1m
timeout server 1m
# TCP keep-alive on client side. Server already enables them.
option clitcpka
option clitcpka
So what is causing the TCP connection to drop when the keepalive is enabled on every end.
Keepalive is what makes connections go away if one of the end points has died without closing the connection. Investigate in that direction.
The only time keepalive actually keeps the connection alive is in connection with an ill-configured firewall that drops idle connections.

In HAProxy, is it possible to stop routing traffic to a specific server with active sessions by disabling the server?

I am trying implement the following setup
HA -|
|- Redis1
|- Redis2
At any time only one of the redis instances should serve the incoming requests.
Going by the documentation, it seems that you can disable a server dynamically and HA would stop directing the traffic to the disabled server.
While this worked for new client connections, existing client connections are still served content from the disabled server.
But if I kill the redis instance, even the existing client connections are redirected to the other instance.
Is it possible to achieve this behavior without killing the instance?
Heres my HA config:
global
stats socket /opt/haproxy/admin.sock mode 660 level admin
stats socket ipv4#*:19999 level admin
defaults
log global
mode tcp
listen myproxy
mode tcp
bind *:4444
balance roundrobin
server redis1 127.0.0.1:6379 check
server redis2 127.0.0.1:7379 check
Found the answer. Need to add the following directives:
on-marked-down shutdown-sessions
This closes any existing sessions. Eg:
server redis1 127.0.0.1:6379 check on-marked-down shutdown-sessions
server redis2 127.0.0.1:7379 check on-marked-down shutdown-sessions

Does haproxy buffer tcp request body when backend is down?

I am using haproxy 1.6.4 as TCP(not HTTP) proxy.
My clients are making TCP requests. They do not wait for any response, they just send the data and close the connection.
How haproxy behaves when all back-end nodes are down?
I see that (from the client point of view) haproxy is accepting incomming connections.
Haproxy statistics show that front-end has status OPEN, he is accepting connections.
Number of sessions and bytes-in increases for frontend, but not for back-end (he is DOWN).
Is haproxy buffering incoming TCP requests, and will pass them to the back-end once back-end is up?
If yes, it is possible to configure this buffer size? Where data is buffered (in memory, disk?)
Is this possible to turn off front-end (do not accept incoming TCP connections) when all back-end nodes are DOWN?
Edit:
when backend started, I see that
* backend in-bytes and sessions is equal to front-end number of sessions
* but my one and only back-end node has fever number of bytes-in, fever sessions and has errors.
So, it seems that in default configuration there is no tcp buffering.
Data is accepted by haproxy even if all backend nodes are down, but this data is lost.
I would prefer to turn off tcp front-end when there are no backend servers- so client connections would be rejected. Is that possible?
edit:
haproxy log is
Jul 15 10:02:32 172.17.0.2 haproxy[1]: 185.130.180.3:11319
[15/Jul/2016:10:02:32.335] tcp-in app/ -1/-1/0 0 SC \0/0/0/0/0
0/0 908
my log format is
%ci:%cp\ [%t]\ %ft\ %b/%s\ %Tw/%Tc/%Tt\ %B\ %ts\ \%ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %U
What I understand from log:
there are no backeend servers
termination state SC translates to
S : the TCP session was unexpectedly aborted by the server, or the
server explicitly refused it.
C : the proxy was waiting for the CONNECTION to establish on the
server. The server might at most have noticed a connection attempt.
I don't think what you are looking for is possible. HAproxy handles the two sides of the connection (frontend, backend) separately. The incoming TCP connection is established first and then HAproxy looks for a matching destination for it.

AWS TCP ELB refuse connection when there is no available back-end server

We have a TCP application that receives connections in a protocol that we did not design and don’t control.
This protocol will assume that if it can establish a TCP connection, then it can send a message and that message is acknowledged.
This works ok if connecting directly to a machine, if the machine or application is down, the tcp connection will be refused or dropped and the client will attempt to redeliver the message.
When we use AWS elastic load balancer, ELB will establish a TCP connection with the client, regardless of whether there is an available back-end server to fulfil the request.
As a result if our application or server crashes then we lose messages.
ELB will close the TCP connection shortly thereafter, but its not good enough.
Is there a way to make ELB, only establish a connection if it can reach the back-end server?
What options do we have (within the AWS ecosystem), of balancing a TCP based service, while still refusing connections if they cannot be served.
I don't think that's achievable through ELB. By design a load balancer will manage 2 sets of connections (frontend - LB and LB - backend). The load balancer will attempt to minimize the time it takes to serve the traffic it receives. This means that the FE-LB connection will be established as the LB looks for a Backend connection to use / reuse. The case in which all of the Backend hosts are dead is such an edge case that you end up with the behavior you are seeing. Normally it's not a big deal as the requested will just get disconnected once the LB figures out that it cannot server the traffic.
Back to your protocol: to me it seem really weird that you would interpret the ability to establish a connection as equal to message delivery. It sounds like you're using TCP but not waiting for the confirmations that the message were actually received at the destination. To me that seems wrong and will get you in trouble eventually with or without a load balancer.
And not to sound too pessimistic (I do understand we are not living in an ideal world) what I would do in this specific scenario, if you can deploy additional software on the client, would be to use a tcp proxy on the client that would get disabled automatically whenever the load balancer is unhealthy/unable to serve traffic. Instruct the client to connect to this proxy. Far from ideal but it should do the trick.
You could create a health check from your ELB to verify if the backend EC2 instances respond on the TCP port. See ELB Health Checks
Then, you monitor the health status of the EC2 instances sent by the ELB to CloudWatch.
Once you determine that none of the EC2 instances are responding on the TCP port, you can remove the TCP listener from the ELB. See Delete ELB Listeners
Hopefully, at that point the ELB stops accepting TCP connections.
Note, I have not tested this solution.

How can we remove close_wait state of the socket without restarting the server?

We have written an application in which client-server communication is used with the IOCP concept.
Client connects to the server through wireless access points.
When temporary disconnection happens in the network, this can lead a CLOSE_WAIT state.This could indicate that the
client properly closed the connection. But the server still has its socket open.
If there are too many instances of the port (to which the server and client were talking) were in CLOSE_WAIT state then at the highest peak ,server stop functioning thus rejecting the connection.That is totally frustrating.In this case, user has to restart the server to wipe out all the close_wait state by clearing the memory.When server restart,client again try to connect to the server.Server calls accept command again,But before accepting a new connection ,previous connection should be closed at server side,How can we do that ?
How can we remove close_wait state of the socket without restarting the server ?
Is there any alternate way to avoid server restart ?
We also came to know that,If all of the available ephemeral ports are allocated to client applications then the
client experiences a condition known as TCP/IP port exhaustion. When TCP/IP port exhaustion occurs, client port
reservations cannot be made and errors will occur in client applications that attempt to connect to a server via TCP/IP sockets.
if this is happening then we need to increase the upper range of ephemeral ports that are dynamically allocated to client TCP/IP socket connections.
Reference :
http://msdn.microsoft.com/en-us/library/aa560610%28v=bts.10%29.aspx
Let us know if this alternate way is useful or not ?
Thanks in advance.
Regards
Amey
Fix the server code.
The server should be reading with a timeout, and if the timeout expires it should close the socket.