HAProxy : Prevent stickiness to a backup server - haproxy

I'm facing a configuration issue with HAProxy (1.8).
Context:
In a HAProxy config, I have a several severs in a backend, and an additional backup server in case the other servers are down.
Once a client gets an answer from a server, it must stick to this server for its next queries.
For some good reasons, I can't use a cookie for this concern, and I had to use a stick-table instead.
Problem:
When every "normal" server is down, clients are redirected to the backup server, as expected.
BUT the stick-table is then filled with an association between the client and the id of the backup server.
AND when every "normal" server is back, the clients which are present in the stick table and associated with the id of the backup server will continue to get redirected to the backup server instead of the normal ones!
This is really upsetting me...
So my question is: how to prevent HAProxy to stick clients to a backup server in a backend?
Please find below a configuration sample:
defaults
option redispatch
frontend fe_test
bind 127.0.0.1:8081
stick-table type ip size 1m expire 1h
acl acl_test hdr(host) -i whatever.domain.com
...
use_backend be_test if acl_test
...
backend be_test
mode http
balance roundrobin
stick on hdr(X-Real-IP) table fe_test
option httpchk GET /check
server test-01 server-01.lan:8080 check
server test-02 server-02.lan:8080 check
server maintenance 127.0.0.1:8085 backup
(I've already tried to add a lower weight to the backup server, but it didn't solve this issue.)
I read in the documentation that the "stick-on" keyword has some "if/unless" options, and maybe I can use it to write a condition based on the backend server names, but I have no clue about the syntax to use, or even if it is possible.
Any idea is welcome!

So silly of me! I was so obsessed by the stick table configuration that I didn't think to look in the server options...
There is a simple keyword that perfectly solves my problem: non-stick
Never add connections allocated to this sever to a stick-table. This
may be used in conjunction with backup to ensure that stick-table
persistence is disabled for backup servers.
So the last line of my configuration sample simply becomes:
server maintenance 127.0.0.1:8085 backup non-stick
...and everything is now working as I expected.

Related

Deploy a WebApp and always keep it running

I have a web application spread over multiple servers and the incoming traffic is handled by HAProxy in order to balance the load. When we do the distribution, we do it at night because the users are much less and therefore we are less in service. To make the distribution we use the following strategy:
we shut down half of the servers
we deploy on servers that are turned off
we reactivate the servers that are turned off
we perform the same procedure on the other servers
The problem is that in any case I turn off the servers we close connections to users. Is there a better strategy for doing this? How could I improve this and avoid disservices and maybe be able to make distributions even during the day?
I hope I was clear. Thanks
I strongly suggest to use health checks for the servers.
Using HAProxy as an API Gateway, Part 3 [Health Checks]
You should have a URL ("/health") which you can use for health check of the backend server and add option redispatch to the config.
Now when you want to maintain the backend server just "remove" the "/health" URL and haproxy automagically routes the user to the other available servers.

Postgres's tcp_keepalives_idle Not Updating AWS ELB Idle Timeout

I have an Amazon ELB in front of Postgres. This is for Kubernetes-related reasons, see this question. I'm trying to work around the maximum AWS ELB Idle Timeout limit of 1 hour so I can have clients that can execute long-running transactions without being disconnected by the ELB. I have no control over the client configuration in my case, so any workaround needs to happen on the server side.
I've come across the tcp_keepalives_idle setting in Postgres, which in theory should get around this by sending periodic keepalive packets to the client, thus creating activity so the ELB doesn't think the client is idle.
I tried testing this by setting the idle timeout on the ELB to 2 minutes. I set tcp_keepalives_idle to 30 seconds, which should force the server to send the client a keepalive every 30 seconds. I then execute the following query through the load balancer: psql -h elb_dns_name.com -U my_user -c "select pg_sleep(140)". After 2 minutes, the ELB disconnects the client. Why are the keepalives not coming through to the client? Is there something with pg_sleep that might be blocking them? If so, is there a better way to simulate a long running query/transaction?
I fear this might be a deep dive and I may need to bring out tcpdump or similar tools. Unfortunately things do get a bit more complicated to parse with all of the k8s chatter going on as well. So before going down this route I thought it would be good to see if I was missing something obvious. If not, any tips on how to best determine whether a keepalive is actually being sent to the server, through the ELB, and ending up at the client would be much appreciated.
Update: I reached out to Amazon regarding this. Apparently idle is defined as not transferring data over the wire. Data is defined as any network packet that has a payload. Since TCP keep-alives do not have payloads, the client and server keep-alives are considered idle. So unless there's a way to get the server to send data inside of their keep alive payloads, or send data in some other form, this may be impossible.
Keepalives are sent on the TCP level, well below PostgreSQL, so it doesn't make a difference if the server is running a pg_sleep or something else.
Since a hosted database is somewhat of a black box, you could try to control the behavior on the client side. The fortunate thing is that PostgreSQL also offers keepalive parameters on the client side.
Experiment with
psql 'host=elb_dns_name.com user=my_user keepalives_idle=1800' -c 'select pg_sleep(140)'

Haproxy Health Check port

I'm trying to think through the advantages and disadvantages of haproxy health checks happening on a different port from regular traffic.
If a server becomes overloaded having health checks on a different port may mark the server as being up even when overloaded. I think this is a good thing because taking servers offline may make an overloading problem worse, but want to confirm that that makes sense. I can't seem to find any good docs on the tradeoffs though and was wondering if someone has a good analysis on the tradeoffs.
The port keyword is often used with address to send health checks somewhere else than directly to the service you are checking. One example might be enabling option httpchk to monitor a non-HTTP service. What you then do is have a HTTP-compatible service that when queried can execute complex health checks against the service you are actually testing.
The above is often done with agent-check nowdays, but some people prefer to use an HTTP interface.
This also has nothing to do with server load, the only idea is to send health checks to some other service, not the one directly monitored, which is more capable of testing the actual service (possibly by using a more-complex logic) and returning a result. As an example, one could have a MySQL backend which instead of being tested just for authentication by option mysql-check, could be tested by a PHP script that, for example, checks if backup is running and if it is returns a 5xx HTTP error. The configuration could be something like:
backend mysql
mode tcp
option httpchk GET /mysql-status.php
server mysqlserver 10.0.0.1:3306 check port 80

haproxy: set timeout if <condition>

The question seems to be quite straight and easy, however I have not been able to find a proper answer.
In haproxy I have 1 backend, say:
backend-1
and 2 frontends, say:
frontend-1
frontend-2
In the backend stanza I want to set a "timeout server" parameter, but, only if the connection comes from frontend-1.
As I didn't find anything I tried to figure it out myself:
backend backend-1
bind *:80
option <blahblah_option>
timeout server 1d if frontend frontend-1
This syntax does not work, and I am mentioning it to let understand what I am trying to achieve.
This is not doable yet in HAProxy.
Later, you will be able to set timeouts using tcp-request and http-request rules.
What we usually do to workaround this for now, is that we setup 2 backends using the same parameters, but different timeout servers.
This is useful when a few urls only deserve a long server timeout.
Edit followup your comment about multiple health checks:
Well, that's why the server's 'track' directive exists:
backend my_app
server srv1 10.0.0.1:80 check
backend my_app_longtime
server srv1 10.0.0.1:80 track my_app/srv1
In the conf above, the server in my_app_longtime backend won't be checked. That said, it will follow up the same state than srv1 in the backend my_app.
Baptiste
Baptiste
I did it like this and it worked. It made it possible to extend timeout on specific app urls, which are more time consuming. Used that trace health check - thanks Babtiste.
frontend www-http
bind 10.0.0.1:80
default_backend app
acl long_url path_beg -i /long_url
use_backend app-extended if long_url
backend app
server web-1 10.0.0.2:80 check
backend app-extended
server web-1 10.0.0.2:80 trace app/web-1
timeout server 10m

Haproxy: Keepalive connections not balanced evenly

we’ve got a strange little problem we’re experiencing for months now:
The load on our cluster (http, long lasting keepalive connections with a lot of very short (<100ms) requests) is distributed very uneven.
All servers are configured the same way but some connections that push through thousands of requests per second just end up being sent to only one server.
We tried both load balancing strategies but that does not help.
It seems to be strictly keepalive related.
The misbehaving backend has the following settings:
option tcpka
option http-pretend-keepalive
Is the option http-server-close made to cover that issue?
If I get it right it will close and re-open a lot of connections which means load to the systems? Isn't there a way to keep the connections open but evenly balance the traffic anyway?
I tried to enable that option but it kills all of our backends when under load.
HAProxy currently only support keep-alive HTTP-connections toward the client, not the server. If you want to be able to inspect (and balance) each HTTP request, you currently have to use one of the following options
# enable keepalive to the client
option http-server-close
# or
# disable keepalive completely
option httpclose
The option http-pretend-keepalive doesn't change the actual behavior of HAProxy in regards of connection handling. Instead, it is intended as a workaround for servers which don't work well when they see a non-keepalive connection (as is generated by HAProxy to the backend server).
Support for keep-alive towards the backend server is scheduled to be in the final HAProxy 1.5 release. But the actual scope of that might still vary and the final release date is sometime in the future...
Just FYI, it's present in the latest release 1.5-dev20 (but take the fixes with it, as it shipped with a few regressions).