HAProxy: Random Layer4 timeouts in when using httpchk - haproxy

I'm currently investigating an issue of random "Layer4" timeouts being reported by health checks used in HAPRoxy. The actual backend server being checked is proven to be up and responding at the time of these errors, as other trafic to the server is flowing through.
Thus making me suspect there might be issues caused by our configuration.
Server health is currently configured as follow:
option httpchk GET /health HTTP/1.1\r\nHost:\ Haproxy\r\nConnection:\ close
http-check expect string OK
server server1 server1.internal.example.com check check-ssl port 443 verify none inter 3s fall 2 backup
Trying to understand the docs, I see the "http-check connect" and "linger" options being mentioned. Would the "connect" directive make any actual difference to how the connection for the health check is set up compared to our current conf?
Any other feedback/observartions on the above config is welcome.

Related

Haproxy http health check reuse connection

I'm try to config haproxy to reuse connection(http keep alive) for reduce TCP establish connection cost
It's ok for normal request. But for health check, it's alway create new connection to check backend.
How can we re-use a connection for health check purpose.
Here is my config
balance roundrobin
mode http
option http-keep-alive
option httpchk GET /actuator/health HTTP/1.1\r\nHost:\ 10.1.1.2:8003
server micro-dev 10.1.1.2:8003 check maxconn 2000
Thanks
Few days agos, I have the same question about this header Connection: close which is unfortunately always added to the request.
The support replied that this behavior is normal.
In fact, the single request health check tells the server to close the connection once the response is returned.
It is not possible to change this behavior.
For more informations read this https://github.com/haproxy/haproxy/issues/560

HAProxy multiple health check types on one backend

Is it possible to have a httpchk on port 8008 and ssl-hello-chkon port 5432 in a single haproxy tcp backend? From what I can see health checks are defined with option and then by adding check to the server definition which leads me to believe that only one health check may be done per backend.
Am I wrong and is there a way to accomplish what I want?
You could define the same server twice within the backend and set one of each health check.

haproxy: set timeout if <condition>

The question seems to be quite straight and easy, however I have not been able to find a proper answer.
In haproxy I have 1 backend, say:
backend-1
and 2 frontends, say:
frontend-1
frontend-2
In the backend stanza I want to set a "timeout server" parameter, but, only if the connection comes from frontend-1.
As I didn't find anything I tried to figure it out myself:
backend backend-1
bind *:80
option <blahblah_option>
timeout server 1d if frontend frontend-1
This syntax does not work, and I am mentioning it to let understand what I am trying to achieve.
This is not doable yet in HAProxy.
Later, you will be able to set timeouts using tcp-request and http-request rules.
What we usually do to workaround this for now, is that we setup 2 backends using the same parameters, but different timeout servers.
This is useful when a few urls only deserve a long server timeout.
Edit followup your comment about multiple health checks:
Well, that's why the server's 'track' directive exists:
backend my_app
server srv1 10.0.0.1:80 check
backend my_app_longtime
server srv1 10.0.0.1:80 track my_app/srv1
In the conf above, the server in my_app_longtime backend won't be checked. That said, it will follow up the same state than srv1 in the backend my_app.
Baptiste
Baptiste
I did it like this and it worked. It made it possible to extend timeout on specific app urls, which are more time consuming. Used that trace health check - thanks Babtiste.
frontend www-http
bind 10.0.0.1:80
default_backend app
acl long_url path_beg -i /long_url
use_backend app-extended if long_url
backend app
server web-1 10.0.0.2:80 check
backend app-extended
server web-1 10.0.0.2:80 trace app/web-1
timeout server 10m

HAProxy - Request getting Broadcast to every server

I am hosting two different application versions on same servers on different ports. In basic version i expect that following configuration should send request in RoundRobin fashion to different ports. But what i am observing is the request is getting broadcasted to ALL of my server endpoints. Meaning in below example my main request to port 8080 gets FWD to both www.myappdemo.com:5001 and www.myappdemo.com:5002... although the response send by proxy is ALWAYS from www.myappdemo.com:5001.
Can anyone tell what is wrong here?
global
debug
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:8080
default_backend servers
backend servers
balance roundrobin
server svr_50301 www.myappdemo.com:5001 maxconn 32 check
server svr_50302 www.myappdemo.com:5002 maxconn 32 check
i can advise you to enable logs and web interface, after that you can provide us more logs and you can check in web interface also if haproxy detects you second server(svr_50302) to be alive.
Reference to HAProxy 1.5 Doc's :
Web Interface - http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#4.2-stats%20admin
Good info how to enable login - http://webdevwonders.com/haproxy-load-balancer-setup-including-logging-on-debian/
Best Regards,
Dani

HAProxy random HTTP 503 errors

We've setup 3 servers:
Server A with Nginx + HAproxy to perform load balancing
backend server B
backend server C
Here is our /etc/haproxy/haproxy.cfg:
global
log /dev/log local0
log 127.0.0.1 local1 notice
maxconn 40096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 50000
clitimeout 50000
srvtimeout 50000
stats enable
stats uri /lb?stats
stats realm Haproxy\ Statistics
stats auth admin:admin
listen statslb :5054 # choose different names for the 2 nodes
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth admin:admin
listen Server-A 0.0.0.0:80
mode http
balance roundrobin
cookie JSESSIONID prefix
option httpchk HEAD /check.txt HTTP/1.0
server Server-B <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 2
server Server-C <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 3
All of the three servers have a good amount of RAM and CPU cores to handle requests
Random HTTP 503 errors are shown when browsing: 503 Service Unavailable - No server is available to handle this request.
And also on server's console:
Message from syslogd#server-a at Dec 21 18:27:20 ...
haproxy[1650]: proxy Server-A has no server available!
Note that 90% times of the time there is no errors. These errors happens randomly.
I had the same issue. After days of pulling my hair out I found the issue.
I had two HAProxy instances running. One was a zombie that somehow never got killed during maybe an update or a haproxy restart. I noticed this when refreshing the /haproxy stats page and the PID would change between two different numbers. The page with one of the numbers had absurd connection stats. To confirm I did
netstat -tulpn | grep 80
Or
sudo lsof -i:80
and saw two haproxy processes listening to port 80.
To fix the issue I did a "kill xxxx" where xxxx is the pid with the suspicious statistics.
Adding my answer here for anyone else who encounters this exact same problem but none of the listed solutions above are applicable. Please note that my answer does not apply to the original code listed above.
For anyone else who may have this problem, check your config and see if you might have mistakenly put the same "bind" line in multiple sections of your config. Haproxy does not check this during startup, and I plan to submit this as a recommended validation check to the developers. In my case, I have 3 different sections of the config, and I mistakenly put the same IP binding in two different places. It was about a 50/50 shot on whether or not the correct section would be used or the incorrect section was used. Even when the correct section was used, about half of the requests still got a 503.
It is possible your servers share, perhaps, a common resource that is timing out at certain times, and that your health check requests are being made at the same time (and thus pulling the backend servers out at the same time).
You can try using the HAProxy option spread-checks to randomize health checks.
I had the same issue, due to 2 HAProxy services running in the linux box, but with different name/pid/resources. Unless i stop the unwanted one, the required instances throws 503 error randomly, say 1 in 5 times.
Was trying to use single linux box for multiple URL routing but looks a limitation in haproxy or the config file of haproxy i have defined.
Hard to say without more details, but is it possible you are exceeding the configured maxconn for each backend? The Stats UI shows these stats on both the frontend and on individual backends.
I resolved my intermittent 503s with HAProxy by adding option http-server-close to backend. Looks like uWSGI (which is upstream) is not doing well with keep-alive. Not sure what's really behind the problem, but after adding this option, haven't seen single 503 since.
don't use the "bind" line in multiple sections of your haproxy.cfg
for example, this would be wrong
frontend stats
bind *:443 ssl crt /etc/ssl/certs/your.pem
frontend Main
bind *:443 ssl crt /etc/ssl/certs/your.pem
fix like this
frontend stats
bind *:8443 ssl crt /etc/ssl/certs/your.pem
frontend Main
bind *:443 ssl crt /etc/ssl/certs/your.pem