HAProxy random HTTP 503 errors - haproxy

We've setup 3 servers:
Server A with Nginx + HAproxy to perform load balancing
backend server B
backend server C
Here is our /etc/haproxy/haproxy.cfg:
global
log /dev/log local0
log 127.0.0.1 local1 notice
maxconn 40096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 50000
clitimeout 50000
srvtimeout 50000
stats enable
stats uri /lb?stats
stats realm Haproxy\ Statistics
stats auth admin:admin
listen statslb :5054 # choose different names for the 2 nodes
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth admin:admin
listen Server-A 0.0.0.0:80
mode http
balance roundrobin
cookie JSESSIONID prefix
option httpchk HEAD /check.txt HTTP/1.0
server Server-B <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 2
server Server-C <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 3
All of the three servers have a good amount of RAM and CPU cores to handle requests
Random HTTP 503 errors are shown when browsing: 503 Service Unavailable - No server is available to handle this request.
And also on server's console:
Message from syslogd#server-a at Dec 21 18:27:20 ...
haproxy[1650]: proxy Server-A has no server available!
Note that 90% times of the time there is no errors. These errors happens randomly.

I had the same issue. After days of pulling my hair out I found the issue.
I had two HAProxy instances running. One was a zombie that somehow never got killed during maybe an update or a haproxy restart. I noticed this when refreshing the /haproxy stats page and the PID would change between two different numbers. The page with one of the numbers had absurd connection stats. To confirm I did
netstat -tulpn | grep 80
Or
sudo lsof -i:80
and saw two haproxy processes listening to port 80.
To fix the issue I did a "kill xxxx" where xxxx is the pid with the suspicious statistics.

Adding my answer here for anyone else who encounters this exact same problem but none of the listed solutions above are applicable. Please note that my answer does not apply to the original code listed above.
For anyone else who may have this problem, check your config and see if you might have mistakenly put the same "bind" line in multiple sections of your config. Haproxy does not check this during startup, and I plan to submit this as a recommended validation check to the developers. In my case, I have 3 different sections of the config, and I mistakenly put the same IP binding in two different places. It was about a 50/50 shot on whether or not the correct section would be used or the incorrect section was used. Even when the correct section was used, about half of the requests still got a 503.

It is possible your servers share, perhaps, a common resource that is timing out at certain times, and that your health check requests are being made at the same time (and thus pulling the backend servers out at the same time).
You can try using the HAProxy option spread-checks to randomize health checks.

I had the same issue, due to 2 HAProxy services running in the linux box, but with different name/pid/resources. Unless i stop the unwanted one, the required instances throws 503 error randomly, say 1 in 5 times.
Was trying to use single linux box for multiple URL routing but looks a limitation in haproxy or the config file of haproxy i have defined.

Hard to say without more details, but is it possible you are exceeding the configured maxconn for each backend? The Stats UI shows these stats on both the frontend and on individual backends.

I resolved my intermittent 503s with HAProxy by adding option http-server-close to backend. Looks like uWSGI (which is upstream) is not doing well with keep-alive. Not sure what's really behind the problem, but after adding this option, haven't seen single 503 since.

don't use the "bind" line in multiple sections of your haproxy.cfg
for example, this would be wrong
frontend stats
bind *:443 ssl crt /etc/ssl/certs/your.pem
frontend Main
bind *:443 ssl crt /etc/ssl/certs/your.pem
fix like this
frontend stats
bind *:8443 ssl crt /etc/ssl/certs/your.pem
frontend Main
bind *:443 ssl crt /etc/ssl/certs/your.pem

Related

HAProxy: Random Layer4 timeouts in when using httpchk

I'm currently investigating an issue of random "Layer4" timeouts being reported by health checks used in HAPRoxy. The actual backend server being checked is proven to be up and responding at the time of these errors, as other trafic to the server is flowing through.
Thus making me suspect there might be issues caused by our configuration.
Server health is currently configured as follow:
option httpchk GET /health HTTP/1.1\r\nHost:\ Haproxy\r\nConnection:\ close
http-check expect string OK
server server1 server1.internal.example.com check check-ssl port 443 verify none inter 3s fall 2 backup
Trying to understand the docs, I see the "http-check connect" and "linger" options being mentioned. Would the "connect" directive make any actual difference to how the connection for the health check is set up compared to our current conf?
Any other feedback/observartions on the above config is welcome.

Is it possible in haproxy to have sticky sessions based on cookie and still load balance?

So if this is the backend config:
backend main
mode http
balance leastconn
cookie serverid insert indirect nocache
stick-table type string len 36 size 1m expire 8h
stick on cookie(JSESSIONID)
option httpchk HEAD /web1 HTTP/1.0
http-check expect ! rstatus ^5
server monintdevweb 10.333.33.33:443 check cookie check ssl verify none #web1
server monintdevweb2 10.222.22.122:443 check cookie check ssl verify none #web2
server localmaint 10.100.00.105:9042 backup #maint
option log-health-checks
option redispatch
timeout connect 1s
timeout queue 5s
timeout server 3600s
It seems it always sends all users to web1. i.e. its not evenly load balancing using leastconn algorithm specified. I tried with stick table using src IP and that does what i want - ie persist sessions but each new session should get balanced between servers. Is it not possible to have that using cookies?
Another problem with cookies i noticed was that if I were to bring down services on the web1, all users get redirected to web2 and then redirected back to web1 when web1 is restored!! What real life scenario would find this behavior be useful?
After scouring the internet for a whole day and going thru every link that talks about load balancing and sticky sessions, I found the answer immediately after i posted the question. I need to use the "application ID" which will help load balancer differentiate between each user session so it can continue to load balance requests. I am not using IIS/asp.net so thats why it didn't hit me earlier. But here is the config that works..
change these lines..
cookie serverid insert indirect nocache
stick-table type string len 36 size 1m expire 8h
stick on cookie(JSESSIONID)
to:
stick-table type string len 36 size 1m expire 8h
stick on cookie(DWRSESSION)
where DWRSESSION is the my application session ID.

How to configure haproxy to use a different backend for each request

I have an Haproxy 1.5.4. I would like to configure the haproxy to use a different backend for each request. This way , I want to ensure that a diffeent backend is used for each request. I curently use the following config:
global
daemon
maxconn 500000
nbproc 2
log 127.0.0.1 local0 info
defaults
mode tcp
timeout connect 50000ms
timeout client 500000ms
timeout server 500000ms
timeout check 5s
timeout tunnel 50000ms
option redispatch
listen httptat *:3310
mode http
stats enable
stats refresh 5s
stats uri /httpstat
stats realm HTTPS proxy stats
stats auth https:xxxxxxxxxxx
listen HTTPS *:5008
mode tcp
#maxconn 50000
balance leastconn
server backend1 xxx.xxx.xxx.xxx:125 check
server backend1 xxx.xxx.xxx.xxx:126 check
server backend1 xxx.xxx.xxx.xxx:127 check
server backend1 xxx.xxx.xxx.xxx:128 check
server backend1 xxx.xxx.xxx.xxx:129 check
server backend1 xxx.xxx.xxx.xxx:130 check
......
simply change the balance setting from leastconn to roundrobin
from the haproxy manual for 1.5 :
roundrobin Each server is used in turns, according to their weights.
This is the smoothest and fairest algorithm when the server's
processing time remains equally distributed. This algorithm
is dynamic, which means that server weights may be adjusted
on the fly for slow starts for instance. It is limited by
design to 4095 active servers per backend. Note that in some
large farms, when a server becomes up after having been down
for a very short time, it may sometimes take a few hundreds
requests for it to be re-integrated into the farm and start
receiving traffic. This is normal, though very rare. It is
indicated here in case you would have the chance to observe
it, so that you don't worry.
https://cbonte.github.io/haproxy-dconv/1.5/configuration.html#4-balance

haproxy: set timeout if <condition>

The question seems to be quite straight and easy, however I have not been able to find a proper answer.
In haproxy I have 1 backend, say:
backend-1
and 2 frontends, say:
frontend-1
frontend-2
In the backend stanza I want to set a "timeout server" parameter, but, only if the connection comes from frontend-1.
As I didn't find anything I tried to figure it out myself:
backend backend-1
bind *:80
option <blahblah_option>
timeout server 1d if frontend frontend-1
This syntax does not work, and I am mentioning it to let understand what I am trying to achieve.
This is not doable yet in HAProxy.
Later, you will be able to set timeouts using tcp-request and http-request rules.
What we usually do to workaround this for now, is that we setup 2 backends using the same parameters, but different timeout servers.
This is useful when a few urls only deserve a long server timeout.
Edit followup your comment about multiple health checks:
Well, that's why the server's 'track' directive exists:
backend my_app
server srv1 10.0.0.1:80 check
backend my_app_longtime
server srv1 10.0.0.1:80 track my_app/srv1
In the conf above, the server in my_app_longtime backend won't be checked. That said, it will follow up the same state than srv1 in the backend my_app.
Baptiste
Baptiste
I did it like this and it worked. It made it possible to extend timeout on specific app urls, which are more time consuming. Used that trace health check - thanks Babtiste.
frontend www-http
bind 10.0.0.1:80
default_backend app
acl long_url path_beg -i /long_url
use_backend app-extended if long_url
backend app
server web-1 10.0.0.2:80 check
backend app-extended
server web-1 10.0.0.2:80 trace app/web-1
timeout server 10m

Combine HAProxy stats?

I have two instances of HAProxy. Both instances have stats enabled and are working fine.
I am trying to combine the stats from both instances into one so that I can use a single HAProxy to view the front/backends stats. I've tried to have the stats listener on the same port for both haproxy instances but this isn't working. I've tried using the sockets interface but this only reports on one of the interfaces as well.
Any ideas?
My one haproxy config file looks like this:
global
daemon
maxconn 256
log 127.0.0.1 local0 debug
log-tag haproxy
stats socket /tmp/haproxy
defaults
log global
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:8000
default_backend servers
log global
option httplog clf
backend servers
balance roundrobin
server ws8001 localhost:8001
server ws8002 localhost:8002
log global
listen admin
bind *:7000
stats enable
stats uri /
The other haproxy config is the same except the front/backend server IPs are different.
While perhaps not an exact answer to this specific question, I've seen this kind of question enough that I think it deserves to be answered.
When running with nbproc greater than 1, the Stack Exchange guys have a unique solution. They have a listen section that receives SSL traffic and then uses send-proxy to 127.0.0.1:80. They then have a frontend that binds to 127.0.0.1:80 like this: bind 127.0.0.1:80 accept-proxy. Inside of that frontend they then bind that frontend, e.g. bind-process 1 and in the globals section the do the following:
global
stats socket /var/run/haproxy-t1.stat level admin
stats bind-process 1
The advantage of this is that they get multiple cores for SSL offloading and then a single core dedicated to load balancing traffic. All traffic ultimately flows through this frontend and therefore they can accurately measure stats from that frontend.
This can't work. Haproxy keeps stats separated in each process. It has no capabilities to combine stats of multiple processes.
That said, you are of course free to use external monitoring tools like (munin, graphite or even nagios) which can aggregate the CSV data from multiple stats sockets and display them in unified graphs. These tools are however out-of-scope of core haproxy.