Haproxy agent-check DRAIN(agent) status still accepts new connections - haproxy

I am trying to perform a load balancing for my Node.js application which uses websockets. I need haproxy to stop load balance new connections on a server, which has reached its maximum number of connections, keeping the existing ones intact in the same time.
I do this by performing an agent-check for each of my servers. If a server cannot accept new connections it responds with "drain" to an agent-check. If a server is able to respond to a new connection, it responds with "ready" to an agent-check.
Here is my haproxy.cfg configuration file:
global
daemon
maxconn 240000
log /dev/log local0 debug
log /dev/log local1 notice
tune.ssl.default-dh-param 2048
defaults
mode http
log global
option httplog
option dontlognull
option dontlog-normal
option http-server-close
option redispatch
timeout connect 20000ms
timeout http-request 1m
timeout client 2100000ms
timeout server 2100000ms
timeout queue 30s
timeout check 5s
timeout http-keep-alive 180s
timeout tunnel 3600s
timeout tarpit 60s
frontend stats
bind *:8084
stats enable
stats uri /stats
stats refresh 10s
stats admin if TRUE
frontend test
mode http
bind *:5000
default_backend ws
backend ws
mode http
fullconn 100000
balance roundrobin
cookie SERVERID insert indirect nocache
server 1 backend1:9999 check agent-check agent-port 8080 cookie 1 inter 500 fall 1 rise 2
And here is how I respond to haproxy agent-check in my Node.js app:
const healthCheckServer = net.createServer((c) => {
let data = '';
if (currentConn < MAX_CONN) {
data += 'ready';
} else {
data += 'drain';
}
c.write(data + '\r\n');
c.destroy();
});
healthCheckServer.listen(8080, '0.0.0.0');
When number of connections to my application is reaching its maximum, haproxy correctly changes server status to DRAIN (agent) (I can observe this in haproxy web dashboard). The problem is, that new connections are still accepted by the application.
I am new to haproxy, so can someone point me to where I am wrong?

Found out that when server is drained by agent (status is set to DRAIN (agent)) and if server is the only one existing in the backend it will still accept new connections.
When there are multiple servers present and each server is drained the behavior is just like expected: haproxy returns HTTP 503.
UPDATE:
Turned out that I was looking in the wrong direction all the time.
First I had to mark my backend which processes WebSocket connections as a non-http (remove mode http line). My guess is haproxy incorrectly counts current sessions for http backend when using WebSockets. Removing mode http solved my problem.
Second, returning maxconn:<conn> in agent-check looks like much simple and more idiomatic way to limit the number of concurrent connections.
Sources:
https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#5.2-agent-check
https://www.haproxy.com/blog/websockets-load-balancing-with-haproxy/
I hope this will help someone.

Related

haproxy close connecton after 1 minute

this config
frontend https_frontend
bind *:4055
mode tcp
maxconn 8192
use_backend https_web
backend https_web
mode tcp
balance roundrobin
option http-keep-alive
server haproxy2 xxx.xxx.xxx.xxx:4055 send-proxy-v2
new connection send keep-alive packets every 30 seconds. but connection drop after 1 minute
I think this is because you're using mode tcp, but option http-keep-alive is a mode http option. In this case, it would most likely be using whatever value you have for timeout client or timeout server before dropping the connection.
For more details about option http-keep-alive and mode http, see:
https://www.haproxy.com/documentation/aloha/7-5/traffic-management/lb-layer7/http-modes/#http-modes-in-haproxy
frontend https_frontend
bind *:4055
mode tcp
maxconn 8192
use_backend https_web
backend https_web
mode tcp
balance roundrobin
timeout client 600000
timeout server 600000
server haproxy2 147.78.65.172:4055 send-proxy-v2
now i send keep-alive packets and real data every 30 seconds
but steel drop after 2 minutes
its not http/https query. its sample tcp communication with rand data. maybe it problem?

HAProxy environment refusing to connect

I have an installation with 2 webservices behind a load balancer with HAProxy. While on service run by 3 servers responds quite fine, the other service with just one server doesn't.
So basically here's what should happen:
loadbalancer --> rancherPlatformAdministration if certain url is used
loadbalancer --> rancherServices for all other requests
Here's my haproxy.cfg:
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
frontend http-in
bind *:80
# Define hosts
acl host_rancherAdmin hdr(host) -i admin.mydomain.tech
use_backend rancherPlatformAdministration if host_rancherAdmin
default_backend rancherServices
backend rancherServices
balance roundrobin
server rancherserver91 192.168.20.91:8080 check
server rancherserver92 192.168.20.92:8080 check
server rancherserver93 192.168.20.93:8080 check
backend rancherPlatformAdministration
server rancherapi01 192.168.20.20:8081 check
wget --server-response foo.mydomain.tech answers with a 401 which is respected behaviour as I am not providing a username nor a password. I can also open up foo.mydomain.tech with my browser an log in. So this part works as I said before.
wget --server-response 192.168.20.20:8081 (yes, this Tomcat really is running under 8081) locally from the loadbalancer responds with 200 and thus works just fine, while trying wget --server-response admin.mydomain.tech results in the following:
--2018-06-10 20:51:56-- http://admin.mydomain.tech/
Aufl"osen des Hostnamens admin.mydomain.tech (admin.mydomain.tech)... <PUBLIC IP>
Verbindungsaufbau zu admin.mydomain.tech (admin.mydomain.tech)|<PUBLIC IP>|:80 ... verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet ...
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html
2018-06-10 20:51:56 FEHLER 503: Service Unavailable.
I am pretty sure I am missing something here; I am aware of the differences in forwarding the request as a layer 4 or a layer 7 request – which seems to work just fine. I am providing mode http so I am on layer7...
Any hints on what's happening here or on how I can debug this?
Turns out that in my case the selinux was the showstopper – after putting it to permissive mode by setenforce 0, it just worked...
Since this change is not restart-persistent, I had to follow the instructions found here: https://www.tecmint.com/disable-selinux-temporarily-permanently-in-centos-rhel-fedora/

Why does HAProxy show that a server's check URL is a 404 when running curl on this URL is successful?

I'm setting up HAProxy to load-balance a resource between 3 back-ends. Here is the HAProxy config : (In the following snippets I replaced the actual domain name by example.net)
global
log 127.0.0.1 local2
log-send-hostname
maxconn 2000
pidfile /var/run/haproxy.pid
stats socket /var/run/haproxy.sock mode 600 level admin
stats timeout 30s
daemon
# SSL ciphers
...
defaults
mode http
option forwardfor
option contstats
option http-server-close
option log-health-checks
option redispatch
timeout connect 5000
timeout client 10000
timeout server 10000
...
frontend front
bind *:443 ssl crt /usr/local/etc/haproxy/front.pem
reqadd X-Forwarded-Proto:\ https if { ssl_fc }
stats uri /haproxy?stats
option httpclose
option forwardfor
default_backend back
balance source
backend back
balance roundrobin
option httpchk GET /healthcheck HTTP/1.0
server server1 xxx.xxx.xxx.xxx:80 check inter 5s fall 2 rise 1
server server2 yyy.yyy.yyy.yyy:8003 check backup
server mysite example.net:80 check backup
The issue is the following: even though the first 2 servers respond correctly, the domain-based one always shows as a 404:
What is counter-intuitive to me is that if I use curl to access this same healthcheck, I get an HTTP 200 (like I would expect to see in the HAProxy stats) :
curl -I http://example.net/healthcheck
HTTP/1.1 200 OK
When I ping my site, I get:
# ping example.net
PING example.net (217.160.0.195) 56(84) bytes of data.
64 bytes from 217-160-0-195.elastic-ssl.ui-r.com (217.160.0.195): icmp_seq=1 ttl=50 time=45.7 ms
Is it because the IP of my domain is shared with other domains (1&1 shared hosting) that HAProxy can't access it? Why is that and how to make HAProxy reach it correctly?

jBoss thread count increaed after upgrading haproxy from 1.5dev21 to 1.5.1

I upgraded my haproxy from 1.5dev21 to 1.5.1 stable version with same configuartion. At the backend, I am using jBoss.
As soon as we upgraded, I encountered serious issue regarding jBoss thread counts. It has been increased tremendously.
After rollback to 1.5dev21, everything works fine.
Please find my below configuration file of haproxy. Kindly suggest any changes required to migrate/upgrade to 1.5.1
global
daemon
maxconn 20000
defaults
mode http
timeout connect 15000ms
timeout client 50000ms
timeout server 50000ms
timeout queue 60s
stats enable
stats refresh 5s
backend backend_http
mode http
cookie JSESSIONID prefix
balance leastconn
option forceclose
option persist
option redispatch
option forwardfor
server server3 192.168.58.211:80 cookie server3_cokkie maxconn 1024 check
server server4 192.168.58.212:80 cookie server4_cookie maxconn 1024 check
acl force_sticky_server3 hdr_sub(server3_cookie) TEST=true
force-persist if force_sticky_server3
acl force_sticky_server4 hdr_sub(server4_cookie) TEST=true
force-persist if force_sticky_server4
rspidel ^Server:.*
rspidel ^X-Powered-By:.*
rspidel ^AMF-Ver:.*
listen frontend_http *:80
mode http
maxconn 20000
default_backend backend_http
listen frontend_https
mode http
maxconn 20000
bind *:443 ssl crt /opt/haproxy-ssl/conf/ssl/testsite.pem
reqadd X-Forwarded-Proto:\ https
reqadd X-Forwarded-Protocol:\ https
reqadd X-Forwarded-Port:\ 443
reqadd X-Forwarded-SSL:\ on
acl valid_domains hdr_end(host) -i gateway.testsite.com www.testsite.com m.testsite.com
redirect scheme http if !valid_domains
default_backend backend_http if valid_domains
Found this on the haproxy manual, may be of help:
Option "http-tunnel" disables any HTTP processing past the first request and
the first response. This is the mode which was used by default in versions
1.0 to 1.5-dev21. It is the mode with the lowest processing overhead, which
is normally not needed anymore unless in very specific cases such as when
using an in-house protocol that looks like HTTP but is not compatible, or
just to log one request per client in order to reduce log size. Note that
everything which works at the HTTP level, including header parsing/addition,
cookie processing or content switching will only work for the first request
and will be ignored after the first response.

Haproxy 503 Service Unavailable . No server is available to handle this request

How does haproxy deal with static file , like .css, .js, .jpeg ? When I use my configure file , my brower says :
503 Service Unavailable
No server is available to handle this request.
This my config :
global
daemon
group root
maxconn 4000
pidfile /var/run/haproxy.pid
user root
defaults
log global
option redispatch
maxconn 65535
contimeout 5000
clitimeout 50000
srvtimeout 50000
retries 3
log 127.0.0.1 local3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
listen dashboard_cluster :8888
mode http
stats refresh 5s
balance roundrobin
option httpclose
option tcplog
#stats realm Haproxy \ statistic
acl url_static path_beg -i /static
acl url_static path_end -i .css .jpg .jpeg .gif .png .js
use_backend static_server if url_static
backend static_server
mode http
balance roundrobin
option httpclose
option tcplog
stats realm Haproxy \ statistic
server controller1 10.0.3.139:80 cookie controller1 check inter 2000 rise 2 fall 5
server controller2 10.0.3.113:80 cookie controller2 check inter 2000 rise 2 fall 5
Does my file wrong ? What should I do to solve this problem ? ths !
What I think is the cause:
There was no default_backend defined. 503 will be sent by HAProxy---this will appear as NOSRV in the logs.
Another Possible Cause
Based on one of my experiences, the HTTP 503 error I receive was due to my 2 bindings I have for the same IP and port x.x.x.x:80.
frontend test_fe
bind x.x.x.x:80
bind x.x.x.x:443 ssl blah
# more config here
frontend conflicting_fe
bind x.x.x.x:80
# more config here
Haproxy configuration check does not warn you about it and netstat doesn't show you 2 LISTEN entries, that's why it took a while to realize what's going on.
This can also happen if you have 2 haproxy services running. Please check the running processes and terminate the older one.
Try making the timers bigger and check that the server is reachable.
From the HAproxy docs:
It can happen from many reasons:
The status code is always 3-digit. The first digit indicates a general status :
- 1xx = informational message to be skipped (eg: 100, 101)
- 2xx = OK, content is following (eg: 200, 206)
- 3xx = OK, no content following (eg: 302, 304)
- 4xx = error caused by the client (eg: 401, 403, 404)
- 5xx = error caused by the server (eg: 500, 502, 503)
503 when no server was available to handle the request, or in response to
monitoring requests which match the "monitor fail" condition
When a server's maxconn is reached, connections are left pending in a queue
which may be server-specific or global to the backend. In order not to wait
indefinitely, a timeout is applied to requests pending in the queue. If the
timeout is reached, it is considered that the request will almost never be
served, so it is dropped and a 503 error is returned to the client.
if you see SC in the logs:
SC The server or an equipment between it and haproxy explicitly refused
the TCP connection (the proxy received a TCP RST or an ICMP message
in return). Under some circumstances, it can also be the network
stack telling the proxy that the server is unreachable (eg: no route,
or no ARP response on local network). When this happens in HTTP mode,
the status code is likely a 502 or 503 here.
Check ACLs, check timeouts... and check the logs, that's the most important...