Difference between global maxconn and server maxconn haproxy - haproxy

I have a question about my haproxy config:
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log 127.0.0.1 syslog emerg
maxconn 4000
quiet
user haproxy
group haproxy
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option abortonclose
option dontlognull
option httpclose
option httplog
option forwardfor
option redispatch
timeout connect 10000 # default 10 second time out if a backend is not found
timeout client 300000 # 5 min timeout for client
timeout server 300000 # 5 min timeout for server
stats enable
listen http_proxy localhost:81
balance roundrobin
option httpchk GET /empty.html
server server1 myip:80 maxconn 15 check inter 10000
server server2 myip:80 maxconn 15 check inter 10000
As you can see it is straight forward, but I am a bit confused about how the maxconn properties work.
There is the global one and the maxconn on the server, in the listen block. My thinking is this: the global one manages the total number of connections that haproxy, as a service, will queue or process at one time. If the number gets above that, it either kills the connection, or pools in some linux socket? I have no idea what happens if the number gets higher than 4000.
Then you have the server maxconn property set at 15. First off, I set that at 15 because my php-fpm, this is forwarding to on a separate server, only has so many child processes it can use, so I make sure I am pooling the requests here, instead of in php-fpm. Which I think is faster.
But back on the subject, my theory about this number is each server in this block will only be sent 15 connections at a time. And then the connections will wait for an open server. If I had cookies on, the connections would wait for the CORRECT open server. But I don't.
So questions are:
What happens if the global connections get above 4000? Do they die? Or pool in Linux somehow?
Are the global connection related to the server connections, other than the fact you can't have a total number of server connections greater than global?
When figuring out the global connections, shouldn't it be the amount of connections added up in the server section, plus a certain percentage for pooling? And obviously you have other restrains on the connections, but really it is how many you want to send to the proxies?
Thank you in advance.

Willy got me an answer by email. I thought I would share it. His answers are in bold.
I have a question about my haproxy config:
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log 127.0.0.1 syslog emerg
maxconn 4000
quiet
user haproxy
group haproxy
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option abortonclose
option dontlognull
option httpclose
option httplog
option forwardfor
option redispatch
timeout connect 10000 # default 10 second time out if a backend is not found
timeout client 300000 # 5 min timeout for client
timeout server 300000 # 5 min timeout for server
stats enable
listen http_proxy localhost:81
balance roundrobin
option httpchk GET /empty.html
server server1 myip:80 maxconn 15 check inter 10000
server server2 myip:80 maxconn 15 check inter 10000
As you can see it is straight forward, but I am a bit confused about how the
maxconn properties work.
There is the global one and the maxconn on the server, in the listen block.
And there is also another one in the listen block which defaults to something
like 2000.
My thinking is this: the global one manages the total number of connections
that haproxy, as a service, will que or process at one time.
Correct. It's the per-process max number of concurrent connections.
If the number
gets above that, it either kills the connection, or pools in some linux
socket?
The later, it simply stops accepting new connections and they remain in the
socket queue in the kernel. The number of queuable sockets is determined
by the min of (net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, and the
listen block's maxconn).
I have no idea what happens if the number gets higher than 4000.
The excess connections wait for another one to complete before being
accepted. However, as long as the kernel's queue is not saturated, the
client does not even notice this, as the connection is accepted at the
TCP level but is not processed. So the client only notices some delay
to process the request.
But in practice, the listen block's maxconn is much more important,
since by default it's smaller than the global one. The listen's maxconn
limits the number of connections per listener. In general it's wise to
configure it for the number of connections you want for the service,
and to configure the global maxconn to the max number of connections
you let the haproxy process handle. When you have only one service,
both can be set to the same value. But when you have many services,
you can easily understand it makes a huge difference, as you don't
want a single service to take all the connections and prevent the
other ones from working.
Then you have the server maxconn property set at 15. First off, I set that at
15 because my php-fpm, this is forwarding to on a separate server, only has
so many child processes it can use, so I make sure I am pooling the requests
here, instead of in php-fpm. Which I think is faster.
Yes, not only it should be faster, but it allows haproxy to find another
available server whenever possible, and also it allows it to kill the
request in the queue if the client hits "stop" before the connection is
forwarded to the server.
But back on the subject, my theory about this number is each server in this
block will only be sent 15 connections at a time. And then the connections
will wait for an open server. If I had cookies on, the connections would wait
for the CORRECT open server. But I don't.
That's exactly the principle. There is a per-proxy queue and a per-server
queue. Connections with a persistence cookie go to the server queue and
other connections go to the proxy queue. However since in your case no
cookie is configured, all connections go to the proxy queue. You can look
at the diagram doc/queuing.fig in haproxy sources if you want, it explains
how/where decisions are taken.
So questions are:
What happens if the global connections get above 4000? Do they die? Or
pool in Linux somehow?
They're queued in linux. Once you overwhelm the kernel's queue, then they're
dropped in the kernel.
Are the global connection related to the server connections, other than
the fact you can't have a total number of server connections greater than
global?
No, global and server connection settings are independant.
When figuring out the global connections, shouldn't it be the amount of
connections added up in the server section, plus a certain percentage for
pooling? And obviously you have other restrains on the connections, but
really it is how many you want to send to the proxies?
You got it right. If your server's response time is short, there is nothing
wrong with queueing thousands of connections to serve only a few at a time,
because it substantially reduces the request processing time. Practically,
establishing a connection nowadays takes about 5 microseconds on a gigabit
LAN. So it makes a lot of sense to let haproxy distribute the connections
as fast as possible from its queue to a server with a very small maxconn.
I remember one gaming site queuing more than 30000 concurrent connections
and running with a queue of 30 per server ! It was an apache server, and
apache is much faster with small numbers of connections than with large
numbers. But for this you really need a fast server, because you don't
want to have all your clients queued waiting for a connection slot because
the server is waiting for a database for instance.
Also something which works very well is to dedicate servers. If your site
has many statics, you can direct the static requests to a pool of servers
(or caches) so that you don't queue static requests on them and that the
static requests don't eat expensive connection slots.
Hoping this helps,
Willy

Related

Understanding keepalive between client and cockroachdb with haproxy

We are facing a problem where our client lets name it A. Is attempting to connect DB server (Cockroach) name B load balanced via ha-proxy
A < -- > haproxy < -- > B
Now at every, while our client A is receiving Broken Pipe error.
But I'm not able to understand why?
Cockroach server already has the below default value i.e 60 seconds.
COCKROACH_SQL_TCP_KEEP_ALIVE ## which is enabled to send for 60 second
Plus our haproxy config has the following setting.
defaults
mode tcp
# Timeout values should be configured for your specific use.
# See: https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4-timeout%20connect
timeout connect 10s
timeout client 1m
timeout server 1m
# TCP keep-alive on client side. Server already enables them.
option clitcpka
option clitcpka
So what is causing the TCP connection to drop when the keepalive is enabled on every end.
Keepalive is what makes connections go away if one of the end points has died without closing the connection. Investigate in that direction.
The only time keepalive actually keeps the connection alive is in connection with an ill-configured firewall that drops idle connections.

How to configure haproxy to use a different backend for each request

I have an Haproxy 1.5.4. I would like to configure the haproxy to use a different backend for each request. This way , I want to ensure that a diffeent backend is used for each request. I curently use the following config:
global
daemon
maxconn 500000
nbproc 2
log 127.0.0.1 local0 info
defaults
mode tcp
timeout connect 50000ms
timeout client 500000ms
timeout server 500000ms
timeout check 5s
timeout tunnel 50000ms
option redispatch
listen httptat *:3310
mode http
stats enable
stats refresh 5s
stats uri /httpstat
stats realm HTTPS proxy stats
stats auth https:xxxxxxxxxxx
listen HTTPS *:5008
mode tcp
#maxconn 50000
balance leastconn
server backend1 xxx.xxx.xxx.xxx:125 check
server backend1 xxx.xxx.xxx.xxx:126 check
server backend1 xxx.xxx.xxx.xxx:127 check
server backend1 xxx.xxx.xxx.xxx:128 check
server backend1 xxx.xxx.xxx.xxx:129 check
server backend1 xxx.xxx.xxx.xxx:130 check
......
simply change the balance setting from leastconn to roundrobin
from the haproxy manual for 1.5 :
roundrobin Each server is used in turns, according to their weights.
This is the smoothest and fairest algorithm when the server's
processing time remains equally distributed. This algorithm
is dynamic, which means that server weights may be adjusted
on the fly for slow starts for instance. It is limited by
design to 4095 active servers per backend. Note that in some
large farms, when a server becomes up after having been down
for a very short time, it may sometimes take a few hundreds
requests for it to be re-integrated into the farm and start
receiving traffic. This is normal, though very rare. It is
indicated here in case you would have the chance to observe
it, so that you don't worry.
https://cbonte.github.io/haproxy-dconv/1.5/configuration.html#4-balance

Combine HAProxy stats?

I have two instances of HAProxy. Both instances have stats enabled and are working fine.
I am trying to combine the stats from both instances into one so that I can use a single HAProxy to view the front/backends stats. I've tried to have the stats listener on the same port for both haproxy instances but this isn't working. I've tried using the sockets interface but this only reports on one of the interfaces as well.
Any ideas?
My one haproxy config file looks like this:
global
daemon
maxconn 256
log 127.0.0.1 local0 debug
log-tag haproxy
stats socket /tmp/haproxy
defaults
log global
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:8000
default_backend servers
log global
option httplog clf
backend servers
balance roundrobin
server ws8001 localhost:8001
server ws8002 localhost:8002
log global
listen admin
bind *:7000
stats enable
stats uri /
The other haproxy config is the same except the front/backend server IPs are different.
While perhaps not an exact answer to this specific question, I've seen this kind of question enough that I think it deserves to be answered.
When running with nbproc greater than 1, the Stack Exchange guys have a unique solution. They have a listen section that receives SSL traffic and then uses send-proxy to 127.0.0.1:80. They then have a frontend that binds to 127.0.0.1:80 like this: bind 127.0.0.1:80 accept-proxy. Inside of that frontend they then bind that frontend, e.g. bind-process 1 and in the globals section the do the following:
global
stats socket /var/run/haproxy-t1.stat level admin
stats bind-process 1
The advantage of this is that they get multiple cores for SSL offloading and then a single core dedicated to load balancing traffic. All traffic ultimately flows through this frontend and therefore they can accurately measure stats from that frontend.
This can't work. Haproxy keeps stats separated in each process. It has no capabilities to combine stats of multiple processes.
That said, you are of course free to use external monitoring tools like (munin, graphite or even nagios) which can aggregate the CSV data from multiple stats sockets and display them in unified graphs. These tools are however out-of-scope of core haproxy.

Haproxy: Keepalive connections not balanced evenly

we’ve got a strange little problem we’re experiencing for months now:
The load on our cluster (http, long lasting keepalive connections with a lot of very short (<100ms) requests) is distributed very uneven.
All servers are configured the same way but some connections that push through thousands of requests per second just end up being sent to only one server.
We tried both load balancing strategies but that does not help.
It seems to be strictly keepalive related.
The misbehaving backend has the following settings:
option tcpka
option http-pretend-keepalive
Is the option http-server-close made to cover that issue?
If I get it right it will close and re-open a lot of connections which means load to the systems? Isn't there a way to keep the connections open but evenly balance the traffic anyway?
I tried to enable that option but it kills all of our backends when under load.
HAProxy currently only support keep-alive HTTP-connections toward the client, not the server. If you want to be able to inspect (and balance) each HTTP request, you currently have to use one of the following options
# enable keepalive to the client
option http-server-close
# or
# disable keepalive completely
option httpclose
The option http-pretend-keepalive doesn't change the actual behavior of HAProxy in regards of connection handling. Instead, it is intended as a workaround for servers which don't work well when they see a non-keepalive connection (as is generated by HAProxy to the backend server).
Support for keep-alive towards the backend server is scheduled to be in the final HAProxy 1.5 release. But the actual scope of that might still vary and the final release date is sometime in the future...
Just FYI, it's present in the latest release 1.5-dev20 (but take the fixes with it, as it shipped with a few regressions).

HAProxy random HTTP 503 errors

We've setup 3 servers:
Server A with Nginx + HAproxy to perform load balancing
backend server B
backend server C
Here is our /etc/haproxy/haproxy.cfg:
global
log /dev/log local0
log 127.0.0.1 local1 notice
maxconn 40096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 50000
clitimeout 50000
srvtimeout 50000
stats enable
stats uri /lb?stats
stats realm Haproxy\ Statistics
stats auth admin:admin
listen statslb :5054 # choose different names for the 2 nodes
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth admin:admin
listen Server-A 0.0.0.0:80
mode http
balance roundrobin
cookie JSESSIONID prefix
option httpchk HEAD /check.txt HTTP/1.0
server Server-B <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 2
server Server-C <server.ip>:80 cookie app1inst2 check inter 1000 rise 2 fall 3
All of the three servers have a good amount of RAM and CPU cores to handle requests
Random HTTP 503 errors are shown when browsing: 503 Service Unavailable - No server is available to handle this request.
And also on server's console:
Message from syslogd#server-a at Dec 21 18:27:20 ...
haproxy[1650]: proxy Server-A has no server available!
Note that 90% times of the time there is no errors. These errors happens randomly.
I had the same issue. After days of pulling my hair out I found the issue.
I had two HAProxy instances running. One was a zombie that somehow never got killed during maybe an update or a haproxy restart. I noticed this when refreshing the /haproxy stats page and the PID would change between two different numbers. The page with one of the numbers had absurd connection stats. To confirm I did
netstat -tulpn | grep 80
Or
sudo lsof -i:80
and saw two haproxy processes listening to port 80.
To fix the issue I did a "kill xxxx" where xxxx is the pid with the suspicious statistics.
Adding my answer here for anyone else who encounters this exact same problem but none of the listed solutions above are applicable. Please note that my answer does not apply to the original code listed above.
For anyone else who may have this problem, check your config and see if you might have mistakenly put the same "bind" line in multiple sections of your config. Haproxy does not check this during startup, and I plan to submit this as a recommended validation check to the developers. In my case, I have 3 different sections of the config, and I mistakenly put the same IP binding in two different places. It was about a 50/50 shot on whether or not the correct section would be used or the incorrect section was used. Even when the correct section was used, about half of the requests still got a 503.
It is possible your servers share, perhaps, a common resource that is timing out at certain times, and that your health check requests are being made at the same time (and thus pulling the backend servers out at the same time).
You can try using the HAProxy option spread-checks to randomize health checks.
I had the same issue, due to 2 HAProxy services running in the linux box, but with different name/pid/resources. Unless i stop the unwanted one, the required instances throws 503 error randomly, say 1 in 5 times.
Was trying to use single linux box for multiple URL routing but looks a limitation in haproxy or the config file of haproxy i have defined.
Hard to say without more details, but is it possible you are exceeding the configured maxconn for each backend? The Stats UI shows these stats on both the frontend and on individual backends.
I resolved my intermittent 503s with HAProxy by adding option http-server-close to backend. Looks like uWSGI (which is upstream) is not doing well with keep-alive. Not sure what's really behind the problem, but after adding this option, haven't seen single 503 since.
don't use the "bind" line in multiple sections of your haproxy.cfg
for example, this would be wrong
frontend stats
bind *:443 ssl crt /etc/ssl/certs/your.pem
frontend Main
bind *:443 ssl crt /etc/ssl/certs/your.pem
fix like this
frontend stats
bind *:8443 ssl crt /etc/ssl/certs/your.pem
frontend Main
bind *:443 ssl crt /etc/ssl/certs/your.pem