Issue with HAProxy not retrying on retry-on - haproxy

We were having issues with Apache mod_proxy getting random 502/503 errors from our backend server (that we don’t control), so we decided to give HAProxy a shot in testing. We setup HAProxy and got the same errors, so decided to try the retry-on all-retryable-errors but keep getting the same errors. We would have thought that HAProxy would have attempted retries on this but it doesn’t seem to be happening.
For testing, we do a WGET every half a second for 10000 tries. out of the 10000 tries, we get about 10 errors.
Didn’t know if someone could look at our setup and logs to help us determine why the retry isn’t occuring.
haproxy.cfg
global
log 127.0.0.1 local2 debug
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
frontend main
bind 127.0.0.1:5000
default_backend app
mode http
backend app
balance roundrobin
http-send-name-header Host
retry-on all-retryable-errors
retries 10
http-request disable-l7-retry if METH_POST
server srv1 backend-server:443 ssl verify none
server srv2 backend-server:443 ssl verify none
server srv3 backend-server:443 ssl verify none
haproxy.log (you see the 502 error in the middle of the log)
Apr 27 19:01:29 localhost haproxy[26058]: 127.0.0.1:58028 [27/Apr/2022:19:01:29.769] main app/srv1 0/0/5/111/116 200 1932 - - ---- 1/1/0/0/0 0/0 "GET /PBI_PBI1151/Login/RemoteInitialize/053103585 HTTP/1.1"
Apr 27 19:01:30 localhost haproxy[12306]: 127.0.0.1:58032 [27/Apr/2022:19:01:30.430] main app/srv2 0/0/2/119/121 200 1932 - - ---- 1/1/0/0/0 0/0 "GET /PBI_PBI1151/Login/RemoteInitialize/053103585 HTTP/1.1"
Apr 27 19:01:31 localhost haproxy[8726]: 127.0.0.1:58036 [27/Apr/2022:19:01:31.099] main app/srv2 0/0/6/114/120 200 1932 - - ---- 1/1/0/0/0 0/0 "GET /PBI_PBI1151/Login/RemoteInitialize/053103585 HTTP/1.1"
Apr 27 19:01:33 localhost haproxy[26058]: 127.0.0.1:58040 [27/Apr/2022:19:01:31.764] main app/srv2 0/0/6/-1/1385 502 209 - - SH-- 1/1/0/0/0 0/0 "GET /PBI_PBI1151/Login/RemoteInitialize/053103585 HTTP/1.1"
Apr 27 19:01:33 localhost haproxy[8726]: 127.0.0.1:58044 [27/Apr/2022:19:01:33.695] main app/srv3 0/0/10/112/122 200 1932 - - ---- 1/1/0/0/0 0/0 "GET /PBI_PBI1151/Login/RemoteInitialize/053103585 HTTP/1.1"
Apr 27 19:01:34 localhost haproxy[26058]: 127.0.0.1:58048 [27/Apr/2022:19:01:34.362] main app/srv3 0/0/3/113/116 200 1932 - - ---- 1/1/0/0/0 0/0 "GET /PBI_PBI1151/Login/RemoteInitialize/053103585 HTTP/1.1"
Apr 27 19:01:45 localhost haproxy[8726]: 127.0.0.1:58052 [27/Apr/2022:19:01:35.023] main app/srv1 0/0/16/10552/10568 200 1932 - - ---- 1/1/0/0/0 0/0 "GET /PBI_PBI1151/Login/RemoteInitialize/053103585 HTTP/1.1"
output from the wget:
--2022-04-27 19:01:31-- http://localhost:5000/PBI_PBI1151/Login/RemoteInitialize/053103585
Resolving localhost (localhost)... 127.0.0.1
Connecting to localhost (localhost)|127.0.0.1|:5000... connected.
HTTP request sent, awaiting response... 502 Bad Gateway
2022-04-27 19:01:33 ERROR 502: Bad Gateway.

Related

HAProxy for postgresql Load Balancer start error - cannot bind socket

I am setting postgresql loadbalance using Haproxy and I met a error messages as below:
Jun 30 07:57:43 vm0 systemd[1]: Starting HAProxy Load Balancer...
Jun 30 07:57:43 vm0 haproxy[15084]: [ALERT] 180/075743 (15084) : Starting proxy ReadWrite: cannot bind socket [0.0.0.0:8081]
Jun 30 07:57:43 vm0 haproxy[15084]: [ALERT] 180/075743 (15084) : Starting proxy ReadOnly: cannot bind socket [0.0.0.0:8082]
Jun 30 07:57:43 vm0 systemd[1]: haproxy.service: Main process exited, code=exited, status=1/FAILURE
Jun 30 07:57:43 vm0 systemd[1]: haproxy.service: Failed with result 'exit-code'.
Jun 30 07:57:43 vm0 systemd[1]: Failed to start HAProxy Load Balancer.
the below is my haproxy.cfg file and I kept checking all the possiblilites but I couldn't find the reason why I have the error. actualy I check the port is already used but no other process use the port 8001, 8002
-- haproxy.cfg
listen ReadWrite
bind *:8081
option httpchk
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg1 pg1:5432 maxconn 100 check port 23267
listen ReadOnly
bind *:8082
option httpchk
http-check expect status 206
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg2 pg1:5432 maxconn 100 check port 23267
server pg3 pg2:5432 maxconn 100 check port 23267

Haproxy SSL(https) health checks without terminating ssl

so I can't figure out a proper way to do the SSL check, I am not using certificates, just need to check against a HTTPS websites url (google.com/ for example)
Trying multiple combinations at a time, without success. Maybe someone has a similar configuration,
backends using -
> check-sni google.com sni ssl_fc_sni
returns - reason: Layer7 wrong status, code: 301, info: "Moved Permanently"
check port 80 check-ssl -
reason: Layer6 invalid response, info: "SSL handshake failure"
All others just timing out. Here's the complete configuration file-
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
ssl-server-verify none
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
# An alternative list with additional directives can be obtained from
# https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend myfront
bind *:8000
mode tcp
tcp-request inspect-delay 5s
default_backend backend1
listen stats
bind :444
stats enable
stats uri /
stats hide-version
stats auth test:test
backend Backends
balance roundrobin
option forwardfor
option httpchk
http-check send hdr host google.com meth GET uri /
http-check expect status 200
#http-check connect
#http-check send meth GET uri / ver HTTP/1.1 hdr host haproxy.1wt.eu
#http-check expect status 200-399
#http-check connect port 443 ssl sni haproxy.1wt.eu
#http-check send meth GET uri / ver HTTP/1.1 hdr host haproxy.1wt.eu
#http-check expect status 200-399
#http-check connect port 443 ssl sni google.com
#http-check send meth GET uri / ver HTTP/1.1 hdr host google.com
default-server fall 10 rise 1
server Node1011 192.168.0.2:1011 check inter 15s check-ssl check port 443
server Node1012 192.168.0.2:1012 check inter 15s check-ssl check port 443
server Node1015 192.168.0.2:1015 check inter 15s check port 443
server Node1017 192.168.0.2:1017 check inter 15s check-ssl check-sni google.com sni ssl_fc_sni
server Node1018 192.168.0.2:1018 check inter 15s check-ssl check-sni google.com sni ssl_fc_sni
server Node1019 192.168.0.2:1019 check inter 15s check-sni google.com sni ssl_fc_sni
server Node1020 192.168.0.2:1020 check inter 15s check port 443 check-ssl
server Node1021 192.168.0.2:1021 check inter 15s check port 443 check-ssl
server Node1027 192.168.0.2:1027 check inter 15s check port 80
server Node1028 192.168.0.2:1028 check inter 15s check port 80
server Node1029 192.168.0.2:1029 check inter 15s check port 80
server Node1030 192.168.0.2:1030 check inter 15s check port 80 check-ssl
server Node1031 192.168.0.2:1031 check inter 15s check port 80 check-ssl
server Node1033 192.168.0.2:1033 check inter 15s check port 80 check-ssl verify none
server Node1034 192.168.0.2:1034 check inter 15s check port 80 check-ssl verify none
server Node1035 192.168.0.2:1035 check inter 15s check-ssl
server Node1036 192.168.0.2:1036 check inter 15s check-ssl
server Node1048 192.168.0.2:1048 check inter 15s check-ssl verify none
server Node1049 192.168.0.2:1049 check inter 15s check-ssl verify none
P.s Found a website, which explains just what I'm trying to do(https://hodari.be/posts/2020_09_04_configure_sni_for_haproxy_backends/), but that doesn't work either, my haproxy version is 2.2.3
P.s.s I am literally trying to check against www.google.com , just to be clear.
Thank you!
That's really not an error. If you do a curl to https://google.com it does do a 301 redirect to https://www.google.com/. I snipped out some protocol details below for brevity, but you get the idea.
Either change your expect to 301, or use www.google.com.
paul:~ $ curl -vv https://google.com
* Rebuilt URL to: https://google.com/
* Trying 172.217.1.206...
-[snip]-
> GET / HTTP/2
> Host: google.com
> User-Agent: curl/7.58.0
> Accept: */*
>
-[snip]-
< HTTP/2 301
< location: https://www.google.com/
< content-type: text/html; charset=UTF-8
< date: Mon, 18 Jan 2021 03:42:04 GMT
< expires: Wed, 17 Feb 2021 03:42:04 GMT
< cache-control: public, max-age=2592000
< server: gws
< content-length: 220
< x-xss-protection: 0
< x-frame-options: SAMEORIGIN
< alt-svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
<
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
So, if you want to avoid the 301, use the www.google.com value in your config as thus:
http-check send hdr host www.google.com meth GET uri /

Nginx ingress: upstream connection timeout (Operation timed out)

I have configured nginx-ingress for UDP load balancing. Below is my configuration in helm chart.
udp: {
"514":"default/syslog-service:514",
"162":"default/trapreceiver-service:162",
"123":"default/ntp-service:123"
}
And my configmap
data:
max-worker-connections: "65535"
proxy-body-size: 500m
proxy-connect-timeout: "50"
proxy-next-upstream-tries: "2"
proxy-read-timeout: "3600"
proxy-send-timeout: "120"
Ingress is not able to send UDP packets to backend due to below error
[1000::601] [09/Nov/2020:17:19:57 +0000] UDP 502 0 48 600.000
[1000::601] [09/Nov/2020:17:19:57 +0000] UDP 502 0 48 600.001
2020/11/09 17:19:57 [error] 1166#1166: *868219 upstream timed out (110: Operation timed out) while proxying connection, udp client: 1000::601, server: [::]:123, upstream: "[fc00::fcc1]:123", bytes from/to client:48/0, bytes from/to upstream:0/48
[1000::601] [09/Nov/2020:17:19:57 +0000] UDP 502 0 387 600.001
2020/11/09 17:19:57 [error] 1166#1166: *868221 upstream timed out (110: Operation timed out) while proxying connection, udp client: 1000::601, server: [::]:162, upstream: "[fc00::696e]:162", bytes from/to client:387/0, bytes from/to upstream:0/387

k8s pod readiness probe fails with connection refused, but pod is serving requests just fine

I'm having a hard time understanding why a pods readiness probe is failing.
Warning Unhealthy 21m (x2 over 21m) kubelet, REDACTED Readiness probe failed: Get http://192.168.209.74:8081/actuator/health: dial tcp 192.168.209.74:8081: connect: connection refused
If I exec into this pod (or in fact into any other I have for that application), I can run a curl against that very URL without issue:
kubectl exec -it REDACTED-l2z5w /bin/bash
$ curl -v http://192.168.209.74:8081/actuator/health
$ curl -v http://192.168.209.74:8081/actuator/health
* Expire in 0 ms for 6 (transfer 0x5611b949ff50)
* Trying 192.168.209.74...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5611b949ff50)
* Connected to 192.168.209.74 (192.168.209.74) port 8081 (#0)
> GET /actuator/health HTTP/1.1
> Host: 192.168.209.74:8081
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200
< Set-Cookie: CM_SESSIONID=E62390F0FF8C26D51C767835988AC690; Path=/; HttpOnly
< X-Content-Type-Options: nosniff
< X-XSS-Protection: 1; mode=block
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: 0
< X-Frame-Options: DENY
< Content-Type: application/vnd.spring-boot.actuator.v3+json
< Transfer-Encoding: chunked
< Date: Tue, 02 Jun 2020 15:07:21 GMT
<
* Connection #0 to host 192.168.209.74 left intact
{"status":"UP",...REDACTED..}
I'm getting this behavior from both a Docker-for-Desktop k8s cluster on my Mac as well as an OpenShift cluster.
The readiness probe is shown like this in kubectl describe:
Readiness: http-get http://:8081/actuator/health delay=20s timeout=3s period=5s #success=1 #failure=10
The helm chart has this to configure it:
readinessProbe:
failureThreshold: 10
httpGet:
path: /actuator/health
port: 8081
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
I cannot fully rule out that HTTP proxy settings are to blame, but the k8s docs say that HTTP_PROXY is ignored for checks since v1.13, so it shouldn't happen locally.
The OpenShift k8s version is 1.11, my local one is 1.16.
Describing events always show the last event on the resource you are checking. The thing is that the last event logged was an error while checking the readinessProbe.
I tested it in my lab with the following pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: readiness-exec
spec:
containers:
- name: readiness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- sleep 30; touch /tmp/healthy; sleep 600
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
As can be seen, a file /tmp/healthy will be created in the pod after 30 seconds and the readinessProbe will check if the file exists after 5 seconds and repeat the check after every 5 seconds.
Describing this pod will give me that:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m56s default-scheduler Successfully assigned default/readiness-exec to yaki-118-2
Normal Pulling 7m55s kubelet, yaki-118-2 Pulling image "k8s.gcr.io/busybox"
Normal Pulled 7m55s kubelet, yaki-118-2 Successfully pulled image "k8s.gcr.io/busybox"
Normal Created 7m55s kubelet, yaki-118-2 Created container readiness
Normal Started 7m55s kubelet, yaki-118-2 Started container readiness
Warning Unhealthy 7m25s (x6 over 7m50s) kubelet, yaki-118-2 Readiness probe failed: cat: can't open '/tmp/healthy': No such file or directory
The readinessProbe looked for the file 6 times with no success and it's completely right as I configured it to check every 5 seconds and the file was created after 30 seconds.
What you think is a problem is actually the expected behavior. Your Events is telling you that the readinessProbe failed to check 21 minutes ago. It means actually that your pod is healthy since 21 minutes ago.

proxy Memcache_Servers has no server available

proxy Memcache_Servers has no server available, when I start the haproxy.service:
[root#ha-node1 log]# systemctl restart haproxy.service
Message from syslogd#localhost at Aug 2 10:49:23 ...
haproxy[81665]: proxy Memcache_Servers has no server available!
The configuration in my haproxy.cfg:
listen Memcache_Servers
bind 45.117.40.168:11211
balance roundrobin
mode tcp
option tcpka
server ha-node1 ha-node1:11211 check inter 10s fastinter 2s downinter 2s rise 30 fall 3
server ha-node2 ha-node2:11211 check inter 10s fastinter 2s downinter 2s rise 30 fall 3
server ha-node3 ha-node3:11211 check inter 10s fastinter 2s downinter 2s rise 30 fall 3
At last, I found the ip in my hosts is like below:
[root#ha-node1 sysconfig]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.8.101 ha-node1 ha-node1.aa.com
192.168.8.102 ha-node2 ha-node2.aa.com
192.168.8.103 ha-node3 ha-node3.aa.com
45.117.40.168 ha-vhost devops.aa.com
192.168.8.104 nfs-backend backend.aa.com
But in my /etc/sysconfig/memcached, the ip is not the host ip before, so I changed to the ip in the hosts:
Now I restart the memcached and haproxy, it works normal now.