haproxy as loadbalancer for kube-apiserver - kubernetes

I've set up a kubernetes cluster with three masters. The kube-apiserver should be stateless. To properly access them from the worker nodes, I've configured an haproxy which is configured to provide the ports (8080) of the apiserver.
frontend http_front_8080
bind *:8080
stats uri /haproxy?stats
default_backend http_back_8080
backend http_back_8080
balance roundrobin
server m01 192.168.33.21:8080 check
server m02 192.168.33.22:8080 check
server m03 192.168.33.23:8080 check
But when I run the nodes with the loadbalancers ip as the address of the apiserver I'll receive this errors:
Apr 20 12:35:07 n01 kubelet[3383]: E0420 12:35:07.308337 3383 reflector.go:271] pkg/kubelet/kubelet.go:240: Failed to watch *api.Service: too old resource version: 4001 (4041)
Apr 20 12:36:48 n01 kubelet[3383]: E0420 12:36:48.321021 3383 reflector.go:271] pkg/kubelet/kubelet.go:240: Failed to watch *api.Service: too old resource version: 4011 (4041)
Apr 20 12:37:31 n01 kube-proxy[3408]: E0420 12:37:31.381042 3408 reflector.go:271] pkg/proxy/config/api.go:47: Failed to watch *api.Service: too old resource version: 4011 (4041)
Apr 20 12:41:42 n01 kube-proxy[3408]: E0420 12:41:42.409604 3408 reflector.go:271] pkg/proxy/config/api.go:47: Failed to watch *api.Service: too old resource version: 4011 (4041)
If I change the loadbalancers IP to one of the masters nodes it works as expected (without these error messages above).
Am I something missing in my haproxy configuration which is vital for running this config?

I had the same issue as you. I assume the watch requires some sort of state on the api server side.
The solution is to change the configuration so all the requests from a client go to the same server using balance source. I assume you only have multiple api servers so kubernetes is highly available (instead of load balancing).
frontend http_front_8080
bind *:8080
stats uri /haproxy?stats
default_backend http_back_8080
backend http_back_8080
balance source
server m01 192.168.33.21:8080 check
server m02 192.168.33.22:8080 check
server m03 192.168.33.23:8080 check

Related

Haproxy SSL termination : Layer4 connection problem, info: "Connection refused"

I was trying to implement SSL termination with HAProxy.
This is how my haproxy.cfg looks like
frontend Local_Server
bind *:443 ssl crt /home/vagrant/ingress-certificate/k8s.pem
mode tcp
reqadd X-Forwarded-Proto:\ https
default_backend k8s_server
backend k8s_server
mode tcp
balance roundrobin
redirect scheme https if !{ ssl_fc }
server web1 100.0.0.2:8080 check
I have generated the self signed certificate which k8s.pem.
My normal URL (without https) is working perfectly fine .i.e. - http://100.0.0.2/hello
But when i try to access the same url with HTTPS .i.e.- https://100.0.0.2/hello i get 404 and when i checked my haproxy logs i can see following message
Jul 21 18:10:19 node1 haproxy[10813]: Server k8s_server/web1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 21 18:10:19 node1 haproxy[10813]: Server k8s_server/web1 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Any suggestions which i can incorporate in my haproxy.cfg ?
PS - The microservice which i am trying to access is deployed under kubernetes cluster with service exposed as ClusterIP

Error restoring Rancher: This cluster is currently Unavailable; areas that interact directly with it will not be available until the API is ready

I am trying to backup and restore rancher server (single node install), as the described here.
After backup, I tried to turn off the rancher server node, and I run a new rancher container on a new node (in the same network, but another ip address), then I restored using the backup file.
After restoring, I logged in to the rancher UI and it showed the error below:
So, I checked the logs of the rancher server and it showed as below:
2019-10-05 16:41:32.197641 I | http: TLS handshake error from 127.0.0.1:38388: EOF
2019-10-05 16:41:32.202442 I | http: TLS handshake error from 127.0.0.1:38380: EOF
2019-10-05 16:41:32.210378 I | http: TLS handshake error from 127.0.0.1:38376: EOF
2019-10-05 16:41:32.211106 I | http: TLS handshake error from 127.0.0.1:38386: EOF
2019/10/05 16:42:26 [ERROR] ClusterController c-4pgjl [user-controllers-controller] failed with : failed to start user controllers for cluster c-4pgjl: failed to contact server: Get https://192.168.94.154:6443/api/v1/namespaces/kube-system?timeout=30s: waiting for cluster agent to connect
2019/10/05 16:44:34 [ERROR] ClusterController c-4pgjl [user-controllers-controller] failed with : failed to start user controllers for cluster c-4pgjl: failed to contact server: Get https://192.168.94.154:6443/api/v1/namespaces/kube-system?timeout=30s: waiting for cluster agent to connect
2019/10/05 16:48:50 [ERROR] ClusterController c-4pgjl [user-controllers-controller] failed with : failed to start user controllers for cluster c-4pgjl: failed to contact server: Get https://192.168.94.154:6443/api/v1/namespaces/kube-system?timeout=30s: waiting for cluster agent to connect
2019-10-05 16:50:19.114475 I | mvcc: store.index: compact 75951
2019-10-05 16:50:19.137825 I | mvcc: finished scheduled compaction at 75951 (took 22.527694ms)
2019-10-05 16:55:19.120803 I | mvcc: store.index: compact 76282
2019-10-05 16:55:19.124813 I | mvcc: finished scheduled compaction at 76282 (took 2.746382ms)
After that, I checked logs of the master nodes, I found that the rancher agent still tries to connect to the old rancher server (old ip address), not as the new one, so it makes the cluster not available.
How can I fix this?
You need to re-register the node in Rancher using the following steps.
Update the server-url in Rancher by going to Global -> Settings -> server-url
This should be the full URL with https://
Then use this script to re-register the node in Rancher https://github.com/mattmattox/cluster-agent-tool

Minikube is slow and unresponsive

Today randomly minikube seems to be taking very long to respond to command via kubectl.
And occasionally even:
kubectl get pods
Unable to connect to the server: net/http: TLS handshake timeout
How can I diagnose this?
Some logs from minikube logs:
==> kube-scheduler <==
I0527 14:16:55.809859 1 serving.go:319] Generated self-signed cert in-memory
W0527 14:16:56.256478 1 authentication.go:387] failed to read in-cluster kubeconfig for delegated authentication: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0527 14:16:56.256856 1 authentication.go:249] No authentication-kubeconfig provided in order to lookup client-ca-file in configmap/extension-apiserver-authentication in kube-system, so client certificate authentication won't work.
W0527 14:16:56.257077 1 authentication.go:252] No authentication-kubeconfig provided in order to lookup requestheader-client-ca-file in configmap/extension-apiserver-authentication in kube-system, so request-header client certificate authentication won't work.
W0527 14:16:56.257189 1 authorization.go:177] failed to read in-cluster kubeconfig for delegated authorization: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory
W0527 14:16:56.257307 1 authorization.go:146] No authorization-kubeconfig provided, so SubjectAccessReview of authorization tokens won't work.
I0527 14:16:56.264875 1 server.go:142] Version: v1.14.1
I0527 14:16:56.265228 1 defaults.go:87] TaintNodesByCondition is enabled, PodToleratesNodeTaints predicate is mandatory
W0527 14:16:56.286959 1 authorization.go:47] Authorization is disabled
W0527 14:16:56.286982 1 authentication.go:55] Authentication is disabled
I0527 14:16:56.286995 1 deprecated_insecure_serving.go:49] Serving healthz insecurely on [::]:10251
I0527 14:16:56.287397 1 secure_serving.go:116] Serving securely on 127.0.0.1:10259
I0527 14:16:57.417028 1 controller_utils.go:1027] Waiting for caches to sync for scheduler controller
I0527 14:16:57.524378 1 controller_utils.go:1034] Caches are synced for scheduler controller
I0527 14:16:57.827438 1 leaderelection.go:217] attempting to acquire leader lease kube-system/kube-scheduler...
E0527 14:17:10.865448 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0527 14:17:43.418910 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0527 14:18:01.447065 1 leaderelection.go:227] successfully acquired lease kube-system/kube-scheduler
I0527 14:18:29.044544 1 leaderelection.go:263] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0527 14:18:38.999295 1 server.go:252] lost master
E0527 14:18:39.204637 1 leaderelection.go:306] error retrieving resource lock kube-system/kube-scheduler: Get https://localhost:8443/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
lost lease
Update:
To work around this issue I just did a minikube delete and minikube start, and the performance issue resolved..
As solution has been found, I am posting this as Community Wiki for future users.
1) Debugging issues with minikube by adding -v flag and set debug level (0, 1, 2, 3, 7).
As example: minikube start --v=1 to set outbut to INFO level.
More detailed information here
2) Use logs command minikube logs
3) Because Minikube is working on Virtual Machine sometimes is better to delete minikube and start it again (It helped in this case).
minikube delete
minikube start
4) It might get slow due to lack of resources.
Minikube as default is using 2048MB of memory and 2 CPUs. More details about this can be fund here
In addition, you can enforce Minikube to create more using command
minikube start --cpus 4 --memory 8192

Marathon Proxy loop detected

Running Marathon in HA ver 1.1.1-1.0.472.el7.
One of the non-leader nodes is returning 502 "Detecting Proxy Loop".
When checking the logs for each request I am seeing the following:
Oct 17 11:13:10 marathon-2-server marathon[13362]: [2016-10-17 11:13:10,130] INFO Proxying request to GET http://marathon-1-server:8080/favicon.ico from marathon-2-server:8080 (mesosphere.marathon.api.JavaUrlConnectionRequestForwarder$:qtp1485208789-526699)
Oct 17 11:13:10 marathon-2-server marathon[13362]: [2016-10-17 11:13:10,131] INFO Proxying request to GET http://marathon-1-server:8080/favicon.ico from marathon-2-server:8080 (mesosphere.marathon.api.JavaUrlConnectionRequestForwarder$:qtp1485208789-530887)
Oct 17 11:13:10 marathon-2-server marathon[13362]: [2016-10-17 11:13:10,131] ERROR Prevent proxy cycle, rejecting request (mesosphere.marathon.api.JavaUrlConnectionRequestForwarder$:qtp1485208789-530887)
Marathon had some issues with leader proxy detecting during leader election, but this should be solved with 1.3 or newer.

HAProxy not running stats socket

I installed haproxy from aur in Arch Linux and modified the config file a bit:
global
maxconn 20000
log 127.0.0.1 local0
user haproxy
stats socket /run/haproxy/haproxy.sock mode 660 level admin
stats timeout 30s
chroot /usr/share/haproxy
pidfile /run/haproxy.pid
daemon
defaults
mode http
stats enable
stats uri /stats
stats realm Haproxy\ Statistics
frontend www-http
bind 127.0.0.1:80
default_backend www-backend
backend www-backend
mode http
balance roundrobin
timeout connect 5s
timeout server 30s
timeout queue 30s
server app1 127.0.0.1:5001 check
server app2 127.0.0.1:5002 check
I have made sure that the directory /run/haproxy exists and has permissions for the user haproxy to write to it:
ツ ls -al /run/haproxy
total 0
drwxr-xr-x 2 haproxy root 40 May 13 21:37 .
drwxr-xr-x 27 root root 720 May 13 22:00 ..
When I launch haproxy using systemctl start haproxy.service, it loads fine. I can even go to the /stats page and view stats, however, socat reports the following error:
ツ sudo socat unix-connect:/run/haproxy/haproxy.sock stdio
2016/05/13 22:04:11 socat[24202] E connect(5, AF=1 "/run/haproxy/haproxy.sock", 27): No such file or directory
I am at wits end and not able to understand what is happening. This is what I get from journalctl -xe:
May 13 21:56:31 rohanarch.local systemd[1]: Starting HAProxy Load Balancer...
May 13 21:56:31 rohanarch.local systemd[1]: Started HAProxy Load Balancer.
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: haproxy-systemd-wrapper: executing /usr/bin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: [WARNING] 133/215631 (20456) : config : missing timeouts for frontend 'www-http'.
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: | While not properly invalid, you will certainly encounter various problems
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: | with such a configuration. To fix this, please ensure that all following
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Basically, no errors/warnings or not even so much as an indication about the stats socket. Others who have faced a problem with the stats socket fail to get haproxy started. In my case, it starts up fine, but the socket just isn't creating.
You need to manually create the directory yourself. Please ensure
/run/haproxy exists. If it doesn't, then first create it with:
sudo mkdir /run/haproxy
This should resolve your issue.
try to make selinux permissive with the command belowe and restart HAproxy service.
selinux command