CoreDNS has problems getting Endpoints, Services, Namespaces - kubernetes

I have a following problem with CoreDNS from master (also see ready is 0/1 on master):
E0321 22:54:45.590231 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.528164 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.531540 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.531540 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.531540 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.531540 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.591304 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.591304 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.591304 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E0321 22:54:46.591304 1 reflector.go:126] pkg/mod/k8s.io/client-go#v11.0.0+incompatible/tools/cache/reflector.go:94: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
Where everything else seem to be running normally and I can also access internet from nodes/pods on cluster
kube-system coredns-776474d56-46fnz 1/1 Running 0 2d23h 10.32.0.3 raspberrypi4-node <none> <none>
kube-system coredns-776474d56-7nlw4 0/1 Running 0 32h 10.36.0.1 raspberrypi4-master <none> <none>
kube-system etcd-raspberrypi4-master 1/1 Running 6 3d22h 192.168.0.192 raspberrypi4-master <none> <none>
kube-system kube-apiserver-raspberrypi4-master 1/1 Running 4 3d22h 192.168.0.192 raspberrypi4-master <none> <none>
kube-system kube-controller-manager-raspberrypi4-master 1/1 Running 9 3d22h 192.168.0.192 raspberrypi4-master <none> <none>
kube-system kube-proxy-6vgm9 1/1 Running 0 3d13h 192.168.0.157 raspberrypi3-node <none> <none>
kube-system kube-proxy-vqqv7 1/1 Running 5 3d22h 192.168.0.192 raspberrypi4-master <none> <none>
kube-system kube-proxy-wj784 1/1 Running 0 3d21h 192.168.0.90 raspberrypi4-node <none> <none>
kube-system kube-scheduler-raspberrypi4-master 1/1 Running 9 3d22h 192.168.0.192 raspberrypi4-master <none> <none>
kube-system weave-net-6db56 2/2 Running 0 3d9h 192.168.0.90 raspberrypi4-node <none> <none>
kube-system weave-net-7t7t6 2/2 Running 0 3d9h 192.168.0.192 raspberrypi4-master <none> <none>
kube-system weave-net-mg79s 2/2 Running 0 3d9h 192.168.0.157 raspberrypi3-node <none> <none>
I have checked the docs and some ports are not open, but this is access to port 443 which is kinda system privileged port, so I am wondering if this is the case where I need to provide access to kubernetes to that port (and maybe forward it to 6443 which in docs is Kubernetes API server). I will also get access from outside of cluster to this port and would like kubernetes services to handle it and would appreciate a simple command to forward 80 and 443 ports to that port.
I just noticed that service is indeed listening to correct IP/port, so no idea why it is refusing connection.
$ kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3d22h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 3d22h

Accepted answer did not solve my problem. In case someone has similar issues restarting coredns solved my issue.
kubectl rollout restart deployment coredns --namespace kube-system

The problem is with iptables.
Make sure the ip forwarding is enabled on the linux kernel of every node.
Execute command:
$ sysctl net.ipv4.conf.all.forwarding = 1
If your docker's version >=1.13, the default FORWARD chain policy was dropped, you have to set default policy of the FORWARD chain to ACCEPT.
Execute command:
$ sudo iptables -P FORWARD ACCEPT.
Finally pass kube-proxy configuration using flag cluster-cidr:
--cluster-cidr=.
--cluster-cidr flag means:
CIDR Range for Pods in cluster. Requires --allocate-node-cidrs to be
true.
If not provided, no off-cluster bridging will be performed.
Similar problem: kubernetes-coredns-issue.
Please let me know if it helped.

Related

Kubernetes cluster on bare metal by kubeadm

I'm trying to create a single control-plane cluster with kubeadm on 3 bare metal nodes (1 master and 2 workers) running on Debian 10 with Docker as a container runtime. Each node has an external IP and internal IP.
I want to configure a cluster on the internal network and be accessible from the Internet.
Used this command for that (please correct me if something wrong):
kubeadm init --control-plane-endpoint=10.10.0.1 --apiserver-cert-extra-sans={public_DNS_name},10.10.0.1 --pod-network-cidr=192.168.0.0/16
I got:
kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
dev-k8s-master-0.public.dns Ready master 16h v1.18.2 10.10.0.1 <none> Debian GNU/Linux 10 (buster) 4.19.0-8-amd64 docker://19.3.8
Init phase complete successfully and the cluster is accessible from the Internet. All pods are up and running except coredns that should be running after networking will be applied.
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
After networking applied, coredns pods still not ready:
kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-75d56dfc47-g8g9g 0/1 CrashLoopBackOff 192 16h
kube-system calico-node-22gtx 1/1 Running 0 16h
kube-system coredns-66bff467f8-87vd8 0/1 Running 0 16h
kube-system coredns-66bff467f8-mv8d9 0/1 Running 0 16h
kube-system etcd-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-apiserver-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-controller-manager-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-proxy-lp6b8 1/1 Running 0 16h
kube-system kube-scheduler-dev-k8s-master-0 1/1 Running 0 16h
Some logs from failed pods:
kubectl -n kube-system logs calico-kube-controllers-75d56dfc47-g8g9g
2020-04-22 08:24:55.853 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
2020-04-22 08:24:55.855 [INFO][1] k8s.go 228: Using Calico IPAM
W0422 08:24:55.855525 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-04-22 08:24:55.856 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-04-22 08:25:05.857 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-04-22 08:25:05.857 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
coredns:
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0422 08:29:12.275344 1 trace.go:116] Trace[1050055850]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.274382393 +0000 UTC m=+59491.429700922) (total time: 30.000897581s):
Trace[1050055850]: [30.000897581s] [30.000897581s] END
E0422 08:29:12.275388 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0422 08:29:12.276163 1 trace.go:116] Trace[188478428]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.275499997 +0000 UTC m=+59491.430818380) (total time: 30.000606394s):
Trace[188478428]: [30.000606394s] [30.000606394s] END
E0422 08:29:12.276198 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0422 08:29:12.277424 1 trace.go:116] Trace[16697023]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.276675998 +0000 UTC m=+59491.431994406) (total time: 30.000689778s):
Trace[16697023]: [30.000689778s] [30.000689778s] END
E0422 08:29:12.277452 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
Any thoughts what's wrong?
This answer is to call attention to #florin suggestion:
I've seen a similar behavior when I had multiple public interfaces on the node and calico selected the wrong one.
What I did is to set IP_AUTODETECT_METHOD in the calico config.
From Calico Configuration on IP_AUTO_DETECT_METHOD:
The method to use to autodetect the IPv4 address for this host. This is only used when the IPv4 address is being autodetected. See IP Autodetection methods for details of the valid methods.
Learn more Here: https://docs.projectcalico.org/reference/node/configuration#ip-autodetection-methods
I am also facing same problem, but following is work for me, try this in you master node.
$ sudo iptables -P INPUT ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -F

coredns crashes with error "Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/****: dial tcp 10.96.0.1:443: connect: no route to host"

CoreDNS pod is not running. Please find below status.
kubectl get po --all-namespaces -o wide | grep -i coredns
kube-system coredns-6955765f44-8qhkr 1/1 Running 0 24m 10.244.0.59 k8s-master <none> <none>
kube-system coredns-6955765f44-lpmjk 0/1 Running 0 24m 10.244.1.43 k8s-worker-node-1 <none> <none>
Please find below logs of pod.
kubectl logs coredns-6955765f44-lpmjk -n kube-system
E0420 03:43:03.855622 1 reflector.go:125] pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0420 03:43:03.855622 1 reflector.go:125] pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0420 03:43:03.855622 1 reflector.go:125] pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0420 03:43:03.855622 1 reflector.go:125] pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0420 03:43:05.859525 1 reflector.go:125] pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0420 03:43:05.859525 1 reflector.go:125] pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
To solve no route to host issue with CoreDNS pods you have to flush iptables by running:
systemctl stop kubelet
systemctl stop docker
iptables --flush
iptables -tnat --flush
systemctl start kubelet
systemctl start docker
Also mind that flannel has been removed from the list of CNIs in the kubeadm documentation:
The reason for that is that Cluster Lifecycle have been
getting a number of issues related to flannel (either in kubeadm or
kops tickets) and we don't have good answers for the users as the
project is not actively maintained.
- Add note that issues for CNI should be logged in the respective issue
trackers and that Calico is the only CNI we e2e test kubeadm against.
So recommended approach would be also move to Calico CNI.
I was using K8s 1.19.7 with flannel without any error, As soon as I upgraded to 1.21.1 it start showing above mentioned error and following fix works for me
firewall-cmd --permanent --zone=trusted --add-source=10.244.0.0/16
firewall-cmd --reload

Kubernetes using KVM instances on OpenStack via KubeAdm

I have successfully deployed a "working" Kubernetes cluster using the Horizon interface to create the Linux instances:
Having configured the hosts according to: https://kubernetes.io/docs/setup/independent/high-availability/
I can now say I have a Kubernetes cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-apiserver-1 Ready master 1d v1.12.2
kube-apiserver-2 Ready master 1d v1.12.2
kube-apiserver-3 Ready master 1d v1.12.2
kube-node-1 Ready <none> 21h v1.12.2
kube-node-2 Ready <none> 21h v1.12.2
kube-node-3 Ready <none> 21h v1.12.2
kube-node-4 Ready <none> 21h v1.12.2
However, getting beyond this point has proven to be quite a struggle. I can not create usable services and coredns which is an essential component seems unusable:
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-4gdnc 0/1 CrashLoopBackOff 288 23h
coredns-576cbf47c7-x4h4v 0/1 CrashLoopBackOff 288 23h
kube-apiserver-kube-apiserver-1 1/1 Running 0 1d
kube-apiserver-kube-apiserver-2 1/1 Running 0 1d
kube-apiserver-kube-apiserver-3 1/1 Running 0 1d
kube-controller-manager-kube-apiserver-1 1/1 Running 3 1d
kube-controller-manager-kube-apiserver-2 1/1 Running 1 1d
kube-controller-manager-kube-apiserver-3 1/1 Running 0 1d
kube-flannel-ds-amd64-2zdtd 1/1 Running 0 20h
kube-flannel-ds-amd64-7l5mr 1/1 Running 0 20h
kube-flannel-ds-amd64-bmvs9 1/1 Running 0 1d
kube-flannel-ds-amd64-cmhkg 1/1 Running 0 1d
...
Errors in the pod indicate that it cannot reach the kubernetes service:
$ kubectl -n kube-system logs coredns-576cbf47c7-4gdnc
E1121 18:04:48.928055 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:04:48.928688 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:04:48.928917 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.929869 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.930819 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.931517 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.932159 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.932722 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.933179 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2018/11/21 18:06:07 [INFO] SIGTERM: Shutting down servers then terminating
E1121 18:06:21.933058 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:06:21.934010 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:06:21.935107 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
$ kubectl -n kube-system describe pod/coredns-576cbf47c7-dk7sh
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned kube-system/coredns-576cbf47c7-dk7sh to kube-node-3
Normal Pulling 25m kubelet, kube-node-3 pulling image "k8s.gcr.io/coredns:1.2.2"
Normal Pulled 25m kubelet, kube-node-3 Successfully pulled image "k8s.gcr.io/coredns:1.2.2"
Normal Created 20m (x3 over 25m) kubelet, kube-node-3 Created container
Normal Killing 20m (x2 over 22m) kubelet, kube-node-3 Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 20m (x2 over 22m) kubelet, kube-node-3 Container image "k8s.gcr.io/coredns:1.2.2" already present on machine
Normal Started 20m (x3 over 25m) kubelet, kube-node-3 Started container
Warning Unhealthy 4m (x36 over 24m) kubelet, kube-node-3 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 17s (x22 over 8m) kubelet, kube-node-3 Back-off restarting failed container
The kubernetes service is there and seems to be properly autoconfigured:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h
$ kubectl describe svc/kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 192.168.5.19:6443,192.168.5.24:6443,192.168.5.29:6443
Session Affinity: None
Events: <none>
$ kubectl get endpoints
NAME ENDPOINTS AGE
kubernetes 192.168.5.19:6443,192.168.5.24:6443,192.168.5.29:6443 23h
I have a nagging suspicion that I am missing something in the network layer and that this issue has something to do with Neutron. There are plenty of HOWTOs on how to install Kubernetes using other tools and how to install it in OpenStack but I have yet to find one guide that explains how to install it by creating KVMs using the Horizon interface and dealing with security groups and network issues. By the way, ALL IPv4/TCP ports are open between the Masters and Nodes.
Is there anyone out there with a guide that explains this scenario?
The issue here was a polluted etcd cluster. As soon as I rebuilt the EXTERNAL etcd cluster and started from scratch using these instructions: https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd all items were working as expected. There does not seem to be a tool available to reset the etcd entries for a flannel pod network.

Cannot get Traefik to work on Centos 7.5 with Kubernetes

Centos 7.5
I am trying to get Traefik installed in Kubernetes using the guide from here.
I performed the following:
kubectl apply -f https://raw.githubusercontent.com/containous/traefik/master/examples/k8s/traefik-rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/containous/traefik/master/examples/k8s/traefik-ds.yaml
and then checked my pods receiving the following:
kubectl --namespace=kube-system get pods
coredns-78fcdf6894-5bslk 0/1 CrashLoopBackOff 626 2d
coredns-78fcdf6894-fxr28 1/1 Running 627 2d
etcd-nw-h-cos7-n1.natiki 1/1 Running 0 2d
kube-apiserver-nw-h-cos7-n1.natiki 1/1 Running 0 2d
kube-controller-manager-nw-h-cos7-n1.natiki 1/1 Running 0 2d
kube-flannel-ds-amd64-6t2j6 1/1 Running 0 2d
kube-flannel-ds-amd64-gxpg5 1/1 Running 0 2d
kube-flannel-ds-amd64-nf6bv 1/1 Running 0 2d
kube-proxy-5n9lm 1/1 Running 0 2d
kube-proxy-62kkp 1/1 Running 0 2d
kube-proxy-t7dfq 1/1 Running 0 2d
kube-scheduler-nw-h-cos7-n1.natiki 1/1 Running 0 2d
traefik-ingress-controller-7vbkz 1/1 Running 0 41m
traefik-ingress-controller-9qld7 1/1 Running 0 41m
Now as I understand it I should be able to do a basic http connection on the master node as: http://localhost and get to the Traefik dashboard however I get a page not found. Same if I try http://localhost:8001/api/v1/proxy/namespaces/default/services/traefik
I also tried to proxy the ingress as:
kubectl --namespace=kube-system port-forward traefik-ingress-controller-7vbkz 8080:8001
but when I access it that returns:
Handling connection for 8080
E0801 19:26:49.332072 12575 portforward.go:331] an error occurred forwarding 8080 -> 8001: error forwarding port 8001 to pod 46a5574cef4397d0d58c453883c91fc1df5eff94e68234587077f9a1c65d83b5, uid : exit status 1: 2018/08/01 19:26:49 socat[17624] E connect(5, AF=2 127.0.0.1:8001, 16): Connection refused
My topology is as follows:
Master 10.99.99.101
Slave1 10.99.99.102
Slagve2 10.99.99.103
When I look at the ingress logs I see:
E0730 22:02:57.030684 1 reflector.go:205] github.com/containous/traefik/vendor/k8s.io/client-go/informers/factory.go:86: Failed to list *v1beta1.Ingress: Get https://10.96.0.1:443/apis/extensions/v1beta1/ingresses?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
It appears that traefik is trying to connect to 10.96.0.1? No idea why. Perhaps the CoreDNS is the issue because looking at the logs for that pod I see:
kubectl --namespace=kube-system logs coredns-78fcdf6894-5bslk
E0801 09:32:21.267810 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0801 09:32:21.267943 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:320: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0801 09:32:21.268219 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:315: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I am wondering if this has to do with my use of Flannel or something else? Can someone please help me as to how I get to the dashboard to display?

kube-dns Failed to list *v1.Endpoints getsockopt: connection refused

I have a kubernetes cluster (v1.10) using flannel (not sure if relevant, might be) as CNI provider. Trying to apply kube-dns but it goes to CrashLoopBackOff and the logs for the kubedns pod show, repeatedly:
I0423 17:46:47.045712 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:47.545729 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:48.045723 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:48.545749 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E0423 17:46:49.019286 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
E0423 17:46:49.019325 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
I0423 17:46:49.045731 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
F0423 17:46:49.545707 1 dns.go:167] Timeout waiting for initialization
Nothing in my kube-dns manifest refers to port 443 and kube-apiserver is configured for 6443. What is it trying to get a connection to that is being refused?
I also don't know whether it has anything to do with the kube-dns pod having an ip of 10.88.0.3:
kubectl -n kube-system -o wide get pods
NAME READY STATUS RESTARTS AGE IP NODE
kube-dns-564f9d98-lt9js 2/3 CrashLoopBackOff 13 18m 10.88.0.3 worker1
kube-flannel-ds-5bqm6 1/1 Running 0 35m 10.240.0.12 controller2
kube-flannel-ds-djmld 1/1 Running 0 35m 10.240.0.11 controller1
kube-flannel-ds-nbfhp 1/1 Running 0 35m 10.240.0.23 worker3
kube-flannel-ds-prxdr 1/1 Running 0 35m 10.240.0.22 worker2
kube-flannel-ds-x9cdq 1/1 Running 0 35m 10.240.0.21 worker1
kube-flannel-ds-zjbgb 1/1 Running 0 35m 10.240.0.13 controller3
Again, where is this coming from? It's not something I have configured and it does not sit within either of my service network or pod network CIDR ranges:
kubernetes_dns_domain: kubernetes.local
kubernetes_dns_ip: "{{ kubernetes_cluster_subnet }}.10"
kubernetes_cluster_subnet: 10.96.0
kubernetes_pod_network_cidr: 10.244.0.0/16
kubernetes_service_ip: "{{ kubernetes_cluster_subnet }}.1"
kubernetes_service_ip_range: "{{ kubernetes_cluster_subnet }}.0/24"
kubernetes_service_node_port_range: 30000-32767
kubernetes_secure_port: 6443
I'm thoroughly confused and would be grateful of any explanations as to what is going on.
kube_dns_version: 1.14.10
flannel_version: v0.10.0