Weave CrashLoopBackOff on fresh HA Cluster - kubernetes

I create a HA clusters with kubeadm by following this guides:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
https://medium.com/faun/configuring-ha-kubernetes-cluster-on-bare-metal-servers-with-kubeadm-1-2-1e79f0f7857b
I've already have the ETCD nodes up and runnig, the APIserver through the HAproxy and keepalive running.
And 1 master node running with weave-net network.
I use this subnets
networking:
podSubnet: 192.168.240.0/22
serviceSubnet: 192.168.244.0/22
But when I join the second master node to the cluster, the weave pod created got CrashLoopBackOff.
I run the weave-net plugin with this line:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=192.168.240.0/21"
I also found that the /etc/cni/net.d isn't created by kubelet when apply weave conf.
The Master Nodes
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubemaster01 Ready master 17h v1.18.1 192.168.129.137 <none> Ubuntu 18.04.4 LTS 4.15.0-96-generic docker://19.3.8
kubemaster02 Ready master 83m v1.18.1 192.168.129.138 <none> Ubuntu 18.04.4 LTS 4.15.0-91-generic docker://19.3.8
The Pods
oot#kubemaster01:~# kubectl get pods,svc --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/coredns-66bff467f8-kh4mh 0/1 Running 0 18h 192.168.240.3 kubemaster01 <none> <none>
kube-system pod/coredns-66bff467f8-xhzjk 0/1 Running 0 18h 192.168.240.2 kubemaster01 <none> <none>
kube-system pod/kube-apiserver-kubemaster01 1/1 Running 0 16h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/kube-apiserver-kubemaster02 1/1 Running 0 104m 192.168.129.138 kubemaster02 <none> <none>
kube-system pod/kube-controller-manager-kubemaster01 1/1 Running 0 16h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/kube-controller-manager-kubemaster02 1/1 Running 0 104m 192.168.129.138 kubemaster02 <none> <none>
kube-system pod/kube-proxy-sct5x 1/1 Running 0 18h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/kube-proxy-tsr65 1/1 Running 0 104m 192.168.129.138 kubemaster02 <none> <none>
kube-system pod/kube-scheduler-kubemaster01 1/1 Running 2 18h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/kube-scheduler-kubemaster02 1/1 Running 0 104m 192.168.129.138 kubemaster02 <none> <none>
kube-system pod/weave-net-4zdg6 2/2 Running 0 3h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/weave-net-bf8mq 1/2 CrashLoopBackOff 38 104m 192.168.129.138 kubemaster02 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 192.168.244.1 <none> 443/TCP 20h <none>
kube-system service/kube-dns ClusterIP 192.168.244.10 <none> 53/UDP,53/TCP,9153/TCP 18h k8s-app=kube-dns
IP Routes in Master Nodes
root#kubemaster01:~# ip r
default via 192.168.128.1 dev ens3 proto static
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.128.0/21 dev ens3 proto kernel scope link src 192.168.129.137
192.168.240.0/21 dev weave proto kernel scope link src 192.168.240.1
root#kubemaster02:~# ip r
default via 192.168.128.1 dev ens3 proto static
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.128.0/21 dev ens3 proto kernel scope link src 192.168.129.138
Description of the weave pod running on the second master node
root#kubemaster01:~# kubectl describe pod/weave-net-bf8mq -n kube-system
Name: weave-net-bf8mq
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: kubemaster02./192.168.129.138
Start Time: Fri, 17 Apr 2020 12:28:09 -0300
Labels: controller-revision-hash=79478b764c
name=weave-net
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 192.168.129.138
IPs:
IP: 192.168.129.138
Controlled By: DaemonSet/weave-net
Containers:
weave:
Container ID: docker://93bff012aaebb34dc338001bf73798b5eeefe32a4d50b82731b0ef003c63c786
Image: docker.io/weaveworks/weave-kube:2.6.2
Image ID: docker-pullable://weaveworks/weave-kube#sha256:a1f58e75f24f02e1c2fa2a95b9e55a1b94930f455e75bd5f4799e1a55671971f
Port: <none>
Host Port: <none>
Command:
/home/weave/launch.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 17 Apr 2020 14:15:59 -0300
Finished: Fri, 17 Apr 2020 14:16:29 -0300
Ready: False
Restart Count: 39
Requests:
cpu: 10m
Readiness: http-get http://127.0.0.1:6784/status delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HOSTNAME: (v1:spec.nodeName)
IPALLOC_RANGE: 192.168.240.0/21
Mounts:
/host/etc from cni-conf (rw)
/host/home from cni-bin2 (rw)
/host/opt from cni-bin (rw)
/host/var/lib/dbus from dbus (rw)
/lib/modules from lib-modules (rw)
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-xp46t (ro)
/weavedb from weavedb (rw)
weave-npc:
Container ID: docker://4de9116cae90cf3f6d59279dd1531938b102adcdd1b76464e5bbe2f2b013b060
Image: docker.io/weaveworks/weave-npc:2.6.2
Image ID: docker-pullable://weaveworks/weave-npc#sha256:5694b0b77003780333ccd1fc79810469434779cd86e926a17675cc5b70470459
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 17 Apr 2020 12:28:24 -0300
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Environment:
HOSTNAME: (v1:spec.nodeName)
Mounts:
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-xp46t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
weavedb:
Type: HostPath (bare host directory volume)
Path: /var/lib/weave
HostPathType:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
HostPathType:
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
HostPathType:
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
HostPathType:
dbus:
Type: HostPath (bare host directory volume)
Path: /var/lib/dbus
HostPathType:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
weave-net-token-xp46t:
Type: Secret (a volume populated by a Secret)
SecretName: weave-net-token-xp46t
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 11m (x17 over 81m) kubelet, kubemaster02. Container image "docker.io/weaveworks/weave-kube:2.6.2" already present on machine
Warning BackOff 85s (x330 over 81m) kubelet, kubemaster02. Back-off restarting failed container
The logs files complains about timeout, but that is because there is no network running.
root#kubemaster02:~# kubectl logs weave-net-bf8mq -name weave -n kube-system
FATA: 2020/04/17 17:22:04.386233 [kube-peers] Could not get peers: Get https://192.168.244.1:443/api/v1/nodes: dial tcp 192.168.244.1:443: i/o timeout
Failed to get peers
root#kubemaster02:~# kubectl logs weave-net-bf8mq -name weave-npc -n kube-system | more
INFO: 2020/04/17 15:28:24.851287 Starting Weaveworks NPC 2.6.2; node name "kubemaster02"
INFO: 2020/04/17 15:28:24.851469 Serving /metrics on :6781
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `NFLOG'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `BASE'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `PCAP'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:981 building new pluginstance stack: 'log1:NFLOG,base1:BASE,pcap1:PCAP'
WARNING: scheduler configuration failed: Function not implemented
DEBU: 2020/04/17 15:28:24.887619 Got list of ipsets: []
ERROR: logging before flag.Parse: E0417 15:28:54.923915 19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:321: Failed to list *v1.Pod: Get https://192.168.244.1:443/api/v1/pods?limit=500&resourceVersion=0: dial
tcp 192.168.244.1:443: i/o timeout
ERROR: logging before flag.Parse: E0417 15:28:54.923895 19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:322: Failed to list *v1.NetworkPolicy: Get https://192.168.244.1:443/apis/networking.k8s.io/v1/networkpo
licies?limit=500&resourceVersion=0: dial tcp 192.168.244.1:443: i/o timeout
ERROR: logging before flag.Parse: E0417 15:28:54.924071 19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:320: Failed to list *v1.Namespace: Get https://192.168.244.1:443/api/v1/namespaces?limit=500&resourceVer
sion=0: dial tcp 192.168.244.1:443: i/o timeout
Any advice or suggestions?
Regards.

The error was in a misconfiguration with CRI-O as CRI runtime. Follow this installation guide corrects the issue.
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

Related

kube-proxy fails with CrashLoopBackOff in minikube

I am running k8s using minikube version v1.18.0 on Ubuntu 20. But kube-proxy fails with CrashLoopBackOff status. What could be the issue? I am using 1.18 version for k8s server and client with minikube. I tried reinstalling the cluster but the issue is persistent. logs are attached for get pod and describe pod.
/home/ravi> kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-66bff467f8-gtsl7 0/1 Running 0 10m 172.17.0.2 minikube <none> <none>
kube-system etcd-minikube 1/1 Running 0 10m 192.168.49.2 minikube <none> <none>
kube-system kube-apiserver-minikube 1/1 Running 0 10m 192.168.49.2 minikube <none> <none>
kube-system kube-controller-manager-minikube 1/1 Running 0 10m 192.168.49.2 minikube <none> <none>
kube-system kube-proxy-d5dqf 0/1 CrashLoopBackOff 6 10m 192.168.49.2 minikube <none> <none>
kube-system kube-scheduler-minikube 1/1 Running 0 10m 192.168.49.2 minikube <none> <none>
kube-system storage-provisioner 0/1 CrashLoopBackOff 6 11m 192.168.49.2 minikube <none> <none>
/home/ravi>
/home/ravi>
/home/ravi>kubectl describe pod -n kube-system kube-proxy-d5dqf
Name: kube-proxy-d5dqf
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: minikube/192.168.49.2
Start Time: Fri, 13 May 2022 14:08:26 +0530
Labels: controller-revision-hash=5bdc57b48f
k8s-app=kube-proxy
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 192.168.49.2
IPs:
IP: 192.168.49.2
Controlled By: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID: docker://dc1554848405254d1ba463fc4ae2ec98fb7e2db1472d3b143b71256dcb7812c3
Image: k8s.gcr.io/kube-proxy:v1.18.20
Image ID: docker://sha256:27f8b8d51985f755cfb3ffea424fa34865cc0da63e99378d8202f923c3c5a8ba
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/kube-proxy
--config=/var/lib/kube-proxy/config.conf
--hostname-override=$(NODE_NAME)
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 13 May 2022 14:19:30 +0530
Finished: Fri, 13 May 2022 14:19:30 +0530
Ready: False
Restart Count: 7
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-proxy-token-z2nb4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-proxy:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
kube-proxy-token-z2nb4:
Type: Secret (a volume populated by a Secret)
SecretName: kube-proxy-token-z2nb4
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations:
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned kube-system/kube-proxy-d5dqf to minikube
Normal Pulled 13m (x5 over 15m) kubelet, minikube Container image "k8s.gcr.io/kube-proxy:v1.18.20" already present on machine
Normal Created 13m (x5 over 15m) kubelet, minikube Created container kube-proxy
Normal Started 13m (x5 over 15m) kubelet, minikube Started container kube-proxy
Warning BackOff 5m22s (x49 over 15m) kubelet, minikube Back-off restarting failed container
/home/ravi>
/home/ravi>kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.20", GitCommit:"1f3e19b7beb1cc0110255668c4238ed63dadb7ad", GitTreeState:"clean", BuildDate:"2021-06-16T12:51:17Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
/home/ravi>
/home/ravi/>kubectl logs -n kube-system kube-proxy-d5dqf
W0513 10:01:01.652298 1 server_others.go:559] Unknown proxy mode "", assuming iptables proxy
I0513 10:01:01.673427 1 node.go:136] Successfully retrieved node IP: 192.168.49.2
I0513 10:01:01.673482 1 server_others.go:186] Using iptables Proxier.
I0513 10:01:01.673968 1 server.go:583] Version: v1.18.20
I0513 10:01:01.674955 1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
F0513 10:01:01.675040 1 server.go:497] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
/home/ravi>

How to resolve this error that nginx-ingress-controller start fail in my k8s cluster?

Rancher v2.4.2
kubernetes version: v1.17.4
In my k8s cluster,nginx-ingress-controller doesn't work and restart always.I don't get anything useful information in the logs, thanks for your help.
cluster nodes:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready controlplane,etcd,worker 18d v1.17.4
master2 Ready controlplane,etcd,worker 17d v1.17.4
node1 Ready worker 17d v1.17.4
node2 Ready worker 17d v1.17.4
cluster pods in ingress-nginx namespace
> kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
default-http-backend-5bb77998d7-k7gdh 1/1 Running 1 17d
nginx-ingress-controller-6l4jh 0/1 Running 10 27m
nginx-ingress-controller-bh2pg 1/1 Running 0 63m
nginx-ingress-controller-drtzx 1/1 Running 0 63m
nginx-ingress-controller-qndbw 1/1 Running 0 63m
the pod logs of nginx-ingress-controller-6l4jh
> kubectl logs nginx-ingress-controller-6l4jh -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: nginx-0.25.1-rancher1
Build:
Repository: https://github.com/rancher/ingress-nginx.git
nginx version: openresty/1.15.8.1
-------------------------------------------------------------------------------
>
describe info
> kubectl describe pod nginx-ingress-controller-6l4jh -n ingress-nginx
Name: nginx-ingress-controller-6l4jh
Namespace: ingress-nginx
Priority: 0
Node: node2/172.26.13.11
Start Time: Tue, 19 Apr 2022 07:12:16 +0000
Labels: app=ingress-nginx
controller-revision-hash=758cb9dbbc
pod-template-generation=8
Annotations: cattle.io/timestamp: 2022-04-19T07:08:51Z
field.cattle.io/ports:
[[{"containerPort":80,"dnsName":"nginx-ingress-controller-hostport","hostPort":80,"kind":"HostPort","name":"http","protocol":"TCP","source...
field.cattle.io/publicEndpoints:
[{"addresses":["172.26.13.130"],"nodeId":"c-wv692:m-d5802d05bbf0","port":80,"protocol":"TCP"},{"addresses":["172.26.13.130"],"nodeId":"c-w...
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Running
IP: 172.26.13.11
IPs:
IP: 172.26.13.11
Controlled By: DaemonSet/nginx-ingress-controller
Containers:
nginx-ingress-controller:
Container ID: docker://09a6248edb921b9c9cbab678c793fe1cc3d28322ea6abbb8f15c899351ce4b40
Image: 172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
Image ID: docker-pullable://172.26.13.133:5000/rancher/nginx-ingress-controller#sha256:fe50ceea3d1a0bc9a7ccef8d5845c9a30b51f608e411467862dff590185a47d2
Ports: 80/TCP, 443/TCP
Host Ports: 80/TCP, 443/TCP
Args:
/nginx-ingress-controller
--default-backend-service=$(POD_NAMESPACE)/default-http-backend
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Tue, 19 Apr 2022 07:40:12 +0000
Finished: Tue, 19 Apr 2022 07:41:32 +0000
Ready: False
Restart Count: 11
Liveness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-6l4jh (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-2kdbj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-2kdbj:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-2kdbj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoExecute
:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-6l4jh to node2
Normal Pulled 27m (x3 over 30m) kubelet, node2 Container image "172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1" already present on machine
Normal Created 27m (x3 over 30m) kubelet, node2 Created container nginx-ingress-controller
Normal Started 27m (x3 over 30m) kubelet, node2 Started container nginx-ingress-controller
Normal Killing 27m (x2 over 28m) kubelet, node2 Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 25m (x10 over 29m) kubelet, node2 Liveness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning Unhealthy 10m (x21 over 29m) kubelet, node2 Readiness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning BackOff 8s (x69 over 20m) kubelet, node2 Back-off restarting failed container
>
It sounds like the ingress controller pod fails the liveness/readiness checks but looks like only on a certain node. You could try:
check the node for firewall on that port
update to newer version than nginx-0.25.1

Kubernetes Dashboard CrashLoopBackOff: timeout error on Raspberry Pi cluster

Should be a simple task, I simply want to run the Kubernetes Dashboard on a clean install of Kubernetes on a Raspberry Pi cluster.
What I've done:
Setup the initial cluster (hostname, static ip, cgroup, swapspace, install and configure docker, install kubernetes, setup kubernetes network and join nodes)
I have flannel installed
I have applied the dashboard
Bunch of random testing trying to figure this out
Obviously, as seen below, the container in the dashboard pod is not working because it cannot access kubernetes-dashboard-csrf. I have no idea why this cannot be accessed, my only thought is that I missed a step when setting up the cluster. I've followed about 6 different guides without success, prioritizing the official guide. I have also seen quite a few people having the same or similar issues that most have not posted a resolution. Thanks!
Nodes: kubectl get nodes
NAME STATUS ROLES AGE VERSION
gus3 Ready <none> 346d v1.23.1
juliet3 Ready <none> 346d v1.23.1
shawn4 Ready <none> 346d v1.23.1
vick4 Ready control-plane,master 346d v1.23.1
All Pods: kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-74ff55c5b-7j2xg 1/1 Running 27 346d
kube-system coredns-74ff55c5b-cb2x8 1/1 Running 27 346d
kube-system etcd-vick4 1/1 Running 2 169m
kube-system kube-apiserver-vick4 1/1 Running 2 169m
kube-system kube-controller-manager-vick4 1/1 Running 2 169m
kube-system kube-flannel-ds-gclmp 1/1 Running 0 11m
kube-system kube-flannel-ds-hshjv 1/1 Running 0 12m
kube-system kube-flannel-ds-kdd4w 1/1 Running 0 11m
kube-system kube-flannel-ds-wzhkt 1/1 Running 0 10m
kube-system kube-proxy-4t25v 1/1 Running 26 346d
kube-system kube-proxy-b6vbx 1/1 Running 26 346d
kube-system kube-proxy-jgj4s 1/1 Running 27 346d
kube-system kube-proxy-n65sl 1/1 Running 26 346d
kube-system kube-scheduler-vick4 1/1 Running 2 169m
kubernetes-dashboard dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 77m
kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 18 77m
Resources: kubectl get all -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-5b8896d7fc-99wfk 1/1 Running 0 79m
pod/kubernetes-dashboard-897c7599f-qss5p 0/1 CrashLoopBackOff 19 79m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 172.20.0.191 <none> 8000/TCP 79m
service/kubernetes-dashboard ClusterIP 172.20.0.15 <none> 443/TCP 79m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/dashboard-metrics-scraper 1/1 1 1 79m
deployment.apps/kubernetes-dashboard 0/1 1 0 79m
NAME DESIRED CURRENT READY AGE
replicaset.apps/dashboard-metrics-scraper-5b8896d7fc 1 1 1 79m
replicaset.apps/kubernetes-dashboard-897c7599f 1 1 0 79m
Notice CrashLoopBackOff
Pod Details: kubectl describe pods kubernetes-dashboard-897c7599f-qss5p -n kubernetes-dashboard
Name: kubernetes-dashboard-897c7599f-qss5p
Namespace: kubernetes-dashboard
Priority: 0
Node: shawn4/192.168.10.71
Start Time: Fri, 17 Dec 2021 18:52:15 +0000
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=897c7599f
Annotations: <none>
Status: Running
IP: 172.19.1.75
IPs:
IP: 172.19.1.75
Controlled By: ReplicaSet/kubernetes-dashboard-897c7599f
Containers:
kubernetes-dashboard:
Container ID: docker://894a354e40ca1a95885e149dcd75415e0f186ead3f2e05ec0787f4b1c7a29622
Image: kubernetesui/dashboard:v2.4.0
Image ID: docker-pullable://kubernetesui/dashboard#sha256:526850ae4ea9aba360e72b6df69fd3126b129d446efe83ac5250282b85f95b7f
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kubernetes-dashboard
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 17 Dec 2021 20:10:19 +0000
Finished: Fri, 17 Dec 2021 20:10:49 +0000
Ready: False
Restart Count: 19
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubernetes-dashboard-token-wq9m8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kubernetes-dashboard-token-wq9m8:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-token-wq9m8
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 21s (x327 over 79m) kubelet Back-off restarting failed container
Logs: kubectl logs -f -n kubernetes-dashboard kubernetes-dashboard-897c7599f-qss5p
2021/12/17 20:10:19 Starting overwatch
2021/12/17 20:10:19 Using namespace: kubernetes-dashboard
2021/12/17 20:10:19 Using in-cluster config to connect to apiserver
2021/12/17 20:10:19 Using secret token for csrf signing
2021/12/17 20:10:19 Initializing csrf token from kubernetes-dashboard-csrf secret
panic: Get "https://172.20.0.1:443/api/v1/namespaces/kubernetes-dashboard/secrets/kubernetes-dashboard-csrf": dial tcp 172.20.0.1:443: i/o timeout
goroutine 1 [running]:
github.com/kubernetes/dashboard/src/app/backend/client/csrf.(*csrfTokenManager).init(0x400055fae8)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:41 +0x350
github.com/kubernetes/dashboard/src/app/backend/client/csrf.NewCsrfTokenManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/csrf/manager.go:66
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).initCSRFKey(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:502 +0x8c
github.com/kubernetes/dashboard/src/app/backend/client.(*clientManager).init(0x40001fc080)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:470 +0x40
github.com/kubernetes/dashboard/src/app/backend/client.NewClientManager(...)
/home/runner/work/dashboard/dashboard/src/app/backend/client/manager.go:551
main.main()
/home/runner/work/dashboard/dashboard/src/app/backend/dashboard.go:95 +0x1dc
If you need any more information please ask!
UPDATE 12/29/21:
Fixed this issue by reinstalling the cluster to the newest versions of Kubernetes and Ubuntu.
Turned out there were several issues:
I was using Ubuntu Buster which is deprecated.
My client/server Kubernetes versions were +/-0.3 out of sync
I was following outdated instructions
I reinstalled the whole cluster following Kubernetes official guide and, with a few snags along the way, it works!

JupyterHub not launching on Helm | K8s

I have a metalLB loadbalancer, k8s clusters (one master and one worker) v1.18.5, helm 3.7, and nfs dynamic volume provisioning using helm. I run up a jupyterhub instance with helm. Within a minute everything is set up but when I use the external IP to open JupyterHub on my browser, noting loads up. here is my kubectl get all
pod/continuous-image-puller-4l5gj 1/1 Running 0 23s
pod/hub-6c9cb48df8-k5t4w 1/1 Running 0 23s
pod/nfs-subdir-external-provisioner-789697969b-hqp46 1/1 Running 0 23h
pod/nginx2-669c86457c-hc5mv 1/1 Running 0 35h
pod/proxy-66cb767659-svwbv 1/1 Running 0 23s
pod/user-scheduler-6d4698dd59-wqw9l 1/1 Running 0 23s
pod/user-scheduler-6d4698dd59-zk4c7 1/1 Running 0 23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/hub ClusterIP 10.111.196.55 <none> 8081/TCP 23s
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 39h
service/nginx2 LoadBalancer 10.106.241.85 10.0.3.240 80:30746/TCP 32h
service/proxy-api ClusterIP 10.109.211.71 <none> 8001/TCP 23s
service/proxy-public LoadBalancer 10.111.233.85 10.0.3.241 80:31336/TCP 23s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/continuous-image-puller 1 1 1 1 1 <none> 23s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hub 1/1 1 1 23s
deployment.apps/nfs-subdir-external-provisioner 1/1 1 1 23h
deployment.apps/nginx2 1/1 1 1 35h
deployment.apps/proxy 1/1 1 1 23s
deployment.apps/user-scheduler 2/2 2 2 23s
NAME DESIRED CURRENT READY AGE
replicaset.apps/hub-6c9cb48df8 1 1 1 23s
replicaset.apps/nfs-subdir-external-provisioner-789697969b 1 1 1 23h
replicaset.apps/nginx2-669c86457c 1 1 1 35h
replicaset.apps/proxy-66cb767659 1 1 1 23s
replicaset.apps/user-scheduler-6d4698dd59 2 2 2 23s
NAME READY AGE
statefulset.apps/user-placeholder 0/0 23s
Also, below is my storage class for reference: kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 23h
I will not paste the config file as it is very large, basically what I did was
helm show values jupyterhub/jupyterhub > /tmp/jupyterhub.yaml
(after changing some values)
helm install jupyterhub jupyterhub/jupyterhub --values /tmp/jupyterhub.yaml
The only few things I changed was the security-key (hex [as mentioned on the website]) along with writing nfs-client wherever it said storageClass and storageClassName and perhaps altering the storage size (1Gi/2Gi). That's all. The LoadBalancer works fine because I ran nginx and I can easily open it up on my browser. So I decided to check the JupyterHub pod's by first getting the pod's name using: kubectl get pods
NAME READY STATUS RESTARTS AGE
continuous-image-puller-4l5gj 1/1 Running 0 20m
hub-6c9cb48df8-k5t4w 1/1 Running 0 20m
nfs-subdir-external-provisioner-789697969b-hqp46 1/1 Running 0 23h
nginx2-669c86457c-hc5mv 1/1 Running 0 35h
proxy-66cb767659-svwbv 1/1 Running 0 20m
user-scheduler-6d4698dd59-wqw9l 1/1 Running 0 20m
user-scheduler-6d4698dd59-zk4c7 1/1 Running 0 20m
root#master:/home/ubuntu#
and then using kubectl describe pod hub-6c9cb48df8-k5t4w -n default which gave me this:
Name: hub-6c9cb48df8-k5t4w
Namespace: default
Priority: 0
Node: worker/10.0.0.126
Start Time: Sat, 27 Nov 2021 10:21:43 +0000
Labels: app=jupyterhub
component=hub
hub.jupyter.org/network-access-proxy-api=true
hub.jupyter.org/network-access-proxy-http=true
hub.jupyter.org/network-access-singleuser=true
pod-template-hash=6c9cb48df8
release=jupyterhub
Annotations: checksum/config-map: f746d7e563a064e9158fe6f7f59bdbd463ed24ad7a927d75a1f18c022c3afeaf
checksum/secret: 926186a1b18e5cb9aa5b8c0a177f379299bcf0f05ac4de17d1958422054d15e5
cni.projectcalico.org/podIP: 192.168.171.97/32
cni.projectcalico.org/podIPs: 192.168.171.97/32
Status: Running
IP: 192.168.171.97
IPs:
IP: 192.168.171.97
Controlled By: ReplicaSet/hub-6c9cb48df8
Containers:
hub:
Container ID: docker://1d5e3a812f9712f6d59c09d855b034e2f6bc3e058bad4932db87145ec09f70d1
Image: jupyterhub/k8s-hub:1.2.0
Image ID: docker-pullable://jupyterhub/k8s-hub#sha256:e4770285aaf7230b930643986221757c2cc2e9420f5e21ac892582c96a57ce1c
Port: 8081/TCP
Host Port: 0/TCP
Args:
jupyterhub
--config
/usr/local/etc/jupyterhub/jupyterhub_config.py
--upgrade-db
State: Running
Started: Sat, 27 Nov 2021 10:21:45 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:http/hub/health delay=300s timeout=3s period=10s #success=1 #failure=30
Readiness: http-get http://:http/hub/health delay=0s timeout=1s period=2s #success=1 #failure=1000
Environment:
PYTHONUNBUFFERED: 1
HELM_RELEASE_NAME: jupyterhub
POD_NAMESPACE: default (v1:metadata.namespace)
CONFIGPROXY_AUTH_TOKEN: <set to the key 'hub.config.ConfigurableHTTPProxy.auth_token' in secret 'hub'> Optional: false
Mounts:
/srv/jupyterhub from pvc (rw)
/usr/local/etc/jupyterhub/config/ from config (rw)
/usr/local/etc/jupyterhub/jupyterhub_config.py from config (rw,path="jupyterhub_config.py")
/usr/local/etc/jupyterhub/secret/ from secret (rw)
/usr/local/etc/jupyterhub/z2jh.py from config (rw,path="z2jh.py")
/var/run/secrets/kubernetes.io/serviceaccount from hub-token-zd25x (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: hub
Optional: false
secret:
Type: Secret (a volume populated by a Secret)
SecretName: hub
Optional: false
pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: hub-db-dir
ReadOnly: false
hub-token-zd25x:
Type: Secret (a volume populated by a Secret)
SecretName: hub-token-zd25x
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: hub.jupyter.org/dedicated=core:NoSchedule
hub.jupyter.org_dedicated=core:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 21m default-scheduler Successfully assigned default/hub-6c9cb48df8-k5t4w to worker
Normal Pulled 21m kubelet, worker Container image "jupyterhub/k8s-hub:1.2.0" already present on machine
Normal Created 21m kubelet, worker Created container hub
Normal Started 21m kubelet, worker Started container hub
Warning Unhealthy 21m (x3 over 21m) kubelet, worker Readiness probe failed: Get http://192.168.171.97:8081/hub/health: dial tcp 192.168.171.97:8081: connect: connection refused
So I know that the pod is unhealthy. But I do not have any other details to debug this. Any help on how to fix or debug this would be highly appreciated.
Thank you!

How to solve CoreDNS always stuck at "waiting for kubernetes"?

Vagrant, vm os: ubuntu/bionic64, swap disabled
Kubernetes version: 1.18.0
infrastructure: 1 haproxy node, 3 external etcd node and 3 kubernetes master node
Attempts: trying to setup ha rancher so I am setting up ha kubernetes cluster first using kubeadm by following the official doc
Expected behavior: all k8s components are up and be able to navigate to weave scope to see all nodes
Actual behavior: CoreDNS is still not ready even after installing CNI (Weave Net) so weave scope (the nice visualization ui) is not working unless networking is working properly (weave net and coredns).
# kubeadm config
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "172.16.0.30:6443"
etcd:
external:
caFile: /etc/rancher-certs/ca-chain.cert.pem
keyFile: /etc/rancher-certs/etcd.key.pem
certFile: /etc/rancher-certs/etcd.cert.pem
endpoints:
- https://172.16.0.20:2379
- https://172.16.0.21:2379
- https://172.16.0.22:2379
-------------------------------------------------------------------------------
# firewall
vagrant#rancher-0:~$ sudo ufw status
Status: active
To Action From
-- ------ ----
OpenSSH ALLOW Anywhere
Anywhere ALLOW 172.16.0.0/26
OpenSSH (v6) ALLOW Anywhere (v6)
-------------------------------------------------------------------------------
# no swap
vagrant#rancher-0:~$ free -h
total used free shared buff/cache available
Mem: 1.9G 928M 97M 1.4M 966M 1.1G
Swap: 0B 0B 0B
k8s diagnostic output:
vagrant#rancher-0:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rancher-0 Ready master 14m v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
rancher-1 Ready master 9m23s v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
rancher-2 Ready master 4m26s v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
vagrant#rancher-0:~$ kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cert-manager cert-manager ClusterIP 10.106.146.236 <none> 9402/TCP 17m
cert-manager cert-manager-webhook ClusterIP 10.102.162.87 <none> 443/TCP 17m
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 18m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 18m
weave weave-scope-app NodePort 10.96.110.153 <none> 80:30276/TCP 17m
vagrant#rancher-0:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager cert-manager-bd9d585bd-x8qpb 0/1 Pending 0 16m <none> <none> <none> <none>
cert-manager cert-manager-cainjector-76c6657c55-d8fpj 0/1 Pending 0 16m <none> <none> <none> <none>
cert-manager cert-manager-webhook-64b9b4fdfd-sspjx 0/1 Pending 0 16m <none> <none> <none> <none>
kube-system coredns-66bff467f8-9z4f8 0/1 Running 0 10m 10.32.0.2 rancher-1 <none> <none>
kube-system coredns-66bff467f8-zkk99 0/1 Running 0 16m 10.32.0.2 rancher-0 <none> <none>
kube-system kube-apiserver-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-apiserver-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-apiserver-rancher-2 1/1 Running 0 7m23s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-controller-manager-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-controller-manager-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-controller-manager-rancher-2 1/1 Running 0 7m24s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-proxy-grts7 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-proxy-jv9lm 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-proxy-z2lrc 1/1 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-scheduler-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-scheduler-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-scheduler-rancher-2 1/1 Running 0 7m23s 10.0.2.15 rancher-2 <none> <none>
kube-system weave-net-nnvkd 2/2 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
kube-system weave-net-pgxnq 2/2 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system weave-net-q22bh 2/2 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
weave weave-scope-agent-9gwj2 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
weave weave-scope-agent-mznp7 1/1 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
weave weave-scope-agent-v7jql 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
weave weave-scope-app-bc7444d59-cjpd8 0/1 Pending 0 16m <none> <none> <none> <none>
weave weave-scope-cluster-agent-5c5dcc8cb-ln4hg 0/1 Pending 0 16m <none> <none> <none> <none>
vagrant#rancher-0:~$ kubectl describe node rancher-0
Name: rancher-0
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=rancher-0
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 28 Jul 2020 09:24:17 +0000
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: rancher-0
AcquireTime: <unset>
RenewTime: Tue, 28 Jul 2020 09:35:33 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 28 Jul 2020 09:24:47 +0000 Tue, 28 Jul 2020 09:24:47 +0000 WeaveIsUp Weave pod has set this
MemoryPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:52 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.2.15
Hostname: rancher-0
Capacity:
cpu: 2
ephemeral-storage: 10098432Ki
hugepages-2Mi: 0
memory: 2040812Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 9306714916
hugepages-2Mi: 0
memory: 1938412Ki
pods: 110
System Info:
Machine ID: 9b1bc8a8ef2c4e5b844624a36302d877
System UUID: A282600C-28F8-4D49-A9D3-6F05CA16865E
Boot ID: 77746bf5-7941-4e72-817e-24f149172158
Kernel Version: 4.15.0-99-generic
OS Image: Ubuntu 18.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.12
Kubelet Version: v1.18.0
Kube-Proxy Version: v1.18.0
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-66bff467f8-zkk99 100m (5%) 0 (0%) 70Mi (3%) 170Mi (8%) 11m
kube-system kube-apiserver-rancher-0 250m (12%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-controller-manager-rancher-0 200m (10%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-proxy-jv9lm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-scheduler-rancher-0 100m (5%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system weave-net-q22bh 20m (1%) 0 (0%) 0 (0%) 0 (0%) 11m
weave weave-scope-agent-9gwj2 100m (5%) 0 (0%) 100Mi (5%) 2000Mi (105%) 11m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 770m (38%) 0 (0%)
memory 170Mi (8%) 2170Mi (114%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 11m kubelet, rancher-0 Starting kubelet.
Warning ImageGCFailed 11m kubelet, rancher-0 failed to get imageFs info: unable to find data in memory cache
Normal NodeHasSufficientMemory 11m (x3 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m (x3 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m (x2 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet, rancher-0 Updated Node Allocatable limit across pods
Normal Starting 11m kubelet, rancher-0 Starting kubelet.
Normal NodeHasSufficientMemory 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet, rancher-0 Updated Node Allocatable limit across pods
Normal Starting 11m kube-proxy, rancher-0 Starting kube-proxy.
Normal NodeReady 10m kubelet, rancher-0 Node rancher-0 status is now: NodeReady
vagrant#rancher-0:~$ kubectl exec -n kube-system weave-net-nnvkd -c weave -- /home/weave/weave --local status
Version: 2.6.5 (failed to check latest version - see logs; next check at 2020/07/28 15:27:34)
Service: router
Protocol: weave 1..2
Name: 5a:40:7b:be:35:1d(rancher-2)
Encryption: disabled
PeerDiscovery: enabled
Targets: 0
Connections: 0
Peers: 1
TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
vagrant#rancher-0:~$ kubectl logs weave-net-nnvkd -c weave -n kube-system
INFO: 2020/07/28 09:34:15.989759 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 ipalloc-init:consensus=0 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:5a:40:7b:be:35:1d nickname:rancher-2 no-dns:true port:6783]
INFO: 2020/07/28 09:34:15.989792 weave 2.6.5
INFO: 2020/07/28 09:34:16.178429 Bridge type is bridged_fastdp
INFO: 2020/07/28 09:34:16.178451 Communication between peers is unencrypted.
INFO: 2020/07/28 09:34:16.182442 Our name is 5a:40:7b:be:35:1d(rancher-2)
INFO: 2020/07/28 09:34:16.182499 Launch detected - using supplied peer list: []
INFO: 2020/07/28 09:34:16.196598 Checking for pre-existing addresses on weave bridge
INFO: 2020/07/28 09:34:16.204735 [allocator 5a:40:7b:be:35:1d] No valid persisted data
INFO: 2020/07/28 09:34:16.206236 [allocator 5a:40:7b:be:35:1d] Initialising via deferred consensus
INFO: 2020/07/28 09:34:16.206291 Sniffing traffic on datapath (via ODP)
INFO: 2020/07/28 09:34:16.210065 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2020/07/28 09:34:16.210471 Listening for metrics requests on 0.0.0.0:6782
INFO: 2020/07/28 09:34:16.275523 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.15.0-99-generic&flag_kubernetes-cluster-size=0&flag_kubernetes-cluster-uid=aca5a8cc-27ca-4e8f-9964-4cf3971497c6&flag_kubernetes-version=v1.18.6&os=linux&signature=7uMaGpuc3%2F8ZtHqGoHyCnJ5VfOJUmnL%2FD6UZSqWYxKA%3D&version=2.6.5: dial tcp: lookup checkpoint-api.weave.works on 10.96.0.10:53: write udp 10.0.2.15:43742->10.96.0.10:53: write: operation not permitted
INFO: 2020/07/28 09:34:17.052454 [kube-peers] Added myself to peer list &{[{96:cd:5b:7f:65:73 rancher-1} {5a:40:7b:be:35:1d rancher-2}]}
DEBU: 2020/07/28 09:34:17.065599 [kube-peers] Nodes that have disappeared: map[96:cd:5b:7f:65:73:{96:cd:5b:7f:65:73 rancher-1}]
DEBU: 2020/07/28 09:34:17.065836 [kube-peers] Preparing to remove disappeared peer 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.079511 [kube-peers] Noting I plan to remove 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.095598 weave DELETE to http://127.0.0.1:6784/peer/96:cd:5b:7f:65:73 with map[]
INFO: 2020/07/28 09:34:17.097095 [kube-peers] rmpeer of 96:cd:5b:7f:65:73: 0 IPs taken over from 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.644909 [kube-peers] Nodes that have disappeared: map[]
INFO: 2020/07/28 09:34:17.658557 Assuming quorum size of 1
10.32.0.1
DEBU: 2020/07/28 09:34:17.761697 registering for updates for node delete events
vagrant#rancher-0:~$ kubectl logs coredns-66bff467f8-9z4f8 -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0728 09:31:10.764496 1 trace.go:116] Trace[2019727887]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.763691008 +0000 UTC m=+0.308910646) (total time: 30.000692218s):
Trace[2019727887]: [30.000692218s] [30.000692218s] END
E0728 09:31:10.764526 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0728 09:31:10.764666 1 trace.go:116] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.761333538 +0000 UTC m=+0.306553222) (total time: 30.00331917s):
Trace[1427131847]: [30.00331917s] [30.00331917s] END
E0728 09:31:10.764673 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0728 09:31:10.767435 1 trace.go:116] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.762085835 +0000 UTC m=+0.307305485) (total time: 30.005326233s):
Trace[939984059]: [30.005326233s] [30.005326233s] END
E0728 09:31:10.767569 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
...
vagrant#rancher-0:~$ kubectl describe pod coredns-66bff467f8-9z4f8 -n kube-system
Name: coredns-66bff467f8-9z4f8
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: rancher-1/10.0.2.15
Start Time: Tue, 28 Jul 2020 09:30:38 +0000
Labels: k8s-app=kube-dns
pod-template-hash=66bff467f8
Annotations: <none>
Status: Running
IP: 10.32.0.2
IPs:
IP: 10.32.0.2
Controlled By: ReplicaSet/coredns-66bff467f8
Containers:
coredns:
Container ID: docker://899cfd54a5281939dcb09eece96ff3024a3b4c444e982bda74b8334504a6a369
Image: k8s.gcr.io/coredns:1.6.7
Image ID: docker-pullable://k8s.gcr.io/coredns#sha256:2c8d61c46f484d881db43b34d13ca47a269336e576c81cf007ca740fa9ec0800
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Tue, 28 Jul 2020 09:30:40 +0000
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-znl2p (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-znl2p:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-znl2p
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28m default-scheduler Successfully assigned kube-system/coredns-66bff467f8-9z4f8 to rancher-1
Normal Pulled 28m kubelet, rancher-1 Container image "k8s.gcr.io/coredns:1.6.7" already present on machine
Normal Created 28m kubelet, rancher-1 Created container coredns
Normal Started 28m kubelet, rancher-1 Started container coredns
Warning Unhealthy 3m35s (x151 over 28m) kubelet, rancher-1 Readiness probe failed: HTTP probe failed with statuscode: 503
Edit 0:
The issue is solved, so the problem was that I configure ufw rule to allow cidr of my vms network but does not allow from kubernetes(from docker containers), so I configure ufw to allow certain ports documented from kubernetes website and ports documented from weave website so now the cluster is working as expected
As #shadowlegend said the issue is solved, so the problem was with configuration ufw rule to allow cidr of vms network but does not allow from kubernetes(from docker containers). Configure ufw to allow certain ports documented from kubernetes website and ports documented from weave website and the cluster will be working as expected.
Take a look: ufw-firewall-kubernetes.
NOTE:
Those same playbook work as expected on google cloud.