kubeadm upgrade to 1.91 kube-dns failure - kubernetes

I attempted to upgrade to 1.7 to 1.9 using kubeadm, kube-dns was crashloopig. I removed the deployment and applied the a new deployment using the latest yaml for kube-dns (replacing the clusterip with 10.96.0.10, domain with cluster.local).
The kubedns container fails after not being able to get a valid response from the api server. The 10.96.0.1 ip does respond to a wget on the 443 port from all servers in the cluster (403 forbidden response).
E0104 21:51:42.732805 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 21:51:42.732971 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
Is this a connection issue, configuration issue, or a security model change that is causing the errors in the log?
Thanks.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu80 Ready master 165d v1.9.1
ubuntu81 Ready <none> 165d v1.9.1
ubuntu82 Ready <none> 165d v1.9.1
ubuntu83 Ready <none> 163d v1.9.1
$ kubectl get all --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/kube-flannel-ds 4 4 4 0 4 beta.kubernetes.io/arch=amd64 165d
ds/kube-proxy 4 4 4 4 4 <none> 165d
ds/traefik-ingress-controller 3 3 3 3 3 <none> 165d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 1 1 1 0 1h
deploy/tiller-deploy 1 1 1 1 163d
NAME DESIRED CURRENT READY AGE
rs/kube-dns-6c857864fb 1 1 0 1h
rs/tiller-deploy-3341511835 1 1 1 105d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/kube-flannel-ds 4 4 4 0 4 beta.kubernetes.io/arch=amd64 165d
ds/kube-proxy 4 4 4 4 4 <none> 165d
ds/traefik-ingress-controller 3 3 3 3 3 <none> 165d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 1 1 1 0 1h
deploy/tiller-deploy 1 1 1 1 163d
NAME DESIRED CURRENT READY AGE
rs/kube-dns-6c857864fb 1 1 0 1h
rs/tiller-deploy-3341511835 1 1 1 105d
NAME READY STATUS RESTARTS AGE
po/etcd-ubuntu80 1/1 Running 1 16d
po/kube-apiserver-ubuntu80 1/1 Running 1 2h
po/kube-controller-manager-ubuntu80 1/1 Running 1 2h
po/kube-dns-6c857864fb-grhxp 1/3 CrashLoopBackOff 52 1h
po/kube-flannel-ds-07npj 2/2 Running 32 165d
po/kube-flannel-ds-169lh 2/2 Running 26 165d
po/kube-flannel-ds-50c56 2/2 Running 27 163d
po/kube-flannel-ds-wkd7j 2/2 Running 29 165d
po/kube-proxy-495n7 1/1 Running 1 2h
po/kube-proxy-9g7d2 1/1 Running 1 2h
po/kube-proxy-d856z 1/1 Running 0 2h
po/kube-proxy-kzmcc 1/1 Running 0 2h
po/kube-scheduler-ubuntu80 1/1 Running 1 2h
po/tiller-deploy-3341511835-m3x26 1/1 Running 2 58d
po/traefik-ingress-controller-51r7d 1/1 Running 4 105d
po/traefik-ingress-controller-sf6lc 1/1 Running 4 105d
po/traefik-ingress-controller-xz1rt 1/1 Running 5 105d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 1h
svc/kubernetes-dashboard ClusterIP 10.101.112.198 <none> 443/TCP 165d
svc/tiller-deploy ClusterIP 10.98.117.242 <none> 44134/TCP 163d
svc/traefik-web-ui ClusterIP 10.110.215.194 <none> 80/TCP 165d
$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0104 21:51:12.730927 1 dns.go:48] version: 1.14.6-3-gc36cb11
I0104 21:51:12.731643 1 server.go:69] Using configuration read from directory: /kube-dns-config with period 10s
I0104 21:51:12.731673 1 server.go:112] FLAG: --alsologtostderr="false"
I0104 21:51:12.731679 1 server.go:112] FLAG: --config-dir="/kube-dns-config"
I0104 21:51:12.731683 1 server.go:112] FLAG: --config-map=""
I0104 21:51:12.731686 1 server.go:112] FLAG: --config-map-namespace="kube-system"
I0104 21:51:12.731688 1 server.go:112] FLAG: --config-period="10s"
I0104 21:51:12.731693 1 server.go:112] FLAG: --dns-bind-address="0.0.0.0"
I0104 21:51:12.731695 1 server.go:112] FLAG: --dns-port="10053"
I0104 21:51:12.731713 1 server.go:112] FLAG: --domain="cluster.local."
I0104 21:51:12.731717 1 server.go:112] FLAG: --federations=""
I0104 21:51:12.731723 1 server.go:112] FLAG: --healthz-port="8081"
I0104 21:51:12.731726 1 server.go:112] FLAG: --initial-sync-timeout="1m0s"
I0104 21:51:12.731729 1 server.go:112] FLAG: --kube-master-url=""
I0104 21:51:12.731733 1 server.go:112] FLAG: --kubecfg-file=""
I0104 21:51:12.731735 1 server.go:112] FLAG: --log-backtrace-at=":0"
I0104 21:51:12.731740 1 server.go:112] FLAG: --log-dir=""
I0104 21:51:12.731743 1 server.go:112] FLAG: --log-flush-frequency="5s"
I0104 21:51:12.731746 1 server.go:112] FLAG: --logtostderr="true"
I0104 21:51:12.731748 1 server.go:112] FLAG: --nameservers=""
I0104 21:51:12.731751 1 server.go:112] FLAG: --stderrthreshold="2"
I0104 21:51:12.731753 1 server.go:112] FLAG: --v="2"
I0104 21:51:12.731756 1 server.go:112] FLAG: --version="false"
I0104 21:51:12.731761 1 server.go:112] FLAG: --vmodule=""
I0104 21:51:12.731798 1 server.go:194] Starting SkyDNS server (0.0.0.0:10053)
I0104 21:51:12.731979 1 server.go:213] Skydns metrics enabled (/metrics:10055)
I0104 21:51:12.731987 1 dns.go:146] Starting endpointsController
I0104 21:51:12.731991 1 dns.go:149] Starting serviceController
I0104 21:51:12.732457 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0104 21:51:12.732467 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0104 21:51:13.232355 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:13.732395 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:14.232389 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:14.732389 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:15.232369 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:42.732629 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E0104 21:51:42.732805 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 21:51:42.732971 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0104 21:51:43.232257 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:51.232379 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:51.732371 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:52.232390 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:52:11.732376 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:52:12.232382 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
F0104 21:52:12.732377 1 dns.go:167] Timeout waiting for initialization
$ kubectl describe po/kube-dns-6c857864fb-grhxp --namespace=kube-system
Name: kube-dns-6c857864fb-grhxp
Namespace: kube-system
Node: ubuntu82/10.80.82.1
Start Time: Fri, 05 Jan 2018 01:55:48 +0530
Labels: k8s-app=kube-dns
pod-template-hash=2741342096
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.244.2.12
Controlled By: ReplicaSet/kube-dns-6c857864fb
Containers:
kubedns:
Container ID: docker://3daa4233f54fa251abdcdfe73d2e71179356f5da45983d19fe66a3f18bab8d13
Image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-kube-dns-amd64#sha256:f5bddc71efe905f4e4b96f3ca346414be6d733610c1525b98fff808f93966680
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-dir=/kube-dns-config
--v=2
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Fri, 05 Jan 2018 03:21:12 +0530
Finished: Fri, 05 Jan 2018 03:22:12 +0530
Ready: False
Restart Count: 26
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/kube-dns-config from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-cpzzw (ro)
dnsmasq:
Container ID: docker://a40a34e6fdf7176ea148fdb1f21d157c5d264e44bd14183ed9d19164a742fb65
Image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64#sha256:6cfb9f9c2756979013dbd3074e852c2d8ac99652570c5d17d152e0c0eb3321d6
Ports: 53/UDP, 53/TCP
Args:
-v=2
-logtostderr
-configDir=/etc/k8s/dns/dnsmasq-nanny
-restartDnsmasq=true
--
-k
--cache-size=1000
--no-negcache
--log-facility=-
--server=/cluster.local/127.0.0.1#10053
--server=/in-addr.arpa/127.0.0.1#10053
--server=/ip6.arpa/127.0.0.1#10053
State: Running
Started: Fri, 05 Jan 2018 03:24:44 +0530
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Fri, 05 Jan 2018 03:17:33 +0530
Finished: Fri, 05 Jan 2018 03:19:33 +0530
Ready: True
Restart Count: 27
Requests:
cpu: 150m
memory: 20Mi
Liveness: http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-cpzzw (ro)
sidecar:
Container ID: docker://c05b33a08344f15b0d1a1e8fee39cc05b6d9de6a24db6d2cd05e92c2706fc03c
Image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-sidecar-amd64#sha256:f80f5f9328107dc516d67f7b70054354b9367d31d4946a3bffd3383d83d7efe8
Port: 10054/TCP
Args:
--v=2
--logtostderr
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
State: Running
Started: Fri, 05 Jan 2018 02:09:25 +0530
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 05 Jan 2018 01:55:50 +0530
Finished: Fri, 05 Jan 2018 02:08:20 +0530
Ready: True
Restart Count: 1
Requests:
cpu: 10m
memory: 20Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-cpzzw (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
kube-dns-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-dns
Optional: true
kube-dns-token-cpzzw:
Type: Secret (a volume populated by a Secret)
SecretName: kube-dns-token-cpzzw
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 46m (x57 over 1h) kubelet, ubuntu82 Readiness probe failed: Get http://10.244.2.12:8081/readiness: dial tcp 10.244.2.12:8081: getsockopt: connection refused
Warning Unhealthy 36m (x42 over 1h) kubelet, ubuntu82 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 31m (x162 over 1h) kubelet, ubuntu82 Back-off restarting failed container
Normal Killing 26m (x13 over 1h) kubelet, ubuntu82 Killing container with id docker://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Normal SuccessfulMountVolume 21m kubelet, ubuntu82 MountVolume.SetUp succeeded for volume "kube-dns-token-cpzzw"
Normal SuccessfulMountVolume 21m kubelet, ubuntu82 MountVolume.SetUp succeeded for volume "kube-dns-config"
Normal Pulled 21m kubelet, ubuntu82 Container image "gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7" already present on machine
Normal Started 21m kubelet, ubuntu82 Started container
Normal Created 21m kubelet, ubuntu82 Created container
Normal Started 19m (x2 over 21m) kubelet, ubuntu82 Started container
Normal Created 19m (x2 over 21m) kubelet, ubuntu82 Created container
Normal Pulled 19m (x2 over 21m) kubelet, ubuntu82 Container image "gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7" already present on machine
Warning Unhealthy 19m (x4 over 20m) kubelet, ubuntu82 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 16m (x22 over 21m) kubelet, ubuntu82 Readiness probe failed: Get http://10.244.2.12:8081/readiness: dial tcp 10.244.2.12:8081: getsockopt: connection refused
Normal Killing 6m (x6 over 19m) kubelet, ubuntu82 Killing container with id docker://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning BackOff 1m (x65 over 20m) kubelet, ubuntu82 Back-off restarting failed container

Kubedns 1.14.7 does not work well with kubernetes 1.9.1. In my case, kubedns was trying to connect to apiserver using 443 and not, as configured, 6443.
When I changed the image version to 1.14.8 (newest - kubedns github), kubedns recognized the apiserver port properly. No problems any more:
kubectl edit deploy kube-dns --namespace=kube-system
#change to the image version to 1.14.8 and works

Yes, I saw issues with kube-dns 1.14.7 too. Use the latest kube-dns version 1.14.8 in https://github.com/kubernetes/dns/releases by doing:
kubectl edit deploy kube-dns --namespace=kube-system
# change the image version in the "Image:" field to 1.14.8
If the issue is still seen, also do:
kubectl create configmap --namespace=kube-system kube-dns
kubectl delete pod <name of kube-dns pod> --namespace=kube-system
# kube-dns should restart and work now

Related

How to resolve this error that nginx-ingress-controller start fail in my k8s cluster?

Rancher v2.4.2
kubernetes version: v1.17.4
In my k8s cluster,nginx-ingress-controller doesn't work and restart always.I don't get anything useful information in the logs, thanks for your help.
cluster nodes:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready controlplane,etcd,worker 18d v1.17.4
master2 Ready controlplane,etcd,worker 17d v1.17.4
node1 Ready worker 17d v1.17.4
node2 Ready worker 17d v1.17.4
cluster pods in ingress-nginx namespace
> kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
default-http-backend-5bb77998d7-k7gdh 1/1 Running 1 17d
nginx-ingress-controller-6l4jh 0/1 Running 10 27m
nginx-ingress-controller-bh2pg 1/1 Running 0 63m
nginx-ingress-controller-drtzx 1/1 Running 0 63m
nginx-ingress-controller-qndbw 1/1 Running 0 63m
the pod logs of nginx-ingress-controller-6l4jh
> kubectl logs nginx-ingress-controller-6l4jh -n ingress-nginx
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: nginx-0.25.1-rancher1
Build:
Repository: https://github.com/rancher/ingress-nginx.git
nginx version: openresty/1.15.8.1
-------------------------------------------------------------------------------
>
describe info
> kubectl describe pod nginx-ingress-controller-6l4jh -n ingress-nginx
Name: nginx-ingress-controller-6l4jh
Namespace: ingress-nginx
Priority: 0
Node: node2/172.26.13.11
Start Time: Tue, 19 Apr 2022 07:12:16 +0000
Labels: app=ingress-nginx
controller-revision-hash=758cb9dbbc
pod-template-generation=8
Annotations: cattle.io/timestamp: 2022-04-19T07:08:51Z
field.cattle.io/ports:
[[{"containerPort":80,"dnsName":"nginx-ingress-controller-hostport","hostPort":80,"kind":"HostPort","name":"http","protocol":"TCP","source...
field.cattle.io/publicEndpoints:
[{"addresses":["172.26.13.130"],"nodeId":"c-wv692:m-d5802d05bbf0","port":80,"protocol":"TCP"},{"addresses":["172.26.13.130"],"nodeId":"c-w...
prometheus.io/port: 10254
prometheus.io/scrape: true
Status: Running
IP: 172.26.13.11
IPs:
IP: 172.26.13.11
Controlled By: DaemonSet/nginx-ingress-controller
Containers:
nginx-ingress-controller:
Container ID: docker://09a6248edb921b9c9cbab678c793fe1cc3d28322ea6abbb8f15c899351ce4b40
Image: 172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
Image ID: docker-pullable://172.26.13.133:5000/rancher/nginx-ingress-controller#sha256:fe50ceea3d1a0bc9a7ccef8d5845c9a30b51f608e411467862dff590185a47d2
Ports: 80/TCP, 443/TCP
Host Ports: 80/TCP, 443/TCP
Args:
/nginx-ingress-controller
--default-backend-service=$(POD_NAMESPACE)/default-http-backend
--configmap=$(POD_NAMESPACE)/nginx-configuration
--tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
--udp-services-configmap=$(POD_NAMESPACE)/udp-services
--annotations-prefix=nginx.ingress.kubernetes.io
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 143
Started: Tue, 19 Apr 2022 07:40:12 +0000
Finished: Tue, 19 Apr 2022 07:41:32 +0000
Ready: False
Restart Count: 11
Liveness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Readiness: http-get http://:10254/healthz delay=60s timeout=20s period=10s #success=1 #failure=3
Environment:
POD_NAME: nginx-ingress-controller-6l4jh (v1:metadata.name)
POD_NAMESPACE: ingress-nginx (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-serviceaccount-token-2kdbj (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-serviceaccount-token-2kdbj:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-serviceaccount-token-2kdbj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: :NoExecute
:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned ingress-nginx/nginx-ingress-controller-6l4jh to node2
Normal Pulled 27m (x3 over 30m) kubelet, node2 Container image "172.26.13.133:5000/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1" already present on machine
Normal Created 27m (x3 over 30m) kubelet, node2 Created container nginx-ingress-controller
Normal Started 27m (x3 over 30m) kubelet, node2 Started container nginx-ingress-controller
Normal Killing 27m (x2 over 28m) kubelet, node2 Container nginx-ingress-controller failed liveness probe, will be restarted
Warning Unhealthy 25m (x10 over 29m) kubelet, node2 Liveness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning Unhealthy 10m (x21 over 29m) kubelet, node2 Readiness probe failed: Get http://172.26.13.11:10254/healthz: dial tcp 172.26.13.11:10254: connect: connection refused
Warning BackOff 8s (x69 over 20m) kubelet, node2 Back-off restarting failed container
>
It sounds like the ingress controller pod fails the liveness/readiness checks but looks like only on a certain node. You could try:
check the node for firewall on that port
update to newer version than nginx-0.25.1

helm chart deployment liveness and readiness failed error

I have an Openshift cluster. I have created one custom application and am trying to deploy it using Helm charts. When I deploy it on Openshift using 'oc new-app', the deployments works perfectly fine. But when I deploy it using helm chart, it does not work.
Following is the output of 'oc get all' ----
[root#worker2 ~]#
[root#worker2 ~]# oc get all
NAME READY STATUS RESTARTS AGE
pod/chart-acme-85648d4645-7msdl 1/1 Running 0 3d7h
pod/chart1-acme-f8b65b78d-k2fb6 1/1 Running 0 3d7h
pod/netshoot 1/1 Running 0 3d10h
pod/sample1-buildachart-5b5d9d8649-qqmsf 0/1 CrashLoopBackOff 672 2d9h
pod/sample2-686bb7f969-fx5bk 0/1 CrashLoopBackOff 674 2d9h
pod/vjobs-npm-96b65fcb-b2p27 0/1 CrashLoopBackOff 817 47h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/chart-acme LoadBalancer 172.30.174.208 <pending> 80:30222/TCP 3d7h
service/chart1-acme LoadBalancer 172.30.153.36 <pending> 80:30383/TCP 3d7h
service/sample1-buildachart NodePort 172.30.29.124 <none> 80:32375/TCP 2d9h
service/sample2 NodePort 172.30.19.24 <none> 80:32647/TCP 2d9h
service/vjobs-npm NodePort 172.30.205.30 <none> 80:30476/TCP 47h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/chart-acme 1/1 1 1 3d7h
deployment.apps/chart1-acme 1/1 1 1 3d7h
deployment.apps/sample1-buildachart 0/1 1 0 2d9h
deployment.apps/sample2 0/1 1 0 2d9h
deployment.apps/vjobs-npm 0/1 1 0 47h
NAME DESIRED CURRENT READY AGE
replicaset.apps/chart-acme-85648d4645 1 1 1 3d7h
replicaset.apps/chart1-acme-f8b65b78d 1 1 1 3d7h
replicaset.apps/sample1-buildachart-5b5d9d8649 1 1 0 2d9h
replicaset.apps/sample2-686bb7f969 1 1 0 2d9h
replicaset.apps/vjobs-npm-96b65fcb 1 1 0 47h
[root#worker2 ~]#
[root#worker2 ~]#
As per above diagram, you can see the deployment 'vjobs-npm' gives a 'CrashLoopBackOff' error.
Below is the output of 'oc describe pod' ----
[root#worker2 ~]#
[root#worker2 ~]# oc describe pod vjobs-npm-96b65fcb-b2p27
Name: vjobs-npm-96b65fcb-b2p27
Namespace: vjobs-testing
Priority: 0
Node: worker0/192.168.100.109
Start Time: Mon, 31 Aug 2020 09:30:28 -0400
Labels: app.kubernetes.io/instance=vjobs-npm
app.kubernetes.io/name=vjobs-npm
pod-template-hash=96b65fcb
Annotations: openshift.io/scc: restricted
Status: Running
IP: 10.131.0.107
IPs:
IP: 10.131.0.107
Controlled By: ReplicaSet/vjobs-npm-96b65fcb
Containers:
vjobs-npm:
Container ID: cri-o://c232849eb25bd96ae9343ac3ed1539d492985dd8cdf47a5a4df7d3cf776c4cf3
Image: quay.io/aditya7002/vjobs_local_build_new:latest
Image ID: quay.io/aditya7002/vjobs_local_build_new#sha256:87f18e3a24fc7043a43a143e96b0b069db418ace027d95a5427cf53de56feb4c
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 31 Aug 2020 09:31:23 -0400
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 31 Aug 2020 09:30:31 -0400
Finished: Mon, 31 Aug 2020 09:31:22 -0400
Ready: False
Restart Count: 1
Liveness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from vjobs-npm-token-vw6f7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
vjobs-npm-token-vw6f7:
Type: Secret (a volume populated by a Secret)
SecretName: vjobs-npm-token-vw6f7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 68s default-scheduler Successfully assigned vjobs-testing/vjobs-npm-96b65fcb-b2p27 to worker0
Normal Killing 44s kubelet, worker0 Container vjobs-npm failed liveness probe, will be restarted
Normal Pulling 14s (x2 over 66s) kubelet, worker0 Pulling image "quay.io/aditya7002/vjobs_local_build_new:latest"
Normal Pulled 13s (x2 over 65s) kubelet, worker0 Successfully pulled image "quay.io/aditya7002/vjobs_local_build_new:latest"
Normal Created 13s (x2 over 65s) kubelet, worker0 Created container vjobs-npm
Normal Started 13s (x2 over 65s) kubelet, worker0 Started container vjobs-npm
Warning Unhealthy 4s (x4 over 64s) kubelet, worker0 Liveness probe failed: Get http://10.131.0.107:80/: dial tcp 10.131.0.107:80:
connect: connection refused
Warning Unhealthy 1s (x7 over 61s) kubelet, worker0 Readiness probe failed: Get http://10.131.0.107:80/: dial tcp 10.131.0.107:80
: connect: connection refused
[root#worker2 ~]#

How to solve CoreDNS always stuck at "waiting for kubernetes"?

Vagrant, vm os: ubuntu/bionic64, swap disabled
Kubernetes version: 1.18.0
infrastructure: 1 haproxy node, 3 external etcd node and 3 kubernetes master node
Attempts: trying to setup ha rancher so I am setting up ha kubernetes cluster first using kubeadm by following the official doc
Expected behavior: all k8s components are up and be able to navigate to weave scope to see all nodes
Actual behavior: CoreDNS is still not ready even after installing CNI (Weave Net) so weave scope (the nice visualization ui) is not working unless networking is working properly (weave net and coredns).
# kubeadm config
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "172.16.0.30:6443"
etcd:
external:
caFile: /etc/rancher-certs/ca-chain.cert.pem
keyFile: /etc/rancher-certs/etcd.key.pem
certFile: /etc/rancher-certs/etcd.cert.pem
endpoints:
- https://172.16.0.20:2379
- https://172.16.0.21:2379
- https://172.16.0.22:2379
-------------------------------------------------------------------------------
# firewall
vagrant#rancher-0:~$ sudo ufw status
Status: active
To Action From
-- ------ ----
OpenSSH ALLOW Anywhere
Anywhere ALLOW 172.16.0.0/26
OpenSSH (v6) ALLOW Anywhere (v6)
-------------------------------------------------------------------------------
# no swap
vagrant#rancher-0:~$ free -h
total used free shared buff/cache available
Mem: 1.9G 928M 97M 1.4M 966M 1.1G
Swap: 0B 0B 0B
k8s diagnostic output:
vagrant#rancher-0:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rancher-0 Ready master 14m v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
rancher-1 Ready master 9m23s v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
rancher-2 Ready master 4m26s v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
vagrant#rancher-0:~$ kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cert-manager cert-manager ClusterIP 10.106.146.236 <none> 9402/TCP 17m
cert-manager cert-manager-webhook ClusterIP 10.102.162.87 <none> 443/TCP 17m
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 18m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 18m
weave weave-scope-app NodePort 10.96.110.153 <none> 80:30276/TCP 17m
vagrant#rancher-0:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager cert-manager-bd9d585bd-x8qpb 0/1 Pending 0 16m <none> <none> <none> <none>
cert-manager cert-manager-cainjector-76c6657c55-d8fpj 0/1 Pending 0 16m <none> <none> <none> <none>
cert-manager cert-manager-webhook-64b9b4fdfd-sspjx 0/1 Pending 0 16m <none> <none> <none> <none>
kube-system coredns-66bff467f8-9z4f8 0/1 Running 0 10m 10.32.0.2 rancher-1 <none> <none>
kube-system coredns-66bff467f8-zkk99 0/1 Running 0 16m 10.32.0.2 rancher-0 <none> <none>
kube-system kube-apiserver-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-apiserver-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-apiserver-rancher-2 1/1 Running 0 7m23s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-controller-manager-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-controller-manager-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-controller-manager-rancher-2 1/1 Running 0 7m24s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-proxy-grts7 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-proxy-jv9lm 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-proxy-z2lrc 1/1 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-scheduler-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-scheduler-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-scheduler-rancher-2 1/1 Running 0 7m23s 10.0.2.15 rancher-2 <none> <none>
kube-system weave-net-nnvkd 2/2 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
kube-system weave-net-pgxnq 2/2 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system weave-net-q22bh 2/2 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
weave weave-scope-agent-9gwj2 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
weave weave-scope-agent-mznp7 1/1 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
weave weave-scope-agent-v7jql 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
weave weave-scope-app-bc7444d59-cjpd8 0/1 Pending 0 16m <none> <none> <none> <none>
weave weave-scope-cluster-agent-5c5dcc8cb-ln4hg 0/1 Pending 0 16m <none> <none> <none> <none>
vagrant#rancher-0:~$ kubectl describe node rancher-0
Name: rancher-0
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=rancher-0
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 28 Jul 2020 09:24:17 +0000
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: rancher-0
AcquireTime: <unset>
RenewTime: Tue, 28 Jul 2020 09:35:33 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 28 Jul 2020 09:24:47 +0000 Tue, 28 Jul 2020 09:24:47 +0000 WeaveIsUp Weave pod has set this
MemoryPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:52 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.2.15
Hostname: rancher-0
Capacity:
cpu: 2
ephemeral-storage: 10098432Ki
hugepages-2Mi: 0
memory: 2040812Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 9306714916
hugepages-2Mi: 0
memory: 1938412Ki
pods: 110
System Info:
Machine ID: 9b1bc8a8ef2c4e5b844624a36302d877
System UUID: A282600C-28F8-4D49-A9D3-6F05CA16865E
Boot ID: 77746bf5-7941-4e72-817e-24f149172158
Kernel Version: 4.15.0-99-generic
OS Image: Ubuntu 18.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.12
Kubelet Version: v1.18.0
Kube-Proxy Version: v1.18.0
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-66bff467f8-zkk99 100m (5%) 0 (0%) 70Mi (3%) 170Mi (8%) 11m
kube-system kube-apiserver-rancher-0 250m (12%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-controller-manager-rancher-0 200m (10%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-proxy-jv9lm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-scheduler-rancher-0 100m (5%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system weave-net-q22bh 20m (1%) 0 (0%) 0 (0%) 0 (0%) 11m
weave weave-scope-agent-9gwj2 100m (5%) 0 (0%) 100Mi (5%) 2000Mi (105%) 11m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 770m (38%) 0 (0%)
memory 170Mi (8%) 2170Mi (114%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 11m kubelet, rancher-0 Starting kubelet.
Warning ImageGCFailed 11m kubelet, rancher-0 failed to get imageFs info: unable to find data in memory cache
Normal NodeHasSufficientMemory 11m (x3 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m (x3 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m (x2 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet, rancher-0 Updated Node Allocatable limit across pods
Normal Starting 11m kubelet, rancher-0 Starting kubelet.
Normal NodeHasSufficientMemory 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet, rancher-0 Updated Node Allocatable limit across pods
Normal Starting 11m kube-proxy, rancher-0 Starting kube-proxy.
Normal NodeReady 10m kubelet, rancher-0 Node rancher-0 status is now: NodeReady
vagrant#rancher-0:~$ kubectl exec -n kube-system weave-net-nnvkd -c weave -- /home/weave/weave --local status
Version: 2.6.5 (failed to check latest version - see logs; next check at 2020/07/28 15:27:34)
Service: router
Protocol: weave 1..2
Name: 5a:40:7b:be:35:1d(rancher-2)
Encryption: disabled
PeerDiscovery: enabled
Targets: 0
Connections: 0
Peers: 1
TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
vagrant#rancher-0:~$ kubectl logs weave-net-nnvkd -c weave -n kube-system
INFO: 2020/07/28 09:34:15.989759 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 ipalloc-init:consensus=0 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:5a:40:7b:be:35:1d nickname:rancher-2 no-dns:true port:6783]
INFO: 2020/07/28 09:34:15.989792 weave 2.6.5
INFO: 2020/07/28 09:34:16.178429 Bridge type is bridged_fastdp
INFO: 2020/07/28 09:34:16.178451 Communication between peers is unencrypted.
INFO: 2020/07/28 09:34:16.182442 Our name is 5a:40:7b:be:35:1d(rancher-2)
INFO: 2020/07/28 09:34:16.182499 Launch detected - using supplied peer list: []
INFO: 2020/07/28 09:34:16.196598 Checking for pre-existing addresses on weave bridge
INFO: 2020/07/28 09:34:16.204735 [allocator 5a:40:7b:be:35:1d] No valid persisted data
INFO: 2020/07/28 09:34:16.206236 [allocator 5a:40:7b:be:35:1d] Initialising via deferred consensus
INFO: 2020/07/28 09:34:16.206291 Sniffing traffic on datapath (via ODP)
INFO: 2020/07/28 09:34:16.210065 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2020/07/28 09:34:16.210471 Listening for metrics requests on 0.0.0.0:6782
INFO: 2020/07/28 09:34:16.275523 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.15.0-99-generic&flag_kubernetes-cluster-size=0&flag_kubernetes-cluster-uid=aca5a8cc-27ca-4e8f-9964-4cf3971497c6&flag_kubernetes-version=v1.18.6&os=linux&signature=7uMaGpuc3%2F8ZtHqGoHyCnJ5VfOJUmnL%2FD6UZSqWYxKA%3D&version=2.6.5: dial tcp: lookup checkpoint-api.weave.works on 10.96.0.10:53: write udp 10.0.2.15:43742->10.96.0.10:53: write: operation not permitted
INFO: 2020/07/28 09:34:17.052454 [kube-peers] Added myself to peer list &{[{96:cd:5b:7f:65:73 rancher-1} {5a:40:7b:be:35:1d rancher-2}]}
DEBU: 2020/07/28 09:34:17.065599 [kube-peers] Nodes that have disappeared: map[96:cd:5b:7f:65:73:{96:cd:5b:7f:65:73 rancher-1}]
DEBU: 2020/07/28 09:34:17.065836 [kube-peers] Preparing to remove disappeared peer 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.079511 [kube-peers] Noting I plan to remove 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.095598 weave DELETE to http://127.0.0.1:6784/peer/96:cd:5b:7f:65:73 with map[]
INFO: 2020/07/28 09:34:17.097095 [kube-peers] rmpeer of 96:cd:5b:7f:65:73: 0 IPs taken over from 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.644909 [kube-peers] Nodes that have disappeared: map[]
INFO: 2020/07/28 09:34:17.658557 Assuming quorum size of 1
10.32.0.1
DEBU: 2020/07/28 09:34:17.761697 registering for updates for node delete events
vagrant#rancher-0:~$ kubectl logs coredns-66bff467f8-9z4f8 -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0728 09:31:10.764496 1 trace.go:116] Trace[2019727887]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.763691008 +0000 UTC m=+0.308910646) (total time: 30.000692218s):
Trace[2019727887]: [30.000692218s] [30.000692218s] END
E0728 09:31:10.764526 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0728 09:31:10.764666 1 trace.go:116] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.761333538 +0000 UTC m=+0.306553222) (total time: 30.00331917s):
Trace[1427131847]: [30.00331917s] [30.00331917s] END
E0728 09:31:10.764673 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0728 09:31:10.767435 1 trace.go:116] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.762085835 +0000 UTC m=+0.307305485) (total time: 30.005326233s):
Trace[939984059]: [30.005326233s] [30.005326233s] END
E0728 09:31:10.767569 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
...
vagrant#rancher-0:~$ kubectl describe pod coredns-66bff467f8-9z4f8 -n kube-system
Name: coredns-66bff467f8-9z4f8
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: rancher-1/10.0.2.15
Start Time: Tue, 28 Jul 2020 09:30:38 +0000
Labels: k8s-app=kube-dns
pod-template-hash=66bff467f8
Annotations: <none>
Status: Running
IP: 10.32.0.2
IPs:
IP: 10.32.0.2
Controlled By: ReplicaSet/coredns-66bff467f8
Containers:
coredns:
Container ID: docker://899cfd54a5281939dcb09eece96ff3024a3b4c444e982bda74b8334504a6a369
Image: k8s.gcr.io/coredns:1.6.7
Image ID: docker-pullable://k8s.gcr.io/coredns#sha256:2c8d61c46f484d881db43b34d13ca47a269336e576c81cf007ca740fa9ec0800
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Tue, 28 Jul 2020 09:30:40 +0000
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-znl2p (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-znl2p:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-znl2p
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28m default-scheduler Successfully assigned kube-system/coredns-66bff467f8-9z4f8 to rancher-1
Normal Pulled 28m kubelet, rancher-1 Container image "k8s.gcr.io/coredns:1.6.7" already present on machine
Normal Created 28m kubelet, rancher-1 Created container coredns
Normal Started 28m kubelet, rancher-1 Started container coredns
Warning Unhealthy 3m35s (x151 over 28m) kubelet, rancher-1 Readiness probe failed: HTTP probe failed with statuscode: 503
Edit 0:
The issue is solved, so the problem was that I configure ufw rule to allow cidr of my vms network but does not allow from kubernetes(from docker containers), so I configure ufw to allow certain ports documented from kubernetes website and ports documented from weave website so now the cluster is working as expected
As #shadowlegend said the issue is solved, so the problem was with configuration ufw rule to allow cidr of vms network but does not allow from kubernetes(from docker containers). Configure ufw to allow certain ports documented from kubernetes website and ports documented from weave website and the cluster will be working as expected.
Take a look: ufw-firewall-kubernetes.
NOTE:
Those same playbook work as expected on google cloud.

kube-scheduler Liveness probe failed: Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused

So I have this unhealthy cluster partially working in the datacenter. This is probably the 10th time I have rebuilt from the instructions at: https://kubernetes.io/docs/setup/independent/high-availability/
I can apply some pods to this cluster and it seems to work but eventually it starts slowing down and crashing as you can see below. Here is the scheduler manifest:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
image: k8s.gcr.io/kube-scheduler:v1.14.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10251
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
status: {}
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-42psn 1/1 Running 9 88m
coredns-fb8b8dccf-x9mlt 1/1 Running 11 88m
docker-registry-dqvzb 1/1 Running 1 2d6h
kube-apiserver-kube-apiserver-1 1/1 Running 44 2d8h
kube-apiserver-kube-apiserver-2 1/1 Running 34 2d7h
kube-controller-manager-kube-apiserver-1 1/1 Running 198 2d2h
kube-controller-manager-kube-apiserver-2 0/1 CrashLoopBackOff 170 2d7h
kube-flannel-ds-amd64-4mbfk 1/1 Running 1 2d7h
kube-flannel-ds-amd64-55hc7 1/1 Running 1 2d8h
kube-flannel-ds-amd64-fvwmf 1/1 Running 1 2d7h
kube-flannel-ds-amd64-ht5wm 1/1 Running 3 2d7h
kube-flannel-ds-amd64-rjt9l 1/1 Running 4 2d8h
kube-flannel-ds-amd64-wpmkj 1/1 Running 1 2d7h
kube-proxy-2n64d 1/1 Running 3 2d7h
kube-proxy-2pq2g 1/1 Running 1 2d7h
kube-proxy-5fbms 1/1 Running 2 2d8h
kube-proxy-g8gmn 1/1 Running 1 2d7h
kube-proxy-wrdrj 1/1 Running 1 2d8h
kube-proxy-wz6gv 1/1 Running 1 2d7h
kube-scheduler-kube-apiserver-1 0/1 CrashLoopBackOff 198 2d2h
kube-scheduler-kube-apiserver-2 1/1 Running 5 18m
nginx-ingress-controller-dz8fm 1/1 Running 3 2d4h
nginx-ingress-controller-sdsgg 1/1 Running 3 2d4h
nginx-ingress-controller-sfrgb 1/1 Running 1 2d4h
$ kubectl -n kube-system describe pod kube-scheduler-kube-apiserver-1
Containers:
kube-scheduler:
Container ID: docker://c04f3c9061cafef8749b2018cd66e6865d102f67c4d13bdd250d0b4656d5f220
Image: k8s.gcr.io/kube-scheduler:v1.14.2
Image ID: docker-pullable://k8s.gcr.io/kube-scheduler#sha256:052e0322b8a2b22819ab0385089f202555c4099493d1bd33205a34753494d2c2
Port: <none>
Host Port: <none>
Command:
kube-scheduler
--bind-address=127.0.0.1
--kubeconfig=/etc/kubernetes/scheduler.conf
--authentication-kubeconfig=/etc/kubernetes/scheduler.conf
--authorization-kubeconfig=/etc/kubernetes/scheduler.conf
--leader-elect=true
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 28 May 2019 23:16:50 -0400
Finished: Tue, 28 May 2019 23:19:56 -0400
Ready: False
Restart Count: 195
Requests:
cpu: 100m
Liveness: http-get http://127.0.0.1:10251/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/kubernetes/scheduler.conf from kubeconfig (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kubeconfig:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/scheduler.conf
HostPathType: FileOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 4h56m (x104 over 37h) kubelet, kube-apiserver-1 Created container kube-scheduler
Normal Started 4h56m (x104 over 37h) kubelet, kube-apiserver-1 Started container kube-scheduler
Warning Unhealthy 137m (x71 over 34h) kubelet, kube-apiserver-1 Liveness probe failed: Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
Normal Pulled 132m (x129 over 37h) kubelet, kube-apiserver-1 Container image "k8s.gcr.io/kube-scheduler:v1.14.2" already present on machine
Warning BackOff 128m (x1129 over 34h) kubelet, kube-apiserver-1 Back-off restarting failed container
Normal SandboxChanged 80m kubelet, kube-apiserver-1 Pod sandbox changed, it will be killed and re-created.
Warning Failed 76m kubelet, kube-apiserver-1 Error: context deadline exceeded
Normal Pulled 36m (x7 over 78m) kubelet, kube-apiserver-1 Container image "k8s.gcr.io/kube-scheduler:v1.14.2" already present on machine
Normal Started 36m (x6 over 74m) kubelet, kube-apiserver-1 Started container kube-scheduler
Normal Created 32m (x7 over 74m) kubelet, kube-apiserver-1 Created container kube-scheduler
Warning Unhealthy 20m (x9 over 40m) kubelet, kube-apiserver-1 Liveness probe failed: Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
Warning BackOff 2m56s (x85 over 69m) kubelet, kube-apiserver-1 Back-off restarting failed container
I feel like I am overlooking a simple option or configuration but I can't find it and after days of dealing with this problem and reading documentation I am at my wits end.
The load balancer is a TCP load balancer and seems to be working as expected as I can query the cluster from my desktop.
Any suggestions or troubleshooting tips are definitely welcome at this time.
Thank you.
The problem with our configuration was that a well intended technician decided to eliminate one of the rules on the kubernetes master firewall which prevented the master from looping back to ports it needed to probe. This caused all kinds of weird issues and misdiagnosed problems which was definitely the wrong direction. After we allowed all ports on the servers Kubernetes was back to its normal behavior.

issue on arm64: no endpoints,code:503

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/arm64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/arm64"}
Environment:
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Kernel (e.g. uname -a):
Linux node4 4.11.0-rc6-next-20170411-00286-gcc55807 #0 SMP PREEMPT Mon Jun 5 18:56:20 CST 2017 aarch64 aarch64 aarch64 GNU/Linux
What happened:
I want to use kube-deploy/master.sh to setup master on ARM64, but I encountered the error when visiting $myip:8080/ui:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service "kubernetes-dashboard"",
"reason": "ServiceUnavailable",
"code": 503
}
My branch is 2017-2-7 (c8d6fbfc…)
by the way, It can work on X86-amd64 platform by using the same steps to install.
Anything else we need to know:
5.1 kubectl get pod --namespace=kube-system
k8s-master-10.193.20.23 4/4 Running 17 1h
k8s-proxy-v1-sk8vd 1/1 Running 0 1h
kube-addon-manager-10.193.20.23 2/2 Running 2 1h
kube-dns-3365905565-xvj7n 2/4 CrashLoopBackOff 65 1h
kubernetes-dashboard-1416335539-lhlhz 0/1 CrashLoopBackOff 22 1h
5.2 kubectl describe pods kubernetes-dashboard-1416335539-lhlhz --namespace=kube-system
Name: kubernetes-dashboard-1416335539-lhlhz
Namespace: kube-system
Node: 10.193.20.23/10.193.20.23
Start Time: Mon, 12 Jun 2017 10:04:07 +0800
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=1416335539
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kubernetes-dashboard-1416335539","uid":"6ab170d2-4f13-11e7-a...
scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 10.1.70.2
Controllers: ReplicaSet/kubernetes-dashboard-1416335539
Containers:
kubernetes-dashboard:
Container ID: docker://fbdbe4c047803b0e98ca7412ca617031f1f31d881e3a5838298a1fda24a1ae18
Image: gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0
Image ID: docker-pullable://gcr.io/google_containers/kubernetes-dashboard-arm64#sha256:559d58ef0d8e9dbe78f80060401b97d6262462318c0b8e071937a73896ea1d3d
Port: 9090/TCP
State: Running
Started: Mon, 12 Jun 2017 11:30:03 +0800
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 12 Jun 2017 11:24:28 +0800
Finished: Mon, 12 Jun 2017 11:24:59 +0800
Ready: True
Restart Count: 23
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-0mnn8 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-0mnn8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-0mnn8
Optional: false
QoS Class: Guaranteed
Node-Selectors:
Tolerations:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
30m 30m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id b0562b3640ae: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
18m 18m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 477066c3a00f: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
12m 12m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 3e021d6df31f: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
11m 11m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 43fe3c37817d: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
5m 5m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 23cea72e1f45: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
1h 5m 7 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Warning Unhealthy Liveness probe failed: Get http://10.1.70.2:9090/: dial tcp 10.1.70.2:9090: getsockopt: connection refused
1h 38s 335 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Warning BackOff Back-off restarting failed docker container
1h 38s 307 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)"
1h 27s 24 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Pulled Container image "gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0" already present on machine
59m 23s 15 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Created (events with common reason combined)
59m 22s 15 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Started (events with common reason combined)
5.3 kubectl get svc,ep,rc,rs,deploy,pod -o wide --all-namespaces
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default svc/kubernetes 10.0.0.1 443/TCP 16m
kube-system svc/kube-dns 10.0.0.10 53/UDP,53/TCP 16m k8s-app=kube-dns
kube-system svc/kubernetes-dashboard 10.0.0.95 80/TCP 16m k8s-app=kubernetes-dashboard
NAMESPACE NAME ENDPOINTS AGE
default ep/kubernetes 10.193.20.23:6443 16m
kube-system ep/kube-controller-manager <none> 11m
kube-system ep/kube-dns 16m
kube-system ep/kube-scheduler <none> 11m
kube-system ep/kubernetes-dashboard 16m
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINER(S) IMAGE(S) SELECTOR
kube-system rs/kube-dns-3365905565 1 1 0 16m kubedns,dnsmasq,dnsmasq-metrics,healthz gcr.io/google_containers/kubedns-arm64:1.9,gcr.io/google_containers/kube-dnsmasq-arm64:1.4,gcr.io/google_containers/dnsmasq-metrics-arm64:1.0,gcr.io/google_containers/exechealthz-arm64:1.2 k8s-app=kube-dns,pod-template-hash=3365905565
kube-system rs/kubernetes-dashboard-1416335539 1 1 0 16m kubernetes-dashboard gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0 k8s-app=kubernetes-dashboard,pod-template-hash=1416335539
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINER(S) IMAGE(S) SELECTOR
kube-system deploy/kube-dns 1 1 1 0 16m kubedns,dnsmasq,dnsmasq-metrics,healthz gcr.io/google_containers/kubedns-arm64:1.9,gcr.io/google_containers/kube-dnsmasq-arm64:1.4,gcr.io/google_containers/dnsmasq-metrics-arm64:1.0,gcr.io/google_containers/exechealthz-arm64:1.2 k8s-app=kube-dns
kube-system deploy/kubernetes-dashboard 1 1 1 0 16m kubernetes-dashboard gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0 k8s-app=kubernetes-dashboard
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system po/k8s-master-10.193.20.23 4/4 Running 50 15m 10.193.20.23 10.193.20.23
kube-system po/k8s-proxy-v1-5b831 1/1 Running 0 16m 10.193.20.23 10.193.20.23
kube-system po/kube-addon-manager-10.193.20.23 2/2 Running 6 15m 10.193.20.23 10.193.20.23
kube-system po/kube-dns-3365905565-jxg4f 1/4 CrashLoopBackOff 20 16m 10.1.5.3 10.193.20.23
kube-system po/kubernetes-dashboard-1416335539-frt3v 0/1 CrashLoopBackOff 7 16m 10.1.5.2 10.193.20.23
5.4 kubectl describe pods kube-dns-3365905565-lb0mq --namespace=kube-system
Name: kube-dns-3365905565-lb0mq
Namespace: kube-system
Node: 10.193.20.23/10.193.20.23
Start Time: Wed, 14 Jun 2017 10:43:46 +0800
Labels: k8s-app=kube-dns
pod-template-hash=3365905565
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-dns-3365905565","uid":"4870aec2-50ab-11e7-a420-6805ca36...
scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 10.1.20.3
Controllers: ReplicaSet/kube-dns-3365905565
Containers:
kubedns:
Container ID: docker://729562769b48be60a02b62692acd3d1e1c67ac2505f4cb41240067777f45fd77
Image: gcr.io/google_containers/kubedns-arm64:1.9
Image ID: docker-pullable://gcr.io/google_containers/kubedns-arm64#sha256:3c78a2c5b9b86c5aeacf9f5967f206dcf1e64362f3e7f274c1c078c954ecae38
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-map=kube-dns
--v=0
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 14 Jun 2017 10:56:29 +0800
Finished: Wed, 14 Jun 2017 10:58:06 +0800
Ready: False
Restart Count: 6
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
dnsmasq:
Container ID: docker://b6d7e98a4af2715294764929f901947ab3b985be45d9f213245bd338ab8c3101
Image: gcr.io/google_containers/kube-dnsmasq-arm64:1.4
Image ID: docker-pullable://gcr.io/google_containers/kube-dnsmasq-arm64#sha256:dff5f9e2a521816aa314d469fd8ef961270fe43b4a74bab490385942103f3728
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 14 Jun 2017 10:55:50 +0800
Finished: Wed, 14 Jun 2017 10:57:26 +0800
Ready: False
Restart Count: 6
Requests:
cpu: 150m
memory: 10Mi
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
dnsmasq-metrics:
Container ID: docker://51693aea0e732e488b631dcedc082f5a9e23b5b74857217cf005d1e947375367
Image: gcr.io/google_containers/dnsmasq-metrics-arm64:1.0
Image ID: docker-pullable://gcr.io/google_containers/dnsmasq-metrics-arm64#sha256:fc0e8b676a26ed0056b8c68611b74b9b5f3f00c608e5b11ef1608484ce55dd9a
Port: 10054/TCP
Args:
--v=2
--logtostderr
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Exit Code: 128
Started: Wed, 14 Jun 2017 10:57:28 +0800
Finished: Wed, 14 Jun 2017 10:57:28 +0800
Ready: False
Restart Count: 7
Requests:
memory: 10Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
healthz:
Container ID: docker://fab7ef143a95ad4d2f6363d5fcdc162eba1522b92726665916462be765289327
Image: gcr.io/google_containers/exechealthz-arm64:1.2
Image ID: docker-pullable://gcr.io/google_containers/exechealthz-arm64#sha256:e8300fde6c36b454cc00b5fffc96d6985622db4d8eb42a9f98f24873e9535b5c
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
State: Running
Started: Wed, 14 Jun 2017 10:44:31 +0800
Ready: True
Restart Count: 0
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-1t5v9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1t5v9
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
15m 15m 1 default-scheduler Normal Scheduled Successfully assigned kube-dns-3365905565-lb0mq to 10.193.20.23
14m 14m 1 kubelet, 10.193.20.23 spec.containers{kubedns} Normal Created Created container with docker id 2fef2db445e6; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{kubedns} Normal Started Started container with docker id 2fef2db445e6
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq} Normal Created Created container with docker id 41ec998eeb76; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq} Normal Started Started container with docker id 41ec998eeb76
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 676ef0e877c8; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Pulled Container image "gcr.io/google_containers/exechealthz-arm64:1.2" already present on machine
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 676ef0e877c8 with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Created Created container with docker id fab7ef143a95; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Started Started container with docker id fab7ef143a95
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 45f6bd7f1f3a with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 45f6bd7f1f3a; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "dnsmasq-metrics" with CrashLoopBackOff: "Back-off 10s restarting failed container=dnsmasq-metrics pod=kube-dns-3365905565-lb0mq_kube-system(48845c1a-50ab-11e7-a420-6805ca369d7f)"
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 2d1e5adb97bb; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 2d1e5adb97bb with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 2 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "dnsmasq-metrics" with CrashLoopBackOff: "Back-off 20s restarting failed container=dnsmasq-metrics pod=kube-dns-3365905565-lb0mq_kube-system(48845c1a-50ab-11e7-a420-6805ca369d7f)"
So it looks like you have hit a (or several) bugs in Kubernetes. I suggest that you retry with a more recent version (possibly another docker version too). It would be a good idea to report these bugs too (https://github.com/kubernetes/dashboard/issues).
All in all, bear in mind that Kubernetes on arm is an advanced topic and you should expect problems and be ready to debug/resolve them :/
There might be a problem with that docker image (gcr.io/google_containers/dnsmasq-metrics-amd64). Non amd64 stuff is not well tested.
Could you try running:
kubectl set image --namespace=kube-system deployment/kube-dns dnsmasq-metrics=lenart/dnsmasq-metrics-arm64:1.0`
Can't reach dashboard because the dashboard Pod is unhealthy and failing the readiness probe. Because it's not ready it's not considered for the dashboard service so the service has no endpoints which leads to the error message you reported.
The dashboard is most likely unhealthy because kube-dns is not ready (1/4 containers in the Pod ready, should be 4/4).
The kube-dns is most likely unhealthy because you have no pod networking (overlay network) deployed.
Go to the add-ons, pick a network add-on and deploy it. Weave has 1.5 compatible version and requires no setup.
After you have done that give it a few minutes. If you are inpatient just delete the kubernetes-dashboard and kube-dns pods (not the deployment/controller!!). If this does not resolve your problem then please update your question with the new information.