kubernetes HA cluster masters nodes not ready - kubernetes

I have deployed a kubernetes HA cluster using the next config.yaml:
etcd:
endpoints:
- "http://172.16.8.236:2379"
- "http://172.16.8.237:2379"
- "http://172.16.8.238:2379"
networking:
podSubnet: "192.168.0.0/16"
apiServerExtraArgs:
endpoint-reconciler-type: lease
When I check kubectl get nodes:
NAME STATUS ROLES AGE VERSION
master1 Ready master 22m v1.10.4
master2 NotReady master 17m v1.10.4
master3 NotReady master 16m v1.10.4
If I check the pods, I can see too much are failing:
[ikerlan#master1 ~]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-etcd-5jftb 0/1 NodeLost 0 16m
calico-etcd-kl7hb 1/1 Running 0 16m
calico-etcd-z7sps 0/1 NodeLost 0 16m
calico-kube-controllers-79dccdc4cc-vt5t7 1/1 Running 0 16m
calico-node-dbjl2 2/2 Running 0 16m
calico-node-gkkth 0/2 NodeLost 0 16m
calico-node-rqzzl 0/2 NodeLost 0 16m
kube-apiserver-master1 1/1 Running 0 21m
kube-controller-manager-master1 1/1 Running 0 22m
kube-dns-86f4d74b45-rwchm 1/3 CrashLoopBackOff 17 22m
kube-proxy-226xd 1/1 Running 0 22m
kube-proxy-jr2jq 0/1 ContainerCreating 0 18m
kube-proxy-zmjdm 0/1 ContainerCreating 0 17m
kube-scheduler-master1 1/1 Running 0 21m
If I run kubectl describe node master2:
[ikerlan#master1 ~]$ kubectl describe node master2
Name: master2
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=master2
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Mon, 11 Jun 2018 12:06:03 +0200
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure False Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:00 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 172.16.8.237
Hostname: master2
Capacity:
cpu: 2
ephemeral-storage: 37300436Ki
Then if I check the pods, kubectl describe pod -n kube-system calico-etcd-5jftb:
[ikerlan#master1 ~]$ kubectl describe pod -n kube-system calico-etcd-5jftb
Name: calico-etcd-5jftb
Namespace: kube-system
Node: master2/
Labels: controller-revision-hash=4283683065
k8s-app=calico-etcd
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Terminating (lasts 20h)
Termination Grace Period: 30s
Reason: NodeLost
Message: Node master2 which was running pod calico-etcd-5jftb is unresponsive
IP:
Controlled By: DaemonSet/calico-etcd
Containers:
calico-etcd:
Image: quay.io/coreos/etcd:v3.1.10
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/etcd
Args:
--name=calico
--data-dir=/var/etcd/calico-data
--advertise-client-urls=http://$CALICO_ETCD_IP:6666
--listen-client-urls=http://0.0.0.0:6666
--listen-peer-urls=http://0.0.0.0:6667
--auto-compaction-retention=1
Environment:
CALICO_ETCD_IP: (v1:status.podIP)
Mounts:
/var/etcd from var-etcd (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tj6d7 (ro)
Volumes:
var-etcd:
Type: HostPath (bare host directory volume)
Path: /var/etcd
HostPathType:
default-token-tj6d7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tj6d7
Optional: false
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events: <none>
I have tried to update etcd cluster, to version 3.3 and now I can see the next logs (and some more timeouts):
2018-06-12 09:17:51.305960 W | etcdserver: read-only range request "key:\"/registry/apiregistration.k8s.io/apiservices/v1beta1.authentication.k8s.io\" " took too long (190.475363ms) to execute
2018-06-12 09:18:06.788558 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " took too long (109.543763ms) to execute
2018-06-12 09:18:34.875823 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " took too long (136.649505ms) to execute
2018-06-12 09:18:41.634057 W | etcdserver: read-only range request "key:\"/registry/minions\" range_end:\"/registry/miniont\" count_only:true " took too long (106.00073ms) to execute
2018-06-12 09:18:42.345564 W | etcdserver: request "header:<ID:4449666326481959890 > lease_revoke:<ID:4449666326481959752 > " took too long (142.771179ms) to execute
I have checked: kubectl get events
22m 22m 1 master2.15375fdf087fc69f Node Normal Starting kube-proxy, master2 Starting kube-proxy.
22m 22m 1 master3.15375fe744055758 Node Normal Starting kubelet, master3 Starting kubelet.
22m 22m 5 master3.15375fe74d47afa2 Node Normal NodeHasSufficientDisk kubelet, master3 Node master3 status is now: NodeHasSufficientDisk
22m 22m 5 master3.15375fe74d47f80f Node Normal NodeHasSufficientMemory kubelet, master3 Node master3 status is now: NodeHasSufficientMemory
22m 22m 5 master3.15375fe74d48066e Node Normal NodeHasNoDiskPressure kubelet, master3 Node master3 status is now: NodeHasNoDiskPressure
22m 22m 5 master3.15375fe74d481368 Node Normal NodeHasSufficientPID kubelet, master3 Node master3 status is now: NodeHasSufficientPID

I see multiple calico-etcd pods attempting to be ran, if you have used a calico.yaml that deploys etcd for you, that will not work in a multi-master environment.
That manifest is not intended for production deployment and will not work in a multi-master environment because the etcd it deploys is not configured to attempt to form a cluster.
You could still use that manifest but you would need to remove the etcd pods it deploys and set the etcd_endpoints to an etcd cluster you have deployed.

I have solved it:
Adding all the masters IPs and LB IP to the apiServerCertSANs
Copying the kubernetes certificates from the first master to the other masters.

Related

helm chart deployment liveness and readiness failed error

I have an Openshift cluster. I have created one custom application and am trying to deploy it using Helm charts. When I deploy it on Openshift using 'oc new-app', the deployments works perfectly fine. But when I deploy it using helm chart, it does not work.
Following is the output of 'oc get all' ----
[root#worker2 ~]#
[root#worker2 ~]# oc get all
NAME READY STATUS RESTARTS AGE
pod/chart-acme-85648d4645-7msdl 1/1 Running 0 3d7h
pod/chart1-acme-f8b65b78d-k2fb6 1/1 Running 0 3d7h
pod/netshoot 1/1 Running 0 3d10h
pod/sample1-buildachart-5b5d9d8649-qqmsf 0/1 CrashLoopBackOff 672 2d9h
pod/sample2-686bb7f969-fx5bk 0/1 CrashLoopBackOff 674 2d9h
pod/vjobs-npm-96b65fcb-b2p27 0/1 CrashLoopBackOff 817 47h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/chart-acme LoadBalancer 172.30.174.208 <pending> 80:30222/TCP 3d7h
service/chart1-acme LoadBalancer 172.30.153.36 <pending> 80:30383/TCP 3d7h
service/sample1-buildachart NodePort 172.30.29.124 <none> 80:32375/TCP 2d9h
service/sample2 NodePort 172.30.19.24 <none> 80:32647/TCP 2d9h
service/vjobs-npm NodePort 172.30.205.30 <none> 80:30476/TCP 47h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/chart-acme 1/1 1 1 3d7h
deployment.apps/chart1-acme 1/1 1 1 3d7h
deployment.apps/sample1-buildachart 0/1 1 0 2d9h
deployment.apps/sample2 0/1 1 0 2d9h
deployment.apps/vjobs-npm 0/1 1 0 47h
NAME DESIRED CURRENT READY AGE
replicaset.apps/chart-acme-85648d4645 1 1 1 3d7h
replicaset.apps/chart1-acme-f8b65b78d 1 1 1 3d7h
replicaset.apps/sample1-buildachart-5b5d9d8649 1 1 0 2d9h
replicaset.apps/sample2-686bb7f969 1 1 0 2d9h
replicaset.apps/vjobs-npm-96b65fcb 1 1 0 47h
[root#worker2 ~]#
[root#worker2 ~]#
As per above diagram, you can see the deployment 'vjobs-npm' gives a 'CrashLoopBackOff' error.
Below is the output of 'oc describe pod' ----
[root#worker2 ~]#
[root#worker2 ~]# oc describe pod vjobs-npm-96b65fcb-b2p27
Name: vjobs-npm-96b65fcb-b2p27
Namespace: vjobs-testing
Priority: 0
Node: worker0/192.168.100.109
Start Time: Mon, 31 Aug 2020 09:30:28 -0400
Labels: app.kubernetes.io/instance=vjobs-npm
app.kubernetes.io/name=vjobs-npm
pod-template-hash=96b65fcb
Annotations: openshift.io/scc: restricted
Status: Running
IP: 10.131.0.107
IPs:
IP: 10.131.0.107
Controlled By: ReplicaSet/vjobs-npm-96b65fcb
Containers:
vjobs-npm:
Container ID: cri-o://c232849eb25bd96ae9343ac3ed1539d492985dd8cdf47a5a4df7d3cf776c4cf3
Image: quay.io/aditya7002/vjobs_local_build_new:latest
Image ID: quay.io/aditya7002/vjobs_local_build_new#sha256:87f18e3a24fc7043a43a143e96b0b069db418ace027d95a5427cf53de56feb4c
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 31 Aug 2020 09:31:23 -0400
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 31 Aug 2020 09:30:31 -0400
Finished: Mon, 31 Aug 2020 09:31:22 -0400
Ready: False
Restart Count: 1
Liveness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from vjobs-npm-token-vw6f7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
vjobs-npm-token-vw6f7:
Type: Secret (a volume populated by a Secret)
SecretName: vjobs-npm-token-vw6f7
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 68s default-scheduler Successfully assigned vjobs-testing/vjobs-npm-96b65fcb-b2p27 to worker0
Normal Killing 44s kubelet, worker0 Container vjobs-npm failed liveness probe, will be restarted
Normal Pulling 14s (x2 over 66s) kubelet, worker0 Pulling image "quay.io/aditya7002/vjobs_local_build_new:latest"
Normal Pulled 13s (x2 over 65s) kubelet, worker0 Successfully pulled image "quay.io/aditya7002/vjobs_local_build_new:latest"
Normal Created 13s (x2 over 65s) kubelet, worker0 Created container vjobs-npm
Normal Started 13s (x2 over 65s) kubelet, worker0 Started container vjobs-npm
Warning Unhealthy 4s (x4 over 64s) kubelet, worker0 Liveness probe failed: Get http://10.131.0.107:80/: dial tcp 10.131.0.107:80:
connect: connection refused
Warning Unhealthy 1s (x7 over 61s) kubelet, worker0 Readiness probe failed: Get http://10.131.0.107:80/: dial tcp 10.131.0.107:80
: connect: connection refused
[root#worker2 ~]#

How to solve CoreDNS always stuck at "waiting for kubernetes"?

Vagrant, vm os: ubuntu/bionic64, swap disabled
Kubernetes version: 1.18.0
infrastructure: 1 haproxy node, 3 external etcd node and 3 kubernetes master node
Attempts: trying to setup ha rancher so I am setting up ha kubernetes cluster first using kubeadm by following the official doc
Expected behavior: all k8s components are up and be able to navigate to weave scope to see all nodes
Actual behavior: CoreDNS is still not ready even after installing CNI (Weave Net) so weave scope (the nice visualization ui) is not working unless networking is working properly (weave net and coredns).
# kubeadm config
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "172.16.0.30:6443"
etcd:
external:
caFile: /etc/rancher-certs/ca-chain.cert.pem
keyFile: /etc/rancher-certs/etcd.key.pem
certFile: /etc/rancher-certs/etcd.cert.pem
endpoints:
- https://172.16.0.20:2379
- https://172.16.0.21:2379
- https://172.16.0.22:2379
-------------------------------------------------------------------------------
# firewall
vagrant#rancher-0:~$ sudo ufw status
Status: active
To Action From
-- ------ ----
OpenSSH ALLOW Anywhere
Anywhere ALLOW 172.16.0.0/26
OpenSSH (v6) ALLOW Anywhere (v6)
-------------------------------------------------------------------------------
# no swap
vagrant#rancher-0:~$ free -h
total used free shared buff/cache available
Mem: 1.9G 928M 97M 1.4M 966M 1.1G
Swap: 0B 0B 0B
k8s diagnostic output:
vagrant#rancher-0:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rancher-0 Ready master 14m v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
rancher-1 Ready master 9m23s v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
rancher-2 Ready master 4m26s v1.18.0 10.0.2.15 <none> Ubuntu 18.04.4 LTS 4.15.0-99-generic docker://19.3.12
vagrant#rancher-0:~$ kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cert-manager cert-manager ClusterIP 10.106.146.236 <none> 9402/TCP 17m
cert-manager cert-manager-webhook ClusterIP 10.102.162.87 <none> 443/TCP 17m
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 18m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 18m
weave weave-scope-app NodePort 10.96.110.153 <none> 80:30276/TCP 17m
vagrant#rancher-0:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager cert-manager-bd9d585bd-x8qpb 0/1 Pending 0 16m <none> <none> <none> <none>
cert-manager cert-manager-cainjector-76c6657c55-d8fpj 0/1 Pending 0 16m <none> <none> <none> <none>
cert-manager cert-manager-webhook-64b9b4fdfd-sspjx 0/1 Pending 0 16m <none> <none> <none> <none>
kube-system coredns-66bff467f8-9z4f8 0/1 Running 0 10m 10.32.0.2 rancher-1 <none> <none>
kube-system coredns-66bff467f8-zkk99 0/1 Running 0 16m 10.32.0.2 rancher-0 <none> <none>
kube-system kube-apiserver-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-apiserver-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-apiserver-rancher-2 1/1 Running 0 7m23s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-controller-manager-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-controller-manager-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-controller-manager-rancher-2 1/1 Running 0 7m24s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-proxy-grts7 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-proxy-jv9lm 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-proxy-z2lrc 1/1 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
kube-system kube-scheduler-rancher-0 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
kube-system kube-scheduler-rancher-1 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system kube-scheduler-rancher-2 1/1 Running 0 7m23s 10.0.2.15 rancher-2 <none> <none>
kube-system weave-net-nnvkd 2/2 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
kube-system weave-net-pgxnq 2/2 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
kube-system weave-net-q22bh 2/2 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
weave weave-scope-agent-9gwj2 1/1 Running 0 16m 10.0.2.15 rancher-0 <none> <none>
weave weave-scope-agent-mznp7 1/1 Running 0 7m25s 10.0.2.15 rancher-2 <none> <none>
weave weave-scope-agent-v7jql 1/1 Running 0 12m 10.0.2.15 rancher-1 <none> <none>
weave weave-scope-app-bc7444d59-cjpd8 0/1 Pending 0 16m <none> <none> <none> <none>
weave weave-scope-cluster-agent-5c5dcc8cb-ln4hg 0/1 Pending 0 16m <none> <none> <none> <none>
vagrant#rancher-0:~$ kubectl describe node rancher-0
Name: rancher-0
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=rancher-0
kubernetes.io/os=linux
node-role.kubernetes.io/master=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 28 Jul 2020 09:24:17 +0000
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: rancher-0
AcquireTime: <unset>
RenewTime: Tue, 28 Jul 2020 09:35:33 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Tue, 28 Jul 2020 09:24:47 +0000 Tue, 28 Jul 2020 09:24:47 +0000 WeaveIsUp Weave pod has set this
MemoryPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:17 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 28 Jul 2020 09:35:26 +0000 Tue, 28 Jul 2020 09:24:52 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.2.15
Hostname: rancher-0
Capacity:
cpu: 2
ephemeral-storage: 10098432Ki
hugepages-2Mi: 0
memory: 2040812Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 9306714916
hugepages-2Mi: 0
memory: 1938412Ki
pods: 110
System Info:
Machine ID: 9b1bc8a8ef2c4e5b844624a36302d877
System UUID: A282600C-28F8-4D49-A9D3-6F05CA16865E
Boot ID: 77746bf5-7941-4e72-817e-24f149172158
Kernel Version: 4.15.0-99-generic
OS Image: Ubuntu 18.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.12
Kubelet Version: v1.18.0
Kube-Proxy Version: v1.18.0
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-66bff467f8-zkk99 100m (5%) 0 (0%) 70Mi (3%) 170Mi (8%) 11m
kube-system kube-apiserver-rancher-0 250m (12%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-controller-manager-rancher-0 200m (10%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-proxy-jv9lm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system kube-scheduler-rancher-0 100m (5%) 0 (0%) 0 (0%) 0 (0%) 11m
kube-system weave-net-q22bh 20m (1%) 0 (0%) 0 (0%) 0 (0%) 11m
weave weave-scope-agent-9gwj2 100m (5%) 0 (0%) 100Mi (5%) 2000Mi (105%) 11m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 770m (38%) 0 (0%)
memory 170Mi (8%) 2170Mi (114%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 11m kubelet, rancher-0 Starting kubelet.
Warning ImageGCFailed 11m kubelet, rancher-0 failed to get imageFs info: unable to find data in memory cache
Normal NodeHasSufficientMemory 11m (x3 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m (x3 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m (x2 over 11m) kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet, rancher-0 Updated Node Allocatable limit across pods
Normal Starting 11m kubelet, rancher-0 Starting kubelet.
Normal NodeHasSufficientMemory 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 11m kubelet, rancher-0 Node rancher-0 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 11m kubelet, rancher-0 Updated Node Allocatable limit across pods
Normal Starting 11m kube-proxy, rancher-0 Starting kube-proxy.
Normal NodeReady 10m kubelet, rancher-0 Node rancher-0 status is now: NodeReady
vagrant#rancher-0:~$ kubectl exec -n kube-system weave-net-nnvkd -c weave -- /home/weave/weave --local status
Version: 2.6.5 (failed to check latest version - see logs; next check at 2020/07/28 15:27:34)
Service: router
Protocol: weave 1..2
Name: 5a:40:7b:be:35:1d(rancher-2)
Encryption: disabled
PeerDiscovery: enabled
Targets: 0
Connections: 0
Peers: 1
TrustedSubnets: none
Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
vagrant#rancher-0:~$ kubectl logs weave-net-nnvkd -c weave -n kube-system
INFO: 2020/07/28 09:34:15.989759 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 ipalloc-init:consensus=0 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:5a:40:7b:be:35:1d nickname:rancher-2 no-dns:true port:6783]
INFO: 2020/07/28 09:34:15.989792 weave 2.6.5
INFO: 2020/07/28 09:34:16.178429 Bridge type is bridged_fastdp
INFO: 2020/07/28 09:34:16.178451 Communication between peers is unencrypted.
INFO: 2020/07/28 09:34:16.182442 Our name is 5a:40:7b:be:35:1d(rancher-2)
INFO: 2020/07/28 09:34:16.182499 Launch detected - using supplied peer list: []
INFO: 2020/07/28 09:34:16.196598 Checking for pre-existing addresses on weave bridge
INFO: 2020/07/28 09:34:16.204735 [allocator 5a:40:7b:be:35:1d] No valid persisted data
INFO: 2020/07/28 09:34:16.206236 [allocator 5a:40:7b:be:35:1d] Initialising via deferred consensus
INFO: 2020/07/28 09:34:16.206291 Sniffing traffic on datapath (via ODP)
INFO: 2020/07/28 09:34:16.210065 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2020/07/28 09:34:16.210471 Listening for metrics requests on 0.0.0.0:6782
INFO: 2020/07/28 09:34:16.275523 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.15.0-99-generic&flag_kubernetes-cluster-size=0&flag_kubernetes-cluster-uid=aca5a8cc-27ca-4e8f-9964-4cf3971497c6&flag_kubernetes-version=v1.18.6&os=linux&signature=7uMaGpuc3%2F8ZtHqGoHyCnJ5VfOJUmnL%2FD6UZSqWYxKA%3D&version=2.6.5: dial tcp: lookup checkpoint-api.weave.works on 10.96.0.10:53: write udp 10.0.2.15:43742->10.96.0.10:53: write: operation not permitted
INFO: 2020/07/28 09:34:17.052454 [kube-peers] Added myself to peer list &{[{96:cd:5b:7f:65:73 rancher-1} {5a:40:7b:be:35:1d rancher-2}]}
DEBU: 2020/07/28 09:34:17.065599 [kube-peers] Nodes that have disappeared: map[96:cd:5b:7f:65:73:{96:cd:5b:7f:65:73 rancher-1}]
DEBU: 2020/07/28 09:34:17.065836 [kube-peers] Preparing to remove disappeared peer 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.079511 [kube-peers] Noting I plan to remove 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.095598 weave DELETE to http://127.0.0.1:6784/peer/96:cd:5b:7f:65:73 with map[]
INFO: 2020/07/28 09:34:17.097095 [kube-peers] rmpeer of 96:cd:5b:7f:65:73: 0 IPs taken over from 96:cd:5b:7f:65:73
DEBU: 2020/07/28 09:34:17.644909 [kube-peers] Nodes that have disappeared: map[]
INFO: 2020/07/28 09:34:17.658557 Assuming quorum size of 1
10.32.0.1
DEBU: 2020/07/28 09:34:17.761697 registering for updates for node delete events
vagrant#rancher-0:~$ kubectl logs coredns-66bff467f8-9z4f8 -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0728 09:31:10.764496 1 trace.go:116] Trace[2019727887]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.763691008 +0000 UTC m=+0.308910646) (total time: 30.000692218s):
Trace[2019727887]: [30.000692218s] [30.000692218s] END
E0728 09:31:10.764526 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0728 09:31:10.764666 1 trace.go:116] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.761333538 +0000 UTC m=+0.306553222) (total time: 30.00331917s):
Trace[1427131847]: [30.00331917s] [30.00331917s] END
E0728 09:31:10.764673 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0728 09:31:10.767435 1 trace.go:116] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-07-28 09:30:40.762085835 +0000 UTC m=+0.307305485) (total time: 30.005326233s):
Trace[939984059]: [30.005326233s] [30.005326233s] END
E0728 09:31:10.767569 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
...
vagrant#rancher-0:~$ kubectl describe pod coredns-66bff467f8-9z4f8 -n kube-system
Name: coredns-66bff467f8-9z4f8
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: rancher-1/10.0.2.15
Start Time: Tue, 28 Jul 2020 09:30:38 +0000
Labels: k8s-app=kube-dns
pod-template-hash=66bff467f8
Annotations: <none>
Status: Running
IP: 10.32.0.2
IPs:
IP: 10.32.0.2
Controlled By: ReplicaSet/coredns-66bff467f8
Containers:
coredns:
Container ID: docker://899cfd54a5281939dcb09eece96ff3024a3b4c444e982bda74b8334504a6a369
Image: k8s.gcr.io/coredns:1.6.7
Image ID: docker-pullable://k8s.gcr.io/coredns#sha256:2c8d61c46f484d881db43b34d13ca47a269336e576c81cf007ca740fa9ec0800
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Tue, 28 Jul 2020 09:30:40 +0000
Ready: False
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-znl2p (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-znl2p:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-znl2p
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28m default-scheduler Successfully assigned kube-system/coredns-66bff467f8-9z4f8 to rancher-1
Normal Pulled 28m kubelet, rancher-1 Container image "k8s.gcr.io/coredns:1.6.7" already present on machine
Normal Created 28m kubelet, rancher-1 Created container coredns
Normal Started 28m kubelet, rancher-1 Started container coredns
Warning Unhealthy 3m35s (x151 over 28m) kubelet, rancher-1 Readiness probe failed: HTTP probe failed with statuscode: 503
Edit 0:
The issue is solved, so the problem was that I configure ufw rule to allow cidr of my vms network but does not allow from kubernetes(from docker containers), so I configure ufw to allow certain ports documented from kubernetes website and ports documented from weave website so now the cluster is working as expected
As #shadowlegend said the issue is solved, so the problem was with configuration ufw rule to allow cidr of vms network but does not allow from kubernetes(from docker containers). Configure ufw to allow certain ports documented from kubernetes website and ports documented from weave website and the cluster will be working as expected.
Take a look: ufw-firewall-kubernetes.
NOTE:
Those same playbook work as expected on google cloud.

PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"

I am setting up a kubernetes lab using one node only and learning to setup kubernetes nfs.
I am following kubernetes nfs example step by step from the following link:
https://github.com/kubernetes/examples/tree/master/staging/volumes/nfs
Trying the first section, NFS server part, executed 3 commands:
$ kubectl create -f examples/volumes/nfs/provisioner/nfs-server-gce-pv.yaml
$ kubectl create -f examples/volumes/nfs/nfs-server-rc.yaml
$ kubectl create -f examples/volumes/nfs/nfs-server-service.yaml
I experience problem, where I see the following event:
PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
Research done:
https://github.com/kubernetes/kubernetes/issues/43120
https://github.com/kubernetes/examples/pull/30
None of those links above help me to resolve issue I experience.
I have made sure it is using image 0.8.
Image: gcr.io/google_containers/volume-nfs:0.8
Does anyone know what does this message mean?
Clue and guidance on how to troubleshoot this issue is very much appreciated.
Thank you.
$ docker version
Client:
Version: 17.09.0-ce
API version: 1.32
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:41:23 2017
OS/Arch: linux/amd64
Server:
Version: 17.09.0-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:42:49 2017
OS/Arch: linux/amd64
Experimental: false
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:39:33Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.3", GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
lab-kube-06 Ready master 2m v1.8.3
$ kubectl describe nodes lab-kube-06
Name: lab-kube-06
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=lab-kube-06
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Thu, 16 Nov 2017 16:51:28 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Thu, 16 Nov 2017 17:30:36 +0000 Thu, 16 Nov 2017 16:51:28 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Thu, 16 Nov 2017 17:30:36 +0000 Thu, 16 Nov 2017 16:51:28 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 16 Nov 2017 17:30:36 +0000 Thu, 16 Nov 2017 16:51:28 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Thu, 16 Nov 2017 17:30:36 +0000 Thu, 16 Nov 2017 16:51:28 +0000 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.0.6
Hostname: lab-kube-06
Capacity:
cpu: 2
memory: 8159076Ki
pods: 110
Allocatable:
cpu: 2
memory: 8056676Ki
pods: 110
System Info:
Machine ID: e198b57826ab4704a6526baea5fa1d06
System UUID: 05EF54CC-E8C8-874B-A708-BBC7BC140FF2
Boot ID: 3d64ad16-5603-42e9-bd34-84f6069ded5f
Kernel Version: 3.10.0-693.el7.x86_64
OS Image: Red Hat Enterprise Linux Server 7.4 (Maipo)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://Unknown
Kubelet Version: v1.8.3
Kube-Proxy Version: v1.8.3
ExternalID: lab-kube-06
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system etcd-lab-kube-06 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-lab-kube-06 250m (12%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-lab-kube-06 200m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-dns-545bc4bfd4-gmdvn 260m (13%) 0 (0%) 110Mi (1%) 170Mi (2%)
kube-system kube-proxy-68w8k 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-lab-kube-06 100m (5%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-7zlbg 20m (1%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
830m (41%) 0 (0%) 110Mi (1%) 170Mi (2%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 39m kubelet, lab-kube-06 Starting kubelet.
Normal NodeAllocatableEnforced 39m kubelet, lab-kube-06 Updated Node Allocatable limit across pods
Normal NodeHasSufficientDisk 39m (x8 over 39m) kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasSufficientDisk
Normal NodeHasSufficientMemory 39m (x8 over 39m) kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 39m (x7 over 39m) kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasNoDiskPressure
Normal Starting 38m kube-proxy, lab-kube-06 Starting kube-proxy.
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pv-provisioning-demo Pending 14s
$ kubectl get events
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
18m 18m 1 lab-kube-06.14f79f093119829a Node Normal Starting kubelet, lab-kube-06 Starting kubelet.
18m 18m 8 lab-kube-06.14f79f0931d0eb6e Node Normal NodeHasSufficientDisk kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasSufficientDisk
18m 18m 8 lab-kube-06.14f79f0931d1253e Node Normal NodeHasSufficientMemory kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasSufficientMemory
18m 18m 7 lab-kube-06.14f79f0931d131be Node Normal NodeHasNoDiskPressure kubelet, lab-kube-06 Node lab-kube-06 status is now: NodeHasNoDiskPressure
18m 18m 1 lab-kube-06.14f79f0932f3f1b0 Node Normal NodeAllocatableEnforced kubelet, lab-kube-06 Updated Node Allocatable limit across pods
18m 18m 1 lab-kube-06.14f79f122a32282d Node Normal RegisteredNode controllermanager Node lab-kube-06 event: Registered Node lab-kube-06 in Controller
17m 17m 1 lab-kube-06.14f79f1cdfc4c3b1 Node Normal Starting kube-proxy, lab-kube-06 Starting kube-proxy.
17m 17m 1 lab-kube-06.14f79f1d94ef1c17 Node Normal RegisteredNode controllermanager Node lab-kube-06 event: Registered Node lab-kube-06 in Controller
14m 14m 1 lab-kube-06.14f79f4b91cf73b3 Node Normal RegisteredNode controllermanager Node lab-kube-06 event: Registered Node lab-kube-06 in Controller
58s 11m 42 nfs-pv-provisioning-demo.14f79f766cf887f2 PersistentVolumeClaim Normal FailedBinding persistentvolume-controller no persistent volumes available for this claim and no storage class is set
14s 4m 20 nfs-server-kq44h.14f79fd21b9db5f9 Pod Warning FailedScheduling default-scheduler PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
4m 4m 1 nfs-server.14f79fd21b946027 ReplicationController Normal SuccessfulCreate replication-controller Created pod: nfs-server-kq44h
2m
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-kq44h 0/1 Pending 0 16s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-kq44h 0/1 Pending 0 26s
$ kubectl get rc
NAME DESIRED CURRENT READY AGE
nfs-server 1 1 0 40s
$ kubectl describe pods nfs-server-kq44h
Name: nfs-server-kq44h
Namespace: default
Node: <none>
Labels: role=nfs-server
Annotations: kubernetes.io/created-
by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-server","uid":"5653eb53-caf0-11e7-ac02-000d3a04eb...
Status: Pending
IP:
Created By: ReplicationController/nfs-server
Controlled By: ReplicationController/nfs-server
Containers:
nfs-server:
Image: gcr.io/google_containers/volume-nfs:0.8
Ports: 2049/TCP, 20048/TCP, 111/TCP
Environment: <none>
Mounts:
/exports from mypvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-plgv5 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
mypvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-pv-provisioning-demo
ReadOnly: false
default-token-plgv5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-plgv5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 39s (x22 over 5m) default-scheduler PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
Each Persistent Volume Claim (PVC) needs a Persistent Volume (PV) that it can bind to. In your example, you have only created a PVC, but not the volume itself.
A PV can either be created manually, or automatically by using a Volume class with a provisioner. Have a look at the docs of static and dynamic provisioning for more information):
There are two ways PVs may be provisioned: statically or dynamically.
Static
A cluster administrator creates a number of PVs. They carry the details of the real storage which is available for use by cluster users. [...]
Dynamic
When none of the static PVs the administrator created matches a user’s PersistentVolumeClaim, the cluster may try to dynamically provision a volume specially for the PVC. This provisioning is based on StorageClasses: the PVC must request a class and the administrator must have created and configured that class in order for dynamic provisioning to occur.
In your example, you are creating a storage class provisioner (defined in examples/volumes/nfs/provisioner/nfs-server-gce-pv.yaml) that seems to be tailored for usage within the Google cloud (which it will probably not be able to actually create PVs in your lab setup).
You can create a persistent volume manually on your own. After creating the PV, the PVC should automatically bind itself to the volume and your pods should start. Below is an example for a persistent volume that uses the node's local file system as a volume (which is probably OK for a one-node test setup):
apiVersion: v1
kind: PersistentVolume
metadata:
name: someVolume
spec:
capacity:
storage: 200Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /path/on/host
For a production setup, you'll probably want to choose a different volume type at hostPath, although the volume types available to you will greatly differ depending on the environment that you're in (cloud or self-hosted/bare-metal).

issue on arm64: no endpoints,code:503

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/arm64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/arm64"}
Environment:
OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="16.04.2 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.2 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Kernel (e.g. uname -a):
Linux node4 4.11.0-rc6-next-20170411-00286-gcc55807 #0 SMP PREEMPT Mon Jun 5 18:56:20 CST 2017 aarch64 aarch64 aarch64 GNU/Linux
What happened:
I want to use kube-deploy/master.sh to setup master on ARM64, but I encountered the error when visiting $myip:8080/ui:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service "kubernetes-dashboard"",
"reason": "ServiceUnavailable",
"code": 503
}
My branch is 2017-2-7 (c8d6fbfc…)
by the way, It can work on X86-amd64 platform by using the same steps to install.
Anything else we need to know:
5.1 kubectl get pod --namespace=kube-system
k8s-master-10.193.20.23 4/4 Running 17 1h
k8s-proxy-v1-sk8vd 1/1 Running 0 1h
kube-addon-manager-10.193.20.23 2/2 Running 2 1h
kube-dns-3365905565-xvj7n 2/4 CrashLoopBackOff 65 1h
kubernetes-dashboard-1416335539-lhlhz 0/1 CrashLoopBackOff 22 1h
5.2 kubectl describe pods kubernetes-dashboard-1416335539-lhlhz --namespace=kube-system
Name: kubernetes-dashboard-1416335539-lhlhz
Namespace: kube-system
Node: 10.193.20.23/10.193.20.23
Start Time: Mon, 12 Jun 2017 10:04:07 +0800
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=1416335539
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kubernetes-dashboard-1416335539","uid":"6ab170d2-4f13-11e7-a...
scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 10.1.70.2
Controllers: ReplicaSet/kubernetes-dashboard-1416335539
Containers:
kubernetes-dashboard:
Container ID: docker://fbdbe4c047803b0e98ca7412ca617031f1f31d881e3a5838298a1fda24a1ae18
Image: gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0
Image ID: docker-pullable://gcr.io/google_containers/kubernetes-dashboard-arm64#sha256:559d58ef0d8e9dbe78f80060401b97d6262462318c0b8e071937a73896ea1d3d
Port: 9090/TCP
State: Running
Started: Mon, 12 Jun 2017 11:30:03 +0800
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 12 Jun 2017 11:24:28 +0800
Finished: Mon, 12 Jun 2017 11:24:59 +0800
Ready: True
Restart Count: 23
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-0mnn8 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-0mnn8:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-0mnn8
Optional: false
QoS Class: Guaranteed
Node-Selectors:
Tolerations:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
30m 30m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id b0562b3640ae: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
18m 18m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 477066c3a00f: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
12m 12m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 3e021d6df31f: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
11m 11m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 43fe3c37817d: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
5m 5m 1 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Killing Killing container with docker id 23cea72e1f45: pod "kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)" container "kubernetes-dashboard" is unhealthy, it will be killed and re-created.
1h 5m 7 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Warning Unhealthy Liveness probe failed: Get http://10.1.70.2:9090/: dial tcp 10.1.70.2:9090: getsockopt: connection refused
1h 38s 335 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Warning BackOff Back-off restarting failed docker container
1h 38s 307 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-1416335539-lhlhz_kube-system(6ab54dba-4f13-11e7-a56b-6805ca369d7f)"
1h 27s 24 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Pulled Container image "gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0" already present on machine
59m 23s 15 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Created (events with common reason combined)
59m 22s 15 kubelet, 10.193.20.23 spec.containers{kubernetes-dashboard} Normal Started (events with common reason combined)
5.3 kubectl get svc,ep,rc,rs,deploy,pod -o wide --all-namespaces
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default svc/kubernetes 10.0.0.1 443/TCP 16m
kube-system svc/kube-dns 10.0.0.10 53/UDP,53/TCP 16m k8s-app=kube-dns
kube-system svc/kubernetes-dashboard 10.0.0.95 80/TCP 16m k8s-app=kubernetes-dashboard
NAMESPACE NAME ENDPOINTS AGE
default ep/kubernetes 10.193.20.23:6443 16m
kube-system ep/kube-controller-manager <none> 11m
kube-system ep/kube-dns 16m
kube-system ep/kube-scheduler <none> 11m
kube-system ep/kubernetes-dashboard 16m
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINER(S) IMAGE(S) SELECTOR
kube-system rs/kube-dns-3365905565 1 1 0 16m kubedns,dnsmasq,dnsmasq-metrics,healthz gcr.io/google_containers/kubedns-arm64:1.9,gcr.io/google_containers/kube-dnsmasq-arm64:1.4,gcr.io/google_containers/dnsmasq-metrics-arm64:1.0,gcr.io/google_containers/exechealthz-arm64:1.2 k8s-app=kube-dns,pod-template-hash=3365905565
kube-system rs/kubernetes-dashboard-1416335539 1 1 0 16m kubernetes-dashboard gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0 k8s-app=kubernetes-dashboard,pod-template-hash=1416335539
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE CONTAINER(S) IMAGE(S) SELECTOR
kube-system deploy/kube-dns 1 1 1 0 16m kubedns,dnsmasq,dnsmasq-metrics,healthz gcr.io/google_containers/kubedns-arm64:1.9,gcr.io/google_containers/kube-dnsmasq-arm64:1.4,gcr.io/google_containers/dnsmasq-metrics-arm64:1.0,gcr.io/google_containers/exechealthz-arm64:1.2 k8s-app=kube-dns
kube-system deploy/kubernetes-dashboard 1 1 1 0 16m kubernetes-dashboard gcr.io/google_containers/kubernetes-dashboard-arm64:v1.5.0 k8s-app=kubernetes-dashboard
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system po/k8s-master-10.193.20.23 4/4 Running 50 15m 10.193.20.23 10.193.20.23
kube-system po/k8s-proxy-v1-5b831 1/1 Running 0 16m 10.193.20.23 10.193.20.23
kube-system po/kube-addon-manager-10.193.20.23 2/2 Running 6 15m 10.193.20.23 10.193.20.23
kube-system po/kube-dns-3365905565-jxg4f 1/4 CrashLoopBackOff 20 16m 10.1.5.3 10.193.20.23
kube-system po/kubernetes-dashboard-1416335539-frt3v 0/1 CrashLoopBackOff 7 16m 10.1.5.2 10.193.20.23
5.4 kubectl describe pods kube-dns-3365905565-lb0mq --namespace=kube-system
Name: kube-dns-3365905565-lb0mq
Namespace: kube-system
Node: 10.193.20.23/10.193.20.23
Start Time: Wed, 14 Jun 2017 10:43:46 +0800
Labels: k8s-app=kube-dns
pod-template-hash=3365905565
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-dns-3365905565","uid":"4870aec2-50ab-11e7-a420-6805ca36...
scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 10.1.20.3
Controllers: ReplicaSet/kube-dns-3365905565
Containers:
kubedns:
Container ID: docker://729562769b48be60a02b62692acd3d1e1c67ac2505f4cb41240067777f45fd77
Image: gcr.io/google_containers/kubedns-arm64:1.9
Image ID: docker-pullable://gcr.io/google_containers/kubedns-arm64#sha256:3c78a2c5b9b86c5aeacf9f5967f206dcf1e64362f3e7f274c1c078c954ecae38
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-map=kube-dns
--v=0
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 14 Jun 2017 10:56:29 +0800
Finished: Wed, 14 Jun 2017 10:58:06 +0800
Ready: False
Restart Count: 6
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
dnsmasq:
Container ID: docker://b6d7e98a4af2715294764929f901947ab3b985be45d9f213245bd338ab8c3101
Image: gcr.io/google_containers/kube-dnsmasq-arm64:1.4
Image ID: docker-pullable://gcr.io/google_containers/kube-dnsmasq-arm64#sha256:dff5f9e2a521816aa314d469fd8ef961270fe43b4a74bab490385942103f3728
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 14 Jun 2017 10:55:50 +0800
Finished: Wed, 14 Jun 2017 10:57:26 +0800
Ready: False
Restart Count: 6
Requests:
cpu: 150m
memory: 10Mi
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
dnsmasq-metrics:
Container ID: docker://51693aea0e732e488b631dcedc082f5a9e23b5b74857217cf005d1e947375367
Image: gcr.io/google_containers/dnsmasq-metrics-arm64:1.0
Image ID: docker-pullable://gcr.io/google_containers/dnsmasq-metrics-arm64#sha256:fc0e8b676a26ed0056b8c68611b74b9b5f3f00c608e5b11ef1608484ce55dd9a
Port: 10054/TCP
Args:
--v=2
--logtostderr
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Exit Code: 128
Started: Wed, 14 Jun 2017 10:57:28 +0800
Finished: Wed, 14 Jun 2017 10:57:28 +0800
Ready: False
Restart Count: 7
Requests:
memory: 10Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
healthz:
Container ID: docker://fab7ef143a95ad4d2f6363d5fcdc162eba1522b92726665916462be765289327
Image: gcr.io/google_containers/exechealthz-arm64:1.2
Image ID: docker-pullable://gcr.io/google_containers/exechealthz-arm64#sha256:e8300fde6c36b454cc00b5fffc96d6985622db4d8eb42a9f98f24873e9535b5c
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
State: Running
Started: Wed, 14 Jun 2017 10:44:31 +0800
Ready: True
Restart Count: 0
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-1t5v9 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-1t5v9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-1t5v9
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
15m 15m 1 default-scheduler Normal Scheduled Successfully assigned kube-dns-3365905565-lb0mq to 10.193.20.23
14m 14m 1 kubelet, 10.193.20.23 spec.containers{kubedns} Normal Created Created container with docker id 2fef2db445e6; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{kubedns} Normal Started Started container with docker id 2fef2db445e6
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq} Normal Created Created container with docker id 41ec998eeb76; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq} Normal Started Started container with docker id 41ec998eeb76
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 676ef0e877c8; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Pulled Container image "gcr.io/google_containers/exechealthz-arm64:1.2" already present on machine
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 676ef0e877c8 with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Created Created container with docker id fab7ef143a95; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{healthz} Normal Started Started container with docker id fab7ef143a95
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 45f6bd7f1f3a with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 45f6bd7f1f3a; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "dnsmasq-metrics" with CrashLoopBackOff: "Back-off 10s restarting failed container=dnsmasq-metrics pod=kube-dns-3365905565-lb0mq_kube-system(48845c1a-50ab-11e7-a420-6805ca369d7f)"
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Normal Created Created container with docker id 2d1e5adb97bb; Security:[seccomp=unconfined]
14m 14m 1 kubelet, 10.193.20.23 spec.containers{dnsmasq-metrics} Warning Failed Failed to start container with docker id 2d1e5adb97bb with error: Error response from daemon: {"message":"linux spec user: unable to find group nobody: no matching entries in group file"}
14m 14m 2 kubelet, 10.193.20.23 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "dnsmasq-metrics" with CrashLoopBackOff: "Back-off 20s restarting failed container=dnsmasq-metrics pod=kube-dns-3365905565-lb0mq_kube-system(48845c1a-50ab-11e7-a420-6805ca369d7f)"
So it looks like you have hit a (or several) bugs in Kubernetes. I suggest that you retry with a more recent version (possibly another docker version too). It would be a good idea to report these bugs too (https://github.com/kubernetes/dashboard/issues).
All in all, bear in mind that Kubernetes on arm is an advanced topic and you should expect problems and be ready to debug/resolve them :/
There might be a problem with that docker image (gcr.io/google_containers/dnsmasq-metrics-amd64). Non amd64 stuff is not well tested.
Could you try running:
kubectl set image --namespace=kube-system deployment/kube-dns dnsmasq-metrics=lenart/dnsmasq-metrics-arm64:1.0`
Can't reach dashboard because the dashboard Pod is unhealthy and failing the readiness probe. Because it's not ready it's not considered for the dashboard service so the service has no endpoints which leads to the error message you reported.
The dashboard is most likely unhealthy because kube-dns is not ready (1/4 containers in the Pod ready, should be 4/4).
The kube-dns is most likely unhealthy because you have no pod networking (overlay network) deployed.
Go to the add-ons, pick a network add-on and deploy it. Weave has 1.5 compatible version and requires no setup.
After you have done that give it a few minutes. If you are inpatient just delete the kubernetes-dashboard and kube-dns pods (not the deployment/controller!!). If this does not resolve your problem then please update your question with the new information.

Kubernetes Node status ready but can not be seen by scheduler

I've set up a Kubernetes cluster with three nodes, i get all my nodes status ready, but the scheduler seems not find one of them. How could this happen.
[root#master1 app]# kubectl get nodes
NAME LABELS STATUS AGE
172.16.0.44 kubernetes.io/hostname=172.16.0.44,pxc=node1 Ready 8d
172.16.0.45 kubernetes.io/hostname=172.16.0.45 Ready 8d
172.16.0.46 kubernetes.io/hostname=172.16.0.46 Ready 8d
I use nodeSelect in my RC file like thie:
nodeSelector:
pxc: node1
describe the rc:
Name: mongo-controller
Namespace: kube-system
Image(s): mongo
Selector: k8s-app=mongo
Labels: k8s-app=mongo
Replicas: 1 current / 1 desired
Pods Status: 0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Volumes:
mongo-persistent-storage:
Type: HostPath (bare host directory volume)
Path: /k8s/mongodb
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
25m 25m 1 {replication-controller } SuccessfulCreate Created pod: mongo-controller-0wpwu
get pods to be pending:
[root#master1 app]# kubectl get pods mongo-controller-0wpwu --namespace=kube-system
NAME READY STATUS RESTARTS AGE
mongo-controller-0wpwu 0/1 Pending 0 27m
describe pod mongo-controller-0wpwu:
[root#master1 app]# kubectl describe pod mongo-controller-0wpwu --namespace=kube-system
Name: mongo-controller-0wpwu
Namespace: kube-system
Image(s): mongo
Node: /
Labels: k8s-app=mongo
Status: Pending
Reason:
Message:
IP:
Replication Controllers: mongo-controller (1/1 replicas created)
Containers:
mongo:
Container ID:
Image: mongo
Image ID:
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Waiting
Ready: False
Restart Count: 0
Environment Variables:
Volumes:
mongo-persistent-storage:
Type: HostPath (bare host directory volume)
Path: /k8s/mongodb
default-token-7qjcu:
Type: Secret (a secret that should populate this volume)
SecretName: default-token-7qjcu
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
22m 37s 12 {default-scheduler } FailedScheduling pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.46): MatchNodeSelector
fit failure on node (172.16.0.45): MatchNodeSelector
27m 9s 67 {default-scheduler } FailedScheduling pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
See the ip list in events, The 172.16.0.44 seems not seen by the scheduler? How could the happen?
describe the node 172.16.0.44
[root#master1 app]# kubectl describe nodes --namespace=kube-system
Name: 172.16.0.44
Labels: kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp: Wed, 30 Mar 2016 15:58:47 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Fri, 08 Apr 2016 12:18:01 +0800 Fri, 08 Apr 2016 11:18:52 +0800 KubeletReady kubelet is posting ready status
OutOfDisk Unknown Wed, 30 Mar 2016 15:58:47 +0800 Thu, 07 Apr 2016 17:38:50 +0800 NodeStatusNeverUpdated Kubelet never posted node status.
Addresses: 172.16.0.44,172.16.0.44
Capacity:
cpu: 2
memory: 7748948Ki
pods: 40
System Info:
Machine ID: 45461f76679f48ee96e95da6cc798cc8
System UUID: 2B850D4F-953C-4C20-B182-66E17D5F6461
Boot ID: 40d2cd8d-2e46-4fef-92e1-5fba60f57965
Kernel Version: 3.10.0-123.9.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Container Runtime Version: docker://1.10.1
Kubelet Version: v1.2.0
Kube-Proxy Version: v1.2.0
ExternalID: 172.16.0.44
Non-terminated Pods: (1 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
───────── ──── ──────────── ────────── ─────────────── ─────────────
kube-system kube-registry-proxy-172.16.0.44 100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Allocated resources:
(Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
CPU Requests CPU Limits Memory Requests Memory Limits
──────────── ────────── ─────────────── ─────────────
100m (5%) 100m (5%) 50Mi (0%) 50Mi (0%)
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
───────── ──────── ───── ──── ───────────── ────── ───────
59m 59m 1 {kubelet 172.16.0.44} Starting Starting kubelet.
Ssh login 44, i get the disk space is free(i also remove some docker images and containers):
[root#iZ25dqhvvd0Z ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 40G 2.6G 35G 7% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.7G 0 3.7G 0% /dev/shm
tmpfs 3.7G 143M 3.6G 4% /run
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
/dev/xvdb 40G 361M 37G 1% /k8s
Still docker logs scheduler(v1.3.0-alpha.1 ) get this
E0408 05:28:42.679448 1 factory.go:387] Error scheduling kube-system mongo-controller-0wpwu: pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
; retrying
I0408 05:28:42.679577 1 event.go:216] Event(api.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"mongo-controller-0wpwu", UID:"2d0f0844-fd3c-11e5-b531-00163e000727", APIVersion:"v1", ResourceVersion:"634139", FieldPath:""}): type: 'Warning' reason: 'FailedScheduling' pod (mongo-controller-0wpwu) failed to fit in any node
fit failure on node (172.16.0.45): MatchNodeSelector
fit failure on node (172.16.0.46): MatchNodeSelector
Thanks for your replay Robert. i got this resolve by doing below:
kubectl delete rc
kubectl delete node 172.16.0.44
stop kubelet in 172.16.0.44
rm -rf /k8s/*
restart kubelet
Now the node is ready, and out of disk is gone.
Name: 172.16.0.44
Labels: kubernetes.io/hostname=172.16.0.44,pxc=node1
CreationTimestamp: Fri, 08 Apr 2016 15:14:51 +0800
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready True Fri, 08 Apr 2016 15:25:33 +0800 Fri, 08 Apr 2016 15:14:50 +0800 KubeletReady kubelet is posting ready status
Addresses: 172.16.0.44,172.16.0.44
Capacity:
cpu: 2
memory: 7748948Ki
pods: 40
System Info:
Machine ID: 45461f76679f48ee96e95da6cc798cc8
System UUID: 2B850D4F-953C-4C20-B182-66E17D5F6461
Boot ID: 40d2cd8d-2e46-4fef-92e1-5fba60f57965
Kernel Version: 3.10.0-123.9.3.el7.x86_64
OS Image: CentOS Linux 7 (Core)
I found this https://github.com/kubernetes/kubernetes/issues/4135, but still don't know why my disk space is free and kubelet thinks it is out of disk...
The reason the scheduler failed is that there wasn't space to fit the pod onto the node. If you look at the conditions for your node, it says that the OutOfDisk condition is Unknown. The scheduler is probably not willing to place on pod onto a node that it thinks doesn't have available disk space.
We had the same issue in AWS when they changed DNS from IP=DNS name to IP=IP.eu-central: Nodes showed ready but where not reachable via their name.