today,I create three pods by deployment.yaml,but pods status always is ContainerCreating.could - kubernetes

this is my deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.12.2
ports:
- containerPort: 80
this is my pods
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-5cc6c7559b-6vk87 0/1 ContainerCreating 0 51m <none> k8s-node2 <none> <none>
nginx-deployment-5cc6c7559b-g7wpz 0/1 ContainerCreating 0 51m <none> k8s-node1 <none> <none>
nginx-deployment-5cc6c7559b-s6k2s 0/1 ContainerCreating 0 51m <none> k8s-node1 <none> <none>
this is my description of a pod
Name: nginx-deployment-5cc6c7559b-6vk87
Namespace: default
Priority: 0
Node: k8s-node2/192.168.74.136
Start Time: Mon, 22 Mar 2021 03:02:36 -0400
Labels: app=nginx
pod-template-hash=5cc6c7559b
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/nginx-deployment-5cc6c7559b
Containers:
nginx:
Container ID:
Image: nginx:1.12.2
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-s7x98 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-s7x98:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-s7x98
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/nginx-deployment-5cc6c7559b-6vk87 to k8s-node2
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "688ae6e1b403f8cf0f56bb41ef6e2341044c949304874400a3f4ced159c40f08" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "d8b9f498bf0407ebc5e8e47700af9cec559632f38d12252b1edcde723ce9863f" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "2c72ed28e5672a1da32f7941ba0b638eb459048ff9e70aec42bd125a569faf3f" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7dd06af29506a6e4f22c9484b47ca23412a57b61398ee6caa89edec59e2dcfa5" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "6c14c33fdbb3bb8e42d7e33c991bc51220dcbfd5acc71115c26f966a759fff29" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "dacb90c7ab07cc55c83dba82286e65dd89e30be569e9b5744202c2ae65f54830" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "d03004bff912d6f9aaf614e892d2b43c153392e8fcc03e7988c43d4dfb46ebf0" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7eaf53ffba761c30bfa13f2b3cae2ca2957f9fefee47edf6c0b46943bb09d7a3" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 33m kubelet, k8s-node2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5047e9ad878b99b69090cf96e5534dfe10ec46830cdcc7e73a8afc96dc11e98c" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 18m (x859 over 33m) kubelet, k8s-node2 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 3m34s (x1712 over 33m) kubelet, k8s-node2 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "501ff9b578eac098d6f763a0bc6212423b71714c9d2b1c83ea94b25e7a30e374" network for pod "nginx-deployment-5cc6c7559b-6vk87": networkPlugin cni failed to set up pod "nginx-deployment-5cc6c7559b-6vk87_default" network: open /run/flannel/subnet.env: no such file or directory
I find a similar question:Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod network
I have /etc/cni/net.d and its /opt/cni/bin\
[root#k8s-master bin]# cd /etc/cni/net.d/
[root#k8s-master net.d]# ll -a
total 4
drwxr-xr-x. 2 root root 33 Feb 25 11:08 .
drwxr-xr-x. 3 root root 19 Feb 25 11:08 ..
-rw-r--r--. 1 root root 292 Feb 25 11:08 10-flannel.conflist
[root#k8s-master net.d]# cd /opt/cni/bin
[root#k8s-master bin]# ll -a
total 56484
drwxr-xr-x. 2 root root 239 Feb 25 10:01 .
drwxr-xr-x. 3 root root 17 Feb 25 10:01 ..
-rwxr-xr-x. 1 root root 3254624 Sep 9 2020 bandwidth
-rwxr-xr-x. 1 root root 3581192 Sep 9 2020 bridge
-rwxr-xr-x. 1 root root 9837552 Sep 9 2020 dhcp
-rwxr-xr-x. 1 root root 4699824 Sep 9 2020 firewall
-rwxr-xr-x. 1 root root 2650368 Sep 9 2020 flannel
-rwxr-xr-x. 1 root root 3274160 Sep 9 2020 host-device
-rwxr-xr-x. 1 root root 2847152 Sep 9 2020 host-local
-rwxr-xr-x. 1 root root 3377272 Sep 9 2020 ipvlan
-rwxr-xr-x. 1 root root 2715600 Sep 9 2020 loopback
-rwxr-xr-x. 1 root root 3440168 Sep 9 2020 macvlan
-rwxr-xr-x. 1 root root 3048528 Sep 9 2020 portmap
-rwxr-xr-x. 1 root root 3528800 Sep 9 2020 ptp
-rwxr-xr-x. 1 root root 2849328 Sep 9 2020 sbr
-rwxr-xr-x. 1 root root 2503512 Sep 9 2020 static
-rwxr-xr-x. 1 root root 2820128 Sep 9 2020 tuning
-rwxr-xr-x. 1 root root 3377120 Sep 9 2020 vlan
I have three nodes named k8s-master k8s-node1 and k8s-node2,but I don't add rules for nodes.
Something is not right
NAME READY STATUS RESTARTS AGE
coredns-7ff77c879f-m7bjr 0/1 CrashLoopBackOff 171 24d
coredns-7ff77c879f-x4xjf 0/1 Running 170 24d
etcd-k8s-master 1/1 Running 0 24d
kube-apiserver-k8s-master 1/1 Running 8 24d
kube-controller-manager-k8s-master 1/1 Running 2 24d
kube-proxy-6wxcp 1/1 Running 1 24d
kube-proxy-cmhn6 1/1 Running 0 24d
kube-proxy-pzhqc 1/1 Running 0 24d
kube-scheduler-k8s-master 1/1 Running 2 24d
my network plugin flannel isn't work,maybe it cause this question

Only execute one command
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Question is over

As you said looks like the issue is with flannel.
Please try to follow the advises on this GitHub issue: https://github.com/kubernetes/kubernetes/issues/70202#issuecomment-481173403
The top voted answer is:
Just got the same problem - fixed it by manually adding the file:
/run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Related

when i restart a pod , it shows "Volume is already attached by pod minio/minio-3"

when i restart a pod with 'kubectl delete -n minio pod minio-3', kubelet shows "Volume is already attached by pod minio/minio-3". it seems like that the volume is attached to a old one. How can I make it work.
[root#control01 ~]# kubectl get pod -n minio
NAME READY STATUS RESTARTS AGE
minio-0 0/1 ContainerCreating 0 62m
minio-1 1/1 Running 0 128m
minio-2 1/1 Running 1 6d7h
minio-3 0/1 ContainerCreating 0 96m
[root#control12 ~]# cat /var/log/messages |grep 'Sep 13'
Sep 13 16:48:10 control12 kubelet: E0913 16:48:10.189343 40141 nestedpendingoperations.go:270] Operation for "\"flexvolume-ceph.rook.io/rook-ceph/e51e0fbd-f09f-430a-8b47-2ca1dbdfdc2e-pvc-06a22ce4-cbbb-4cd7-82c5-d7bf9755fbd9\" (\"e51e0fbd-f09f-430a-8b47-2ca1dbdfdc2e\")" failed. No retries permitted until 2022-09-13 16:50:12.189281072 +0800 CST m=+5554.416909175 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"pvc-06a22ce4-cbbb-4cd7-82c5-d7bf9755fbd9\" (UniqueName: \"flexvolume-ceph.rook.io/rook-ceph/e51e0fbd-f09f-430a-8b47-2ca1dbdfdc2e-pvc-06a22ce4-cbbb-4cd7-82c5-d7bf9755fbd9\") pod \"minio-3\" (UID: \"e51e0fbd-f09f-430a-8b47-2ca1dbdfdc2e\") : mount command failed, status: Failure, reason: Rook: Mount volume failed: failed to attach volume pvc-06a22ce4-cbbb-4cd7-82c5-d7bf9755fbd9 for pod minio/minio-3. Volume is already attached by pod minio/minio-3. Status Pending"
[root#control01 ~]# kubectl describe pod -n minio minio-3
Name: minio-3
Namespace: minio
Priority: 0
Node: control12/192.168.1.112
Start Time: Tue, 13 Sep 2022 15:18:28 +0800
Labels: app=minio
controller-revision-hash=minio-95c8c444c
statefulset.kubernetes.io/pod-name=minio-3
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/minio
Containers:
minio:
Container ID:
Image: minio/minio:RELEASE.2021-04-06T23-11-00Z
Image ID:
Port: 9000/TCP
Host Port: 0/TCP
Args:
server
http://minio-{0...3}.minio.minio.svc.cluster.local/minio/data
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
...
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 52m (x13 over 93m) kubelet, control12 Unable to attach or mount volumes: unmounted volumes=[minio-data], unattached volumes=[minio-data tz-config default-token-np5x5]: timed out waiting for the condition
Warning FailedMount 11m (x50 over 97m) kubelet, control12 MountVolume.SetUp failed for volume "pvc-06a22ce4-cbbb-4cd7-82c5-d7bf9755fbd9" : mount command failed, status: Failure, reason: Rook: Mount volume failed: failed to attach volume pvc-06a22ce4-cbbb-4cd7-82c5-d7bf9755fbd9 for pod minio/minio-3. Volume is already attached by pod minio/minio-3. Status Pending
Warning FailedMount 6m36s (x7 over 95m) kubelet, control12 Unable to attach or mount volumes: unmounted volumes=[minio-data], unattached volumes=[tz-config default-token-np5x5 minio-data]: timed out waiting for the condition
Warning FailedMount 2m1s (x9 over 86m) kubelet, control12 Unable to attach or mount volumes: unmounted volumes=[minio-data], unattached volumes=[default-token-np5x5 minio-data tz-config]: timed out waiting for the condition
If you want to delete the attachment:
$ kubectl get pv
# find the pvc NAME by CLAIM
$ kubectl get volumeattachment
# use the NAME from above as PV in the output to look up CSI NAME
$ kubectl delete volumeattachment [CSI NAME]

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container

We are trying to create POD but the Pod's status struck at ContainerCreating for long time.
This is the output we got after running the command: kubectl describe pod
Name: demo-6c59fb8f77-9x6sr
Namespace: default
Priority: 0
Node: k8-slave2/10.0.0.5
Start Time: Wed, 23 Dec 2020 10:16:23 +0000
Labels: app=demo
pod-template-hash=6c59fb8f77
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/demo-6c59fb8f77
Containers:
private-docker-registry:
Container ID:
Image: private-docker-registry:5000/mahin/mof-docker-demo:v1
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-p94zw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-p94zw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-p94zw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/demo-6c59fb8f77-9x6sr to k8-slave2
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8eee497a2176c7f5782222f804cc63a4abac7f4a2fc7813016793857ae1b1dff" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "95e72bfc6f6c13de7f5c96eb76b012c2e6639ca03f4c2f270b23ed1a09b90413" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "566370012e4a1d32af2ef9035ff64d743cd81f36f25d2724e7b033e393b8247e" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7d499e40f572cfc29ecfb44f8376493df56a44213b1c1e9333b65499a0c288cd" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "53241e64de1e4470712b4061e2c82f44916d654bc532f8f1d12e5d5d4e136914" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "fd168faab4546f988dc38fc56df2f71cf80c922e86d3f869be15a43f08328f99" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e578afe329abb0cba64802dfa480e00f2bbbb8c80be537791c24a31c853eb62f" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "a3cb32dba55907ca907fc4f38f7ca05ef6db10a6af2dd1fa3c4db166e4ab9ffe" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7e4368ba8ec460b3c94de24ab0a04b6c799eb28df885cbbacfc3bb3ffa8c1e67" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m (x4 over 10m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c4aaa8f8cd2dc1eff788baf04774c4ecc845568d00ed1b386df311ec224eb6f3" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 56s (x551 over 10m) kubelet Pod sandbox changed, it will be killed and re-created.
azureuser#k8-master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default demo-6c59fb8f77-2jq6k 0/1 ContainerCreating 0 5m23s
kube-system coredns-f9fd979d6-q8s9b 1/1 Running 2 27h
kube-system coredns-f9fd979d6-qnm4j 1/1 Running 2 27h
kube-system etcd-k8-master 1/1 Running 2 27h
kube-system kube-apiserver-k8-master 1/1 Running 3 27h
kube-system kube-controller-manager-k8-master 1/1 Running 3 27h
kube-system kube-flannel-ds-kqz4t 0/1 CrashLoopBackOff 92 27h
kube-system kube-flannel-ds-szqzn 1/1 Running 3 27h
kube-system kube-flannel-ds-v9q47 0/1 CrashLoopBackOff 142 27h
kube-system kube-proxy-4mb47 1/1 Running 2 27h
kube-system kube-proxy-54m9b 1/1 Running 2 27h
kube-system kube-proxy-wdxfz 1/1 Running 1 27h
kube-system kube-scheduler-k8-master 1/1 Running 3 27h
kubernetes-dashboard dashboard-metrics-scraper-7b59f7d4df-zmlvs 0/1 ContainerCreating 0 27h
kubernetes-dashboard kubernetes-dashboard-665f4c5ff-cnsvn 0/1 ContainerCreating 0 6h3m
To fix the flannel crashloopbackoff we did Kubeadm reset and after some time this problem showed up again.
Current we are working with one master and two worker node.
My cluster details as follows:
azureuser#k8-master:~$ kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://52.150.11.168:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin#kubernetes
current-context: kubernetes-admin#kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
Docker version:
azureuser#k8-master:~$ sudo docker version
[sudo] password for azureuser:
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Wed Oct 14 19:00:27 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Wed Oct 14 16:52:50 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu1~18.04.2
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
kubeadm version :
azureuser#k8-master:~$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:15:05Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
The flannel is crashing whenever I tried to schedule pod creation.
Background
I think your issue is cased by your 2 Flannel CNI pods CrashLoopBackOff status.
Your error
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8eee497a2176c7f5782222f804cc63a4abac7f4a2fc7813016793857ae1b1dff" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
is pointing that pod cannot be created due to lack of /run/flannel/subnet.env file.
In Flannel Github document you can find:
Flannel runs a small, single binary agent called flanneld on each host, and is responsible for allocating a subnet lease to each host out of a larger, preconfigured address space.
Meaning, to proper work, Flannel pod should be running on each node as it contains subnets information. From your outputs I can see that only 1 is working properly out of 3 Flannel pods.
NAMESPACE NAME READY STATUS RESTARTS AGE
...
kube-system kube-flannel-ds-kqz4t 0/1 CrashLoopBackOff 92 27h
kube-system kube-flannel-ds-szqzn 1/1 Running 3 27h
kube-system kube-flannel-ds-v9q47 0/1 CrashLoopBackOff 142 27h
If mentioned pod was scheduled on node where flannel pod is not working it won't be created due to CNI network issues. Besides your demo pod, also kubernetes-dashboard pods have the same issue with ContainerCreating status.
Conclusion
Your demo pod cannot be scheduled as Kubernetes encounter some network issues related with flannel configuration file (...network: open /run/flannel/subnet.env: no such file or directory).
Your flannel pods restarts counts is very high as for 27 hours. You have to determine why and fix it. It might be lack of resources, network issues with your infrastructure or many other reasons. Once all flannel pods will be working correctly, your shouldn't encounter this error.
Solution
You have to make flannel pods works correctly on each node.
Additional Troubleshooting Details
For detailed investigation please provide
$ kubectl describe kube-flannel-ds-kqz4t -n kube-system
$ kubectl describe kube-flannel-ds-v9q47 -n kube-system
Logs details would be also helpful
$ kubectl logs kube-flannel-ds-kqz4t -n kube-system
$ kubectl logs kube-flannel-ds-v9q47 -n kube-system
Please replace kubectl get pods --all-namespaces with kubectl get pods -o wide -A and output of kubectl get nodes -o wide.
If you will provide those information, it should be possible to determine root cause of flannel pods issues and I will edit this answer with exact solution.

Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod network

Issue Redis POD creation on k8s(v1.10) cluster and POD creation stuck at "ContainerCreating"
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30m default-scheduler Successfully assigned redis to k8snode02
Normal SuccessfulMountVolume 30m kubelet, k8snode02 MountVolume.SetUp succeeded for volume "default-token-f8tcg"
Warning FailedCreatePodSandBox 5m (x1202 over 30m) kubelet, k8snode02 Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "redis_default" network: failed to find plugin "loopback" in path [/opt/loopback/bin /opt/cni/bin]
Normal SandboxChanged 47s (x1459 over 30m) kubelet, k8snode02 Pod sandbox changed, it will be killed and re-created.
When I used calico as CNI and I faced a similar issue.
The container remained in creating state, I checked for /etc/cni/net.d and /opt/cni/bin on master both are present but not sure if this is required on worker node as well.
root#KubernetesMaster:/opt/cni/bin# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5c7588df-5zds6 0/1 ContainerCreating 0 21m
root#KubernetesMaster:/opt/cni/bin# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetesmaster Ready master 26m v1.13.4
kubernetesslave1 Ready <none> 22m v1.13.4
root#KubernetesMaster:/opt/cni/bin#
kubectl describe pods
Name: nginx-5c7588df-5zds6
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: kubernetesslave1/10.0.3.80
Start Time: Sun, 17 Mar 2019 05:13:30 +0000
Labels: app=nginx
pod-template-hash=5c7588df
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/nginx-5c7588df
Containers:
nginx:
Container ID:
Image: nginx
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qtfbs (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-qtfbs:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qtfbs
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned default/nginx-5c7588df-5zds6 to kubernetesslave1
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "123d527490944d80f44b1976b82dbae5dc56934aabf59cf89f151736d7ea8adc" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8cc5e62ebaab7075782c2248e00d795191c45906cc9579464a00c09a2bc88b71" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "30ffdeace558b0935d1ed3c2e59480e2dd98e983b747dacae707d1baa222353f" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "630e85451b6ce2452839c4cfd1ecb9acce4120515702edf29421c123cf231213" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "820b919b7edcfc3081711bb78b79d33e5be3f7dafcbad29fe46b6d7aa22227aa" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "abbfb5d2756f12802072039dec20ba52f546ae755aaa642a9a75c86577be589f" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "dfeb46ffda4d0f8a434f3f3af04328fcc4b6c7cafaa62626e41b705b06d98cc4" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "9ae3f47bb0282a56e607779d3267127ee8b0ae1d7f416f5a184682119203b1c8" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 18m kubelet, kubernetesslave1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "546d07f1864728b2e2675c066775f94d658e221ada5fb4ed6bf6689ec7b8de23" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Normal SandboxChanged 18m (x12 over 18m) kubelet, kubernetesslave1 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 3m39s (x829 over 18m) kubelet, kubernetesslave1 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f586be437843537a3082f37ad139c88d0eacfbe99ddf00621efd4dc049a268cc" network for pod "nginx-5c7588df-5zds6": NetworkPlugin cni failed to set up pod "nginx-5c7588df-5zds6_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
root#KubernetesMaster:/etc/cni/net.d#
On worker node NGINX is trying to come up but getting exited, I am not sure what's going on here - I am newbie to kubernetes & not able to fix this issue -
root#kubernetesslave1:/home/ubuntu# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ad5500e8270 fadcc5d2b066 "/usr/local/bin/kube…" 3 minutes ago Up 3 minutes k8s_kube-proxy_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
b1c9929ebe9e k8s.gcr.io/pause:3.1 "/pause" 3 minutes ago Up 3 minutes k8s_POD_calico-node-749qx_kube-system_4e2d8c9c-4873-11e9-a33a-06516e7d78c4_1
ceb78340b563 k8s.gcr.io/pause:3.1 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
root#kubernetesslave1:/home/ubuntu# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ad5500e8270 fadcc5d2b066 "/usr/local/bin/kube…" 3 minutes ago Up 3 minutes k8s_kube-proxy_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
b1c9929ebe9e k8s.gcr.io/pause:3.1 "/pause" 3 minutes ago Up 3 minutes k8s_POD_calico-node-749qx_kube-system_4e2d8c9c-4873-11e9-a33a-06516e7d78c4_1
ceb78340b563 k8s.gcr.io/pause:3.1 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
root#kubernetesslave1:/home/ubuntu# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ad5500e8270 fadcc5d2b066 "/usr/local/bin/kube…" 3 minutes ago Up 3 minutes k8s_kube-proxy_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
b1c9929ebe9e k8s.gcr.io/pause:3.1 "/pause" 3 minutes ago Up 3 minutes k8s_POD_calico-node-749qx_kube-system_4e2d8c9c-4873-11e9-a33a-06516e7d78c4_1
ceb78340b563 k8s.gcr.io/pause:3.1 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
root#kubernetesslave1:/home/ubuntu# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
94b2994401d0 k8s.gcr.io/pause:3.1 "/pause" 1 second ago Up Less than a second k8s_POD_nginx-5c7588df-5zds6_default_677a722b-4873-11e9-a33a-06516e7d78c4_534
5ad5500e8270 fadcc5d2b066 "/usr/local/bin/kube…" 4 minutes ago Up 4 minutes k8s_kube-proxy_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
b1c9929ebe9e k8s.gcr.io/pause:3.1 "/pause" 4 minutes ago Up 4 minutes k8s_POD_calico-node-749qx_kube-system_4e2d8c9c-4873-11e9-a33a-06516e7d78c4_1
ceb78340b563 k8s.gcr.io/pause:3.1 "/pause" 4 minutes ago Up 4 minutes k8s_POD_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
root#kubernetesslave1:/home/ubuntu# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ad5500e8270 fadcc5d2b066 "/usr/local/bin/kube…" 4 minutes ago Up 4 minutes k8s_kube-proxy_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
b1c9929ebe9e k8s.gcr.io/pause:3.1 "/pause" 4 minutes ago Up 4 minutes k8s_POD_calico-node-749qx_kube-system_4e2d8c9c-4873-11e9-a33a-06516e7d78c4_1
ceb78340b563 k8s.gcr.io/pause:3.1 "/pause" 4 minutes ago Up 4 minutes k8s_POD_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
root#kubernetesslave1:/home/ubuntu# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f72500cae2b7 k8s.gcr.io/pause:3.1 "/pause" 1 second ago Up Less than a second k8s_POD_nginx-5c7588df-5zds6_default_677a722b-4873-11e9-a33a-06516e7d78c4_585
5ad5500e8270 fadcc5d2b066 "/usr/local/bin/kube…" 4 minutes ago Up 4 minutes k8s_kube-proxy_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
b1c9929ebe9e k8s.gcr.io/pause:3.1 "/pause" 4 minutes ago Up 4 minutes k8s_POD_calico-node-749qx_kube-system_4e2d8c9c-4873-11e9-a33a-06516e7d78c4_1
ceb78340b563 k8s.gcr.io/pause:3.1 "/pause" 4 minutes ago Up 4 minutes k8s_POD_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
root#kubernetesslave1:/home/ubuntu# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5ad5500e8270 fadcc5d2b066 "/usr/local/bin/kube…" 5 minutes ago Up 5 minutes k8s_kube-proxy_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
b1c9929ebe9e k8s.gcr.io/pause:3.1 "/pause" 5 minutes ago Up 5 minutes k8s_POD_calico-node-749qx_kube-system_4e2d8c9c-4873-11e9-a33a-06516e7d78c4_1
ceb78340b563 k8s.gcr.io/pause:3.1 "/pause" 5 minutes ago Up 5 minutes k8s_POD_kube-proxy-f24gd_kube-system_4e2d313a-4873-11e9-a33a-06516e7d78c4_1
I checked about /etc/cni/net.d & /opt/cni/bin on worker node as well, it is there -
root#kubernetesslave1:/home/ubuntu# cd /etc/cni
root#kubernetesslave1:/etc/cni# ls -ltr
total 4
drwxr-xr-x 2 root root 4096 Mar 17 05:19 net.d
root#kubernetesslave1:/etc/cni# cd /opt/cni
root#kubernetesslave1:/opt/cni# ls -ltr
total 4
drwxr-xr-x 2 root root 4096 Mar 17 05:19 bin
root#kubernetesslave1:/opt/cni# cd bin
root#kubernetesslave1:/opt/cni/bin# ls -ltr
total 107440
-rwxr-xr-x 1 root root 3890407 Aug 17 2017 bridge
-rwxr-xr-x 1 root root 3475802 Aug 17 2017 ipvlan
-rwxr-xr-x 1 root root 3520724 Aug 17 2017 macvlan
-rwxr-xr-x 1 root root 3877986 Aug 17 2017 ptp
-rwxr-xr-x 1 root root 3475750 Aug 17 2017 vlan
-rwxr-xr-x 1 root root 9921982 Aug 17 2017 dhcp
-rwxr-xr-x 1 root root 2605279 Aug 17 2017 sample
-rwxr-xr-x 1 root root 32351072 Mar 17 05:19 calico
-rwxr-xr-x 1 root root 31490656 Mar 17 05:19 calico-ipam
-rwxr-xr-x 1 root root 2856252 Mar 17 05:19 flannel
-rwxr-xr-x 1 root root 3084347 Mar 17 05:19 loopback
-rwxr-xr-x 1 root root 3036768 Mar 17 05:19 host-local
-rwxr-xr-x 1 root root 3550877 Mar 17 05:19 portmap
-rwxr-xr-x 1 root root 2850029 Mar 17 05:19 tuning
root#kubernetesslave1:/opt/cni/bin#
Ensure that /etc/cni/net.d and its /opt/cni/bin friend both exist and are correctly populated with the CNI configuration files and binaries on all Nodes. For flannel specifically, one might make use of the flannel cni repo
I had this issue with my GKE cluster on GCP with one of my preemptive node pools. Thanks to #mdaniel tip of checking the integrity of /etc/cni/net.d I could reproduce the issue again by ssh into the node of a testing cluster with the command gcloud compute ssh <name of some node> --zone <zone-of-cluster> --internal-ip. Then I simply edited the file /etc/cni/net.d/10-gke-ptp.conflist and messed with the values on the "routes": [ {"dst": "0.0.0.0/0"} ] (changed from 0.0.0.0/0 to 1.0.0.0/0).
After that, I deleted the pods that were running inside of it and they all got stuck with the ContainerCreating status forever generating kublet events with the error Failed create pod sandbox: rpc error: code...
Note that in order to test I've set up my nodepool to have maximum of 1 node. Otherwise it will scale up a new one and the pods will be recreated at the new node. In my production incident the nodepool reached maximum node count so setting my tests to max 1 node reproduced a similar situation.
Since that, deleting the node from GKE solved the issue in production, I created a Python script that lists all events on the cluster and filters the ones that have the keyword "Failed create pod sandbox: rpc error: code". Than I go over all events and get their pods, and then from the pods, I get the nodes. Finally I loop over the nodes deleting them both from Kubernetes API and from Compute API with it's respective Python clients. For the Python script I used the libs: kubernetes and google-cloud-compute.
This is a simpler version of the script. Test it before using it:
from kubernetes import client, config
from google.cloud.compute_v1.services.instances import InstancesClient
ERROR_KEYWORDS = [
'Failed to create pod sandbox'.lower()
]
config.load_kube_config()
v1 = client.CoreV1Api()
events_result = v1.list_event_for_all_namespaces()
filtered_events = []
# filter only the events containing ERROR_KEYWORDS
for event in events_result.items:
for error_keyword in ERROR_KEYWORDS:
if error_keyword in event.message.lower():
filtered_events.append(event)
# gets the list of pods from those events
pods_list = {}
for event in filtered_events:
try:
pod = v1.read_namespaced_pod(
event.involved_object.name,
namespace=event.involved_object.namespace
)
pod_dict = {
"name": event.involved_object.name,
"namespace": event.involved_object.namespace,
"node": pod.spec.node_name
}
pods_list[event.involved_object.name] = pod_dict
except Exception as e:
pass
# Get the nodes from those pods
broken_nodes = set()
for name, pod_dict in pods_list.items():
if pod_dict.get('node'):
broken_nodes.add(pod_dict["node"])
broken_nodes = list(broken_nodes)
# Deletes the nodes from both Kubernetes API and Compute Engine API
if broken_nodes:
broken_nodes_str = ", ".join(broken_nodes)
print(f'BROKEN NODES: "{broken_nodes_str}"')
for node in broken_nodes:
try:
api_response = v1.delete_node(node)
except Exception as e:
pass
time.sleep(30)
try:
result = gcp_client.delete(project=PROJECT_ID, zone=CLUSTER_ZONE, instance=node)
except Exception as e:
pass
AWS EKS doesn't yet support t3a, m5ad r5ad instances
kubectl drain node1 node2 --delete-local-data --force --ignore-daemonsets
Just when I was planning to expel pods on all nodes, I didn’t expect the pods with errors all the time to become Running. You can try to execute it, hope it will be useful to you
This problem appeared for me when I added a PVC on AWS EKS.
Updating the aws-node CNI plugin to the latest version resolved it -
https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html
Following steps resets kubernetes cluster and helped me to solve my problem.
Stop all running pods
Delete all worker nodes from cluster
Perform kubeadm reset on master and nodes
Initiate the master node
kubeadm init --apiserver-advertise-address
install Pod network “WeaveNet”
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=192.168.0.0/16"
Join nodes to the cluster
Restart all nodes
#-------------------------------------
#Reset the kubernetes environment
#-----------------------------------
#[root#centos8-Master: ~]# k get nodes
#NAME STATUS ROLES AGE VERSION
#centos8-master Ready control-plane 14m v1.24.1
#centos8-slave Ready <none> 11m v1.24.3
#
#Master Node
#1. Delete the nodes
#First delete all pods, deployments, svc
#kubectl delete --all pods
#kubectl delete --all deployments
#kubectl delete --all svc
#kubectl drain centos8-slave --ignore-daemonsets --delete-emptydir-data --force
#kubectl delete node centos8-slave
#
#Worker Node
#2. Go to worker node, stop all the kubelet services.
#[root#centos8-Slave rprasads]# kubectl version --short
#Client Version: v1.24.3
#Kustomize Version: v4.5.4
#[root#centos8-Slave rprasads]# systemctl stop kubelet
#[root#centos8-Slave rprasads]# netstat -tulnp |grep kube
#kill -9 <pid> [kube-proxy]
#
#Master Node
#2. Reset the kubeadm.
#$ sudo kubeadm reset
#$ sudo swapoff -a
#
#Master Node
#3. Get you kubeadm version
#[root#centos8-Master: ~]# kubectl version --short
#Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
#Client Version: v1.24.1
#Kustomize Version: v4.5.4
#Server Version: v1.24.3
#
#Master Node
#4.On Master Initialize the kubeadm with proper network address and version
#$ kubeadm init --apiserver-advertise-address=192.168.56.101 --pod-network-cidr=192.168.0.0/16
##Download calico yaml file from the site: Refer the documentation https://projectcalico.docs.tigera.io/getting-started/kubernetes/self-managed-onprem/onpremises#install-calico-with-kubernetes-api-datastore-more-than-50-nodes
#
#$ curl https://projectcalico.docs.tigera.io/manifests/calico.yaml -O
#$ kubectl apply -f calico.yaml
#
#Worker Node
#5. Go to worker node and add the node with the command displayed.
# kubeadm join 192.168.56.101:6443 --token h0nuxq.zk9m731nc4ia93pq --discovery-token-ca-cert-hash sha256:1682644baf3433caeb0e6f9099ed487ef48b94ab6a0314df88e3ff42ae501a13
#
#Master Node
#6.On the master node run below commands.
#$ sudo rm -rf $HOME/.kube
#
#$ mkdir -p $HOME/.kube
#$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
#$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
#
#$ sudo systemctl enable docker.service
#$ sudo service kubelet restart
#
#$ kubectl get nodes
#
#
#------------------------------------------------
#Test your new kubernetes cluster environment.
#-----------------------------------------------
#[root#centos8-Master: ~]# kubectl run nginx --image=nginx
#Wait for some time.
#
#[root#centos8-Master: ~]# k describe pods nginx
#Normal Scheduled 21s default-scheduler Successfully assigned default/nginx to centos8-slave
#
#[root#centos8-Master: ~]# k get pods
#NAME READY STATUS RESTARTS AGE
#nginx 1/1 Running 0 25s
#
#*************************************END*************************************

kubernetes HA cluster masters nodes not ready

I have deployed a kubernetes HA cluster using the next config.yaml:
etcd:
endpoints:
- "http://172.16.8.236:2379"
- "http://172.16.8.237:2379"
- "http://172.16.8.238:2379"
networking:
podSubnet: "192.168.0.0/16"
apiServerExtraArgs:
endpoint-reconciler-type: lease
When I check kubectl get nodes:
NAME STATUS ROLES AGE VERSION
master1 Ready master 22m v1.10.4
master2 NotReady master 17m v1.10.4
master3 NotReady master 16m v1.10.4
If I check the pods, I can see too much are failing:
[ikerlan#master1 ~]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-etcd-5jftb 0/1 NodeLost 0 16m
calico-etcd-kl7hb 1/1 Running 0 16m
calico-etcd-z7sps 0/1 NodeLost 0 16m
calico-kube-controllers-79dccdc4cc-vt5t7 1/1 Running 0 16m
calico-node-dbjl2 2/2 Running 0 16m
calico-node-gkkth 0/2 NodeLost 0 16m
calico-node-rqzzl 0/2 NodeLost 0 16m
kube-apiserver-master1 1/1 Running 0 21m
kube-controller-manager-master1 1/1 Running 0 22m
kube-dns-86f4d74b45-rwchm 1/3 CrashLoopBackOff 17 22m
kube-proxy-226xd 1/1 Running 0 22m
kube-proxy-jr2jq 0/1 ContainerCreating 0 18m
kube-proxy-zmjdm 0/1 ContainerCreating 0 17m
kube-scheduler-master1 1/1 Running 0 21m
If I run kubectl describe node master2:
[ikerlan#master1 ~]$ kubectl describe node master2
Name: master2
Roles: master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=master2
node-role.kubernetes.io/master=
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Mon, 11 Jun 2018 12:06:03 +0200
Taints: node-role.kubernetes.io/master:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure False Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:00 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready Unknown Mon, 11 Jun 2018 12:06:15 +0200 Mon, 11 Jun 2018 12:06:56 +0200 NodeStatusUnknown Kubelet stopped posting node status.
Addresses:
InternalIP: 172.16.8.237
Hostname: master2
Capacity:
cpu: 2
ephemeral-storage: 37300436Ki
Then if I check the pods, kubectl describe pod -n kube-system calico-etcd-5jftb:
[ikerlan#master1 ~]$ kubectl describe pod -n kube-system calico-etcd-5jftb
Name: calico-etcd-5jftb
Namespace: kube-system
Node: master2/
Labels: controller-revision-hash=4283683065
k8s-app=calico-etcd
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Terminating (lasts 20h)
Termination Grace Period: 30s
Reason: NodeLost
Message: Node master2 which was running pod calico-etcd-5jftb is unresponsive
IP:
Controlled By: DaemonSet/calico-etcd
Containers:
calico-etcd:
Image: quay.io/coreos/etcd:v3.1.10
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/etcd
Args:
--name=calico
--data-dir=/var/etcd/calico-data
--advertise-client-urls=http://$CALICO_ETCD_IP:6666
--listen-client-urls=http://0.0.0.0:6666
--listen-peer-urls=http://0.0.0.0:6667
--auto-compaction-retention=1
Environment:
CALICO_ETCD_IP: (v1:status.podIP)
Mounts:
/var/etcd from var-etcd (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tj6d7 (ro)
Volumes:
var-etcd:
Type: HostPath (bare host directory volume)
Path: /var/etcd
HostPathType:
default-token-tj6d7:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tj6d7
Optional: false
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events: <none>
I have tried to update etcd cluster, to version 3.3 and now I can see the next logs (and some more timeouts):
2018-06-12 09:17:51.305960 W | etcdserver: read-only range request "key:\"/registry/apiregistration.k8s.io/apiservices/v1beta1.authentication.k8s.io\" " took too long (190.475363ms) to execute
2018-06-12 09:18:06.788558 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " took too long (109.543763ms) to execute
2018-06-12 09:18:34.875823 W | etcdserver: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " took too long (136.649505ms) to execute
2018-06-12 09:18:41.634057 W | etcdserver: read-only range request "key:\"/registry/minions\" range_end:\"/registry/miniont\" count_only:true " took too long (106.00073ms) to execute
2018-06-12 09:18:42.345564 W | etcdserver: request "header:<ID:4449666326481959890 > lease_revoke:<ID:4449666326481959752 > " took too long (142.771179ms) to execute
I have checked: kubectl get events
22m 22m 1 master2.15375fdf087fc69f Node Normal Starting kube-proxy, master2 Starting kube-proxy.
22m 22m 1 master3.15375fe744055758 Node Normal Starting kubelet, master3 Starting kubelet.
22m 22m 5 master3.15375fe74d47afa2 Node Normal NodeHasSufficientDisk kubelet, master3 Node master3 status is now: NodeHasSufficientDisk
22m 22m 5 master3.15375fe74d47f80f Node Normal NodeHasSufficientMemory kubelet, master3 Node master3 status is now: NodeHasSufficientMemory
22m 22m 5 master3.15375fe74d48066e Node Normal NodeHasNoDiskPressure kubelet, master3 Node master3 status is now: NodeHasNoDiskPressure
22m 22m 5 master3.15375fe74d481368 Node Normal NodeHasSufficientPID kubelet, master3 Node master3 status is now: NodeHasSufficientPID
I see multiple calico-etcd pods attempting to be ran, if you have used a calico.yaml that deploys etcd for you, that will not work in a multi-master environment.
That manifest is not intended for production deployment and will not work in a multi-master environment because the etcd it deploys is not configured to attempt to form a cluster.
You could still use that manifest but you would need to remove the etcd pods it deploys and set the etcd_endpoints to an etcd cluster you have deployed.
I have solved it:
Adding all the masters IPs and LB IP to the apiServerCertSANs
Copying the kubernetes certificates from the first master to the other masters.

Kubernetes reports ImagePullBackOff for pod on minikube

I've built a docker image within the minikube VM. However I don't understand why Kubernetes is not finding it?
minikube ssh
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
diyapploopback latest 9590c4dc2ed1 2 hours ago 842MB
And if I describe the pod:
kubectl describe pods abcxyz12-6b4d85894-fhb2p
Name: abcxyz12-6b4d85894-fhb2p
Namespace: diyclientapps
Node: minikube/192.168.99.100
Start Time: Wed, 07 Mar 2018 13:49:51 +0000
Labels: appId=abcxyz12
pod-template-hash=260841450
Annotations: <none>
Status: Pending
IP: 172.17.0.6
Controllers: <none>
Containers:
nginx:
Container ID:
Image: diyapploopback:latest
Image ID:
Port: 80/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c62fx (ro)
mariadb:
Container ID: docker://fe09e08f98a9f972f2d086b56b55982e96772a2714ad3b4c2adf4f2f06c2986a
Image: mariadb:10.3
Image ID: docker-pullable://mariadb#sha256:8d4b8fd12c86f343b19e29d0fdd0c63a7aa81d4c2335317085ac973a4782c1f5
Port:
State: Running
Started: Wed, 07 Mar 2018 14:21:00 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 07 Mar 2018 13:49:54 +0000
Finished: Wed, 07 Mar 2018 14:18:43 +0000
Ready: True
Restart Count: 1
Environment:
MYSQL_ROOT_PASSWORD: passwordTempXyz
Mounts:
/var/lib/mysql from mysql-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c62fx (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
mysql-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: abcxyz12
ReadOnly: false
default-token-c62fx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c62fx
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
31m 31m 1 default-scheduler Normal Scheduled Successfully assigned abcxyz12-6b4d85894-fhb2p to minikube
31m 31m 1 kubelet, minikube Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-c62fx"
31m 31m 1 kubelet, minikube Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "pvc-689f3067-220e-11e8-a244-0800279a9a04"
31m 31m 1 kubelet, minikube spec.containers{mariadb} Normal Pulled Container image "mariadb:10.3" already present on machine
31m 31m 1 kubelet, minikube spec.containers{mariadb} Normal Created Created container
31m 31m 1 kubelet, minikube spec.containers{mariadb} Normal Started Started container
31m 30m 3 kubelet, minikube spec.containers{nginx} Warning Failed Failed to pull image "diyapploopback:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for diyapploopback, repository does not exist or may require 'docker login'
31m 30m 3 kubelet, minikube spec.containers{nginx} Warning Failed Error: ErrImagePull
31m 29m 4 kubelet, minikube spec.containers{nginx} Normal Pulling pulling image "diyapploopback:latest"
31m 16m 63 kubelet, minikube spec.containers{nginx} Normal BackOff Back-off pulling image "diyapploopback:latest"
31m 6m 105 kubelet, minikube spec.containers{nginx} Warning Failed Error: ImagePullBackOff
21s 21s 1 kubelet, minikube Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "pvc-689f3067-220e-11e8-a244-0800279a9a04"
20s 20s 1 kubelet, minikube Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-c62fx"
20s 20s 1 kubelet, minikube Normal SandboxChanged Pod sandbox changed, it will be killed and re-created.
17s 17s 1 kubelet, minikube spec.containers{nginx} Warning Failed Failed to pull image "diyapploopback:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for diyapploopback, repository does not exist or may require 'docker login'
17s 17s 1 kubelet, minikube spec.containers{nginx} Warning Failed Error: ErrImagePull
17s 17s 1 kubelet, minikube spec.containers{mariadb} Normal Pulled Container image "mariadb:10.3" already present on machine
17s 17s 1 kubelet, minikube spec.containers{mariadb} Normal Created Created container
16s 16s 1 kubelet, minikube spec.containers{mariadb} Normal Started Started container
16s 15s 2 kubelet, minikube spec.containers{nginx} Normal BackOff Back-off pulling image "diyapploopback:latest"
16s 15s 2 kubelet, minikube spec.containers{nginx} Warning Failed Error: ImagePullBackOff
19s 1s 2 kubelet, minikube spec.containers{nginx} Normal Pulling pulling image "diyapploopback:latest"
Seems I'm able to run it directly (only for debugging/diagnoses purposes..):
kubectl run abcxyz123 --image=diyapploopback --image-pull-policy=Never
If I describe the above deployment/container I get:
Name: abcxyz123-6749977548-stvsm
Namespace: diyclientapps
Node: minikube/192.168.99.100
Start Time: Wed, 07 Mar 2018 14:26:33 +0000
Labels: pod-template-hash=2305533104
run=abcxyz123
Annotations: <none>
Status: Running
IP: 172.17.0.9
Controllers: <none>
Containers:
abcxyz123:
Container ID: docker://c9b71667feba21ef259a395c9b8504e3e4968e5b9b35a191963f0576d0631d11
Image: diyapploopback
Image ID: docker://sha256:9590c4dc2ed16cb70a21c3385b7e0519ad0b1fece79e343a19337131600aa866
Port:
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 07 Mar 2018 14:42:45 +0000
Finished: Wed, 07 Mar 2018 14:42:48 +0000
Ready: False
Restart Count: 8
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c62fx (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-c62fx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c62fx
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
17m 17m 1 default-scheduler Normal Scheduled Successfully assigned abcxyz123-6749977548-stvsm to minikube
17m 17m 1 kubelet, minikube Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-c62fx"
17m 15m 5 kubelet, minikube spec.containers{abcxyz123} Normal Pulled Container image "diyapploopback" already present on machine
17m 15m 5 kubelet, minikube spec.containers{abcxyz123} Normal Created Created container
17m 15m 5 kubelet, minikube spec.containers{abcxyz123} Normal Started Started container
16m 1m 66 kubelet, minikube spec.containers{abcxyz123} Warning BackOff Back-off restarting failed container
imagePullPolicy: IfNotPresent
The above was not present (and it is required...) in my image config within the deployment...