Kube flannel in CrashLoopBackOff status - kubernetes

We just start to create our cluster on kubernetes.
Now we try to deploy tiller but we have en error:
NetworkPlugin cni failed to set up pod
"tiller-deploy-64c9d747bd-br9j7_kube-system" network: open
/run/flannel/subnet.env: no such file or directory
After that I call:
kubectl get pods --all-namespaces -o wide
And got response:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
kube-system coredns-78fcdf6894-ksdvt 1/1 Running 2 7d 192.168.0.4 kube-master <none>
kube-system coredns-78fcdf6894-p4l9q 1/1 Running 2 7d 192.168.0.5 kube-master <none>
kube-system etcd-kube-master 1/1 Running 2 7d 10.168.209.20 kube-master <none>
kube-system kube-apiserver-kube-master 1/1 Running 2 7d 10.168.209.20 kube-master <none>
kube-system kube-controller-manager-kube-master 1/1 Running 2 7d 10.168.209.20 kube-master <none>
kube-system kube-flannel-ds-amd64-42rl7 0/1 CrashLoopBackOff 2135 7d 10.168.209.17 node5 <none>
kube-system kube-flannel-ds-amd64-5fx2p 0/1 CrashLoopBackOff 2164 7d 10.168.209.14 node2 <none>
kube-system kube-flannel-ds-amd64-6bw5g 0/1 CrashLoopBackOff 2166 7d 10.168.209.15 node3 <none>
kube-system kube-flannel-ds-amd64-hm826 1/1 Running 1 7d 10.168.209.20 kube-master <none>
kube-system kube-flannel-ds-amd64-thjps 0/1 CrashLoopBackOff 2160 7d 10.168.209.16 node4 <none>
kube-system kube-flannel-ds-amd64-w99ch 0/1 CrashLoopBackOff 2166 7d 10.168.209.13 node1 <none>
kube-system kube-proxy-d6v2n 1/1 Running 0 7d 10.168.209.13 node1 <none>
kube-system kube-proxy-lcckg 1/1 Running 0 7d 10.168.209.16 node4 <none>
kube-system kube-proxy-pgblx 1/1 Running 1 7d 10.168.209.20 kube-master <none>
kube-system kube-proxy-rnqq5 1/1 Running 0 7d 10.168.209.14 node2 <none>
kube-system kube-proxy-wc959 1/1 Running 0 7d 10.168.209.15 node3 <none>
kube-system kube-proxy-wfqqs 1/1 Running 0 7d 10.168.209.17 node5 <none>
kube-system kube-scheduler-kube-master 1/1 Running 2 7d 10.168.209.20 kube-master <none>
kube-system kubernetes-dashboard-6948bdb78-97qcq 0/1 ContainerCreating 0 7d <none> node5 <none>
kube-system tiller-deploy-64c9d747bd-br9j7 0/1 ContainerCreating 0 45m <none> node4 <none>
We have some flannel pods in CrashLoopBackOff status. For example kube-flannel-ds-amd64-42rl7.
When I call:
kubectl describe pod -n kube-system kube-flannel-ds-amd64-42rl7
I've got status Running:
Name: kube-flannel-ds-amd64-42rl7
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: node5/10.168.209.17
Start Time: Wed, 22 Aug 2018 16:47:10 +0300
Labels: app=flannel
controller-revision-hash=911701653
pod-template-generation=1
tier=node
Annotations: <none>
Status: Running
IP: 10.168.209.17
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID: docker://eb7ee47459a54d401969b1770ff45b39dc5768b0627eec79e189249790270169
Image: quay.io/coreos/flannel:v0.10.0-amd64
Image ID: docker-pullable://quay.io/coreos/flannel#sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
Port: <none>
Host Port: <none>
Command:
cp
Args:
-f
/etc/kube-flannel/cni-conf.json
/etc/cni/net.d/10-flannel.conflist
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 22 Aug 2018 16:47:24 +0300
Finished: Wed, 22 Aug 2018 16:47:24 +0300
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/cni/net.d from cni (rw)
/etc/kube-flannel/ from flannel-cfg (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flannel-token-9wmch (ro)
Containers:
kube-flannel:
Container ID: docker://521b457c648baf10f01e26dd867b8628c0f0a0cc0ea416731de658e67628d54e
Image: quay.io/coreos/flannel:v0.10.0-amd64
Image ID: docker-pullable://quay.io/coreos/flannel#sha256:88f2b4d96fae34bfff3d46293f7f18d1f9f3ca026b4a4d288f28347fcb6580ac
Port: <none>
Host Port: <none>
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 30 Aug 2018 10:15:04 +0300
Finished: Thu, 30 Aug 2018 10:15:08 +0300
Ready: False
Restart Count: 2136
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: kube-flannel-ds-amd64-42rl7 (v1:metadata.name)
POD_NAMESPACE: kube-system (v1:metadata.namespace)
Mounts:
/etc/kube-flannel/ from flannel-cfg (rw)
/run from run (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flannel-token-9wmch (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
run:
Type: HostPath (bare host directory volume)
Path: /run
HostPathType:
cni:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
flannel-cfg:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-flannel-cfg
Optional: false
flannel-token-9wmch:
Type: Secret (a volume populated by a Secret)
SecretName: flannel-token-9wmch
Optional: false
QoS Class: Guaranteed
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 51m (x2128 over 7d) kubelet, node5 Container image "quay.io/coreos/flannel:v0.10.0-amd64" already present on machine
Warning BackOff 1m (x48936 over 7d) kubelet, node5 Back-off restarting failed container
here kube-controller-manager.yaml:
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: kube-controller-manager
tier: control-plane
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
- --address=127.0.0.1
- --allocate-node-cidrs=true
- --cluster-cidr=192.168.0.0/24
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --controllers=*,bootstrapsigner,tokencleaner
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --leader-elect=true
- --node-cidr-mask-size=24
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --use-service-account-credentials=true
image: k8s.gcr.io/kube-controller-manager-amd64:v1.11.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10252
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-controller-manager
resources:
requests:
cpu: 200m
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/kubernetes/controller-manager.conf
name: kubeconfig
readOnly: true
- mountPath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
name: flexvolume-dir
- mountPath: /etc/pki
name: etc-pki
readOnly: true
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
hostNetwork: true
priorityClassName: system-cluster-critical
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/kubernetes/controller-manager.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
type: DirectoryOrCreate
name: flexvolume-dir
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: etc-pki
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
status: {}
OS is CentOS Linux release 7.5.1804
logs from one of pods:
# kubectl logs --namespace kube-system kube-flannel-ds-amd64-5fx2p
main.go:475] Determining IP address of default interface
main.go:488] Using interface with name eth0 and address 10.168.209.14
main.go:505] Defaulting external address to interface address (10.168.209.14)
kube.go:131] Waiting 10m0s for node controller to sync
kube.go:294] Starting kube subnet manager
kube.go:138] Node controller sync successful
main.go:235] Created subnet manager: Kubernetes Subnet Manager - node2
main.go:238] Installing signal handlers
main.go:353] Found network config - Backend type: vxlan
vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
main.go:280] Error registering network: failed to acquire lease: node "node2" pod cidr not assigned
main.go:333] Stopping shutdownHandler...
Where error is?

For flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init.

Try this:
Failed to acquire lease simply means, the pod didn't get the podCIDR. Happened with me as well although the manifest on master-node says podCIDR true but still it wasn't working and funnel going in crashbackloop.
This is what i did to fix it.
From the master-node, first find out your funnel CIDR
sudo cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep -i cluster-cidr
Output:
- --cluster-cidr=172.168.10.0/24
Then run the following from the master node:
kubectl patch node slave-node-1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}'
where,
slave-node-1 is your node where acquire lease is failing
podCIDR is the cidr that you found in previous command
Hope this helps.

The reason is that
flannel run with CIDR=10.244.0.0/16 NOT 10.244.0.0/24 !!!
CNI Conflicts because the node installed multiple CNIs Plugin within /etc/cni/net.d/.
The 2 Interface flannel.1 and cni0 did not match each other.
Eg:
flannel.1=10.244.0.0 and cni0=10.244.1.1 will failed. It should be
flannel.1=10.244.0.0 and cni0=10.244.0.1
To fix this, please following the step below:
Step 0: Reset all Nodes within your Cluster. Run all nodes with
kubeadm reset --force;
Step 1: Down Interface cni0 and flannel.1.
sudo ifconfig cni0 down;
sudo ifconfig flannel.1 down;
Step 2: Delete Interface cni0 and flannel.1.
sudo ip link delete cni0;
sudo ip link delete flannel.1;
Step 3: Remove all items within /etc/cni/net.d/.
sudo rm -rf /etc/cni/net.d/;
Step 4: Re-Bootstrap your Kubernetes Cluster again.
kubeadm init --control-plane-endpoint="..." --pod-network-cidr=10.244.0.0/16;
Step 5: Re-deploy CNIs.
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml;
Step 6: Restart your CNIs, here I used Container Daemon (Containerd).
systemctl restart containerd;
This will ensure your Core-DNS working nicely.

I had a similar problem. I did the following steps to make it work:
Delete the nodes from the master by kubeadm reset on the worker node.
Clear the iptables rules by iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X.
Clerar the config file by rm -rf $HOME/.kube/config.
Reboot the worker node.
Disable the Swap on the worker node by swapoff -a.
Join the master node, again.

And also ensure SELinux set to Permissive or disabled.
# getenforce
Permissive

Had the same issue. When followed the solution mentioned by #PanDe, I got the following error.
[root#xxxxxx]# kubectl patch node myslavenode -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'
The Node "myslavenode" is invalid:
spec.podCIDRs: Forbidden: node updates may not change podCIDR except from "" to valid
[]: Forbidden: node updates may only change labels, taints, or capacity (or configSource, if the DynamicKubeletConfig feature gate is enabled).
In the end, when selinux was checked,it was enabled. Setting it to permissive resolved the issue. Thanks #senthil murugan.
Regards,
Vivek

Related

Weave CrashLoopBackOff on fresh HA Cluster

I create a HA clusters with kubeadm by following this guides:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
https://medium.com/faun/configuring-ha-kubernetes-cluster-on-bare-metal-servers-with-kubeadm-1-2-1e79f0f7857b
I've already have the ETCD nodes up and runnig, the APIserver through the HAproxy and keepalive running.
And 1 master node running with weave-net network.
I use this subnets
networking:
podSubnet: 192.168.240.0/22
serviceSubnet: 192.168.244.0/22
But when I join the second master node to the cluster, the weave pod created got CrashLoopBackOff.
I run the weave-net plugin with this line:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')&env.IPALLOC_RANGE=192.168.240.0/21"
I also found that the /etc/cni/net.d isn't created by kubelet when apply weave conf.
The Master Nodes
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubemaster01 Ready master 17h v1.18.1 192.168.129.137 <none> Ubuntu 18.04.4 LTS 4.15.0-96-generic docker://19.3.8
kubemaster02 Ready master 83m v1.18.1 192.168.129.138 <none> Ubuntu 18.04.4 LTS 4.15.0-91-generic docker://19.3.8
The Pods
oot#kubemaster01:~# kubectl get pods,svc --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/coredns-66bff467f8-kh4mh 0/1 Running 0 18h 192.168.240.3 kubemaster01 <none> <none>
kube-system pod/coredns-66bff467f8-xhzjk 0/1 Running 0 18h 192.168.240.2 kubemaster01 <none> <none>
kube-system pod/kube-apiserver-kubemaster01 1/1 Running 0 16h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/kube-apiserver-kubemaster02 1/1 Running 0 104m 192.168.129.138 kubemaster02 <none> <none>
kube-system pod/kube-controller-manager-kubemaster01 1/1 Running 0 16h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/kube-controller-manager-kubemaster02 1/1 Running 0 104m 192.168.129.138 kubemaster02 <none> <none>
kube-system pod/kube-proxy-sct5x 1/1 Running 0 18h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/kube-proxy-tsr65 1/1 Running 0 104m 192.168.129.138 kubemaster02 <none> <none>
kube-system pod/kube-scheduler-kubemaster01 1/1 Running 2 18h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/kube-scheduler-kubemaster02 1/1 Running 0 104m 192.168.129.138 kubemaster02 <none> <none>
kube-system pod/weave-net-4zdg6 2/2 Running 0 3h 192.168.129.137 kubemaster01 <none> <none>
kube-system pod/weave-net-bf8mq 1/2 CrashLoopBackOff 38 104m 192.168.129.138 kubemaster02 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 192.168.244.1 <none> 443/TCP 20h <none>
kube-system service/kube-dns ClusterIP 192.168.244.10 <none> 53/UDP,53/TCP,9153/TCP 18h k8s-app=kube-dns
IP Routes in Master Nodes
root#kubemaster01:~# ip r
default via 192.168.128.1 dev ens3 proto static
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.128.0/21 dev ens3 proto kernel scope link src 192.168.129.137
192.168.240.0/21 dev weave proto kernel scope link src 192.168.240.1
root#kubemaster02:~# ip r
default via 192.168.128.1 dev ens3 proto static
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.128.0/21 dev ens3 proto kernel scope link src 192.168.129.138
Description of the weave pod running on the second master node
root#kubemaster01:~# kubectl describe pod/weave-net-bf8mq -n kube-system
Name: weave-net-bf8mq
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: kubemaster02./192.168.129.138
Start Time: Fri, 17 Apr 2020 12:28:09 -0300
Labels: controller-revision-hash=79478b764c
name=weave-net
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 192.168.129.138
IPs:
IP: 192.168.129.138
Controlled By: DaemonSet/weave-net
Containers:
weave:
Container ID: docker://93bff012aaebb34dc338001bf73798b5eeefe32a4d50b82731b0ef003c63c786
Image: docker.io/weaveworks/weave-kube:2.6.2
Image ID: docker-pullable://weaveworks/weave-kube#sha256:a1f58e75f24f02e1c2fa2a95b9e55a1b94930f455e75bd5f4799e1a55671971f
Port: <none>
Host Port: <none>
Command:
/home/weave/launch.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 17 Apr 2020 14:15:59 -0300
Finished: Fri, 17 Apr 2020 14:16:29 -0300
Ready: False
Restart Count: 39
Requests:
cpu: 10m
Readiness: http-get http://127.0.0.1:6784/status delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HOSTNAME: (v1:spec.nodeName)
IPALLOC_RANGE: 192.168.240.0/21
Mounts:
/host/etc from cni-conf (rw)
/host/home from cni-bin2 (rw)
/host/opt from cni-bin (rw)
/host/var/lib/dbus from dbus (rw)
/lib/modules from lib-modules (rw)
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-xp46t (ro)
/weavedb from weavedb (rw)
weave-npc:
Container ID: docker://4de9116cae90cf3f6d59279dd1531938b102adcdd1b76464e5bbe2f2b013b060
Image: docker.io/weaveworks/weave-npc:2.6.2
Image ID: docker-pullable://weaveworks/weave-npc#sha256:5694b0b77003780333ccd1fc79810469434779cd86e926a17675cc5b70470459
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 17 Apr 2020 12:28:24 -0300
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Environment:
HOSTNAME: (v1:spec.nodeName)
Mounts:
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from weave-net-token-xp46t (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
weavedb:
Type: HostPath (bare host directory volume)
Path: /var/lib/weave
HostPathType:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
HostPathType:
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
HostPathType:
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
HostPathType:
dbus:
Type: HostPath (bare host directory volume)
Path: /var/lib/dbus
HostPathType:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
weave-net-token-xp46t:
Type: Secret (a volume populated by a Secret)
SecretName: weave-net-token-xp46t
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 11m (x17 over 81m) kubelet, kubemaster02. Container image "docker.io/weaveworks/weave-kube:2.6.2" already present on machine
Warning BackOff 85s (x330 over 81m) kubelet, kubemaster02. Back-off restarting failed container
The logs files complains about timeout, but that is because there is no network running.
root#kubemaster02:~# kubectl logs weave-net-bf8mq -name weave -n kube-system
FATA: 2020/04/17 17:22:04.386233 [kube-peers] Could not get peers: Get https://192.168.244.1:443/api/v1/nodes: dial tcp 192.168.244.1:443: i/o timeout
Failed to get peers
root#kubemaster02:~# kubectl logs weave-net-bf8mq -name weave-npc -n kube-system | more
INFO: 2020/04/17 15:28:24.851287 Starting Weaveworks NPC 2.6.2; node name "kubemaster02"
INFO: 2020/04/17 15:28:24.851469 Serving /metrics on :6781
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `NFLOG'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `BASE'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:408 registering plugin `PCAP'
Fri Apr 17 15:28:24 2020 <5> ulogd.c:981 building new pluginstance stack: 'log1:NFLOG,base1:BASE,pcap1:PCAP'
WARNING: scheduler configuration failed: Function not implemented
DEBU: 2020/04/17 15:28:24.887619 Got list of ipsets: []
ERROR: logging before flag.Parse: E0417 15:28:54.923915 19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:321: Failed to list *v1.Pod: Get https://192.168.244.1:443/api/v1/pods?limit=500&resourceVersion=0: dial
tcp 192.168.244.1:443: i/o timeout
ERROR: logging before flag.Parse: E0417 15:28:54.923895 19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:322: Failed to list *v1.NetworkPolicy: Get https://192.168.244.1:443/apis/networking.k8s.io/v1/networkpo
licies?limit=500&resourceVersion=0: dial tcp 192.168.244.1:443: i/o timeout
ERROR: logging before flag.Parse: E0417 15:28:54.924071 19321 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:320: Failed to list *v1.Namespace: Get https://192.168.244.1:443/api/v1/namespaces?limit=500&resourceVer
sion=0: dial tcp 192.168.244.1:443: i/o timeout
Any advice or suggestions?
Regards.
The error was in a misconfiguration with CRI-O as CRI runtime. Follow this installation guide corrects the issue.
https://kubernetes.io/docs/setup/production-environment/container-runtimes/

Improper cni install preventing coredns pods from starting

Just installed a single master cluster using kubeadm v1.15.0. However, coredns seems stuck in pending mode:
coredns-5c98db65d4-4pm65 0/1 Pending 0 2m17s <none> <none> <none> <none>
coredns-5c98db65d4-55hcc 0/1 Pending 0 2m2s <none> <none> <none> <none>
the following is what shows up for the pod:
kubectl describe pods coredns-5c98db65d4-4pm65 --namespace=kube-system
Name: coredns-5c98db65d4-4pm65
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=5c98db65d4
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-5c98db65d4
Containers:
coredns:
Image: k8s.gcr.io/coredns:1.3.1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-5t2wn (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-5t2wn:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-5t2wn
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 61s (x4 over 5m21s) default-scheduler 0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
I removed the taint on the master node, to no avail. Shouldn't I be able to created a single node master without any problems like this. I know scheduling pods on the master is not possible without removing the taint, but this is odd.
I tried adding the latest calico cni, to no avail, too.
I get the following running journalctl (systemctl shows no errors):
sudo journalctl -xn --unit kubelet.service
[sudo] password for gms:
-- Logs begin at Fri 2019-07-12 04:31:34 CDT, end at Tue 2019-07-16 16:58:17 CDT. --
Jul 16 16:57:54 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:57:54.122355 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:57:54 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:57:54.400606 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:57:59 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:57:59.124863 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:57:59 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:57:59.400924 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:04 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:04.127120 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:04 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:04.401266 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:09 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:09.129287 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:09 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:09.401520 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Jul 16 16:58:14 thalia0.ahc.umn.edu kubelet[11250]: E0716 16:58:14.133059 11250 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPl
Jul 16 16:58:14 thalia0.ahc.umn.edu kubelet[11250]: W0716 16:58:14.402008 11250 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d
Indeed, when I look in /etc/cni/net.d there is nothing there -> yes, I ran kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml... this is the output when I apply this:
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
I ran the following on the pod for calico-node, which is stuck in the following state:
calico-node-tcfhw 0/1 Init:0/3 0 11m 10.32.3.158
describe pods calico-node-tcfhw --namespace=kube-system
Name: calico-node-tcfhw
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: thalia0.ahc.umn.edu/10.32.3.158
Start Time: Tue, 16 Jul 2019 18:08:25 -0500
Labels: controller-revision-hash=844ddd97c6
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP: 10.32.3.158
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://1e1bf9e65cb182656f6f06a1bb8291237562f0f5a375e557a454942e81d32063
Image: calico/cni:v3.8.0
Image ID: docker-pullable://docker.io/calico/cni#sha256:decba0501ab0658e6e7da2f5625f1eabb8aba5690f9206caba3bf98caca5094c
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Running
Started: Tue, 16 Jul 2019 18:08:26 -0500
Ready: False
Restart Count: 0
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
install-cni:
Container ID:
Image: calico/cni:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
flexvol-driver:
Container ID:
Image: calico/pod2daemon-flexvol:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
Containers:
calico-node:
Container ID:
Image: calico/node:v3.8.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 250m
Liveness: http-get http://localhost:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -bird-ready -felix-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-b9c6p (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-b9c6p:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-b9c6p
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m15s default-scheduler Successfully assigned kube-system/calico-node-tcfhw to thalia0.ahc.umn.edu
Normal Pulled 9m14s kubelet, thalia0.ahc.umn.edu Container image "calico/cni:v3.8.0" already present on machine
Normal Created 9m14s kubelet, thalia0.ahc.umn.edu Created container upgrade-ipam
Normal Started 9m14s kubelet, thalia0.ahc.umn.edu Started container upgrade-ipam
I tried Flannel as a cni, but that was even worse. The kube-proxy wouldn't even start due to a taint!
EDIT ADDENDUM
Should the kube-controller-manager and kube-scheduler not have defined endpoints?
[gms#thalia0 ~]$ kubectl get ep --namespace=kube-system -o wide
NAME ENDPOINTS AGE
kube-controller-manager <none> 19h
kube-dns <none> 19h
kube-scheduler <none> 19h
[gms#thalia0 ~]$ kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
coredns-5c98db65d4-nmn4g 0/1 Pending 0 19h
coredns-5c98db65d4-qv8fm 0/1 Pending 0 19h
etcd-thalia0.x.x.edu. 1/1 Running 0 19h
kube-apiserver-thalia0.x.x.edu 1/1 Running 0 19h
kube-controller-manager-thalia0.x.x.edu 1/1 Running 0 19h
kube-proxy-4hrdc 1/1 Running 0 19h
kube-proxy-vb594 1/1 Running 0 19h
kube-proxy-zwrst 1/1 Running 0 19h
kube-scheduler-thalia0.x.x.edu 1/1 Running 0 19h
Lastly, for sanity's sake, I tried v1.13.1, and voila! Success:
NAME READY STATUS RESTARTS AGE
calico-node-pbrps 2/2 Running 0 15s
coredns-86c58d9df4-g5944 1/1 Running 0 2m40s
coredns-86c58d9df4-zntjl 1/1 Running 0 2m40s
etcd-thalia0.ahc.umn.edu 1/1 Running 0 110s
kube-apiserver-thalia0.ahc.umn.edu 1/1 Running 0 105s
kube-controller-manager-thalia0.ahc.umn.edu 1/1 Running 0 103s
kube-proxy-qxh2h 1/1 Running 0 2m39s
kube-scheduler-thalia0.ahc.umn.edu 1/1 Running 0 117s
EDIT 2
Tried sudo kubeadm upgrade plan and got an error on api-server's health and bad certs.
Ran this on the api-server:
kubectl logs kube-apiserver-thalia0.x.x.edu --namespace=kube-system1
and got a ton of errors of the sort TLS handshake error from 10.x.x.157:52384: remote error: tls: bad certificate, which were from nodes that have long been deleted from the cluster and, long after several kubeadm resets on the master, along with uninstall/reinstall of kubelet, kubeadm, etc.
Why are these old nodes showing up? Don't the certs get recreated on a kubeadm init?
This issue https://github.com/projectcalico/calico/issues/2699 had similar symptoms and indicates that deleting /var/lib/cni/ fixed the issue. You could see if it exists and delete it if so.
Coreos-dns doesn't start until Calico is started, check if your worker nodes are ready with this command
kubectl get nodes -owide
kubectl describe node <your-node>
or
kubectl get node <your-node> -oyaml
Other thing to check is the following message in the log :
"Unable to update cni config: No networks found in /etc/cni/net.d"
what you have in that directory?
Maybe cni isn't configured properly.
That directory /etc/cni/net.d should contain 2 files :
10-calico.conflist calico-kubeconfig
Below is the content of this two files, check if you have files like this in your directory
[root#master net.d]# cat 10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.0",
"plugins": [
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "master",
"mtu": 1440,
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}
[root#master net.d]# cat calico-kubeconfig
# Kubeconfig file for Calico CNI plugin.
apiVersion: v1
kind: Config
clusters:
- name: local
cluster:
server: https://[10.20.0.1]:443
certificate-authority-data: LSRt.... tLQJ=
users:
- name: calico
user:
token: "eUJh .... ZBoIA"
contexts:
- name: calico-context
context:
cluster: local
user: calico
current-context: calico-context

Jenkins app is not accessible outside Kubernetes cluster

On CentOS 7.4, I have set up a Kubernetes master node, pulled down jenkins image and deployed it to the cluster defining the jenkins service on a NodePort as below.
I can curl the jenkins app from the worker or master nodes using the IP defined by the service. But, I can not access the Jenkins app (dashboard) from my browser (outside cluster) using the public IP of the master node.
[administrator#abcdefgh ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
abcdefgh Ready master 19h v1.13.1
hgfedcba Ready <none> 19h v1.13.1
[administrator#abcdefgh ~]$ sudo docker pull jenkinsci/jenkins:2.154-alpine
[administrator#abcdefgh ~]$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-proxy v1.13.1 fdb321fd30a0 5 days ago 80.2MB
k8s.gcr.io/kube-controller-manager v1.13.1 26e6f1db2a52 5 days ago 146MB
k8s.gcr.io/kube-apiserver v1.13.1 40a63db91ef8 5 days ago 181MB
k8s.gcr.io/kube-scheduler v1.13.1 ab81d7360408 5 days ago 79.6MB
jenkinsci/jenkins 2.154-alpine aa25058d8320 2 weeks ago 222MB
k8s.gcr.io/coredns 1.2.6 f59dcacceff4 6 weeks ago 40MB
k8s.gcr.io/etcd 3.2.24 3cab8e1b9802 2 months ago 220MB
quay.io/coreos/flannel v0.10.0-amd64 f0fad859c909 10 months ago 44.6MB
k8s.gcr.io/pause 3.1 da86e6ba6ca1 12 months ago 742kB
[administrator#abcdefgh ~]$ ls -l
total 8
-rw------- 1 administrator administrator 678 Dec 18 06:12 jenkins-deployment.yaml
-rw------- 1 administrator administrator 410 Dec 18 06:11 jenkins-service.yaml
[administrator#abcdefgh ~]$ cat jenkins-service.yaml
apiVersion: v1
kind: Service
metadata:
name: jenkins-ui
spec:
type: NodePort
ports:
- protocol: TCP
port: 8080
targetPort: 8080
name: ui
selector:
app: jenkins-master
---
apiVersion: v1
kind: Service
metadata:
name: jenkins-discovery
spec:
selector:
app: jenkins-master
ports:
- protocol: TCP
port: 50000
targetPort: 50000
name: jenkins-slaves
[administrator#abcdefgh ~]$ cat jenkins-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jenkins
spec:
replicas: 1
template:
metadata:
labels:
app: jenkins-master
spec:
containers:
- image: jenkins/jenkins:2.154-alpine
name: jenkins
ports:
- containerPort: 8080
name: http-port
- containerPort: 50000
name: jnlp-port
env:
- name: JAVA_OPTS
value: -Djenkins.install.runSetupWizard=false
volumeMounts:
- name: jenkins-home
mountPath: /var/jenkins_home
volumes:
- name: jenkins-home
emptyDir: {}
[administrator#abcdefgh ~]$ kubectl create -f jenkins-service.yaml
service/jenkins-ui created
service/jenkins-discovery created
[administrator#abcdefgh ~]$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
jenkins-discovery ClusterIP 10.98.--.-- <none> 50000/TCP 19h
jenkins-ui NodePort 10.97.--.-- <none> 8080:31587/TCP 19h
kubernetes ClusterIP 10.96.--.-- <none> 443/TCP 20h
[administrator#abcdefgh ~]$ kubectl create -f jenkins-deployment.yaml
deployment.extensions/jenkins created
[administrator#abcdefgh ~]$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
jenkins 1/1 1 1 19h
[administrator#abcdefgh ~]$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default jenkins-6497cf9dd4-f9r5b 1/1 Running 0 19h
kube-system coredns-86c58d9df4-jfq5b 1/1 Running 0 20h
kube-system coredns-86c58d9df4-s4k6d 1/1 Running 0 20h
kube-system etcd-abcdefgh 1/1 Running 1 20h
kube-system kube-apiserver-abcdefgh 1/1 Running 1 20h
kube-system kube-controller-manager-abcdefgh 1/1 Running 5 20h
kube-system kube-flannel-ds-amd64-2w68w 1/1 Running 1 20h
kube-system kube-flannel-ds-amd64-6zl4g 1/1 Running 1 20h
kube-system kube-proxy-9r4xt 1/1 Running 1 20h
kube-system kube-proxy-s7fj2 1/1 Running 1 20h
kube-system kube-scheduler-abcdefgh 1/1 Running 8 20h
[administrator#abcdefgh ~]$ kubectl describe pod jenkins-6497cf9dd4-f9r5b
Name: jenkins-6497cf9dd4-f9r5b
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: hgfedcba/10.41.--.--
Start Time: Tue, 18 Dec 2018 06:32:50 -0800
Labels: app=jenkins-master
pod-template-hash=6497cf9dd4
Annotations: <none>
Status: Running
IP: 10.244.--.--
Controlled By: ReplicaSet/jenkins-6497cf9dd4
Containers:
jenkins:
Container ID: docker://55912512a7aa1f782784690b558d74001157f242a164288577a85901ecb5d152
Image: jenkins/jenkins:2.154-alpine
Image ID: docker-pullable://jenkins/jenkins#sha256:b222875a2b788f474db08f5f23f63369b0f94ed7754b8b32ac54b8b4d01a5847
Ports: 8080/TCP, 50000/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Tue, 18 Dec 2018 07:16:32 -0800
Ready: True
Restart Count: 0
Environment:
JAVA_OPTS: -Djenkins.install.runSetupWizard=false
Mounts:
/var/jenkins_home from jenkins-home (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wqph5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
jenkins-home:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-wqph5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-wqph5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
[administrator#abcdefgh ~]$ kubectl describe svc jenkins-ui
Name: jenkins-ui
Namespace: default
Labels: <none>
Annotations: <none>
Selector: app=jenkins-master
Type: NodePort
IP: 10.97.--.--
Port: ui 8080/TCP
TargetPort: 8080/TCP
NodePort: ui 31587/TCP
Endpoints: 10.244.--.--:8080
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
# Check if NodePort along with Kubernetes ports are open
[administrator#abcdefgh ~]$ sudo su root
[root#abcdefgh administrator]# systemctl start firewalld
[root#abcdefgh administrator]# firewall-cmd --permanent --add-port=6443/tcp # Kubernetes API Server
Warning: ALREADY_ENABLED: 6443:tcp
success
[root#abcdefgh administrator]# firewall-cmd --permanent --add-port=2379-2380/tcp # etcd server client API
Warning: ALREADY_ENABLED: 2379-2380:tcp
success
[root#abcdefgh administrator]# firewall-cmd --permanent --add-port=10250/tcp # Kubelet API
Warning: ALREADY_ENABLED: 10250:tcp
success
[root#abcdefgh administrator]# firewall-cmd --permanent --add-port=10251/tcp # kube-scheduler
Warning: ALREADY_ENABLED: 10251:tcp
success
[root#abcdefgh administrator]# firewall-cmd --permanent --add-port=10252/tcp # kube-controller-manager
Warning: ALREADY_ENABLED: 10252:tcp
success
[root#abcdefgh administrator]# firewall-cmd --permanent --add-port=10255/tcp # Read-Only Kubelet API
Warning: ALREADY_ENABLED: 10255:tcp
success
[root#abcdefgh administrator]# firewall-cmd --permanent --add-port=31587/tcp # NodePort of jenkins-ui service
Warning: ALREADY_ENABLED: 31587:tcp
success
[root#abcdefgh administrator]# firewall-cmd --reload
success
[administrator#abcdefgh ~]$ kubectl cluster-info
Kubernetes master is running at https://10.41.--.--:6443
KubeDNS is running at https://10.41.--.--:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[administrator#hgfedcba ~]$ curl 10.41.--.--:8080
curl: (7) Failed connect to 10.41.--.--:8080; Connection refused
# Successfully curl jenkins app using its service IP from the worker node
[administrator#hgfedcba ~]$ curl 10.97.--.--:8080
<!DOCTYPE html><html><head resURL="/static/5882d14a" data-rooturl="" data-resurl="/static/5882d14a">
<title>Dashboard [Jenkins]</title><link rel="stylesheet" ...
...
Would you know how to do that? Happy to provide additional logs. Also, I have installed jenkins from yum on another similar machine without any docker or kubernetes and it's possible to access it through 10.20.30.40:8080 in my browser so there is no provider firewall preventing me from doing that.
Your Jenkins Service is of type NodePort. That means that a specific port number, on any node within your cluster, will deliver your Jenkins UI.
When you described your Service, you can see that the port assigned was 31587.
You should be able to browse to http://SOME_IP:31587

Kube-Flannel cant get CIDR although PodCIDR available on node

currently I am setting up Kubernetes on a 1 Master 2 Node enviorement.
I succesfully initialized the Master and added the nodes to the Cluster
kubectl get nodes
When I joined the Nodes to the cluster, the kube-proxy pod started succesfully, but the kube-flannel pod gets an error and runs into a CrashLoopBackOff.
flannel-pod.log:
I0613 09:03:36.820387 1 main.go:475] Determining IP address of default interface,
I0613 09:03:36.821180 1 main.go:488] Using interface with name ens160 and address 172.17.11.2,
I0613 09:03:36.821233 1 main.go:505] Defaulting external address to interface address (172.17.11.2),
I0613 09:03:37.015163 1 kube.go:131] Waiting 10m0s for node controller to sync,
I0613 09:03:37.015436 1 kube.go:294] Starting kube subnet manager,
I0613 09:03:38.015675 1 kube.go:138] Node controller sync successful,
I0613 09:03:38.015767 1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - caasfaasslave1.XXXXXX.local,
I0613 09:03:38.015828 1 main.go:238] Installing signal handlers,
I0613 09:03:38.016109 1 main.go:353] Found network config - Backend type: vxlan,
I0613 09:03:38.016281 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false,
E0613 09:03:38.016872 1 main.go:280] Error registering network: failed to acquire lease: node "caasfaasslave1.XXXXXX.local" pod cidr not assigned,
I0613 09:03:38.016966 1 main.go:333] Stopping shutdownHandler...,
On the Node, I can verify that the PodCDIR is available:
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
172.17.12.0/24
On the Masters kube-controller-manager, the pod cidr is also there
[root#caasfaasmaster manifests]# cat kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: kube-controller-manager
tier: control-plane
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
- --leader-elect=true
- --controllers=*,bootstrapsigner,tokencleaner
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --address=127.0.0.1
- --use-service-account-credentials=true
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --allocate-node-cidrs=true
- --cluster-cidr=172.17.12.0/24
- --node-cidr-mask-size=24
env:
- name: http_proxy
value: http://ntlmproxy.XXXXXX.local:3154
- name: https_proxy
value: http://ntlmproxy.XXXXXX.local:3154
- name: no_proxy
value: .XXXXX.local,172.17.11.0/24,172.17.12.0/24
image: k8s.gcr.io/kube-controller-manager-amd64:v1.10.4
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10252
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-controller-manager
resources:
requests:
cpu: 200m
volumeMounts:
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/kubernetes/controller-manager.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/pki
name: ca-certs-etc-pki
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: ca-certs-etc-pki
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/kubernetes/controller-manager.conf
type: FileOrCreate
name: kubeconfig
status: {}
XXXXX for anonymization
I initialized the master with the following kubeadm comman (which also went through without any errors)
kubeadm init --pod-network-cidr=172.17.12.0/24 --service-
cidr=172.17.11.129/25 --service-dns-domain=dcs.XXXXX.local
Does anyone know what could cause my issues and how to fix them?
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system etcd-caasfaasmaster.XXXXXX.local 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-apiserver-caasfaasmaster.XXXXXX.local 1/1 Running 1 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-controller-manager-caasfaasmaster.XXXXXX.local 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-dns-75c5968bf9-qfh96 3/3 Running 0 16h 172.17.12.2 caasfaasmaster.XXXXXX.local
kube-system kube-flannel-ds-4b6kf 0/1 CrashLoopBackOff 205 16h 172.17.11.2 caasfaasslave1.XXXXXX.local
kube-system kube-flannel-ds-j2fz6 0/1 CrashLoopBackOff 191 16h 172.17.11.3 caasfassslave2.XXXXXX.local
kube-system kube-flannel-ds-qjd89 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-proxy-h4z54 1/1 Running 0 16h 172.17.11.3 caasfassslave2.XXXXXX.local
kube-system kube-proxy-sjwl2 1/1 Running 0 16h 172.17.11.2 caasfaasslave1.XXXXXX.local
kube-system kube-proxy-zc5xh 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-scheduler-caasfaasmaster.XXXXXX.local 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
Failed to acquire lease simply means, the pod didn't get the podCIDR. Happened with me as well although the manifest on master-node says podCIDR true but still it wasn't working and funnel going in crashbackloop.
This is what i did to fix it.
From the master-node, first find out your funnel CIDR
sudo cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep -i cluster-cidr
Output:
- --cluster-cidr=172.168.10.0/24
Then run the following from the master node:
kubectl patch node slave-node-1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}'
where,
slave-node-1 is your node where acquire lease is failing
podCIDR is the cidr that you found in previous command
Hope this helps.
According to Flannel documentation:
At the bare minimum, you must tell flannel an IP range (subnet) that
it should use for the overlay. Here is an example of the minimum
flannel configuration:
{ "Network": "10.1.0.0/16" }
Therefore, you need to specify a network for pods with a minimum size of /16, and it should not be a part of your existing network because Flannel uses encapsulation to connect pods on different nodes to one overlay network.
Here is the part of documentation which describes it:
With Docker, each container is assigned an IP address that can be used
to communicate with other containers on the same host. For
communicating over a network, containers are tied to the IP addresses
of the host machines and must rely on port-mapping to reach the
desired container. This makes it difficult for applications running
inside containers to advertise their external IP and port as that
information is not available to them.
flannel solves the problem by giving each container an IP that can be
used for container-to-container communication. It uses packet
encapsulation to create a virtual overlay network that spans the whole
cluster. More specifically, flannel gives each host an IP subnet
(/24 by default) from which the Docker daemon is able to allocate
IPs to the individual containers.
In other words, you should recreate your cluster with settings like these:
kubeadm init --pod-network-cidr=10.17.0.0/16 --service-cidr=10.18.0.0/24 --service-dns-domain=dcs.XXXXX.local

MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32

This is 2nd question following 1st question at
PersistentVolumeClaim is not bound: "nfs-pv-provisioning-demo"
I am setting up a kubernetes lab using one node only and learning to setup kubernetes nfs. I am following kubernetes nfs example step by step from the following link: https://github.com/kubernetes/examples/tree/master/staging/volumes/nfs
Based on feedback provided by 'helmbert', I modified the content of
https://github.com/kubernetes/examples/blob/master/staging/volumes/nfs/provisioner/nfs-server-gce-pv.yaml
It works and I don't see the event "PersistentVolumeClaim is not bound: “nfs-pv-provisioning-demo”" anymore.
$ cat nfs-server-local-pv01.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv01
labels:
type: local
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp/data01"
$ cat nfs-server-local-pvc01.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pv-provisioning-demo
labels:
demo: nfs-pv-provisioning
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv01 10Gi RWO Retain Available 4s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nfs-pv-provisioning-demo Bound pv01 10Gi RWO 2m
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-server-nlzlv 1/1 Running 0 1h
$ kubectl describe pods nfs-server-nlzlv
Name: nfs-server-nlzlv
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 19:32:21 +0000
Labels: role=nfs-server
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-server","uid":"b1b00292-cef2-11e7-8ed3-000d3a04eb...
Status: Running
IP: 10.32.0.3
Created By: ReplicationController/nfs-server
Controlled By: ReplicationController/nfs-server
Containers:
nfs-server:
Container ID: docker://1ea76052920d4560557cfb5e5bfc9f8efc3af5f13c086530bd4e0aded201955a
Image: gcr.io/google_containers/volume-nfs:0.8
Image ID: docker-pullable://gcr.io/google_containers/volume-nfs#sha256:83ba87be13a6f74361601c8614527e186ca67f49091e2d0d4ae8a8da67c403ee
Ports: 2049/TCP, 20048/TCP, 111/TCP
State: Running
Started: Tue, 21 Nov 2017 19:32:43 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/exports from mypvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
mypvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-pv-provisioning-demo
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
I continued the rest of steps and reached the "Setup the fake backend" section and ran the following command:
$ kubectl create -f examples/volumes/nfs/nfs-busybox-rc.yaml
I see status 'ContainerCreating' and never change to 'Running' for both nfs-busybox pods. Is this because the container image is for Google Cloud as shown in the yaml?
https://github.com/kubernetes/examples/blob/master/staging/volumes/nfs/nfs-server-rc.yaml
containers:
- name: nfs-server
image: gcr.io/google_containers/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: mypvc
Do I have to replace that 'image' line to something else because I don't use Google Cloud for this lab? I only have a single node in my lab. Do I have to rewrite the definition of 'containers' above? What should I replace the 'image' line with? Do I need to download dockerized 'nfs image' from somewhere?
$ kubectl describe pvc nfs
Name: nfs
Namespace: default
StorageClass:
Status: Bound
Volume: nfs
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
Capacity: 1Mi
Access Modes: RWX
Events: <none>
$ kubectl describe pv nfs
Name: nfs
Labels: <none>
Annotations: pv.kubernetes.io/bound-by-controller=yes
StorageClass:
Status: Bound
Claim: default/nfs
Reclaim Policy: Retain
Access Modes: RWX
Capacity: 1Mi
Message:
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 10.111.29.157
Path: /
ReadOnly: false
Events: <none>
$ kubectl get rc
NAME DESIRED CURRENT READY AGE
nfs-busybox 2 2 0 25s
nfs-server 1 1 1 1h
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
nfs-busybox-lmgtx 0/1 ContainerCreating 0 3m
nfs-busybox-xn9vz 0/1 ContainerCreating 0 3m
nfs-server-nlzlv 1/1 Running 0 1h
$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 20m
nfs-server ClusterIP 10.111.29.157 <none> 2049/TCP,20048/TCP,111/TCP 9s
$ kubectl describe services nfs-server
Name: nfs-server
Namespace: default
Labels: <none>
Annotations: <none>
Selector: role=nfs-server
Type: ClusterIP
IP: 10.111.29.157
Port: nfs 2049/TCP
TargetPort: 2049/TCP
Endpoints: 10.32.0.3:2049
Port: mountd 20048/TCP
TargetPort: 20048/TCP
Endpoints: 10.32.0.3:20048
Port: rpcbind 111/TCP
TargetPort: 111/TCP
Endpoints: 10.32.0.3:111
Session Affinity: None
Events: <none>
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
nfs 1Mi RWX Retain Bound default/nfs 38m
pv01 10Gi RWO Retain Bound default/nfs-pv-provisioning-demo 1h
I see repeating events - MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
$ kubectl describe pod nfs-busybox-lmgtx
Name: nfs-busybox-lmgtx
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 20:39:35 +0000
Labels: name=nfs-busybox
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-busybox","uid":"15d683c2-cefc-11e7-8ed3-000d3a04e...
Status: Pending
IP:
Created By: ReplicationController/nfs-busybox
Controlled By: ReplicationController/nfs-busybox
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port: <none>
Command:
sh
-c
while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt from nfs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned nfs-busybox-lmgtx to lab-kube-06
Normal SuccessfulMountVolume 17m kubelet, lab-kube-06 MountVolume.SetUp succeeded for volume "default-token-grgdz"
Warning FailedMount 17m kubelet, lab-kube-06 MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs 10.111.29.157:/ /var/lib/kubelet/pods/15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit run-43641.scope.
mount: wrong fs type, bad option, bad superblock on 10.111.29.157:/,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try
dmesg | tail or so.
Warning FailedMount 9m (x4 over 15m) kubelet, lab-kube-06 Unable to mount volumes for pod "nfs-busybox-lmgtx_default(15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-lmgtx". list of unattached/unmounted volumes=[nfs]
Warning FailedMount 4m (x8 over 15m) kubelet, lab-kube-06 (combined from similar events): Unable to mount volumes for pod "nfs-busybox-lmgtx_default(15d8d6d6-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-lmgtx". list of unattached/unmounted volumes=[nfs]
Warning FailedSync 2m (x7 over 15m) kubelet, lab-kube-06 Error syncing pod
$ kubectl describe pod nfs-busybox-xn9vz
Name: nfs-busybox-xn9vz
Namespace: default
Node: lab-kube-06/10.0.0.6
Start Time: Tue, 21 Nov 2017 20:39:35 +0000
Labels: name=nfs-busybox
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"default","name":"nfs-busybox","uid":"15d683c2-cefc-11e7-8ed3-000d3a04e...
Status: Pending
IP:
Created By: ReplicationController/nfs-busybox
Controlled By: ReplicationController/nfs-busybox
Containers:
busybox:
Container ID:
Image: busybox
Image ID:
Port: <none>
Command:
sh
-c
while true; do date > /mnt/index.html; hostname >> /mnt/index.html; sleep $(($RANDOM % 5 + 5)); done
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/mnt from nfs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-grgdz (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs
ReadOnly: false
default-token-grgdz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-grgdz
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 59m (x6 over 1h) kubelet, lab-kube-06 Unable to mount volumes for pod "nfs-busybox-xn9vz_default(15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd)": timeout expired waiting for volumes to attach/mount for pod "default"/"nfs-busybox-xn9vz". list of unattached/unmounted volumes=[nfs]
Warning FailedMount 7m (x32 over 1h) kubelet, lab-kube-06 (combined from similar events): MountVolume.SetUp failed for volume "nfs" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs --scope -- mount -t nfs 10.111.29.157:/ /var/lib/kubelet/pods/15d7fb5e-cefc-11e7-8ed3-000d3a04ebcd/volumes/kubernetes.io~nfs/nfs
Output: Running scope as unit run-59365.scope.
mount: wrong fs type, bad option, bad superblock on 10.111.29.157:/,
missing codepage or helper program, or other error
(for several filesystems (e.g. nfs, cifs) you might
need a /sbin/mount.<type> helper program)
In some cases useful info is found in syslog - try
dmesg | tail or so.
Warning FailedSync 2m (x31 over 1h) kubelet, lab-kube-06 Error syncing pod
Had the same problem,
sudo apt install nfs-kernel-server
directly on the nodes fixed it for ubuntu 18.04 server.
NFS server running on AWS EC2.
My pod was stuck in ContainerCreating state
I was facing this issue because of the Kubernetes cluster node CIDR range was not present in the inbound rule of Security Group of my AWS EC2 instance(where my NFS server was running )
Solution:
Added my Kubernetes cluser Node CIDR range to inbound rule of Security Group.
Installed the following nfs libraries on node machine of CentOS worked for me.
yum install -y nfs-utils nfs-utils-lib
Installing the nfs-common library in ubuntu worked for me.