Coredns in CrashLoopBackOff (kubernetes 1.11) - kubernetes

I'm trying to install kubernetes on an Ubuntu 16.04 VM, followed instructions at https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/, using weave as my pod network add-on.
I'm seeing similar issue as coredns pods have CrashLoopBackOff or Error state, but I didn't see a solution there, and the versions I'm using are different:
kubeadm 1.11.4-00
kubectl 1.11.4-00
kubelet 1.11.4-00
kubernetes-cni 0.6.0-00
Docker version 1.13.1-cs8, build 91ca5f2
weave script 2.5.0
weave 2.5.0
I'm running behind a corporate firewall, so I set my proxy variables, then ran kubeadm init as follows:
# echo $http_proxy
http://135.28.13.11:8080
# echo $https_proxy
http://135.28.13.11:8080
# echo $no_proxy
127.0.0.1,135.21.27.139,135.0.0.0/8,10.96.0.0/12,10.32.0.0/12
# kubeadm init --pod-network-cidr=10.32.0.0/12
# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
# kubectl taint nodes --all node-role.kubernetes.io/master-
Both coredns pods stay in CrashLoopBackOff
# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default hostnames-674b556c4-2b5h2 1/1 Running 0 5h 10.32.0.6 mtpnjvzonap001 <none>
default hostnames-674b556c4-4bzdj 1/1 Running 0 5h 10.32.0.5 mtpnjvzonap001 <none>
default hostnames-674b556c4-64gx5 1/1 Running 0 5h 10.32.0.4 mtpnjvzonap001 <none>
kube-system coredns-78fcdf6894-s7rvx 0/1 CrashLoopBackOff 18 1h 10.32.0.7 mtpnjvzonap001 <none>
kube-system coredns-78fcdf6894-vxwgv 0/1 CrashLoopBackOff 80 6h 10.32.0.2 mtpnjvzonap001 <none>
kube-system etcd-mtpnjvzonap001 1/1 Running 0 6h 135.21.27.139 mtpnjvzonap001 <none>
kube-system kube-apiserver-mtpnjvzonap001 1/1 Running 0 1h 135.21.27.139 mtpnjvzonap001 <none>
kube-system kube-controller-manager-mtpnjvzonap001 1/1 Running 0 6h 135.21.27.139 mtpnjvzonap001 <none>
kube-system kube-proxy-2c4tx 1/1 Running 0 6h 135.21.27.139 mtpnjvzonap001 <none>
kube-system kube-scheduler-mtpnjvzonap001 1/1 Running 0 1h 135.21.27.139 mtpnjvzonap001 <none>
kube-system weave-net-bpx22 2/2 Running 0 6h 135.21.27.139 mtpnjvzonap001 <none>
coredns pods have this entry in their log
E1114 20:59:13.848196 1 reflector.go:205]
github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed
to list *v1.Service: Get
https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0:
dial tcp 10.96.0.1:443: i/o timeout
This suggests to me that coredns cannot access apiserver pod using its cluster IP:
# kubectl describe svc/kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 135.21.27.139:6443
Session Affinity: None
Events: <none>
I also went through the troubleshooting steps at https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/
I created a busybox pod for testing
I created the hostnames deployment successfully
I exposed the hostnames deployment successfully
From the busybox pod, I accessed the hostnames service by its cluster IP successfully
from the node, I accessed the hostnames service by its cluster IP successfully
So in short, I created the hostnames service which had a cluster IP in 10.96.0.0/12 space (as expected), and it works, but for some reason, pods cannot access the apiserver's cluster IP of 10.96.0.1, though from the node I can access 10.96.0.1:
# wget --no-check-certificate https://10.96.0.1/hello
--2018-11-14 21:44:25-- https://10.96.0.1/hello
Connecting to 10.96.0.1:443... connected.
WARNING: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 403 Forbidden
2018-11-14 21:44:25 ERROR 403: Forbidden.
Some other things I checked, based on advice from others who reported a similar problem:
# sysctl net.ipv4.conf.all.forwarding
net.ipv4.conf.all.forwarding = 1
# sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
# iptables-save | egrep ':INPUT|:OUTPUT|:POSTROUTING|:FORWARD'
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [11:692]
:POSTROUTING ACCEPT [11:692]
:INPUT ACCEPT [1697:364811]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1652:363693]
# ls -l /usr/sbin/conntrack
-rwxr-xr-x 1 root root 65632 Jan 24 2016 /usr/sbin/conntrack
# systemctl status firewalld
● firewalld.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)
I checked the log for kube-proxy, did not see any errors.
I also tried deleting coredns pods, apiserver pod; they are recreated (as expected), but the problem remains.
Here's a copy of the log from the weave container
# kubectl logs -n kube-system weave-net-bpx22 weave
DEBU: 2018/11/14 15:56:10.909921 [kube-peers] Checking peer "aa:53:be:75:71:f7" against list &{[]}
Peer not in list; removing persisted data
INFO: 2018/11/14 15:56:11.041807 Command line options: map[name:aa:53:be:75:71:f7 nickname:mtpnjvzonap001 ipalloc-init:consensus=1 ipalloc-range:10.32.0.0/12 db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 conn-limit:100 datapath:datapath no-dns:true port:6783]
INFO: 2018/11/14 15:56:11.042230 weave 2.5.0
INFO: 2018/11/14 15:56:11.198348 Bridge type is bridged_fastdp
INFO: 2018/11/14 15:56:11.198372 Communication between peers is unencrypted.
INFO: 2018/11/14 15:56:11.203206 Our name is aa:53:be:75:71:f7(mtpnjvzonap001)
INFO: 2018/11/14 15:56:11.203249 Launch detected - using supplied peer list: [135.21.27.139]
INFO: 2018/11/14 15:56:11.216398 Checking for pre-existing addresses on weave bridge
INFO: 2018/11/14 15:56:11.229313 [allocator aa:53:be:75:71:f7] No valid persisted data
INFO: 2018/11/14 15:56:11.233391 [allocator aa:53:be:75:71:f7] Initialising via deferred consensus
INFO: 2018/11/14 15:56:11.233443 Sniffing traffic on datapath (via ODP)
INFO: 2018/11/14 15:56:11.234120 ->[135.21.27.139:6783] attempting connection
INFO: 2018/11/14 15:56:11.234302 ->[135.21.27.139:49182] connection accepted
INFO: 2018/11/14 15:56:11.234818 ->[135.21.27.139:6783|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/11/14 15:56:11.234843 ->[135.21.27.139:49182|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/11/14 15:56:11.236010 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/11/14 15:56:11.236424 Listening for metrics requests on 0.0.0.0:6782
INFO: 2018/11/14 15:56:11.990529 [kube-peers] Added myself to peer list &{[{aa:53:be:75:71:f7 mtpnjvzonap001}]}
DEBU: 2018/11/14 15:56:11.995901 [kube-peers] Nodes that have disappeared: map[]
10.32.0.1
135.21.27.139
DEBU: 2018/11/14 15:56:12.075738 registering for updates for node delete events
INFO: 2018/11/14 15:56:41.279799 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/14 20:52:47.025412 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/15 01:46:32.842792 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/15 09:06:03.624359 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout
INFO: 2018/11/15 14:34:02.070893 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout
Here are the events for the 2 coredns pods
# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-6f9q6
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
41m 20h 245 coredns-78fcdf6894-6f9q6.1568eab25f0acb02 Pod spec.containers{coredns} Normal Killing kubelet, mtpnjvzonap001 Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
26m 20h 248 coredns-78fcdf6894-6f9q6.1568ea920f72ddd4 Pod spec.containers{coredns} Normal Pulled kubelet, mtpnjvzonap001 Container image "k8s.gcr.io/coredns:1.1.3" already present on machine
5m 20h 1256 coredns-78fcdf6894-6f9q6.1568eaa1fd9216d2 Pod spec.containers{coredns} Warning Unhealthy kubelet, mtpnjvzonap001 Liveness probe failed: HTTP probe failed with statuscode: 503
1m 19h 2963 coredns-78fcdf6894-6f9q6.1568eb75f2b1af3e Pod spec.containers{coredns} Warning BackOff kubelet, mtpnjvzonap001 Back-off restarting failed container
# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-skjwz
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
6m 20h 1259 coredns-78fcdf6894-skjwz.1568eaa181fbeffe Pod spec.containers{coredns} Warning Unhealthy kubelet, mtpnjvzonap001 Liveness probe failed: HTTP probe failed with statuscode: 503
1m 19h 2969 coredns-78fcdf6894-skjwz.1568eb7578188f24 Pod spec.containers{coredns} Warning BackOff kubelet, mtpnjvzonap001 Back-off restarting failed container
#
Any help or further troubleshooting steps are welcome

I had the same problem and needed to allow several ports in my firewall: 22, 53, 6443, 6783, 6784, 8285.
I copied the rules from an existing healthy cluster. Probably only 6443, shown above as the target port for the coredns service, is required for this error and the others are for other services I run in my cluster.
With Ubuntu this was uncomplicated firewall
ufw allow 22/tcp # allowed for ssh, included in case you had firewall disabled altogether
ufw allow 6443
ufw allow 53
ufw allow 8285
ufw allow 6783
ufw allow 6784

Related

CoreDNS pods stuck in ContainerCreating - Kubernetes

I am still new to Kubernetes and I was trying to set up a cluster on bare metal servers according to the official docu.
Right now I am running a one worker and one master node configuration, but I am struggling to run all the pods once the cluster initializes. The main problem is the coredns pods, that are stuck in the ContainerCreating state.
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-78fcd69978-4vtsp 0/1 ContainerCreating 0 5s
kube-system coredns-78fcd69978-wtn2c 0/1 ContainerCreating 0 12h
kube-system etcd-dcpoth24213118 1/1 Running 4 12h
kube-system kube-apiserver-dcpoth24213118 1/1 Running 0 12h
kube-system kube-controller-manager-dcpoth24213118 1/1 Running 0 12h
kube-system kube-proxy-8282p 1/1 Running 0 12h
kube-system kube-scheduler-dcpoth24213118 1/1 Running 0 12h
kube-system weave-net-6zz2j 2/2 Running 0 12h
After checking the logs I've noticed this error. The problem is I don't really know what the error is refering to.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19s default-scheduler Successfully assigned kube-system/coredns-78fcd69978-4vtsp to dcpoth24213118
Warning FailedCreatePodSandBox 13s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "2521c9dd723f3fc50b3510791a8c35cbc9ec19768468eb3da3367274a4dfcbba" network for pod "coredns-78fcd69978-4vtsp": networkPlugin cni failed to set up pod "coredns-78fcd69978-4vtsp_kube-system" network: error getting ClusterInformation: Get "https://[10.43.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.43.0.1:443: connect: no route to host, failed to clean up sandbox container "2521c9dd723f3fc50b3510791a8c35cbc9ec19768468eb3da3367274a4dfcbba" network for pod "coredns-78fcd69978-4vtsp": networkPlugin cni failed to teardown pod "coredns-78fcd69978-4vtsp_kube-system" network: error getting ClusterInformation: Get "https://[10.43.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.43.0.1:443: connect: no route to host]
Normal SandboxChanged 10s (x2 over 12s) kubelet Pod sandbox changed, it will be killed and re-created.
I've running the kuberenetes cluster behind a corporate proxy. I've set the environmental variables as follows.
export https_proxy=http://proxyIP:PORT
export http_proxy=http://proxyIP:PORT
export HTTP_PROXY="${http_proxy}"
export HTTPS_PROXY="${https_proxy}"
export NO_PROXY=localhost,127.0.0.1,master_node_IP,worker_node_IP,10.0.0.0/8,10.96.0.0/16
[root#dcpoth24213118 ~]# kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 12h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 12h
[root#dcpoth24213118 ~]# ip r s
default via 6.48.248.129 dev eth1
6.48.248.128/26 dev eth1 proto kernel scope link src 6.48.248.145
10.32.0.0/12 dev weave proto kernel scope link src 10.32.0.1
10.155.0.0/24 via 6.48.248.129 dev eth1
10.228.0.0/24 via 6.48.248.129 dev eth1
10.229.0.0/24 via 6.48.248.129 dev eth1
10.250.0.0/24 via 6.48.248.129 dev eth1
I've got weave network plugin installed. The issue is that I cannot create any other pods, all will get stuck in the ContainerCreating state.
I've run out of ideas how to fix it. Can someone give me a hint ?

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container

We are trying to create POD but the Pod's status struck at ContainerCreating for long time.
This is the output we got after running the command: kubectl describe pod
Name: demo-6c59fb8f77-9x6sr
Namespace: default
Priority: 0
Node: k8-slave2/10.0.0.5
Start Time: Wed, 23 Dec 2020 10:16:23 +0000
Labels: app=demo
pod-template-hash=6c59fb8f77
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/demo-6c59fb8f77
Containers:
private-docker-registry:
Container ID:
Image: private-docker-registry:5000/mahin/mof-docker-demo:v1
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-p94zw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-p94zw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-p94zw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/demo-6c59fb8f77-9x6sr to k8-slave2
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8eee497a2176c7f5782222f804cc63a4abac7f4a2fc7813016793857ae1b1dff" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "95e72bfc6f6c13de7f5c96eb76b012c2e6639ca03f4c2f270b23ed1a09b90413" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "566370012e4a1d32af2ef9035ff64d743cd81f36f25d2724e7b033e393b8247e" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7d499e40f572cfc29ecfb44f8376493df56a44213b1c1e9333b65499a0c288cd" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "53241e64de1e4470712b4061e2c82f44916d654bc532f8f1d12e5d5d4e136914" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "fd168faab4546f988dc38fc56df2f71cf80c922e86d3f869be15a43f08328f99" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e578afe329abb0cba64802dfa480e00f2bbbb8c80be537791c24a31c853eb62f" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "a3cb32dba55907ca907fc4f38f7ca05ef6db10a6af2dd1fa3c4db166e4ab9ffe" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7e4368ba8ec460b3c94de24ab0a04b6c799eb28df885cbbacfc3bb3ffa8c1e67" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Warning FailedCreatePodSandBox 10m (x4 over 10m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c4aaa8f8cd2dc1eff788baf04774c4ecc845568d00ed1b386df311ec224eb6f3" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 56s (x551 over 10m) kubelet Pod sandbox changed, it will be killed and re-created.
azureuser#k8-master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default demo-6c59fb8f77-2jq6k 0/1 ContainerCreating 0 5m23s
kube-system coredns-f9fd979d6-q8s9b 1/1 Running 2 27h
kube-system coredns-f9fd979d6-qnm4j 1/1 Running 2 27h
kube-system etcd-k8-master 1/1 Running 2 27h
kube-system kube-apiserver-k8-master 1/1 Running 3 27h
kube-system kube-controller-manager-k8-master 1/1 Running 3 27h
kube-system kube-flannel-ds-kqz4t 0/1 CrashLoopBackOff 92 27h
kube-system kube-flannel-ds-szqzn 1/1 Running 3 27h
kube-system kube-flannel-ds-v9q47 0/1 CrashLoopBackOff 142 27h
kube-system kube-proxy-4mb47 1/1 Running 2 27h
kube-system kube-proxy-54m9b 1/1 Running 2 27h
kube-system kube-proxy-wdxfz 1/1 Running 1 27h
kube-system kube-scheduler-k8-master 1/1 Running 3 27h
kubernetes-dashboard dashboard-metrics-scraper-7b59f7d4df-zmlvs 0/1 ContainerCreating 0 27h
kubernetes-dashboard kubernetes-dashboard-665f4c5ff-cnsvn 0/1 ContainerCreating 0 6h3m
To fix the flannel crashloopbackoff we did Kubeadm reset and after some time this problem showed up again.
Current we are working with one master and two worker node.
My cluster details as follows:
azureuser#k8-master:~$ kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://52.150.11.168:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin#kubernetes
current-context: kubernetes-admin#kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: REDACTED
client-key-data: REDACTED
Docker version:
azureuser#k8-master:~$ sudo docker version
[sudo] password for azureuser:
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Wed Oct 14 19:00:27 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Wed Oct 14 16:52:50 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu1~18.04.2
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
kubeadm version :
azureuser#k8-master:~$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:15:05Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
The flannel is crashing whenever I tried to schedule pod creation.
Background
I think your issue is cased by your 2 Flannel CNI pods CrashLoopBackOff status.
Your error
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8eee497a2176c7f5782222f804cc63a4abac7f4a2fc7813016793857ae1b1dff" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
is pointing that pod cannot be created due to lack of /run/flannel/subnet.env file.
In Flannel Github document you can find:
Flannel runs a small, single binary agent called flanneld on each host, and is responsible for allocating a subnet lease to each host out of a larger, preconfigured address space.
Meaning, to proper work, Flannel pod should be running on each node as it contains subnets information. From your outputs I can see that only 1 is working properly out of 3 Flannel pods.
NAMESPACE NAME READY STATUS RESTARTS AGE
...
kube-system kube-flannel-ds-kqz4t 0/1 CrashLoopBackOff 92 27h
kube-system kube-flannel-ds-szqzn 1/1 Running 3 27h
kube-system kube-flannel-ds-v9q47 0/1 CrashLoopBackOff 142 27h
If mentioned pod was scheduled on node where flannel pod is not working it won't be created due to CNI network issues. Besides your demo pod, also kubernetes-dashboard pods have the same issue with ContainerCreating status.
Conclusion
Your demo pod cannot be scheduled as Kubernetes encounter some network issues related with flannel configuration file (...network: open /run/flannel/subnet.env: no such file or directory).
Your flannel pods restarts counts is very high as for 27 hours. You have to determine why and fix it. It might be lack of resources, network issues with your infrastructure or many other reasons. Once all flannel pods will be working correctly, your shouldn't encounter this error.
Solution
You have to make flannel pods works correctly on each node.
Additional Troubleshooting Details
For detailed investigation please provide
$ kubectl describe kube-flannel-ds-kqz4t -n kube-system
$ kubectl describe kube-flannel-ds-v9q47 -n kube-system
Logs details would be also helpful
$ kubectl logs kube-flannel-ds-kqz4t -n kube-system
$ kubectl logs kube-flannel-ds-v9q47 -n kube-system
Please replace kubectl get pods --all-namespaces with kubectl get pods -o wide -A and output of kubectl get nodes -o wide.
If you will provide those information, it should be possible to determine root cause of flannel pods issues and I will edit this answer with exact solution.

Kubernetes cluster on bare metal by kubeadm

I'm trying to create a single control-plane cluster with kubeadm on 3 bare metal nodes (1 master and 2 workers) running on Debian 10 with Docker as a container runtime. Each node has an external IP and internal IP.
I want to configure a cluster on the internal network and be accessible from the Internet.
Used this command for that (please correct me if something wrong):
kubeadm init --control-plane-endpoint=10.10.0.1 --apiserver-cert-extra-sans={public_DNS_name},10.10.0.1 --pod-network-cidr=192.168.0.0/16
I got:
kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
dev-k8s-master-0.public.dns Ready master 16h v1.18.2 10.10.0.1 <none> Debian GNU/Linux 10 (buster) 4.19.0-8-amd64 docker://19.3.8
Init phase complete successfully and the cluster is accessible from the Internet. All pods are up and running except coredns that should be running after networking will be applied.
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
After networking applied, coredns pods still not ready:
kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-75d56dfc47-g8g9g 0/1 CrashLoopBackOff 192 16h
kube-system calico-node-22gtx 1/1 Running 0 16h
kube-system coredns-66bff467f8-87vd8 0/1 Running 0 16h
kube-system coredns-66bff467f8-mv8d9 0/1 Running 0 16h
kube-system etcd-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-apiserver-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-controller-manager-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-proxy-lp6b8 1/1 Running 0 16h
kube-system kube-scheduler-dev-k8s-master-0 1/1 Running 0 16h
Some logs from failed pods:
kubectl -n kube-system logs calico-kube-controllers-75d56dfc47-g8g9g
2020-04-22 08:24:55.853 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
2020-04-22 08:24:55.855 [INFO][1] k8s.go 228: Using Calico IPAM
W0422 08:24:55.855525 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-04-22 08:24:55.856 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-04-22 08:25:05.857 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-04-22 08:25:05.857 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
coredns:
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0422 08:29:12.275344 1 trace.go:116] Trace[1050055850]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.274382393 +0000 UTC m=+59491.429700922) (total time: 30.000897581s):
Trace[1050055850]: [30.000897581s] [30.000897581s] END
E0422 08:29:12.275388 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0422 08:29:12.276163 1 trace.go:116] Trace[188478428]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.275499997 +0000 UTC m=+59491.430818380) (total time: 30.000606394s):
Trace[188478428]: [30.000606394s] [30.000606394s] END
E0422 08:29:12.276198 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0422 08:29:12.277424 1 trace.go:116] Trace[16697023]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.276675998 +0000 UTC m=+59491.431994406) (total time: 30.000689778s):
Trace[16697023]: [30.000689778s] [30.000689778s] END
E0422 08:29:12.277452 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
Any thoughts what's wrong?
This answer is to call attention to #florin suggestion:
I've seen a similar behavior when I had multiple public interfaces on the node and calico selected the wrong one.
What I did is to set IP_AUTODETECT_METHOD in the calico config.
From Calico Configuration on IP_AUTO_DETECT_METHOD:
The method to use to autodetect the IPv4 address for this host. This is only used when the IPv4 address is being autodetected. See IP Autodetection methods for details of the valid methods.
Learn more Here: https://docs.projectcalico.org/reference/node/configuration#ip-autodetection-methods
I am also facing same problem, but following is work for me, try this in you master node.
$ sudo iptables -P INPUT ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -F

Kubernetes dashboard: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout

I have a Kubernetes cluster in vagrant (1.14.0) and installed calico.
I have installed the kubernetes dashboard. When I use kubectl proxy to visit the dashboard:
Error: 'dial tcp 192.168.1.4:8443: connect: connection refused'
Trying to reach: 'https://192.168.1.4:8443/'
Here are my pods (dashboard is restarting frequently):
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-etcd-cj928 1/1 Running 0 11m
calico-node-4fnb6 1/1 Running 0 18m
calico-node-qjv7t 1/1 Running 0 20m
calico-policy-controller-b9b6749c6-29c44 1/1 Running 1 11m
coredns-fb8b8dccf-jjbhk 1/1 Running 0 20m
coredns-fb8b8dccf-jrc2l 1/1 Running 0 20m
etcd-k8s-master 1/1 Running 0 19m
kube-apiserver-k8s-master 1/1 Running 0 19m
kube-controller-manager-k8s-master 1/1 Running 0 19m
kube-proxy-8mrrr 1/1 Running 0 18m
kube-proxy-cdsr9 1/1 Running 0 20m
kube-scheduler-k8s-master 1/1 Running 0 19m
kubernetes-dashboard-5f7b999d65-nnztw 1/1 Running 3 2m11s
logs of the dasbhoard pod:
2019/03/30 14:36:21 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service account's configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
I can telnet from both master and nodes to 10.96.0.1:443.
What is configured wrongly? The rest of the cluster seems to work fine, although I see this logs in kubelet:
failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml"
kubelet seems to run fine on the master.
The cluster was created with this command:
kubeadm init --apiserver-advertise-address="192.168.50.10" --apiserver-cert-extra-sans="192.168.50.10" --node-name k8s-master --pod-network-cidr=192.168.0.0/16
you should define your hostname in /etc/hosts
#hostname
YOUR_HOSTNAME
#nano /etc/hosts
YOUR_IP HOSTNAME
if you set your hostname in your master but it did not work try
# systemctl stop kubelet
# systemctl stop docker
# iptables --flush
# iptables -tnat --flush
# systemctl start kubelet
# systemctl start docker
and you should install dashboard before join worker node
and disable your firewall
and you can check your free ram.
Exclude -- node-name parameter from kubeadm init command
try this command
kubeadm init --apiserver-advertise-address=$(hostname -i) --apiserver-cert-extra-sans="192.168.50.10" --pod-network-cidr=192.168.0.0/16
For me the issue was I needed to create a NetworkPolicy that allowed Egress traffic to the kubernetes API

Setting up Kubernetes - API not reachable from Pods

I'm trying to setup a basic Kubernetes cluster on a (Ubuntu 16) VM. I've just followed the getting started docs and would expect a working cluster, but unfortunately, no such luck - no pods can't seem to connect to the Kubenernetes API. Since I'm new to Kubernetes it is very tough for me to find where things are going wrong.
Provision script:
apt-get update && apt-get upgrade -y
apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl docker.io
apt-mark hold kubelet kubeadm kubectl
swapoff -a
sysctl net.bridge.bridge-nf-call-iptables=1
kubeadm init
mkdir -p /home/ubuntu/.kube
cp -i /etc/kubernetes/admin.conf /home/ubuntu/.kube/config
chown -R ubuntu:ubuntu /home/ubuntu/.kube
runuser -l ubuntu -c "kubectl apply -f \"https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')\""
runuser -l ubuntu -c "kubectl taint nodes --all node-role.kubernetes.io/master-"
Installation seems fine.
ubuntu#packer-Ubuntu-16:~$ kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-86c58d9df4-lbp46 0/1 CrashLoopBackOff 7 18m 10.32.0.2 packer-ubuntu-16 <none> <none>
kube-system coredns-86c58d9df4-t8nnn 0/1 CrashLoopBackOff 7 18m 10.32.0.3 packer-ubuntu-16 <none> <none>
kube-system etcd-packer-ubuntu-16 1/1 Running 0 17m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system kube-apiserver-packer-ubuntu-16 1/1 Running 0 18m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system kube-controller-manager-packer-ubuntu-16 1/1 Running 0 17m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system kube-proxy-dwhhf 1/1 Running 0 18m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system kube-scheduler-packer-ubuntu-16 1/1 Running 0 17m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system weave-net-sfvz5 2/2 Running 0 18m 145.100.100.100 packer-ubuntu-16 <none> <none>
Question: is it normal that the Kubernetes pods have as IP the ip of eth0 of the host (145.100.100.100)? Seems weird to me, I would expect them to have a virtual IP?
As you can see the coredns pod is crashing, because, well, it cannot reach the API.
This is as I understand it, the service:
ubuntu#packer-Ubuntu-16:~$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22m
CoreDNS crashing, because API is unreachable:
ubuntu#packer-Ubuntu-16:~$ kubectl logs -n kube-system coredns-86c58d9df4-lbp46
.:53
2018-12-06T12:54:28.481Z [INFO] CoreDNS-1.2.6
2018-12-06T12:54:28.481Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
[INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
E1206 12:54:53.482269 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1206 12:54:53.482363 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:311: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1206 12:54:53.482540 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I tried launching a simple alpine pod/container. And indeed 10.96.0.1 doesn't responds to pings or anything else.
I'm stuck here. I've tried to google a lot but nothing comes up and my understanding is pretty basic. I guess something's up with the networking, but I don't know what (for me it seems suspicious that when doing get pods, the pods show up with the host IP, but perhaps this is normal also?)
I found that the problem is caused by the host's iptables rules.