Kubernetes API container dies constantly - kubernetes

I just installed from scratch a small Kubernetes test cluster in a 4 Armbian/Odroid_MC1 (Debian 10) nodes. The install process is this 1, nothing fancy or special, adding k8s apt repo and install with apt.
The problem is that the API server dies constantly, like every 5 to 10 minutes, after the controller-manager and the scheduler die together, who seem to stop simultaneously before. Evidently, the API becomes unusable for like a minute. All three services do restart, and things run fine for the next four to nine minutes, when the loop repeats. Logs are here 2. This is an excerpt:
$ kubectl get pods -o wide --all-namespaces
The connection to the server 192.168.1.91:6443 was refused - did you specify the right host or port?
(a minute later)
$ kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-74ff55c5b-8pm9r 1/1 Running 2 88m 10.244.0.7 mc1 <none> <none>
kube-system coredns-74ff55c5b-pxdqz 1/1 Running 2 88m 10.244.0.6 mc1 <none> <none>
kube-system etcd-mc1 1/1 Running 2 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-apiserver-mc1 0/1 Running 12 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-controller-manager-mc1 1/1 Running 5 31m 192.168.1.91 mc1 <none> <none>
kube-system kube-flannel-ds-fxg2s 1/1 Running 5 45m 192.168.1.94 mc4 <none> <none>
kube-system kube-flannel-ds-jvvmp 1/1 Running 5 48m 192.168.1.92 mc2 <none> <none>
kube-system kube-flannel-ds-qlvbc 1/1 Running 6 45m 192.168.1.93 mc3 <none> <none>
kube-system kube-flannel-ds-ssb9t 1/1 Running 3 77m 192.168.1.91 mc1 <none> <none>
kube-system kube-proxy-7t9ff 1/1 Running 2 45m 192.168.1.93 mc3 <none> <none>
kube-system kube-proxy-8jhc7 1/1 Running 2 88m 192.168.1.91 mc1 <none> <none>
kube-system kube-proxy-cg75m 1/1 Running 2 45m 192.168.1.94 mc4 <none> <none>
kube-system kube-proxy-mq8j7 1/1 Running 2 48m 192.168.1.92 mc2 <none> <none>
kube-system kube-scheduler-mc1 1/1 Running 5 31m 192.168.1.91 mc1 <none> <none>
$ docker ps -a # (check the exited and restarted services)
CONTAINER ID NAMES STATUS IMAGE NETWORKS PORTS
0e179c6495db k8s_kube-apiserver_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_13 Up About a minute 66eaad223e2c
2ccb014beb73 k8s_kube-scheduler_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_6 Up 3 minutes 21e17680ca2d
3322f6ec1546 k8s_kube-controller-manager_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_6 Up 3 minutes a1ab72ce4ba2
583129da455f k8s_kube-apiserver_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_12 Exited (137) About a minute ago 66eaad223e2c
72268d8e1503 k8s_install-cni_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_0 Exited (0) 5 minutes ago 263b01b3ca1f
fe013d07f186 k8s_kube-controller-manager_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_5 Exited (255) 3 minutes ago a1ab72ce4ba2
34ef8757b63d k8s_kube-scheduler_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_5 Exited (255) 3 minutes ago 21e17680ca2d
fd8e0c0ba27f k8s_coredns_coredns-74ff55c5b-8pm9r_kube-system_3b813dc9-827d-4cf6-88cc-027491b350f1_2 Up 32 minutes 15c1a66b013b
f44e2c45ed87 k8s_coredns_coredns-74ff55c5b-pxdqz_kube-system_c3b7fbf2-2064-4f3f-b1b2-dec5dad904b7_2 Up 32 minutes 15c1a66b013b
04fa4eca1240 k8s_POD_coredns-74ff55c5b-8pm9r_kube-system_3b813dc9-827d-4cf6-88cc-027491b350f1_42 Up 32 minutes k8s.gcr.io/pause:3.2 none
f00c36d6de75 k8s_POD_coredns-74ff55c5b-pxdqz_kube-system_c3b7fbf2-2064-4f3f-b1b2-dec5dad904b7_42 Up 32 minutes k8s.gcr.io/pause:3.2 none
a1d6814e1b04 k8s_kube-flannel_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_3 Up 32 minutes 263b01b3ca1f
94b231456ed7 k8s_kube-proxy_kube-proxy-8jhc7_kube-system_cc637e27-3b14-41bd-9f04-c1779e500a3a_2 Up 33 minutes 377de0f45e5c
df91856450bd k8s_POD_kube-flannel-ds-ssb9t_kube-system_dbf3513d-dad2-462d-9107-4813acf9c23a_2 Up 34 minutes k8s.gcr.io/pause:3.2 host
b480b844671a k8s_POD_kube-proxy-8jhc7_kube-system_cc637e27-3b14-41bd-9f04-c1779e500a3a_2 Up 34 minutes k8s.gcr.io/pause:3.2 host
1d4a7bcaad38 k8s_etcd_etcd-mc1_kube-system_14b7b6d6446e21cc57f0b40571ae3958_2 Up 35 minutes 2e91dde7e952
e5d517a9c29d k8s_POD_kube-controller-manager-mc1_kube-system_17cf17caf36ba27e3d2ec4f113a0cf6f_1 Up 35 minutes k8s.gcr.io/pause:3.2 host
3a3da7dbf3ad k8s_POD_kube-apiserver-mc1_kube-system_c55114bd57b1bf357c8f4c0d749ae105_2 Up 35 minutes k8s.gcr.io/pause:3.2 host
eef29cdebf5f k8s_POD_etcd-mc1_kube-system_14b7b6d6446e21cc57f0b40571ae3958_2 Up 35 minutes k8s.gcr.io/pause:3.2 host
3631d43757bc k8s_POD_kube-scheduler-mc1_kube-system_fe362b2b6b08ca576b7416df7f2e7845_1 Up 35 minutes k8s.gcr.io/pause:3.2 host
I see no weird issues on the logs (I'm a k8s beginner). This was working until a month ago, when I've reinstalled this for practicing, this is probably my tenth install attempt, I've tried different options, versions and googled a lot, but can't find no solution.
What could be the reason? What else can I try? How can I get to the root of the problem?
UPDATE 2021/02/06
The problem is not occurring anymore. Apparently, the issue was the version in this specific case. Didn't filed an issue because I didn't found clues regarding what specific issue to report.
The installation procedure in all cases was this:
# swapoff -a
# curl -sL get.docker.com|sh
# usermod -aG docker rodolfoap
# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
# echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
# apt-get update
# apt-get install -y kubeadm kubectl kubectx # Master
# kubeadm config images pull
# kubeadm init --apiserver-advertise-address=0.0.0.0 --pod-network-cidr=10.244.0.0/16
Armbian-20.08.1 worked fine. My installation procedure has not changed since.
Armbian-20.11.3 had the issue: the API, scheduler and coredns restarted every 5 minutes, blocking the access to the API 5 of each 8 minutes, average..
Armbian-21.02.1 works fine. Worked at the first install, same procedure.
All versions were updated to the last kernel, at the moment of the install, current is 5.10.12-odroidxu4.
As you can see, after around two hours, no API reboots:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE LABELS
kube-system coredns-74ff55c5b-gnvf2 1/1 Running 0 173m 10.244.0.2 mc1 k8s-app=kube-dns,pod-template-hash=74ff55c5b
kube-system coredns-74ff55c5b-wvnnz 1/1 Running 0 173m 10.244.0.3 mc1 k8s-app=kube-dns,pod-template-hash=74ff55c5b
kube-system etcd-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=etcd,tier=control-plane
kube-system kube-apiserver-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-apiserver,tier=control-plane
kube-system kube-controller-manager-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-controller-manager,tier=control-plane
kube-system kube-flannel-ds-c4jgv 1/1 Running 0 123m 192.168.1.93 mc3 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-flannel-ds-cl6n5 1/1 Running 0 75m 192.168.1.94 mc4 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-flannel-ds-z2nmw 1/1 Running 0 75m 192.168.1.92 mc2 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-flannel-ds-zqxh7 1/1 Running 0 150m 192.168.1.91 mc1 app=flannel,controller-revision-hash=64465d999,pod-template-generation=1,tier=node
kube-system kube-proxy-bd596 1/1 Running 0 75m 192.168.1.94 mc4 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-n6djp 1/1 Running 0 75m 192.168.1.92 mc2 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-rf4cr 1/1 Running 0 173m 192.168.1.91 mc1 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-proxy-xhl95 1/1 Running 0 123m 192.168.1.93 mc3 controller-revision-hash=b89db7f56,k8s-app=kube-proxy,pod-template-generation=1
kube-system kube-scheduler-mc1 1/1 Running 0 173m 192.168.1.91 mc1 component=kube-scheduler,tier=control-plane
Cluster is fully functional :)

I have the same problem, but with Ubuntu:
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
The cluster works good for:
Ubuntu 20.04 LTS
Ubuntu 18.04 LTS
Thought it will help someone else who is running ubuntu instead of Armbian.
Solution for ubuntu (possible for Armbian too) is here: Issues with "stability" with Kubernetes cluster before adding networking
Apparently it is a problem with the config of containerd on those versions.
UPDATE:
The problem is that if you use sudo apt install containerd, you will install the version v1.5.9 which has the option SystemdCgroup = false that worked, in my case, in Ubuntu 20.04 but on the Ubuntu 22.04 doesn't work. But if you change it to SystemdCgroup = trueit works.(this feature is updated in containerd v1.6.2 so that it is set on true). This will hopefully fix your problem too.

Related

calico-kube-controller stays in pending state

I have a new install of kubernetes on Ubuntu-18 using version 1.24.3 with Calico. The calico-controller will not start:
$ sudo kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-555bc4b957-z4q2p 0/1 Pending 0 5m14s
kube-system calico-node-jz2j7 1/1 Running 0 5m15s
kube-system coredns-6d4b75cb6d-hwfx9 1/1 Running 0 5m14s
kube-system coredns-6d4b75cb6d-wdh55 1/1 Running 0 5m14s
kube-system etcd-ubuntu-18-extssd 1/1 Running 1 5m27s
kube-system kube-apiserver-ubuntu-18-extssd 1/1 Running 1 5m28s
kube-system kube-controller-manager-ubuntu-18-extssd 1/1 Running 1 5m26s
kube-system kube-proxy-t5z2r 1/1 Running 0 5m15s
kube-system kube-scheduler-ubuntu-18-extssd 1/1 Running 1 5m27s
Someone suggested setting a couple of Calico timeouts to 60 seconds, but that didn't work either.
What could be causing the calico-controller to fail to start, especially since the calico-node is running?
Also, is there a more trouble-free CNI implementation to use? Calico seems very error-prone.
I solved this by installing Weave:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
with this cidr:
sudo kubeadm init --pod-network-cidr=192.168.0.0/16

There are 2 networking component installed on node master, Weave and Calico. how can I completely remove Calico from my kubernetes cluster?

Weave has overlap with host's IP address and its pod stuck in CrashLoopBackOff state. There is a need to remove Calico first as I have no clue about working 2 Networking module on master!
emo#master:~$ sudo kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-64897985d-dw6ch 0/1 ContainerCreating 0
kube-system coredns-64897985d-xr6br 0/1 ContainerCreating 0
kube-system etcd-master 1/1 Running 26 (14m ago)
kube-system kube-apiserver-master 1/1 Running 26 (12m ago)
kube-system kube-controller-manager-master 1/1 Running 4 (20m ago)
kube-system kube-proxy-g98ph 1/1 Running 3 (20m ago)
kube-system kube-scheduler-master 1/1 Running 4 (20m ago)
kube-system weave-net-56n8k 1/2 CrashLoopBackOff 76 (54s ago)
tigera-operator tigera-operator-b876f5799-sqzf9 1/1 Running 6 (5m57s ago)
master:
emo#master:~$ kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane,master 6d19h v1.23.5 192.168.71.132 <none> Ubuntu 20.04.3 LTS 5.4.0-81-generic containerd://1.5.5
You may need to re-build your cluster after cleaning it up.
First, run kubectl delete for all the manifests you have applied to configure calico and weave. (e.g. kubectl delete -f https://projectcalico.docs.tigera.io/manifests/tigera-operator.yaml)
Then run kubeadm reset and run /etc/cni/net.d/ to delete all of your cni configurations. After that, you also need to reboot the server to delete some old records of ip link, or manually remove them by ip link delete {name}.
Now the new installation should be done well.

Kubernetes can't access pod in multi worker nodes

I was following a tutorial on youtube and the guy said that if you deploy your application in a multi-cluster setup and if your service is of type NodePort, you don't have to worry from where your pod gets scheduled. You can access it with different node IP address like
worker1IP:servicePort or worker2IP:servicePort or workerNIP:servicePort
But I tried just now and this is not the case, I can only access the pod on the node from where it is scheduled and deployed. Is it correct behavior?
kubectl version --short
> Client Version: v1.18.5
> Server Version: v1.18.5
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-66bff467f8-6pt8s 0/1 Running 288 7d22h
coredns-66bff467f8-t26x4 0/1 Running 288 7d22h
etcd-redhat-master 1/1 Running 16 7d22h
kube-apiserver-redhat-master 1/1 Running 17 7d22h
kube-controller-manager-redhat-master 1/1 Running 19 7d22h
kube-flannel-ds-amd64-9mh6k 1/1 Running 16 5d22h
kube-flannel-ds-amd64-g2k5c 1/1 Running 16 5d22h
kube-flannel-ds-amd64-rnvgb 1/1 Running 14 5d22h
kube-proxy-gf8zk 1/1 Running 16 7d22h
kube-proxy-wt7cp 1/1 Running 9 7d22h
kube-proxy-zbw4b 1/1 Running 9 7d22h
kube-scheduler-redhat-master 1/1 Running 18 7d22h
weave-net-6jjd8 2/2 Running 34 7d22h
weave-net-ssqbz 1/2 CrashLoopBackOff 296 7d22h
weave-net-ts2tj 2/2 Running 34 7d22h
[root#redhat-master deployments]# kubectl logs weave-net-ssqbz -c weave -n kube-system
DEBU: 2020/07/05 07:28:04.661866 [kube-peers] Checking peer "b6:01:79:66:7d:d3" against list &{[{e6:c9:b2:5f:82:d1 redhat-master} {b2:29:9a:5b:89:e9 redhat-console-1} {e2:95:07:c8:a0:90 redhat-console-2}]}
Peer not in list; removing persisted data
INFO: 2020/07/05 07:28:04.924399 Command line options: map[conn-limit:200 datapath:datapath db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 ipalloc-init:consensus=2 ipalloc-range:10.32.0.0/12 metrics-addr:0.0.0.0:6782 name:b6:01:79:66:7d:d3 nickname:redhat-master no-dns:true port:6783]
INFO: 2020/07/05 07:28:04.924448 weave 2.6.5
FATA: 2020/07/05 07:28:04.938587 Existing bridge type "bridge" is different than requested "bridged_fastdp". Please do 'weave reset' and try again
Update:
So basically the issue is because iptables is deprecated in rhel8. But After downgrading my OS to rhel7. I can access the nodeport only on the node it is deployed.

Gluster cluster in Kubernetes: Glusterd inactive (dead) after node reboot. How to debug?

I don't know what to do to debug it. I have 1 Kubernetes master node and three slave nodes. I have deployed on the three nodes a Gluster cluster just fine with this guide https://github.com/gluster/gluster-kubernetes/blob/master/docs/setup-guide.md.
I created volumes and everything is working. But when I reboot a slave node, and the node reconnects to the master node, the glusterd.service inside the slave node shows up dead and nothing works after this.
[root#kubernetes-node-1 /]# systemctl status glusterd.service
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
Active: inactive (dead)
I don't know what to do from here, for example /var/log/glusterfs/glusterd.log has been updated last time 3 days ago (it's not being updated with errors after a reboot or a pod deletion+recreation).
I just want to know where glusterd crashes so I can find out why.
How can I debug this crash?
All the nodes (master + slaves) run on Ubuntu Desktop 18 64 bit LTS Virtualbox VMs.
requested logs (kubectl get all --all-namespaces):
NAMESPACE NAME READY STATUS RESTARTS AGE
glusterfs pod/glusterfs-7nl8l 0/1 Running 62 22h
glusterfs pod/glusterfs-wjnzx 1/1 Running 62 2d21h
glusterfs pod/glusterfs-wl4lx 1/1 Running 112 41h
glusterfs pod/heketi-7495cdc5fd-hc42h 1/1 Running 0 22h
kube-system pod/coredns-86c58d9df4-n2hpk 1/1 Running 0 6d12h
kube-system pod/coredns-86c58d9df4-rbwjq 1/1 Running 0 6d12h
kube-system pod/etcd-kubernetes-master-work 1/1 Running 0 6d12h
kube-system pod/kube-apiserver-kubernetes-master-work 1/1 Running 0 6d12h
kube-system pod/kube-controller-manager-kubernetes-master-work 1/1 Running 0 6d12h
kube-system pod/kube-flannel-ds-amd64-785q8 1/1 Running 5 3d19h
kube-system pod/kube-flannel-ds-amd64-8sj2z 1/1 Running 8 3d19h
kube-system pod/kube-flannel-ds-amd64-v62xb 1/1 Running 0 3d21h
kube-system pod/kube-flannel-ds-amd64-wx4jl 1/1 Running 7 3d21h
kube-system pod/kube-proxy-7f6d9 1/1 Running 5 3d19h
kube-system pod/kube-proxy-7sf9d 1/1 Running 0 6d12h
kube-system pod/kube-proxy-n9qxq 1/1 Running 8 3d19h
kube-system pod/kube-proxy-rwghw 1/1 Running 7 3d21h
kube-system pod/kube-scheduler-kubernetes-master-work 1/1 Running 0 6d12h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d12h
elastic service/glusterfs-dynamic-9ad03769-2bb5-11e9-8710-0800276a5a8e ClusterIP 10.98.38.157 <none> 1/TCP 2d19h
elastic service/glusterfs-dynamic-a77e02ca-2bb4-11e9-8710-0800276a5a8e ClusterIP 10.97.203.225 <none> 1/TCP 2d19h
elastic service/glusterfs-dynamic-ad16ed0b-2bb6-11e9-8710-0800276a5a8e ClusterIP 10.105.149.142 <none> 1/TCP 2d19h
glusterfs service/heketi ClusterIP 10.101.79.224 <none> 8080/TCP 2d20h
glusterfs service/heketi-storage-endpoints ClusterIP 10.99.199.190 <none> 1/TCP 2d20h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 6d12h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
glusterfs daemonset.apps/glusterfs 3 3 0 3 0 storagenode=glusterfs 2d21h
kube-system daemonset.apps/kube-flannel-ds-amd64 4 4 4 4 4 beta.kubernetes.io/arch=amd64 3d21h
kube-system daemonset.apps/kube-flannel-ds-arm 0 0 0 0 0 beta.kubernetes.io/arch=arm 3d21h
kube-system daemonset.apps/kube-flannel-ds-arm64 0 0 0 0 0 beta.kubernetes.io/arch=arm64 3d21h
kube-system daemonset.apps/kube-flannel-ds-ppc64le 0 0 0 0 0 beta.kubernetes.io/arch=ppc64le 3d21h
kube-system daemonset.apps/kube-flannel-ds-s390x 0 0 0 0 0 beta.kubernetes.io/arch=s390x 3d21h
kube-system daemonset.apps/kube-proxy 4 4 4 4 4 <none> 6d12h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
glusterfs deployment.apps/heketi 1/1 1 0 2d20h
kube-system deployment.apps/coredns 2/2 2 2 6d12h
NAMESPACE NAME DESIRED CURRENT READY AGE
glusterfs replicaset.apps/heketi-7495cdc5fd 1 1 0 2d20h
kube-system replicaset.apps/coredns-86c58d9df4 2 2 2 6d12h
requested:
tasos#kubernetes-master-work:~$ kubectl logs -n glusterfs glusterfs-7nl8l
env variable is set. Update in gluster-blockd.service
Please check these similar topics:
GlusterFS deployment on k8s cluster-- Readiness probe failed: /usr/local/bin/status-probe.sh
and
https://github.com/gluster/gluster-kubernetes/issues/539
Check tcmu-runner.log log to debug it.
UPDATE:
I think it will be your issue:
https://github.com/gluster/gluster-kubernetes/pull/557
PR is prepared, but not merged.
UPDATE 2:
https://github.com/gluster/glusterfs/issues/417
Be sure that rpcbind is installed.

kube-dns and weave-net not starting

I am deploying Kubernetes 1.4 on Ubuntu 16 on Raspberry Pi 3 following the instructions at http://kubernetes.io/docs/getting-started-guides/kubeadm/. The master starts and the minion joins no problem but when I add weave kubedns won't start. Here's the pods:
k8s#k8s-master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-k8s-master 1/1 Running 1 23h
kube-system kube-apiserver-k8s-master 1/1 Running 3 23h
kube-system kube-controller-manager-k8s-master 1/1 Running 1 23h
kube-system kube-discovery-1943570393-ci2m9 1/1 Running 1 23h
kube-system kube-dns-4291873140-ia4y8 0/3 ContainerCreating 0 23h
kube-system kube-proxy-arm-nfvvy 1/1 Running 0 1h
kube-system kube-proxy-arm-tcnta 1/1 Running 1 23h
kube-system kube-scheduler-k8s-master 1/1 Running 1 23h
kube-system weave-net-4gqd1 0/2 CrashLoopBackOff 54 1h
kube-system weave-net-l758i 0/2 CrashLoopBackOff 44 1h
The events log doesn't show anything. getting logs for kube-dns doesn't get anything either.
What can I do to debug?
kube-dns won't start until the network is up.
Look in the kubelet logs on each machine for more information about the crash that is causing the CrashLoopBackoff.
How did you get ARM images for Weave Net? The weaveworks/weave-kube image on DockerHub is only built for x64.
Edit: as #pidster says Weave Net now supports ARM
UPDATE: As Bryan pointed out, Flannel is not the only overlay network anymore.
Note this two hints in the kubeadm install documentation:
Flannel is the only network overlay support for arm
If you are on another architecture than amd64, you should use the flannel overlay network as described in the multi-platform section
When using Flannel, you need to make a kubectl init --por-network-cidr=10.244.0.0/16
Note: this will autodetect the network interface to advertise the
master on as the interface with the default gateway. If you want to
use a different interface, specify
--api-advertise-addresses= argument to kubeadm init. If you want to use flannel as the pod network, specify
--pod-network-cidr=10.244.0.0/16 if you’re using the daemonset manifest below. However, please note that this is not required for any
other networks besides Flannel.
You may also would like to check my automated step-by-step installation for Raspberry Pi 3 with Ansible, since there is no issue with DNS and probably will work with Ubuntu 16 as well:
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox-894550917-7vj3z 1/1 Running 0 15h
default busybox-894550917-p9vnl 1/1 Running 1 3d
default gogs-3464422143-cf5wb 1/1 Running 0 16h
kube-system dummy-2501624643-pxmgz 1/1 Running 2 3d
kube-system etcd-master.cluster.local 1/1 Running 2 3d
kube-system kube-apiserver-master.cluster.local 1/1 Running 2 3d
kube-system kube-controller-manager-master.cluster.local 1/1 Running 2 3d
kube-system kube-discovery-1659614412-vrhv4 1/1 Running 2 3d
kube-system kube-dns-4211557627-kpsj4 4/4 Running 8 3d
kube-system kube-flannel-ds-d1bgg 2/2 Running 6 3d
kube-system kube-flannel-ds-fcp4b 2/2 Running 6 3d
kube-system kube-flannel-ds-n7p3m 2/2 Running 6 3d
kube-system kube-flannel-ds-tn7nd 2/2 Running 6 3d
kube-system kube-flannel-ds-vpk4w 2/2 Running 6 3d
kube-system kube-proxy-5nmtn 1/1 Running 2 3d
kube-system kube-proxy-gq7bz 1/1 Running 2 3d
kube-system kube-proxy-lkkgm 1/1 Running 2 3d
kube-system kube-proxy-mlh3v 1/1 Running 1 3d
kube-system kube-proxy-sg8n8 1/1 Running 2 3d
kube-system kube-scheduler-master.cluster.local 1/1 Running 2 3d
kube-system kubernetes-dashboard-3507263287-h9q33 1/1 Running 2 3d