Flannel is crashing for Slave node - kubernetes

I am getting this result for flannel service on my slave node. Flannel is running fine on master node.
kube-system kube-flannel-ds-amd64-xbtrf 0/1 CrashLoopBackOff 4 3m5s
Kube-proxy running on the slave is fine but not the flannel pod.
I have a master and a slave node only. At first its say running, then it goes to error and finally, crashloopbackoff.
godfrey#master:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system kube-flannel-ds-amd64-jszwx 0/1 CrashLoopBackOff 4 2m17s 192.168.152.104 slave3 <none> <none>
kube-system kube-proxy-hxs6m 1/1 Running 0 18m 192.168.152.104 slave3 <none> <none>
I am also getting this from the logs:
I0515 05:14:53.975822 1 main.go:390] Found network config - Backend type: vxlan
I0515 05:14:53.975856 1 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0515 05:14:53.976072 1 main.go:291] Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
I0515 05:14:53.976154 1 main.go:370] Stopping shutdownHandler...
I could not find a solution so far. Help appreciated.

As solution came from OP, I'm posting answer as community wiki.
As reported by OP in the comments, he didn't passed the podCIDR during kubeadm init.
The following command was used to see that the flannel pod was in "CrashLoopBackoff" state:
sudo kubectl get pods --all-namespaces -o wide
To confirm that podCIDR was not passed to flannel pod kube-flannel-ds-amd64-ksmmh that was in CrashLoopBackoff state.
$ kubectl logs kube-flannel-ds-amd64-ksmmh
kubeadm init --pod-network-cidr=172.168.10.0/24 didn't pass the podCIDR to the slave nodes as expected.
Hence to solve the problem, kubectl patch node slave1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}' command had to be used to pass podCIDR to each slave node.
Please see this link: coreos.com/flannel/docs/latest/troubleshooting.html and section "Kubernetes Specific"

The described cluster configuration doesn't look correct in two aspects:
First of all, PodCIDR reasonable minimum subnet size is /16. Each Kubernetes node usually gets /24 subnet because it can run up to 100 pods.
PodCIDR and ServicesCIDR (default: "10.96.0.0/12") must not interfere with your existing LAN network and with each other.
So, correct kubeadm command would look like:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
In your case PodCIDR subnet is only /24 and it was assigned to master node. Slave node didn't get its own /24 subnet, so Flannel Pod showed the error in the logs:
Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
Assigning the same subnet to several nodes manually will lead to the other connectivity problems.
You can find more details on Kubernetes IP subnets in GKE documentation.
The second problem is the IP subnet number.
Recent Calico network addon versions are able to detect the correct Pod subnet based on kubeadm parameter --pod-network-cidr. Older version was using predefined subnet 192.168.0.0/16 and you had to adjust it in its YAML file in the Deaemonset specification :
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
Flannel is still requires default subnet ( 10.244.0.0/16 ) to be specified for kubeadm init.
To use custom subnet for your cluster, Flannel "installation" YAML file should be adjusted before applying to the cluster.
...
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
...
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
...
So the following should work for any version of Kubernetes and Calico:
$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Latest Calico version
$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# or specific version, v3.14 in this case, which is also latest at the moment
# kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
Same for Flannel:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# For Kubernetes v1.7+
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# For older versions of Kubernetes:
# For RBAC enabled clusters:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-legacy.yml
$
There are many other network addons. You can find the list in the documentation:
Cluster Networking
Installing Addons

Related

Failed to install rook on k8s cluster

I am trying to create a rook cluster inside k8s cluster.
Set up - 1 master node, 1 worker node
These are the steps I have followed
Master node:
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
sudo sysctl net.bridge.bridge-nf-call-iptables=1
sudo sysctl net.bridge.bridge-nf-call-ip6tables=1
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/32a765fd19ba45b387fdc5e3812c41fff47cfd55/Documentation/kube-flannel.yml
kubeadm token create --print-join-command
Worker node:
kubeadm join {master_ip_address}:6443 --token {token} --discovery-token-ca-cert-hash {hash} --apiserver-advertise-address={worker_private_ip}
Master node - Install rook - (reference - https://rook.github.io/docs/rook/master/ceph-quickstart.html):
kubectl create -f ceph/common.yaml
kubectl create -f ceph/operator.yaml
kubectl create -f ceph/cluster-test.yaml
Error while creating rook-ceph-operator pod:
(combined from similar events): Failed create pod sandbox: rpc error: code =
Unknown desc = failed to set up sandbox container "4a901f12e5af5340f2cc48a976e10e5c310c01a05a4a47371f766a1a166c304f"
network for pod "rook-ceph-operator-fdfbcc5c5-jccc9": networkPlugin cni failed to
set up pod "rook-ceph-operator-fdfbcc5c5-jccc9_rook-ceph" network: failed to set bridge addr:
"cni0" already has an IP address different from 10.244.1.1/24
Can anybody help me with this issue?
This issue start if you did kubeadm reset and after that kubeadm init reinitialize Kubernetes.
kubeadm reset
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
After this start docker and kubelet and kubeadm again.
Work around
You can also try this way as simple easy solution
ip link delete cni0
ip link delete flannel.1
that depends on which network you are using inside k8s.

About : CreateContainerError

i installed K8S cluster in my laptop, it was running fine in the beginning but when i restarted my laptop then some services were not running.
kube-system coredns-5c98db65d4-9nm6m 0/1 Error 594 12d
kube-system coredns-5c98db65d4-qwkk9 0/1 CreateContainerError
kube-system kube-scheduler-kubemaster 0/1 CreateContainerError
I searched online for solution but could not get appropriate answer ,
please help me resolve this issue
I encourage you to look for official kubernetes documentation. Remember that your kubemaster should have at least fallowing resources: 2CPUs or more, 2GB or more of RAM.
Firstly install docker and kubeadm (as a root user) on each machine.
Initialize kubeadm (on master):
kubeadm init <args>
For example for Calico to work correctly, you need to pass --pod-network-cidr=192.168.0.0/16 to kubeadm init:
kubeadm init --pod-network-cidr=192.168.0.0/16
Install a pod network add-on (depends on what you would like to use). You can install a pod network add-on with the following command:
kubectl apply -f <add-on.yaml>
e.g. for Calico:
kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
To start using your cluster, you need to run on master the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You can now join any number of machines by running the following on each node as root:
kubeadm join <master-ip>:<master-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>
By default, tokens expire after 24 hours. If you are joining a node to the cluster after the current token has expired, you can create a new token by running the following command on the control-plane node:
kubeadm token create
Please, let me know if it works for you.
Did you check the status of docker and kubelet services.? if not, please run below commands and verify that services are up and running.
systemctl status docker kubelet

flannel restart very often

Flannel on node restarts always.
Log as follows:
root#debian:~# docker logs faa668852544
I0425 07:14:37.721766 1 main.go:514] Determining IP address of default interface
I0425 07:14:37.724855 1 main.go:527] Using interface with name eth0 and address 192.168.50.19
I0425 07:14:37.815135 1 main.go:544] Defaulting external address to interface address (192.168.50.19)
E0425 07:15:07.825910 1 main.go:241] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-arm-bg9rn': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-arm-bg9rn: dial tcp 10.96.0.1:443: i/o timeout
master configuration:
ubuntu: 16.04
node:
embedded system with debian rootfs(linux4.9).
kubernetes version:v1.14.1
docker version:18.09
flannel version:v0.11.0
I hope flannel run normal on node.
First, for flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init.
kubeadm init --pod-network-cidr=10.244.0.0/16
Set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 by running
sysctl net.bridge.bridge-nf-call-iptables=1
Next is to create the clusterrole and clusterrolebinding
as follows:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml

Kubernetes 1.6.2 flannel configuration in centos 7

Using kueadm command I have configured 3 nodes Kubernetes cluster. Unlike earlier version 1.6.2 kubeadm command configures all the Kubernetes process automatically. For flannel I used this yml file kube-flannel.yml. my understanding with Kubernetes is it will create the container and run the process inside the container but I see flannel process running on node itself but /opt/bin/flannel binary not in my node. How Kubernetes running the flannel?
How Kubernetes handles this? Is there right document explains this concepts?
flannel pod running in master node itself.
[root#master01 ~]# kubectl get pods -o wide --namespace=kube-system -l app=flannel
NAME READY STATUS RESTARTS AGE IP NODE
kube-flannel-ds-3694s 2/2 Running 37 3d 192.168.15.101 master01
kube-flannel-ds-mbh9b 2/2 Running 10 3d 192.168.15.102 node-01
kube-flannel-ds-vlm20 2/2 Running 12 3d 192.168.15.103 node-02
I see flanneld process
[root#master01 ~]# ps -fed |grep flan
root 5447 5415 0 May10 ? 00:00:08 /opt/bin/flanneld --ip-masq --kube-subnet-mgr
root 5604 5582 0 May10 ? 00:00:00 /bin/sh -c set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done
but flanneld is not in the master node
> [root#master01 ~]# ls -ld /opt/bin/flanneld
> ls: cannot access /opt/bin/flanneld: No such file or directory
Thanks
SR
After some more reading found the answer flanneld run inside the continerd.
here is the run details.
https://github.com/opencontainers/runc
we can extract the flannel docker images like below.
> docker save -o flannel-v0.7.1-amd64.tar
> quay.io/coreos/flannel:v0.7.1-amd64 tar tvf flannel-v0.7.1-amd64.tar

kubectl get nodes not showing workers

I am following this tutorial with 2 vms running CentOS7. Everything looks fine (no errors during installation/setup) but I can't see my nodes.
NOTE:
I am running this on VMWare VMs
kub1 is my master and kub2 my worker node
kubectl get nodes output:
[root#kub1 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#kub2 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
nodes:
[root#kub1 ~]# kubectl get nodes
[root#kub1 ~]# kubectl get nodes -a
[root#kub1 ~]#
[root#kub2 ~]# kubectl get nodes -a
[root#kub2 ~]# kubectl get no
[root#kub2 ~]#
cluster events:
[root#kub1 ~]# kubectl get events -a
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kubelet kub2.local} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
/var/log/messages:
kubelet.go:1194] Unable to construct api.Node object for kubelet: can't get ip address of node node-kub2: lookup node-kub2: no such host
QUESTION: any idea why my nodes are not shown using "kubectl get nodes"?
My issue was that the KUBELET_HOSTNAME on /etc/kubernetes/kubeletvalue didn't match the hostname.
I commented that line, then restarted the services and I could see my worker after that.
hope that helps
Not sure about your scenario, but I have solved it after 3-4 hours of efforts.
Solved
I was facing this issue, because my docker cgroup driver was different than kubernetes cgroup driver.
Just updated it to cgroupfs using following commands mentioned in doc.
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
EOF
Restart docker service service docker restart.
Reset kubernetes on slave node: kubeadm reset
Joined master again: kubeadm join <><>
It was visible on master using kubectl get nodes.
I had a similar problem after installing k8s using kubespray on fedora31, and to debug the issue, tried to run a random container directly using docker run that failed with:
docker: Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.
this is a known problem cause by cgroup version on fedora 31, and the fix is to update grub to use the previous version:
sudo dnf install grubby
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"