flannel restart very often - kubernetes

Flannel on node restarts always.
Log as follows:
root#debian:~# docker logs faa668852544
I0425 07:14:37.721766 1 main.go:514] Determining IP address of default interface
I0425 07:14:37.724855 1 main.go:527] Using interface with name eth0 and address 192.168.50.19
I0425 07:14:37.815135 1 main.go:544] Defaulting external address to interface address (192.168.50.19)
E0425 07:15:07.825910 1 main.go:241] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-arm-bg9rn': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-arm-bg9rn: dial tcp 10.96.0.1:443: i/o timeout
master configuration:
ubuntu: 16.04
node:
embedded system with debian rootfs(linux4.9).
kubernetes version:v1.14.1
docker version:18.09
flannel version:v0.11.0
I hope flannel run normal on node.

First, for flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init.
kubeadm init --pod-network-cidr=10.244.0.0/16
Set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 by running
sysctl net.bridge.bridge-nf-call-iptables=1
Next is to create the clusterrole and clusterrolebinding
as follows:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml

Related

Flannel is crashing for Slave node

I am getting this result for flannel service on my slave node. Flannel is running fine on master node.
kube-system kube-flannel-ds-amd64-xbtrf 0/1 CrashLoopBackOff 4 3m5s
Kube-proxy running on the slave is fine but not the flannel pod.
I have a master and a slave node only. At first its say running, then it goes to error and finally, crashloopbackoff.
godfrey#master:~$ kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system kube-flannel-ds-amd64-jszwx 0/1 CrashLoopBackOff 4 2m17s 192.168.152.104 slave3 <none> <none>
kube-system kube-proxy-hxs6m 1/1 Running 0 18m 192.168.152.104 slave3 <none> <none>
I am also getting this from the logs:
I0515 05:14:53.975822 1 main.go:390] Found network config - Backend type: vxlan
I0515 05:14:53.975856 1 vxlan.go:121] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
E0515 05:14:53.976072 1 main.go:291] Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
I0515 05:14:53.976154 1 main.go:370] Stopping shutdownHandler...
I could not find a solution so far. Help appreciated.
As solution came from OP, I'm posting answer as community wiki.
As reported by OP in the comments, he didn't passed the podCIDR during kubeadm init.
The following command was used to see that the flannel pod was in "CrashLoopBackoff" state:
sudo kubectl get pods --all-namespaces -o wide
To confirm that podCIDR was not passed to flannel pod kube-flannel-ds-amd64-ksmmh that was in CrashLoopBackoff state.
$ kubectl logs kube-flannel-ds-amd64-ksmmh
kubeadm init --pod-network-cidr=172.168.10.0/24 didn't pass the podCIDR to the slave nodes as expected.
Hence to solve the problem, kubectl patch node slave1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}' command had to be used to pass podCIDR to each slave node.
Please see this link: coreos.com/flannel/docs/latest/troubleshooting.html and section "Kubernetes Specific"
The described cluster configuration doesn't look correct in two aspects:
First of all, PodCIDR reasonable minimum subnet size is /16. Each Kubernetes node usually gets /24 subnet because it can run up to 100 pods.
PodCIDR and ServicesCIDR (default: "10.96.0.0/12") must not interfere with your existing LAN network and with each other.
So, correct kubeadm command would look like:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
In your case PodCIDR subnet is only /24 and it was assigned to master node. Slave node didn't get its own /24 subnet, so Flannel Pod showed the error in the logs:
Error registering network: failed to acquire lease: node "slave3" pod cidr not assigned
Assigning the same subnet to several nodes manually will lead to the other connectivity problems.
You can find more details on Kubernetes IP subnets in GKE documentation.
The second problem is the IP subnet number.
Recent Calico network addon versions are able to detect the correct Pod subnet based on kubeadm parameter --pod-network-cidr. Older version was using predefined subnet 192.168.0.0/16 and you had to adjust it in its YAML file in the Deaemonset specification :
- name: CALICO_IPV4POOL_CIDR
value: "192.168.0.0/16"
Flannel is still requires default subnet ( 10.244.0.0/16 ) to be specified for kubeadm init.
To use custom subnet for your cluster, Flannel "installation" YAML file should be adjusted before applying to the cluster.
...
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
...
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
...
So the following should work for any version of Kubernetes and Calico:
$ sudo kubeadm init --pod-network-cidr=192.168.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# Latest Calico version
$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
# or specific version, v3.14 in this case, which is also latest at the moment
# kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
Same for Flannel:
$ sudo kubeadm init --pod-network-cidr=10.244.0.0/16
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
# For Kubernetes v1.7+
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
# For older versions of Kubernetes:
# For RBAC enabled clusters:
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-legacy.yml
$
There are many other network addons. You can find the list in the documentation:
Cluster Networking
Installing Addons

Can not login to kubernetes dashboard dial tcp 172.17.0.6:8443: connect: connection refused

I deployed Kubernetes v1.15.2 dashboard successfully. Checking the cluster info:
$ kubectl cluster-info
Kubernetes master is running at http://172.19.104.231:8080
kubernetes-dashboard is running at http://172.19.104.231:8080/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
When I access the dashboard, the result is:
[root#ops001 ~]# curl -L http://172.19.104.231:8080/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
Error: 'dial tcp 172.17.0.6:8443: connect: connection refused'
Trying to reach: 'https://172.17.0.6:8443/'
This is dashboard status:
[root#ops001 ~]# kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
kubernetes-dashboard-74d7cc788-mk9c7 1/1 Running 0 92m
What should I do to access the dashboard? When I using proxy to access dashboard UI:
$ kubectl proxy --address='localhost' --port=8086 --accept-hosts='^*$'
Starting to serve on 127.0.0.1:8086
the result is:
[root#ops001 ~]# curl -L http://127.0.0.1:8086/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
Error: 'dial tcp 172.17.0.6:8443: connect: connection refused'
Trying to reach: 'https://172.17.0.6:8443/'
What should I do to fix this problem?
I finally find the problem is kubernetes dashboard pod container does not communicate with the proxy nginx container. Because the proxy container is deployed before kubernetes flannel,and not in the same network.Trying to add proxy nginx container to flannel network would solve the problem.Check current flannel network:
[root#ops001 conf.d]# cat /run/flannel/subnet.env
FLANNEL_NETWORK=172.30.0.0/16
FLANNEL_SUBNET=172.30.224.1/21
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
generate start parameter of docker:
./mk-docker-opts.sh -d /run/docker_opts.env -c
check the parameter:
[root#ops001 conf.d] cat /run/docker_opts.env
DOCKER_OPTS=" --bip=172.30.224.1/21 --ip-masq=false --mtu=1450"
add paramter to docker service:
# vim /lib/systemd/system/docker.service
EnvironmentFile=/run/docker_opts.env
ExecStart=/usr/bin/dockerd $DOCKER_OPTS -H fd://
restart docker and the container would join the flannel network,could communicate with each other:
systemctl daemon-reload
systemctl restart docker
hope this may help you!

Kube-state-metrics error: Failed to create client: ... i/o timeout

I'm running Kubernetes in virtual machines and going through the basic tutorials, currently Add logging and metrics to the PHP / Redis Guestbook example. I'm trying to install kube-state-metrics:
git clone https://github.com/kubernetes/kube-state-metrics.git kube-state-metrics
kubectl create -f kube-state-metrics/kubernetes
but it fails.
kubectl describe pod --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7
...
Warning Unhealthy 28m (x8 over 30m) kubelet, kubernetes-node1 Readiness probe failed: Get http://192.168.129.102:8080/healthz: dial tcp 192.168.129.102:8080: connect: connection refused
kubectl logs --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7 -c kube-state-metrics
I0514 17:29:26.980707 1 main.go:85] Using default collectors
I0514 17:29:26.980774 1 main.go:93] Using all namespace
I0514 17:29:26.980780 1 main.go:129] metric white-blacklisting: blacklisting the following items:
W0514 17:29:26.980800 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0514 17:29:26.983504 1 main.go:169] Testing communication with server
F0514 17:29:56.984025 1 main.go:137] Failed to create client: ERROR communicating with apiserver: Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout
I'm unsure if this 10.96.0.1 IP is correct. My virtual machines are in a bridged network 10.10.10.0/24 and a host-only network 192.168.59.0/24. When initializing Kubernetes I used the argument --pod-network-cidr=192.168.0.0/16 so that's one more IP range that I'd expect. But 10.96.0.1 looks unfamiliar.
I'm new to Kubernetes, just doing the basic tutorials, so I don't know what to do now. How to fix it or investigate further?
EDIT - additonal info:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubernetes-master Ready master 15d v1.14.1 10.10.10.11 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node1 Ready <none> 15d v1.14.1 10.10.10.5 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node2 Ready <none> 15d v1.14.1 10.10.10.98 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
The command I used to initialize the cluster:
sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=192.168.0.0/16
The reason for this is probably overlapping of Pod network with Node network - you set Pod network CIDR to 192.168.0.0/16 which your host-only network will be included into as its address is 192.168.59.0/24.
To solve this you can either change the pod network CIDR to 192.168.0.0/24 (it is not recommended as this will give you only 255 addresses for your pod networking)
You can also use different range for your Calico. If you want to do it on a running cluster here is an instruction.
Also other way I tried:
edit Calico manifest to different range (for example 10.0.0.0/8) - sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=10.0.0.0/8) and apply it after the init.
Another way would be using different CNI like Flannel (which uses 10.244.0.0/16).
You can find more information about ranges of CNI plugins here.

Failed to create pod sandbox kubernetes error

I have a Ubuntu 16.04 which is acting as kubernetes master. I have installed kuber v1.13.1 and using weave for networking. I have 2 Raspberry pi devices running the same version of kubernetes. I created a cluster and joined the raspberry pi to Ubuntu kube master. I have started a deployment and everything looks to be working fine.
When I checked the logs of the container, I found out that it was not able to connect to the internet. I tried pinging but got no results. When I run the command to describe the pod, I got following:
Warning FailedCreatePodSandBox 42m (x3 over 42m) kubelet, node02 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "dea99f80488031b84b7b1f934343e54d877adf931071401651628505d52f55f9" network for pod "deployment-cnfc5": NetworkPlugin cni failed to set up pod "deployment-cnfc5_matrix-device" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/dea99f80488031b84b7b1f934343e54d877adf931071401651628505d52f55f9: dial tcp 127.0.0.1:6784: connect: connection refused
I have checked the directory /etc/cni/net.d and it contains 10-weave.conflist on both master and worker node. I have also checked the directory /opt/cni/bin and found below on master node:
bridge flannel ipvlan macvlan ptp tuning weave-ipam weave-plugin-2.5.1
dhcp host-local loopback portmap sample vlan weave-net
and on worker, I got below:
bridge flannel ipvlan macvlan ptp tuning weave-ipam weave-plugin-2.5.0
dhcp host-local loopback portmap sample vlan weave-net weave-plugin-2.5.1
Please can anyone please let me know what can I do to resolve this issue.? Thanks.
I initiated the kube master by using below commands:
sudo kubeadm init --token-ttl=0 --apiserver-advertise-address=192.168.0.142
and installed weave using:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

kubectl get nodes not showing workers

I am following this tutorial with 2 vms running CentOS7. Everything looks fine (no errors during installation/setup) but I can't see my nodes.
NOTE:
I am running this on VMWare VMs
kub1 is my master and kub2 my worker node
kubectl get nodes output:
[root#kub1 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#kub2 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
nodes:
[root#kub1 ~]# kubectl get nodes
[root#kub1 ~]# kubectl get nodes -a
[root#kub1 ~]#
[root#kub2 ~]# kubectl get nodes -a
[root#kub2 ~]# kubectl get no
[root#kub2 ~]#
cluster events:
[root#kub1 ~]# kubectl get events -a
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kubelet kub2.local} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
/var/log/messages:
kubelet.go:1194] Unable to construct api.Node object for kubelet: can't get ip address of node node-kub2: lookup node-kub2: no such host
QUESTION: any idea why my nodes are not shown using "kubectl get nodes"?
My issue was that the KUBELET_HOSTNAME on /etc/kubernetes/kubeletvalue didn't match the hostname.
I commented that line, then restarted the services and I could see my worker after that.
hope that helps
Not sure about your scenario, but I have solved it after 3-4 hours of efforts.
Solved
I was facing this issue, because my docker cgroup driver was different than kubernetes cgroup driver.
Just updated it to cgroupfs using following commands mentioned in doc.
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
EOF
Restart docker service service docker restart.
Reset kubernetes on slave node: kubeadm reset
Joined master again: kubeadm join <><>
It was visible on master using kubectl get nodes.
I had a similar problem after installing k8s using kubespray on fedora31, and to debug the issue, tried to run a random container directly using docker run that failed with:
docker: Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.
this is a known problem cause by cgroup version on fedora 31, and the fix is to update grub to use the previous version:
sudo dnf install grubby
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"