When I reboot the master and work node, the pod of coredns show the below error message seem that it can not recreate kubelet after server restart.
Normal SandboxChanged 12s kubelet, izbp1dyjigsfwmw0dtl85gz Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 11s kubelet, izbp1dyjigsfwmw0dtl85gz Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "5e850ee3e8bf86688fec2badd9b0272127a0d775620a5783e7c30b4e0d412b01" network for pod "coredns-6955765f44-4xnhj": networkPlugin cni failed to set up pod "coredns-6955765f44-4xnhj_kube-system" network: open /run/flannel/subnet.env: no such file or directory
You can try cleaning up flannel and reinstalling it.
kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
rm -rf /var/lib/cni/
rm -rf /run/flannel
rm -rf /etc/cni/
Remove interfaces related to and flannel:
ip link
For each interface flannel, do the following
ifconfig <name of interface from ip link> down
ip link delete <name of interface from ip link>
After this install flannel
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
For flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init
Related
I am new to Kubernetes. I have created a Kubernetes cluster with one Master node and 2 worker nodes. I have installer helm for the deployment of apps. I am getting the following error while starting the tiller pod
tiller-deploy-5b4685ffbf-znbdc 0/1 ContainerCreating 0 23h
After describing the pod I got the following result
[root#master-node flannel]# kubectl --namespace kube-system describe
pod tiller-deploy-5b4685ffbf-znbdc
Events:
Type Reason Age From Message
Warning FailedCreatePodSandBox 10m (x34020 over 22h) kubelet,
worker-node1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"cdda0a8ae9200668a2256e8c7b41904dce604f73f0282b0443d972f5e2846059"
network for pod "tiller-deploy-5b4685ffbf-znbdc": networkPlugin cni
failed to set up pod "tiller-deploy-5b4685ffbf-znbdc_kube-system"
network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 25s (x34556 over 22h) kubelet, worker-node1 Pod
sandbox changed, it will be killed and re-created.
Any hint of how can I get away with this error.
You need to setup a CNI plugin such as Flannel. Verify if all the pods in kube-system namespace are running.
To apply flannel in you cluster run the following command:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml
For flannel to work correctly pod-network-cidr should be 10.244.0.0/16 or if you have a different CIDR, you can customize flannel manifest (kube-flannel.yml) according to your needs.
Example:
net-conf.json: |
{
"Network": "10.10.0.0/16",
"Backend": {
"Type": "vxlan"
}
I have a Ubuntu 16.04 which is acting as kubernetes master. I have installed kuber v1.13.1 and using weave for networking. I have 2 Raspberry pi devices running the same version of kubernetes. I created a cluster and joined the raspberry pi to Ubuntu kube master. I have started a deployment and everything looks to be working fine.
When I checked the logs of the container, I found out that it was not able to connect to the internet. I tried pinging but got no results. When I run the command to describe the pod, I got following:
Warning FailedCreatePodSandBox 42m (x3 over 42m) kubelet, node02 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "dea99f80488031b84b7b1f934343e54d877adf931071401651628505d52f55f9" network for pod "deployment-cnfc5": NetworkPlugin cni failed to set up pod "deployment-cnfc5_matrix-device" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/dea99f80488031b84b7b1f934343e54d877adf931071401651628505d52f55f9: dial tcp 127.0.0.1:6784: connect: connection refused
I have checked the directory /etc/cni/net.d and it contains 10-weave.conflist on both master and worker node. I have also checked the directory /opt/cni/bin and found below on master node:
bridge flannel ipvlan macvlan ptp tuning weave-ipam weave-plugin-2.5.1
dhcp host-local loopback portmap sample vlan weave-net
and on worker, I got below:
bridge flannel ipvlan macvlan ptp tuning weave-ipam weave-plugin-2.5.0
dhcp host-local loopback portmap sample vlan weave-net weave-plugin-2.5.1
Please can anyone please let me know what can I do to resolve this issue.? Thanks.
I initiated the kube master by using below commands:
sudo kubeadm init --token-ttl=0 --apiserver-advertise-address=192.168.0.142
and installed weave using:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
I am trying to setup my first cluster using Kubernetes 1.13.1. The master got initialized okay, but both of my worker nodes are NotReady. kubectl describe node shows that Kubelet stopped posting node status on both worker nodes. On one of the worker nodes I get log output like
> kubelet[3680]: E0107 20:37:21.196128 3680 kubelet.go:2266] node
> "xyz" not found.
Here is the full details:
I am using Centos 7 & Kubernetes 1.13.1.
Initializing was done as follows:
[root#master ~]# kubeadm init --apiserver-advertise-address=10.142.0.4 --pod-network-cidr=10.142.0.0/24
Successfully initialized the cluster:
You can now join any number of machines by running the following on each node
as root:
`kubeadm join 10.142.0.4:6443 --token y0epoc.zan7yp35sow5rorw --discovery-token-ca-cert-hash sha256:f02d43311c2696e1a73e157bda583247b9faac4ffb368f737ee9345412c9dea4`
deployed the flannel CNI:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
The join command worked fine.
[kubelet-start] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "node01" as an annotation
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
Result of kubectl get nodes:
[root#master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 9h v1.13.1
node01 NotReady <none> 9h v1.13.1
node02 NotReady <none> 9h v1.13.1
on both nodes:
[root#node01 ~]# service kubelet status
Redirecting to /bin/systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2019-01-08 04:49:20 UTC; 32s ago
Docs: https://kubernetes.io/docs/
Main PID: 4224 (kubelet)
Memory: 31.3M
CGroup: /system.slice/kubelet.service
└─4224 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi
`Jan 08 04:54:10 node01 kubelet[4224]: E0108 04:54:10.957115 4224 kubelet.go:2266] node "node01" not found`
I appreciate your advise on how to troubleshoot this.
The previous answer sounds correct. You can verify that by running
kubectl describe node node01 on the master, or wherever kubectl is correctly configured.
It seems like the reason of this error is due to incorrect subnet. In Flannel documentation it is written that you should use /16 not /24 for pod network.
NOTE: If kubeadm is used, then pass --pod-network-cidr=10.244.0.0/16
to kubeadm init to ensure that the podCIDR is set.
I tried to run kubeadm with /24 and although I had nodes in Ready state the flannel pods did not run properly which resulted in some issues.
You can check if your flannel pods are running properly by:
kubectl get pods -n kube-system if the status is other than running then it is incorrect behavior. In this case you can check details by running kubectl describe pod PODNAME -n kube-system. Try changing the subnet and update us if that fixed the problem.
I ran into almost the same problem, and in the end I found that the reason was that the firewall was not turned off. You can try the following commands:
sudo ufw disable
or
systemctl disable firewalld
or
setenforce 0
I have created a 3 node azure kubernetes cluster using the following commands
az group create --name ResourceGroup --location canadacentral
az provider register -n Microsoft.ContainerService
az provider register -n Microsoft.Compute
az provider register -n Microsoft.Network
az aks create --resource-group ResourceGroup --name ReplicaSet --node-count 3 --kubernetes-version 1.8.7 --node-vm-size Standard_A0 --generate-ssh-keys
kubectl create -f https://raw.githubusercontent.com/openebs/openebs/master/k8s/openebs-operator.yaml
kubectl create -f https://raw.githubusercontent.com/openebs/openebs/master/k8s/openebs-storageclasses.yaml
Subsequently I have created a postgres stateful set as well which does not start since opensci is not installed on the kubelet.
Kubelet logs from Node-1 (where the pgset pod is scheduled)
I0313 05:42:41.910525 7845 reconciler.go:257] operationExecutor.MountVolume started for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3")
I0313 05:42:41.910605 7845 operation_generator.go:416] MountVolume.WaitForAttach entering for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3") DevicePath ""
E0313 05:42:41.910744 7845 iscsi_util.go:207] iscsi: could not read iface default error:
E0313 05:42:41.910815 7845 nestedpendingoperations.go:264] Operation for "\"kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0\"" failed. No retries permitted until 2018-03-13 05:44:43.910784094 +0000 UTC (durationBeforeRetry 2m2s). Error: MountVolume.WaitForAttach failed for volume "pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3" (UniqueName: "kubernetes.io/iscsi/10.0.20.229:3260:iqn.2016-09.com.openebs.jiva:pvc-a980c1e4-2674-11e8-a384-0a58ac1f03e3:0") pod "pgset-0" (UID: "a9826973-2674-11e8-a384-0a58ac1f03e3") : executable file not found in $PATH
E0313 05:43:12.080406 7845 kubelet.go:1628] Unable to mount volumes for pod "pgset-0_default(a9826973-2674-11e8-a384-0a58ac1f03e3)": timeout expired waiting for volumes to attach/mount for pod "default"/"pgset-0". list of unattached/unmounted volumes=[pgdata]; skipping pod
E0313 05:43:12.081262 7845 pod_workers.go:182] Error syncing pod a9826973-2674-11e8-a384-0a58ac1f03e3 ("pgset-0_default(a9826973-2674-11e8-a384-0a58ac1f03e3)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"pgset-0". list of unattached/unmounted volumes=[pgdata]
My question is whether there is a way to configure and ensure that the kubelet comes up by default with the openiscsi initiator utils installed and running
The following steps were followed to manually install iscsi-initiator in kubelet:
SSH into the Kubernetes Nodes
Identify the docker container running the kubelet using sudo docker ps.
Enter the kubelet container shell
sudo docker exec -it kubelet_container_id bash
Install open-iscsi.
apt-get update
apt install -y open-iscsi
I am following this tutorial with 2 vms running CentOS7. Everything looks fine (no errors during installation/setup) but I can't see my nodes.
NOTE:
I am running this on VMWare VMs
kub1 is my master and kub2 my worker node
kubectl get nodes output:
[root#kub1 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#kub2 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
nodes:
[root#kub1 ~]# kubectl get nodes
[root#kub1 ~]# kubectl get nodes -a
[root#kub1 ~]#
[root#kub2 ~]# kubectl get nodes -a
[root#kub2 ~]# kubectl get no
[root#kub2 ~]#
cluster events:
[root#kub1 ~]# kubectl get events -a
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kubelet kub2.local} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
/var/log/messages:
kubelet.go:1194] Unable to construct api.Node object for kubelet: can't get ip address of node node-kub2: lookup node-kub2: no such host
QUESTION: any idea why my nodes are not shown using "kubectl get nodes"?
My issue was that the KUBELET_HOSTNAME on /etc/kubernetes/kubeletvalue didn't match the hostname.
I commented that line, then restarted the services and I could see my worker after that.
hope that helps
Not sure about your scenario, but I have solved it after 3-4 hours of efforts.
Solved
I was facing this issue, because my docker cgroup driver was different than kubernetes cgroup driver.
Just updated it to cgroupfs using following commands mentioned in doc.
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
EOF
Restart docker service service docker restart.
Reset kubernetes on slave node: kubeadm reset
Joined master again: kubeadm join <><>
It was visible on master using kubectl get nodes.
I had a similar problem after installing k8s using kubespray on fedora31, and to debug the issue, tried to run a random container directly using docker run that failed with:
docker: Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.
this is a known problem cause by cgroup version on fedora 31, and the fix is to update grub to use the previous version:
sudo dnf install grubby
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"