Nginx Kubernetes POD stays in ContainerCreating - kubernetes

I was able to setup the Kubernetes Cluster on Centos7 with one master and two worker nodes, however when I try to deploy a pod with nginx, the state of the pod stays in ContainerRunning forever and doesn't seem to get out of it.
For pod network I am using the calico.
Can you please help me resolve this issue? for some reason I don't feel satisfied moving forward without resolving this issue, I tried to check forums etc, since the last two days and this is the last resort that I am reaching out to you.
[root#kube-master ~]# kubectl get pods --all-namespaces
[get pods result][1]
However when I run describe pods I see the below error for the nginx container under events section.
Warning FailedCreatePodSandBox 41s (x8 over 11m) kubelet,
kube-worker1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"ac77a42270009cba0c508e2fd82a84d6caef287bdb117d288d5193960b52abcb"
network for pod "nginx-6db489d4b7-2r4d2": networkPlugin cni failed to
set up pod "nginx-6db489d4b7-2r4d2_default" network: unable to connect
to Cilium daemon: failed to create cilium agent client after 30.000000
seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config:
dial unix /var/run/cilium/cilium.sock: connect: no such file or
directory
Hope you can help here.
Edit 1:
The ip address of the master VM is 192.168.40.133
Used the below command to initialize the kubeadm:
kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address 192.168.40.133
Used the below command to install the pod network:
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
The kubeadm init above gave me the join command that I used to join the workers into the cluster.
All the VMs are connected to host and bridged network adapters.

your pod subnet (specified by --pod-network-cidr) clashes with the network your VMs are located in: these 2 have to be distinct. Use something else for the pod subnet, for example 10.244.0.0/16 and then edit calico.yaml before applying it as described in the official docs:
POD_CIDR="10.244.0.0/16"
kubeadm init --pod-network-cidr=${POD_CIDR} --apiserver-advertise-address 192.168.40.133
curl https://docs.projectcalico.org/manifests/calico.yaml -O
sed -i -e "s?192.168.0.0/16?${POD_CIDR}?g" calico.yaml
kubectl apply -f calico.yaml
hope this helps :)
note: you don't really need to specify --apiserver-advertise-address flag: kubeadm will detect correctly the main IP of the machine most of the time.

Related

I cannot load the node information on kubernetes

When I ran the command below, I got the below messages
bistel#BISTelResearchDev-DN03:~$ kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
While in the master node, I get the information as below:
bistel#BISTelResearchDev-NN:/etc/kubernetes$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
bistelresearchdev-dn03 NotReady <none> 62s v1.19.3
bistelresearchdev-nn Ready master 57m v1.19.3
bistel#BISTelResearchDev-NN:/etc/kubernetes$
The bistelresearchdev-dn03 is the worker node and the message appears when I ran any command using kubectl as follows The connection to the server localhost:8080 was refused - did you specify the right host or port?.
I googled it a lot but any trials didn't work for me.
Thanks,
kubectl works only on master node in cluster. If you are getting this error then there is no issue.
I can see the issue here is node is NotReady status for that you can check below things.
Check kubelet is running on node bistelresearchdev-dn03 with systemctl status kubelet
Check network plugin is installed on your cluster.
The first computer you ran on is missing the kube config file.
Normally kubectl expects to find it at
~/.kube/config
If you get the one off the master node and copy it onto your machine your kubectl will see it and be able to use it.

MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found

I am trying to configure ceph on kubernetes cluster using rook, I have run the following commands:
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster.yaml
I have three worker nodes with atached volumes and on master, all the created pods are running except the rook-ceph-crashcollector pods for the three nodes, when I describe these pods I get this message
MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found
However all the nodes are running and working
It is hard to exactly tell what might be the cause of this but there are few possibilities:
Cluster networking problem between nodes
Some possible leftover sockets in the /var/lib/kubelet directory related to rook ceph.
A bug when connecting to an external Ceph cluster.
In order to fix your issue you can:
Use Flannel and make sure it is using the right interface. Check the kube-flannel.yml file and see if it uses the --iface= option. Or alternatively try to use Calico.
Clear the ./var/lib/rook/, ./var/lib/kubelet/plugins/ and ./var/lib/kubelet/plugins_registry/ directories and reinstall the rook service.
Create the rook-ceph-crash-collector-keyring secret manually by executing: kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring.

Kubernetes Nginx Ingress controller Readiness Probe failed

I am trying to setup my very first Kubernetes cluster and it seems to have setup fine until nginx-ingress controller.
Here is my cluster information:
Nodes: three RHEL7 and one RHEL8 nodes
Master is running on RHEL7
Kubernetes server version: 1.19.1
Networking used: flannel
coredns is running fine.
selinux and firewall are disabled on all nodes
Here are my all pods running in kube-system
I then followed instructions on following page to install nginx ingress controller: https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/
Instead of deployment, I decided to use daemon-set since I am going to have only few nodes running in my kubernetes cluster.
After following the instructions, pod on my RHEL8 is constantly failing with the following error:
Readiness probe failed: Get "http://10.244.3.2:8081/nginx-ready": dial
tcp 10.244.3.2:8081: connect: connection refused Back-off restarting
failed container
Here is the screenshot shows that RHEL7 pods are working just fine and RHEL8 is failing:
All nodes are setup exactly the same way and there is no difference.
I am very new to Kubernetes and don't know much internals of it. Can someone please point me on how can I debug and fix this issue? I am really willing to learn from issues like this.
This is how I provisioned RHEL7 and RHEL8 nodes
Installed docker version: 19.03.12, build 48a66213fe
Disabled firewalld
Disabled swap
Disabled SELinux
To enable iptables to see bridged traffic, set net.bridge.bridge-nf-call-ip6tables = 1 and net.bridge.bridge-nf-call-iptables = 1
Added hosts entry for all the nodes involved in Kubernetes cluster so that they can find each other without hitting DNS
Added IP address of all nodes in Kubernetes cluster on /etc/environment for no_proxy so that it doesn't hit corporate proxy
Verified docker driver to be "systemd" and NOT "cgroupfs"
Reboot server
Install kubectl, kubeadm, kubelet as per kubernetes guide here at: https://kubernetes.io/docs/tasks/tools/install-kubectl/
Start and enable kubelet service
Initialize master by executing the following:
kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
Apply node-selector patch for mixed OS scheduling
wget https://raw.githubusercontent.com/Microsoft/SDN/master/Kubernetes/flannel/l2bridge/manifests/node-selector-patch.yml
kubectl patch ds/kube-proxy --patch "$(cat node-selector-patch.yml)" -n=kube-system
Apply flannel CNI
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Modify net-conf.json section of kube-flannel.yml for a type "host-gw"
kubectl apply -f kube-flannel.yml
Apply node selector patch
kubectl patch ds/kube-flannel-ds-amd64 --patch "$(cat node-selector-patch.yml)" -n=kube-system
Thanks
According to kubernetes documentation the list of supported host operating systems is as follows:
Ubuntu 16.04+
Debian 9+
CentOS 7
Red Hat Enterprise Linux (RHEL) 7
Fedora 25+
HypriotOS v1.0.1+
Flatcar Container Linux (tested with 2512.3.0)
This article mentioned that there are network issues on RHEL 8:
(2020/02/11 Update: After installation, I keep facing pod network issue which is like deployed pod is unable to reach external network
or pods deployed in different workers are unable to ping each other
even I can see all nodes (master, worker1 and worker2) are ready via
kubectl get nodes. After checking through the Kubernetes.io official website, I observed the nfstables backend is not compatible with the
current kubeadm packages. Please refer the following link in “Ensure
iptables tooling does not use the nfstables backend”.
The simplest solution here is to reinstall the node on supported operating system.

Unable to run pods on new node

Had to change node (server) with the new one leaving the same node name. What I did was:
master> kubectl delete no srv1 (removing old node)
srv1> kubeadm join... (joining new node)
after new node joined cluster no pods can be created.
Warning FailedCreatePodSandBox 16s kubelet, srv1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b85728b51a18533e9d57f6a1b1808dbb5ad72bff4d516217de04e7dad4ce358d" network for pod "dpl-6f56777485-6jzm6": NetworkPlugin cni failed to set up pod "dpl-6f56777485-6jzm6_default" network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.16.1/24
Ideally when performing such a task like "replacing a node" below steps should be considered:
Drain node kubectl drain NODE_NAME
Reset that node kubeadm reset in the old node (optional step if the old node is accessible)
Finally kubeadm delete node NODE_NAME
Things to consider when replacing a old node with new node:
The new node should have the same name as the old node which is echo $HOSTNAME should remain same.
The new node should have the same ip as the old one.
Because these are a node identity.
Finally in a scenario where you have already performed kubectl delete node ... and replaced it with a new one.
curl -LO https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml
kubectl delete -f kube-flannel.yml
[perform below in the nodes which are having problems]
sudo ip link del cni0
sudo ip link del flannel.1
sudo systemctl restart network
[re-apply network plugin]
kubectl apply -f kube-flannel.yml

Something seems to be catching TCP traffic to pods

I'm trying to deploy Kubernetes with Calico (IPIP) with Kubeadm. After deployment is done I'm deploying Calico using these manifests
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
Before applying it, I'm editing CALICO_IPV4POOL_CIDR and setting it to 10.250.0.0/17 as well as using command kubeadm init --pod-cidr 10.250.0.0/17.
After few seconds CoreDNS pods (for example getting addr 10.250.2.2) starts restarting with error 10.250.2.2:8080 connection refused.
Now a bit of digging:
from any node in cluster ping 10.250.2.2 works and it reaches pod (tcpdump in pod net namespace shows it).
from different pod (on different node) curl 10.250.2.2:8080 works well
from any node to curl 10.250.2.2:8080 fails with connection refused
Because it's coredns pod it listens on 53 both udp and tcp, so I've tried netcat from nodes
nc 10.250.2.2 53 - connection refused
nc -u 10.250.2.2 55 - works
Now I've tcpdump each interface on source node for port 8080 and curl to CoreDNS pod doesn't even seem to leave node... sooo iptables?
I've also tried weave, canal and flannel, all seem to have same issue.
I've ran out of ideas by now...any pointers please?
Seems to be a problem with Calico implementation, CoreDNS Pods are sensitive on the CNI network Pods successful functioning.
For proper CNI network plugin implementation you have to include --pod-network-cidr flag to kubeadm init command and afterwards apply the same value to CALICO_IPV4POOL_CIDR parameter inside calico.yml.
Moreover, for a successful Pod network installation you have to apply some RBAC rules in order to make sufficient permissions in compliance with general cluster security restrictions, as described in official Kubernetes documentation:
For Calico to work correctly, you need to pass
--pod-network-cidr=192.168.0.0/16 to kubeadm init or update the calico.yml file to match your Pod network. Note that Calico works on
amd64 only.
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
In your case I would switched to the latest Calico versions at least from v3.3 as given in the example.
If you've noticed that you run Pod network plugin installation properly, please take a chance and update the question with your current environment setup and Kubernetes components versions with a health statuses.