Microk8s fails to AUTOMATICALLY mount pod on Longhorn - kubernetes

I have a single node kubernetes setup on Ubuntu 20.04. Am using microk8s and longhorn storage for my single node cluster. I install packages using Helm via Lens IDE. I have configured everything as per the respective guides but anytime I install a package that requires persistence eg Mariadb or Wordpress, the following happens:
pv and pvc get created and Bound successfully
pod does not successfully create and throws the error below
MountVolume.SetUp failed for volume "pvc-fdada93c-c4af-4916-942f-abf9897feaf9" : applyFSGroup failed for vol pvc-fdada93c-c4af-4916-942f-abf9897feaf9: lstat /var/snap/microk8s/common/var/lib/kubelet/pods/f69173e1-cd98-4f86-9e52-edf62fa723da/volumes/kubernetes.io~csi/pvc-fdada93c-c4af-4916-942f-abf9897feaf9/mount: no such file or directory
when I manually create a directory using the command below, the pod will successfully start
mkdir -p /var/snap/microk8s/common/var/lib/kubelet/pods/f69173e1-cd98-4f86-9e52-edf62fa723da/volumes/kubernetes.io~csi/pvc-fdada93c-c4af-4916-942f-abf9897feaf9/mount
the issue will then repeat if I do server reboot
Question: How can I get the pods to automatically mount when I install a package from Helm. I have seen this happen on similar single node clusters using the same software.
NOTE: nfs-common and open-iscsi are both running

I was able to figure out the issue.
The issue was actually not due to Longhorn itself. It was due to CoreDNS.
Due to firewall restrictions, CoreDNS could not resolve internal kubernetes DNS, especially longhorn-backend
Provided the UI and Driver could not reach longhorn-backend, they could never start. Fixing CoreDNS issues fixed caused the longhorn services to work well and my PVCs and PVs also worked as expected.
Steps to resolve were as follow
Check the coredns pod for errors
kubectl logs coredns-7f9c69c78c-7dsjg -n kube-system
Any output other than simply the coredns version means you need to resolve the errors shown
For me it was done by disabling firewalls and adding 8.8.8.8 in my Node's /etc/resolv.conf file
Once resolved, you can ether wait a minute for coredns to resolve internal DNS or restart it with the command below
kubectl rollout restart deployment/coredns -n kube-system
Everything worked well after that!

Related

How to restore accidentally deleted a kube-proxy DaemonSet in a Kubernetes cluster?

I accidentally deleted kube-proxy daemonset by using command: kubectl delete -n kube-system daemonset kube-proxy which should run kube-proxy pods in my cluster, what the best way to restore it?
That's how it should look
Kubernetes allows you to reinstall kube-proxy by running the following command which install the kube-proxy addon components via the API server.
$ kubeadm init phase addon kube-proxy --kubeconfig ~/.kube/config --apiserver-advertise-address string
This will generate the output as
[addons] Applied essential addon: kube-proxy
The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.
Hence kube-proxy will be reinstalled in the cluster by creating a DaemonSet and launching the pods.
kube-proxy daemon got created at the time of cluster creation, so you need to write your own manifest for daemon-set unless you have a backup to restore it from there.

Kubernetes pod failed to update

We have a Gitlab CI/CD to deploy pod via Kubernetes. However, the updated pod is always pending and the deleted pod is always stuck at terminating.
The controller and scheduler are both okay.
If I described the pending pod, it shows it is scheduled but nothing else.
This is the pending pod's logs:
$ kubectl logs -f robo-apis-dev-7b79ccf74b-nr9q2 -n xxx -f Error from
server (BadRequest): container "robo-apis-dev" in pod
"robo-apis-dev-7b79ccf74b-nr9q2" is waiting to start:
ContainerCreating
What could be the issue? Our Kubernetes cluster never had this issue before.
Okay, it turns out we used to have an NFS server as PVC. But we have moved to AWS EKS recently, thus cleaning the NFS servers. Maybe there are some resources from nodes that are still on the NFS server. Once we temporarily roll back the NFS server, the pods start to move to RUNNING state.
The issue was discussed here - Orphaned pod https://github.com/kubernetes/kubernetes/issues/60987

MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found

I am trying to configure ceph on kubernetes cluster using rook, I have run the following commands:
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster.yaml
I have three worker nodes with atached volumes and on master, all the created pods are running except the rook-ceph-crashcollector pods for the three nodes, when I describe these pods I get this message
MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found
However all the nodes are running and working
It is hard to exactly tell what might be the cause of this but there are few possibilities:
Cluster networking problem between nodes
Some possible leftover sockets in the /var/lib/kubelet directory related to rook ceph.
A bug when connecting to an external Ceph cluster.
In order to fix your issue you can:
Use Flannel and make sure it is using the right interface. Check the kube-flannel.yml file and see if it uses the --iface= option. Or alternatively try to use Calico.
Clear the ./var/lib/rook/, ./var/lib/kubelet/plugins/ and ./var/lib/kubelet/plugins_registry/ directories and reinstall the rook service.
Create the rook-ceph-crash-collector-keyring secret manually by executing: kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring.

Kubernetes Nginx Ingress controller Readiness Probe failed

I am trying to setup my very first Kubernetes cluster and it seems to have setup fine until nginx-ingress controller.
Here is my cluster information:
Nodes: three RHEL7 and one RHEL8 nodes
Master is running on RHEL7
Kubernetes server version: 1.19.1
Networking used: flannel
coredns is running fine.
selinux and firewall are disabled on all nodes
Here are my all pods running in kube-system
I then followed instructions on following page to install nginx ingress controller: https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/
Instead of deployment, I decided to use daemon-set since I am going to have only few nodes running in my kubernetes cluster.
After following the instructions, pod on my RHEL8 is constantly failing with the following error:
Readiness probe failed: Get "http://10.244.3.2:8081/nginx-ready": dial
tcp 10.244.3.2:8081: connect: connection refused Back-off restarting
failed container
Here is the screenshot shows that RHEL7 pods are working just fine and RHEL8 is failing:
All nodes are setup exactly the same way and there is no difference.
I am very new to Kubernetes and don't know much internals of it. Can someone please point me on how can I debug and fix this issue? I am really willing to learn from issues like this.
This is how I provisioned RHEL7 and RHEL8 nodes
Installed docker version: 19.03.12, build 48a66213fe
Disabled firewalld
Disabled swap
Disabled SELinux
To enable iptables to see bridged traffic, set net.bridge.bridge-nf-call-ip6tables = 1 and net.bridge.bridge-nf-call-iptables = 1
Added hosts entry for all the nodes involved in Kubernetes cluster so that they can find each other without hitting DNS
Added IP address of all nodes in Kubernetes cluster on /etc/environment for no_proxy so that it doesn't hit corporate proxy
Verified docker driver to be "systemd" and NOT "cgroupfs"
Reboot server
Install kubectl, kubeadm, kubelet as per kubernetes guide here at: https://kubernetes.io/docs/tasks/tools/install-kubectl/
Start and enable kubelet service
Initialize master by executing the following:
kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
Apply node-selector patch for mixed OS scheduling
wget https://raw.githubusercontent.com/Microsoft/SDN/master/Kubernetes/flannel/l2bridge/manifests/node-selector-patch.yml
kubectl patch ds/kube-proxy --patch "$(cat node-selector-patch.yml)" -n=kube-system
Apply flannel CNI
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Modify net-conf.json section of kube-flannel.yml for a type "host-gw"
kubectl apply -f kube-flannel.yml
Apply node selector patch
kubectl patch ds/kube-flannel-ds-amd64 --patch "$(cat node-selector-patch.yml)" -n=kube-system
Thanks
According to kubernetes documentation the list of supported host operating systems is as follows:
Ubuntu 16.04+
Debian 9+
CentOS 7
Red Hat Enterprise Linux (RHEL) 7
Fedora 25+
HypriotOS v1.0.1+
Flatcar Container Linux (tested with 2512.3.0)
This article mentioned that there are network issues on RHEL 8:
(2020/02/11 Update: After installation, I keep facing pod network issue which is like deployed pod is unable to reach external network
or pods deployed in different workers are unable to ping each other
even I can see all nodes (master, worker1 and worker2) are ready via
kubectl get nodes. After checking through the Kubernetes.io official website, I observed the nfstables backend is not compatible with the
current kubeadm packages. Please refer the following link in “Ensure
iptables tooling does not use the nfstables backend”.
The simplest solution here is to reinstall the node on supported operating system.

How to restart kube-proxy in Kubernetes 1.2 (GKE)

As of Kubernetes 1.2, kube-proxy is now a pod running in the kube-system namespace.
The old init script /etc/init.d/kube-proxy has been removed.
Aside from simply resetting the GCE instance, is there a good way to restart kube-proxy?
I just added an annotation to change the proxy mode, and I need to restart kube-proxy for my change to take effect.
The kube-proxy is run as an addon pod, meaning the Kubelet will automatically restart it if it goes away. This means you can restart the kube-proxy pod by simply deleting it:
$ kubectl delete pod --namespace=kube-system kube-proxy-${NODE_NAME}
Where $NODE_NAME is the node you want to restart the proxy on (this is assuming a default configuration, otherwise kubectl get pods --kube-system should include the list of kube-proxy pods).
If the restarted kube-proxy is missing your annotation change, you may need to update the manifest file, usually found in /etc/kubernetes/manifests on the node.