MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found - kubernetes

I am trying to configure ceph on kubernetes cluster using rook, I have run the following commands:
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster.yaml
I have three worker nodes with atached volumes and on master, all the created pods are running except the rook-ceph-crashcollector pods for the three nodes, when I describe these pods I get this message
MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found
However all the nodes are running and working

It is hard to exactly tell what might be the cause of this but there are few possibilities:
Cluster networking problem between nodes
Some possible leftover sockets in the /var/lib/kubelet directory related to rook ceph.
A bug when connecting to an external Ceph cluster.
In order to fix your issue you can:
Use Flannel and make sure it is using the right interface. Check the kube-flannel.yml file and see if it uses the --iface= option. Or alternatively try to use Calico.
Clear the ./var/lib/rook/, ./var/lib/kubelet/plugins/ and ./var/lib/kubelet/plugins_registry/ directories and reinstall the rook service.
Create the rook-ceph-crash-collector-keyring secret manually by executing: kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring.

Related

Microk8s fails to AUTOMATICALLY mount pod on Longhorn

I have a single node kubernetes setup on Ubuntu 20.04. Am using microk8s and longhorn storage for my single node cluster. I install packages using Helm via Lens IDE. I have configured everything as per the respective guides but anytime I install a package that requires persistence eg Mariadb or Wordpress, the following happens:
pv and pvc get created and Bound successfully
pod does not successfully create and throws the error below
MountVolume.SetUp failed for volume "pvc-fdada93c-c4af-4916-942f-abf9897feaf9" : applyFSGroup failed for vol pvc-fdada93c-c4af-4916-942f-abf9897feaf9: lstat /var/snap/microk8s/common/var/lib/kubelet/pods/f69173e1-cd98-4f86-9e52-edf62fa723da/volumes/kubernetes.io~csi/pvc-fdada93c-c4af-4916-942f-abf9897feaf9/mount: no such file or directory
when I manually create a directory using the command below, the pod will successfully start
mkdir -p /var/snap/microk8s/common/var/lib/kubelet/pods/f69173e1-cd98-4f86-9e52-edf62fa723da/volumes/kubernetes.io~csi/pvc-fdada93c-c4af-4916-942f-abf9897feaf9/mount
the issue will then repeat if I do server reboot
Question: How can I get the pods to automatically mount when I install a package from Helm. I have seen this happen on similar single node clusters using the same software.
NOTE: nfs-common and open-iscsi are both running
I was able to figure out the issue.
The issue was actually not due to Longhorn itself. It was due to CoreDNS.
Due to firewall restrictions, CoreDNS could not resolve internal kubernetes DNS, especially longhorn-backend
Provided the UI and Driver could not reach longhorn-backend, they could never start. Fixing CoreDNS issues fixed caused the longhorn services to work well and my PVCs and PVs also worked as expected.
Steps to resolve were as follow
Check the coredns pod for errors
kubectl logs coredns-7f9c69c78c-7dsjg -n kube-system
Any output other than simply the coredns version means you need to resolve the errors shown
For me it was done by disabling firewalls and adding 8.8.8.8 in my Node's /etc/resolv.conf file
Once resolved, you can ether wait a minute for coredns to resolve internal DNS or restart it with the command below
kubectl rollout restart deployment/coredns -n kube-system
Everything worked well after that!

Minikube - /var/log/kubeproxy.log: No such file or directory

I am trying to find the kubeproxy logs on minikube, It doesn't seem they are located.
sudo cat: /var/log/kubeproxy.log: No such file or directory
A more generic way (besides what hoque described) that you can use on any kubernetes cluster is to check the logs using kubectl.
kubectl logs kube-proxy-s8lcb -n kube-system
Using this solution allow you to check logs for any K8s cluster even if you don't have access to your nodes.
Pod logs are located in /var/log/pods/.
Run
$ minikube ssh
$ ls /var/log/pods/
default_dapi-test-pod_1566526c-1051-4102-a23b-13b73b1dd904
kube-system_coredns-5d4dd4b4db-7ttnf_59d7b01c-4d7d-40f9-8d6a-ac62b1fa018e
kube-system_coredns-5d4dd4b4db-n8d5t_6aa36b9a-6539-4ef2-b163-c7e713861fa2
kube-system_etcd-minikube_188c8af9ff66b5060895a385b1bb50c2
kube-system_kube-addon-manager-minikube_f7d3bd9bbbbdd48d97a3437e231fff24
kube-system_kube-apiserver-minikube_b15fea5ed20174140af5049ecdd1c59e
kube-system_kube-controller-manager-minikube_d8cdb4170ab1aac172022591866bd7eb
kube-system_kube-proxy-qc4xl_30a6100a-db70-42c1-bbd5-4a818379a004
kube-system_kube-scheduler-minikube_14ff2730e74c595cd255e47190f474fd

How to debug kubectl apply for kube-flannel.yml?

I'm trying to create a kubernetes cluster following the document at: https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
First I have installed kubeadm with docker image on Coreos (1520.9.0) inside VirtualBox with Vagrant:
docker run -it \
-v /etc:/rootfs/etc \
-v /opt:/rootfs/opt \
-v /usr/bin:/rootfs/usr/bin \
-e K8S_VERSION=v1.8.4 \
-e CNI_RELEASE=v0.6.0 \
xakra/kubeadm-installer:0.4.7 coreos
This was my kubeadm init:
kubeadm init --pod-network-cidr=10.244.0.0/16
When run the command:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
It returns:
clusterrole "flannel" configured
clusterrolebinding "flannel" configured
serviceaccount "flannel" configured
configmap "kube-flannel-cfg" configured
daemonset "kube-flannel-ds" configured
But if I check "kubectl get pods --all-namespaces"
It returns:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-coreos1 1/1 Running 0 18m
kube-system kube-apiserver-coreos1 1/1 Running 0 18m
kube-system kube-controller-manager-coreos1 0/1 CrashLoopBackOff 8 19m
kube-system kube-scheduler-coreos1 1/1 Running 0 18m
With journalctl -f -u kubelet I can see this error: Unable to update cni config: No networks found in /etc/cni/net.d
I suspect that something was wrong with the command kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml
Is there a way to know why this command doesn't work? Can I get some logs from anywhere?
Just tonight I used kubespray to provision a vagrant cluster, on CoreOS, using flannel (vxlan), and I was also mystified about how flannel could be a Pod inside Kubernetes
It turns out, as seen here, that they are using flannel-cni image from quay.io to write out CNI files using a flannel side-car plus hostDir volume-mounts; it outputs cni-conf.json (that configures CNI to use flannel), and then net-conf.json (that configures the subnet and backend used by flannel).
I hope the jinja2 mustache syntax doesn't obfuscate the answer, but I found it very interesting to see how the Kubernetes folks chose to do it "for real" to compare and contrast against the example DaemonSet given in the flannel-cni README. I guess that's the long way of saying: try the descriptors in the flannel-cni README, then if it doesn't work see if they differ in some way from the known-working kubespray setup
update: as a concrete example, observe that the Documentation yaml doesn't include the --iface= switch, and if your Vagrant setup is using both NAT and "private_network" then it likely means flannel is binding to eth0 (the NAT one) and not eth1 with a more static IP. I saw that caveat mentioned in the docs, but can't immediately recall where in order to cite it
update 2
Is there a way to know why this command doesn't work? Can I get some logs from anywhere?
One may almost always access the logs of a Pod (even a statically defined one such as kube-controller-manager-coreos1) in the same manner: kubectl --namespace=kube-system logs kube-controller-manager-coreos1, and in the CrashLoopBackOff circumstance, adding in the -p for "-p"revious will show the logs from the most recent crash (but only for a few seconds, not indefinitely), and occasionally kubectl --namespace=kube-system describe pod kube-controller-manager-coreos1 will show helpful information in either the Events section at the bottom, or in the "Status" block near the top if it was Terminated for cause
In the case of a very bad failure, such as the apiserver failing to come up (and thus kubectl logs won't do anything), then ssh-ing to the Node and using a mixture of journalctl -u kubelet.service --no-pager --lines=150 and docker logs ${the_sha_or_name} to try and see any error text. You will almost certainly need docker ps -a in the latter case to find the exited container's sha or name, but that same "only for a few seconds" applies, too, as dead containers will be pruned after some time.
In the case of vagrant, one can ssh into the VM in one of several ways:
vagrant ssh coreos1
vagrant ssh-config > ssh-config && ssh -F ssh-config coreos1
or if it has a "private_network" address, such as 192.168.99.101 or such, then you can usually ssh -i ~/.vagrant.d/insecure_private_key core#192.168.99.101 but one of the first two are almost always more convenient

How to kill pods on Kubernetes local setup

I am starting exploring runnign docker containers with Kubernetes. I did the following
Docker run etcd
docker run master
docker run service proxy
kubectl run web --image=nginx
To cleanup the state, I first stopped all the containers and cleared the downloaded images. However I still see pods running.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
web-3476088249-w66jr 1/1 Running 0 16m
How can I remove this?
To delete the pod:
kubectl delete pods web-3476088249-w66jr
If this pod is started via some replicaSet or deployment or anything that is creating replicas then find that and delete that first.
kubectl get all
This will list all the resources that have been created in your k8s cluster. To get information with respect to resources created in your namespace kubectl get all --namespace=<your_namespace>
To get info about the resource that is controlling this pod, you can do
kubectl describe web-3476088249-w66jr
There will be a field "Controlled By", or some owner field using which you can identify which resource created it.
When you do kubectl run ..., that's a deployment you create, not a pod directly. You can check this with kubectl get deploy. If you want to delete the pod, you need to delete the deployment with kubectl delete deploy DEPLOYMENT.
I would recommend you to create a namespace for testing when doing this kind of things. You just do kubectl create ns test, then you do all your tests in this namespace (by adding -n test). Once you have finished, you just do kubectl delete ns test, and you are done.
If you defined your object as Pod then
kubectl delete pod <--all | pod name>
will remove all of the generated Pod. But, If wrapped your Pod to Deployment object then running the command above only will trigger a re-creation of them.
In that case, you need to run
kubectl delete deployment <--all | deployment name>
That will also remove the Service object that is related to the deleted Deployment

Kubernetes - how to tear down cluster?

I've been trying to shut down kubernetes cluster , but I couldn't managed to do it.
When I type
kubectl cluster-info
I can see that my cluster is still running.
I tried commands like running script
kube-down.sh
but it didn't work.
I deleted all pods. How can I shut it down ?
The tear down section of the official documentation says:
To undo what kubeadm did, you should first drain the node and make sure that the node is empty before shutting it down.
Talking to the master with the appropriate credentials, run:
kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
kubectl delete node <node name>
Then, on the node being removed, reset all kubeadm installed state:
kubeadm reset
You cannot use kubectl stop command as it has been deprecated. If you have created pods using a yaml file, I suggest you use
kubectl delete -f <filename>.yml to stop any running pod.
You can also delete service associated with running pods by using the following command:
# Delete pods and services with same names "baz" and "foo"
kubectl delete pod,service baz foo
When using kube-down.sh you've to make sure that all the environment variables which were adjusted for the kube-up.sh are also used during the shut down. See also