Recovery from kubectl crash - kubernetes

What is the best way to troubleshoot when kubectl doesn't responde or exit with timeout? How to get it work again?
I'm having my kubectl as well as helm on my cluster down when installing a helm chart.

General advice:
Check if your kubectl is connecting to the correct kube-api endpoint. You could take a look at your kubeconfig. It is by default stored in $HOME/.kube. Try simple CURL to make sure that it is not DNS problem, etc.
Take a look at your nodes' logs by ssh into the nodes that you have: see this for more details instructions and log locations.
Once you have more information, you could get yourself started in the investigation of problems.

Related

Permission denied using kubectl but able to run helm

I am facing permission denied errors when using kubectl for all commands, be get pods or apply, but I am able to use helm and login with k9s to perform destructive actions. I am using the same context for all of these actions.
kubectl get nodes
# error: You must be logged in to the server (Unauthorized)
kubectl apply -f some-manifest.yaml
# error: You must be logged in to the server (the server has asked for the client to provide credentials)
Does anyone have a hint as to why this is happening or what to look further into? I am using a managed k8s on Vultr, a smaller cloud provider.
Don't know what specifically the issue was but I rebuilt my .kube/config file slowly with all my contexts and it ended up working again.
Very strange though that helm worked and kubectl didn't though...
I am pretty sure that this is a "kubernetes context" problem
Check the solution here: helm and kubectl context mismatch
Solution for k9s can be found here: https://k9scli.io/topics/commands/

deployed a service on k8s but not showing any pods weven when it failed

I have deployed a k8s service, however its not showing any pods. This is what I see
kubectl get deployments
It should create on the default namespace
kubectl get nodes (this shows me nothing)
How do I troubleshoot a failed deployment. The test-control-plane is the one deployed by kind this is the k8s one I'm using.
kubectl get nodes
If above command is not showing anything which mean there is no Nodes in your cluster so where your workload will run ?
You need to have at least one worker node in K8s cluster so deployment can schedule the POD on it and run the application.
You can check worker node using same command
kubectl get nodes
You can debug more and check the reason of issue further using
kubectl describe deployment <name of your deployment>
To find out what really went wrong, first follow the steps described in Harsh Manvar in his answer. Perhaps obtaining that information can help you find the problem. If not, check the logs of your deployment. Try to list your pods and see which ones did not boot properly, then check their logs.
You can also use the kubectl describe on pods to see in more detail what went wrong. Since you are using kind, I include a list of known errors for you.
You can also see this visual guide on troubleshooting Kubernetes deployments and 5 Tips for Troubleshooting Kubernetes Deployments.

How to monitor pod preemption event

I have a bunch of Rancher clusters I take care of and on some of them developers use PriorityClasses to ensure that some of the more important workloads get scheduled. The 3 PriorityClasses are in 3 digits range so they will not interfere with the default ones. However, at present none of the PriorityClasses is set as default and neither is the preemptionPolicy set so it defaults to PreemptLowerPriority.
None of the rancher, longhorn, prometheus, grafana, etc., workloads have priorityClassName set.
Long story short, I believe this causes havoc on the cluster when resources are in short supply.
Before I take my opinion to the developers I would like to collect some data to back up my story.
The question: How do I detect if the pod was Terminated due to Preemption?
I tried to google the subject but couldn't find anything. I was hoping kube state metrics would have something but I didn't find anything.
Any help would be greatly appreciated.
You can try to look for convincing data like the pod termination reason with help of kubectl.
You can see the last restart logs of a container using the following command:
kubectl logs podname -c containername --previous
You can also use the following command to check the lifecycle events sent by the kubelet to the apiserver about the pod.
kubectl describe pod podname
Finally, You can also write a final message to /dev/termination-log, and this will show up as described in the docs.
To use kubectl commands with rancher kindly refer to this documentation page.

How to debug a kubernetes cluster?

As the question shows, I have very low knowledge about kubernetes. Following a tutorial, I made a Kubernetes cluster to run a web app on a local server using Minikube. I have applied the kubernetes components and they are running but the Web-Server does not respond to HTTP requests. My problem is that all the system that I have created is like a black box for me and I have literally no idea how to open it and see where the problem is. Can you explain how I can debug such implementaions in a wise way. Thanks.
use a tool like https://github.com/kubernetes/kubernetes-dashboard
You can install kubectl and kubernetes-dashboard in a k8s cluster (https://kubernetes.io/docs/tasks/tools/install-kubectl/), and then use the kubectl command to query information about a pod or container, or use the kubernetes-dashboard web UI to query information about the cluster.
For more information, please refer to https://kubernetes.io/
kubectl get pods
will show you all your pods and their status. A quick check to make sure that all is at least running.
If there are pods that are unhealthy, then
kubectl describe pod <pod name>
will give some more information.. eg image not found etc
kubectl log <pod name> --all
is often the next step , use -f to follow the logs as you exercise your api.
It is possible to link up images running in a pod with most ide debuggers, but instructions will differ depending on language and ide used...

Kubectl documentation without starting Kubernetes

I have installed a K8S cluster on laptop using Kubeadm and VirtualBox. It seems a bit odd that the cluster has to be up and running to see the documentation as shown below.
praveensripati#praveen-ubuntu:~$ kubectl explain pods
Unable to connect to the server: dial tcp 192.168.0.31:6443: connect: no route to host
Any workaround for this?
See "kubectl explain — #HeptioProTip"
Behind the scenes, kubectl just made an API request to my Kubernetes cluster, grabbed the current Swagger documentation of the API version running in the cluster, and output the documentation and object types.
Try kubectl help as an offline alternative, but that won't be as complete (limite to kubectl itself).
So the rather sobering news is that AFAIK there's not out-of-the box way how to do it, though you could totally write a kubectl plugin (it has become rather trivial now in 1.12). But for now, the best I can offer is the following:
# figure out which endpoint kubectl uses to retrieve docs:
$ kubectl -v9 explain pods
# from above I learn that in my case it's apparently
# https://192.168.64.11:8443/openapi/v2 so let's curl that:
$ curl -k https://192.168.64.11:8443/openapi/v2 > resources-docs.json
From here you can, for example, use jq to query for the descriptions. It's not as nice as a proper explain, but kinda is a good enough workaround until someone writes an docs offline query kubectl plugin.
The 'explain' documentation lives in the kube-apiserver and its resource definitions. Hence the need to connect to it through kubectl explain to get any docs. This is different from the standard very basic cli help from kubectl where it's in the kubectl Golang code.
So no workaround really other than setting up a dummy Kubernetes cluster and have kubectl point to it. Please note that CRDs help might not be available since they live in the deployed CRDs themselves.