How can I access the pod when it become CrashLoopBackOff? - kubernetes

Right now, I deployed some pods on my kubernetes cluster. But sometime, my image may has some bugs which make the pod cannot start correctly.
For example:
nats-1 0/1 CrashLoopBackOff 121 10h
I also cannot see any error in the kubectl log.
So is there any way to access this pod? Or is there any tools or tech can allow to to enter the container?
Thanks a lot all! :)

You can kubectl describe to get the events, it sometimes might show some errors there. Otherwise you can probably also make the deployment/pod run a command like sleep 3600 to keep it open for you to exec into it to investigate further.

Edited after clarification:
You could go into the worker (kubectl get pod <pod-name> -o wide to get which one) and access the node syslogs or pods' logs. That should show you a more detailed information of what happened.
But #ho-man approach is very valid and less cumbersome.

Related

How to monitor pod preemption event

I have a bunch of Rancher clusters I take care of and on some of them developers use PriorityClasses to ensure that some of the more important workloads get scheduled. The 3 PriorityClasses are in 3 digits range so they will not interfere with the default ones. However, at present none of the PriorityClasses is set as default and neither is the preemptionPolicy set so it defaults to PreemptLowerPriority.
None of the rancher, longhorn, prometheus, grafana, etc., workloads have priorityClassName set.
Long story short, I believe this causes havoc on the cluster when resources are in short supply.
Before I take my opinion to the developers I would like to collect some data to back up my story.
The question: How do I detect if the pod was Terminated due to Preemption?
I tried to google the subject but couldn't find anything. I was hoping kube state metrics would have something but I didn't find anything.
Any help would be greatly appreciated.
You can try to look for convincing data like the pod termination reason with help of kubectl.
You can see the last restart logs of a container using the following command:
kubectl logs podname -c containername --previous
You can also use the following command to check the lifecycle events sent by the kubelet to the apiserver about the pod.
kubectl describe pod podname
Finally, You can also write a final message to /dev/termination-log, and this will show up as described in the docs.
To use kubectl commands with rancher kindly refer to this documentation page.

How to debug a kubernetes cluster?

As the question shows, I have very low knowledge about kubernetes. Following a tutorial, I made a Kubernetes cluster to run a web app on a local server using Minikube. I have applied the kubernetes components and they are running but the Web-Server does not respond to HTTP requests. My problem is that all the system that I have created is like a black box for me and I have literally no idea how to open it and see where the problem is. Can you explain how I can debug such implementaions in a wise way. Thanks.
use a tool like https://github.com/kubernetes/kubernetes-dashboard
You can install kubectl and kubernetes-dashboard in a k8s cluster (https://kubernetes.io/docs/tasks/tools/install-kubectl/), and then use the kubectl command to query information about a pod or container, or use the kubernetes-dashboard web UI to query information about the cluster.
For more information, please refer to https://kubernetes.io/
kubectl get pods
will show you all your pods and their status. A quick check to make sure that all is at least running.
If there are pods that are unhealthy, then
kubectl describe pod <pod name>
will give some more information.. eg image not found etc
kubectl log <pod name> --all
is often the next step , use -f to follow the logs as you exercise your api.
It is possible to link up images running in a pod with most ide debuggers, but instructions will differ depending on language and ide used...

Getting Unkown target for HPA

Am actually new to kubernetes, but as now am good with the terms such as deployment, pods etc.
Well i was trying an example of HPA (Horizontal pod autoscaler), and as prerequisite metrics-servers is already integrated, but after all those things am not able to see HPA working as expected
enter image description here
When execute below cmd-
Kubectl get hpa
Unknown in the target, although i have tried all my luck referring online forum but didn't got any break through
Any help would be really appreciated
Thank you
I was getting same issue, fixed after adding cpu requests in my pod defination. Below points can be the reason in most of the cases
Metric server is not installed in your kubernates cluster, you can check with command
(kubectl get deploy,svc -n kube-system | egrep metrics-server)
Check if you have provided resources for deployment/sts/pod definations

How to find the reason of a pod crashing?

Is there a way to see why a kubernetes pod is failing with the status "craskLoopBackOff" under a heavy load?
I have a HorizontalPodAutoscaler which never kicks in. In its status it always shows low (Under 50%) cpu and memory usage.
Tailing the application logs within the pods doesnt give any insights either.
Try looking at the Kubernetes events kubectl get events --sort-by='.lastTimestamp'
If you don't get anything meaningful out of events go to the specific node and see the kubelet logs journalctl -u kubelet
To get logs from a pod you should use:
kubectl logs [podname] -p
You can also do kubelet logs but that's mostly for Cluster logs.
If there is no logs that means your application did not produces any logs before the crash. You would need to rewrite the app and for example add a memory dump on crush.
You mentioned that the pod is dying under heavy load but stats shows only 50% utilization. You should login to the pod and check yourself the load, maybe check how many files are being open because maybe you are hitting the limit.
You can read the Kubernetes docs about Application Introspection and Debugging and go over Debugging CrashLoopBackoffs with Init-Containers.
You can also try running your image in Docker and checking logs there. There is a nice documentation about Logs and troubleshooting available.
If you provide more details we might be more helpful.
Below are some obvious reasons for crashloopbackoff, which I have observed:
waiting for some condition to be full-filled e.g. some secrets,
failing healthcheck etc
pod is running with burstable or besteffort
QoS and is getting killed due to non-availability of resources on
node
You can run this script to find the possible issues for pods in a namespace: https://github.com/dguyhasnoname/k8s-day2-ops/blob/master/namespace_scripts/debug_app_namespace.sh

Kubernetes Deployment/Pod/Container statuses

I am currently working on a monitoring service that will monitor Kubernetes' deployments and their pods. I want to notify users when a deployment is not running the expected amount of replicas and also when pods' containers restart unexpectedly. This may not be the right things to monitor and I would greatly appreciate some feedback on what I should be monitoring.
Anyways, the main question is the differences between all of the Statuses of pods. And when I say Statuses I mean the Status column when running kubectl get pods. The statuses in question are:
- ContainerCreating
- ImagePullBackOff
- Pending
- CrashLoopBackOff
- Error
- Running
What causes pod/containers to go into these states?
For the first four Statuses, are these states recoverable without user interaction?
What is the threshold for a CrashLoopBackOff?
Is Running the only status that has a Ready Condition of True?
Any feedback would be greatly appreciated!
Also, would it be bad practice to use kubectl in an automated script for monitoring purposes? For example, every minute log the results of kubectl get pods to Elasticsearch?
You can see the pod lifecycle details in k8s documentation.
The recommended way of monitoring kubernetes cluster and applications are with prometheus
I will try to tell what I see hidden behind these terms
ContainerCreating
Showing when we wait to image be downloaded and the
container will be created by a docker or another system.
ImagePullBackOff
Showing when we have problem to download the image from a registry. Wrong credentials to log in to the docker hub for example.
Pending
The container starts (if start take time) or started but redinessProbe failed.
CrashLoopBackOff
This status showing when container restarts occur too much often. For example, we have process that tries to read not exists file and crash. Then the container will be recreated by Kube and repeat.
Error
This is pretty clear. We have some errors to run the container.
Running
All is good container running and livenessProbe is OK.