View log files of crashed pods in kubernetes - kubernetes

Any idea to view the log files of a crashed pod in kubernetes?
My pod is listing it's state as "CrashLoopBackOff" after started the replicationController. I search the available docs and couldn't find any.

Assuming that your pod still exists:
kubectl logs <podname> --previous
$ kubectl logs -h
-p, --previous[=false]: If true, print the logs for the previous instance of the container in a pod if it exists.

In many cases, kubectl logs <podname> --previous is returning:
Error from server (BadRequest): previous terminated container "<container-name>" in pod "<pod-name>" not found
So you can try to check in the namespace's events (kubectl get events ..) like #alltej showed.
If you don't find the reason for the error with kubectl logs / get events and you can't view it with external logging tool I would suggest:
1 ) Check on which node that pod was running on with:
$kubectl get -n <namespace> pod <pod-name> -o=custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName
NAME STATUS NODE
failed-pod-name Pending dns-of-node
(If you remove the <pod-name> you can see other pods in the namespace).
2 ) SSH into that node and:
A ) Search for the failed pod container name in /var/log/containers/ and dump its .log file and search for errors - in most of the cases the cause of error will be displayed there alongside with the actions / events that took place before the error.
B ) If previous step doesn't help try searching for latest System level errors by running: sudo journalctl -u kubelet -n 100 --no-pager.

kubectl logs command only works if the pod is up and running. If they are not, you can use the kubectl events command.
kubectl get events -n <your_app_namespace> --sort-by='.metadata.creationTimestamp'
By default it does not sort the events, hence the --sort-by flag.

There was a bug in kubernetes that prevents logs obtaining for pods in CrashLoopBackOff state. Looks like it was fixed. Here issue on github with additional information

As discussed on another StackOverflow question, I wrote an open source tool to do this
The main difference with the other answers is that this is triggered automatically when a pod crashes, so it can help avoid scenarios where you start debugging this much later on and the pod itself no longer exists and logs can't be fetched.

If the pod does not exist anymore:
kubectl describe pod {RUNTIME_NAME_OF_POD}
In the output you should have the section "Events" which contains the error messages that prevented the pod to start.

Container Failures could be due to resource limits reached
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 18 Jan 2023 11:28:14 +0530
Finished: Wed, 18 Jan 2023 11:28:18 +0530
Ready: False
Restart Count: 13
OR
The application ended due to an error
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Wed, 18 Jan 2023 11:50:59 +0530
Finished: Wed, 18 Jan 2023 11:51:03 +0530
Ready: False
Debugging container failure:
Looking at pod status which will contain the above status information
sugumar$ kubectl get pod POD_NAME -o yaml
sugumar$ kubectl get events -w | grep POD_NAME_STRING
For default container logs
sugumar$ kubectl logs -f POD_NAME
For specific container: reason for application failure
sugumar$ kubectl logs -f POD_NAME --container CONTAINER_NAME
Looking at events
sugumar$ kubectl describe deployment DEPLOYMENT_NAME
sugumar$ kubectl describe pod POD_NAME

Related

Pods are shown as TERMINATED after kubernetes node reboot/restart

We are running a reboot (sudo systemctl reboot) of a worker node with Kubernetes version v1.22.4.
We observe that pods on the rebooted node are presented with status TERMINATED.
kubectl get pod loggingx-7c879bf5c8-nvxls -n footprint”
NAME READY STATUS RESTARTS AGE
loggingx-7c879bf5c8-nvxls 3/3 Terminated 10 (44m ago) 29
Question:
All containers are up 3/3. Why is the status still Terminated ?
kubectl describe pod loggingx-7c879bf5c8-nvxls -n footprint”
Name: loggingx-7c879bf5c8-nvxls
Namespace: footprint
Priority: 0
Node: node-10-63-134-154/10.63.134.154
Start Time: Mon, 08 Aug 2022 07:07:15 +0000
.
.
.
Status: Running
Reason: Terminated
Message: Pod was terminated in response to imminent node shutdown.
Question: The status presented from kubectl get pod .. and kubectl describe pod is different. Why ?
We used the Lens tool and could confirm that the pods are actually running after the reboot!
This behavior applies to both deployments and statefulsets.
We ran the same test in a cluster with kubernetes v1.20.4:
After the reboot is completed, the node becomes ready again and pods are recreated in new/or same worker node.
It looks to us as that the new Beta feature "Non graceful node shutdown" introduced with v.1.21 has a strange impact on node reboot use case.
Have you had any similar experiences?
BR,
Thomas

How to cause an intentional restart of a single kubernetes pod

I am testing a log previous command and for that I need a pod to restart.
I can get my pods using a command like
kubectl get pods -n $ns -l $label
Which shows that my pods did not restart so far. I want to test the command:
kubectl logs $podname -n $ns --previous=true
That command fails because my pod did not restart making the --previous=true switch meaningless.
I am aware of this command to restart pods when configuration changed:
kubectl rollout restart deployment myapp -n $ns
This does not restart the containers in a way that is meaningful for my log command test but rather terminates the old pods and creates new pods (which have a restart count of 0).
I tried various versions of exec to see if I can shut them down from within but most commands I would use are not found in that container:
kubectl exec $podname -n $ns -- shutdown
kubectl exec $podname -n $ns -- shutdown now
kubectl exec $podname -n $ns -- halt
kubectl exec $podname -n $ns -- poweroff
How can I use a kubectl command to forcefully restart the pod with it retaining its identity and the restart counter increasing by one so that my test log command has a previous instance to return the logs from.
EDIT:
Connecting to the pod is well described.
kubectl -n $ns exec --stdin --tty $podname -- /bin/bash
The process list shows only a handful running processes:
ls -1 /proc | grep -Eo "^[0-9]{1,5}$"
proc 1 seems to be the one running the pod.
kill 1 does nothing, not even kill the proc with pid 1
I am still looking into this at the moment.
There are different ways to achieve your goal. I'll describe below most useful options.
Crictl
Most correct and efficient way - restart the pod on container runtime level.
I tested this on Google Cloud Platform - GKE and minikube with docker driver.
You need to ssh into the worker node where the pod is running. Then find it's POD ID:
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
9863a993e0396 87a94228f133e 3 minutes ago Running nginx-3 2 6d17dad8111bc
OR
$ crictl pods -s ready
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
6d17dad8111bc About an hour ago Ready nginx-3 default 2 (default)
Then stop it:
$ crictl stopp 6d17dad8111bc
Stopped sandbox 6d17dad8111bc
After some time, kubelet will start this pod again (with different POD ID in CRI, however kubernetes cluster treats this pod as the same):
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
f5f0442841899 87a94228f133e 41 minutes ago Running nginx-3 3 b628e1499da41
This is how it looks in cluster:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-3 1/1 Running 3 48m
Getting logs with --previous=true flag also confirmed it's the same POD for kubernetes.
Kill process 1
It works with most images, however not always.
E.g. I tested on simple pod with nginx image:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 27h
$ kubectl exec -it nginx -- /bin/bash
root#nginx:/# kill 1
root#nginx:/# command terminated with exit code 137
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 1 27h
Useful link:
Debugging Kubernetes nodes with crictl

Why busybox pod keep restarting after completed with exitCode 0

Created a pod using following command
kubectl run bb --image=busybox --generator=run-pod/v1 --command -- sh -c "echo hi"
Pod is getting created repeatedly
bb 1/1 Running 1 7s
bb 0/1 Completed 1 8s
bb 0/1 CrashLoopBackOff 1 9s
bb 0/1 Completed 2 22s
bb 0/1 CrashLoopBackOff 2 23s
bb 0/1 Completed 3 53s
exit code is 0
k describe pod bb
...
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 29 Aug 2019 22:58:36 +0000
Finished: Thu, 29 Aug 2019 22:58:36 +0000
Ready: False
Restart Count: 7
Thanks
kubectl run defaults to setting the "restart policy" to "Always". It also sets up a Deployment in this case, to manage the pod.
--restart='Always': The restart policy for this Pod. Legal values
[Always, OnFailure, Never]. If set to 'Always' a deployment is created,
if set to 'OnFailure' a job is created, if set to 'Never', a regular pod
is created. For the latter two --replicas must be 1. Default 'Always',
for CronJobs `Never`.
If you change the command to:
kubectl run bb \
--image=busybox \
--generator=run-pod/v1 \
--restart=Never \
--command -- sh -c "echo hi"
A Job will be setup and the pod won't be restarted.
Outside of kubectl run
All pod specs will include a restartPolicy, which defaults to Always so must be specified if you want different behaviour.
spec:
template:
spec:
containers:
- name: something
restartPolicy: Never
If you are looking to run a task to completion, try a Job instead.
Please see the Last state reason which is Completed.
Terminated: Indicates that the container completed its execution and has stopped running. A container enters into this when it has successfully completed execution or when it has failed for some reason. Regardless, a reason and exit code is displayed, as well as the container’s start and finish time. Before a container enters into Terminated, preStop hook (if any) is executed.
...
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 30 Jan 2019 11:45:26 +0530
Finished: Wed, 30 Jan 2019 11:45:26 +0530
...
Please see more details here. And you can try something like this will show the difference.
kubectl run bb --image=busybox --generator=run-pod/v1 --command -- sh -c "sleep 1000"

GCP GKE: View logs of terminated jobs/pods

I have a few cron jobs on GKE.
One of the pods did terminate and now I am trying to access the logs.
➣ $ kubectl get events
LAST SEEN TYPE REASON KIND MESSAGE
23m Normal SuccessfulCreate Job Created pod: virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
22m Normal SuccessfulDelete Job Deleted pod: virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
22m Warning DeadlineExceeded Job Job was active longer than specified deadline
23m Normal Scheduled Pod Successfully assigned default/virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42 to staging-cluster-default-pool-4b4827bf-rpnl
23m Normal Pulling Pod pulling image "gcr.io/my-repo/myimage:v8"
23m Normal Pulled Pod Successfully pulled image "gcr.io/my-repo/my-image:v8"
23m Normal Created Pod Created container
23m Normal Started Pod Started container
22m Normal Killing Pod Killing container with id docker://virulent-angelfish-cronjob:Need to kill Pod
23m Normal SuccessfulCreate CronJob Created job virulent-angelfish-cronjob-netsuite-proservices-1562220000
22m Normal SawCompletedJob CronJob Saw completed job: virulent-angelfish-cronjob-netsuite-proservices-1562220000
So at least one CJ run.
I would like to see the pod's logs, but there is nothing there
➣ $ kubectl get pods
No resources found.
Given that in my cj definition, I have:
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 3
shouldn't at least one pod be there for me to do forensics?
Your pod is crashing or otherwise unhealthy
First, take a look at the logs of the current container:
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
If your container has previously crashed, you can access the previous container’s crash log with:
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
Alternately, you can run commands inside that container with exec:
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
Note: -c ${CONTAINER_NAME} is optional. You can omit it for pods that only contain a single container.
As an example, to look at the logs from a running Cassandra pod, you might run:
kubectl exec cassandra -- cat /var/log/cassandra/system.log
If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host.
Finaly, check Logging on Google StackDriver.
Debugging Pods
The first step in debugging a pod is taking a look at it. Check the current state of the pod and recent events with the following command:
kubectl describe pods ${POD_NAME}
Look at the state of the containers in the pod. Are they all Running? Have there been recent restarts?
Continue debugging depending on the state of the pods.
Debugging ReplicationControllers
ReplicationControllers are fairly straightforward. They can either create pods or they can’t. If they can’t create pods, then please refer to the instructions above to debug your pods.
You can also use kubectl describe rc ${CONTROLLER_NAME} to inspect events related to the replication controller.
Hope it helps you to find exactly problem.
You can use the --previous flag to get the logs for the previous pod.
So, you can use:
kubectl logs --previous virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
to get the logs for the pod that was there before this one.

kubectl get nodes shows NotReady

I have installed two nodes kubernetes 1.12.1 in cloud VMs, both behind internet proxy. Each VMs have floating IPs associated to connect over SSH, kube-01 is a master and kube-02 is a node. Executed export:
no_proxy=127.0.0.1,localhost,10.157.255.185,192.168.0.153,kube-02,192.168.0.25,kube-01
before running kubeadm init, but I am getting the following status for kubectl get nodes:
NAME STATUS ROLES AGE VERSION
kube-01 NotReady master 89m v1.12.1
kube-02 NotReady <none> 29s v1.12.2
Am I missing any configuration? Do I need to add 192.168.0.153 and 192.168.0.25 in respective VM's /etc/hosts?
Looks like pod network is not installed yet on your cluster . You can install weave for example with below command
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
After a few seconds, a Weave Net pod should be running on each Node and any further pods you create will be automatically attached to the Weave network.
You can install pod networks of your choice . Here is a list
after this check
$ kubectl describe nodes
check all is fine like below
Conditions:
Type Status
---- ------
OutOfDisk False
MemoryPressure False
DiskPressure False
Ready True
Capacity:
cpu: 2
memory: 2052588Ki
pods: 110
Allocatable:
cpu: 2
memory: 1950188Ki
pods: 110
next ssh to the pod which is not ready and observe kubelet logs. Most likely errors can be of certificates and authentication.
You can also use journalctl on systemd to check kubelet errors.
$ journalctl -u kubelet
Try with this
Your coredns is in pending state check with the networking plugin you have used and check the proper addons are added
check kubernates troubleshooting guide
https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#coredns-or-kube-dns-is-stuck-in-the-pending-state
https://kubernetes.io/docs/concepts/cluster-administration/addons/
And install the following with those
And check
kubectl get pods -n kube-system
On the off chance it might be the same for someone else, in my case, I was using the wrong AMI image to create the nodegroup.
Run
journalctl -u kubelet
Then check at node logs, if you get below error, disable the sawp using swapoff -a
"Failed to run kubelet" err="failed to run Kubelet: running with swap on is not supported, please disable swap! or set --fa
Main process exited, code=exited, status=1/FAILURE