GCP GKE: View logs of terminated jobs/pods - kubernetes

I have a few cron jobs on GKE.
One of the pods did terminate and now I am trying to access the logs.
➣ $ kubectl get events
LAST SEEN TYPE REASON KIND MESSAGE
23m Normal SuccessfulCreate Job Created pod: virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
22m Normal SuccessfulDelete Job Deleted pod: virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
22m Warning DeadlineExceeded Job Job was active longer than specified deadline
23m Normal Scheduled Pod Successfully assigned default/virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42 to staging-cluster-default-pool-4b4827bf-rpnl
23m Normal Pulling Pod pulling image "gcr.io/my-repo/myimage:v8"
23m Normal Pulled Pod Successfully pulled image "gcr.io/my-repo/my-image:v8"
23m Normal Created Pod Created container
23m Normal Started Pod Started container
22m Normal Killing Pod Killing container with id docker://virulent-angelfish-cronjob:Need to kill Pod
23m Normal SuccessfulCreate CronJob Created job virulent-angelfish-cronjob-netsuite-proservices-1562220000
22m Normal SawCompletedJob CronJob Saw completed job: virulent-angelfish-cronjob-netsuite-proservices-1562220000
So at least one CJ run.
I would like to see the pod's logs, but there is nothing there
➣ $ kubectl get pods
No resources found.
Given that in my cj definition, I have:
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 3
shouldn't at least one pod be there for me to do forensics?

Your pod is crashing or otherwise unhealthy
First, take a look at the logs of the current container:
kubectl logs ${POD_NAME} ${CONTAINER_NAME}
If your container has previously crashed, you can access the previous container’s crash log with:
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
Alternately, you can run commands inside that container with exec:
kubectl exec ${POD_NAME} -c ${CONTAINER_NAME} -- ${CMD} ${ARG1} ${ARG2} ... ${ARGN}
Note: -c ${CONTAINER_NAME} is optional. You can omit it for pods that only contain a single container.
As an example, to look at the logs from a running Cassandra pod, you might run:
kubectl exec cassandra -- cat /var/log/cassandra/system.log
If none of these approaches work, you can find the host machine that the pod is running on and SSH into that host.
Finaly, check Logging on Google StackDriver.
Debugging Pods
The first step in debugging a pod is taking a look at it. Check the current state of the pod and recent events with the following command:
kubectl describe pods ${POD_NAME}
Look at the state of the containers in the pod. Are they all Running? Have there been recent restarts?
Continue debugging depending on the state of the pods.
Debugging ReplicationControllers
ReplicationControllers are fairly straightforward. They can either create pods or they can’t. If they can’t create pods, then please refer to the instructions above to debug your pods.
You can also use kubectl describe rc ${CONTROLLER_NAME} to inspect events related to the replication controller.
Hope it helps you to find exactly problem.

You can use the --previous flag to get the logs for the previous pod.
So, you can use:
kubectl logs --previous virulent-angelfish-cronjob-netsuite-proservices-15622200008gc42
to get the logs for the pod that was there before this one.

Related

K8s cluster deployment error: nc: bad address 'xx'

host#host:~$ kubectl logs kafka-0 -c init-zookeeper
nc: bad address 'zookeeper-0.zookeeper-headless-service.default.svc.cluster.local'
I have deployed an k8s cluster. When the application pod was installed, the pod keep in the Init state. I try to find out where goes wrong, only get this error below.
pml#pml:~/bfn-mon/k8s$ kubectl get pods
NAME READY STATUS RESTARTS AGE
broker-59f66ff494-lwtxq 0/1 Init:0/2 0 41m
coordinator-9998c64b8-ql7xz 0/1 Init:0/2 0 41m
kafka-0 0/1 Init:0/1 0 41m
host#host:~$ kubectl logs kafka-0 -c init-zookeeper
nc: bad address 'zookeeper-0.zookeeper-headless-service.default.svc.cluster.local'
Would someone can tell what's going wrong? How can I fix it?
I would expect someone who did have the same problem, or know what's going wrong, and give some debug instructions.
During Pod startup, the kubelet delays running init containers until the networking and storage are ready. Then the kubelet runs the Pod's init containers in the order they appear in the Pod's spec.
For pods stuck in an init state with a bad address, It means the PVC may not be recycled correctly so the storage is not ready so the pod will be init state until it gets cleared.
From this link, you can follow below solutions:
Check if PVs are created and bound to all expected PVCs.
Run /opt/kubernetes/bin/kube-restart.sh to restart the cluster.

How to list containers and its info running a pod using kubectl command

When we run kubectl get pod => it is listing the count of containers running inside a pod and restart count. So I am not sure which container gets restarted. Either I need to login to UI or using kubectl describe pods.
NAME READY STATUS RESTARTS AGE
test-pod 2/2 Running 5 14h
But I need to see each container names and its restart count using kubectl command somethings as below.
NAME STATUS RESTARTS AGE
container-1 Running 2 14h
container-2 Running 3 14h
It would be helpful if someone helps me on this. Thanks in advance!
You can try something like this:
kubectl get pods <pod-name> -o jsonpath='{.spec.containers[*].name} {.status.containerStatuses[*].restartCount} {.status.containerStatuses[*].state}'
You will get in result container-name, restartCount and state.
Then you will be able to format it in such way as you need.

More detailed monitoring of Pod states

Our Pods usually spend at least a minute and up to several minutes in the Pending state, the events via kubectl describe pod x yield:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned testing/runner-2zyekyp-project-47-concurrent-0tqwl4 to host
Normal Pulled 55s kubelet, host Container image "registry.com/image:c1d98da0c17f9b1d4ca81713c138ee2e" already present on machine
Normal Created 55s kubelet, host Created container build
Normal Started 54s kubelet, host Started container build
Normal Pulled 54s kubelet, host Container image "gitlab/gitlab-runner-helper:x86_64-6214287e" already present on machine
Normal Created 54s kubelet, host Created container helper
Normal Started 54s kubelet, host Started container helper
The provided information is not exactly detailed as to figure out exactly what is happening.
Question:
How can we gather more detailed metrics of what exactly and when exactly something happens in regards to get a Pod running in order to troubleshoot which step exactly needs how much time?
Special interest would be the metric of how long it takes to mount a volume.
Check kubelet and kube scheduler logs because kube scheduler schedules the pod to a node and kubelet starts the pod on that node and reports the status as ready.
journalctl -u kubelet # after logging into the kubernetes node
kubectl logs kube-scheduler -n kube-system
Describe the pod, deployment, replicaset to get more details
kubectl describe pod podnanme -n namespacename
kubectl describe deploy deploymentnanme -n namespacename
kubectl describe rs replicasetnanme -n namespacename
Check events
kubectl get events -n namespacename
Describe the nodes and check available resources and status which should be ready.
kubectl describe node nodename

One pod in Kubernetes cluster crashes but other doesn't

Strangely, one pod in kubernetes cluster crashes but other doesn't!
codingjediweb-6d77f46b56-5mffg 0/1 CrashLoopBackOff 3 81s
codingjediweb-6d77f46b56-vcr8q 1/1 Running 0 81s
They should both have same image and both should work. What could be reason?
I suspect that the crashing pod has old image but I don't know why. Its because I fixed an issue and expected the code to work (which is on one of the pods).
Is it possible that different pods have different images? Is there a way to check which pod is running which image? Is there a way to "flush" an old image or force K8S to download even if it has a cache?
UPDATE
After Famen's suggestion, I looked at the image. I can see that for the crashing container seem to be using an existing image (which might be old). How can I make K8S always pull an image?
manuchadha25#cloudshell:~ (copper-frame-262317)$ kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 1 2d1h
codingjediweb-6d77f46b56-5mffg 0/1 CrashLoopBackOff 10 29m
codingjediweb-6d77f46b56-vcr8q 1/1 Running 0 29m
manuchadha25#cloudshell:~ (copper-frame-262317)$ kubectl describe pod codingjediweb-6d77f46b56-vcr8q | grep image
Normal Pulling 29m kubelet, gke-codingjediweb-cluste-default-pool-69be8339-wtjt Pulling image "docker.io/manuchadha25/codingjediweb:08072020v3"
Normal Pulled 29m kubelet, gke-codingjediweb-cluste-default-pool-69be8339-wtjt Successfully pulled image "docker.io/manuchadha25/codingjediweb:08072020v3"
manuchadha25#cloudshell:~ (copper-frame-262317)$ kubectl describe pod codingjediweb-6d77f46b56-5mffg | grep image
Normal Pulled 28m (x5 over 30m) kubelet, gke-codingjediweb-cluste-default-pool-69be8339-p5hx Container image "docker.io/manuchadha25/codingjediweb:08072020v3" already present on machine
manuchadha25#cloudshell:~ (copper-frame-262317)$
Also, the working pod has two entries for the image (pulling and pulled). Where are there two?
When you create a deployment, a replicaSet is created in the background. Each pod of that replicaSet has same properties(i.e. images, memory).
When you apply any changes by updating the PodTemplateSpec of the Deployment. A new ReplicaSet is created and the Deployment controller manages the moving of the Pods from the old ReplicaSet to the new one at a controlled rate. At this point you may find different pods from different replicaSet with differnt properties.
To check the image:
# get pod's yaml
$ kubectl get pods -n <namespace> <pod-name> -o yaml
# get deployment's yaml
$ kubectl get deployments -n <namespace> <deployment-name> -o yaml
Set imagePullPolicy to Always in your deployment yaml, to use the updated image by forcing a pull.

Hide Completed and other finished pods by default

In my professional environment it is common for "completed" pods to outnumber active ones and they often clutter the output of kubectl get pods like so:
$ kubectl get pods
finished-pod-38163 0/1 Completed 2m
errored-pod-83023 0/1 Error 2m
running-pod-20899 1/1 Running 2m
I can filter them out using --show-all=false:
$ kubectl get pods --show-all=false
running-pod-20899 1/1 Running 2m
However I would prefer not to have to type out --show-all=false every time I want to see my running pods. Is it possible to configure kubectl to disable --show-all by default rather than having it enabled by default?
From kubectl get pods --help:
-a, --show-all=true: When printing, show all resources (default show all pods
including terminated one.)
I know I could create some shell alias kgetpo, but this would remove support for tab-completion so I'd prefer native solutions if they exist.
You can try something like this:
kubectl get pods --field-selector=status.phase==Running