Identify pod which is not in a Ready state - kubernetes

We have deployed a few pods in cluster in various namespaces. I would like to inspect and identify all pod which is not in a Ready state.
master $ k get pod/nginx1401 -n dev1401
NAME READY STATUS RESTARTS AGE
nginx1401 0/1 Running 0 10m
In above list, Pod are showing in Running status but having some issue. How can we find the list of those pods. Below command not showing me the desired output:
kubectl get po -A | grep Pending Looking for pods that have yet to schedule
kubectl get po -A | grep -v Running Looking for pods in a state other than Running
kubectl get pods --field-selector=status.phase=Failed

There is a long-standing feature request for this. The latest entry suggests
kubectl get po --all-namespaces | gawk 'match($3, /([0-9])+\/([0-9])+/, a) {if (a[1] < a[2] && $4 != "Completed") print $0}'
for finding pods that are running but not complete.
There are a lot of other suggestions in the thread that might work as well.

You can try this:
$ kubectl get po --all-namespaces -w
you will get an update whenever any change(create/update/delete) happened in the pod for all namespace
Or you can watch all pod by using:
$ watch -n 1 kubectl get po --all-namespaces
This will continuously watch all pod in any namespace in 1 seconds interval.

Related

How to cause an intentional restart of a single kubernetes pod

I am testing a log previous command and for that I need a pod to restart.
I can get my pods using a command like
kubectl get pods -n $ns -l $label
Which shows that my pods did not restart so far. I want to test the command:
kubectl logs $podname -n $ns --previous=true
That command fails because my pod did not restart making the --previous=true switch meaningless.
I am aware of this command to restart pods when configuration changed:
kubectl rollout restart deployment myapp -n $ns
This does not restart the containers in a way that is meaningful for my log command test but rather terminates the old pods and creates new pods (which have a restart count of 0).
I tried various versions of exec to see if I can shut them down from within but most commands I would use are not found in that container:
kubectl exec $podname -n $ns -- shutdown
kubectl exec $podname -n $ns -- shutdown now
kubectl exec $podname -n $ns -- halt
kubectl exec $podname -n $ns -- poweroff
How can I use a kubectl command to forcefully restart the pod with it retaining its identity and the restart counter increasing by one so that my test log command has a previous instance to return the logs from.
EDIT:
Connecting to the pod is well described.
kubectl -n $ns exec --stdin --tty $podname -- /bin/bash
The process list shows only a handful running processes:
ls -1 /proc | grep -Eo "^[0-9]{1,5}$"
proc 1 seems to be the one running the pod.
kill 1 does nothing, not even kill the proc with pid 1
I am still looking into this at the moment.
There are different ways to achieve your goal. I'll describe below most useful options.
Crictl
Most correct and efficient way - restart the pod on container runtime level.
I tested this on Google Cloud Platform - GKE and minikube with docker driver.
You need to ssh into the worker node where the pod is running. Then find it's POD ID:
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
9863a993e0396 87a94228f133e 3 minutes ago Running nginx-3 2 6d17dad8111bc
OR
$ crictl pods -s ready
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
6d17dad8111bc About an hour ago Ready nginx-3 default 2 (default)
Then stop it:
$ crictl stopp 6d17dad8111bc
Stopped sandbox 6d17dad8111bc
After some time, kubelet will start this pod again (with different POD ID in CRI, however kubernetes cluster treats this pod as the same):
$ crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
f5f0442841899 87a94228f133e 41 minutes ago Running nginx-3 3 b628e1499da41
This is how it looks in cluster:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-3 1/1 Running 3 48m
Getting logs with --previous=true flag also confirmed it's the same POD for kubernetes.
Kill process 1
It works with most images, however not always.
E.g. I tested on simple pod with nginx image:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 27h
$ kubectl exec -it nginx -- /bin/bash
root#nginx:/# kill 1
root#nginx:/# command terminated with exit code 137
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 1 27h
Useful link:
Debugging Kubernetes nodes with crictl

kubectl status.phase=Running return wrong results

When I run:
kubectl get pods --field-selector=status.phase=Running
I see:
NAME READY STATUS RESTARTS AGE
k8s-fbd7b 2/2 Running 0 5m5s
testm-45gfg 1/2 Error 0 22h
I don't understand why this command gives me pod that are in Error status?
According to K8S api, there is no such thing STATUS=Error.
How can I get only the pods that are in this Error status?
When I run:
kubectl get pods --field-selector=status.phase=Failed
It tells me that there are no pods in that status.
Using the kubectl get pods --field-selector=status.phase=Failed command you can display all Pods in the Failed phase.
Failed means that all containers in the Pod have terminated, and at least one container has terminated in failure (see: Pod phase):
Failed - All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system.
In your example, both Pods are in the Running phase because at least one container is still running in each of these Pods.:
Running - The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.
You can check the current phase of Pods using the following command:
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
Let's check how this command works:
$ kubectl get pods
NAME READY STATUS
app-1 1/2 Error
app-2 0/1 Error
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
app-1 Running
app-2 Failed
As you can see, only the app-2 Pod is in the Failed phase. There is still one container running in the app-1 Pod, so this Pod is in the Running phase.
To list all pods with the Error status, you can simply use:
$ kubectl get pods -A | grep Error
default app-1 1/2 Error
default app-2 0/1 Error
Additionally, it's worth mentioning that you can check the state of all containers in Pods:
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[*].state}{"\n"}{end}'
app-1 {"terminated":{"containerID":"containerd://f208e2a1ff08c5ce2acf3a33da05603c1947107e398d2f5fbf6f35d8b273ac71","exitCode":2,"finishedAt":"2021-08-11T14:07:21Z","reason":"Error","startedAt":"2021-08-11T14:07:21Z"}} {"running":{"startedAt":"2021-08-11T14:07:21Z"}}
app-2 {"terminated":{"containerID":"containerd://7a66cbbf73985efaaf348ec2f7a14d8e5bf22f891bd655c4b64692005eb0439b","exitCode":2,"finishedAt":"2021-08-11T14:08:50Z","reason":"Error","startedAt":"2021-08-11T14:08:50Z"}}
You can simply grep the Error pods using the
kubectl get pods --all-namespces | grep Error
Remove all error pods from the cluster
kubectl delete pod `kubectl get pods --namespace <yournamespace> | awk '$3 == "Error" {print $1}'` --namespace <yournamespace>
Mostly Pod failures return explicit error states that can be observed in the status field
Error :
Your pod is crashed, it was able to schedule on node successfully but crashed after that. To debug it more you can use different methods or commands
kubectl describe pod <Pod name > -n <Namespace>
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#my-pod-is-crashing-or-otherwise-unhealthy
Here is an overkill go-template based attempt:
kubectl get pods -o go-template='{{range $index, $element := .items}}{{range .status.containerStatuses}}{{range .state }}{{if .reason }}{{if (eq .reason "Error") }}{{$element.metadata.name}} {{$element.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}{{end}}'
job1-stn45 default
My pod status:
k get pod
NAME READY STATUS RESTARTS AGE
foo 1/1 Running 1 2d11h
nginx-0 1/1 Running 3 5d10h
nginx-2 1/1 Running 3 5d10h
nginx-1 1/1 Running 3 5d10h
job1-stn45 0/1 Error 0 113m
update-test-27145740-82z7s 0/1 ImagePullBackOff 0 96m
update-test-27145500-7f2l9 0/1 ImagePullBackOff 0 5h36m

Scale down Kubernetes pods

I am using
kubectl scale --replicas=0 -f deployment.yaml
to stop all my running pods. Please let me know if there are better ways to bring down all running pods to Zero keeping configuration, deployments etc.. intact, so that I can scale up later as required.
You are doing the correct action; traditionally the scale verb is applied just to the resource name, as in kubectl scale deploy my-awesome-deployment --replicas=0, which removes the need to always point at the specific file that describes that deployment, but there's nothing wrong (that I know of) with using the file if that is more convenient for you.
The solution is pretty easy and straightforward
kubectl scale deploy -n <namespace> --replicas=0 --all
Here we go.
Scales down all deployments in a whole namespace:
kubectl get deploy -n <namespace> -o name | xargs -I % kubectl scale % --replicas=0 -n <namespace>
To scale up set --replicas=1 (or any other required number) accordingly
Use the following to scale down/up all deployments and stateful sets in the current namespace. Useful in development when switching projects.
kubectl scale statefulset,deployment --all --replicas=0
Add a namespace flag if needed
kubectl scale statefulset,deployment -n mynamespace --all --replicas=0
kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
app-gke 3/3 3 3 13m
kubectl scale deploy app-gke --replicas=5
deployment.extensions/app-gke scaled
kubectl get pods
NAME READY STATUS RESTARTS AGE
app-gke-7b768cd6d7-b25px 2/2 Running 0 11m
app-gke-7b768cd6d7-glj5v 0/2 ContainerCreating 0 4s
app-gke-7b768cd6d7-jdt6l 2/2 Running 0 11m
app-gke-7b768cd6d7-ktx87 2/2 Running 0 11m
app-gke-7b768cd6d7-qxpgl 0/2 ContainerCreating 0 4s
If you need more granularity with pipes or grep, here is another shell solution:
for i in $(kubectl get deployments | grep -v NAME | grep -v app | awk '{print $1}'); do kubectl scale --replicas=2 deploy $i; done
If you want generic patch:
namespace=devops-ci-dev
kubectl get deployment -n ${namespace} --no-headers| awk '{print $1}' | xargs -I elhay kubectl patch deployment -n ${namespace} -p '{"spec": {"replicas": 1}}' elhay
Change namespace=devops-ci-dev, to be your name space.
kubectl get svc | awk '{print $1}' | xargs kubectl scale deploy --replicas=0

How do you get a Kubernetes pod's name from its IP address?

How do I get a pod's name from its IP address? What's the magic incantation of kubectl + sed/awk/grep/etc regardless of where kubectl is invoked?
Example:
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
alpine-3835730047-ggn2v 1/1 Running 0 5d 10.22.19.69 ip-10-35-80-221.ec2.internal
get pod name by IP
kubectl get --all-namespaces --output json pods | jq '.items[] | select(.status.podIP=="10.22.19.69")' | jq .metadata.name
"alpine-3835730047-ggn2v"
get container name by IP
kubectl get --all-namespaces --output json pods | jq '.items[] | select(.status.podIP=="10.22.19.69")' | jq .spec.containers[].name
"alpine"
Can be done without additional tools, just kubectl is enough:
kubectl get pods -o custom-columns=:metadata.name --no-headers=true --field-selector status.podIP=<pod-ip-address-goes-here>
Another way to get pod name by ip address is like this:
$ kubectl get pods --all-namespaces -o wide | grep 10.2.6.181
jenkins jenkins-2-7d6d7fd99c-9xgkx 2/2 Running 3 12d 10.2.6.181 ip.ap-southeast-2.compute.internal <none>
In this example, the pod name is "jenkins-2-7d6d7fd99c-9xgkx" for ip address "10.2.6.181".

Can't delete pods in pending state?

[root#vpct-k8s-1 kubernetes]# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-ui-v2-ck0yw 0/1 Pending 0 1h
[root#vpct-k8s-1 kubernetes]# kubectl get rc --all-namespaces
NAMESPACE CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
kube-system kube-ui-v2 kube-ui gcr.io/google_containers/kube-ui:v2 k8s-app=kube-ui,version=v2 1 1h
Can't delete pods in pending state?
kubectl get ns
kubectl get pods --all-namespaces
kubectl get deployment -n (namespacename)
kubectl get deployments --all-namespaces
kubectl delete deployment (podname) -n (namespacename)
Try the below command
kubectl delete pod kube-ui-v2-ck0yw --grace-period=0 --force -n kube-system
To delete a pod in the pending state, simply delete the deployment file by using kubectl.
Please check the below command:
kubectl delete -f deployment-file-name.yaml
Depending on the number of replicas you specified while creating the cluster, you might be able to delete the pending pod but another pod will be recreated automatically. You can delete the pod by running this command:
$ ./cluster/kubectl.sh delete pod kube-ui-v2-ck0yw