"Ghost" kubernetes pod stuck in terminating - kubernetes

The situation
I have a kubernetes pod stuck in "Terminating" state that resists pod deletions
NAME READY STATUS RESTARTS AGE
...
funny-turtle-myservice-xxx-yyy 1/1 Terminating 1 11d
...
Where funny-turtle is the name of the helm release that have since been deleted.
What I have tried
try to delete the pod.
Output: pod "funny-turtle-myservice-xxx-yyy" deleted
Outcome: it still shows up in the same state.
- also tried with --force --grace-period=0, same outcome with extra warning
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
try to read the logs (kubectl logs ...).
Outcome: Error from server (NotFound): nodes "ip-xxx.yyy.compute.internal" not found
try to delete the kubernetes deployment.
but it does not exist.
So I assume this pod somehow got "disconnected" from the aws API, reasoning from the error message that kubectl logs printed.
I'll take any suggestions or guidance to explain what happened here and how I can get rid of it.
EDIT 1
Tried to see if the "ghost" node was still there (kubectl delete node ip-xxx.yyy.compute.internal) but it does not exist.

Try removing the finalizers from the pod:
kubectl patch pod funny-turtle-myservice-xxx-yyy -p '{"metadata":{"finalizers":null}}'

In my case, the solution proposed by the accepted answer did not work, it kept stuck in "Terminating" status. What did the trick for me was:
kubectl delete pods <pod> --grace-period=0 --force

The above solutions did not work in my case, except I didn't try restarting all the nodes.
The error state for my pod was as follows (extra lines omitted):
$ kubectl -n myns describe pod/mypod
Status: Terminating (lasts 41h)
Containers:
runner:
State: Waiting
Reason: ContainerCreating
Last State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was deleted.
The container used to be Running
Exit Code: 137
$ kubectl -n myns get pod/mypod -o json
"metadata": {
"deletionGracePeriodSeconds": 0,
"deletionTimestamp": "2022-06-07T22:17:20Z",
"finalizers": [
"actions.summerwind.dev/runner-pod"
],
I removed the entry under finalizers (leaving finalizers as empty array) and then the pod was finally gone.
$ kubectl -n myns edit pod/mypod
pod/mypod edited

In my case nothing worked, no logs, no delete, absolutely nothing. I had to restart all the nodes, then the situation cleared up, no more Terminating pods.

Related

using kubectl delete command to remove core-dns pod blocked / No activity

I found my coredns pod throw error: Readiness probe failed: Get http://172.30.224.7:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers) . I am delete pod using this command:
kubectl delete pod coredns-89764d78c-mbcbz -n kube-system
but the command keep waiting and nothing response,how to know the progress of deleting? this is output:
[root#ops001 ~]# kubectl delete pod coredns-89764d78c-mbcbz -n kube-system
pod "coredns-89764d78c-mbcbz" deleted
and the terminal hangs or blocked,when I use browser UI with using kubernetes dashboard the pod exits.how to force delete it? or fix it the right way?
You are deleting a pod which is monitored by deployment controller. That's why when you delete one of the pods, the controller create another to make sure the number of pods equal to the replica count. If you really want to delete the coredns[not recommended], delete the deployment instead of the pods.
$ kubectl delete deployment coredns -n kube-system
Answering another part of your question:
but the command keep waiting and nothing response,how to know the
progress of deleting? this is output:
[root#ops001 ~]# kubectl delete pod coredns-89764d78c-mbcbz -n kube-system
pod "coredns-89764d78c-mbcbz" deleted
and the terminal blocked...
When you're deleting a Pod and you want to see what's going on under the hood, you can additionally provide -v flag and specify the desired verbosity level e.g.:
kubectl delete pod coredns-89764d78c-mbcbz -n kube-system -v 8
If there is some issue with the deletion of specific Pod, it should tell you the details.
I totally agree with #P Ekambaram's comment:
if coredns is not started. you need to check logs and find out why it
is not getting started – P Ekambaram
You can always delete the whole coredns Deployment and re-deploy it again but generally you shouldn't do that. Looking at Pod logs:
kubectl logs coredns-89764d78c-mbcbz -n kube-system
should also tell you some details explaining why it doesn't work properly. I would say that deleting the whole coredns Deployment is a last resort command.

How to see logs of terminated pods

I am running selenium hubs and my pods are getting terminated frequently. I would like to look at the logs of the pods which are terminated. How to do it?
NAME READY STATUS RESTARTS AGE
chrome-75-0-0e5d3b3d-3580-49d1-bc25-3296fdb52666 0/2 Terminating 0 49s
chrome-75-0-29bea6df-1b1a-458c-ad10-701fe44bb478 0/2 Terminating 0 23s
chrome-75-0-8929d8c8-1f7b-4eba-96f2-918f7a0d77f5 0/2 ContainerCreating 0 7s
kubectl logs chrome-75-0-8929d8c8-1f7b-4eba-96f2-918f7a0d77f5
Error from server (NotFound): pods "chrome-75-0-8929d8c8-1f7b-4eba-96f2-918f7a0d77f5" not found
$ kubectl logs chrome-75-0-8929d8c8-1f7b-4eba-96f2-918f7a0d77f5 --previous
Error from server (NotFound): pods "chrome-75-0-8929d8c8-1f7b-4eba-96f2-918f7a0d77f5" not found
Running kubectl logs -p will fetch logs from existing resources at API level. This means that terminated pods' logs will be unavailable using this command.
As mentioned in other answers, the best way is to have your logs centralized via logging agents or directly pushing these logs into an external service.
Alternatively and given the logging architecture in Kubernetes, you might be able to fetch the logs directly from the log-rotate files in the node hosting the pods. However, this option might depend on the Kubernetes implementation as log files might be deleted when the pod eviction is triggered.
From kubernetes docs:
Examples
# Return snapshot logs from pod nginx with only one container
kubectl logs nginx
# Return snapshot of previous terminated ruby container logs from pod web-1
kubectl logs -p -c ruby web-1
# Begin streaming the logs of the ruby container in pod web-1
kubectl logs -f -c ruby web-1
# Display only the most recent 20 lines of output in pod nginx
kubectl logs --tail=20 nginx
# Show all logs from pod nginx written in the last hour
kubectl logs --since=1h nginx
Options
-c, --container="": Print the logs of this container
-f, --follow[=false]: Specify if the logs should be streamed.
--limit-bytes=0: Maximum bytes of logs to return. Defaults to no limit.
-p, --previous[=false]: If true, print the logs for the previous instance of the container in a pod if it exists.
--since=0: Only return logs newer than a relative duration like 5s, 2m, or 3h. Defaults to all logs. Only one of since-time / since may be used.
--since-time="": Only return logs after a specific date (RFC3339). Defaults to all logs. Only one of since-time / since may be used.
--tail=-1: Lines of recent log file to display. Defaults to -1, showing all log lines.
--timestamps[=false]: Include timestamps on each line in the log output
That is just a simple way of doing it. But in production , I would send all the logs of all the pods to a centeral Log management system such as ELK by deploying a log sending client on the kubernetes cluster as daemon-set such as fluentbit , that will keep sending logs to ELk where I am able to filter things base don the namespace , pod , container , or any other label.
kubectl get event -o custom-columns=NAME:.metadata.name -n <namespace> --no-headers
use the above command to get the list of terminated pods in your namespace and use
kubectl logs -f pod-name -n <namespace> -p
to see the terminated pod's logs
P.S.: The above command to fetch terminated pod details will give you the pods which were terminated 1 hour ag
You can try --previous flag on the logs
i.e.
kubectl --namespace namespace logs pod_name --previous
This will show the logs from dump pod logs (stdout) for a previous container accroding to kubernetes docs
A combination of a flag --previous and a container name for a container that was terminated with reason: CrashLoopBackOff:
First find a pod in a namespace. Its status is CrashLoopBackOff:
kubectl get pods -n namespace_name
NAME READY STATUS RESTARTS AGE
crashing_pod_name 0/9 Init:CrashLoopBackOff 17 (105s ago) 63m
Then use describe to find out a name of a container that failed:
kubectl describe pod -n namespace_name crashing_pod_name
Find a name of a container that is terminated with a reason CrashLoopBackOff
Then list logs:
kubectl logs -n namespace_name crashing_pod_name -c failing_container_name --previous

Pod not terminating

We have a Rancher Kubernetes cluster where sometimes the pods get stuck in terminating status when we try to delete the corresponding deployment, as shown below.
$ kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
...
storage-manager-deployment 1 0 0 0 1d
...
$ kubectl delete deployments storage-manager-deployment
kubectl delete deployments storage-manager-deployment
deployment.extensions "storage-manager-deployment" deleted
C-c C-c^C
$ kubectl get po
NAME READY STATUS RESTARTS AGE
...
storage-manager-deployment-6d56967cdd-7bgv5 0/1 Terminating 0 23h
...
$ kubectl delete pods storage-manager-deployment-6d56967cdd-7bgv5 --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "storage-manager-deployment-6d56967cdd-7bgv5" force deleted
C-c C-c^C
Both the delete commands (for the deployment and the pod) get stuck and need to be stopped manually.
We have tried both
kubectl delete pod NAME --grace-period=0 --force
and
kubectl delete pod NAME --now
without any luck.
We have also set fs.may_detach_mounts=1, so it seems that all the similar questions already on StackOverflow don't apply to our problem.
If we check the node on which the incriminated pod runs, it does not appear in the docker ps list.
Any suggestion?
Thanks
Check the pod spec for an array: 'finalizers'
finalizers:
- cattle-system
If this exists, remove it, and the pod will terminate.

How do I check if a Kubernetes pod was killed for OOM or DEADLINE EXCEEDED?

I have some previously run pods that I think were killed by Kubernetes for OOM or DEADLINE EXCEEDED, what's the most reliable way to confirm that? Especially if the pods weren't recent.
If the pods are still showing up when you type kubectl get pods -a then you get type the following kubectl describe pod PODNAME and look at the reason for termination. The output will look similar to the following (I have extracted parts of the output that are relevant to this discussion):
Containers:
somename:
Container ID: docker://5f0d9e4c8e0510189f5f209cb09de27b7b114032cc94db0130a9edca59560c11
Image: ubuntu:latest
...
State: Terminated
Reason: Completed
Exit Code: 0
In the sample output you will, my pod's terminated reason is Completed but you will see other reasons such as OOMKilled and others over there.
If the pod has already been deleted, you can also check kubernetes events and see what's going on:
$ kubectl get events
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
59m 59m 1 my-pod-7477dc76c5-p49k4 Pod spec.containers{my-service} Normal Killing kubelet Killing container with id docker://my-service:Need to kill Pod

Pods stuck in Terminating status

I tried to delete a ReplicationController with 12 pods and I could see that some of the pods are stuck in Terminating status.
My Kubernetes cluster consists of one control plane node and three worker nodes installed on Ubuntu virtual machines.
What could be the reason for this issue?
NAME READY STATUS RESTARTS AGE
pod-186o2 1/1 Terminating 0 2h
pod-4b6qc 1/1 Terminating 0 2h
pod-8xl86 1/1 Terminating 0 1h
pod-d6htc 1/1 Terminating 0 1h
pod-vlzov 1/1 Terminating 0 1h
You can use following command to delete the POD forcefully.
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
The original question is "What could be the reason for this issue?" and the answer is discussed at https://github.com/kubernetes/kubernetes/issues/51835 & https://github.com/kubernetes/kubernetes/issues/65569 & see https://www.bountysource.com/issues/33241128-unable-to-remove-a-stopped-container-device-or-resource-busy
Its caused by docker mount leaking into some other namespace.
You can logon to pod host to investigate.
minikube ssh
docker container ps | grep <id>
docker container stop <id>
Force delete the pod:
kubectl delete pod --grace-period=0 --force --namespace <NAMESPACE> <PODNAME>
The --force flag is mandatory.
I found this command more straightforward:
for p in $(kubectl get pods | grep Terminating | awk '{print $1}'); do kubectl delete pod $p --grace-period=0 --force;done
It will delete all pods in Terminating status in default namespace.
Delete the finalizers block from resource (pod,deployment,ds etc...) yaml:
"finalizers": [
"foregroundDeletion"
]
In my case the --force option didn't quite work. I could still see the pod ! It was stuck in Terminating/Unknown mode. So after running
kubectl -n redis delete pods <pod> --grace-period=0 --force
I ran
kubectl -n redis patch pod <pod> -p '{"metadata":{"finalizers":null}}'
Practical answer -- you can always delete a terminating pod by running:
kubectl delete pod NAME --grace-period=0
Historical answer -- There was an issue in version 1.1 where sometimes pods get stranded in the Terminating state if their nodes are uncleanly removed from the cluster.
I stumbled upon this recently to free up resource in my cluster. here is the command to delete them all.
kubectl get pods --all-namespaces | grep Terminating | while read line; do
pod_name=$(echo $line | awk '{print $2}' ) \
name_space=$(echo $line | awk '{print $1}' ); \
kubectl delete pods $pod_name -n $name_space --grace-period=0 --force
done
hope this help someone who read this
Force delete ALL pods in namespace:
kubectl delete pods --all -n <namespace> --grace-period 0 --force
If --grace-period=0 is not working then you can do:
kubectl delete pods <pod> --grace-period=0 --force
I stumbled upon this recently when removing rook ceph namespace - it got stuck in Terminating state.
The only thing that helped was removing kubernetes finalizer by directly calling k8s api with curl as suggested here.
kubectl get namespace rook-ceph -o json > tmp.json
delete kubernetes finalizer in tmp.json (leave empty array "finalizers": [])
run kubectl proxy in another terminal for auth purposes and run following curl request to returned port
curl -k -H "Content-Type: application/json" -X PUT --data-binary #tmp.json 127.0.0.1:8001/k8s/clusters/c-mzplp/api/v1/namespaces/rook-ceph/finalize
namespace is gone
Detailed rook ceph teardown here.
I used this command to delete the pods
kubectl delete pod --grace-period=0 --force --namespace <NAMESPACE> <PODNAME>
But when I tried run another pod, it didn't work, it was stuck in "Pending" state, it looks like the node itself was stuck.
For me, the solution was to recreate the node. I simply went to GKE console and deleted the node from the cluster and so GKE started another.
After that, everything started to work normally again.
I had to same issue in a production Kubernetes cluster.
A pod was stuck in Terminating phase for a while:
pod-issuing mypod-issuing-0 1/1 Terminating 0 27h
I tried checking the logs and events using the command:
kubectl describe pod mypod-issuing-0 --namespace pod-issuing
kubectl logs mypod-issuing-0 --namespace pod-issuing
but none was available to view
How I fixed it:
I ran the command below to forcefully delete the pod:
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
This deleted the pod immediately and started creating a new one. However, I ran into the error below when another pod was being created:
Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data mypod-issuing-token-5swgg aws-iam-token]: timed out waiting for the condition
I had to wait for 7 to 10 minutes for the volume to become detached from the previous pod I deleted so that it can become available for this new pod I was creating.
For my case, I don't like workaround. So there are steps :
k get pod -o wide -> this will show which Node is running the pod
k get nodes -> Check status of that node... I got it NotReady
I went and I fixed that node. For my case, it's just restart kubelet :
ssh that-node -> run swapoff -a && systemctl restart kubelet (Or systemctl restart k3s in case of k3s | or systemctl restart crio in other cases like OCP 4.x (k8s <1.23) )
Now deletion of pod should work without forcing the Poor pod.
Please try below command:
kubectl patch pod <pod>-p '{"metadata":{"finalizers":null}}'
Before doing a force deletion i would first do some checks.
1- node state: get the node name where your node is running, you can see this with the following command:
"kubectl -n YOUR_NAMESPACE describe pod YOUR_PODNAME"
Under the "Node" label you will see the node name.
With that you can do:
kubectl describe node NODE_NAME
Check the "conditions" field if you see anything strange.
If this is fine then you can move to the step, redo:
"kubectl -n YOUR_NAMESPACE describe pod YOUR_PODNAME"
Check the reason why it is hanging, you can find this under the "Events" section.
I say this because you might need to take preliminary actions before force deleting the pod, force deleting the pod only deletes the pod itself not the underlying resource (a stuck docker container for example).
I'd not recommend force deleting pods unless container already exited.
Verify kubelet logs to see what is causing the issue "journalctl -u kubelet"
Verify docker logs: journalctl -u docker.service
Check if pod's volume mount points still exist and if anyone holds lock on it.
Verify if host is out of memory or disk
you can use awk :
kubectl get pods --all-namespaces | awk '{if ($4=="Terminating") print "oc delete pod " $2 " -n " $1 " --force --grace-period=0 ";}' | sh
One reason WHY this happens can be turning off a node (without draining it). Fix in this case is to turn on the node again; then termination should succeed.
My pods stuck in 'Terminating', even after I tried to restart docker & restart server. Resolved after edit the pod & delete items below 'finalizer'
$ kubectl -n mynamespace edit pod/my-pod-name
I am going to try the most extense answer, because none of the above are wrong, but they do not work in all case scenarios.
The usual way to put an end to a terminating pod is:
kubectl delete pod -n ${namespace} ${pod} --grace-period=0
But you may need to remove finalizers that could be preventing the POD from stoppoing using:
kubectl -n ${namespace} patch pod ${pod} -p '{"metadata":{"finalizers":null}}'
If none of that works, you can remove the pod from etcd with etcdctl:
# Define variables
ETCDCTL_API=3
certs-path=${HOME}/.certs/e
etcd-cert-path=${certs-path}/etcd.crt
etcd-key-path=${certs-path}/etcd.key
etcd-cacert-path=${certs-path}/etcd.ca
etcd-endpoints=https://127.0.0.1:2379
namespace=myns
pod=mypod
# Call etcdctl to remove the pod
etcdctl del \
--endpoints=${etcd-endpoints}\
--cert ${etcd-cert-path} \
--key ${etcd-client-key}\
--cacert ${etcd-cacert-path} \
--prefix \
/registry/pods/${namespace}/${pod}
This last case should be used as last resource, in my case I ended having to do it due to a deadlock that prevented calico from starting in the node due to Pods under terminating status. Those pods won't be removed until calico is up, but they have reserved enough CPU to avoid calico, or any other pod, from Initializing.
Following command with awk and xargs can be used along with --grace-period=0 --force to delete all the Pods in Terminating state.
kubectl get pods|grep -i terminating | awk '{print $1}' | xargs kubectl delete --grace-period=0 --force pod
go templates will work without awk, for me it works without --grace-period=0 --force but, add it if you like
this will output the command to delete the Terminated pods.
kubectl get pods --all-namespaces -otemplate='{{ range .items }}{{ if eq .status.reason "Terminated" }}{{printf "kubectl delete pod -n %v %v\n" .metadata.namespace .metadata.name}}{{end}}{{end}}'
if you are happy with the output, you cat add | sh - to execute it.
as follow:
kubectl get pods --all-namespaces -otemplate='{{ range .items }}{{ if eq .status.reason "Terminated" }}{{printf "kubectl delete pod -n %v %v\n" .metadata.namespace .metadata.name}}{{end}}{{end}}' |sh -
for me below command has resolved the issue
oc patch pvc pvc_name -p '{"metadata":{"finalizers":null}}