Pod not terminating - kubernetes

We have a Rancher Kubernetes cluster where sometimes the pods get stuck in terminating status when we try to delete the corresponding deployment, as shown below.
$ kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
...
storage-manager-deployment 1 0 0 0 1d
...
$ kubectl delete deployments storage-manager-deployment
kubectl delete deployments storage-manager-deployment
deployment.extensions "storage-manager-deployment" deleted
C-c C-c^C
$ kubectl get po
NAME READY STATUS RESTARTS AGE
...
storage-manager-deployment-6d56967cdd-7bgv5 0/1 Terminating 0 23h
...
$ kubectl delete pods storage-manager-deployment-6d56967cdd-7bgv5 --grace-period=0 --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "storage-manager-deployment-6d56967cdd-7bgv5" force deleted
C-c C-c^C
Both the delete commands (for the deployment and the pod) get stuck and need to be stopped manually.
We have tried both
kubectl delete pod NAME --grace-period=0 --force
and
kubectl delete pod NAME --now
without any luck.
We have also set fs.may_detach_mounts=1, so it seems that all the similar questions already on StackOverflow don't apply to our problem.
If we check the node on which the incriminated pod runs, it does not appear in the docker ps list.
Any suggestion?
Thanks

Check the pod spec for an array: 'finalizers'
finalizers:
- cattle-system
If this exists, remove it, and the pod will terminate.

Related

Does `kubectl log deployment/name` get all pods or just one pod?

I need to see the logs of all the pods in a deployment with N worker pods
When I do kubectl logs deployment/name --tail=0 --follow the command syntax makes me assume that it will tail all pods in the deployment
However when I go to process I don't see any output as expected until I manually view the logs for all N pods in the deployment
Does kubectl log deployment/name get all pods or just one pod?
Yes, if you run kubectl logs with a deployment, it will return the logs of only one pod from the deployment.
However, you can accomplish what you are trying to achieve by using the -l flag to return the logs of all pods matching a label.
For example, let's say you create a deployment using:
kubectl create deployment my-dep --image=nginx --replicas=3
Each of the pods gets a label app=my-dep, as seen here:
$ kubectl get pods -l app=my-dep
NAME READY STATUS RESTARTS AGE
my-dep-6d4ddbf4f7-8jnsw 1/1 Running 0 6m36s
my-dep-6d4ddbf4f7-9jd7g 1/1 Running 0 6m36s
my-dep-6d4ddbf4f7-pqx2w 1/1 Running 0 6m36s
So, if you want to get the combined logs of all pods in this deployment you can use this command:
kubectl logs -l app=my-dep
only one pod seems to be the answer.
i went here How do I get logs from all pods of a Kubernetes replication controller? and it seems that the command kubectl logs deployment/name only shows one pod of N
also when you do execute the kubectl logs on a deployment it does say it only print to console that it is for one pod (not all the pods)

using kubectl delete command to remove core-dns pod blocked / No activity

I found my coredns pod throw error: Readiness probe failed: Get http://172.30.224.7:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers) . I am delete pod using this command:
kubectl delete pod coredns-89764d78c-mbcbz -n kube-system
but the command keep waiting and nothing response,how to know the progress of deleting? this is output:
[root#ops001 ~]# kubectl delete pod coredns-89764d78c-mbcbz -n kube-system
pod "coredns-89764d78c-mbcbz" deleted
and the terminal hangs or blocked,when I use browser UI with using kubernetes dashboard the pod exits.how to force delete it? or fix it the right way?
You are deleting a pod which is monitored by deployment controller. That's why when you delete one of the pods, the controller create another to make sure the number of pods equal to the replica count. If you really want to delete the coredns[not recommended], delete the deployment instead of the pods.
$ kubectl delete deployment coredns -n kube-system
Answering another part of your question:
but the command keep waiting and nothing response,how to know the
progress of deleting? this is output:
[root#ops001 ~]# kubectl delete pod coredns-89764d78c-mbcbz -n kube-system
pod "coredns-89764d78c-mbcbz" deleted
and the terminal blocked...
When you're deleting a Pod and you want to see what's going on under the hood, you can additionally provide -v flag and specify the desired verbosity level e.g.:
kubectl delete pod coredns-89764d78c-mbcbz -n kube-system -v 8
If there is some issue with the deletion of specific Pod, it should tell you the details.
I totally agree with #P Ekambaram's comment:
if coredns is not started. you need to check logs and find out why it
is not getting started – P Ekambaram
You can always delete the whole coredns Deployment and re-deploy it again but generally you shouldn't do that. Looking at Pod logs:
kubectl logs coredns-89764d78c-mbcbz -n kube-system
should also tell you some details explaining why it doesn't work properly. I would say that deleting the whole coredns Deployment is a last resort command.

"Ghost" kubernetes pod stuck in terminating

The situation
I have a kubernetes pod stuck in "Terminating" state that resists pod deletions
NAME READY STATUS RESTARTS AGE
...
funny-turtle-myservice-xxx-yyy 1/1 Terminating 1 11d
...
Where funny-turtle is the name of the helm release that have since been deleted.
What I have tried
try to delete the pod.
Output: pod "funny-turtle-myservice-xxx-yyy" deleted
Outcome: it still shows up in the same state.
- also tried with --force --grace-period=0, same outcome with extra warning
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
try to read the logs (kubectl logs ...).
Outcome: Error from server (NotFound): nodes "ip-xxx.yyy.compute.internal" not found
try to delete the kubernetes deployment.
but it does not exist.
So I assume this pod somehow got "disconnected" from the aws API, reasoning from the error message that kubectl logs printed.
I'll take any suggestions or guidance to explain what happened here and how I can get rid of it.
EDIT 1
Tried to see if the "ghost" node was still there (kubectl delete node ip-xxx.yyy.compute.internal) but it does not exist.
Try removing the finalizers from the pod:
kubectl patch pod funny-turtle-myservice-xxx-yyy -p '{"metadata":{"finalizers":null}}'
In my case, the solution proposed by the accepted answer did not work, it kept stuck in "Terminating" status. What did the trick for me was:
kubectl delete pods <pod> --grace-period=0 --force
The above solutions did not work in my case, except I didn't try restarting all the nodes.
The error state for my pod was as follows (extra lines omitted):
$ kubectl -n myns describe pod/mypod
Status: Terminating (lasts 41h)
Containers:
runner:
State: Waiting
Reason: ContainerCreating
Last State: Terminated
Reason: ContainerStatusUnknown
Message: The container could not be located when the pod was deleted.
The container used to be Running
Exit Code: 137
$ kubectl -n myns get pod/mypod -o json
"metadata": {
"deletionGracePeriodSeconds": 0,
"deletionTimestamp": "2022-06-07T22:17:20Z",
"finalizers": [
"actions.summerwind.dev/runner-pod"
],
I removed the entry under finalizers (leaving finalizers as empty array) and then the pod was finally gone.
$ kubectl -n myns edit pod/mypod
pod/mypod edited
In my case nothing worked, no logs, no delete, absolutely nothing. I had to restart all the nodes, then the situation cleared up, no more Terminating pods.

Kubernetes pod gets recreated when deleted

I have started pods with command
$ kubectl run busybox \
--image=busybox \
--restart=Never \
--tty \
-i \
--generator=run-pod/v1
Something went wrong, and now I can't delete this Pod.
I tried using the methods described below but the Pod keeps being recreated.
$ kubectl delete pods busybox-na3tm
pod "busybox-na3tm" deleted
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox-vlzh3 0/1 ContainerCreating 0 14s
$ kubectl delete pod busybox-vlzh3 --grace-period=0
$ kubectl delete pods --all
pod "busybox-131cq" deleted
pod "busybox-136x9" deleted
pod "busybox-13f8a" deleted
pod "busybox-13svg" deleted
pod "busybox-1465m" deleted
pod "busybox-14uz1" deleted
pod "busybox-15raj" deleted
pod "busybox-160to" deleted
pod "busybox-16191" deleted
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox-c9rnx 0/1 RunContainerError 0 23s
You need to delete the deployment, which should in turn delete the pods and the replica sets https://github.com/kubernetes/kubernetes/issues/24137
To list all deployments:
kubectl get deployments --all-namespaces
Then to delete the deployment:
kubectl delete -n NAMESPACE deployment DEPLOYMENT
Where NAMESPACE is the namespace it's in, and DEPLOYMENT is the name of the deployment. If NAMESPACE is default, leave off the -n option altogether.
In some cases it could also be running due to a job or daemonset.
Check the following and run their appropriate delete command.
kubectl get jobs
kubectl get daemonsets.app --all-namespaces
kubectl get daemonsets.extensions --all-namespaces
Instead of trying to figure out whether it is a deployment, deamonset, statefulset... or what (in my case it was a replication controller that kept spanning new pods :)
In order to determine what it was that kept spanning up the image I got all the resources with this command:
kubectl get all
Of course you could also get all resources from all namespaces:
kubectl get all --all-namespaces
or define the namespace you would like to inspect:
kubectl get all -n NAMESPACE_NAME
Once I saw that the replication controller was responsible for my trouble I deleted it:
kubectl delete replicationcontroller/CONTROLLER_NAME
If your pod has name like name-xxx-yyy, it could be controlled by a replicasets.apps named name-xxx, you should delete that replicaset first before deleting the pod:
kubectl delete replicasets.apps name-xxx
Obviously something is respawning the pod. While a lot of the other answers have you looking at everything (replica sets, jobs, deployments, stateful sets, ...) to find what may be respawning the pod, you can instead just look at the pod to see what spawned it. For example do:
$ kubectl describe pod $mypod | grep 'Controlled By:'
Controlled By: ReplicaSet/foobar
This tells you exactly what created the pod. You can then go and delete that.
Look out for stateful sets as well
kubectl get sts --all-namespaces
to delete all the stateful sets in a namespace
kubectl --namespace <yournamespace> delete sts --all
to delete them one by one
kubectl --namespace ag1 delete sts mssql1
kubectl --namespace ag1 delete sts mssql2
kubectl --namespace ag1 delete sts mssql3
This will provide information about all the pods,deployments, services and jobs
in the namespace.
kubectl get pods,services,deployments,jobs
pods can either be created by deployments or jobs
kubectl delete job [job_name]
kubectl delete deployment [deployment_name]
If you delete the deployment or job then restart of the pods can be stopped.
Many answers here tells to delete a specific k8s object, but you can delete multiple objects at once, instead of one by one:
kubectl delete deployments,jobs,services,pods --all -n <namespace>
In my case, I'm running OpenShift cluster with OLM - Operator Lifecycle Manager. OLM is the one who controls the deployment, so when I deleted the deployment, it was not sufficient to stop the pods from restarting.
Only when I deleted OLM and its subscription, the deployment, services and pods were gone.
First list all k8s objects in your namespace:
$ kubectl get all -n openshift-submariner
NAME READY STATUS RESTARTS AGE
pod/submariner-operator-847f545595-jwv27 1/1 Running 0 8d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/submariner-operator-metrics ClusterIP 101.34.190.249 <none> 8383/TCP 8d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/submariner-operator 1/1 1 1 8d
NAME DESIRED CURRENT READY AGE
replicaset.apps/submariner-operator-847f545595 1 1 1 8d
OLM is not listed with get all, so I search for it specifically:
$ kubectl get olm -n openshift-submariner
NAME AGE
operatorgroup.operators.coreos.com/openshift-submariner 8d
NAME DISPLAY VERSION
clusterserviceversion.operators.coreos.com/submariner-operator Submariner 0.0.1
Now delete all objects, including OLMs, subscriptions, deployments, replica-sets, etc:
$ kubectl delete olm,svc,rs,rc,subs,deploy,jobs,pods --all -n openshift-submariner
operatorgroup.operators.coreos.com "openshift-submariner" deleted
clusterserviceversion.operators.coreos.com "submariner-operator" deleted
deployment.extensions "submariner-operator" deleted
subscription.operators.coreos.com "submariner" deleted
service "submariner-operator-metrics" deleted
replicaset.extensions "submariner-operator-847f545595" deleted
pod "submariner-operator-847f545595-jwv27" deleted
List objects again - all gone:
$ kubectl get all -n openshift-submariner
No resources found.
$ kubectl get olm -n openshift-submariner
No resources found.
After taking an interactive tutorial I ended up with a bunch of pods, services, deployments:
me#pooh ~ > kubectl get pods,services
NAME READY STATUS RESTARTS AGE
pod/kubernetes-bootcamp-5c69669756-lzft5 1/1 Running 0 43s
pod/kubernetes-bootcamp-5c69669756-n947m 1/1 Running 0 43s
pod/kubernetes-bootcamp-5c69669756-s2jhl 1/1 Running 0 43s
pod/kubernetes-bootcamp-5c69669756-v8vd4 1/1 Running 0 43s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 37s
me#pooh ~ > kubectl get deployments --all-namespaces
NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
default kubernetes-bootcamp 4 4 4 4 1h
docker compose 1 1 1 1 1d
docker compose-api 1 1 1 1 1d
kube-system kube-dns 1 1 1 1 1d
To clean up everything, delete --all worked fine:
me#pooh ~ > kubectl delete pods,services,deployments --all
pod "kubernetes-bootcamp-5c69669756-lzft5" deleted
pod "kubernetes-bootcamp-5c69669756-n947m" deleted
pod "kubernetes-bootcamp-5c69669756-s2jhl" deleted
pod "kubernetes-bootcamp-5c69669756-v8vd4" deleted
service "kubernetes" deleted
deployment.extensions "kubernetes-bootcamp" deleted
That left me with (what I think is) an empty Kubernetes cluster:
me#pooh ~ > kubectl get pods,services,deployments
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 8m
In some cases the pods will still not go away even when deleting the deployment. In that case to force delete them you can run the below command.
kubectl delete pods podname --grace-period=0 --force
When the pod is recreating automatically even after the deletion of the pod manually, then those pods have been created using the Deployment.
When you create a deployment, it automatically creates ReplicaSet and Pods. Depending upon how many replicas of your pod you mentioned in the deployment script, it will create those number of pods initially.
When you try to delete any pod manually, it will automatically create those pod again.
Yes, sometimes you need to delete the pods with force. But in this case force command doesn’t work.
Instead of removing NS you can try removing replicaSet
kubectl get rs --all-namespaces
Then delete the replicaSet
kubectl delete rs your_app_name
The root cause for the question asked was the deployment/job/replicasets spec attribute strategy->type which defines what should happen when the pod will be destroyed (either implicitly or explicitly). In my case, it was Recreate.
As per #nomad's answer, deleting the deployment/job/replicasets is the simple fix to avoid experimenting with deadly combos before messing up the cluster as a novice user.
Try the following commands to understand the behind the scene actions before jumping into debugging :
kubectl get all -A -o name
kubectl get events -A | grep <pod-name>
In my case I deployed via a YAML file like kubectl apply -f deployment.yaml and the solution appears to be to delete via kubectl delete -f deployment.yaml
Firstly list the deployments
kubectl get deployments
After that delete the deployment
kubectl delete deployment <deployment_name>
If you have a job that continues running, you need to search the job and delete it:
kubectl get job --all-namespaces | grep <name>
and
kubectl delete job <job-name>
You can do kubectl get replicasets check for old deployment based on age or time
Delete old deployment based on time if you want to delete same current running pod of application
kubectl delete replicasets <Name of replicaset>
I also faced the issue, I have used below command to delete deployment.
kubectl delete deployments DEPLOYMENT_NAME
but still pods was recreating, So I crossed check the Replica Set by using below command
kubectl get rs
then edit the replicaset to 1 to 0
kubectl edit rs REPICASET_NAME
With deployments that have stateful sets (or services, jobs, etc.) you can use this command:
This command terminates anything that runs in the specified <NAMESPACE>
kubectl -n <NAMESPACE> delete replicasets,deployments,jobs,service,pods,statefulsets --all
And forceful
kubectl -n <NAMESPACE> delete replicasets,deployments,jobs,service,pods,statefulsets --all --cascade=true --grace-period=0 --force
There is basically two ways to remove PODS
kubectl scale --replicas=0 deploy name_of_deployment.
This will set the number of replica to 0 and hence it will not restart the pods again.
Use helm to uninstall the chart which you have implemented in your pipeline.
Do not delete the deployment directly, instead use helm to uninstall the chart which will remove all objects it created.
The fastest solution for me was installing Lens IDE and removing the service under de DEPLOYMENTS tab. Just delete from this tab and the replica will be deleted too.
Best regards
Kubernetes always works in the format like:
deployments >>> replicasets >>> pods
first edit deployment with 0 replicas and then scale deployment with desired replicas(run below command).You will see new replicaset has been created and pods will also run with desired count.
*
IN-Linux:~ anuragmanikkame$ kubectl scale deploy tomcat -n
dev-namespace --replicas=2 deployment.extensions/tomcat scaled
I experienced a similar problem: after deleting the deployment (kubectl delete deploy <name>), the pods kept "Running" and where automatically re-created after deletion (kubectl delete po <name>).
It turned out that the associated replica set was not deleted automatically for some reason, and after deleting that (kubectl delete rs <name>), it was possible to delete the pods.
This has happened to me with some broken 'helm' installs. You might have a bit of a messed up deployment. If none of the previous suggestions work, look for a daemonset and delete that.
eg
kubectl get daemonset --namespace
then delete daemonset
kubectl delete daemonset --namespace <NAMESPACE> --all --force
then try to delete the pods.
kubectl delete pod --namespace <NAMESPACE> --all --force
Check if pods are gone.
kubectl get pods --all-namespaces
In my case I use these below
kubectl get all --all-namespaces
kubectl delete deployment statefulset-deploymentnament(choose your deployment name)
kubectl delete sts -n default(choose your namespace) --all
kubectl get pods --all-namespaces
Problem got resolved

Pods stuck in Terminating status

I tried to delete a ReplicationController with 12 pods and I could see that some of the pods are stuck in Terminating status.
My Kubernetes cluster consists of one control plane node and three worker nodes installed on Ubuntu virtual machines.
What could be the reason for this issue?
NAME READY STATUS RESTARTS AGE
pod-186o2 1/1 Terminating 0 2h
pod-4b6qc 1/1 Terminating 0 2h
pod-8xl86 1/1 Terminating 0 1h
pod-d6htc 1/1 Terminating 0 1h
pod-vlzov 1/1 Terminating 0 1h
You can use following command to delete the POD forcefully.
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
The original question is "What could be the reason for this issue?" and the answer is discussed at https://github.com/kubernetes/kubernetes/issues/51835 & https://github.com/kubernetes/kubernetes/issues/65569 & see https://www.bountysource.com/issues/33241128-unable-to-remove-a-stopped-container-device-or-resource-busy
Its caused by docker mount leaking into some other namespace.
You can logon to pod host to investigate.
minikube ssh
docker container ps | grep <id>
docker container stop <id>
Force delete the pod:
kubectl delete pod --grace-period=0 --force --namespace <NAMESPACE> <PODNAME>
The --force flag is mandatory.
I found this command more straightforward:
for p in $(kubectl get pods | grep Terminating | awk '{print $1}'); do kubectl delete pod $p --grace-period=0 --force;done
It will delete all pods in Terminating status in default namespace.
Delete the finalizers block from resource (pod,deployment,ds etc...) yaml:
"finalizers": [
"foregroundDeletion"
]
In my case the --force option didn't quite work. I could still see the pod ! It was stuck in Terminating/Unknown mode. So after running
kubectl -n redis delete pods <pod> --grace-period=0 --force
I ran
kubectl -n redis patch pod <pod> -p '{"metadata":{"finalizers":null}}'
Practical answer -- you can always delete a terminating pod by running:
kubectl delete pod NAME --grace-period=0
Historical answer -- There was an issue in version 1.1 where sometimes pods get stranded in the Terminating state if their nodes are uncleanly removed from the cluster.
I stumbled upon this recently to free up resource in my cluster. here is the command to delete them all.
kubectl get pods --all-namespaces | grep Terminating | while read line; do
pod_name=$(echo $line | awk '{print $2}' ) \
name_space=$(echo $line | awk '{print $1}' ); \
kubectl delete pods $pod_name -n $name_space --grace-period=0 --force
done
hope this help someone who read this
Force delete ALL pods in namespace:
kubectl delete pods --all -n <namespace> --grace-period 0 --force
If --grace-period=0 is not working then you can do:
kubectl delete pods <pod> --grace-period=0 --force
I stumbled upon this recently when removing rook ceph namespace - it got stuck in Terminating state.
The only thing that helped was removing kubernetes finalizer by directly calling k8s api with curl as suggested here.
kubectl get namespace rook-ceph -o json > tmp.json
delete kubernetes finalizer in tmp.json (leave empty array "finalizers": [])
run kubectl proxy in another terminal for auth purposes and run following curl request to returned port
curl -k -H "Content-Type: application/json" -X PUT --data-binary #tmp.json 127.0.0.1:8001/k8s/clusters/c-mzplp/api/v1/namespaces/rook-ceph/finalize
namespace is gone
Detailed rook ceph teardown here.
I used this command to delete the pods
kubectl delete pod --grace-period=0 --force --namespace <NAMESPACE> <PODNAME>
But when I tried run another pod, it didn't work, it was stuck in "Pending" state, it looks like the node itself was stuck.
For me, the solution was to recreate the node. I simply went to GKE console and deleted the node from the cluster and so GKE started another.
After that, everything started to work normally again.
I had to same issue in a production Kubernetes cluster.
A pod was stuck in Terminating phase for a while:
pod-issuing mypod-issuing-0 1/1 Terminating 0 27h
I tried checking the logs and events using the command:
kubectl describe pod mypod-issuing-0 --namespace pod-issuing
kubectl logs mypod-issuing-0 --namespace pod-issuing
but none was available to view
How I fixed it:
I ran the command below to forcefully delete the pod:
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
This deleted the pod immediately and started creating a new one. However, I ran into the error below when another pod was being created:
Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data mypod-issuing-token-5swgg aws-iam-token]: timed out waiting for the condition
I had to wait for 7 to 10 minutes for the volume to become detached from the previous pod I deleted so that it can become available for this new pod I was creating.
For my case, I don't like workaround. So there are steps :
k get pod -o wide -> this will show which Node is running the pod
k get nodes -> Check status of that node... I got it NotReady
I went and I fixed that node. For my case, it's just restart kubelet :
ssh that-node -> run swapoff -a && systemctl restart kubelet (Or systemctl restart k3s in case of k3s | or systemctl restart crio in other cases like OCP 4.x (k8s <1.23) )
Now deletion of pod should work without forcing the Poor pod.
Please try below command:
kubectl patch pod <pod>-p '{"metadata":{"finalizers":null}}'
Before doing a force deletion i would first do some checks.
1- node state: get the node name where your node is running, you can see this with the following command:
"kubectl -n YOUR_NAMESPACE describe pod YOUR_PODNAME"
Under the "Node" label you will see the node name.
With that you can do:
kubectl describe node NODE_NAME
Check the "conditions" field if you see anything strange.
If this is fine then you can move to the step, redo:
"kubectl -n YOUR_NAMESPACE describe pod YOUR_PODNAME"
Check the reason why it is hanging, you can find this under the "Events" section.
I say this because you might need to take preliminary actions before force deleting the pod, force deleting the pod only deletes the pod itself not the underlying resource (a stuck docker container for example).
I'd not recommend force deleting pods unless container already exited.
Verify kubelet logs to see what is causing the issue "journalctl -u kubelet"
Verify docker logs: journalctl -u docker.service
Check if pod's volume mount points still exist and if anyone holds lock on it.
Verify if host is out of memory or disk
you can use awk :
kubectl get pods --all-namespaces | awk '{if ($4=="Terminating") print "oc delete pod " $2 " -n " $1 " --force --grace-period=0 ";}' | sh
One reason WHY this happens can be turning off a node (without draining it). Fix in this case is to turn on the node again; then termination should succeed.
My pods stuck in 'Terminating', even after I tried to restart docker & restart server. Resolved after edit the pod & delete items below 'finalizer'
$ kubectl -n mynamespace edit pod/my-pod-name
I am going to try the most extense answer, because none of the above are wrong, but they do not work in all case scenarios.
The usual way to put an end to a terminating pod is:
kubectl delete pod -n ${namespace} ${pod} --grace-period=0
But you may need to remove finalizers that could be preventing the POD from stoppoing using:
kubectl -n ${namespace} patch pod ${pod} -p '{"metadata":{"finalizers":null}}'
If none of that works, you can remove the pod from etcd with etcdctl:
# Define variables
ETCDCTL_API=3
certs-path=${HOME}/.certs/e
etcd-cert-path=${certs-path}/etcd.crt
etcd-key-path=${certs-path}/etcd.key
etcd-cacert-path=${certs-path}/etcd.ca
etcd-endpoints=https://127.0.0.1:2379
namespace=myns
pod=mypod
# Call etcdctl to remove the pod
etcdctl del \
--endpoints=${etcd-endpoints}\
--cert ${etcd-cert-path} \
--key ${etcd-client-key}\
--cacert ${etcd-cacert-path} \
--prefix \
/registry/pods/${namespace}/${pod}
This last case should be used as last resource, in my case I ended having to do it due to a deadlock that prevented calico from starting in the node due to Pods under terminating status. Those pods won't be removed until calico is up, but they have reserved enough CPU to avoid calico, or any other pod, from Initializing.
Following command with awk and xargs can be used along with --grace-period=0 --force to delete all the Pods in Terminating state.
kubectl get pods|grep -i terminating | awk '{print $1}' | xargs kubectl delete --grace-period=0 --force pod
go templates will work without awk, for me it works without --grace-period=0 --force but, add it if you like
this will output the command to delete the Terminated pods.
kubectl get pods --all-namespaces -otemplate='{{ range .items }}{{ if eq .status.reason "Terminated" }}{{printf "kubectl delete pod -n %v %v\n" .metadata.namespace .metadata.name}}{{end}}{{end}}'
if you are happy with the output, you cat add | sh - to execute it.
as follow:
kubectl get pods --all-namespaces -otemplate='{{ range .items }}{{ if eq .status.reason "Terminated" }}{{printf "kubectl delete pod -n %v %v\n" .metadata.namespace .metadata.name}}{{end}}{{end}}' |sh -
for me below command has resolved the issue
oc patch pvc pvc_name -p '{"metadata":{"finalizers":null}}