"Waiting for tearing down pods" when Kubernetes turns down - kubernetes

I have a Kubernetes cluster installed in my Ubuntu machines. It consists of three machines: one master/node and two nodes.
When I turn down the cluster, it never stops printing "waiting for tearing down pods":
root#kubernetes01:~/kubernetes/cluster# KUBERNETES_PROVIDER=ubuntu ./kube-down.sh
Bringing down cluster using provider: ubuntu
Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
No resources found
No resources found
service "kubernetes" deleted
No resources found
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
waiting for tearing down pods
There is no pods nor services running when I turn it down. Finally, I have to force stop by killing processes and stoping services.

First we have to find out which rc is running :
kubectl get rc --namespace=kube-system
We have to delete Running rc :
kubectl delete rc above_running_rc_name --namespace=kube-system
Then cluster down script "KUBERNETES_PROVIDER=ubuntu ./kube-down.sh", will execute without Error "waiting for tearing down pods"
EXAMPLE ::
root#ubuntu:~/kubernetes/cluster# KUBERNETES_PROVIDER=ubuntu ./kube-down.sh
Bringing down cluster using provider: ubuntu
Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
No resources found
No resources found
service "kubernetes" deleted
No resources found
waiting for tearing down pods
waiting for tearing down pods
^C
root#ubuntu:~/kubernetes/cluster# kubectl get rc --namespace=kube-system
CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
kubernetes-dashboard-v1.0.1 kubernetes-dashboard gcr.io/google_containers/kubernetes-dashboard-amd64:v1.0.1 k8s-app=kubernetes-dashboard 1 44m
root#ubuntu:~/kubernetes/cluster#
root#ubuntu:~/kubernetes/cluster# kubectl delete rc kubernetes-dashboard-v1.0.1 --namespace=kube-system
replicationcontroller "kubernetes-dashboard-v1.0.1" deleted
root#ubuntu:~/kubernetes/cluster# KUBERNETES_PROVIDER=ubuntu ./kube-down.sh
Bringing down cluster using provider: ubuntu
Identity added: /root/.ssh/id_rsa (/root/.ssh/id_rsa)
No resources found
No resources found
service "kubernetes" deleted
No resources found
Cleaning on master 172.27.59.208
26979
etcd stop/waiting
Connection to 172.27.59.208 closed.
Connection to 172.27.59.208 closed.
Connection to 172.27.59.208 closed.
Cleaning on node 172.27.59.233
2165
flanneld stop/waiting
Connection to 172.27.59.233 closed.
Connection to 172.27.59.233 closed.
Done

You can find out which pods is it waiting for by running:
kubectl get pods --show-all --all-namespaces
Thats what the code runs: https://github.com/kubernetes/kubernetes/blob/1c80864913e4b9da957c45eef005b06dba68cec3/cluster/ubuntu/util.sh#L689

Related

Why kubernetes keeps pods in Error/Completed status - preemptible nodes, GKE

I have an issue with my GKE cluster. I am using two node pools: secondary - with standard set of highmen-n1 nodes, and primary - with preemptible highmem-n1 nodes. Issue is that I have many pods in Error/Completed status which are not cleared by k8s, all ran on preemptible set. THESE PODS ARE NOT JOBS.
GKE documentation says that:
"Preemptible VMs are Compute Engine VM instances that are priced lower than standard VMs and provide no guarantee of availability. Preemptible VMs offer similar functionality to Spot VMs, but only last up to 24 hours after creation."
"When Compute Engine needs to reclaim the resources used by preemptible VMs, a preemption notice is sent to GKE. Preemptible VMs terminate 30 seconds after receiving a termination notice."
Ref: https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
And from the kubernetes documentation:
"For failed Pods, the API objects remain in the cluster's API until a human or controller process explicitly removes them.
The Pod garbage collector (PodGC), which is a controller in the control plane, cleans up terminated Pods (with a phase of Succeeded or Failed), when the number of Pods exceeds the configured threshold (determined by terminated-pod-gc-threshold in the kube-controller-manager). This avoids a resource leak as Pods are created and terminated over time."
Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection
So, from my understanding every 24 hours this set of nodes is changing, so it kills all the pods running on them and depending on graceful shutdown pods are ending up in Completed or Error state. Nevertheless, kubernetes is not clearing or removing them, so I have tons of pods in mentioned statuses in my cluster, which is not expected at all.
I am attaching screenshots for reference.
Example kubectl describe pod output:
Status: Failed
Reason: Terminated
Message: Pod was terminated in response to imminent node shutdown.
Apart from that, no events, logs, etc.
GKE version:
1.24.7-gke.900
Both Node pools versions:
1.24.5-gke.600
Did anyone encounter such issue or knows what's going on there? Is there solution to clear it in a different way than creating some script and running it periodically?
I tried digging in into GKE logs, but I couldn't find anything. I also tried to look for the answers in docs, but I've failed.
While using the node pool with Preemptible mode the clusters running GKE version 1.20 and later, the kubelet graceful node shutdown feature is enabled by default. The kubelet notices the termination notice and gracefully terminates Pods that are running on the node. If the Pods are part of a Deployment, the controller creates and schedules new Pods to replace the terminated Pods.
During graceful Pod termination, the kubelet updates the status of the Pod, assigning a Failed phase and a Terminated reason to the terminated Pods. When the number of terminated Pods reaches a threshold, garbage collection cleans up the Pods.
You can also delete shutdown Pods manually for GKE version 1.21.3-gke.1200 and later
Delete shutdown Pods manually:
kubectl get pods --all-namespaces | grep -i NodeShutdown | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n
kubectl get pods --all-namespaces | grep -i Terminated | awk '{print $1, $2}' | xargs -n2 kubectl delete pod -n

Kubernetes pod failed to update

We have a Gitlab CI/CD to deploy pod via Kubernetes. However, the updated pod is always pending and the deleted pod is always stuck at terminating.
The controller and scheduler are both okay.
If I described the pending pod, it shows it is scheduled but nothing else.
This is the pending pod's logs:
$ kubectl logs -f robo-apis-dev-7b79ccf74b-nr9q2 -n xxx -f Error from
server (BadRequest): container "robo-apis-dev" in pod
"robo-apis-dev-7b79ccf74b-nr9q2" is waiting to start:
ContainerCreating
What could be the issue? Our Kubernetes cluster never had this issue before.
Okay, it turns out we used to have an NFS server as PVC. But we have moved to AWS EKS recently, thus cleaning the NFS servers. Maybe there are some resources from nodes that are still on the NFS server. Once we temporarily roll back the NFS server, the pods start to move to RUNNING state.
The issue was discussed here - Orphaned pod https://github.com/kubernetes/kubernetes/issues/60987

How to reschedule the pod from node in kubernetes ( baremetal servers )?

Kubernetes nodes are getting unscheduled while i initiate the drain or cordon but the pods which is available on the node are not getting moved to different node immediately ?
i mean, these pods are not created by daemonset.
So, how come, Application running pod can make 100% available when a node getting faulty or with some issues ?
any inputs ?
command used :
To drain / cordon to make the node unavailable:
kubectl drain node1
kubectl cordon node1
To check the node status :
kubectl get nodes
To check the pod status before / after cordon or drain :
kubectl get pods -o wide
kubectl describe pod <pod-name>
Surprising part is , even node is unavailable, the pod status showing always running. :-)
Pods by itself doesn't migrate to another node.
You can use workload resources to create and manage multiple Pods for you. A controller for the resource handles replication and rollout and automatic healing in case of Pod failure. For example, if a Node fails, a controller notices that Pods on that Node have stopped working and creates a replacement Pod. The scheduler places the replacement Pod onto a healthy Node.
Some examples of controllers are:
deployment
daemonset
statefulsets
Check this link to more information.

Rancher 2.0 - Troubleshooting and fixing “Controller Manager Unhealthy Issue”

I have a problem with controller-manager and scheduler not responding, that is not related to github issues I've found (rancher#11496, azure#173, …)
Two days ago we had a memory overflow by one POD on one Node in our 3-node HA cluster. After that rancher webapp was not accessible, we found the compromised pod and scaled it to 0 over kubectl. But that took some time, figuring everything out.
Since then rancher webapp is working properly, but there are continuous alerts from controller-manager and scheduler not working. Alerts are not consist, sometimes they are both working, some times their health check urls are refusing connection.
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-2 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
Restarting controller-manager and scheduler on compromised Node hasn’t been effective. Even reloading all of the components with
docker restart kube-apiserver kubelet kube-controller-manager kube-scheduler kube-proxy
wasn’t effective either.
Can someone please help me figure out the steps towards troubleshooting and fixing this issue without downtime on running containers?
Nodes are hosted on DigitalOcean on servers with 4 Cores and 8GB of RAM each (Ubuntu 16, Docker 17.03.3).
Thanks in advance !
The first area to look at would be your logs... Can you export the following logs and attach them?
/var/log/kube-controller-manager.log
The controller manager is an endpoint, so you will need to do a "get endpoint". Can you run the following:
kubectl -n kube-system get endpoints kube-controller-manager
and
kubectl -n kube-system describe endpoints kube-controller-manager
and
kubectl -n kube-system get endpoints kube-controller-manager -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}'
Please run this command in master nodes
sed -i 's|- --port=0|#- --port=0|' /etc/kubernetes/manifests/kube-scheduler.yaml
sed -i 's|- --port=0|#- --port=0|' /etc/kubernetes/manifests/kube-controller-manager.yaml
systemctl restart kubelet
After restarting the kubelet, the problem will be solved.

Why pods aren't rescheduled when the remote kubelet is unreachable

I'm currently doing some tests on a kubernetes cluster.
I was wondering why the pods aren't rescheduled in some cases :
When the node is unreachable
When the remote kubelet doesn't answer
Actually the only case when a pod got rescheduled is when the kubelet notify the master.
Is it on purpose ? Why ?
If i shut down a server where there's a rc with a unique pod running, my service is down.
Maybe there's something i misunderstood.
Regards,
Smana
There is a quite long default timeout for detecting unreachable nodes and for re-scheduling pods, maybe you did not wait long enough?
You can adjust the timeouts with several flags:
node-status-update-frequency on the kubelet (http://kubernetes.io/v1.0/docs/admin/kubelet.html)
node-monitor-grace-period and pod_eviction_timeout on the kube-controller-manager (http://kubernetes.io/v1.0/docs/admin/kube-controller-manager.html)