Auto delete CrashBackoffLoop pods in a deployment - kubernetes

In my kubernetes cluster, there are multiple deployments in a namespace.
For a specific deployment, there is a need to not allow "CrashLoopBackoff" pods to exist.
So basically, when any pod gets to this state, I would want it to be deleted and later a new pod to be created which is already handled by the ReplicaSet.
I tried with custom controllers, with the thought that the SharedInformer would alert about the state of Pod and then I would delete it from that loop.
However, this brings dependency on the pod on which the custom controller would run.
I also tried searching for any option to be configured in the manifest itself, but could not find any.
I am pretty new to Kuberenetes, so need help in the implementation of this behaviour.

Firstly, you should address the reason why the pod has entered the CrashLoopBackOff state rather than just delete it. If you do this, you'll potentially just recreate the problem again and you'll be deleting pods repeatedly. For example, if your pod is trying to access an external DB and that DB is down, it'll CrashLoop, and deleting and restarting the pod won't help fix that.
Secondly, if you want to do this deleting in an automated manner, an easy way would be to run a CronJob resource that goes through your deployment and deletes the CrashLooped pods. You could set the cronjob to run once an hour or whatever schedule you wish.

Deleting the POD and waiting for the New one is like restarting the deployment or POD.
Kubernetes will auto restart your CrashLoopBackoff POD if failing, you can check the Restart count.
NAME READY STATUS RESTARTS AGE
te-pod-1 0/1 CrashLoopBackOff 2 1m44s
This restarts will be similar to what you have mentioned
when any pod gets to this state, I would want it to be deleted and
later a new pod to be created which is already handled by the
ReplicaSet.
If you want to remove Crashing the POD fully and not look for new POD to come up, you have to rollback the deployment.
If there is any issue with your Replicaset and your POD is crashing it would be useless, any number of times you delete and restart the POD it will crash all time, unless you check logs & debug to solve the real issue in replicaset(Deployment).

Related

Job still running even when deleting nodes

I created a two nodes clusters and I created a new job using the busybox image that sleeps for 300 secs. I checked on which node this job is running using
kubectl get pods -o wide
I deleted the node but surprisingly the job was still finishing to run on the same node. Any idea if this is a normal behavior? If not how can I fix it?
Jobs aren't scheduled or running on nodes. The role of a job is just to define a policy by making sure that a pod with certain specifications exists and ensure that it runs till the completion of the task whether it completed successfully or not.
When you create a job, you are declaring a policy that the built-in job-controller will see and will create a pod for. Then the built-in kube-scheduler will see this pod without a node and patch the pod to it with a node's identity. The kubelet will see a pod with a node matching it's own identity and hence a container will be started. As the container will be still running, the control-plane will know that the node and the pod still exist.
There are two ways of breaking a node, one with a drain and the second without a drain. The process of breaking a node without draining is identical to a network cut or a server crash. The api-server will keep the node resource for a while, but it 'll cease being Ready. The pods will be then terminated slowly. However, when you drain a node, it looks as if you are preventing new pods from scheduling on to the node and deleting the pods using kubectl delete pod.
In both ways, the pods will be deleted and you will be having a job that hasn't run to completion and doesn't have a pod, therefore job-controller will make a new pod for the job and the job's failed-attempts will be increased by 1, and the loop will start over again.

How to delete a pod from Kubernetes master node?

Does anyone know how to delete pod from kubernetes master node? I have this one master node on bare-metal ubuntu server. When i'm trying to delete it with "kubectl delete pod .." or force deleting from there: https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/ it doesnt work. the pod is creating again and again...
The pods in a Statefulsets are managed by ReplicaSets and will be recreated again if the current and the desired replicas defined in the spec do not match.
The document you linked provides instructions as to how to kill the pods forcefully avoiding the graceful shutdown behaviour which can have unexpected behaviour depending on the application.
The link clearly states the pods will be recreated in the section:
Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated. Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee.
If you want the pods to be stopped and new pods for the Statefulset do not get created, you need to scale down the Statefulset by changing the replicas to 0.
You can read the official docs for how to scale the Statefulset replicas.
The key to figuring out how to kill the pod will be to understand how it was created. For example, if the pod is part of a deployment with a declared replicas count as 1, Once you kill/ force kill, Kubernetes detects a mismatch between the desired state (the number of replicas defined in the deployment configuration) to the current state and will create a new pod to replace the one that was deleted - therefor in this example you will need to either scale the deployment to 0 or delete the deployment.
If we need to kill any pod we can just scale down the replica set.
kubectl scale deploy <deployment_name> --replicas=<expected_no_of_replicas>
Way of deleting pods will depends on how you created it. If you created it individually ( not part of a ReplicaSet/ReplicationController/Deployment ) then you can delete pod directly. otherwise the only option to delete is the scale option. In production setup what I believe is all are using Deployment option out of ReplicaSet/ReplicationController/Deployment( Please refer documents and understand the difference between all those three options )

Kubernetes rolling deploy: terminate a pod only when there are no containers running

I am trying to deploy updates to pods. However I want the current pods to terminate only when all the containers inside the pod have terminated and their process is complete.
The new pods can keep waiting to start untill all container in the old pods have completed. We have a mechanism to stop old pods from picking up new tasks and therefore they should eventually terminate.
It's okay if twice the pods exist at some instance of time. I tried finding solution for this in kubernetes docs but wan't successful. Pointers on how / if this is possible would be helpful.
well I guess then you may have to create a duplicate kind of deployment with new image as required and change the selector in service to new deployment, which will prevent external traffic from entering pre-existing pods and new calls can go to new pods. Then later you can check for something like -
Kubectl top pods -c containers
and if the load appears to be static and low, then preferrably you can delete the old pods related deployment later.
But for this thing everytime the service selectors have to be updated and likely for keeping track of things you can append the git commit hash to the service selector to keep it unique everytime.
But rollback to previous versions if required from inside Kubernetes cluster will be difficult, so preferably you can trigger the wanted build again.
I hope this makes some sense !!

Duplicate pods / Pods creating without deploy existing

I'm running into an issue managing my Kubernetes pods.
I had a deploy instance which I removed and created a new one. The pod tied to that deploy instance shut down as expected and a new one came up when I created a new deploy, as expected.
However, once I changed the deploy, a second pod began running. I tried to "kubectl delete pod pod-id" but it would just recreate itself again.
I went through the same process again and now I'm stuck with 3 pods, and no deploy. I removed the deploy completely, and I try to delete the pods but they keep recreating themselves. This is an issue because I am exhausting the resources available on my Kubernetes.
Does anyone know how to force remove these pods? I do not know how they are recreating themselves if there's no deploy to go by.
The root cause could be either an existing deployment, replicaset, daemonset, statefulset or a static pod. Check if any of these exist in the affected namespace using kubectl get <RESOURCE-TYPE>
I've had this happen after issuing a rollout restart deployment while a pod was already in an error or creating state, and explicitly deleting the second pod only resulted in a new one getting scheduled (trick birthday candle situation).
I find almost any time I have an issue like this it can be fixed by simply zeroing out the replicaSets in the deployment, applying, then restoring replicaSets to the original value.

Postgres pod suddenly dissapeared after update in gcloud

We changed kubernetes node version because of this message, and because for some reason, pods were unable to be scheduled
Before this message, however, there was a postgres pod running
As you can see the pod is gone, for some reason, why is it so?
I cannot seem to get it back, when I try kubectl get events I get that no resources cannot be found, is there anyway to revive the postgres container, or get information about it, why is it down? What could I do? kubectl logs postgres doesn't seem to work either.
What I want to get is where was this postgres pod running (like the location path), or if the configuration of this pod is still available, or if this is lost forever. If the pod is dead, can I still access to it's "graveyard" (that means the database data), or was this cleaned up?
Update
Okay so it turns out this pod wasn't managed by a controller, so that's why when it died there was no traces of it, but why there is no log information that this pod was killed?
Judging by the name your pod has, it wasn't provisioned using a deployment or a replicaset (if it was, like your other pods, it'd have a random id after its name)
More than likely, it's a standalone pod, which means one the node is gone, the pod is gone.
It might be possible to use kubectl get pods --show-all but it's unlikely.
If your database has a persistent volume, you may still be able to retrieve the data by reattaching that to a new postgres pod.
In future, you might consider setting the termination message and message path and also ensuring all pods are in a replicaset or deployment with persistent volumes attached.