Skipping a pod deployment in statefulset - kubernetes

I have a stateful set of pods, and due to the stateful nature of them one of them cannot be recreated due to some state error that deleting it wouldn't help.
Since it's a stateful set kubernetes will block creation of additional pods until it's able to get the stuck one running.
Statefulsets has podManagementPolicy: "Parallel" but it cannot be changed in runtime.
The question is if there's a way to make kubernetes skip the stuck one?

I belive you are looking for a WA for an known issue which is still open
StatefulSet will continue to wait for the broken Pod to become Ready (which never happens) before it will attempt to revert it back to the working configuration.
In term of upgrade found this on git hub below from official doc
The Pods in the StatefulSet are updated in reverse ordinal order. The StatefulSet controller terminates each Pod, and waits for it to transition to Running and Ready prior to updating the next Pod.
Note that, even though the StatefulSet controller will not proceed to update the next Pod until its ordinal successor is Running and Ready, it will restore any Pod that fails during the update to its current version. Pods that have already received the update will be restored to the updated version, and Pods that have not yet received the update will be restored to the previous version. In this way, the controller attempts to continue to keep the application healthy and the update consistent in the presence of intermittent failures.
Read Forced Rollback
When using Rolling Updates with the default Pod Management Policy (OrderedReady), it’s possible to get into a broken state that requires manual intervention to repair.

Related

Auto delete CrashBackoffLoop pods in a deployment

In my kubernetes cluster, there are multiple deployments in a namespace.
For a specific deployment, there is a need to not allow "CrashLoopBackoff" pods to exist.
So basically, when any pod gets to this state, I would want it to be deleted and later a new pod to be created which is already handled by the ReplicaSet.
I tried with custom controllers, with the thought that the SharedInformer would alert about the state of Pod and then I would delete it from that loop.
However, this brings dependency on the pod on which the custom controller would run.
I also tried searching for any option to be configured in the manifest itself, but could not find any.
I am pretty new to Kuberenetes, so need help in the implementation of this behaviour.
Firstly, you should address the reason why the pod has entered the CrashLoopBackOff state rather than just delete it. If you do this, you'll potentially just recreate the problem again and you'll be deleting pods repeatedly. For example, if your pod is trying to access an external DB and that DB is down, it'll CrashLoop, and deleting and restarting the pod won't help fix that.
Secondly, if you want to do this deleting in an automated manner, an easy way would be to run a CronJob resource that goes through your deployment and deletes the CrashLooped pods. You could set the cronjob to run once an hour or whatever schedule you wish.
Deleting the POD and waiting for the New one is like restarting the deployment or POD.
Kubernetes will auto restart your CrashLoopBackoff POD if failing, you can check the Restart count.
NAME READY STATUS RESTARTS AGE
te-pod-1 0/1 CrashLoopBackOff 2 1m44s
This restarts will be similar to what you have mentioned
when any pod gets to this state, I would want it to be deleted and
later a new pod to be created which is already handled by the
ReplicaSet.
If you want to remove Crashing the POD fully and not look for new POD to come up, you have to rollback the deployment.
If there is any issue with your Replicaset and your POD is crashing it would be useless, any number of times you delete and restart the POD it will crash all time, unless you check logs & debug to solve the real issue in replicaset(Deployment).

How will a scheduled (rolling) restart of a service be affected by an ongoing upgrade (and vice versa)

Due to a memory leak in one of our services I am planning to add a k8s CronJob to schedule a periodic restart of the leaking service. Right now we do not have the resources to look into the mem leak properly, so we need a temporary solution to quickly minimize the issues caused by the leak. It will be a rolling restart, as outlined here:
How to schedule pods restart
I have already tested this in our test cluster, and it seems to work as expected. The service has 2 replicas in test, and 3 in production.
My plan is to schedule the CronJob to run every 2 hours.
I am now wondering: How will the new CronJob behave if it should happen to execute while a service upgrade is already running? We do rolling upgrades to achieve zero downtime, and we sometimes roll out upgrades several times a day. I don't want to limit the people who deploy upgrades by saying "please ensure you never deploy near to 08:00, 10:00, 12:00 etc". That will never work in the long term.
And vice versa, I am also wondering what will happen if an upgrade is started while the CronJob is already running and the pods are restarting.
Does kubernetes have something built-in to handle this kind of conflict?
This answer to the linked question recommends using kubectl rollout restart from a CronJob pod. That command internally works by adding an annotation to the deployment's pod spec; since the pod spec is different, it triggers a new rolling upgrade of the deployment.
Say you're running an ordinary redeployment; that will change the image: setting in the pod spec. At about the same time, the kubectl rollout restart happens that changes an annotation setting in the pod spec. The Kubernetes API forces these two changes to be serialized, so the final deployment object will always have both changes in it.
This question then reduces to "what happens if a deployment changes and needs to trigger a redeployment, while a redeployment is already running?" The Deployment documentation covers this case: it will start deploying new pods on the newest version of the pod spec and treat all older ones as "old", so a pod with the intermediate state might only exist for a couple of minutes before getting replaced.
In short: this should work consistently and you shouldn't need to take any special precautions.

Kubernetes rolling deploy: terminate a pod only when there are no containers running

I am trying to deploy updates to pods. However I want the current pods to terminate only when all the containers inside the pod have terminated and their process is complete.
The new pods can keep waiting to start untill all container in the old pods have completed. We have a mechanism to stop old pods from picking up new tasks and therefore they should eventually terminate.
It's okay if twice the pods exist at some instance of time. I tried finding solution for this in kubernetes docs but wan't successful. Pointers on how / if this is possible would be helpful.
well I guess then you may have to create a duplicate kind of deployment with new image as required and change the selector in service to new deployment, which will prevent external traffic from entering pre-existing pods and new calls can go to new pods. Then later you can check for something like -
Kubectl top pods -c containers
and if the load appears to be static and low, then preferrably you can delete the old pods related deployment later.
But for this thing everytime the service selectors have to be updated and likely for keeping track of things you can append the git commit hash to the service selector to keep it unique everytime.
But rollback to previous versions if required from inside Kubernetes cluster will be difficult, so preferably you can trigger the wanted build again.
I hope this makes some sense !!

Kubernetes Deployment Rolling Updates

I have an application that I deploy on Kubernetes.
This application has 4 replicas and I'm doing a rolling update on each deployment.
This application has a graceful shutdown which can take tens of minutes (it has to wait for running tasks to finish).
My problem is that during updates, I have over-capacity since all the older version pods are stuck at "Terminating" status while all the new pods are created.
During the updates, I end up running with 8 containers and it is something I'm trying to avoid.
I tried to set maxSurge to 0, but this setting doesn't take into consideration the "Terminating" pods, so the load on my servers during the deployment is too high.
The behaviour I'm trying to get is that new pods will only get created after the old version pods finished successfully, so at all times I'm not exceeding the number of replicas I set.
I wonder if there is a way to achieve such behaviour.
What I ended up doing is creating a StatefulSet with podManagementPolicy: Parallel and updateStrategy to OnDelete.
I also set terminationGracePeriodSeconds to the maximum time it takes for a pod to terminate.
As a part of my deployment process, I apply the new StatefulSet with the new image and then delete all the running pods.
This way all the pods are entering Terminating state and whenever a pod finished its task and terminated a new pod with the new image will replace it.
This way I'm able to keep a static number of replicas during the whole deployment process.
Let me suggest the following strategy:
Deployments implement the concept of ready pods to aide rolling updates. Readiness probes allow the deployment to gradually update pods while giving you the control to determine when the rolling update can proceed.
A Ready pod is one that is considered successfully updated by the Deployment and will no longer count towards the surge count for deployment. A pod will be considered ready if its readiness probe is successful and spec.minReadySeconds have passed since the pod was created. The default for these options will result in a pod that is ready as soon as its containers start.
So, what you can do, is implement (if you haven't done so yet) a readiness probe for your pods in addition to setting the spec.minReadySeconds to a value that will make sense (worst case) to the time that it takes for your pods to terminate.
This will ensure rolling out will happen gradually and in coordination to your requirements.
In addition to that, don't forget to configure a deadline for the rollout.
By default, after the rollout can’t make any progress in 10 minutes, it’s considered as failed. The time after which the Deployment is considered failed is configurable through the progressDeadlineSeconds property in the Deployment spec.

What is a rollout in Kubernetes?

I just started to learn Kubernetes. I know what a rollback is, but I have never heard of rollout. Is "rollout" related to rollback in any way? Or "rollout is similar to deploying something?
Rollout simply means rolling update of application. Rolling update means that application is updated gradually, gracefully and with no downtime. So when you push new version of your application's Docker image and then trigger rollout of your deployment Kubernetes first launches new pod with new image while keeping old version still running. When new pod settles down (passes its readiness probe) - Kubernetes kills old pod and switches Service endpoints to point to new version. When you have multiple replicas it will happen gradually until all replicas are replaced with new version.
This behavior however is not the only one possible. You can tune Rolling Update settings in your deployments spec.strategy settings.
Official docs even have interactive tutorial on Rolling Update feature, it perfectly explains how it works: https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/
Rollout is opposite to Rollback. Yes it means deploying new application or upgrading existing application.
Note: Some more details on the paragraph that you referred. Let's say we have 5 replicas. On rollout, we can configure how many applications should upgrade at a time and what should happen if there is a failure in the new configuration using maxUnavailabe, maxSurge and readinessProbe. Refer refer about all this parameters and tune accordingly.