Kubernetes Deployment Rolling Updates - kubernetes

I have an application that I deploy on Kubernetes.
This application has 4 replicas and I'm doing a rolling update on each deployment.
This application has a graceful shutdown which can take tens of minutes (it has to wait for running tasks to finish).
My problem is that during updates, I have over-capacity since all the older version pods are stuck at "Terminating" status while all the new pods are created.
During the updates, I end up running with 8 containers and it is something I'm trying to avoid.
I tried to set maxSurge to 0, but this setting doesn't take into consideration the "Terminating" pods, so the load on my servers during the deployment is too high.
The behaviour I'm trying to get is that new pods will only get created after the old version pods finished successfully, so at all times I'm not exceeding the number of replicas I set.
I wonder if there is a way to achieve such behaviour.

What I ended up doing is creating a StatefulSet with podManagementPolicy: Parallel and updateStrategy to OnDelete.
I also set terminationGracePeriodSeconds to the maximum time it takes for a pod to terminate.
As a part of my deployment process, I apply the new StatefulSet with the new image and then delete all the running pods.
This way all the pods are entering Terminating state and whenever a pod finished its task and terminated a new pod with the new image will replace it.
This way I'm able to keep a static number of replicas during the whole deployment process.

Let me suggest the following strategy:
Deployments implement the concept of ready pods to aide rolling updates. Readiness probes allow the deployment to gradually update pods while giving you the control to determine when the rolling update can proceed.
A Ready pod is one that is considered successfully updated by the Deployment and will no longer count towards the surge count for deployment. A pod will be considered ready if its readiness probe is successful and spec.minReadySeconds have passed since the pod was created. The default for these options will result in a pod that is ready as soon as its containers start.
So, what you can do, is implement (if you haven't done so yet) a readiness probe for your pods in addition to setting the spec.minReadySeconds to a value that will make sense (worst case) to the time that it takes for your pods to terminate.
This will ensure rolling out will happen gradually and in coordination to your requirements.
In addition to that, don't forget to configure a deadline for the rollout.
By default, after the rollout can’t make any progress in 10 minutes, it’s considered as failed. The time after which the Deployment is considered failed is configurable through the progressDeadlineSeconds property in the Deployment spec.

Related

Is it possible to kill previous replicaset without waiting for new replicaset to be rolled out?

Context
Say we have d.yaml in which a deployment, whose strategy is RollingUpdate, is defined.
We first create a deployment:
kubectl apply -f d.yaml
After some time, we modify d.yaml and re-apply it to update the deployment.
vi d.yaml
kubectl apply -f d.yaml
This starts rolling out a new replicaset R_new.
Normally, the old (previous) replicaset R_old is killed only after R_new has successfully been rolled out.
Question (tl;dr)
Is it possible to kill R_old without waiting for rolling out R_new to complete?
By "kill", I mean completely stopping a replicaset; it should never restart. (So kubectl delete replicaset didn't help.)
Question (long)
In my specific situation, my containers connect to an external database. This single database is also connected from many containers managed by other teams.
If the maximum number of connections allowed is already reached, new containers associated with R_new fail to start (i.e. CrashLoopBackOff).
If I could forcefully kill R_old, the number of connections would be lowered by N where N is the number of replicas, and thus R_new's containers would successfully start.
FAQ:
Q. Why not temporarily stop using RollingUpdate strategy?
A. Actually I have no permission to edit d.yaml. It is edited by CI/CD.
Q. Why not just make the maximum number of connections larger?
A. I have no permission for the database either...
Is it possible to kill R_old without waiting for rolling out R_new to complete?By "kill", I mean completely stopping a replicaset; it should never restart. (So kubectl delete replicaset didn't help.)
Make Changes to deployment
Scale down deployment to replicas=0 so that it is as good as stopping the old replicaset
scale up deployment to desired number of replicas, new replicasets will be created with new configuration changes in deployment.
steps number 1 & 2 can be interchanged based on the requirement
Technically, you can delete the old replicaset by running kubectl delete replicaset R_old and this would terminate the old pod. I just verified it in my cluster (kubernetes version 1.21.8).
However, terminating a pod doesn't necessarily mean it is been killed immediately.
The actual termination of the pod depends on whether or not a preStop lifecycle hook is defined, and on the value of terminationGracePeriodSeconds (see description by typing kubectl explain pod.spec.terminationGracePeriodSeconds).
By default, the kubelet deletes the pod only after all of the processes in its containers are stopped, or the grace period has been over.

How to delete a pod from Kubernetes master node?

Does anyone know how to delete pod from kubernetes master node? I have this one master node on bare-metal ubuntu server. When i'm trying to delete it with "kubectl delete pod .." or force deleting from there: https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/ it doesnt work. the pod is creating again and again...
The pods in a Statefulsets are managed by ReplicaSets and will be recreated again if the current and the desired replicas defined in the spec do not match.
The document you linked provides instructions as to how to kill the pods forcefully avoiding the graceful shutdown behaviour which can have unexpected behaviour depending on the application.
The link clearly states the pods will be recreated in the section:
Force deletions do not wait for confirmation from the kubelet that the Pod has been terminated. Irrespective of whether a force deletion is successful in killing a Pod, it will immediately free up the name from the apiserver. This would let the StatefulSet controller create a replacement Pod with that same identity; this can lead to the duplication of a still-running Pod, and if said Pod can still communicate with the other members of the StatefulSet, will violate the at most one semantics that StatefulSet is designed to guarantee.
If you want the pods to be stopped and new pods for the Statefulset do not get created, you need to scale down the Statefulset by changing the replicas to 0.
You can read the official docs for how to scale the Statefulset replicas.
The key to figuring out how to kill the pod will be to understand how it was created. For example, if the pod is part of a deployment with a declared replicas count as 1, Once you kill/ force kill, Kubernetes detects a mismatch between the desired state (the number of replicas defined in the deployment configuration) to the current state and will create a new pod to replace the one that was deleted - therefor in this example you will need to either scale the deployment to 0 or delete the deployment.
If we need to kill any pod we can just scale down the replica set.
kubectl scale deploy <deployment_name> --replicas=<expected_no_of_replicas>
Way of deleting pods will depends on how you created it. If you created it individually ( not part of a ReplicaSet/ReplicationController/Deployment ) then you can delete pod directly. otherwise the only option to delete is the scale option. In production setup what I believe is all are using Deployment option out of ReplicaSet/ReplicationController/Deployment( Please refer documents and understand the difference between all those three options )

Google kubernetes pods shutdown and start

We are using Google kubernetes to deploy our microservices, circlci for deployment integration, we define our k8s files inside the githup repos.
The problem we are facing is that some services are taking time on startup by loading database schema and other pre data, but google kubernetes shutdown old pods before the new pods fully started.
can we tell kubernetes somehow to wait till the new pods are fully loaded or at least to wait 10 seconds before shutdown the old pods.
Yes, this is possible. Based on the description it sounds like you are using a single deployment that gets updated. The new pods are created and the old ones are removed before the new ones become ready.
To address this, you need to have a proper readinessProbe configured or readinessGates on the pods so that the pod status only becomes ready once it actually is ready. If you are not sure what to put as the probe, you can also define initialDelaySeconds with a guess at how much time you think the pod needs to start up.
You should also look into using the deployment spec field for minReadySeconds as well as defining a proper deployment strategy. You can ensure that the rolling update creates new pods (by defining the maxSurge field) and ensure that the old pod is not removed until the new one is ready and receiving traffic (using the maxUnavailable field = 0).
an example would be:
spec:
replicas: 3
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
This will maintain 3 working replicas at any given time. When a new version is pushed, 1 new pod will be created with the new image. No pods will be taken offline until the new one is in ready state. Once it is, one of the old pods will be terminated and the cycle goes again. Feel free to change the maxSurge value to a higher number if you want the rollout to happen in one go.

Kubernetes deployment: Don't terminate pod until new pod has been in Running state for 2 minutes

When I do kubectl delete pod or kubectl patch, how can I ensure that the old pod doesn't actually delete itself until the replacement pod has been running for 2 minutes? (And if the replacement pod dies before the 2 minute mark, then don't even delete the old pod)
The reason is that my initiation takes about 2 minutes to pull some latest data and run calculations; after 2 minutes it'll reach a point where it'll either error out or continue running with the updated calculations.
I want to be able to occasionally delete the pod so that it'll restart and get new versions of stuff (because getting new versions is only done at the beginning of the code).
Is there a way I can do this without an init container? Because I'm concerned that it would be difficult to pass the calculated results from the init container to the main container
We need to tweak two parameters.
We have set the minReadySeconds as 2 min. Or we can use readiness probe instead of hardcoded 2min.
We have to do rolling update with maxSurge > 0 (default: 1) and maxUnavailable: 0. This will bring the new pod(s) and only if it becomes ready, old pod(s) will be killed. This process continues for rest of the pods.
Note: 0 <= maxSurge <= replicaCount
If you are using a Deployment, you can set minReadySeconds in the spec to 120 (seconds). Kubernetes will not consider it actually ready and in service (and therefore spin down old pods) until the pod reports it has been ready for that long.

Why does scaling down a deployment seem to always remove the newest pods?

(Before I start, I'm using minikube v27 on Windows 10.)
I have created a deployment with the nginx 'hello world' container with a desired count of 2:
I actually went into the '2 hours' old pod and edited the index.html file from the welcome message to "broken" - I want to play with k8s to seem what it would look like if one pod was 'faulty'.
If I scale this deployment up to more instances and then scale down again, I almost expected k8s to remove the oldest pods, but it consistently removes the newest:
How do I make it remove the oldest pods first?
(Ideally, I'd like to be able to just say "redeploy everything as the exact same version/image/desired count in a rolling deployment" if that is possible)
Pod deletion preference is based on a ordered series of checks, defined in code here:
https://github.com/kubernetes/kubernetes/blob/release-1.11/pkg/controller/controller_utils.go#L737
Summarizing- precedence is given to delete pods:
that are unassigned to a node, vs assigned to a node
that are in pending or not running state, vs running
that are in not-ready, vs ready
that have been in ready state for fewer seconds
that have higher restart counts
that have newer vs older creation times
These checks are not directly configurable.
Given the rules, if you can make an old pod to be not ready, or cause an old pod to restart, it will be removed at scale down time before a newer pod that is ready and has not restarted.
There is discussion around use cases for the ability to control deletion priority, which mostly involve workloads that are a mix of job and service, here:
https://github.com/kubernetes/kubernetes/issues/45509
what about this :
kubectl scale deployment ingress-nginx-controller --replicas=2
Wait until 2 replicas are up.
kubectl delete pod ingress-nginx-controller-oldest-replica
kubectl scale deployment ingress-nginx-controller --replicas=1
I experienced zero downtime doing so while removing oldest pod.