Kubernetes vertical pod autoscaler (autoscale memory, cpu resources of pods) necessitates a restart of the pod to be able to use the newly assigned resources which might add small window of unavailability.
My question is that if the deployment of the pod is running a rolling update would that ensure zero down time, and zero window of unavailability when the VPA recommendation is applied.
Thank you.
From the official documentation:
Rolling updates allow Deployments' update to take place with zero downtime by incrementally updating Pods instances with new ones. The new Pods will be scheduled on Nodes with available resources.
In this documentation, you will find a very good rolling update overview:
Rolling updates allow the following actions:
Promote an application from one environment to another (via container image updates)
Rollback to previous versions
Continuous Integration and Continuous Delivery of applications with zero downtime
Here you can find information about Rolling update deployment:
The Deployment updates Pods in a rolling update fashion when .spec.strategy.type==RollingUpdate. You can specify maxUnavailable and maxSurge to control the rolling update process.
Additionally, you can add another 2 fields: Max Unavailable and Max Surge.
.spec.strategy.rollingUpdate.maxUnavailable is an optional field that specifies the maximum number of Pods that can be unavailable during the update process.
.spec.strategy.rollingUpdate.maxSurge is an optional field that specifies the maximum number of Pods that can be created over the desired number of Pods.
Now it's up to you how you set these values. Here are some options:
Deploy by adding a Pod, then remove an old one: maxSurge = 1, maxUnavailable = 0. With this configuration, Kubernetes will spin up an additional Pod, then stop an “old” one down.
Deploy by removing a Pod, then add a new one: maxSurge = 0, maxUnavailable = 1. In that case, Kubernetes will first stop a Pod before starting up a new one.
Deploy by updating pods as fast as possible: maxSurge = 1, maxUnavailable = 1. This configuration drastically reduce the time needed to switch between application versions, but combines the cons from both the previous ones.
See also:
good article about zero downtime
guide with examples
Yes. The default RollingUpdate behavior for Deployment should automatically do that. It brings up some new replicas first,then delete some old replicas once the new ones are ready. You can control how many pod can be unavailable at once or how many new pod will be created using maxUnavailable and maxSurge field. You can tune these variable to achieve your zero downtime goal.
Ref:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
https://kubernetes.io/blog/2018/04/30/zero-downtime-deployment-kubernetes-jenkins/
Related
In my team, we sometimes scale down to just one pod in Openshift to make testing easier. If we then do a rolling update with the desired replica count set to 2, Openshift scales up to two pods before performing a rolling deploy. It is a nuisance, because the new "old" pod can start things that we don't expect to be started before the new deployment starts, and so we have to remember to take down the one pod before the new deploy.
Is there a way to stop the old deployment from scaling up to the desired replica count while the new deployment is scaled up to the desired replica count? Also, why does it work this way?
OpenShift Master:
v3.11.200
Kubernetes Master:
v1.11.0+d4cacc0
OpenShift Web Console:
3.11.200-1-8a53b1d
From our Openshift template:
- apiVersion: v1
kind: DeploymentConfig
spec:
replicas: 2
strategy:
type: Rolling
This is expected behavior when using RollingUpdate strategy. It removes old pods one by one, while adding new ones at the same time, keeping the application available throughout the whole process, and ensuring there’s no drop in its capacity to handle requests. Since you have only one pod, Kubernetes scales the deployment to keep the strategy and zero-downtime as requested in the manifest.
It scales up to 2, because if not specified maxSurge defaults to 25%. It means that there can be at most 25% more pod instances than the desired count during an update.
If you want to ensure that this won't be scaled you might change the strategy to Recreate. This will cause all old pods to be deleted before the new ones are created. Use this strategy when your application doesn’t support running multiple versions in parallel and requires the old version to be stopped completely before the new one is started. However please note that, this strategy does involve a short period of time when your app becomes completely unavailable.
Here`s a good document that describes rolling update strategy. It is worth also checking official kubernetes documentation about deployments.
I have an application that I deploy on Kubernetes.
This application has 4 replicas and I'm doing a rolling update on each deployment.
This application has a graceful shutdown which can take tens of minutes (it has to wait for running tasks to finish).
My problem is that during updates, I have over-capacity since all the older version pods are stuck at "Terminating" status while all the new pods are created.
During the updates, I end up running with 8 containers and it is something I'm trying to avoid.
I tried to set maxSurge to 0, but this setting doesn't take into consideration the "Terminating" pods, so the load on my servers during the deployment is too high.
The behaviour I'm trying to get is that new pods will only get created after the old version pods finished successfully, so at all times I'm not exceeding the number of replicas I set.
I wonder if there is a way to achieve such behaviour.
What I ended up doing is creating a StatefulSet with podManagementPolicy: Parallel and updateStrategy to OnDelete.
I also set terminationGracePeriodSeconds to the maximum time it takes for a pod to terminate.
As a part of my deployment process, I apply the new StatefulSet with the new image and then delete all the running pods.
This way all the pods are entering Terminating state and whenever a pod finished its task and terminated a new pod with the new image will replace it.
This way I'm able to keep a static number of replicas during the whole deployment process.
Let me suggest the following strategy:
Deployments implement the concept of ready pods to aide rolling updates. Readiness probes allow the deployment to gradually update pods while giving you the control to determine when the rolling update can proceed.
A Ready pod is one that is considered successfully updated by the Deployment and will no longer count towards the surge count for deployment. A pod will be considered ready if its readiness probe is successful and spec.minReadySeconds have passed since the pod was created. The default for these options will result in a pod that is ready as soon as its containers start.
So, what you can do, is implement (if you haven't done so yet) a readiness probe for your pods in addition to setting the spec.minReadySeconds to a value that will make sense (worst case) to the time that it takes for your pods to terminate.
This will ensure rolling out will happen gradually and in coordination to your requirements.
In addition to that, don't forget to configure a deadline for the rollout.
By default, after the rollout can’t make any progress in 10 minutes, it’s considered as failed. The time after which the Deployment is considered failed is configurable through the progressDeadlineSeconds property in the Deployment spec.
We are using Google kubernetes to deploy our microservices, circlci for deployment integration, we define our k8s files inside the githup repos.
The problem we are facing is that some services are taking time on startup by loading database schema and other pre data, but google kubernetes shutdown old pods before the new pods fully started.
can we tell kubernetes somehow to wait till the new pods are fully loaded or at least to wait 10 seconds before shutdown the old pods.
Yes, this is possible. Based on the description it sounds like you are using a single deployment that gets updated. The new pods are created and the old ones are removed before the new ones become ready.
To address this, you need to have a proper readinessProbe configured or readinessGates on the pods so that the pod status only becomes ready once it actually is ready. If you are not sure what to put as the probe, you can also define initialDelaySeconds with a guess at how much time you think the pod needs to start up.
You should also look into using the deployment spec field for minReadySeconds as well as defining a proper deployment strategy. You can ensure that the rolling update creates new pods (by defining the maxSurge field) and ensure that the old pod is not removed until the new one is ready and receiving traffic (using the maxUnavailable field = 0).
an example would be:
spec:
replicas: 3
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
This will maintain 3 working replicas at any given time. When a new version is pushed, 1 new pod will be created with the new image. No pods will be taken offline until the new one is in ready state. Once it is, one of the old pods will be terminated and the cycle goes again. Feel free to change the maxSurge value to a higher number if you want the rollout to happen in one go.
Assuming I have a Deployment with a specific value set to the .spec.strategy.rollingUpdate.maxUnavailable field.
Then I deploy a PodDisruptionBudget attached to the deployment above, setting its spec.maxUnavailable field to a value different to the above.
Which one will prevail?
By interpreting the documentation, it seems that it depends on the event.
For a rolling update, the Deployment's maxUnavailable will be in effect, even if the PodDisruptionBudget specifies a smaller value.
But for an eviction, the PodDisruptionBudget's maxUnavailable will prevail, even if the Deployment specifies a smaller value.
The documentation does not explicitly compare these two settings, but from the way the documentation is written, it can be deduced that these are separate settings for different events that don't interact with each other.
For example:
Updating a Deployment
Output of kubectl explain deploy.spec.strategy.rollingUpdate.maxUnavailable
Specifying a PodDisruptionBudget
Output of kubectl explain pdb.spec.maxUnavailable
Also, this is more in the spirit of how Kubernetes works. The Deployment Controller is not going to read a field of a PodDisruptionBudget, and vice versa.
But to be really sure, you would just need to try it out.
I believe they updated the docs clarifying your doubt:
Involuntary disruptions cannot be prevented by PDBs; however they do count against the budget.
Pods which are deleted or unavailable due to a rolling upgrade to an application do count against the disruption budget, but workload resources (such as Deployment and StatefulSet) are not limited by PDBs when doing rolling upgrades. Instead, the handling of failures during application updates is configured in the spec for the specific workload resource
Caution: Not all voluntary disruptions are constrained by Pod Disruption Budgets. For example, deleting deployments or pods bypasses Pod Disruption Budgets.
I would like to remove a given, exact, selected pod from a set of pods controlled by the same Replication Controller, or the same Replica Set.
The use case is the following: each pod in the set runs a stateful (but in-memory) application. I would like to remove a pod from the set on a graceful way, i.e. before the removal I would like to be sure, that there are no ongoing application sessions handled by the pod. Let's say I can solve the task of emptying the pod on application level, i.e. no new sessions are directed to the selected pod, and I can measure the number of ongoing sessions in the pod, so I can decide when to remove the pod. The hard part is to remove this pod so, that RC or RS does not replace the pod with a new one based on the value of "replicas".
I could not find a solution for this. The nearest one would be to isolate the pod from the RC or RS as suggested by http://kubernetes.io/docs/user-guide/replication-controller/#isolating-pods-from-a-replication-controller
Though, the RC or RS replaces the isolated pod with a new one, according to the same document. And I as can understand there is no way to isolate the pod and decrease the value of "replicas" on an atomic way.
I have checked the coming PetSet support, but my application does not require e.g. persistent storage, or persistent pod ID. Such features are not necessary in my case, so my application is not really a pet from this perspective.
Maybe a new pod state (e.g. "target for removal" - state name is not important for me) would make it, which could be patched via the API, and which would be considered by RC or RS when the value of "replicas" is decreased?
You can achieve this in three steps:
Add a label to all pods except the one you want to delete. Because the labels of the pods still satisfy the selector of the Replica Set, so no new pods will be created.
Update the Replica Set: adding the new label to the selector and decrease the replicas of the Replica Set atomically. The pod you want to deleted won't be selected by the Replica Set because it doesn't have the new label.
Delete the selected pod.