I'm using default deployment strategy for my load-balancer service in kubernetes, and when I describe my deployment the strategy looks like follows:
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
So according to the description, there should not be any downtime. However, there's still a downtime in the service.
How can I make sure there's zero downtime?
From what I can see you are using the as fast as possible approach of Rolling Updates.
While this is a good approach it's better to use Replicas: 3, because you might end up with 2 pods down during update.
You should implement ReadinessProbe that might look like the following:
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
initialDelaySeconds: Number of seconds after the container has started before readiness probes are initiated.
periodSeconds: How often to perform the probe. Default to 10 seconds.
successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1.
I also recommend reading Enable Rolling updates in Kubernetes with Zero downtime, as they nicely explains the use of Rolling updates.
Related
How can I speedup the rollout of new images in Kubernetes?
Currently, we have an automated build job that modifies a yaml file to point to a new revision and then runs kubectl apply on it.
It works, but it takes long delays (up to 20 minutes PER POD) before all pods with the previous revision are replaced with the latest.
Also, the deployment is configured for 3 replicas. We see one pod at a time is started with the new revision. (Is this the Kubernetes "surge" ?) But that is too slow, I would rather kill all 3 pods and have 3 new ones with the new image.
I would rather kill all 3 pods and have 3 new ones with the new image.
You can do that. Set strategy.type: to Recreate instead of the default RollingUpdate in your Deployment. See strategy.
But you probably get some downtime during deployment.
Jonas and SYN are right but I would like to expand this topic with some additional info and examples.
You have two types of strategies to choose from when specifying the way of updating your deployments:
Recreate Deployment: All existing Pods are killed before new ones are created.
Rolling Update Deployment: The Deployment updates Pods in a rolling update fashion.
The default and more recommended one is the .spec.strategy.type==RollingUpdate. See the examples below:
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
In this example there would be one additional Pod (maxSurge: 1) above the desired number of 3, and the number of available Pods cannot go lower than that number (maxUnavailable: 0).
Choosing this config, the Kubernetes will spin up an additional Pod, then stop an “old” one. If there’s another Node available to deploy this Pod, the system will be able to handle the same workload during deployment. If not, the Pod will be deployed on an already used Node at the cost of resources from other Pods hosted on the same Node.
You can also try something like this:
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
With the example above there would be no additional Pods (maxSurge: 0) and only a single Pod at a time would be unavailable (maxUnavailable: 1).
In this case, Kubernetes will first stop a Pod before starting up a new one. The advantage of that is that the infrastructure doesn’t need to scale up but the maximum workload will be less.
If you chose to use the percentage values for maxSurge and maxUnavailable you need to remember that:
maxSurge - the absolute number is calculated from the percentage by rounding up
maxUnavailable - the absolute number is calculated from percentage by rounding down
With the RollingUpdate defined correctly you also have to make sure your applications provide endpoints to be queried by Kubernetes that return the app’s status. Below it's a /greeting endpoint, that returns an HTTP 200 status when it’s ready to handle requests, and HTTP 500 when it’s not:
readinessProbe:
httpGet:
path: /greeting
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
initialDelaySeconds - Time (in seconds) before the first check for readiness is done.
periodSeconds - Time (in seconds) between two readiness checks after the first one.
successThreshold - Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.
timeoutSeconds - Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
More on the topic of liveness/readiness probes can be found here.
Try setting the spec.strategy.rollingUpdate.maxUnavailable (keeping the spec.strategy.type to RollingUpdate).
Setting it to 2, the first two containers should be re-deployed together, keeping service running on the third one. Or go with 3, if you don't care.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#max-unavailable
We have hosted service in AKS which has RWO volumes with Deployment strategy as Recreate.
We recently went live with this new service and we have many features/issues to be delivered everyday. Since the deployment strategy is Recreate, business team is experiencing some down time (2 min max) but it is annoying. Is there a better approach to manage RWO volumes with rolling update strategy ?
You have two types of strategies to choose from when specifying the way of updating your deployments:
Recreate Deployment: All existing Pods are killed before new ones are created.
Rolling Update Deployment: The Deployment updates Pods in a rolling update fashion.
The default and more recommended one is the .spec.strategy.type==RollingUpdate. See the examples below:
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
In this example there would be one additional Pod (maxSurge: 1) above the desired number of 2, and the number of available Pods cannot go lower than that number (maxUnavailable: 0).
Choosing this config, the Kubernetes will spin up an additional Pod, then stop an “old” one. If there’s another Node available to deploy this Pod, the system will be able to handle the same workload during deployment. If not, the Pod will be deployed on an already used Node at the cost of resources from other Pods hosted on the same Node.
You can also try something like this:
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
With the example above there would be no additional Pods (maxSurge: 0) and only a single Pod at a time would be unavailable (maxUnavailable: 1).
In this case, Kubernetes will first stop a Pod before starting up a new one. The advantage of that is that the infrastructure doesn’t need to scale up but the maximum workload will be less.
If you chose to use the percentage values for maxSurge and maxUnavailable you need to remember that:
maxSurge - the absolute number is calculated from the percentage by rounding up
maxUnavailable - the absolute number is calculated from percentage by rounding down
With the RollingUpdate defined correctly you also have to make sure your applications provide endpoints to be queried by Kubernetes that return the app’s status. Below it's a /greeting endpoint, that returns an HTTP 200 status when it’s ready to handle requests, and HTTP 500 when it’s not:
readinessProbe:
httpGet:
path: /greeting
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
initialDelaySeconds - Time (in seconds) before the first check for readiness is done.
periodSeconds - Time (in seconds) between two readiness checks after the first one.
successThreshold - Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.
timeoutSeconds - Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
More on the topic of liveness/readiness probes can be found here.
These are only examples but they should give you the idea of that particular update strategy that could be used in order to eliminate the possibility of downtime.
Here is my Readiness Probe Configuration:
readinessProbe:
httpGet:
path: /devops/versioninfo/api
port: 9001
initialDelaySeconds: 300
timeoutSeconds: 3
periodSeconds: 10
failureThreshold: 60
Here is my rolling update strategy:
strategy:
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
Because it will take a long time for my pods to be ready, but when the deployment is rolling update, old pods will be deleted when the new one's status is running whose ready health is not ok.
How to let the rolling update strategy be that the new one is ready and then delete the old one.
You can try increasing the minReadySeconds option in the Deployment spec. Basically, tell the deployment that you need to at least wait X number of seconds before you can say one particular pod is ready.
✌️
My setting for readinessProbe is following:
readinessProbe:
httpGet:
path: /up
port: *status-port
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
I want to change the periodSeconds to a larger value once my pod is running ok. Is it possible to achieve this? Since during starting of the pod it makes sense to probe it once every 5 seconds, but once it is running fine, it would be more efficient use of resource to probe it once every say 30 seconds.
Such a feature doesn't exist. You can look here for available options.
Let's say I have a deployment template like this
spec:
minReadySeconds: 15
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 20
successThreshold: 1
timeoutSeconds: 5
How will this affect the newly versions of my app? Will the minReadySeconds and initialDelaySeconds count at the same time? Will the initialDelaySeconds come first then minReadySeconds?
From Kubernetes Deployment documentation:
.spec.minReadySeconds is an optional field that specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing, for it to be considered available. This defaults to 0 (the Pod will be considered available as soon as it is ready). To learn more about when a Pod is considered ready, see Container Probes
So your newly created app pod have to be ready for .spec.minReadySeconds seconds to be considered as available.
initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated.
So initialDelaySeconds comes before minReadySeconds.
Lets say, container in the pod has started at t seconds. Readiness probe will be initiated at t+initialDelaySeconds seconds. Assume Pod become ready at t1 seconds(t1 > t+initialDelaySeconds). So this pod will be available after t1+minReadySeconds seconds.