kubernetes deployment wait between pods on rolling update - kubernetes

So we have a deployment that is using rolling updates. We need it to pause 180 seconds between each pod it brings up. My understanding is that I need to set MinReadySeconds: 180 and to set the RollingUpdateStrategy.MaxUnavailable: 1 and RollingUpdateStrategy.MaxSurge: 1 for the deployment to wait. With those settings it still brings the pods up as fast as it can. . . What am I missing.
relevant part of my deployment
spec:
minReadySeconds: 180
replicas: 9
revisionHistoryLimit: 20
selector:
matchLabels:
deployment: standard
name: standard-pod
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate

Assuming that a pod is ready after a certain delay is not very idiomatic within an orchestrator like Kubernetes, as there may be something that prevents the pod from successfully starting, or maybe delays the start by another few seconds.
Instead, you could use Liveliness and Readiness probes to make sure that the pod is there and ready to serve traffic before taking down the old pod

We updated our cluster to a newer version of Kubernetes and it started working.
Posted on behalf of the question asker.

Related

How rollout undo works during rollingUpdate strategy when deployment has crossed 'progressDeadlineSeconds'?

I have created a kubernetes deployment with below specs:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
Algorithm of the rollout and undo that I'm experimenting:
kubectl apply -f deployment.yaml
if(kubectl rollout status deployment != 0)
{
kubectl rollout undo deployment
}
I'm trying to test the failure case.
Initially we have 2 pods old-pod-1 and old-pod-2. We initiate a rolling deployment with above configuration to maintain at least 2 pods at all times. So, when we initiate the deployment, an additional pod is created with the new code (new-pod-1). Let us say this succeeds. At this time, One old pod is brought down(old-pod-1 is down). Now if the 2nd new pod deployment fails, what would happen?
Adding some background:
In a test where new-pod-1 creation is unsuccessful, after progressDeadlineSeconds new-pod-1 is killed and after rollout undo we will be left with old-pod-1 and old-pod-2.
But in the above case, old-pod-1 is already down. thus, what is the end state of the pods? Will we have a pod created with old deployment version?

How to scale up all OpenShift pods before scaling down old ones

I have a basic OpenShift deployment configuration:
kind: DeploymentConfig
spec:
replicas: 3
strategy:
type: Rolling
Additionaly I've put:
maxSurge: 3
maxUnavailable: 0%
because I want to scale up all new pods first and after that scale down old pods (so there will be 6 pods running during deploymentm that's why I decided to set up maxSurge).
I want to have all old pods running until all new pods are up but with this set of parameters there is something wrong. During deployment:
all 3 new pods are initialized at once and are trying to start, old pods are running (as expected)
if first new pod started sucessfully then the old one is terminated
if second new pod is ready then another old pod is terminated
I want to terminate all old pods ONLY if all new pods are ready to handle requests, otherwise all the old pods should handle requests.
What did I miss in this confgiuration?
The behavior you document is expected for a deployment rollout (that OpenShift will shut down each old pod as a new pod becomes ready). It will also start routing traffic to the new nodes as they become available, which you say that you don't want either.
A service is pretty much by definition going to route to pods as they are available. And a deployment pretty much handles pods independently, so I don't believe that anything will really give you the behavior you are looking for there either.
If you want a blue green style deployment like you describe, you are essentially going to have deploy the new pods as a separate deployment. Then once the new deployment is completely up, you can change the corresponding service to point at the new pods. Then you can shut down the old deployment.
Service Mesh can help with some of that. So could an operator. Or you could do it manually.
You can combine the rollout strategy with readiness checks with an initial delay to ensure that all the new pods have time to start up before the old ones are all shut down at the same time.
In the case below, the new 3 pods will be spun up (for a total of 6 pods) and then after 60 seconds, the readiness check will occur and the old pods will be shut down. You would just want to adjust your readiness delay to a large enough timeframe to give all of your new pods time to start up.
apiVersion: v1
kind: DeploymentConfig
spec:
replicas: 3
strategy:
rollingParams:
maxSurge: 3
maxUnavailable: 0
type: Rolling
template:
spec:
containers:
- readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8099
initialDelaySeconds: 60

How to handle Rolling updates deployment strategy for RWO volume service?

We have hosted service in AKS which has RWO volumes with Deployment strategy as Recreate.
We recently went live with this new service and we have many features/issues to be delivered everyday. Since the deployment strategy is Recreate, business team is experiencing some down time (2 min max) but it is annoying. Is there a better approach to manage RWO volumes with rolling update strategy ?
You have two types of strategies to choose from when specifying the way of updating your deployments:
Recreate Deployment: All existing Pods are killed before new ones are created.
Rolling Update Deployment: The Deployment updates Pods in a rolling update fashion.
The default and more recommended one is the .spec.strategy.type==RollingUpdate. See the examples below:
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
In this example there would be one additional Pod (maxSurge: 1) above the desired number of 2, and the number of available Pods cannot go lower than that number (maxUnavailable: 0).
Choosing this config, the Kubernetes will spin up an additional Pod, then stop an “old” one. If there’s another Node available to deploy this Pod, the system will be able to handle the same workload during deployment. If not, the Pod will be deployed on an already used Node at the cost of resources from other Pods hosted on the same Node.
You can also try something like this:
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
With the example above there would be no additional Pods (maxSurge: 0) and only a single Pod at a time would be unavailable (maxUnavailable: 1).
In this case, Kubernetes will first stop a Pod before starting up a new one. The advantage of that is that the infrastructure doesn’t need to scale up but the maximum workload will be less.
If you chose to use the percentage values for maxSurge and maxUnavailable you need to remember that:
maxSurge - the absolute number is calculated from the percentage by rounding up
maxUnavailable - the absolute number is calculated from percentage by rounding down
With the RollingUpdate defined correctly you also have to make sure your applications provide endpoints to be queried by Kubernetes that return the app’s status. Below it's a /greeting endpoint, that returns an HTTP 200 status when it’s ready to handle requests, and HTTP 500 when it’s not:
readinessProbe:
httpGet:
path: /greeting
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
initialDelaySeconds - Time (in seconds) before the first check for readiness is done.
periodSeconds - Time (in seconds) between two readiness checks after the first one.
successThreshold - Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.
timeoutSeconds - Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
More on the topic of liveness/readiness probes can be found here.
These are only examples but they should give you the idea of that particular update strategy that could be used in order to eliminate the possibility of downtime.

kubernetes pod restart takes time and downtime

I have a service and pod in node.js . .consider hello world ..
exposed port : 80 on http
I want to seamlessly restart my service/pod
pod/service restart is taking a lot of time, thus there is downtime.
Using : kubectl delete; then recreate it with kubectl.
How can i avoid delay and downtime ?
Considering continuous deployments, your previous Pods will be terminated & new Pods will be created. Therefore, downtime of service is possible.
To avoid this add strategy in your deployment spec
example:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: api
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
where maxUnavailable: 0 defines that at any given time more than 1 pods should be available
Extra:
If you service takes some time to be live you can use readiness probe in spec to avoid traffic to be routed before the pods are ready .
example :
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 15
periodSeconds: 30

How to automatically stop rolling update when CrashLoopBackOff?

I use Google Kubernetes Engine and I intentionally put an error in the code. I was hoping the rolling update will stop when it discovers the status is CrashLoopBackOff, but it wasn't.
In this page, they say..
The Deployment controller will stop the bad rollout automatically, and
will stop scaling up the new ReplicaSet. This depends on the
rollingUpdate parameters (maxUnavailable specifically) that you have
specified.
But it's not happening, is it only if the status ImagePullBackOff?
Below is my configuration.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: volume-service
labels:
group: volume
tier: service
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
template:
metadata:
labels:
group: volume
tier: service
spec:
containers:
- name: volume-service
image: gcr.io/example/volume-service:latest
P.S. I already read liveness/readiness probes, but I don't think it can stop a rolling update? or is it?
Turns out I just need to set minReadySeconds and it stops the rolling update when the new replicaSet has status CrashLoopBackOff or something like Exited with status code 1. So now the old replicaSet still available and not updated.
Here is the new config.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: volume-service
labels:
group: volume
tier: service
spec:
replicas: 4
minReadySeconds: 60
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
template:
metadata:
labels:
group: volume
tier: service
spec:
containers:
- name: volume-service
image: gcr.io/example/volume-service:latest
Thank you for averyone help!
I agree with #Nicola_Ben - I would also consider changing to the setup below:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 <----- I want at least (4)-[1] = 3 available pods.
maxSurge: 1 <----- I want maximum (4)+[1] = 5 total running pods.
Or even change maxSurge to 0.
This will help us to expose less possibly nonfunctional pods (like we would do in canary release).
Like #Hana_Alaydrus suggested its important to setup minReadySeconds.
With addition to that, sometimes we need to take more actions after the rollout execution.
(For example, there are cases when the new pods not functioning properly but the process running inside the container haven't crash).
A suggestion for a general debug process:
1 ) First of all, pause the rollout with:
kubectl rollout pause deployment <name>.
2 ) Debug the relevant pods and decide how to continue (maybe we can continue with with the new release, maybe not).
3 ) We would have to resume the rollout with: kubectl rollout resume deployment <name> because even if we decided to return to previous release with the undo command (4.B) we need first to resume the rollout.
4.A ) Continue with new release.
4.B ) Return to previous release with: kubectl rollout undo deployment <name>.
Below is a visual summary (click inside in order to view the comments):
The explanation you quoted is correct, and it means that the new replicaSet (the one with the error) will not proceed to completion, but it will be stopped in its progression to the maxSurge+maxUnavailable count. And the old replicaSet will be present too.
Here the example I tried with:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
And these are the results:
NAME READY STATUS RESTARTS AGE
pod/volume-service-6bb8dd677f-2xpwn 0/1 ImagePullBackOff 0 42s
pod/volume-service-6bb8dd677f-gcwj6 0/1 ImagePullBackOff 0 42s
pod/volume-service-c98fd8d-kfff2 1/1 Running 0 59s
pod/volume-service-c98fd8d-wcjkz 1/1 Running 0 28m
pod/volume-service-c98fd8d-xvhbm 1/1 Running 0 28m
NAME DESIRED CURRENT READY AGE
replicaset.extensions/volume-service-6bb8dd677f 2 2 0 26m
replicaset.extensions/volume-service-c98fd8d 3 3 3 28m
My new replicaSet will start only 2 new pods (1 slot from the maxUnavailable and 1 slot from the maxSurge).
The old replicaSet will keep running 3 pods (4 - 1 unAvailable).
The two params you set in the rollingUpdate section are the key point, but you can play also with other factors like readinessProbe, livenessProbe, minReadySeconds, progressDeadlineSeconds.
For them, here the reference.