How to scale up all OpenShift pods before scaling down old ones - deployment

I have a basic OpenShift deployment configuration:
kind: DeploymentConfig
spec:
replicas: 3
strategy:
type: Rolling
Additionaly I've put:
maxSurge: 3
maxUnavailable: 0%
because I want to scale up all new pods first and after that scale down old pods (so there will be 6 pods running during deploymentm that's why I decided to set up maxSurge).
I want to have all old pods running until all new pods are up but with this set of parameters there is something wrong. During deployment:
all 3 new pods are initialized at once and are trying to start, old pods are running (as expected)
if first new pod started sucessfully then the old one is terminated
if second new pod is ready then another old pod is terminated
I want to terminate all old pods ONLY if all new pods are ready to handle requests, otherwise all the old pods should handle requests.
What did I miss in this confgiuration?

The behavior you document is expected for a deployment rollout (that OpenShift will shut down each old pod as a new pod becomes ready). It will also start routing traffic to the new nodes as they become available, which you say that you don't want either.
A service is pretty much by definition going to route to pods as they are available. And a deployment pretty much handles pods independently, so I don't believe that anything will really give you the behavior you are looking for there either.
If you want a blue green style deployment like you describe, you are essentially going to have deploy the new pods as a separate deployment. Then once the new deployment is completely up, you can change the corresponding service to point at the new pods. Then you can shut down the old deployment.
Service Mesh can help with some of that. So could an operator. Or you could do it manually.

You can combine the rollout strategy with readiness checks with an initial delay to ensure that all the new pods have time to start up before the old ones are all shut down at the same time.
In the case below, the new 3 pods will be spun up (for a total of 6 pods) and then after 60 seconds, the readiness check will occur and the old pods will be shut down. You would just want to adjust your readiness delay to a large enough timeframe to give all of your new pods time to start up.
apiVersion: v1
kind: DeploymentConfig
spec:
replicas: 3
strategy:
rollingParams:
maxSurge: 3
maxUnavailable: 0
type: Rolling
template:
spec:
containers:
- readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8099
initialDelaySeconds: 60

Related

Speed up rollout of Kubernetes deployment

How can I speedup the rollout of new images in Kubernetes?
Currently, we have an automated build job that modifies a yaml file to point to a new revision and then runs kubectl apply on it.
It works, but it takes long delays (up to 20 minutes PER POD) before all pods with the previous revision are replaced with the latest.
Also, the deployment is configured for 3 replicas. We see one pod at a time is started with the new revision. (Is this the Kubernetes "surge" ?) But that is too slow, I would rather kill all 3 pods and have 3 new ones with the new image.
I would rather kill all 3 pods and have 3 new ones with the new image.
You can do that. Set strategy.type: to Recreate instead of the default RollingUpdate in your Deployment. See strategy.
But you probably get some downtime during deployment.
Jonas and SYN are right but I would like to expand this topic with some additional info and examples.
You have two types of strategies to choose from when specifying the way of updating your deployments:
Recreate Deployment: All existing Pods are killed before new ones are created.
Rolling Update Deployment: The Deployment updates Pods in a rolling update fashion.
The default and more recommended one is the .spec.strategy.type==RollingUpdate. See the examples below:
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
In this example there would be one additional Pod (maxSurge: 1) above the desired number of 3, and the number of available Pods cannot go lower than that number (maxUnavailable: 0).
Choosing this config, the Kubernetes will spin up an additional Pod, then stop an “old” one. If there’s another Node available to deploy this Pod, the system will be able to handle the same workload during deployment. If not, the Pod will be deployed on an already used Node at the cost of resources from other Pods hosted on the same Node.
You can also try something like this:
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
With the example above there would be no additional Pods (maxSurge: 0) and only a single Pod at a time would be unavailable (maxUnavailable: 1).
In this case, Kubernetes will first stop a Pod before starting up a new one. The advantage of that is that the infrastructure doesn’t need to scale up but the maximum workload will be less.
If you chose to use the percentage values for maxSurge and maxUnavailable you need to remember that:
maxSurge - the absolute number is calculated from the percentage by rounding up
maxUnavailable - the absolute number is calculated from percentage by rounding down
With the RollingUpdate defined correctly you also have to make sure your applications provide endpoints to be queried by Kubernetes that return the app’s status. Below it's a /greeting endpoint, that returns an HTTP 200 status when it’s ready to handle requests, and HTTP 500 when it’s not:
readinessProbe:
httpGet:
path: /greeting
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
initialDelaySeconds - Time (in seconds) before the first check for readiness is done.
periodSeconds - Time (in seconds) between two readiness checks after the first one.
successThreshold - Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. Minimum value is 1.
timeoutSeconds - Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
More on the topic of liveness/readiness probes can be found here.
Try setting the spec.strategy.rollingUpdate.maxUnavailable (keeping the spec.strategy.type to RollingUpdate).
Setting it to 2, the first two containers should be re-deployed together, keeping service running on the third one. Or go with 3, if you don't care.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#max-unavailable

kubectl replace -f creates pod indefinitely in Pending state

I have a k8s deployment - I often deploy a new version to the docker repo - change the image tag - and try to replace the deployment using kubectl replace -f file.yaml. My replicas are set to 1 - I have only 1 pod of the deployment running at a time.
When I change the image tag (e.g changing v1 to v2) and try to replace it - it creates a new pod, but it remains in the 'pending' state indefinitely, while the old pod stays in 'Running' state.
I think the new pod waits for the old pod to be terminated - but it won't terminate by itself. I need it to be deleted by k8s so the new pod can take its place.
Using replace --force fixes this issue - but I'd like it to work using just replace -f. Any ideas how to achieve this?
The issue you see has nothing to do with kubectl replace/apply. The real reason is that deployments by default use RollingUpdate strategy which by default waits for new pod to be Running and only then kills old pod. The reason why new pod is in Pending state is unclear from your question but in most cases this indicates lack of compute resources for new pod.
You may do two different things:
Use RollingUpdate strategy with maxUnavailable=1. This will do what you want - it will kill old pod and then create a new one.
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
OR you can specify Recreate strategy which effectively does the same:
spec:
strategy:
type: Recreate
Read more about deployment strategies here: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy

Gitlab Autodevops How to always keep one pod alive

I'm using Gitlab Autodevops to deploy app on my kubernetes cluster. That app should always have only one instance running.
Problem is, during the update process, Helm kills currently running pod before the new pod is ready. This causes downtime period, when old version is already killed and new one isn't ready yet. To make it worse, app need significant time to start (2+ minutes).
I have tried to set minAvailable: 1 in PodDisruptionBudget, but no help.
Any idea how can i tell helm to wait for readiness of updated pod before killing old one? (Having 2 instances running simultaneously for several second is not such a problem for me)
You can release a new application version in few ways, it's necessary to choose the one that fit your needs.
I would recommend one of the following:
Ramped - slow rollout
A ramped deployment updates pods in a rolling update fashion, a secondary ReplicaSet is created with the new version of the application, then the number of replicas of the old version is decreased and the new version is increased until the correct number of replicas is reached.
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # how many pods we can add at a time
maxUnavailable: 0 # maxUnavailable define how many pods can be unavailable
# during the rolling update
Full example and steps can be found here.
Blue/Green - best to avoid API versioning issues
A blue/green deployment differs from a ramped deployment because the “green” version of the application is deployed alongside the “blue” version. After testing that the new version meets the requirements, we update the Kubernetes Service object that plays the role of load balancer to send traffic to the new version by replacing the version label in the selector field.
apiVersion: v1
kind: Service
metadata:
name: my-app
labels:
app: my-app
spec:
type: NodePort
ports:
- name: http
port: 8080
targetPort: 8080
# Note here that we match both the app and the version.
# When switching traffic, we update the label “version” with
# the appropriate value, ie: v2.0.0
selector:
app: my-app
version: v1.0.0
Full example and steps can be found here.
Canary - for testing
A canary deployment consists of routing a subset of users to a new functionality. In Kubernetes, a canary deployment can be done using two Deployments with common pod labels. One replica of the new version is released alongside the old version. Then after some time and if no error is detected, scale up the number of replicas of the new version and delete the old deployment.
Using this ReplicaSet technique requires spinning-up as many pods as necessary to get the right percentage of traffic. That said, if you want to send 1% of traffic to version B, you need to have one pod running with version B and 99 pods running with version A. This can be pretty inconvenient to manage so if you are looking for a better managed traffic distribution, look at load balancers such as HAProxy or service meshes like Linkerd, which provide greater controls over traffic.
Manifest for version A:
spec:
replicas: 3
Manifest for version B:
spec:
replicas: 1
Full example and steps can be found here.
You can also play with Interactive Tutorial - Updating Your App on Kubernetes.
I recommend reading Deploy, Scale And Upgrade An Application On Kubernetes With Helm.

How do I make Kubernetes scale my deployment based on the "ready"/ "not ready" status of my Pods?

I have a deployment with a defined number of replicas. I use readiness probe to communicate if my Pod is ready/ not ready to handle new connections – my Pods toggle between ready/ not ready state during their lifetime.
I want Kubernetes to scale the deployment up/ down to ensure that there is always the desired number of pods in a ready state.
Example:
If replicas is 4 and there are 4 Pods in ready state, then Kubernetes should keep the current replica count.
If replicas is 4 and there are 2 ready pods and 2 not ready pods, then Kubernetes should add 2 more pods.
How do I make Kubernetes scale my deployment based on the "ready"/ "not ready" status of my Pods?
I don't think this is possible. If pod is not ready, k8 will not make it ready as It is something which releated to your application.Even if it create new pod, how readiness will be guaranted. So you have to resolve the reasons behind non ready status and then k8. Only thing k8 does it keep them away from taking world load to avoid request failure
Ensuring you always have 4 pods running can be done by specifying the replicas property in your deployment definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 4 #here we define a requirement for 4 replicas
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
Kubernetes will ensure that if any pods crash, replacement pods will be created so that a total of 4 are always available.
You cannot schedule deployments on unhealthy nodes in the cluster. The master api will only create pods on nodes which are healthy and meet the quota criteria to create any additional pods on the nodes which are schedulable.
Moreover, what you define is called an auto-heal concept of k8s which in basic terms will be taken care of.

Is there a way to speed up the horizontal pod autoscaler metrics scan? Now it takes two minutes to upscale new pods

I have an application with some endpoints that are quite CPU intensive. Because of that I have configured a Horizontal Pod Autoscaler like this:
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: DeploymentConfig
name: some-app
targetCPUUtilizationPercentage: 30
The point is, supposing there's a request on a pod that keeps it working at 100% CPU for 5 mins. I takes two minutes until Openshift/Kubernetes schedules new pods.
Is there a way to speed up this process? It forces us to be almost unresponsive for two minutes.
The same thing happens to downscale, having to wait for two minutes until it destroys the unnecessary pods.
Ideally there should be some config option to set this up.
For OpenShift, Please modify /etc/origin/master/master-config.yaml
kubernetesMasterConfig:
controllerArguments:
horizontal-pod-autoscaler-downscale-delay: 2m0s
horizontal-pod-autoscaler-upscale-delay: 2m0s
and restart openshift master.
It's not a scalar, you should set it like this
kubernetesMasterConfig:
controllerArguments:
horizontal-pod-autoscaler-downscale-delay:
- 2m0s
horizontal-pod-autoscaler-upscale-delay:
- 2m0s
At least in OpenShift Origin v3.11