Google kubernetes pods shutdown and start - kubernetes

We are using Google kubernetes to deploy our microservices, circlci for deployment integration, we define our k8s files inside the githup repos.
The problem we are facing is that some services are taking time on startup by loading database schema and other pre data, but google kubernetes shutdown old pods before the new pods fully started.
can we tell kubernetes somehow to wait till the new pods are fully loaded or at least to wait 10 seconds before shutdown the old pods.

Yes, this is possible. Based on the description it sounds like you are using a single deployment that gets updated. The new pods are created and the old ones are removed before the new ones become ready.
To address this, you need to have a proper readinessProbe configured or readinessGates on the pods so that the pod status only becomes ready once it actually is ready. If you are not sure what to put as the probe, you can also define initialDelaySeconds with a guess at how much time you think the pod needs to start up.
You should also look into using the deployment spec field for minReadySeconds as well as defining a proper deployment strategy. You can ensure that the rolling update creates new pods (by defining the maxSurge field) and ensure that the old pod is not removed until the new one is ready and receiving traffic (using the maxUnavailable field = 0).
an example would be:
spec:
replicas: 3
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
This will maintain 3 working replicas at any given time. When a new version is pushed, 1 new pod will be created with the new image. No pods will be taken offline until the new one is in ready state. Once it is, one of the old pods will be terminated and the cycle goes again. Feel free to change the maxSurge value to a higher number if you want the rollout to happen in one go.

Related

Replace the image on one pod manually, while other pods uses the main image

Let's say I have 10 pods running a stable version, and I wish to replace the image of one of them to run a newer version before a full rollout.
Is there a way to do that?
Not as such: every pod managed by a Deployment is expected to be identical, including running the same image. You can't change a pod's image once it's been created, and if you change the Deployment's image, it will try to recreate all of its managed pods.
If the only thing you're worried about is the pod starting up, the default behavior of a deployment is to start 25% of its specified replicas with the new image. The old pods will continue running uninterrupted until the new replicas successfully start and pass their readiness checks. If the new pods immediately go into CrashLoopBackOff state, the old pods will still be running.
If you want to start a pod specifically as a canary deployment, you can create a second Deployment to handle that. You'll need to include some label on the pods (for instance, canary: 'true') where you can distinguish the canary from main pods. This would be present in the pod spec, and in the deployment selector, but it would not be present in the corresponding Service selector: the Service matches both canary and non-canary pods. If this runs successfully then you can remove the canary Deployment and update the image on the main Deployment.
Like the other answer mentioned, It sounds like you are talking about a canary deployment. You can do this with Kubernetes and also with Istio. I prefer Istio as it gives you some great control over traffic weighting. I.e you could send 1% of traffic to the canary and 99% to the control. Great for testing in production. It also lets you route using HTTP headers.
https://istio.io/latest/blog/2017/0.1-canary/
If you want to do it with k8s just create two deployments with unique deployment names (myappv1 & myappv2 for example) with the same app= label. Then you can just create a service with the selector = whatever your app label is. The svc will round robin between the two v1 and v2 deployments.

Why does Openshift scale up old deployment before rolling deployment

In my team, we sometimes scale down to just one pod in Openshift to make testing easier. If we then do a rolling update with the desired replica count set to 2, Openshift scales up to two pods before performing a rolling deploy. It is a nuisance, because the new "old" pod can start things that we don't expect to be started before the new deployment starts, and so we have to remember to take down the one pod before the new deploy.
Is there a way to stop the old deployment from scaling up to the desired replica count while the new deployment is scaled up to the desired replica count? Also, why does it work this way?
OpenShift Master:
v3.11.200
Kubernetes Master:
v1.11.0+d4cacc0
OpenShift Web Console:
3.11.200-1-8a53b1d
From our Openshift template:
- apiVersion: v1
kind: DeploymentConfig
spec:
replicas: 2
strategy:
type: Rolling
This is expected behavior when using RollingUpdate strategy. It removes old pods one by one, while adding new ones at the same time, keeping the application available throughout the whole process, and ensuring there’s no drop in its capacity to handle requests. Since you have only one pod, Kubernetes scales the deployment to keep the strategy and zero-downtime as requested in the manifest.
It scales up to 2, because if not specified maxSurge defaults to 25%. It means that there can be at most 25% more pod instances than the desired count during an update.
If you want to ensure that this won't be scaled you might change the strategy to Recreate. This will cause all old pods to be deleted before the new ones are created. Use this strategy when your application doesn’t support running multiple versions in parallel and requires the old version to be stopped completely before the new one is started. However please note that, this strategy does involve a short period of time when your app becomes completely unavailable.
Here`s a good document that describes rolling update strategy. It is worth also checking official kubernetes documentation about deployments.

Kubernetes Deployment Rolling Updates

I have an application that I deploy on Kubernetes.
This application has 4 replicas and I'm doing a rolling update on each deployment.
This application has a graceful shutdown which can take tens of minutes (it has to wait for running tasks to finish).
My problem is that during updates, I have over-capacity since all the older version pods are stuck at "Terminating" status while all the new pods are created.
During the updates, I end up running with 8 containers and it is something I'm trying to avoid.
I tried to set maxSurge to 0, but this setting doesn't take into consideration the "Terminating" pods, so the load on my servers during the deployment is too high.
The behaviour I'm trying to get is that new pods will only get created after the old version pods finished successfully, so at all times I'm not exceeding the number of replicas I set.
I wonder if there is a way to achieve such behaviour.
What I ended up doing is creating a StatefulSet with podManagementPolicy: Parallel and updateStrategy to OnDelete.
I also set terminationGracePeriodSeconds to the maximum time it takes for a pod to terminate.
As a part of my deployment process, I apply the new StatefulSet with the new image and then delete all the running pods.
This way all the pods are entering Terminating state and whenever a pod finished its task and terminated a new pod with the new image will replace it.
This way I'm able to keep a static number of replicas during the whole deployment process.
Let me suggest the following strategy:
Deployments implement the concept of ready pods to aide rolling updates. Readiness probes allow the deployment to gradually update pods while giving you the control to determine when the rolling update can proceed.
A Ready pod is one that is considered successfully updated by the Deployment and will no longer count towards the surge count for deployment. A pod will be considered ready if its readiness probe is successful and spec.minReadySeconds have passed since the pod was created. The default for these options will result in a pod that is ready as soon as its containers start.
So, what you can do, is implement (if you haven't done so yet) a readiness probe for your pods in addition to setting the spec.minReadySeconds to a value that will make sense (worst case) to the time that it takes for your pods to terminate.
This will ensure rolling out will happen gradually and in coordination to your requirements.
In addition to that, don't forget to configure a deadline for the rollout.
By default, after the rollout can’t make any progress in 10 minutes, it’s considered as failed. The time after which the Deployment is considered failed is configurable through the progressDeadlineSeconds property in the Deployment spec.

Kubernetes deployment: Don't terminate pod until new pod has been in Running state for 2 minutes

When I do kubectl delete pod or kubectl patch, how can I ensure that the old pod doesn't actually delete itself until the replacement pod has been running for 2 minutes? (And if the replacement pod dies before the 2 minute mark, then don't even delete the old pod)
The reason is that my initiation takes about 2 minutes to pull some latest data and run calculations; after 2 minutes it'll reach a point where it'll either error out or continue running with the updated calculations.
I want to be able to occasionally delete the pod so that it'll restart and get new versions of stuff (because getting new versions is only done at the beginning of the code).
Is there a way I can do this without an init container? Because I'm concerned that it would be difficult to pass the calculated results from the init container to the main container
We need to tweak two parameters.
We have set the minReadySeconds as 2 min. Or we can use readiness probe instead of hardcoded 2min.
We have to do rolling update with maxSurge > 0 (default: 1) and maxUnavailable: 0. This will bring the new pod(s) and only if it becomes ready, old pod(s) will be killed. This process continues for rest of the pods.
Note: 0 <= maxSurge <= replicaCount
If you are using a Deployment, you can set minReadySeconds in the spec to 120 (seconds). Kubernetes will not consider it actually ready and in service (and therefore spin down old pods) until the pod reports it has been ready for that long.

Why does scaling down a deployment seem to always remove the newest pods?

(Before I start, I'm using minikube v27 on Windows 10.)
I have created a deployment with the nginx 'hello world' container with a desired count of 2:
I actually went into the '2 hours' old pod and edited the index.html file from the welcome message to "broken" - I want to play with k8s to seem what it would look like if one pod was 'faulty'.
If I scale this deployment up to more instances and then scale down again, I almost expected k8s to remove the oldest pods, but it consistently removes the newest:
How do I make it remove the oldest pods first?
(Ideally, I'd like to be able to just say "redeploy everything as the exact same version/image/desired count in a rolling deployment" if that is possible)
Pod deletion preference is based on a ordered series of checks, defined in code here:
https://github.com/kubernetes/kubernetes/blob/release-1.11/pkg/controller/controller_utils.go#L737
Summarizing- precedence is given to delete pods:
that are unassigned to a node, vs assigned to a node
that are in pending or not running state, vs running
that are in not-ready, vs ready
that have been in ready state for fewer seconds
that have higher restart counts
that have newer vs older creation times
These checks are not directly configurable.
Given the rules, if you can make an old pod to be not ready, or cause an old pod to restart, it will be removed at scale down time before a newer pod that is ready and has not restarted.
There is discussion around use cases for the ability to control deletion priority, which mostly involve workloads that are a mix of job and service, here:
https://github.com/kubernetes/kubernetes/issues/45509
what about this :
kubectl scale deployment ingress-nginx-controller --replicas=2
Wait until 2 replicas are up.
kubectl delete pod ingress-nginx-controller-oldest-replica
kubectl scale deployment ingress-nginx-controller --replicas=1
I experienced zero downtime doing so while removing oldest pod.