kubernetes pod restart takes time and downtime - kubernetes

I have a service and pod in node.js . .consider hello world ..
exposed port : 80 on http
I want to seamlessly restart my service/pod
pod/service restart is taking a lot of time, thus there is downtime.
Using : kubectl delete; then recreate it with kubectl.
How can i avoid delay and downtime ?

Considering continuous deployments, your previous Pods will be terminated & new Pods will be created. Therefore, downtime of service is possible.
To avoid this add strategy in your deployment spec
example:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: api
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
where maxUnavailable: 0 defines that at any given time more than 1 pods should be available
Extra:
If you service takes some time to be live you can use readiness probe in spec to avoid traffic to be routed before the pods are ready .
example :
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 15
periodSeconds: 30

Related

Kubernetes Zero Downtime deployment not working - Gives 503 Service Temporarily Unavailable

I am trying to achieve zero-downtime deployment using Kubernetes. But every time I do the upgrade of the deployment using a new image, I am seeing 2-3 seconds of downtime. I am testing this using a Hello-World sort of application but still could not achieve it. I am deploying my application using the Helm charts.
Following the online blogs and resources, I am using Readiness-Probe and Rolling Update strategy in my Deployment.yaml file. But this gives me no success.
I have created a /health end-point which simply returns 200 status code as a check for readiness probe. I expected that after using readiness probes and RollingUpdate strategy in Kubernetes I would be able to achieve zero-downtime of my service when I upgrade the image of the container. The request to my service goes through an Amazon ELB.
Deployment.yaml file is as below:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: wine-deployment
labels:
app: wine-store
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app: wine-store
replicas: 2
template:
metadata:
labels:
app: wine-store
spec:
containers:
- name: {{ .Chart.Name }}
resources:
limits:
cpu: 250m
requests:
cpu: 200m
image: "my-private-image-repository-with-tag-and-version-goes-here-which-i-have-hidden-here"
imagePullPolicy: Always
env:
- name: GET_HOSTS_FROM
value: dns
ports:
- containerPort: 8089
name: testing-port
readinessProbe:
httpGet:
path: /health
port: 8089
initialDelaySeconds: 3
periodSeconds: 3
Service.yaml file:
apiVersion: v1
kind: Service
metadata:
name: wine-service
labels:
app: wine-store
spec:
ports:
- port: 80
targetPort: 8089
protocol: TCP
selector:
app: wine-store
Ingress.yaml file:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: wine-ingress
annotations:
kubernetes.io/ingress.class: public-nginx
spec:
rules:
- host: my-service-my-internal-domain.com
http:
paths:
- path: /
backend:
serviceName: wine-service
servicePort: 80
I expect the downtime to be zero when I am upgrading the image using helm upgrade command. Meanwhile, when the upgrade is in progress, I continuously hit my service using a curl command. This curl command gives me 503-service Temporarily un-available errors for 2-3 seconds and then again the service is up. I expect that this downtime does not happens.
This issue is caused by the Service VIP using iptables. You haven't done anything wrong - it's a limitation of current Kubernetes.
When the readiness probe on the new pod passes, the old pod is terminated and kube-proxy rewrites the iptables for the service. However, a request can hit the service after the old pod is terminated but before iptables has been updated resulting in a 503.
A simple workaround is to delay termination by using a preStop lifecycle hook:
lifecycle:
preStop:
exec:
command: ["/bin/bash", "-c", "sleep 10"]
It'd probably not relevant in this case, but implementing graceful termination in your application is a good idea. Intercept the TERM signal and wait for your application to finish handling any requests that it has already received rather than just exiting immediately.
Alternatively, more replicas, a low maxUnavailable and a high maxSurge will all reduce the probability of requests hitting a terminating pod.
For more info:
https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables
https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods
Another answer mistakenly suggests you need a liveness probe. While it's a good idea to have a liveness probe, it won't effect the issue that you are experiencing. With no liveness probe defined the default state is Success.
In the context of a rolling deployment a liveness probe will be irrelevant - Once the readiness probe on the new pod passes the old pod will be sent the TERM signal and iptables will be updated. Now that the old pod is terminating, any liveness probe is irrelevant as its only function is to cause a pod to be restarted if the liveness probe fails.
Any liveness probe on the new pod again is irrelevant. When the pod is first started it is considered live by default. Only after the initialDelaySeconds of the liveness probe would it start being checked and, if it failed, the pod would be terminated.
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes
Go around with blue-green deployments because even if pods are up it may take time for kube-proxy to forward requests to new POD IPs.
So setup new deployment, after all pods are up update service selector to new POD lables.
Follow: https://kubernetes.io/blog/2018/04/30/zero-downtime-deployment-kubernetes-jenkins/
The problem you describe indicate an issue with readiness probes. It is important to understand the differences between liveness and readiness probes. First of all you should implement and configure both!
The liveness probes are to check if the container is started and alive. If this isn’t the case, kubernetes will eventually restart the container.
The readiness probes in turn also check dependencies like database connections or other services your container is depending on to fulfill it’s work. As a developer you have to invest here more time into the implementation than just for the liveness probes. You have to expose a an endpoint which is also checking the mentioned dependencies when queried.
Your current configuration uses an health endpoint which is usually used by the liveness probes. It probably doesn’t check if your services is really ready to take traffic.
Kubernetes relies on the readiness probes. During an rolling update, it will keep the old container up and running until the new service declares that it is ready to take traffic. Therefore the readiness probes have to be implemented correctly.

Kubernetes - can a Deployment have multiple ReplicaSets?

Just finished reading Nigel Poulton's The Kubernetes Book. I'm left with the question of whether or not a Deployment can specify multiple ReplicaSets.
When I think Deployment, I think of it in the traditional sense of an entire application being deployed. Or is there meant to be a Deployment for each microservice?
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: hello-deploy
spec:
replicas: 10
selector:
matchLabels:
app: hello-world
minReadySeconds: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
app: hello-world
spec:
containers:
- name: hello-pod
image: nigelpoulton/k8sbook : latest
ports:
- containerPort: 8080
It is meant to be a deployment of each microservice.
You can also manage the quantity of "deployed services" of each microservices type.
So for instance, if you want to deploy Service A (Docker image with an Java service) 5 times, you have a deployment resulting 5 pods. Each pod contains the image of Service A.
If you deploy a new version of this Service A (Docker image with an Java service), Kubernetes is able to do a rolling update and manage the shut down of the old Java service type (the existing pods) and creates 5 new pods with the new Java Service A.2 (a new docker image).
Thus your whole microservices application/infrastructure is build upon multiple deployments. Each generating Kubernetes pods, which are published by Kubernetes services.
A deployment contains a single pod template, and generates one replicaset per revision
The replica sets can be multiple up to a limit of 10 based on the number of updates that have been done using deployment. But only one replicaSet (the latest one) should be showing the number of pods; all other older sets should be showing 0.
We can set revisionHistoryLimit to specify how many old replicaSets we want to retain:
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#clean-up-policy

How to automatically stop rolling update when CrashLoopBackOff?

I use Google Kubernetes Engine and I intentionally put an error in the code. I was hoping the rolling update will stop when it discovers the status is CrashLoopBackOff, but it wasn't.
In this page, they say..
The Deployment controller will stop the bad rollout automatically, and
will stop scaling up the new ReplicaSet. This depends on the
rollingUpdate parameters (maxUnavailable specifically) that you have
specified.
But it's not happening, is it only if the status ImagePullBackOff?
Below is my configuration.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: volume-service
labels:
group: volume
tier: service
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
template:
metadata:
labels:
group: volume
tier: service
spec:
containers:
- name: volume-service
image: gcr.io/example/volume-service:latest
P.S. I already read liveness/readiness probes, but I don't think it can stop a rolling update? or is it?
Turns out I just need to set minReadySeconds and it stops the rolling update when the new replicaSet has status CrashLoopBackOff or something like Exited with status code 1. So now the old replicaSet still available and not updated.
Here is the new config.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: volume-service
labels:
group: volume
tier: service
spec:
replicas: 4
minReadySeconds: 60
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
template:
metadata:
labels:
group: volume
tier: service
spec:
containers:
- name: volume-service
image: gcr.io/example/volume-service:latest
Thank you for averyone help!
I agree with #Nicola_Ben - I would also consider changing to the setup below:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 <----- I want at least (4)-[1] = 3 available pods.
maxSurge: 1 <----- I want maximum (4)+[1] = 5 total running pods.
Or even change maxSurge to 0.
This will help us to expose less possibly nonfunctional pods (like we would do in canary release).
Like #Hana_Alaydrus suggested its important to setup minReadySeconds.
With addition to that, sometimes we need to take more actions after the rollout execution.
(For example, there are cases when the new pods not functioning properly but the process running inside the container haven't crash).
A suggestion for a general debug process:
1 ) First of all, pause the rollout with:
kubectl rollout pause deployment <name>.
2 ) Debug the relevant pods and decide how to continue (maybe we can continue with with the new release, maybe not).
3 ) We would have to resume the rollout with: kubectl rollout resume deployment <name> because even if we decided to return to previous release with the undo command (4.B) we need first to resume the rollout.
4.A ) Continue with new release.
4.B ) Return to previous release with: kubectl rollout undo deployment <name>.
Below is a visual summary (click inside in order to view the comments):
The explanation you quoted is correct, and it means that the new replicaSet (the one with the error) will not proceed to completion, but it will be stopped in its progression to the maxSurge+maxUnavailable count. And the old replicaSet will be present too.
Here the example I tried with:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
And these are the results:
NAME READY STATUS RESTARTS AGE
pod/volume-service-6bb8dd677f-2xpwn 0/1 ImagePullBackOff 0 42s
pod/volume-service-6bb8dd677f-gcwj6 0/1 ImagePullBackOff 0 42s
pod/volume-service-c98fd8d-kfff2 1/1 Running 0 59s
pod/volume-service-c98fd8d-wcjkz 1/1 Running 0 28m
pod/volume-service-c98fd8d-xvhbm 1/1 Running 0 28m
NAME DESIRED CURRENT READY AGE
replicaset.extensions/volume-service-6bb8dd677f 2 2 0 26m
replicaset.extensions/volume-service-c98fd8d 3 3 3 28m
My new replicaSet will start only 2 new pods (1 slot from the maxUnavailable and 1 slot from the maxSurge).
The old replicaSet will keep running 3 pods (4 - 1 unAvailable).
The two params you set in the rollingUpdate section are the key point, but you can play also with other factors like readinessProbe, livenessProbe, minReadySeconds, progressDeadlineSeconds.
For them, here the reference.

Kubernetes - Rolling update killing off old pod without bringing up new one

I am currently using Deployments to manage my pods in my K8S cluster.
Some of my deployments require 2 pods/replicas, some require 3 pods/replicas and some of them require just 1 pod/replica. The issue Im having is the one with one pod/replica.
My YAML file is :
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: user-management-backend-deployment
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
selector:
matchLabels:
name: user-management-backend
template:
metadata:
labels:
name: user-management-backend
spec:
containers:
- name: user-management-backend
image: proj_csdp/user-management_backend:3.1.8
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
livenessProbe:
httpGet:
port: 8080
path: /user_management/health
initialDelaySeconds: 300
timeoutSeconds: 30
readinessProbe:
httpGet:
port: 8080
path: /user_management/health
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: nfs
mountPath: "/vault"
volumes:
- name: nfs
nfs:
server: kube-nfs
path: "/kubenfs/vault"
readOnly: true
I have a the old version running fine.
# kubectl get po | grep user-management-backend-deployment
user-management-backend-deployment-3264073543-mrrvl 1/1 Running 0 4d
Now I want to update the image:
# kubectl set image deployment user-management-backend-deployment user-management-backend=proj_csdp/user-management_backend:3.2.0
Now as per RollingUpdate design, K8S should bring up the new pod while keeping the old pod working and only once the new pod is ready to take the traffic, should the old pod get deleted. But what I see is that the old pod is immediately deleted and the new pod is created and then it takes time to start taking traffic meaning that I have to drop traffic.
# kubectl get po | grep user-management-backend-deployment
user-management-backend-deployment-3264073543-l93m9 0/1 ContainerCreating 0 1s
# kubectl get po | grep user-management-backend-deployment
user-management-backend-deployment-3264073543-l93m9 1/1 Running 0 33s
I have used maxSurge: 2 & maxUnavailable: 1 but this does not seem to be working.
Any ideas why is this not working ?
It appears to be the maxUnavailable: 1; I was able to trivially reproduce your experience setting that value, and trivially achieve the correct experience by setting it to maxUnavailable: 0
Here's my "pseudo-proof" of how the scheduler arrived at the behavior you are experiencing:
Because replicas: 1, the desired state for k8s is exactly one Pod in Ready. During a Rolling Update operation, which is the strategy you requested, it will create a new Pod, bringing the total to 2. But you granted k8s permission to leave one Pod in an unavailable state, and you instructed it to keep the desired number of Pods at 1. Thus, it fulfilled all of those constraints: 1 Pod, the desired count, in an unavailable state, permitted by the R-U strategy.
By setting the maxUnavailable to zero, you correctly direct k8s to never let any Pod be unavailable, even if that means surging Pods above the replica count for a short time
with Strategy Type set to RollingUpdate a new pod is created before the old one is deleted even with a single replica. Strategy Type Recreate kills old pods before creating new ones
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment
As answered already, you can set the maxUnavailable to 0 to achieve the desired result. A couple of extra notes:
You should not expect this to work when using a stateful service that mounts a single specific volume that is to be used by the new pod. The volume will be attached to the soon-to-be-replaced pod, so won't be able to attach to the new pod.
The documentation notes that you cannot set this to 0 if you have set .spec.strategy.rollingUpdate.maxSurge to 0.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#max-unavailable

kubernetes deployment wait between pods on rolling update

So we have a deployment that is using rolling updates. We need it to pause 180 seconds between each pod it brings up. My understanding is that I need to set MinReadySeconds: 180 and to set the RollingUpdateStrategy.MaxUnavailable: 1 and RollingUpdateStrategy.MaxSurge: 1 for the deployment to wait. With those settings it still brings the pods up as fast as it can. . . What am I missing.
relevant part of my deployment
spec:
minReadySeconds: 180
replicas: 9
revisionHistoryLimit: 20
selector:
matchLabels:
deployment: standard
name: standard-pod
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
Assuming that a pod is ready after a certain delay is not very idiomatic within an orchestrator like Kubernetes, as there may be something that prevents the pod from successfully starting, or maybe delays the start by another few seconds.
Instead, you could use Liveliness and Readiness probes to make sure that the pod is there and ready to serve traffic before taking down the old pod
We updated our cluster to a newer version of Kubernetes and it started working.
Posted on behalf of the question asker.