Kubernetes - Rolling update killing off old pod without bringing up new one - kubernetes

I am currently using Deployments to manage my pods in my K8S cluster.
Some of my deployments require 2 pods/replicas, some require 3 pods/replicas and some of them require just 1 pod/replica. The issue Im having is the one with one pod/replica.
My YAML file is :
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: user-management-backend-deployment
spec:
replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
selector:
matchLabels:
name: user-management-backend
template:
metadata:
labels:
name: user-management-backend
spec:
containers:
- name: user-management-backend
image: proj_csdp/user-management_backend:3.1.8
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
livenessProbe:
httpGet:
port: 8080
path: /user_management/health
initialDelaySeconds: 300
timeoutSeconds: 30
readinessProbe:
httpGet:
port: 8080
path: /user_management/health
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: nfs
mountPath: "/vault"
volumes:
- name: nfs
nfs:
server: kube-nfs
path: "/kubenfs/vault"
readOnly: true
I have a the old version running fine.
# kubectl get po | grep user-management-backend-deployment
user-management-backend-deployment-3264073543-mrrvl 1/1 Running 0 4d
Now I want to update the image:
# kubectl set image deployment user-management-backend-deployment user-management-backend=proj_csdp/user-management_backend:3.2.0
Now as per RollingUpdate design, K8S should bring up the new pod while keeping the old pod working and only once the new pod is ready to take the traffic, should the old pod get deleted. But what I see is that the old pod is immediately deleted and the new pod is created and then it takes time to start taking traffic meaning that I have to drop traffic.
# kubectl get po | grep user-management-backend-deployment
user-management-backend-deployment-3264073543-l93m9 0/1 ContainerCreating 0 1s
# kubectl get po | grep user-management-backend-deployment
user-management-backend-deployment-3264073543-l93m9 1/1 Running 0 33s
I have used maxSurge: 2 & maxUnavailable: 1 but this does not seem to be working.
Any ideas why is this not working ?

It appears to be the maxUnavailable: 1; I was able to trivially reproduce your experience setting that value, and trivially achieve the correct experience by setting it to maxUnavailable: 0
Here's my "pseudo-proof" of how the scheduler arrived at the behavior you are experiencing:
Because replicas: 1, the desired state for k8s is exactly one Pod in Ready. During a Rolling Update operation, which is the strategy you requested, it will create a new Pod, bringing the total to 2. But you granted k8s permission to leave one Pod in an unavailable state, and you instructed it to keep the desired number of Pods at 1. Thus, it fulfilled all of those constraints: 1 Pod, the desired count, in an unavailable state, permitted by the R-U strategy.
By setting the maxUnavailable to zero, you correctly direct k8s to never let any Pod be unavailable, even if that means surging Pods above the replica count for a short time

with Strategy Type set to RollingUpdate a new pod is created before the old one is deleted even with a single replica. Strategy Type Recreate kills old pods before creating new ones
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment

As answered already, you can set the maxUnavailable to 0 to achieve the desired result. A couple of extra notes:
You should not expect this to work when using a stateful service that mounts a single specific volume that is to be used by the new pod. The volume will be attached to the soon-to-be-replaced pod, so won't be able to attach to the new pod.
The documentation notes that you cannot set this to 0 if you have set .spec.strategy.rollingUpdate.maxSurge to 0.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#max-unavailable

Related

kubernetes pod restart takes time and downtime

I have a service and pod in node.js . .consider hello world ..
exposed port : 80 on http
I want to seamlessly restart my service/pod
pod/service restart is taking a lot of time, thus there is downtime.
Using : kubectl delete; then recreate it with kubectl.
How can i avoid delay and downtime ?
Considering continuous deployments, your previous Pods will be terminated & new Pods will be created. Therefore, downtime of service is possible.
To avoid this add strategy in your deployment spec
example:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: api
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
where maxUnavailable: 0 defines that at any given time more than 1 pods should be available
Extra:
If you service takes some time to be live you can use readiness probe in spec to avoid traffic to be routed before the pods are ready .
example :
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 15
periodSeconds: 30

How can I ignore failure of a container in multi-container pod?

I have a multi-container application: app + sidecar. Both containers suppose to be alive all the time but sidecar is not really that important.
Sidecar depends on external resource, if this resource is not available - sidecar crashes. And it takes entire pod down. Kubernetes tries to recreate pod and fails because sidecar now won't start.
But from my business logic perspective - crash of sidecar is absolutely normal. Having that sidecar is nice but not mandatory.
I don't want sidecar to take main app with it when it crashes.
What would be best Kubernetes-native way to achieve that?
Is it possible to tell kubernetes ignore failure of sidecar as a "false positive" event which is absolutely fine?
I can't find anything in pod specification what controls that behaviour.
My yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: myapp
spec:
volumes:
- name: logs-dir
emptyDir: {}
containers:
- name: myapp
image: ${IMAGE}
ports:
- containerPort: 9009
volumeMounts:
- name: logs-dir
mountPath: /usr/src/app/logs
resources:
limits:
cpu: "1"
memory: "512Mi"
readinessProbe:
initialDelaySeconds: 60
failureThreshold: 8
timeoutSeconds: 1
periodSeconds: 8
httpGet:
scheme: HTTP
path: /myapp/v1/admin-service/git-info
port: 9009
- name: graylog-sidecar
image: digiapulssi/graylog-sidecar:latest
volumeMounts:
- name: logs-dir
mountPath: /log
env:
- name: GS_TAGS
value: "[\"myapp\"]"
- name: GS_NODE_ID
value: "nodeid"
- name: GS_SERVER_URL
value: "${GRAYLOG_URL}"
- name: GS_LIST_LOG_FILES
value: "[\"/ctwf\"]"
- name: GS_UPDATE_INTERVAL
value: "10"
resources:
limits:
memory: "128Mi"
cpu: "0.1"
Warning: the answer that was flagged as "correct" does not appear to work.
Adding a Liveness Probe to the application container and setting Restart Policy to "Never", will lead to the Pod being stopped and never restarted in a scenario where the sidecar container has stopped and the application container has failed its Liveness Probe. This is a problem, since you DO want the restarts for the application container.
The problem should be solved as follows:
Tweak your sidecar container in the startup command to keep the main process running on failure of the application process. This could be done with an extra piece of scripting, e.g. by appending | tail -f /dev/null to the startup command.
Adding a Liveness Probe to the application container is in general a good idea. Keep in mind though that it only protects you against a scenario where your application process keeps running without your application being in a correct state. It will certainly not overwrite the restartPolicy:
livenessProbe: Indicates whether the container is running. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy. If a Container does not provide a liveness probe, the default state is Success.
Container Probes
A custom livenessProbe should help but for your scenario I would use the liveness for your main app container which is the myapp. Considering the fact that you don't care about the sidecare (as mentioned). I would set the pod restartPolicy to Never and then define a custom livelinessProbe for your main myapp. In this way the Pod will never restart doesn't matter which container is failed but when your myapp container's liveliness fails kubelet will restart the container! Ref below, link
Pod is running and has two Containers. Container 1 exits with failure.
Log failure event. If restartPolicy is: Always: Restart Container; Pod
phase stays Running. OnFailure: Restart Container; Pod phase stays
Running. Never: Do not restart Container; Pod phase stays Running.
so the updated (pseudo) yaml should look like below
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: myapp
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
...
spec:
...
restartPolicy: Never
containers:
- name: myapp
...
livenessProbe:
exec:
command:
- /bin/sh
- -c
- {{ your custom liveliness check command goes }}
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
readinessProbe:
...
- name: graylog-sidecar
...
Note: since I don't know your application therefore I cannot write the command but for my jboss server I use this (an example for you)
livenessProbe:
exec:
command:
- /bin/sh
- -c
- /opt/jboss/wildfly/bin/jboss-cli.sh --connect --commands="read-attribute
server-state"
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
The best solution which works for me is not to fail inside a sidecar container, but just log an error and rerun.
#!/usr/bin/env bash
set -e
# do some stuff which can fail on start
set +e # needed to not exit if command fails
while ! command; do
echo "command failed - rerun"
done
This will always rerun the command if it fails, but exit if the command finished successfully.
You can define a custom livenessProbe for your sidecar to have greater failureThreshold / periodSeconds to accommodate what is considered acceptable failure rate in your environment, or simply ignore all failure.
Docs:
https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.15/#probe-v1-core
kubectl explain deployment.spec.template.spec.containers.livenessProbe

Is there a way to do a load balancing between pod in multiple nodes?

I have a kubernetes cluster deployed with rke witch is composed of 3 nodes in 3 different servers and in those server there is 1 pod which is running yatsukino/healthereum which is a personal modification of ethereum/client-go:stable .
The problem is that I'm not understanding how to add an external ip to send request to the pods witch are
My pods could be in 3 states:
they syncing the ethereum blockchain
they restarted because of a sync problem
they are sync and everything is fine
I don't want my load balancer to transfer requests to the 2 first states, only the third point consider my pod as up to date.
I've been searching in the kubernetes doc but (maybe because a miss understanding) I only find load balancing for pods inside a unique node.
Here is my deployment file:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: goerli
name: goerli-deploy
spec:
replicas: 3
selector:
matchLabels:
app: goerli
template:
metadata:
labels:
app: goerli
spec:
containers:
- image: yatsukino/healthereum
name: goerli-geth
args: ["--goerli", "--datadir", "/app", "--ipcpath", "/root/.ethereum/geth.ipc"]
env:
- name: LASTBLOCK
value: "0"
- name: FAILCOUNTER
value: "0"
ports:
- containerPort: 30303
name: geth
- containerPort: 8545
name: console
livenessProbe:
exec:
command:
- /bin/sh
- /app/health.sh
initialDelaySeconds: 20
periodSeconds: 60
volumeMounts:
- name: app
mountPath: /app
initContainers:
- name: healthcheck
image: ethereum/client-go:stable
command: ["/bin/sh", "-c", "wget -O /app/health.sh http://my-bash-script && chmod 544 /app/health.sh"]
volumeMounts:
- name: app
mountPath: "/app"
restartPolicy: Always
volumes:
- name: app
hostPath:
path: /app/
The answers above explains the concepts, but about your questions anout services and external ip; you must declare the service, example;
apiVersion: v1
kind: Service
metadata:
name: goerli
spec:
selector:
app: goerli
ports:
- port: 8545
type: LoadBalancer
The type: LoadBalancer will assign an external address for in public cloud or if you use something like metallb. Check your address with kubectl get svc goerli. If the external address is "pending" you have a problem...
If this is your own setup you can use externalIPs to assign your own external ip;
apiVersion: v1
kind: Service
metadata:
name: goerli
spec:
selector:
app: goerli
ports:
- port: 8545
externalIPs:
- 222.0.0.30
The externalIPs can be used from outside the cluster but you must route traffic to any node yourself, for example;
ip route add 222.0.0.30/32 \
nexthop via 192.168.0.1 \
nexthop via 192.168.0.2 \
nexthop via 192.168.0.3
Assuming yous k8s nodes have ip 192.168.0.x. This will setup ECMP routes to your nodes. When you make a request from outside the cluster to 222.0.0.30:8545 k8s will load-balance between your ready PODs.
For loadbalancing and exposing your pods, you can use https://kubernetes.io/docs/concepts/services-networking/service/
and for checking when a pod is ready, you can use tweak your liveness and readiness probes as explained https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
for probes you might want to consider exec actions like execution a script that checks what is required and returning 0 or 1 dependent on status.
When a container is started, Kubernetes can be configured to wait for a configurable
amount of time to pass before performing the first readiness check. After that, it
invokes the probe periodically and acts based on the result of the readiness probe. If a
pod reports that it’s not ready, it’s removed from the service. If the pod then becomes
ready again, it’s re-added.
Unlike liveness probes, if a container fails the readiness check, it won’t be killed or
restarted. This is an important distinction between liveness and readiness probes.
Liveness probes keep pods healthy by killing off unhealthy containers and replacing
them with new, healthy ones, whereas readiness probes make sure that only pods that
are ready to serve requests receive them. This is mostly necessary during container
start up, but it’s also useful after the container has been running for a while.
I think you can use probe for your goal

Kubernetes Zero Downtime deployment not working - Gives 503 Service Temporarily Unavailable

I am trying to achieve zero-downtime deployment using Kubernetes. But every time I do the upgrade of the deployment using a new image, I am seeing 2-3 seconds of downtime. I am testing this using a Hello-World sort of application but still could not achieve it. I am deploying my application using the Helm charts.
Following the online blogs and resources, I am using Readiness-Probe and Rolling Update strategy in my Deployment.yaml file. But this gives me no success.
I have created a /health end-point which simply returns 200 status code as a check for readiness probe. I expected that after using readiness probes and RollingUpdate strategy in Kubernetes I would be able to achieve zero-downtime of my service when I upgrade the image of the container. The request to my service goes through an Amazon ELB.
Deployment.yaml file is as below:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: wine-deployment
labels:
app: wine-store
chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
selector:
matchLabels:
app: wine-store
replicas: 2
template:
metadata:
labels:
app: wine-store
spec:
containers:
- name: {{ .Chart.Name }}
resources:
limits:
cpu: 250m
requests:
cpu: 200m
image: "my-private-image-repository-with-tag-and-version-goes-here-which-i-have-hidden-here"
imagePullPolicy: Always
env:
- name: GET_HOSTS_FROM
value: dns
ports:
- containerPort: 8089
name: testing-port
readinessProbe:
httpGet:
path: /health
port: 8089
initialDelaySeconds: 3
periodSeconds: 3
Service.yaml file:
apiVersion: v1
kind: Service
metadata:
name: wine-service
labels:
app: wine-store
spec:
ports:
- port: 80
targetPort: 8089
protocol: TCP
selector:
app: wine-store
Ingress.yaml file:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: wine-ingress
annotations:
kubernetes.io/ingress.class: public-nginx
spec:
rules:
- host: my-service-my-internal-domain.com
http:
paths:
- path: /
backend:
serviceName: wine-service
servicePort: 80
I expect the downtime to be zero when I am upgrading the image using helm upgrade command. Meanwhile, when the upgrade is in progress, I continuously hit my service using a curl command. This curl command gives me 503-service Temporarily un-available errors for 2-3 seconds and then again the service is up. I expect that this downtime does not happens.
This issue is caused by the Service VIP using iptables. You haven't done anything wrong - it's a limitation of current Kubernetes.
When the readiness probe on the new pod passes, the old pod is terminated and kube-proxy rewrites the iptables for the service. However, a request can hit the service after the old pod is terminated but before iptables has been updated resulting in a 503.
A simple workaround is to delay termination by using a preStop lifecycle hook:
lifecycle:
preStop:
exec:
command: ["/bin/bash", "-c", "sleep 10"]
It'd probably not relevant in this case, but implementing graceful termination in your application is a good idea. Intercept the TERM signal and wait for your application to finish handling any requests that it has already received rather than just exiting immediately.
Alternatively, more replicas, a low maxUnavailable and a high maxSurge will all reduce the probability of requests hitting a terminating pod.
For more info:
https://kubernetes.io/docs/concepts/services-networking/service/#proxy-mode-iptables
https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods
Another answer mistakenly suggests you need a liveness probe. While it's a good idea to have a liveness probe, it won't effect the issue that you are experiencing. With no liveness probe defined the default state is Success.
In the context of a rolling deployment a liveness probe will be irrelevant - Once the readiness probe on the new pod passes the old pod will be sent the TERM signal and iptables will be updated. Now that the old pod is terminating, any liveness probe is irrelevant as its only function is to cause a pod to be restarted if the liveness probe fails.
Any liveness probe on the new pod again is irrelevant. When the pod is first started it is considered live by default. Only after the initialDelaySeconds of the liveness probe would it start being checked and, if it failed, the pod would be terminated.
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes
Go around with blue-green deployments because even if pods are up it may take time for kube-proxy to forward requests to new POD IPs.
So setup new deployment, after all pods are up update service selector to new POD lables.
Follow: https://kubernetes.io/blog/2018/04/30/zero-downtime-deployment-kubernetes-jenkins/
The problem you describe indicate an issue with readiness probes. It is important to understand the differences between liveness and readiness probes. First of all you should implement and configure both!
The liveness probes are to check if the container is started and alive. If this isn’t the case, kubernetes will eventually restart the container.
The readiness probes in turn also check dependencies like database connections or other services your container is depending on to fulfill it’s work. As a developer you have to invest here more time into the implementation than just for the liveness probes. You have to expose a an endpoint which is also checking the mentioned dependencies when queried.
Your current configuration uses an health endpoint which is usually used by the liveness probes. It probably doesn’t check if your services is really ready to take traffic.
Kubernetes relies on the readiness probes. During an rolling update, it will keep the old container up and running until the new service declares that it is ready to take traffic. Therefore the readiness probes have to be implemented correctly.

How to automatically stop rolling update when CrashLoopBackOff?

I use Google Kubernetes Engine and I intentionally put an error in the code. I was hoping the rolling update will stop when it discovers the status is CrashLoopBackOff, but it wasn't.
In this page, they say..
The Deployment controller will stop the bad rollout automatically, and
will stop scaling up the new ReplicaSet. This depends on the
rollingUpdate parameters (maxUnavailable specifically) that you have
specified.
But it's not happening, is it only if the status ImagePullBackOff?
Below is my configuration.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: volume-service
labels:
group: volume
tier: service
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
template:
metadata:
labels:
group: volume
tier: service
spec:
containers:
- name: volume-service
image: gcr.io/example/volume-service:latest
P.S. I already read liveness/readiness probes, but I don't think it can stop a rolling update? or is it?
Turns out I just need to set minReadySeconds and it stops the rolling update when the new replicaSet has status CrashLoopBackOff or something like Exited with status code 1. So now the old replicaSet still available and not updated.
Here is the new config.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: volume-service
labels:
group: volume
tier: service
spec:
replicas: 4
minReadySeconds: 60
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
template:
metadata:
labels:
group: volume
tier: service
spec:
containers:
- name: volume-service
image: gcr.io/example/volume-service:latest
Thank you for averyone help!
I agree with #Nicola_Ben - I would also consider changing to the setup below:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 <----- I want at least (4)-[1] = 3 available pods.
maxSurge: 1 <----- I want maximum (4)+[1] = 5 total running pods.
Or even change maxSurge to 0.
This will help us to expose less possibly nonfunctional pods (like we would do in canary release).
Like #Hana_Alaydrus suggested its important to setup minReadySeconds.
With addition to that, sometimes we need to take more actions after the rollout execution.
(For example, there are cases when the new pods not functioning properly but the process running inside the container haven't crash).
A suggestion for a general debug process:
1 ) First of all, pause the rollout with:
kubectl rollout pause deployment <name>.
2 ) Debug the relevant pods and decide how to continue (maybe we can continue with with the new release, maybe not).
3 ) We would have to resume the rollout with: kubectl rollout resume deployment <name> because even if we decided to return to previous release with the undo command (4.B) we need first to resume the rollout.
4.A ) Continue with new release.
4.B ) Return to previous release with: kubectl rollout undo deployment <name>.
Below is a visual summary (click inside in order to view the comments):
The explanation you quoted is correct, and it means that the new replicaSet (the one with the error) will not proceed to completion, but it will be stopped in its progression to the maxSurge+maxUnavailable count. And the old replicaSet will be present too.
Here the example I tried with:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
And these are the results:
NAME READY STATUS RESTARTS AGE
pod/volume-service-6bb8dd677f-2xpwn 0/1 ImagePullBackOff 0 42s
pod/volume-service-6bb8dd677f-gcwj6 0/1 ImagePullBackOff 0 42s
pod/volume-service-c98fd8d-kfff2 1/1 Running 0 59s
pod/volume-service-c98fd8d-wcjkz 1/1 Running 0 28m
pod/volume-service-c98fd8d-xvhbm 1/1 Running 0 28m
NAME DESIRED CURRENT READY AGE
replicaset.extensions/volume-service-6bb8dd677f 2 2 0 26m
replicaset.extensions/volume-service-c98fd8d 3 3 3 28m
My new replicaSet will start only 2 new pods (1 slot from the maxUnavailable and 1 slot from the maxSurge).
The old replicaSet will keep running 3 pods (4 - 1 unAvailable).
The two params you set in the rollingUpdate section are the key point, but you can play also with other factors like readinessProbe, livenessProbe, minReadySeconds, progressDeadlineSeconds.
For them, here the reference.