Helm upgrade custom microservice causes temporary downtime - kubernetes

during deployment of new version of application sequentially 4 pods are terminated and replaced by newer ones; but for those ~10minutes the app is hitting other microservice is hitting older endpoints causing 502/404 errors - anyone know of a way to deploy 4 new pods, then drain traffic from old ones to new ones and after all connections to prev ver are terminated, then terminate the old pods ?

This probably means you don't have a readiness probe set up? Because the default is already to only roll 25% of the pods at once. If you have a readiness probe, this will include waiting until the new pods are actually available and Ready but otherwise it only waits until they start.

Related

Kubernetes Version Upgrades and Downtime

I just tested Ranche RKE , upgrading kubernetes 13.xx to 14.xx , during upgrade , an already running nginx Pod got restarted during upgrade. Is this expected behavior?
Can we have Kubernetes cluster upgrades without user pods restarting?
Which tool supports un-intruppted upgrades?
What are the downtimes that we can never aviod? ( apart from Control plane )
The default way Kubernetes upgrades is by doing a rolling upgrade of the nodes, one at a time.
This works by draining and cordoning (marking the node as unavailable for new deployments) each node that is being upgraded so that there no pods running on that node.
It does that by creating a new revision of the existing pods on another node (if it's available) and when the new pod starts running (and answering to the readiness/health probes), it stops and remove the old pod (sending SIGTERM to each pod container) on the node that was being upgraded.
The amount of time Kubernetes waits for the pod to graceful shutdown, is controlled by the terminationGracePeriodSeconds on the pod spec, if the pod takes longer than that, they are killed with SIGKILL.
The point is, to have a graceful Kubernetes upgrade, you need to have enough nodes available, and your pods must have correct liveness and readiness probes (https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/).
Some interesting material that is worth a read:
https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-upgrading-your-clusters-with-zero-downtime (specific to GKE but has some insights)
https://blog.gruntwork.io/zero-downtime-server-updates-for-your-kubernetes-cluster-902009df5b33
Resolved By configuring the container runtime on the hosts not to restart containers on docker restart.

Kubernetes in production. Problems with a working pod

I have a question with Kubernetes when deploying a new version.
My YAML configuration of Kubernetes has the RollingUpdate strategy. The problem comes when it comes to changing versions this way. If I have a php-fpm that is performing an action, does that action get lost? In case it is just changing that pod to the new version.
My main question is if Kubernetes with this strategy takes into consideration if the pod is being used and if so if it waits until he finishes what it has to do and changes it.
Thanks!
If something is dropping your sessions it would be a bug. Generally speaking, if you have a 'Service' that forwards to multiple backend replicas when you do an update it happens one replica at a time. Something like this:
New pod created.
Wait for the new pod to be ready and serviceable.
Put the new pod in the Service pool.
Remove the old pod from the Service pool
Drain old pod. Don't take any more incoming connections and wait for connections to close.
Take down the old pod.

Openshift: trigger pods restart sequentially

My application loads data during startup. So I need to restart application to change data.
Data is loaded from Oracle schema and can be changed by other application.
If the data is changed, application becomes partially functional and needs to be restarted.
Requirement: restart should be done automatically without downtime (old pod should be killed, when new one pass readiness check).
How this requirement can be fulfilled?
Notes:
I would really like to use liveness probe to check some URL with health check. Issue: AFAIK liveness probe kills pod as soon as check fails. So all
pods will be killed simultaneously, which leads to a downtime
during startup.
The desired behavior can be reached by a rolling deployment. However I don't want to perform it manually.
I don't want to implement loading data during pod operation for simplicity: it can load data only during startup. If pod state is not fully functional, it is killed and recreated.
2 ways i can think of :
- Use statefulsets, the pods will be restarted in order and killed in reverse order.
- You can use deployment's spec.strategy.type = RollingUpgrade and pair it with maxUnavailable to greater than 1.
.spec.strategy.rollingUpdate.maxUnavailable

Configure Kubernetes StatefulSet to start pods first restart failed containers after start?

Basic info
Hi, I'm encountering a problem with Kubernetes StatefulSets. I'm trying to spin up a set with 3 replicas.
These replicas/pods each have a container which pings a container in the other pods based on their network-id.
The container requires a response from all the pods. If it does not get a response the container will fail. In my situation I need 3 pods/replicas for my setup to work.
Problem description
What happens is the following. Kubernetes starts 2 pods rather fast. However since I need 3 pods for a fully functional cluster the first 2 pods keep crashing as the 3rd is not up yet.
For some reason Kubernetes opts to keep restarting both pods instead of adding the 3rd pod so my cluster will function.
I've seen my setup run properly after about 15 minutes because Kubernetes added the 3rd pod by then.
Question
So, my question.
Does anyone know a way to delay restarting failed containers until the desired amount of pods/replicas have been booted?
I've since found out the cause of this.
StatefulSets launch pods in a specific order. If one of the pods fails to launch it does not launch the next one.
You can add a podManagementPolicy: "Parallel" to launch the pods without waiting for previous pods to be Running.
See this documentation
I think a better way to deal with your problem is to leverage liveness probe, as described in the document, rather than delay the restart time (not configurable in the YAML).
Your pods respond to the liveness probe right after they are started to let Kubernetes know they are alive, which prevents them from being restarted. Meanwhile, your pods keep ping others until they are all up. Only when all your pods are started will serve the external requests. This is similar to creating a Zookeeper ensemble.

Does Kubernetes support connection draining?

Does Kubernetes support connection draining?
For example, my deployment rolls out a new version of my web app container.
In connection draining mode Kubernetes should spin up a new container from the new image and route all new traffic coming to my service to this new instance. The old instance should remain alive long enough to send a response for existing connections.
Kubernetes does support connection draining, but how it happens is controlled by the Pods, and is called graceful termination.
Graceful Termination
Let's take an example of a set of Pods serving traffic through a Service. This is a simplified example, the full details can be found in the documentation.
The system (or a user) notifies the API that the Pod needs to stop.
The Pod is set to the Terminating state. This removes it from a Service serving traffic. Existing connections are maintained, but new connections should stop as soon as the load balancers recognize the change.
The system sends SIGTERM to all containers in the Pod.
The system waits terminationGracePeriodSeconds (default 30s), or until the Pod completes on it's own.
If containers in the Pod are still running, they are sent SIGKILL and terminated immediately. At this point the Pod is forcefully terminated if it is still running.
This not only covers the simple termination case, but the exact same process is used in rolling update deployments, each Pod is terminated in the exact same way and is given the opportunity to clean up.
Using Graceful Termination For Connection Draining
If you do not handle SIGTERM in your app, your Pods will immediately terminate, since the default action of SIGTERM is to terminate the process immediately, and the grace period is not used since the Pod exits on its own.
If you need "connection draining", this is the basic way you would implement it in Kubernetes:
Handle the SIGTERM signal, and clean up your connections in a way that your application decides. This may simply be "do nothing" to allow in-flight connections to clear out. Long running connections may be terminated in a way that is (more) friendly to client applications.
Set the terminationGracePeriodSeconds long enough for your Pod to clean up after itself.
There are some more options which could help to enable a zero downtime deployment. Here's a summary:
1. Pod Graceful Termination
s. Answer of Kekoa
Downside: Pod directly receives SIGTERM and will not accept any new requests (dependent on the implementation).
2. Pre Stop Hook
Pod receives SIGTERM after a waiting time and can still accept any new requests. You should set terminationGracePeriodSeconds to a value larger than the preStop sleeping time.
lifecycle:
preStop:
exec:
command: ["/bin/bash","-c","/bin/sleep 90"]
This is the recommended solution of the Azure Application Gateway Ingress Controller.
Downside: Pod is removed from the list of Endpoints and might not be visible to other pods.
3. Helm Chart Hooks
If you need some cleanup before the Pods are removed from the Endpoints, you need Helm Chart Hooks.
apiVersion: batch/v1
kind: Job
metadata:
name: graceful-shutdown-mydeployment
annotations:
"helm.sh/hook": pre-delete,pre-upgrade,pre-rollback
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
labels:
app.kubernetes.io/name: graceful-shutdown-mydeployment
spec:
template:
spec:
containers:
- name: graceful-shutdown
image: ...
command: ...
restartPolicy: Never
backoffLimit: 0
For more details see https://stackoverflow.com/a/66733416/1909531
Downside: Helm required
Full lifecycle
Here's how these options are executed in the pod's lifecycle.
Execute Helm Chart Hook
Set Pod to Terminating State + Executing Pre Stop Hook
Send SIGTERM
Wait terminationGracePeriodSeconds
Stop Pod by sending SIGKILL, if containers are still running.
Choosing an option
I would first try 1. terminationGracePeriodSeconds, then 2. Pre Stop Hook and at last 3. Helm Chart Hook as the complexity rises.
No, deployments do not support connection draining per se. Draining connections effectively happen as old pods stop & new pods start, clients connected to old pods will have to reconnect to new pods. As clients connect to the service, it's all transparent to clients. You do need to ensure that your application can handle different versions running concurrently, but that is a good idea anyway as it minimises downtime in upgrades & allows you to perform things like A/B testing.
There are a couple of different strategies which will let you tweak how your upgrades take place: deployments support two update strategies: Recreate or RollingUpdate.
With Recreate, old pods are stopped before new pods are started. This leads to a period of downtime but ensures that all clients connect to either the old or the new version - there will never be a time when both old & new pods are servicing clients at the same time. If downtime is acceptable to you then this may be an option to you.
Most of the time, however, downtime is unacceptable for a service so RollingUpdate is more appropriate. This starts up new pods & as it does so it stops old pods. As old pods are stopped, clients connected to them have to reconnect. Eventually there will be no old pods & all clients will have reconnected to new pods.
While there is no option to do connection draining as you suggest, you can configure the rolling update via maxUnavailable and maxSurge options. From http://kubernetes.io/docs/user-guide/deployments/#rolling-update-deployment:
.spec.strategy.rollingUpdate.maxUnavailable is an optional field that specifies the maximum number of Pods that can be unavailable during the update process. The value can be an absolute number (e.g. 5) or a percentage of desired Pods (e.g. 10%). The absolute number is calculated from percentage by rounding up. This can not be 0 if .spec.strategy.rollingUpdate.maxSurge is 0. By default, a fixed value of 1 is used.
For example, when this value is set to 30%, the old Replica Set can be scaled down to 70% of desired Pods immediately when the rolling update starts. Once new Pods are ready, old Replica Set can be scaled down further, followed by scaling up the new Replica Set, ensuring that the total number of Pods available at all times during the update is at least 70% of the desired Pods.
.spec.strategy.rollingUpdate.maxSurge is an optional field that specifies the maximum number of Pods that can be created above the desired number of Pods. Value can be an absolute number (e.g. 5) or a percentage of desired Pods (e.g. 10%). This can not be 0 if MaxUnavailable is 0. The absolute number is calculated from percentage by rounding up. By default, a value of 1 is used.
For example, when this value is set to 30%, the new Replica Set can be scaled up immediately when the rolling update starts, such that the total number of old and new Pods do not exceed 130% of desired Pods. Once old Pods have been killed, the new Replica Set can be scaled up further, ensuring that the total number of Pods running at any time during the update is at most 130% of desired Pods.
Hope that helps.