Kubernetes helm waiting before killing the old pods during helm deployment - kubernetes

I have a "big" micro-service (website) with 3 pods deployed with Helm Chart in production env, but when I deploy a new version of the Helm chart, during 40 seconds (time to start my big microservice) I have a problem with the website (503 Service Unavailable)
So, I look at a solution to tell to kubernetes do not kill the old pod before the complete start of the new version
I tried the --wait --timeout but it did not work for me.
My EKS version : "v1.14.6-eks-5047ed"

Without more details about the Pods, I'd suggest:
Use Deployment (if not already) so that Pods are managed by a Replication Controller, which allows to do rolling updates, and that in combination with configured Startup Probe (if on k8s v1.16+) or Readiness Probe so that Kubernetes knows when the new Pods are ready to take on traffic (a Pod is considered ready when all of its Containers are ready).

Related

While deploying Kafka on on premises k8s the status of pod is pending for long time

I am trying to use helm charts for deploying kafka and zookeeper in local k8s but while checking the status of respective pods it shows PENDING for long time and pod is not assigning to any node nevertheless i have 2 worker nodes running which are healthy
I tried by deleting the pods and redeployed still i landed in same situation not able to make pods run need help on how i can run this pods

Grafana & Loki agents not deployed in Tainted nodes

We are running our workloads on AKS. Basically we have Two Node-Pools.
1. System-Node-Pool: Where all system pods are running
2. Apps-Node-Pool: Where our actual workloads/ apps run in.
In fact, our Apps-Node-Pool is Tainted whereas System-Node-Pool isn't. So basically I deployed Loki-Grafana stack in order for Monitoring and for Log analysis. I'm using below Helm command to install the Grafana-Loki stack.
helm upgrade --install loki grafana/loki-stack --set grafana.enabled=true,prometheus.enabled=true,prometheus.alertmanager.persistentVolume.enabled=false,prometheus.server.persistentVolume.enabled=false,loki.persistence.enabled=true,loki.persistence.storageClassName=standard,loki.persistence.size=5Gi
Since the Toleration isn't added in the helm command (Or even in values.yaml) all the Grafana and Loki pods get deployed in the System-Node-Pool. But my case is, since the necessary agents aren't deployed on Apps-Node-Pool (For example: Promtail Pods) I can't check the logs of my App pods.
Since the Taint exists in Apps-Node-Pool if we add a Toleration along with the Helm command, then basically all the Monitoring related Pods will get deployed in Apps-Node-Pool (Still can't guarant as it may get deployed in System-Node-Pool since it doesn't have a Taint)
So according to my cluster, what can I do in order to make sure that Agent pods are also running in Tainted node?
So in my case my requirement was to run "Promtail" pods in Apps-Node-Pool. I haven't added a Toleration to the promtail pods, so I had to add a toleration to the Promtail pod. So successfully the Promtail pods got deployed in Apps-Node-Pool.
But still adding a Toleration in Promtail pod doesn't guarant the deployment of the Promtail pods gets deployed in Apps-Node-Pool because in my case, System-Node-Pool didn't have any Taint.
In this case you may leverage both NodeAffinity and Tolerations to exclusively deploy pods in a specific node.

Does "kubectl rollout restart deploy" cause downtime?

I'm trying to get all the deployments of a namespace to be restarted for implementation reasons.
I'm using "kubectl rollout -n restart deploy" and it works perfectly, but I'm not sure it that command causes downtime or if it works as the "rollout update", applying the restart one by one, keeping my services up.
Does anyone know?
In the documentation I can only find this:
Operation
Syntax
Description
rollout
kubectl rollout SUBCOMMAND [options]
Manage the rollout of a resource. Valid resource types include: deployments, daemonsets and statefulsets.
But I can't find details about the specific "rollout restart deploy".
I need to make sure it doesn't cause downtime. Right now is very hard to tell, because the restart process is very quick.
Update: I know that for one specific deployment (kubectl rollout restart deployment/name), it works as expected and doesn't cause downtime, but I need to apply it to all the namespace (without specifying the deployment) and that's the case I'm not sure about.
kubectl rollout restart deploy -n namespace1 will restart all deployments in specified namespace with zero downtime.
Restart command will work as follows:
After restart it will create new pods for a each deployments
Once new pods are up (running and ready) it will terminate old pods
Add readiness probes to your deployments to configure initial delays.
#pcsutar 's answer is almost correct. kubectl rollout restart $resourcetype $resourcename restarts your deployment, daemonset or stateful set according to the its update strategy. so if it is set to rollingUpdate it will behave exactly as the above answer:
After restart it will create new pods for a each deployments
Once new pods are up (running and ready) it will terminate old pods
Add readiness probes to your deployments to configure initial delays.
However, if the strategy for example is type: recreate all the currently running pods belonging to the deployment will be terminated before new pods will be spun up!

Airflow Kubernetes Executor pods go into "NotReady" state instead of being deleted

Installed airflow in kubernetes using the repo https://airflow-helm.github.io/charts and airflow-stable/airflow with version 8.1.3. So I have Airflow v2.0.1 installed. I have it setup using external postgres database and using the kubernetes executor.
What I have noticed is when airflow related pods are done they go into a "NotReady" status. This happens with the update-db pod at startup and also pods launched by the kubernetes executioner. When I go into airflow and look at the task some are successful and some can be failure, but either way the related pods are in "NotReady" status. In the values file I set the below thinking it would delete the pods when they are done. I've gone through the logs and made sure one of the dags ran as intended and was success in the related task was success and of course the related pod when it was done went into "NotReady" status.
The values below are located in Values.airflow.config.
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "true"
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE: "true"
So I'm not really sure what I'm missing and if anyone has seen this behavior? It's also really strange that the upgrade-db pod is doing this too.
Screenshot of kubectl get pods for the namespace airflow is deployed in with the "NotReady" pods
Figured it out. The K8 namespace had auto injection of linkerd sidecar container for each pod. Would have to just use celery executioner or setup some sort of k8 job to cleanup completed pods and jobs that don’t get cleaned up due to the linkerd container running forever in those pods.

How to rollaback Kubernetes StatefulSet application

Currently, I am migrating one of our microservice from K8S Deployment type to StatefulSets.
While updating Kubernetes deployment config I noticed StatefulSets doesn't support revisionHistoryLimit and minReadySeconds.
revesionHistoryLimit is used keep previous N numbers of replica sets for rollback.
minReadySeconds is number of seconds pod should be ready without any of its container crashing.
I couldn't find any compatible settings for StatefulSets.
So my questions are:
1) How long master will wait to consider Stateful Pod ready?
2) How to handle rollback of Stateful application.
After reverting the configuration, you must also delete any Pods that StatefulSet had already attempted to run with the bad configuration. The new pod will automatically spin up with correct configuration.
You should define a readiness probe, and the master will wait for it to report the pod as Ready.
StatefulSets currently do not support rollbacks.