Airflow Clean Up Pods

Airflow Clean Up Pods - kubernetes

Currently we have a CronJob to clean pods deployed by airflow.
Cleanup cron job in airflow is defined as follows
This Cleans all completed pods (Successful pod and Pods that are marked as Error).
I have a requirement where in CleanUp Pods CronJob shouldn't clean Pods that are marked as ERROR.
I checked Airflow Docs but couldn't get anything. Any other way in which i can achieve this

There are 2 airflow environments variables that might help.
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS - If True, all worker pods will be deleted upon termination
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE - If False (and delete_worker_pods is True), failed worker pods will not be deleted so users can investigate them. This only prevents removal of worker pods where the worker itself failed, not when the task it ran failed
for more details see here

Related

While deploying Kafka on on premises k8s the status of pod is pending for long time

I am trying to use helm charts for deploying kafka and zookeeper in local k8s but while checking the status of respective pods it shows PENDING for long time and pod is not assigning to any node nevertheless i have 2 worker nodes running which are healthy
I tried by deleting the pods and redeployed still i landed in same situation not able to make pods run need help on how i can run this pods

What is difference between Kubernetes Jobs & Deployments

I see that Kubernetes Job & Deployment provide very similar configuration. Both can deploy one or more pods with certain configuration. So I have few queries around these:
Is the pod specification .spec.template different in Job & Deployment?
What is difference in Job's completions & Deployment's replicas?
If a command is run in a Deployment's only container and it completes (no server or daemon process containers), the pod would terminate. The same is applicable in a Job as well. So how is the pod lifecycle different in either of the resources?

Many resources in Kubernetes use a Pod template. Both Deployments and Jobs use it, because they manage Pods.
Controllers for workload resources create Pods from a pod template and manage those Pods on your behalf.
PodTemplates are specifications for creating Pods, and are included in workload resources such as Deployments, Jobs, and DaemonSets.
The main difference between Deployments and Jobs is how they handle a Pod that is terminated. A Deployment is intended to be a "service", e.g. it should be up-and-running, so it will try to restart the Pods it manage, to match the desired number of replicas. While a Job is intended to execute and successfully terminate.

Regarding spec.template: both Job and Deployment would include a similar definition. See: https://v1-21.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#podtemplate-v1-core
Job completions and parallelism lets you split a task in sub-tasks. See https://kubernetes.io/docs/concepts/workloads/controllers/job/#parallel-jobs , https://kubernetes.io/docs/tasks/job/indexed-parallel-processing-static/ . Replicas in a Deployment would not offer this.
In a Deployment, the default restartPolicy of your Pod is set Always. In a Job: Never. A job is not meant to restart your container once it would have exited. A deployment is not meant to exit.

Does "kubectl rollout restart deploy" cause downtime?

I'm trying to get all the deployments of a namespace to be restarted for implementation reasons.
I'm using "kubectl rollout -n restart deploy" and it works perfectly, but I'm not sure it that command causes downtime or if it works as the "rollout update", applying the restart one by one, keeping my services up.
Does anyone know?
In the documentation I can only find this:
Operation
Syntax
Description
rollout
kubectl rollout SUBCOMMAND [options]
Manage the rollout of a resource. Valid resource types include: deployments, daemonsets and statefulsets.
But I can't find details about the specific "rollout restart deploy".
I need to make sure it doesn't cause downtime. Right now is very hard to tell, because the restart process is very quick.
Update: I know that for one specific deployment (kubectl rollout restart deployment/name), it works as expected and doesn't cause downtime, but I need to apply it to all the namespace (without specifying the deployment) and that's the case I'm not sure about.

kubectl rollout restart deploy -n namespace1 will restart all deployments in specified namespace with zero downtime.
Restart command will work as follows:
After restart it will create new pods for a each deployments
Once new pods are up (running and ready) it will terminate old pods
Add readiness probes to your deployments to configure initial delays.

#pcsutar 's answer is almost correct. kubectl rollout restart $resourcetype $resourcename restarts your deployment, daemonset or stateful set according to the its update strategy. so if it is set to rollingUpdate it will behave exactly as the above answer:
After restart it will create new pods for a each deployments
Once new pods are up (running and ready) it will terminate old pods
Add readiness probes to your deployments to configure initial delays.
However, if the strategy for example is type: recreate all the currently running pods belonging to the deployment will be terminated before new pods will be spun up!

Airflow Kubernetes Executor pods go into "NotReady" state instead of being deleted

Installed airflow in kubernetes using the repo https://airflow-helm.github.io/charts and airflow-stable/airflow with version 8.1.3. So I have Airflow v2.0.1 installed. I have it setup using external postgres database and using the kubernetes executor.
What I have noticed is when airflow related pods are done they go into a "NotReady" status. This happens with the update-db pod at startup and also pods launched by the kubernetes executioner. When I go into airflow and look at the task some are successful and some can be failure, but either way the related pods are in "NotReady" status. In the values file I set the below thinking it would delete the pods when they are done. I've gone through the logs and made sure one of the dags ran as intended and was success in the related task was success and of course the related pod when it was done went into "NotReady" status.
The values below are located in Values.airflow.config.
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS: "true"
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE: "true"
So I'm not really sure what I'm missing and if anyone has seen this behavior? It's also really strange that the upgrade-db pod is doing this too.
Screenshot of kubectl get pods for the namespace airflow is deployed in with the "NotReady" pods

Figured it out. The K8 namespace had auto injection of linkerd sidecar container for each pod. Would have to just use celery executioner or setup some sort of k8 job to cleanup completed pods and jobs that don’t get cleaned up due to the linkerd container running forever in those pods.

Can you auto destroy a kubernetes pod deployment?

I can delete a deployment with the kubectl cli, but is there a way to make my deployment auto-destroy itself once it has finished? For my situation, we are kicking off a long-running process in a Docker container on AWS EKS. When I check the status, it is 'running', and then sometime later the status is 'completed'. So is there any way to get the kubernetes pod to auto destroy once it as finished?
kubectl run some_deployment_name --image=path_to_image
kubectl get pods
//the above command returns...
some_deployment_name1212-75bfdbb99b-vt622 0/1 Running 2 23s
//and then some time later...
some_deployment_name1212-75bfdbb99b-vt622 0/1 Completed 2 15m
Once it is complete, I would like for it to be destroyed, without me having to call another command.

So the question is about running Jobs and not deployments as in the Kubernetes Deployments abstraction that creates a ReplicaSet but more like Kubernetes Jobs
A Job is created with kubectl run when you specify the --restart=OnFailure option. These jobs are not cleaned up by the cluster unless you delete them manually with kubectl delete <pod-name>. More info here.
If you are using Kubernetes 1.12 or later a new Job spec was introduced: ttlSecondsAfterFinished. You can also use that to clean up your jobs. Another more time-consuming option would be to write your own Kubernetes controller that cleans up regular Jobs.
A CronJob is created if you specify both the --restart=OnFailure and `--schedule="" option. These pods get deleted automatically because they run on a regular schedule.
More info on kubectl run here.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Airflow Clean Up Pods - kubernetes

Related

While deploying Kafka on on premises k8s the status of pod is pending for long time

What is difference between Kubernetes Jobs & Deployments

Does "kubectl rollout restart deploy" cause downtime?

Airflow Kubernetes Executor pods go into "NotReady" state instead of being deleted

Can you auto destroy a kubernetes pod deployment?

Categories

Resources