Set retention policy for Pods created by Kubernetes CronJob - kubernetes

I understand that Kubernetes CronJobs create pods that run on the schedule specified inside the CronJob. However, the retention policy seems arbitrary and I don't see a way where I can retain failed/successful pods for a certain period of time.

I am not sure about what you are exactly asking here.
CronJob does not create pods. It creates Jobs (which also manages) and those jobs are creating pods.
As per Kubernetes Jobs Documentation If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy. In short, pods and jobs will not be deleted utill you remove CronJob. You will be able to check logs from Pods/Jobs/CronJob. Just use kubctl describe
As default CronJob keeps history of 3 successfulJobs and only 1 of failedJob. You can change this limitation in CronJob spec by parameters:
spec:
successfulJobsHistoryLimit: 10
failedJobsHistoryLimit: 0
0 means that CronJob will not keep any history of failed jobs
10 means that CronJob will keep history of 10 succeeded jobs
You will not be able to retain pod from failed job because, when job fails it will be restarted until it was succeded or reached backoffLimit given in the spec.
Other option you have is to suspend CronJob.
kubctl patch cronjob <name_of_cronjob> -p '{"spec:"{"suspend":true}}'
If value of spuspend is true, CronJob will not create any new jobs or pods. You will have access to completed pods and jobs.
If none of the above helpd you, could you please give more information what do you exactly expect?
CronJob spec would be helpful.

Related

Can a deployment kind with a replica count = 1 ever result in two Pods in the 'Running' phase?

From what I understand, with the above configuration, it is possible to have 2 pods that exist in the cluster associated with the deployment. However, the old Pod is guranteed to be in the 'Terminated' state. An example scenario is updating the image version associated with the deployment.
There should not be any scenario where there are 2 Pods that are associated with the deployment and both are in the 'Running' phase. Is this correct?
In the scenarios I tried, for example, Pod eviction or updating the Pod spec. The existing Pod enters 'Terminating' state and a new Pod is deployed.
This is what I expected. Just wanted to make sure that all possible scenarios around updating Pod spec or Pod eviction cannot end up with two Pods in the 'Running' state as it would violate the replica count = 1 config.
It depends on your update strategy. Many times it's desired to have the new pod running and healthy before you shut down the old pod, otherwise you have downtime which may not be acceptable as per business requirements. By default, it's doing rolling updates.
The defaults look like the below, so if you don't specify anything, that's what will be used.
apiVersion: apps/v1
kind: Deployment
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
So usually, you would have a moment where both pods are running. But Kubernetes will terminate the old pod as soon as the new pod becomes ready, so it will be hard, if not impossible, to literally see both in the state ready.
You can read about it in the docs: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
Deployment ensures that only a certain number of Pods are down while they are being updated. By default, it ensures that at least 75% of the desired number of Pods are up (25% max unavailable).
Deployment also ensures that only a certain number of Pods are created above the desired number of Pods. By default, it ensures that at most 125% of the desired number of Pods are up (25% max surge).
For example, if you look at the above Deployment closely, you will see that it first creates a new Pod, then deletes an old Pod, and creates another new one. It does not kill old Pods until a sufficient number of new Pods have come up, and does not create new Pods until a sufficient number of old Pods have been killed. It makes sure that at least 3 Pods are available and that at max 4 Pods in total are available. In case of a Deployment with 4 replicas, the number of Pods would be between 3 and 5.
This is also explained here: https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/
Users expect applications to be available all the time and developers are expected to deploy new versions of them several times a day. In Kubernetes this is done with rolling updates. Rolling updates allow Deployments' update to take place with zero downtime by incrementally updating Pods instances with new ones. The new Pods will be scheduled on Nodes with available resources.
To get the behaviour, described by you, you would set spec.strategy.type to Recreate.
All existing Pods are killed before new ones are created when .spec.strategy.type==Recreate.

Airflow Clean Up Pods

Currently we have a CronJob to clean pods deployed by airflow.
Cleanup cron job in airflow is defined as follows
This Cleans all completed pods (Successful pod and Pods that are marked as Error).
I have a requirement where in CleanUp Pods CronJob shouldn't clean Pods that are marked as ERROR.
I checked Airflow Docs but couldn't get anything. Any other way in which i can achieve this
There are 2 airflow environments variables that might help.
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS - If True, all worker pods will be deleted upon termination
AIRFLOW__KUBERNETES__DELETE_WORKER_PODS_ON_FAILURE - If False (and delete_worker_pods is True), failed worker pods will not be deleted so users can investigate them. This only prevents removal of worker pods where the worker itself failed, not when the task it ran failed
for more details see here

Is it possible to run Kubernets Cronjob inside Container in Existing Pod?

I need to run this kubernets Cronjob inside the container which is already running in my existing pod.
But This Cronjob always creating a Pod and terminating it based on scheduler.
Is that possible to run the kubernets cron inside the existing pod container ?
or
In existing pod can we run this kubernets cron as container ?
You can trigger a kubernetes cronjob from within a pod, but it defeats the object of the cronjob resource.
There are alternatives to scheduling stuff. Airflow, as m_vemuri is one option. We've got a few cases here where we have actually setup a pod that runs crond, because we have a job that runs every minute, and the time it takes to scale up the pod, pull the image, run the job and then terminate the pod is often more than the 1 minute between runs.

What is difference between Kubernetes Jobs & Deployments

I see that Kubernetes Job & Deployment provide very similar configuration. Both can deploy one or more pods with certain configuration. So I have few queries around these:
Is the pod specification .spec.template different in Job & Deployment?
What is difference in Job's completions & Deployment's replicas?
If a command is run in a Deployment's only container and it completes (no server or daemon process containers), the pod would terminate. The same is applicable in a Job as well. So how is the pod lifecycle different in either of the resources?
Many resources in Kubernetes use a Pod template. Both Deployments and Jobs use it, because they manage Pods.
Controllers for workload resources create Pods from a pod template and manage those Pods on your behalf.
PodTemplates are specifications for creating Pods, and are included in workload resources such as Deployments, Jobs, and DaemonSets.
The main difference between Deployments and Jobs is how they handle a Pod that is terminated. A Deployment is intended to be a "service", e.g. it should be up-and-running, so it will try to restart the Pods it manage, to match the desired number of replicas. While a Job is intended to execute and successfully terminate.
Regarding spec.template: both Job and Deployment would include a similar definition. See: https://v1-21.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.21/#podtemplate-v1-core
Job completions and parallelism lets you split a task in sub-tasks. See https://kubernetes.io/docs/concepts/workloads/controllers/job/#parallel-jobs , https://kubernetes.io/docs/tasks/job/indexed-parallel-processing-static/ . Replicas in a Deployment would not offer this.
In a Deployment, the default restartPolicy of your Pod is set Always. In a Job: Never. A job is not meant to restart your container once it would have exited. A deployment is not meant to exit.

Deploying container as a CronJob to (Google) Kubernetes Engine - How to stop Pod after completing task

I have a container that runs some data fetching from a MySQL database and simply displays the result in console.log(), and want to run this as a cron job in GKE. So far I have the container working on my local machine, and have successfully deployed this to GKE (in terms of there being no errors thrown so far as I can see).
However, the pods that were created were just left as Running instead of stopping after completion of the task. Are the pods supposed to stop automatically after executing all the code, or do they require explicit instruction to stop and if so what is the command to terminate a pod after creation (by the Cron Job)?
I'm reading that there is supposedly some kind of termination grace period of ~30s by default, but after running a minutely-executed cronjob for ~20minutes, all the pods were still running. Not sure if there's a way to terminate the pods from inside the code, otherwise it would be a little silly to have a cronjob generating lots of pods left running idly..My cronjob.yaml below:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test
spec:
schedule: "5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: test
image: gcr.io/project/test:v1
# env:
# - name: "DELAY"
# value: 15
restartPolicy: OnFailure
A CronJob is essentially a cookie cutter for jobs. That is, it knows how to create jobs and execute them at a certain time. Now, that being said, when looking at garbage collection and clean up behaviour of a CronJob, we can simply look at what the Kubernetes docs have to say about this topic in the context of jobs:
When a Job completes, no more Pods are created, but the Pods are not deleted either. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output. The job object also remains after it is completed so that you can view its status. It is up to the user to delete old jobs after noting their status. Delete the job with kubectl (e.g. kubectl delete jobs/pi or kubectl delete -f ./job.yaml).
Adding a process.kill(); line in the code to explicitly end the process after the code has finished executing allowed the pod to automatically stop after execution
A job in Kubernetes is intended to run a single instance of a pod and ensure it runs to completion. As another answer specifies, a CronJob is a factory for Jobs which knows how and when to spawn a job according to the specified schedule.
Accordingly, and unlike a service which is intended to run forever, the container(s) in the pod created by the pod must exit upon completion of the job. There is a notable problem with the sidecar pattern which often requires manual pod lifecycle handling; if your main pod requires additional pods to provide logging or database access, you must arrange for these to exit upon completion of the main pod, otherwise they will remain running and k8s will not consider the job complete. In such circumstances, the pod associated with the job will never terminate.
The termination grace period is not applicable here: this timer applies after Kubernetes has requested that your pod terminate (e.g. if you delete it). It specifies the maximum time the pod is afforded to shutdown gracefully before the kubelet will summarily terminate it. If Kubernetes never considers your job to be complete, this phase of the pod lifecycle will not be entered.
Furthermore, old pods are kept around after completion for some time to allow perusal of logs and such. You may see pods listed which are not actively running and so not consuming compute resources on your worker nodes.
If your pods are not completing, please provide more information regarding the code they are running so we can assist in determining why the process never exits.