Jobs not getting cleaned up until 30+ minutes after TTL has passed - kubernetes

We are using the Job ttlSecondsAfterFinished attribute to automatically clean up finished jobs. When we had a very small number of jobs (10-50), the jobs (and their pods) would get cleaned up approximately 60 seconds after completion. However, now that we have ~5000 jobs running on our cluster, it takes 30 + minutes for a Job object to get cleaned after completion.
This is a problem because although the Jobs are just sitting there, not consuming resources, we do use a ResourceQuota (selector count/jobs.batch) to control our workload, and those completed jobs are taking up space in the ResourceQuota.
I know that jobs only get marked for deletion once the TTL has passed, and are not guaranteed to be deleted immediately then, but 30 minutes is a very long time. What could be causing this long delay?
Our logs indicate that our k8s API servers are not under heavy load, and that API response times are reasonable.

Solution 1
How do you use the Job ttlSecondsAfterFinished? You can specify .spec.ttlSecondsAfterFinished to the value what you need. Below is the example from official documentation
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-ttl
spec:
ttlSecondsAfterFinished: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
And please note this:
Note that the TTL period, e.g. .spec.ttlSecondsAfterFinished field of Jobs, can be modified after the job is created or has finished. However, once the Job becomes eligible to be deleted (when the TTL has expired), the system won't guarantee that the Jobs will be kept, even if an update to extend the TTL returns a successful API response.
For more information: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/#updating-ttl-seconds
Solution 2
As it mentioned above in the comment field, you can try to play with kube-controller-manager and increase the number of TTL-after-finished controller workers that are allowed to sync concurrently by using the following flag option:
kube-controller-manager --concurrent-ttl-after-finished-syncs int32 Default: 5

Related

What is the default .spec.activeDeadlineSeconds in kubernetes job in you don't explicitly set it

In Kubernetes job, there is a spec for .spec.activeDeadlineSeconds. If you don't explicitly set it, what will be the default value? 600 secs?
here is the example from k8s doc
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout
spec:
backoffLimit: 5
activeDeadlineSeconds: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
assume I remove the line
activeDeadlineSeconds: 100
By default, a Job will run uninterrupted.
If you don't set activeDeadlineSeconds, the job will not have active deadline limit. It means activeDeadlineSeconds doesn't have default value.
By the way, there are several ways to terminate the job.(Of course, when a Job completes, no more Pods are created.)
Pod backoff failure policy(.spec,.backofflimit)
You can set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.
Setting an active deadline(.spec.activeDeadlineSeconds)
The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
Note that a Job's .spec.activeDeadlineSeconds takes precedence over its .spec.backoffLimit. Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds, even if the backoffLimit is not yet reached.
It is not set by default. Here is a note from changeLog:
ActiveDeadlineSeconds is validated in workload controllers now, make sure it's not set anywhere (it shouldn't be set by default and having it set means your controller will restart the Pods at some point) (#38741)

Deploying container as a Job to (Google) Kubernetes Engine - How to terminate Pod after completing task

Goal is to terminate the pod after completion of Job.
This is my yaml file. Currently, my pod status is completed after running the job.
apiVersion: batch/v1
kind: Job
metadata:
# Unique key of the Job instance
name: example-job
spec:
template:
metadata:
name: example-job
spec:
containers:
- name: container-name
image: my-img
command: ["python", "main.py"]
# Do not restart containers after they exit
restartPolicy: Never
# of retries before marking as failed.
backoffLimit: 4
You can configure and remove the jobs once complete
inside the YAML you can configure limit of keeping the PODs
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
you can set the history limits using above config in YAML.
The .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit
fields are optional. These fields specify how many completed and
failed jobs should be kept. By default, they are set to 3 and 1
respectively. Setting a limit to 0 corresponds to keeping none of the
corresponding kind of jobs after they finish.
backoffLimit: 4 will retires a number of times and try to run the job before marking it as a failed one.
Read more at : https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#jobs-history-limits
A job of a pod basically terminates itself after the main container of that pod finishes successful. If it returns a failure error code it will retry as many times as you specified in your backoffLimit.
So it seems as if your container does not terminate after it finishes whatever job it is supposed to do. Without knowing anything about your job image I cannot tell you what you need to do exactly.
However, it seems as if you need to adapt your main.py to properly exit after it has done what it is supposed to do.
If you want to delete the pod after completing the task, then just delete the job with kubectl:
$ kubectl delete jobs
You can also use yaml file script to automatically delete the jobs by using the following command:
$kubectl delete -f ./job. yaml
When you delete the job using kubectl, all the created pods get deleted too.
You can check whether these jobs and pods deleted or not with the following commands:
$ kubectl get jobs and $ kubectl get pods
For more details refer to the Jobs.
I have tried the above steps in my own environment and it worked for me.

How to run kubernetes cronjob immediately

Im very new to kubernetes ,here i tired a cronjob yaml in which the pods are created at every 1 minute.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
but the pods are created only after 1 minute.is it possible to run job immediately and after that every 1 minute ?
As already stated in the comments CronJob is backed by Job. What you can do is literally launch Cronjob and Job resources using the same spec at the same time. You can do that conveniently using helm chart or kustomize.
Alternatively you can place both manifests in the same file or two files in the same directory and then use:
kubectl apply -f <file/dir>
With this workaround initial Job is started and then after some time Cronjob.
The downside of this solution is that first Job is standalone and it is not included in the Cronjob's history. Another possible side effect is that the first Job and first CronJob can run in parallel if the Job cannot finish its tasks fast enough. concurrencyPolicy does not take that Job into consideration.
From the documentation:
A cron job creates a job object about once per execution time of its
schedule. We say "about" because there are certain circumstances where
two jobs might be created, or no job might be created. We attempt to
make these rare, but do not completely prevent them.
So if you want to keep the task execution more strict, perhaps it may be better to use Bash wrapper script with sleep 1 between task executions or design an app that forks sub processes after specified interval, create a container image and run it as a Deployment.

Is there a way to delete pods automatically through YAML after they have status 'Completed'?

I have a YAML file which creates a pod on execution. This pod extracts data from one of our internal systems and uploads to GCP. It takes around 12 mins to do so after which the status of the pod changes to 'Completed', however I would like to delete this pod once it has completed.
apiVersion: v1
kind: Pod
metadata:
name: xyz
spec:
restartPolicy: Never
volumes:
- name: mount-dir
hostPath:
path: /data_in/datos/abc/
initContainers:
- name: abc-ext2k8s
image: registrysecaas.azurecr.io/secaas/oracle-client11c:11.2.0.4-latest
volumeMounts:
- mountPath: /media
name: mount-dir
command: ["/bin/sh","-c"]
args: ["sqlplus -s CLOUDERA/MYY4nGJKsf#hal5:1531/dbmk #/media/ext_hal5_lk_org_localfisico.sql"]
imagePullSecrets:
- name: regcred
Is there a way to acheive this?
Typically you don't want to create bare Kubernetes pods. The pattern you're describing of running some moderate-length task in a pod, and then having it exit, matches a Job. (Among other properties, a job will reschedule a pod if the node it's on fails.)
Just switching this to a Job doesn't directly address your question, though. The documentation notes:
When a Job completes, no more Pods are created, but the Pods are not deleted either. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output. The job object also remains after it is completed so that you can view its status. It is up to the user to delete old jobs after noting their status.
So whatever task creates the pod (or job) needs to monitor it for completion, and then delete the pod (or job). (Consider using the watch API or equivalently the kubectl get -w option to see when the created objects change state.) There's no way to directly specify this in the YAML file since there is a specific intent that you can get useful information from a completed pod.
If this is actually a nightly task that you want to run at midnight or some such, you do have one more option. A CronJob will run a job on some schedule, which in turn runs a single pod. The important relevant detail here is that CronJobs have an explicit control for how many completed Jobs they keep. So if a CronJob matches your pattern, you can set successfulJobsHistoryLimit: 0 in the CronJob spec, and created jobs and their matching pods will be deleted immediately.

How to set a time limit for a Kubernetes job?

I'd like to launch a Kubernetes job and give it a fixed deadline to finish. If the pod is still running when the deadline comes, I'd like the job to automatically be killed.
Does something like this exist? (At first I thought that the Job spec's activeDeadlineSeconds covered this use case, but now I see that activeDeadlineSeconds only places a limit on when a job is re-tried; it doesn't actively kill a slow/runaway job.)
You can self-impose timeouts on the container's entrypoint command by using GNU timeout utility.
For example the following Job that computes first 4000 digits of pi will time out after 10 seconds:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["/usr/bin/timeout", "10", "perl", "-Mbignum=bpi", "-wle", "print bpi(4000)"]
restartPolicy: Never
(Manifest adopted from https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#running-an-example-job)
You can play with the numbers and see it timeout or not. Typically computing 4000 digits of pi takes ~23 seconds on my workstation, so if you set it to 5 seconds it'll probably always fail and if you set it to 120 seconds it will always work.
From the way I understand the documentation of activeDeadlineSeconds section is that it refers to the active time of a Job and after this time the Job is considered Failed.
Official doc statement:
The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded
https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
You could instead add the activeDeadlineSeconds to the pod spec in the pod template defined as part of the job. This way the pods which are spawned by the job are limited with the timeout.