How to set a time limit for a Kubernetes job? - kubernetes

I'd like to launch a Kubernetes job and give it a fixed deadline to finish. If the pod is still running when the deadline comes, I'd like the job to automatically be killed.
Does something like this exist? (At first I thought that the Job spec's activeDeadlineSeconds covered this use case, but now I see that activeDeadlineSeconds only places a limit on when a job is re-tried; it doesn't actively kill a slow/runaway job.)

You can self-impose timeouts on the container's entrypoint command by using GNU timeout utility.
For example the following Job that computes first 4000 digits of pi will time out after 10 seconds:
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["/usr/bin/timeout", "10", "perl", "-Mbignum=bpi", "-wle", "print bpi(4000)"]
restartPolicy: Never
(Manifest adopted from https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#running-an-example-job)
You can play with the numbers and see it timeout or not. Typically computing 4000 digits of pi takes ~23 seconds on my workstation, so if you set it to 5 seconds it'll probably always fail and if you set it to 120 seconds it will always work.

From the way I understand the documentation of activeDeadlineSeconds section is that it refers to the active time of a Job and after this time the Job is considered Failed.
Official doc statement:
The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded
https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup

You could instead add the activeDeadlineSeconds to the pod spec in the pod template defined as part of the job. This way the pods which are spawned by the job are limited with the timeout.

Related

Jobs not getting cleaned up until 30+ minutes after TTL has passed

We are using the Job ttlSecondsAfterFinished attribute to automatically clean up finished jobs. When we had a very small number of jobs (10-50), the jobs (and their pods) would get cleaned up approximately 60 seconds after completion. However, now that we have ~5000 jobs running on our cluster, it takes 30 + minutes for a Job object to get cleaned after completion.
This is a problem because although the Jobs are just sitting there, not consuming resources, we do use a ResourceQuota (selector count/jobs.batch) to control our workload, and those completed jobs are taking up space in the ResourceQuota.
I know that jobs only get marked for deletion once the TTL has passed, and are not guaranteed to be deleted immediately then, but 30 minutes is a very long time. What could be causing this long delay?
Our logs indicate that our k8s API servers are not under heavy load, and that API response times are reasonable.
Solution 1
How do you use the Job ttlSecondsAfterFinished? You can specify .spec.ttlSecondsAfterFinished to the value what you need. Below is the example from official documentation
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-ttl
spec:
ttlSecondsAfterFinished: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
And please note this:
Note that the TTL period, e.g. .spec.ttlSecondsAfterFinished field of Jobs, can be modified after the job is created or has finished. However, once the Job becomes eligible to be deleted (when the TTL has expired), the system won't guarantee that the Jobs will be kept, even if an update to extend the TTL returns a successful API response.
For more information: https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/#updating-ttl-seconds
Solution 2
As it mentioned above in the comment field, you can try to play with kube-controller-manager and increase the number of TTL-after-finished controller workers that are allowed to sync concurrently by using the following flag option:
kube-controller-manager --concurrent-ttl-after-finished-syncs int32 Default: 5

What is the default .spec.activeDeadlineSeconds in kubernetes job in you don't explicitly set it

In Kubernetes job, there is a spec for .spec.activeDeadlineSeconds. If you don't explicitly set it, what will be the default value? 600 secs?
here is the example from k8s doc
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout
spec:
backoffLimit: 5
activeDeadlineSeconds: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
assume I remove the line
activeDeadlineSeconds: 100
By default, a Job will run uninterrupted.
If you don't set activeDeadlineSeconds, the job will not have active deadline limit. It means activeDeadlineSeconds doesn't have default value.
By the way, there are several ways to terminate the job.(Of course, when a Job completes, no more Pods are created.)
Pod backoff failure policy(.spec,.backofflimit)
You can set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.
Setting an active deadline(.spec.activeDeadlineSeconds)
The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
Note that a Job's .spec.activeDeadlineSeconds takes precedence over its .spec.backoffLimit. Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds, even if the backoffLimit is not yet reached.
It is not set by default. Here is a note from changeLog:
ActiveDeadlineSeconds is validated in workload controllers now, make sure it's not set anywhere (it shouldn't be set by default and having it set means your controller will restart the Pods at some point) (#38741)

How to run kubernetes cronjob immediately

Im very new to kubernetes ,here i tired a cronjob yaml in which the pods are created at every 1 minute.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
but the pods are created only after 1 minute.is it possible to run job immediately and after that every 1 minute ?
As already stated in the comments CronJob is backed by Job. What you can do is literally launch Cronjob and Job resources using the same spec at the same time. You can do that conveniently using helm chart or kustomize.
Alternatively you can place both manifests in the same file or two files in the same directory and then use:
kubectl apply -f <file/dir>
With this workaround initial Job is started and then after some time Cronjob.
The downside of this solution is that first Job is standalone and it is not included in the Cronjob's history. Another possible side effect is that the first Job and first CronJob can run in parallel if the Job cannot finish its tasks fast enough. concurrencyPolicy does not take that Job into consideration.
From the documentation:
A cron job creates a job object about once per execution time of its
schedule. We say "about" because there are certain circumstances where
two jobs might be created, or no job might be created. We attempt to
make these rare, but do not completely prevent them.
So if you want to keep the task execution more strict, perhaps it may be better to use Bash wrapper script with sleep 1 between task executions or design an app that forks sub processes after specified interval, create a container image and run it as a Deployment.

Is it possible to specify a delay for pod restart when Kubernetes liveness probe fails?

Got a simple REST API server built with python gunicorn, which runs multiple threads to accept requests. After running for some time, some of these threads crash. Got a script to detect the number of dead threads (using log files). Once this number crosses some threshold, we want to restart gunicorn. This script is configured to be used as liveness probe.
The script works fine and restarts the pod as expected. But there are a few live threads that are still processing requests. Also, gunicorn keeps a backlog queue of accepted requests that it cannot process yet, since other requests are processing. Is there a way to specify a delay for the pod restart so the other running threads and the backlog requests have some time to finish processing?
You can use a prestop hook. Offcial docs here
How to use documented here.
You can also use terminationGracePeriodSeconds to allow graceful termination of pod.
Best Practices here
You can configure graceful pod termination with terminationGracePeriodSeconds
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: test
spec:
replicas: 1
template:
spec:
containers:
- name: test
image: ...
terminationGracePeriodSeconds: 60

How to fail a (cron) job after a certain number of retries?

We have a Kubernetes cluster of web scraping cron jobs set up. All seems to go well until a cron job starts to fail (e.g., when a site structure changes and our scraper no longer works). It looks like every now and then a few failing cron jobs will continue to retry to the point it brings down our cluster. Running kubectl get cronjobs (prior to a cluster failure) will show too many jobs running for a failing job.
I've attempted following the note described here regarding a known issue with the pod backoff failure policy; however, that does not seem to work.
Here is our config for reference:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: scrape-al
spec:
schedule: '*/15 * * * *'
concurrencyPolicy: Allow
failedJobsHistoryLimit: 0
successfulJobsHistoryLimit: 0
jobTemplate:
metadata:
labels:
app: scrape
scrape: al
spec:
template:
spec:
containers:
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
command:
- /opt/openstates/openstates/pupa-scrape.sh
args:
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
Ideally we would prefer that a cron job would be terminated after N retries (e.g., something like kubectl delete cronjob my-cron-job after my-cron-job has failed 5 times). Any ideas or suggestions would be much appreciated. Thanks!
You can tell your Job to stop retrying using backoffLimit.
Specifies the number of retries before marking this job failed.
In your case
spec:
template:
spec:
containers:
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
command:
- /opt/openstates/openstates/pupa-scrape.sh
args:
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
You set 3 asbackoffLimit of your Job. That means when a Job is created by CronJob, It will retry 3 times if fails. This controls Job, not CronJob
When Job is failed, another Job will be created again as your scheduled period.
You want:
If I am not wrong, you want to stop scheduling new Job, when your scheduled Jobs are failed for 5 times. Right?
Answer:
In that case, this is not possible automatically.
Possible solution:
You need to suspend CronJob so than it stop scheduling new Job.
Suspend: true
You can do this manually. If you do not want to do this manually, you need to setup a watcher, that will watch your CronJob status, and will update CronJob to suspend if necessary.