How do I stop a CronJob from recreating failed Jobs? - kubernetes

When for whatever reasons I delete the pod running the Job that was started by a CronJob, I immediately see a new pod being created. It is only once I delete something like six times the backoffLimit number of pods, that new ones stop being created.
Of course, if I'm actively monitoring the process, I can delete the CronJob, but what if the Pod inside the job fails when I'm not looking? I would like it not to be recreated.
How can I stop the CronJob from persisting in creating new jobs (or pods?), and wait until the next scheduled time if the current job/pod failed? Is there something similar to Jobs' backoffLimit, but for CronJobs?

Set startingDeadlineSeconds to a large value or left unset (the default).
At the same time set .spec.concurrencyPolicy as Forbid and the CronJobs skips the new job run while previous created job is still running.
If startingDeadlineSeconds is set to a large value or left unset (the default) and if concurrencyPolicy is set to Forbid, the job will not be run if failed.
Concurrent policy field you can add to specification to defintion of your CronJob (.spec.concurrencyPolicy), but this is optional.
It specifies how to treat concurrent executions of a job that is created by this CronJob. The spec may specify only one of these three concurrency policies:
Allow (default) - The cron job allows concurrently running jobs
Forbid - The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
Replace - If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
It is good to know that currency policy applies just to the jobs created by the same CronJob.
If there are multiple CronJobs, their respective jobs are always allowed to run concurrently.
A CronJob is counted as missed if it has failed to be created at its scheduled time. For example, If concurrencyPolicy is set to Forbid and a CronJob was attempted to be scheduled when there was a previous schedule still running, then it would count as missed.
For every CronJob, the CronJob controller checks how many schedules it missed in the duration from its last scheduled time until now. If there are more than 100 missed schedules, then it does not start the job and logs the error
More information you can find here: CronJobs and AutomatedTask.
I hope it helps.

CronJob creates a job by a "backoffLimit" with a default value (6) in your case, and restart policy by default is (Always)
Better to make backoffLimit > (0) and make restart policy = (Never) and increase startingDeadlineSeconds to be lower than or equal to your interval or you can customize it up on your request to control the run time of each CronJob run
Additionally, you may stop "concurrencyPolicy" >> (Forbid)

Related

Kubernetes concurrencyPolicy Forbid not preventing concurrent jobs

I have a backup job running, scheduled to run every 24 hours. I have the concurrency policy set to "Forbid." I am testing my backup, and I create jobs manually for testing, but these tests are not forbidding concurrent runs. I use:
kubectl create job --from=cronjob/my-backup manual-backup-(timestamp)
... and when I run them twice in close succession, I find that both begin the work.
Does the concurrency policy only apply to jobs created by the Cron job scheduler? Does it ignore manually-created jobs? If it is ignoring those, are there other ways to manually run the job such that the Cron job scheduler knows they are there?
...Does the concurrency policy only apply to jobs created by the Cron job scheduler?
concurrencyPolicy applies to CronJob as it influences how CronJob start job. It is part of CronJob spec and not the Job spec.
...Does it ignore manually-created jobs?
Yes.
...ways to manually run the job such that the Cron job scheduler knows they are there?
Beware that when concurrencyPolicy is set to Forbid and when the time has come for CronJob to run job; but it detected there is job belongs to this CronJob is running; it will count the current attempt as missed. It is better to temporary set the CronJob spec.suspend to true if you manually start a job base out of the CronJob and the execution time will span over the next schedule time.

Kubernetes CronJob - Do ConcurrencyPolicy and manual job execution/creation communicate with one another?

I have a Kube cronjob that has a concurrencyPolicy of Replace. As I'd have expected, documentation suggests this means if there is a job running when the next cycle in the schedule is met while the previous job is running that the previous job would be killed off / cancelled.
What I want to know is, if I manually kick off a job with kubectl create job --from, does the concurrencyPolicy still play a part? It seems as though the answer is no from the testing I've been doing (and then I'll have multiple concurrent jobs), but would like to confirm.
If I'm correct and they don't work together, is there a way to have this functionality? Basically wanting to be able to deploy a job and then test it without having to wait around for it to kick off, but also don't want to have two jobs running at the same time.
Thanks!

What does shutdown look like for a kubernetes cron job pod when it's being terminated by "replace" concurrency policy?

I couldn't find anything in the official kubernetes docs about this. What's the actual low-level process for replacing a long-running cron job? I'd like to understand this so my application can handle it properly.
Is it a clean SIGHUP/SIGTERM signal that gets sent to the app that's running?
Is there a waiting period after that signal gets sent, so the app has time to cleanup/shutdown before it potentially gets killed? If so, what's that timeout in seconds? Or does it wait forever?
For reference, here's the Replace policy explanation in the docs:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
Concurrency Policy
Replace: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
A CronJob has just another Pod underneath.
When a Cronjob with a concurrency policy of "Replace" is still active, the Job will be deleted, which also deletes the Pod.
When a Pod is deleted the Linux container/s will be sent a SIGTERM and then a SIGKILL after a grace period, defaulted to 30 seconds. The terminationGracePeriodSeconds property in a PodSpec can be set to override that default.
Due to the flag added to the DeleteJob call, it sounds like this delete is only deleting values from the kube key/value store. Which would mean the new Job/Pod could be created while the current Job/Pod is still terminating. You could confirm with a Job that doesn't respect a SIGTERM and has a terminationGracePeriodSeconds set to a few times your clusters scheduling speed.

Getting kubernetes cronjob history

I have a CronJob which runs every 15 Mins. Say, Its running for the last 1 year.
Is it possible to get the complete history using Kube API? Or, Is it possible to control the maximum history that can be stored?
Also, Can we get the status( Success/ Failure ) of each run along with the total completion time?
Does the POD die after completing the Job?
A CronJob creates a Job object for each execution.
For regular Jobs you can configure .spec.ttlSecondsAfterFinished along with the TTLAfterFinished feature gate to configure which Job instances are retained.
For CronJob you can specify the .spec.successfulJobsHistoryLimit to configure the number of managed Job instances to be retained.
You can get the desired information from these objects.
The pod does not die when the job completes, it is the other way around: If the pod terminates without an error, the job is considered completed.
The .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit fields are optional.
These fields specify how many completed and failed jobs should be kept.
By default, they are set to 3 and 1 respectively.

How to run kubernetes pod for a set period of time each day?

I'm looking for a way to deploy a pod on kubernetes to run for a few hours each day. Essentially I want it to run every morning at 8AM and continue running until about 5:30 PM.
I've been researching a lot and haven't found a way to deploy the pod with a specific timeframe in mind. I've found cron jobs, but that seems to be to be for pods that terminate themselves, whereas mine should be running constantly.
Is there any way to deploy my pod on kubernetes this way? Or should I just set up the pod itself to run its intended application based on its internal clock?
According to the Kubernetes architecture, a Job creates one or more pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the job tracks the successful completions. When a specified number of successful completions is reached, the job itself is complete.
In simple words, Jobs run until completion or failure. That's why there is no option to schedule a Cron Job termination in Kubernetes.
In your case, you can start a Cron Job regularly and terminate it using one of the following options:
A better way is to terminate a container by itself, so you can add such functionality to your application or use Cron. More information about how to add Cron to the Docker container, you can find here.
You can use another Cron Job to terminate your Cron Job. You need to run a command inside a Pod to find and delete a Pod related to your Job. For more information, you can look through this link. But it is not a good way, because your Cron Job will always have failed status.
In both cases, you need to check with what status your Cron Job was finished and use the correct RestartPolicy accordingly.
It seems you can implement using a cronjob object,
[ https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#creating-a-cron-job ]