Kubernetes Cronjob: Reset missed start times after cluster recovery - kubernetes

I have a cluster that includes a Cronjob scheduled to run every 5 minutes.
We recently experienced an issue that incurred downtime and required manual recovery of the cluster. Although now healthy again, this particular cronjob is failing to run with the following error:
Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
I understand that the Cronjob has 'missed' a number of scheduled jobs while the cluster was down, and this has past a threshold at which no further jobs will be scheduled.
How can I reset the number of missed start times and have these jobs scheduled again (without scheduling all the missed jobs to suddenly run?)

Per the kubernetes Cronjob docs, there does not seem to be a way to cleanly resolve this. Setting the .spec.startingDeadlineSeconds value to a large number will re-schedule all missed occurrences that fall within the increased window.
My solution was just to kubectl delete cronjob x-y-z and recreate it, which worked as desired.

Related

Prevent Kubernetes from restarting job on OOM

I'm having a problem where a job runs out of memory, and K8s is continually trying to run it again, despite it having no chance of succeeding, since it's going to use the same amount of memory every time. I want it to simply let the job fail and sit there, and I'll take care of creating a new one with a higher memory limit, if desired, and/or deleting the existing failed job.
I have
restartPolicy: Never
backoffLimit: 0
From the not-so-clear things I've read, setting backoffLimit to 1 might do the trick. But is that true? Would that make it restart once, or is the 1 the number of times it can be run, including the first attempt?
Should I switch from jobs to pods? The main issue with that, is then I don't think K8s will restart the pod on another K8s worker node should the one it's running on go down, and that's a situation where I'd want the job to automatically be restarted on another node.
backoffLimit should be 1 as shown below
backoffLimit: 1
Setting backoffLimit to 0 is correct, if the Job is supposed to run once and not be restarted:
backoffLimit: Specifies the number of retries before marking this job failed.
Switching your workload to a Pod would make sense as long as you are not interested in restarts in combination with backoff limits.

How to terminate only certain pods based on wheather or not they have finnished a certain task in kubernetes?

I'm having trouble with finding a solution that allows to terminate only certain pods in a deployment.
The application running inside the pods does some processing which can a take lot of time to be finished.
Let's say I have 10 tasks that are stored in a database and I issue a command to scale the deployment to 10 pods.
Let's say that after some time 3 of the pods have finished their tasks and are no longer required.
How can i scale down the deployment from 10 to 7 while terminate only the pods that have finished the tasks and not the pods that are still processing those tasks?
I don't know if more details are needed but i will happily edit the question if there are more details needed to give an answer for this kind of problem.
In this case Kubernetes Job might be better suited for this kind of task.

What does shutdown look like for a kubernetes cron job pod when it's being terminated by "replace" concurrency policy?

I couldn't find anything in the official kubernetes docs about this. What's the actual low-level process for replacing a long-running cron job? I'd like to understand this so my application can handle it properly.
Is it a clean SIGHUP/SIGTERM signal that gets sent to the app that's running?
Is there a waiting period after that signal gets sent, so the app has time to cleanup/shutdown before it potentially gets killed? If so, what's that timeout in seconds? Or does it wait forever?
For reference, here's the Replace policy explanation in the docs:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
Concurrency Policy
Replace: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
A CronJob has just another Pod underneath.
When a Cronjob with a concurrency policy of "Replace" is still active, the Job will be deleted, which also deletes the Pod.
When a Pod is deleted the Linux container/s will be sent a SIGTERM and then a SIGKILL after a grace period, defaulted to 30 seconds. The terminationGracePeriodSeconds property in a PodSpec can be set to override that default.
Due to the flag added to the DeleteJob call, it sounds like this delete is only deleting values from the kube key/value store. Which would mean the new Job/Pod could be created while the current Job/Pod is still terminating. You could confirm with a Job that doesn't respect a SIGTERM and has a terminationGracePeriodSeconds set to a few times your clusters scheduling speed.

Prevent K8S HPA from deleting pod after load is reduced

I have sidekiq custom metrics coming from prometheus adapter. Using thoes queue metrics from prometheus i have setup HPA. When jobs in queue in sidekiq goes above say 1000 jobs HPA triggers 10 new pods. Then each pod will execute 100 jobs in queue. When jobs are reduced to say 400. HPA will scale-down. But when scale-down happens, hpa kills pods say 4 pods are killed. Thoes 4 pods were still running jobs say each pod was running 30-50 jobs. Now when hpa deletes these 4 pods, jobs running on them are also terminated. And thoes jobs are marked as failed in sidekiq.
So what i want to achieve is stop hpa from deleting pods which are executing the jobs. Moreover i want hpa to not scale-down even after load is reduced to minimum, instead delete pods when jobs in queue in sidekiq metrics is 0.
Is there any way to achieve this?
Weird usage, honestly: you're wasting resources even your traffic is on the cool-down phase but since you didn't provide further details, here it is.
Actually, it's not possible to achieve what you desire since the common behavior is to support a growing load against your workload. The unique wait to achieve this (and this is not recommended) is to change the horizontal-pod-autoscaler-downscale-stabilization Kubernetes Controller Manager's flag to a higher value.
JFI, the doc warns you:
Note: When tuning these parameter values, a cluster operator should be aware of the possible consequences. If the delay (cooldown) value is set too long, there could be complaints that the Horizontal Pod Autoscaler is not responsive to workload changes. However, if the delay value is set too short, the scale of the replicas set may keep thrashing as usual.
As per the discussion and the work done by #Hb_1993 it can be done with a pre-stop hook to delay the eviction, where the delay is based on operation time or some logic to know if the procession is done or not.
A pre-stop hook is a lifecycle method which is invoked before a pod is evicted, and we can then attach to this event and perform some logic like performing ping check to make sure that our pod has completed the processing of current request.
PS- Use this solution with a pinch of salt as this might not work in all the cases or produce unintended results.
To do this, we introduce asleep in the preStop hook that delays the
shutdown sequence.
More details can be found in this article.
https://blog.gruntwork.io/delaying-shutdown-to-wait-for-pod-deletion-propagation-445f779a8304

How do I stop a CronJob from recreating failed Jobs?

When for whatever reasons I delete the pod running the Job that was started by a CronJob, I immediately see a new pod being created. It is only once I delete something like six times the backoffLimit number of pods, that new ones stop being created.
Of course, if I'm actively monitoring the process, I can delete the CronJob, but what if the Pod inside the job fails when I'm not looking? I would like it not to be recreated.
How can I stop the CronJob from persisting in creating new jobs (or pods?), and wait until the next scheduled time if the current job/pod failed? Is there something similar to Jobs' backoffLimit, but for CronJobs?
Set startingDeadlineSeconds to a large value or left unset (the default).
At the same time set .spec.concurrencyPolicy as Forbid and the CronJobs skips the new job run while previous created job is still running.
If startingDeadlineSeconds is set to a large value or left unset (the default) and if concurrencyPolicy is set to Forbid, the job will not be run if failed.
Concurrent policy field you can add to specification to defintion of your CronJob (.spec.concurrencyPolicy), but this is optional.
It specifies how to treat concurrent executions of a job that is created by this CronJob. The spec may specify only one of these three concurrency policies:
Allow (default) - The cron job allows concurrently running jobs
Forbid - The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
Replace - If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
It is good to know that currency policy applies just to the jobs created by the same CronJob.
If there are multiple CronJobs, their respective jobs are always allowed to run concurrently.
A CronJob is counted as missed if it has failed to be created at its scheduled time. For example, If concurrencyPolicy is set to Forbid and a CronJob was attempted to be scheduled when there was a previous schedule still running, then it would count as missed.
For every CronJob, the CronJob controller checks how many schedules it missed in the duration from its last scheduled time until now. If there are more than 100 missed schedules, then it does not start the job and logs the error
More information you can find here: CronJobs and AutomatedTask.
I hope it helps.
CronJob creates a job by a "backoffLimit" with a default value (6) in your case, and restart policy by default is (Always)
Better to make backoffLimit > (0) and make restart policy = (Never) and increase startingDeadlineSeconds to be lower than or equal to your interval or you can customize it up on your request to control the run time of each CronJob run
Additionally, you may stop "concurrencyPolicy" >> (Forbid)