I created a k8s CronJob with the following schedule (run every minute):
schedule: "*/1 * * * *"
I see my CronJob created:
NAMESPACE NAME READY STATUS RESTARTS AGE
job-staging job-1593017820-tt2sn 2/3 Running 0 10m
My job simply does a Printf to the log, one time, then exits.
When I do a kubernetes get cronjob I see:
NAMESPACE NAME READY STATUS RESTARTS AGE
job-staging job */1 * * * * False 1 19m 19m
When I look at the logs, it looks like it only ran once, which was the first run. Do I need to prevent my program from exiting?
I assumed k8s would restart my program, but maybe that's a wrong assumption.
Your assumption about the behavior of Kubernetes ("restarting the program") is correct.
As you may know, a Job is basically a Kubernetes Pod that executes some process and successfully finishes when it exits with a zero exit code. The "Cron" part of CronJob is the most obvious, scheduling the Job to execute in a particular time pattern.
Most YAML objects for CronJobs include the restartPolicy: OnFailure key that prevents Kubernetes from rescheduling the Job for a non-zero exit code (the hello-world YAML file in Kubernetes documentation uses this flag).
From what I see in the logs obtained by your kubectl instruction, it looks like your Job is failing - because of the Status 1. I would recommend you check the logs of the CronJob using kubectl logs -f -n default job-1593017820-tt2sn for any possible errors in the execution of your script (if your script explicitly exits with an exit-code, check for a possible non-zero code).
[UPDATE]
CronJobs also have limitations:
A cron job creates a job object about once per execution time of its schedule. We say “about” because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.
I think these are pretty rare scenarios, but maybe you've found yourself in these rare situations. The documentation is here.
Related
I have triggered a cronjob and even after successful completion of the job it is not cleaning the pods.
Here are two statements that may help to understand it better.
successfulJobsHistoryLimit: 0
terminationGracePeriodSeconds: 30
Here I have set the successfulJobsHistoryLimit to 0 still it is not cleaning the pods even though they are in the completed state.
But if I use ttlSecondsAfterFinished, then it is cleaning the pods.
So my question is, Even after successfulJobsHistory is set to 0, Why is ttlSecondsAfterFinishedrequired to clean the pods of the job which is successfully completed?
successfulJobsHistoryLimit refers to the entity Job which is a separate entity from Pod
while terminationGracePeriodSeconds, ttlSecondsAfterFinished refer to Pods
when set successfulJobsHistoryLimit to 0, if you do kubectl get job you should not see entries for that specific job. but the pods stay as seen by kubectl get pods
pods which run to completion stay in the history until deleted or until its Job is deleted manually as it seems. thats why you still need ttlSecondsAfterFinishedrequired to delete them. this property is managed regardless of Job entity and is specific to Pods.
job-termination-and-cleanup states that:
When a Job completes, no more Pods are created, but the Pods are usually not deleted either. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output. The job object also remains after it is completed so that you can view its status. It is up to the user to delete old jobs after noting their status. Delete the job with kubectl (e.g. kubectl delete jobs/pi or kubectl delete -f ./job.yaml). When you delete the job using kubectl, all the pods it created are deleted too.
Its implied that pods arent deleted automatically unless the job is manually deleted, but not when its history entries are auto deleted.
For some context, I'm creating an API in python that creates K8s Jobs with user input in ENV variables.
Sometimes, it happens that the Image selected does not exist or has been deleted. Secrets does not exists or Volume isn't created. So it makes the Job in a crashloopbackoff or imagepullbackoff state.
First I'm am wondering if the ressource during this state are allocated to the job?
If yes, I don't want the Job to loop forever and lock resources to a never starting Job.
I've set the backofflimit to 0, but this is when the Job detect a Pod that goes in fail and tries to relaunch an other Pod to retry. In my case, I know that if a Pod fails for a job, then it's mostly due to OOM or code that fails and will always fails due to user input. So retrying will always fail.
But it doesn't limit the number of tries to crashloopbackoff or imagepullbackoff. Is there a way to set to terminate or fail the Job? I don't want to kill it, but just free the ressource and keep the events in (status.container.state.waiting.reason + status.container.state.waiting.message) or (status.container.state.terminated.reason + status.container.state.terminated.exit_code)
Could there be an option to set to limit the number of retry at the creation so I can free resources, but not to remove it to keep logs.
I have tested your first question and YES even if a pod is in crashloopbackoff state, the resources are still allocated to it !!! Here is my test: Are the Kubernetes requested resources by a pod still allocated to it when it is in crashLoopBackOff state?
Thanks for your question !
Long answer short, unfortunately there is no such option in Kubernetes.
However, you can do this manually by checking if the pod is in a crashloopbackoff then, unallocate its resources or simply delete the pod itself.
The following script delete any pod in the crashloopbackoff state from a specified namespace
#!/bin/bash
# This script check the passed namespace and delete pods in 'CrashLoopBackOff state
NAMESPACE="test"
delpods=$(sudo kubectl get pods -n ${NAMESPACE} |
grep -i 'CrashLoopBackOff' |
awk '{print $1 }')
for i in ${delpods[#]}; do
sudo kubectl delete pod $i --force=true --wait=false \
--grace-period=0 -n ${NAMESPACE}
done
Since we have passed the option --grace-period=0 the pod won't automatically restart again.
But, if after using this script or assigning it to a job, you noticed that the pod continues to restart and fall in the CrashLoopBackOff state again for some weird reason. Thera is a workaround for this, which is changing the restart policy of the pod:
A PodSpec has a restartPolicy field with possible values Always,
OnFailure, and Never. The default value is Always. restartPolicy
applies to all Containers in the Pod. restartPolicy only refers to
restarts of the Containers by the kubelet on the same node. Exited
Containers that are restarted by the kubelet are restarted with an
exponential back-off delay (10s, 20s, 40s …) capped at five minutes,
and is reset after ten minutes of successful execution. As discussed
in the Pods document, once bound to a node, a Pod will never be
rebound to another node.
See more details in the documentation or from here.
And that is it! Happy hacking.
Regarding the first question, it is already answered by bguess here.
I want to execute a script immediately after the container is completed successfully or terminated due to an error in the pod.
I tried by attaching handlers to Container lifecycle events like preStop but it is only called when a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others.
Reference - Kubernetes Doc: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
Is there an alternative approach to this?
From the official docs, as you said:
Kubernetes only sends the preStop event when a Pod is terminated. This means that the preStop hook is not invoked when the Pod is completed.
Although, the use of bare pods is not recommended. Consider using a Job Controller:
A Job creates one or more Pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions.
You can check job conditions and wait for them with this:
kubectl wait --for=condition=complete job/your-job, then run your script. In the meanwhile add preStop event to your pods definition to run script if pods are terminated. You can write extra script which will work in the background and will be checking if job is completed and then it will run script.
while kubectl get jobs your-job -o jsonpath='{.status.conditions[?
(#.type=="Complete")].status}' | grep True ; do <run your main script> ; done
See more: job-completion-task.
I'm using Azure Kubernetes Service to run a Go application that pulls from RabbitMQ, runs some processing, and returns it. The pods scale to handle an increase of jobs. Pretty run-of-the-mill stuff.
The HPA is setup like this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
production Deployment/production 79%/80% 2 10 10 4d11h
staging Deployment/staging 2%/80% 1 2 1 4d11h
What happens is as the HPA scales up and down, there are always 2 pods that will stay running. We've found that after running for so long, the Go app on those pods will time out. Sometimes that's days, sometimes it weeks. Yes, we could probably dig into the code and figure this out, but it's kind of a low priority for that team.
Another solution I've thought of would be to have the HPA remove the oldest pods first. This would mean that the oldest pod would never be more than a few hours old. A first-in, first-out model.
However, I don't see any clear way to do that. It's entirely possible that it isn't, but it seems like something that could work.
Am I missing something? Is there a way to make this work?
In my opinion(I also mentioned in comment) - the most simple(not sure about elegance) way is to have some cronjob that will periodically clean timed out pods.
One CronJob object is like one line of a crontab (cron table) file. It
runs a job periodically on a given schedule, written in Cron format.
CronJobs are useful for creating periodic and recurring tasks, like
running backups or sending emails. CronJobs can also schedule
individual tasks for a specific time, such as scheduling a Job for
when your cluster is likely to be idle.
Cronjobs examples and howto:
How To Use Kubernetes’ Job and CronJob
Kubernetes: Delete pods older than X days
https://github.com/dignajar/clean-pods <-- real example
Context
We have long running kubernetes jobs based on docker containers.
The containers needs resources (eg 15gb memory, 2 cpu) and we use autoscaler to scale up new worker nodes at request.
Scenario
Users can select the version of the docker image to be used for a job, eg 1.0.0, 1.1.0, or even a commit hash of the code the image was build from in test environment.
As we leave the docker tag to be freetext, the user can type a non-existing docker tag. Because of this the job pod comes in ImagePullBackOff state. The pod stays in this state and keeps the resources locked so that they cannot be reused by any other job.
Question
What is the right solution, that can be applied in kubernetes itself, for failing the pod immediately or at least quickly if a pull fails due to a non existing docker image:tag?
Possibilities
I looked into backofflimit. I have set it to 0, but this doesn't fail or remove the job. The resources are of course kept as well.
Maybe they can be killed by a cron job. Not sure how to do so.
Ideally, resources should not even be allocated for a job with an unexisting docker image. But I'm not sure if there is a possibility to easily achieve this.
Any other?
After Looking at your design, I would recommend to add InitContainer to Job specification to check existence of docker images with the given tag.
If the image with the tag doesn't exist in the registry, InitContainer can report an error and fail the Job's Pod by exiting with non-zero exit code.
After that Job's Pod will be restarted. After certain amount of attempts Job will get Failed state. By configuring .spec.ttlSecondsAfterFinished option, failed jobs can be wiped out.
If a Pod’s init container fails, Kubernetes repeatedly restarts the Pod until the init container succeeds. However, if the Pod has a restartPolicy of Never, Kubernetes does not restart the Pod.
If the image exists, InitContainer script exits with zero exit code and the main Job container image is going to be pulled and container starts.
When a Job completes, no more Pods are created, but the Pods are not deleted either.
By default, a Job will run uninterrupted unless a Pod fails (restartPolicy=Never) or a Container exits in error (restartPolicy=OnFailure), at which point the Job defers to the .spec.backoffLimit described above. Once .spec.backoffLimit has been reached the Job will be marked as failed and any running Pods will be terminated.
Another way to terminate a Job is by setting an active deadline. Do this by setting the .spec.activeDeadlineSeconds field of the Job to a number of seconds. The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
Note that a Job’s .spec.activeDeadlineSeconds takes precedence over its .spec.backoffLimit. Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds, even if the backoffLimit is not yet reached.
Here is more information: jobs.
You can also set-up concurrencyPolicy of cronjob to Replace and replace the currently running job with a new job.
Here is an example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/2 * * * *"
concurrencyPolicy: Replace
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster && sleep 420
restartPolicy: Never
Setting Replace value for concurrencyPolicy flag means if it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run.
Regardless of this solutions your problem lies in wrong images so automated deleting pods or jobs doesn't solve problem. Because if you don't change anything in definition of jobs and images your pods will still fail after creating job again.
Here is example of troubleshooting for Error: ImagePullBackOff Normal BackOff: ImagePullBackOff .
You can use failedJobsHistoryLimit for failed jobs and successfulJobsHistoryLimit for success jobs
With these two parameters, you can keep your job history clean
.spec.backoffLimit to specify the number of retries before considering a Job as failed.