How to delete kubernetes failed/completed jobs

How to delete kubernetes failed/completed jobs - kubernetes

How can I clean up the failed and completed pods created by kubernetes job automatically without using cronjob. I want to keep only the last pod created by job.
How can we accomplish that?

...clean up the failed and completed pods created by kubernetes job automatically without using cronjob
If you specify ttlSecondsAfterFinished to the same period as the Job schedule, you should see only the last pod until the next Job starts. You can prolong the duration to keep more pods in the system this way and not wait until they are explicitly delete.

Related

kubernetes schedule job after cron

I have Cron name "cronX" and a Job name "JobY"
how can I configure kubernetes to run "JobY" after "cronX" finished?
I know I can do it using API call from "cronX" to start "JobY" but I don't want to do that using an API call.
Is there any Kubernetes configuration to schedule this?

is it possible that this pod will contain 2 containers and one of them will run only after the second container finish?
Negative, more details here. If you only have 2 containers to run, you can place the first one under initContainers and another under containers and schedule the pod.
No built-in K8s configuration available to do workflow orchestration. You can try Argo workflow to do this.

Set a Pod as owner reference for another pod in client go program

Is it possible to set a running pod as owner of another pod which is to be created. I tired but in that case pod creation fails.

This is not directly supported by Kubernetes. When you have a Pod that depends on the existence of another one (e.g. needs a Database or similar), you could use a Init Container. This will delay the container start until the init container finishes. This is a good way to apply e.g. waiting conditions.

I think you can use Kubernetes Jobs.
A Job creates one or more Pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the Job tracks the successful completions. When a specified number of successful completions is reached, the task (ie, Job) is complete. Deleting a Job will clean up the Pods it created.
A simple case is to create one Job object in order to reliably run one Pod to completion. The Job object will start a new Pod if the first Pod fails or is deleted (for example due to a node hardware failure or a node reboot).
More information you can find here: jobs-kubernetes.

What is the right way to free up kubernetes resources for a kubernetes job that fails pulling the image?

Context
We have long running kubernetes jobs based on docker containers.
The containers needs resources (eg 15gb memory, 2 cpu) and we use autoscaler to scale up new worker nodes at request.
Scenario
Users can select the version of the docker image to be used for a job, eg 1.0.0, 1.1.0, or even a commit hash of the code the image was build from in test environment.
As we leave the docker tag to be freetext, the user can type a non-existing docker tag. Because of this the job pod comes in ImagePullBackOff state. The pod stays in this state and keeps the resources locked so that they cannot be reused by any other job.
Question
What is the right solution, that can be applied in kubernetes itself, for failing the pod immediately or at least quickly if a pull fails due to a non existing docker image:tag?
Possibilities
I looked into backofflimit. I have set it to 0, but this doesn't fail or remove the job. The resources are of course kept as well.
Maybe they can be killed by a cron job. Not sure how to do so.
Ideally, resources should not even be allocated for a job with an unexisting docker image. But I'm not sure if there is a possibility to easily achieve this.
Any other?

After Looking at your design, I would recommend to add InitContainer to Job specification to check existence of docker images with the given tag.
If the image with the tag doesn't exist in the registry, InitContainer can report an error and fail the Job's Pod by exiting with non-zero exit code.
After that Job's Pod will be restarted. After certain amount of attempts Job will get Failed state. By configuring .spec.ttlSecondsAfterFinished option, failed jobs can be wiped out.
If a Pod’s init container fails, Kubernetes repeatedly restarts the Pod until the init container succeeds. However, if the Pod has a restartPolicy of Never, Kubernetes does not restart the Pod.
If the image exists, InitContainer script exits with zero exit code and the main Job container image is going to be pulled and container starts.

When a Job completes, no more Pods are created, but the Pods are not deleted either.
By default, a Job will run uninterrupted unless a Pod fails (restartPolicy=Never) or a Container exits in error (restartPolicy=OnFailure), at which point the Job defers to the .spec.backoffLimit described above. Once .spec.backoffLimit has been reached the Job will be marked as failed and any running Pods will be terminated.
Another way to terminate a Job is by setting an active deadline. Do this by setting the .spec.activeDeadlineSeconds field of the Job to a number of seconds. The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
Note that a Job’s .spec.activeDeadlineSeconds takes precedence over its .spec.backoffLimit. Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds, even if the backoffLimit is not yet reached.
Here is more information: jobs.
You can also set-up concurrencyPolicy of cronjob to Replace and replace the currently running job with a new job.
Here is an example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/2 * * * *"
concurrencyPolicy: Replace
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster && sleep 420
restartPolicy: Never
Setting Replace value for concurrencyPolicy flag means if it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run.
Regardless of this solutions your problem lies in wrong images so automated deleting pods or jobs doesn't solve problem. Because if you don't change anything in definition of jobs and images your pods will still fail after creating job again.
Here is example of troubleshooting for Error: ImagePullBackOff Normal BackOff: ImagePullBackOff .

You can use failedJobsHistoryLimit for failed jobs and successfulJobsHistoryLimit for success jobs
With these two parameters, you can keep your job history clean
.spec.backoffLimit to specify the number of retries before considering a Job as failed.

Scheduling a controller to run every one hour in Kubernetes

I have a console application which does some operations when run and I generate an image of it using docker. Now, I would like to deploy it to Kubernetes and run it every hour, is it possible that I could do it in K8?
I have read about Cron jobs but that's being offered only from version 1.4

The short answer. Sure, you can do it with a CronJob and yes it does create a Pod. You can configure Job History Limits to control how many failed, completed pods you want to keep before Kubernetes deletes them.
Note that CronJob is a subset of the Job resource.

Kubernetes POD automation - schedule a new deployment based on the successful completion of the old deployment

I'm looking for some help in scheduling deployments in Kubernetes.
I need to schedule a new deployment based on the successful completion of an old deployment. How can I achieve that in Kubernetes?
My first deployment will be running for 3-4 hours doing fetching some data. Once it gets completed I need to delete that deployment and deploy a new one.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse