Is there a way to automatically remove completed Jobs besides making a CronJob to clean up completed Jobs?
The K8s Job Documentation states that the intended behavior of completed Jobs is for them to remain in a completed state until manually deleted. Because I am running thousands of Jobs a day via CronJobs and I don't want to keep completed Jobs around.
You can now set history limits, or disable history altogether, so that failed or successful CronJobs are not kept around indefinitely. See my answer here. Documentation is here.
To set the history limits:
The .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit fields are optional. These fields specify how many completed and failed jobs should be kept. By default, they are set to 3 and 1 respectively. Setting a limit to 0 corresponds to keeping none of the corresponding kind of jobs after they finish.
The config with 0 limits would look like:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
This is possible from version 1.12 Alpha with ttlSecondsAfterFinished. An example from Clean Up Finished Jobs Automatically:
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-ttl
spec:
ttlSecondsAfterFinished: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
Another way using a field-selector:
kubectl delete jobs --field-selector status.successful=1
That could be executed in a cronjob, similar to others answers.
Create a service account, something like my-sa-name
Create a role with list and delete permissions for the resource jobs
Attach the role in the service account (rolebinding)
Create a cronjob that will use the service account that will check for completed jobs and delete them
# 1. Create a service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-sa-name
namespace: default
---
# 2. Create a role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: default
name: my-completed-jobs-cleaner-role
rules:
- apiGroups: [""]
resources: ["jobs"]
verbs: ["list", "delete"]
---
# 3. Attach the role to the service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: my-completed-jobs-cleaner-rolebinding
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: my-completed-jobs-cleaner-role
subjects:
- kind: ServiceAccount
name: my-sa-name
namespace: default
---
# 4. Create a cronjob (with a crontab schedule) using the service account to check for completed jobs
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: jobs-cleanup
spec:
schedule: "*/30 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: my-sa-name
containers:
- name: kubectl-container
image: bitnami/kubectl:latest
# I'm using bitnami kubectl, because the suggested kubectl image didn't had the `field-selector` option
command: ["sh", "-c", "kubectl delete jobs --field-selector status.successful=1"]
restartPolicy: Never
I've found the below to work
To remove failed jobs:
kubectl delete job $(kubectl get jobs | awk '$3 ~ 0' | awk '{print $1}')
To remove completed jobs:
kubectl delete job $(kubectl get jobs | awk '$3 ~ 1' | awk '{print $1}')
i'm using wernight/kubectl's kubectl image
scheduled a cron deleting anything that is
completed
2 - 9 days old (so I have 2 days to review any failed jobs)
it runs every 30mins so i'm not accounting for jobs that are 10+ days old
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cleanup
spec:
schedule: "*/30 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: kubectl-runner
image: wernight/kubectl
command: ["sh", "-c", "kubectl get jobs | awk '$4 ~ /[2-9]d$/ || $3 ~ 1' | awk '{print $1}' | xargs kubectl delete job"]
restartPolicy: Never
I recently built a kubernetes-operator to do this task.
After deploy it will monitor selected namespace and delete completed jobs/pods if they completed without errors/restarts.
https://github.com/lwolf/kube-cleanup-operator
Using jsonpath:
kubectl delete job $(kubectl get job -o=jsonpath='{.items[?(#.status.succeeded==1)].metadata.name}')
As stated in the documentation "It is up to the user to delete old jobs", see http://kubernetes.io/docs/user-guide/jobs/#job-termination-and-cleanup
I would run a pod to do this cleanup based on job name and certain conditions, thus letting kubernetes at least take care of the availability of your process here. You could run a recurring job for this (assuming you run kubernetes 1.5).
A simple way to delete them by running a cron job:
kubectl get jobs --all-namespaces | sed '1d' | awk '{ print $2, "--namespace", $1 }' | while read line; do kubectl delete jobs $line; done
Related
I am trying to create a cronjob , I have written a Springboot application for this and have created a abc-dev.yml file for application configuration
error: unable to recognize "src/java/k8s/abc-dev.yml": no matches for kind "CronJob" in version "apps/v1"
apiVersion: apps/v1
kind: CronJob
metadata:
name: abc-cron-job
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
container:
- name: abc-cron-job
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
If you are running kubernetes 1.20 or lower, the correct apiVersion value is:
apiVersion: batch/v1beta1
If you are running kubernetes 1.21 or higher, its
apiVersion: batch/v1
You can check the api-version of a resource with the
kubectl api-resources
command.
In this case:
kubectl api-resources | grep cronjob | awk -v N=3 '{print $N}'
The output is 'batch/v1'.
Cronjob belongs to batch/v1 k8s api. You should check api version before creating resources in any case they sometimes change.
https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
Say that I have a job history limit > 1, is there a way to use kubectl to find which jobs that have been spawned by a CronJob?
Use label.
$kubectl get jobs -n namespace -l created-by=cronjob
created-by=cronjob which define at your cronjob.
apiVersion: batch/v1
kind: CronJob
metadata:
name: hello
spec:
schedule: "* * * * *"
jobTemplate:
metadata:
labels:
created-by: cronjob
spec:
template:
spec:
containers:
- name: hello
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Ideally you would use the --field-selector label to get all jobs belonging to a particular CronJob, but field selector does not support all the fields. The best way I could find is using jsonPath:
Substitute CronJobName to the name of your cronjob to get all jobs that belong to that CronJob
kubectl get jobs $(kubectl get jobs -o=jsonpath='{.items[?(#.metadata.ownerReferences[*].name=="CronJobName")].metadata.name}')
I am running a kubernetes job on GKE and want to delete the job automatically after the job is completed.
Here is my configuration file for the job.
I set ttlSecondsAfterFinished: 0 but the job was not deleted automatically.
Am I missing something?
cluster / node version: 1.12.8-gke.10
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
# automatically clean up finished job
ttlSecondsAfterFinished: 0
template:
metadata:
name: myjob
spec:
containers:
- name: myjob
image: gcr.io/GCP_PROJECT/myimage:COMMIT_SHA
command: ["bash"]
args: ["deploy.sh"]
# Do not restart containers after they exit
restartPolicy: Never
Looks like this feature is still not available on GKE now.
https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
https://cloud.google.com/kubernetes-engine/docs/concepts/alpha-clusters#about_feature_stages
To ensure stability and production quality, normal GKE clusters only enable features that
are beta or higher. Alpha features are not enabled on normal clusters because they are not
production-ready or upgradeable.
It depends how did you create job.
If you are using CronJob you can use spec.successfulJobsHistoryLimit and spec.failedJobsHistoryLimit and set values to 0. It will say K8s to not sotre any previously finished jobs.
If you are creating pods using YAMLs you have to deleted it manually. However you can also set CronJob to execute command each 5 minutes.
kubectl delete job $(kubectl get job -o=jsonpath='{.items[?(#.status.succeeded==1)].metadata.name}')
It will delete all jobs with status succeded.
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: jp-runner
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "delete"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: jp-runner
subjects:
- kind: ServiceAccount
name: sa-jp-runner
roleRef:
kind: Role
name: jp-runner
apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-jp-runner
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: clean-jobs
spec:
concurrencyPolicy: Forbid
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-jp-runner
containers:
- name: clean-jobs
image: bitnami/kubectl:latest
command:
- /bin/sh
- -c
- kubectl delete jobs $(kubectl get jobs -o=jsonpath='{.items[?(#.status.succeeded==1)].metadata.name}')
restartPolicy: Never
backoffLimit: 0
We recently started using istio Istio to establish a service-mesh within out Kubernetes landscape.
We now have the problem that jobs and cronjobs do not terminate and keep running forever if we inject the istio istio-proxy sidecar container into them. The istio-proxy should be injected though to establish proper mTLS connections to the services the job needs to talk to and comply with our security regulations.
I also noticed the open issues within Istio (istio/issues/6324) and kubernetes (kubernetes/issues/25908), but both do not seem to provide a valid solution anytime soon.
At first a pre-stop hook seemed suitable to solve this issue, but there is some confusion about this conecpt itself: kubernetes/issues/55807
lifecycle:
preStop:
exec:
command:
...
Bottomline: Those hooks will not be executed if the the container successfully completed.
There are also some relatively new projects on GitHub trying to solve this with a dedicated controller (which I think is the most preferrable approach), but to our team they do not feel mature enough to put them right away into production:
k8s-controller-sidecars
K8S-job-sidecar-terminator
In the meantime, we ourselves ended up with the following workaround that execs into the sidecar and sends a SIGTERM signal, but only if the main container finished successfully:
apiVersion: v1
kind: ServiceAccount
metadata:
name: terminate-sidecar-example-service-account
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: terminate-sidecar-example-role
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get","delete"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: terminate-sidecar-example-rolebinding
subjects:
- kind: ServiceAccount
name: terminate-sidecar-example-service-account
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: terminate-sidecar-example-role
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: terminate-sidecar-example-cronjob
labels:
app: terminate-sidecar-example
spec:
schedule: "30 2 * * *"
jobTemplate:
metadata:
labels:
app: terminate-sidecar-example
spec:
template:
metadata:
labels:
app: terminate-sidecar-example
annotations:
sidecar.istio.io/inject: "true"
spec:
serviceAccountName: terminate-sidecar-example-service-account
containers:
- name: ****
image: ****
command:
- "/bin/ash"
- "-c"
args:
- node index.js && kubectl exec -n ${POD_NAMESPACE} ${POD_NAME} -c istio-proxy -- bash -c "sleep 5 && /bin/kill -s TERM 1 &"
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
So, the ultimate question to all of you is: Do you know of any better workaround, solution, controller, ... that would be less hacky / more suitable to terminate the istio-proxy container once the main container finished its work?
- command:
- /bin/sh
- -c
- |
until curl -fsI http://localhost:15021/healthz/ready; do echo \"Waiting for Sidecar...\"; sleep 3; done;
echo \"Sidecar available. Running the command...\";
<YOUR_COMMAND>;
x=$(echo $?); curl -fsI -X POST http://localhost:15020/quitquitquit && exit $x
Update: sleep loop can be omitted if holdApplicationUntilProxyStarts is set to true (globally or as an annotation) starting with istio 1.7
This was not a misconfiguration, this was a bug in upstream Kubernetes. As of September of 2019, this has been resolved by Istio by introducing a /quitquitquit endpoint to the Pilot agent.
Unfortunately, Kubernetes has not been so steadfast in solving this issue themselves. So it still does exist in some facets. However, the /quitquitquit endpoint in Istio should have resolved the problem for this specific use case.
I have found a work around by editing the configmap of istio-sidecar-injector as per the link Istio documentation
https://istio.io/docs/setup/additional-setup/sidecar-injection/
apiVersion: v1
kind: ConfigMap
metadata:
name: istio-sidecar-injector
data:
config: |-
policy: enabled
neverInjectSelector:
- matchExpressions:
- {key: job-name, operator: Exists}
But with this changes in our cronjob sidecar will not inject and istio policy will not apply on the cronjob job, and in our case we dont want any policy to be enforced by istio
Note :- job-name is by default label gets added in the pod creation
For those for whom curl is a luxury my wget version of the Dimitri's code:
command:
- /bin/sh
- -c
- |
until wget -q --spider http://127.0.0.1:15021/healthz/ready 2>/dev/null; do echo "Waiting for Istio sidecar..."; sleep 3; done;
echo \"Sidecar available. Running...\";
<COMMAND>;
x=$?; wget -q --post-data='' -S -O /dev/null http://127.0.0.1:15020/quitquitquit && exit $x
Is it possible to restart pods automatically based on the time?
For example, I would like to restart the pods of my cluster every morning at 8.00 AM.
Use a cronjob, but not to run your pods, but to schedule a Kubernetes API command that will restart the deployment everyday (kubectl rollout restart). That way if something goes wrong, the old pods will not be down or removed.
Rollouts create new ReplicaSets, and wait for them to be up, before killing off old pods, and rerouting the traffic. Service will continue uninterrupted.
You have to setup RBAC, so that the Kubernetes client running from inside the cluster has permissions to do needed calls to the Kubernetes API.
---
# Service account the client will use to reset the deployment,
# by default the pods running inside the cluster can do no such things.
kind: ServiceAccount
apiVersion: v1
metadata:
name: deployment-restart
namespace: <YOUR NAMESPACE>
---
# allow getting status and patching only the one deployment you want
# to restart
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: deployment-restart
namespace: <YOUR NAMESPACE>
rules:
- apiGroups: ["apps", "extensions"]
resources: ["deployments"]
resourceNames: ["<YOUR DEPLOYMENT NAME>"]
verbs: ["get", "patch", "list", "watch"] # "list" and "watch" are only needed
# if you want to use `rollout status`
---
# bind the role to the service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: deployment-restart
namespace: <YOUR NAMESPACE>
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: deployment-restart
subjects:
- kind: ServiceAccount
name: deployment-restart
namespace: <YOUR NAMESPACE>
And the cronjob specification itself:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: deployment-restart
namespace: <YOUR NAMESPACE>
spec:
concurrencyPolicy: Forbid
schedule: '0 8 * * *' # cron spec of time, here, 8 o'clock
jobTemplate:
spec:
backoffLimit: 2 # this has very low chance of failing, as all this does
# is prompt kubernetes to schedule new replica set for
# the deployment
activeDeadlineSeconds: 600 # timeout, makes most sense with
# "waiting for rollout" variant specified below
template:
spec:
serviceAccountName: deployment-restart # name of the service
# account configured above
restartPolicy: Never
containers:
- name: kubectl
image: bitnami/kubectl # probably any kubectl image will do,
# optionaly specify version, but this
# should not be necessary, as long the
# version of kubectl is new enough to
# have `rollout restart`
command:
- 'kubectl'
- 'rollout'
- 'restart'
- 'deployment/<YOUR DEPLOYMENT NAME>'
Optionally, if you want the cronjob to wait for the deployment to roll out, change the cronjob command to:
command:
- bash
- -c
- >-
kubectl rollout restart deployment/<YOUR DEPLOYMENT NAME> &&
kubectl rollout status deployment/<YOUR DEPLOYMENT NAME>
Another quick and dirty option for a pod that has a restart policy of Always (which cron jobs are not supposed to handle - see creating a cron job spec pod template) is a livenessProbe that simply tests the time and restarts the pod on a specified schedule
ex. After startup, wait an hour, then check hour every minute, if hour is 3(AM) fail probe and restart, otherwise pass
livenessProbe:
exec:
command:
- exit $(test $(date +%H) -eq 3 && echo 1 || echo 0)
failureThreshold: 1
initialDelaySeconds: 3600
periodSeconds: 60
Time granularity is up to how you return the date and test ;)
Of course this does not work if you are already utilizing the liveness probe as an actual liveness probe ¯\_(ツ)_/¯
I borrowed idea from #Ryan Lowe but modified it a bit. It will restart pod older than 24 hours
livenessProbe:
exec:
command:
- bin/sh
- -c
- "end=$(date -u +%s);start=$(stat -c %Z /proc/1 | awk '{print int($1)}'); test $(($end-$start)) -lt 86400"
There's a specific resource for that: CronJob
Here an example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: your-cron
spec:
schedule: "*/20 8-19 * * 1-5"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
metadata:
labels:
app: your-periodic-batch-job
spec:
containers:
- name: my-image
image: your-image
imagePullPolicy: IfNotPresent
restartPolicy: OnFailure
change spec.concurrencyPolicy to Replace if you want to replace the old pod when starting a new pod. Using Forbid, the new pod creation will be skip if the old pod is still running.
According to cronjob-in-kubernetes-to-restart-delete-the-pod-in-a-deployment
you could create a kind: CronJob with a jobTemplate having containers. So your CronJob will start those containers with a activeDeadlineSeconds of one day (until restart). According to you example, it will be then schedule: 0 8 * * ? for 8:00AM
We were able to do this by modifying the manifest (passing a random param every 3 hours) file of deployment from a CRON job:
We specifically used Spinnaker for triggering deployments:
We created a CRON job in Spinnaker like below:
Configuration step looks like:
The Patch Manifest looks like: (K8S restarts PODS when YAML changes, to counter that check bottom of post)
As there can be a case where all pods can restart at a same time, causing downtime, we have a policy for Rolling Restart where maxUnavailablePods is 0%
spec:
# replicas: 1
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 50%
maxUnavailable: 0%
This spawns new pods and then terminates old ones.