I have a job definition based on example from kubernetes website.
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout-6
spec:
activeDeadlineSeconds: 30
completions: 1
parallelism: 1
template:
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["exit", "1"]
restartPolicy: Never
I would like run this job once and not restart if fails. With comand exit 1 kubernetes trying to run new pod to get exit 0 code until reach activeDeadlineSeconds timeout. How can avoid that? I would like run build commands in kubernetes to check compilation and if compilation fails I'll get exit code different than 0. I don't want run compilation again.
Is it possible? How?
By now this is possible by setting backoffLimit: 0 which tells the controller to do 0 retries. default is 6
If you want a one-try command runner, you probably should create bare pod, because the job will try to execute the command until it's successful or the active deadline is met.
Just create the pod from your template:
apiVersion: v1
kind: Pod
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["exit", "1"]
restartPolicy: Never
Sadly there is currently no way to prevent the job controller to just respawn new pods when they fail, but the kubernetes community is working on a solution, see:
"Backoff policy and failed pod limit" https://github.com/kubernetes/community/pull/583
Related
When deploying NebulaGraph in binary packages (RPM/DEB), I could leverage the logrotate from OS, which is a basic expectation/solution for cleaning up the logs generated.
While in K8s deployment, there is no such layer at the OS level anymore, what is the state-of-the-art thing I should do? or it's a missing piece in Nebula-Operator?
I think we could attach log dir to a pod running logrotate, too, but it looks not elegant to me(or I am wrong?).
After some study, I think the best way could be to leverage what K8s Conjob API could provide.
We could create it like:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: log-cleanup
spec:
schedule: "0 0 * * *" # run the job every day at midnight
jobTemplate:
spec:
template:
spec:
containers:
- name: log-cleanup
image: your-log-cleanup-image:latest
command: ["/bin/sh", "-c", "./cleanup.sh /path/to/log"]
restartPolicy: OnFailure
And in /cleanup.sh we could either simple put the log removing logic or log archiving logic(say move them to s3)
Kubernetes doesn't delete a manually created completed job when historylimit is set when using newer versions of kubernetes clients.
mycron.yaml:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
namespace: myjob
spec:
schedule: "* * 10 * *"
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Creating cronjob:
kubectl create -f mycron.yaml
Creating job manually:
kubectl create job -n myjob --from=cronjob/hello hello-job
Result:
Job is completed but not removed
NAME COMPLETIONS DURATION AGE
hello-job 1/1 2s 6m
Tested with kubernetes server+client versions of 1.19.3 and 1.20.0
However when I used an older client version (1.15.5) against the server's 1.19/1.20 it worked well.
Comparing the differences while using different client versions:
kubernetes-controller log:
Using client v1.15.5 I have this line in the log (But missing when using client v1.19/1.20):
1 event.go:291] "Event occurred" object="myjob/hello" kind="CronJob" apiVersion="batch/v1beta1" type="Normal" reason="SuccessfulDelete" message="Deleted job hello-job"
Job yaml:
Exactly the same, except the ownerReference part:
For client v1.19/1.20
ownerReferences:
- apiVersion: batch/v1beta1
kind: CronJob
name: hello
uid: bb567067-3bd4-4e5f-9ca2-071010013727
For client v1.15
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: CronJob
name: hello
uid: bb567067-3bd4-4e5f-9ca2-071010013727
And that is it. No other informations in the logs, no errors, no warnings ..nothing (checked all the pods logs in kube-system)
Summary:
It seems to be a bug in kubectl client itself but not in kubernetes server. But don't know how to proceed further.
edit:
When I let the cronjob itself to do the job (ie hitting the time in the expression), it will remove the completed job successfully.
I have just started with Kubernetes.
I need to run a Deployment in Kubernetes with a container that competes for execution after ~10-15 minutes.
When I tried, "restart Policy=Never" doesn't hold true with Deployments.
Reason for opting for Deployment is to use Replicas.
Please provide your inputs on how I can achieve multiple replicas of my Deployment with the container that completes execution and not keep running.
You can run a Job as below where the container runs a sleep command for 15m. After 15 minutes the container will exit and pod will be terminated.
apiVersion: batch/v1
kind: Job
metadata:
name: job
spec:
template:
spec:
containers:
- command:
- sh
- -c
- sleep 15m
image: bash:5.1.0
restartPolicy: Never
My yaml file
apiVersion: batch/v1
kind: Job
metadata:
name: auto
labels:
app: auto
spec:
backoffLimit: 5
activeDeadlineSeconds: 100
template:
metadata:
labels:
app: auto
spec:
containers:
- name: auto
image: busybox
imagePullPolicy: Always
ports:
- containerPort: 9080
imagePullSecrets:
- name: imageregistery
restartPolicy: Never
The pods are killed appropriately but the job ceases to kill itself post 100 seconds.
Is there anything that we could do to kill the job post the container/pod's functionality is completed.
kubectl version --short
Client Version: v1.6.1
Server Version: v1.13.10+IKS
kubectl get jobs --namespace abc
NAME DESIRED SUCCESSFUL AGE
auto 1 1 26m
Thank you,
The default way to delete jobs after they are done is to use kubectl delete command.
As mentioned by #Erez:
Kubernetes is keeping pods around so you can get the
logs,configuration etc from it.
If you don't want to do that manually you could write a script running in your cluster that would check for jobs with completed status and than delete them.
Another way would be to use TTL feature that deletes the jobs automatically after a specified number of seconds. However, if you set it to zero it will clean them up immediately. For more details of how to set it up look here.
Please let me know if that helped.
What I would like to do is to run some backup scripts on each of Kubernetes nodes periodically. I want it to run inside Kubernetes cluster in contrast to just adding script to each node's crontab. This is because I will store backup on the volume mounted to the node by Kubernetes. It differs from the configuration but it could be CIFS filesystem mounted by Flex plugin or awsElasticBlockStore.
It would be perfect if CronJob will be able to template DaemonSet (instead of fixing it as jobTemplate) and there will be possibility to set DaemonSet restart policy to OnFailure.
I would like to avoid defining n different CronJobs for each of n nodes and then associate them together by defining nodeSelectors since this will be not so convenient to maintain in environment where nodes count changes dynamically.
What I can see problem was discussed here without any clear conclusion: https://github.com/kubernetes/kubernetes/issues/36601
Maybe do you have any hacks or tricks to achieve this?
You can use DaemonSet with the following bash script:
while :; do
currenttime=$(date +%H:%M)
if [[ "$currenttime" > "23:00" ]] && [[ "$currenttime" < "23:05" ]]; then
do_something
else
sleep 60
fi
test "$?" -gt 0 && notify_failed_job
done
i know i am late to party,
First option :
Using the parallelism to run multiple Job PODs, with topologySpreadConstraints to spread/schedule the PODs on all the nodes.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
labels:
jobgroup: parallel
spec:
schedule: "*/5 * * * *"
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
jobTemplate:
spec:
template:
metadata:
name: kubejob
labels:
jobgroup: parallel
spec:
topologySpreadConstraints:
- maxSkew: 2
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
jobgroup: parallel
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo Job Pod is Running ; sleep 10']
restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 5
concurrencyPolicy: Allow
Option two :
Using cronjob you can apply the YAML template of daemon set and delete it after a certain duration which will work as job ideally on all nodes. Also if a custom docker image runs inside the deamon set it could also be complete once done with execution.
Extra:
i would suggest checking out this CRD also : https://github.com/AmitKumarDas/metac/tree/master/examples/daemonjob
Read more about the CRD : https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
You can write your own custom resource and add it into the kuberetes.