How to verify a cronjob successfully completed in Kubernetes - kubernetes

I am trying to create a cronjob that runs the command date in a single busybox container. The command should run every minute and must complete within 17 seconds or be terminated by Kubernetes. The cronjob name and container name should both be hello.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
jobTemplate:
metadata:
name: hello
spec:
completions: 1
activeDeadlineSeconds: 17
template:
metadata:
creationTimestamp: null
spec:
containers:
- image: busybox
name: hello
command: ["/bin/sh","-c","date"]
resources: {}
restartPolicy: OnFailure
schedule: '*/1 * * * *'
status: {}
I want to verify that the job executed successfully at least once.
I tried it using the command k get cronjob -w which gives me this result.
Is there another way to verify that the job executes successfully? Is it a good way to add a command date to the container?

CronJob internally creates Job which internally creates Pod. Watch for the job that gets created by the CronJob
kubectl get jobs --watch
The output is similar to this:
NAME COMPLETIONS DURATION AGE
hello-4111706356 0/1 0s
hello-4111706356 0/1 0s 0s
hello-4111706356 1/1 5s 5s
And you can see the number of COMPLETIONS
#Replace "hello-4111706356" with the job name in your system
pods=$(kubectl get pods --selector=job-name=hello-4111706356 --output=jsonpath={.items[*].metadata.name})
Check the pod logs
kubectl logs $pods

you can check the logs of the pods created by the cronjob resource. Have a look at this question and let me know if this solves your query.

You can directly check the status of the jobs. Cronjob is just controlling a Kubernetes job.
Run kubectl get jobs and it will give you the completion status.
> kubectl get jobs
NAME COMPLETIONS DURATION AGE
datee-job 0/1 of 3 24m 24m

Related

How make kubernetes job fail with it cannot find the pod image?

When a container image is not present on the cluster the pod fails with the error ErrImageNeverPull but the job never fails. Is there a configuration that I can add to make sure the job fails if the pod startup fails.
apiVersion: batch/v1
kind: Job
metadata:
name: image-not-present
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 120
template:
spec:
serviceAccountName: consolehub
containers:
- name: image-not-present
image: aipaintr/image_not_present:latest
imagePullPolicy: Never
restartPolicy: OnFailure
You can config activeDeadlineSeconds for this case. However, you have know how long your job take to reach Complete status to avoid this timeout can kill your pod processing.
From the documents:
The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
For example: I have created job with wrong image and activeDeadlineSeconds: 100. Obviously, the pod stuck with status Pending because of wrong image.kubectl describe pod
After 100 seconds, the Job was Fail and the pod was killed as well.
kubectl describe job

How the pod can reflect the application exit code of K8S Job

I'm running a K8S job, with the following flags:
apiVersion: batch/v1
kind: Job
metadata:
name: my-EP
spec:
template:
metadata:
labels:
app: EP
spec:
restartPolicy: "Never"
containers:
- name: EP
image: myImage
The Job starts, runs my script that runs some application that sends me an email and then terminates. The application returns the exit code to the bash script.
when I run the command:
kubectl get pods, I get the following:
NAME READY STATUS RESTARTS AGE
my-EP-94rh8 0/1 Completed 0 2m2s
Sometimes there are issues, and the network not connected or no license available.
I would like that to be visible to the pod user.
My question is, can I propagate the script exit code to be seen when I run the above get pods command?
I.E instead of the "Completed" status, I would like to see my application exit code - 0, 1, 2, 3....
or maybe there is a way to see it in the Pods Statuses, in the describe command?
currently I see:
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Is this possible?
The a non-zero exit code on k8s jobs will fall into the Failed pod status. There really isn't a way for you to have the exit code shown with kubectl get pods but you could output the pod status with -ojson and then pipe it into jq looking for the exit code. Something like the following from this post might work
kubectl get pod pod_name -c container_name-n namespace -ojson | jq .status.containerStatuses[].state.terminated.exitCode
or this, with the items[] in the json
kubectl get pods -ojson | jq .items[].status.containerStatuses[].state.terminated.exitCode
Alternatively, as u/blaimi mentioned, you can do it without jq, like this:
kubectl get pod pod_name -o jsonpath --template='{.status.containerStatuses[*].state.terminated.exitCode}

Kubernetes doesn't remove completed jobs for a Cronjob

Kubernetes doesn't delete a manually created completed job when historylimit is set when using newer versions of kubernetes clients.
mycron.yaml:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
namespace: myjob
spec:
schedule: "* * 10 * *"
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Creating cronjob:
kubectl create -f mycron.yaml
Creating job manually:
kubectl create job -n myjob --from=cronjob/hello hello-job
Result:
Job is completed but not removed
NAME COMPLETIONS DURATION AGE
hello-job 1/1 2s 6m
Tested with kubernetes server+client versions of 1.19.3 and 1.20.0
However when I used an older client version (1.15.5) against the server's 1.19/1.20 it worked well.
Comparing the differences while using different client versions:
kubernetes-controller log:
Using client v1.15.5 I have this line in the log (But missing when using client v1.19/1.20):
1 event.go:291] "Event occurred" object="myjob/hello" kind="CronJob" apiVersion="batch/v1beta1" type="Normal" reason="SuccessfulDelete" message="Deleted job hello-job"
Job yaml:
Exactly the same, except the ownerReference part:
For client v1.19/1.20
ownerReferences:
- apiVersion: batch/v1beta1
kind: CronJob
name: hello
uid: bb567067-3bd4-4e5f-9ca2-071010013727
For client v1.15
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: CronJob
name: hello
uid: bb567067-3bd4-4e5f-9ca2-071010013727
And that is it. No other informations in the logs, no errors, no warnings ..nothing (checked all the pods logs in kube-system)
Summary:
It seems to be a bug in kubectl client itself but not in kubernetes server. But don't know how to proceed further.
edit:
When I let the cronjob itself to do the job (ie hitting the time in the expression), it will remove the completed job successfully.

How Can I set Kubernetes Cronjob to run at a specific time

When I set the Cronjob Schedule as */1 * * * *,it would work.
When I set any number which is in 0-59 to the crontab minute,such as 30 * * * *,it would work as well.
However when I set the Cronjob Schedule as 30 11 * * *,it even doesn`t create a job at 11:30.
All the config is followed:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "33 11 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello-cronjob
image: busybox
command: ["bash","-c","date;echo Hello from the Kubernetes cluste"]
restartPolicy: OnFailure
This is probably because your cluster is running in a different timezone then the one used by you.
You can check what timezone will be set in a POD using:
kubectl run -i --tty busybox --image=busybox --restart=Never -- date.
As for your yaml it looks good, there is no need to change anything with the spec.schedule value.
A small hint that might be helpful to you which is checking the logs from Jobs.
When you create CronJob when it's scheduled it will spawn a Job, you can see them using kubectl get jobs.
$ kubectl get jobs
NAME DESIRED SUCCESSFUL AGE
hello-1552390680 1 1 7s
If you use the name of that job hello-1552390680 and set it as a variable you can check the logs from that job.
$ pods=$(kubectl get pods --selector=job-name=hello-1552390680 --output=jsonpath={.items..metadata.name})
You can later check logs:
$ kubectl logs $pods
Tue Mar 12 11:38:04 UTC 2019
Hello from the Kubernetes cluster
Try this once and test result
0 30 11 1/1 * ? *
http://www.cronmaker.com/

Why do pods with completed status still show up in kubctl get pods?

I have executed the samples from the book "Kubernetes Up and Running" where a pod with a work queue is run, then a k8s job is created 5 pods to consume all the work on the queue. I have reproduced the yaml api objects below.
My Expectation is that once a k8s job completes then it's pods would be deleted but kubectl get pods -o wide shows the pods are still around even though it reports 0/1 containers ready and they still seem to have ip addresses assigned see output below.
When will completed job pods be removed from the output of kubectl get pods why is that not right after all the containers in the pod finish?
Are the pods consuming any resources when they complete like an IP address or is the info being printed out historical?
Output from kubectl after all the pods have consumed all the messages.
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
consumers-bws9f 0/1 Completed 0 6m 10.32.0.35 gke-cluster1-default-pool-3796b2ee-rtcr
consumers-d25cs 0/1 Completed 0 6m 10.32.0.33 gke-cluster1-default-pool-3796b2ee-rtcr
consumers-jcwr8 0/1 Completed 0 6m 10.32.2.26 gke-cluster1-default-pool-3796b2ee-tpml
consumers-l9rkf 0/1 Completed 0 6m 10.32.0.34 gke-cluster1-default-pool-3796b2ee-rtcr
consumers-mbd5c 0/1 Completed 0 6m 10.32.2.27 gke-cluster1-default-pool-3796b2ee-tpml
queue-wlf8v 1/1 Running 0 22m 10.32.0.32 gke-cluster1-default-pool-3796b2ee-rtcr
The follow three k8s api calls were executed these are cut and pasted from the book samples.
Run a pod with a work queue
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
labels:
app: work-queue
component: queue
chapter: jobs
name: queue
spec:
replicas: 1
template:
metadata:
labels:
app: work-queue
component: queue
chapter: jobs
spec:
containers:
- name: queue
image: "gcr.io/kuar-demo/kuard-amd64:1"
imagePullPolicy: Always
Expose the pod as a service so that the worker pods can get to it.
apiVersion: v1
kind: Service
metadata:
labels:
app: work-queue
component: queue
chapter: jobs
name: queue
spec:
ports:
- port: 8080
protocol: TCP
targetPort: 8080
selector:
app: work-queue
component: queue
Post 100 items to the queue then run a job with 5 pods executing in parallel until the queue is empty.
apiVersion: batch/v1
kind: Job
metadata:
labels:
app: message-queue
component: consumer
chapter: jobs
name: consumers
spec:
parallelism: 5
template:
metadata:
labels:
app: message-queue
component: consumer
chapter: jobs
spec:
containers:
- name: worker
image: "gcr.io/kuar-demo/kuard-amd64:1"
imagePullPolicy: Always
args:
- "--keygen-enable"
- "--keygen-exit-on-complete"
- "--keygen-memq-server=http://queue:8080/memq/server"
- "--keygen-memq-queue=keygen"
restartPolicy: OnFailure
The docs say it pretty well:
When a Job completes, no more Pods are created, but the Pods are not
deleted either. Keeping them around allows you to still view the logs
of completed pods to check for errors, warnings, or other diagnostic
output. The job object also remains after it is completed so that you
can view its status. It is up to the user to delete old jobs after
noting their status. Delete the job with kubectl (e.g. kubectl delete
jobs/pi or kubectl delete -f ./job.yaml). When you delete the job
using kubectl, all the pods it created are deleted too.
It shows completed status when it actually terminated. If you set restartPloicy:Never( when you don't want to run more then once) then it goes to this state.
Terminated: Indicates that the container completed its execution and has stopped running. A container enters into this when it has successfully completed execution or when it has failed for some reason. Regardless, a reason and exit code is displayed, as well as the container’s start and finish time. Before a container enters into Terminated, preStop hook (if any) is executed.
...
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 30 Jan 2019 11:45:26 +0530
Finished: Wed, 30 Jan 2019 11:45:26 +0530
...