How the pod can reflect the application exit code of K8S Job - kubernetes

I'm running a K8S job, with the following flags:
apiVersion: batch/v1
kind: Job
metadata:
name: my-EP
spec:
template:
metadata:
labels:
app: EP
spec:
restartPolicy: "Never"
containers:
- name: EP
image: myImage
The Job starts, runs my script that runs some application that sends me an email and then terminates. The application returns the exit code to the bash script.
when I run the command:
kubectl get pods, I get the following:
NAME READY STATUS RESTARTS AGE
my-EP-94rh8 0/1 Completed 0 2m2s
Sometimes there are issues, and the network not connected or no license available.
I would like that to be visible to the pod user.
My question is, can I propagate the script exit code to be seen when I run the above get pods command?
I.E instead of the "Completed" status, I would like to see my application exit code - 0, 1, 2, 3....
or maybe there is a way to see it in the Pods Statuses, in the describe command?
currently I see:
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Is this possible?

The a non-zero exit code on k8s jobs will fall into the Failed pod status. There really isn't a way for you to have the exit code shown with kubectl get pods but you could output the pod status with -ojson and then pipe it into jq looking for the exit code. Something like the following from this post might work
kubectl get pod pod_name -c container_name-n namespace -ojson | jq .status.containerStatuses[].state.terminated.exitCode
or this, with the items[] in the json
kubectl get pods -ojson | jq .items[].status.containerStatuses[].state.terminated.exitCode
Alternatively, as u/blaimi mentioned, you can do it without jq, like this:
kubectl get pod pod_name -o jsonpath --template='{.status.containerStatuses[*].state.terminated.exitCode}

Related

Azure devops Deploy to Kubernetes task succeeds if kubernetes Job exitCode is nonzero

When deploying the below Job to Azure AKS the exitCode is nonzero but the deploy task succeeds. The question is
how to make Deploy to Kubernetes task fail if exit code of Job or Pod is nonzero?
#kubectl get pod --selector=job-name=job-pod-failure-policy-example -o jsonpath='{.items[-1:]..exitCode}'
apiVersion: batch/v1
kind: Job
metadata:
name: job-pod-failure-policy-example
spec:
completions: 12
parallelism: 3
template:
spec:
restartPolicy: Never
containers:
- name: main
image: docker.io/library/bash:5
command: ["bash"] # example command simulating a bug which triggers the FailJob action
args:
- -c
- echo "Hello world!" && sleep 5 && exit 1
I tried to reproduce the same issue in my environment and got the below results
I have created the pod and deployed into the cluster
vi job-pod-failure-policy-example.yaml
kubectl apply -f filename.yaml
kubectl get pods #to check the pods
To check the logs use the below command
kubectl describe pod pod_name
When I check the the exit code using below command its 0 because of pod run successfully
kubectl get pod --selector=job-name=job-pod-failure-policy-example -o jsonpath='{.items[-1:]..exitCode}'
Here I have created and deployed the file with exit code 1
When I check the logs below I got exit code 1 because the application failed
We can also have different types of jobs or pods with exit code is non zero
I have created the sample file and deployed
#vi crashbackoop.yaml
#kubectl apply -f filename
apiVersion: v1
kind: Pod
metadata:
name: privetae-image-testing
spec:
containers:
- name: private-image-test
image: buildforjenkin.azurecr.io/nginx:latest
imagePullPolicy: IfNotPresent
command: ['echo','success','sleep 1000000']
When I check the logs I got the ErrImage error
The above error we will get if the image path is not correct or kubelet does not succeeded in authenticating with container registry
For more about exit code non zero please refer this link

K8s Job being constantly recreated

I have a cronjob that keeps restarting, despite its RestartPolicy set to Never:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cron-zombie-pod-killer
spec:
schedule: "*/9 * * * *"
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
name: cron-zombie-pod-killer
spec:
containers:
- name: cron-zombie-pod-killer
image: bitnami/kubectl
command:
- "/bin/sh"
args:
- "-c"
- "kubectl get pods --all-namespaces --field-selector=status.phase=Failed | awk '{print $2 \" --namespace=\" $1}' | xargs kubectl delete pod > /dev/null"
serviceAccountName: pod-read-and-delete
restartPolicy: Never
I would expect it to run every 9th minute, but that's not the case.
What happens is that when there are pods to clean up (so, when there's smth to do for the pod) it would run normally. Once everything is cleared up, it keeps restarting -> failing -> starting, etc. in a loop every second.
Is there something I need to do to tell k8s that the job has been successful, even if there's nothing to do (no pods to clean up)? What makes the job loop in restarts and failures?
That is by design. restartPolicy is not applied to a CronJob, but a Pod it creates.
If restartPolicy is set to Never, it will ust create new pods, if the previous failed. Setting it to OnFailure causes the Pod to be restarted, and prevents the stream of new Pods.
This was discussed in this GitHub issue: Job being constanly recreated despite RestartPolicy: Never #20255
Your kubectl command results in exit code 123 (any invocation exited with a non-zero status) if there are no Pods in Failed state. This causes the Job to fail, and constant restarts.
You can fix that by forcing kubectl command to exit with exit code 0. Add || exit 0 to the end of it:
kubectl get pods --all-namespaces --field-selector=status.phase=Failed | awk '{print $2 \" --namespace=\" $1}' | xargs kubectl delete pod > /dev/null || exit 0
...Once everything is cleared up, it keeps restarting -> failing -> starting, etc. in a loop every second.
When your first command returns no pod, the trailing commands (eg. awk, xargs) fails and returns non-zero exit code. Such exit code is perceived by the controller that the job has failed and therefore start a new pod to re-run the job. You should just exit with zero when there is no pod returned.

Deleting failed jobs in kubernetes (gke)

How to delete the failed jobs in the kubernetes cluster using a cron job in gke?. when i tried to delete the failed jobs using following YAML, it has deleted all the jobs (including running)
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: XXX
namespace: XXX
spec:
schedule: "*/30 * * * *"
failedJobsHistoryLimit: 1
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
serviceAccountName: XXX
containers:
- name: kubectl-runner
image: bitnami/kubectl:latest
command: ["sh", "-c", "kubectl delete jobs $(kubectl get jobs | awk '$2 ~ 1/1' | awk '{print $1}')"]
restartPolicy: OnFailure
To delete failed Jobs in GKE you will need to use following command:
$ kubectl delete job $(kubectl get job -o=jsonpath='{.items[?(#.status.failed==1)].metadata.name}')
This command will output the JSON for all jobs and search for jobs that have status.failed field set to 1. It will then pass the failed jobs to $ kubectl delete jobs
This command ran in a CronJob will fail when there are no jobs with status: failed.
As a workaround you can use:
command: ["sh", "-c", "kubectl delete job --ignore-not-found=true $(kubectl get job -o=jsonpath='{.items[?(#.status.failed==1)].metadata.name}'); exit 0"]
exit 0 was added to make sure that the Pod will leave with status code 0
As for part of the comments made under the question:
You will need to modify it it support "Failed" Jobs
I have already tried the following , but it's not deleting the jobs. kubectl delete job $(kubectl get job -o=jsonpath='{.items[?(#.status.Failed==1)].metadata.name}')
#.status.Failed==1 <-- incorrect as JSON is case sensitive
#.status.failed==1 <-- correct
If you were to run incorrect version of this command on following Pods (to show that they failed and aren't still running to completion):
NAME READY STATUS RESTARTS AGE
job-four-9w5h9 0/1 Error 0 5s
job-one-n9trm 0/1 Completed 0 6s
job-three-nhqb6 0/1 Error 0 5s
job-two-hkz8r 0/1 Error 0 6s
You should get the following error :
error: resource(s) were provided, but no name, label selector, or --all flag specified
Above error will also show when there was no jobs passed to $ kubectl delete job.
Running correct version of this command should delete all jobs that failed:
job.batch "job-four" deleted
job.batch "job-three" deleted
job.batch "job-two" deleted
I encourage you to check additional resources:
StackOverflow.com: Questions: Kubectl list delete all completed jobs
Kubernetes.io: Doc
This one visually looks better for me:
kubectl delete job --field-selector=status.phase==Failed
#Dawid Kruk answer is excellent but working on a specific namespace only and not for all namespaces as I needed.
in order to solve it, I've created a simple bash script that gets all failed jobs and delete them -
# Delete failed jobs
failedJobs=$(kubectl get job -A -o=jsonpath='{range .items[?(#.status.failed>=1)]}{.metadata.name}{"\t"}{.metadata.namespace}{"\n"}{end}')
echo "$failedJobs" | while read each
do
array=($each)
jobName=${array[0]}
namespace=${array[1]}
echo "Debug: job name: $jobName is deleted on namespace $namespace"
kubectl delete job $jobName -n $namespace
done

How to verify a cronjob successfully completed in Kubernetes

I am trying to create a cronjob that runs the command date in a single busybox container. The command should run every minute and must complete within 17 seconds or be terminated by Kubernetes. The cronjob name and container name should both be hello.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
jobTemplate:
metadata:
name: hello
spec:
completions: 1
activeDeadlineSeconds: 17
template:
metadata:
creationTimestamp: null
spec:
containers:
- image: busybox
name: hello
command: ["/bin/sh","-c","date"]
resources: {}
restartPolicy: OnFailure
schedule: '*/1 * * * *'
status: {}
I want to verify that the job executed successfully at least once.
I tried it using the command k get cronjob -w which gives me this result.
Is there another way to verify that the job executes successfully? Is it a good way to add a command date to the container?
CronJob internally creates Job which internally creates Pod. Watch for the job that gets created by the CronJob
kubectl get jobs --watch
The output is similar to this:
NAME COMPLETIONS DURATION AGE
hello-4111706356 0/1 0s
hello-4111706356 0/1 0s 0s
hello-4111706356 1/1 5s 5s
And you can see the number of COMPLETIONS
#Replace "hello-4111706356" with the job name in your system
pods=$(kubectl get pods --selector=job-name=hello-4111706356 --output=jsonpath={.items[*].metadata.name})
Check the pod logs
kubectl logs $pods
you can check the logs of the pods created by the cronjob resource. Have a look at this question and let me know if this solves your query.
You can directly check the status of the jobs. Cronjob is just controlling a Kubernetes job.
Run kubectl get jobs and it will give you the completion status.
> kubectl get jobs
NAME COMPLETIONS DURATION AGE
datee-job 0/1 of 3 24m 24m

How can I debug why my single-job pod ends with status = "Error"?

I'm setting up a Kubernetes cluster and am testing a small container. This is my YAML file for the pod:
apiVersion: v1
kind: Pod
metadata:
name: example
spec:
restartPolicy: Never
containers:
- name: node
image: 'node:5'
command: ['node']
args: ['-e', 'console.log(1234)']
I deploy it with kubectl create -f example.yml and sure enough it runs as expected:
$ kubectl logs example
1234
However, the pod's status is "Error":
$ kubectl get po example
NAME READY STATUS RESTARTS AGE
example 0/1 Error 0 16m
How can I investigate why the status is "Error"?
kubectl describe pod example will give you more info on what's going on
also
kubectl get events can get you more details too although not dedicated to the given pod.