Why is the pod status for my Kubernetes job not "Failed" when I purposefully ran a command to exit with a non-zero exit code? - kubernetes

I am running a CronJob on Kubernetes (version 1.20). These are the relevant details:
concurrencyPolicy: Forbid
backoffLimit: 6
- command:
- /bin/bash
- -c
- |
exit 1;
restartPolicy: OnFailure
When I run the job, the pod that spins up restarts up to six times and then terminates. I describe the job, expecting to see that it failed, but instead I see this line:
Pods Statuses: 0 Running / 0 Succeeded / 0 Failed
I am really expecting to see this instead:
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
I have played around with the backoffLimit, but it was to no avail. The other settings need to be what they are at the moment.

The pod terminated with a non-zero return code (from the exit 1), which is considered a failure.

Your yaml file is missing some details like spec, api version, metadata etc., follow the below syntax it worked for me
apiVersion: v1
kind: Pod
metadata:
name: command-demo
labels:
purpose: demonstrate-command
spec:
containers:
- name: command-demo-container
image: debian
command:
- /bin/bash
- -c
- |
exit 1

Related

Azure devops Deploy to Kubernetes task succeeds if kubernetes Job exitCode is nonzero

When deploying the below Job to Azure AKS the exitCode is nonzero but the deploy task succeeds. The question is
how to make Deploy to Kubernetes task fail if exit code of Job or Pod is nonzero?
#kubectl get pod --selector=job-name=job-pod-failure-policy-example -o jsonpath='{.items[-1:]..exitCode}'
apiVersion: batch/v1
kind: Job
metadata:
name: job-pod-failure-policy-example
spec:
completions: 12
parallelism: 3
template:
spec:
restartPolicy: Never
containers:
- name: main
image: docker.io/library/bash:5
command: ["bash"] # example command simulating a bug which triggers the FailJob action
args:
- -c
- echo "Hello world!" && sleep 5 && exit 1
I tried to reproduce the same issue in my environment and got the below results
I have created the pod and deployed into the cluster
vi job-pod-failure-policy-example.yaml
kubectl apply -f filename.yaml
kubectl get pods #to check the pods
To check the logs use the below command
kubectl describe pod pod_name
When I check the the exit code using below command its 0 because of pod run successfully
kubectl get pod --selector=job-name=job-pod-failure-policy-example -o jsonpath='{.items[-1:]..exitCode}'
Here I have created and deployed the file with exit code 1
When I check the logs below I got exit code 1 because the application failed
We can also have different types of jobs or pods with exit code is non zero
I have created the sample file and deployed
#vi crashbackoop.yaml
#kubectl apply -f filename
apiVersion: v1
kind: Pod
metadata:
name: privetae-image-testing
spec:
containers:
- name: private-image-test
image: buildforjenkin.azurecr.io/nginx:latest
imagePullPolicy: IfNotPresent
command: ['echo','success','sleep 1000000']
When I check the logs I got the ErrImage error
The above error we will get if the image path is not correct or kubelet does not succeeded in authenticating with container registry
For more about exit code non zero please refer this link

Kubernetes Pod with Sleep command takes time to get deleted

Currently it takes quite a long time before the pod can be terminated after a kubectl delete command. I have the feeling that it could be because of the sleep command.
How can I make the container stop faster?
What best practices should I use here?
apiVersion: apps/v1
kind: Deployment
...
spec:
template:
spec:
containers:
- image: alpine
..
command:
- /bin/sh
- -c
- |
trap : TERM INT
while true; do
# some code to check something
sleep 10
done
Is my approach with "trap: TERM INT" correct? At the moment I don't see any positive effect...
When I terminate the pod it takes several seconds for the command to come back.
kubectl delete pod my-pod
Add terminationGracePeriodSeconds to your spec will do:
...
spec:
template:
spec:
terminationGracePeriodSeconds: 10 # <-- default is 30, can go as low as 0 to send SIGTERM immediately.
containers:
- image: alpine

Is there a way to enable shareProcessNamespace for helm post-install hook?

I'm running a pod with 3 containers (telegraf, fluentd and an in-house agent) that makes use of shareProcessNamespace: true.
I've written a python script to fetch the initial config for telegraf and fluentd from a central controller API endpoint. Since this is a one time operation, I plan to use helm post-install hook.
apiVersion: batch/v1
kind: Job
metadata:
name: agent-postinstall
annotations:
"helm.sh/hook-weight": "3"
"helm.sh/hook": "post-install"
spec:
template:
spec:
containers:
- name: agent-postinstall
image: "{{ .Values.image.agent.repository }}:{{ .Values.image.agent.tag | default .Chart.AppVersion }}"
imagePullPolicy: IfNotPresent
command: ['python3', 'getBaseCfg.py']
volumeMounts:
- name: config-agent-volume
mountPath: /etc/config
volumes:
- name: config-agent-volume
configMap:
name: agent-cm
restartPolicy: Never
backoffLimit: 1
It is required for the python script to check if telegraf/fluentd/agent processes are up, before getting the config. I intend to wait (with a timeout) until pgrep <telegraf/fluentd/agent> returns true and then fire APIs. Is there a way to enable shareProcessNamespace for the post-install hook as well? Thanks.
PS: Currently, the agent calls the python script along with its own startup script. It works, but it is kludgy. I'd like to move it out of agent container.
shareProcessNamespace
Most important part of this flag is it works only within one pod, all containers within one pod will share processes between each other.
In described approach job is supposed to be used. Job creates a separate pod so it won't work this way. Container should be a part of the "main" pod with all other containers to have access to running processes of that pod.
More details about process sharing.
Possible way to solution it
It's possible to get processes from the containers directly using kubectl command.
Below is an example how to check state of the processes using pgrep command. The pgrepContainer container needs to have the pgrep command already installed.
job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-postinstall-hook"
annotations: "helm.sh/hook": post-install
spec:
template:
spec:
serviceAccountName: config-user # service account with appropriate permissions is required using this approach
volumes:
- name: check-script
configMap:
name: check-script
restartPolicy: Never
containers:
- name: post-install-job
image: "bitnami/kubectl" # using this image with kubectl so we can connect to the cluster
command: ["bash", "/mnt/script/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt/script
And configmap.yaml which contains script and logic which check three processes in loop for 60 iterations per 10 seconds each:
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
podName=test
pgrepContainer=app-1
process1=sleep
process2=pause
process3=postgres
attempts=0
until [ $attempts -eq 60 ]; do
kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process1} 1>/dev/null 2>&1 \
&& kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process2} 1>/dev/null 2>&1 \
&& kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process3} 1>/dev/null 2>&1
if [ $? -eq 0 ]; then
break
fi
attempts=$((attempts + 1))
sleep 10
echo "Waiting for all containers to be ready...$[ ${attempts}*10 ] s"
done
if [ $attempts -eq 60 ]; then
echo "ERROR: Timeout"
exit 1
fi
echo "All containers are ready !"
echo "Configuring telegraf and fluentd services"
Final result will look like:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test 2/2 Running 0 20m
test-postinstall-hook-dgrc9 0/1 Completed 0 20m
$ kubectl logs test-postinstall-hook-dgrc9
Waiting for all containers to be ready...10 s
All containers are ready !
Configuring telegraf and fluentd services
Above is an another approach, you can use its logic as base to achieve your end goal.
postStart
Also postStart hook can be considered to be used where some logic will be located. It will run after container is created. Since main application takes time to start and there's already logic which waits for it, it's not an issue that:
there is no guarantee that the hook will execute before the container ENTRYPOINT

How the pod can reflect the application exit code of K8S Job

I'm running a K8S job, with the following flags:
apiVersion: batch/v1
kind: Job
metadata:
name: my-EP
spec:
template:
metadata:
labels:
app: EP
spec:
restartPolicy: "Never"
containers:
- name: EP
image: myImage
The Job starts, runs my script that runs some application that sends me an email and then terminates. The application returns the exit code to the bash script.
when I run the command:
kubectl get pods, I get the following:
NAME READY STATUS RESTARTS AGE
my-EP-94rh8 0/1 Completed 0 2m2s
Sometimes there are issues, and the network not connected or no license available.
I would like that to be visible to the pod user.
My question is, can I propagate the script exit code to be seen when I run the above get pods command?
I.E instead of the "Completed" status, I would like to see my application exit code - 0, 1, 2, 3....
or maybe there is a way to see it in the Pods Statuses, in the describe command?
currently I see:
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Is this possible?
The a non-zero exit code on k8s jobs will fall into the Failed pod status. There really isn't a way for you to have the exit code shown with kubectl get pods but you could output the pod status with -ojson and then pipe it into jq looking for the exit code. Something like the following from this post might work
kubectl get pod pod_name -c container_name-n namespace -ojson | jq .status.containerStatuses[].state.terminated.exitCode
or this, with the items[] in the json
kubectl get pods -ojson | jq .items[].status.containerStatuses[].state.terminated.exitCode
Alternatively, as u/blaimi mentioned, you can do it without jq, like this:
kubectl get pod pod_name -o jsonpath --template='{.status.containerStatuses[*].state.terminated.exitCode}

Kubernetes - how to run job only once

I have a job definition based on example from kubernetes website.
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout-6
spec:
activeDeadlineSeconds: 30
completions: 1
parallelism: 1
template:
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["exit", "1"]
restartPolicy: Never
I would like run this job once and not restart if fails. With comand exit 1 kubernetes trying to run new pod to get exit 0 code until reach activeDeadlineSeconds timeout. How can avoid that? I would like run build commands in kubernetes to check compilation and if compilation fails I'll get exit code different than 0. I don't want run compilation again.
Is it possible? How?
By now this is possible by setting backoffLimit: 0 which tells the controller to do 0 retries. default is 6
If you want a one-try command runner, you probably should create bare pod, because the job will try to execute the command until it's successful or the active deadline is met.
Just create the pod from your template:
apiVersion: v1
kind: Pod
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["exit", "1"]
restartPolicy: Never
Sadly there is currently no way to prevent the job controller to just respawn new pods when they fail, but the kubernetes community is working on a solution, see:
"Backoff policy and failed pod limit" https://github.com/kubernetes/community/pull/583