Azure devops Deploy to Kubernetes task succeeds if kubernetes Job exitCode is nonzero - kubernetes

When deploying the below Job to Azure AKS the exitCode is nonzero but the deploy task succeeds. The question is
how to make Deploy to Kubernetes task fail if exit code of Job or Pod is nonzero?
#kubectl get pod --selector=job-name=job-pod-failure-policy-example -o jsonpath='{.items[-1:]..exitCode}'
apiVersion: batch/v1
kind: Job
metadata:
name: job-pod-failure-policy-example
spec:
completions: 12
parallelism: 3
template:
spec:
restartPolicy: Never
containers:
- name: main
image: docker.io/library/bash:5
command: ["bash"] # example command simulating a bug which triggers the FailJob action
args:
- -c
- echo "Hello world!" && sleep 5 && exit 1

I tried to reproduce the same issue in my environment and got the below results
I have created the pod and deployed into the cluster
vi job-pod-failure-policy-example.yaml
kubectl apply -f filename.yaml
kubectl get pods #to check the pods
To check the logs use the below command
kubectl describe pod pod_name
When I check the the exit code using below command its 0 because of pod run successfully
kubectl get pod --selector=job-name=job-pod-failure-policy-example -o jsonpath='{.items[-1:]..exitCode}'
Here I have created and deployed the file with exit code 1
When I check the logs below I got exit code 1 because the application failed
We can also have different types of jobs or pods with exit code is non zero
I have created the sample file and deployed
#vi crashbackoop.yaml
#kubectl apply -f filename
apiVersion: v1
kind: Pod
metadata:
name: privetae-image-testing
spec:
containers:
- name: private-image-test
image: buildforjenkin.azurecr.io/nginx:latest
imagePullPolicy: IfNotPresent
command: ['echo','success','sleep 1000000']
When I check the logs I got the ErrImage error
The above error we will get if the image path is not correct or kubelet does not succeeded in authenticating with container registry
For more about exit code non zero please refer this link

Related

Is there a way to enable shareProcessNamespace for helm post-install hook?

I'm running a pod with 3 containers (telegraf, fluentd and an in-house agent) that makes use of shareProcessNamespace: true.
I've written a python script to fetch the initial config for telegraf and fluentd from a central controller API endpoint. Since this is a one time operation, I plan to use helm post-install hook.
apiVersion: batch/v1
kind: Job
metadata:
name: agent-postinstall
annotations:
"helm.sh/hook-weight": "3"
"helm.sh/hook": "post-install"
spec:
template:
spec:
containers:
- name: agent-postinstall
image: "{{ .Values.image.agent.repository }}:{{ .Values.image.agent.tag | default .Chart.AppVersion }}"
imagePullPolicy: IfNotPresent
command: ['python3', 'getBaseCfg.py']
volumeMounts:
- name: config-agent-volume
mountPath: /etc/config
volumes:
- name: config-agent-volume
configMap:
name: agent-cm
restartPolicy: Never
backoffLimit: 1
It is required for the python script to check if telegraf/fluentd/agent processes are up, before getting the config. I intend to wait (with a timeout) until pgrep <telegraf/fluentd/agent> returns true and then fire APIs. Is there a way to enable shareProcessNamespace for the post-install hook as well? Thanks.
PS: Currently, the agent calls the python script along with its own startup script. It works, but it is kludgy. I'd like to move it out of agent container.
shareProcessNamespace
Most important part of this flag is it works only within one pod, all containers within one pod will share processes between each other.
In described approach job is supposed to be used. Job creates a separate pod so it won't work this way. Container should be a part of the "main" pod with all other containers to have access to running processes of that pod.
More details about process sharing.
Possible way to solution it
It's possible to get processes from the containers directly using kubectl command.
Below is an example how to check state of the processes using pgrep command. The pgrepContainer container needs to have the pgrep command already installed.
job.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-postinstall-hook"
annotations: "helm.sh/hook": post-install
spec:
template:
spec:
serviceAccountName: config-user # service account with appropriate permissions is required using this approach
volumes:
- name: check-script
configMap:
name: check-script
restartPolicy: Never
containers:
- name: post-install-job
image: "bitnami/kubectl" # using this image with kubectl so we can connect to the cluster
command: ["bash", "/mnt/script/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt/script
And configmap.yaml which contains script and logic which check three processes in loop for 60 iterations per 10 seconds each:
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
podName=test
pgrepContainer=app-1
process1=sleep
process2=pause
process3=postgres
attempts=0
until [ $attempts -eq 60 ]; do
kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process1} 1>/dev/null 2>&1 \
&& kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process2} 1>/dev/null 2>&1 \
&& kubectl exec ${podName} -c ${pgrepContainer} -- pgrep ${process3} 1>/dev/null 2>&1
if [ $? -eq 0 ]; then
break
fi
attempts=$((attempts + 1))
sleep 10
echo "Waiting for all containers to be ready...$[ ${attempts}*10 ] s"
done
if [ $attempts -eq 60 ]; then
echo "ERROR: Timeout"
exit 1
fi
echo "All containers are ready !"
echo "Configuring telegraf and fluentd services"
Final result will look like:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test 2/2 Running 0 20m
test-postinstall-hook-dgrc9 0/1 Completed 0 20m
$ kubectl logs test-postinstall-hook-dgrc9
Waiting for all containers to be ready...10 s
All containers are ready !
Configuring telegraf and fluentd services
Above is an another approach, you can use its logic as base to achieve your end goal.
postStart
Also postStart hook can be considered to be used where some logic will be located. It will run after container is created. Since main application takes time to start and there's already logic which waits for it, it's not an issue that:
there is no guarantee that the hook will execute before the container ENTRYPOINT

How to achieve Automatic Rollback in Kubernetes?

Let's say I've a deployment. For some reason it's not responding after sometime. Is there any way to tell Kubernetes to rollback to previous version automatically on failure?
You mentioned that:
I've a deployment. For some reason it's not responding after sometime.
In this case, you can use liveness and readiness probes:
The kubelet uses liveness probes to know when to restart a container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
The above probes may prevent you from deploying a corrupted version, however liveness and readiness probes aren't able to rollback your Deployment to the previous version. There was a similar issue on Github, but I am not sure there will be any progress on this matter in the near future.
If you really want to automate the rollback process, below I will describe a solution that you may find helpful.
This solution requires running kubectl commands from within the Pod.
In short, you can use a script to continuously monitor your Deployments, and when errors occur you can run kubectl rollout undo deployment DEPLOYMENT_NAME.
First, you need to decide how to find failed Deployments. As an example, I'll check Deployments that perform the update for more than 10s with the following command:
NOTE: You can use a different command depending on your need.
kubectl rollout status deployment ${deployment} --timeout=10s
To constantly monitor all Deployments in the default Namespace, we can create a Bash script:
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
We want to run this script from inside the Pod, so I converted it to ConfigMap which will allow us to mount this script in a volume (see: Using ConfigMaps as files from a Pod):
$ cat check-script-configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: check-script
data:
checkScript.sh: |
#!/bin/bash
while true; do
sleep 60
deployments=$(kubectl get deployments --no-headers -o custom-columns=":metadata.name" | grep -v "deployment-checker")
echo "====== $(date) ======"
for deployment in ${deployments}; do
if ! kubectl rollout status deployment ${deployment} --timeout=10s 1>/dev/null 2>&1; then
echo "Error: ${deployment} - rolling back!"
kubectl rollout undo deployment ${deployment}
else
echo "Ok: ${deployment}"
fi
done
done
$ kubectl apply -f check-script-configmap.yml
configmap/check-script created
I've created a separate deployment-checker ServiceAccount with the edit Role assigned and our Pod will run under this ServiceAccount:
NOTE: I've created a Deployment instead of a single Pod.
$ cat all-in-one.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: deployment-checker
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: deployment-checker-binding
subjects:
- kind: ServiceAccount
name: deployment-checker
namespace: default
roleRef:
kind: ClusterRole
name: edit
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: deployment-checker
name: deployment-checker
spec:
selector:
matchLabels:
app: deployment-checker
template:
metadata:
labels:
app: deployment-checker
spec:
serviceAccountName: deployment-checker
volumes:
- name: check-script
configMap:
name: check-script
containers:
- image: bitnami/kubectl
name: test
command: ["bash", "/mnt/checkScript.sh"]
volumeMounts:
- name: check-script
mountPath: /mnt
After applying the above manifest, the deployment-checker Deployment was created and started monitoring Deployment resources in the default Namespace:
$ kubectl apply -f all-in-one.yaml
serviceaccount/deployment-checker created
clusterrolebinding.rbac.authorization.k8s.io/deployment-checker-binding created
deployment.apps/deployment-checker created
$ kubectl get deploy,pod | grep "deployment-checker"
deployment.apps/deployment-checker 1/1 1
pod/deployment-checker-69c8896676-pqg9h 1/1 Running
Finally, we can check how it works. I've created three Deployments (app-1, app-2, app-3):
$ kubectl create deploy app-1 --image=nginx
deployment.apps/app-1 created
$ kubectl create deploy app-2 --image=nginx
deployment.apps/app-2 created
$ kubectl create deploy app-3 --image=nginx
deployment.apps/app-3 created
Then I changed the image for the app-1 to the incorrect one (nnnginx):
$ kubectl set image deployment/app-1 nginx=nnnginx
deployment.apps/app-1 image updated
In the deployment-checker logs we can see that the app-1 has been rolled back to the previous version:
$ kubectl logs -f deployment-checker-69c8896676-pqg9h
...
====== Thu Oct 7 09:20:15 UTC 2021 ======
Ok: app-1
Ok: app-2
Ok: app-3
====== Thu Oct 7 09:21:16 UTC 2021 ======
Error: app-1 - rolling back!
deployment.apps/app-1 rolled back
Ok: app-2
Ok: app-3
I stumbled upon Argo Rollout which addresses this non automatic rollback and many other deployment related things.
https://argoproj.github.io/argo-rollouts/

How the pod can reflect the application exit code of K8S Job

I'm running a K8S job, with the following flags:
apiVersion: batch/v1
kind: Job
metadata:
name: my-EP
spec:
template:
metadata:
labels:
app: EP
spec:
restartPolicy: "Never"
containers:
- name: EP
image: myImage
The Job starts, runs my script that runs some application that sends me an email and then terminates. The application returns the exit code to the bash script.
when I run the command:
kubectl get pods, I get the following:
NAME READY STATUS RESTARTS AGE
my-EP-94rh8 0/1 Completed 0 2m2s
Sometimes there are issues, and the network not connected or no license available.
I would like that to be visible to the pod user.
My question is, can I propagate the script exit code to be seen when I run the above get pods command?
I.E instead of the "Completed" status, I would like to see my application exit code - 0, 1, 2, 3....
or maybe there is a way to see it in the Pods Statuses, in the describe command?
currently I see:
Pods Statuses: 0 Running / 1 Succeeded / 0 Failed
Is this possible?
The a non-zero exit code on k8s jobs will fall into the Failed pod status. There really isn't a way for you to have the exit code shown with kubectl get pods but you could output the pod status with -ojson and then pipe it into jq looking for the exit code. Something like the following from this post might work
kubectl get pod pod_name -c container_name-n namespace -ojson | jq .status.containerStatuses[].state.terminated.exitCode
or this, with the items[] in the json
kubectl get pods -ojson | jq .items[].status.containerStatuses[].state.terminated.exitCode
Alternatively, as u/blaimi mentioned, you can do it without jq, like this:
kubectl get pod pod_name -o jsonpath --template='{.status.containerStatuses[*].state.terminated.exitCode}

Imperative command for creating job and cronjob in Kubernetes

Is this a valid imperative command for creating job?
kubectl create job my-job --image=busybox
I see this in https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands. But the command is not working. I am getting error as bellow:
Error: unknown flag: --image
What is the correct imperative command for creating job?
Try this one
kubectl create cronjob my-job --schedule="0,15,30,45 * * * *" --image=busy-box
What you have should work, though it not recommended as an approach anymore. I would check what version of kubectl you have, and possibly upgrade it if you aren't using the latest.
That said, the more common approach these days is to write a YAML file containing the Job definition and then run kubectl apply -f myjob.yaml or similar. This file-driven approach allowed for more natural version control, editing, review, etc.
Using correct value for --restart field on "kubectl run" will result run command to create an deployment or job or cronjob
--restart='Always': The restart policy for this Pod. Legal values [Always, OnFailure, Never]. If set to 'Always'
a deployment is created, if set to 'OnFailure' a job is created, if set to 'Never', a regular pod is created. For the
latter two --replicas must be 1. Default 'Always', for CronJobs `Never`.
Use "kubectl run" for creating basic kubernetes job using imperatively command as below
master $ kubectl run nginx --image=nginx --restart=OnFailure --dry-run -o yaml > output.yaml
Above should result an "output.yaml" as below example, you can edit this yaml for advance configurations as needed and create job by "kubectl create -f output.yaml or if you just need basic job then remove --dry-run option from above command and you will get basic job created.
apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
spec:
template:
metadata:
creationTimestamp: null
labels:
run: nginx
spec:
containers:
- image: nginx
name: nginx
resources: {}
restartPolicy: OnFailure
status: {}

Not able to see Pod when I create a Job

When I try to create Deployment as Type Job, it's not pulling any image.
Below is .yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: copyartifacts
spec:
backoffLimit: 1
template:
metadata:
name: copyartifacts
spec:
restartPolicy: "Never"
volumes:
- name: sharedvolume
persistentVolumeClaim:
claimName: shared-pvc
- name: dockersocket
hostPath:
path: /var/run/docker.sock
containers:
- name: copyartifacts
image: alpine:3.7
imagePullPolicy: Always
command: ["sh", "-c", "ls -l /shared; rm -rf /shared/*; ls -l /shared; while [ ! -d /shared/artifacts ]; do echo Waiting for artifacts to be copied; sleep 2; done; sleep 10; ls -l /shared/artifacts; "]
volumeMounts:
- mountPath: /shared
name: sharedvolume
Can you please guide here?
Regards,
Vikas
There could be two possible reasons for not seeing pod.
The pod hasn't been created yet.
The pod has completed it's task and terminated before you have noticed.
1. Pod hasn't been created:
If pod hasn't been created yet, you have to find out why the job failed to create pod. You can view job's events to see if there are any failure event. Use following command to describe a job.
kubectl describe job <job-name> -n <namespace>
Then, check the Events: field. There might be some events showing pod creation failure with respective reason.
2. Pod has completed and terminated:
Job's are used to perform one-time task rather than serving an application that require to maintain a desired state. When the task is complete, pod goes to completed state then terminate (but not deleted). If your Job is intended for a task that does not take much time, the pod may terminate after completing the task before you have noticed.
As the pod is terminated, kubectl get pods will not show that pod. However, you will able to see the pod using kubectl get pods -a command as it hasn't been deleted.
You can also describe the job and check for completion or success event.
if you use kind created the K8s cluster, all the cluster node run as docker. If you had reboot you computer or VM, the cluster (pod) ip address may change, leeding to the cluster node internet communication failed. In this case, see the cluster manager logs, it has error message. Job created, but pod not.
try to re-create the cluster, or change the node config about ip address.