Kubernetes job vs pod in containerCreating state - kubernetes

I am trying to figure out if there is a way to force a pod that is stuck on containerCreating state (for valid reasons like can't mount an inaccessible NFS, etc.) to move to a failed state after a specific amount of time.
I have Kubernetes jobs that I'm running through a Jenkins pipeline. I'm using the job state (type: completed|failed) to determine the outcome and then I gather the results of the jobs (kubectl get pods + kubectl logs). It works well as long as the pods go into a known failed state like ContainerCannotRun or Backofflimit and therefore the job state goes to failed.
Where the problem arises is when a pod goes into containerCreating state and stays that way. Then, the job state stays active and will never change. Is there a way, in the job manifest to put something to force a pod that's in containerCreating state to move to a failed state after a certain amount of time?
Example:
pod status
- image: myimage
imageID: ""
lastState: {}
name: primary
ready: false
restartCount: 0
state:
waiting:
reason: ContainerCreating
hostIP: x.y.z.y
phase: Pending
qosClass: BestEffort
startTime: "2020-05-06T17:09:58Z"
job status
active: 1
startTime: "2020-05-06T17:09:58Z"
Thanks for any input.

As documented here use activeDeadlineSeconds or backoffLimit
The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
Once backoffLimit has been reached the Job will be marked as failed and any running Pods will be terminated.
Note that a Job’s activeDeadlineSeconds takes precedence over its backoffLimit. Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds, even if the backoffLimit is not yet reached.
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout
spec:
backoffLimit: 5
activeDeadlineSeconds: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never

Related

How make kubernetes job fail with it cannot find the pod image?

When a container image is not present on the cluster the pod fails with the error ErrImageNeverPull but the job never fails. Is there a configuration that I can add to make sure the job fails if the pod startup fails.
apiVersion: batch/v1
kind: Job
metadata:
name: image-not-present
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 120
template:
spec:
serviceAccountName: consolehub
containers:
- name: image-not-present
image: aipaintr/image_not_present:latest
imagePullPolicy: Never
restartPolicy: OnFailure
You can config activeDeadlineSeconds for this case. However, you have know how long your job take to reach Complete status to avoid this timeout can kill your pod processing.
From the documents:
The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
For example: I have created job with wrong image and activeDeadlineSeconds: 100. Obviously, the pod stuck with status Pending because of wrong image.kubectl describe pod
After 100 seconds, the Job was Fail and the pod was killed as well.
kubectl describe job

What is the default .spec.activeDeadlineSeconds in kubernetes job in you don't explicitly set it

In Kubernetes job, there is a spec for .spec.activeDeadlineSeconds. If you don't explicitly set it, what will be the default value? 600 secs?
here is the example from k8s doc
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-timeout
spec:
backoffLimit: 5
activeDeadlineSeconds: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
assume I remove the line
activeDeadlineSeconds: 100
By default, a Job will run uninterrupted.
If you don't set activeDeadlineSeconds, the job will not have active deadline limit. It means activeDeadlineSeconds doesn't have default value.
By the way, there are several ways to terminate the job.(Of course, when a Job completes, no more Pods are created.)
Pod backoff failure policy(.spec,.backofflimit)
You can set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s ...) capped at six minutes. The back-off count is reset when a Job's Pod is deleted or successful without any other Pods for the Job failing around that time.
Setting an active deadline(.spec.activeDeadlineSeconds)
The activeDeadlineSeconds applies to the duration of the job, no matter how many Pods are created. Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
Note that a Job's .spec.activeDeadlineSeconds takes precedence over its .spec.backoffLimit. Therefore, a Job that is retrying one or more failed Pods will not deploy additional Pods once it reaches the time limit specified by activeDeadlineSeconds, even if the backoffLimit is not yet reached.
It is not set by default. Here is a note from changeLog:
ActiveDeadlineSeconds is validated in workload controllers now, make sure it's not set anywhere (it shouldn't be set by default and having it set means your controller will restart the Pods at some point) (#38741)

Why the pod status is coming as crashloopbackoff in my case?

I created a pod which just calculates the pi and exits and it should run again. On monitoring I observed the status was running, then turned completed and finally crashloopbackoff .
I tried different images, but issue is the same.
apiVersion: v1
kind: Pod
metadata:
name: pi
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
Running and Completed status in a series. But i see CrashLoopBackOff.
qvamjak#qvamjak:~/Jobs$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pi 0/1 CrashLoopBackOff 16 63m
This is due to the restartPolicy of your pod, by default it is Always, which means the pod expects all its containers to be long-running (e.g., HTTP server), if any container in the pod exits (even successfully with code 0), it will be restarted. In this case, you want to set restartPolicy as OnFailure. See https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy.
In addition to that, Job instead of Pod is the resource type you want for run-to-completion applications, because Job manages pods for you and offers more guarantees than unmanaged Pod.

Pods stuck in PodInitializing state indefinitely

I've got a k8s cronjob that consists of an init container and a one pod container. If the init container fails, the Pod in the main container never gets started, and stays in "PodInitializing" indefinitely.
My intent is for the job to fail if the init container fails.
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: job-name
namespace: default
labels:
run: job-name
spec:
schedule: "15 23 * * *"
startingDeadlineSeconds: 60
concurrencyPolicy: "Forbid"
successfulJobsHistoryLimit: 30
failedJobsHistoryLimit: 10
jobTemplate:
spec:
# only try twice
backoffLimit: 2
activeDeadlineSeconds: 60
template:
spec:
initContainers:
- name: init-name
image: init-image:1.0
restartPolicy: Never
containers:
- name: some-name
image: someimage:1.0
restartPolicy: Never
a kubectl on the pod that's stuck results in:
Name: job-name-1542237120-rgvzl
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: my-node-98afffbf-0psc/10.0.0.0
Start Time: Wed, 14 Nov 2018 23:12:16 +0000
Labels: controller-uid=ID
job-name=job-name-1542237120
Annotations: kubernetes.io/limit-ranger:
LimitRanger plugin set: cpu request for container elasticsearch-metrics; cpu request for init container elasticsearch-repo-setup; cpu requ...
Status: Failed
IP: 10.0.0.0
Controlled By: Job/job-1542237120
Init Containers:
init-container-name:
Container ID: docker://ID
Image: init-image:1.0
Image ID: init-imageID
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 14 Nov 2018 23:12:21 +0000
Finished: Wed, 14 Nov 2018 23:12:32 +0000
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwl5n (ro)
Containers:
some-name:
Container ID:
Image: someimage:1.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwl5n (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
To try and figure this out I would run the command:
kubectl get pods - Add the namespace param if required.
Then copy the pod name and run:
kubectl describe pod {POD_NAME}
That should give you some information as to why it's stuck in the initializing state.
A Pod can be stuck in Init status due to many reasons.
PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts). If the pods status is ´Init:0/1´ means that one init container is not finalized; init:N/M means the Pod has M Init Containers, and N have completed so far.
Gathering information
For those scenario the best would be to gather information, as the root cause can be different in every PodInitializing issue.
kubectl describe pods pod-XXX with this command you can get many info of the pod, you can check if there's any meaningful event as well. Save the init container name
kubectl logs pod-XXX this command prints the logs for a container in a pod or specified resource.
kubectl logs pod-XXX -c init-container-xxx This is the most accurate as could print the logs of the init container. You can get the init container name describing the pod in order to replace "init-container-XXX" as for example to "copy-default-config" as below:
The output of kubectl logs pod-XXX -c init-container-xxx can thrown meaningful info of the issue, reference:
In the image above we can see that the root cause is that the init container can't download the plugins from jenkins (timeout), here now we can check connection config, proxy, dns; or just modify the yaml to deploy the container without the plugins.
Additional:
kubectl describe node node-XXX describing the pod will give you the name of its node, which you can also inspect with this command.
kubectl get events to list the cluster events.
journalctl -xeu kubelet | tail -n 10 kubelet logs on systemd (journalctl -xeu docker | tail -n 1 for docker).
Solutions
The solutions depends on the information gathered, once the root cause is found.
When you find a log with an insight of the root cause, you can investigate that specific root cause.
Some examples:
1 > In there this happened when init container was deleted, can be fixed deleting the pod so it would be recreated, or redeploy it. Same scenario in 1.1.
2 > If you found "bad address 'kube-dns.kube-system'" the PVC may not be recycled correctly, solution provided in 2 is running /opt/kubernetes/bin/kube-restart.sh.
3 > There, a sh file was not found, the solution would be to modify the yaml file or remove the container if unnecessary.
4 > A FailedSync was found, and it was solved restarting docker on the node.
In general you can modify the yaml, for example to avoid using an outdated URL, try to recreate the affected resource, or just remove the init container that causes the issue from your deployment. However the specific solution will depend on the specific root cause.
I think that you could miss that it is the expected behavior of the init containers.
The rule is that in case of initContainers failure a Pod will not restart if restartPolicy is set to Never otherwise the Kubernetes will keep restarting it until it succeeds.
Also:
If the init container fails, the Pod in the main container never gets
started, and stays in "PodInitializing" indefinitely.
According to documentation:
A Pod cannot be Ready until all Init Containers have succeeded. The
ports on an Init Container are not aggregated under a service. A Pod
that is initializing is in the Pending state but should have a
condition Initializing set to true.
*I can see that you tried to change this behavior, but I am not sure if you can do that with CronJob, I saw examples with Jobs. But I am just theorizing, and if this post did not help you solve your issue I can try to recreate it in lab environment.
Since you have already figured out that initcontainers are meant to run to completion, successfully. If you can't get rid of init containers, what i would do in this case is to make sure that the init container ends successfully all the time. The result of the init container can be written in an emptydir volume, something like a status file, shared by both your init container and your work container.
I would delegate to the work container the responsibility of deciding what to do in case the init container ends unsuccessfully.

Why do pods with completed status still show up in kubctl get pods?

I have executed the samples from the book "Kubernetes Up and Running" where a pod with a work queue is run, then a k8s job is created 5 pods to consume all the work on the queue. I have reproduced the yaml api objects below.
My Expectation is that once a k8s job completes then it's pods would be deleted but kubectl get pods -o wide shows the pods are still around even though it reports 0/1 containers ready and they still seem to have ip addresses assigned see output below.
When will completed job pods be removed from the output of kubectl get pods why is that not right after all the containers in the pod finish?
Are the pods consuming any resources when they complete like an IP address or is the info being printed out historical?
Output from kubectl after all the pods have consumed all the messages.
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
consumers-bws9f 0/1 Completed 0 6m 10.32.0.35 gke-cluster1-default-pool-3796b2ee-rtcr
consumers-d25cs 0/1 Completed 0 6m 10.32.0.33 gke-cluster1-default-pool-3796b2ee-rtcr
consumers-jcwr8 0/1 Completed 0 6m 10.32.2.26 gke-cluster1-default-pool-3796b2ee-tpml
consumers-l9rkf 0/1 Completed 0 6m 10.32.0.34 gke-cluster1-default-pool-3796b2ee-rtcr
consumers-mbd5c 0/1 Completed 0 6m 10.32.2.27 gke-cluster1-default-pool-3796b2ee-tpml
queue-wlf8v 1/1 Running 0 22m 10.32.0.32 gke-cluster1-default-pool-3796b2ee-rtcr
The follow three k8s api calls were executed these are cut and pasted from the book samples.
Run a pod with a work queue
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
labels:
app: work-queue
component: queue
chapter: jobs
name: queue
spec:
replicas: 1
template:
metadata:
labels:
app: work-queue
component: queue
chapter: jobs
spec:
containers:
- name: queue
image: "gcr.io/kuar-demo/kuard-amd64:1"
imagePullPolicy: Always
Expose the pod as a service so that the worker pods can get to it.
apiVersion: v1
kind: Service
metadata:
labels:
app: work-queue
component: queue
chapter: jobs
name: queue
spec:
ports:
- port: 8080
protocol: TCP
targetPort: 8080
selector:
app: work-queue
component: queue
Post 100 items to the queue then run a job with 5 pods executing in parallel until the queue is empty.
apiVersion: batch/v1
kind: Job
metadata:
labels:
app: message-queue
component: consumer
chapter: jobs
name: consumers
spec:
parallelism: 5
template:
metadata:
labels:
app: message-queue
component: consumer
chapter: jobs
spec:
containers:
- name: worker
image: "gcr.io/kuar-demo/kuard-amd64:1"
imagePullPolicy: Always
args:
- "--keygen-enable"
- "--keygen-exit-on-complete"
- "--keygen-memq-server=http://queue:8080/memq/server"
- "--keygen-memq-queue=keygen"
restartPolicy: OnFailure
The docs say it pretty well:
When a Job completes, no more Pods are created, but the Pods are not
deleted either. Keeping them around allows you to still view the logs
of completed pods to check for errors, warnings, or other diagnostic
output. The job object also remains after it is completed so that you
can view its status. It is up to the user to delete old jobs after
noting their status. Delete the job with kubectl (e.g. kubectl delete
jobs/pi or kubectl delete -f ./job.yaml). When you delete the job
using kubectl, all the pods it created are deleted too.
It shows completed status when it actually terminated. If you set restartPloicy:Never( when you don't want to run more then once) then it goes to this state.
Terminated: Indicates that the container completed its execution and has stopped running. A container enters into this when it has successfully completed execution or when it has failed for some reason. Regardless, a reason and exit code is displayed, as well as the container’s start and finish time. Before a container enters into Terminated, preStop hook (if any) is executed.
...
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 30 Jan 2019 11:45:26 +0530
Finished: Wed, 30 Jan 2019 11:45:26 +0530
...