I am able to run successfully a Kubernetes Job with multiple parallel worker processes, by following the example provided in "Fine Parallel Processing Using a Work Queue" in the official Kubernetes documentation
(https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/)
For example, with parallelism: 2 in the Job definition yaml file, I am able to complete the task on 2 worker pods in parallel.
Hence, the command:
kubectl get jobs
returns:
NAME COMPLETIONS DURATION AGE
worker 2/1 of 2 1h 6h
My question is: how to interpret precisely the notation 2/1 of 2 in the completions column?
(especially what is the meaning of the /1 part?). I cannot find anything helpful in the official documention about this.
Thank you for your assistance.
[Update] The status of the pods, when the job is completed, is the following:
kubectl get pods
returns:
NAME READY STATUS RESTARTS AGE
worker-dt2ss 0/1 Completed 0 6h
worker-qm56f 0/1 Completed 0 6h
A Job that is completed when a certain number of Pods terminate successfully. The Completions specifies how many Pods should terminate successfully before the Job is completed.
COMPLETIONS indicates the total number of pods in the job / the number of completed pods in the job. From your use case 2/1 indicates that there are two pods in the job in which one of the pods has been completed.
The DURATION indicates how long the business in the job has been running. This is useful for performance optimization.
And AGE is obtained by subtracting the creation time of a pod from the current time. This parameter specifies the time elapsed since the pod was created.
Related
For some context, I'm creating an API in python that creates K8s Jobs with user input in ENV variables.
Sometimes, it happens that the Image selected does not exist or has been deleted. Secrets does not exists or Volume isn't created. So it makes the Job in a crashloopbackoff or imagepullbackoff state.
First I'm am wondering if the ressource during this state are allocated to the job?
If yes, I don't want the Job to loop forever and lock resources to a never starting Job.
I've set the backofflimit to 0, but this is when the Job detect a Pod that goes in fail and tries to relaunch an other Pod to retry. In my case, I know that if a Pod fails for a job, then it's mostly due to OOM or code that fails and will always fails due to user input. So retrying will always fail.
But it doesn't limit the number of tries to crashloopbackoff or imagepullbackoff. Is there a way to set to terminate or fail the Job? I don't want to kill it, but just free the ressource and keep the events in (status.container.state.waiting.reason + status.container.state.waiting.message) or (status.container.state.terminated.reason + status.container.state.terminated.exit_code)
Could there be an option to set to limit the number of retry at the creation so I can free resources, but not to remove it to keep logs.
I have tested your first question and YES even if a pod is in crashloopbackoff state, the resources are still allocated to it !!! Here is my test: Are the Kubernetes requested resources by a pod still allocated to it when it is in crashLoopBackOff state?
Thanks for your question !
Long answer short, unfortunately there is no such option in Kubernetes.
However, you can do this manually by checking if the pod is in a crashloopbackoff then, unallocate its resources or simply delete the pod itself.
The following script delete any pod in the crashloopbackoff state from a specified namespace
#!/bin/bash
# This script check the passed namespace and delete pods in 'CrashLoopBackOff state
NAMESPACE="test"
delpods=$(sudo kubectl get pods -n ${NAMESPACE} |
grep -i 'CrashLoopBackOff' |
awk '{print $1 }')
for i in ${delpods[#]}; do
sudo kubectl delete pod $i --force=true --wait=false \
--grace-period=0 -n ${NAMESPACE}
done
Since we have passed the option --grace-period=0 the pod won't automatically restart again.
But, if after using this script or assigning it to a job, you noticed that the pod continues to restart and fall in the CrashLoopBackOff state again for some weird reason. Thera is a workaround for this, which is changing the restart policy of the pod:
A PodSpec has a restartPolicy field with possible values Always,
OnFailure, and Never. The default value is Always. restartPolicy
applies to all Containers in the Pod. restartPolicy only refers to
restarts of the Containers by the kubelet on the same node. Exited
Containers that are restarted by the kubelet are restarted with an
exponential back-off delay (10s, 20s, 40s …) capped at five minutes,
and is reset after ten minutes of successful execution. As discussed
in the Pods document, once bound to a node, a Pod will never be
rebound to another node.
See more details in the documentation or from here.
And that is it! Happy hacking.
Regarding the first question, it is already answered by bguess here.
When you run "kubectl get pods -A -o wide" you get a list of pods and a STATUS column.
Where can I get a list of the possible status options?
What I trying to do is generate a list of statuses and how many pods are in each status. If I had a list of the possible status states I could do what I need.
Thanks.
if you want also result on container basics, you try this command
kubectl get pods -A -o wide --no-headers | cut -b 85-108 | sort | uniq -c
if the output looks like
2 0/1 CrashLoopBackOff
1 0/3 Pending
260 1/1 Running
4 2/2 Running
like comment in Complete list of pod statuses :
$ kubectl get pod -A --no-headers |awk '{arr[$4]++}END{for (a in arr) print a, arr[a]}'
Evicted 1
Running 121
CrashLoopBackOff 4
Completed 5
Pending 1
This command will shows how many pod are currently in what state.
But how to get the possible values of all the states?
In my view, there is no api or command to get it.
This status: "The aggregate status of the containers in this pod." source code can be find in https://github.com/kubernetes/kubernetes/blob/master/pkg/printers/internalversion/printers.go#L741 shows status based on pod.Status.Phase and will be changed.
A phase of a Pod is a simple, high-level summary of where the Pod is in its Lifecycle.
The phase is not intended to be a comprehensive rollup of observations of Container or Pod state, nor is it intended to be a comprehensive state machine.
Here are the possible values for phase:
Pending The Pod has been accepted by the Kubernetes system, but one or more of the Container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while.
Running The Pod has been bound to a node, and all the Containers have been created. At least one Container is still running, or is working on starting or restarting.
Succeeded All Containers in the Pod have terminated in success, and will not be restarted.
Failed All Containers in the Pod have terminated, and at least one Container has terminated in failure. That is, the Container either exited with non-zero status or was terminated by the system.
Unknown For some reason the state of the Pod could not be obtained, typically due to an error in communicating with the host of the Pod.
If you are interested in detailed arrays with Pod conditions, I suggest looking at Pod Lifecycle from Kubernetes documentation and inspect source code for remaining information.
I created a k8s CronJob with the following schedule (run every minute):
schedule: "*/1 * * * *"
I see my CronJob created:
NAMESPACE NAME READY STATUS RESTARTS AGE
job-staging job-1593017820-tt2sn 2/3 Running 0 10m
My job simply does a Printf to the log, one time, then exits.
When I do a kubernetes get cronjob I see:
NAMESPACE NAME READY STATUS RESTARTS AGE
job-staging job */1 * * * * False 1 19m 19m
When I look at the logs, it looks like it only ran once, which was the first run. Do I need to prevent my program from exiting?
I assumed k8s would restart my program, but maybe that's a wrong assumption.
Your assumption about the behavior of Kubernetes ("restarting the program") is correct.
As you may know, a Job is basically a Kubernetes Pod that executes some process and successfully finishes when it exits with a zero exit code. The "Cron" part of CronJob is the most obvious, scheduling the Job to execute in a particular time pattern.
Most YAML objects for CronJobs include the restartPolicy: OnFailure key that prevents Kubernetes from rescheduling the Job for a non-zero exit code (the hello-world YAML file in Kubernetes documentation uses this flag).
From what I see in the logs obtained by your kubectl instruction, it looks like your Job is failing - because of the Status 1. I would recommend you check the logs of the CronJob using kubectl logs -f -n default job-1593017820-tt2sn for any possible errors in the execution of your script (if your script explicitly exits with an exit-code, check for a possible non-zero code).
[UPDATE]
CronJobs also have limitations:
A cron job creates a job object about once per execution time of its schedule. We say “about” because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.
I think these are pretty rare scenarios, but maybe you've found yourself in these rare situations. The documentation is here.
I have a requirement I want to know every part of the time spent of a Pod.
how much time to pull a docker image? Maybe a Pod has multiple initContainers and containers. I want to know every part of them.
Maybe I can analysis the Events using
'kubectl describe pod-name...'
how much time a Pod get ready? From being created and get readiness ready.
For a bare Pod, I can know the startTime of the Pod and which time it is finished. Then I can calculate the duration.
But for pods that created by Deployment,StatefulSet,DaemonSet, I cannot find any time flag that indicating the first time that the Pod becomes readiness ready.
I want to know how much time spent to get the Pod ready. Not the age of the Pod.
The easiest method would be to subscribe to api-server to notify you if some changes occur in your cluster.
For example, I issued:
$ kubectl get pods --output-watch-events --watch
and then created a new pod. Here is the output:
EVENT NAME READY STATUS RESTARTS AGE
ADDED example-pod 0/1 Pending 0 0s
MODIFIED example-pod 0/1 ContainerCreating 0 0s
MODIFIED example-pod 0/1 Running 0 19s
MODIFIED example-pod 1/1 Running 0 23s
and here is a little explanation:
As you can see first event is ADDED and it is in Pending state which means that pod object just got created.
Second event is MODIFIED with ContainerCreating status, age 0 which means it took less than 1s time to assing/schedule the pod to a node. Now kubelet starts downloading continer image.
Third event has Running status meaning continer is started running. Looking at age column you can see it took 19s since previous event so it took around 19s to download the image and start the container. If you take a look at READY column you can see 0/1 value, so container is running but it is not yet in ready state.
Fourth event has READY column set to 1/1 so readiness probe has passed successfully. If you now look at the age column, you can see it took around 4s (32-19) to check readiness probe and change pod status.
If this information in not enough you can use --output= parameter to receive full pod specification on every change.
You can also play with kubectl get events to receive some more events. And of course, by adding --watch flag you can watch events in real time.
If you want higher level of felxibility, use dedicated kubernetes clinet libraries instead of kuebctl to receive this information and to process it.
I am working on two servers, each with a number of pods. The first server is Validation Env and I can use kubernetes commands, but the second server is on Prod Env I do not have full rights. It is out of the question to get full rights on the last one.
So, I am doing a platform stability statistic and I need info about the last reset of pods. I can see the "Age" but I cannot use a screenshot in my statistic, so I need a command that outputs every pods age or the last reset.
P.S. Every night at 00:00 the pods are saved and archived in a separate folder.
Get pods already gives you that info:
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-7cdbd8cdc9-8pnzq 1/1 Running 0 36s
$ kubectl delete po nginx-7cdbd8cdc9-8pnzq
pod "nginx-7cdbd8cdc9-8pnzq" deleted
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-7cdbd8cdc9-67l8l 1/1 Running 0 4s
I found a solution:
command:
zgrep "All subsystems started successfully" 201911??/*ota*
response:
23:23:37,429 [INFO ] main c.o.c.a.StartUp - All subsystems started successfully
P.S. "ota" is my pod's name.