I have a requirement I want to know every part of the time spent of a Pod.
how much time to pull a docker image? Maybe a Pod has multiple initContainers and containers. I want to know every part of them.
Maybe I can analysis the Events using
'kubectl describe pod-name...'
how much time a Pod get ready? From being created and get readiness ready.
For a bare Pod, I can know the startTime of the Pod and which time it is finished. Then I can calculate the duration.
But for pods that created by Deployment,StatefulSet,DaemonSet, I cannot find any time flag that indicating the first time that the Pod becomes readiness ready.
I want to know how much time spent to get the Pod ready. Not the age of the Pod.
The easiest method would be to subscribe to api-server to notify you if some changes occur in your cluster.
For example, I issued:
$ kubectl get pods --output-watch-events --watch
and then created a new pod. Here is the output:
EVENT NAME READY STATUS RESTARTS AGE
ADDED example-pod 0/1 Pending 0 0s
MODIFIED example-pod 0/1 ContainerCreating 0 0s
MODIFIED example-pod 0/1 Running 0 19s
MODIFIED example-pod 1/1 Running 0 23s
and here is a little explanation:
As you can see first event is ADDED and it is in Pending state which means that pod object just got created.
Second event is MODIFIED with ContainerCreating status, age 0 which means it took less than 1s time to assing/schedule the pod to a node. Now kubelet starts downloading continer image.
Third event has Running status meaning continer is started running. Looking at age column you can see it took 19s since previous event so it took around 19s to download the image and start the container. If you take a look at READY column you can see 0/1 value, so container is running but it is not yet in ready state.
Fourth event has READY column set to 1/1 so readiness probe has passed successfully. If you now look at the age column, you can see it took around 4s (32-19) to check readiness probe and change pod status.
If this information in not enough you can use --output= parameter to receive full pod specification on every change.
You can also play with kubectl get events to receive some more events. And of course, by adding --watch flag you can watch events in real time.
If you want higher level of felxibility, use dedicated kubernetes clinet libraries instead of kuebctl to receive this information and to process it.
Related
We have a argo-rollout for one of the service. I used the cmd to update the image.
kubectl-argo-rollouts -n ddash5 set image detector detector=starry-academy-177207/detector:deepak-detector-8
I was expecting this to update the pod, but it created a new one.
NAME READY STATUS RESTARTS AGE
detector-5d96bc8456-h2x7p 1/1 Running 0 35m
detector-68f89d8b45-j465j 0/1 Running 0 35m
Even if I delete detector-5d96bc8456-h2x7p, pod gets recreated with the older image.
and detector-68f89d8b45-j465j stays in 0/1 state.
I am new to kube, Can someone give me insights to this?
Thanks!!!
Deepak
You are using argo rollout, where rolling updates allow deployment updates pods instances with new ones. The new Pods will be scheduled on Nodes with available resources.That is the reason new pods are getting created by replacing existing pods.
Instead you can use kubectl set image command which is used to update images of existing deployment, it will update images without recreating the deployment.Use the following command.
kubectl set image deployment/<deployment-name> <container-name>=<image>:<tag>
In your case:
kubectl set image deployment/detector detector=starry-academy-177207/detector:deepak-detector-8
This will update existing deployment, try it and let me know if this works.Found ArgoCD Image updater you can check it.
For some context, I'm creating an API in python that creates K8s Jobs with user input in ENV variables.
Sometimes, it happens that the Image selected does not exist or has been deleted. Secrets does not exists or Volume isn't created. So it makes the Job in a crashloopbackoff or imagepullbackoff state.
First I'm am wondering if the ressource during this state are allocated to the job?
If yes, I don't want the Job to loop forever and lock resources to a never starting Job.
I've set the backofflimit to 0, but this is when the Job detect a Pod that goes in fail and tries to relaunch an other Pod to retry. In my case, I know that if a Pod fails for a job, then it's mostly due to OOM or code that fails and will always fails due to user input. So retrying will always fail.
But it doesn't limit the number of tries to crashloopbackoff or imagepullbackoff. Is there a way to set to terminate or fail the Job? I don't want to kill it, but just free the ressource and keep the events in (status.container.state.waiting.reason + status.container.state.waiting.message) or (status.container.state.terminated.reason + status.container.state.terminated.exit_code)
Could there be an option to set to limit the number of retry at the creation so I can free resources, but not to remove it to keep logs.
I have tested your first question and YES even if a pod is in crashloopbackoff state, the resources are still allocated to it !!! Here is my test: Are the Kubernetes requested resources by a pod still allocated to it when it is in crashLoopBackOff state?
Thanks for your question !
Long answer short, unfortunately there is no such option in Kubernetes.
However, you can do this manually by checking if the pod is in a crashloopbackoff then, unallocate its resources or simply delete the pod itself.
The following script delete any pod in the crashloopbackoff state from a specified namespace
#!/bin/bash
# This script check the passed namespace and delete pods in 'CrashLoopBackOff state
NAMESPACE="test"
delpods=$(sudo kubectl get pods -n ${NAMESPACE} |
grep -i 'CrashLoopBackOff' |
awk '{print $1 }')
for i in ${delpods[#]}; do
sudo kubectl delete pod $i --force=true --wait=false \
--grace-period=0 -n ${NAMESPACE}
done
Since we have passed the option --grace-period=0 the pod won't automatically restart again.
But, if after using this script or assigning it to a job, you noticed that the pod continues to restart and fall in the CrashLoopBackOff state again for some weird reason. Thera is a workaround for this, which is changing the restart policy of the pod:
A PodSpec has a restartPolicy field with possible values Always,
OnFailure, and Never. The default value is Always. restartPolicy
applies to all Containers in the Pod. restartPolicy only refers to
restarts of the Containers by the kubelet on the same node. Exited
Containers that are restarted by the kubelet are restarted with an
exponential back-off delay (10s, 20s, 40s …) capped at five minutes,
and is reset after ten minutes of successful execution. As discussed
in the Pods document, once bound to a node, a Pod will never be
rebound to another node.
See more details in the documentation or from here.
And that is it! Happy hacking.
Regarding the first question, it is already answered by bguess here.
I am able to run successfully a Kubernetes Job with multiple parallel worker processes, by following the example provided in "Fine Parallel Processing Using a Work Queue" in the official Kubernetes documentation
(https://kubernetes.io/docs/tasks/job/fine-parallel-processing-work-queue/)
For example, with parallelism: 2 in the Job definition yaml file, I am able to complete the task on 2 worker pods in parallel.
Hence, the command:
kubectl get jobs
returns:
NAME COMPLETIONS DURATION AGE
worker 2/1 of 2 1h 6h
My question is: how to interpret precisely the notation 2/1 of 2 in the completions column?
(especially what is the meaning of the /1 part?). I cannot find anything helpful in the official documention about this.
Thank you for your assistance.
[Update] The status of the pods, when the job is completed, is the following:
kubectl get pods
returns:
NAME READY STATUS RESTARTS AGE
worker-dt2ss 0/1 Completed 0 6h
worker-qm56f 0/1 Completed 0 6h
A Job that is completed when a certain number of Pods terminate successfully. The Completions specifies how many Pods should terminate successfully before the Job is completed.
COMPLETIONS indicates the total number of pods in the job / the number of completed pods in the job. From your use case 2/1 indicates that there are two pods in the job in which one of the pods has been completed.
The DURATION indicates how long the business in the job has been running. This is useful for performance optimization.
And AGE is obtained by subtracting the creation time of a pod from the current time. This parameter specifies the time elapsed since the pod was created.
When you run "kubectl get pods -A -o wide" you get a list of pods and a STATUS column.
Where can I get a list of the possible status options?
What I trying to do is generate a list of statuses and how many pods are in each status. If I had a list of the possible status states I could do what I need.
Thanks.
if you want also result on container basics, you try this command
kubectl get pods -A -o wide --no-headers | cut -b 85-108 | sort | uniq -c
if the output looks like
2 0/1 CrashLoopBackOff
1 0/3 Pending
260 1/1 Running
4 2/2 Running
like comment in Complete list of pod statuses :
$ kubectl get pod -A --no-headers |awk '{arr[$4]++}END{for (a in arr) print a, arr[a]}'
Evicted 1
Running 121
CrashLoopBackOff 4
Completed 5
Pending 1
This command will shows how many pod are currently in what state.
But how to get the possible values of all the states?
In my view, there is no api or command to get it.
This status: "The aggregate status of the containers in this pod." source code can be find in https://github.com/kubernetes/kubernetes/blob/master/pkg/printers/internalversion/printers.go#L741 shows status based on pod.Status.Phase and will be changed.
A phase of a Pod is a simple, high-level summary of where the Pod is in its Lifecycle.
The phase is not intended to be a comprehensive rollup of observations of Container or Pod state, nor is it intended to be a comprehensive state machine.
Here are the possible values for phase:
Pending The Pod has been accepted by the Kubernetes system, but one or more of the Container images has not been created. This includes time before being scheduled as well as time spent downloading images over the network, which could take a while.
Running The Pod has been bound to a node, and all the Containers have been created. At least one Container is still running, or is working on starting or restarting.
Succeeded All Containers in the Pod have terminated in success, and will not be restarted.
Failed All Containers in the Pod have terminated, and at least one Container has terminated in failure. That is, the Container either exited with non-zero status or was terminated by the system.
Unknown For some reason the state of the Pod could not be obtained, typically due to an error in communicating with the host of the Pod.
If you are interested in detailed arrays with Pod conditions, I suggest looking at Pod Lifecycle from Kubernetes documentation and inspect source code for remaining information.
I am working on two servers, each with a number of pods. The first server is Validation Env and I can use kubernetes commands, but the second server is on Prod Env I do not have full rights. It is out of the question to get full rights on the last one.
So, I am doing a platform stability statistic and I need info about the last reset of pods. I can see the "Age" but I cannot use a screenshot in my statistic, so I need a command that outputs every pods age or the last reset.
P.S. Every night at 00:00 the pods are saved and archived in a separate folder.
Get pods already gives you that info:
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-7cdbd8cdc9-8pnzq 1/1 Running 0 36s
$ kubectl delete po nginx-7cdbd8cdc9-8pnzq
pod "nginx-7cdbd8cdc9-8pnzq" deleted
$ kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-7cdbd8cdc9-67l8l 1/1 Running 0 4s
I found a solution:
command:
zgrep "All subsystems started successfully" 201911??/*ota*
response:
23:23:37,429 [INFO ] main c.o.c.a.StartUp - All subsystems started successfully
P.S. "ota" is my pod's name.