Quick question regarding Kubernetes job status.
Lets assume I submit my resource to 10 PODS and want to check if my JOB is completed successfully.
What is the best available options that we can use from KUBECTL commands.
I think of kubectl get jobs but the problem here is you have only two codes 0 & 1. 1 for completion 0 for failed or running, we cannot really depend on this
Other option is kubectl describe to check the PODS status like out of 10 PODS how many are commpleted/failed.
Any other effective way of monitoring the PODs? Please let me know
Anything that can talk to the Kubernetes API can query for Job object and look at the JobStatus field, which has info on which pods are running, completed, failed, or unavailable. kubectl is probably the easiest, as you mentioned, but you could write something more specialized using any client library if you wanted/needed to.
I was able to able to achieve it by running the following, which will show the status of the container:
kubectl describe job
Related
I have a bunch of Rancher clusters I take care of and on some of them developers use PriorityClasses to ensure that some of the more important workloads get scheduled. The 3 PriorityClasses are in 3 digits range so they will not interfere with the default ones. However, at present none of the PriorityClasses is set as default and neither is the preemptionPolicy set so it defaults to PreemptLowerPriority.
None of the rancher, longhorn, prometheus, grafana, etc., workloads have priorityClassName set.
Long story short, I believe this causes havoc on the cluster when resources are in short supply.
Before I take my opinion to the developers I would like to collect some data to back up my story.
The question: How do I detect if the pod was Terminated due to Preemption?
I tried to google the subject but couldn't find anything. I was hoping kube state metrics would have something but I didn't find anything.
Any help would be greatly appreciated.
You can try to look for convincing data like the pod termination reason with help of kubectl.
You can see the last restart logs of a container using the following command:
kubectl logs podname -c containername --previous
You can also use the following command to check the lifecycle events sent by the kubelet to the apiserver about the pod.
kubectl describe pod podname
Finally, You can also write a final message to /dev/termination-log, and this will show up as described in the docs.
To use kubectl commands with rancher kindly refer to this documentation page.
I was wondering if there is a kubectl command to quickly get the history of all STATUS for a given pod?
for example: Lets say a pod - my-test-pod went from ContainerCreating to Running to OomKill to Terminating:
I was wondering if there is a command that experts use to get this lineage. Appreciate a nudge..
Using kubectl get events you can only see events of last 1 hour. If you want to persist events for a longer duration you can sue eventrouter.The event router serves as an active watcher of event resource in the kubernetes system, which takes those events and pushes them to a user specified sink. This is useful for a number of different purposes, but most notably long term behavioral analysis of your workloads running on your kubernetes cluster.
kubectl get events or kubectl describe pod which shows the events for the pod at the bottom. However events are only kept for a little while, so it's not a permanent history. For that you would need some webhooks or a tool like Prometheus.
I'm wondering for a batch distributed job I need to run. Is there a way in K8S if I use a Job/Stateful Set or whatever, a way for the pod itself(via ENV var or whatever) to know its 1 of X pods run for this job?
I need to chunk up some data and have each process fetch the stuff it needs.
--
I guess the statefulset hostname setting is one way of doing it. Is there a better option?
This is planned but not yet implemented that I know of. You probably want to look into higher order layers like Argo Workflows or Airflow instead for now.
You could write some infrastructure as code using Ansible that will perform the following tasks in order:
kubectl create -f jobs.yml
kubectl wait --for=condition=complete job/job1
kubectl wait --for=condition=complete job/job2
kubectl wait --for=condition=complete job/job3
kubectl create -f pod.yml
kubectl wait can be used in situations like this to halt progress until an action has been performed. In this case, a job has completed its run.
Here is a similar question that someone asked on StackOverflow before.
I am currently working on a monitoring service that will monitor Kubernetes' deployments and their pods. I want to notify users when a deployment is not running the expected amount of replicas and also when pods' containers restart unexpectedly. This may not be the right things to monitor and I would greatly appreciate some feedback on what I should be monitoring.
Anyways, the main question is the differences between all of the Statuses of pods. And when I say Statuses I mean the Status column when running kubectl get pods. The statuses in question are:
- ContainerCreating
- ImagePullBackOff
- Pending
- CrashLoopBackOff
- Error
- Running
What causes pod/containers to go into these states?
For the first four Statuses, are these states recoverable without user interaction?
What is the threshold for a CrashLoopBackOff?
Is Running the only status that has a Ready Condition of True?
Any feedback would be greatly appreciated!
Also, would it be bad practice to use kubectl in an automated script for monitoring purposes? For example, every minute log the results of kubectl get pods to Elasticsearch?
You can see the pod lifecycle details in k8s documentation.
The recommended way of monitoring kubernetes cluster and applications are with prometheus
I will try to tell what I see hidden behind these terms
ContainerCreating
Showing when we wait to image be downloaded and the
container will be created by a docker or another system.
ImagePullBackOff
Showing when we have problem to download the image from a registry. Wrong credentials to log in to the docker hub for example.
Pending
The container starts (if start take time) or started but redinessProbe failed.
CrashLoopBackOff
This status showing when container restarts occur too much often. For example, we have process that tries to read not exists file and crash. Then the container will be recreated by Kube and repeat.
Error
This is pretty clear. We have some errors to run the container.
Running
All is good container running and livenessProbe is OK.
I am building deploy pipeline. I Need a "kubectl" command that would tell me that rollout is completed to all the pods then I can deploy to next stage.
The Deployment documentation suggests kubectl rollout status, which among other things will return a non-zero exit code if the deployment isn't complete. kubectl get deployment will print out similar information (how many replicas are expected, available, and up-to-date), and you can add a -w option to watch it.
For this purpose you can also consider using one of the Kubernetes APIs. You can "get" or "watch" the deployment object, and get back something matching the structure of a Deployment object. Using that you can again monitor the replica count, or the embedded condition list, and decide if it's ready or not. If you're using the "watch" API you'll continue to get updates as the object status changes.
The one trick here is detecting failed deployments. Say you're deploying a pod that depends on a database; usual practice is to configure the pod with the hostname you expect the database to have, and just crash (and get restarted) if it's not there yet. You can briefly wind up in CrashLoopBackOff state when this happens. If your application or deployment is totally wrong, of course, you'll also wind up in CrashLoopBackOff state, and your deployment will stop progressing. There's not an easy way to tell these two cases apart; consider an absolute timeout.