How to find out if a K8s job failed or succeeded using kubectl? - kubernetes

I have a Kubernetes job that runs for some time, and I need to check if it failed or was successful.
I am checking this periodically:
kubectl describe job/myjob | grep "1 Succeeded"
This works but I am concerned that a change in kubernetes can break this; say, the message is changed to "1 completed with success" (stupid text but you know what I mean) and now my grep will not find what it is looking for.
Any suggestions? this is being done in a bash script.

You can get this information from the job using jsonpath filtering to select the .status.succeeded field of the job you are interested in. It will only return the value you are interested in.
from kubectl explain job.status.succeeded:
The number of pods which reached phase Succeeded.
This command will get you that field for the particular job specified:
kubectl get job <jobname> -o jsonpath={.status.succeeded}

Related

kubectl wait not working for creation of resources

How do you get around waiting on resources not yes created?
In script I get:
kubectl wait --for=condition=ready --timeout=60s -n <some namespace> --all pods
error: no matching resources found
This is a community wiki answer posted for better visibility. Feel free to expand it.
As documented:
Experimental: Wait for a specific condition on one or many resources.
The command takes multiple resources and waits until the specified
condition is seen in the Status field of every given resource.
Alternatively, the command can wait for the given set of resources to
be deleted by providing the "delete" keyword as the value to the --for
flag.
A successful message will be printed to stdout indicating when the
specified condition has been met. One can use -o option to change to
output destination.
This command will not work for the resources that hasn't been created yet. #EmruzHossain has posted two valid points:
Make sure you have provided a valid namespace.
First wait for the resource to get created. Probably a loop running kubectl get periodically. When the desired resource is found, break the loop. Then, run kubectl wait to wait for the resource to be ready.
Also, there is this open thread: kubectl wait for un-existed resource. #83242 which is still waiting (no pun intended) to be implemented.

How to get start, finish time and status kubernetes jobs

I try to use
kubectl get jobs -o custom-columns=TIMESTAMP:.metadata.creationTimestamp,NAME:.metadata.name
but can't find custom columns for 'ending at' and 'status'.
Where can I find a columns list for kubectl get and describe job?
You can use the status section of the job to get the required details.
The completion time is present at .status.completionTime and the status at .status.conditions[].type.
kubectl get jobs dummy-xxxx-xxxx -o custom-columns=TIMESTAMP:.metadata.creationTimestamp,NAME:.metadata.name,COMPLETIONTIME:.status.completionTime,status:.status.conditions[].type
TIMESTAMP NAME COMPLETIONTIME status
2021-02-08T14:40:03Z dummy-xxxx-xxxx 2021-02-08T14:40:50Z Complete
kubectl get jobs dummy-yyyy-yyyy -o custom-columns=TIMESTAMP:.metadata.creationTimestamp,NAME:.metadata.name,COMPLETIONTIME:.status.completionTime,status:.status.conditions[].type
TIMESTAMP NAME COMPLETIONTIME status
2021-02-08T12:00:08Z dummy-yyyy-yyyy <none> Failed
Note that the completion time is not set for the Failed jobs and there was already a bug that mentions this problem.
When a Job completes (finished successfully), its
.status.completionTime will be set, and Job conditions will have one
with type==Complete.
When a Job fails, Job conditions will have one with type==Failed.

How to check a job is completed before it's been created using kubectl?

I can use kubectl wait --for=condition=complete --timeout=<some time> job/<job-name> to wait for a job in completed state. However, if the job has not yet been created (sometimes, it's due to k8s takes some time to schedule the job), kubectl will exit with error immediately.
Is there a way to wait for the job been created and then transit into completed state? What's the most common way to do this in industry?
kubectl wait does not include the functionality to wait on a non existent resource yet.
For anything complex try and use a kube API client. Run a watch on a resource group and you receive a stream of events for it, and continue on when the event criteria has been met.
If you are stuck in shell land, kubectl doesn't seem to respect SIGPIPE signals when when handling the output of a kubectl get x --watch so maybe a simple loop...
timeout=$(( $(date +%s) + 60 ))
while ! kubectl get job whatever 2>/dev/null; do
[ $(date +%s) -gt $timeout ] && exit 1
sleep 5
done
kubectl wait --for=condition=created --timeout=<some time> job/<job-name>
Edit: If I'm not mistaken, kubectl wait is still experimental anyway - but the condition you name should match whatever you're expecting in a standard status output.
Second Edit: wrong character update

Is it possible to stop a job in Kubernetes without deleting it

Because Kubernetes handles situations where there's a typo in the job spec, and therefore a container image can't be found, by leaving the job in a running state forever, I've got a process that monitors job events to detect cases like this and deletes the job when one occurs.
I'd prefer to just stop the job so there's a record of it. Is there a way to stop a job?
1) According to the K8S documentation here.
Finished Jobs are usually no longer needed in the system. Keeping them around in the system will put pressure on the API server. If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.
Here are the details for the failedJobsHistoryLimit property in the CronJobSpec.
This is another way of retaining the details of the failed job for a specific duration. The failedJobsHistoryLimit property can be set based on the approximate number of jobs run per day and the number of days the logs have to be retained. Agree that the Jobs will be still there and put pressure on the API server.
This is interesting. Once the job completes with failure as in the case of a wrong typo for image, the pod is getting deleted and the resources are not blocked or consumed anymore. Not sure exactly what kubectl job stop will achieve in this case. But, when the Job with a proper image is run with success, I can still see the pod in kubectl get pods.
2) Another approach without using the CronJob is to specify the ttlSecondsAfterFinished as mentioned here.
Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job.
Not really, no such mechanism exists in Kubernetes yet afaik.
You can workaround is to ssh into the machine and run a: (if you're are using Docker)
# Save the logs
$ docker log <container-id-that-is-running-your-job> 2>&1 > save.log
$ docker stop <main-container-id-for-your-job>
It's better to stream log with something like Fluentd, or logspout, or Filebeat and forward the logs to an ELK or EFK stack.
In any case, I've opened this
You can suspend cronjobs by using the suspend attribute. From the Kubernetes documentation:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#suspend
Documentation says:
The .spec.suspend field is also optional. If it is set to true, all
subsequent executions are suspended. This setting does not apply to
already started executions. Defaults to false.
So, to pause a cron you could:
run and edit "suspend" from False to True.
kubectl edit cronjob CRON_NAME (if not in default namespace, then add "-n NAMESPACE_NAME" at the end)
you could potentially create a loop using "for" or whatever you like, and have them all changed at once.
you could just save the yaml file locally and then just run:
kubectl create -f cron_YAML
and this would recreate the cron.
The other answers hint around the .spec.suspend solution for the CronJob API, which works, but since the OP asked specifically about Jobs it is worth noting the solution that does not require a CronJob.
As of Kubernetes 1.21, there alpha support for the .spec.suspend field in the Job API as well, (see docs here). The feature is behind the SuspendJob feature gate.

What is a use case for kubernetes job?

I'm looking to fully understand the jobs in kubernetes.
I have successfully create and executed a job, but I do not see the use case.
Not being able to rerun a job or not being able to actively listen to it completion makes me think it is a bit difficult to manage.
Anyone using them? Which is the use case?
Thank you.
A job retries pods until they complete, so that you can tolerate errors that cause pods to be deleted.
If you want to run a job repeatedly and periodically, you can use CronJob alpha or cronetes.
Some Helm Charts use Jobs to run install, setup, or test commands on clusters, as part of installing services. (Example).
If you save the YAML for the job then you can re-run it by deleting the old job an creating it again, or by editing the YAML to change the name (or use e.g. sed in a script).
You can watch a job's status with this command:
kubectl get jobs myjob -w
The -w option watches for changes. You are looking for the SUCCESSFUL column to show 1.
Here is a shell command loop to wait for job completion (e.g. in a script):
until kubectl get jobs myjob -o jsonpath='{.status.conditions[?(#.type=="Complete")].status}' | grep True ; do sleep 1 ; done
One of the use case can be to take a backup of a DB. But as already mentioned that are some overheads to run a job e.g. When a Job completes the Pods are not deleted . so you need to manually delete the job(which will also delete the pods created by job). so recommended option will be to use Cron instead of Jobs