How do you get around waiting on resources not yes created?
In script I get:
kubectl wait --for=condition=ready --timeout=60s -n <some namespace> --all pods
error: no matching resources found
This is a community wiki answer posted for better visibility. Feel free to expand it.
As documented:
Experimental: Wait for a specific condition on one or many resources.
The command takes multiple resources and waits until the specified
condition is seen in the Status field of every given resource.
Alternatively, the command can wait for the given set of resources to
be deleted by providing the "delete" keyword as the value to the --for
flag.
A successful message will be printed to stdout indicating when the
specified condition has been met. One can use -o option to change to
output destination.
This command will not work for the resources that hasn't been created yet. #EmruzHossain has posted two valid points:
Make sure you have provided a valid namespace.
First wait for the resource to get created. Probably a loop running kubectl get periodically. When the desired resource is found, break the loop. Then, run kubectl wait to wait for the resource to be ready.
Also, there is this open thread: kubectl wait for un-existed resource. #83242 which is still waiting (no pun intended) to be implemented.
Related
I can use kubectl wait --for=condition=complete --timeout=<some time> job/<job-name> to wait for a job in completed state. However, if the job has not yet been created (sometimes, it's due to k8s takes some time to schedule the job), kubectl will exit with error immediately.
Is there a way to wait for the job been created and then transit into completed state? What's the most common way to do this in industry?
kubectl wait does not include the functionality to wait on a non existent resource yet.
For anything complex try and use a kube API client. Run a watch on a resource group and you receive a stream of events for it, and continue on when the event criteria has been met.
If you are stuck in shell land, kubectl doesn't seem to respect SIGPIPE signals when when handling the output of a kubectl get x --watch so maybe a simple loop...
timeout=$(( $(date +%s) + 60 ))
while ! kubectl get job whatever 2>/dev/null; do
[ $(date +%s) -gt $timeout ] && exit 1
sleep 5
done
kubectl wait --for=condition=created --timeout=<some time> job/<job-name>
Edit: If I'm not mistaken, kubectl wait is still experimental anyway - but the condition you name should match whatever you're expecting in a standard status output.
Second Edit: wrong character update
I have a simple containerised python script which I am trying to parallelise with Kubernetes. This script guesses hashes until it finds a hashed value below a certain threshold.
I am only interested in the first such value, so I wish to create a Kubernetes job that spawns n worker pods and completes as soon as one worker pod finds a suitable value.
By default, Kubernetes jobs wait until all worker pods complete before marking the job as complete. I have so far been unable to find a way around this (no mention of this job pattern in the documentation), and have been relying on checking the logs of bare pods via a bash script to determine whether one has completed.
Is there a native means to achieve this? And, if not, what would be the best approach?
Hi look this link https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#parallel-jobs.
I've never tried it but it seems possible to launch several pods and configure the end of the job when x pods have finished. In your case x is 1.
We can define two specifications for parallel Jobs:
1. Parallel Jobs with a fixed completion count:
specify a non-zero positive value for .spec.completions.
the Job represents the overall task, and is complete when there is
one successful Pod for each value in the range 1 to
.spec.completions
not implemented yet: Each Pod is passed a different index in the
range 1 to .spec.completions.
2. Parallel Jobs with a work queue:
do not specify .spec.completions, default to .spec.parallelism
the Pods must coordinate amongst themselves or an external service to
determine what each should work on.
For example, a Pod might fetch a batch of up to N items from the work queue.
each Pod is independently capable of determining whether or not all its peers are done, and thus that the entire Job is done.
when any Pod from the Job terminates with success, no new Pods are
created
once at least one Pod has terminated with success and all Pods are
terminated, then the Job is completed with success
once any Pod has exited with success, no other Pod should still be
doing any work for this task or writing any output. They should all
be in the process of exiting
For a fixed completion count Job, you should set .spec.completions to the number of completions needed. You can set .spec.parallelism, or leave it unset and it will default to 1.
For a work queue Job, you must leave .spec.completions unset, and set .spec.parallelism to a non-negative integer.
For more information about how to make use of the different types of job, see the job patterns section.
You can also take a look on single job which starts controller pod:
This pattern is for a single Job to create a Pod which then creates other Pods, acting as a sort of custom controller for those Pods. This allows the most flexibility, but may be somewhat complicated to get started with and offers less integration with Kubernetes.
One example of this pattern would be a Job which starts a Pod which runs a script that in turn starts a Spark master controller (see spark example), runs a spark driver, and then cleans up.
An advantage of this approach is that the overall process gets the completion guarantee of a Job object, but complete control over what Pods are created and how work is assigned to them.
At the same time take under consideration that completition status of Job set by dafault - when specified number of successful completions is reached it ensure that all tasks are processed properly. Applying this status before all tasks are finished is not secure solution.
You should also know that finished Jobs are usually no longer needed in the system. Keeping them around in the system will put pressure on the API server. If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.
Here is official documentations: jobs-parallel-processing , parallel-jobs.
Useful blog: article-parallel job.
EDIT:
Another option is that you can create special script which will continuously check values you look for. Using job then will not be necessary, you can simply use deployment.
I have a Kubernetes job that runs for some time, and I need to check if it failed or was successful.
I am checking this periodically:
kubectl describe job/myjob | grep "1 Succeeded"
This works but I am concerned that a change in kubernetes can break this; say, the message is changed to "1 completed with success" (stupid text but you know what I mean) and now my grep will not find what it is looking for.
Any suggestions? this is being done in a bash script.
You can get this information from the job using jsonpath filtering to select the .status.succeeded field of the job you are interested in. It will only return the value you are interested in.
from kubectl explain job.status.succeeded:
The number of pods which reached phase Succeeded.
This command will get you that field for the particular job specified:
kubectl get job <jobname> -o jsonpath={.status.succeeded}
Because Kubernetes handles situations where there's a typo in the job spec, and therefore a container image can't be found, by leaving the job in a running state forever, I've got a process that monitors job events to detect cases like this and deletes the job when one occurs.
I'd prefer to just stop the job so there's a record of it. Is there a way to stop a job?
1) According to the K8S documentation here.
Finished Jobs are usually no longer needed in the system. Keeping them around in the system will put pressure on the API server. If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.
Here are the details for the failedJobsHistoryLimit property in the CronJobSpec.
This is another way of retaining the details of the failed job for a specific duration. The failedJobsHistoryLimit property can be set based on the approximate number of jobs run per day and the number of days the logs have to be retained. Agree that the Jobs will be still there and put pressure on the API server.
This is interesting. Once the job completes with failure as in the case of a wrong typo for image, the pod is getting deleted and the resources are not blocked or consumed anymore. Not sure exactly what kubectl job stop will achieve in this case. But, when the Job with a proper image is run with success, I can still see the pod in kubectl get pods.
2) Another approach without using the CronJob is to specify the ttlSecondsAfterFinished as mentioned here.
Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job.
Not really, no such mechanism exists in Kubernetes yet afaik.
You can workaround is to ssh into the machine and run a: (if you're are using Docker)
# Save the logs
$ docker log <container-id-that-is-running-your-job> 2>&1 > save.log
$ docker stop <main-container-id-for-your-job>
It's better to stream log with something like Fluentd, or logspout, or Filebeat and forward the logs to an ELK or EFK stack.
In any case, I've opened this
You can suspend cronjobs by using the suspend attribute. From the Kubernetes documentation:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#suspend
Documentation says:
The .spec.suspend field is also optional. If it is set to true, all
subsequent executions are suspended. This setting does not apply to
already started executions. Defaults to false.
So, to pause a cron you could:
run and edit "suspend" from False to True.
kubectl edit cronjob CRON_NAME (if not in default namespace, then add "-n NAMESPACE_NAME" at the end)
you could potentially create a loop using "for" or whatever you like, and have them all changed at once.
you could just save the yaml file locally and then just run:
kubectl create -f cron_YAML
and this would recreate the cron.
The other answers hint around the .spec.suspend solution for the CronJob API, which works, but since the OP asked specifically about Jobs it is worth noting the solution that does not require a CronJob.
As of Kubernetes 1.21, there alpha support for the .spec.suspend field in the Job API as well, (see docs here). The feature is behind the SuspendJob feature gate.
I'm looking to fully understand the jobs in kubernetes.
I have successfully create and executed a job, but I do not see the use case.
Not being able to rerun a job or not being able to actively listen to it completion makes me think it is a bit difficult to manage.
Anyone using them? Which is the use case?
Thank you.
A job retries pods until they complete, so that you can tolerate errors that cause pods to be deleted.
If you want to run a job repeatedly and periodically, you can use CronJob alpha or cronetes.
Some Helm Charts use Jobs to run install, setup, or test commands on clusters, as part of installing services. (Example).
If you save the YAML for the job then you can re-run it by deleting the old job an creating it again, or by editing the YAML to change the name (or use e.g. sed in a script).
You can watch a job's status with this command:
kubectl get jobs myjob -w
The -w option watches for changes. You are looking for the SUCCESSFUL column to show 1.
Here is a shell command loop to wait for job completion (e.g. in a script):
until kubectl get jobs myjob -o jsonpath='{.status.conditions[?(#.type=="Complete")].status}' | grep True ; do sleep 1 ; done
One of the use case can be to take a backup of a DB. But as already mentioned that are some overheads to run a job e.g. When a Job completes the Pods are not deleted . so you need to manually delete the job(which will also delete the pods created by job). so recommended option will be to use Cron instead of Jobs