Is there a way for a deployment or job to delete itself automatically after finished in kubernetes? - kubernetes

Is there a way for a Deployment or Job to completely delete itself upon completion?
I want it such that when I do kubectl get jobs/deployments/pods they don't show up after they finished (container exited).
One possible way I thought of was to call kubectl delete jobs/deployments/pods from within the container but I'm not sure if that's safe.

I found a related question and answer, but found out it is for CronJobs.
Following that answer, there is a TTL Controller introduced Alpha in 1.12 and Beta in 1.21.
You have to manually enable it in Feature Gates before v1.21.
According to the docs:
provides a TTL (time to live) mechanism to limit the lifetime of resource objects that have finished execution. TTL controller only handles Jobs for now, and may be expanded to handle other resources that will finish execution, such as Pods and custom resources.
It could clean up Jobs currently and others in plan.

Related

Whole Application level rolling update

My kubernetes application is made of several flavors of nodes, a couple of “schedulers” which send tasks to quite a few more “worker” nodes. In order for this app to work correctly all the nodes must be of exactly the same code version.
The deployment is performed using a standard ReplicaSet and when my CICD kicks in it just does a simple rolling update. This causes a problem though since during the rolling update, nodes of different code versions co-exist for a few seconds, so a few tasks during this time get wrong results.
Ideally what I would want is that deploying a new version would create a completely new application that only communicates with itself and has time to warm its cache, then on a flick of a switch this new app would become active and start to get new client requests. The old app would remain active for a few more seconds and then shut down.
I’m using Istio sidecar for mesh communication.
Is there a standard way to do this? How is such a requirement usually handled?
I also had such a situation. Kubernetes alone cannot satisfy your requirement, I was also not able to find any tool that allows to coordinate multiple deployments together (although Flagger looks promising).
So the only way I found was by using CI/CD: Jenkins in my case. I don't have the code, but the idea is the following:
Deploy all application deployments using single Helm chart. Every Helm release name and corresponding Kubernetes labels must be based off of some sequential number, e.g. Jenkins $BUILD_NUMBER. Helm release can be named like example-app-${BUILD_NUMBER} and all deployments must have label version: $BUILD_NUMBER . Important part here is that your Services should not be a part of your Helm chart because they will be handled by Jenkins.
Start your build with detecting the current version of the app (using bash script or you can store it in ConfigMap).
Start helm install example-app-{$BUILD_NUMBER} with --atomic flag set. Atomic flag will make sure that the release is properly removed on failure. And don't delete previous version of the app yet.
Wait for Helm to complete and in case of success run kubectl set selector service/example-app version=$BUILD_NUMBER. That will instantly switch Kubernetes Service from one version to another. If you have multiple services you can issue multiple set selector commands (each command executes immediately).
Delete previous Helm release and optionally update ConfigMap with new app version.
Depending on your app you may want to run tests on non user facing Services as a part of step 4 (after Helm release succeeds).
Another good idea is to have preStop hooks on your worker pods so that they can finish their jobs before being deleted.
You should consider Blue/Green Deployment strategy

Argo Workflows semaphore with value 0

I'm learning about semaphores in the Argo project workflows to avoid concurrent workflows using the same resource.
My use case is that I have several external resources which only one workflow can use one at a time. So far so good, but sometimes the resource needs some maintenance and during that period I don't want to Argo to start any workflow.
I guess I have two options:
I tested manually setting the semaphore value in the configMap to the value 0, but Argo started one workflow anyway.
I can start a workflow that runs forever, until it is deleted, claiming the synchronization lock, but that requires some extra overhead to have workflows running that don't do anything.
So I wonder how it is supposed to work if I set the semaphore value to 0, I think it should not start the workflow then since it says 0. Anyone have any info about this?
This is the steps I carried out:
First I apply my configmap with kubectl -f.
I then submit some workflows and since they all use the same semaphore Argo will start one and the rest will be executed in order one at a time.
I then change value of the semapore with kubectl edit configMap
Submit new job which then Argo will execute.
Perhaps Argo will not reload the configMap when I update the configMap through kubectl edit? I would like to update the configmap programatically in the future but used kubectl edit now for testing.
Quick fix: after applying the ConfigMap change, cycle the workflow-controller pod. That will force it to reload semaphore state.
I couldn't reproduce your exact issue. After using kubectl edit to set the semaphore to 0, any newly submitted workflows remained Pending.
I did encounter an issue where using kubectl edit to bump up the semaphore limit did not automatically kick off any of the Pending workflows. Cycling the workflow controller pod allowed the workflows to start running again.
Besides using the quick fix, I'd recommend submitting an issue. Synchronization is a newer feature, and it's possible it's not 100% robust yet.

How to terminate only certain pods based on wheather or not they have finnished a certain task in kubernetes?

I'm having trouble with finding a solution that allows to terminate only certain pods in a deployment.
The application running inside the pods does some processing which can a take lot of time to be finished.
Let's say I have 10 tasks that are stored in a database and I issue a command to scale the deployment to 10 pods.
Let's say that after some time 3 of the pods have finished their tasks and are no longer required.
How can i scale down the deployment from 10 to 7 while terminate only the pods that have finished the tasks and not the pods that are still processing those tasks?
I don't know if more details are needed but i will happily edit the question if there are more details needed to give an answer for this kind of problem.
In this case Kubernetes Job might be better suited for this kind of task.

Is it possible to stop a job in Kubernetes without deleting it

Because Kubernetes handles situations where there's a typo in the job spec, and therefore a container image can't be found, by leaving the job in a running state forever, I've got a process that monitors job events to detect cases like this and deletes the job when one occurs.
I'd prefer to just stop the job so there's a record of it. Is there a way to stop a job?
1) According to the K8S documentation here.
Finished Jobs are usually no longer needed in the system. Keeping them around in the system will put pressure on the API server. If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.
Here are the details for the failedJobsHistoryLimit property in the CronJobSpec.
This is another way of retaining the details of the failed job for a specific duration. The failedJobsHistoryLimit property can be set based on the approximate number of jobs run per day and the number of days the logs have to be retained. Agree that the Jobs will be still there and put pressure on the API server.
This is interesting. Once the job completes with failure as in the case of a wrong typo for image, the pod is getting deleted and the resources are not blocked or consumed anymore. Not sure exactly what kubectl job stop will achieve in this case. But, when the Job with a proper image is run with success, I can still see the pod in kubectl get pods.
2) Another approach without using the CronJob is to specify the ttlSecondsAfterFinished as mentioned here.
Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job.
Not really, no such mechanism exists in Kubernetes yet afaik.
You can workaround is to ssh into the machine and run a: (if you're are using Docker)
# Save the logs
$ docker log <container-id-that-is-running-your-job> 2>&1 > save.log
$ docker stop <main-container-id-for-your-job>
It's better to stream log with something like Fluentd, or logspout, or Filebeat and forward the logs to an ELK or EFK stack.
In any case, I've opened this
You can suspend cronjobs by using the suspend attribute. From the Kubernetes documentation:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#suspend
Documentation says:
The .spec.suspend field is also optional. If it is set to true, all
subsequent executions are suspended. This setting does not apply to
already started executions. Defaults to false.
So, to pause a cron you could:
run and edit "suspend" from False to True.
kubectl edit cronjob CRON_NAME (if not in default namespace, then add "-n NAMESPACE_NAME" at the end)
you could potentially create a loop using "for" or whatever you like, and have them all changed at once.
you could just save the yaml file locally and then just run:
kubectl create -f cron_YAML
and this would recreate the cron.
The other answers hint around the .spec.suspend solution for the CronJob API, which works, but since the OP asked specifically about Jobs it is worth noting the solution that does not require a CronJob.
As of Kubernetes 1.21, there alpha support for the .spec.suspend field in the Job API as well, (see docs here). The feature is behind the SuspendJob feature gate.

Confusion about how to update kubernetes jobs

I am eagerly awaiting the release of Kubernetes v1.3 in mid to late June, so that I can access cron scheduling for jobs. In the meantime, what I plan to do is the following:
Deploy a job on my Kubernetes cluster
Use jenkins as a cron tool to trigger the job in defined intervals (e.g. 1 hour).
I have two questions:
How do I update a job? For replication controllers, I would simply do a rolling update, but in the jobs API spec (http://kubernetes.io/docs/user-guide/jobs/) there are no details about how to do this. For example, lets say that I want to use my jenkins deploy system to update the job whenever I do a git commit.
Is it possible to use the kubernetes API to trigger jobs? For example, I have a job that runs and then the pod is terminated on completion. Then, 1 hour later, I want to use jenkins to trigger the job again.
Thanks so much!
I am not sure if there is any fancy way to trigger a completed job, but one way to do it can be to delete and recreate the job.
Re: rolling-update: that is required for long running pods, which is what RCs control.
For jobs: You can update the podTemplateSpec in jobSpec and that will ensure that any new pod created by the job after the update will have the updated podTemplateSpec (note: already running pods will not be affected).
Hope this helps!