What I would like to do
I would like to create a Kubernetes workflow where users could POST jobs whenever they wanted, and they might do it at any time, not necessarily scheduling anything (CronJobs), or specifying parallelism or completion requirements, i.e., users could create Jobs on demand.
How I would do it right now
The way I'm thinking about accomplishing this is by simply applying the Jobs to the Kubernetes cluster (I also have to make sure the job doesn't have the same name of a current one because otherwise Kubernetes will think it's a mistake and won't create another one). However, this feels improper because the Jobs will be kind of scattered on the cluster and I would lose control over them (though Kubernetes would supposedly automatically manage them optimally).
Is there a better, proper a way?
I imagine a more proper way of configuring all this is to create some sort of Deployment and Service on top of the Jobs, but is that an existing feature on Kubernetes? Huge companies probably have had this problem in the past so I wonder: what are the best practices for this Kubernetes Jobs On Demand use case?
Not a full answer but you might be interested in this project: https://github.com/ivoscc/kubernetes-task-runner.
It provides an API to launch one-time tasks as Jobs on a Kubernetes cluster, handles input/output files via GCS and periodically cleans up finished Jobs.
Related
I have read about Kubernetes CronJobs, but I'm looking for a more flexible scheduling solution (I'm using GKE). In particular, I have a web app that, upon some user setting a checkbox on a dashboard, I want to trigger some service every X minutes. If the user clears the checkbox, the trigger will stop. I was hoping there are such ready-made services. What's the best approach here?
I want to trigger some service every X minutes. If the user clears the checkbox, the trigger will stop
The simplest way to do this would be to have your web app create a CronJob in kubernetes API when user enables this, and deletes that object when he disables it.
Though I'm not sure something like this would scale very well. Depends on your app. Going with Kubernetes Cronjobs, each job would create a pod, allocating resources, pull image, start container, run stuff, terminate. There's some overhead that could be avoided -- depending on what you're doing, this may or might not make sense. Another way to do this would be to implement some jobs queue in your application.
Eg: in NodeJS, I would use something like bee-queue, bull or kue. A single "worker" could then process jobs from multiple users, in parallel, and/or with some concurrency limit, ... A timer (eg: node-schedule) could trigger jobs. Web frontend deals with enabling or disabling timers one behalf of users, user selection may be kept in whatever SGBD/noSGBD you have available. Or even in a ConfigMap (data has sizes limitation!).
With a couple workers (running as Deployments or StatefulSet), some master/slave redis setup, I should be able to deal with lots of different jobs. Maybe add some HorizontalPodAutoscaler, allowing for adding/removing workers depending on CPU or memory usage of your workers.
While if I were to create kubernetes CronJobs for each user requesting something, that could make for a lot of Pods to schedule, potentially waisting resources or testing my cluster limits.
Triggering schedules is a typical use case of Google cloud functions, that is the serverless approach.
I think it's also cost effective, instead of GKE.
Look at these docs:
https://cloud.google.com/scheduler/docs/tut-pub-sub
You might use a cloud function to invoke a GKE CronJob, or a kubernetes replica set creation with replicas 1 using an image for the scheduled job. It might be a spring boot micro-service with the #Scheduled and actual schedule loaded from parameters. To disable the schedule you scale down the pod to 0 replicas.
Remember that in order to access the VPC of GKE nodes you need a VPC access because cloud functions are serverless.
Anyway you can understand that GKE is a cumbersome and costly approach.
In general, I have a single workflow that I want to be able to monitor. The workflow should start whenever new files arrive or alternatively at certain scheduled times, i.e. I want to be able to insert new "jobs" to the workflow as they come, and process the files by going through multiple different tasks and steps. I want to be able to monitor each file going through the tasks.
The queues and distributing the load for each task might be managed by Celery, but it's not decided yet either.
I've looked at Apache Airflow, and as far as I understand at the moment, is geared more towards monitoring many different workflows, such that each workflow is mostly running from start to end, not adding new files to the beginning of the flow before the previous run ended.
Cadence workflow seems like can do what I need, but also seems to be a bit of an overkill.
I'm not expecting a specific final solution here, but I would appreciate suggestions to more such solutions that I can look into and can fit the above.
Luigi - https://luigi.readthedocs.io/en/stable/
Extremely light-weight and fast compared to Airflow.
I have an app I'm building on Kubernetes which needs to dynamically add and remove worker pods (which can't be known at initial deployment time). These pods are not interchangeable (so increasing the replica count wouldn't make sense). My question is: what is the right way to do this?
One possible solution would be to call the Kubernetes API to dynamically start and stop these worker pods as needed. However, I've heard that this might be a bad way to go since, if those dynamically-created pods are not in a replica set or deployment, then if they die, nothing is around to restart them (I have not yet verified for certain if this is true or not).
Alternatively, I could use the Kubernetes API to dynamically spin up a higher-level abstraction (like a replica set or deployment). Is this a better solution? Or is there some other more preferable alternative?
If I understand you correctly you need ConfigMaps.
From the official documentation:
The ConfigMap API resource stores configuration data as key-value
pairs. The data can be consumed in pods or provide the configurations
for system components such as controllers. ConfigMap is similar to
Secrets, but provides a means of working with strings that don’t
contain sensitive information. Users and system components alike can
store configuration data in ConfigMap.
Here you can find some examples of how to setup it.
Please try it and let me know if that helped.
One of the documented best practices for Kubernetes is to store the configuration in version control. It is mentioned in the official best practices and also summed up in this Stack Overflow question. The reason is that this is supposed to speed-up rollbacks if necessary.
My question is, why do we need to store this configuration if this is already stored by Kubernetes and there are ways with which we can easily go back to a previous version of the configuration using for example kubectl? An example is a command like:
kubectl rollout history deployment/nginx-deployment
Isn't storing the configuration an unnecessary duplication of a piece of information that we will then have to keep synchronized?
The reason I am asking this is that we are building a configuration service on top of Kubernetes. The user will interact with it to configure multiple deployments, I was wondering if we should keep a history of the Kubernetes configuration and the content of configMaps in a database for possible roll backs or if we should just rely on kubernetes to retrieve the current configuration and rolling back to previous versions of the configuration.
You can use Kubernetes as your store of configuration, to your point, it's just that you probably shouldn't want to. By storing configuration as code, you get several benefits:
Configuration changes get regular code reviews.
They get versioned, are diffable, etc.
They can be tested, linted, and whatever else you desired.
They can be refactored, share code, and be documented.
And all this happens before actually being pushed to Kubernetes.
That may seem bad ("but then my configuration is out of date!"), but keep in mind that configuration is actually never in date - just because you told Kubernetes you want 3 replicas running doesn't mean there are, or if there were that 1 isn't temporarily down right now, and so on.
Configuration expresses intent. It takes a different process to actually notice when your intent changes or doesn't match reality, and make it so. For Kubernetes, that storage is etcd and it's up to the master to, in a loop forever, ensure the stored intent matches reality. For you, the storage is source control and whatever process you want, automated or not, can, in a loop forever, ensure your code eventually becomes reflected in Kubernetes.
The rollback command, then, is just a very fast shortcut to "please do this right now!". It's for when your configuration intent was wrong and you don't have time to fix it. As soon as you roll back, you should chase your configuration and update it there as well. In a sense, this is indeed duplication, but it's a rare event compared to the normal flow, and the overall benefits outweigh this downside.
Kubernetes cluster doesn't store your configuration it runs it, as you server runs your application code.
I'm interested in scientific workflow scheduling. I'm trying to figure out and modify the existing scheduling algorithm inside Pegasus workflow management system from http://pegasus.isi.edu/, but I don't know where it is and how to do so. Thanks!
Pegasus has a notion of site selection during it's mapping phase where it maps the jobs to the various sites defined in the site catalog. The site selection is explained in the documentation here
https://pegasus.isi.edu/wms/docs/latest/running_workflows.php#mapping_refinement_steps
Internally, there is a site selector interface that you can implement to incorporate your own scheduling algorithms.
You can access the javadoc at
https://pegasus.isi.edu/wms/docs/latest/javadoc/edu/isi/pegasus/planner/selector/SiteSelector.html
There are some implementations included in this package
There is a version of Heft also implemented there. The algorithm is implemented in the the following class.
edu.isi.pegasus.planner.selector.site.heft.Algorithm
Looking at the Heft implementation of site selector will provide you a good template on how to incorporate other site selection algorithms.
However, you need to keep in mind, that Pegasus maps the workflow to various sites and then hands over the workflow to Condor DAGMan for execution. Condor DAGMAn looks at what jobs are ready to run and then releases them to local Condor queue ( managed by Condor Schedd). The jobs are then submitted to the remote sites by Condor Schedd. The actual node on which a job gets executed is determined the by local resource scheduler on the site. For example, if you submit the jobs in a workflow to a site that is running PBS , then PBS decides the actual nodes on which a job runs.
In case of Condor you can associate requirements with your jobs that can help you steer jobs to specific nodes etc.
With a workflow, you can also associate job priorities that determine the priority of the job in the local Condor Queue on the submit host. You can use that to control what job gets submitted by schedd first if there are multiple jobs in the queue.