How to schedule jobs in Kubeflow? - kubernetes

I'm setting up a Kubeflow cluster on AWS EKS, is there a native way in Kubeflow that allows us to automatically schedule jobs i.e. (Run the workflow every X hours, get data every X hours, etc.)
I have tried to look for other things like Airflow, but i'm not really sure if it will integrate well with the Kubeflow environment.

That should be what a recurring run is for.
That would be using a run trigger, which does have a cron field, for specifying cron semantics for scheduling runs.

Related

Google Kubernetes Api Cron Job

I have a cluster in Google Kubernetes Engine, in that cluster there is a workload which runs every 4 hours, its a cron job that was set up by someone. I want to make that run whenever I need it. I am trying to achieve this by using the google Kubernetes API, sending requests from my app whenever a button is clicked to run that cron job, unfortunately the API has no apparent way to do that, or does not have a way at all. What would be some good advice to achieve my goal?
This is a Community Wiki answer, posted for better visibility, so feel free to edit it and add any additional details you consider important.
CronJob resource in kubernetes is not meant to be used one-off tasks, that are run on demand. It is rather configured to run on a regular schedule.
Manuel Polacek has already mentioned that in his comment:
For this scenario you don't need a cron job. A simple bare pod or a
job would be enough, i would say. You can apply a resource on button
push, for example with kubectl – Manuel Polacek Apr 24 at 19:25
So rather than trying to find a way to run your CronJobs on demand, regardless of how they are originally scheduled (usually to be repeated at regular intervals), you should copy the code of such CronJob and find a different way of running it. A Job fits ideally to such use case as it is designed to run one-off tasks.

Use Airflow to run parametrized jobs on-demand and with a schedule

I have a reporting application that uses Celery to process thousands of jobs per day. There is a python module per each report type that encapsulates all job steps. Jobs take customer-specific parameters and typically complete within a few minutes. Currently, jobs are triggered by customers on-demand when they create a new report or request a refresh of an existing one.
Now, I would like to add scheduling, so the jobs run daily, and reports get refreshed automatically. I understand that Airflow shines at task orchestration and scheduling. I also like the idea of expressing my jobs as DAGs and getting the benefit of task retries. I can see how I can use Airflow to run scheduled batch-processing jobs, but I am unsure about my use case.
If I express my jobs as Airflow DAGs, I will still need to run them parametrized for each customer. It means, if the customer creates a new report, I will need to have a way to trigger a DAG with the customer-specific configuration. And with a scheduled execution, I will need to enumerate all customers and create a parametrized (sub-)DAG for each of them. My understanding this should be possible since Airflow supports DAGs created dynamically, however, I am not sure if this is an efficient and correct way to use Airflow.
I wonder if anyway considered using Airflow for a scenario similar to mine.
Celery workflows do literally the same, and you can create and run them at any point of time. Also, Celery has a pretty good scheduler (I have never seen it failing in 5 years of using Celery) - Celery Beat.
Sure, Airflow can be used to do what you need without any problems.
You can use Airflow to create DAGs dynamically, I am not sure if this will work with a scale of 1000 of DAGs though. There are some good examples on astronomer.io on Dynamically Generating DAGs in Airflow.
I have some DAGs and task that are dynamically generated by a yaml configuration with different schedules and configurations. It all works without any issue.
Only thing that might be challenging is the "jobs are triggered by customers on-demand" - I guess you could trigger any DAG with Airflow's REST API, but it's still in a experimental state.

Run task defintion after stack creation

The question seems simple enough. I have a bunch task definitions and a cluster in my CloudFormation template. When setting up manually I would create a task based on any definition and provide it with a CRON definition. It would then start to run.
I can't seem to find this option in CF? I found service but this only works for tasks that run indefinitely, which mine are not (they run once per day for approx. 10-20 minutes).
After some research I found out about AWS::Events::Rule which people seem to only use in conjunction with Lambda which I do not. I was unable to find any example that referenced FARGATE tasks so I'm not sure it's even possible.
If anyone has any examples of running tasks in CRON using CF, that would be great.
I think that ECS scheduled tasks (cron) would suit you:
Amazon ECS supports the ability to schedule tasks on either a cron-like schedule or in a response to CloudWatch Events. This is supported for Amazon ECS tasks using both the Fargate and EC2 launch types.
This is based on CloudWatch Events which can be used to schedule many things, not only lambda.
To setup it using CloudFormation you can use AWS::Events::Rule with the target of AWS::Events::Rule EcsParameters

Running very lightweight tasks periodically with kubernetes

Consider a requirement where we need to run very simple and lightweight tasks , say running curl command every 10 minutes.
If this was to run in a kubernetes cluster , is it efficient to create a container every 10 minutes ? Just to execute a task that may take a few seconds or even millisecond ? Is it an overkill from time and cost angle ?
Please note unfortunately lambda functions or cloud functions is not an option.
You can use a CronJob to run Jobs on a time-based schedule. These automated jobs run like Cron tasks on a Linux or UNIX system. Cron jobs are useful for creating periodic and recurring tasks.
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/

How to restart service at scheduled time in Marathon?

Is there any way to have a Docker-based service on Marathon restart itself at a given time everyday? What I'd like is a way to say something like "scale to 0 at midnight and scale it to 1 at 6am" or something like that.
On DC/OS there is the notion of jobs but it isn't clear to me whether a job can restart a running service.
As far as I know Marathon has no such feature, Marathon is used to manage(create/delete/scale/health-check) apps on Mesos cluster as what init process(e.g. Systemd) do for Linux. Scheduled jobs is delegated to other frameworks, scheduled jobs functionality on CS/OS mentioned in your question is provided by metronome, and there's also a sophisticated framework Chronos to do the same thing, as what crontab job for Linux.
Even Marathon has no built-in features like that, it provides rich RESTful APIs, you can easily resolve your problem by using Chronos and Marathon together:
Create a script to stop/start your app through Marathon API
Create Chronos job to run your script at midnight to stop your app
Create Chronos job to run your script at 6AM to start app
You can use mesos-chronos for scheduling job.Docker can be scheduled using it.More details at https://mesos.github.io/chronos/