Hi I'm running batch jobs via SCDF in openshift environment. All the jobs have been scheduled through the scheduling option in SCDF. Is there way to pause or Hold those jobs from executing instead of destroying the schedules ? Since the number of jobs are more, everytime we have to recreated the schedules for all of them.
Thanks.
We have an open issue: spring-cloud/spring-cloud-dataflow#3276 to add support for it.
Feel free to update the issue with your use-case requirements and the acceptance criteria. Better yet, it'd be great if you can contribute adding support for it in a PR; we would love to collaborate and release it.
Related
We have some spring-batch jobs are triggered by autosys with shell scripts as short lived processes.
Right now there's no way to view what is going on in the spring-batch process so I was exploring ways to view the status & manage(stop) the jobs.
Spring Cloud Data Flow is one of the options that I was exploring - but it seems that may not work when jobs are scheduled with Autosys.
What are the other options that I can explore in this regard and what is the recommended approach to manage spring-batch jobs now?
To stop a job, you first need to get the ID of the job execution to stop. This can be done using the JobExplorer API that allows you to explore meta-data that Spring Batch is aware of in the job repository. Once you get the job execution ID, you can stop it by calling the JobOperator#stop method, please refer to the Stopping a job section of the reference documentation.
This is independent of any method you used to launch the job (either manually, or via a scheduler or a graphical tool) and allows you to gracefully stop a job and leave the repository in a consistent state (ready for a restart if needed).
I have a cluster in Google Kubernetes Engine, in that cluster there is a workload which runs every 4 hours, its a cron job that was set up by someone. I want to make that run whenever I need it. I am trying to achieve this by using the google Kubernetes API, sending requests from my app whenever a button is clicked to run that cron job, unfortunately the API has no apparent way to do that, or does not have a way at all. What would be some good advice to achieve my goal?
This is a Community Wiki answer, posted for better visibility, so feel free to edit it and add any additional details you consider important.
CronJob resource in kubernetes is not meant to be used one-off tasks, that are run on demand. It is rather configured to run on a regular schedule.
Manuel Polacek has already mentioned that in his comment:
For this scenario you don't need a cron job. A simple bare pod or a
job would be enough, i would say. You can apply a resource on button
push, for example with kubectl – Manuel Polacek Apr 24 at 19:25
So rather than trying to find a way to run your CronJobs on demand, regardless of how they are originally scheduled (usually to be repeated at regular intervals), you should copy the code of such CronJob and find a different way of running it. A Job fits ideally to such use case as it is designed to run one-off tasks.
We have an issue where currently we have a single release running and one release queued. Normally this would be ok, but we run self-hosted agents and have currently 9 agents sitting idle to run the release. I keep seeing parallel jobs, but I don't know if that solves this, as this feels more like something that I have misconfigured.
Since this post we have purchased 10 parallel jobs for the self-hosted pool and still are seeing this issue.
In your current situation, we recommend you can try to check your Deployment queue settings, this setting is used to configure actions when multiple releases are queued for deployment. You can select the 'Unlimited':
I have a reporting application that uses Celery to process thousands of jobs per day. There is a python module per each report type that encapsulates all job steps. Jobs take customer-specific parameters and typically complete within a few minutes. Currently, jobs are triggered by customers on-demand when they create a new report or request a refresh of an existing one.
Now, I would like to add scheduling, so the jobs run daily, and reports get refreshed automatically. I understand that Airflow shines at task orchestration and scheduling. I also like the idea of expressing my jobs as DAGs and getting the benefit of task retries. I can see how I can use Airflow to run scheduled batch-processing jobs, but I am unsure about my use case.
If I express my jobs as Airflow DAGs, I will still need to run them parametrized for each customer. It means, if the customer creates a new report, I will need to have a way to trigger a DAG with the customer-specific configuration. And with a scheduled execution, I will need to enumerate all customers and create a parametrized (sub-)DAG for each of them. My understanding this should be possible since Airflow supports DAGs created dynamically, however, I am not sure if this is an efficient and correct way to use Airflow.
I wonder if anyway considered using Airflow for a scenario similar to mine.
Celery workflows do literally the same, and you can create and run them at any point of time. Also, Celery has a pretty good scheduler (I have never seen it failing in 5 years of using Celery) - Celery Beat.
Sure, Airflow can be used to do what you need without any problems.
You can use Airflow to create DAGs dynamically, I am not sure if this will work with a scale of 1000 of DAGs though. There are some good examples on astronomer.io on Dynamically Generating DAGs in Airflow.
I have some DAGs and task that are dynamically generated by a yaml configuration with different schedules and configurations. It all works without any issue.
Only thing that might be challenging is the "jobs are triggered by customers on-demand" - I guess you could trigger any DAG with Airflow's REST API, but it's still in a experimental state.
I am working to migrate from Quartz 1.6 to 2.1 and use a JDBCJobStore. Previously, the the jobs were loaded via an xml file when the webapp started. The scheduler is now running using the JDBCJobStore but I don't understand how to add the jobs to the database which need to run on an ongoing basis (not one-off jobs).
My first thought is to create a servlet which runs on startup which adds the jobs to the database. But my concern is that this will be executed every time I need to restart the app and the jobs will get duplicated.
Thanks,
steve
The Jobs wont disappear from the database when you do a restart. So within your servlet, when it starts up before adding any jobs check to see if they already exist. When you create your jobs you can give them identities. Using the identities and some quartz methods you check if they already exist.
It sounds like the memory based scheduler is a better fit for these fixed jobs. You can create more than one scheduler, one memory, one JDBC if that makes sense for your application.