We have bunch of spring batch jobs and we need to invoke them in specific order. Is there any best practice we should follow? I was thinking of using autosys or cron scheduler based on status of each job and decide whether to invoke next one or not but open to other suggestions.
The approach sounds right, though it's harder to build something like this in cron. A scheduler tool like autosys or control-m provide the orchestration feature usually out of the box.
I have used CRON to schedule the spring batch jobs . I nearly had to schedule around 3 main jobs and 6 jobs in all of them.
I had a same scenario where the next job is dependent on the first.
In that case you can use spring batch tables to check if the previous job is Completed or not using spring batch tables.
You will find the batch tables details here - http://docs.spring.io/spring-batch/reference/html/metaDataSchema.html
The tables are -
BATCH_JOB_INSTANCE
BATCH_JOB_EXECUTION
BATCH_EXECUTION_CONTEXT
BATCH_STEP_EXECUTION
BATCH_STEP_EXECUTION
and it will be easy from CRON to schedule the jobs for you .But some how managing jobs in CRON is quite a pain.
TO use a scheduler tool you need to configure it and it will consume a good time. But once the scheduler tool is up , then it is easy to schedule and manage jobs.
In most of the cases - scheduling is one time activity. So i guess it is better not to waist time for scheduler tool , go for CRON instead.
Related
We have some spring-batch jobs are triggered by autosys with shell scripts as short lived processes.
Right now there's no way to view what is going on in the spring-batch process so I was exploring ways to view the status & manage(stop) the jobs.
Spring Cloud Data Flow is one of the options that I was exploring - but it seems that may not work when jobs are scheduled with Autosys.
What are the other options that I can explore in this regard and what is the recommended approach to manage spring-batch jobs now?
To stop a job, you first need to get the ID of the job execution to stop. This can be done using the JobExplorer API that allows you to explore meta-data that Spring Batch is aware of in the job repository. Once you get the job execution ID, you can stop it by calling the JobOperator#stop method, please refer to the Stopping a job section of the reference documentation.
This is independent of any method you used to launch the job (either manually, or via a scheduler or a graphical tool) and allows you to gracefully stop a job and leave the repository in a consistent state (ready for a restart if needed).
We are trying to implement a few batch jobs using Spring Batch. Our application is batch heavy, currently, we have jobs in shell scripts. Now, we are trying to move to spring batch. We are looking for a scheduler with monitors.
We are evaluating various schedulers like Spring Cloud Data Flow, Airflow, Argo
We are checking the feasibility of these jobs running on both Kubernetes/OpenShift
We are not sure which one is good, can someone suggest which we would go for?
Things we expect:
List batch jobs in which stage along with logs
Monitor jobs (Dynatrace/Prometheus)
Complex cron jobs scheduling (more flexibility similar to that of unix cron jobs)
I have a reporting application that uses Celery to process thousands of jobs per day. There is a python module per each report type that encapsulates all job steps. Jobs take customer-specific parameters and typically complete within a few minutes. Currently, jobs are triggered by customers on-demand when they create a new report or request a refresh of an existing one.
Now, I would like to add scheduling, so the jobs run daily, and reports get refreshed automatically. I understand that Airflow shines at task orchestration and scheduling. I also like the idea of expressing my jobs as DAGs and getting the benefit of task retries. I can see how I can use Airflow to run scheduled batch-processing jobs, but I am unsure about my use case.
If I express my jobs as Airflow DAGs, I will still need to run them parametrized for each customer. It means, if the customer creates a new report, I will need to have a way to trigger a DAG with the customer-specific configuration. And with a scheduled execution, I will need to enumerate all customers and create a parametrized (sub-)DAG for each of them. My understanding this should be possible since Airflow supports DAGs created dynamically, however, I am not sure if this is an efficient and correct way to use Airflow.
I wonder if anyway considered using Airflow for a scenario similar to mine.
Celery workflows do literally the same, and you can create and run them at any point of time. Also, Celery has a pretty good scheduler (I have never seen it failing in 5 years of using Celery) - Celery Beat.
Sure, Airflow can be used to do what you need without any problems.
You can use Airflow to create DAGs dynamically, I am not sure if this will work with a scale of 1000 of DAGs though. There are some good examples on astronomer.io on Dynamically Generating DAGs in Airflow.
I have some DAGs and task that are dynamically generated by a yaml configuration with different schedules and configurations. It all works without any issue.
Only thing that might be challenging is the "jobs are triggered by customers on-demand" - I guess you could trigger any DAG with Airflow's REST API, but it's still in a experimental state.
I have had experience working with Spring Batch a few months but I have got a doubt a few days ago. I have to process a file and then update a database from it but this is not a scheduled batch process because it has to be executed just once.
Is Spring batch recommended to execute not scheduled processes like this one? Or the fact that is not scheduled has nothing to do with using Spring batch or not
Thanks
Is Spring batch recommended to execute not scheduled processes like this one? Or the fact that is not scheduled has nothing to do with using Spring batch or not
Yes, the fact that your job has to be executed only once has nothing to do with using Spring Batch or not. There is a difference between developing the job (using Spring Batch or not) and scheduling the job (using cron, quartz, etc).
For your use case (process a file and then update a database), I would recommend using Spring Batch to develop your job. Then, you can choose to run it:
only once or on demand (Spring Batch provides APIs to run the job)
or schedule it to run repeatedly using your favourite scheduler
I am working to migrate from Quartz 1.6 to 2.1 and use a JDBCJobStore. Previously, the the jobs were loaded via an xml file when the webapp started. The scheduler is now running using the JDBCJobStore but I don't understand how to add the jobs to the database which need to run on an ongoing basis (not one-off jobs).
My first thought is to create a servlet which runs on startup which adds the jobs to the database. But my concern is that this will be executed every time I need to restart the app and the jobs will get duplicated.
Thanks,
steve
The Jobs wont disappear from the database when you do a restart. So within your servlet, when it starts up before adding any jobs check to see if they already exist. When you create your jobs you can give them identities. Using the identities and some quartz methods you check if they already exist.
It sounds like the memory based scheduler is a better fit for these fixed jobs. You can create more than one scheduler, one memory, one JDBC if that makes sense for your application.