What is the current recommended approach to manage/stop a spring-batch job? - spring-batch

We have some spring-batch jobs are triggered by autosys with shell scripts as short lived processes.
Right now there's no way to view what is going on in the spring-batch process so I was exploring ways to view the status & manage(stop) the jobs.
Spring Cloud Data Flow is one of the options that I was exploring - but it seems that may not work when jobs are scheduled with Autosys.
What are the other options that I can explore in this regard and what is the recommended approach to manage spring-batch jobs now?

To stop a job, you first need to get the ID of the job execution to stop. This can be done using the JobExplorer API that allows you to explore meta-data that Spring Batch is aware of in the job repository. Once you get the job execution ID, you can stop it by calling the JobOperator#stop method, please refer to the Stopping a job section of the reference documentation.
This is independent of any method you used to launch the job (either manually, or via a scheduler or a graphical tool) and allows you to gracefully stop a job and leave the repository in a consistent state (ready for a restart if needed).

Related

Spring Cloud Data Flow UI

We have a Spring Batch Application that is triggered by a Task Command Line Runner that is periodically triggered. We are looking for a UI to view the Job Execution status, can we use the Spring Cloud Data Flow UI dependency and get the UI view capability of these Job Executions?
You cannot just use the SCDF GUI outside on your own without SCDF — they are tightly coupled.
When Task/batch-job are launched from SCDF, the task/job executions are automatically tracked in the common datasource; likewise, the SCDF GUI will show task and batch-job details automatically, as well [see task executions / job executions].
Whether using a scheduler or manually launching the jobs, as far as the launch from both approaches goes through SCDF, everything should just work.

Batch Processing on Kubernetes

Anyone here have experience about batch processing (e.g. spring batch) on kubernetes ? Is it good idea ? How to prevent batch processing process same data if we use kubernetes auto scaling feature ? Thank you.
Anyone here have experience about batch processing (e.g. spring batch) on kubernetes ? Is it good idea ?
For Spring Batch, we (the Spring Batch team) do have some experience on the matter which we share in the following talks:
Cloud Native Batch Processing on Kubernetes, by Michael Minella
Spring Batch on Kubernetes, by me.
Running batch jobs on kubernetes can be tricky:
pods may be re-scheduled by k8s on different nodes in the middle of processing
cron jobs might be triggered twice
etc
This requires additional non-trivial work on the developer's side to make sure the batch application is fault-tolerant (resilient to node failure, pod re-scheduling, etc) and safe against duplicate job execution in a clustered environment.
Spring Batch takes care of this additional work for you and can be a good choice to run batch workloads on k8s for several reasons:
Cost efficiency: Spring Batch jobs maintain their state in an external database, which makes it possible to restart them from the last save point in case of job/node failure or pod re-scheduling
Robustness: Safe against duplicate job executions thanks to a centralized job repository
Fault-tolerance: Retry/Skip failed items in case of transient errors like a call to a web service that might be temporarily down or being re-scheduled in a cloud environment
I wrote a blog post in which I explain all these aspects in details with code examples. You can find it here: Spring Batch on Kubernetes: Efficient batch processing at scale
How to prevent batch processing process same data if we use kubernetes auto scaling feature ?
Making each job process a different data set is the way to go (a job per file for example). But there are different patterns that you might be interested in, see Job Patterns from k8s docs.

Spring batch jobOperator - how are multiple concurrent instances of a job from the same XML file controlled?

When we run multiple concurrent jobs with different parameters, how can we control (stop, restart) the appropriate jobs? Our internal code provides the jobExecution object, but under the covers The jobOperator uses the job name to get the job instance.
In our case all of the jobs are from "do-stuff.xml" (okay, it's sanitized and not very original). After looking at the spring-batch source code, our concern is that if there is more then one job running and we stop a job it will take the most recently submitted job and stop it.
The JobOperator will allow you to fetch all running executions of the job using getRunningExecutions(String jobName). You should be able to iterate over that list to find the one you want. Then, just call stop(long executionId) on the one you want.
Alternatively, we've also implemented listeners (both at step and chunk level) to check an outage status table. When we want to implement a system-wide outage, we add the outage there and have our listener throw an exception to bring our jobs down. once the outage is lifted, all "failed" executions may be restarted.

spring batch job scheduling best practice

We have bunch of spring batch jobs and we need to invoke them in specific order. Is there any best practice we should follow? I was thinking of using autosys or cron scheduler based on status of each job and decide whether to invoke next one or not but open to other suggestions.
The approach sounds right, though it's harder to build something like this in cron. A scheduler tool like autosys or control-m provide the orchestration feature usually out of the box.
I have used CRON to schedule the spring batch jobs . I nearly had to schedule around 3 main jobs and 6 jobs in all of them.
I had a same scenario where the next job is dependent on the first.
In that case you can use spring batch tables to check if the previous job is Completed or not using spring batch tables.
You will find the batch tables details here - http://docs.spring.io/spring-batch/reference/html/metaDataSchema.html
The tables are -
BATCH_JOB_INSTANCE
BATCH_JOB_EXECUTION
BATCH_EXECUTION_CONTEXT
BATCH_STEP_EXECUTION
BATCH_STEP_EXECUTION
and it will be easy from CRON to schedule the jobs for you .But some how managing jobs in CRON is quite a pain.
TO use a scheduler tool you need to configure it and it will consume a good time. But once the scheduler tool is up , then it is easy to schedule and manage jobs.
In most of the cases - scheduling is one time activity. So i guess it is better not to waist time for scheduler tool , go for CRON instead.

how to load/boostrap ongoing jobs with a quartz jdbcStore

I am working to migrate from Quartz 1.6 to 2.1 and use a JDBCJobStore. Previously, the the jobs were loaded via an xml file when the webapp started. The scheduler is now running using the JDBCJobStore but I don't understand how to add the jobs to the database which need to run on an ongoing basis (not one-off jobs).
My first thought is to create a servlet which runs on startup which adds the jobs to the database. But my concern is that this will be executed every time I need to restart the app and the jobs will get duplicated.
Thanks,
steve
The Jobs wont disappear from the database when you do a restart. So within your servlet, when it starts up before adding any jobs check to see if they already exist. When you create your jobs you can give them identities. Using the identities and some quartz methods you check if they already exist.
It sounds like the memory based scheduler is a better fit for these fixed jobs. You can create more than one scheduler, one memory, one JDBC if that makes sense for your application.