Use Spring batch for not scheduled process? - spring-batch

I have had experience working with Spring Batch a few months but I have got a doubt a few days ago. I have to process a file and then update a database from it but this is not a scheduled batch process because it has to be executed just once.
Is Spring batch recommended to execute not scheduled processes like this one? Or the fact that is not scheduled has nothing to do with using Spring batch or not
Thanks

Is Spring batch recommended to execute not scheduled processes like this one? Or the fact that is not scheduled has nothing to do with using Spring batch or not
Yes, the fact that your job has to be executed only once has nothing to do with using Spring Batch or not. There is a difference between developing the job (using Spring Batch or not) and scheduling the job (using cron, quartz, etc).
For your use case (process a file and then update a database), I would recommend using Spring Batch to develop your job. Then, you can choose to run it:
only once or on demand (Spring Batch provides APIs to run the job)
or schedule it to run repeatedly using your favourite scheduler

Related

What is the current recommended approach to manage/stop a spring-batch job?

We have some spring-batch jobs are triggered by autosys with shell scripts as short lived processes.
Right now there's no way to view what is going on in the spring-batch process so I was exploring ways to view the status & manage(stop) the jobs.
Spring Cloud Data Flow is one of the options that I was exploring - but it seems that may not work when jobs are scheduled with Autosys.
What are the other options that I can explore in this regard and what is the recommended approach to manage spring-batch jobs now?
To stop a job, you first need to get the ID of the job execution to stop. This can be done using the JobExplorer API that allows you to explore meta-data that Spring Batch is aware of in the job repository. Once you get the job execution ID, you can stop it by calling the JobOperator#stop method, please refer to the Stopping a job section of the reference documentation.
This is independent of any method you used to launch the job (either manually, or via a scheduler or a graphical tool) and allows you to gracefully stop a job and leave the repository in a consistent state (ready for a restart if needed).

Better job scheduler

We are trying to implement a few batch jobs using Spring Batch. Our application is batch heavy, currently, we have jobs in shell scripts. Now, we are trying to move to spring batch. We are looking for a scheduler with monitors.
We are evaluating various schedulers like Spring Cloud Data Flow, Airflow, Argo
We are checking the feasibility of these jobs running on both Kubernetes/OpenShift
We are not sure which one is good, can someone suggest which we would go for?
Things we expect:
List batch jobs in which stage along with logs
Monitor jobs (Dynatrace/Prometheus)
Complex cron jobs scheduling (more flexibility similar to that of unix cron jobs)

Talend Automation Job taking too much time

I had developed a Job in Talend and built the job and automated to run the Windows Batch file from the below build
On the Execution of the Job Start Windows Batch file it will invoke the dimtableinsert job and then after it finishes it will invoke fact_dim_combine it is taking just minutes to run in the Talend Open Studio but when I invoke the batch file via the Task Scheduler it is taking hours for the process to finish
Time Taken
Manual -- 5 Minutes
Automation -- 4 hours (on invoking Windows batch file)
Can someone please tell me what is wrong with this Automation Process
The reason of the delay in the execution would be a latency issue. Talend might be installed in the same server where database instance is installed. And so whenever you execute the job in Talend, it will complete as expected. But the scheduler might be installed in the other server, when you call the job through scheduler, it would take some time to insert the data.
Make sure you scheduler and database instance is on the same server
Execute the job directly in the windows terminal and check if you have same issue
The easiest way to know what is taking so much time is to add some logs to your job.
First, add some tWarn at the start and finish of each of the subjobs (dimtableinsert and fact_dim_combine) to know which one is the longest.
Then add more logs before/after the components inside the jobs.
This way you should have a better idea of what is responsible for the slowdown (DB access, writing of some files, etc ...)

Convert non-launchable job to launchable job in Spring Batch Admin

I have a Spring Batch job developed with Spring Boot (1.4.1.RELEASE).
It successfully runs from command line and writes job execution data to MySQL. It shows up as non-launchable job in Spring Batch Admin (2.0.0.M1, pointing to MySQL) and I can see job execution metrics.
Now I'd like to turn it into a launchable job so I can run it within Spring Batch Admin.
I wonder if anyone has done that before. The documentation has a section Add your Own Jobs For Launching. But it does not specify where to add the implementation jar(s) for the job?
Is it spring-batch-admin/WEB-INF/lib?
With Spring Boot, the non-launchable job is one big, all-in-one executable jar. Its dependencies overlap with Spring Batch Admin. For example, they both have spring-batch*.jar, spring*.jar but different versions.
Is there a way, like the job definition xml file, to keep them in separate contexts? Thank you.
Spring Batch Admin looks for your job definitions in src/main/resources/META-INF/spring/batch/jobs folder. You could add your job-definition.xml file in that folder and define your batch jobs in that xml.

spring batch job scheduling best practice

We have bunch of spring batch jobs and we need to invoke them in specific order. Is there any best practice we should follow? I was thinking of using autosys or cron scheduler based on status of each job and decide whether to invoke next one or not but open to other suggestions.
The approach sounds right, though it's harder to build something like this in cron. A scheduler tool like autosys or control-m provide the orchestration feature usually out of the box.
I have used CRON to schedule the spring batch jobs . I nearly had to schedule around 3 main jobs and 6 jobs in all of them.
I had a same scenario where the next job is dependent on the first.
In that case you can use spring batch tables to check if the previous job is Completed or not using spring batch tables.
You will find the batch tables details here - http://docs.spring.io/spring-batch/reference/html/metaDataSchema.html
The tables are -
BATCH_JOB_INSTANCE
BATCH_JOB_EXECUTION
BATCH_EXECUTION_CONTEXT
BATCH_STEP_EXECUTION
BATCH_STEP_EXECUTION
and it will be easy from CRON to schedule the jobs for you .But some how managing jobs in CRON is quite a pain.
TO use a scheduler tool you need to configure it and it will consume a good time. But once the scheduler tool is up , then it is easy to schedule and manage jobs.
In most of the cases - scheduling is one time activity. So i guess it is better not to waist time for scheduler tool , go for CRON instead.