Currently Spring batch job is running for every 20 seconds and there are 3 jobs run concurrently. So in effect there is an abrupt increase of the size of the Spring batch metadata tables below. So is there a way we can disable this? If not then how we can clean up in this table from time to time?
BATCH_JOB_INSTANCE,
BATCH_JOB_EXECUTION,
BATCH_JOB_EXECUTION_PARAMS,
and BATCH_STEP_EXECUTION
The RemoveSpringBatchHistoryTasklet can be used in a spring batch job that you can schedule to run periodically to purge the spring batch working tables.
See https://github.com/arey/spring-batch-toolkit
Related
I have a spring batch application with the master-slave pattern.
Now, this batch job runs once at a time for one client/customer.
I want to re-design the spring batch job with the master-slave pattern in such a way that it supports parallel jobs for the respective clients/customers.
Please provide me a path forward.
I have spring-batch with spring boot application to process 60-70 millions of data. Application was built for using spring batch partitioning. I need to read customer ids from a file and then read some reference data from redis and oarcle DB and apply some business logic and write to PG DB .
Application working as expected and all our system testing completed. But when we went to PT testing we see few slave steps hang at random place(not consistent with file or line number). Step_execution table version keep increment but no data process. I have tried between 50-1000 partition with 5-25 million data . Only for 1 million a with 36 partition I was able to get completed status for all slaves and partition step. What might be the reason to hang some slave steps. If I re-run the job issue is not consistent like always not the same file(slave) hangs neither same number of slaves hang.
I am currently working on Spring Boot and Spring Batch application to read 200,000 records from Database, process it and generate XML output.
I wrote single threaded Spring Batch program which uses JDBCPagingItemReader to read batch of 10K records from Database and StaxEventItemReader to generate this output. Total process is taking 30 minutes. I am wanting to enhance this program by using Spring Batch local Partitioning. Could anyone share Java configuration code to do this task of Spring Batch partitioning which will split processing into multi thread + multi files.. I tried to multi thread java configuration but StaxEventItemReader is single thread so it didn't work. Only way I see is Partition.
Appreciate help.
You are correct that partitioning is the way to approach this problem. I don't have an example of JDBC to XML of how to configure a partitioned batch job, but I do have one that is CSV to JDBC in which you should be able to just replace the ItemReader and ItemWriter with the ones you need (JdbcPagingItemReader and StaxEventItemWriter respectively). This example actually uses Spring Cloud Task to launch the workers as remote processes, but if you replace the partitionHandler with the TaskExecutorPartitionHandler (instead of the DeployerPartitionHandler as configured), that would execute the partitions internally as threads.
https://github.com/mminella/S3JDBC
I would like to use Hypersonic in memory DB to persist jobs as I need to run the same job multiple times on separate threads. The SimpleJobRepository can not be used as I run into Optimistic locking issue.
Does any one has a sample java configuration file and how to wire the hypersonic for spring batch job in java with annotation?
I am running the spring batch job in three machines. For example the database has 30 records, the batch job in each machine has to pick up unique 10 records and process it.
I read partitioning and Parallel processing and bit confused, which one is suitable?
Appreciate your help.
What you are describing is partitioning. Partitioning is when the input is broken up into partitions and each partition is processed in parallel. Spring Batch offers two different ways to execute partitioning, one is local using threads (via the TaskExecutorPartitionHandler). The other one is distributing the partitions via messages so they can be executed either locally or remotely via the MessageChannelPartitionHandler found in Spring Batch Admin's spring-batch-integration project. You can learn more about remote partitioning via my talk on multi-jvm batch processing here: http://www.youtube.com/watch?v=CYTj5YT7CZU