In Spring batch (see https://docs.spring.io/spring-batch/3.0.x/reference/html/metaDataSchema.html) there are six tables:
BATCH_JOB_EXECUTION
BATCH_JOB_EXECUTION_CONTEXT
BATCH_JOB_EXECUTION_PARAMS
BATCH_JOB_INSTANCE
BATCH_STEP_EXECUTION
BATCH_STEP_EXECUTION_CONTEXT
Are there no more tables for things like Job Registry, Job Repository, Job Explorer etc?
Thanks
Job Resgistry, Job Explorer, Job Repository are interfaces you can use in querying the Spring Batch tables not actual tables.
See
https://docs.spring.io/spring-batch/3.0.x/reference/html/configureJob.html
look at section 4.6.1 Querying the Repository
Related
Currently Spring batch job is running for every 20 seconds and there are 3 jobs run concurrently. So in effect there is an abrupt increase of the size of the Spring batch metadata tables below. So is there a way we can disable this? If not then how we can clean up in this table from time to time?
BATCH_JOB_INSTANCE,
BATCH_JOB_EXECUTION,
BATCH_JOB_EXECUTION_PARAMS,
and BATCH_STEP_EXECUTION
The RemoveSpringBatchHistoryTasklet can be used in a spring batch job that you can schedule to run periodically to purge the spring batch working tables.
See https://github.com/arey/spring-batch-toolkit
I am currently working on Spring Boot and Spring Batch application to read 200,000 records from Database, process it and generate XML output.
I wrote single threaded Spring Batch program which uses JDBCPagingItemReader to read batch of 10K records from Database and StaxEventItemReader to generate this output. Total process is taking 30 minutes. I am wanting to enhance this program by using Spring Batch local Partitioning. Could anyone share Java configuration code to do this task of Spring Batch partitioning which will split processing into multi thread + multi files.. I tried to multi thread java configuration but StaxEventItemReader is single thread so it didn't work. Only way I see is Partition.
Appreciate help.
You are correct that partitioning is the way to approach this problem. I don't have an example of JDBC to XML of how to configure a partitioned batch job, but I do have one that is CSV to JDBC in which you should be able to just replace the ItemReader and ItemWriter with the ones you need (JdbcPagingItemReader and StaxEventItemWriter respectively). This example actually uses Spring Cloud Task to launch the workers as remote processes, but if you replace the partitionHandler with the TaskExecutorPartitionHandler (instead of the DeployerPartitionHandler as configured), that would execute the partitions internally as threads.
https://github.com/mminella/S3JDBC
In my spring batch project i am reading data from db and write data to db, when i checked the dynatrace after job completed, i am not able to see the db select and insert queries in dynatrace.
Is any configuration need to do in spring batch to get these queries in dynatrace ?
Thanks!
In the job I read from a file and store something in a database.
I would like to have many running jars of the batch job in different processes and partition the data from the file among the running instances.
I would also like to be able to keep adding files to be processed and also distribute the reads from those.
I read spring xd might be a good fit, but can't find good tutorials on it.
YES I am also a noob of spring batch and xd.
The first thing to understand is how to remotely partition batch jobs. See the batch documentation for Spring Batch Integration and its support for remote partitioning, based on basic batch partitioning.
Spring XD provides out-of-the-box support for single-step partitioned work-loads.
You just have to import singlestep-partition-support.xml and provide partitioner and tasklet beans. See the XD Documentation for an example.
I've used Spring Batch with MySQL before and the availability of Spring Batch Admin makes the starting, stopping, restarting of Jobs a lot easier. But my current company is considering to move to MongoDB from Derby database for obvious NoSQL DB benefits and also wants to move their existing messy batch application solutions to use Spring Batch framework. They also would like to use Spring Batch's Admin for managing the jobs.
Question:
What are the tradeoffs that we will have to make for using Spring Batch with MongoDB then Spring Batch with MySQL?
After doing a bit of research I've gathered the following trade-offs for using MongoDB with Spring batch
Since MongoDB does not support transactions, Spring Batch Admin will not work since the Admin requires the meta-data schema which is not available for MongoDB.
We will not be able to Stop, Start & restart jobs.
If a Step's writer tries committing 20 documents and commit for 1 document fails, the other 19 documents will not be rolled back automatically and will have to be managed by the system.
Can you please tell me if I am right with the above and if there are any other that I have not mentioned already.