I've used Spring Batch with MySQL before and the availability of Spring Batch Admin makes the starting, stopping, restarting of Jobs a lot easier. But my current company is considering to move to MongoDB from Derby database for obvious NoSQL DB benefits and also wants to move their existing messy batch application solutions to use Spring Batch framework. They also would like to use Spring Batch's Admin for managing the jobs.
Question:
What are the tradeoffs that we will have to make for using Spring Batch with MongoDB then Spring Batch with MySQL?
After doing a bit of research I've gathered the following trade-offs for using MongoDB with Spring batch
Since MongoDB does not support transactions, Spring Batch Admin will not work since the Admin requires the meta-data schema which is not available for MongoDB.
We will not be able to Stop, Start & restart jobs.
If a Step's writer tries committing 20 documents and commit for 1 document fails, the other 19 documents will not be rolled back automatically and will have to be managed by the system.
Can you please tell me if I am right with the above and if there are any other that I have not mentioned already.
Related
I am using spring batch and and my database is Couchbase. Is there any way to read documents from Couchbase in batch or in bulk mode?
Have a look at the Spring batch extensions project
https://github.com/spring-projects/spring-batch-extensions/pull/5/commits
I need to read my mongo DB table data periodically and publish it into a Kafka topic using spring boot. I have created a collection in Mongo DB and inserted a few records in Mongo DB. Further, I want to read the data from Mongo DB periodically and need to publish those table data in Kafka's topic using spring boot. I'm very new to spring batch scheduler. Can you please suggest me an idea to achieve this?
Thanks in advance.
What you are talking about is more relevant to Spring Integration: https://spring.io/projects/spring-integration#overview
So, you configure a MongoDbMessageSource with a Poller to read collection periodically.
And then you have service-activator based on the KafkaProducerMessageHandler to damp data into a Kafka topic.
See more in docs:
https://docs.spring.io/spring-integration/docs/5.3.2.RELEASE/reference/html/mongodb.html#mongodb
https://docs.spring.io/spring-integration/docs/5.4.0-M3/reference/html/kafka.html#kafka
Not sure though how to do that with Spring Batch...
I went through the Introducing Spring Cloud Task, but things are not clear for the following questions.
I'm using Spring Batch
What's the use of Spring Cloud Task when we already have the metadata provided by Spring Batch ?
We're planning to use Spring Cloud Data Flow to monitor the Spring Batch. All the batch jobs can be imported into the SCDF as task and can be scheduled there, but don't see support for MongoDB. Hope MySQL works well.
What is the difference between Spring Cloud Task and Spring Batch?
Spring Cloud Task has a broader scope than Spring Batch. It is designed for any short lived task, including but not limited to (Spring) Batch jobs. A short lived task could be a Java process, a shell script, a Docker container, etc. Spring Cloud Task has its own meta-data tables to track the progress/status/stats of tasks.
In the context of Spring Batch, Spring Cloud Task provides a number of additional features:
Batch informational messages: ability to emit messages based on Spring Batch listeners events. Those messages can be consumed by streaming apps and make it possible to bridge tasks and streaming apps.
DeployerPartitionHandler: an additional partition handler that is suitable to cloud environments to dynamically deploy workers in a remote partitioning setup.
I am currently working on Spring Boot and Spring Batch application to read 200,000 records from Database, process it and generate XML output.
I wrote single threaded Spring Batch program which uses JDBCPagingItemReader to read batch of 10K records from Database and StaxEventItemReader to generate this output. Total process is taking 30 minutes. I am wanting to enhance this program by using Spring Batch local Partitioning. Could anyone share Java configuration code to do this task of Spring Batch partitioning which will split processing into multi thread + multi files.. I tried to multi thread java configuration but StaxEventItemReader is single thread so it didn't work. Only way I see is Partition.
Appreciate help.
You are correct that partitioning is the way to approach this problem. I don't have an example of JDBC to XML of how to configure a partitioned batch job, but I do have one that is CSV to JDBC in which you should be able to just replace the ItemReader and ItemWriter with the ones you need (JdbcPagingItemReader and StaxEventItemWriter respectively). This example actually uses Spring Cloud Task to launch the workers as remote processes, but if you replace the partitionHandler with the TaskExecutorPartitionHandler (instead of the DeployerPartitionHandler as configured), that would execute the partitions internally as threads.
https://github.com/mminella/S3JDBC
In the job I read from a file and store something in a database.
I would like to have many running jars of the batch job in different processes and partition the data from the file among the running instances.
I would also like to be able to keep adding files to be processed and also distribute the reads from those.
I read spring xd might be a good fit, but can't find good tutorials on it.
YES I am also a noob of spring batch and xd.
The first thing to understand is how to remotely partition batch jobs. See the batch documentation for Spring Batch Integration and its support for remote partitioning, based on basic batch partitioning.
Spring XD provides out-of-the-box support for single-step partitioned work-loads.
You just have to import singlestep-partition-support.xml and provide partitioner and tasklet beans. See the XD Documentation for an example.