Can Spring Batch use no-sql database to store batch metadata? - nosql

Can Spring Batch use no-sql database (e.g. firestore, mongodb, etc.) to store batch metadata? If yes, can you share sample?

It can, but it does not provide an implementation yet. We have a feature request for that here: https://github.com/spring-projects/spring-batch/issues/877.
That said, it is a matter of implementing a single interface: JobRepository. Otherwise, you can implement the 4 DAOs required by Spring Batch using your NoSQL database (JobInstanceDao, JobExecutionDao, StepExecutionDao, ExecutionContextDao) and use them with the provided SimpleJobRepository.

Related

Can persistent databases and in-memory database work together?

My requirement is to utilize 2 different database sources(persistent) and serve to frontend. To make api response faster can I utilize in-memory database like H2 or gemfire, to store data that is known to be frequently accessed (like a cron job at some time does that) and for other calls go to the databases. Here, the challenge for me is transferring the data from persistent to in-memory as Spring needs same 2 POJO's with different annotation(For e.g #Document for mongo, for h2,gemfire #Entity). as of now it does not make sense manually go through each record from an array received from mongo and save it in in-mem.
You can utilize the Spring Framework's Cache Abstraction and different forms of caching patterns when using Apache Geode (or alternatively, VMware Tanzu GemFire). See a more general description about caching starting here.
NOTE: VMware Tanzu GemFire, up to 9.15, is built on Apache Geode and are virtually interchangeable by swapping the bits.
I also provide many Samples of the different caching patterns in action, Guide and Source included, which are reference in the relevant section in the reference documentation.
Finally, in the Spring Boot for Apache Geode project, I also have 2 test classes testing the Inline Caching Pattern, which you may refer to:
The first test class uses Apache Geode as a caching provider in Spring's Cache Abstraction to cache data from a database, HSQLDB (predecessor of H2) in this case.
The second test class is similar in that Apache Geode caches data from Apache Cassandra.
The primary, backend data store used in Inline Caching is made irrelevant since it uses the Spring Data Repository abstraction to interface with any data store (referred to as "modules" in the Spring Data portfolio; see here) supported by the SD Repository abstraction.
Caching is but 1 technique to keep relevant data in-memory in order to improve response times in a Spring (Boot) application using a backing data store, such as an RDBMS, as the primary System of Record (SOR). Of course, there are other approaches, too.
Apache Geode can even be configured for durability (persistence) and redundancy. In some cases, it might even completely replace your database for some function. So, you have a lot of options.
Hopefully this will get you started and give you more ideas.

Spring batch with MongoDB and transactions

I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored. The relation DB still uses DataSourceTransactionManager.
However I dont think the Mongo writes are done within an active transaction with rollbacks. Here is the excerpt from the official Spring Batch documentation on MongoItemWriter:
A ItemWriter implementation that writes to a MongoDB store using an implementation of Spring Data's MongoOperations. Since MongoDB is not a transactional store, a best effort is made to persist written data at the last moment, yet still honor job status contracts. No attempt to roll back is made if an error occurs during writing.
However this is not the case any more; MongoDB introduced ACID transactions in version 4.
How do I go about adding transactions to my writes? I could use #Transactional on my service methods when I use ItemWriterAdapter. But still dont know what to do with MongoItemWriter... What is the right configuration here? Thank you.
I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored.
I invite you to take a look at the following posts to understand the implications of this design choice:
How to java-configure separate datasources for spring batch data and business data? Should I even do it?
How does Spring Batch transaction management work?
In your case, you have a distributed transaction across two data sources:
SQL datasource for the job repository, which is managed by a DataSourceTransactionManager
MongoDB for your step (using the MongoItemWriter), which is managed by a MongoTransactionManager
If you want technical meta-data and business data to be committed/rolled back in the scope of the same distributed transaction, you need to use a JtaTransactionManager that coordinates the DataSourceTransactionManager and MongoTransactionManager. You can find some resources about the matter here: https://stackoverflow.com/a/56547839/5019386.
BTW, there is a feature request to use MongoDB as a job repository in Spring Batch: https://github.com/spring-projects/spring-batch/issues/877. When this is implemented, you could store both business data and technical meta-data in the same datasource (so no need for a distributed transaction anymore) and you would be able to use the same MongoTransactionManager for both the job repository and your step.

Benefits of using QueryDSL vs. Spring Data?

Im thinking about using QueryDSL in my project where I am already using Spring Data. I am programming a microservice inclidung a REST-Interface.
What are the main differences between Spring Data and QueryDSL? What are the big benefits of using QueryDSL instead of Spring Data?
Querydsl and Spring Data go along well. While both deal with the domain of persistence they have very different goals.
Querydsl provides a type-safe query API.
Spring Data provides a consistent API to accessing persistent stores, inspired by the ideas of Domain Driven Design, without getting in the way of the user and how she wants to formulate queries.
Therefore there exists an extension point to combine Spring Data and Querydsl and you can always implement non-standard queries using Querydsl if they go beyond, what can be easily formulated using the build in Spring Data repositories.

Insert data using Mongodb Rest Interface

i am using mongodb rest interface to get data from server.
but anyone have idea how to insert data to particular collection using Rest interface.
If you are using the built-in REST-interface then there is no support for inserting new documents. It's stated in their documentation.
The mongod process includes a simple REST interface, with no support for insert/update/remove operations, as a convenience – it is generally used for monitoring/alerting scripts or administrative tasks.

Database Crawler using JPA

We have a requirement for building a database crawler. The application parses the tnsnames, connects to each database and retrieves some information like version, accounts, etc. We are trying to use JPA across the other parts of the application and to persist this data into the application's database.
So far, I only see creating an EntityManagerFactory programmatically for every database. Is there any other options?
We are using Spring, are there any benefits that Spring brings to the table in this scenario?
Thanks
JPA is clearly not the right tool for this job. JPA allows creating functional entities mapping a well-know database schema. Your tool doesn't know anything about the schemas and tables it will find. There could be 0 tables or 5000, with completely unknow names.
You need a much lower-level API to do what you want, like JDBC.
You could use JPA to store the results of your crawlings in a single schema, though.