Could I use stored persistence context if DB temporarily shut down? - jpa

I have to implement some kind of cache, or temporal storage, that matches the following condition :
This cache stores 4 tables from a specific (maria) DB. len(column) < 50, len(row) < 1000
This cache runs innate when a spring web server turned on. When server turned on, it immediately crawls data from DB, stores them into cache, to minimize direct DB querying.
The spring web server fetches data from cache when received HTTP.get resquest.
The spring web server updates DB column when received HTTP.post, HTTP.delete, HTTP.put, by updating the data in cache, and paste them into tables consequently.
The spring web server must not invoke exception when DB suddenly shuts down, and lose connection. It must handle HTTP requests by cached data, delaying any direct connect logic to DB, and synchronize DB data when DB restores.
I'm not familiar to JPA, but it seems that Spring JPA itself supports the former 4 conditions, by using EntityManager and Persistence context.
However, I cannot find any information that makes this context tolerant. I cannot find any option that makes whole JPA structure Check DB connection alive, and only update after checking returns true.
Since I'm requested to use JPA as far as I could, I want to find out whether using JPA to match the conditions above is possible or not.
Thanks for any Information provided.

Could I use stored persistence context if DB temporarily shut down?
No, not really.
The persistence context gets flushed on every commit, which you don't want, because you want to serve queries from the cache.
Also it doesn't serve the result of any kind of query, it just serves entities.
And most importantly: when a flush event happens and the database is not available you will get an exception.

Related

Spring batch with MongoDB and transactions

I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored. The relation DB still uses DataSourceTransactionManager.
However I dont think the Mongo writes are done within an active transaction with rollbacks. Here is the excerpt from the official Spring Batch documentation on MongoItemWriter:
A ItemWriter implementation that writes to a MongoDB store using an implementation of Spring Data's MongoOperations. Since MongoDB is not a transactional store, a best effort is made to persist written data at the last moment, yet still honor job status contracts. No attempt to roll back is made if an error occurs during writing.
However this is not the case any more; MongoDB introduced ACID transactions in version 4.
How do I go about adding transactions to my writes? I could use #Transactional on my service methods when I use ItemWriterAdapter. But still dont know what to do with MongoItemWriter... What is the right configuration here? Thank you.
I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored.
I invite you to take a look at the following posts to understand the implications of this design choice:
How to java-configure separate datasources for spring batch data and business data? Should I even do it?
How does Spring Batch transaction management work?
In your case, you have a distributed transaction across two data sources:
SQL datasource for the job repository, which is managed by a DataSourceTransactionManager
MongoDB for your step (using the MongoItemWriter), which is managed by a MongoTransactionManager
If you want technical meta-data and business data to be committed/rolled back in the scope of the same distributed transaction, you need to use a JtaTransactionManager that coordinates the DataSourceTransactionManager and MongoTransactionManager. You can find some resources about the matter here: https://stackoverflow.com/a/56547839/5019386.
BTW, there is a feature request to use MongoDB as a job repository in Spring Batch: https://github.com/spring-projects/spring-batch/issues/877. When this is implemented, you could store both business data and technical meta-data in the same datasource (so no need for a distributed transaction anymore) and you would be able to use the same MongoTransactionManager for both the job repository and your step.

Is JPA's flush and JDBC Batch works as same internally?

As per my understanding flush() method of JPA's entitymanager will sync the data available in persistence context with Database in a single DB network call. Thus it avoids multiple DB calls when somebody trying to persist large amount of records. Why can't I consider this as a batch equivalent (I know flush() may not be implemented for that purpose) of JDBC batch insert ? Because, JDBC batch insert also work with the same idea that it make only single DB call for all the statements it added to the statement object ?
From a performance point of view, both are comparable ? Are they work with the same technique ? Internally, at Database side both will generate same number of queries ?
Somebody please make me understand the difference.
entitymanager will sync the data available in persistence context with Database in a single DB network call
No, not at all. That isn't possible. A flush could possibly delete from several tables, insert in several tables, and update several tables. That can't be done in a single network call.
A flush can use batch statements to execute multiple similar inserts or updates though.

Rolling back the transaction when API call was already executed

Recently I've encountered with some problem working with microservices. My main application works with relational databases, the microservice works with Mongo DB and provides ReST API with CRUD methods for some model. CRUD methods are also implemented in the application. A call from the front-end goes to the application first, a new record is created in the relational db (only some of the fields are saved there), then the model is saved externally - in Mongo DB. In the end the transaction is committed. So if something goes wrong and the transaction is rolled back, the API call would already be executed. In the case of creation I can just delete the newly created record from Mongo DB, but in case of Edit I have no idea what to do.
One of the ideas was to overwrite the model in the Mongo DB with the record from the relational database, but in this case the data would be incosistent, as not all the fields are saved there.
Any ideas about this?
There are multiple ways of doing this:
Using a 2PC (2 phase commit): you basically request an operation in any distributed service and then confirm/rollback it afterwards.
Using sagas: With sagas, you are meant to provide some sort of "rollback operation" for operations you did in your distributed services. When you need to roll back an operation that has already been perform you call the service and indicate a rollback.
More information about sagas here: https://microservices.io/patterns/data/saga.html

Why does fetching from a DB cursor returns always the same result set with MyBatis and Spring Transaction

My setup is Postgres database which is connected via JDBC driver to a Tomcat server (which is responsible for connection pooling), which again serves this data source via JNDI to an Spring application.
In the java application I use MyBatis and MyBatis-Spring for querying the database.
Now I want to page through a table using a cursor as shown in this simple example http://www.postgresql.org/docs/9.3/static/sql-fetch.html.
Since a cursor needs to be run within a DB transaction I annotated the relevant method with #transactional annotation provided by the Spring DataSourceTransactionManager (see http://mybatis.github.io/spring/transactions.html)
This is where the crazy part starts. On runtime every FETCH FORWARD 1000 FROM CURSOR queried by MyBatis mapper does return one and the same result set. So it seems the cursor position gets rolled back on every call. So it will return the first 1000 rows of the table avery time.
Why do the following fetches do not return the next chunks of records?
I figured out that MyBatis uses a cache mechanism which isn't quite intelligent in my eyes https://mybatis.github.io/mybatis-3/configuration.html.
In fact MyBatis by default caches all queries executed during a session. Session means transaction or per connection. So on AutoCommit this is no problem. But not for use with a cursor where the fetch statement does not change within a transaction.
So once the first data from the DB has been fetched from the cursor the result was cached in memory and no following fetches were queried to the DB.
The solution is the following line in the mybatis-config.xml
<setting name="localCacheScope" value="STATEMENT"/>
So the local session will be used just for statement execution, no data will be shared between two different calls to the same SqlSession.
For me it seems like a bug since the default caching scope makes no sense for DB cursors.

Is there a way to persist HSQLDB data?

We have all of our unit tests written so that they create and populate tables in HSQL. I want the developers who use this to be able to write queries against this HSQL DB ( 1) by writing queries they can better understand the data model and the ones not as familiar with SQL can play with the data before writing the runtime statements and 2) since they don't have access to the test DB/security reasons). Is there a way to persist the results of the test data so that it may be examine and analyzed with a an sql client?
Right now I am jury rigging it by switching the data source to a different DB (like DB2/mysql, then connecting to that DB on my machine so I can play with persistant data), however it would be easier for me if HSQL supports persisting this than to explain how to do this to every new developer.
Just to be clear, I need an SQL client to interact with persistent data, so debugging and checking memory won't be clean. This has more to do with initial development and not debugging/maintenance/testing.
If you use an HSQLDB Server instance for your tests, the data will survive the test run.
If the server uses a jdbc:hsqldb:mem:aname (all-in-memory) url for its database, then the data will be available while the server is running. Alternatively the server can use a jdbc:hsqldb:file:filepath url and the data is persisted to files.
The latest HSQLDB docs explain the different options. Most of the observations also apply to older (1.8.x) versions. However, the latest version 2.0.1 supports starting a server and creating databases dynamically upon the first connection, which can simplify testing a lot.
http://hsqldb.org/doc/2.0/guide/deployment-chapt.html#N13C3D