What's the bottleneck in Spring r2dbc database connection? - spring-data

I've set up a sample project using spring boot, webflux, and r2dbc. I've been able to stream rows from a postgres db table to the client.
Is there a memory bottleneck on this server implementation (for storing the results of the query)? Do the rows stream through?
PS I'm not claiming any level of quality on this, I know pagination and so on would be essential, just wondering about how the db query interacts with the reactive framework.

Pagination is not essential with R2DBC. If you have a lot of rows to process you can issue a single query instead of fetching batches. The driver uses back-pressure to allow flow control so it does not overwhelm your application. You could read here about how backpressure is applied on such queries.


Could I use stored persistence context if DB temporarily shut down?

I have to implement some kind of cache, or temporal storage, that matches the following condition :
This cache stores 4 tables from a specific (maria) DB. len(column) < 50, len(row) < 1000
This cache runs innate when a spring web server turned on. When server turned on, it immediately crawls data from DB, stores them into cache, to minimize direct DB querying.
The spring web server fetches data from cache when received HTTP.get resquest.
The spring web server updates DB column when received HTTP.post, HTTP.delete, HTTP.put, by updating the data in cache, and paste them into tables consequently.
The spring web server must not invoke exception when DB suddenly shuts down, and lose connection. It must handle HTTP requests by cached data, delaying any direct connect logic to DB, and synchronize DB data when DB restores.
I'm not familiar to JPA, but it seems that Spring JPA itself supports the former 4 conditions, by using EntityManager and Persistence context.
However, I cannot find any information that makes this context tolerant. I cannot find any option that makes whole JPA structure Check DB connection alive, and only update after checking returns true.
Since I'm requested to use JPA as far as I could, I want to find out whether using JPA to match the conditions above is possible or not.
Thanks for any Information provided.
Could I use stored persistence context if DB temporarily shut down?
No, not really.
The persistence context gets flushed on every commit, which you don't want, because you want to serve queries from the cache.
Also it doesn't serve the result of any kind of query, it just serves entities.
And most importantly: when a flush event happens and the database is not available you will get an exception.

Can persistent databases and in-memory database work together?

My requirement is to utilize 2 different database sources(persistent) and serve to frontend. To make api response faster can I utilize in-memory database like H2 or gemfire, to store data that is known to be frequently accessed (like a cron job at some time does that) and for other calls go to the databases. Here, the challenge for me is transferring the data from persistent to in-memory as Spring needs same 2 POJO's with different annotation(For e.g #Document for mongo, for h2,gemfire #Entity). as of now it does not make sense manually go through each record from an array received from mongo and save it in in-mem.
You can utilize the Spring Framework's Cache Abstraction and different forms of caching patterns when using Apache Geode (or alternatively, VMware Tanzu GemFire). See a more general description about caching starting here.
NOTE: VMware Tanzu GemFire, up to 9.15, is built on Apache Geode and are virtually interchangeable by swapping the bits.
I also provide many Samples of the different caching patterns in action, Guide and Source included, which are reference in the relevant section in the reference documentation.
Finally, in the Spring Boot for Apache Geode project, I also have 2 test classes testing the Inline Caching Pattern, which you may refer to:
The first test class uses Apache Geode as a caching provider in Spring's Cache Abstraction to cache data from a database, HSQLDB (predecessor of H2) in this case.
The second test class is similar in that Apache Geode caches data from Apache Cassandra.
The primary, backend data store used in Inline Caching is made irrelevant since it uses the Spring Data Repository abstraction to interface with any data store (referred to as "modules" in the Spring Data portfolio; see here) supported by the SD Repository abstraction.
Caching is but 1 technique to keep relevant data in-memory in order to improve response times in a Spring (Boot) application using a backing data store, such as an RDBMS, as the primary System of Record (SOR). Of course, there are other approaches, too.
Apache Geode can even be configured for durability (persistence) and redundancy. In some cases, it might even completely replace your database for some function. So, you have a lot of options.
Hopefully this will get you started and give you more ideas.

Does reactive database return all data at once?

my question is pretty straightforward, when we make asynchronous call with reactive repository from Spring, do we:
Receive all data from database (MongoDB) at once and put it into reactive type Flux?
Receive data in chunks at rate that Driver is reading them from database?
My confusion here is that if database Driver is working as traditional one or is it working on Producer/Subscriber pattern that we use inside our SpringFlux application.
Thank you for your answers.
An R2DBC compliant driver is a non blocking driver that will stream items if returned as a flux.

Spring batch with MongoDB and transactions

I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored. The relation DB still uses DataSourceTransactionManager.
However I dont think the Mongo writes are done within an active transaction with rollbacks. Here is the excerpt from the official Spring Batch documentation on MongoItemWriter:
A ItemWriter implementation that writes to a MongoDB store using an implementation of Spring Data's MongoOperations. Since MongoDB is not a transactional store, a best effort is made to persist written data at the last moment, yet still honor job status contracts. No attempt to roll back is made if an error occurs during writing.
However this is not the case any more; MongoDB introduced ACID transactions in version 4.
How do I go about adding transactions to my writes? I could use #Transactional on my service methods when I use ItemWriterAdapter. But still dont know what to do with MongoItemWriter... What is the right configuration here? Thank you.
I have a Spring Batch application with two databases: one SQL DB for the Spring Batch meta data, and another which is a MongoDB where all the business data is stored.
I invite you to take a look at the following posts to understand the implications of this design choice:
How to java-configure separate datasources for spring batch data and business data? Should I even do it?
How does Spring Batch transaction management work?
In your case, you have a distributed transaction across two data sources:
SQL datasource for the job repository, which is managed by a DataSourceTransactionManager
MongoDB for your step (using the MongoItemWriter), which is managed by a MongoTransactionManager
If you want technical meta-data and business data to be committed/rolled back in the scope of the same distributed transaction, you need to use a JtaTransactionManager that coordinates the DataSourceTransactionManager and MongoTransactionManager. You can find some resources about the matter here: https://stackoverflow.com/a/56547839/5019386.
BTW, there is a feature request to use MongoDB as a job repository in Spring Batch: https://github.com/spring-projects/spring-batch/issues/877. When this is implemented, you could store both business data and technical meta-data in the same datasource (so no need for a distributed transaction anymore) and you would be able to use the same MongoTransactionManager for both the job repository and your step.

Apache Spark and MongoDB integration using pymongo

i have a problem working with apache spark and mongodb using the pymongo library. Actually, i am processing thousands of records and for each record, i need to read its corresponding data from the database, update certain info and save it back to the database. Due to the reads and writes, i choosed to use Pymongo instead of using Spark-Mongo Connector which apparently isnt well suited for this task. Unfortunately however, when performing writes, mongodb always returns write successful but when i check the database, some updates where not performed. After debugging for over a week, i realized by setting the server to a single core processor, all writes were successful and written in the database but the application has become tremendously slow.
I would like to know if anyone knows how to solve this issue. Thanks in advance