Optimizing save time for bulk save in spring data jpa - spring-data-jpa

I have a use-case(most common i guess), where i have to insert 300k records, which is everyday refresh, i have used spring jpa save (using batch), currently it is taking more than 1hr to save all records.
I have used batching, but it dint helped much, database is mariadb
is there any better approach for this optimize save time.

Related

Sync Elasticsearch Postgresql on a Springboot application

I have Postgresql as my primary database and I would like to take advantage of the Elasticsearch as a search engine for my SpringBoot application.
Problem: The queries are quite complex and with millions of rows in each table, most of the search queries are timing out.
Partial solution: I utilized the materialized views concept in the Postgresql and have a job running that refreshes them every X minutes. But on systems with huge amounts of data and with other database transactions (especially writes) in progress, the views tend to take long times to refresh (about 10 minutes to refresh 5 views). I realized that the current views are at it's capacity and I cannot add more.
That's when I started exploring other options just for the search and landed on Elasticsearch and it works great with the amount of data I have. As a POC, I used the Logstash's Jdbc input plugin but then it doesn't support the DELETE operation (bummer).
From here the soft delete is the option which I cannot take because:
A) Almost all the tables in the postgresql DB are updated every few minutes and some of them have constraints on the "name" key which in this case will stay until a clean-up job runs.
B) Many tables in my Postgresql Db are referenced with CASCADE DELETE and it's not possible for me to update 220 table's Schema and JPA queries to check for the soft delete boolean.
The same question mentioned in the link above also provides PgSync that syncs the postgresql with elasticsearch periodically. However, I cannot go with that either since it has LGPL license which is forbidden in our organization.
I'm starting to wonder if anyone else encountered this strange limitation of elasticsearch and RDMS.
I'm open to other options rather than elasticsearch to solve my need. I just don't know what's the right stack to use. Any help here is much appreciated!

Is it possible to configure Hibernate for flush only but never commit ( A kind of commit simulation)

I need to migrate from an old postgreSql database with an old schema (58 tables) to a new database with a new schema (40 tables). The patterns are completely different.
It is not a simple migration (copy and paste). But rather a copy-transform-paste.
I decided to write a batch and use spring batch, spring data and jpa. So I have two dataSources and a chainedTransaction. My config spring is mainly made up of chunck Task with a JpaPagingItemReader and an ItemWriterAdapter.
For performance needs, I also configured Partitioner which allows me to partition my source tables into several sub-tables and a chunckSize = 500000
Everything works smoothly. But considering the size of my old table it takes me a week to migrate all the data.
I will want to do a test which will consist of running my Batch without committing. Just that hibernate generates all sql requests in a ".sql" file, but does not commit the data to the database.
This will allow me to see if the commit is costly in execution time.
Is it possible to configure hibernate to flush only but never commit? A kind of commit simulation ?
Thank's
Usually, the costly part is foreign key and unique key checks as well as index maintenance, but since you don't write how you fetch data, it could very well be the case that you are accessing your data in an inefficient manner.
In general, I would recommend you to create a dump with pg_dump, restore that and then try to do the migration in an SQL only way. This way, no data has to flow around but can stay on the machine which is generally much more efficient.

Unloading huge data from Cassandra table

We have a table with 15 million records and one of the columns stores huge XML. Requirement is generate 30 different text files with different fields of XML with all the data (15+ million) from table.
And all   these 30 jobs will run at the same time.
Often we are getting into ReadTimeoutException. Due to time constraints, we can’t think of caching solutions.
How can we mitigate these readtimeout exception? Any help will be greatful.
Below are the spring batch and Cassandra version used
Cassandra – 3.11, using Spring Batch – 3.x as unloads framework.

Bulk data insertion and updating from one db server to another db server

I have some set of tables which has 20 million records in a postgres server. As of now i m migrating some table data from one server to another server using insert and update queries with dependent tables in functions. It takes around 2 hours even after optimizing the query. I need a solution to migrate the data faster by using mongodb or cassandra. How?
Try putting your updates and inserts into a file and then load the file. I understand Postgresql will optimise loading the file contents. It's always worked for me although I haven't used that quantity.

Jsondb performance

Good day
I'm using QtJsonDb from http://qt-project.org/wiki/Building_QtJsonDb_from_Git as a JsonDb backend NoSQL database.
It used to work very good, but now I have over 10,000 records and its becoming very very slow
I'm saving somewhat complex objects to the db
1- how fast should the db be when retrieving the details
2- is there a 3rd party application or framework where I can load the json files and test the queries on them as well and see how is the performance there
Thanks!
Look at MongoDb, it can store data in json and it has ability to add custom indexes for quick retrieval.