When I use a secondary table, saving the entities takes 2 times the time it takes without a secondary table.
However, the inserts to the secondary table (postgresql) takes less than 1ms according to postgres logs. So, I guess it's something with hibernate itself. Is there any known performance issue with secondary tables in hibernate? I'm using hibernate 2.1
Related
I'm using db view that is carrying values from multiple tables using postgresql db, mapped by hibernate, and managed by JPA.
I want to make a join with the view right after updating a value in table column that is mapped to the view.
My question is how to guarantee instant view update? Also, does it happen after finishing the transaction by the hibernate?
I have Postgresql as my primary database and I would like to take advantage of the Elasticsearch as a search engine for my SpringBoot application.
Problem: The queries are quite complex and with millions of rows in each table, most of the search queries are timing out.
Partial solution: I utilized the materialized views concept in the Postgresql and have a job running that refreshes them every X minutes. But on systems with huge amounts of data and with other database transactions (especially writes) in progress, the views tend to take long times to refresh (about 10 minutes to refresh 5 views). I realized that the current views are at it's capacity and I cannot add more.
That's when I started exploring other options just for the search and landed on Elasticsearch and it works great with the amount of data I have. As a POC, I used the Logstash's Jdbc input plugin but then it doesn't support the DELETE operation (bummer).
From here the soft delete is the option which I cannot take because:
A) Almost all the tables in the postgresql DB are updated every few minutes and some of them have constraints on the "name" key which in this case will stay until a clean-up job runs.
B) Many tables in my Postgresql Db are referenced with CASCADE DELETE and it's not possible for me to update 220 table's Schema and JPA queries to check for the soft delete boolean.
The same question mentioned in the link above also provides PgSync that syncs the postgresql with elasticsearch periodically. However, I cannot go with that either since it has LGPL license which is forbidden in our organization.
I'm starting to wonder if anyone else encountered this strange limitation of elasticsearch and RDMS.
I'm open to other options rather than elasticsearch to solve my need. I just don't know what's the right stack to use. Any help here is much appreciated!
I have some set of tables which has 20 million records in a postgres server. As of now i m migrating some table data from one server to another server using insert and update queries with dependent tables in functions. It takes around 2 hours even after optimizing the query. I need a solution to migrate the data faster by using mongodb or cassandra. How?
Try putting your updates and inserts into a file and then load the file. I understand Postgresql will optimise loading the file contents. It's always worked for me although I haven't used that quantity.
We have a Spring Boot project that uses Spring-JPA for data access. We have a couple of tables where we create/update rows once (or a few times, all within minutes). We don't update rows that are older than a day. These tables (like audit table) can get very large and we want to use Postgres' table partitioning features to help break up the data by month. So the main table always has this calendar month's data but if the query requires retrieval from previous months it would somehow read it from other partitions.
Two questions:
1) Is this a good idea for archiving older data but still leave it query-able?
2) Does Spring-JPA work with partitioned tables? Or do we have to figure out how to break up the query and do native queries and concatenate the restult set?
Thanks.
I am working with postgres partitioning with Hibernate & Spring JPA for a period of time. So I think, I can try to answer your questions.
1) Is this a good idea for archiving older data but still leave it query-able?
If you are applying indexes and not re-indexing table frequently, then partitioning of data may result faster query results.
Also you can use clustered index feature in postgres as well to fetch the data faster.
Because table with older data will not going to be updated, so clustered index will improve the performance efficiently.
2) Does Spring-JPA work with partitioned tables? Or do we have to figure out how to break up the query and do native queries and concatenate the restult set?
Spring JPA will work out of the box with partitioned table. It will retrieve the data from master as well as child tables and returns the concatenated result set.
Note : Issue with partitioned table
The only issue you will face with partitioned table is insertion in partitioned table.
Let me explain, when you partition a table, you will create a trigger over master table, and that trigger will return null. This is the key behind insertion issue in partitioned table using Spring JPA / Hibernate.
When you try to insert a row using Spring JPA or Hibernate you will face below issue
Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
To overcome this issue you need to override implementation of Batching batcher.
In hibernate you can provide the custom implementation of batcher factory using below configuration
hibernate.jdbc.factory_class=path.to.my.batcher.factory.implementation
In spring JPA you can achieve the same by custom implementation of batch builder using below configuration
hibernate.jdbc.batch.builder=path.to.my.batch.builder.implementation
References :
Custom Batch Builder/Batch in Spring-JPA
Demo Application
In addition to the #Anil Agrawal answer.
If you are using spring boot 2 then you need to define the customBatcher using the property.
spring.jpa.properties.hibernate.jdbc.batch.builder=net.xyz.jdbc.CustomBatchBuilder
You do not have to break down the JDBC query with postgres 11+.
If you execute select on the main table with plain jdbc, the DB would return the aggregated results from the partitioned tables.
In other words, the work is done by the Postgres DB, so Spring JPA will simply get the result and map it to objects as if there were no partitioning.
For having inserts work in a partitioned table you need to make sure that your partitions are already created, i think spring data will not create them for you.
I have just started using JPA (with EclipseLink implementation). I have a very simply select query, like
(1) entityManager.find(SomeEntity.class, SomeEntityPK);
(2) entityManager.createQuery("Select x from SomeEntity x where x.isDefault = true").getResultList();
The number of records in SomeEntity table is approx 50 (very small table).
Query (1) initially takes 3s, but subsequent hit just takes 200ms. Obviously cache is at play.
However Query (2) takes 2s for all invocations- wonder why cache is not used. I understand Query (those not using Id or Index) always hits DB and Entity relationships are utilized from Cache.
Is there any way to improve the performance? A simple JDBC select just takes <300ms to fetch data for Query (2).
[UPDATE]
I think I have solved the issue. One of the columbs in table 'SomeEntity' was Oracle XMLType. Due to some issue, I had to remove this field and instead use a CLOB field to store xml data. and voila, JPA suddenly started caching the query result. Although I don't know reason why JPA doesn't caches XMLType.
EclipseLink has a number of caches at different levels that can be used. I think the query cache is what you might be looking for described here
http://docs.oracle.com/cd/E25054_01/core.1111/e10108/toplink.htm#BCGEGHGE
And explained a bit here
http://wiki.eclipse.org/Introduction_to_EclipseLink_Queries_%28ELUG%29#How_to_Cache_Query_Results_in_the_Query_Cache