JPA (with EclipseLink) Performance issue - jpa

I have just started using JPA (with EclipseLink implementation). I have a very simply select query, like
(1) entityManager.find(SomeEntity.class, SomeEntityPK);
(2) entityManager.createQuery("Select x from SomeEntity x where x.isDefault = true").getResultList();
The number of records in SomeEntity table is approx 50 (very small table).
Query (1) initially takes 3s, but subsequent hit just takes 200ms. Obviously cache is at play.
However Query (2) takes 2s for all invocations- wonder why cache is not used. I understand Query (those not using Id or Index) always hits DB and Entity relationships are utilized from Cache.
Is there any way to improve the performance? A simple JDBC select just takes <300ms to fetch data for Query (2).
[UPDATE]
I think I have solved the issue. One of the columbs in table 'SomeEntity' was Oracle XMLType. Due to some issue, I had to remove this field and instead use a CLOB field to store xml data. and voila, JPA suddenly started caching the query result. Although I don't know reason why JPA doesn't caches XMLType.

EclipseLink has a number of caches at different levels that can be used. I think the query cache is what you might be looking for described here
http://docs.oracle.com/cd/E25054_01/core.1111/e10108/toplink.htm#BCGEGHGE
And explained a bit here
http://wiki.eclipse.org/Introduction_to_EclipseLink_Queries_%28ELUG%29#How_to_Cache_Query_Results_in_the_Query_Cache

Related

handle sql exception for large data insert

I have a Spring 2.5 application that takes a large (275K) file and parses it. Each record is then inserted into a Postgres db. There is a unique column (not the primaryKey/#Id) that will kick out the attempted record insert. This results in a DataContraintViolationException, which seems natural enough.
The problem I have is this kills the process. Is there a good way to continue processing the entire file, and just log the exception and move onto the next record for insert? I tried wrapping the respository.save(record) in a try/catch, but it still kills the process with a transaction rollback.
A ConstraintViolationException will be wrapped in a PersistenceException and Hibernate will generally mark the transaction for rollback - even if the exception was registered to not cause a rollback at the spring transaction handling level, e.g. via #Transactional(noRollbackFor = PersistenceException.class).
So there needs to be a different solution. Some ideas:
explicitly look whether a corresponding row is already present (one additional select per item)
try every insert in a dedicated transaction (e.g. annotating a corresponding service method with #Transactional(propagation = Propagation.REQUIRES_NEW) (one additional transaction per item)
handle the constraint violation in a custom DB statement (e.g. ON CONFLICT DO NOTHING / other "upsert" / "merge" behavior the DB offers)
The 1st and the 2nd option should offer some potential for parallelization, since selects / inserts can be issued independently from each other and there is no need to wait for unrelated DB roundtrips.
The 3rd option could be the fastest, as it requires no selects, the least amount of DB roundtrips, and statements could be batched; however it probably also needs the most amount of custom setup: Spring JPA bulk upserts is slow (1,000 entities took 20 seconds) (Reporting back which number or even which entities were actually inserted would likely even increase the complexity: How can I get the INSERTED and UPDATED rows for an UPSERT operation in postgres)

Multiple updates performance improvement

I have built an application with Spring Boot and JPA to migrate a Jira postgres database.
Basically, I have 5000 users that I need to migrate. Each user means 67 update queries in different tables.
Each query uses the LOWER function to compare ignoring case.
Some pseudo-code:
for (user : users){
for (query : queries) {
jdbcTemplate.execute(query.replace(user....
I ignore any errors, so if a single query fails, I still go on and execute the other 66.
I am running this in 10 separate threads and each user is taking roughly 120 seconds to migrate. (20 threads resulted in database dead lock)
At this pace, it's gonna take more than a day, which is not acceptable (I am running this in a test environment before doing in production).
The queries looks like this:
UPDATE table SET column = 'NEWUSERNAME' where LOWER(column) = LOWER('CURRENTUSERNAME');
Is there anything I can do to try and optimize this migration?
UPDATE:
I changed my approach. First, I select every element with the CURRENTUSERNAME and get it's ID. Then I create the UPDATE queries using the ID as the "where" clause.
Other than that, it is still taking a long time (4+ hours) to execute.
I am running millions of UPDATEs, each at a time. I know jdbcTemplate has a bulk method, but if a single UPDATE fails, I believe it roll's back every successful update too. Also, I am not aware of the performance improvement it would have, if any.
So, to update the question, given that I have millions of UPDATE queries to run, what would be the best way execute them? (bulk, multi threading, something else)

Can Spring-JPA work with Postgres partitioning?

We have a Spring Boot project that uses Spring-JPA for data access. We have a couple of tables where we create/update rows once (or a few times, all within minutes). We don't update rows that are older than a day. These tables (like audit table) can get very large and we want to use Postgres' table partitioning features to help break up the data by month. So the main table always has this calendar month's data but if the query requires retrieval from previous months it would somehow read it from other partitions.
Two questions:
1) Is this a good idea for archiving older data but still leave it query-able?
2) Does Spring-JPA work with partitioned tables? Or do we have to figure out how to break up the query and do native queries and concatenate the restult set?
Thanks.
I am working with postgres partitioning with Hibernate & Spring JPA for a period of time. So I think, I can try to answer your questions.
1) Is this a good idea for archiving older data but still leave it query-able?
If you are applying indexes and not re-indexing table frequently, then partitioning of data may result faster query results.
Also you can use clustered index feature in postgres as well to fetch the data faster.
Because table with older data will not going to be updated, so clustered index will improve the performance efficiently.
2) Does Spring-JPA work with partitioned tables? Or do we have to figure out how to break up the query and do native queries and concatenate the restult set?
Spring JPA will work out of the box with partitioned table. It will retrieve the data from master as well as child tables and returns the concatenated result set.
Note : Issue with partitioned table
The only issue you will face with partitioned table is insertion in partitioned table.
Let me explain, when you partition a table, you will create a trigger over master table, and that trigger will return null. This is the key behind insertion issue in partitioned table using Spring JPA / Hibernate.
When you try to insert a row using Spring JPA or Hibernate you will face below issue
Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1
To overcome this issue you need to override implementation of Batching batcher.
In hibernate you can provide the custom implementation of batcher factory using below configuration
hibernate.jdbc.factory_class=path.to.my.batcher.factory.implementation
In spring JPA you can achieve the same by custom implementation of batch builder using below configuration
hibernate.jdbc.batch.builder=path.to.my.batch.builder.implementation
References :
Custom Batch Builder/Batch in Spring-JPA
Demo Application
In addition to the #Anil Agrawal answer.
If you are using spring boot 2 then you need to define the customBatcher using the property.
spring.jpa.properties.hibernate.jdbc.batch.builder=net.xyz.jdbc.CustomBatchBuilder
You do not have to break down the JDBC query with postgres 11+.
If you execute select on the main table with plain jdbc, the DB would return the aggregated results from the partitioned tables.
In other words, the work is done by the Postgres DB, so Spring JPA will simply get the result and map it to objects as if there were no partitioning.
For having inserts work in a partitioned table you need to make sure that your partitions are already created, i think spring data will not create them for you.

How many rows in a SQL Server table is too much?

I am parsing and storing some OSM (open street map) data in a SQL table, using Entity framework.
I've estimated there will be around 11 million records in this table. Which will be bound a to a model with EF etc. Is this too many?
What can I do to make this amount of data useable and not too slow?
The total number of rows in the DB is not the deciding factor regarding EF usage. What does become an issue with EF is when you need to work with many records at once. If you regularly manipulate many records at once, eg insert 10k, delete 10k, or update 10k at once, daily, then you will want to use SQL stored procs.
With Context, context objects and proxies and even change tracking, all nice with small transactions, large volume activity becomes slow.
My personal rule of thumb is around 1000 objects loaded at once. Use direct sql.
I use direct SQL side by side EF. I use EF for 95% of activity.
For data loads, extracts, table copies etc, all with SQL scripts/Sps.
Also Since EF6 you can tell EF to add extra indexes beyond the foreign keys, so that SQL generated performs ok.

Eclipselink Batch vs fetch-join reading in Fetch Groups is this possible?

I have a situation whereby I would like to create a query against an entity using EclipsLink JPA, I require 5 fields from this entity of which it has many. 2 of those fields are joined OneToMany relationships. I only require 2 primitive fields from each of the joins.
What is the most efficient way to do this?
I've considered a number of possiblities, batch reading seems the best bet based on what I have read however I believe this will only work if I retrieve the full entity i.e. SELECT a FROM Entity a... and the reason I don't want to do this is I have LOB and BLOB types that will eat dangerously into the memory.
Join-fetch is another but the entity has ~10 joined tables and I don't want to duplicate all of this data.
I have been using fetch groups (http://wiki.eclipse.org/EclipseLink/Examples/JPA/AttributeGroup) and specifying the fields I want which causes cached lazing loading. This is workable and the memory footprint is better. The issue is though when I do entity.getCollection() it must do a single SQL statement for each call and this is where I feel it is inefficient. If I could do SELECT a.Field, a.Field2 from Entity A using some form of batching or join-fetching or better still apply this to my fetch group this would be best I would imagine but not sure if I could ensure that it would not load all related tables and only give me the ones I want.
Help/thoughts would be much appreciated.
I think batch fetching with also work with nested fetch groups, did you try this?
You can also set a defaultFetchGroup on your FetchGroupManager (either directly or by adding fetch=LAZY to your fields you do not want in your fetch group, i.e. add fetch=LAZY to your LOB fields).