Issue with timing of SQL query firing in JPA - jpa

I am using JPA for data persistence.
I am unable to explain a behaviour in my program.
I have an entity A which has another entity B as its member.In my code I create new instance of A and set an instance of B (fetched from database) in A,and then I save A using EntityManager. I am using container managed transaction, hence all transactions are supposed to commit at end of the method.
In very same method, after persisting A, I try to fetch an entity of class C. C, like A, has B as its member. I use a JQPL query to fetch C for id of B's instance I associated with A's instance previously.
Issue is that while fetching C, JPA is also executing SQL query to save A. I expect that to happen at end of transaction (ie when method ends).
But its happening while I try to fetch C. If I don't fetch C, then SQL query for saving A is issued when method ends.
What can be the reason for this behaviour?

JPA provider needs to flush the persistence context before query execution if there is the possibility that query results would not be consistent with the current persistent context state.
You can set flush mode to COMMIT for the desired (or all) sessions. Just keep in mind to manually flush the session if a query depends on the dirty persistence context state. Default flush mode is AUTO, meaning that persistence context may be flushed before query execution.

The reason is the database isolation level. It's read_commited by default.
Read more about isolation levels here:
https://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Read_committed
So to not break this isolation JPA MUST execute all SQL statements in the buffer that all data in the transaction has reached the database.

Related

Spring data jpa - how to guarantee flush order?

I have a quite complex save process of Spring data JPA repos in one Transaction:
mainRepo.save();
relatedRepo1.save();
relatedRepoOfRelatedRepo1.save();
...
And in the end I call (on mainRepo):
#Modifying
#Query("update mainEntity set finished = true where id = :id")
void setFinishedTrue(#Param("id") UUID id);
I want to guarantee that when setFinishedTrue(id) is called, all the related data are actually on the database because it will start an integration process that requires all needed data is available.
If you are using standard settings JPA will flush data before executing queries. So you are fine.
If you want to be really really really sure you can add an explicit flush operation.
You can do this by either using the JpaRepository.flush operation or by injecting the EntityManager and call flush on it explicitly.

What's the difference between DbUpdateConcurrencyException and 40001 postgres code?

I have two transactions.
In first I select an entity, do validations, upload provided by client file to S3 and then update this entity with info about S3 file.
Second transaction is simply deleting this entity.
Now, assume that someone called first transaction and immediately second. Second one will proceed faster and first one will throw DbUpdateConcurrencyException, as selected entity no longer exists on update query.
I get DbUpdateConcurrencyException, when my transaction has IsolationLevel.ReadCommited. But if I set IsolationLevel.Serializable it throws InvalidOperationException with 40001 postgres code. Could someone explain why do I get different errors, because it seems to me that outcome should be the same, as both errors invoked by updating non-existing entity?
The 40001 error corresponds to the SQLSTATE serialization_failure (see the table of error codes).
It's generated by the database engine in serializable isolation level when it detects that there are concurrent transactions and this transaction may have produced a result that could not have been obtained if the concurrent transactions had been run serially.
When using IsolationLevel.ReadCommited, it's impossible to obtain this error, because choosing this level of isolation precisely means that the client-side doesn't want to have these isolation checks being done by the database.
On the other hand, the DbUpdateConcurrencyException is probably not generated by the database engine. It's generated by the entity framework. The database itself is fine with an UPDATE updating zero row, it's not an error at the SQL level.
I think you get the serialization failure if the database errors out first, and the DbUpdateConcurrencyException error if the database doesn't error out, but the second layer in the order of layering (the EF) does.
The typical way to deal with serialization failures, at the serializable isolation level, is for the client-side to retry the transaction when it gets a 40001 error. The retried transaction will have a fresh view of the data and hopefully will pass (otherwise, loop on retrying).
The typical way to deal with concurrency at lesser isolation levels like Read Committed it to explicitly lock objets before accessing them to force the serialization of concurrent transactions.

When does flush and clear commit?

I'm using JPA EclipseLink 2.0 with Glassfish 3.1.2.2
I want to know if after I call
em.flush()
em.clear()
The objects are immediatly commited to the database. My problem is I'm doing so many transactions that I'm getting OutOfMemory. I want to avoid this by flushing the transaction's objects.
After I flush and clear, I can't see any immediate entity commited to the database, I can only see them AFTER the whole process is done, which tells me this isn't actually commiting.
If flush and clear doesn't commit:
1) What does it actually do?
2) Why am I no longer getting OutOfMemory?
Please tell me if I'm right:
The objects that were allocated in my RAM are sent to the database, but changes are not yet commited. This only means I cleared my RAM, the objects are now in the DB server but the transaction is not yet commited.
Entities are synchronized to the connected database at transaction commit time. If you only have n = 1 ongoing transaction (here: JTA/container managed), changes on one or more entities get written to the DB the moment you call flush() on the EntityManager instance.
However, changes become "visible" only after the transaction has been properly executed by the container (here: Glassfish) which is responsible for transaction handling. For reference, see. section 7.6.1 (p. 294) of JPA Spec 2.0 which defines:
A new persistence context begins when the container-managed entity manager is invoked (Specifically, when one of the methods of the EntityManager interface is invoked) in the scope of an active JTA transaction, and there is no current persistence context already associated with the JTA transaction. The persistence context is created and then associated with the JTA transaction.
The persistence context ends when the associated JTA transaction commits or rolls back, and all entities that were managed by the EntityManager become detached.
In section 3.2.4 (Synchronization to the Database) of the JPA Spec 2.0 we find:
The state of persistent entities is synchronized to the database at transaction commit.
[..]
The persistence provider runtime is permitted to perform synchronization to the database at other times as well when a transaction is active. The flush method can be used by the application to force synchronization.
It applies to entities associated with the persistence context. The EntityManager and Query setFlushMode methods can be used to control synchronization semantics. The effect of FlushModeType.AUTO is defined in section 3.8.7. If FlushModeType.COMMIT is specified, flushing will occur at transaction commit; the persistence provider is permitted, but not required, to perform to flush at other times. If there is no transaction active, the persistence provider must not flush to the database.
Most likely in your scenario, the container (Glassfish) and/or your application is configured for FlushModeType.COMMIT(*1). In case FlushModeType.AUTO is in place, it is up to the Persistence Provider (EclipseLink) which "is responsible for ensuring that all updates to the state of all entities in the persistence context which could potentially affect the result of the query are visible to the processing of the query." (Section 3.8.7, p. 122)
By contrast, the clear() method does NOT commit anything by itself. It simply detaches all managed entities from the current persistence context, thus causing any changes on entities which have not been flushed (committed) to get lost. For reference, see p. 70 of the linked JPA Spec.
With respect to the OutOfMemoryError, it's hard to tell what's causing this under which circumstances, as you did not provide much detail either. However, I would:
read the aforementioned sections of the JPA specification
check how your environment is configured and
reevaluate how your application is written/implemented, potentially making false assumptions on the transaction handling of the container it is running in.
Related to 2., you might check your persistence.xml whether it configures
<property name="eclipselink.persistence-context.flush-mode" value="COMMIT" />
and change it to AUTO to see if there is any difference.
Hope it helps.
Footnotes
*1: But that's a good guess, as you did not provide that much detail on your setup/environment.
On JPA transaction commit, JPA is doing flush automatically. You should see object in DB right after first transaction end, not only after whole process end. Check if really do more transactions or just one.

Create new or update existing entity at one go with JPA

A have a JPA entity that has timestamp field and is distinguished by a complex identifier field. What I need is to update timestamp in an entity that has already been stored, otherwise create and store new entity with the current timestamp.
As it turns out the task is not as simple as it seems from the first sight. The problem is that in concurrent environment I get nasty "Unique index or primary key violation" exception. Here's my code:
// Load existing entity, if any.
Entity e = entityManager.find(Entity.class, id);
if (e == null) {
// Could not find entity with the specified id in the database, so create new one.
e = entityManager.merge(new Entity(id));
}
// Set current time...
e.setTimestamp(new Date());
// ...and finally save entity.
entityManager.flush();
Please note that in this example entity identifier is not generated on insert, it is known in advance.
When two or more of threads run this block of code in parallel, they may simultaneously get null from entityManager.find(Entity.class, id) method call, so they will attempt to save two or more entities at the same time, with the same identifier resulting in error.
I think that there are few solutions to the problem.
Sure I could synchronize this code block with a global lock to prevent concurrent access to the database, but would it be the most efficient way?
Some databases support very handy MERGE statement that updates existing or creates new row if none exists. But I doubt that OpenJPA (JPA implementation of my choice) supports it.
Event if JPA does not support SQL MERGE, I can always fall back to plain old JDBC and do whatever I want with the database. But I don't want to leave comfortable API and mess with hairy JDBC+SQL combination.
There is a magic trick to fix it using standard JPA API only, but I don't know it yet.
Please help.
You are referring to the transaction isolation of JPA transactions. I.e. what is the behaviour of transactions when they access other transactions' resources.
According to this article:
READ_COMMITTED is the expected default Transaction Isolation level for using [..] EJB3 JPA
This means that - yes, you will have problems with the above code.
But JPA doesn't support custom isolation levels.
This thread discusses the topic more extensively. Depending on whether you use Spring or EJB, I think you can make use of the proper transaction strategy.

JPA NamedQuery does not pick up changes to modified Entity

I have a method that retrieves Entities using a NamedQuery. I update a value of each entity and then run another named query (in the same method and Transaction) filtering by the old value and it returns the same Entities as if I had not changed them.
I understand that the EntityManager needs to be flushed and also that it should happen automatically but that doesn't make any difference.
I enabled hibernate SQL logging and can see that the Entities are not updated when I call flush but when the container transaction commits.
EntityManager entityManager = getPrimaryEntityManager();
MyEntity myEntity = entityManager.find(MyEntityImpl.class, allocationId);
myEntity.setStateId(State.ACTIVE);
// Flush the entity manager to pick up any changes to entity states before we run this query.
entityManager.flush();
Query countQuery = entityManager
.createNamedQuery("MyEntity.getCountByState");
// we're telling the persistence provider that we want the query to do automatic flushing before this
// particular query is executed.
countQuery.setParameter("stateId", State.CHECKING);
Long count = (Long) countQuery.getSingleResult();
// Count should be zero but isn't. It doesn't see my change above
To be honest I'm not that familiar with JPA, but I ran into similar problems with Hiberate's session manager. My fix was to manually remove the specified object from Hibernate's session before querying on it again so it's forced to do a lookup from the database and doesn't get the object from cache. You might try doing the same with JPA's EntityManager.
I've just had the same issue and discovered two things:
Firstly, you should check the FlushMode for the persistence context
and / or the query.
Secondly, make sure that the entity manager is
exactly the same object for both transaction management and query
execution. In my case, I had Mockito spy on the entityManager, which
was enough to break the transaction management.