What is the relation between a thread and transaction in JPA? - spring-data-jpa

I am aware that JPA works with the default isolation level set for the database , if no isolation level is explicitly specified using the #Transactional annotation.
So, for a simple JPA query like findByID(someId), is the transaction limited to the JPA query, or the transaction is applicable throughout the request thread ?
If I execute findById() method twice in the same thread, does it execute within the same transaction ?

If you don't specify transaction boundaries with annotations or programmatic transactions then each query executes in its own transaction.
JPA flushes before the transaction commits, so each findById will make its own database query, then flush the cached results. So if you call findById twice it will result in two queries.
You can verify this by viewing logging of transactions, see Showing a Spring Transaction in log.
Isolation level is a different issue from transaction boundaries. Most transaction properties (the A, C, and D in ACID) are all-or-nothing, but isolation isn't, it can be dialed up or down. Isolation level determines how changes in one transaction become visible to other transactions in progress.

Related

Are two identical sequential reads in a Postgres transaction guaranteed return the same records?

I was wondering if transactions in Postgres freeze the state of a table, similar to the way a point-in-time does in Elasticsearch.
If have a query with a where clause, and I inside a transaction I run it, first as a count(*), then with a select, and finally, with an update, do I need to be worried about a different process inserting a record into the db and throwing off the results?
In the default transaction isolation level READ COMMITTED, each statement in a transaction sees a different state (snapshot) of the database.
If you want all statements in a transaction to see the same snapshot, you will have to use the REPEATABLE READ isolation level. However, there is the possibility that concurrent data manipulations cause the UPDATE to fail with a serialization error, forcing you to repeat the transaction.

handle sql exception for large data insert

I have a Spring 2.5 application that takes a large (275K) file and parses it. Each record is then inserted into a Postgres db. There is a unique column (not the primaryKey/#Id) that will kick out the attempted record insert. This results in a DataContraintViolationException, which seems natural enough.
The problem I have is this kills the process. Is there a good way to continue processing the entire file, and just log the exception and move onto the next record for insert? I tried wrapping the respository.save(record) in a try/catch, but it still kills the process with a transaction rollback.
A ConstraintViolationException will be wrapped in a PersistenceException and Hibernate will generally mark the transaction for rollback - even if the exception was registered to not cause a rollback at the spring transaction handling level, e.g. via #Transactional(noRollbackFor = PersistenceException.class).
So there needs to be a different solution. Some ideas:
explicitly look whether a corresponding row is already present (one additional select per item)
try every insert in a dedicated transaction (e.g. annotating a corresponding service method with #Transactional(propagation = Propagation.REQUIRES_NEW) (one additional transaction per item)
handle the constraint violation in a custom DB statement (e.g. ON CONFLICT DO NOTHING / other "upsert" / "merge" behavior the DB offers)
The 1st and the 2nd option should offer some potential for parallelization, since selects / inserts can be issued independently from each other and there is no need to wait for unrelated DB roundtrips.
The 3rd option could be the fastest, as it requires no selects, the least amount of DB roundtrips, and statements could be batched; however it probably also needs the most amount of custom setup: Spring JPA bulk upserts is slow (1,000 entities took 20 seconds) (Reporting back which number or even which entities were actually inserted would likely even increase the complexity: How can I get the INSERTED and UPDATED rows for an UPSERT operation in postgres)

EF consistency between two or more reads

In this page in Microsoft's documentation on EF it is stated literally
Entity Framework does not wrap queries in a transaction
If I am right, this means that sql reads are not implied with transactions and thus every select in our code is executed independently. But if this is so, can we ensure that two reads are consistent between each other? In the typical scenario, is there a warranty that the sum of the loaded amount of A and the loaded amount of B will be right (in some connection) if a transfer between A and B is started (in a different connection) between the read of A and the read of B? Would Entity Framework be able to solve this case in some way?
The built-in solution in EF is client-side optimistic concurrency. On update EF will build a query that ensures that the row to be updated has not been changed since it was read.
Properties configured as concurrency tokens are used to implement
optimistic concurrency control: whenever an update or delete operation
is performed during SaveChanges, the value of the concurrency token on
the database is compared against the original value read by EF Core.
If the values match, the operation can complete. If the values do not
match, EF Core assumes that another user has performed a conflicting
operation and aborts the current transaction.
You can also opt in to Transactions at whatever isolation level you choose, which may provide similar protections. Or use Raw SQL queries with lock hints for your target database.

JPA locks and database isolation levels

Is there any mutual influence between JPA locks (optimistic/pessimistic) and database isolation levels (in example http://www.postgresql.org/docs/9.1/static/transaction-iso.html)?
EJB3.2 Spec (8.3.2 "Isolation levels") says that Bean Provider is responsible for setting isolation level of transaction, so generally I shouldn't care, but anyway I am still confused. In example in PostgreSQL, according to mentioned source, the default isolation level is "read commited". Does this mean, that when I do not lock any entity, the transaction isolation level will be still "read commited"?
By having #Version column on your entities and using no locking (equivalent of using LockModeType.NONE) you are implicitly working with READ_COMMITED isolation. This is achieved in the JPA layer because all updates are usually deferred until commit time or OptimisticLockException is thrown is case of an update conflict (I'm still assuming no explicit locking).
It assumes ... that writes to the database
will typically occur only when the flush method has been invoked—whether explicitly by the application,
or by the persistence provider runtime in accordance with the flush mode setting
On the database layer, JPA specification also assumes you have READ_COMMITED isolation.
It assumes that the databases to
which persistence units are mapped will be accessed by the implementation using read-committed isolation
(or a vendor equivalent in which long-term read locks are not held)
Of course manual flush/refresh, queries and flush type modes (AUTO, COMMIT) complicates the situation. Also 2nd level and query cache configuration might play a role. However with all defaults, JPA READ_COMMITED behaves pretty predictably and as a rule of thumb it is safe to accompany it with READ_COMMITED isolation at the db level.
In order to achieve REPETABLE_READ with JPA you have to use locks (but that's another story).
Lock modes are intended to provide a facility that enables the effect of “repeatable read” semantics

concurrent transaction management in EJB/JPA

I am working on EJB 3.0 where entity beans are managed by JPA.My question is if two or more user will try to insert in same table using same form same time, how JPA will handle that situation.
It will manage it just fine, by using database transactions. If two threads try to create the same row (i.e. with the same primary key) at the same time, one will succeed, and the other will get an exception from the database, which will cause a rollback of its transaction. That means that all the other inserts, updates and deletes made in the same transaction will also be rollbacked, or cancelled if you prefer, leaving the database in a coherent state. That's the A in ACID.
If two threads insert two different rows at the same time in the same table, then the database will handle that just fine, and both rows will be inserted.