EF consistency between two or more reads - entity-framework

In this page in Microsoft's documentation on EF it is stated literally
Entity Framework does not wrap queries in a transaction
If I am right, this means that sql reads are not implied with transactions and thus every select in our code is executed independently. But if this is so, can we ensure that two reads are consistent between each other? In the typical scenario, is there a warranty that the sum of the loaded amount of A and the loaded amount of B will be right (in some connection) if a transfer between A and B is started (in a different connection) between the read of A and the read of B? Would Entity Framework be able to solve this case in some way?

The built-in solution in EF is client-side optimistic concurrency. On update EF will build a query that ensures that the row to be updated has not been changed since it was read.
Properties configured as concurrency tokens are used to implement
optimistic concurrency control: whenever an update or delete operation
is performed during SaveChanges, the value of the concurrency token on
the database is compared against the original value read by EF Core.
If the values match, the operation can complete. If the values do not
match, EF Core assumes that another user has performed a conflicting
operation and aborts the current transaction.
You can also opt in to Transactions at whatever isolation level you choose, which may provide similar protections. Or use Raw SQL queries with lock hints for your target database.

Related

Is optimistic locking equivalent to Select For Update?

It is my first time using EF Core and DDD concepts. Our database is Microsoft SQL Server. We use optimistic concurrency based on the RowVersion for user requests. This handles concurrent read and writes by users.
With the DDD paradigma user changes are not written directly to the database nor is the logic handled in database with a stored procedure. It is a three step process:
get aggregate from repository that pulls it from the database
update aggregate through domain commands that implement business logic
save aggregate back to repository that writes it to the database
The separation of read and write in the application logic can lead again to race conditions between parallel commands.
Since the time between read and write in the backend is normally fairly short, those race conditions can be handled with optimistic and also pessimistic locking.
To my understanding optimistic concurrency using RowVersion is sufficient for lost update problem, but not for write skew as is shown in Martin Kleppmann's book "Designing Data-Intensive Applications". This would require locking the read records.
To prevent write skew a common solution is to lock the records in step 1 with FOR UPDATE or in SQL Server with the hints UPDLOCK and HOLDLOCK.
EF Core does neither support FOR UPDATE nor SQL Server's WITH.
If I'm not able to lock records with EF Core does it mean there is no way to prevent write skew except using Raw SQL or Stored Procedures?
If I use RowVersion, I first check the RowVersion after getting the aggregate from the database. If it doesn't match I can fail fast. If it matches it is checked through EF Core in step 3 when updating the database. Is this pattern sufficient to eliminate all race conditions except write skew?
Since the write skew race condition occurs when read and write is on different records, it seems that there can always be a transaction added maybe later during development that makes a decision on a read. In a complex system I would not feel safe if it is not just simple CRUD access. Is there another solution when using EF Core to prevent write skew without locking records for update?
If you tell EF Core about the RowVersion attribute, it will use it in any update statement. BUT you have to be careful to preserve the RowVersion value from your data retrieval. The usual work pattern would retrieve the data, the user potentially edits the data, and then the user saves the data. When the user saves the data, you would normally have EF retrieve the entity, update the entity with the user's changes, and save the updates. EF uses the RowVersion in a Where clause to ensure nothing has changed since you read the data. This is the tricky part- you want to make sure the RowVersion is still the same as your initial data retrieval, not the second retrieval used to update the entity before saving.

handle sql exception for large data insert

I have a Spring 2.5 application that takes a large (275K) file and parses it. Each record is then inserted into a Postgres db. There is a unique column (not the primaryKey/#Id) that will kick out the attempted record insert. This results in a DataContraintViolationException, which seems natural enough.
The problem I have is this kills the process. Is there a good way to continue processing the entire file, and just log the exception and move onto the next record for insert? I tried wrapping the respository.save(record) in a try/catch, but it still kills the process with a transaction rollback.
A ConstraintViolationException will be wrapped in a PersistenceException and Hibernate will generally mark the transaction for rollback - even if the exception was registered to not cause a rollback at the spring transaction handling level, e.g. via #Transactional(noRollbackFor = PersistenceException.class).
So there needs to be a different solution. Some ideas:
explicitly look whether a corresponding row is already present (one additional select per item)
try every insert in a dedicated transaction (e.g. annotating a corresponding service method with #Transactional(propagation = Propagation.REQUIRES_NEW) (one additional transaction per item)
handle the constraint violation in a custom DB statement (e.g. ON CONFLICT DO NOTHING / other "upsert" / "merge" behavior the DB offers)
The 1st and the 2nd option should offer some potential for parallelization, since selects / inserts can be issued independently from each other and there is no need to wait for unrelated DB roundtrips.
The 3rd option could be the fastest, as it requires no selects, the least amount of DB roundtrips, and statements could be batched; however it probably also needs the most amount of custom setup: Spring JPA bulk upserts is slow (1,000 entities took 20 seconds) (Reporting back which number or even which entities were actually inserted would likely even increase the complexity: How can I get the INSERTED and UPDATED rows for an UPSERT operation in postgres)

does sql statement ensure atomicity in postgres

I have a simple bug in my program that uses multi user support. I'm using knex to build sql queries, and I have a pseudocode that depicts the scenerio:
const value = queryBuilder().readDataFromTheDatabase();//executes this
//do some other work and get value
queryBuilder.writeValueToTheDatabase(updateValue(value));
This piece of code is being use in sort of a middleware function. And as you can see, this is a possible race condition i.e. when multiple users access the thing, one of them gets a stale value when they try to execute this at roughly the same amount of time.
My solution
So, I was think a possible solution would be create a single queryBuilder statement:
queryBuilder().readAndUpdateValueInTheDatabase();
So, I'll probably have to use a little bit of plpgsql. I was wondering if this solution will be sufficient. Will the statement be executed atomically? i.e. When one request reads and doesn't finish his write, does another request wait around to both read and write or just waits to write but, reads the stale value?
I think what you are looking for here is isolation, not atomicity. You could set all transactions to the highest isolation level, serializable (which is higher than the usual default level). With that level, if data that a transaction read (and presumably relied upon) is changed, then when it tries to commit it might get a serialization failure error. I say "might", because the system could conclude the situation would be consistent with the data change having happened after the commit, in which case the commit is allowed to stand.
To avoid a race condition with such a setup, you must run both the read and the write in the same database transaction.
There are two ways to do that:
Use the default READ COMMITTED isolation level and lock the rows when you read them:
SELECT ... FROM ... FOR NO KEY UPDATE;
That locks the rows against concurrent modifications, and the lock is held until the end of the transaction.
Use the REPEATABLE READ isolation level and don't lock anything. Then your UPDATE will receive a serialization error (SQLSTATE 40001) if somebody modified the row concurrently. In that case, you roll the transaction back and try again in a new REPEATABLE READ transaction.
The first solution is usually better if you expect conflicts frequently, while the second is better if conflicts are rare.
Note that you should keep the database transaction as short as possible in both cases to keep the risk of conflicts low.
Transaction in PostgreSQL use an optimistic locking model when accessing to tables, while some other DBMS do pessimistic locking (IBM Db2) or the two locking model (MS SQL Server).
Optimistic locking snapshot the data on which you are working, and the modifications are done on the snapshot until the transaction ended. When the transaction finishes, the snapshot modifications are postponed on the real database (table rows), but if some other user had made a change between the moment of the snapshot capture and the commit, then the commit cannot apply and the COMMIT is rejected as a ROLLBACK.
You can try to raise the ISOLATION LEVEL (REPEATABLE READ or SERIALIZABLE) to avoid the trouble.

Locking for JPA entities without #Version

Env:
Java EE 7
JPA 2.1
EJB 3.1
Hibernate 4
Recently we are experiencing data problems in one of the table. Couple of points
The table is mapped to JPA entity
Table as well as Entity does not have "version" column/attribute.
In other words, there is no optimistic locking available for this table. On doing RCA, it turned out to be concurrent data modification issues.
Questions :
In such cases where #Version is not available/used (in other words optimistic locking), is using a singleton repository class is the only option to make sure data consistency is maintained ?
What about pessimistic locking in such cases ?
I believe its a general use case where an application (especially legacy) can have some tables with version column and some dont. Are there any known patterns for handling tables/entities without version column ?
Thanks in advance,
Rakesh
JPA supports pessimistic locking and you are free to use it in case you cannot or do not want to use optimistic locking.
In short, EntityManager provides lock methods to lock already retrieved entity, and also overloaded em.find and em.merge, as well as Query.setLockMode provide means to supply lock options to apply locks atomically at the time when the data is retrieved from DB.
However, with pessimistic locking, you should be aware you should prevent deadlocks. The best way to tackle it is always locking at most one entity per transaction.
You might also consider setting timeout for attempt to lock an entity, so that your transaction does not wait for long time if the entity is already locked.
In more detail, a very intelligible explanation of optimistic and pessimistic locking with JPA is provided here, including differences between READ and WRITE lock modes and setting lock timeout.

Entity Framework - add or subtract set amount from DB field

I am working on my first project using an ORM (currently using Entiry Framework, although that's not set in stone) and am unsure what is the best practice when I need to add or subtract a given amount from a database field, when I am not interested in the new value and I know the field in question is frequently updated, so concurrency conflicts are a concern.
For example, in a retail system where I am recording a sale, as well as creating records for the sale and each of the line items, I need to update the quantity on hand of the items sold. It seems unnecessary to query the database for the existing quantity on hand, just so that I can populate the entity model before saving the updated quantity - and in the time taken for that round-trip, there is a chance that the same item will have been sold through another checkout or the website, so I either have a conflict or (if using a transaction) the other sale is blocked until I complete my update.
In SQL I would simply write
UPDATE Item SET Quantity=Quantity-1 WHERE ...
It seems the best option in this case is to fall back to ADO.NET + stored procedure for this one update, but is there a better way within Entity Framework?
You're right. ORMs are specialized in tracking changes to each individual entity, and applying those changes to the DB individually. Some ORMs support sending thechanges in btaches, but, even so, to modify all the records in a table implies reading them all, modifyng each one, and sending the changes back to the DB as individual UPDATEs.
And that's a big no-no! as you have corectly thought. It implies loading all the rows into memory, modifying all of them, track their changes, and send them back to the DB as indivudal updates, which is way more expensive that running a single UPDATE on the DB.
As to the final question, to run a SQL command you don't need to use traditional ADO.NET. You can run SQL queries directly from an EF DbContext using ExecuteSqlCommand like this:
MyDbContext.Database.ExecuteSqlCommand('Your SQL here!!');
I recommend you to look at the MSDN docs for Database class, to learn all the things that can be done, for example managing transactions, executing commands that return no data (as the previous example) or executing queries that return data, and even mapping them to entities (classes) in your model: SqlQuery().
So you can run SQL commands and queries without using a different technology.