Is optimistic locking equivalent to Select For Update? - entity-framework-core

It is my first time using EF Core and DDD concepts. Our database is Microsoft SQL Server. We use optimistic concurrency based on the RowVersion for user requests. This handles concurrent read and writes by users.
With the DDD paradigma user changes are not written directly to the database nor is the logic handled in database with a stored procedure. It is a three step process:
get aggregate from repository that pulls it from the database
update aggregate through domain commands that implement business logic
save aggregate back to repository that writes it to the database
The separation of read and write in the application logic can lead again to race conditions between parallel commands.
Since the time between read and write in the backend is normally fairly short, those race conditions can be handled with optimistic and also pessimistic locking.
To my understanding optimistic concurrency using RowVersion is sufficient for lost update problem, but not for write skew as is shown in Martin Kleppmann's book "Designing Data-Intensive Applications". This would require locking the read records.
To prevent write skew a common solution is to lock the records in step 1 with FOR UPDATE or in SQL Server with the hints UPDLOCK and HOLDLOCK.
EF Core does neither support FOR UPDATE nor SQL Server's WITH.
If I'm not able to lock records with EF Core does it mean there is no way to prevent write skew except using Raw SQL or Stored Procedures?
If I use RowVersion, I first check the RowVersion after getting the aggregate from the database. If it doesn't match I can fail fast. If it matches it is checked through EF Core in step 3 when updating the database. Is this pattern sufficient to eliminate all race conditions except write skew?
Since the write skew race condition occurs when read and write is on different records, it seems that there can always be a transaction added maybe later during development that makes a decision on a read. In a complex system I would not feel safe if it is not just simple CRUD access. Is there another solution when using EF Core to prevent write skew without locking records for update?

If you tell EF Core about the RowVersion attribute, it will use it in any update statement. BUT you have to be careful to preserve the RowVersion value from your data retrieval. The usual work pattern would retrieve the data, the user potentially edits the data, and then the user saves the data. When the user saves the data, you would normally have EF retrieve the entity, update the entity with the user's changes, and save the updates. EF uses the RowVersion in a Where clause to ensure nothing has changed since you read the data. This is the tricky part- you want to make sure the RowVersion is still the same as your initial data retrieval, not the second retrieval used to update the entity before saving.

Related

EF consistency between two or more reads

In this page in Microsoft's documentation on EF it is stated literally
Entity Framework does not wrap queries in a transaction
If I am right, this means that sql reads are not implied with transactions and thus every select in our code is executed independently. But if this is so, can we ensure that two reads are consistent between each other? In the typical scenario, is there a warranty that the sum of the loaded amount of A and the loaded amount of B will be right (in some connection) if a transfer between A and B is started (in a different connection) between the read of A and the read of B? Would Entity Framework be able to solve this case in some way?
The built-in solution in EF is client-side optimistic concurrency. On update EF will build a query that ensures that the row to be updated has not been changed since it was read.
Properties configured as concurrency tokens are used to implement
optimistic concurrency control: whenever an update or delete operation
is performed during SaveChanges, the value of the concurrency token on
the database is compared against the original value read by EF Core.
If the values match, the operation can complete. If the values do not
match, EF Core assumes that another user has performed a conflicting
operation and aborts the current transaction.
You can also opt in to Transactions at whatever isolation level you choose, which may provide similar protections. Or use Raw SQL queries with lock hints for your target database.

does sql statement ensure atomicity in postgres

I have a simple bug in my program that uses multi user support. I'm using knex to build sql queries, and I have a pseudocode that depicts the scenerio:
const value = queryBuilder().readDataFromTheDatabase();//executes this
//do some other work and get value
queryBuilder.writeValueToTheDatabase(updateValue(value));
This piece of code is being use in sort of a middleware function. And as you can see, this is a possible race condition i.e. when multiple users access the thing, one of them gets a stale value when they try to execute this at roughly the same amount of time.
My solution
So, I was think a possible solution would be create a single queryBuilder statement:
queryBuilder().readAndUpdateValueInTheDatabase();
So, I'll probably have to use a little bit of plpgsql. I was wondering if this solution will be sufficient. Will the statement be executed atomically? i.e. When one request reads and doesn't finish his write, does another request wait around to both read and write or just waits to write but, reads the stale value?
I think what you are looking for here is isolation, not atomicity. You could set all transactions to the highest isolation level, serializable (which is higher than the usual default level). With that level, if data that a transaction read (and presumably relied upon) is changed, then when it tries to commit it might get a serialization failure error. I say "might", because the system could conclude the situation would be consistent with the data change having happened after the commit, in which case the commit is allowed to stand.
To avoid a race condition with such a setup, you must run both the read and the write in the same database transaction.
There are two ways to do that:
Use the default READ COMMITTED isolation level and lock the rows when you read them:
SELECT ... FROM ... FOR NO KEY UPDATE;
That locks the rows against concurrent modifications, and the lock is held until the end of the transaction.
Use the REPEATABLE READ isolation level and don't lock anything. Then your UPDATE will receive a serialization error (SQLSTATE 40001) if somebody modified the row concurrently. In that case, you roll the transaction back and try again in a new REPEATABLE READ transaction.
The first solution is usually better if you expect conflicts frequently, while the second is better if conflicts are rare.
Note that you should keep the database transaction as short as possible in both cases to keep the risk of conflicts low.
Transaction in PostgreSQL use an optimistic locking model when accessing to tables, while some other DBMS do pessimistic locking (IBM Db2) or the two locking model (MS SQL Server).
Optimistic locking snapshot the data on which you are working, and the modifications are done on the snapshot until the transaction ended. When the transaction finishes, the snapshot modifications are postponed on the real database (table rows), but if some other user had made a change between the moment of the snapshot capture and the commit, then the commit cannot apply and the COMMIT is rejected as a ROLLBACK.
You can try to raise the ISOLATION LEVEL (REPEATABLE READ or SERIALIZABLE) to avoid the trouble.

Entity Framework - add or subtract set amount from DB field

I am working on my first project using an ORM (currently using Entiry Framework, although that's not set in stone) and am unsure what is the best practice when I need to add or subtract a given amount from a database field, when I am not interested in the new value and I know the field in question is frequently updated, so concurrency conflicts are a concern.
For example, in a retail system where I am recording a sale, as well as creating records for the sale and each of the line items, I need to update the quantity on hand of the items sold. It seems unnecessary to query the database for the existing quantity on hand, just so that I can populate the entity model before saving the updated quantity - and in the time taken for that round-trip, there is a chance that the same item will have been sold through another checkout or the website, so I either have a conflict or (if using a transaction) the other sale is blocked until I complete my update.
In SQL I would simply write
UPDATE Item SET Quantity=Quantity-1 WHERE ...
It seems the best option in this case is to fall back to ADO.NET + stored procedure for this one update, but is there a better way within Entity Framework?
You're right. ORMs are specialized in tracking changes to each individual entity, and applying those changes to the DB individually. Some ORMs support sending thechanges in btaches, but, even so, to modify all the records in a table implies reading them all, modifyng each one, and sending the changes back to the DB as individual UPDATEs.
And that's a big no-no! as you have corectly thought. It implies loading all the rows into memory, modifying all of them, track their changes, and send them back to the DB as indivudal updates, which is way more expensive that running a single UPDATE on the DB.
As to the final question, to run a SQL command you don't need to use traditional ADO.NET. You can run SQL queries directly from an EF DbContext using ExecuteSqlCommand like this:
MyDbContext.Database.ExecuteSqlCommand('Your SQL here!!');
I recommend you to look at the MSDN docs for Database class, to learn all the things that can be done, for example managing transactions, executing commands that return no data (as the previous example) or executing queries that return data, and even mapping them to entities (classes) in your model: SqlQuery().
So you can run SQL commands and queries without using a different technology.

I'm accessing a mongoDb database using the repository patterns. Where should I check for data Integrity?

I'm kind of new to mongodb and NoSQL data design in general.
I'm building a mongodb database that will have some denormalized data. For exemple, my "User" documents contains a reference (just the id) to zero or more "Article" documents and my Article documents contains references to zero or more users.
Since I'm using the repository pattern, no parts of my Data Access Layer knows about Articles AND Users. Where in my code should I check to make sure that all my documents are consistent with each others? Should I simply let the DAL's users code do the checks?
Would it be a good idea to have a Data Integrity Script run once in a while to check if everything is consistent?
Here is Microsoft's write-up on the Repository Pattern. From that document:
Use a repository to separate the logic that retrieves the data and maps it to the entity model from the business logic that acts on the model.
You have a couple of questions:
Where in my code should I check to make sure that all my documents are consistent with each others?
Based on the statement above, I think it's clear that this logic belongs in the Repository. The relation between these objects only exists at the layer of "business logic", the database cannot enforce these types of rules.
Should I simply let the DAL's users code do the checks?
How could they? As the writer of the repository, you are the DAL user. For MongoDB, the DAL is basically the driver.
You could possibly write a wrapper around the driver that would wrap the multiple writes in some form of transactions. But you would have to write this, MongoDB has no notion of transactions.
Would it be a good idea to have a Data Integrity Script run once in a while to check if everything is consistent?
At the end of the day, whoever writes the repository is going to be responsible for the integrity of the data. Such a script might be useful, but it would definitely suck a lot of CPU cycles.
My suggestion for N:M mappings is to start building some basic blocks for handling the multiple writes that are required to keep these two in sync. One idea is to Queue the changes and let a background job make the updates. This way you don't have to worry about multiple writes and roll-backs causing bad data.

Entity Framework Self Tracking Entities - Synchronize between 2 databases

I am using Self Tracking Entities with the Entity Framework 4. I have 2 databases, with the exact same schema. However, tables in one database will be added to/edited etc (and I mean data will be added/edited, not the actual table definitions) and at certain points of the day I will need to synchronize all the changes between this database and the other database.
I can create a separate context for both of them. But if I read a large graph from one database, how can I update the other database with the graph? Is there an easy way?
My database model is large and complex and fully relational. So it would be a big job to go through every single entity and do a read from the other database to see if it exists or not, update/insert it if need be, and then carry this on through the full object graph!
Any ideas?
This is not a use case for EF. In EF you will have to do exactly what you've described. Self tracking entities are able to track changes to these object instances - they know nothing about changes made to their own database over time and they will not know anything about state of your second database as well.
Try to look at SQL server native features (including mirroring, transaction log shipping or SSIS) and MS Sync framework. Depending on your detailed requirements these tools can suite you better.