Understanding EF Core row/table locks and isolation

Understanding EF Core row/table locks and isolation - entity-framework-core

I am working on an exhibit booth reservation system for a conference. This works similarly to concert tickets. Basically, each booth can be reserved by a user. I am using Entity Framework Core, and inside my logic to purchase a booth, I want to make sure no one else can read or write while I am in this transaction in the most performant way.
I have been reading up about isolation levels and table hits but I am a bit confused on what I need to do in this case. I THINK I want Serializable isolation level, but I'm not sure. What about UPDLOCK, ROWLOCK query hints?
Basically, I want this block of code below to only be able to run if no one else is updating the record inside this method. Reads should be blocked in this method too, but not in other places where I query this table.
var tx = await context.Database.BeginTransactionAsync(System.Data.IsolationLevel.Serializable);
var product = await context.SponsorshipProducts.FirstOrDefaultAsync(x => x.Id == productId);
product.Purchase(); //check to see if booth has been purchased already. If it is not, mark purchased to block other purchases.
await context.SaveChangesAsync();
tx.Commit();
This table has a ReservedById column which indicates if the exhibit has been reserved. I want to make sure this can only be updated by one user at a time for a given record.

There are typically four types of Isolation Levels:
Read uncommitted (lowest level)
In this isolation level we can read data before it is committed by operation.
Read committed (default Isolation Level)
In this isolation level we can read data only after it is committed by operation.
Repeatable Read
In this isolation level it reads and lock it until the operation gets over.
Serializable (highest level)
Serializable is very similar to repeatable read Isolation Level but in this isolation level it will not allow not add new data during the transaction.
I want this block of code below to only be able to run if no one else is updating the record inside this method. Reads should be blocked in this method too, but not in other places where I query this table.
for this scenario Serializable or Repeatable Read Isolation level can applicable.
for more information you can refer this document.

Related

Snapshot taking and restore strategies

I've been reading about CQRS+EventSoucing patterns (which I wish to apply in a near future) and one point common to all decks and presentations I found is to take snapshots of your model state in order to restore it, but none of these share patterns/strategies of doing that.
I wonder if you could share your thoughts and experience in this matter particularly in terms of:
When to snapshot
How to model a snapshot store
Application/cache cold start
TL;DR: How have you implemented Snapshotting in your CQRS+EventSourcing application? Pros and Cons?

Rule #1: Don't.
Rule #2: Don't.
Snapshotting an event sourced model is a performance optimization. The first rule of performance optimization? Don't.
Specifically, snapshotting reduces the amount of time you lose in your repository trying to reload the history of your model from your event store.
If your repository can keep the model in memory, then you aren't going to be reloading it very often. So the win from snapshotting will be small. Therefore: don't.
If you can decompose your model into aggregates, which is to say that you can decompose the history of your model into a number of entities that have non-overlapping histories, then your one model long model history becomes many many short histories that each describe the changes to a single entity. Each entity history that you need to load will be pretty short, so the win from a snapshot will be small. Therefore: don't.
The kind of systems I'm working today require high performance but not 24x7 availability. So in a situation where I shut down my system for maintenace and restart it I'd have to load and reprocess all my event store as my fresh system doesn't know which aggregate ids to process the events. I need a better starting point for my systems to restart be more efficient.
You are worried about missing a write SLA when the repository memory caches are cold, and you have long model histories with lots of events to reload. Bolting on snapshotting might be a lot more reasonable than trying to refactor your model history into smaller streams. OK....
The snapshot store is a read model -- at any point in time, you should be able to blow away the model and rebuild it from the persisted history in the event store.
From the perspective of the repository, the snapshot store is a cache; if no snapshot is available, or if the store itself doesn't respond within the SLA, you want to fall back to reprocessing the entire event history, starting from the initial seed state.
The service provider interface is going to look something like
interface SnapshotClient {
SnapshotRecord getSnapshot(Identifier id)
}
SnapshotRecord is going to provide to the repository the information it needs to consume the snapshot. That's going to include at a minimum
a memento that allows the repository to rehydrate the snapshotted state
a description of the last event processed by the snapshot projector when building the snapshot.
The model will then re-hydrate the snapshotted state from the memento, load the history from the event store, scanning backwards (ie, starting from the most recent event) looking for the event documented in the SnapshotRecord, then apply the subsequent events in order.
The SnapshotRepository itself could be a key-value store (at most one record for any given id), but a relational database with blob support will work fine too
select *
from snapshots s
where id = ?
order by s.total_events desc
limit 1
The snapshot projector and the repository are tightly coupled -- they need to agree on what the state of the entity should be for all possible histories, they need to agree how to de/re-hydrate the memento, and they need to agree which id will be used to locate the snapshot.
The tight coupling also means that you don't need to worry particularly about the
schema for the memento; a byte array will be fine.
They don't, however, need to agree with previous incarnations of themselves. Snapshot Projector 2.0 discards/ignores any snapshots left behind by Snapshot Projector 1.0 -- the snapshot store is just a cache after all.
i'm designing an application that will probably generate millions event a day. what can we do if we need to rebuild a view 6 month later
One of the more compelling answers here is to model time explicitly. Do you have one entity that lives for six months, or do you have 180+ entities that each live for one day? Accounting is a good domain to reference here: at the end of the fiscal year, the books are closed, and the next year's books are opened with the carryover.
Yves Reynhout frequently talks about modeling time and scheduling; Evolving a Model may be a good starting point.

There are few instances you need to snapshot for sure. But there are a couple - a common example is an account in a ledger. You'll have thousands maybe millions of credit/debit events producing the final BALANCE state of the account - it would be insane not to snapshot that every so often.
My approach to snapshoting when I designed Aggregates.NET was its off by default and to enable your aggregates or entities must inherit from AggregateWithMemento or EntityWithMemento which in turn your entity must define a RestoreSnapshot, a TakeSnapshot and a ShouldTakeSnapshot
The decision whether to take a snapshot or not is left up to the entity itself. A common pattern is
Boolean ShouldTakeSnapshot() {
return this.Version % 50 == 0;
}
Which of course would take a snapshot every 50 events.
When reading the entity stream the first thing we do is check for a snapshot then read the rest of the entity's stream from the moment the snapshot was taken. IE: Don't ask for the entire stream just the part we have not snapshoted.
As for the store - you can use literally anything. VOU is right though a key-value store is best because you only need to 1. check if one exists 2. load the entire thing - which is ideal for kv
For system restarts - I'm not really following what your described problem is. There's no reason for your domain server to be stateful in the sense that its doing something different at different points in time. It should do just 1 thing - process the next command. In the process of handling a command it loads data from the event store, including a snapshot, runs the command against the entity which either produces a business exception or domain events which are recorded to the store.
I think you may be trying to optimize too much with this talk of caching and cold starts.

Why no explicit transaction handling for JPA read methods like find() and so on

I have an application managed persistence context. Why I don't have to begin and commit a transaction in my code when I call any read method from a JPA EntityManager like find(), refresh(), JPQL-Queries, CrtieriaAPI-Queries and so on?
Finally also Select-Statements need a transaction because of problems like dirty read, non repeatable reads...
Thanks!

Dirty reads und non repeatable reads will obviously prevented by the same strategy like preventing the lost update problem - with optimistic and pessimistic locks. Isolation levels do not matter in JPA.
//no transaction
em.find(foo.class,1,LockMode.PESSIMISTIC_READ);
-> Here my own Lock plays no role, because there is no transaction while the lock is to be obtained. My Lock is waste. But ...
em.getTransaction.begin();
em.find(foo.class,1,LockMode.PESSIMISTIC_READ);
//some stuff
em.getTransaction.commit();
-> There is a period while the lock is to be obtained. The lock makes sense.
Right?

NEventStore issue with replaying events

We are using CQRS + ES. The ES is NEventStore (folrmerly JOliver EventStore). We have 2 aggregates in different commands. The projections of the second AR depends on the data written by the first AR projections in the read model. The problem is that when we run the software everything goes so fast that sometimes the two aggregates are persisted in the event store with identical datetime (column CommitStamp). When replaying the events we get them from the beginning ordered by CommitStamp column. But if the two streams are with identical CommitStamp and taken in wrong order the read model projections blow with exceptions.
Any idea how to solve this problem?
===============================
Here is the discussion about this problem at github
https://github.com/NEventStore/NEventStore/issues/170
===============================
EDIT: This is how we currently replay events. I searched how GetFrom(...) works and it turned out that commitstamp column is not used for ordering. After all there is not commit order. So if I start replaying events it may return an event from today, next an event recorded 2 years ago, next etc
public void ReplayEvents(Action<List<UncommittedEvent>> whatToDoWithEvents, DateTime loadEventsAfterDate)
{
var eventPortion = store.Advanced.GetFrom(loadEventsAfterDate);
var uncommitedEventStream = new UncommittedEventStream();
foreach (var commit in eventPortion)
{
foreach (var eventMessage in commit.Events.ToList()))
{
uncommitedEventStream.Append(new UncommittedEvent(eventMessage.Body));
}
}
whatToDoWithEvents(uncommitedEventStream.ToList());
}

In NEventStore, the consistency boundary is the stream. As of version 3.2 (as #Marijn mentioned, issue #159) the CommitSequence column is used to order CommitMessages (and the therein contained EventMessages) when reading from a stream across all persistence engines.
EventMessage ordering is guaranteed on a per stream basis. There is no implied ordering of messages across streams. Any actual ordering that may occur as a result some aspect of the chosen persistence engine is accidental and must not be relied upon.
To guarantee ordering across streams will severely restrict the distributed-friendly aspects of the library. Even if we were to consider such a feature, it would have to work with all supported persistence engines, which includes NoSQL stores.
If you are practising Domain Driven Design, where each stream represents an aggregate root, and you need to guarantee ordering across 2 or more aggregates, this points to a design issue in your domain model.
If your projections need to merge values from multiple sources (streams), you can rely on ordering intra- source, but you need be flexible on ordering inter-source. You should also account for the possibility of duplicate messages, especially if you are replaying through an external bus or queue.
If you attempt to re-order multiple streams on the receiver end using a timestamp (CommitStamp), that will be fragile. Timestamps have a fixed resolution (ms, tick, etc). Even with a single writer, things may still happen 'at the same time'.

Damian added a checkpoint column in the database. This is in the current master branch. When the events a replayed with GetFromCheckpoint(int) the results are correct.

At the database level, while the CommitStamp is fine for filtering, the CommitSequence column is the one that should guide the ordering.
As for that that translates to in terms of API calls on whatever version of the libs you're using -- I'll leave that as an exercise for you (or if you fill in a code snippet and/or a mention of the version perhaps someone else can step in)

Doctrine: avoid collision in update

I have a product table accesed by many applications, with several users in each one. I want to avoid collisions, but in a very small portion of code I have detected collisions can occur.
$item = $em->getRepository('MyProjectProductBundle:Item')
->findOneBy(array('product'=>$this, 'state'=>1));
if ($item)
{
$item->setState(3);
$item->setDateSold(new \DateTime("now"));
$item->setDateSent(new \DateTime("now"));
$dateC = new \DateTime("now");
$dateC->add(new \DateInterval('P1Y'));
$item->setDateGuarantee($dateC);
$em->persist($item);
$em->flush();
//...after this, set up customer data, etc.
}
One option could be make 2 persist() and flush(), the first one just after the state change, but before doing it I would like to know if there is a way that offers more guarantee.
I don't think a transaction is a solution, as there are actually many other actions involved in the process so, wrapping them in a transaction would force many rollbacks and failed sellings, making it worse.
Tha database is Postgress.
Any other ideas?

My first thought would be to look at optimistic locking. The end result is that if someone changes the underlying data out from under you, doctrine will throw an exception on flush. However, this might not be easy, since you say you have multiple applications operating on a central database -- it's not clear if you can control those applications or not, and you'll need to, because they'll all need to play along with the optimistic locking scheme and update the version column when they run updates.

entity framework and some general doubts with the optimistic concurrency exception

I have some doubts about optimistic concurrency exception.
Well, For example, I retrieve some data from the database, I modify some registers and then submit the changes. If someone update the information of the registers between my request and my update I get an optimistic exception. The classic concurrency problem.
My first doubt is the following. EF to decide if the information is changed or not, retrieves the data from the database, and compare the original data that I obtain with the data that is retrieved from the database. If exists differences, then the optimistic concurrency exception is thrown.
If when I catch the optimistic concurrency exception, I decide if the client wins or the store wins. in this step EF retrieves again the information or use the data from the first retrieved? Because if retrieve again the data, it would be inefficient.
The second doubt is how to control the optimistic concurrency exception. In the catch block of code, I decide if the client wins or the store wins. If the client wins, then I call again saveChanges. But between the time that I decide that the client wins and the savechanges, other user could change the data, so I get again an optimistic concurrency exception. In theory, It could be an infinity loop.
would it be a good idea to use a transaction (scope) to ensure that the client update the information in the database? Other solution could be use a loop to try N times to update the data, if it is not possible, exit and say it to the user.
would the transaction a good idea? does it consume a lot of resources of the database? Although the transaction blocks for a moment the database, it ensures that the operation of update is finished. The loop of N times to try to complete the operation, call the database N times, and perhaps it could need more resources.
Thanks.
Daimroc.
EDIT: I forgot to ask. is it possible set the context to use client wins by default instead to wait to the concurrency exception?

My first doubt is the following. EF to decide if the information is
changed or not, retrieves the data from the database ...
It doesn't retrieve any additional data from database. It takes original values of your entity used for concurrency handling and use them in where condition of update command. The update command is followed by selecting number of modified rows. If the number is 0 it either means that record doesn't exists or somebody has changed it.
The second doubt is how to control the optimistic concurrency exception.
You simply call Refresh and SaveChanges. You can repeat pattern few times if needed. If you have so much highly concurrent application that multiple threads are fighting to update same record within fraction of seconds you most probably need to architect your data storage in different way.
Would it be a good idea to use a transaction (scope) to ensure that the client update the information in the database?
SaveChanges always uses database transaction. TransactionScope will not add you any additional value unless you want to use transaction over multiple calls to SaveChanges, distributed transaction or change for example isolation level of the transaction.
Is it possible set the context to use client wins by default instead
to wait to the concurrency exception?
It is set by default. Simply don't mark any of your properties with ConcurrencyMode.Fixed and you will have no concurrency handling = client wins.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse