Parallel Read and Dbcontext threading ssues

Parallel Read and Dbcontext threading ssues - entity-framework

We have to read 2 select statements need to run parallelly, means should be awaited together.
var t1 = GetSomeData1(somParams);
var tWrite = await WriteData(someParams);
var t2 = GetSomeData2(somParams);
await Task.WhenAll(t1, t2);
Note 1 write is between but in errors we see this error sometimes,
System.InvalidOperationException: A second operation was started on
this context before a previous operation completed. This is usually
caused by different threads concurrently using the same instance of
DbContext. For more information on how to avoid threading issues with
DbContext, see https://go.microsoft.com/fwlink/?linkid=2097913. at
Microsoft.EntityFrameworkCore.Internal.ConcurrencyDetector.EnterCriticalSection()
at
Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.MoveNext()
As per this. https://learn.microsoft.com/en-us/ef/ef6/saving/transactions default transaction isolation level is READ COMMITED.
I know read and write in READ COMMITED causes blocks but does we get the same error when we mix read and write.
What is the recommendation to avoid this error and use parallel reads.

As it reads on the link from your exception:
Entity Framework Core does not support multiple parallel operations being run on the same DbContext instance. This includes both parallel execution of async queries and any explicit concurrent use from multiple threads. Therefore, always await async calls immediately, or use separate DbContext instances for operations that execute in parallel.
When EF Core detects an attempt to use a DbContext instance concurrently, you'll see an InvalidOperationException with a message like this:
A second operation started on this context before a previous operation completed. This is usually caused by different threads using the same instance of DbContext, however instance members are not guaranteed to be thread safe.
When concurrent access goes undetected, it can result in undefined behavior, application crashes and data corruption.
https://learn.microsoft.com/en-us/ef/core/dbcontext-configuration/#avoiding-dbcontext-threading-issues

Related

Problem with the concept of scope in Dependency injection when using EF [duplicate]

This question already has answers here:
What is the best practice in EF Core for using parallel async calls with an Injected DbContext?
(2 answers)
Closed last year.
I have a problem with the concept of scope in dependency injection. I have registered my db context as a scope and And I save the user activity in a table using an asynchronous method without using "await".
// In Startup:
services.AddScoped<IDbContext, StorageSystemDbContext>();
services.AddScoped<IUserActivityService,UserActivityService>();
// In UserActivityService:
public async void LogUserActivityAsync(string controllerName, string actionName, ActionType actionType = ActionType.View, string data = "", string description = "")
{
await InsertAsync(new UserActivity
{
ControllerName = controllerName,
ActionName = actionName,
ActionType = actionType,
CreatedDateTime = DateTime.Now,
Description = description,
UserId = (await _workContext.CurrentUserAsync())?.Id
});
}
//In Controller:
_userActivityService.LogUserActivityAsync(CurrentControllerName, CurrentActionName,data);
I get the following error when I call same action twice immediately:
InvalidOperationException: A second operation was started on this context before a previous operation completed. This is usually caused by different threads concurrently using the same instance of DbContext. For more information on how to avoid threading issues with DbContext, see https://go.microsoft.com/fwlink/?linkid=2097913.
I expected a new db context to be created with the second request, depending on the type of db context dependency registration, but according to this error, a new context was not created for the second request and used the previous one.
What is the reason for this?
I'm using Asp Net.Core MVC and EF in .Net Core 5

An injected DbContext into a service regardless of scoping will be one single reference when constructor injected. Calling multiple methods in that service will always use the same instance. AddedScoped with ASP.Net will scope the services (and DbContext) to the web request. This is the recommended scoping for a DbContext to ensure any entities loaded during a request can ensure that they are all tracked by the same DbContext instance and that DbContext should be alive for the life of that request. (i.e. to provided lazy loading support if needed) A Transient scoped dependency would mean the DbContext passed to 2 different services would be distinct references. This leads to problems where Service A calls another service to retrieve entities that it wants to associate with an entity it loaded and is trying to update. These entities are associated to a different DbContext resulting in errors or issues like duplicate data being created.
Even with a transient scope DbContext you would still have the exact same problem trying to run two calls from the same service in parallel, and there are many good reasons referenced in the comments not to use un-awaited async calls to do so. Even if your intention is to await multiple calls together, the only way to enable something like would be to internally scope the DbContext within the method call itself. This would typically involve injecting a DbContextFactory type class rather than a DbContext into the service, where the DbContextFactory is a dependency that can initialize and provide a new DbContext; Then:
using (var context = _contextFactory.Create())
{
// operations with DbContext. (context)
}
Even then you need to consider the DB synchronization guards like row and table locks / deadlocks which could rear their heads if you have a significant number of operations happening in parallel. Keep in mind with web applications the web server can be responding to a significant number of requests in parallel, each of which could be kicking off these processes at any time. (Works fine during development with 1 client, crawls/dies out in the real world.)

I found the answer here:
https://stackoverflow.com/a/44121808/4604557
If for some reason you want to run parallel database operations (and think you can avoid deadlocks, concurrency conflicts etc.), make sure each one has its own DbContext instance. Note however, that parallelization is mainly useful for CPU-bound processes, not IO-bound processes like database interaction. Maybe you can benefit from parallel independent read operations but I would certainly never execute parallel write processes. Apart from deadlocks etc. it also makes it much harder to run all operations in one transaction.

What's the difference between DbUpdateConcurrencyException and 40001 postgres code?

I have two transactions.
In first I select an entity, do validations, upload provided by client file to S3 and then update this entity with info about S3 file.
Second transaction is simply deleting this entity.
Now, assume that someone called first transaction and immediately second. Second one will proceed faster and first one will throw DbUpdateConcurrencyException, as selected entity no longer exists on update query.
I get DbUpdateConcurrencyException, when my transaction has IsolationLevel.ReadCommited. But if I set IsolationLevel.Serializable it throws InvalidOperationException with 40001 postgres code. Could someone explain why do I get different errors, because it seems to me that outcome should be the same, as both errors invoked by updating non-existing entity?

The 40001 error corresponds to the SQLSTATE serialization_failure (see the table of error codes).
It's generated by the database engine in serializable isolation level when it detects that there are concurrent transactions and this transaction may have produced a result that could not have been obtained if the concurrent transactions had been run serially.
When using IsolationLevel.ReadCommited, it's impossible to obtain this error, because choosing this level of isolation precisely means that the client-side doesn't want to have these isolation checks being done by the database.
On the other hand, the DbUpdateConcurrencyException is probably not generated by the database engine. It's generated by the entity framework. The database itself is fine with an UPDATE updating zero row, it's not an error at the SQL level.
I think you get the serialization failure if the database errors out first, and the DbUpdateConcurrencyException error if the database doesn't error out, but the second layer in the order of layering (the EF) does.
The typical way to deal with serialization failures, at the serializable isolation level, is for the client-side to retry the transaction when it gets a 40001 error. The retried transaction will have a fresh view of the data and hopefully will pass (otherwise, loop on retrying).
The typical way to deal with concurrency at lesser isolation levels like Read Committed it to explicitly lock objets before accessing them to force the serialization of concurrent transactions.

Can I continue using DbContext after it has thrown an exception?

I'm aware of two different scenarios in which exceptions can be produced when working with an Entity Framework DbContext:
Enumerating a query (could throw a EntityCommandExecutionException)
Calling SaveChanges (could throw a DbUpdateException)
Within a single instance of DbContext, I'm wanting to catch these exceptions, try to recover if applicable, and then repeat the operation.
Specifically, if a call to SaveChanges throws an exception because of a deadlock, I would like to retry the call to SaveChanges. I already know how to detect this situation and perform the retry.
I saw this answer here, which indicates that an SQL connection shouldn't be used after a deadlock. This indicates that I should restart the entire DbContext and higher-level operation to recover from such exceptions.
What I'm not sure about is whether it's safe to continue using the DbContext after it has thrown an exception such as this. Will it enter an unusable state? Will it still work but not function correctly? Will SaveChanges no longer occur transactionally?

If you don't supply the DbContext with an already opened SQL connection, the DbContext will open and close the connection for you when you call SaveChanges. In that case there is no danger in keeping the DbContext around, except of course that the entities the DbContext holds on to might be in an invalid state (because this could be the reason that the SQL exception was thrown).
Here's an example of a DbContext that is suppied by an opened SQL connection and transaction:
using (var connection = new SqlConnection("my connection"))
{
connection.Open();
using (var transaction = connection.BeginTransaction())
{
using (var context = new DbContext(connection))
{
// Do useful stuff.
context.SaveChanges();
}
transaction.Commit();
}
}
If you supply the DbContext with a SqlConnection that runs in the context of a transaction, this answer holds.
Note that Entity Framework will not create a nested transaction. It simply checks whether the connection is "enlisted in user transaction". If SaveChanges already runs in a transaction, no transaction is started. Entity Framework however is unable to detect if the database has aborted the transaction because of a severe failure (such as a database deadlock). So if a first call to SaveChanges fails with something like a deadlock and you catch and recall SaveChanges, Entity Framework still thinks it is running inside a transaction.
This means that this second call is executed without a transaction and this means that when the operation fails halfway, the already executed statements will NOT be rolled back since there is no transaction to rollback.
The problem of the torn SaveChanges operation could have been prevented if Entity Framework used nested transactions, but it still wouldn't solve the general problem of consistency.
Entity Framework creates connections and transactions for us when we do not supply them explicitly. We only need/want to supply a connection and transaction explicitly when the call to SaveChanges is part of a bigger overall transaction. So even if EF created a nested transaction for us and committed this before returning from SaveChanges, we're in trouble if we call SaveChanges a second time, since this 'nested' transaction actually isn't nested at all. When EF commits this 'nested' transaction, it actually commits the only transaction there is, which means that the entire operation we needed to be atomic is torn; all changes done by SaveChanges are committed, while the operations that might came after this call didn't run. Obviously this is not a good place to be.
So moral of the story is that either you let Entity Framework handle connections and transactions for you and you can redo calls to SaveChanges without risk, or you handle transactions yourself and will have to fail fast when the database throws an exception; you shouldn't call SaveChanges again.

Entity Framework Query Execution Results

We are contemplating the use of Entity Framework for a bulk upload job. Because we are interested in tracking the individual results of each recorded, I am trying to find out whether all items inserted are done as part of a transaction and if a single failure would cause a rollback, or if there is a way to track the result of each individual result.
I was contemplating just looping through the items and calling Save() on each and wrapping that in a try/catch block, but just calling save() on the context seems more effective if I can tap into the individual results.
Thanks!

By default each time you call SaveChanges all records that are pending are saved at once. All operations within a SaveChanges call are within a transaction (source) and so if a particular SaveChanges call it fails it will rollback the queries in that particular call, but obviously not previously successful calls.
Ultimately it depends on where you call SaveChanges as to how much of a rollback you'd get. You can extend the length of a transaction with TransactionScope see this article for some examples on how to extend the scope.
My own experience is that calling SaveChanges too often can really decrease performance, and slow the entire process down a lot. But obviously waiting too long can mean a greater memory footprint. I found that EF can easily handle several thousand rows without much of a memory problem.
Placing the code in a try...catch will assist you, but you may want to look at entity validation as you can place restrictions on the objects and then validation will occur on the objects before any SQL is generated.

Concurrent read access in Entity Framework 4

Assuming I am never writing with EF, will never call SaveChanges for example, Is EF safe for concurrent reads from the same ObjectContext?
It may still be initialising a database connection and reading new objects, or updating existing objects (or deleting!) but it won't be writing anything to the db, so no transactions (I assume).
Thanks

ObjectContext and related EF classes are not thread safe so don't use them for concurrent operations. If you need to run concurrent data access use a new context for every thread.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse