I want to set entity framework so that SavesChanges() doesn't wrap all its generated SQL with a BEGIN and a COMMIT. I came across this suggestion from a related SO question:
using( var transation = new TransactionScope(TransactionScopeOption.Suppress) )
{
ObjectContext.SaveChanges();
}
But that doesn't work for me. We are using the MYSQL connector, so I did do some digging by stepping through the source code of that, and I found EntityFramework is the one that is asking for a new transaction to be created, and not the fault of the connector see the following stack trace.
Next I started looking through EF's source code (EF 6 is open source, and the relevant code looks the same in the decompiler)
var needLocalTransaction = false;
var connection = (EntityConnection)Connection;
if (connection.CurrentTransaction == null
&& !connection.EnlistedInUserTransaction
&& _lastTransaction == null)
{
needLocalTransaction = startLocalTransaction;
}
...
if (needLocalTransaction)
{
localTransaction = connection.BeginTransaction();
}
Okay, so it looks if a transaction doesn't exist it creates one itself. And stepping further to how the current transaction is set up in EF I get to the code where EF sets up the connection
var currentTransaction = Transaction.Current;
EnsureContextIsEnlistedInCurrentTransaction(
currentTransaction,
() =>
{
Connection.Open();
_openedConnection = true;
_connectionRequestCount++;
return true;
},
false);
And this seems to be the only place where the "TransactionScopeOption.Suppress" line comes in to play, however all that does is set the ambient transaction (Transaction.Current) to null. Forcing EF not to see a transaction and doing the opposite of what I want and creating a new transaction.
Has anyone else had any luck with turning off transaction in EF 5, or is my only solution to hack and build my own version of the sql connector?
Thanks!
EF5 is hard-coded to wrap INSERTS and UPDATES into transactions. And in almost all cases, this is the desired behavior when batching writes of multiple entities.
If you want to change that you will have to write new code for yourself, compile it to an appropriate DLL, and reference the new code instead. A lot of work to get around a very important safety feature. I don't recommend it.
The question is why do you want to disable transactions so much?
Is it causing performance problems? Do you find the extra safety of the transaction to be detrimental to the business process?
It sounds like you don't care if writes are lost, phantom reads, etc. happen (which I assume because you are wanting to skip transactions all together), which is a bad idea.
If that's not the case (I hope) and you're worried about performance, maybe your transaction level isolation is too high for what you want? You may want to wrap your other calls in transactions that set a much lower (less safe, but less locking so better performance) isolation level like READ UNCOMMITTED?
Related
I have a .NET Core 1.1 API with EF Core 1.1 and using Microsoft's vanilla setup of using Dependency Injection to provide the DbContext to my services. (Reference: https://learn.microsoft.com/en-us/aspnet/core/data/ef-mvc/intro#register-the-context-with-dependency-injection)
Now, I am looking into parallelizing database reads as an optimization using WhenAll
So instead of:
var result1 = await _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var result2 = await _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
I use:
var repositoryTask1 = _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var repositoryTask2 = _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);
(var result1, var result2) = await (repositoryTask1, repositoryTask2 ).WhenAll();
This is all well and good, until I use the same strategy outside of these DB Repository access classes and call these same methods with WhenAll in my controller across multiple services:
var serviceTask1 = _service1.GetSomethingsFromDb(Id);
var serviceTask2 = _service2.GetSomeMoreThingsFromDb(Id);
(var dataForController1, var dataForController2) = await (serviceTask1, serviceTask2).WhenAll();
Now when I call this from my controller, randomly I will get concurrency errors like:
System.InvalidOperationException: ExecuteReader requires an open and available Connection. The connection's current state is closed.
The reason I believe is because sometimes these threads try to access the same tables at the same time. I know that this is by design in EF Core and if I wanted to I could create a new dbContext every time, but I am trying to see if there is a workaround. That's when I found this good post by Mehdi El Gueddari: http://mehdi.me/ambient-dbcontext-in-ef6/
In which he acknowledges this limitation:
an injected DbContext prevents you from being able to introduce multi-threading or any sort of parallel execution flows in your services.
And offers a custom workaround with DbContextScope.
However, he presents a caveat even with DbContextScope in that it won't work in parallel (what I'm trying to do above):
if you attempt to start multiple parallel tasks within the context of
a DbContextScope (e.g. by creating multiple threads or multiple TPL
Task), you will get into big trouble. This is because the ambient
DbContextScope will flow through all the threads your parallel tasks
are using.
His final point here leads me to my question:
In general, parallelizing database access within a single business transaction has little to no benefits and only adds significant complexity. Any parallel operation performed within the context of a business transaction should not access the database.
Should I not be using WhenAll in this case in my Controllers and stick with using await one-by-one? Or is dependency-injection of the DbContext the more fundamental problem here, therefore a new one should instead be created/supplied every time by some kind of factory?
Using any context.XyzAsync() method is only useful if you either await the called method or return control to a calling thread that's doesn't have context in its scope.
A DbContext instance isn't thread-safe: you should never ever use it in parallel threads. Which means, just for sure, never use it in multiple threads anyway, even if they don't run parallel. Don't try to work around it.
If for some reason you want to run parallel database operations (and think you can avoid deadlocks, concurrency conflicts etc.), make sure each one has its own DbContext instance. Note however, that parallelization is mainly useful for CPU-bound processes, not IO-bound processes like database interaction. Maybe you can benefit from parallel independent read operations but I would certainly never execute parallel write processes. Apart from deadlocks etc. it also makes it much harder to run all operations in one transaction.
In ASP.Net core you'd generally use the context-per-request pattern (ServiceLifetime.Scoped, see here), but even that can't keep you from transferring the context to multiple threads. In the end it's only the programmer who can prevent that.
If you're worried about the performance costs of creating new contexts all the time: don't be. Creating a context is a light-weight operation, because the underlying model (store model, conceptual model + mappings between them) is created once and then stored in the application domain. Also, a new context doesn't create a physical connection to the database. All ASP.Net database operations run through the connection pool that manages a pool of physical connections.
If all this implies that you have to reconfigure your DI to align with best practices, so be it. If your current setup passes contexts to multiple threads there has been a poor design decision in the past. Resist the temptation to postpone inevitable refactoring by work-arounds. The only work-around is to de-parallelize your code, so in the end it may even be slower than if you redesign your DI and code to adhere to context per thread.
It came to the point where really the only way to answer the debate was to do a performance/load test to get comparable, empirical, statistical evidence so I could settle this once and for all.
Here is what I tested:
Cloud Load test with VSTS # 200 users max for 4 minutes on a Standard Azure webapp.
Test #1: 1 API call with Dependency Injection of the DbContext and async/await for each service.
Results for Test #1:
Test #2: 1 API call with new creation of the DbContext within each service method call and using parallel thread execution with WhenAll.
Results for Test #2:
Conclusion:
For those who doubt the results, I ran these tests several times with varying user loads, and the averages were basically the same every time.
The performance gains with parallel processing in my opinion is insignificant, and this does not justify the need for abandoning Dependency Injection which would create development overhead/maintenance debt, potential for bugs if handled wrong, and a departure from Microsoft's official recommendations.
One more thing to note: as you can see there were actually a few failed requests with the WhenAll strategy, even when ensuring a new context is created every time. I am not sure the reason for this, but I would much prefer no 500 errors over a 10ms performance gain.
Recently, I am doing some simple EF work. Very simple,
first,
List<Book> books = entity.Books.WHERE(c=>c.processed==false)ToList();
then
foreach(var book in books)
{
//DoSomelogic, update some properties,
book.ISBN = "ISBN " + randomNumber();
book.processed = true;
entity.saveChanges(book)
}
I put entity.saveChanges within the foreach because it is a large list, around 100k records, and if this record is processed without problem, then flag this record, set book.processed = true, if the process is interrupted by exception, then next time, I don't have to process these good records again.
Everything seems ok to me. It is fast when you process hundreds of records. Then when we move to 100k records, entity.saveChanges is very very slow. around 1-3 seconds per record. Then we keep the entity model, but replace entity.saveChanges with classic SqlHelper.ExecuteNonQuery("update_book", sqlparams). And it is very fast.
Could anyone tell me why entity framework process that slow? And if I still want to use entity.saveChanges, what is the best way to improve the performance?
Thank you
Turn off change tracking before you perform your inserts. This will improve your performance significantly (magnitudes of order). Putting SaveChanges() outside your loop will help as well, but turning off change tracking will help even more.
using (var context = new CustomerContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
// A loop to add all your new entities
context.SaveChanges();
}
See this page for some more information.
I would take the SaveChanges(book) outside of the foreach. Since book is on the entity as a list, you can put this outside and EF will work better with the resulting code.
The list is an attribute on the entity, and EF is designed to optimize updates/creates/deletes on the back end database. If you do this, I'd be curious whether it helps.
I too may advise you to take the SaveChanges() out of the loop, as it does 'n' number of updates to the database, thus the context will have 'n' times to iterate through the checkpoints and validations required.
var books = entity.Books.Where(c => c.processed == false).ToList();
books.Foreach(b =>
{
b.ISBN = "ISBN " + randomNumber();
b.processed = true;
//DoSomelogic, update some properties
});
entity.SaveChanges();
The Entity Framework, in my opinion, is a poor choice for BULK operations both from a performance and a memory consumption standpoint. Once you get beyond a few thousand records, the SaveChanges method really starts to break down.
You can try to partition your work over into smaller transactions, but again, I think you're working too hard to create this.
A much better approach is to leverage the BULK operations that are already provided by your DBMS. SQL Server provides BULK COPY via .NET. Oracle provides BULK COPY for the Oracle.DataAccess or unmanaged data access. For Oracle.ManagedDataAccess, the BULK COPY library unfortunately isn't available. But I can create an Oracle Stored Procedure using BULK COLLECT/FOR ALL that allows me to insert thousands of records within seconds with a much lower memory footprint within your application. Within the .NET app you can implement PLSQL Associative arrays as parameters, etc.
The benefit of leveraging the BULK facilities within your DBMS is reducing the context switches between your application, the query processor and the database engine.
I'm sure other database vendors provide something similar.
"AsNoTracking" works for me
ex:
Item itemctx = ctx.Items.AsNoTracking().Single(i=>i.idItem == item.idItem);
ctx.Entry(itemctx).CurrentValues.SetValues(item);
itemctx.images = item.images;
ctx.SaveChanges();
Without "AsNoTracking" the updates is very slow.
I just execute the insert command directly.
//the Id property is the primary key, so need to have this update automatically
db.Database.ExecuteSqlCommand("SET IDENTITY_INSERT[dbo].[MyTable] ON");
foreach (var p in itemsToSave)
{
db.Database.ExecuteSqlCommand("INSERT INTO[dbo].[MyTable]([Property1], [Property2]) VALUES(#Property1, #Property2)",
new SqlParameter("#Property1", p.Property1),
new SqlParameter("#Property2", p.Property2)"
}
db.Database.ExecuteSqlCommand("SET IDENTITY_INSERT[dbo].[MyTable] OFF");
Works really quickly. EF is just impossibly slow with bulk updates, completely unusable in my case for more than a dozen or so items.
Use this Nuget package:
Z.EntityFramework.Extensions
It has extension methods that you can call on the DbContext like DbContext.BulkSaveChanges which works amazingly fast.
Note: this is NOT a free package but it has a trial period.
I have a product table accesed by many applications, with several users in each one. I want to avoid collisions, but in a very small portion of code I have detected collisions can occur.
$item = $em->getRepository('MyProjectProductBundle:Item')
->findOneBy(array('product'=>$this, 'state'=>1));
if ($item)
{
$item->setState(3);
$item->setDateSold(new \DateTime("now"));
$item->setDateSent(new \DateTime("now"));
$dateC = new \DateTime("now");
$dateC->add(new \DateInterval('P1Y'));
$item->setDateGuarantee($dateC);
$em->persist($item);
$em->flush();
//...after this, set up customer data, etc.
}
One option could be make 2 persist() and flush(), the first one just after the state change, but before doing it I would like to know if there is a way that offers more guarantee.
I don't think a transaction is a solution, as there are actually many other actions involved in the process so, wrapping them in a transaction would force many rollbacks and failed sellings, making it worse.
Tha database is Postgress.
Any other ideas?
My first thought would be to look at optimistic locking. The end result is that if someone changes the underlying data out from under you, doctrine will throw an exception on flush. However, this might not be easy, since you say you have multiple applications operating on a central database -- it's not clear if you can control those applications or not, and you'll need to, because they'll all need to play along with the optimistic locking scheme and update the version column when they run updates.
I have a strange problem, when I try to use ObjectSet(TEntity).AddObject() to add a new Entity it is very slow compared to the normal behavior, it takes like 500ms and it happens just in one occasion and I am sure that the instruction which take some time is the AddObject and not the SaveChanges.
The code is very simple
ObjectEntity obj = businessObj.ExtractEntityObj();
context.ObjectTable.AddObject(ObjectEntity);
I do some work to the ObjectContext before to call this part of code but it's not much more than what I do in all the other operations, It seems that it depends on how many items with the same foreign key are stored in database, I know it could sound weird but it's the only variable that makes change the performance of this same instruction. I have also noticed that if I try that on a new DataContext the performance goes back to normal (few milliseconds) but for architectural problem I can't have new DataContext each time I have to accomplish something.
Could someone help me?
Thanks
I resolved my issue, maybe could be helpfull to someone, it was due to a simple query I was doing before that instruction, this one:
int id = context.ObjectTable.Where(t=>t.Something = somethingElse).FirstOrDefault().ID
I still don't know why but this instruction attaches all "ObjectTable" data to the context making it very slow, "ObjectTable" is also the same table in which I was adding the new entity.
Setting the ObjectQuery MergeOption to NoTracking I have resolved the problem
Bye
Modify your query
int id = context.ObjectTable.Where(t=>t.Something = somethingElse).FirstOrDefault().ID
to
int id = context.ObjectTable.FirstOrDefault(t=>t.Something = somethingElse).ID
It should't have any significant performance improvement but why use Where() and then FirsOrDefault().
I have a business case in which, for each record in a given source table, a number of changes in different tables need to be made, and each of these sourceTable records need to be processed in isolation.
So I have the following pseodocode:
MyEntityFrameworkContext ctx;
foreach (sourceRecord sr in ctx.sourceTable)
{
try
{
using (MyEntityFrameworkContext tctx = new MyEntityFramworkContext)
{
string result1 = MakeUpdatesToSomeOtherTable1(tctx);
sr.Result1 = result1;
string result2 = MakeUpdatesToSomeOtherTable2(tctx);
sr.Result2 = result2;
// will be more tables here.
using (TransactionScope ts = new TransactionScope)
{
tctx.SaveChanges; // to save changes made to OtherTable1 and OtherTable2
tctx.ExecuteStoreCommand("SQL that makes a few other changes related to sourceRecord to tables that are NOT in the EF context");
ts.Complete();
}
}
}
catch (Exception ex)
{
sr.ExceptionResult = ex.Message();
}
});
ctx.SaveChanges(); // to save all changes made to sourceTable.
The reason for the tctx and TransactionScope within the loop is that I need the changes to the OtherTables1&2 to be saved in one transaction with the call to the tctx.ExecuteStoreCommand() for each sourceRecord being processed.
Note also that I need the results written to sourceTable to be saved independently of the changes made to the tables updated in the TransactionScope. Hence I can't include the update of sourceTable in the same TransactionScope, since if that txn rolls back, I won't have a record of the exception. This way, at the end of the whole process, I can see which sourceRecords failed and which ones succeeded.
The above pseudocode works perfectly.
However, I would like to take advantage of parallelism here, and have converted the foreach to a Parallel.ForEach(). But then I run into very unexpected errors (like TransactionAbortedException after calling ts.Complete(), NullReferenceException when calling ctx.SaveChanges(), or when setting one of the Result properties of sourceRecord, I sometimes get InvalidOperationException: EntityMemberChanged or EntityComplexMemberChanged was called without first calling EntityMemberChanging or EntityComplexMemberChanging on the same change tracker with the same property name).
So I'm thinking that parallelism, while being perfect for QUERIES, does not lend itself well to data UPDATES in EntityFramwork? What am I missing, or not understanding about parallelism? I don't understand why my above approach breaks when converted to use parallelism. Any advice would be appreciated.
The EF object context isn't thread safe, so there is a high potential for catastrophic errors when multiple threads are buzzing around in the same context.
It looks like you have at least one context object outside the foreach loop and shared among threads.
Based on your description I'd guess that updating properties on the sourceRecord entities from multiple threads is corrupting some internal state in the context - probably the collections of data it maintains for change tracking.
Parallelism will only give benefit for those operations which are CPU bound, in your case it seems that all you are doing is IO operations (database updated etc), so I don't think you can get any benefit out of making it parallel. But in case there are operation in this code which are CPU bound, you can make them parallel for each update, but the updates will be sequential.