SaveChanges() overhead - entity-framework

SaveChanges() overhead - entity-framework

When calling DbContext.SaveChanges(), is there any performance benefit to checking whether there are any changes to be saved? For example:
if (context.ChangeTracker.HasChanges())
context.SaveChanges()
Or is this the kind of check that EF is performing internally when SaveChanges() is called?

Internally, EF does a check to determine whether changes have been made, and it will not commit anything (so it won't waste resources), if there were no changes.

I hope this helps
Stopwatch sw = new Stopwatch();
Stopwatch sw2 = new Stopwatch();
sw.Start();
var b = db.ChangeTracker.HasChanges();
sw.Stop();
sw2.Start();
var c = db.SaveChanges();
sw2.Stop();
Console.WriteLine("Elapsed(HasChanges)={0}", sw.Elapsed); //takes 0.0000889 miliseconds
Console.WriteLine("Elapsed(SaveChanges)={0}", sw2.Elapsed); // takes 0.0003563 miliseconds
Console.ReadLine();

Related

EF Core ChangeTracker entities snapshot and reset to that snapshot

Is it possible to take a snapshot of the current state of EF Core's ChangeTracker and then reset later to that snapshot if needed?
Let's say I want to make the following code into reality:
public async Task ExecuteTransactionAsync(DbContext context)
{
var executionStrategy = new SqlServerRetryingExecutionStrategy(context);
var retries = 0;
var snapshot = null;
await executionStrategy.ExecuteAsync(async () =>
{
var txOptions = new TransactionOptions
{
IsolationLevel = IsolationLevel.ReadCommitted
};
using var transaction = new TransactionScope(TransactionScopeOption.RequiresNew, txOptions, TransactionScopeAsyncFlowOption.Enabled)
if (retries > 0 && snapshot != null)
{
// This method in reality doesn't exist.
context.ChangeTracker.ResetToSnapshot(snapshot);
}
// This method in reality doesn't exist.
var snapshot = context.ChangeTracker.TakeSnapshot();
retries += 1;
// Let's imagine that we are doing an insert operation on the context.
context.DoInsert();
await context.SaveChangesAsync(false);
// Let's imagine that we are doing an update operation on the context.
context.DoUpdate();
// Let's say this SaveChanges fails the first time and the transaction will retry.
await context.SaveChangesAsync(false);
transaction.Complete();
}
}
Why would this be useful?
I have found it hard to work with ChangeTracker and retrying transactions. The transaction itself of course works fine, the changes will be rollbacked database side between retries. However, ChangeTracker doesn't seem to have a similar rollback functionality.
There is SaveChanges(false) option which is supposed to be used with transactions in order to preserve the state of ChangeTracker between retries. However, if I had an insert on the first run of the transaction, on the second run (first retry) another entity would be added to the ChangeTracker and then on successful SaveChanges, two entities would get inserted to database, but my expectation from the code was to have one entity inserted.
public async Task ExecuteTransactionAsync(DbContext context)
{
var executionStrategy = new SqlServerRetryingExecutionStrategy(context);
await executionStrategy.ExecuteAsync(async () =>
{
var txOptions = new TransactionOptions
{
IsolationLevel = IsolationLevel.ReadCommitted
};
using var transaction = new TransactionScope(TransactionScopeOption.RequiresNew, txOptions, TransactionScopeAsyncFlowOption.Enabled)
// On the first retry, the ChangeTracker will already have a new entity added.
// DoInsert will add another one and now I will have two new entities
// when my expectation was just one.
context.DoInsert();
await context.SaveChangesAsync(false);
context.DoUpdate();
// Let's say this SaveChanges fails the first time and the transaction will retry.
await context.SaveChangesAsync(false);
transaction.Complete();
}
}
A similar problem would occur when updating an entity with += or -= operators. If, for example, an entity had Price = 10 and .DoUpdate would do Price -= 2 - after the first retry (so after the code wrapped in the transaction would run twice) I would have Price = 6 because it was subtracted twice, when my expectation at the end of the transaction was to have Price = 8.
So it got me thinking that a total reset to some ChangeTracker state in some point of time would be extremely useful.

Parallel.Foreach and BulkCopy

I have a C# library which connects to 59 servers of the same database structure and imports data to my local db to the same table. At this moment I am retrieving data server by server in foreach loop:
foreach (var systemDto in systems)
{
var sourceConnectionString = _systemService.GetConnectionStringAsync(systemDto.Ip).Result;
var dbConnectionFactory = new DbConnectionFactory(sourceConnectionString,
"System.Data.SqlClient");
var dbContext = new DbContext(dbConnectionFactory);
var storageRepository = new StorageRepository(dbContext);
var usedStorage = storageRepository.GetUsedStorageForCurrentMonth();
var dtUsedStorage = new DataTable();
dtUsedStorage.Load(usedStorage);
var dcIp = new DataColumn("IP", typeof(string)) {DefaultValue = systemDto.Ip};
var dcBatchDateTime = new DataColumn("BatchDateTime", typeof(string))
{
DefaultValue = batchDateTime
};
dtUsedStorage.Columns.Add(dcIp);
dtUsedStorage.Columns.Add(dcBatchDateTime);
using (var blkCopy = new SqlBulkCopy(destinationConnectionString))
{
blkCopy.DestinationTableName = "dbo.tbl";
blkCopy.WriteToServer(dtUsedStorage);
}
}
Because there are many systems to retrieve data, I wonder if it is possible to use Pararel.Foreach loop? Will BulkCopy lock the table during WriteToServer and next WriteToServer will wait until previous will complete?
-- EDIT 1
I've changed Foreach to Parallel.Foreach but I face one problem. Inside this loop I have async method: _systemService.GetConnectionStringAsync(systemDto.Ip)
and this line returns error:
System.NotSupportedException: A second operation started on this
context before a previous asynchronous operation completed. Use
'await' to ensure that any asynchronous operations have completed
before calling another method on this context. Any instance members
are not guaranteed to be thread safe.
Any ideas how can I handle this?

In general, it will get blocked and will wait until the previous operation complete.
There are some factors that may affect if SqlBulkCopy can be run in parallel or not.
I remember when adding the Parallel feature to my .NET Bulk Operations, I had hard time to make it work correctly in parallel but that worked well when the table has no index (which is likely never the case)
Even when worked, the performance gain was not a lot faster.
Perhaps you will find more information here: MSDN - Importing Data in Parallel with Table Level Locking

Code First - Retrieve and Update Record in a Transaction without Deadlocks

I have a EF code first context which represents a queue of jobs which a processing application can retrieve and run. These processing applications can be running on different machines but pointing at the same database.
The context provides a method that returns a QueueItem if there is any work to do, or null, called CollectQueueItem.
To ensure no two applications can pick up the same job, the collection takes place in a transaction with an ISOLATION LEVEL of REPEATABLE READ. This means that if there are two attempts to pick up the same job at the same time, one will be chosen as the deadlock victim and be rolled back. We can handle this by catching the DbUpdateException and return null.
Here is the code for the CollectQueueItem method:
public QueueItem CollectQueueItem()
{
using (var transaction = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.RepeatableRead }))
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
transaction.Complete();
return queueItem;
}
}
catch (DbUpdateException) //we might have been the deadlock victim. No matter.
{ }
return null;
}
}
I ran a test in LinqPad to check that this is working as expected. Here is the test below:
var ids = Enumerable.Range(0, 8).AsParallel().SelectMany(i =>
Enumerable.Range(0, 100).Select(j => {
using (var context = new QueueContext())
{
var queueItem = context.CollectQueueItem();
return queueItem == null ? -1 : queueItem.OperationId;
}
})
);
var sw = Stopwatch.StartNew();
var results = ids.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count());
sw.Stop();
Console.WriteLine("Elapsed time: {0}", sw.Elapsed);
Console.WriteLine("Deadlocked: {0}", results.Where(r => r.Key == -1).Select(r => r.Value).SingleOrDefault());
Console.WriteLine("Duplicates: {0}", results.Count(r => r.Key > -1 && r.Value > 1));
//IsolationLevel = IsolationLevel.RepeatableRead:
//Elapsed time: 00:00:26.9198440
//Deadlocked: 634
//Duplicates: 0
//IsolationLevel = IsolationLevel.ReadUncommitted:
//Elapsed time: 00:00:00.8457558
//Deadlocked: 0
//Duplicates: 234
I ran the test a few times. Without the REPEATABLE READ isolation level, the same job is retrieved by different theads (seen in the 234 duplicates). With REPEATABLE READ, jobs are only retrieved once but performance suffers and there are 634 deadlocked transactions.
My question is: is there a way to get this behaviour in EF without the risk of deadlocks or conflicts? I know in real life there will be less contention as the processors won't be continually hitting the database, but nonetheless, is there a way to do this safely without having to handle the DbUpdateException? Can I get performance closer to that of the version without the REPEATABLE READ isolation level? Or are Deadlocks not that bad in fact and I can safely ignore the exception and let the processor retry after a few millis and accept that the performance will be OK if the not all the transactions are happening at the same time?
Thanks in advance!

Id recommend a different approach.
a) sp_getapplock
Use an SQL SP that provides an Application lock feature
So you can have unique app behaviour, which might involve read from the DB or what ever else activity you need to control. It also lets you use EF in a normal way.
OR
b) Optimistic concurrency
http://msdn.microsoft.com/en-us/data/jj592904
//Object Property:
public byte[] RowVersion { get; set; }
//Object Configuration:
Property(p => p.RowVersion).IsRowVersion().IsConcurrencyToken();
a logical extension to the APP lock or used just by itself is the rowversion concurrency field on DB. Allow the dirty read. BUT when someone goes to update the record As collected, it fails if someone beat them to it. Out of the box EF optimistic locking.
You can delete "collected" job records later easily.
This might be better approach unless you expect high levels of concurrency.

As suggested by Phil, I used optimistic concurrency to ensure the job could not be processed more than once. I realised that rather than having to add a dedicated rowversion column I could use the IsLocked bit column as the ConcurrencyToken. Semantically, if this value has changed since we retrieved the row, the update should fail since only one processor should ever be able to lock it. I used the fluent API as below to configure this, although I could also have used the ConcurrencyCheck data annotation.
protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Entity<QueueItem>()
.Property(p => p.IsLocked)
.IsConcurrencyToken();
}
I was then able to simple the CollectQueueItem method, losing the TransactionScope entirely and catching the more DbUpdateConcurrencyException.
public OperationQueueItem CollectQueueItem()
{
try
{
var queueItem = this.QueueItems.FirstOrDefault(qi => !qi.IsLocked);
if (queueItem != null)
{
queueItem.DateCollected = DateTime.UtcNow;
queueItem.IsLocked = true;
this.SaveChanges();
return queueItem;
}
}
catch (DbUpdateConcurrencyException) //someone else grabbed the job.
{ }
return null;
}
I reran the tests, you can see it's a great compromise. No duplicates, nearly 100x faster than with REPEATABLE READ, and no DEADLOCKS so the DBAs won't be on my case. Awesome!
//Optimistic Concurrency:
//Elapsed time: 00:00:00.5065586
//Deadlocked: 624
//Duplicates: 0

Handle concurrency in Entity Framework

I am looking for the best way to handle concurrency while using Entity Framework. The simplest and most recommended (also on stack) solution is described here:
http://msdn.microsoft.com/en-us/library/bb399228.aspx
And it looks like:
try
{
// Try to save changes, which may cause a conflict.
int num = context.SaveChanges();
Console.WriteLine("No conflicts. " +
num.ToString() + " updates saved.");
}
catch (OptimisticConcurrencyException)
{
// Resolve the concurrency conflict by refreshing the
// object context before re-saving changes.
context.Refresh(RefreshMode.ClientWins, orders);
// Save changes.
context.SaveChanges();
Console.WriteLine("OptimisticConcurrencyException "
+ "handled and changes saved");
}
But is it enough? What if something changes between Refresh() and the second SaveChanges()? There will be uncaught OptimisticConcurrencyException?
EDIT 2:
I think this would be the final solution:
int savesCounter = 100;
Boolean saveSuccess = false;
while (!saveSuccess && savesCounter > 0)
{
savesCounter--;
try
{
// Try to save changes, which may cause a conflict.
int num = context.SaveChanges();
saveSuccess = true;
Console.WriteLine("Save success. " + num.ToString() + " updates saved.");
}
catch (OptimisticConcurrencyException)
{
// Resolve the concurrency conflict by refreshing the
// object context before re-saving changes.
Console.WriteLine("OptimisticConcurrencyException, refreshing context.");
context.Refresh(RefreshMode.ClientWins, orders);
}
}
I am not sure if Iunderstand how the Refresh() works. Does it refresh whole context? If yes, why does it take additional arguments (entities objects)? Or does it refreshes only objects specified?
For example in this situation what should be passed as Refresh() second argument:
Order dbOrder = dbContext.Orders.Where(x => x.ID == orderID);
dbOrder.Name = "new name";
//here whole the code written above to save changes
should it be dbOrder?

Yes, even the second save may cause an OptimisticConcurrencyException if - as you say - something changes between Refresh() and SaveChanges().
The example given is just a very simple retry logic, if you need to retry more than once or resolve the conflict in a more complex way, you're better off creating a loop that will retry n times than nesting try/catch more than this single level.

JOliver EventStore Snapshotting

Say I have this code:
private void CreateSnapshots(IEnumerable<StreamHead> streams)
{
foreach (StreamHead head in streams)
{
IAggregate aggregate = ???;
IMemento memento = aggregate.GetSnapshot();
var snapshot = new Snapshot(head.StreamId, head.SnapshotRevision + 1, memento);
eventStore.AddSnapshot(snapshot);
observer.Notify(new SnapshotTaken(head.StreamId, head.HeadRevision));
}
}
how do I know what aggregate to load for the current stream? I'm also using CommonDomain. Is there something in there?
Thanks

The snapshotting aspect of the EventStore needs a bit of love. I have tried to make the IStoreEvents interface geared toward working with an individual aggregate. I have also tried to ensure that snapshotting does not interfere or get in the way of normal use.
Since the release of v2.0, I have now turned my attention toward v2.1 and I will be able to make a few small API changes related to this. In the meantime, your best option is probably to bypass IStoreEvents altogether when doing snapshotting.
Another alternative is to have the snapshotting code run in-process with your regular code. When an aggregate is loaded the needs a snapshot, you could easily push a reference to that aggregate asynchronously to your snapshotting code. In this way, you don't actually have to do a load because you already have the aggregate.

I found a solution for me (this is most definitely a hack). It is still out-of-band snapshotting. Here's a sample of it in action.
private void CreateSnapshots(IEnumerable<StreamHead> streams)
{
foreach (StreamHead head in streams)
{
//NOTE: This uses a patched version of EventStore that loads commit headers in OptimisticEventStream.PopulateStream()
// <code>
// this.identifiers.Add(commit.CommitId);
// this.headers = this.headers.Union(commit.Headers).ToDictionary(k => k.Key, k => k.Value);
// </code>
var stream = eventStore.OpenStream(head.StreamId, int.MinValue, int.MaxValue);
//NOTE: Nasty hack but it works.
var aggregateType = stream.UncommittedHeaders.Where(p=>p.Key=="AggregateType").First();
var type = aggregateTypeResolver(aggregateType.Value.ToString());
MethodInfo methodInfo = typeof(IRepository).GetMethod("GetById");
MethodInfo method = methodInfo.MakeGenericMethod(type);
object o = method.Invoke(repository, new object[]{head.StreamId, head.HeadRevision});
var aggregate = (IAggregate) o;
IMemento memento = aggregate.GetSnapshot();
var snapshot = new Snapshot(head.StreamId, head.HeadRevision, memento);
eventStore.AddSnapshot(snapshot);
observer.Notify(new SnapshotTaken(head.StreamId, head.HeadRevision));
}
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

SaveChanges() overhead - entity-framework

When calling DbContext.SaveChanges(), is there any performance benefit to checking whether there are any changes to be saved? For example: if (context.ChangeTracker.HasChanges()) context.SaveChanges() Or is this the kind of check that EF is performing internally when SaveChanges() is called?

Internally, EF does a check to determine whether changes have been made, and it will not commit anything (so it won't waste resources), if there were no changes.

Related

EF Core ChangeTracker entities snapshot and reset to that snapshot

Parallel.Foreach and BulkCopy

Code First - Retrieve and Update Record in a Transaction without Deadlocks

Handle concurrency in Entity Framework

JOliver EventStore Snapshotting

Categories

Resources