Entity framework performance issue, saveChanges is very slow - entity-framework

Recently, I am doing some simple EF work. Very simple,
first,
List<Book> books = entity.Books.WHERE(c=>c.processed==false)ToList();
then
foreach(var book in books)
{
//DoSomelogic, update some properties,
book.ISBN = "ISBN " + randomNumber();
book.processed = true;
entity.saveChanges(book)
}
I put entity.saveChanges within the foreach because it is a large list, around 100k records, and if this record is processed without problem, then flag this record, set book.processed = true, if the process is interrupted by exception, then next time, I don't have to process these good records again.
Everything seems ok to me. It is fast when you process hundreds of records. Then when we move to 100k records, entity.saveChanges is very very slow. around 1-3 seconds per record. Then we keep the entity model, but replace entity.saveChanges with classic SqlHelper.ExecuteNonQuery("update_book", sqlparams). And it is very fast.
Could anyone tell me why entity framework process that slow? And if I still want to use entity.saveChanges, what is the best way to improve the performance?
Thank you

Turn off change tracking before you perform your inserts. This will improve your performance significantly (magnitudes of order). Putting SaveChanges() outside your loop will help as well, but turning off change tracking will help even more.
using (var context = new CustomerContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
// A loop to add all your new entities
context.SaveChanges();
}
See this page for some more information.

I would take the SaveChanges(book) outside of the foreach. Since book is on the entity as a list, you can put this outside and EF will work better with the resulting code.
The list is an attribute on the entity, and EF is designed to optimize updates/creates/deletes on the back end database. If you do this, I'd be curious whether it helps.

I too may advise you to take the SaveChanges() out of the loop, as it does 'n' number of updates to the database, thus the context will have 'n' times to iterate through the checkpoints and validations required.
var books = entity.Books.Where(c => c.processed == false).ToList();
books.Foreach(b =>
{
b.ISBN = "ISBN " + randomNumber();
b.processed = true;
//DoSomelogic, update some properties
});
entity.SaveChanges();

The Entity Framework, in my opinion, is a poor choice for BULK operations both from a performance and a memory consumption standpoint. Once you get beyond a few thousand records, the SaveChanges method really starts to break down.
You can try to partition your work over into smaller transactions, but again, I think you're working too hard to create this.
A much better approach is to leverage the BULK operations that are already provided by your DBMS. SQL Server provides BULK COPY via .NET. Oracle provides BULK COPY for the Oracle.DataAccess or unmanaged data access. For Oracle.ManagedDataAccess, the BULK COPY library unfortunately isn't available. But I can create an Oracle Stored Procedure using BULK COLLECT/FOR ALL that allows me to insert thousands of records within seconds with a much lower memory footprint within your application. Within the .NET app you can implement PLSQL Associative arrays as parameters, etc.
The benefit of leveraging the BULK facilities within your DBMS is reducing the context switches between your application, the query processor and the database engine.
I'm sure other database vendors provide something similar.

"AsNoTracking" works for me
ex:
Item itemctx = ctx.Items.AsNoTracking().Single(i=>i.idItem == item.idItem);
ctx.Entry(itemctx).CurrentValues.SetValues(item);
itemctx.images = item.images;
ctx.SaveChanges();
Without "AsNoTracking" the updates is very slow.

I just execute the insert command directly.
//the Id property is the primary key, so need to have this update automatically
db.Database.ExecuteSqlCommand("SET IDENTITY_INSERT[dbo].[MyTable] ON");
foreach (var p in itemsToSave)
{
db.Database.ExecuteSqlCommand("INSERT INTO[dbo].[MyTable]([Property1], [Property2]) VALUES(#Property1, #Property2)",
new SqlParameter("#Property1", p.Property1),
new SqlParameter("#Property2", p.Property2)"
}
db.Database.ExecuteSqlCommand("SET IDENTITY_INSERT[dbo].[MyTable] OFF");
Works really quickly. EF is just impossibly slow with bulk updates, completely unusable in my case for more than a dozen or so items.

Use this Nuget package:
Z.EntityFramework.Extensions
It has extension methods that you can call on the DbContext like DbContext.BulkSaveChanges which works amazingly fast.
Note: this is NOT a free package but it has a trial period.

Related

In the Entity Framework Code First Approach - perform Updaterange()

I need to perform the update operation on the List of Model Object.
As of now I am able to update while looping through them.
*public virtual void UpdateList(List<TEntity> entity)
{
foreach (TEntity ent in entity)
{
if (Entities.Any(h=>h.Id == ent.Id))
{
Entities.Attach(ent);
_context.Entry(ent).State = EntityState.Modified;
}
}
}*
Is there any direct way I can update List without looping through them?
It seems like what you're looking for is bulk operations. Entity Framework isn't suited for bulk operations. As the number of changes EF has to track increases, performance degrades.
There are some possible work arounds:
Commit your changes on intervals as you enumerate through the list that you're updating. i.e. SaveChanges after you have inserted or updated 1000 more items in the context.
The more items that are tracked by EF, the harder EF has to work. Part of this is alleviated by option 1, but you can also disable tracking. Fair warning, there are some catches to this so be certain to read up on everything you have to account for when disabling change detection.
If you're process requires a massive amount of changes, you might be better off using stored procedures than EF.
Note: 1000 items in option one is an arbitrary number. If you choose to go this route, you should run tests to see what range works best for the objects that you are working with.
What you'll find is:
a list of size listSize
a number n between 1 and listSize
It's much faster to call SaveChanges after n number of items than calling SaveChanges after every item. If listSize is on the order of tens of thousands or hundreds of thousands of updates, then n is most likely less than listSize. The goal is to find the value of n that will allow you to update the entire list the fastest.

Entity Framework 5 without transactions

I want to set entity framework so that SavesChanges() doesn't wrap all its generated SQL with a BEGIN and a COMMIT. I came across this suggestion from a related SO question:
using( var transation = new TransactionScope(TransactionScopeOption.Suppress) )
{
ObjectContext.SaveChanges();
}
But that doesn't work for me. We are using the MYSQL connector, so I did do some digging by stepping through the source code of that, and I found EntityFramework is the one that is asking for a new transaction to be created, and not the fault of the connector see the following stack trace.
Next I started looking through EF's source code (EF 6 is open source, and the relevant code looks the same in the decompiler)
var needLocalTransaction = false;
var connection = (EntityConnection)Connection;
if (connection.CurrentTransaction == null
&& !connection.EnlistedInUserTransaction
&& _lastTransaction == null)
{
needLocalTransaction = startLocalTransaction;
}
...
if (needLocalTransaction)
{
localTransaction = connection.BeginTransaction();
}
Okay, so it looks if a transaction doesn't exist it creates one itself. And stepping further to how the current transaction is set up in EF I get to the code where EF sets up the connection
var currentTransaction = Transaction.Current;
EnsureContextIsEnlistedInCurrentTransaction(
currentTransaction,
() =>
{
Connection.Open();
_openedConnection = true;
_connectionRequestCount++;
return true;
},
false);
And this seems to be the only place where the "TransactionScopeOption.Suppress" line comes in to play, however all that does is set the ambient transaction (Transaction.Current) to null. Forcing EF not to see a transaction and doing the opposite of what I want and creating a new transaction.
Has anyone else had any luck with turning off transaction in EF 5, or is my only solution to hack and build my own version of the sql connector?
Thanks!
EF5 is hard-coded to wrap INSERTS and UPDATES into transactions. And in almost all cases, this is the desired behavior when batching writes of multiple entities.
If you want to change that you will have to write new code for yourself, compile it to an appropriate DLL, and reference the new code instead. A lot of work to get around a very important safety feature. I don't recommend it.
The question is why do you want to disable transactions so much?
Is it causing performance problems? Do you find the extra safety of the transaction to be detrimental to the business process?
It sounds like you don't care if writes are lost, phantom reads, etc. happen (which I assume because you are wanting to skip transactions all together), which is a bad idea.
If that's not the case (I hope) and you're worried about performance, maybe your transaction level isolation is too high for what you want? You may want to wrap your other calls in transactions that set a much lower (less safe, but less locking so better performance) isolation level like READ UNCOMMITTED?

Performing multiple updates in EF

I'm currently using Entity Framework 4.3, Asp.Net 4.0, Sql Server 2008 Web.
I have method that loops through 10 to 20 loan providers. Each loan provider has a chance to bid for the loan - if they do the loop terminates - if not, they loop continues until all loan providers are consumed and a "no offers" message is returned.
I want to start logging this data - simply the lender and customer ID's. So for each customer who submits an application there maybe 1 to 20 inserts to perform. I can save these in a list and update them in 1 go.
As I'm using EF, I'm a little concerned about performance - I presume an insert will be done for each separate item added to the context.
Firstly - should I be worried about performance?
Secondly, are there any techniques I can use to make this process more efficient?
Thanks in advance.
This has nothing to do with Entity Framework. You must insert one row at a time in SQL, that's the way SQL works. Each insert is a separate statement. There is no way (outside of special bulk-insert methods, which would be pointless for 20 records) of doing it differently.
Now, yes, you can insert those 20 records in one request, which is what EF will do.
I don't understand you comment "As I'm using EF, I'm a little concerned about performance". EF is no worse performing than anything else in this respect. It's a simple 20 record insert, there's nothing special here or any complexity that could cause performance issues.

Doctrine: avoid collision in update

I have a product table accesed by many applications, with several users in each one. I want to avoid collisions, but in a very small portion of code I have detected collisions can occur.
$item = $em->getRepository('MyProjectProductBundle:Item')
->findOneBy(array('product'=>$this, 'state'=>1));
if ($item)
{
$item->setState(3);
$item->setDateSold(new \DateTime("now"));
$item->setDateSent(new \DateTime("now"));
$dateC = new \DateTime("now");
$dateC->add(new \DateInterval('P1Y'));
$item->setDateGuarantee($dateC);
$em->persist($item);
$em->flush();
//...after this, set up customer data, etc.
}
One option could be make 2 persist() and flush(), the first one just after the state change, but before doing it I would like to know if there is a way that offers more guarantee.
I don't think a transaction is a solution, as there are actually many other actions involved in the process so, wrapping them in a transaction would force many rollbacks and failed sellings, making it worse.
Tha database is Postgress.
Any other ideas?
My first thought would be to look at optimistic locking. The end result is that if someone changes the underlying data out from under you, doctrine will throw an exception on flush. However, this might not be easy, since you say you have multiple applications operating on a central database -- it's not clear if you can control those applications or not, and you'll need to, because they'll all need to play along with the optimistic locking scheme and update the version column when they run updates.

Entity Framework 5 Get All method

In our EF implementation for Brand New Project, we have GetAll method in Repository. But, as our Application Grows, we are going to have let's say 5000 Products, Getting All the way it is, would be Performance Problem. Wouldn't it ?
if So, what is good implementation to tackle this future problem?
Any help will be hightly appreciated.
It could become a performance problem if get all is enumerating on the collection, thus causing the entire data set to be returned in an IEnumerable. This all depends on the number of joins, the size of the data set, if you have lazy loading enabled, how SQL Server performs, just to name a few.
Some people will say this is not desired, you could have GetAll() return an IQueryable which would defer the query until something caused the collection to be filled by going to SQL. That way you could filter the results with other Where(), Take(), Skip(), etc statements to allow for paging without having to retrieve all 5000+ products from the database.
It depends on how your repository class is set up. If you're performing the query immediately, i.e. if your GetAll() method returns something like IEnumerable<T> or IList<T>, then yes, that could easily be a performance problem, and you should generally avoid that sort of thing unless you really want to load all records at once.
On the other hand, if your GetAll() method returns an IQueryable<T>, then there may not be a problem at all, depending on whether you trust the people writing queries. Returning an IQueryable<T> would allow callers to further refine the search criteria before the SQL code is actually generated. Performance-wise, it would only be a problem if developers using your code didn't apply any filters before executing the query. If you trust them enough to give them enough rope to hang themselves (and potentially take your database performance down with them), then just returning IQueryable<T> might be good enough.
If you don't trust them, then, as others have pointed out, you could limit the number of records returned by your query by using the Skip() and Take() extension methods to implement simple pagination, but note that it's possible for records to slip through the cracks if people make changes to the database before you move on to the next page. Making pagination work seamlessly with an ever-changing database is much harder than a lot of people think.
Another approach would be to replace your GetAll() method with one that requires the caller to apply a filter before returning results:
public IQueryable<T> GetMatching<T>(Expression<Func<T, bool>> filter)
{
// Replace myQuery with Context.Set<T>() or whatever you're using for data access
return myQuery.Where(filter);
}
and then use it like var results = GetMatching(x => x.Name == "foo");, or whatever you want to do. Note that this could be easily bypassed by calling GetMatching(x => true), but at least it makes the intention clear. You could also combine this with the first method to put a firm cap on the number of records returned.
My personal feeling, though, is that all of these ways of limiting queries are just insurance against bad developers getting their hands on your application, and if you have bad developers working on your project, they'll find a way to cause problems no matter what you try to do. So my vote is to just return an IQueryable<T> and trust that it will be used responsibly. If people abuse it, take away the GetAll() method and give them training-wheels methods like GetRecentPosts(int count) or GetPostsWithTag(string tag, int count) or something like that, where the query logic is out of their hands.
One way to improve this is by using pagination
context.Products.Skip(n).Take(m);
What your referring to is known as paging, and it's pretty trivial to do using LINQ via the Skip/Take methods.
EF queries are lazily loaded which means they won't actually hit the database until they are evaluated so the following would only pull the first 10 rows after skipping the first 10
context.Table.Skip(10).Take(10);