In the Entity Framework Code First Approach - perform Updaterange() - entity-framework

I need to perform the update operation on the List of Model Object.
As of now I am able to update while looping through them.
*public virtual void UpdateList(List<TEntity> entity)
{
foreach (TEntity ent in entity)
{
if (Entities.Any(h=>h.Id == ent.Id))
{
Entities.Attach(ent);
_context.Entry(ent).State = EntityState.Modified;
}
}
}*
Is there any direct way I can update List without looping through them?

It seems like what you're looking for is bulk operations. Entity Framework isn't suited for bulk operations. As the number of changes EF has to track increases, performance degrades.
There are some possible work arounds:
Commit your changes on intervals as you enumerate through the list that you're updating. i.e. SaveChanges after you have inserted or updated 1000 more items in the context.
The more items that are tracked by EF, the harder EF has to work. Part of this is alleviated by option 1, but you can also disable tracking. Fair warning, there are some catches to this so be certain to read up on everything you have to account for when disabling change detection.
If you're process requires a massive amount of changes, you might be better off using stored procedures than EF.
Note: 1000 items in option one is an arbitrary number. If you choose to go this route, you should run tests to see what range works best for the objects that you are working with.
What you'll find is:
a list of size listSize
a number n between 1 and listSize
It's much faster to call SaveChanges after n number of items than calling SaveChanges after every item. If listSize is on the order of tens of thousands or hundreds of thousands of updates, then n is most likely less than listSize. The goal is to find the value of n that will allow you to update the entire list the fastest.

Related

Entity framework performance issue, saveChanges is very slow

Recently, I am doing some simple EF work. Very simple,
first,
List<Book> books = entity.Books.WHERE(c=>c.processed==false)ToList();
then
foreach(var book in books)
{
//DoSomelogic, update some properties,
book.ISBN = "ISBN " + randomNumber();
book.processed = true;
entity.saveChanges(book)
}
I put entity.saveChanges within the foreach because it is a large list, around 100k records, and if this record is processed without problem, then flag this record, set book.processed = true, if the process is interrupted by exception, then next time, I don't have to process these good records again.
Everything seems ok to me. It is fast when you process hundreds of records. Then when we move to 100k records, entity.saveChanges is very very slow. around 1-3 seconds per record. Then we keep the entity model, but replace entity.saveChanges with classic SqlHelper.ExecuteNonQuery("update_book", sqlparams). And it is very fast.
Could anyone tell me why entity framework process that slow? And if I still want to use entity.saveChanges, what is the best way to improve the performance?
Thank you
Turn off change tracking before you perform your inserts. This will improve your performance significantly (magnitudes of order). Putting SaveChanges() outside your loop will help as well, but turning off change tracking will help even more.
using (var context = new CustomerContext())
{
context.Configuration.AutoDetectChangesEnabled = false;
// A loop to add all your new entities
context.SaveChanges();
}
See this page for some more information.
I would take the SaveChanges(book) outside of the foreach. Since book is on the entity as a list, you can put this outside and EF will work better with the resulting code.
The list is an attribute on the entity, and EF is designed to optimize updates/creates/deletes on the back end database. If you do this, I'd be curious whether it helps.
I too may advise you to take the SaveChanges() out of the loop, as it does 'n' number of updates to the database, thus the context will have 'n' times to iterate through the checkpoints and validations required.
var books = entity.Books.Where(c => c.processed == false).ToList();
books.Foreach(b =>
{
b.ISBN = "ISBN " + randomNumber();
b.processed = true;
//DoSomelogic, update some properties
});
entity.SaveChanges();
The Entity Framework, in my opinion, is a poor choice for BULK operations both from a performance and a memory consumption standpoint. Once you get beyond a few thousand records, the SaveChanges method really starts to break down.
You can try to partition your work over into smaller transactions, but again, I think you're working too hard to create this.
A much better approach is to leverage the BULK operations that are already provided by your DBMS. SQL Server provides BULK COPY via .NET. Oracle provides BULK COPY for the Oracle.DataAccess or unmanaged data access. For Oracle.ManagedDataAccess, the BULK COPY library unfortunately isn't available. But I can create an Oracle Stored Procedure using BULK COLLECT/FOR ALL that allows me to insert thousands of records within seconds with a much lower memory footprint within your application. Within the .NET app you can implement PLSQL Associative arrays as parameters, etc.
The benefit of leveraging the BULK facilities within your DBMS is reducing the context switches between your application, the query processor and the database engine.
I'm sure other database vendors provide something similar.
"AsNoTracking" works for me
ex:
Item itemctx = ctx.Items.AsNoTracking().Single(i=>i.idItem == item.idItem);
ctx.Entry(itemctx).CurrentValues.SetValues(item);
itemctx.images = item.images;
ctx.SaveChanges();
Without "AsNoTracking" the updates is very slow.
I just execute the insert command directly.
//the Id property is the primary key, so need to have this update automatically
db.Database.ExecuteSqlCommand("SET IDENTITY_INSERT[dbo].[MyTable] ON");
foreach (var p in itemsToSave)
{
db.Database.ExecuteSqlCommand("INSERT INTO[dbo].[MyTable]([Property1], [Property2]) VALUES(#Property1, #Property2)",
new SqlParameter("#Property1", p.Property1),
new SqlParameter("#Property2", p.Property2)"
}
db.Database.ExecuteSqlCommand("SET IDENTITY_INSERT[dbo].[MyTable] OFF");
Works really quickly. EF is just impossibly slow with bulk updates, completely unusable in my case for more than a dozen or so items.
Use this Nuget package:
Z.EntityFramework.Extensions
It has extension methods that you can call on the DbContext like DbContext.BulkSaveChanges which works amazingly fast.
Note: this is NOT a free package but it has a trial period.

Doctrine: avoid collision in update

I have a product table accesed by many applications, with several users in each one. I want to avoid collisions, but in a very small portion of code I have detected collisions can occur.
$item = $em->getRepository('MyProjectProductBundle:Item')
->findOneBy(array('product'=>$this, 'state'=>1));
if ($item)
{
$item->setState(3);
$item->setDateSold(new \DateTime("now"));
$item->setDateSent(new \DateTime("now"));
$dateC = new \DateTime("now");
$dateC->add(new \DateInterval('P1Y'));
$item->setDateGuarantee($dateC);
$em->persist($item);
$em->flush();
//...after this, set up customer data, etc.
}
One option could be make 2 persist() and flush(), the first one just after the state change, but before doing it I would like to know if there is a way that offers more guarantee.
I don't think a transaction is a solution, as there are actually many other actions involved in the process so, wrapping them in a transaction would force many rollbacks and failed sellings, making it worse.
Tha database is Postgress.
Any other ideas?
My first thought would be to look at optimistic locking. The end result is that if someone changes the underlying data out from under you, doctrine will throw an exception on flush. However, this might not be easy, since you say you have multiple applications operating on a central database -- it's not clear if you can control those applications or not, and you'll need to, because they'll all need to play along with the optimistic locking scheme and update the version column when they run updates.

Entity Framework 5 Get All method

In our EF implementation for Brand New Project, we have GetAll method in Repository. But, as our Application Grows, we are going to have let's say 5000 Products, Getting All the way it is, would be Performance Problem. Wouldn't it ?
if So, what is good implementation to tackle this future problem?
Any help will be hightly appreciated.
It could become a performance problem if get all is enumerating on the collection, thus causing the entire data set to be returned in an IEnumerable. This all depends on the number of joins, the size of the data set, if you have lazy loading enabled, how SQL Server performs, just to name a few.
Some people will say this is not desired, you could have GetAll() return an IQueryable which would defer the query until something caused the collection to be filled by going to SQL. That way you could filter the results with other Where(), Take(), Skip(), etc statements to allow for paging without having to retrieve all 5000+ products from the database.
It depends on how your repository class is set up. If you're performing the query immediately, i.e. if your GetAll() method returns something like IEnumerable<T> or IList<T>, then yes, that could easily be a performance problem, and you should generally avoid that sort of thing unless you really want to load all records at once.
On the other hand, if your GetAll() method returns an IQueryable<T>, then there may not be a problem at all, depending on whether you trust the people writing queries. Returning an IQueryable<T> would allow callers to further refine the search criteria before the SQL code is actually generated. Performance-wise, it would only be a problem if developers using your code didn't apply any filters before executing the query. If you trust them enough to give them enough rope to hang themselves (and potentially take your database performance down with them), then just returning IQueryable<T> might be good enough.
If you don't trust them, then, as others have pointed out, you could limit the number of records returned by your query by using the Skip() and Take() extension methods to implement simple pagination, but note that it's possible for records to slip through the cracks if people make changes to the database before you move on to the next page. Making pagination work seamlessly with an ever-changing database is much harder than a lot of people think.
Another approach would be to replace your GetAll() method with one that requires the caller to apply a filter before returning results:
public IQueryable<T> GetMatching<T>(Expression<Func<T, bool>> filter)
{
// Replace myQuery with Context.Set<T>() or whatever you're using for data access
return myQuery.Where(filter);
}
and then use it like var results = GetMatching(x => x.Name == "foo");, or whatever you want to do. Note that this could be easily bypassed by calling GetMatching(x => true), but at least it makes the intention clear. You could also combine this with the first method to put a firm cap on the number of records returned.
My personal feeling, though, is that all of these ways of limiting queries are just insurance against bad developers getting their hands on your application, and if you have bad developers working on your project, they'll find a way to cause problems no matter what you try to do. So my vote is to just return an IQueryable<T> and trust that it will be used responsibly. If people abuse it, take away the GetAll() method and give them training-wheels methods like GetRecentPosts(int count) or GetPostsWithTag(string tag, int count) or something like that, where the query logic is out of their hands.
One way to improve this is by using pagination
context.Products.Skip(n).Take(m);
What your referring to is known as paging, and it's pretty trivial to do using LINQ via the Skip/Take methods.
EF queries are lazily loaded which means they won't actually hit the database until they are evaluated so the following would only pull the first 10 rows after skipping the first 10
context.Table.Skip(10).Take(10);

EF: Lazy loading, eager loading, and "enumerating the enumerable"

I find I'm confused about lazy loading, etc.
First, are these two statements equivalent:
(1) Lazy loading:
_flaggedDates = context.FlaggedDates.Include("scheduledSchools")
.Include ("interviews").Include("partialDayAvailableBlocks")
.Include("visit").Include("events");
(2) Eager loading:
_flaggedDates = context.FlaggedDates;
In other words, in (1) the "Includes" cause the navigation collections/properties to be loaded along with the specific collection requested, regardless of the fact that you are using lazy loading ... right?
And in (2), the statement will load all the navigation entities even though you do not specifically request them, because you are using eager loading ... right?
Second: even if you are using eager loading, the data will not actually be downloaded from the database until you "enumerate the enumerable", as in the following code:
var dates = from d in _flaggedDates
where d.dateID = 2
select d;
foreach (FlaggedDate date in dates)
{
... etc.
}
The data will not actually be downloaded ("enumerated") until the foreach loop ... right? In other words, the "var dates" line defines the query, but the query is not executed until the foreach loop.
Given that (if my assumptions are correct), what's the real difference between eager loading and lazy loading?? It seems that in either case, the data does not appear until the enumeration. Am I missing something?
(My specific experience is with code-first, POCO development, by the way ... though the questions may apply more generally.)
Your description of (1) is correct, but it is an example of Eager Loading rather than Lazy Loading.
Your description of (2) is incorrect. (2) is technically using no loading at all, but will use Lazy Loading if you try to access any non-scalar values on your FlaggedDates.
In either case, you are correct that no data will be loaded from your data store until you attempt to "do something" with the _flaggedDates. However, what happens is different in each case.
(1): Eager loading: as soon as you begin your for loop, every one of the objects that you have specified will get pulled from the database and built into a gigantic in-memory data structure. This will be a very expensive operation, pulling an enormous amount of data from your database. However, it will all happen in one database round trip, with a single SQL query getting executed.
(2): Lazy loading: When your for loop begins, it will only load the FlaggedDates objects. However, if you access related objects inside your for loop, it will not have those objects loaded into memory yet. The first attempt to retrieve the scheduledSchools for a given FlaggedDate will result in either a new database roundtrip to retrieve the schools, or an Exception being thrown because your context has already been disposed. Since you'd be accessing the scheduledSchools collection inside a for loop, you would have a new database round trip for every FlaggedDate that you initially loaded at the beginning of the for loop.
Reponse to Comments
Disabling Lazy Loading is not the same as Enabling Eager Loading. In this example:
context.ContextOptions.LazyLoadingEnabled = false;
var schools = context.FlaggedDates.First().scheduledSchools;
The schools variable will contain an empty EntityCollection, because I didn't Include them in the original query (FlaggedDates.First()), and I disabled lazy loading so that they couldn't be loaded after the initial query had been executed.
You are correct that the where d.dateID == 2 would mean that only the objects related to that specific FlaggedDate object would be pulled in. However, depending on how many objects are related to that FlaggedDate, you could still end up with a lot of data going over that wire. This is due to the way the EntityFramework builds out its SQL query. SQL Query results are always in a tabular format, meaning you must have the same number of columns for every row. For every scheduledSchool object, there needs to be at least one row in the result set, and since every row has to contain at least some value for every column, you end up with every scalar value on your FlaggedDate object being repeated. So if you have 10 scheduledSchools and 10 interviews associated with your FlaggedDate, you'll end up with 20 rows that each contain every scalar value on FlaggedDate. Half of the rows will have null values for all the ScheduledSchool columns, and the other half will have null values for all of the Interviews columns.
Where this gets really bad, though, is if you go "deep" in the data you're including. For example, if each ScheduledSchool had a students property, which you included as well, then suddenly you would have a row for each Student in each ScheduledSchool, and on each of those rows, every scalar value for the Student's ScheduledSchool would be included (even though only the first row's values end up getting used), along with every scalar value on the original FlaggedDate object. It can add up quickly.
It's difficult to explain in writing, but if you look at the actual data coming back from a query with multiple Includes, you will see that there is a lot of duplicate data. You can use LinqPad to see the SQL Queries generated by your EF code.
No difference. This was not true in EF 1.0, which didn't support eager loading (at least not automatically). In 1.0, you had to either modify the property to load automatically, or call the Load() method on the property reference.
One thing to keep in mind is that those Includes can go up in smoke if you query across multiple objects like so:
from d in ctx.ObjectDates.Include("MyObjectProperty")
from da in d.Days
ObjectDate.MyObjectProperty will not be automatically loaded.

How to keep track of objects deleted from an ObservableCollection in CRUD scenarios?

In our multi-tier business application we have ObservableCollections of Self-Tracking Entities that are returned from service calls.
The idea is we want to be able to get entities, add, update and remove them from the collection client side, and then send these changes to the server side, where they will be persisted to the database.
Self-Tracking Entities, as their name might suggest, track their state themselves.
When a new STE is created, it has the Added state, when you modify a property, it sets the Modified state, it can also have Deleted state but this state is not set when the entity is removed from an ObservableCollection (obviously). If you want this behavior you need to code it yourself.
In my current implementation, when an entity is removed from the ObservableCollection, I keep it in a shadow collection, so that when the ObservableCollection is sent back to the server, I can send the deleted items along, so Entity Framework knows to delete them.
Something along the lines of:
protected IDictionary<int, IList> DeletedCollections = new Dictionary<int, IList>();
protected void SubscribeDeletionHandler<TEntity>(ObservableCollection<TEntity> collection)
{
var deletedEntities = new List<TEntity>();
DeletedCollections[collection.GetHashCode()] = deletedEntities;
collection.CollectionChanged += (o, a) =>
{
if (a.OldItems != null)
{
deletedEntities.AddRange(a.OldItems.Cast<TEntity>());
}
};
}
Now if the user decides to save his changes to the server, I can get the list of removed items, and send them along:
ObservableCollection<Customer> customers = MyServiceProxy.GetCustomers();
customers.RemoveAt(0);
MyServiceProxy.UpdateCustomers(customers);
At this point the UpdateCustomers method will verify my shadow collection if any items were removed, and send them along to the server side.
This approach works fine, until you start to think about the life-cycle these shadow collections. Basically, when the ObservableCollection is garbage collected there is no way of knowing that we need to remove the shadow collection from our dictionary.
I came up with some complicated solution that basically does manual memory management in this case. I keep a WeakReference to the ObservableCollection and every few seconds I check to see if the reference is inactive, in which case I remove the shadow collection.
But this seems like a terrible solution... I hope the collective genius of StackOverflow can shed light on a better solution.
EDIT:
In the end I decided to go with subclassing the ObservableCollection. The service proxy code is generated so it was a relatively simple task to change it to return my derived type.
Thanks for all the help!
Instead of rolling your own "weak reference + poll Is it Dead, Is it Alive" logic, you could use the HttpRuntime.Cache (available from all project types, not just web projects).
Add each shadow collection to the Cache, either with a generous timeout, or a delegate that can check if the original collection is still alive (or both).
It isn't dreadfully different to your own solution, but it does use tried and trusted .Net components.
Other than that, you're looking at extending ObservableCollection and using that new class instead (which I'd imagine is no small change), or changing/wrapping UpdateCustomers method to remove the shadow collection form DeletedCollections
Sorry I can't think of anything else, but hope this helps.
BW
If replacing ObservableCollection is a possibility (e.g. if you are using a common factory for all the collections instances) then you could subclass ObservableCollection and add a Finalize method which cleans up the deleted items that belongs to this collection.
Another alternative is to change the way you compute which items are deleted. You could maintain the original collection, and give the client a shallow copy. When the collection comes back, you can compare the two to see what items are no longer present. If the collections are sorted, then the comparison can be done in linear time on the size of the collection. If they're not sorted, then the modified collection values can be put in a hash table and that used to lookup each value in the original collection. If the entities have a natural id, then using that as the key is a safe way of determining which items are not present in the returned collection, that is, have been deleted. This also runs in linear time.
Otherwise, your original solution doesn't sound that bad. In java, a WeakReference can register a callback that gets called when the reference is cleared. There is no similar feature in .NET, but using polling is a close approximation. I don't think this approach is so bad, and if it's working, then why change it?
As an aside, aren't you concerned about GetHashCode() returning the same value for distinct collections? Using a weak reference to the collection might be more appropriate as the key, then there is no chance of a collision.
I think you're on a good path, I'd consider refactoring in this situation. My experience is that in 99% of the cases the garbage collector makes memory managment awesome - almost no real work needed.
but in the 1% of the cases it takes someone to realize that they've got to up the ante and go "old school" by firming up their caching/memory management in those areas. hats off to you for realizing you're in that situation and for trying to avoid the IDispose/WeakReference tricks. I think you'll really help the next guy who works in your code.
As for getting a solution, I think you've got a good grip on the situation
-be clear when your objects need to be created
-be clear when your objects need to be destroyed
-be clear when your objects need to be pushed to the server
good luck! tell us how it goes :)