EF: Lazy loading, eager loading, and "enumerating the enumerable" - entity-framework

I find I'm confused about lazy loading, etc.
First, are these two statements equivalent:
(1) Lazy loading:
_flaggedDates = context.FlaggedDates.Include("scheduledSchools")
.Include ("interviews").Include("partialDayAvailableBlocks")
.Include("visit").Include("events");
(2) Eager loading:
_flaggedDates = context.FlaggedDates;
In other words, in (1) the "Includes" cause the navigation collections/properties to be loaded along with the specific collection requested, regardless of the fact that you are using lazy loading ... right?
And in (2), the statement will load all the navigation entities even though you do not specifically request them, because you are using eager loading ... right?
Second: even if you are using eager loading, the data will not actually be downloaded from the database until you "enumerate the enumerable", as in the following code:
var dates = from d in _flaggedDates
where d.dateID = 2
select d;
foreach (FlaggedDate date in dates)
{
... etc.
}
The data will not actually be downloaded ("enumerated") until the foreach loop ... right? In other words, the "var dates" line defines the query, but the query is not executed until the foreach loop.
Given that (if my assumptions are correct), what's the real difference between eager loading and lazy loading?? It seems that in either case, the data does not appear until the enumeration. Am I missing something?
(My specific experience is with code-first, POCO development, by the way ... though the questions may apply more generally.)

Your description of (1) is correct, but it is an example of Eager Loading rather than Lazy Loading.
Your description of (2) is incorrect. (2) is technically using no loading at all, but will use Lazy Loading if you try to access any non-scalar values on your FlaggedDates.
In either case, you are correct that no data will be loaded from your data store until you attempt to "do something" with the _flaggedDates. However, what happens is different in each case.
(1): Eager loading: as soon as you begin your for loop, every one of the objects that you have specified will get pulled from the database and built into a gigantic in-memory data structure. This will be a very expensive operation, pulling an enormous amount of data from your database. However, it will all happen in one database round trip, with a single SQL query getting executed.
(2): Lazy loading: When your for loop begins, it will only load the FlaggedDates objects. However, if you access related objects inside your for loop, it will not have those objects loaded into memory yet. The first attempt to retrieve the scheduledSchools for a given FlaggedDate will result in either a new database roundtrip to retrieve the schools, or an Exception being thrown because your context has already been disposed. Since you'd be accessing the scheduledSchools collection inside a for loop, you would have a new database round trip for every FlaggedDate that you initially loaded at the beginning of the for loop.
Reponse to Comments
Disabling Lazy Loading is not the same as Enabling Eager Loading. In this example:
context.ContextOptions.LazyLoadingEnabled = false;
var schools = context.FlaggedDates.First().scheduledSchools;
The schools variable will contain an empty EntityCollection, because I didn't Include them in the original query (FlaggedDates.First()), and I disabled lazy loading so that they couldn't be loaded after the initial query had been executed.
You are correct that the where d.dateID == 2 would mean that only the objects related to that specific FlaggedDate object would be pulled in. However, depending on how many objects are related to that FlaggedDate, you could still end up with a lot of data going over that wire. This is due to the way the EntityFramework builds out its SQL query. SQL Query results are always in a tabular format, meaning you must have the same number of columns for every row. For every scheduledSchool object, there needs to be at least one row in the result set, and since every row has to contain at least some value for every column, you end up with every scalar value on your FlaggedDate object being repeated. So if you have 10 scheduledSchools and 10 interviews associated with your FlaggedDate, you'll end up with 20 rows that each contain every scalar value on FlaggedDate. Half of the rows will have null values for all the ScheduledSchool columns, and the other half will have null values for all of the Interviews columns.
Where this gets really bad, though, is if you go "deep" in the data you're including. For example, if each ScheduledSchool had a students property, which you included as well, then suddenly you would have a row for each Student in each ScheduledSchool, and on each of those rows, every scalar value for the Student's ScheduledSchool would be included (even though only the first row's values end up getting used), along with every scalar value on the original FlaggedDate object. It can add up quickly.
It's difficult to explain in writing, but if you look at the actual data coming back from a query with multiple Includes, you will see that there is a lot of duplicate data. You can use LinqPad to see the SQL Queries generated by your EF code.

No difference. This was not true in EF 1.0, which didn't support eager loading (at least not automatically). In 1.0, you had to either modify the property to load automatically, or call the Load() method on the property reference.
One thing to keep in mind is that those Includes can go up in smoke if you query across multiple objects like so:
from d in ctx.ObjectDates.Include("MyObjectProperty")
from da in d.Days
ObjectDate.MyObjectProperty will not be automatically loaded.

Related

Materialize partial set of results with EF Core 2.1

Let's say I have a large collection of tasks stored in DB and I want to retrieve the latest one according to requesting user's permissions. The permissions checking logic is complex and not related to the persistence layer, hence I can't put it in an SQL query. What I'm doing today is retrieving ALL tasks from DB ordered by descending date, then filter them by permissions set and taking first one. Not a perfect solution: I retrieve thousands of objects when I need only one.
My question is: how can I materialize objects coming from DB until I find one that matches my criteria and discards rest of results?
I thought about one solution, but couldn't find information regarding EF Core behavior in this case and don't know how to check it myself:
Build the IQueryable, cast to IEnumerable, then iterate over it and take the first good task. I know that IQueryable part will be executed on Server and IEnumerable on the client, but I don't know if all task will be materialized before applying FilterByPermissions or it will be performed by demand? And I also don't like the synchronous nature of this solution.
IQueryable<MyTask> allTasksQuery = ...;
IEnumerable<MyTask> allTasksEnumerable = allTasksQuery.AsEnumerable();
IEnumerable<MyTask> filteredTasks = FilterByPermissions(allTasksEnumerable);
MyTask latestTask = filteredTasks.FirstOrDefault();
The workaround could be retrieving small sets of data (pages of 50 for example) until one good task is found but I don't like it.

In the Entity Framework Code First Approach - perform Updaterange()

I need to perform the update operation on the List of Model Object.
As of now I am able to update while looping through them.
*public virtual void UpdateList(List<TEntity> entity)
{
foreach (TEntity ent in entity)
{
if (Entities.Any(h=>h.Id == ent.Id))
{
Entities.Attach(ent);
_context.Entry(ent).State = EntityState.Modified;
}
}
}*
Is there any direct way I can update List without looping through them?
It seems like what you're looking for is bulk operations. Entity Framework isn't suited for bulk operations. As the number of changes EF has to track increases, performance degrades.
There are some possible work arounds:
Commit your changes on intervals as you enumerate through the list that you're updating. i.e. SaveChanges after you have inserted or updated 1000 more items in the context.
The more items that are tracked by EF, the harder EF has to work. Part of this is alleviated by option 1, but you can also disable tracking. Fair warning, there are some catches to this so be certain to read up on everything you have to account for when disabling change detection.
If you're process requires a massive amount of changes, you might be better off using stored procedures than EF.
Note: 1000 items in option one is an arbitrary number. If you choose to go this route, you should run tests to see what range works best for the objects that you are working with.
What you'll find is:
a list of size listSize
a number n between 1 and listSize
It's much faster to call SaveChanges after n number of items than calling SaveChanges after every item. If listSize is on the order of tens of thousands or hundreds of thousands of updates, then n is most likely less than listSize. The goal is to find the value of n that will allow you to update the entire list the fastest.

Loading a context

Have I understood this correctly please.
When you are running a web application to view pages and you create an instance of the context is that instance loading all the database date into it?
If it does does that not take up a lot of memory a blog with five years of blogs could have 1,500 to 2,000 (or more)post in it, with all the comments tags etc that would be a great deal of data.
So what does happen when you create the instance of a context?
A context only loads the records that you request, so when you first instantiate one it will be empty and won't perform any queries against the database until you tell it to. Any entities you load through it will (usually) be cached within the context, though, so they use more and more memory every time you run a query and can become very large over time.
For that reason, and because contexts are relatively cheap to instantiate, it's a good idea to only keep them alive while you actually need them, and dispose of them as soon as you're done. This is part of the "unit of work" pattern -- basically using a new context for each set of operations that go together as one unit or transaction.
Edited to add:
If you're performing read-only queries (i.e. you just want to display data, you don't need to make changes and save them back to the database), you might check out non-tracking queries (e.g. the .AsNoTracking() method if you're using a DbContext/DbSet, or the MergeOption.NoTracking property if you're using an ObjectContext/ObjectSet) -- that will avoid caching the results in the context, increasing performance and reducing memory use.

Why does my Entity Framework query return a record twice?

I have two tables, one with Contacts (people) and another with Addresses.
Gregory Alderson has one Contact entry and two Address entries.
This is the code:
that returns two a records for Gregory Alderson:
If I leave LazyLoadingEnable set to ‘true’, it does the same thing but both records contain both addresses:
The book I’m learning from (Programming Entity Framework 2nd edition – good book BTW) explained that LazyLoading is disabled so the Count method does not impact the results, but so far has not explained why it would do so.
Can someone explain to me why I get two records with LazyLoading turned off, and two records (both with both addresses) with LazyLoading turned on?
A good way to get a better understanding of what's going on is run up Query Analyzer and watch what SQL statements are executed against the db or better yet get a copy of Ayende's EF Profiler.
Essentially with eager loading you need to be more explicit on what related entities you want returned. This is done using the Include method on the context object. Without lazy loading enabled you're making a single hit against the db and then evaluating only against the locally held data rather than making another request to the db for further data used in the Count().
The issue here seems to be due to what you're selecting. Specifically:
select new {a, a.Contact}
Contact is actually a navigation property of a in this case. When you select a, you're selecting everything on a, including Contact. Also selecting a.Contact means you get contact twice.
With lazy loading enabled you don't have to select it. If you select a and then simply use a.Contact somewhere else in your code, EF will go load it for you. The "lazy" in lazy loading is that it's not loaded unless you actually try to use it. With lazy loading on, you just need this:
select a
With lazy loading off, that doesn't happen. But you still don't want to select it. Instead you'd use Include:
from a in context.Addresses.Include("Contact") select a
That tells EF that you always want it to load the Contact navigation property and to do so immediately. It'll be available right away and will still be available if you dispose of the context (which isn't the case with lazy loading).
I suspect the problem here is that by selecting all of a AND a property of a, you're getting a weird side effect.
The OP's question, summed up in the last two paragraphs, was:
The book I’m learning from (Programming Entity Framework 2nd edition – good book BTW) explained that LazyLoading is disabled so the Count method does not impact the results, but so far has not explained why it would do so.
Can someone explain to me why I get two records with LazyLoading turned off, and two records (both with both addresses) with LazyLoading turned on?
The part about the effects of LazyLoading on Count() was explained by Daz. But neither Daz nor Tridus explained why the OP was getting two Contact records for Gregory Anderson in the output regardless of LazyLoading. That is the question that I will answer here.
The problem is that the iteration was essentially happening over Addresses. That is, the outer foreach loop was executing once for each Address in Canada. So because Gregory Anderson has two addresses in Canada, the loop is executed twice for him.
Note that if Gregory Anderson also had an address in the US, the loop would still be executed twice, but all three addresses would be printed out, not just the addresses in Canada.
If the intention was actually to list each Contact once and then list each Address for that Contact, a more appropriate design would be the following:
var contacts = context.Contacts
.Where(c => c.Addresses.Any(a => a.CountryRegion == "Canada"));
foreach (var c in contacts)
{
Console.WriteLine("{0} {1}, Addresses: {2}",
c.FirstName.Trim(),
c.LastName.Trim(),
c.Addresses.Count());
foreach (var a in c.Addresses)
{
Console.WriteLine("...{0}, {1}\n", a.Street1.Trim(), a.City.Trim());
}
}
(I tried to keep the code as close to identical as I could, but I couldn't think of how to write the query using query syntax, so I used LINQ syntax because I'm much more familiar with it...)
This code will result in a distinct list of Contacts being returned from the database and then each Contact will be output one time along with each of the child Addresses.
Hope that helps someone who might be dealing with this and didn't find the other answers helpful on this aspect.

Entity Framework - Object Properties

I'm trying to use Entity Framework for my model/data access, and running into speed issues, hopefully someone can assist?
What I've been doing is using the EF diagram with the default code generator to generate partial classes describing anything that is going to be persisted. I then have partial classes with methods and non-persisted properties. These might be simple things like full name as the concatenated first/last names, or derived from related entities, such as total stock as the sum of the collection of stock locations' quantity.
Any methods accessing related entities do work, but seem to be very slow. Here's an example of an especially slow one, it takes about 6-7 seconds:
Quick description of entities involved:
Supplier --> supplies many SupplierLines, each has cost price
SupplierLine --> broken down into StockLines
StockLine --> has many Locations, each location has quantity
So I'm trying to add a method to get the total stock value from a supplier, i.e. mySupplier.StockValue() which should obviously be the total of cost price x total quantity for each supplier line and its stock lines.
I've done this as a function in Supplier as:
Public Function StockValue() As Decimal
Return SupplierLines.
Sum(Function(sul) sul.LastPrice * sul.StockLines.Sum(Function(skl) skl.Locations.Sum(Function(l) l.Quantity)))
End Function
Which gives correct results, but takes forever to do so.
Any thoughts as to how I can get better results?
I want to keep my model classes persistence-ignorant
I want to keep all my logic compile-checked
I want to everything to be easily unit-testable using fake data sources
I don't really want to pre-load this information, because it isn't always needed
Your problem is most probably lazy loading. If you load only Supplier entity it doesn't load its realted SupplierLine instances and they related StockLine instances and their related Location instances. If this is really case (if you didn't Include them in query retrieving Supplier) the situation is as follows:
SupplierLines. - executes query to database to get all lines for current Supplier
sul.StockLines. - executes separate query for each supplier line to get its stock lines
skl.Locations. - executes separate query for each stock line to get its locations
So depending on amount of data you have in these collections you can end up with tens to thousands sql queries executed in your first call to StockValue. Next call will be fast because data are already loaded.
If you want to avoid it you must retrieve supplier with all its realted data:
context.Supplier.Include("SupplierLines.StockLines.Locations").Where(...);
I found an Include()... method that extends IEnumerable, and use that. It resolves the performance issues, while maintaining ignorance of the context.