I am using EF 4.1, with POCOs which are lazy loaded.
Some sample queries that I run:
var discountsCount = product.Discounts.Count();
var currentDiscountsCount = product.Discounts.Count(d=>d.IsCurrent);
var expiredDiscountsCount = product.Discounts.Count(d=>d.IsExpired);
What I'd like to know, is whether my queries make sense, or are poorly performant:
Am I hitting the database each time, or will the results come from cached data in the DbContext?
Is it okay to access the navigation properties "from scratch" each time, as above, or should I be caching them and then performing more queries on them, for example:
var discounts = product.Discounts;
var current = discounts.Count(d=>d.IsCurrent);
var expired = discounts.Count(d=>d.Expired);
What about a complicated case like below, does it pull the whole collection and then perform local operations on it, or does it construct a specialised SQL query which means that I cannot reuse the results to avoid hitting the database again:
var chained = discounts.OrderBy(d=>d.CreationDate).Where(d=>d.CreationDate < DateTime.Now).Count();
Thanks for the advice!
EDIT based on comments below
So once I call a navigation property (which is a collection), it will load the entire object graph. But what if I filtered that collection using .Count(d=>d...) or Select(d=>d...) or Min(d=>d...), etc. Does it load the entire graph as well, or only the final data?
product.Discounts (or any other navigation collection) isn't an IQueryable but only an IEnumerable. LINQ operations you perform on product.Discounts will never issue a query to the database - with the only exception that in case of lazy loading product.Discounts will be loaded once from the database into memory. It will be loaded completely - no matter which LINQ operation or filter you perform.
If you want to perform filters or any queries on navigation collections without loading the collection completely into memory you must not access the navigation collection but create a query through the context, for instance in your example:
var chained = context.Entry(product).Collection(p => p.Discounts).Query()
.Where(d => d.CreationDate < DateTime.Now).Count();
This would not load the Discounts collection of the product into memory but perform a query in the database and then return a single number as result. The next query of this kind would go to the database again.
In your examples above the Discounts collection should be populated by Ef the first time you access it. The subsequent linq queries on the Discount collection should then be performed in memory. This will even include the last complex expression.
You can also use the Include method to make sure you are getting back associated collection first time. example .Include("Discounts");
If your worried about performance I would recommend using SQL Profiler to have a look at what SQL is being executed.
Related
I have read about lazy loading from this web site.
Enable or disable LazyLoading
"If we request a list of Students with LazyLoading enabled, the data provider will get all of our students from the DB but each StudentAddress property won’t be loaded until the property will be explicitly accessed."
This statement says that when I set Lazy Loading Enabled = true the related data won't be loaded. However
List<Students> stdList = Datacontext.Students.ToList();
if I set lazy loading enabled = true the above code returns all stundents with their teachers and address. What is the point that I am missing here? Please could someone explain it?
No matter what setting you have, if you use .ToList() it will "enumerate the enumerable". This is very significant, and this phrase should become common knowledge to you.
When .ToList() is used, many things occur. Enumerating the enumerable means that the previous set of how to enumerate the set is now being used to actually iterate through the set and populate the data. What that means is that the previous enumerator (which was stored internally as an Expression Tree) is now going to be sent from Entity Framework to your SQLProvider Factory. That will then convert the object graph from the Expression Tree into SQL and execute the query on the server, thus returning the data and populating your list.
Lazy loading instead of using ToList() would be if you had this IQueryable enumerable, and then iterated that manually loading each element in the set, or only partial elements in the set.
Once you have the list of elements returned, lazy loading will only come in to play if there are navigational properties. If there were related properties, for example if you have an Invoice and you want to get the related Customer information from the customer table. The relation will not explicitly be returned at first, only the invoices. So to get the customer data you could then (while the context was still open, i.e. not disposed) access that via the .Customer reference on your object and it would load. Conversely, to load all the customers during the original enumeration, you could use the .Include() functionality on your queryable, and that would then tell the sql provider factory to use a join when issuing the query.
In your specific example,
List<Students> stdList = Datacontext.Students.ToList();
This will actually not load all of the Teachers and Addresses regardless of if lazy loading is enabled or not. It will only load the students. If you want to lazy load a Teacher, while the Datacontext is still not disposed, you could then use
var firstStudent = stdList.First();
var teacher = firstStudent.Teacher;
//and at this point lazy loading will fetch the teacher
//by issuing **another** query (round trip) to the database
That would only be possible if lazy loading were enabled.
The alternative to this is to eager load, which would include the teachers and addresses. That would look like this
List<Students> stdList = Datacontext.Students
.Include( s => s.Teacher )
.Include( s => s.Address ).ToList();
And then later on if you were to try to access a teacher the context could be disposed and access would still be possible because the data was already loaded.
var firstStudent = stdList.First();
var teacher = firstStudent.Teacher;
//and at this point the teacher was already
//loaded and as a result no additional round trip is required
How was that you noticed that property was loaded, using the debugger? If that is the case, then you already have the answer. With the debugger you are accessing to that property too, so, that also triggers lazy loading.
How this works?
If your entities meets these requirements, then EF will create a proxy class for each of your entities that support change tracking or lazy loading. This way you can load the related entities only when these are accessed. As I explained earlier, the debugger will also trigger lazy loading.
Now, be careful with lazy loading, once you context has been disposed, you will get an exception when you try to get access to one of the related properties. So, I would suggest to use eager loading in that case.
I've created a stored procedure to manage searching on my application. The stored procedure returns a collection of "cases".
When displaying my search results I reference a property of each "case" that is a linked table in the database. At the moment lazy loading makes the EF framework load the linked property for each "case".
Mini profiler shows this as duplicate trips to the database one for each object returned (possibly 200 objects) is there a way to say load all the linked properties in one go rather than leaving lazy loading to do it?
I don't think you can use .Include() directly but once you get the results from the stored procs you can run a single Linq query to get all the related results - something like:
var relatedReults = ctx.RelatedResults.Where(r => ids.Contains(r.Id)).ToArray();
where ids is an array of ids you build based on the results returned by your stored proc. EF should be able to fix up the relations so after running the query you should be able to access related entities without sending additional queries.
I'm using EF (dll version is 4.4) to query against a database. The database contains several tables with course information. When having a look what actually is sent to the db I see a massive, almost 1300 line SQL query (which I'm not going to paste here because of it's size). The query I'm running on the context looks like:
entities.Plans
.Include("program")
.Include("program.offers")
.Include("program.fees")
.Include("program.intakes")
.Include("program.requirements")
.Include("program.codes")
.Include("focuses")
.Include("codes")
.Include("exceptions")
.Include("requirements")
where plans.Code == planCode
select plans).SingleOrDefault();
I want to avoid having to go back to the server when collecting information from each of the related tables but with such a large query I'm wondering if there is there a better way of doing this?
Thanks.
A bit late but, as your data is only changing once a day, look at putting everything you need into an indexed view and place this view in your EF model.
You can usually add the .Include() after the where clause. This means that you're only pulling out information that matches what you want, see if that reduces your query any.
As you are performing an eager loading, so if you are choosing the required entities then its fine. Otherwise you can go with Lazy Loading, but as you specified that you don't want database round trip, so you can avoid it.
I would suggest, If this query is used multiple times then you can use compiled query. So that it will increase the performance.
Go through this link, if you want..
http://msdn.microsoft.com/en-us/library/bb896297.aspx
If you're using DbContext, you can use the .Local property on the context to look if your entity is already retrieved and therefore attached to the context.
If the query had run before and your root Plan entities are already attached based on Plan.Code == planId, presumably its sub-entities are also already attached since you did eager loading, so referring to them via the navigation properties won't hit the DB for them again during the lifetime of that context.
This article may be helpful in using .Local.
You may be able to get a slightly more concise SQL query by using projection rather than Include to pull back your referenced entities:
var planAggregate =
(from plan in entities.Plans
let program = plan.Program
let offers = program.Offers
let fees = program.Fees
//...
where plan.Code == planCode
select new {
plan
program,
offers,
fees,
//...
})
.SingleOrDefault();
If you disable lazy loading on your context this kind of query will result in the navigation properties of your entities being populated with the entities which were included in your query.
(I've only tested this on EF.dll v5.0, but it should behave the same on EF.dll v4.4, which is just EF5 on .NET 4.0. When I tested using this pattern rather than Include on a similarly shaped query it cut about 70 lines off of 500 lines of SQL. Your mileage may vary.)
I have a very deep relational tree in my model design, that is, the root entity contains a collection of entities that contains more collections of other entities that contains more collections and on an on ... I develop a business layer that other developers have to use to perform operations, including get/save data.
Then, I am thinking about what is the best strategy to cope with this situation. I cannot allow that when retrieving a entity, EF resolves all the dependency tree, since it will end in a lot of useless JOIN (useless because maybe I do not need that data in the next level).
If I disable lazy loading and enforce eager loading for what is needed, it works as expected, but if other developer calls child.Parent.Id instead of child.ParentId trying to do something new (like a new requirement or feature not considered at the beggining), it will get a NullReferenceException if that dependency was not included, which is bad... but it will be a "fast error", and it could be fixed straight away.
If I enable lazy loading, accessing child.Parent.Id instead of child.ParentId will end in a standalone query to the DB each time it is accessed. It won't fail, but it is worse because there is no error, only a decrement in the performance, and all the code should be reviewed.
I am not happy with any of these two solutions.
I am not happy having entities that contains null or empty collections, when in reality, it is not true.
I am not happy with letting EF perform arbitrary queries to the DB at any moment. I would like to get all the information in one shoot if possible.
So, I come up with several possible solutions that involve disabling lazy loading and enforcing eager loading, but not sure which is better:
I can create a EntityBase class, that contains the data in the table without the collections, so they cannot be accessed. And concrete implementations that contains the relationships, the problem is that you do not have much flexibility since C# does not allow multi-inheritance.
I can create interfaces that "mask" the objects hidding the properties that are not available at that method call. For example, if I have a User.Roles property, in order to show a grid will all users, I do not need to resolve the .Roles property, so I could create an interface 'IUserData' that does not contain such property.
But I do not if this additional work is worth, maybe a fast NullReferenceException indicating "This property has not been loaded" would be enough.
Would it be possible to throw a specific exception type if the property is virtual and it has not been overridden/set ?
What method do you use?
Thanks.
In my opinion you are trying to protect the developers from the need to understand what they are doing when they access data and what performance implications it can have - which might result in an unnecessary convoluted API with a lot of helper classes, base classes, interfaces, etc.
If a developer uses user.MiddleName.Trim() and MiddleName is null he gets a NullReferenceException and did something wrong, either didn't check for null or didn't make sure that the MiddleName is set to a value. The same when he accesses user.Roles and gets a NullReferenceException: He didn't check for null or didn't call the appropriate method of your API that loads the Roles of the user.
I would say: Explain how navigation properties work and that they have to be requested explicitly and let the application crash if a developer doesn't follow the rules. He needs to understand the mistake and fix it.
As a help you could make loading related data explicit somehow in the API, for example with methods like:
public User GetUser(int userId);
public User GetUserWithRoles(int userId);
Or:
public User GetUser(int userId, params Expression<Func<User,object>>[] includes);
which could be called with:
var userWithoutRoles = layer.GetUser(1);
var userWithRoles = layer.GetUser(2, u => u.Roles);
You could also leverage explicit loading instead of lazy loading to force the developers to call a method when they want to load a navigation property and not just access the property.
Two additional remarks:
...lazy loading ... will end in a standalone query to the DB each time
it is accessed.
"...and not yet loaded" to complete this. If the navigation property has already been loaded within the same context, accessing the property again won't trigger a query to the database.
I would like to get all the information in one shoot if possible.
Multiple queries do not necessarily result in worse performance than one query with a lot of Includes. In fact complex eager loading can lead to data multiplication on the wire and make entity materialization very time consuming and slower than multiple lazy or explicit loading queries. (Here is an example where a query's performance has been improved by a factor of 50 by changing it from a single query with Includes to more than 1000 queries without Include.) Quintessence is: You cannot reliably predict what's the best loading strategy in a specific situation without measuring the performance (if the performance matters in that situation).
Is it possible to load an entity excluding some properties? One of this entity's properties is expensive to select. I would like to lazy load this property. Is that possible?
Now that you have read everyone's reply, I will give you the correct answer. EF does not support lazy loading of properties. However it does support a much powerful concept then this. It's called table splitting where you can map a table to two entities. Say a product table in the the database can be mapped to product entity and ProductDetail entity. You can then move the expensive fields to the ProductDetail entity and then create a 1..1 association between prodcut and productdetail entity. You can then lazy load the productdetail association only when you need it.
In my performance chapter of my book, I have a recipe called.
13-9. Moving an Expensive Property to Another Entity
Hope that helps!
Julie Lerman has an article on how to split a table
With a scalar property, the only way to selectively not load a certain property is to project in ESQL or L2E:
var q = from p in Context.People
select new
{
Id = p.Id,
Name = p.Name // note no Biography
};
+1 to Dan; doing this lazily is worse than loading it up-front. If you want to control loading, be explicit.
stimms is correct, but be careful while using lazy loading. You may have performance issues and not realize the property is getting loaded at a specific location in your code. This is because it loads the data when you use the property
I prefer to use explicit loading. This way you know when they get loaded and where. Here's a link that gives an example for the LoadProperty http://sankarsan.wordpress.com/2010/05/09/ado-net-entity-framework-data-loading-part-2/
You can also you Eager Loading by using the Include method. Example here:http://wildermuth.com/2008/12/28/Caution_when_Eager_Loading_in_the_Entity_Framework
Given a query over an EntityFramework DbSet, where the targeted entity contains a BigProperty and a SmallProperty,
When you're trying to only access the SmallProperty without loading the BigProperty in memory :
//this query loads the entire entity returned by FirstOrDefault() in memory
//the execution is deferred during Where; the execution happens at FirstOrDefault
db.BigEntities.Where(filter).FirstOrDefault()?.SmallProperty;
//this query only loads the SmallProperty in memory
//the execution is still deferred during Select; the execution happens at FirstOrDefault
//a subset of properties can be selected from the entity, and only those will be loaded in memory
db.BigEntities.Where(filter).Select(e=>e.SmallProperty).FirstOrDefault();
Therefore you could exploit this behaviour to only query the BigProperty where you actually need it, and use select statements to explicitly filter it out everywhere else.
I tested this with the Memory Usage functionality from the Visual Studio debug Diagnostic Tools.