IQueryable and lazy loading - entity-framework

I'm having a hard time determining the best way to handle this... With Entity Framework (and L2S), LINQ queries return IQueryable. I have read various opinions on whether the DAL/BLL should return IQueryable, IEnumerable or IList. Assuming we go with IList, then the query is run immediately and that control is not passed on to the next layer. This makes it easier to unit test, etc. You lose the ability to refine the query at higher levels, but you could simply create another method that allows you to refine the query and still return IList. And there are many more pros/cons. So far so good.
Now comes Entity Framework and lazy loading. I am using POCO objects with proxies in .NET 4/VS 2010. In the presentation layer I do:
foreach (Order order in bll.GetOrders())
{
foreach (OrderLine orderLine in order.OrderLines)
{
// Do something
}
}
In this case, GetOrders() returns IList so it executes immediately before returning to the PL. But in the next foreach, you have lazy loading which executes multiple SQL queries as it gets all the OrderLines. So basically, the PL is running SQL queries "on demand" in the wrong layer.
Is there any sensible way to avoid this? I could turn lazy loading off, but then what's the point of having this "feature" that everyone was complaining EF1 didn't have? And I'll admit it is very useful in many scenarios. So I see several options:
Somehow remove all associations in the entities and add methods to return them. This goes against the default EF behavior/code generation and makes it harder to do some composite (multiple entity) LINQ queries. It seems like a step backwards. I vote no.
If we have lazy loading anyway which makes it hard to unit test, then go all the way and return IQueryable. You'll have more control farther up the layers. I still don't think this is a good option because IQueryable ties you to L2S, L2E, or your own full implementation of IQueryable. Lazy loading may run queries "on demand", but doesn't tie you to any specific interface. I vote no.
Turn off lazy loading. You'll have to handle your associations manually. This could be with eager loading's .Include(). I vote yes in some specific cases.
Keep IList and lazy loading. I vote yes in many cases, only due to the troubles with the others.
Any other options or suggestions? I haven't found an option that really convinces me.

You could make your methods accept some sort of load strategy.
Func<ObjectSet<Order>, ObjectQuery<Order>> loadSpan =
orders=> orders.Include("OrderLines");
foreach (Order order in bll.GetOrders(loadSpan))
{
foreach (OrderLine orderLine in order.OrderLines)
{
// Do something
}
}
And inside your GetOrders method, you do something like
public IList<Oorder> GetOrders(
Func<ObjectSet<Order>, ObjectQuery<Order>> loadSpan)
{
var ordersWithSpan = loadSpan(context.OrderSet);
var orders = from order in ordersWithSpan
where ...your normal filters etc
return orders.ToList();
}
This will allow you to specify entire load graphs per use case.
You can ofcourse also wrap those strategies up in some wrapper class so you would write:
//wrapped in a static class "OrderLoadSpans"
foreach (Order order in bll.GetOrders(OrderLoadSpans.WithOrderLines))
HTH

Related

How to solve initial very slow EF Entity call which uses TPH and Complex Types?

I am using EF6
I have a generic table which holds data for different types of class objects using the "Table Per Hierarchy" Approach. In addition these class objects use complex types for defining types for their properties.
So using a made up example,
Table = Person
"Mike the Teacher" is a "Teacher" instance of Person with a personType of "Teacher"
The "Teacher" instance has 2 properties, complextypePersonalDetails and complextypeAddress.
complextypePersonalDetails contains
First Name, Surname and Age.
complextypeAddress contains
HouseName, Street, Town, City, County.
I admit that this design may be over the top, and the problem may be of my making, but that aside I wanted to check to see whether I could do anymore with EF6 before I rewrite it.
I am performance profiling the code with JetBrains DotTrace.
On first call, say on
personTeacher = db.person.OfType().First()
I get a massive delay of around 150,000ms
around:
SerializedGeneratedViewOfType (150,000ms)
TryGenerateQueryViewOfType
GenerateTypeSpecificQueryView
GenerateQueryViewForSingleExtent
GenerateQueryViewForExtentAndType
GenerateViewsForExtentAndType
GenerateViewComponents
EnsureExtentIsFullyMapped (90,000ms)
GenerateCaseStatements (60,000ms)
I have created a pregenerated View using the "InteractivePreGeneratedViews" nuget package which creates the SQL. However even with this I still need to incur my first hit. Also this hit seems to happen every time the Webserver/Website/AppPool is restarted.
I am not totally sure of the EF process, but I guess there is some further form of runtime compilation or caching which happens when web app starts. Where could this be happening and is there a proactive method that I could use to pregenerate/precompile/precache this problem away.
In the medium term, we will rewrite this code in Dapper or EF.Core. So for now, any thoughts on what can be done?
Thanks.
I had commented on this before, but retracted it, but just agreeing with "this design may be over the top, and the problem may be of my making", but I thought I'd see if anyone else jumped in.
The initial spin-up cost is due to EF needing to resolve the mapping for your schema. This happens once, the first time a DBSet on the context is accessed. You can mitigate this by executing a query on your application start, I.e.
void Application_Start(object sender, EventArgs e)
{
// Initialization stuff...
using (var context = new MyContext())
{
var result = context.MyTable.Any(); // Spin up will happen here, not when the first user attempts to access a query.
}
}
You actually need to run a query for the the DbContext to resolve the mapping, just new-ing one up won't do it.
For larger, or more complex schemas you can also look to utilize bounded contexts where each context maps a particular set of relationships for a specific area of the application. The less complex/comprehensive a context is, the faster it initializes.
As far as the design goes, TPH is for representing inheritance, which is where you need to establish an "is-a" relation between like entities. Relational models, and ORMs by definition can support this, but they're geared more towards "has-a" relationships. Rather than having a model where you go "is-a person with an address", the relation is best mapped out that a person may "have-an" address. I've worked on a system that was designed by a team of engineers where an entire reporting system with dynamic rules was represented by 6 tables. Honestly, those designs are a nightmare to maintain.
I don't know why OfType() is so slow, but I found a fast and easy workaround by replacing it with a cast; EntityFramework seems to support that just fine and without the performance penalty.
var stopwatch = Stopwatch.StartNew();
using (var db = new MyDbContext())
{
// warn up
Console.WriteLine(db.People.Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// fast
Console.WriteLine(db.People.Select(p => p as Teacher).Where(p => p != null).Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// slow
Console.WriteLine(db.People.OfType<Teacher>().Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
}
20
3308 ms
2
3796 ms
2
10026 ms

Disabling Lazy Loading is dangerous?

I have an ASP.NET MVC application utilizing Entity Framework for the data layer.
In one of my methods I retrieve the seasonal availability data for a product, and afterwards, the best tax rate for the product.
public ProductList FetchProductSearchList(ProductSearchCriteria criteria)
{
...
var avail = ProductAvailabilityTemplate.Get(criteria.ProductID);
...
var tr = TaxRate.BestMatchFor(criteria.ProductID, criteria.TaxCode);
...
}
In the data layer for ProductAvailabilityTemplate.Get, I had been optimizing the performance of my LINQ code. In particular, I had set ctx.ObjectContext.ContextOptions.LazyLoadingEnabled = false; to prevent EF from loading some entities (via navigation properties) that I don't need in this scenario.
However, once this change was made I noticed that my TaxRates weren't loading fully, because ctx.ObjectContext.ContextOptions.LazyLoadingEnabled was still false in my Tax data layer code. This meant that an entity linked to TaxRate via a navigation property wasn't being loaded.
To overcome this problem I simply set ctx.ObjectContext.ContextOptions.LazyLoadingEnabled = true; in the Tax data layer method, but I am concerned that an unrelated change could cause a problem like this. It seems that you can't safely disable lazy loading for one feature without potentially affecting the operation of whatever is called afterwards. I am tempted to remove all navigation properties, disable lazy loading, and use good old fashioned joins to load exactly what I need for each data layer call, no more no less.
Would welcome any advice.
It's a trade off:
Lazy Loading
gives you the benefit of not needing to specify the depth of graph during loading
will need to return to the database to retrieve missing results
requires some pollution of the POCO's (e.g. virtual properties with proxies)
requires the DbContext to be longer lived, for the duration of all data accesses.
Eager Loading
requires a lot more thought into the depth of loading during each fetch
will typically generate fewer queries with wider joins to fetch the graph at once
does not require any alteration or ceremony around your entities
Allows much shorter lived connections and DbContexts
FWIW, I've generally done prototype work with Lazy Loading enabled, to get software to a demonstrable state, and once the data access patterns stabilize, then switch off Lazy Loading and move to explicitly Included references. A few Unit Tests checking for null references will also do wonders at this point. I am loathe to deliver a production system with Lazy Loading still enabled, as there is an element of non-determinism (e.g. difficult to fully test), and the need to return to the DB for further data will hurt performance.
Either way, I wouldn't switch off all Navigation and do explicit Joins - you are losing the power of navigability that an ORM provides. When you switch out of Lazy Loading, simply explicitly define the entities to be eager loaded with applicable Includes
I was fond of lazy loading when I started using EF, but after a while I realized that it was affecting performance since it effectively disables joins and pulls all subdata in separate queries even if you need to consume it all at once.
So now I'm rather using Includes to eagerly load the sub-entities that I'm interested in. You could also do this somewhat dynamic, for instance by providing a includeDetails parameter:
public IEnumerable<Customer> LoadCustomersStartingWithName(string name, bool includeDetails)
{
using (var db = new MyContext())
{
var customers = db.Customers;
if (includeDetails)
customers = customers.Include(x => x.Orders).Include(x => x.ContactPersons);
customers = customers.Where(x => x.Name.StartsWith(name));
return customers;
}
}
For the code to work in EF6, you would also need to include
using System.Data.Entity;
at the top of the class

How do I avoid large generated SQL queries in EF when using Include()

I'm using EF (dll version is 4.4) to query against a database. The database contains several tables with course information. When having a look what actually is sent to the db I see a massive, almost 1300 line SQL query (which I'm not going to paste here because of it's size). The query I'm running on the context looks like:
entities.Plans
.Include("program")
.Include("program.offers")
.Include("program.fees")
.Include("program.intakes")
.Include("program.requirements")
.Include("program.codes")
.Include("focuses")
.Include("codes")
.Include("exceptions")
.Include("requirements")
where plans.Code == planCode
select plans).SingleOrDefault();
I want to avoid having to go back to the server when collecting information from each of the related tables but with such a large query I'm wondering if there is there a better way of doing this?
Thanks.
A bit late but, as your data is only changing once a day, look at putting everything you need into an indexed view and place this view in your EF model.
You can usually add the .Include() after the where clause. This means that you're only pulling out information that matches what you want, see if that reduces your query any.
As you are performing an eager loading, so if you are choosing the required entities then its fine. Otherwise you can go with Lazy Loading, but as you specified that you don't want database round trip, so you can avoid it.
I would suggest, If this query is used multiple times then you can use compiled query. So that it will increase the performance.
Go through this link, if you want..
http://msdn.microsoft.com/en-us/library/bb896297.aspx
If you're using DbContext, you can use the .Local property on the context to look if your entity is already retrieved and therefore attached to the context.
If the query had run before and your root Plan entities are already attached based on Plan.Code == planId, presumably its sub-entities are also already attached since you did eager loading, so referring to them via the navigation properties won't hit the DB for them again during the lifetime of that context.
This article may be helpful in using .Local.
You may be able to get a slightly more concise SQL query by using projection rather than Include to pull back your referenced entities:
var planAggregate =
(from plan in entities.Plans
let program = plan.Program
let offers = program.Offers
let fees = program.Fees
//...
where plan.Code == planCode
select new {
plan
program,
offers,
fees,
//...
})
.SingleOrDefault();
If you disable lazy loading on your context this kind of query will result in the navigation properties of your entities being populated with the entities which were included in your query.
(I've only tested this on EF.dll v5.0, but it should behave the same on EF.dll v4.4, which is just EF5 on .NET 4.0. When I tested using this pattern rather than Include on a similarly shaped query it cut about 70 lines off of 500 lines of SQL. Your mileage may vary.)

How do you handle deep relational trees in Entity Framework?

I have a very deep relational tree in my model design, that is, the root entity contains a collection of entities that contains more collections of other entities that contains more collections and on an on ... I develop a business layer that other developers have to use to perform operations, including get/save data.
Then, I am thinking about what is the best strategy to cope with this situation. I cannot allow that when retrieving a entity, EF resolves all the dependency tree, since it will end in a lot of useless JOIN (useless because maybe I do not need that data in the next level).
If I disable lazy loading and enforce eager loading for what is needed, it works as expected, but if other developer calls child.Parent.Id instead of child.ParentId trying to do something new (like a new requirement or feature not considered at the beggining), it will get a NullReferenceException if that dependency was not included, which is bad... but it will be a "fast error", and it could be fixed straight away.
If I enable lazy loading, accessing child.Parent.Id instead of child.ParentId will end in a standalone query to the DB each time it is accessed. It won't fail, but it is worse because there is no error, only a decrement in the performance, and all the code should be reviewed.
I am not happy with any of these two solutions.
I am not happy having entities that contains null or empty collections, when in reality, it is not true.
I am not happy with letting EF perform arbitrary queries to the DB at any moment. I would like to get all the information in one shoot if possible.
So, I come up with several possible solutions that involve disabling lazy loading and enforcing eager loading, but not sure which is better:
I can create a EntityBase class, that contains the data in the table without the collections, so they cannot be accessed. And concrete implementations that contains the relationships, the problem is that you do not have much flexibility since C# does not allow multi-inheritance.
I can create interfaces that "mask" the objects hidding the properties that are not available at that method call. For example, if I have a User.Roles property, in order to show a grid will all users, I do not need to resolve the .Roles property, so I could create an interface 'IUserData' that does not contain such property.
But I do not if this additional work is worth, maybe a fast NullReferenceException indicating "This property has not been loaded" would be enough.
Would it be possible to throw a specific exception type if the property is virtual and it has not been overridden/set ?
What method do you use?
Thanks.
In my opinion you are trying to protect the developers from the need to understand what they are doing when they access data and what performance implications it can have - which might result in an unnecessary convoluted API with a lot of helper classes, base classes, interfaces, etc.
If a developer uses user.MiddleName.Trim() and MiddleName is null he gets a NullReferenceException and did something wrong, either didn't check for null or didn't make sure that the MiddleName is set to a value. The same when he accesses user.Roles and gets a NullReferenceException: He didn't check for null or didn't call the appropriate method of your API that loads the Roles of the user.
I would say: Explain how navigation properties work and that they have to be requested explicitly and let the application crash if a developer doesn't follow the rules. He needs to understand the mistake and fix it.
As a help you could make loading related data explicit somehow in the API, for example with methods like:
public User GetUser(int userId);
public User GetUserWithRoles(int userId);
Or:
public User GetUser(int userId, params Expression<Func<User,object>>[] includes);
which could be called with:
var userWithoutRoles = layer.GetUser(1);
var userWithRoles = layer.GetUser(2, u => u.Roles);
You could also leverage explicit loading instead of lazy loading to force the developers to call a method when they want to load a navigation property and not just access the property.
Two additional remarks:
...lazy loading ... will end in a standalone query to the DB each time
it is accessed.
"...and not yet loaded" to complete this. If the navigation property has already been loaded within the same context, accessing the property again won't trigger a query to the database.
I would like to get all the information in one shoot if possible.
Multiple queries do not necessarily result in worse performance than one query with a lot of Includes. In fact complex eager loading can lead to data multiplication on the wire and make entity materialization very time consuming and slower than multiple lazy or explicit loading queries. (Here is an example where a query's performance has been improved by a factor of 50 by changing it from a single query with Includes to more than 1000 queries without Include.) Quintessence is: You cannot reliably predict what's the best loading strategy in a specific situation without measuring the performance (if the performance matters in that situation).

Entity Framework 4 selective lazy loading properties

Is it possible to load an entity excluding some properties? One of this entity's properties is expensive to select. I would like to lazy load this property. Is that possible?
Now that you have read everyone's reply, I will give you the correct answer. EF does not support lazy loading of properties. However it does support a much powerful concept then this. It's called table splitting where you can map a table to two entities. Say a product table in the the database can be mapped to product entity and ProductDetail entity. You can then move the expensive fields to the ProductDetail entity and then create a 1..1 association between prodcut and productdetail entity. You can then lazy load the productdetail association only when you need it.
In my performance chapter of my book, I have a recipe called.
13-9. Moving an Expensive Property to Another Entity
Hope that helps!
Julie Lerman has an article on how to split a table
With a scalar property, the only way to selectively not load a certain property is to project in ESQL or L2E:
var q = from p in Context.People
select new
{
Id = p.Id,
Name = p.Name // note no Biography
};
+1 to Dan; doing this lazily is worse than loading it up-front. If you want to control loading, be explicit.
stimms is correct, but be careful while using lazy loading. You may have performance issues and not realize the property is getting loaded at a specific location in your code. This is because it loads the data when you use the property
I prefer to use explicit loading. This way you know when they get loaded and where. Here's a link that gives an example for the LoadProperty http://sankarsan.wordpress.com/2010/05/09/ado-net-entity-framework-data-loading-part-2/
You can also you Eager Loading by using the Include method. Example here:http://wildermuth.com/2008/12/28/Caution_when_Eager_Loading_in_the_Entity_Framework
Given a query over an EntityFramework DbSet, where the targeted entity contains a BigProperty and a SmallProperty,
When you're trying to only access the SmallProperty without loading the BigProperty in memory :
//this query loads the entire entity returned by FirstOrDefault() in memory
//the execution is deferred during Where; the execution happens at FirstOrDefault
db.BigEntities.Where(filter).FirstOrDefault()?.SmallProperty;
//this query only loads the SmallProperty in memory
//the execution is still deferred during Select; the execution happens at FirstOrDefault
//a subset of properties can be selected from the entity, and only those will be loaded in memory
db.BigEntities.Where(filter).Select(e=>e.SmallProperty).FirstOrDefault();
Therefore you could exploit this behaviour to only query the BigProperty where you actually need it, and use select statements to explicitly filter it out everywhere else.
I tested this with the Memory Usage functionality from the Visual Studio debug Diagnostic Tools.