EF: how to eager load associations correctly - entity-framework

I'm using EF for Data Access in my App. Data Model (greatly simplified):
class Project
{
virtual ICollection<Foo> Foos {get;set;}
virtual ICollection<Bar> Bars {get;set;}
// actually I have many more kinds of Project's data
}
class Foo
{
virtual Project Project {get;set;}
virtual ICollection<Bar> LinkedBars {get;set;}
//And they are greatly intervened with each other
}
//etc etc.
Point is:
All Foo, Bar etc etc have references to each other
This references never leaves Project, i.e. following is always right:
foo.Bars.All(bar => bar.Project == foo.Project) == true
I load projects as follows:
var project = Db.Set<Project>
.Include(p => p.Foos)
.Include(p => p.Bars);
So, after that, if I accessing some foo.Bars I trigger lazy load. That's expected, after all, I have eagerly loaded all required data from Project, Foo and Bar, but not Foo_Bar link table.
Let's modify:
var project = Db.Set<Project>
.Include(p => p.Foos.Select(f => f.Bars))
.Include(p => p.Bars);
Now I have all required data in memory, but (!) generated SQL is clearly not optimal. Actually, EF is loading Project, Foo, Foo_Bar tables and scans Bar twice — once for p.Bars and once for foo.Bars. So, I can't eager load foo.Bar_Ids without eager loading foo.Bars, can I?
What should I do to improve?

AFAIK only way to obtain this is to declare Foo_Bars as explicit link table (not autogenerated by EF6), and Include it

Related

Do I need a foreign key in the EF core code first?

I'm new at code first in entity framework and reading up on relationships, I see everyone does it differently. It might be because of earlier versions, might be the same or might be because of performance.
Let's say I have two tables Company and User.
I would set the company-to-user relationship like this:
public List<User> Users { get; set; } = new List<User>();
Then if I from the user-to-company perspective needed to find the company, I would have this in the User:
public Company Company { get; set; }
And do this query:
return await _clientContext.Users.Where(x => x.Company.Id == companyId).ToListAsync();
Or I could have this:
public int CompanyId { get; set; }
[ForeignKey("CompanyId")]
public Company Company { get; set; }
And have this query:
return await _clientContext.Users.Where(x => x.CompanyId == companyId).ToListAsync();
Also some define Company with the keyword virtual like this:
public virtual Company Company { get; set; }
I'm not sure if every scenario is the same and doing x.CompanyId instead of x.Company.Id would actually be the same. What is used normally?
I generally recommend the first option using Shadow Properties for FKs over the second when using navigation properties. The main reason is that with the second approach there are two sources of truth. For instance with a Team referencing a Coach, some code may use team.CoachId while other code uses team.Coach.CoachId. These two values are not guaranteed to always be in sync. (depending on when you happen to check them when one or the other is updated.)
Updating references between entities via a FK property can have varied behaviour depending on whether the referenced entity is loaded or not.
What is the expected difference between if want to update a team's coach:
var teamA = context.Teams.Single(x => x.TeamId == teamId);
If Team has a Coach navigation property and a CoachId FK reference I could do...
teamA.CoachId = newCoachId;
If TeamA's old coach ID was 1, and the newCoachId = 2, what do you think happens if I have code that lazy loads the coach before SaveChanges?
var coachName = teamA.Coach.Name;
You might expect that since the Coach hadn't been loaded yet it would load in Coach #2's name, but it loads Coach #1 because the change hasn't been committed even though teamA.CoachId == 2. If you check the Coach reference after SaveChanges you get Coach #2.
Depending on whether lazy loading is enabled or not you can get a bit strange behaviour by setting a FK property where navigation properties are nulled. Even when eager loading, changing a FK property will potentially trigger a new lazy load if that new entity isn't already tracked:
var teamA = context.Teams.Include(x => x.Coach).Single(x => x.TeamId == teamId);
teamA.CoachId == newCoachId;
var coachName = teamA.Coach.Name; // Still points to Coach #1's name as expected.
context.SaveChanges();
coachName = teamA.Coach.Name; // Triggers lazy load and return new coach's name.
Saving a FK against an entity that has eager loaded the reference does not automatically re-populate referenced entities. So for instance if you have lazy loading disabled, the same above code:
context.SaveChanges();
coachName = teamA.Coach.Name; // Potential NullReferenceException on teamA.Coach.
This will potentially trigger a null reference exception unless the new coach happens to be tracked by the DbContext prior to SaveChanges being called. If the DbContext is tracking the entity, the new reference will be swapped in on SaveChanges, otherwise it is nulled. (With lazy loading this is covered by the new lazy load call after it was nulled)
When working with navigation properties my default recommendation is to hide FK properties as Shadow Properties. (For EF6 this means using .Map(x => x.MapKey()). For relationships where I only care about the ID, I will expose the FK with no navigation property. So, one or the other. (Such as lookups or bounded contexts where I want raw speed.)
I will deviate sparingly from this for exposing FKs for relationships I may inspect by ID frequently, and treat it as read-only, but still have infrequent need of the navigation property. An example of this would be CreatedBy / CreatedByUserId. Many queries may inspect the CreatedByUserId for data filtering, while some projections may want the CreatedBy.Name etc. A record's CreatedBy doesn't change so I avoid potential pitfalls of the data getting out of sync.
Your second scenario is used normally.
i.e.
public int CompanyId { get; set; }
[ForeignKey("CompanyId")]
public Company Company { get; set; }
And have this query:
return await _clientContext.Users.Where(x => x.CompanyId == companyId).ToListAsync();

When to use Include in EF? Not needed in projection?

I have the following in Entity Framework Core:
public class Book {
public Int32 Id { get; set; }
public String Title { get; set; }
public virtual Theme Theme { get; set; }
}
public class Theme {
public Int32 Id { get; set; }
public String Name { get; set; }
public Byte[] Illustration { get; set; }
public virtual ICollection<Ebook> Ebooks { get; set; }
}
And I have the following linq query:
List<BookModel> books = await context.Books.Select(x =>
new BookModel {
Id = x.Id,
Name = x.Name,
Theme = new ThemeModel {
Id = x.Theme.Id,
Name = x.Theme.Name
}
}).ToListAsync();
I didn't need to include the Theme to make this work, e.g:
List<BookModel> books = await context.Books.Include(x => x.Theme).Select(x => ...
When will I need to use Include in Entity Framework?
UPDATE
I added a column of type Byte[] Illustration in Theme. In my projection I am not including that column so will it be loaded if I use Include? Or is never loaded unless I have it in the projection?
In search for an official answer to your question from Microsoft's side, I found this quote from Diego Vega (part of the Entity Framework and .NET team) made at the aspnet/Announcements github
repository:
A very common issue we see when looking at user LINQ queries is the use of Include() where it is unnecessary and cannot be honored. The typical pattern usually looks something like this:
var pids = context.Orders
.Include(o => o.Product)
.Where(o => o.Product.Name == "Baked Beans")
.Select(o =>o.ProductId)
.ToList();
One might assume that the Include operation here is required because of the reference to the Product navigation property in the Where and Select operations. However, in EF Core, these two things are orthogonal: Include controls which navigation properties are loaded in entities returned in the final results, and our LINQ translator can directly translate expressions involving navigation properties.
You didn't need Include because you were working inside EF context. When you reference Theme inside the anonymous object you are creating, that's not using lazy loading, that's telling EF to do a join.
If you return a list of books and you don't include the themes, then when you try to get the theme you'll notice that it's null. If the EF connection is open and you have lazy loading, it will go to the DB and grab it for you. But, if the connection is not opened, then you have to get it explicitely.
On the other hand, if you use Include, you get the data right away. Under the hood it's gonna do a JOIN to the necessary table and get the data right there.
You can check the SQL query that EF is generating for you and that's gonna make things clearer for you. You'll see only one SQL query.
If you Include a child, it is loaded as part of the original query, which makes it larger.
If you don't Include or reference the child in some other way in the query, the initial resultset is smaller, but each child you later reference will lazy load through a new request to the database.
If you loop through 1000 users in one request and then ask for their 10 photos each, you will make 1001 database requests if you don't Include the child...
Also, lazy loading requires the context hasn't been disposed. Always an unpleasant surprise when you pass an Entity to a view for UI rendering for example.
update
Try this for example and see it fail:
var book = await context.Books.First();
var theme = book.Theme;
Then try this:
var book = await context.Books.Include(b => b.Theme).First();
var theme = book.Theme;

EF Filtering a Child Table with Lazy Load

I'm using entity framework with POCOs and the repository pattern and am wondering if there is any way to filter a child list lazy load. Example:
class Person
{
public virtual Organisation organisation {set; get;}
}
class Organisation
{
public virtual ICollection<Product> products {set; get;}
}
class Product
{
public bool active {set; get;}
}
Currently I only have a person repository because I'm always starting from that point, so ideally I would like to do the following:
Person person = personRepo.GetById(Id);
var products = person.organisation.products;
And have it only load products where active = true from the database.
Is this possible and if so how?
EDIT My best guess would be either a filter can be added to the configuration of the entity. Or there might be a way to intercept/override the lazy load call and modify it. Obviously if I created an Organisation Repository I could manually load it as I please but I am trying to avoid that.
There's not a direct way to do this via lazy loading, but if you were willing to explicitly load the collection, you could follow whats in this blog, see the Applying filters when explicitly loading related entities section.
context.Entry(person)
.Collection(p => p.organisation.products)
.Query()
.Where(u => u.IsActive)
.Load();
You can do what Mark Oreta and luksan suggest while keeping all the query logic within the repository.
All you have to do is pass a Lazy<ICollection<Product>> into the organization constructor, and use the logic they provided. It will not evaluate until you access the value property of the lazy instance.
UPDATE
/*
First, here are your changes to the Organisation class:
Add a constructor dependency on the delegate to load the products to your
organization class. You will create this object in the repository method
and assign it to the Person.Organization property
*/
public class Organisation
{
private readonly Lazy<ICollection<Product>> lazyProducts;
public Organisation(Func<ICollection<Product>> loadProducts){
this.lazyProducts = new Lazy<ICollection<Product>>(loadProducts);
}
// The underlying lazy field will not invoke the load delegate until this property is accessed
public virtual ICollection<Product> Products { get { return this.lazyProducts.Value; } }
}
Now, in your repository method, when you construct the Person object you will assign the Organisation property with an Organisation object containing the lazy loading field.
So, without seeing your whole model, it will looks something like
public Person GetById(int id){
var person = context.People.Single(p => p.Id == id);
/* Now, I'm not sure about the cardinality of the person-organization or organisation
product relationships, but let's assume you have some way to access the PK of the
organization record from the Person and that the Product has a reference to
its Organisation. I may be misinterpreting your model, but hopefully you
will get the idea
*/
var organisationId = /* insert the aforementioned magic here */
Func<ICollection<Product>> loadProducts = () => context.Products.Where(product => product.IsActive && product.OrganisationId == organisationId).ToList();
person.Organisation = new Organisation( loadProducts );
return person;
}
By using this approach, the query for the products will not be loaded until you access the Products property on the Organisationinstance, and you can keep all your logic in the repository. There's a good chance that I made incorrect assumptions about your model (as the sample code is quite incomplete), but I think there is enough here for you to see how to use the pattern. Let me know if any of this is unclear.
This might be related:
Using CreateSourceQuery in CTP4 Code First
If you were to redefine your properties as ICollection<T> rather than IList<T> and enable change-tracking proxies, then you might be able to cast them to EntityCollection<T> and then call CreateSourceQuery() which would allow you to execute LINQ to Entities queries against them.
Example:
var productsCollection = (EntityCollection<Product>)person.organisation.products;
var productsQuery = productsCollection.CreateSourceQuery();
var activeProducts = products.Where(p => p.Active);
Is your repository using something like:
IQueryable<T> Find(System.Linq.Expressions.Expression<Func<T, bool>> expression)
If so you can do something like this:
var person = personRepo.Find(p => p.organisation.products.Any(e => e.active)).FirstOrDefault();
You could possibly use Query() method to achieve this. Something like:
context.Entry(person)
.Collection(p => p.organisation.products)
.Query()
.Where(pro=> pro.Active==true)
.Load();
Have a look at this page click here

EF Code First forced eager loading

I'm using EF 5 with Code First. I have a class that I want to always eager load some properties. I removed the virtual keyword but it's not eager loading:
public class Person
{
public ICollection<Email> Emails { get; set; }
public Profile Profile {get;set;}
}
So by turning off lazy loading, it won't auto eager load right? If so, how do I archive that without using Include()?
Thanks!
No, turning off lazy loading by removing the virtual keyword will not automatically enable eager loading. You have to Include the related Entity or Collection like so:
var personWithProfile = ctx.People.Include(x => x.Profile).First();
var personWithProfileAndEmails = ctx.People.
.Include(x => x.Profile)
.Include(x => x.Emails)
.First();
This is a great read from the ADO.NET team blog: http://blogs.msdn.com/b/adonet/archive/2011/01/31/using-dbcontext-in-ef-feature-ctp5-part-6-loading-related-entities.aspx

Entity Framework doesn't query derived classes - Error in DbOfTypeExpression

I have a base class and two derived classes.
Each of the derived classes implements the same type as a property - the only difference is the property name.
Sadly I don't have much influence on the class design -> they have been generated from a wsdl file.
I then have a property on the BaseType to encapsulate the common property. The plan was to use this property in my web views etc.
I have used the famous "Fruit-Example" to demonstrate the problem:
public class FruitBase
{
public virtual int ID { get; set; }
//
// The plan is to use this property in mvc view
//
[NotMapped]
public virtual FruitnessFactor Fruitness
{
get
{
if (this.GetType().BaseType == typeof(Apple))
return ((Apple)this).AppleFruitness;
else if (this.GetType().BaseType == typeof(Orange))
return ((Orange)this).OrangeFruitness;
else
return null;
}
}
}
public class FruitnessFactor { }
In my MVC controller, the following query works absolutely fine:
return View(context.FruitEntities
.OfType<Apple>().Include(a =>a.AppleFruitness)
.ToList());
But this one doesn't:
return View(context.FruitEntities
.OfType<Apple>().Include(a =>a.AppleFruitness)
.OfType<Orange>().Include(o => o.OrangeFruitness)
.ToList());
The error message I get is:
DbOfTypeExpression requires an expression argument with a polymorphic result type that is compatible with the type argument.
I am using EF 5.0 RC and the Code First approach.
Any help is much appreciated!
As far as I can tell you can't apply Include on multiple subtypes in a single database query. You can query one type (OfType<Apple>().Include(a => a.AppelFruitness)) and the same for another subtype. The problem is that you can't concat the results in the same query because the result collections have different generic types (apples and oranges).
One option would be to run two queries and copy the result collection into a new collection of the base type - as you already indicated in the comment section under your question.
The other option (which would only need a single query) is a projection. You would have to define a projection type (you could also project into an anonymous type)...
public class FruitViewModel
{
public FruitBase Fruit { get; set; }
public FruitnessFactor Factor { get; set; }
}
...and then can use the query:
List<FruitViewModel> fruitViewModels = context.FruitEntities
.OfType<Apple>()
.Select(a => new FruitViewModel
{
Fruit = a,
Factor = a.AppleFruitness
})
.Concat(context.FruitEntities
.OfType<Orange>()
.Select(o => new FruitViewModel
{
Fruit = o,
Factor = o.OrangeFruitness
}))
.ToList();
If you don't disable change tracking (by using AsNoTracking) the navigation properties get populated automatically when the entities get attached to the context ("Relationship fixup") which means that you can extract the fruits from the viewModel collection...
IEnumerable<FruitBase> fruits = fruitViewModels.Select(fv => fv.Fruit);
...and you'll get the fruits including the FruitnessFactor properties.
This code is pretty awkward but a direct approach without using a projection has been asked for several times without success:
bottleneck using entity framework inheritance
Entity Framework - Eager loading of subclass related objects
How do I deeply eager load an entity with a reference to an instance of a persistent base type (Entity Framework 4)