I have a very deep relational tree in my model design, that is, the root entity contains a collection of entities that contains more collections of other entities that contains more collections and on an on ... I develop a business layer that other developers have to use to perform operations, including get/save data.
Then, I am thinking about what is the best strategy to cope with this situation. I cannot allow that when retrieving a entity, EF resolves all the dependency tree, since it will end in a lot of useless JOIN (useless because maybe I do not need that data in the next level).
If I disable lazy loading and enforce eager loading for what is needed, it works as expected, but if other developer calls child.Parent.Id instead of child.ParentId trying to do something new (like a new requirement or feature not considered at the beggining), it will get a NullReferenceException if that dependency was not included, which is bad... but it will be a "fast error", and it could be fixed straight away.
If I enable lazy loading, accessing child.Parent.Id instead of child.ParentId will end in a standalone query to the DB each time it is accessed. It won't fail, but it is worse because there is no error, only a decrement in the performance, and all the code should be reviewed.
I am not happy with any of these two solutions.
I am not happy having entities that contains null or empty collections, when in reality, it is not true.
I am not happy with letting EF perform arbitrary queries to the DB at any moment. I would like to get all the information in one shoot if possible.
So, I come up with several possible solutions that involve disabling lazy loading and enforcing eager loading, but not sure which is better:
I can create a EntityBase class, that contains the data in the table without the collections, so they cannot be accessed. And concrete implementations that contains the relationships, the problem is that you do not have much flexibility since C# does not allow multi-inheritance.
I can create interfaces that "mask" the objects hidding the properties that are not available at that method call. For example, if I have a User.Roles property, in order to show a grid will all users, I do not need to resolve the .Roles property, so I could create an interface 'IUserData' that does not contain such property.
But I do not if this additional work is worth, maybe a fast NullReferenceException indicating "This property has not been loaded" would be enough.
Would it be possible to throw a specific exception type if the property is virtual and it has not been overridden/set ?
What method do you use?
Thanks.
In my opinion you are trying to protect the developers from the need to understand what they are doing when they access data and what performance implications it can have - which might result in an unnecessary convoluted API with a lot of helper classes, base classes, interfaces, etc.
If a developer uses user.MiddleName.Trim() and MiddleName is null he gets a NullReferenceException and did something wrong, either didn't check for null or didn't make sure that the MiddleName is set to a value. The same when he accesses user.Roles and gets a NullReferenceException: He didn't check for null or didn't call the appropriate method of your API that loads the Roles of the user.
I would say: Explain how navigation properties work and that they have to be requested explicitly and let the application crash if a developer doesn't follow the rules. He needs to understand the mistake and fix it.
As a help you could make loading related data explicit somehow in the API, for example with methods like:
public User GetUser(int userId);
public User GetUserWithRoles(int userId);
Or:
public User GetUser(int userId, params Expression<Func<User,object>>[] includes);
which could be called with:
var userWithoutRoles = layer.GetUser(1);
var userWithRoles = layer.GetUser(2, u => u.Roles);
You could also leverage explicit loading instead of lazy loading to force the developers to call a method when they want to load a navigation property and not just access the property.
Two additional remarks:
...lazy loading ... will end in a standalone query to the DB each time
it is accessed.
"...and not yet loaded" to complete this. If the navigation property has already been loaded within the same context, accessing the property again won't trigger a query to the database.
I would like to get all the information in one shoot if possible.
Multiple queries do not necessarily result in worse performance than one query with a lot of Includes. In fact complex eager loading can lead to data multiplication on the wire and make entity materialization very time consuming and slower than multiple lazy or explicit loading queries. (Here is an example where a query's performance has been improved by a factor of 50 by changing it from a single query with Includes to more than 1000 queries without Include.) Quintessence is: You cannot reliably predict what's the best loading strategy in a specific situation without measuring the performance (if the performance matters in that situation).
Related
I am using EF6
I have a generic table which holds data for different types of class objects using the "Table Per Hierarchy" Approach. In addition these class objects use complex types for defining types for their properties.
So using a made up example,
Table = Person
"Mike the Teacher" is a "Teacher" instance of Person with a personType of "Teacher"
The "Teacher" instance has 2 properties, complextypePersonalDetails and complextypeAddress.
complextypePersonalDetails contains
First Name, Surname and Age.
complextypeAddress contains
HouseName, Street, Town, City, County.
I admit that this design may be over the top, and the problem may be of my making, but that aside I wanted to check to see whether I could do anymore with EF6 before I rewrite it.
I am performance profiling the code with JetBrains DotTrace.
On first call, say on
personTeacher = db.person.OfType().First()
I get a massive delay of around 150,000ms
around:
SerializedGeneratedViewOfType (150,000ms)
TryGenerateQueryViewOfType
GenerateTypeSpecificQueryView
GenerateQueryViewForSingleExtent
GenerateQueryViewForExtentAndType
GenerateViewsForExtentAndType
GenerateViewComponents
EnsureExtentIsFullyMapped (90,000ms)
GenerateCaseStatements (60,000ms)
I have created a pregenerated View using the "InteractivePreGeneratedViews" nuget package which creates the SQL. However even with this I still need to incur my first hit. Also this hit seems to happen every time the Webserver/Website/AppPool is restarted.
I am not totally sure of the EF process, but I guess there is some further form of runtime compilation or caching which happens when web app starts. Where could this be happening and is there a proactive method that I could use to pregenerate/precompile/precache this problem away.
In the medium term, we will rewrite this code in Dapper or EF.Core. So for now, any thoughts on what can be done?
Thanks.
I had commented on this before, but retracted it, but just agreeing with "this design may be over the top, and the problem may be of my making", but I thought I'd see if anyone else jumped in.
The initial spin-up cost is due to EF needing to resolve the mapping for your schema. This happens once, the first time a DBSet on the context is accessed. You can mitigate this by executing a query on your application start, I.e.
void Application_Start(object sender, EventArgs e)
{
// Initialization stuff...
using (var context = new MyContext())
{
var result = context.MyTable.Any(); // Spin up will happen here, not when the first user attempts to access a query.
}
}
You actually need to run a query for the the DbContext to resolve the mapping, just new-ing one up won't do it.
For larger, or more complex schemas you can also look to utilize bounded contexts where each context maps a particular set of relationships for a specific area of the application. The less complex/comprehensive a context is, the faster it initializes.
As far as the design goes, TPH is for representing inheritance, which is where you need to establish an "is-a" relation between like entities. Relational models, and ORMs by definition can support this, but they're geared more towards "has-a" relationships. Rather than having a model where you go "is-a person with an address", the relation is best mapped out that a person may "have-an" address. I've worked on a system that was designed by a team of engineers where an entire reporting system with dynamic rules was represented by 6 tables. Honestly, those designs are a nightmare to maintain.
I don't know why OfType() is so slow, but I found a fast and easy workaround by replacing it with a cast; EntityFramework seems to support that just fine and without the performance penalty.
var stopwatch = Stopwatch.StartNew();
using (var db = new MyDbContext())
{
// warn up
Console.WriteLine(db.People.Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// fast
Console.WriteLine(db.People.Select(p => p as Teacher).Where(p => p != null).Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
// slow
Console.WriteLine(db.People.OfType<Teacher>().Count());
Console.WriteLine($"{stopwatch.ElapsedMilliseconds} ms");
}
20
3308 ms
2
3796 ms
2
10026 ms
I have an ASP.NET MVC application utilizing Entity Framework for the data layer.
In one of my methods I retrieve the seasonal availability data for a product, and afterwards, the best tax rate for the product.
public ProductList FetchProductSearchList(ProductSearchCriteria criteria)
{
...
var avail = ProductAvailabilityTemplate.Get(criteria.ProductID);
...
var tr = TaxRate.BestMatchFor(criteria.ProductID, criteria.TaxCode);
...
}
In the data layer for ProductAvailabilityTemplate.Get, I had been optimizing the performance of my LINQ code. In particular, I had set ctx.ObjectContext.ContextOptions.LazyLoadingEnabled = false; to prevent EF from loading some entities (via navigation properties) that I don't need in this scenario.
However, once this change was made I noticed that my TaxRates weren't loading fully, because ctx.ObjectContext.ContextOptions.LazyLoadingEnabled was still false in my Tax data layer code. This meant that an entity linked to TaxRate via a navigation property wasn't being loaded.
To overcome this problem I simply set ctx.ObjectContext.ContextOptions.LazyLoadingEnabled = true; in the Tax data layer method, but I am concerned that an unrelated change could cause a problem like this. It seems that you can't safely disable lazy loading for one feature without potentially affecting the operation of whatever is called afterwards. I am tempted to remove all navigation properties, disable lazy loading, and use good old fashioned joins to load exactly what I need for each data layer call, no more no less.
Would welcome any advice.
It's a trade off:
Lazy Loading
gives you the benefit of not needing to specify the depth of graph during loading
will need to return to the database to retrieve missing results
requires some pollution of the POCO's (e.g. virtual properties with proxies)
requires the DbContext to be longer lived, for the duration of all data accesses.
Eager Loading
requires a lot more thought into the depth of loading during each fetch
will typically generate fewer queries with wider joins to fetch the graph at once
does not require any alteration or ceremony around your entities
Allows much shorter lived connections and DbContexts
FWIW, I've generally done prototype work with Lazy Loading enabled, to get software to a demonstrable state, and once the data access patterns stabilize, then switch off Lazy Loading and move to explicitly Included references. A few Unit Tests checking for null references will also do wonders at this point. I am loathe to deliver a production system with Lazy Loading still enabled, as there is an element of non-determinism (e.g. difficult to fully test), and the need to return to the DB for further data will hurt performance.
Either way, I wouldn't switch off all Navigation and do explicit Joins - you are losing the power of navigability that an ORM provides. When you switch out of Lazy Loading, simply explicitly define the entities to be eager loaded with applicable Includes
I was fond of lazy loading when I started using EF, but after a while I realized that it was affecting performance since it effectively disables joins and pulls all subdata in separate queries even if you need to consume it all at once.
So now I'm rather using Includes to eagerly load the sub-entities that I'm interested in. You could also do this somewhat dynamic, for instance by providing a includeDetails parameter:
public IEnumerable<Customer> LoadCustomersStartingWithName(string name, bool includeDetails)
{
using (var db = new MyContext())
{
var customers = db.Customers;
if (includeDetails)
customers = customers.Include(x => x.Orders).Include(x => x.ContactPersons);
customers = customers.Where(x => x.Name.StartsWith(name));
return customers;
}
}
For the code to work in EF6, you would also need to include
using System.Data.Entity;
at the top of the class
I'm researching data layer underpinnings for a new web-based reporting system and have spent a lot of time evaluating ORM's over the last few days. That said, I've never dealt with "lazy loading" before and am confused at why its the default setting for LINQ queries in the Entity Framework. It seems like it creates a lot of network traffic and unnecessarily tasks the database with additional queries that could otherwise be resolved with joins.
Can someone describe a scenario in which lazy loading would be beneficial?
Some meta:
The new system will be working against a database with hundreds of tables and many terabytes of data in a production environment with over 3,000 concurrent users on the system 24 hours a day. They will be retrieving large datasets continuously. Is it possible that an ORM just isn't the right solution for our needs, especially since the app will be web-based?
When we talk about lazy loading we are talking about Navigation Properties (how we follow foreign keys). What lazy loading will do for us is to populate the entity from a remote table as we attempt to access that entity. For example if we have a model like this
public class TestEntity
{
public int Id{get;set;}
public AnotherEntity RemoteEntity{get;set;}
}
And call the following
var something = WhateverContext.TestEntities.First().RemoteEntity;
We will get 2 database calls, one for WhateverContext.TestEntities.First() and one for loading the remote entity.
I'm a web guy, (and more specifically an MVC guy) and for web stuff I don't think there is ever a good reason for wanting to do this, One database call is always going to be quicker than two if we require the same set of data.
The situation where I think that lazy loading is actually worth considering is when you don't know when you do your first query if you will need the second entity at all. In my opinion this is much more relevant for windows applications where we have a user who is performing actions in real time (rather than stateless MVC where users are requesting whole pages at once). For example I think lazy loading shines when we have a list of data with a details link, then we don't load the details until the user decides they want to see them.
I don't feel this extends to paging, sorting and filtering, IMO there should be one specifically crafted database query per page of data you are displaying, which returns exactly the data set required to display that page.
In terms of your performance question, I feel that EF (or another ORM) can probably meet your needs here but you want to be careful with how you are retrieving large datasets due to the way EF tracks entities. Check out my EF performance tuning cheat sheet, and read up on DetectChanges and AsNoTracking if you do decide to use EF with large queries.
Most ORMs will give you the option, when you're building up your object selections, to say "don't be lazy, go ahead and join", so if you're worried about it from an efficiency perspective, don't be. You can make it work (usually).
There are 2 particular cases I know of where lazy loading helps:
Chaining commands
What if you want to create a basic select, but then you want to run it through a sort and a filter function that's based on user input. You can simply pass the ORM object in, and attach the sort and filtering functionality to it. Instead of evaluating it each time, it only evaluates when it's actually used.
Avoiding huge, deep, highly-relational queries
What if you just need the IDs of some related fields? If it loads lazily, you don't have to worry about it joining a whole bunch of data and tables that you don't need, potentially slowing down the query and overusing bandwidth. Of course, if you DID want everything else, then you'll need to be explicit, or you may run into a problem where it lazily runs a query for each detail record. Like I mentioned at the outset, that's easily overcome in any ORM worth using.
A simple case is a result set of N records which you do not want to bring to the client at once. The benefit is that you are able to lazily load only what is needed for the clients demands, such as sorting, filtering, etc... An example would be a paging view where one could page through records and sort them accordingly, thus the client only needs N amount at a given time.
When you perform the LINQ query it translates that to SQL commands on the server side to provide only what is needed in the given context. It boils down to offloading work to the database and minimizing what you need to send back to the client.
Some will argue that ORM based lazy loading is wrong however that starts to move to semantics fairly quick and should be more about approach to design versus what is right and wrong.
We develop the back office application with quite large Db.
It's not reasonable to load everything from DB to memory so when model's proprties are requested we read from DB (via EF)
But many of our UIs are just simple lists of entities with some (!) properties presented to the user.
For example, we just want to show Id, Title and Name.
And later when user select the item and want to perform some actions the whole object is needed. Now we have list of items stored in memory.
Some properties contain large textst, images or other data.
EF works with entities and reading a bunch of large objects degrades performance notably.
As far as I understand, the problem can be solved by creating lightweight entities and using them in appropriate context.
First.
I'm afraid that each view will make us create new LightweightEntity and we eventually will end with bloated object context.
Second. As the Model wraps EF we need to provide methods for various entities.
Third. ViewModels communicate and pass entities to each other.
So I'm stuck with all these considerations and need good architectural design advice.
Any ideas?
For images an large textst you may consider table splitting, which is commonly used to split a table in a lightweight entity and a "heavy" entity.
But I think what you call lightweight "entities" are data transfer objects (DTO's). These are not supplied by the context (so it won't get bloated) but by projection from entities, which is done in a repository or service.
For projection you can use AutoMapper, especially its newer feature that I describe here. This allows you to reduce the number of methods you need to provide "for various entities" (DTO's), because the type to project to can be given in a generic type parameter.
I am currently in the process of attempting to integrate an Entity Framework application with a legacy database that is about ten years old or so. One of the many problems that this database has (alongside having no relations or constraints whatsoever) is that almost every column is set to null, even though in almost all cases, this wouldn't make sense.
Invariably, I will encounter an exception along these lines:
The 'SortOrder' property on 'MyRecord' could not be set to a 'null' value. You must set this property to a non-null value of type 'Int32'.
I have seen many questions that refer to the exception above, but these all seem to be genuine mistakes where the developer did not write classes that properly represent the data in the database. I would like to deliberately write a class that does not properly represent the data in the database. I am fully aware that this is against the rules of Entity Framework, and that is most likely why I am having so much difficulty doing it.
It is not possible to change the schema at this point as it will break existing applications. It is also not possible to fix the data, because new data will be inserted by old applications. I would like to map the database with Entity Framework as it should be, slowly move all the applications over the next couple of years or so to rely on it for data access before finally being able to move on to the database redesign phase.
One method I have used to get around this is to transparently proxy the variable:
internal int? SortOrderInternal { get; set; }
public int SortOrder
{
get { return this.SortOrderInternal ?? 0; }
set { this.SortOrderInternal = value; }
}
I can then map the field in CodeFirst:
entity.Ignore(model => model.SortOrder);
entity.Property(model => model.SortOrderInternal).HasColumnName("SortOrder");
Using the internal keyword in this method does allow me to nicely encapsulate this nastiness so I can at the very least keep it from leaking outside my data access assembly.
But unfortunately I am now unable to use the proxy field in a query as a NotSupportedException will be thrown:
The specified type member 'SortOrder' is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties are supported.
Perhaps it might be possible to transparently rewrite the expression once it is received by the DbSet? I would be interested to hear if this would even work; I'm not skilled enough with expression trees to say. I have so far been unsuccessful in finding a method in DbSet that I could override to manipulate the expression, but I'm not above making a new class that implements IDbSet and passes through to DbSet, horrible though that would be.
Whilst investigating the stack trace, I found a reference to an internal Entity Framework concept called a Shaper, which appears to be the thing that takes the data and inputs it to A quick bit of Googling on this concept doesn't yield anything, but investigating System.Data.Entity.dll with dotPeek indicates that this would certainly be something that would help me... assuming Shaper<T> wasn't internal and sealed. I'm almost certainly barking up the wrong tree here, but I'd be interested to hear if anyone has encountered this before.
That's a fairly tough nut to crack, but you might be able to do it via Microsoft.Linq.Translations.