I have a unclear understanding on relational database eager loading.
My example is Children - Teacher which is a multiple relationship.My question is if we eager load children entity ,it will consist collection of teachers ,and those teachers will contain list of children ,How far this will go and how to stop from a certain level.
My question is if we eager load children entity ,it will consist collection of teachers ,and those teachers will contain list of children ,How far this will go and how to stop from a certain level.
You don't need and cannot stop this cycle because this is called EF/EF Core proxy creation. This is the default behavior of EF/EF Core Eager Loading and there is no way to stop this. But you can stop self referencing loop for JSON response.
Here is details about this:Related data and serialization in EF Core
Related
I'm trying to understand how EntityFramework Core manages data internally because it influences how I call DbSets. Particularly, does it refer to in-memory data or re-query the database every time?
Example 1)
If I call _context.ToDo.Where(x => x.id == 123).First() and then in a different procedure call the same command again, will EF give me the in-memory value or re-query the DB?
Example 2)
If I call _context.ToDo.Where(x => x.id == 123).First() and then a few lines later call _context.ToDo.Find(123).Where(x => x.id == 123).Incude(x => x.Children).First(), will it use the in-memeory and then only query the DB for "Children" or does it recall the entire dataset?
I guess I'm wondering if it matters if I duplicate a call or not?
Is this affected by the AsNoTracking() switch?
What you really ask is how caching works in EF Core, not how DbContext manages data.
EF always offered 1st level caching - it kept the entities it loaded in memory, as long as the context remains alive. That's how it can track changes and save all of them when SaveChanges is called.
It doesn't cache the query itself, so it doesn't know that Where(....).First() is meant to return those specific entities. You'd have to use Find() instead. If tracking is disabled, no entities are kept around.
This is explained in Querying and Finding Entities, especially Finding entities using primary keys:
The Find method on DbSet uses the primary key value to attempt to find an entity tracked by the context. If the entity is not found in the context then a query will be sent to the database to find the entity there. Null is returned if the entity is not found in the context or in the database.
Find is different from using a query in two significant ways:
A round-trip to the database will only be made if the entity with the given key is not found in the context.
Find will return entities that are in the Added state. That is, Find will return entities that have been added to the context but have not yet been saved to the database.
In Example #2 the queries are different though. Include forces eager loading, so the results and entities returned are different. There's no need to call that a second time though, if the first entity and context are still around. You could just iterate over the Children property and EF would load the related entities one by one, using lazy loading.
EF will execute 1 query for each child item it loads. If you need to load all of them, this is slow. Slow enough to be have its own name, the N+1 selects problem. To avoid this you can load a related collection explicitly using explicit loading, eg. :
_context.Entry(todo).Collection(t=>t.Children).Load();
When you know you're going to use all children though, it's better to eagerly load all entities with Include().
Using EF6.
I have a list of entity items that are detached from a dbContext. I'd like to explicit load several of the entities related to them, sometimes with through 2 or 3 levels of navigation.
I'd also like to do all this object graph loading in a single DB call if possible.
If its not possible with explicit loading techniques, I will just re-query the database for those specific item Ids and use eager loading at that point (since there is no other state I need to worry about at this point).
How can we tell Entity Framework about Aggregates?
when saving an aggregate, save entities within the aggregate
when deleting an aggregate, delete entities within the aggregate
raise a concurrency error when two different users attempt to modify two different entities within the same aggreate
when loading an aggregate, provide a consistent point-in-time view of the aggregate even if there is some time delay before we access all entities within the aggregate
(Entity Framework 4.3.1 Code First)
EF provides features which allows you defining your aggregates and using them:
This is the most painful part. EF works with entity graphs. If you have an entity like Invoice and this entity has collection of related InvoiceLine entities you can approach it like aggregate. If you are in attached scenario everything works as expected but in detached scenario (either aggregate is not loaded by EF or it is loaded by different context instance) you must attach the aggregate to context instance and tell it exactly what did you changed = set state for every entity and independent association in object graph.
This is handled by cascade delete - if you have related entities loaded, EF will delete them but if you don't you must have cascade delete configured on the relation in the database.
This is handled by concurrency tokens in the database - most commonly either timestamp or rowversion columns.
You must either use eager loading and load all data together at the beginning (= consistent point of view) or you will use lazy loading and in such case you will not have consistent point of view because lazy loading will load current state of relations but it will not update other parts of aggregate you have already loaded (and I consider this as performance killer if you try to implement such refreshing with EF).
I wrote GraphDiff specifically for this purpose. It allows you to define an 'aggregate boundary' on update by providing a fluent mapping. I have used it in cases where I needed to pass detached entity graphs back and forth.
For example:
// Update method of repository
public void Update(Order order)
{
context.UpdateGraph(order, map => map
.OwnedCollection(p => p.OrderItems);
}
The above would tell the Entity Framework to update the order entity and also merge the collection of OrderItems. Mapping in this fashion allows us to ensure that the Entity Framework only manages the graph within the bounds that we define on the aggregate and ignores all other properties. It supports optimistic concurrency checking of all entities. It handles much more complicated scenarios and can also handle updating references in many to many scenarios (via AssociatedCollections).
Hope this can be of use.
When I use Entity Framework, I want to query out a record in a context and add it to another context with the same schema, after query out the record, I detach it from the context, but the related entities are all away, is there any way to solve it?
Thanks in advance!
This is "by design". EF can detach entities only one by one but in the same time EF doesn't support object graphs composed of attached and detached entities. Because of that when you detach entity it will break all relations to the rest of attached object graph. Detaching whole object graph is currently not supported but you can vote for this feature on Data UserVoice.
As a workaround you can turn off lazy loading on your context, use eager loading described by #CodeWarrior to load exactly data you need to pass to other context. Once you have data loaded serialize them to stream and immediately deserialize them to the new instance of the object graph. This is the way how to make deep clone of entity graph which is detached but has all relations intact (turning lazy loading off is needed otherwise serialization will load all other navigation properties as well which can result in much bigger object graph then expected). The only requirement is that your entities must be serializable by serializer of your choice (be aware of circular references which usually require some special handling or additional attributes on your entities).
Are you asking how to load the child entities? If so, you can do eager loading with the .Include method. Given a Person class and a PhoneNumber class where Person has a collection of PhoneNumber, you could do the following:
List<Person> People = db.People.Where(p => p.Name = "Henry")
.Include("PhoneNumbers")
.ToList();
Or you can do what is called explicit loading where you load your entities and call the .Load method on the collections of child and related entities that you want to load. Generally you do this when you do not have LazyLoading enabled (and LazyLoading is enabled by default in 4.0+ don't recall in previous versions).
Regardless of how you query and load them, you will have to detach entities that you want to attach to a different context.
Here is a link to a pretty good MSDN article on loading entities.
I've been trying to familiarize myself with the Entity Framework. Most of it seems straight forward, but I'm a bit confused on the difference between eager loading with the Include method and default lazy loading. Both seem like they load related entities, so on the surface it looks like they do the same thing. What am I missing?
Let's say you have two entities with a one-to-many relationship: Customer and Order, where each Customer can have multiple Orders.
When loading up a Customer entity, Entity Framework allows you to either eager load or lazy load the Customer's Orders collection. If you choose to eager load the Orders collection, when you retrieve a Customer out of the database Entity Framework will generate SQL that retrieves both the Customer's information and the Customer's Orders in one query. However, if you choose to lazy load the Orders collection, when you retrieve a Customer out of the database Entity Framework will generate SQL that only pulls the Customer's information (Entity Framework will then generate a separate SQL statement if you access the Customer's Orders collection later in your code).
Determining when to use eager loading and when to use lazy loading all comes down to what you expect to do with the entities you retrieve. If you know you only need a Customer's information, then you should lazy-load the Orders collection (so that the SQL query can be efficient by only retrieving the Customer's information). Conversely, if you know you'll need to traverse through a Customer's Orders, then you should eager-load the Orders (so you'll save yourself an extra database hit once you access the Customer's Orders in your code).
P.S. Be very careful when using lazy-loading as it can lead to the N+1 problem. For example, let's say you have a page that displays a list of Customers and their Orders. However, you decide to use lazy-loading when fetching the Orders. When you iterate over the Customers collection, then over each Customer's Orders, you'll perform a database hit for each Customer to lazy-load in their Orders collection. This means that for N customers, you'll have N+1 database hits (1 database hit to load up all the Customers, then N database hits to load up each of their Orders) instead of just 1 database hit had you used eager loading (which would have retrieved all Customers and their Orders in one query).
If you come from SQL world think about JOIN.
If you have to show in a grid 10 orders and the customer that put the order you have 2 choices:
1) LAZY LOAD ( = 11 queryes = SLOW PERFORMANCES)
EF will shot a query to retrieve the orders and a query for each order to retrieve the customer data.
Select * from order where order=1
+
10 x (Select * from customer where id = (order.customerId))
1) EAGER LOAD ( = 1 query = HIGH PERFORMANCES)
EF will shot a single query to retrieve the orders and customers with a JOIN.
Select * from orders INNER JOIN customers on orders.customerId=customer.Id where order=1
PS:
When you retrieve an object from the db, the object is stored in a cache while the context is active.
In the example that I made with LAZY LOAD, if all the 10 orders relate to the same customer you will see only 2 query because when you ask to EF to retrieve an object the EF will check if the object is in the cache and if it find it will not fire another SQL query to the DB.
Eager loading is intended to solve the N+1 Selects problem endemic to ORMs. The short version is this: If you are going to directly retrieve some number of entities and you know you will be accessing certain related entities via the retrieved entities, it is much more efficient to retrieve all the related entities up-front in one pass, as compared to retrieving them incrementally via lazy loading.
An important issue is serialization. Microsoft recommends NOT using the default lazy loading if you're dealing with serialized objects. Serialization causes ALL related properties to be called, which can start a chain reaction of related entities being queried. This really comes into play if you're returning JSON data from a controller. JSON data is obviously serialized. You'd either want to return data immediately via Eager or turn the lazyloading off in the context and employ Explicit Lazy loading.