Entity Framework - what's the difference between using Include/eager loading and lazy loading? - entity-framework

I've been trying to familiarize myself with the Entity Framework. Most of it seems straight forward, but I'm a bit confused on the difference between eager loading with the Include method and default lazy loading. Both seem like they load related entities, so on the surface it looks like they do the same thing. What am I missing?

Let's say you have two entities with a one-to-many relationship: Customer and Order, where each Customer can have multiple Orders.
When loading up a Customer entity, Entity Framework allows you to either eager load or lazy load the Customer's Orders collection. If you choose to eager load the Orders collection, when you retrieve a Customer out of the database Entity Framework will generate SQL that retrieves both the Customer's information and the Customer's Orders in one query. However, if you choose to lazy load the Orders collection, when you retrieve a Customer out of the database Entity Framework will generate SQL that only pulls the Customer's information (Entity Framework will then generate a separate SQL statement if you access the Customer's Orders collection later in your code).
Determining when to use eager loading and when to use lazy loading all comes down to what you expect to do with the entities you retrieve. If you know you only need a Customer's information, then you should lazy-load the Orders collection (so that the SQL query can be efficient by only retrieving the Customer's information). Conversely, if you know you'll need to traverse through a Customer's Orders, then you should eager-load the Orders (so you'll save yourself an extra database hit once you access the Customer's Orders in your code).
P.S. Be very careful when using lazy-loading as it can lead to the N+1 problem. For example, let's say you have a page that displays a list of Customers and their Orders. However, you decide to use lazy-loading when fetching the Orders. When you iterate over the Customers collection, then over each Customer's Orders, you'll perform a database hit for each Customer to lazy-load in their Orders collection. This means that for N customers, you'll have N+1 database hits (1 database hit to load up all the Customers, then N database hits to load up each of their Orders) instead of just 1 database hit had you used eager loading (which would have retrieved all Customers and their Orders in one query).

If you come from SQL world think about JOIN.
If you have to show in a grid 10 orders and the customer that put the order you have 2 choices:
1) LAZY LOAD ( = 11 queryes = SLOW PERFORMANCES)
EF will shot a query to retrieve the orders and a query for each order to retrieve the customer data.
Select * from order where order=1
+
10 x (Select * from customer where id = (order.customerId))
1) EAGER LOAD ( = 1 query = HIGH PERFORMANCES)
EF will shot a single query to retrieve the orders and customers with a JOIN.
Select * from orders INNER JOIN customers on orders.customerId=customer.Id where order=1
PS:
When you retrieve an object from the db, the object is stored in a cache while the context is active.
In the example that I made with LAZY LOAD, if all the 10 orders relate to the same customer you will see only 2 query because when you ask to EF to retrieve an object the EF will check if the object is in the cache and if it find it will not fire another SQL query to the DB.

Eager loading is intended to solve the N+1 Selects problem endemic to ORMs. The short version is this: If you are going to directly retrieve some number of entities and you know you will be accessing certain related entities via the retrieved entities, it is much more efficient to retrieve all the related entities up-front in one pass, as compared to retrieving them incrementally via lazy loading.

An important issue is serialization. Microsoft recommends NOT using the default lazy loading if you're dealing with serialized objects. Serialization causes ALL related properties to be called, which can start a chain reaction of related entities being queried. This really comes into play if you're returning JSON data from a controller. JSON data is obviously serialized. You'd either want to return data immediately via Eager or turn the lazyloading off in the context and employ Explicit Lazy loading.

Related

Google Cloud Datastore for a Products and Stock

I'm trying Google Cloud Datastore, but I have some doubts. I know that the ideal is to use a relational database for make a shop online, but I would like to try using Google Cloud Datastore.
How would a database of 2 tables be made? Stock and Products. The stock table has 2 columns (ref and units) and the product table has 3 columns (name, ref and price).
How would you do to get all the products that have stock?... like a join, I know that we do not have joins, that's why my doubt.
There has to be an efficient way to get the stock related to the products.
There are no tables in the Datastore, you have just entities with properties. And, depending on the client library you use, you might have entity models.
The Stock entities can have a Key property pointing to the corresponding Product entities. You query for the Stock entities, from the results you obtain the Product keys with which you pull the respective entities.
Or, if they're always in a 1:1 relationship I could use the exact same entity IDs for the corresponding Stock and Product entities, so I can make a Stock query and from the Stock entities in the result (or rather from their keys/IDs as I'd probably make keys_only queries) I can immediately compute the Product keys and get the respective entities (see re-using an entity's ID for other entities of different kinds - sane idea?).
But, in general, you might want to reconsider the general SQL approach of querying data to generate a report when you need it (and expecting that to be fast) and instead make the habit of performing the necessary computations ahead of time - whenever the data used in those computations changes. This is a much more scalable approach which works hand in hand with the datastore (and I guess with nosql in general). And for which you do not need to perform equivalent to SQL-style join ops. Basically raise the stock empty flag for a product right when you decrement its stock value, when you already know the product in question, so that you don't have to query for it later. While there also add it to the report (so that you'll have it ready when needed) and maybe trigger the restocking activity as well.

EF eager loading from stored procedure

I've created a stored procedure to manage searching on my application. The stored procedure returns a collection of "cases".
When displaying my search results I reference a property of each "case" that is a linked table in the database. At the moment lazy loading makes the EF framework load the linked property for each "case".
Mini profiler shows this as duplicate trips to the database one for each object returned (possibly 200 objects) is there a way to say load all the linked properties in one go rather than leaving lazy loading to do it?
I don't think you can use .Include() directly but once you get the results from the stored procs you can run a single Linq query to get all the related results - something like:
var relatedReults = ctx.RelatedResults.Where(r => ids.Contains(r.Id)).ToArray();
where ids is an array of ids you build based on the results returned by your stored proc. EF should be able to fix up the relations so after running the query you should be able to access related entities without sending additional queries.

Aggregate Root support in Entity Framework

How can we tell Entity Framework about Aggregates?
when saving an aggregate, save entities within the aggregate
when deleting an aggregate, delete entities within the aggregate
raise a concurrency error when two different users attempt to modify two different entities within the same aggreate
when loading an aggregate, provide a consistent point-in-time view of the aggregate even if there is some time delay before we access all entities within the aggregate
(Entity Framework 4.3.1 Code First)
EF provides features which allows you defining your aggregates and using them:
This is the most painful part. EF works with entity graphs. If you have an entity like Invoice and this entity has collection of related InvoiceLine entities you can approach it like aggregate. If you are in attached scenario everything works as expected but in detached scenario (either aggregate is not loaded by EF or it is loaded by different context instance) you must attach the aggregate to context instance and tell it exactly what did you changed = set state for every entity and independent association in object graph.
This is handled by cascade delete - if you have related entities loaded, EF will delete them but if you don't you must have cascade delete configured on the relation in the database.
This is handled by concurrency tokens in the database - most commonly either timestamp or rowversion columns.
You must either use eager loading and load all data together at the beginning (= consistent point of view) or you will use lazy loading and in such case you will not have consistent point of view because lazy loading will load current state of relations but it will not update other parts of aggregate you have already loaded (and I consider this as performance killer if you try to implement such refreshing with EF).
I wrote GraphDiff specifically for this purpose. It allows you to define an 'aggregate boundary' on update by providing a fluent mapping. I have used it in cases where I needed to pass detached entity graphs back and forth.
For example:
// Update method of repository
public void Update(Order order)
{
context.UpdateGraph(order, map => map
.OwnedCollection(p => p.OrderItems);
}
The above would tell the Entity Framework to update the order entity and also merge the collection of OrderItems. Mapping in this fashion allows us to ensure that the Entity Framework only manages the graph within the bounds that we define on the aggregate and ignores all other properties. It supports optimistic concurrency checking of all entities. It handles much more complicated scenarios and can also handle updating references in many to many scenarios (via AssociatedCollections).
Hope this can be of use.

Optimizing multiple LINQ to Entity Framework queries

I have 2 Entity Framework entity sets with many-to-many relationship (compromised of 3 DB tables). For the sake of the example let's say I have a Worker entity and a Company entity, where each worker can work at multiple companies and each company can have many workers.
I want to retrieve for each worker all the companies that he/she works for. The straightforward way would be to create a query for each worker that will fetch the companies using a join between the association table and the companies table, But this results in a round trip to the DB for each worker.
I am sure this can be done in a better more optimized way. Any help will be appreciated.
Thank you.
If your joining table doesn't have any extra info (just the Id's of Worker and Company), you should have only two entities in your model: Worker and Company. If EF 4 the entity graph is eager loaded by default, so unless you enable LazyLoading by doing (context.ContextOptions.LazyLoadingEnabled = true;), you get your company lists whenever your query for the workers:
var workers = context.Workers.ToList();
// Companies already loaded - do something with them
var companiesForWorker0 = workers[0].Companies; // Don't forget to check
... // for null in real code
You can also directly tell EF to eager load the companies when querying for workers (that would be necessary if LazyLoading is enabled):
var workers = context.Workers.Include("Companies").ToList();
Here is what I do in Linq2SQL and might work for you.
Do query #1.
Collect all the worker 'ids' in a list.
Use this list to pass to the secondary query (in other words where list.Contains(item)).
Now it should take only 2 queries.
You could probably combine them both into a single query with a bit more effort if needed.

How to do the opposite of eager-loading in Entity Framework?

I understand in Entity Framework you can specify relationships that need to be joined with Include:
Product firstProduct =
db.Product.Include("OrderDetail").Include("Supplier").First();
But we have the opposite issue that a simple LINQ statement is getting making too many JOINs on the SQL server.
So how do we do the opposite, i.e. tell Entity to not do any deep loading of joined tables when it gets all the orders, so that on the SQL Server it executes:
SELECT * FROM Orders
The Entity Framework often goes ahead and loads basic relationship information too.
It does this so users can make updates easily, without violating the EF's unique concurrency policy for relationships.
You can turn this off however by doing a no tracking query.
See Tip 11 - How to avoid relationship span for more information for more information
Alex