One of the few valid complaints I hear about EF4 vis-a-vis NHibernate is that EF4 is poor at handling lazily loaded collections. For example, on a lazily-loaded collection, if I say:
if (MyAccount.Orders.Count() > 0) ;
EF will pull the whole collection down (if it's not already), while NH will be smart enough to issue a select count(*)
NH also has some nice batch fetching to help with the select n + 1 problem. As I understand it, the closest EF4 can come to this is with the Include method.
Has the EF team let slip any indication that this will be fixed in their next iteration? I know they're hard at work on POCO, but this seems like it would be a popular fix.
What you describe is not N+1 problem. The example of N+1 problem is here. N+1 means that you execute N+1 selects instead of one (or two). In your example it would most probably mean:
// Lazy loads all N Orders in single select
foreach(var order in MyAccount.Orders)
{
// Lazy loads all Items for single order => executed N times
foreach(var orderItem in order.Items)
{
...
}
}
This is easily solved by:
// Eager load all Orders and their items in single query
foreach(var order in context.Accounts.Include("Orders.Items").Where(...))
{
...
}
Your example looks valid to me. You have collection which exposes IEnumerable and you execute Count operation on it. Collection is lazy loaded and count is executed in memory. The ability for translation Linq query to SQL is available only on IQueryable with expression trees representing the query. But IQueryable represents query = each access means new execution in DB so for example checking Count in loop will execute a DB query in each iteration.
So it is more about implementation of dynamic proxy.
Counting related entities without loading them will is already possible in Code-first CTP5 (final release will be called EF 4.1) when using DbContext instead of ObjectContext but not by direct interaction with collection. You will have to use something like:
int count = context.Entry(myAccount).Collection(a => a.Orders).Query().Count();
Query method returns prepared IQueryable which is probably what EF runs if you use lazy loading but you can further modify query - here I used Count.
Related
Considering a table [Person] which has two foreign keys ([Phone_Numbers], [Business_Information]). When using EF Core, we can simply find a Person using the dbContext.Find method like var person = await db.Context.FindAsync<Person>(1); however, the Find method only looks for the entity inside the tracking cache and it does not handle relational properties. In order to solve this, we can call the Entry method to attach those properties like dbContext.Entry<Person>(person).Reference(x=> x.Business_Information). Considering the provided example, we have to call the Entry method twice in this case:
dbContext.Entry<Person>(person).Reference(x=> x.Business_Information).Load();
dbContext.Entry<Person>(person).Collection(x=> x.Phone_Numbers).Load();
An alternative solution is to use the Include method:
var person = await dbContext.Set<Person>().Include("Business_Information").Include("Phone_Numbers").FirstOrDefaultAsync(x=> x.id == id);
The first solution sends two request to the Db (I think the Find method does not send a request if the entity is being tracked); however, I'm not sure how the second one works and accordingly I'm also unsure if it has any performance advantages. I've been thinking the first solution could be faster and more efficient. I'd appreciate if someone clarifies this for me.
It really depends on number of related properties, their type (reference or collection) and in the first case - if they are already loaded or not.
Let say your entity has N reference navigation properties and M collection navigation properties that you want to load.
The approach with Include will always execute 1 + M db queries - one for the entity + reference properties which data is retrieved with JOINs to the corresponding tables and returned as columns in the query result) and one for each collection - regardless of whether the entity and any of the related entities/collections is already loaded.
The approach with explicit loading is more dynamic.
It will execute 1 db query for the entity if it's not loaded in the context, 0 otherwise.
For each reference navigation property it will execute 1 db query if the referenced entity is not already loaded in the context, 0 otherwise.
For each collection navigation property, it will execute 1 db query if the collection is not marked as loaded (db.Entry(entity).Collection(e => e.Collection).IsLoaded == false), 0 otherwise.
At the end, the explicit loading approach could execute between 0 and 1 + N + M db queries.
With all that being said, it's not clear which one is better. If you are using relatively short lived DbContext instances, hence the chances of not execution related queries is low, I would go with Include approach because it is deterministic.
Using Entity Framework 6.
Suppose I have an entity Parent with two nested collections ICollection<Child> and ICollection<Child2>. I want to fetch both eagerly:
dbContext.Parent.Include(p => p.Child).Include(p => Child2).ToList()
This generates a big query, which looks like this at a high level:
SELECT ... FROM (
SELECT (parent columns), (child columns), NULL as (child2 columns)
FROM Parent left join Child on ...
WHERE (filter on Parent)
UNION ALL
SELECT (parent columns), NULL as (child columns), (child2 columns)
FROM Parent left join Child2 on ...
WHERE (filter on Parent)
))
Is there a way to get Entity Framework to behave like batch fetch in NHibernate (or JPA, EclipseLink, Hibernate etc.) where you can specify that you want to query the parent table first, then each child table separately?
SELECT ... from Parent -- as usual
SELECT ... from Child where parent_id in (list of IDs)
SELECT ... from Child2 where parent_id in (list of IDs)
-- alternatively, you can specify EXISTS instead of IN LIST:
SELECT ... from Child where exists (select 1 from Parent where child.parent_id = parent.id and (where clause for parent))
I find this easier to understand and reason about, since it more closely resembles the SQL you would write if you were writing it by hand. Also, it prevents the redundant parent table rows in the result set. On the other hand, it's more round trips.
I do not believe this is possible with the Entity Framework, at least using LINQ. At the end of the day the ORM attempts to generate the most efficient query possible, at least to it. That being said ORMs like Entity don't always generate the nicest looking SQL or the most efficient. My guess, and this is just a guess, is Entity is trying to reduce the number of trips and I/O becaus I/O is experience, in relativity.
If you are looking for fine grain control over your SQL I recommend you avoid ORMs, or do like I do, use Entity for your basic CRUD and simple queries, used stored procedures for your complex queries, such as complex reports. There is always the ADO.NET too, but seems like you are more intent on using an ORM.
You may fine this useful as well. Basically not much tuning is available. https://stackoverflow.com/a/22390400/2272004
Entity Framework misses out on many sophisticated features NHibernate offers. EF's unique selling point is its versatile LINQ support, but if you need declarative control over how an ORM fetches your data spanning multiple tables, EF is not the tool of choice. With EF, you can only try to find procedural tricks.
Let me illustrate that by showing what you'd need to do to achieve "batch fetching" with EF:
context.Configuration.LazyLoadingEnabled = false;
context.Children1.Where(c1 => parentIds.Contains(c1.ParentId)).Load();
context.Children2.Where(c2 => parentIds.Contains(c2.ParentId)).Load();
var parents = dbContext.Parent.Where(p => parentIds.Contains(p.Id)).ToList();
This loads the required data into to context and EF connects the parent and children by relationship fixup. The result is a parents list with their two child collections populated. But of course, it has several downsides:
You need disable lazy loading, because even though the child collections are populated, they are not marked as loaded. Accessing them would still trigger lazy loading, when enabled.
Repetitive code: you need to repeat the predicates three times. It's not really easy to avoid this.
Too specific. For each different scenario, even if they are almost identical, you have to write a new set of statements. Or make it configurable, which is still a procedural solution.
EF's current main production version (6) doesn't have a query batch facility. You need third-party tools like EntityFramework.Extended to run these queries in one database roundtrip.
I am working with Entity Framework and pretty new with it.
I have a table named: Order and table named: Products.
Each order have a lot of products.
When generating the entities I get Order object with ICollection to products.
The problem is I have a lot of products to each order (20K) and when I do
order.Products.where(......)
The EF runs a select statement only with orderId= 123 and does the rest of the where in the code.
Because I have a lot of results - the select takes a lot of time. How can I change the code - that the select in the DB will be with the where conditions?
This statement:
var prods = order.Products.Where(...);
is equivalent to:
var temps = order.Products;
var prods = temps.Where(...);
Unlike Where(...), which returns an IQueryable, order.Products triggers a lazy loading, which produces an ICollection and will be executed immediately, not delayed. So it's this order.Products part that generates the select statement you see. It fetches all the products belonging to that order into memory. Then the Where(...) part is executed in memory, hence the bad performance.
To avoid this, you should use order.Products only if you really want all the products on an order. If you want only a subset of them, do something like the following:
ctx.Products.Where(prod => prod.Order.Id == order.Id && ...)
Note that ctx is the database context, not the order object.
If you think that the prod.Order.Id == order.Id clause above looks a little dirty, here's a purer but longer alternative:
ctx.Entry(order).Collection(ord => ord.Products).Query().Where(...)
which produces exactly the same SQL query.
Using EF 6.1 database first with model. Without respect to the merits or demerits of having scoped models, I'd like to know what is going on. If I run the exact same query on a model that only contains the entities/tables in my query compared to a model that contains all entities/tables (assuming lazy loading is turned off) both queries should take the same time to execute.
Logically the only difference should be that the model/object with the ability to query all entities will have the additional overhead of all the code to deal with all entities. A heavier class, but it shouldn't affect query performance.
Here is the query:
using ( var context = new MyEntities() )
{
context.Configuration.LazyLoadingEnabled = false;
var parentIds = new List<int> { 1, 2 };
var entities = ( from e in context.Entities.Include( "SomeRelatedEntity" )
join eParent in context.Parents on e.ParentId equals eParent.Id
where parentIds.Contains( eParent.Id )
select e )
.ToList();
}
Using the full model, this takes over 2X as long to get 80K entities. When run against the full model, upon executing the line with ToList, it sits there for 10 seconds before ever hitting the database!
If I add a call to .AsNoTracking() after each entity set, the lag goes away.
If I run the same query with the same large model using reverse engineered code first it takes yet a whole lot more time. This, no doubt, is due to tracking proxy generation (at least it makes sense).
Why is EF taking this long to perform a query of tracked entities? Is there something I can do to speed it up? How can this not be a bug (there isn't any DBMS whereby existing queries are penalized by the addition of new tables)?
I have DbContext (called "MyContext") with about 100 DbSets within it.
Among the domain classes, I have a Document class with 10 direct subclasses (like PurchaseOrder, RequestForQuotation etc).
The heirarchy is mapped with a TPT strategy.
That is, in my database, there is a Document table, with other tables like PurchaseOrder, RequestForQuotation for the subclasses.
When I do a query like:
Document document = myContext.Documents.First();
the query took 5 seconds, no matter whether it's the first time I run it or subsequently.
A query like:
Document document = myContext.Documents.Where(o => o.ID == 2);
also took as long.
Is this an issue with EF4.1 (if so, will EF4.2 help) or is this an issue with the query codes?
Did you try using SQL Profile to see what is actually sent to the DB? It could be that you have too many joins on your Document that are not set to lazy load, and so the query has to do all the joins in one go, bringing back too many columns. Try to send a simple query with just one return column.
As you can read here, there are some performance issues regarding TPT in EF.
The EF Team annouced several fixes in the June 2011 CTP, including TPT queries optimization, but they are not included in EF 4.2, as you can read in the comments to this answer.
In the worst case, these fixes will only be released with .NET 4.5. I'm hoping it will be sooner...
I'm not certain that the DbSet exposed by code-first actually using ObjectQuery but you can try to invoke the .ToTraceString() method on them to see what SQL is generated, like so:
var query = myContext.Documents.Where(o => o.ID == 2);
Debug.WriteLine(query.ToTraceString());
Once you get the SQL you can determine whether it's the query or EF which is causing the delay. Depending on the complexity of your base class the query might include a lot of additional columns, which could be avoided using projection. With using projections, you can perform a query like this:
var query = from d in myContext.Documents
where d.ID == 2
select new
{
o.Id
};
This should basically perform a SELECT ID FROM Documents WHERE ID = 2 query and you can measure how long this takes to gain further information. Of course the projected query might not fit your needs but it might get you on the right track. If this still takes up to 5 seconds you should look into performance problems with the database itself rather than EF.
Update
Apparently with code-first you can use .ToString() instead of .ToTraceString(), thanks Slauma for noticing.
I've just had a 5 sec delay in ExecuteFunction, on a stored procedure that runs instantaneously when called from SQL Management Studio. I fixed it by re-writing the procedure.
It appears that EF (and SSRS BTW) tries to do something like a "prepare" on the stored proc and for some (usually complex) procs that can take a very long time.
A quick and dirty solution is to duplicate and then replace your SP parameters with internal variables:
create proc ListOrders(#CountryID int = 3, #MaxOrderCount int = 20)
as
declare #CountryID1 int, #MaxOrderCount1 int
set #CountryID1 = #CountryID
set #MaxOrderCount1 = #MaxOrderCount
select top (#MaxOrderCount1) *
from Orders
where CountryID = #CountryID1