Why Linq to Sql makes SELECT * instead of SELECT COUNT(*) when I need only Count? - entity-framework

We use Entity Framework and this leads to big performance problems whenever we use Count() on child collections of database entities.
As workaround I've used joins of root collections of data context. Then the resulting sql query uses the desired COUNT(*). But this solution is really ugly.
The slow query is:
var booked = erf.Sessions.All(s => s.Exams.All(e => e.Candidates.Count() >= e.CandidatesExpected))

If by "child collections" you mean navigation properties of type ICollection<T> defined in your entities, then it's Linq-to-Entities (not Linq-to-Sql, as you specified). Thus your Count() method is just an extension method defined in Enumerable class, which is executed on entities already materialized into memory. To get results you are expecting, you need to use Count() on DbSet queries.

Related

Entity Framework generated queries for nested collections

Using Entity Framework 6.
Suppose I have an entity Parent with two nested collections ICollection<Child> and ICollection<Child2>. I want to fetch both eagerly:
dbContext.Parent.Include(p => p.Child).Include(p => Child2).ToList()
This generates a big query, which looks like this at a high level:
SELECT ... FROM (
SELECT (parent columns), (child columns), NULL as (child2 columns)
FROM Parent left join Child on ...
WHERE (filter on Parent)
UNION ALL
SELECT (parent columns), NULL as (child columns), (child2 columns)
FROM Parent left join Child2 on ...
WHERE (filter on Parent)
))
Is there a way to get Entity Framework to behave like batch fetch in NHibernate (or JPA, EclipseLink, Hibernate etc.) where you can specify that you want to query the parent table first, then each child table separately?
SELECT ... from Parent -- as usual
SELECT ... from Child where parent_id in (list of IDs)
SELECT ... from Child2 where parent_id in (list of IDs)
-- alternatively, you can specify EXISTS instead of IN LIST:
SELECT ... from Child where exists (select 1 from Parent where child.parent_id = parent.id and (where clause for parent))
I find this easier to understand and reason about, since it more closely resembles the SQL you would write if you were writing it by hand. Also, it prevents the redundant parent table rows in the result set. On the other hand, it's more round trips.
I do not believe this is possible with the Entity Framework, at least using LINQ. At the end of the day the ORM attempts to generate the most efficient query possible, at least to it. That being said ORMs like Entity don't always generate the nicest looking SQL or the most efficient. My guess, and this is just a guess, is Entity is trying to reduce the number of trips and I/O becaus I/O is experience, in relativity.
If you are looking for fine grain control over your SQL I recommend you avoid ORMs, or do like I do, use Entity for your basic CRUD and simple queries, used stored procedures for your complex queries, such as complex reports. There is always the ADO.NET too, but seems like you are more intent on using an ORM.
You may fine this useful as well. Basically not much tuning is available. https://stackoverflow.com/a/22390400/2272004
Entity Framework misses out on many sophisticated features NHibernate offers. EF's unique selling point is its versatile LINQ support, but if you need declarative control over how an ORM fetches your data spanning multiple tables, EF is not the tool of choice. With EF, you can only try to find procedural tricks.
Let me illustrate that by showing what you'd need to do to achieve "batch fetching" with EF:
context.Configuration.LazyLoadingEnabled = false;
context.Children1.Where(c1 => parentIds.Contains(c1.ParentId)).Load();
context.Children2.Where(c2 => parentIds.Contains(c2.ParentId)).Load();
var parents = dbContext.Parent.Where(p => parentIds.Contains(p.Id)).ToList();
This loads the required data into to context and EF connects the parent and children by relationship fixup. The result is a parents list with their two child collections populated. But of course, it has several downsides:
You need disable lazy loading, because even though the child collections are populated, they are not marked as loaded. Accessing them would still trigger lazy loading, when enabled.
Repetitive code: you need to repeat the predicates three times. It's not really easy to avoid this.
Too specific. For each different scenario, even if they are almost identical, you have to write a new set of statements. Or make it configurable, which is still a procedural solution.
EF's current main production version (6) doesn't have a query batch facility. You need third-party tools like EntityFramework.Extended to run these queries in one database roundtrip.

Entity Framework table per type - select from only the base type columns

We are using EF 4.3 Code first and have an object model like so:
class Content { }
class Product:Content { }
class News:Content { }
These are mapped as Table per Type.
There are scenarios where I just want to load only the columns belonging to the base table, like say a list of all the content titles. But a query like
from c in Content
where c.IsDeleted == false
select c
results in some really nasty SQL with joins to the other two tables. Is there any way to force EF to just do a select from the base table only without joins to the other tables?
TPT is problematic and EF generated queries are usually very inefficient. Moreover your expectations are probably incorrect. Linq-to-entities always returns the real type of entity. It cannot return instance of Content type if the record is in fact a Product entity. Your query can have only two meanings:
Return all non deleted contents - this must perform joins to correctly instantiate a real types of entities. The query will return enumeration of Content, Product and News instances.
Return all non deleted Content instances - this must probably again perform joins to correctly instantiate only records mapped to Content directly (without relation to Product and News). No record mapped to Product or News will be returned in the enumeration. This query is not possible with Linq-to-entities - you need to use ESQL and OFTYPE ONLY operator.
There are few things you can try:
Upgrade to .NET 4.5 - there are some improvements for TPT queries
Return projection of properties instead of Content - Product and News are also content so you will never get query without joins if you return Content instances from Linq-to-entities query

Entity Framework 4.1 query takes too long (5 seconds) to complete

I have DbContext (called "MyContext") with about 100 DbSets within it.
Among the domain classes, I have a Document class with 10 direct subclasses (like PurchaseOrder, RequestForQuotation etc).
The heirarchy is mapped with a TPT strategy.
That is, in my database, there is a Document table, with other tables like PurchaseOrder, RequestForQuotation for the subclasses.
When I do a query like:
Document document = myContext.Documents.First();
the query took 5 seconds, no matter whether it's the first time I run it or subsequently.
A query like:
Document document = myContext.Documents.Where(o => o.ID == 2);
also took as long.
Is this an issue with EF4.1 (if so, will EF4.2 help) or is this an issue with the query codes?
Did you try using SQL Profile to see what is actually sent to the DB? It could be that you have too many joins on your Document that are not set to lazy load, and so the query has to do all the joins in one go, bringing back too many columns. Try to send a simple query with just one return column.
As you can read here, there are some performance issues regarding TPT in EF.
The EF Team annouced several fixes in the June 2011 CTP, including TPT queries optimization, but they are not included in EF 4.2, as you can read in the comments to this answer.
In the worst case, these fixes will only be released with .NET 4.5. I'm hoping it will be sooner...
I'm not certain that the DbSet exposed by code-first actually using ObjectQuery but you can try to invoke the .ToTraceString() method on them to see what SQL is generated, like so:
var query = myContext.Documents.Where(o => o.ID == 2);
Debug.WriteLine(query.ToTraceString());
Once you get the SQL you can determine whether it's the query or EF which is causing the delay. Depending on the complexity of your base class the query might include a lot of additional columns, which could be avoided using projection. With using projections, you can perform a query like this:
var query = from d in myContext.Documents
where d.ID == 2
select new
{
o.Id
};
This should basically perform a SELECT ID FROM Documents WHERE ID = 2 query and you can measure how long this takes to gain further information. Of course the projected query might not fit your needs but it might get you on the right track. If this still takes up to 5 seconds you should look into performance problems with the database itself rather than EF.
Update
Apparently with code-first you can use .ToString() instead of .ToTraceString(), thanks Slauma for noticing.
I've just had a 5 sec delay in ExecuteFunction, on a stored procedure that runs instantaneously when called from SQL Management Studio. I fixed it by re-writing the procedure.
It appears that EF (and SSRS BTW) tries to do something like a "prepare" on the stored proc and for some (usually complex) procs that can take a very long time.
A quick and dirty solution is to duplicate and then replace your SP parameters with internal variables:
create proc ListOrders(#CountryID int = 3, #MaxOrderCount int = 20)
as
declare #CountryID1 int, #MaxOrderCount1 int
set #CountryID1 = #CountryID
set #MaxOrderCount1 = #MaxOrderCount
select top (#MaxOrderCount1) *
from Orders
where CountryID = #CountryID1

Select N+1 in next Entity Framework

One of the few valid complaints I hear about EF4 vis-a-vis NHibernate is that EF4 is poor at handling lazily loaded collections. For example, on a lazily-loaded collection, if I say:
if (MyAccount.Orders.Count() > 0) ;
EF will pull the whole collection down (if it's not already), while NH will be smart enough to issue a select count(*)
NH also has some nice batch fetching to help with the select n + 1 problem. As I understand it, the closest EF4 can come to this is with the Include method.
Has the EF team let slip any indication that this will be fixed in their next iteration? I know they're hard at work on POCO, but this seems like it would be a popular fix.
What you describe is not N+1 problem. The example of N+1 problem is here. N+1 means that you execute N+1 selects instead of one (or two). In your example it would most probably mean:
// Lazy loads all N Orders in single select
foreach(var order in MyAccount.Orders)
{
// Lazy loads all Items for single order => executed N times
foreach(var orderItem in order.Items)
{
...
}
}
This is easily solved by:
// Eager load all Orders and their items in single query
foreach(var order in context.Accounts.Include("Orders.Items").Where(...))
{
...
}
Your example looks valid to me. You have collection which exposes IEnumerable and you execute Count operation on it. Collection is lazy loaded and count is executed in memory. The ability for translation Linq query to SQL is available only on IQueryable with expression trees representing the query. But IQueryable represents query = each access means new execution in DB so for example checking Count in loop will execute a DB query in each iteration.
So it is more about implementation of dynamic proxy.
Counting related entities without loading them will is already possible in Code-first CTP5 (final release will be called EF 4.1) when using DbContext instead of ObjectContext but not by direct interaction with collection. You will have to use something like:
int count = context.Entry(myAccount).Collection(a => a.Orders).Query().Count();
Query method returns prepared IQueryable which is probably what EF runs if you use lazy loading but you can further modify query - here I used Count.

Entity framework function import, can't load relations for functions that return entity types

I've created a function import that returns the results of a stored proceedure as one of my entities. however I can't seem to traverse my through navigation properties to access the data in other entities. I know that you can use include() for objectQueries but can't find anything that will force the EF to load my relations for entity results of function imports.
Any ideas??
Thanks in advance.
This is not possible in EF 1.0
The reason is that EF will consider stored procedure values to be just values and not navigation properites.
For example, Employee entity has multiple Order entities. In Order you have a property called EmployeeID. When the database fills your query using include statements, it creates 1 projection query in SQL to populate all of the Order data that a particular Employee could have.
So if I said
var employee = context.Employees.Include("Orders").Where(e => e.ID == 1).First();
var orders = employee.Orders;
The SQL for the first query will create a projection query which will contain orders where the EmployeeID = 1.
Now when your stored procedure runs, this can do any code behind the scenes (in otherwords it can return any set of data). So when SQL runs the stored procedure, it just runs the code in that stored procedure and does not have any knowledge that EmployeeID on Order is an FK to that property. Additionally, if your stored procedure returns an Employee entity, then you are looking at another scenario where you will not even have an OrderID to pursue.
To work around this though, you can setup your query in EF using Include statements that can mirror any stored procedure. If you use the proper mix of .Select and .Include statements you should be able to do the same thing.