EF Where(x => x.ColumnVal == 1) vs FirstOrDefault(x => x.Column == 1) - entity-framework

I had a LINQ query that loads a hierarchy of objects like the following.
Query #1
var result = db.Orders
.Include("Customer")
// many other .Include() here
.FirstOrDefault(x => x.Customer.CustomerId == 1 &&
x.OrderId == orderId);
I was having MAJOR performance problem with it.
The CPU usage was near 100% and memory usage was very high.
And I tweaked it to the following and the performance problem was fixed.
Query #2
var result = db.Orders
.Include("Customer")
// many other .Include() here
.Where(x => x.Customer.CustomerId == 1 &&
x.OrderId == orderId)
.FirstOrDefault();
I just want to confirm my suspicion.
Query #1 is probably looping through all my records in memory looking for a matching record
vs
Query #2 filters the records on the Database and then getting the first record only.
Is that why the Query #1 has performance problems?
Just to be safe, do I need to use the .Select(x => x) before the .FirstOrDefault()?
Query #3
var result = db.Orders
.Include("Customer")
// many other .Include() here
.Where(x => x.Customer.CustomerId == 1 &&
x.OrderId == orderId)
.Select(x => x)
.FirstOrDefault();

No, they both should result in a same SQL query when being executed. You can prove it by looking into SQL Profiler and see what is the exact SQL being submitted from EF in both cases. Your performance optimization should have been caused by some other factors. Here is why:
Include method returns an ObjectQuery<T>:
public class ObjectQuery<T> : ObjectQuery, IOrderedQueryable<T>,
IQueryable<T>, IEnumerable<T>,
IOrderedQueryable, IQueryable,
IEnumerable, IListSource
Which means its FirstOrDefault method comes with 2 overloads:
// Defined by Enumerable:
FirstOrDefault(Func<T, Boolean>)
// Defined by Queryable:
FirstOrDefault(Expression<Func<T, Boolean>>)
When you code .FirstOrDefault(x => x.Customer.CustomerId == 1 compiler will go into a process called Overload Resolution to infer the type of the lambda expression x => x.Customer.CustomerId == 1 since it is convertible to the type of both overload's parameter types.
Compiler will use an algorithm (that I am still trying to find in C# Language Specification!), figure out that converting the lambda to the Expression<Func<T, Boolean> is a better conversion than to Func<T, Boolean> so pick the IQueryable overload.
Therefore, you'll see the predicate in the generated SQL when observing it in the SQL Profiler.

I found the culprit. It's the SQL query generated by Entity Framework.
I have a complicated Schema with a lot of many-to-many relationships.
Entity Framework was generating a 32,000 line long SQL string :'(
Now, I am changing my code to load the hierarchy manually for some part.
Please let me know if anyone knows some good articles to read about Eager Loading and Many-to-Many relationships.

I think best would be to use ...Where(condition).Take(1).FirstOrDefault() because Take(1) can be easily translated to SQL as a TOP clause. Anybody with me?

Related

Entity framework 5.0 First or Group By Issue- After upgrading from 2.2 to 5.0

I have a table called Products and I need to find the products with unique title for a particular category. Earlier we used to do with this query in entity framework core 2.2 :
currentContext.Products
.GroupBy(x => x.Title)
.Select(x => x.FirstOrDefault()))
.Select(x => new ProductViewModel
{
Id = x.Id,
Title = x.Title,
CategoryId= x.CategoryId
}).ToList();
But after upgrading to Entity Framework Core 5.0, we get an error for Groupby Shaker exception:
The LINQ expression 'GroupByShaperExpression:KeySelector: t.title, ElementSelector:EntityShaperExpression: EntityType: Project ValueBufferExpression: ProjectionBindingExpression: EmptyProjectionMember IsNullable: False .FirstOrDefault()' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'.
I know there are multiple way to client projection but I am searching for most efficient way to search.
Most likely that LINQ query couldn't be translated in EF Core 2.2 either, because of some limitations that the GroupBy operator has.
From the docs:
Since no database structure can represent an IGrouping, GroupBy operators have no translation in most cases. When an aggregate operator is applied to each group, which returns a scalar, it can be translated to SQL GROUP BY in relational databases. The SQL GROUP BY is restrictive too. It requires you to group only by scalar values. The projection can only contain grouping key columns or any aggregate applied over a column.
What happened in EF Core 2.x is that whenever it couldn't translate an expression, it would automatically switch to client evaluation and give just a warning.
This is listed as the breaking change with highest impact when migrating to EF Core >= 3.x :
Old behavior
Before 3.0, when EF Core couldn't convert an expression that was part of a query to either SQL or a parameter, it automatically evaluated the expression on the client. By default, client evaluation of potentially expensive expressions only triggered a warning.
New behavior
Starting with 3.0, EF Core only allows expressions in the top-level projection (the last Select() call in the query) to be evaluated on the client. When expressions in any other part of the query can't be converted to either SQL or a parameter, an exception is thrown.
So if the performance of that expression was good enough when using EF Core 2.x, it will be as good as before if you decide to explicitly switch to client evaluation when using EF Core 5.x. That's because both are client evaluated, before and now, with the only difference being that you have to be explicit about it now. So the easy way out, if the performance was acceptable previously, would be to just client evaluate the last part of the query using .AsEnumerable() or .ToList().
If client evaluation performance is not acceptable (which will imply that it wasn't before the migration either) then you have to rewrite the query. There are a couple of answers by Ivan Stoev that might get you inspired.
I am a little confused by the description of what you want to achieve: I need to find the products with unique title for a particular category and the code you posted, since I believe it's not doing what you explained. In any case, I will provide possible solutions for both interpretations.
This is my attempt of writing a query to find the products with unique title for a particular category.
var uniqueProductTitlesForCategoryQueryable = currentContext.Products
.Where(x => x.CategoryId == categoryId)
.GroupBy(x => x.Title)
.Where(x => x.Count() == 1)
.Select(x => x.Key); // Key being the title
var productsWithUniqueTitleForCategory = currentContext.Products
.Where(x => x.CategoryId == categoryId)
.Where(x => uniqueProductTitlesForCategoryQueryable .Contains(x.Title))
.Select(x => new ProductViewModel
{
Id = x.Id,
Title = x.Title,
CategoryId= x.CategoryId
}).ToList();
And this is my attempt of rewriting the query you posted:
currentContext.Products
.Select(product => product.Title)
.Distinct()
.SelectMany(uniqueTitle => currentContext.Products.Where(product => product.Title == uniqueTitle ).Take(1))
.Select(product => new ProductViewModel
{
Id = product.Id,
Title = product.Title,
CategoryId= product.CategoryId
})
.ToList();
I am getting the distinct titles in the Product table and per each distinct title I get the first Product that matches it (that should be equivalent as GroupBy(x => x.Title)+ FirstOrDefault AFAIK). You could add some sorting before the Take(1) if needed.
You can use Join for this query as below :
currentContext.Products
.GroupBy(x => x.Title)
.Select(x => new ProductViewModel()
{
Title = x.Key,
Id = x.Min(b => b.Id)
})
.Join(currentContext.Products, a => a.Id, b => b.Id,
(a, b) => new ProductViewModel()
{
Id = a.Id,
Title = a.Title,
CategoryId = b.CategoryId
}).ToList();
If you watch or log translated SQL query, it would be as below:
SELECT [t].[Title], [t].[c] AS [Id], [p0].[CategoryId] AS [CategoryId]
FROM (
SELECT [p].[Title], MIN([p].[Id]) AS [c]
FROM [Product].[Products] AS [p]
GROUP BY [p].[Title]
) AS [t]
INNER JOIN [Product].[Products] AS [p0] ON [t].[c] = [p0].[Id]
As you can see, the entire query is translated into one SQL query and it is highly efficient because GroupBy operation is being performed in database and no additional record is fetched by the client.
As mentioned by Ivan Stoev, EFC 2.x just silently loads full table to the client side and then apply needed logic for extracting needed result. It is resource consuming way and thanks that EFC team uncovered such potential harmful queries.
Most effective way is already known - raw SQL and window functions. SO is full of answers like this.
SELECT
s.Id,
s.Title,
s.CategoryId
FROM
(SELECT
ROW_NUMBER() OVER (PARTITION BY p.Title ORDER BY p.Id) AS RN,
p.*
FROM Products p) s
WHERE s.RN = 1
Not sure that EFC team will invent universal algorithm for generating such SQL in nearest future, but for special edge cases it is doable and maybe it is their plan to do that for EFC 6.0
Anyway if performance and LINQ is priority for such question, I suggest to try our adaptation of linq2db ORM for EF Core projects: linq2db.EntityFrameworkCore
And you can get desired result without leaving LINQ:
urrentContext.Products
.Select(x => new
{
Product = x,
RN = Sql.Ext.RowNumber().Over()
.PartitionBy(x.Title)
.OrderBy(x.Id)
.ToValue()
})
.Where(x => x.RN == 1)
.Select(x => x.Product)
.Select(x => new ProductViewModel
{
Id = x.Id,
Title = x.Title,
CategoryId = x.CategoryId
})
.ToLinqToDB()
.ToList();
Short answer is you deal with breaking changes in EF Core versions.
You should consider the total API and behavior changes for migration from 2.2 to 5.0 as I provided bellow:
Breaking changes included in EF Core 3.x
Breaking changes in EF Core 5.0
You may face other problems to write valid expressions using the newer version. In my opinion, upgrading to a newer version is not important itself. This is important to know how to work with a specific version.
You should use .GroupBy() AFTER materialization. Unfortunately, EF core doesn't support GROUP BY. In version 3 they introduced strict queries which means you can not execute IQeuriables that can't be converted to SQL unless you disable this configuration (which is not recommended). Also, I'm not sure what are you trying to get with GroupBy() and how it will influence your final result. Anyway, I suggest you upgrade your query like this:
currentContext.Products
.Select(x=> new {
x.Id,
x.Title,
x.Category
})
.ToList()
.GroupBy(x=> x.Title)
.Select(x => new Wrapper
{
ProductsTitle = x.Key,
Products = x.Select(p=> new ProductViewModel{
Id = p.Id,
Title = p.Title,
CategoryId= p.CategoryId
}).ToList()
}).ToList();

GroupJoin: exception thrown: System.InvalidOperationException

I'm trying to write a query to get all restaurant tables if exists or not a opened sale on it.
if a sale exists on a table I want to get the sum and couple details.that is my code:
db.SALETABLES
.GroupJoin(
db.SALES.Where(c => c.CLOSEDTIME == null),
t => t.ID,
sa => sa.ID_TABLE,
(ta, s) => new
{
ta.ID,
ta.DESCRIPTION,
NR_SALE = s.Any() ? s.First().NR_SALE : 0,
IDSALE = s.Any() ? s.First().ID : 0,
IDUSER = s.Any() ? s.First().IDUSER : 0,
USERNAME = s.Any() ? s.First().USERS.USERNAME :"" ,
SALESUM = s.Any() ? s.First().SALES_DETAIL.Sum(p => p.PRICE * p.CANT) : 0
}
but got this error:
Exception thrown: 'System.InvalidOperationException' in
System.Private.CoreLib.dll
thanks for any help
You don't specify the exception, but I assume it's about client-side evaluation (CSE), and you configured EF to throw an exception when it occurs.
It may be First() that triggers CSE, or GroupJoin. The former can easily be fixed by using FirstOrDefault(). The GroupJoin has more to it.
In many cases it isn't necessary to use GroupJoin at all, of Join, for that matter. Usually, manually coded joins can and should be replaced by navigation properties. That doesn't only make the code better readable, but also avoids a couple of issues EF 2.x has with GroupJoin.
Your SaleTable class (I'm not gonna follow your database-driven names) should have a property Sales:
public ICollection<Sale> Sales { get; set; }
And if you like, Sale could have the inverse navigation property:
public SaleTable SaleTable { get; set; }
Configured as
modelBuilder.Entity<SaleTable>()
.HasMany(e => e.Sales)
.WithOne(e => e.SaleTable)
.HasForeignKey(e => e.SaleTableId) // map this to ID_TABLE
.IsRequired();
Now using a table's Sales property will have the same effect as GroupJoin —a unique key, here a SaleTable, with an owned collection— but without the issues.
The next improvement is to simplify the query. In two ways. 1. You repeatedly access the first Sale, so use the let statement. 2. The query is translated into SQL, so don't worry about null references, but do prepare for null values. The improved query will clarify what I mean.
var query = from st in db.SaleTables
let firstSale = st.Sales.FirstOrDefault()
select new
{
st.ID,
NrSale = (int?)firstSale.NrSale ?? 0,
IdSale = (int?)firstSale.ID ?? 0,
...
SalesSum = (int?)firstSale.SalesDetails.Sum(p => p.Price * p.Cant) ?? 0
}
Using NrSale = firstSale.NrSale, would throw an exception for SaleTables without Sales (Nullable object must have a value).
Since the exception is by the EF Core infrastructure, apparently you are hitting current EF Core implementation bug.
But you can help EF Core query translator (thus avoiding their bugs caused by missing use cases) by following some rules when writing your LINQ to Entities queries. These rules will also eliminate in most of the cases the client evaluation of the query (or exception in EF Core 3.0+).
One of the rules which is the origin of issues with this specific query is - never use First. The LINQ to Objects behavior of First is to throw exception if the set is empty. This is not natural for SQL which naturally supports and returns NULL even for values which normally do not allow NULL. In order to emulate the LINQ to Objects behavior, EF Core has to evaluate First() client side, which is not good even if it works. Instead, use FirstOrDefault() which has the same semantics as SQL, hence is translated.
To recap, use FirstOrDefault() when you need the result to be a single "object" or null, or Take(1) when you want the result to be a set with 0 or one elements.
In this particular case, it's better to incorporate the 0 or 1 related SALE rule directly into the join subquery, by removing the GroupJoin and replacing it with SelectMany with correlated Where. And the Any() checks are replaced with != null checks.
With that said, the modified working and fully server translated query looks like this:
var query = db.SALETABLES
.SelectMany(ta => db.SALES
.Where(s => ta.ID == s.ID_TABLE && s.CLOSEDTIME == null).Take(1), // <--
(ta, s) => new
{
ta.ID,
ta.DESCRIPTION,
NR_SALE = s != null ? s.NR_SALE : 0,
IDSALE = s != null ? s.ID : 0,
IDUSER = s != null ? s.IDUSER : 0,
USERNAME = s != null ? s.USERS.USERNAME : "",
SALESUM = s != null ? s.SALES_DETAIL.Sum(p => p.PRICE * p.CANT) : 0
});

Why do I have to Include other entities into my Linq query?

In the below query, why do I have to include the related entities in my query to get a value for them? I mean why Lazy-loading does not seem to work and do I have to do Eager-loading instead?
var acceptedHitchRequest = await _acceptedRequestRepository.GetAll()
.Include(p => p.HitchRequest)
.Include(p => p.CarparkRequest)
.Include(p => p.HitchRequest.User)
.Include(p => p.CarparkRequest.User)
.Where(p => (input.HitchRequestId.HasValue ? p.HitchRequest.Id == input.HitchRequestId : p.CarparkRequest.Id == input.CarparkRequestId)
&& p.IsActive).FirstOrDefaultAsync();
if (input.HitchRequestId.HasValue && acceptedHitchRequest.HitchRequest.CreatorUserId == AbpSession.UserId)
The CreatorUserId in the if condition would throw an exception because the HitchRequest would be null if I were not using the Include().
Inclue() method provides eager loading instead of lazy loading. I'm explaining to you the difference between the two based on my knowledge.
Lazy loading. It gives you records only for the entity itself and one each time that related data (in your case HitchRequest) for
the entity must be retrieved. The DbContext class gives you lazy
loading by default.
Eager loading. When the entity is read, related data is retrieved along with it. This typically results in a single join query that
retrieves all of the data that's needed. You specify eager loading by
using the Include method.
The first statement without Include() is equivalent to the below statement, that's why HitchRequest is null if you don't use Include():
SELECT * FROM AcceptedRequest;
The statement which uses Include("HitchRequest.User") is equivalent to the below statement:
SELECT * FROM AcceptedRequest JOIN Orders ON AcceptedRequest.Id = HitchRequest.User.AcceptedRequestId;
You can refer to this very useful article.
Entity Framework Loading Related Entities, Eager Loading and Eager Loading in Entity Framework

Entity Framework DB select with Skip and Take fetches the entire table

I have the following EF code
Func<CxForumArticle, bool> whereClause = a => a.CreatedBy == authorId;
IEnumerable<CxForumArticle> articlesCol = ctx.Articles
.Where(whereClause)
.Where(a => a.PublishingStatus == EnPublishStatus.PUBLISHED)
.OrderByDescending(a => a.ModifiedOn).Skip(offset).Take(pageSize);
It produces the following SQL
SELECT
[Extent1].[ArticleId] AS [ArticleId],
[Extent1].[Alias] AS [Alias],
[Extent1].[MigratedId] AS [MigratedId],
[Extent1].[Title] AS [Title],
[Extent1].[Teaser] AS [Teaser],
[Extent1].[ClobId] AS [ClobId],
[Extent1].[UnifiedContentId] AS [UnifiedContentId],
[Extent1].[EditorComments] AS [EditorComments],
[Extent1].[CreatedOn] AS [CreatedOn],
[Extent1].[CreatedBy] AS [CreatedBy],
[Extent1].[ModifiedOn] AS [ModifiedOn],
[Extent1].[ModifiedBy] AS [ModifiedBy],
[Extent1].[PublishingStatus] AS [PublishingStatus]
FROM [dbo].[ForumArticle] AS [Extent1]
As you see, there is no ordering and paging in this SQL. So EF orders and pages data in memory.
This doesn't seem to be a good thing to do.
I read an article, claiming that I have to use expression in OrderBy clause. I did that
Func<CxForumArticle, bool> whereClause = a => a.CreatedBy == authorId;
Expression<Func<CxForumArticle, DateTime>> orderByFunc = a => a.ModifiedOn;
IEnumerable<CxForumArticle> articlesCol = ctx.Articles
.Where(whereClause)
.Where(a => a.PublishingStatus == EnPublishStatus.PUBLISHED)
.OrderByDescending(orderByFunc.Compile()).Skip(offset).Take(pageSize)
;
But I got the same result. Any ideas how can I force EF to sort and page the data in DB?
I know this is pretty old, but you actually need to have the whereClause use an expression as well. Entity Framework only knows how to convert expressions to SQL. If you use a delegate instead of an expression, it is going to query the entire table and then then filter using LINQ to Objects.

Filtering related entities in entity framework 6

I want to fetch the candidate and the work exp where it is not deleted. I am using repository pattern in my c# app mvc.
Kind of having trouble filtering the record and its related child entities
I have list of candidates which have collection of workexp kind of throws error saying cannot build expression from the body.
I tried putting out anonymous object but error still persist, but if I use a VM or DTO for returning the data the query works.
It's like EF doesn't like newing up of the existing entity within its current context.
var candidate = dbcontext.candidate
.where(c=>c.candiate.ID == id).include(c=>c.WorkExperience)
.select(e=>new candidate
{
WorkExperience = e.WorkExperience.where(k=>k.isdeleted==false).tolist()
});
Is there any workaround for this?
You cannot call ToList in the expression that is traslated to SQL. Alternatively, you can start you query from selecting from WorkExperience table. I'm not aware of the structure of your database, but something like this might work:
var candidate = dbcontext.WorkExperience
.Include(exp => exp.Candidate)
.Where(exp => exp.isdeleted == false && exp.Candidate.ID == id)
.GroupBy(exp => exp.Candidate)
.ToArray() //query actually gets executed and return grouped data from the DB
.Select(groped => new {
Candidate = grouped.Key,
Experience = grouped.ToArray()
});
var candidate =
from(dbcontext.candidate.Include(c=>c.WorkExperience)
where(c=>c.candiate.ID == id)
select c).ToList().Select(cand => new candidate{WorkExperience = cand.WorkExperience.where(k=>k.isdeleted==false).tolist()});