I'm finding that EF6 is generating different SQL when I need to get a count from a table vs a view.
Let's assume I have a table in SQL Server called Students. Assume there's a view that filters that table; we'll call it CurrentStudents.
When I use EF6 to get the count of the table, the query that EF creates is quite different from what it generates when I query the view count. I'm trying to understand why it does it differently.
I have example code below.
Looking at the table count code, we have:
context.Students.Where(x => x.Status == 1).Count()
That generates the EF query:
SELECT [GroupBy1].[A1] AS [C1]
FROM (SELECT
COUNT(1) AS [A1]
FROM [STUDENTS] AS [Extent1]
WHERE 1 = [Extent1].STATUS)
AS [GroupBy1]
EF generates the Count SQL and wraps it in another SELECT. No problem with that.
Now contrast to just changing the code to get the view count:
context.CurrentStudents.Where(x => x.Status == 1).Count()
Generated view query:
SELECT [GroupBy1].[A1] AS [C1]
FROM (SELECT
COUNT(1) AS [A1]
FROM (SELECT [STUDENTS].[ID] as [ID],
[STUDENTS].[STATUS] AS [STATUS],
[STUDENTS].[FIRSTNAME] AS [FIRSTNAME],
[STUDENTS].[LASTNAME] AS [LASTNAME],
[STUDENTS].[AGE] AS [AGE],
[STUDENTS].[MAJOR] AS [MAJOR],
[STUDENTS].[MINOR] AS [MINOR],
[STUDENTS].[DORM] AS [DORM],
[STUDENTS].[GRADDATE] AS [GRADDATE]
FROM [CURRENTSTUDENTS] AS [CURRENTSTUDENTS])
) AS [Extent1]
WHERE 1 = [Extent1].STATUS)
AS [GroupBy1]
The EF query in this case has added another nested SELECT, which is selecting every column that is available in the view. I'm not sure why EF has decided to add that extra level of verbosity just because a view is the count source. Just wondering if anyone has a better understanding of what's going on behind the scenes here.
Related
EF is producing multiple queries (n+1) instead of a single query with a subquery when the selection contains the entire element instead of just part of it.
Set up a project as per https://learn.microsoft.com/en-us/ef/core/get-started/aspnetcore/new-db?tabs=visual-studio
context.Blogs.Select(a => new { a.Url, a.Posts.Count }).ToList(); runs this
SELECT [a].[Url], (
SELECT COUNT(*)
FROM [Posts] AS [p]
WHERE [a].[BlogId] = [p].[BlogId]
) AS [Count]
FROM [Blogs] AS [a]
But
context.Blogs.Select(a => new { a, a.Posts.Count }).ToList(); runs this
SELECT [a].[BlogId], [a].[Url]
FROM [Blogs] AS [a];
exec sp_executesql N'SELECT COUNT(*)
FROM [Posts] AS [p0]
WHERE #_outer_BlogId = [p0].[BlogId]',N'#_outer_BlogId int',#_outer_BlogId=2
How can I rework the linq to select the entire Blog object without generating multiple queries? Using include isn't helping from what i can see.
I have this problem and reproduced it with AdventureWorks2008R2 to make it more easy. Basically, I want to filter a parent table for a list of IN values and I thought it would generate this type of query
but it doesn't.
SELECT * FROM SalesOrderDetail where EXISTS( select * from SalesOrderHeader where d.id=h.id and rowguid IN ('asdf', 'fff', 'weee' )
Any ideas how to change the LINQ statement to query Header only once?
(ignore the fact I'm matching on Guids - it will actually be integers; I was just quickly looking for a 1-1 table in EF because that's when the problem occurs and I happened to find these)
var guidsToFind = new Guid[] { Guid.NewGuid(), Guid.NewGuid(), Guid.NewGuid()};
AdventureWorks2008R2Entities context = new AdventureWorks2008R2Entities();
var g = context.People.Where(p => guidsToFind.Contains(p.BusinessEntity.rowguid)).ToList();
That produces the following more expensive query:
SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PersonType] AS [PersonType],
[Extent1].[NameStyle] AS [NameStyle],
[Extent1].[Title] AS [Title],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[MiddleName] AS [MiddleName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Suffix] AS [Suffix],
[Extent1].[EmailPromotion] AS [EmailPromotion],
[Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Extent1].[Demographics] AS [Demographics],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Person] AS [Extent1]
INNER JOIN [Person].[BusinessEntity] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]
LEFT OUTER JOIN [Person].[BusinessEntity] AS [Extent3] ON [Extent1].[BusinessEntityID] = [Extent3].[BusinessEntityID]
WHERE [Extent2].[rowguid] = cast('b95b63f9-6304-4626-8e70-0bd2b73b6b0f' as uniqueidentifier) OR [Extent3].[rowguid] IN (cast('f917a037-b86b-4911-95f4-4afc17433086' as uniqueidentifier),cast('3188557d-5df9-40b3-90ae-f83deee2be05' as uniqueidentifier))
Really odd. Looks like a LINQ limitation.
I don't have a system to try this on right now but if you first get a list of BusinessEntityId values based on the provided guids and then get the persons like this
var g = context.People.Where(p => businessEntityIdList.Contains(p.BusinessEntityId)).ToList();
there should not be a reason for additional unnecessary joins anymore.
If that works, you can try to combine the to steps into one LINQ expression to see if the separation stays intact.
I am trying to get the entities of my table grouped by an ID and filtered by MAX date, but I am not capable to do it with Entity Framework.
SELECT * FROM My_table UI
INNER JOIN
(
SELECT idgroup, MAX(filter_date) AS DateFilter FROM My_table
WHERE license = 1
GROUP BY idgroup
) T
ON T.idgroup= UI.idgroup AND T.DateFilter = UI.filter_date
WHERE license = 1
My solution for that right now is a StoredProcedure, but I think that this could be possible with EF, but I do not know how to do it.
Anyone know how to do that with EF statement?
My_table is already mapped as a EF entity in my project.
I am using EF4.0, and I wrote a query:
var query = context.Post.Where(p => p.Id == postId).SingleOrDefault();
I need only One post from this query. I thought SingleOrDefault() will generate "SELECT TOP(1) ...", but when I look into SQL Profiler, It was:
exec sp_executesql N'SELECT TOP (2)
[Extent1].[Id] AS [Id],
[Extent1].[Title] AS [Title],
[Extent1].[Slug] AS [Slug],
[Extent1].[PubDate] AS [PubDate],
[Extent1].[PostContent] AS [PostContent],
[Extent1].[Author] AS [Author],
[Extent1].[CommentEnabled] AS [CommentEnabled],
[Extent1].[AttachmentId] AS [AttachmentId],
[Extent1].[IsPublished] AS [IsPublished],
[Extent1].[Hits] AS [Hits],
[Extent1].[CategoryId] AS [CategoryId]
FROM [dbo].[Post] AS [Extent1]
WHERE [Extent1].[Id] = #p__linq__0',N'#p__linq__0 uniqueidentifier',#p__linq__0='ECD9F3BE-3CA9-462E-AE79-2B28C8A16E32'
I wonder why EF result in SELECT TOP (2)? I only need one post.
It selects top 2 so that if there are actually 2 or more than 2 records in the database, an exception would be thrown. If it only selects top 1 there would be no way to error out.
By asking for the SingleOrDefault of a sequence, you are asking for this behaviour:
if the sequence has exactly 0 elements, return the default for the sequence's element type
if the sequence has exactly 1 element, return the element
if the sequence has more than 1 element, throw
Doing a TOP (1) would empower the first two parts of this, but not the third. Only by doing a TOP (2) can we differentiate between exactly 1 record and more than 1 record.
If you don't want or need the third part of the above behviour, instead use FirstOrDefault.
I have two entities, User and UserPermission. The User entity contains all your normal fields, Id, Username, Email, etc and the UserPermission entity has two values, UserId and PermissionId. I have written a repository method GetUserWithPermissions that originally utilized the Include extension and did something like this:
return dbContext.Users.Include(u => u.UserPermission).Where(u => u.Username.Equals(username)).FirstOrDefault();
It works great but the issues is that there are going to be a bunch of UserPermission entities associated with any given user and using the Include extension essentially just flattens the two tables into one so ALL of the user fields are repeated for every single UserPermission associated with a User. The returned data looks something like this:
Id Username Email ... PermissionId
1 johndoe john#email.com 1
1 johndoe john#email.com 2
1 johndoe john#email.com 3
1 johndoe john#email.com 4
1 johndoe john#email.com 5
1 johndoe john#email.com 6
1 johndoe john#email.com 7
The only difference between each row is the last column PermissionId. If we have 50 permissions defined for the user, that is a large chunk of repeated data being returned when I do not think it is necessary. Obviously my other option is to do something like this:
User user = dbContext.Users.Where(u => u.Username.Equals(username)).FirstOrDefault();
if (user != null)
user.UserPermissions.ToList();
return user;
The above code accomplishes the same thing with drastically less data being returned but with the trade off that two trips are being made to the database.
Which method is better? Returning a lot of repeated data or making two trips to the database?
Here is the SQL query that is generated by the Entity Framework
SELECT
[Project2].[Id] AS [Id],
[Project2].[Username] AS [Username],
[Project2].[LoweredUsername] AS [LoweredUsername],
[Project2].[CompanyId] AS [CompanyId],
[Project2].[FirstName] AS [FirstName],
[Project2].[LastName] AS [LastName],
[Project2].[Email] AS [Email],
[Project2].[C1] AS [C1],
[Project2].[UserId] AS [UserId],
[Project2].[PermissionValue] AS [PermissionValue]
FROM ( SELECT
[Limit1].[Id] AS [Id],
[Limit1].[Username] AS [Username],
[Limit1].[LoweredUsername] AS [LoweredUsername],
[Limit1].[CompanyId] AS [CompanyId],
[Limit1].[FirstName] AS [FirstName],
[Limit1].[LastName] AS [LastName],
[Limit1].[Email] AS [Email],
[Extent2].[UserId] AS [UserId],
[Extent2].[PermissionValue] AS [PermissionValue],
CASE WHEN ([Extent2].[PermissionValue] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM (SELECT TOP (1)
[Extent1].[Id] AS [Id],
[Extent1].[Username] AS [Username],
[Extent1].[LoweredUsername] AS [LoweredUsername],
[Extent1].[CompanyId] AS [CompanyId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Email] AS [Email]
FROM [dbo].[Users] AS [Extent1]
WHERE [Extent1].[LoweredUsername] = (LOWER(LTRIM(RTRIM(#p__linq__0)))) ) AS [Limit1]
LEFT OUTER JOIN [dbo].[UserPermissions] AS [Extent2] ON [Limit1].[Id] = [Extent2].[UserId]
) AS [Project2]
ORDER BY [Project2].[Id] ASC, [Project2].[C1] ASC
Thanks
Nick
It's the way how it works. Include of collections leads indeed to duplication of the columns of the parent entity (see here for great example and explanation: How many Include I can use on ObjectSet in EntityFramework to retain performance?)
And you have a trade-off without a general rule which way is better: One roundtrip with Include but duplicated data or two roundtrips without duplicated data. What is better/more performant? I think you have to measure it case by case if you want an exact answer.
I could imagine that as a rule of thumb we could say: If the parent has many columns and the child collection only a few and the child collection can possibly be very long, then this is a candidate to prefer two roundtrips to avoid the data duclication.
If you don't want eager loading with Include you can either rely on lazy loading or you can use explicite loading:
User user = dbContext.Users.Where(u => u.Username.Equals(username))
.FirstOrDefault();
if (user != null)
dbContext.Entry(user).Collection(u => u.UserPermissions).Load();
return user;
Just wondering if you could do a query that selects a new object with the permisisions as a list.
this is all COMPLETELY psuedo code/untested.not compiled (so modify as you need if you try it ;) )
var userinfo = from u in dbContext.Users
Where(u => u.Username.Equals(username))
Select new { User = u, Permissions = u.UserPermissions.ToList() };
second note this is not tested or even writen in an editor to test if it compiles. Just a quick shot from the hip.
an idea to consider?
I asked similar question. There are some suggestions how to limitate duplication. But I guess it would be difficult to make Entity Framework generate those queries.