Selecting Grouped Items in EF Core - entity-framework

This worked in EF 6.4:
from a in Addresses
group a by new {a.StreetName, a.StreetNumber} into agrp
where agrp.Count() > 3
from aitem in agrp
select aitem
If EF Core 5 I get:
InvalidOperationException: The LINQ expression 'agrp => agrp' could
not be translated. Either rewrite the query in a form that can be
translated, or switch to client evaluation explicitly by inserting a
call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or
'ToListAsync'. See https://go.microsoft.com/fwlink/?linkid=2101038 for
more information.
Why? Is there a different way to write this?

If you're OK with the data being loaded into memory, a simple solution could be to add .ToList() or .AsEnumerable() after Addresses:
from a in Addresses.ToList() // or .AsEnumerable()
group a by new {a.StreetName, a.StreetNumber} into agrp
where agrp.Count() > 3
from aitem in agrp
select aitem
Note that this (in SqlServer) translates into:
SELECT [a].[Id], [a].[StreetName], [a].[StreetNumber]
FROM [Addresses] AS [a]
In EF Core, GroupBy is (in many cases) not translated to SQL, but is run in memory.
(To avoid accidentally loading a lot of data into memory, EF will throw an exception unless .ToList() or .AsEnumerable() is called to indicate that this is intentional.)
(...) Since no database structure can represent an IGrouping, GroupBy operators have no translation in most cases. When an aggregate operator is applied to each group, which returns a scalar, it can be translated to SQL GROUP BY in relational databases. (...)
- Complex query operators, GroupBy
The article also has an example of a query which translates into group by with a filter on Count (included below).
The example doesn't fully cover the example in the question, unfortunately. It would not return the relevant Address-objects, only the group-by Key and Count.
var query = from p in context.Set<Post>()
group p by p.AuthorId into g
where g.Count() > 0
orderby g.Key
select new
{
g.Key,
Count = g.Count()
};
SELECT [p].[AuthorId] AS [Key], COUNT(*) AS [Count]
FROM [Posts] AS [p]
GROUP BY [p].[AuthorId]
HAVING COUNT(*) > 0
ORDER BY [p].[AuthorId]

This seems to work:
var addressGroupQuery = from a in Addresses
group a by new {a.StreetName, a.StreetType} into agrp
where agrp.Count() > 3
select agrp.Max(a => a.AddressID);
var addresses = from a in Addresses
where addressGroupQuery.Contains(a.AddressID)
select a;
Note the addressGroupQuery remains a deferred query (no ToList). Generates some clean looking SQL too:
SELECT [a].[AddressID], [a].[City], [a].[Country], [a].[StreetName], [a].[StreetNumber], [a].[StreetType], [a].[UnitNumber]
FROM [Address] AS [a]
WHERE EXISTS (
SELECT 1
FROM [Address] AS [a0]
GROUP BY [a0].[StreetName], [a0].[StreetType]
HAVING (COUNT(*) > 3) AND (MAX([a0].[AddressID]) = [a].[AddressID]))

Related

Multiple queries being generated when entire element selected with efcore

EF is producing multiple queries (n+1) instead of a single query with a subquery when the selection contains the entire element instead of just part of it.
Set up a project as per https://learn.microsoft.com/en-us/ef/core/get-started/aspnetcore/new-db?tabs=visual-studio
context.Blogs.Select(a => new { a.Url, a.Posts.Count }).ToList(); runs this
SELECT [a].[Url], (
SELECT COUNT(*)
FROM [Posts] AS [p]
WHERE [a].[BlogId] = [p].[BlogId]
) AS [Count]
FROM [Blogs] AS [a]
But
context.Blogs.Select(a => new { a, a.Posts.Count }).ToList(); runs this
SELECT [a].[BlogId], [a].[Url]
FROM [Blogs] AS [a];
exec sp_executesql N'SELECT COUNT(*)
FROM [Posts] AS [p0]
WHERE #_outer_BlogId = [p0].[BlogId]',N'#_outer_BlogId int',#_outer_BlogId=2
How can I rework the linq to select the entire Blog object without generating multiple queries? Using include isn't helping from what i can see.

linq minus of to sum values

I want below SQL Query to be written in entity framework or LINQ.Can any one please help me on this
SQL Query:
select sum(CreditAmount)-sum(DebitAmount)
from [dbo].[JournalEntries]
where FKSubscriberID =3 and FKAccountID =1
In general, assuming C#, convert SQL to LINQ by converting phrases in LINQ comprehension syntax order, and if SQL has table aliases, use them in LINQ as range variables. Functions such as DISTINCT or TOP are called as functions over the whole query at the end. Doing multiple SUM in one query without the optimization #JonSkeet suggests requires an empty group...by to create an aggregate and then First() to reduce to a value:
(from je in dbo.JournalEntries
where je.FKSubscriberID == 3 && je.FKAccountID == 3
group je by 1 into jeg
select jeg.Sum(je => je.CreditAmount) - jeg.Sum(je => je.DebitAmount)).First()
Without the single group, you can aggregate the difference with
(from je in dbo.JournalEntries
where je.FKSubscriberID == 3 && je.FKAccountID == 3
select je.CreditAmount-je.DebitAmount).Sum()

Translate nested query SQL to EF

I have this table structure:
Classic many to many relationship. I want to get all the orders for products belonging to the category for a small number of products I provide. It may be easier to show the SQL that does exactly what I want:
select o.*
from [Order] o join Product p2 on o.FKCatalogNumber=p2.CatalogNumber
where p2.FKCategoryId IN
(select c.Id
from Category c join Product p1 on p1.FKCategoryId=c.Id
where p1.CatalogNumber in ('0001', '0002')
This example gives me all the orders belonging to the categories that catalog #'s 0001 and 0002 are in.
But I am unable to wrap my head around the equivalent EF syntax for this query. I'm embarrassed to say I spent half the day on this. I bet it's easy for someone out there.
I came up with this but it's not working (and probably not even close):
string[] catNumbers = {"0001", "0002"};
var orders = ctx.Categories
.SelectMany(c => c.Products, (c, p) => new {c, p})
.Where(#t => catNumbers.Contains(#t.p.CatalogNumber))
.Select(#t => #t.p.Orders)
.ToList();
You can use query syntax (which looks very similar to SQL) in LINQ, so if you're more comfortable with SQL then you may prefer to write your query like this:
string[] catNumbers = {"0001", "0002"};
var orders = from o in ctx.Orders
join p2 in ctx.Products on o.FKCatalogNumber equals p2.CatalogNumber
where
(
from c in ctx.Categories
join p1 in ctx.Products on c.ID equals p1.FKCategoryId
where catNumbers.Contains(p1.CatalogNumber)
select c.ID
).Contains(p2.FKCategoryId)
select o;
As you can see, it's actually just your SQL query rearranged slightly, but it compiles as C#.
Note that:
the [Order] o syntax for referencing tables is replaced by o in ctx.Orders
LINQ enforces which way round you do the join condition so I had to flip your on o.FKCatalogNumber=p2.CatalogNumber to be on o.FKCatalogNumber equals p2.CatalogNumber
instead of your where p2.FKCategoryId IN (...), the equivalent c# is (...).Contains(p2.FKCategoryId)
the select comes last, not first
but those are the only major changes. Otherwise, it's written just like SQL.
I'd also draw your attention to a distinction regarding this comment:
the equivalent EF syntax for this query
The syntax here isn't specific to EF, but is just LINQ - Language Integrated Querying. It has two flavours: query syntax (sometimes called declarative) and method syntax (sometimes called fluent). LINQ works on just about any collection that implements IEnumerable or IQueryable, including EF's DbSet.
For more info on the different ways of querying, this MSDN page is a decent place to start. There's also this handy reference table showing the equivalent query syntax for each method-syntax operator, where applicable.
You can still nest queries in EF. The following looks like it works for me:
string[] catNumbers = {"0001", "0002"};
var orders = ctx.Orders
.Where(o => ctx.Products
.Where(p => catNumbers.Contains(p.CatalogNumber))
.Select(p => p.CategoryId)
.Contains(o.Product.CategoryId)
);
This produces the following SQL:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CatalogNumber] AS [CatalogNumber]
FROM [dbo].[Orders] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Products] AS [Extent2]
INNER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[CategoryId] = [Extent3].[CategoryId]
WHERE ([Extent1].[CatalogNumber] = [Extent3].[CatalogNumber]) AND ([Extent2].[CatalogNumber] IN (N'0001', N'0002'))
)

Entity Framework .Any does not generate expected SQL WHERE clause

Entity Framework and Linq-To-Entities are really giving me some headaches. I have a fairly simple query:
var result = feed.FeedItems.Any(ei => ei.ServerId == "12345");
feed is a single EF entity I selected earlier in a separate query from the same context.
But the generated SQL just throws away the .Any condition and requests all FeedItems of the feed object which can be several thousands of records which is a waste of Network bandwith. Seems the actual .Any comparison is done in C#:
exec sp_executesql N'SELECT [t0].[Id], [t0].[FeedId], [t0].[ServerId], [t0].[Published], [t0].[Inserted], [t0].[Title], [t0].[Content], [t0].[Author], [t0].[WebUri], [t0].[CommentsUri]
FROM [dbo].[FeedItem] AS [t0]
WHERE [t0].[FeedId] = #p0',N'#p0 int',#p0=3
I also tried:
!feed.FeedItems.Where(ei => ei.ServerId == "12345").Any();
But it doesn't change anything. Even removing Any() and querying for the complete list of items does not change the query.
I don't get it ... why isn't this working as I would expect? There should be a
WHERE ServerId == 1234
clause in the SQL statement.
Thanks very much for any help/clarification :)
As Nicholas already noticed, looks like query executed in FeedItems property (possibly you are returning List or IEnumerable) and whole list of items are returned from database. After that you are applying Any to in-memory collection. That's why you don't see WHERE ServerId == 1234 in SQL query.
When you apply Any to IQueryable generated query will look like:
SELECT
(CASE
WHEN EXISTS(
SELECT NULL AS [EMPTY]
[dbo].[FeedItem] AS [t0]
WHERE [t0].[ServerId] = #p0
) THEN 1
ELSE 0
END) AS [value]

Convert SQL to LINQ, nested select, top, "distinct" using group by and multiple order bys

I have the following SQL query, which I'm struggling to convert to LINQ.
Purpose: Get the top 10 coupons from the table, ordered by the date they expire (i.e. list the ones that are about to expire first) and then randomly choosing one of those for publication.
Notes: Because of the way the database is structured, there maybe duplicate Codes in the Coupon table. Therefore, I am using a GROUP BY to enforce distinction, because I can't use DISTINCT in the sub select query (which I think is correct). The SQL query works.
SELECT TOP 1
c1.*
FROM
Coupon c1
WHERE
Code IN (
SELECT TOP 10
c2.Code
FROM
Coupon c2
WHERE
c2.Published = 0
GROUP BY
c2.Code,
c2.Expires
ORDER BY
c2.Expires
)
ORDER BY NEWID()
Update:
This is as close as I have got, but in two queries:
var result1 = (from c in Coupons
where c.Published == false
orderby c.Expires
group c by new { c.Code, c.Expires } into coupon
select coupon.FirstOrDefault()).Take(10);
var result2 = (from c in result1
orderby Guid.NewGuid()
select c).Take(1);
Here's one possible way:
from c in Coupons
from cs in
((from c in coupons
where c.published == false
select c).Distinct()
).Take(10)
where cs.ID == c.ID
select c
Keep in mind that LINQ creates a strongly-typed data set, so an IN statement has no general equivalent. I understand trying to keep the SQL tight, but LINQ may not be the best answer for this. If you are using MS SQL Server (not SQL Server Compact) you might want to consider doing this as a Stored Procedure.
Using MercurioJ's slightly buggy response, in combination with another SO suggested random row solution my solution was:
var result3 = (from c in _dataContext.Coupons
from cs in
((from c1 in _dataContext.Coupons
where
c1.IsPublished == false
select c1).Distinct()
).Take(10)
where cs.CouponId == c.CouponId
orderby _dataContext.NewId()
select c).Take(1);