EFCore returning too many columns for a simple LEFT OUTER join - entity-framework

I am currently using EFCore 1.1 (preview release) with SQL Server.
I am doing what I thought was a simple OUTER JOIN between an Order and OrderItem table.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId into tmp
from oi in tmp.DefaultIfEmpty()
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};
The actual data returned is correct (I know earlier versions had issues with OUTER JOINS not working at all). However the SQL is horrible and includes every column in Order and OrderItem which is problematic considering one of them is a large XML Blob.
SELECT [order].[OrderId], [order].[OrderStatusTypeId],
[order].[OrderSummary], [order].[OrderTotal], [order].[OrderTypeId],
[order].[ParentFSPId], [order].[ParentOrderId],
[order].[PayPalECToken], [order].[PaymentFailureTypeId] ....
...[orderItem].[OrderId], [orderItem].[OrderItemType], [orderItem].[Qty],
[orderItem].[SKU] FROM [Order] AS [order] LEFT JOIN [OrderItem] AS
[orderItem] ON [order].[OrderId] = [orderItem].[OrderId] ORDER BY
[order].[OrderId]
(There are many more columns not shown here.)
On the other hand - if I make it an INNER JOIN then the SQL is as expected with only the columns in my select clause:
SELECT [order].[OrderDt], [orderItem].[SKU], [orderItem].[Qty] FROM
[Order] AS [order] INNER JOIN [OrderItem] AS [orderItem] ON
[order].[OrderId] = [orderItem].[OrderId]
I tried reverting to EFCore 1.01, but got some horrible nuget package errors and gave up with that.
Not clear whether this is an actual regression issue or an incomplete feature in EFCore. But couldn't find any further information about this elsewhere.
Edit: EFCore 2.1 has addressed a lot of issues with grouping and also N+1 type issues where a separate query is made for every child entity. Very impressed with the performance in fact.
3/14/18 - 2.1 Preview 1 of EFCore isn't recommended because the GROUP BY SQL has some issues when using OrderBy() but it's fixed in nightly builds and Preview 2.

The following applies to EF Core 1.1.0 (release).
Although shouldn't be doing such things, tried several alternative syntax queries (using navigation property instead of manual join, joining subqueries containing anonymous type projection, using let / intermediate Select, using Concat / Union to emulate left join, alternative left join syntax etc.) The result - either the same as in the post, and/or executing more than one query, and/or invalid SQL queries, and/or strange runtime exceptions like IndexOutOfRange, InvalidArgument etc.
What I can say based on tests is that most likely the problem is related to bug(s) (regression, incomplete implementation - does it really matter) in GroupJoin translation. For instance, #7003: Wrong SQL generated for query with group join on a subquery that is not present in the final projection or #6647 - Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling etc.
Until it get fixed (when?), as a (far from perfect) workaround I could suggest using the alternative left outer join syntax (from a in A from b in B.Where(b = b.Key == a.Key).DefaultIfEmpty()):
var orders = from o in ctx.Order
from oi in ctx.OrderItem.Where(oi => oi.OrderId == o.OrderId).DefaultIfEmpty()
select new
{
OrderDt = o.OrderDt,
Sku = oi.Sku,
Qty = (int?)oi.Qty
};
which produces the following SQL:
SELECT [o].[OrderDt], [t1].[Sku], [t1].[Qty]
FROM [Order] AS [o]
CROSS APPLY (
SELECT [t0].*
FROM (
SELECT NULL AS [empty]
) AS [empty0]
LEFT JOIN (
SELECT [oi0].*
FROM [OrderItem] AS [oi0]
WHERE [oi0].[OrderId] = [o].[OrderId]
) AS [t0] ON 1 = 1
) AS [t1]
As you can see, the projection is ok, but instead of LEFT JOIN it uses strange CROSS APPLY which might introduce another performance issue.
Also note that you have to use casts for value types and nothing for strings when accessing the right joined table as shown above. If you use null checks as in the original query, you'll get ArgumentNullException at runtime (yet another bug).

Using "into" will create a temporary identifier to store the results.
Reference : MDSN: into (C# Reference)
So removing the "into tmp from oi in tmp.DefaultIfEmpty()" will result in the clean sql with the three columns.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};

Related

How do I return only the first object in an HQL cross join?

I have the following HQL
from
com.kable.web.allotment.model.Issue i
inner join fetch i.title
inner join fetch i.title.magazine
inner join fetch i.barcodes bcs
, Wholesaler w
LEFT join fetch w.localCurrencies c
inner join fetch w.location
where
w.id = :wholesalerId
and i.title.id = :titleid
and i.distributionStatus = :status
and (
(
i.distributionDate is null
and i.onSaleDate >= TRUNC(CURRENT_DATE)
)
or i.distributionDate >= TRUNC(CURRENT_DATE)
)
and bcs.type.id = w.location.id
and (bcs.localCurrency.id = c.localCurrencyType.id OR c.localCurrencyType.id IS NULL)
and i.onSaleDate BETWEEN COALESCE(c.effectiveDate, i.onSaleDate) and COALESCE(c.expirationDate, i.onSaleDate)
order by
i.distributionDate
, i.onSaleDate
All of my previously written code is expecting to get a List<Issue> back, but with the code above, I am also getting the wholesaler and its joins. In my results, I only want Issue, Title, Magazine, and Barcodes. I am using hibernate version 4.2.18.Final. How do I only return the 1st object graph? I found something about CROSS JOIN ON but it is only for Hibernate 5 or later, and I can't switch because the project is quite large and Java dependencies.
You simply need to add an explicit SELECT i clause.
As a side note, JOIN FETCH for Wholesaler associations doesn't make sense if it's not going to be present in the result anyway

How to use GROUP BY with Firebird?

I'm trying create a SELECT with GROUP BY in Firebird but I can't have any success. How could I do this ?
Exception
Can't format message 13:896 -- message file C:\firebird.msg not found.
Dynamic SQL Error.
SQL error code = -104.
Invalid expression in the select list (not contained in either an aggregate function or the GROUP BY clause).
(49,765 sec)
trying
SELECT FA_DATA, FA_CODALUNO, FA_MATERIA, FA_TURMA, FA_QTDFALTA,
ALU_CODIGO, ALU_NOME,
M_CODIGO, M_DESCRICAO,
FT_CODIGO, FT_ANOLETIVO, FT_TURMA
FROM FALTAS Falta
INNER JOIN ALUNOS Aluno ON (Falta.FA_CODALUNO = Aluno.ALU_CODIGO)
INNER JOIN MATERIAS Materia ON (Falta.FA_MATERIA = Materia.M_CODIGO)
INNER JOIN FORMACAOTURMAS Turma ON (Falta.FA_TURMA = Turma.FT_CODIGO)
WHERE (Falta.FA_CODALUNO = 238) AND (Turma.FT_ANOLETIVO = 2015)
GROUP BY Materia.M_CODIGO
Simple use of group by in firebird,group by all columns
select * from T1 t
where t.id in
(SELECT t.id FROM T1 t
INNER JOIN T2 j ON j.id = t.jid
WHERE t.id = 1
GROUP BY t.id)
Using GROUP BY doesn't make sense in your example code. It is only useful when using aggregate functions (+ some other minor uses). In any case, Firebird requires you to specify all columns from the SELECT column list except those with aggregate functions in the GROUP BY clause.
Note that this is more restrictive than the SQL standard, which allows you to leave out functionally dependent columns (ie if you specify a primary key or unique key, you don't need to specify the other columns of that table).
You don't specify why you want to group (because it doesn't make much sense to do it with this query). Maybe instead you want to ORDER BY, or you want the first row for each M_CODIGO.

Translate nested query SQL to EF

I have this table structure:
Classic many to many relationship. I want to get all the orders for products belonging to the category for a small number of products I provide. It may be easier to show the SQL that does exactly what I want:
select o.*
from [Order] o join Product p2 on o.FKCatalogNumber=p2.CatalogNumber
where p2.FKCategoryId IN
(select c.Id
from Category c join Product p1 on p1.FKCategoryId=c.Id
where p1.CatalogNumber in ('0001', '0002')
This example gives me all the orders belonging to the categories that catalog #'s 0001 and 0002 are in.
But I am unable to wrap my head around the equivalent EF syntax for this query. I'm embarrassed to say I spent half the day on this. I bet it's easy for someone out there.
I came up with this but it's not working (and probably not even close):
string[] catNumbers = {"0001", "0002"};
var orders = ctx.Categories
.SelectMany(c => c.Products, (c, p) => new {c, p})
.Where(#t => catNumbers.Contains(#t.p.CatalogNumber))
.Select(#t => #t.p.Orders)
.ToList();
You can use query syntax (which looks very similar to SQL) in LINQ, so if you're more comfortable with SQL then you may prefer to write your query like this:
string[] catNumbers = {"0001", "0002"};
var orders = from o in ctx.Orders
join p2 in ctx.Products on o.FKCatalogNumber equals p2.CatalogNumber
where
(
from c in ctx.Categories
join p1 in ctx.Products on c.ID equals p1.FKCategoryId
where catNumbers.Contains(p1.CatalogNumber)
select c.ID
).Contains(p2.FKCategoryId)
select o;
As you can see, it's actually just your SQL query rearranged slightly, but it compiles as C#.
Note that:
the [Order] o syntax for referencing tables is replaced by o in ctx.Orders
LINQ enforces which way round you do the join condition so I had to flip your on o.FKCatalogNumber=p2.CatalogNumber to be on o.FKCatalogNumber equals p2.CatalogNumber
instead of your where p2.FKCategoryId IN (...), the equivalent c# is (...).Contains(p2.FKCategoryId)
the select comes last, not first
but those are the only major changes. Otherwise, it's written just like SQL.
I'd also draw your attention to a distinction regarding this comment:
the equivalent EF syntax for this query
The syntax here isn't specific to EF, but is just LINQ - Language Integrated Querying. It has two flavours: query syntax (sometimes called declarative) and method syntax (sometimes called fluent). LINQ works on just about any collection that implements IEnumerable or IQueryable, including EF's DbSet.
For more info on the different ways of querying, this MSDN page is a decent place to start. There's also this handy reference table showing the equivalent query syntax for each method-syntax operator, where applicable.
You can still nest queries in EF. The following looks like it works for me:
string[] catNumbers = {"0001", "0002"};
var orders = ctx.Orders
.Where(o => ctx.Products
.Where(p => catNumbers.Contains(p.CatalogNumber))
.Select(p => p.CategoryId)
.Contains(o.Product.CategoryId)
);
This produces the following SQL:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CatalogNumber] AS [CatalogNumber]
FROM [dbo].[Orders] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Products] AS [Extent2]
INNER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[CategoryId] = [Extent3].[CategoryId]
WHERE ([Extent1].[CatalogNumber] = [Extent3].[CatalogNumber]) AND ([Extent2].[CatalogNumber] IN (N'0001', N'0002'))
)

Query produced for IN filter on 1-1 relation joins to parent table twice

I have this problem and reproduced it with AdventureWorks2008R2 to make it more easy. Basically, I want to filter a parent table for a list of IN values and I thought it would generate this type of query
but it doesn't.
SELECT * FROM SalesOrderDetail where EXISTS( select * from SalesOrderHeader where d.id=h.id and rowguid IN ('asdf', 'fff', 'weee' )
Any ideas how to change the LINQ statement to query Header only once?
(ignore the fact I'm matching on Guids - it will actually be integers; I was just quickly looking for a 1-1 table in EF because that's when the problem occurs and I happened to find these)
var guidsToFind = new Guid[] { Guid.NewGuid(), Guid.NewGuid(), Guid.NewGuid()};
AdventureWorks2008R2Entities context = new AdventureWorks2008R2Entities();
var g = context.People.Where(p => guidsToFind.Contains(p.BusinessEntity.rowguid)).ToList();
That produces the following more expensive query:
SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PersonType] AS [PersonType],
[Extent1].[NameStyle] AS [NameStyle],
[Extent1].[Title] AS [Title],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[MiddleName] AS [MiddleName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Suffix] AS [Suffix],
[Extent1].[EmailPromotion] AS [EmailPromotion],
[Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Extent1].[Demographics] AS [Demographics],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Person] AS [Extent1]
INNER JOIN [Person].[BusinessEntity] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]
LEFT OUTER JOIN [Person].[BusinessEntity] AS [Extent3] ON [Extent1].[BusinessEntityID] = [Extent3].[BusinessEntityID]
WHERE [Extent2].[rowguid] = cast('b95b63f9-6304-4626-8e70-0bd2b73b6b0f' as uniqueidentifier) OR [Extent3].[rowguid] IN (cast('f917a037-b86b-4911-95f4-4afc17433086' as uniqueidentifier),cast('3188557d-5df9-40b3-90ae-f83deee2be05' as uniqueidentifier))
Really odd. Looks like a LINQ limitation.
I don't have a system to try this on right now but if you first get a list of BusinessEntityId values based on the provided guids and then get the persons like this
var g = context.People.Where(p => businessEntityIdList.Contains(p.BusinessEntityId)).ToList();
there should not be a reason for additional unnecessary joins anymore.
If that works, you can try to combine the to steps into one LINQ expression to see if the separation stays intact.

Convert SQL to LINQ, nested select, top, "distinct" using group by and multiple order bys

I have the following SQL query, which I'm struggling to convert to LINQ.
Purpose: Get the top 10 coupons from the table, ordered by the date they expire (i.e. list the ones that are about to expire first) and then randomly choosing one of those for publication.
Notes: Because of the way the database is structured, there maybe duplicate Codes in the Coupon table. Therefore, I am using a GROUP BY to enforce distinction, because I can't use DISTINCT in the sub select query (which I think is correct). The SQL query works.
SELECT TOP 1
c1.*
FROM
Coupon c1
WHERE
Code IN (
SELECT TOP 10
c2.Code
FROM
Coupon c2
WHERE
c2.Published = 0
GROUP BY
c2.Code,
c2.Expires
ORDER BY
c2.Expires
)
ORDER BY NEWID()
Update:
This is as close as I have got, but in two queries:
var result1 = (from c in Coupons
where c.Published == false
orderby c.Expires
group c by new { c.Code, c.Expires } into coupon
select coupon.FirstOrDefault()).Take(10);
var result2 = (from c in result1
orderby Guid.NewGuid()
select c).Take(1);
Here's one possible way:
from c in Coupons
from cs in
((from c in coupons
where c.published == false
select c).Distinct()
).Take(10)
where cs.ID == c.ID
select c
Keep in mind that LINQ creates a strongly-typed data set, so an IN statement has no general equivalent. I understand trying to keep the SQL tight, but LINQ may not be the best answer for this. If you are using MS SQL Server (not SQL Server Compact) you might want to consider doing this as a Stored Procedure.
Using MercurioJ's slightly buggy response, in combination with another SO suggested random row solution my solution was:
var result3 = (from c in _dataContext.Coupons
from cs in
((from c1 in _dataContext.Coupons
where
c1.IsPublished == false
select c1).Distinct()
).Take(10)
where cs.CouponId == c.CouponId
orderby _dataContext.NewId()
select c).Take(1);