Query produced for IN filter on 1-1 relation joins to parent table twice - entity-framework

I have this problem and reproduced it with AdventureWorks2008R2 to make it more easy. Basically, I want to filter a parent table for a list of IN values and I thought it would generate this type of query
but it doesn't.
SELECT * FROM SalesOrderDetail where EXISTS( select * from SalesOrderHeader where d.id=h.id and rowguid IN ('asdf', 'fff', 'weee' )
Any ideas how to change the LINQ statement to query Header only once?
(ignore the fact I'm matching on Guids - it will actually be integers; I was just quickly looking for a 1-1 table in EF because that's when the problem occurs and I happened to find these)
var guidsToFind = new Guid[] { Guid.NewGuid(), Guid.NewGuid(), Guid.NewGuid()};
AdventureWorks2008R2Entities context = new AdventureWorks2008R2Entities();
var g = context.People.Where(p => guidsToFind.Contains(p.BusinessEntity.rowguid)).ToList();
That produces the following more expensive query:
SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PersonType] AS [PersonType],
[Extent1].[NameStyle] AS [NameStyle],
[Extent1].[Title] AS [Title],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[MiddleName] AS [MiddleName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Suffix] AS [Suffix],
[Extent1].[EmailPromotion] AS [EmailPromotion],
[Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Extent1].[Demographics] AS [Demographics],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Person] AS [Extent1]
INNER JOIN [Person].[BusinessEntity] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]
LEFT OUTER JOIN [Person].[BusinessEntity] AS [Extent3] ON [Extent1].[BusinessEntityID] = [Extent3].[BusinessEntityID]
WHERE [Extent2].[rowguid] = cast('b95b63f9-6304-4626-8e70-0bd2b73b6b0f' as uniqueidentifier) OR [Extent3].[rowguid] IN (cast('f917a037-b86b-4911-95f4-4afc17433086' as uniqueidentifier),cast('3188557d-5df9-40b3-90ae-f83deee2be05' as uniqueidentifier))

Really odd. Looks like a LINQ limitation.
I don't have a system to try this on right now but if you first get a list of BusinessEntityId values based on the provided guids and then get the persons like this
var g = context.People.Where(p => businessEntityIdList.Contains(p.BusinessEntityId)).ToList();
there should not be a reason for additional unnecessary joins anymore.
If that works, you can try to combine the to steps into one LINQ expression to see if the separation stays intact.

Related

Linq GroupJoin (join...into) results in INNER JOIN?

I am referencing the accepted answer to this question:
LINQ to SQL multiple tables left outer join
In my example, I need all of the Person records regardless if there is a matching Staff record.
I am using the following query (simplified for illustation's sake):
var result = from person in context.Person
join staffQ in context.Staff
on person.StaffID equals staffQ.ID into staffStaffIDGroup
from staff in staffStaffIDGroup.DefaultIfEmpty()
select new PersonModel()
{
ID = person.ID,
Fname = person.Fname,
Lname = person.Lname,
Sex = person.Sex,
Username = staff != null ? staff.Username : ""
};
However, contrary to my expectations, the query results in the following SQL with an INNER JOIN, which eliminates records I need in the the result set.
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[fname] AS [fname],
[Extent1].[lname] AS [lname],
[Extent1].[sex] AS [sex],
[Extent2].[username] AS [username]
FROM [dbo].[Person] AS [Extent1]
INNER JOIN [dbo].[Staff] AS [Extent2] ON [Extent1].[StaffID] = [Extent2].[ID]
I thought that GroupJoin (or join...into) is supposed to get around this? I know I must have made a dumb mistake here, but I can't see it.
In general the query should generate left outer join.
But remember, this is EF, and it has additional information coming from the model. In this case looks like the StaffID property of Person is an enforced FK constraint to Stuff, so EF knows that there is always a corresponding record in Staff table, hence ignoring your left outer join construct and generates inner join instead.
Again, the model (properties, whether they are required or not, the relationships - required or not etc.) allows EF to perform similar smart decisons and optimizations.
Use a Navigation Property instead of a Join. If you're using a Join in EF LINQ you're almost always doing the wrong thing.
Something like
var result = from person in context.Person
select new PersonModel()
{
ID = person.ID,
Fname = person.Fname,
Lname = person.Lname,
Sex = person.Sex,
Username = person.StaffId != null ? Person.Staff.Username : ""
};

EFCore returning too many columns for a simple LEFT OUTER join

I am currently using EFCore 1.1 (preview release) with SQL Server.
I am doing what I thought was a simple OUTER JOIN between an Order and OrderItem table.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId into tmp
from oi in tmp.DefaultIfEmpty()
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};
The actual data returned is correct (I know earlier versions had issues with OUTER JOINS not working at all). However the SQL is horrible and includes every column in Order and OrderItem which is problematic considering one of them is a large XML Blob.
SELECT [order].[OrderId], [order].[OrderStatusTypeId],
[order].[OrderSummary], [order].[OrderTotal], [order].[OrderTypeId],
[order].[ParentFSPId], [order].[ParentOrderId],
[order].[PayPalECToken], [order].[PaymentFailureTypeId] ....
...[orderItem].[OrderId], [orderItem].[OrderItemType], [orderItem].[Qty],
[orderItem].[SKU] FROM [Order] AS [order] LEFT JOIN [OrderItem] AS
[orderItem] ON [order].[OrderId] = [orderItem].[OrderId] ORDER BY
[order].[OrderId]
(There are many more columns not shown here.)
On the other hand - if I make it an INNER JOIN then the SQL is as expected with only the columns in my select clause:
SELECT [order].[OrderDt], [orderItem].[SKU], [orderItem].[Qty] FROM
[Order] AS [order] INNER JOIN [OrderItem] AS [orderItem] ON
[order].[OrderId] = [orderItem].[OrderId]
I tried reverting to EFCore 1.01, but got some horrible nuget package errors and gave up with that.
Not clear whether this is an actual regression issue or an incomplete feature in EFCore. But couldn't find any further information about this elsewhere.
Edit: EFCore 2.1 has addressed a lot of issues with grouping and also N+1 type issues where a separate query is made for every child entity. Very impressed with the performance in fact.
3/14/18 - 2.1 Preview 1 of EFCore isn't recommended because the GROUP BY SQL has some issues when using OrderBy() but it's fixed in nightly builds and Preview 2.
The following applies to EF Core 1.1.0 (release).
Although shouldn't be doing such things, tried several alternative syntax queries (using navigation property instead of manual join, joining subqueries containing anonymous type projection, using let / intermediate Select, using Concat / Union to emulate left join, alternative left join syntax etc.) The result - either the same as in the post, and/or executing more than one query, and/or invalid SQL queries, and/or strange runtime exceptions like IndexOutOfRange, InvalidArgument etc.
What I can say based on tests is that most likely the problem is related to bug(s) (regression, incomplete implementation - does it really matter) in GroupJoin translation. For instance, #7003: Wrong SQL generated for query with group join on a subquery that is not present in the final projection or #6647 - Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling etc.
Until it get fixed (when?), as a (far from perfect) workaround I could suggest using the alternative left outer join syntax (from a in A from b in B.Where(b = b.Key == a.Key).DefaultIfEmpty()):
var orders = from o in ctx.Order
from oi in ctx.OrderItem.Where(oi => oi.OrderId == o.OrderId).DefaultIfEmpty()
select new
{
OrderDt = o.OrderDt,
Sku = oi.Sku,
Qty = (int?)oi.Qty
};
which produces the following SQL:
SELECT [o].[OrderDt], [t1].[Sku], [t1].[Qty]
FROM [Order] AS [o]
CROSS APPLY (
SELECT [t0].*
FROM (
SELECT NULL AS [empty]
) AS [empty0]
LEFT JOIN (
SELECT [oi0].*
FROM [OrderItem] AS [oi0]
WHERE [oi0].[OrderId] = [o].[OrderId]
) AS [t0] ON 1 = 1
) AS [t1]
As you can see, the projection is ok, but instead of LEFT JOIN it uses strange CROSS APPLY which might introduce another performance issue.
Also note that you have to use casts for value types and nothing for strings when accessing the right joined table as shown above. If you use null checks as in the original query, you'll get ArgumentNullException at runtime (yet another bug).
Using "into" will create a temporary identifier to store the results.
Reference : MDSN: into (C# Reference)
So removing the "into tmp from oi in tmp.DefaultIfEmpty()" will result in the clean sql with the three columns.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};

Translate nested query SQL to EF

I have this table structure:
Classic many to many relationship. I want to get all the orders for products belonging to the category for a small number of products I provide. It may be easier to show the SQL that does exactly what I want:
select o.*
from [Order] o join Product p2 on o.FKCatalogNumber=p2.CatalogNumber
where p2.FKCategoryId IN
(select c.Id
from Category c join Product p1 on p1.FKCategoryId=c.Id
where p1.CatalogNumber in ('0001', '0002')
This example gives me all the orders belonging to the categories that catalog #'s 0001 and 0002 are in.
But I am unable to wrap my head around the equivalent EF syntax for this query. I'm embarrassed to say I spent half the day on this. I bet it's easy for someone out there.
I came up with this but it's not working (and probably not even close):
string[] catNumbers = {"0001", "0002"};
var orders = ctx.Categories
.SelectMany(c => c.Products, (c, p) => new {c, p})
.Where(#t => catNumbers.Contains(#t.p.CatalogNumber))
.Select(#t => #t.p.Orders)
.ToList();
You can use query syntax (which looks very similar to SQL) in LINQ, so if you're more comfortable with SQL then you may prefer to write your query like this:
string[] catNumbers = {"0001", "0002"};
var orders = from o in ctx.Orders
join p2 in ctx.Products on o.FKCatalogNumber equals p2.CatalogNumber
where
(
from c in ctx.Categories
join p1 in ctx.Products on c.ID equals p1.FKCategoryId
where catNumbers.Contains(p1.CatalogNumber)
select c.ID
).Contains(p2.FKCategoryId)
select o;
As you can see, it's actually just your SQL query rearranged slightly, but it compiles as C#.
Note that:
the [Order] o syntax for referencing tables is replaced by o in ctx.Orders
LINQ enforces which way round you do the join condition so I had to flip your on o.FKCatalogNumber=p2.CatalogNumber to be on o.FKCatalogNumber equals p2.CatalogNumber
instead of your where p2.FKCategoryId IN (...), the equivalent c# is (...).Contains(p2.FKCategoryId)
the select comes last, not first
but those are the only major changes. Otherwise, it's written just like SQL.
I'd also draw your attention to a distinction regarding this comment:
the equivalent EF syntax for this query
The syntax here isn't specific to EF, but is just LINQ - Language Integrated Querying. It has two flavours: query syntax (sometimes called declarative) and method syntax (sometimes called fluent). LINQ works on just about any collection that implements IEnumerable or IQueryable, including EF's DbSet.
For more info on the different ways of querying, this MSDN page is a decent place to start. There's also this handy reference table showing the equivalent query syntax for each method-syntax operator, where applicable.
You can still nest queries in EF. The following looks like it works for me:
string[] catNumbers = {"0001", "0002"};
var orders = ctx.Orders
.Where(o => ctx.Products
.Where(p => catNumbers.Contains(p.CatalogNumber))
.Select(p => p.CategoryId)
.Contains(o.Product.CategoryId)
);
This produces the following SQL:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[CatalogNumber] AS [CatalogNumber]
FROM [dbo].[Orders] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[Products] AS [Extent2]
INNER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[CategoryId] = [Extent3].[CategoryId]
WHERE ([Extent1].[CatalogNumber] = [Extent3].[CatalogNumber]) AND ([Extent2].[CatalogNumber] IN (N'0001', N'0002'))
)

Entity Framework 5: How to Outer Join Table Valued Function

I'm attempting to outer join a table with an inline table valued function in my LINQ query, but I get a query compilation error at runtime:
"The query attempted to call 'OuterApply' over a nested query, but 'OuterApply' did not have the appropriate keys."
My linq statement looks like this:
var testQuery = (from accountBase in ViewContext.AccountBases
join advisorConcatRaw in ViewContext.UFN_AccountAdvisorsConcatenated_Get()
on accountBase.AccountId equals advisorConcatRaw.AccountId into advisorConcatOuter
from advisorConcat in advisorConcatOuter.DefaultIfEmpty()
select new
{
accountBase.AccountId,
advisorConcat.Advisors
}).ToList();
The function definition is as follows:
CREATE FUNCTION dbo.UFN_AccountAdvisorsConcatenated_Get()
RETURNS TABLE
AS
RETURN
SELECT AP.AccountId,
LEFT(AP.Advisors, LEN(AP.Advisors) - 1) AS Advisors
FROM ( SELECT DISTINCT
AP.AccountId,
( SELECT AP2.PropertyValue + ', '
FROM dbo.AccountProperty AP2 WITH (NOLOCK)
WHERE AP2.AccountId = AP.AccountId
AND AP2.AccountPropertyTypeId = 1 -- Advisor
FOR XML PATH('')) AS Advisors
FROM dbo.AccountProperty AP WITH (NOLOCK)) AP;
I can successfully perform the join directly in sql as follows:
SELECT ab.accountid,
advisorConcat.Advisors
FROM accountbase ab
LEFT OUTER JOIN dbo.Ufn_accountadvisorsconcatenated_get() advisorConcat
ON ab.accountid = advisorConcat.accountid
Does anyone have a working example of left outer joining an inline TVF to a table in LINQ to entities - or is this a known defect, etc? Many thanks.
Entity Framework needs to know what the primary key columns of the TVF results are to do a left join. Basically you need to create a fake table which has same schema as your TVF results and update TVF in model browser to return the new created table type instead of default complex type. You could refer to this answer to get more details.

Joining several tables: table specified more than once error

I am attempting to call data after joining all of my tables in a postgreSQL query.
I am getting the following error:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: table name "place" specified more than once
)
Failed to execute SQL chunk
After referring to similar posts and online resources, I have attempted setting aliases (which only create new errors about not referring to the other tables in the FROM clause), reordering the table names (the cleanest, reordered chunk is posted below), and experimenting with UPDATE/FROM/JOIN as an alternative to SELECT/FROM/JOIN. However, I think I ultimately need to be using the SELECT clause.
SELECT *
FROM place
INNER JOIN place ON place.key = form.key
INNER JOIN form ON form.id = items.formid
INNER JOIN items ON items.recid = rec.id
INNER JOIN rec ON rec.id = sub.recid
INNER JOIN sub ON sub.id = type.subid
INNER JOIN type ON type.name = det.typeid;
You have "place" in your query twice, which is allowed but you would have to alias the second use of the table "place". Also you reference something called "det", but it's not a table or alias anywhere in your query. If you want only one place that's fine, just remove the INNER JOIN place and move your place.key = form.key to the form join or to a where clause.
If you want to place tables in their because you are trying to join the table to itself the alias the second one, but you will want to make sure that you then have a clause to join those two tables on (could be part of an on or a where)
The issue ended up being the order of table names. The code below joined everything just fine:
SELECT *
FROM place
INNER JOIN form ON place.key = form.key
INNER JOIN items ON form.id = items.formid
INNER JOIN rec ON items.recid = rec.id
INNER JOIN sub ON rec.id = sub.recid
INNER JOIN type ON sub.id = type.subid
INNER JOIN det ON type.name = det.typeid;