Linq GroupJoin (join...into) results in INNER JOIN? - entity-framework

I am referencing the accepted answer to this question:
LINQ to SQL multiple tables left outer join
In my example, I need all of the Person records regardless if there is a matching Staff record.
I am using the following query (simplified for illustation's sake):
var result = from person in context.Person
join staffQ in context.Staff
on person.StaffID equals staffQ.ID into staffStaffIDGroup
from staff in staffStaffIDGroup.DefaultIfEmpty()
select new PersonModel()
{
ID = person.ID,
Fname = person.Fname,
Lname = person.Lname,
Sex = person.Sex,
Username = staff != null ? staff.Username : ""
};
However, contrary to my expectations, the query results in the following SQL with an INNER JOIN, which eliminates records I need in the the result set.
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[fname] AS [fname],
[Extent1].[lname] AS [lname],
[Extent1].[sex] AS [sex],
[Extent2].[username] AS [username]
FROM [dbo].[Person] AS [Extent1]
INNER JOIN [dbo].[Staff] AS [Extent2] ON [Extent1].[StaffID] = [Extent2].[ID]
I thought that GroupJoin (or join...into) is supposed to get around this? I know I must have made a dumb mistake here, but I can't see it.

In general the query should generate left outer join.
But remember, this is EF, and it has additional information coming from the model. In this case looks like the StaffID property of Person is an enforced FK constraint to Stuff, so EF knows that there is always a corresponding record in Staff table, hence ignoring your left outer join construct and generates inner join instead.
Again, the model (properties, whether they are required or not, the relationships - required or not etc.) allows EF to perform similar smart decisons and optimizations.

Use a Navigation Property instead of a Join. If you're using a Join in EF LINQ you're almost always doing the wrong thing.
Something like
var result = from person in context.Person
select new PersonModel()
{
ID = person.ID,
Fname = person.Fname,
Lname = person.Lname,
Sex = person.Sex,
Username = person.StaffId != null ? Person.Staff.Username : ""
};

Related

T-SQL: Joining on two separate fields based on specific criteria in a query

I have a query in which I am trying to get additional fields from another table through a join field that I manually create. The issue is when the field I create is null, then I want to use another field to join on. I am not sure how to do that without getting duplicate results. I tried a UNION query, but that just displays everything where the values are null when the manually created field value is null. Here is the query:
SELECT
BU = m.BU,
BUFBA = m.BUFBA,
a.CostCenter,
Delegate = m.Delegate,
a.DistrictLookup,
PCOwner = m.PCOwner,
a.PGr,
a.POrg,
PrimaryContact = m.PrimaryContact,
WarehouseManager = m.WarehouseManager,
Zone = m.Zone,
ZoneFBA = m.ZoneFBA
FROM
(SELECT
e.CostCenter,
e.District,
DistrictLookup =
CASE
WHEN e.PGr IN ('N01','BQE','BQA') THEN 'GSS'
WHEN e.PGr = 'BQB' THEN 'BG'
WHEN e.PGr = 'BQF' THEN 'FP'
ELSE e.District
END,
e.PGr,
e.POrg
FROM dbo.E1P e (NOLOCK)
WHERE
e.CoCd = '4433'
) a
LEFT JOIN dbo.Mapping m (NOLOCK) ON m.District = a.DistrictLookup
When the DistrictLookup field is NULL, I need a different join to occur so that the additional fields populate. That join would be:
LEFT JOIN dbo.Mapping m (NOLOCK) ON m.CostCenter = a.CostCenter
How can I write in this second join and not get duplicate results? This is a separate join on different fields and I think it differs from the other methods of doing a conditional join. If it, can someone please explain how to implement that logic into my query?
I believe this is what you are after...
LEFT JOIN dbo.Mapping m (NOLOCK)
ON (a.DistrictLookup IS NOT NULL AND m.District = a.DistrictLookup)
OR (a.DistrictLookup IS NULL AND m.CostCenter = a.CostCenter)

EFCore returning too many columns for a simple LEFT OUTER join

I am currently using EFCore 1.1 (preview release) with SQL Server.
I am doing what I thought was a simple OUTER JOIN between an Order and OrderItem table.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId into tmp
from oi in tmp.DefaultIfEmpty()
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};
The actual data returned is correct (I know earlier versions had issues with OUTER JOINS not working at all). However the SQL is horrible and includes every column in Order and OrderItem which is problematic considering one of them is a large XML Blob.
SELECT [order].[OrderId], [order].[OrderStatusTypeId],
[order].[OrderSummary], [order].[OrderTotal], [order].[OrderTypeId],
[order].[ParentFSPId], [order].[ParentOrderId],
[order].[PayPalECToken], [order].[PaymentFailureTypeId] ....
...[orderItem].[OrderId], [orderItem].[OrderItemType], [orderItem].[Qty],
[orderItem].[SKU] FROM [Order] AS [order] LEFT JOIN [OrderItem] AS
[orderItem] ON [order].[OrderId] = [orderItem].[OrderId] ORDER BY
[order].[OrderId]
(There are many more columns not shown here.)
On the other hand - if I make it an INNER JOIN then the SQL is as expected with only the columns in my select clause:
SELECT [order].[OrderDt], [orderItem].[SKU], [orderItem].[Qty] FROM
[Order] AS [order] INNER JOIN [OrderItem] AS [orderItem] ON
[order].[OrderId] = [orderItem].[OrderId]
I tried reverting to EFCore 1.01, but got some horrible nuget package errors and gave up with that.
Not clear whether this is an actual regression issue or an incomplete feature in EFCore. But couldn't find any further information about this elsewhere.
Edit: EFCore 2.1 has addressed a lot of issues with grouping and also N+1 type issues where a separate query is made for every child entity. Very impressed with the performance in fact.
3/14/18 - 2.1 Preview 1 of EFCore isn't recommended because the GROUP BY SQL has some issues when using OrderBy() but it's fixed in nightly builds and Preview 2.
The following applies to EF Core 1.1.0 (release).
Although shouldn't be doing such things, tried several alternative syntax queries (using navigation property instead of manual join, joining subqueries containing anonymous type projection, using let / intermediate Select, using Concat / Union to emulate left join, alternative left join syntax etc.) The result - either the same as in the post, and/or executing more than one query, and/or invalid SQL queries, and/or strange runtime exceptions like IndexOutOfRange, InvalidArgument etc.
What I can say based on tests is that most likely the problem is related to bug(s) (regression, incomplete implementation - does it really matter) in GroupJoin translation. For instance, #7003: Wrong SQL generated for query with group join on a subquery that is not present in the final projection or #6647 - Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling etc.
Until it get fixed (when?), as a (far from perfect) workaround I could suggest using the alternative left outer join syntax (from a in A from b in B.Where(b = b.Key == a.Key).DefaultIfEmpty()):
var orders = from o in ctx.Order
from oi in ctx.OrderItem.Where(oi => oi.OrderId == o.OrderId).DefaultIfEmpty()
select new
{
OrderDt = o.OrderDt,
Sku = oi.Sku,
Qty = (int?)oi.Qty
};
which produces the following SQL:
SELECT [o].[OrderDt], [t1].[Sku], [t1].[Qty]
FROM [Order] AS [o]
CROSS APPLY (
SELECT [t0].*
FROM (
SELECT NULL AS [empty]
) AS [empty0]
LEFT JOIN (
SELECT [oi0].*
FROM [OrderItem] AS [oi0]
WHERE [oi0].[OrderId] = [o].[OrderId]
) AS [t0] ON 1 = 1
) AS [t1]
As you can see, the projection is ok, but instead of LEFT JOIN it uses strange CROSS APPLY which might introduce another performance issue.
Also note that you have to use casts for value types and nothing for strings when accessing the right joined table as shown above. If you use null checks as in the original query, you'll get ArgumentNullException at runtime (yet another bug).
Using "into" will create a temporary identifier to store the results.
Reference : MDSN: into (C# Reference)
So removing the "into tmp from oi in tmp.DefaultIfEmpty()" will result in the clean sql with the three columns.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};

Query produced for IN filter on 1-1 relation joins to parent table twice

I have this problem and reproduced it with AdventureWorks2008R2 to make it more easy. Basically, I want to filter a parent table for a list of IN values and I thought it would generate this type of query
but it doesn't.
SELECT * FROM SalesOrderDetail where EXISTS( select * from SalesOrderHeader where d.id=h.id and rowguid IN ('asdf', 'fff', 'weee' )
Any ideas how to change the LINQ statement to query Header only once?
(ignore the fact I'm matching on Guids - it will actually be integers; I was just quickly looking for a 1-1 table in EF because that's when the problem occurs and I happened to find these)
var guidsToFind = new Guid[] { Guid.NewGuid(), Guid.NewGuid(), Guid.NewGuid()};
AdventureWorks2008R2Entities context = new AdventureWorks2008R2Entities();
var g = context.People.Where(p => guidsToFind.Contains(p.BusinessEntity.rowguid)).ToList();
That produces the following more expensive query:
SELECT [Extent1].[BusinessEntityID] AS [BusinessEntityID],
[Extent1].[PersonType] AS [PersonType],
[Extent1].[NameStyle] AS [NameStyle],
[Extent1].[Title] AS [Title],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[MiddleName] AS [MiddleName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Suffix] AS [Suffix],
[Extent1].[EmailPromotion] AS [EmailPromotion],
[Extent1].[AdditionalContactInfo] AS [AdditionalContactInfo],
[Extent1].[Demographics] AS [Demographics],
[Extent1].[rowguid] AS [rowguid],
[Extent1].[ModifiedDate] AS [ModifiedDate]
FROM [Person].[Person] AS [Extent1]
INNER JOIN [Person].[BusinessEntity] AS [Extent2] ON [Extent1].[BusinessEntityID] = [Extent2].[BusinessEntityID]
LEFT OUTER JOIN [Person].[BusinessEntity] AS [Extent3] ON [Extent1].[BusinessEntityID] = [Extent3].[BusinessEntityID]
WHERE [Extent2].[rowguid] = cast('b95b63f9-6304-4626-8e70-0bd2b73b6b0f' as uniqueidentifier) OR [Extent3].[rowguid] IN (cast('f917a037-b86b-4911-95f4-4afc17433086' as uniqueidentifier),cast('3188557d-5df9-40b3-90ae-f83deee2be05' as uniqueidentifier))
Really odd. Looks like a LINQ limitation.
I don't have a system to try this on right now but if you first get a list of BusinessEntityId values based on the provided guids and then get the persons like this
var g = context.People.Where(p => businessEntityIdList.Contains(p.BusinessEntityId)).ToList();
there should not be a reason for additional unnecessary joins anymore.
If that works, you can try to combine the to steps into one LINQ expression to see if the separation stays intact.

JPA Query over a join table

I have 3 tables like:
A AB B
------------- ------------ ---------------
a1 a1,b1 b1
AB is a transition table between A and B
With this, my classes have no composition within these two classes to each other. But I want to know that , with a JPQL Query, if any records exist for my element from A table in AB table. Just number or a boolean value is what I need.
Because AB is a transition table, there is no model object for it and I want to know if I can do this with a #Query in my Repository object.
the AB table must be modeled in an entity to be queried in JPQL. So you must model this as
an own entity class or an association in your A and or your B entity.
I suggest to use Native query method intead of JPQL (JPA supports Native query too). Let us assume table A is Customer and table B is a Product and AB is a Sale. Here is the query for getting list of products which are ordered by a customer.
entityManager.createNativeQuery("SELECT PRODUCT_ID FROM
SALE WHERE CUSTOMER_ID = 'C_123'");
Actually, the answer to this situation is simpler than you might think. It's a simple matter of using the right tool for the right job. JPA was not designed for implementing complicated SQL queries, that's what SQL is for! So you need a way to get JPA to access a production-level SQL query;
em.createNativeQuery
So in your case what you want to do is access the AB table looking only for the id field. Once you have retrieved your query, take your id field and look up the Java object using the id field. It's a second search true, but trivial by SQL standards.
Let's assume you are looking for an A object based on the number of times a B object references it. Say you are wanting a semi-complicated (but typical) SQL query to group type A objects based on the number of B objects and in descending order. This would be a typical popularity query that you might want to implement as per project requirements.
Your native SQL query would be as such:
select a_id as id from AB group by a_id order by count(*) desc;
Now what you want to do is tell JPA to expect the id list to comeback in a form that that JPA can accept. You need to put together an extra JPA entity. One that will never be used in the normal fashion of JPA. But JPA needs a way to get the queried objects back to you. You would put together an entity for this search query as such;
#Entity
public class IdSearch {
#Id
#Column
Long id;
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
}
Now you implement a little bit of code to bring the two technologies together;
#SuppressWarnings("unchecked")
public List<IdSearch> findMostPopularA() {
return em.createNativeQuery("select a_id as id from AB group by a_id
order by count(*) desc", IdSearch.class).getResultList();
}
There, that's all you have to do to get JPA to get your query completed successfully. To get at your A objects you would simply cross reference into your the A list using the traditional JPA approach, as such;
List<IdSearch> list = producer.getMostPopularA();
Iterator<IdSearch> it = list.iterator();
while ( it.hasNext() ) {
IdSearch a = it.next();
A object = em.find(A.class,a.getId());
// your in business!
Still, a little more refinement of the above can simplify things a bit further actually given the many many capabilities of the SQL design structure. A slightly more complicated SQL query will an even more direct JPA interface to your actual data;
#SuppressWarnings("unchecked")
public List<A> findMostPopularA() {
return em.createNativeQuery("select * from A, AB
where A.id = AB.a_id
group by a_id
order by count(*) desc", A.class).getResultList();
}
This removes the need for an interm IdSearch table!
List<A> list = producer.getMostPopularA();
Iterator<A> it = list.iterator();
while ( it.hasNext() ) {
A a = it.next();
// your in business!
What may not be clear tot the naked eye is the wonderfully simplified way JPA allows you to make use of complicated SQL structures inside the JPA interface. Imagine if you an SQL as follows;
SELECT array_agg(players), player_teams
FROM (
SELECT DISTINCT t1.t1player AS players, t1.player_teams
FROM (
SELECT
p.playerid AS t1id,
concat(p.playerid,':', p.playername, ' ') AS t1player,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t1
INNER JOIN (
SELECT
p.playerid AS t2id,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams
The point is that with createNativeQuery interface, you can still retrieve precisely the data you are looking for and straight into the desired object for easy access by Java.
#SuppressWarnings("unchecked")
public List<A> findMostPopularA() {
return em.createNativeQuery("SELECT array_agg(players), player_teams
FROM (
SELECT DISTINCT t1.t1player AS players, t1.player_teams
FROM (
SELECT
p.playerid AS t1id,
concat(p.playerid,':', p.playername, ' ') AS t1player,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t1
INNER JOIN (
SELECT
p.playerid AS t2id,
array_agg(pl.teamid ORDER BY pl.teamid) AS player_teams
FROM player p
LEFT JOIN plays pl ON p.playerid = pl.playerid
GROUP BY p.playerid, p.playername
) t2 ON t1.player_teams=t2.player_teams AND t1.t1id <> t2.t2id
) innerQuery
GROUP BY player_teams
", A.class).getResultList();
}

In JPA 2.0 JPQL, when one returns a NEW object, how may one make use of FETCH JOINs?

A colleague of mine has the following (apparently invalid) JPQL query:
SELECT NEW com.foobar.jpa.DonationAllocationDTOEntity(a.id, a.campaign, a.campAppeal, a.campDivision, a.divisionFund)
FROM DonationAllocation a JOIN a.donation d JOIN a.allocationType t
JOIN FETCH a.campaign
WHERE d.id = :donationId
AND (t.code = 'Pledge' OR t.code = 'MatchingPledge')
It is worth noting (for later in this message) that DonationAllocation's relationship with a Campaign entity is many-to-one, and is marked as FetchType.LAZY. My colleague's intent with this query is to (among other things) ensure that a.campaign is "inflated" (eagerly fetched).
Hibernate (obviously just one JPA implementation of several), when faced with this query, says:
query specified join fetching, but the owner of the fetched association was not present in the select list
This makes sense, as the select list contains only NEW DonationAllocationDTOEntity(), and section 4.4.5.3 of the JPA 2.0 specification says:
The association referenced by the right side of the FETCH JOIN clause must be an association or element collection that is referenced from an entity or embeddable that is returned as a result of the query.
So since there is no "entity or embeddable that is returned as a result of the query" (it's a DTO constructed using the NEW operator), it follows that there is no possible association for a FETCH JOIN to reference, and hence this query is invalid.
How, given this limitation, should one construct a JPQL query in this case such that a.campaign--passed into the constructor expression--is fetched eagerly?
I would simply select the entity and its association, and llopover the results to invoke the DTO constructor explicitely. You would have the additional advantage of compile-time checks and refactorable code:
select a from DonationAllocation a
JOIN a.donation d
JOIN a.allocationType t
JOIN FETCH a.campaign
WHERE d.id = :donationId
AND (t.code = 'Pledge' OR t.code = 'MatchingPledge')
...
for (DonationAllocation a : list) {
result.add(new DonationAllocationDTOEntity(a.id,
a.campaign,
a.campAppeal,
a.campDivision,
a.divisionFund));
}
EDIT:
This query should also select what's needed, and avoid selecting the whole DonationAllocation entity:
select a.id, a.campaign, a.campAppeal, a.campDivision, a.divisionFund
from DonationAllocation a
JOIN a.donation d
JOIN a.allocationType t
WHERE d.id = :donationId
AND (t.code = 'Pledge' OR t.code = 'MatchingPledge')
and you might just add the DTO constructor in the query if you want:
select new com.foobar.jpa.DonationAllocationDTOEntity(a.id, a.campaign, a.campAppeal, a.campDivision, a.divisionFund)
from DonationAllocation a
JOIN a.donation d
JOIN a.allocationType t
WHERE d.id = :donationId
AND (t.code = 'Pledge' OR t.code = 'MatchingPledge')
The fact the a.campaign is in the select clause should be sufficient to tell Hibernate to load the entity. At least that's how it behaves in my tests.