JPQL: Inner Join without duplicate records - jpa

Below is a question which supposedly was part of the official exam from Sun:
A Reader entity has a one-to-many, bidirectional relationship with a
Book entity. Two Reader entities are persisted, each having two Book
entities associated with them. For example, reader 1 has book a and
book b, while reader 2 has book c and book d. Which query returns a
Collection of fewer than four elements?
A. SELECT b.reader FROM Book b
B. SELECT r FROM Book b INNER JOIN b.reader r
C. SELECT r FROM Reader r INNER JOIN r.books b
D. SELECT r from Book b LEFT JOIN b.reader r LEFT JOIN FETCH r.books
Given answer is C, which I believe is incorrect. From what I understand, SQL with inner join of two tables will be generated by JPA provider. Therefore, in all cases we will get 4 records. I've run a test with one-to-many relation and duplicates were included.
Who is wrong, me or Sun?

Answer from Mike Keith, EJB 3.0 co-specification lead:
There are a couple of statements related to duplicates in the spec.
The JOIN FETCH is a variation of the JOIN, but it does state that similar JOIN semantics apply (except that more data is selected). The spec (section 4.4.5.3 of JPA v2.0) gives an example of duplicate Department rows being returned despite the fact that the Employee objects are not in the select clause.
The more direct reference is in the SELECT section (section 4.8 of JPA v2.0), where it clearly states
"If DISTINCT is not specified, duplicate values are not eliminated."
Many JPA providers do in fact remove the duplicates for a few reasons:
a) Convenience of the users because some users are not knowledgable enough in SQL and are not expecting them
b) There is not typically a use case for requiring dups
c) They may be added to a result set and if object identity is maintained the dups get eliminated automatically

C is correct, joins to ToMany relationships should not return duplicates. The JPA provider should automatically use a distinct to filter these out. I believe this is what the spec requires, although it may be one of those less well defined areas of the spec.
If a join fetch is used, the I believe the spec actually requires the duplicates to be returned. Which is odd, can see why you would every want duplicates. If you put a distinct on a join fetch, then they will be filtered (in memory, as all rows need to be selected).
This is how EclipseLink works anyway.
All of the other cases select Books not readers, so get the duplicates, C selects Readers so should not get duplicates.

Related

Looking for the best option to limit JPA query volume

I have an entity model where I need to dig down two collections to get the data required on the screen - in Hibernate that triggered first an exception when I created a JPA join fetch query until I changed the collections to sets. It is still a cartesian product though. I am trying to write a JPARepository function to execute the following query now, but only with specific fields, because the resulting query returns more than 100 fields of which I need less than 10.
SELECT p FROM Person p
JOIN FETCH p.responsibilities r
JOIN FETCH r.configurations c
JOIN FETCH c.type
WHERE p.id = :id
My JPA Repository method is already using projection interfaces but I think because I am using the #Query annotation it does not optimize the query. When I tried it as a plain, typed findById() method with projection interfaces it did execute a load of subqueries which costs much more time.
Any recommendations

Projection with Entity Splitting in Entity Framework 6

I am using entity splitting to split properties across multiple tables - http://msdn.microsoft.com/en-us/data/jj591617
This adds an inner join to the resulting SQL query.
I expected this join would only be included when the query projection includes the properties in the secondary table. This is not the case when I use an anonymous type to isolate (project) a subset of needed fields. The resulting SQL query only selects columns from the base table but still includes the join.
Is there anyway to continue to use entity splitting and only include the join when necessary?
As far as I can tell, no. With Entity Framework, you can't lazy-load simple properties that are mapped to a column (which, in your case, would theoretically help you avoid the join); only navigation properties can be lazy-loaded. Perhaps you would need to employ table splitting instead to achieve the goal of eliminating the join.
For reference:
http://www.eidias.com/blog/2013/11/18/entity-framework-lazy-loading-properties

Do canonical LINQ queries guard against N+1

With lazy loading used by default, I know that you should call .Include() on your Entity Framework entities to pull in associated entities you want in your queries to reduce the number of calls to the db if you're calling LINQ methods on your entities. If you don't, you run the risk of repeated database calls for each row (the N+1 problem)
Can someone confirm that if I write a canonical LINQ query, with the joins defined explicitly, that we guard against N+1?
from x in _context.tblOrder
join y in _context.tblCustomer equals y.id = x.customerId
select x
Is there any way N+1 could creep in when we're loading in all the required entities with joins?
EDIT
As background, someone asked how junior developers could guard against N+1. I mentioned the simplest way would be to write out your queries and define your joins, I want confirmation that was I indicated was 100% accurate.
If what you are really asking is
Will this query hit the database once?
Then the answer is yes. LINQ to EF translates your expression to raw SQL and only when you evaluate the query will it send anything to the database e.g. ToList()/foreach/for etc. and once that query is sent nothing else is unless you explicitly tell it otherwise.
Your LINQ statement could be simplified using a Lambda expression e.g.
_context.tblOrder.Include("Customer").ToList();
This would give you all the order details, including all related customer details, in a single database trip.
Just because you specify tables in a join doesn't mean that you can't run into a n+1 issue when you iterate over the values. Consider the following extension to your query:
var query = from o in Orders
join c in Customers on o.CustomerID equals c.CustomerID
select o;
foreach (var o in query)
{
Console.WriteLine(String.Format("{0}: {1}", o.OrderDate, o.Employee.FirstName));
}
In this case, each time you navigate through the order's Employee object, the employee is fetched from the database for that order. If you wanted to avoid the issue, you could project the values you want in the select clause:
var query = from o in Orders
join c in Customers on o.CustomerID equals c.CustomerID
select new {o.OrderDate, o.Employee.FirstName};
foreach (var o in query)
{
Console.WriteLine(String.Format("{0}: {1}", o.OrderDate, o.FirstName));
}
Note, in this case, you don't even need the join as you can just use the navigation properties instead. Of course, if you don't allow navigation properties in your entities and rely only on joins, you can avoid the n+1 situation, but that is not a very OOP way of solving the problem.
I think you would be safe guaranteeing against n+1 if you only return anonymous types from your queries, but that would be rather restrictive as well.
The best option is to make sure to profile your application's generated SQL and know precisely when and why you are hitting the database. I discuss some of the profilers available at http://www.thinqlinq.com/Post.aspx/Title/LINQ-to-Database-Performance-hints.

How do I do a "deep" fetch join in JPQL?

I don't think I will ever fully understand fetch joins.
I have a query where I'm attempting to eagerly "inflate" references down two levels.
That is, my A has an optional Collection of Bs, and each B has either 0 or 1 C. The size of the B collection is known to be small (10-20 tops). I'd like to prefetch this graph.
A's B relationship is marked as FetchType.LAZY and is optional. B's relationship to C is also optional and FetchType.LAZY.
I was hoping I could do:
SELECT a
FROM A a
LEFT JOIN FETCH a.bs // look, no alias; JPQL forbids it
LEFT JOIN a.bs b // "repeated" join necessary since you can't alias fetch joins
LEFT JOIN FETCH b.c // this doesn't seem to do anything
WHERE a.id = :id
When I run this, I see that As B collection is indeed fetched (I see a LEFT JOIN in the SQL referencing the table to which B is mapped).
However, I see no such evidence that C's table is fetched.
How can I prefetch all Cs and all Bs and all Cs that are "reachable" from a given A? I can't see any way to do this.
The JPA spec does not allow aliasing a fetch join, but some JPA providers do.
EclipseLink does as of 2.4. EclipseLink also allow nested join fetch using the dot notation (i.e. "JOIN FETCH a.bs.c"), and supports a query hint "eclipselink.join-fetch" that allows nested joins (you can specify multiple hints of the same hint name).
In general you need to be careful when using an alias on a fetch join, as you can affect the data that is returned.
See,
http://java-persistence-performance.blogspot.com/2012/04/objects-vs-data-and-filtering-join.html
I'm using Hibernate (and this may be specific to it) and I've had success with this:
SELECT DISTINCT a, b
FROM A a
LEFT JOIN a.bs b
LEFT JOIN FETCH a.bs
LEFT JOIN FETCH b.c
WHERE a.id = :id
(Note the b in the select list).
This was the only way I found this would work for me, note that this returns Object[] for me and I then filter it in code like so:
(List<A>) q.getResultList().stream().map(pair -> (A) (((Object[])pair)[0])).distinct().collect(Collectors.toList());
Not exactly JPQL, but you can achieve that in pure JPA with Criteria queries:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<MyEntity> q = cb.createQuery(MyEntity.class);
Root<MyEntity> root = q.from(MyEntity.class);
q.select(root);
Fetch bsFetch = root.fetch("b", JoinType.LEFT); //fetch b, property of MyEntity and hold fetch object
bsFetch.fetch("c", JoinType.LEFT); //fetch c, property of b
Support for this kind of nested fetch is vendor-specific (as JPA doesn't require them to do so), but both eclipselink and hibernate do support it, and this way your code remains vendor independant.
JPA does not allow nested join fetches, nor allow an alias on a join
fetch, so this is probably JPA provider specific.
In EclipseLink, you can specify a query hint to perform nested join
fetches.
You can't make it recursive in JPQL though, you could go only at best
n levels. In EclipseLink you could use #JoinFetch or #BatchFetch on
the mapping to make the querying recursive.
See,
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
Source: http://www.coderanch.com/t/570828/ORM/databases/Recursive-fetch-join-recursively-fetching

jpa lazy fetch entities over multiple levels with criteria api

I am using JPA2 with it's Criteria API to select my entities from the database. The implementation is OpenJPA on WebSphere Application Server. All my entities are modeled with Fetchtype=Lazy.
I select an entity with some criteria from the database and want to load all nested data from sub-tables at once.
If I have a datamodel where table A is joined oneToMany to table B, I can use a Fetch-clause in my criteria query:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<A> cq = cb.createQuery(A.class);
Root<A> root = cq.from(A.class);
Fetch<A,B> fetch = root.fetch(A_.elementsOfB, JoinType.LEFT);
This works fine. I get an element A and all of its elements of B are filled correctly.
Now table B has a oneToMany-relationship to table C and I want to load them too. So I add the following statement to my query:
Fetch<B,C> fetch2 = fetch.fetch(B_.elementsOfC, JoinType.LEFT);
But this wont do anything.
Does anybody know how to fetch multi level entities in one query?
It does not work with JPQL and there is no way to make it work in CriteriaQueries either. Specification limits fetched entities to the ones in that are referenced directly from the returned entity:
About fetch join with CriteriaQuery:
An association or attribute referenced by the fetch method must be
referenced from an entity or embeddable that is returned as the result
of the query.
About fetch join in JPQL:
The association referenced by the right side of the FETCH JOIN clause
must be an association or ele ment collection that is referenced from
an entity or embeddable that is returned as a result of the query.
Same limitation is also told in OpenJPA documentation.
For what is worth. I do this all the time and it works just fine.
Several points:
I'm using jpa 2.1, but I'm almost sure it used to work in jpa 2.0 as well.
I'm using the criteria api, and I know some things work diferent in jpql. So don't think it works some way or doesn't work because that's what happens in jpql. Most often they do behave in the same way, but not always.
(Also i'm using plain criteria api, no querydsl or anything. Sometimes it makes a difference)
My associations tend to be SINGULAR_ATTRIBUTE. So maybe that's the problem here. Try a test with the joins in reverse "c.fetch(b).fetch(a)" and see if that works. I know it's not the same, but just to see if it gives you any hint. I'm almost sure I have done it with onetomany left fetch joins too, though.
Yep. I just checked and found it: root.fetch("targets", LEFT).fetch("destinations", LEFT).fetch("internal", LEFT)
This has been working without problems for months, maybe more than a year.
I just run a test and it generates this query:
select -- all fields from all tables
...
from agreement a
left outer join target t on a.id = t.agreement_id
left outer join destination d on t.id = d.target_id
left outer join internal i on d.id = i.destination_id
And returns all rows with all associations with all fields.
Maybe the problem is a different thing. You just say "it wont do anyhting". I don't know if it throws an exception or what, but maybe it executes the query properly but doesn't return the rows you expect because of some conditions or something like that.
You could design a view in the DB joining tables b and c, create the entity and fetchit insted of the original entity.