Do canonical LINQ queries guard against N+1 - entity-framework

With lazy loading used by default, I know that you should call .Include() on your Entity Framework entities to pull in associated entities you want in your queries to reduce the number of calls to the db if you're calling LINQ methods on your entities. If you don't, you run the risk of repeated database calls for each row (the N+1 problem)
Can someone confirm that if I write a canonical LINQ query, with the joins defined explicitly, that we guard against N+1?
from x in _context.tblOrder
join y in _context.tblCustomer equals y.id = x.customerId
select x
Is there any way N+1 could creep in when we're loading in all the required entities with joins?
EDIT
As background, someone asked how junior developers could guard against N+1. I mentioned the simplest way would be to write out your queries and define your joins, I want confirmation that was I indicated was 100% accurate.

If what you are really asking is
Will this query hit the database once?
Then the answer is yes. LINQ to EF translates your expression to raw SQL and only when you evaluate the query will it send anything to the database e.g. ToList()/foreach/for etc. and once that query is sent nothing else is unless you explicitly tell it otherwise.
Your LINQ statement could be simplified using a Lambda expression e.g.
_context.tblOrder.Include("Customer").ToList();
This would give you all the order details, including all related customer details, in a single database trip.

Just because you specify tables in a join doesn't mean that you can't run into a n+1 issue when you iterate over the values. Consider the following extension to your query:
var query = from o in Orders
join c in Customers on o.CustomerID equals c.CustomerID
select o;
foreach (var o in query)
{
Console.WriteLine(String.Format("{0}: {1}", o.OrderDate, o.Employee.FirstName));
}
In this case, each time you navigate through the order's Employee object, the employee is fetched from the database for that order. If you wanted to avoid the issue, you could project the values you want in the select clause:
var query = from o in Orders
join c in Customers on o.CustomerID equals c.CustomerID
select new {o.OrderDate, o.Employee.FirstName};
foreach (var o in query)
{
Console.WriteLine(String.Format("{0}: {1}", o.OrderDate, o.FirstName));
}
Note, in this case, you don't even need the join as you can just use the navigation properties instead. Of course, if you don't allow navigation properties in your entities and rely only on joins, you can avoid the n+1 situation, but that is not a very OOP way of solving the problem.
I think you would be safe guaranteeing against n+1 if you only return anonymous types from your queries, but that would be rather restrictive as well.
The best option is to make sure to profile your application's generated SQL and know precisely when and why you are hitting the database. I discuss some of the profilers available at http://www.thinqlinq.com/Post.aspx/Title/LINQ-to-Database-Performance-hints.

Related

Join database query Vs Handeling Joins in API

I am developing an API where I am confused as to what is the efficient way to handle join query.
I want to join 2 tables data and return the response. Either I can query the database with join query and fetch the result and then return the response OR I can fire two separate queries and then I would handle the join in the API on the fly and return the response. Which is the efficient and correct way ?
Databases are pretty much faster than querying and joining as class instances. Always do joins in the database and map them from the code. Also look for any lazy loading if possible. Cause in a situation like below:
#Entity
#Table(name = "USER")
public class UserLazy implements Serializable {
#Id
#GeneratedValue
#Column(name = "USER_ID")
private Long userId;
#OneToMany(fetch = FetchType.LAZY, mappedBy = "user")
private Set<OrderDetail> orderDetail = new HashSet();
// standard setters and getters
// also override equals and hashcode
}
you might not want order details when you want the initial results.
Usually it's more efficient to do the join in the database, but there are some corner cases, mostly due to the fact that application CPU time is cheaper than database CPU time. Here are a few examples that come to mind, with a query like "table A join table B":
B is a small table that rarely changes.
In this case it can be profitable to cache the contents of this table in the application, and not query it at all.
Rows in A are quite large, and many rows of B are selected for each row of A.
This will cause useless network traffic and load as rows from A are duplicated many times in each result row.
Rows in B are quite large, and there are few distinct b_id's in A
Same as above, except this time the same few rows from B are duplicated in the result set.
In the previous two examples, it could be useful to perform the query on table A, then gather a set of unique b_id's from the result, and SELECT FROM b WHERE b_id IN (list).
Data structure and ORMs
If each table contains a different object type, and they have a "belongs to" relationship (like category and product) and you use an ORM which will instantiate objects for each selected row, then perhaps you only want one instance of each category, and not one per selected product. In this case, you could select the products, gather a list of unique category_ids, and select the categories from there. The ORM may even do that for you behind the scene.
Complicated aggregates
Sometimes, you want some stuff, and some aggregates of other stuff related to the first stuff, but it just won't fit in a neat GROUP BY, or you may need several ones.
So basically, usually the join works better in the database, so that should be the default. If you do it in the application, then you should know why you're doing it, and decide it's a good reason. If it is, then fine. I gave a few reasons, based on performance, data model, and SQL constraints, these are only examples of course.

How do I do a "deep" fetch join in JPQL?

I don't think I will ever fully understand fetch joins.
I have a query where I'm attempting to eagerly "inflate" references down two levels.
That is, my A has an optional Collection of Bs, and each B has either 0 or 1 C. The size of the B collection is known to be small (10-20 tops). I'd like to prefetch this graph.
A's B relationship is marked as FetchType.LAZY and is optional. B's relationship to C is also optional and FetchType.LAZY.
I was hoping I could do:
SELECT a
FROM A a
LEFT JOIN FETCH a.bs // look, no alias; JPQL forbids it
LEFT JOIN a.bs b // "repeated" join necessary since you can't alias fetch joins
LEFT JOIN FETCH b.c // this doesn't seem to do anything
WHERE a.id = :id
When I run this, I see that As B collection is indeed fetched (I see a LEFT JOIN in the SQL referencing the table to which B is mapped).
However, I see no such evidence that C's table is fetched.
How can I prefetch all Cs and all Bs and all Cs that are "reachable" from a given A? I can't see any way to do this.
The JPA spec does not allow aliasing a fetch join, but some JPA providers do.
EclipseLink does as of 2.4. EclipseLink also allow nested join fetch using the dot notation (i.e. "JOIN FETCH a.bs.c"), and supports a query hint "eclipselink.join-fetch" that allows nested joins (you can specify multiple hints of the same hint name).
In general you need to be careful when using an alias on a fetch join, as you can affect the data that is returned.
See,
http://java-persistence-performance.blogspot.com/2012/04/objects-vs-data-and-filtering-join.html
I'm using Hibernate (and this may be specific to it) and I've had success with this:
SELECT DISTINCT a, b
FROM A a
LEFT JOIN a.bs b
LEFT JOIN FETCH a.bs
LEFT JOIN FETCH b.c
WHERE a.id = :id
(Note the b in the select list).
This was the only way I found this would work for me, note that this returns Object[] for me and I then filter it in code like so:
(List<A>) q.getResultList().stream().map(pair -> (A) (((Object[])pair)[0])).distinct().collect(Collectors.toList());
Not exactly JPQL, but you can achieve that in pure JPA with Criteria queries:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<MyEntity> q = cb.createQuery(MyEntity.class);
Root<MyEntity> root = q.from(MyEntity.class);
q.select(root);
Fetch bsFetch = root.fetch("b", JoinType.LEFT); //fetch b, property of MyEntity and hold fetch object
bsFetch.fetch("c", JoinType.LEFT); //fetch c, property of b
Support for this kind of nested fetch is vendor-specific (as JPA doesn't require them to do so), but both eclipselink and hibernate do support it, and this way your code remains vendor independant.
JPA does not allow nested join fetches, nor allow an alias on a join
fetch, so this is probably JPA provider specific.
In EclipseLink, you can specify a query hint to perform nested join
fetches.
You can't make it recursive in JPQL though, you could go only at best
n levels. In EclipseLink you could use #JoinFetch or #BatchFetch on
the mapping to make the querying recursive.
See,
http://java-persistence-performance.blogspot.com/2010/08/batch-fetching-optimizing-object-graph.html
Source: http://www.coderanch.com/t/570828/ORM/databases/Recursive-fetch-join-recursively-fetching

jpa lazy fetch entities over multiple levels with criteria api

I am using JPA2 with it's Criteria API to select my entities from the database. The implementation is OpenJPA on WebSphere Application Server. All my entities are modeled with Fetchtype=Lazy.
I select an entity with some criteria from the database and want to load all nested data from sub-tables at once.
If I have a datamodel where table A is joined oneToMany to table B, I can use a Fetch-clause in my criteria query:
CriteriaBuilder cb = entityManager.getCriteriaBuilder();
CriteriaQuery<A> cq = cb.createQuery(A.class);
Root<A> root = cq.from(A.class);
Fetch<A,B> fetch = root.fetch(A_.elementsOfB, JoinType.LEFT);
This works fine. I get an element A and all of its elements of B are filled correctly.
Now table B has a oneToMany-relationship to table C and I want to load them too. So I add the following statement to my query:
Fetch<B,C> fetch2 = fetch.fetch(B_.elementsOfC, JoinType.LEFT);
But this wont do anything.
Does anybody know how to fetch multi level entities in one query?
It does not work with JPQL and there is no way to make it work in CriteriaQueries either. Specification limits fetched entities to the ones in that are referenced directly from the returned entity:
About fetch join with CriteriaQuery:
An association or attribute referenced by the fetch method must be
referenced from an entity or embeddable that is returned as the result
of the query.
About fetch join in JPQL:
The association referenced by the right side of the FETCH JOIN clause
must be an association or ele ment collection that is referenced from
an entity or embeddable that is returned as a result of the query.
Same limitation is also told in OpenJPA documentation.
For what is worth. I do this all the time and it works just fine.
Several points:
I'm using jpa 2.1, but I'm almost sure it used to work in jpa 2.0 as well.
I'm using the criteria api, and I know some things work diferent in jpql. So don't think it works some way or doesn't work because that's what happens in jpql. Most often they do behave in the same way, but not always.
(Also i'm using plain criteria api, no querydsl or anything. Sometimes it makes a difference)
My associations tend to be SINGULAR_ATTRIBUTE. So maybe that's the problem here. Try a test with the joins in reverse "c.fetch(b).fetch(a)" and see if that works. I know it's not the same, but just to see if it gives you any hint. I'm almost sure I have done it with onetomany left fetch joins too, though.
Yep. I just checked and found it: root.fetch("targets", LEFT).fetch("destinations", LEFT).fetch("internal", LEFT)
This has been working without problems for months, maybe more than a year.
I just run a test and it generates this query:
select -- all fields from all tables
...
from agreement a
left outer join target t on a.id = t.agreement_id
left outer join destination d on t.id = d.target_id
left outer join internal i on d.id = i.destination_id
And returns all rows with all associations with all fields.
Maybe the problem is a different thing. You just say "it wont do anyhting". I don't know if it throws an exception or what, but maybe it executes the query properly but doesn't return the rows you expect because of some conditions or something like that.
You could design a view in the DB joining tables b and c, create the entity and fetchit insted of the original entity.

JPQL: Inner Join without duplicate records

Below is a question which supposedly was part of the official exam from Sun:
A Reader entity has a one-to-many, bidirectional relationship with a
Book entity. Two Reader entities are persisted, each having two Book
entities associated with them. For example, reader 1 has book a and
book b, while reader 2 has book c and book d. Which query returns a
Collection of fewer than four elements?
A. SELECT b.reader FROM Book b
B. SELECT r FROM Book b INNER JOIN b.reader r
C. SELECT r FROM Reader r INNER JOIN r.books b
D. SELECT r from Book b LEFT JOIN b.reader r LEFT JOIN FETCH r.books
Given answer is C, which I believe is incorrect. From what I understand, SQL with inner join of two tables will be generated by JPA provider. Therefore, in all cases we will get 4 records. I've run a test with one-to-many relation and duplicates were included.
Who is wrong, me or Sun?
Answer from Mike Keith, EJB 3.0 co-specification lead:
There are a couple of statements related to duplicates in the spec.
The JOIN FETCH is a variation of the JOIN, but it does state that similar JOIN semantics apply (except that more data is selected). The spec (section 4.4.5.3 of JPA v2.0) gives an example of duplicate Department rows being returned despite the fact that the Employee objects are not in the select clause.
The more direct reference is in the SELECT section (section 4.8 of JPA v2.0), where it clearly states
"If DISTINCT is not specified, duplicate values are not eliminated."
Many JPA providers do in fact remove the duplicates for a few reasons:
a) Convenience of the users because some users are not knowledgable enough in SQL and are not expecting them
b) There is not typically a use case for requiring dups
c) They may be added to a result set and if object identity is maintained the dups get eliminated automatically
C is correct, joins to ToMany relationships should not return duplicates. The JPA provider should automatically use a distinct to filter these out. I believe this is what the spec requires, although it may be one of those less well defined areas of the spec.
If a join fetch is used, the I believe the spec actually requires the duplicates to be returned. Which is odd, can see why you would every want duplicates. If you put a distinct on a join fetch, then they will be filtered (in memory, as all rows need to be selected).
This is how EclipseLink works anyway.
All of the other cases select Books not readers, so get the duplicates, C selects Readers so should not get duplicates.

How to sort related entities with eager loading in ADO.NET Entity Framework

Greetings,
Considering the Northwind sample tables Customers, Orders, and OrderDetails I would like to eager load the related entities corresponding to the tables mentioned above and yet I need ot order the child entities on the database before fetching entities.
Basic case:
var someQueryable = from customer in northwindContext.Customers.Include("Orders.OrderDetails")
select customer;
but I also need to sort Orders and OrderDetails on the database side (before fetching those entities into memory) with respect to some random column on those tables. Is it possible without some projection, like it is in T-SQL? It doesn't matter whether the solution uses e-SQL or LINQ to Entities. I searched the web but I wasn't satisfied with the answers I found since they mainly involve projecting data to some anonymous type and then re-query that anonymous type to get the child entities in the order you like. Also using CreateSourceQuery() doesn't seem to be an option for me since I need to get the data as it is on the database side, with eager loading but just by ordering child entities. That is I want to do the "ORDER BY" before executing any query and then fetch the entities in the order I'd like. Thanks in advance for any guidance. As a personal note, please excuse the direct language since I am kinda pissed at Microsoft for releasing the EF in such an immature shape even compared to Linq to SQL (which they seem to be getting away slowly). I hope this EF thingie will get much better and without significant bugs in the release version of .NET FX 4.0.
Actually I have Tip that addresses exactly this issue.
Sorting of related entities is not 'supported', but using the projection approach Craig shows AND relying on something called 'Relationship Fixup' you can get something very similar working:
If you do this:
var projection = from c in ctx.Customers
select new {
Customer = c,
Orders = c.Orders.OrderByDescending(
o => o.OrderDate
)
};
foreach(var anon in projection )
{
anon.Orders //is sorted (because of the projection)
anon.Customer.Orders // is sorted too! because of relationship fixup
}
Which means if you do this:
var customers = projection.AsEnumerable().Select(x => x.Customer);
you will have customers that have sorted orders!
See the tip for more info.
Hope this helps
Alex
You are confusing two different problems. The first is how to materialize entities in the database, the second is how to retrieve an ordered list. The EntityCollection type is not an ordered list. In your example, customer.Orders is an EntityCollection.
On the other hand, if you want to get a list in a particular order, you can certainly do that; it just can't be in a property of type EntityCollection. For example:
from c in northwindContext.Customers
orderby c.SomeField
select new {
Name = c.Name,
Orders = from o in c.Orders
orderby c.SomeField
select new {
SomeField = c.SomeField
}
}
Note that there is no call to Include. Because I am projecting, it is unnecessary.
The Entity Framework may not work in the way you expect, coming from a LINQ to SQL background, but it does work. Be careful about condemning it before you understand it; deciding that it doesn't work will prevent you from learning how it does work.
Thank you both. I understand that I can use projection to achieve what I wanted but I thought there might be an easy way to do it since in T-SQL world it's perfectly possible with a few nested queries (or joins) and order bys. On the other hand seperation of concerns sounds reasonable and we are in the entity domain now so I will use the way you two both recommended though I have to admit this is easier and cleaner to achieve in LINQ to SQL by using AssociateWith.
Kind regards.