QueryDSL: querying relations and properties - jpa

I'm using QueryDSL with JPA.
I want to query some properties of an entity, it's like this:
QPost post = QPost.post;
JPAQuery q = new JPAQuery(em);
List<Object[]> rows = q.from(post).where(...).list(post.id, post.name);
It works fine.
If i want to query a relation property, e.g. comments of a post:
List<Set<Comment>> rows = q.from(post).where(...).list(post.comments);
It's also fine.
But when I want to query relation and simple properties together, e.g.
List<Object[]> rows = q.from(post).where(...).list(post.id, post.name, post.comments);
Then something went wrong, generiting a bad SQL syntax.
Then I realized that it's not possible to query them together in one SQL statement.
Is it possible that QueryDSL would somehow deal with relations and generate additional queries (just like what hibernate does with lazy relations), and load the results in?
Or should I just query twice, and then merge both result lists?
P.S. what i actually want is each post with its comments' ids. So a function to concat each post's comment ids is better, is this kind of expressin possible?
q.list(post.id, post.name, post.comments.all().id.join())
and generate a subquery sql like (select group_concat(c.id) from comments as c inner join post where c.id = post.id)

Querydsl JPA is restricted to the expressivity of JPQL, so what you are asking for is not possible with Querydsl JPA. You can though try to express it with Querydsl SQL. It should be possible. Also as you don't project entities, but literals and collections it might work just fine.
Alternatively you can load the Posts with only the Comment ids loaded and then project the id, name and comment ids to something else. This should work when accessors are annotated.

The simplest thing would be to query for Posts and use fetchJoin for comments, but I'm assuming that's too slow for you use case.
I think you ought to simply project required properties of posts and comments and group the results by hand (if required). E.g.
QPost post=...;
QComment comment=..;
List<Tuple> rows = q.from(post)
// Or leftJoin if you want also posts without comments
.innerJoin(comment).on(comment.postId.eq(post.id))
.orderBy(post.id) // Could be used to optimize grouping
.list(new QTuple(post.id, post.name, comment.id));
Map<Long, PostWithComments> results=...;
for (Tuple row : rows) {
PostWithComments res = results.get(row.get(post.id));
if (res == null) {
res = new PostWithComments(row.get(post.id), row.get(post.name));
results.put(res.getPostId(), res);
}
res.addCommentId(row.get(comment.id));
}
NOTE: You cannot use limit nor offset with this kind of queries.
As an alternative, it might be possible to tune your mappings so that 1) Comments are always lazy proxies so that (with property access) Comment.getId() is possible without initializing the actual object and 2) using batch fetch* on Post.comments to optimize collection fetching. This way you could just query for Posts and then access id's of their comments with little performance hit. In most cases you shouldn't even need those lazy proxies unless your Comment is very fat. That kind of code would certainly look nicer without low level row handling and you could also use limit and offset in your queries. Just keep an eye on your query log to make sure everything works as intended.
*) Batch fetching isn't directly supported by JPA, but Hibernate supports it through mapping and Eclipselink through query hints.
Maybe some day Querydsl will support this kind of results grouping post processing out-of-box...

Related

SQL pagination issues, and specifically with GraphQL data-loading

I've been doing some research on how to set up a new GraphQL API project, but am running into some basic conceptual? problems in trying to find out how to do pagination and nested database queries efficiently.
I'd appreciate any pointers or advice!
Let's say we get a graphql query like so:
articles(limit: 10) {
title
content
comments(limit: 5) {
postedAt
text
}
}
A typical ORM, assuming eager loading of the nested type, could translate this type of query into an sql query like this, and then loop over the results to manually group the comments together and hydrate it all.
select a.title, a.content, c.posted_at, c.text
from articles as a
left join comments as c on c.article_id = a.id
limit ???
But so far, I've only ever seen ORMs like Doctrine (php) and Sequelize (js) fail in doing pagination correctly in these cases. They can't correctly handle page sizes, because there's no way to express the limit in this sql query's setup.
=> Am I correct in seeing this problem? Or am I missing something crucial, are ORMs able to do pagination with eagerly loaded data somehow?
So now I just recently came across the lateral join type in Postgres, which seems to solve this issue, provided we also add some json trickery:
select a.title, a.content, t.data as comments
from articles as a
join lateral (
select json_agg(sub.*) as data
from (
select c.posted_at, c.text
from comments as c
where c.article_id = a.id
limit 5
) sub
) t on true
limit 20;
(I think I've seen this kind of lateral + json trickery stuff in how Hasura and Postgraphile transform to sql, so I don't this it's unwarranted / bad engineering.)
=> Is there any ORM out there (except hasura/postgraphile), possibly Postgres-specific, that use this kind of lateral and json stuff, instead of the typical method described above?
Lastly, my research has taught me that in building a graphql api, you'll typically find yourself data-loading (batching) nested queries, instead of eager-loading them from the "parent" query. So, for example, this would be without data-loading:
class ArticleResolver {
comments(article) {
db.query("select ... from comments where ... = {article.id}");
}
and then this would be with data-loading:
class ArticleResolver {
commentsDataLoader = new DataLoader(articleIds => {
return db.query("select ... from comments where ... in {articleIds}");
});
comments(article) {
return this.commentsDataLoader.load(article.id);
}
But, as soon as you want to start adding parameters like limit: 5 to nested queries, this data-loading query gets as complicated as the original question, so we're back where we were :)
=> Is there a conventional way, of some standard practices, for dealing with this setup? Is there any known way / library so easily write out resolvers like, for example, this:
class ArticleResolver
...
comments(article, limit) {
return db.somehowMagicallyDataloaded("select * from comments ... = {article.id} limit {limit}")
}

Spring JPA repository and QueryDsl how to force left join

Let's say I have two entities User and Task, each user can have one task.
The issue that I'm facing is if I have one record in the user table whose email starts with a and there are no records at all in the task table.
This snippet below will return no records although I would expect users that have mail starting with a.
UserRepository in example extends QuerydslPredicateExecutor.
userRepository.findAll(
QUser.user.email.startsWith("a")
.or(QUser.user.task.text.contains("something"))
)
If I check logs, Hibernate is creating cross join with user.task_id=task.id as a part of where clauses. This type of join automatically discards users whose mails are starting with a if they don't have a task assigned.
Is there a way to force usage of left join instead of a cross join in findAll method of the repository?
I know I can do it by using JPAQuery but then I would have to reimplement paging functionality...
JPAQuery query = new JPAQuery(entityManager);
query
.from(QUser.user)
.leftJoin(QTask.task)
// ...
I am not sure if we can do that since the findAll implementation is generated for us. However we can pass a predicate in the findAll method which will help deal with issue you are encountering.
You can try to do something like this:
QUser qUser = QUser.user;
QTask qTask = QTask.task;
JPQL<UserEntity> userJpqlQuery = JPAExpressions.selectFrom(qUser)
.leftjoin(qUser.task, qTask)
.where(qUser.email...., qTask.text...);
userRepository.findAll(qUser.in(userJpqlQuery));
In the code above I have used Querydsl, which is an alternative to CriteriaBuilder and is type safe. Then I have created a subquery to make the selection I want and return the all users matching the subquery.
In the end , hibernate should generate something like this:
select * from User qUser0 where qUser0.id.in(
select qUser1.id from User qUser1
left join Task qTask0 on
qUser1.taskId = qTask0.id
where ...
);

Faster/efficient alternative to IN clause in custom/native queries in spring data jpa

I have a custom query along these lines. I get the list of orderIds from outside. I have the entire order object list with me, so I can change the query in any way, if needed.
#Query("SELECT p FROM Person p INNER JOIN p.orders o WHERE o.orderId in :orderIds)")
public List<Person> findByOrderIds(#Param("orderIds") List<String> orderIds);
This query works fine, but sometimes it may have anywhere between 50-1000 entries in the orderIds list sent from outside function. So it becomes very slow, taking as much as 5-6 seconds which is not fast enough. My question is, is there a better, faster way to do this? When I googled, and on this site, I see we can use ANY, EXISTS: Postgresql: alternative to WHERE IN respective WHERE NOT IN or create a temporary table: https://dba.stackexchange.com/questions/12607/ways-to-speed-up-in-queries-under-postgresql or join this to VALUES clause: Alternative when IN clause is inputed A LOT of values (postgreSQL). All these answers are tailored towards direct SQL calls, nothing based on JPA. ANY keyword is not supported by spring-data. Not sure about creating temporary tables in custom queries. I think I can do it with native queries, but have not tried it. I am using spring-data + OpenJPA + PostgresSQL.
Can you please suggest a solution or give pointers? I apologize if I missed anything.
thanks,
Alice
You can use WHERE EXISTS instead of IN Clause in a native SQL Query as well as in HQL in JPA which results in a lot of performance benefits. Please see sample below
Sample JPA Query:
SELECT emp FROM Employee emp JOIN emp.projects p where NOT EXISTS (SELECT project from Project project where p = project AND project.status <> 'Active')

ormlite select count(*) as typeCount group by type

I want to do something like this in OrmLite
SELECT *, COUNT(title) as titleCount from table1 group by title;
Is there any way to do this via QueryBuilder without the need for queryRaw?
The documentation states that the use of COUNT() and the like necessitates the use of selectRaw(). I hoped for a way around this - not having to write my SQL as strings is the main reason I chose to use ORMLite.
http://ormlite.com/docs/query-builder
selectRaw(String... columns):
Add raw columns or aggregate functions
(COUNT, MAX, ...) to the query. This will turn the query into
something only suitable for using as a raw query. This can be called
multiple times to add more columns to select. See section Issuing Raw
Queries.
Further information on the use of selectRaw() as I was attempting much the same thing:
Documentation states that if you use selectRaw() it will "turn the query into" one that is supposed to be called by queryRaw().
What it does not explain is that normally while multiple calls to selectColumns() or selectRaw() are valid (if you exclusively use one or the other),
use of selectRaw() after selectColumns() has a 'hidden' side-effect of wiping out any selectColumns() you called previously.
I believe that the ORMLite documentation for selectRaw() would be improved by a note that its use is not intended to be mixed with selectColumns().
QueryBuilder<EmailMessage, String> qb = emailDao.queryBuilder();
qb.selectColumns("emailAddress"); // This column is not selected due to later use of selectRaw()!
qb.selectRaw("COUNT (emailAddress)");
ORMLite examples are not as plentiful as I'd like, so here is a complete example of something that works:
QueryBuilder<EmailMessage, String> qb = emailDao.queryBuilder();
qb.selectRaw("emailAddress"); // This can also be done with a single call to selectRaw()
qb.selectRaw("COUNT (emailAddress)");
qb.groupBy("emailAddress");
GenericRawResults<String[]> rawResults = qb.queryRaw(); // Returns results with two columns
Is there any way to do this via QueryBuilder without the need for queryRaw(...)?
The short answer is no because ORMLite wouldn't know what to do with the extra count value. If you had a Table1 entity with a DAO definition, what field would the COUNT(title) go into? Raw queries give you the power to select various fields but then you need to process the results.
With the code right now (v5.1), you can define a custom RawRowMapper and then use the dao.getRawRowMapper() method to process the results for Table1 and tack on the titleCount field by hand.
I've got an idea how to accomplish this in a better way in ORMLite. I'll look into it.

How to use "DISTINCT ON (field)" in Doctrine 2?

I know how to use "DISTINCT" in Doctrine 2, but I really need to use "DISTINCT ON (field)" and I don't know how to do this with the QueryBuilder.
My SQL query looks like:
SELECT DISTINCT ON (currency) currency, amount FROM payments ORDER BY currency
And this query works perfect, but I can't use it with the QueryBuilder. Maybe I could write this query on some other way?
I would suggest that the SELECT DISTINCT ON (..) construct that PostgreSQL supports is outside the Object Relational Model (ORM) that is central to Doctrine. Or, perhaps put another way, because SELECT DISTINCT ON (..) is rare in SQL implementations Doctrine haven't coded for it.
Regardless of the actual logic for it not working, I would suggest you try Doctrine's "Native SQL". You need to map the results of your query to the ORM.
With NativeQuery you can execute native SELECT SQL statements and map
the results to Doctrine entities or any other result format supported
by Doctrine.
In order to make this mapping possible, you need to describe to
Doctrine what columns in the result map to which entity property. This
description is represented by a ResultSetMapping object.
With this feature you can map arbitrary SQL code to objects, such as
highly vendor-optimized SQL or stored-procedures.
SELECT DISTINCT ON (..) falls into vendor-optimized SQL I think, so using NativeQuery should allow you to access it.
Doctrine QueryBuilder has some limitations. Even if I didn't check if it's was possible with query builder, I do not hesitate to use DQL when I do not know how to write the query with query builder.
Check theses examples at
http://doctrine-orm.readthedocs.org/en/latest/reference/dql-doctrine-query-language.html#dql-select-examples
Hope this help.
INDEX BY can be used in DQL, allowing first result rows indexed by the defined string/int field to be overwritten by following ones with the same index:
SELECT
p.currency,
p.amount
FROM Namespace\To\Payments p INDEX BY p.currency
ORDER BY p.currency ASC
DQL - EBNF - INDEX BY