Faster/efficient alternative to IN clause in custom/native queries in spring data jpa - postgresql

I have a custom query along these lines. I get the list of orderIds from outside. I have the entire order object list with me, so I can change the query in any way, if needed.
#Query("SELECT p FROM Person p INNER JOIN p.orders o WHERE o.orderId in :orderIds)")
public List<Person> findByOrderIds(#Param("orderIds") List<String> orderIds);
This query works fine, but sometimes it may have anywhere between 50-1000 entries in the orderIds list sent from outside function. So it becomes very slow, taking as much as 5-6 seconds which is not fast enough. My question is, is there a better, faster way to do this? When I googled, and on this site, I see we can use ANY, EXISTS: Postgresql: alternative to WHERE IN respective WHERE NOT IN or create a temporary table: https://dba.stackexchange.com/questions/12607/ways-to-speed-up-in-queries-under-postgresql or join this to VALUES clause: Alternative when IN clause is inputed A LOT of values (postgreSQL). All these answers are tailored towards direct SQL calls, nothing based on JPA. ANY keyword is not supported by spring-data. Not sure about creating temporary tables in custom queries. I think I can do it with native queries, but have not tried it. I am using spring-data + OpenJPA + PostgresSQL.
Can you please suggest a solution or give pointers? I apologize if I missed anything.
thanks,
Alice

You can use WHERE EXISTS instead of IN Clause in a native SQL Query as well as in HQL in JPA which results in a lot of performance benefits. Please see sample below
Sample JPA Query:
SELECT emp FROM Employee emp JOIN emp.projects p where NOT EXISTS (SELECT project from Project project where p = project AND project.status <> 'Active')

Related

JPQL equivalent of SQL query using unions and selecting constants

I've written a SQL query that basically selects from a number of tables to determine which ones have rows that were created since a particular date. My SQL looks something like this:
SELECT widget_type FROM(
SELECT 'A' as widget_type
FROM widget_a
WHERE creation_timestamp > :cutoff
UNION
SELECT 'B' as widget_type
FROM widget_b
WHERE creation_timestamp > :cutoff
) types
GROUP BY widget_type
HAVING count(*)>0
That works well in SQL but I recently found that, while JPA may use unions to perform "table per class" polymorphic queries, JPQL does not support unions in queries. So that leaves me wondering whether JPA has an alternative I could use to accomplish the same thing.
In reality, I would be querying against a dozen tables, not just two, so I would like to avoid doing separate queries. I would also like to avoid doing a native SQL query for portability reasons.
In the question I linked to above, it was asked whether the entities that map to widget_a and widget_b are part of the same inheritance tree. Yes, they are. However, if I selected from their base class, I don't believe I would have a way of specifying different string constants for the different child entities, would I? If I could select an entity's class name instead of a string I provide, that might serve my purpose too. But I don't know if that's possible either. Thoughts?
I did a little more searching and found a (seemingly obscure) feature of JPA that serves my purpose perfectly. What I found is that JPA 2 has a type keyword that allows you to limit polymorphic queries to a particular subclass, like so:
SELECT widget
FROM BaseWidget widget
WHERE TYPE(widget) in (WidgetB, WidgetC)
I've found that JPA (or at least Hibernate as a JPA implementation) allows you to use type not only in constraints but also in select lists. This is approximately what my query ended up looking like:
SELECT DISTINCT TYPE(widget)
FROM BaseWidget widget
WHERE widget.creationTimestamp > :cutoff
That query returns a list of Class objects. My original query was selecting string literals because that's closest to what I might have done in SQL. Selecting Class is actually preferable in my case. But if I did prefer to select a constant based on an entity's type, that is the exact scenario that Oracle's documentation uses to illustrate case statements:
SELECT p.name
CASE TYPE(p)
WHEN Student THEN 'kid'
WHEN Guardian THEN 'adult'
WHEN Staff THEN 'adult'
ELSE 'unknown'
END
FROM Person p
Some JPA providers do support UNION,
http://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Querying/JPQL#UNION
but your query seems very complex, and non object-oriented, so using a native SQL query would probably be best.

How to use "DISTINCT ON (field)" in Doctrine 2?

I know how to use "DISTINCT" in Doctrine 2, but I really need to use "DISTINCT ON (field)" and I don't know how to do this with the QueryBuilder.
My SQL query looks like:
SELECT DISTINCT ON (currency) currency, amount FROM payments ORDER BY currency
And this query works perfect, but I can't use it with the QueryBuilder. Maybe I could write this query on some other way?
I would suggest that the SELECT DISTINCT ON (..) construct that PostgreSQL supports is outside the Object Relational Model (ORM) that is central to Doctrine. Or, perhaps put another way, because SELECT DISTINCT ON (..) is rare in SQL implementations Doctrine haven't coded for it.
Regardless of the actual logic for it not working, I would suggest you try Doctrine's "Native SQL". You need to map the results of your query to the ORM.
With NativeQuery you can execute native SELECT SQL statements and map
the results to Doctrine entities or any other result format supported
by Doctrine.
In order to make this mapping possible, you need to describe to
Doctrine what columns in the result map to which entity property. This
description is represented by a ResultSetMapping object.
With this feature you can map arbitrary SQL code to objects, such as
highly vendor-optimized SQL or stored-procedures.
SELECT DISTINCT ON (..) falls into vendor-optimized SQL I think, so using NativeQuery should allow you to access it.
Doctrine QueryBuilder has some limitations. Even if I didn't check if it's was possible with query builder, I do not hesitate to use DQL when I do not know how to write the query with query builder.
Check theses examples at
http://doctrine-orm.readthedocs.org/en/latest/reference/dql-doctrine-query-language.html#dql-select-examples
Hope this help.
INDEX BY can be used in DQL, allowing first result rows indexed by the defined string/int field to be overwritten by following ones with the same index:
SELECT
p.currency,
p.amount
FROM Namespace\To\Payments p INDEX BY p.currency
ORDER BY p.currency ASC
DQL - EBNF - INDEX BY

QueryDSL: querying relations and properties

I'm using QueryDSL with JPA.
I want to query some properties of an entity, it's like this:
QPost post = QPost.post;
JPAQuery q = new JPAQuery(em);
List<Object[]> rows = q.from(post).where(...).list(post.id, post.name);
It works fine.
If i want to query a relation property, e.g. comments of a post:
List<Set<Comment>> rows = q.from(post).where(...).list(post.comments);
It's also fine.
But when I want to query relation and simple properties together, e.g.
List<Object[]> rows = q.from(post).where(...).list(post.id, post.name, post.comments);
Then something went wrong, generiting a bad SQL syntax.
Then I realized that it's not possible to query them together in one SQL statement.
Is it possible that QueryDSL would somehow deal with relations and generate additional queries (just like what hibernate does with lazy relations), and load the results in?
Or should I just query twice, and then merge both result lists?
P.S. what i actually want is each post with its comments' ids. So a function to concat each post's comment ids is better, is this kind of expressin possible?
q.list(post.id, post.name, post.comments.all().id.join())
and generate a subquery sql like (select group_concat(c.id) from comments as c inner join post where c.id = post.id)
Querydsl JPA is restricted to the expressivity of JPQL, so what you are asking for is not possible with Querydsl JPA. You can though try to express it with Querydsl SQL. It should be possible. Also as you don't project entities, but literals and collections it might work just fine.
Alternatively you can load the Posts with only the Comment ids loaded and then project the id, name and comment ids to something else. This should work when accessors are annotated.
The simplest thing would be to query for Posts and use fetchJoin for comments, but I'm assuming that's too slow for you use case.
I think you ought to simply project required properties of posts and comments and group the results by hand (if required). E.g.
QPost post=...;
QComment comment=..;
List<Tuple> rows = q.from(post)
// Or leftJoin if you want also posts without comments
.innerJoin(comment).on(comment.postId.eq(post.id))
.orderBy(post.id) // Could be used to optimize grouping
.list(new QTuple(post.id, post.name, comment.id));
Map<Long, PostWithComments> results=...;
for (Tuple row : rows) {
PostWithComments res = results.get(row.get(post.id));
if (res == null) {
res = new PostWithComments(row.get(post.id), row.get(post.name));
results.put(res.getPostId(), res);
}
res.addCommentId(row.get(comment.id));
}
NOTE: You cannot use limit nor offset with this kind of queries.
As an alternative, it might be possible to tune your mappings so that 1) Comments are always lazy proxies so that (with property access) Comment.getId() is possible without initializing the actual object and 2) using batch fetch* on Post.comments to optimize collection fetching. This way you could just query for Posts and then access id's of their comments with little performance hit. In most cases you shouldn't even need those lazy proxies unless your Comment is very fat. That kind of code would certainly look nicer without low level row handling and you could also use limit and offset in your queries. Just keep an eye on your query log to make sure everything works as intended.
*) Batch fetching isn't directly supported by JPA, but Hibernate supports it through mapping and Eclipselink through query hints.
Maybe some day Querydsl will support this kind of results grouping post processing out-of-box...

How to build a select using Zend with a DISTINCT specific column?

I'm using Zend Framework for my website and I'd like to retrieve some data from my PostgreSQL database.
I have a request like :
SELECT DISTINCT ON(e.id) e.*, f.*, g.* FROM e, f, g
WHERE e.id = f.id_e AND f.id = g.id_f
This request works well but I don't know how to convert the DISTINCT ON(e.id) with Zend.
It seems that I can get DISTINCT rows but no distinct columns.
$select->distinct()->from("e")->join("f", "e.id = f.id_e")
->join("g", "f.id = g.id_f");
Any idea on how to make a select with distinct column ?
Thanks for help
You probably can't do this with Zend Framework since distinct on is not part of the SQL standard (end of page in Postgres documentation). Although Postgres supports it, I would assume its not part of Zend Framework because you could in theory configure another database connection which does not offer support.
If you know in advance that you're developing for a specific database (Postgres in this case), you could use manually written statements instead. You'll gain more flexibility within the queries and better performance at the cost of no longer being able to switch databases.
You would then instantiate a Zend_Db_Apdapter for Postgres. There a various methods available to get results for SQL queries which are described in the frameworks documentation starting at section Reading Query Results. If you choose to go this route I'd recommend to create an own subclass of the Zend_Db_Adapter_Pgsql class. This is to be able to convert data types and throw exceptions in case of errors instead of returning ambiguous null values and hiding error causes.

Optimising (My)SQL Query

I usually use ORM instead of SQL and I am slightly out of touch on the different JOINs...
SELECT `order_invoice`.*
, `client`.*
, `order_product`.*
, SUM(product.cost) as net
FROM `order_invoice`
LEFT JOIN `client`
ON order_invoice.client_id = client.client_id
LEFT JOIN `order_product`
ON order_invoice.invoice_id = order_product.invoice_id
LEFT JOIN `product`
ON order_product.product_id = product.product_id
WHERE (order_invoice.date_created >= '2009-01-01')
AND (order_invoice.date_created <= '2009-02-01')
GROUP BY `order_invoice`.`invoice_id`
The tables/ columns are logically names... it's an shop type application... the query works... it's just very very slow...
I use the Zend Framework and would usually use Zend_Db_Table_Row::find(Parent|Dependent)Row(set)('TableClass') but I have to make lots of joins and I thought it'll improve performance by doing it all in one query instead of hundreds...
Can I improve the above query by using more appropriate JOINs or a different implementation? Many thanks.
The query is wrong, the GROUP BY is wrong. All columns in the SELECT-part that are not in an aggregate function, have to be in the GROUP BY. You mention only one column.
Change the SQL Mode, set it to ONLY_FULL_GROUP_BY.
When this is done and you have a correct query, use EXPLAIN to find out how the query is executed and what indexes are used. Then start optimizing.