How to prevent OpenJPA 2.2.2 add extra cross join for multiple-root query - openjpa

I have two tables that don't have a defined relation, but I'm still trying to join them together in a single query, using criteria builder API.
This query work as I want it to - synchronizing the rows from both tables:
Root<E_Application> root = q.from(E_Application.class);
Root<E_Searcher> root2 = q.from(E_Searcher.class);
q.where(cb.equal(root.get(E_Application_.packageName),
root2.get(E_Searcher_.packageName)));
q.select(cb.sum(cb.literal(1)));
the query that comes out is: select sum(1) from application t0 cross join application_searcher t1 where (t0.package_name = t1.package_name);
However, if I add another join:
Root<E_Application> root = q.from(E_Application.class);
Root<E_Searcher> root2 = q.from(E_Searcher.class);
Join<E_Application, E_AppState> j1 =
root.join(E_Application_.publicAppState);
q.where(cb.equal(root.get(E_Application_.packageName),
root2.get(E_Searcher_.packageName)));
q.select(cb.sum(cb.literal(1)));
I get an extra cross join, and end up with a Cartesian product: SELECT SUM(1) FROM application t0 CROSS JOIN application t1 CROSS JOIN application_search t3 INNER JOIN application_state t2 ON t1.PUBLICAPPSTATE_ID = t2.id WHERE (t0.package_name = t3.package_name)
Is there any way to prevent this (except for defining the proper relationship between Application and Searcher)? Is this a proper implementation of the spec? It's sort of weird that the resulting query effectively has a root that I didn't explicitly request, and don't have control over...
The database is postgres, if it matters.
P.S. The reason there is no relationship is because these two tables have their PKs the same, and you can't have a PK that is also an entity reference.

Related

(JPA) Append condition in ON clause during Joins

I need the JPQL in this format:
SELECT *
FROM Person p
LEFT JOIN Address a ON p.id = a.id AND a.flat = 100;
But when i run the code the query being slightly modified henceforth the output of data changed..
SELECT * FROM Person p LEFT JOIN Address a ON p.id = a.id where a.flat = 100;
I suppose you want to format the sql output.
So for hibernate, you can set hibernate.format_sql=true to well format the sql.
If you are using spring boot, you can config it via spring.jpa.properties.hibernate.format_sql=true simply.
JPQL does not allow joining to arbitrary other root objects, so you can't do that in JPQL; only allowing you to join along relations. Individual vendors may offer a vendor extension to allow joining across arbitrary roots but you then lose portability.
To provide more than that you have to post the actual JPA entities, and you haven't.

Can I apply predicates to the same columns against multiple tables in a JOIN only once?

I want to join two tables together and add additional information from two other tables to the same columns in both queried tables. I've come up with the below code, which works, but I don't feel comfortable about having to add another JOIN clause for each table, as it would make the query substantially long if I wanted to join/add more things.
Is there a way to combine it, so that I can join additional tables only once (just use S and E aliases every time)?
SELECT
J.JobId,
J.StandardJobId,
S.JobName,
J.EngineerId,
E.EngineerName,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
SF.JobName AS FollowUpJobName,
JF.EngineerId AS FollowUpEngineerId,
EF.EngineerName AS FollowUpEngineerName
FROM
Jobs J
INNER JOIN
Jobs JF
ON
J.FollowUpJobId = JF.JobId
INNER JOIN
StandardJobs S
ON
J.StandardJobId = S.StandardJobId
INNER JOIN
Engineers E
ON
E.EngineerId = J.EngineerId
INNER JOIN
StandardJobs SF
ON
SF.StandardJobId = JF.StandardJobId
INNER JOIN
Engineers EF
ON
EF.EngineerId = JF.EngineerId
One approach would be to use a Common Table Expression (CTE) - something like:
with cte as
(SELECT J.JobId,
J.StandardJobId,
S.JobName,
J.EngineerId,
E.EngineerName,
J.FollowUpJobId
FROM Jobs J
INNER JOIN StandardJobs S ON J.StandardJobId = S.StandardJobId
INNER JOIN Engineers E ON E.EngineerId = J.EngineerId)
SELECT O.*,
F.StandardJobId AS FollowUpStandardJobId,
F.JobName AS FollowUpJobName,
F.EngineerId AS FollowUpEngineerId,
F.EngineerName AS FollowUpEngineerName
FROM CTE AS O
JOIN CTE AS F ON O.FollowUpJobId = F.JobId
You can sort of do this with either a CTE (Common Table Expressions, the WITH clause) or a View:
;WITH Jobs_Extended As
(
SELECT j.*,
s.JobName,
E.EngineerName
FROM Jobs As j
JOIN StandardJobs As s ON s.StandardJobId = j.StandardJobId
JOIN Engineer As e ON e.EngineerId = j.EngineerId
)
SELECT
J.JobId,
J.StandardJobId,
J.JobName,
J.EngineerId,
J.EngineerName,
JF.JobId AS FollowUpJobId,
JF.StandardJobId AS FollowUpStandardJobId,
JF.JobName AS FollowUpJobName,
JF.EngineerId AS FollowUpEngineerId,
JF.EngineerName AS FollowUpEngineerName
FROM Jobs_Extended J
JOIN Jobs_Extended JF ON J.FollowUpJobId = JF.JobId
In this example the CTE Jobs_Extended becomes a defined alias for the relationship between the Jobs, Engineers and StandardJobs tables. Then once defined, you can use it multiple times in the query without having to redefine those interior relations.
You can do the same thing by change the WITH to a View, which will make the defined alias permannet in your database.
No, you cannot avoid JOINing related tables each time a separate reference is needed. The issue is that you are not working with the tables in a general sense but instead working with the specific rows of each table, even more specifically, just those rows that match the JOIN and WHERE conditions.
There is no way to specify the references to either StandardJobs or Engineers only once because you are needing to work with two rows from each table at the same time, at least in the given example.
However, depending on which direction you are wanting to go with "additional tables" (more references to Jobs or more lookups like StandardJobs and Engineers for the given 2 references of Jobs), the CTE construct shown by Mark is the probably the easiest / best way to abstract it. I posted this answer mainly to explain the issue at hand.

Postgres: left join with order by and limit 1

I have the situation:
Table1 has a list of companies.
Table2 has a list of addresses.
Table3 is a N relationship of Table1 and Table2, with fields 'begin' and 'end'.
Because companies may move over time, a LEFT JOIN among them results in multiple records for each company.
begin and end fields are never NULL. The solution to find the latest address is use a ORDER BY being DESC, and to remove older addresses is a LIMIT 1.
That works fine if the query can bring only 1 company. But I need a query that brings all Table1 records, joined with their current Table2 addresses. Therefore, the removal of outdated data must be done (AFAIK) in LEFT JOIN's ON clause.
Any idea how I can build the clause to not create duplicated Table1 companies and bring latest address?
Use a dependent subquery with max() function in a join condition.
Something like in this example:
SELECT *
FROM companies c
LEFT JOIN relationship r
ON c.company_id = r.company_id
AND r."begin" = (
SELECT max("begin")
FROM relationship r1
WHERE c.company_id = r1.company_id
)
INNER JOIN addresses a
ON a.address_id = r.address_id
demo: http://sqlfiddle.com/#!15/f80c6/2
Since PostgreSQL 9.3 there is JOIN LATERAL (https://www.postgresql.org/docs/9.4/queries-table-expressions.html) that allows to make a sub-query to join, so it solves your issue in an elegant way:
SELECT * FROM companies c
JOIN LATERAL (
SELECT * FROM relationship r
WHERE c.company_id = r.company_id
ORDER BY r."begin" DESC LIMIT 1
) r ON TRUE
JOIN addresses a ON a.address_id = r.address_id
The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside.
I managed to solve it using Windows Function:
WITH ranked_relationship AS(
SELECT
*
,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
FROM relationship
)
SELECT
company.*
address.*,
dt_last_addr as dt_relationship
FROM
company
LEFT JOIN ranked_relationship as relationship
ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
LEFT JOIN address ON address.pk_address = relationship.fk_address
row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.
Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!

SQL Server 2008: many to many tables with relational tables ordering field + grouping

To understand what I need, here is the table's I'm using diagram: http://pascalc.nougen.com/stuffs/diagram.png
I need to get a project's properties + all it's relation, all listed based on the corresponding relational tables' OrderNumber column.
Let's say I need "Project Z", I want to get:
Project's BaseUrl, ... where ID = #ID
All testsuites associated to that project, listed by ProjectsToTestSuites.OrderNumber
All testcases associated to the matching testsuites, listed by TestSuitesToTestCases.OrderNumber
All testaction associated to the matching testcases, listed by TestCasesToTestActions.OrderNumber
So far, all my attempts are returning back results with mixed ordering. A testcase is mixed inside a testsuite it doesn't belong to and alike.
I try to avoid using cursors (loop each relation in specific order required), tried the use of UNION but couldn't get it to work either.
I wouldn't have troubles with cursor but if a solution exists withuout the need to use it, I prefer of course.
Thanks
If you want a flat result you could start with something like this (untested)
select
p.Id,
p.BaseUrl,
ts.*,
tc.*,
ta.*,
pts.OrderNumber as SuitesOrder,
tstc.OrderNumber as CasesOrder,
tcta.OrderNumber as ActionsOrder
from
Projects p
join
ProjectToTestSuites pts
on pts.Projects_Id = p.Id
join
TestSuites ts
on ts.Id = pts.TestSuites_Id
join
TestSuitesToTestCases tstc
on tstc.TestSuites_Id = ts.Id
join
TestCases tc
on tc.Id = tstc.TestCases_Id
join
TestCasesToTestActions tcta
on tcta.TestCases_Id = tc.Id
join
TestActions ta
on ta.Id = tcta.TestActions_Id
where
p.Id = #Id
order by
pts.OrderNumber,
tstc.OrderNumber,
tcta.OrderNumber

JOIN that doesn't exclude all records if one side is null

I have a fairly conventional set of order entry tables divided by:
Orders
OrdersRows
OrdersRowsOptions
The record in OrderRowOptions is not created unless needed. When I create a set of joins like
select * from orders o
inner join OrdersRows r on r.idOrder = o.idOrder
inner join ordersrowsoptions ro on ro.idOrderRow = r.idOrderRow
where r.idProduct = [foo]
My full resultset is blank if no ordersrowsoptions records exist for the given product.
what's the correct syntax to return records even if no records exist at one of the join clauses?
thx
select * from orders o
inner join OrdersRows r on r.idOrder = o.idOrder
left join ordersrowsoptions ro on ro.idOrderRow = r.idOrderRow
where r.idProduct = [foo]
Of course you should not use select * in any query but especially never when doing a join. The repeated fields are just wasting server and network resources.
Since you seem unfamiliar with left joins, you probably also need to understand the concepts in this:
http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN
LEFT JOIN / RIGHT JOIN.
Edit: yes, the following answer, given earlier, is correct:
select * from orders o
inner join OrdersRows r on r.idOrder = o.idOrder
left join ordersrowsoptions ro on ro.idOrderRow = r.idOrderRow
where r.idProduct = [foo]
LEFT JOIN (or RIGHT JOIN) are probably what you are looking for, depending on which side of the join no rows may appear.
Interesting, do you want to get all orders that have that product in them? The other post is correct that you have to use LEFT or RIGHT OUTER JOINS. But if you want to get entire orders that have that product then you'd need a more complex where clause.