Joining on different columns from multiple tables and combining the results; Union method is too costly, need alternative approach - Postgresql - postgresql

There are two massive tables from where I have to query out a subset of interest. Both have multiple common columns but with lot of nulls. I want to join with multiple join conditions on these columns and then combine the result sets. Using Union method is costing too much and db is not ready to allow the query. Could someone help how I can optimize with some smart technique.
My query is like
select col1,col2,col3,col4,col5 from tab1 T1
left join tab2 T2 on T1.col1=T2.col1
Union
select col1,col2,col3,col4,col5 from tab1 T1
left join tab2 T2 on T1.col2=T2.col2
Union
select col1,col2,col3,col4,col5 from tab1 T1
left join tab2 T2 on T1.col3=T2.col3
Thanks for your support.

Related

Select distinct join in HQL Request

I have a bit complicated request to do in HQL and I don't manage to obtain the result I want.
Here is what I want to do :
I have two entities t1 and t2 with a OneToMany relation between both.
I want to select some infos from both tables in the same request but here is the issue, I don't want any duplicate of t1.
So basically I want 4 properties, 3 from t1 and 1 from t2, but as there are several records from t2 for the same t1 object, I just want to get the first from t2 to not have any t1 duplicated records.
Here is what I did :
SELECT DISTINCT(t1.a , t1.b, t1.c, t2.z) FROM t1 LEFT JOIN t2
But Obviously, that worked when I did not need any t2 parameter, but now I have some records (a,b,c) duplicated for different t2.z
And I don't find any way in HQL to do it (I can't do any Select LIMIT 1 that could work in SQL).
Does anybody have an idea on how to resolve that?
Thanks

Postgres: left join with order by and limit 1

I have the situation:
Table1 has a list of companies.
Table2 has a list of addresses.
Table3 is a N relationship of Table1 and Table2, with fields 'begin' and 'end'.
Because companies may move over time, a LEFT JOIN among them results in multiple records for each company.
begin and end fields are never NULL. The solution to find the latest address is use a ORDER BY being DESC, and to remove older addresses is a LIMIT 1.
That works fine if the query can bring only 1 company. But I need a query that brings all Table1 records, joined with their current Table2 addresses. Therefore, the removal of outdated data must be done (AFAIK) in LEFT JOIN's ON clause.
Any idea how I can build the clause to not create duplicated Table1 companies and bring latest address?
Use a dependent subquery with max() function in a join condition.
Something like in this example:
SELECT *
FROM companies c
LEFT JOIN relationship r
ON c.company_id = r.company_id
AND r."begin" = (
SELECT max("begin")
FROM relationship r1
WHERE c.company_id = r1.company_id
)
INNER JOIN addresses a
ON a.address_id = r.address_id
demo: http://sqlfiddle.com/#!15/f80c6/2
Since PostgreSQL 9.3 there is JOIN LATERAL (https://www.postgresql.org/docs/9.4/queries-table-expressions.html) that allows to make a sub-query to join, so it solves your issue in an elegant way:
SELECT * FROM companies c
JOIN LATERAL (
SELECT * FROM relationship r
WHERE c.company_id = r.company_id
ORDER BY r."begin" DESC LIMIT 1
) r ON TRUE
JOIN addresses a ON a.address_id = r.address_id
The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside.
I managed to solve it using Windows Function:
WITH ranked_relationship AS(
SELECT
*
,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
FROM relationship
)
SELECT
company.*
address.*,
dt_last_addr as dt_relationship
FROM
company
LEFT JOIN ranked_relationship as relationship
ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
LEFT JOIN address ON address.pk_address = relationship.fk_address
row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.
Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!

How does COUNT(*) behave in an inner join

Take this query:
SELECT c.CustomerID, c.AccountNumber, COUNT(*) AS CountOfOrders,
SUM(s.TotalDue) AS SumOfTotalDue
FROM Sales.Customer AS c
INNER JOIN Sales.SalesOrderheader AS s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;
I expected COUNT(*) to count the rows in Sales.Customer but to my surprise it counts the number of rows in the joined table.
Any idea why this is? Also, is there a way to be explicit in specifying which table COUNT() should operate on?
Query Processing Order...
The FROM clause is processed before the SELECT clause -- which is to say -- by the time SELECT comes into play, there is only one (virtual) table it is selecting from -- namely, the individual tables after their joined (JOIN), filtered (WHERE), etc.
If you just want to count over the one table, then you might try a couple of things...
COUNT(DISTINCT table1.id)
Or turn the table you want to count into a sub-query with count() inside of it

Intersect of Select Statements based on a particular column

I have a Q about INTERSECT clause between two select statements in Sql server 2008.
Select 1 a,b,c ..... INTERSECT Select 2 a,b,c....
Here, the datasets of the two queries should exactly match to return the common elements.
But, I want only column a of both select statements to match.
If the values of column a in both the queries have same values, the entire row should appear in the result set.
Can i Do that and How ??
Thanks,
Marcus..
The best thing to do is to look at the queries itself. DO they need an INTERSECT, of is it possible to make a join with it
for example.
An INTERSECT looks like this
select columnA
from tableA
INTERSECT
select columnAreference
from tableB
Your result would have all columns that are in BOTH tables.. so a join would be more usefull
select columnA
from tableA a
inner join tableB b
on b.columnAReference = a.columnA
If you look into the execution plan you'll see that the INTERSECT will do a left semi join and the inner join will do a, like expected, an inner join. A left semi join isn't something you can tell the query optimizer to do, BUT IT IS FASTER!!!! A left semi join will only return 1 row from the left table, where a normal join will return them all. In this particular case it will be faster.
So an INTERSECT isn't a bad thing which should be eliminated with an INNER JOIN construction, sometimes it will perform even better.
However, to give you the best answer, i will need some more details about your query :)
select * from table1 t1 inner join Table2 t2
on t1.col1=t2.col1

How to select distinct-columns along with one nondistinct-column in DB2?

I need to perform distinct select on few columns out of which, one column is non-distinct. Can I specify which columns make up the distinct group in my SQL statement.
Currently I am doing this.
Select distinct a,b,c,d from TABLE_1 inner join TABLE_2 on TABLE_1.a = TABLE_2.a where TABLE_2.d IS NOT NULL;
The problem I have is I am getting 2 rows for the above SQL because column D holds different values. How can I form a distinct group of columns (a,b&c) ignoring column d, but have column d in my select clause as well?
FYI: I am using DB2
Thanks
Sandeep
SELECT a,b,c,MAX(d)
FROM table_1
INNER JOIN table_2 ON table_1.a = table_2.a
GROUP BY a,b,c
Well, your question, even with refinements, is still pretty general. So, you get a general answer.
Without knowing more about your table structure or your desired results, it may be impossible to give a meaningful answer, but here goes:
SELECT a, b, c, d
FROM table_1 as t1
JOIN table_2 as t2
ON t2.a = t1.a
AND t2.[some_timestamp_column] = (SELECT MAX(t3.[some_timestamp_column])
FROM table_2 as t3
WHERE t3.a = t2.a)
This assumes that table_1 is populated with single rows to retrieve, and that the one-to-many relationship between table_1 and table_2 is created because of different values of d, populated at unique [some_timestamp_column] times. If this is the case, it will get the most-recent table_2 record that matches to table_1.