Self join on CTE - tsql

I have doubts about the execution of an auto join on cte:
;with data as
(
select col1, col2, …, Weighting
from maTable
some joins … (on EAV modele :( )
where ...
)
select * form data t1
left join data t2
on t1.id = t2.id
and t1.Weighting > t2.Weighting
The CTE will return all valid data, the self join will get the most recent values.
Will SQL Server scan maTable two times, or place the result in memory and use it for the join?
The execution plan shows that it scans my index two times...
What is the best way to do this with less impact on cpu and io?
Using a
- temp table instead of CTE is quicker but use more CPU and IO
- var table instead of CTE is longer and use a lot more CPU and IO
Thanks in advance.

Related

How to optimize the script

I have the following SQL code:
select t1.*
from t1
join t3 on t3.id = t1.id
join t2 on t1.num = t2.num and coalesce(t1.date,t3.date) >= t2.date
but this script is not optimal at all, probably because of inequality in join.Is there a way to rewrite this, nothing comes to my mind
you can add indexes to columns t1.id, t3.id, t1.num, t2.num,
t1.date, t2.date, t3.date to perform an index scan while query
execution.
Postgresql - Index optimization for Date columns
Alternatively, if exists, also add the Direct join condition between
t2 & t3.
Use specific columns to be returned instead of "*".

Using LIMIT Statement in INNER JOIN (postgreSQL)

I am having trouble using the LIMIT Statement. I would really appreciate your help.
I am trying to INNER JOIN three tables and use the LIMIT statement to only query a few lines because the tables are so huge.
So, basically, this is what I am trying to accomplish:
SELECT *
FROM ((scheme1.table1
INNER JOIN scheme1.table2
ON scheme1.table1.column1 = scheme1.table2.column1 LIMIT 1)
INNER JOIN scheme1.table3
ON scheme1.table1.column1 = scheme1.table3.column1)
LIMIT 1;
I get an syntax error on the LIMIT from the first INNER JOIN. Why? How can I limit the results I get from each of the INNER JOINS. If I only use the second "LIMIT 1" at the bottom, I will query the entire table.
Thanks a lot!
LIMIT can only be applied to queries, not to a table reference. So you need to use a complete SELECT query for table2 in order to be able to use the LIMIT clause:
SELECT *
FROM schema1.table1 as t1
INNER JOIN (
select *
from schema1.table2
order by ???
limit 1
) as t2 ON t1.column1 = t2.column1
INNER JOIN schema1.table3 as t3 on ON t1.column1 = t3.column1
order by ???
limit 1;
Note that LIMIT without an ORDER BY typically makes no sense as results of a query have no inherent sort order. You should think about applying the necessary ORDER BY in the derived table (aka sub-query) and the outer query to get consistent and deterministic results.

Joining on different columns from multiple tables and combining the results; Union method is too costly, need alternative approach - Postgresql

There are two massive tables from where I have to query out a subset of interest. Both have multiple common columns but with lot of nulls. I want to join with multiple join conditions on these columns and then combine the result sets. Using Union method is costing too much and db is not ready to allow the query. Could someone help how I can optimize with some smart technique.
My query is like
select col1,col2,col3,col4,col5 from tab1 T1
left join tab2 T2 on T1.col1=T2.col1
Union
select col1,col2,col3,col4,col5 from tab1 T1
left join tab2 T2 on T1.col2=T2.col2
Union
select col1,col2,col3,col4,col5 from tab1 T1
left join tab2 T2 on T1.col3=T2.col3
Thanks for your support.

Postgres: left join with order by and limit 1

I have the situation:
Table1 has a list of companies.
Table2 has a list of addresses.
Table3 is a N relationship of Table1 and Table2, with fields 'begin' and 'end'.
Because companies may move over time, a LEFT JOIN among them results in multiple records for each company.
begin and end fields are never NULL. The solution to find the latest address is use a ORDER BY being DESC, and to remove older addresses is a LIMIT 1.
That works fine if the query can bring only 1 company. But I need a query that brings all Table1 records, joined with their current Table2 addresses. Therefore, the removal of outdated data must be done (AFAIK) in LEFT JOIN's ON clause.
Any idea how I can build the clause to not create duplicated Table1 companies and bring latest address?
Use a dependent subquery with max() function in a join condition.
Something like in this example:
SELECT *
FROM companies c
LEFT JOIN relationship r
ON c.company_id = r.company_id
AND r."begin" = (
SELECT max("begin")
FROM relationship r1
WHERE c.company_id = r1.company_id
)
INNER JOIN addresses a
ON a.address_id = r.address_id
demo: http://sqlfiddle.com/#!15/f80c6/2
Since PostgreSQL 9.3 there is JOIN LATERAL (https://www.postgresql.org/docs/9.4/queries-table-expressions.html) that allows to make a sub-query to join, so it solves your issue in an elegant way:
SELECT * FROM companies c
JOIN LATERAL (
SELECT * FROM relationship r
WHERE c.company_id = r.company_id
ORDER BY r."begin" DESC LIMIT 1
) r ON TRUE
JOIN addresses a ON a.address_id = r.address_id
The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside.
I managed to solve it using Windows Function:
WITH ranked_relationship AS(
SELECT
*
,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
FROM relationship
)
SELECT
company.*
address.*,
dt_last_addr as dt_relationship
FROM
company
LEFT JOIN ranked_relationship as relationship
ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
LEFT JOIN address ON address.pk_address = relationship.fk_address
row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.
Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!

Nested select statement in FROM clause? Inner Join statements? or just table name?

I'm building a query that needs data from 5 tables.
I've been told by a DBA in the past that specifying a list of columns vs getting all columns (*) is preferred from some performance/memory aspect.
I've also been told that the database performs a JOIN operation behind the scenes when there's a list of tables in the FROM clause, to create one table (or view).
The existing database has very little data at the moment, as we're at a very initial point. So not sure I can measure the performance hit in practice.
I am not a database pro. I can get what data I need. The dillema is, at what price.
Added: At the moment I'm working with MS SQL Server 2008 R2.
My questions are:
Is there a performance difference and why, between the following:
a. SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel that this might be a performance hit)
b. SELECT ... FROM tbl1 inner join tbl2 on ... inner join tbl3 on ... etc (would this be more explicit to the server and save on performance/memory)?
c. SELECT ... FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this save anythig? or is it just extra select statements that create more work for the server and for us)?
Is there yet a better way to do this?
Below are two queries that both get the slice of data that I need. One includes more nested select statements.
I apologize if they are not written in a standard form or helplessly overcomplicated - hopefully you can decipher. I try to keep them organized as much as possible.
Insights would be most appreciated as well.
Thanks for checking this out.
5 tables: devicepool, users, trips, TripTracker, and order
Query 1 (more select statements):
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
(
SELECT
tripid,
[status] tripstatus,
stops,
devid,
groupid
FROM
trips
)
AS [base2]
ON base.devid = base2.devid AND base2.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
(
SELECT
[status] [orderstatus],
[address] [destaddress],
[tripid],
stopnumber orderstopnumber
FROM [order]
)
AS [orders]
ON orders.orderstopnumber = tracker.stopnumber)
Query 2:
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
trips
ON base.devid = trips.devid AND trips.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
[order]
ON [order].stopnumber = tracker.stopnumber)
Is there a performance difference and why, between the following: a.
SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel
that this might be a performance hit) b. SELECT ... FROM tbl1 inner
join tbl2 on ... inner join tbl3 on ... etc (would this be more
explicit to the server and save on performance/memory)? c. SELECT ...
FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this
save anythig? or is it just extra select statements that create more
work for the server and for us)?
a) and b) should result in the same query plan (although this is db-specific). b) is much preferred for portability and readability over a). c) is a horrible idea, that hurts readability and if anything will result in worse peformance. Let us never speak of it again.
Is there yet a better way to do this?
b) is the standard approach. In general, writing the plainest ANSI SQL will result in the best performance, as it allows the query parser to easily understand what you are trying to do. Trying to outsmart the compiler with tricks may work in a given situation, but does not mean that it will still work when the cardinality or amount of data changes, or the database engine is upgraded. So, avoid doing that unless you are absolutely forced to.