Criteria in WHERE vs. JOIN - tsql

Is there any difference/limitations/considerations that need to be made when adding additional criteria to a JOIN rather than including it in a WHERE clause. Example...
SELECT
*
FROM
TABLE1 t1
INNER JOIN
TABLE2 t2
ON t1.a = t2.a
AND
t1.DATE_TIME < 06/01/2015
versus
SELECT
*
FROM
TABLE1 t1
INNER JOIN
TABLE2 t2
ON t1.a = t2.a
WHERE
t1.DATE_TIME < 06/01/2015

All the optimizers of DBMS threats the two queries in the same way, so there is no difference in performance between them. The most commonly used form is the second one.

The optimize most likely will treat those two the same
But if you get into 4 or more join what can happen is for the query optimizer to go into a loop join and in that case they might process differently
The safer bet is to have the condition in the join (the first)

Related

How to optimize the script

I have the following SQL code:
select t1.*
from t1
join t3 on t3.id = t1.id
join t2 on t1.num = t2.num and coalesce(t1.date,t3.date) >= t2.date
but this script is not optimal at all, probably because of inequality in join.Is there a way to rewrite this, nothing comes to my mind
you can add indexes to columns t1.id, t3.id, t1.num, t2.num,
t1.date, t2.date, t3.date to perform an index scan while query
execution.
Postgresql - Index optimization for Date columns
Alternatively, if exists, also add the Direct join condition between
t2 & t3.
Use specific columns to be returned instead of "*".

Using LIMIT Statement in INNER JOIN (postgreSQL)

I am having trouble using the LIMIT Statement. I would really appreciate your help.
I am trying to INNER JOIN three tables and use the LIMIT statement to only query a few lines because the tables are so huge.
So, basically, this is what I am trying to accomplish:
SELECT *
FROM ((scheme1.table1
INNER JOIN scheme1.table2
ON scheme1.table1.column1 = scheme1.table2.column1 LIMIT 1)
INNER JOIN scheme1.table3
ON scheme1.table1.column1 = scheme1.table3.column1)
LIMIT 1;
I get an syntax error on the LIMIT from the first INNER JOIN. Why? How can I limit the results I get from each of the INNER JOINS. If I only use the second "LIMIT 1" at the bottom, I will query the entire table.
Thanks a lot!
LIMIT can only be applied to queries, not to a table reference. So you need to use a complete SELECT query for table2 in order to be able to use the LIMIT clause:
SELECT *
FROM schema1.table1 as t1
INNER JOIN (
select *
from schema1.table2
order by ???
limit 1
) as t2 ON t1.column1 = t2.column1
INNER JOIN schema1.table3 as t3 on ON t1.column1 = t3.column1
order by ???
limit 1;
Note that LIMIT without an ORDER BY typically makes no sense as results of a query have no inherent sort order. You should think about applying the necessary ORDER BY in the derived table (aka sub-query) and the outer query to get consistent and deterministic results.

Update a Table with value from another Table. PL SQL

I frequently use below JOIN and UPDATE to bring value from one table to another in TSQL.
UPDATE T1
SET T1.Mobile = T2.Mobile
FROM Table1 T1 INNER JOIN Table2 T2 ON T1.ID = T2.ID
Then I found in PL SQL, the syntax to do such update is either by using a MERGE, or a nest statement.
Either way appear to be not as straight forward as the TSQL solution, which makes me wondering if PL SQL developer actually perform such cross table update, or if there is other development principle making such update unnecessary.
One comment I got from a PL SQL developer is that they'd rather create a view like
CREATE VIEW MyView AS
(
SELECT T1.Filed1, T1.Field2, T2.Mobile
FROM Table1 T1 INNER JOIN Table2 T2 ON T1.ID = T2.ID
);
This looks like a viable solution taking the fact that joining does not introduce duplicates into Table1/MyView, or putting a dedup logic above.
One obvious benefit for this is that we can continue refreshing Table2.Mobile, and MyView will always have the updated value.
I am seeking comment on coding principle. :)
You may update using a correlated subquery:
UPDATE Table1 T1
SET T1.Mobile = (SELECT Mobile FROM Table2 T2 WHERE T1.ID = T2.ID);
You should use this query to make an inner join.
Without the where clause it's just a left join, that can consists of empty mobile data from T2
UPDATE Table1 T1
SET T1.Mobile = (SELECT min(Mobile) FROM Table2 T2 WHERE T1.ID = T2.ID)
where T1.ID in (SELECT T2.ID FROM Table2 T2)
;
And with the max(Mobile) you prevent to get an error when you have a
1:n relation for T1:T2.
To speedup the statements you should create indexes on T1.ID and T2.ID

INNER JOIN, LEFT/RIGHT OUTER JOIN

Apology in advance for a long question, but doing this just for the sake of learning:
i'm new to SQL and researching on JOIN for now. I'm getting two different behaviors when using INNER and OUTER JOIN. What I know is, INNER JOIN gives an intersection kind of result while returning only common rows among tables, and (LEFT/RIGHT) OUTER JOIN is outputting what is common and remaining rows in LEFT or RIGHT tables, depending upon LEFT/RIGHT clause respectively.
While working with MS Training Kit and trying to solve this practice: "Practice 2: In this practice, you identify rows that appear in one table but have no matches in another. You are given a task to return the IDs of employees from the HR.Employees table who did not handle orders (in the Sales.Orders table) on February 12, 2008. Write three different solutions using the following: joins, subqueries, and set
operators. To verify the validity of your solution, you are supposed to return employeeIDs: 1, 2, 3, 5, 7, and 9."
I'm successful doing this with subqueries and set operators but with JOIN is returning something not expected. I've written the following query:
USE TSQL2012;
SELECT
E.empid
FROM
HR.Employees AS H
JOIN Sales.Orders AS O
ON H.empid = O.empid
AND O.orderdate = '20080212'
JOIN HR.Employees AS E
ON E.empid <> H.empid
ORDER BY
E.empid
;
I'm expecting results as: 1, 2, 3, 5, 7, and 9 (6 rows)
But what i'm getting is: 1,1,1,2,2,2,3,3,3,4,4,5,5,5,6,6,7,7,7,8,8,9,9,9 (24 rows)
I tried some videos but could not understand this side of INNER/OUTER JOIN. I'll be grateful if someone could help this side of JOIN, why is it so and what should I try to understand while working with JOIN.
you can also use left outer join to get not matching
*** The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match.
SELECT
H.empid
FROM
HR.Employees AS H
LEFT OUTER JOIN Sales.Orders AS O
ON H.empid = O.empid
AND O.orderdate = '20080212'
WHERE O.empid IS NULL
Above script will return emp id who did not handle orders on specify date
here you can see all kind of join
Diagram taken from: http://dsin.wordpress.com/2013/03/16/sql-join-cheat-sheet/
adjust your query to be like this
USE TSQL2012;
SELECT
E.empid
FROM
HR.Employees AS H
JOIN Sales.Orders AS O
ON H.empid = O.empid
where O.orderdate = '2008-02-12' AND O.empid IN null
ORDER BY
E.empid
;
USE TSQL2012;
SELECT
distinct E.empid
FROM
HR.Employees AS H
JOIN Sales.Orders AS O
ON H.empid = O.empid
AND O.orderdate = '20080212'
JOIN HR.Employees AS E
ON E.empid <> H.empid
ORDER BY
E.empid
;
Primary things to always remind yourself when working with SQL JOINs:
INNER JOINs require a match in the join in order for result set rows produced prior to the INNER JOIN to remain in the result set. When no match is found for a row, the row is discarded from the result set.
For a row fed to an INNER JOIN that matches to ONLY one row, only one copy of that row fed to the result set is delivered.
For a row fed to an INNER JOIN that matches to multiple rows, the row will be delivered multiple times, once for each row match from the INNER JOIN table.
OUTER JOINs will not discard rows fed to them in the result set, whether or not the OUTER JOIN results in a match or not.
Just like INNER JOINs, if an OUTER JOIN matches to more than one row, it will increase the number of rows in the result set by duplicating rows equal to the number of rows matched from the OUTER JOIN table.
Ask yourself "if I get NO match on the JOIN, do I want the row discarded or not?" If the answer is NO, use an OUTER JOIN. If the answer is YES, use an INNER JOIN.
If you don't need to reference any of the columns from a JOIN table, don't perform a JOIN at all. Instead, use a WHERE EXISTS, WHERE NOT EXISTS, WHERE IN, WHERE NOT IN, etc. or similar, depending on your database engine in use. Don't rely on the database engine to be smart enough to discard unreferenced columns resulting from JOINs from the result set. Some databases may be smart enough to do that, some not. There's no reason to pull columns into a result set only to not reference them. Doing so increases chance of reduced performance.
Your JOIN of:
JOIN HR.Employees AS E
ON E.empid <> H.empid
...is matching to all Employees rows with a DIFFERENT EMPID to all rows fed to that join. Use of NOT EQUAL on an INNER JOIN is a very rare thing to do or need, especially if the JOIN predicate is testing only ONE condition. That is why your getting duplicate rows in the result set.
On DB2, we could perform an EXCEPTION JOIN to accomplish that using a JOIN alone. Normally, on DB2, I would use a WHERE NOT EXISTS for that. On SQL Server you could do a JOIN to a query where the query set is all employees without orders in SALES.ORDERS on the specified date, but I don't know if that violates the rules of your tutorial.
Naveen posted the solution it appears your tutorial is looking for!

Postgres: left join with order by and limit 1

I have the situation:
Table1 has a list of companies.
Table2 has a list of addresses.
Table3 is a N relationship of Table1 and Table2, with fields 'begin' and 'end'.
Because companies may move over time, a LEFT JOIN among them results in multiple records for each company.
begin and end fields are never NULL. The solution to find the latest address is use a ORDER BY being DESC, and to remove older addresses is a LIMIT 1.
That works fine if the query can bring only 1 company. But I need a query that brings all Table1 records, joined with their current Table2 addresses. Therefore, the removal of outdated data must be done (AFAIK) in LEFT JOIN's ON clause.
Any idea how I can build the clause to not create duplicated Table1 companies and bring latest address?
Use a dependent subquery with max() function in a join condition.
Something like in this example:
SELECT *
FROM companies c
LEFT JOIN relationship r
ON c.company_id = r.company_id
AND r."begin" = (
SELECT max("begin")
FROM relationship r1
WHERE c.company_id = r1.company_id
)
INNER JOIN addresses a
ON a.address_id = r.address_id
demo: http://sqlfiddle.com/#!15/f80c6/2
Since PostgreSQL 9.3 there is JOIN LATERAL (https://www.postgresql.org/docs/9.4/queries-table-expressions.html) that allows to make a sub-query to join, so it solves your issue in an elegant way:
SELECT * FROM companies c
JOIN LATERAL (
SELECT * FROM relationship r
WHERE c.company_id = r.company_id
ORDER BY r."begin" DESC LIMIT 1
) r ON TRUE
JOIN addresses a ON a.address_id = r.address_id
The disadvantage of this approach is the indexes of the tables inside LATERAL do not work outside.
I managed to solve it using Windows Function:
WITH ranked_relationship AS(
SELECT
*
,row_number() OVER (PARTITION BY fk_company ORDER BY dt_start DESC) as dt_last_addr
FROM relationship
)
SELECT
company.*
address.*,
dt_last_addr as dt_relationship
FROM
company
LEFT JOIN ranked_relationship as relationship
ON relationship.fk_company = company.pk_company AND dt_last_addr = 1
LEFT JOIN address ON address.pk_address = relationship.fk_address
row_number() creates an int counter for each record, inside each window based to fk_company. For each window, the record with latest date comes first with rank 1, then dt_last_addr = 1 makes sure the JOIN happens only once for each fk_company, with the record with latest address.
Window Functions are very powerful and few ppl use them, they avoid many complex joins and subqueries!